® Power ISATM Version 2.06 January 30, 2009 Softcopy Distribution: http://www.power.org/resources/reading/ Version 2.06 The following paragraph does not apply to the United © Copyright International Business Machines Corpora- Kingdom or any country or state where such provisions tion, 1994, 2009. All rights reserved. are inconsistent with local law. The specifications in this manual are subject to change without notice. This manual is provided "AS IS". Inter- national Business Machines Corp. makes no warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. International Business Machines Corp. does not war- rant that the contents of this publication or the accom- panying source code examples, whether individually or as one or more groups, will meet your requirements or that the publication or the accompanying source code examples are error-free. This publication could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorpo- rated in new editions of the publication. Address comments to IBM Corporation, 11400 Burnett Road, Austin, Texas 78758-3493. IBM may use or dis- tribute whatever information you supply in any way it believes appropriate without incurring any obligation to you. The following terms are trademarks of the International Business Machines Corporation in the United States and/or other countries: IBM® Power ISA PowerPC® Power Architecture PowerPC Architecture Power Family RISC/System 6000® POWER POWER2 POWER4 POWER4+ POWER5 POWER5+ POWER6 System/370 System z The POWER ARCHITECTURE and POWER.ORG. word marks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org. AltiVec is a trademark of Freescale Semiconductor, Inc. used under license. Notice to U.S. Government Users--Documentation Related to Restricted Rights--Use, duplication or dis- closure is subject to restrictions set fourth in GSA ADP Schedule Contract with IBM Corporation. ii Power ISATM Book I-III, VLE Version 2.06 Preface The roots of the Power ISA (Instruction Set Architec- Book II, Power ISA Virtual Environment Architecture, ture) extend back over a quarter of a century, to IBM defines the storage model and related instructions and Research. The POWER (Performance Optimization facilities available to the application programmer. With Enhanced RISC) Architecture was introduced with Book III-S, Power ISA Operating Environment Architec- the RISC System/6000 product family in early 1990. In ture, defines the supervisor instructions and related 1991, Apple, IBM, and Motorola began the collabora- facilities used for general purpose implementations. tion to evolve to the PowerPC Architecture, expanding the architecture's applicability. In 1997, Motorola and Book III-E, Power ISA Operating Environment Architec- IBM began another collaboration, focused on optimiz- ture, defines the supervisor instructions and related ing PowerPC for embedded systems, which produced facilities used for embedded implementations. It was Book E. derived from Book E and extended to include APU function. In 2006, Freescale and IBM collaborated on the cre- ation of the Power ISA Version 2.03, which represented Book VLE, Power ISAVariable Length Encoded the reunification of the architecture by combining Instructions Architecture, defines alternative instruction Book E content with the more general purpose Pow- encodings and definitions intended to increase instruc- erPC Version 2.02. A significant benefit of the reunifica- tion density for very low end implementations. It was tion is the establishment of a single, compatible, 64-bit derived from an APU description developed by Frees- programming model. The combining also extends cale Semiconductor. explicit architectural endorsement and control to Auxil- iary Processing Units (APUs), units of function that As used in this document, the term "Power ISA" refers were originally developed as implementation- or prod- to the instructions and facilities described in Books I, II, uct family-specific extensions in the context of the Book III-S, III-E, and VLE. E allocated opcode space. With the resulting architec- Usage of the phrase "Book III" refers to both Book III-S tural superset comes a framework that clearly estab- and Book III-E. An exception to this rule is when, at the lishes requirements and identifies options. beginning of a Section or Book, it is specified that To a very large extent, application program compatibil- usage of the phrase "Book III" implies only either "Book ity has been maintained throughout the history of the III-S" or "Book III-E". architecture, with the main exception being application Change bars have been included to indicate changes exploitation of APUs. The framework identifies the from the Power ISA Version 2.05. base, pervasive, part of the architecture, and differenti- ates it from "categories" of optional function (see Section 1.3.5 of Book I). Because of the substantial dif- ferences in the supervisor (privileged) architecture that developed as Book E was optimized for embedded systems, the supervisor architectures for embedded and general purpose implementations are represented as mutually exclusive categories. Future versions of the architecture will seek to converge on a common solu- tion where possible. This document defines the Power ISA Version 2.06. It is comprised of five books and a set of appendices. Book I, Power ISA User Instruction Set Architecture, covers the base instruction set and related facilities available to the application programmer. It includes five chapters derived from APU function, including the vec- tor extension also known as Altivec. Preface iii Version 2.06 Summary of Changes in Power ISA Version 2.06 This version of the PowerISA was created by applying tlbiel with Invalidation Selector (IS): An IS field is added the following requests for change (RFCs) to to the GPR operand of tlbiel to specify whether the PowerISA version 2.06. invalidation is for a specific virtual address, for all entries in a specific congruence class that have match- Fixed-Point Extended Precision Instructions: Four new ing LPID, or for all entries in a specific congruence divide instructions are added. The divide instructions class regardless of LPID. See Section 5.9.3.3 of Book assist in dividing 64 and 128 bit dividends by 32 and III-S. 64-bit divisors, respectively. Among other uses, these instructions can be used to provide better performance tlbie[l] instructions with LPID operand: The tlbie[l] for extended-precision fixed-point divide operations. instructions are enhanced to allow a GPR to specify an See Sections 3.3.8 and 3.3.8.1 of Book I. LPID value. The L field in the tlbie[l] instructions is moved to GPR RB. See Section 5.9.3.3 of Book III-S. Bit Permute Instruction: A new instruction is added to provide high-speed bit permutations. See Privileged, nonhypervisor tlbiel: The tlbiel instruction is Section 3.3.13.1 of Book I. enhanced to allow an OS to execute it. See Section 5.9.3.3 of Book III-S. Additional Population Count Instructions: 32-bit and 64- bit versions of the Population Count instructions are Edge-triggered Hypervisor Decrementer Exception: A added. See Sections 3.3.12 and 3.3.13.1 of Book I. Hypervisor Decrementer interrupt occurs when HDEC32 changes from 0 to 1, except when a processor Software Initiated Stride-N Prefetching: A new stream is in a power-saving mode. See Section 7.4 of Book III- variant of dcbt and dcbtst enables the specification of S. stride and first element offset. See Section 4.3.2 of Book II. Extensions to BFP Instruction Set to Support Software Divide and Square Root: Two new instructions are Load/Store Doubleword Byte-Reverse Instructions: added to enable effective, IEEE-compliant software Doubleword versions of the byte-reversed Load and divide and square root, which can more effectively uti- Store instructions are added. See Section 3.3.4.1 of lize hardware pipeline latencies, See Section 4.6.6.1 of Book I. Book I. Remove Hypervisor Data and Instruction Segment Data Cache Block Touch - Transient: A new variant of Interrupts and Reuse Interrupt Vectors: The Hypervisor dcbt is added that hints that usage of a soon-to-be-ref- Data and Instruction Segment interrupts that were erenced datum is transient, so that the processor can added for Virtualized Partition Memory in V. 2.04 are avoid displacing non-transient data from the cache. removed. See Section 4.3.2 of Book II. Change Category of VRSAVE Register to Base: The AMR-Related Architecture Changes (AMOR, UAMOR, VRSAVE register, which had been part of the Vector etc.): Two new SPRs, called Authority Mask Override and Embedded categories, is made Category: Base. Register (AMOR) and User Authority Mask Override See Section 3.2.3 of Book I. Register (UAMOR), are added to the architecture to Vector-Scalar Floating-Point Operations: The FPRs are restrict updates to the AMR by operating systems and doubled in width and the VRs appended to them to application programs respectively. A new, non-privi- form a 64 entry by 128b register file. Existing BFP, leged, SPR number for the AMR is added to permit DFP, and Vector instructions continue to operate on application programs to access the AMR. See their respective parts of the expanded register file. Section 5.7.9.1 of Book III-S. Double-precision vector and scalar instructions and Real Mode Storage Control Extension: A more flexible single-precision vector instructions are added which history-based approach to RMSC is added that can operate on the entire register file. See Chapter 7 of cover all of well-behaved data storage. If the initial data Book I. access to any portion of storage is not Caching Inhib- Extensions to BFP Integer Conversion Instruction Set: ited, it is performed as Guarded, but subsequent New double-precision floating-point to unsigned integer accesses will be performed as non-Guarded. If the ini- word/doubleword conversion instructions and new tial access is Caching Inhibited, all accesses will be signed and unsigned integer word/doubleword to dou- performed as Guarded. See Section 5.7.3.3.1 of Book ble-precision/single-precision conversion instructions III-S. are added. See Sections 4.6.2 and 4.6.7.2 of Book I. MPSS Extension: More combinations of page sizes are Processor Compatibility Register: The PCR is updated allowed in a segment. See Section 5.7.7.1 of Book III-S. to control newly added facilities in Version 2.06. See Strong Access Ordering: An assist for X86 and Sparc Section 2.6 of Book III-S. application emulation and porting provides TSO (Total Store Order, reference Sparc architecture) for iv Power ISATM Book I-III, VLE Version 2.06 accesses to designated pages in memory. See Byte and Halfword Reservations: New byte and half- Section 1.6.7 of Book II. word Load and Reserve and Store Conditional instruc- tions are added to accelerate byte and halfword atomic Remove the L Field from fre and frsqrte: The L field that update processing. Also, in Book III-S Appendix D, the was added to fre and frsqrte in V. 2.05 is removed. recommended treatment of DSISR values that corre- Mismatched Store Conditional: A new Store Condi- spond to multiple instructions is changed, to state that tional Page Mobility (SCPM) category is defined to in these cases the interrupt handler should load the specify that if a reservation exists, then a Store Condi- instruction from storage and, if the instruction is Load tional instruction's store is not performed if the storage and Reserve, should treat the case as a programming operand is in a different real memory block than the error. See Section 4.4.2 of Book II. reservation, where the size of the block is the size of Decorated Storage: New fixed-point Load and Store the smallest real page supported by the implementa- instructions are added that operate on "decorated stor- tion. In this case CR0EQ is set to indicate that the store age" and allow the programmer to supply meta-data to is not performed. Also, if a Store Conditional instruction accompany the storage operation. See Chapter 6 of has a different storage operand length than the previ- Book II. ous Load and Reserve instruction and the reservation exists, both the setting of CR0EQ and whether the store Embedded Floating Point efscfd Change: When an is performed are undefined. See Sections 1.7.3 and efscfd instruction operates on a double-precision value 1.7.3.1 of Book II. that is smaller than the smallest normalized single-preci- sion value, the result of the conversion is the appropriate Miscellaneous Changes: Various minor editorial correc- signed zero. See Section 9.3.4 of Book I. tions are made. Delete mfapidi instruction: The mfapid instruction is Embedded Hypervisor: Logical partitioning / hypervisor removed from the architecture. capabilities are added to the Embedded environment. See Chapter 2 of Book III-E. Make mftb Phased-Out for both Embedded and Server Environments: mftb was formerly Phased-Out for the Embedded Interrupt Fixed Offsets: Embedded IVORs Server environment, and is now Phased-Out for both are replaced with fixed offsets. An optional Machine environments. Check Interrupt Vector Prefix Register is also added so that the Machine Check handler can be located in stor- wait Instruction Changes: A new field is added to the age that is not Caching Inhibited. See Sections wait instruction to add reservation loss and an imple- 7.2.18.4 and 7.6.2 of Book III-E. mentation-specific condition as causes for resuming instruction execution. Also, wait is no longer context Multi-threading Architecture: Multi-threading and asso- synchronizing. See Section 4.4.4 of Book II. ciated resources are defined for the Embedded envi- ronment. he necessary definitions are also added to Remove the Version Number from Phased-In Cate- Book III-S, and the terminology in both Book IIIs is gory: The "(sV2.0x)" suffix is removed from the modified to use "thread" instead of "processor" where Phased-In Category. appropriate. See Chapter 3 of Book III-E. CFAR Relaxation: B-form branches within the current FSL MMU with Embedded.Hypervisor Support & cache block or to an adjacent cache block need not set Extensions plus Embedded Page Table: The FSL Type the CFAR. See Section 8.1.1 of Book III-S. MMU becomes the standard for the Embedded envi- Limitations on Implementation-Specific Features: An ronment. Hypervisor capability is added to it, and the Engineering Note is added in Book III-E specifying the allowance for multiple PID registers is removed. A new SPR numbers that Embedded designs should use for MMU Architecture Version 2.0 is defined that includes implementation-specific SPRs. The Engineering Note an optional hardware page table, an optional condi- in the Preface is made consistent with the needs of tional TLB write based on a TLB-reservation, a Logical Embedded designs. See Section 5.4.1 of Book III-E. to Real Address Translation facility to permit OS TLB and/or page table management, and additional 2n KB Handling of High-Order GPR Bits in 32-bit Mode of page sizes. It also allows for a different TLB scheme, Embedded Designs: For Embedded 64-bit designs, including a hardware entry select capability. The adds the option for a 32-bit mode that is compatible tlbivax TLB invalidate all function is removed from Ver- with that of the Server environment. See Section 1.5.2 sion 2.0. MAS register pairs can be read/written by of Book I. mfspr/mtspr on 64-bit implementations. See Chapter 6 of Book III-E. Interrupts and DABR Match for Quadword and Zero- Length Storage Operands: DABR match and [H]DAR Move evlddepx and evstddepx Opcodes: The opcodes setting by [H]DSI and Data Segment interrupts have for evlddepx and evstddepx are changed. See quadword granularity for quadword storage operands. Section 5.4.3 of Book III-E. 0-length storage operands do not cause [H]DSI or Preface v Version 2.06 Alignment interrupts, and do not cause Reference and bit. The descriptions of dcbtep and dcbtstep are Change bits to be set. updated to clarify that they do not support stream prefetch. See Section 1.7.3.1 of Book II. Adjustments for Secure Systems: The options for some instructions to cause reservation loss is removed. Restore bcl 20,31,$+4 Idiom to Programming Note: dcbtst, dcbtstep, and dcbtstls are changed to require Restore the idiom used to get the next instruction write authority but prevented from setting the Change address. See Section 2.4 of Book I. vi Power ISATM Book I-III, VLE Version 2.06 Table of Contents 1.6.7 X-FORM . . . . . . . . . . . . . . . . . . . . 16 1.6.8 XL-FORM . . . . . . . . . . . . . . . . . . . 16 1.6.9 XFX-FORM . . . . . . . . . . . . . . . . . 16 1.6.10 XFL-FORM. . . . . . . . . . . . . . . . . 16 1.6.11 XX1-FORM. . . . . . . . . . . . . . . . . 17 Preface. . . . . . . . . . . . . . . . . . . . . . . . . iii 1.6.12 XX2-FORM. . . . . . . . . . . . . . . . . 17 Summary of Changes in Power ISA 1.6.13 XX3-FORM. . . . . . . . . . . . . . . . . 17 Version 2.06 . . . . . . . . . . . . . . . . . . . . . . iv 1.6.14 XX4-FORM. . . . . . . . . . . . . . . . . 17 1.6.15 XS-FORM. . . . . . . . . . . . . . . . . . 17 Table of Contents . . . . . . . . . . . . . . . . ix 1.6.16 XO-FORM . . . . . . . . . . . . . . . . . 17 1.6.17 A-FORM . . . . . . . . . . . . . . . . . . . 17 1.6.18 M-FORM . . . . . . . . . . . . . . . . . . 17 Figures. . . . . . . . . . . . . . . . . . . . . . xxvii 1.6.19 MD-FORM . . . . . . . . . . . . . . . . . 17 1.6.20 MDS-FORM . . . . . . . . . . . . . . . . 17 Book I: 1.6.21 VA-FORM . . . . . . . . . . . . . . . . . . 17 1.6.22 VC-FORM . . . . . . . . . . . . . . . . . 17 Power ISA User Instruction Set 1.6.23 VX-FORM. . . . . . . . . . . . . . . . . . 18 1.6.24 EVX-FORM . . . . . . . . . . . . . . . . 18 Architecture . . . . . . . . . . . . . . . . . . . . 1 1.6.25 EVS-FORM . . . . . . . . . . . . . . . . 18 1.6.26 Z22-FORM . . . . . . . . . . . . . . . . . 18 Chapter 1. Introduction . . . . . . . . . . 3 1.6.27 Z23-FORM . . . . . . . . . . . . . . . . . 18 1.1 Overview. . . . . . . . . . . . . . . . . . . . . . 3 1.6.28 Instruction Fields . . . . . . . . . . . . 18 1.2 Instruction Mnemonics and Operands3 1.7 Classes of Instructions . . . . . . . . . . 21 1.3 Document Conventions . . . . . . . . . . 4 1.7.1 Defined Instruction Class . . . . . . . 21 1.3.1 Definitions . . . . . . . . . . . . . . . . . . . 4 1.7.2 Illegal Instruction Class . . . . . . . . 22 1.3.2 Notation . . . . . . . . . . . . . . . . . . . . . 4 1.7.3 Reserved Instruction Class . . . . . 22 1.3.3 Reserved Fields and Reserved 1.8 Forms of Defined Instructions . . . . . 23 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.8.1 Preferred Instruction Forms . . . . . 23 1.3.4 Description of Instruction 1.8.2 Invalid Instruction Forms . . . . . . . 23 Operation . . . . . . . . . . . . . . . . . . . . . . . . 7 1.8.3 Reserved-no-op Instructions 1.3.5 Categories . . . . . . . . . . . . . . . . . . . 9 [Category: Phased-In] . . . . . . . . . . . . . . 23 1.3.5.1 Phased-In/Phased-Out . . . . . . . 10 1.9 Exceptions. . . . . . . . . . . . . . . . . . . . 23 1.3.5.2 Corequisite Category . . . . . . . . 11 1.10 Storage Addressing. . . . . . . . . . . . 24 1.3.5.3 Category Notation. . . . . . . . . . . 11 1.10.1 Storage Operands . . . . . . . . . . . 24 1.3.6 Environments. . . . . . . . . . . . . . . . 11 1.10.2 Instruction Fetches . . . . . . . . . . . 25 1.4 Processor Overview . . . . . . . . . . . . 12 1.10.3 Effective Address Calculation. . . 27 1.5 Computation modes . . . . . . . . . . . . 14 1.5.1 Modes [Category: Server] . . . . . . 14 Chapter 2. Branch Facility . . . . . . . 29 1.5.2 Modes [Category: Embedded]. . . 14 2.1 Branch Facility Overview . . . . . . . . . 29 1.6 Instruction formats . . . . . . . . . . . . . 15 2.2 Instruction Execution Order. . . . . . . 29 1.6.1 I-FORM . . . . . . . . . . . . . . . . . . . . 15 2.3 Branch Facility Registers. . . . . . . . . 30 1.6.2 B-FORM . . . . . . . . . . . . . . . . . . . 15 2.3.1 Condition Register . . . . . . . . . . . . 30 1.6.3 SC-FORM . . . . . . . . . . . . . . . . . . 15 2.3.2 Link Register . . . . . . . . . . . . . . . . 31 1.6.4 D-FORM . . . . . . . . . . . . . . . . . . . 15 2.3.3 Count Register . . . . . . . . . . . . . . . 31 1.6.5 DS-FORM . . . . . . . . . . . . . . . . . . 15 2.4 Branch Instructions . . . . . . . . . . . . . 31 1.6.6 DQ-FORM . . . . . . . . . . . . . . . . . . 15 2.5 Condition Register Instructions . . . . 37 Table of Contents ix Version 2.06 2.5.1 Condition Register Logical 3.3.13.1.1 64-bit Fixed-Point Rotate Instructions . . . . . . . . . . . . . . . . . . . . . . .37 Instructions [Category: 64-Bit]. . . . . . . . 90 2.5.2 Condition Register Field 3.3.13.2 Fixed-Point Shift Instructions . 93 Instruction . . . . . . . . . . . . . . . . . . . . . . . .38 3.3.13.2.1 64-bit Fixed-Point Shift 2.6 System Call Instruction . . . . . . . . . .39 Instructions [Category: 64-Bit]. . . . . . . . 95 3.3.14 Binary Coded Decimal (BCD) Chapter 3. Fixed-Point Facility . . . 41 Assist Instructions [Category: 3.1 Fixed-Point Facility Overview . . . . . .41 Embedded.Phased-in, Server] . . . . . . . 97 3.2 Fixed-Point Facility Registers . . . . . .42 3.3.15 Move To/From System Register 3.2.1 General Purpose Registers. . . . . .42 Instructions . . . . . . . . . . . . . . . . . . . . . . 99 3.2.2 Fixed-Point Exception 3.3.15.1 Move to/From One Condition Register . . . . . . . . . . . . . . . . . . . . . . . . .42 Register Field Instructions . . . . . . . . . 103 3.2.3 VR Save Register . . . . . . . . . . . . .43 3.3.15.2 Move To/From System Registers 3.2.4 Software Use SPRs [Category: [Category: Embedded] . . . . . . . . . . . . 104 Embedded] . . . . . . . . . . . . . . . . . . . . . . .43 3.2.5 Device Control Registers Chapter 4. Floating-Point Facility [Category: Embedded.Device Control] . .43 [Category: Floating-Point] . . . . . . 105 3.3 Fixed-Point Facility Instructions . . . .44 4.1 Floating-Point Facility Overview . . 105 3.3.1 Fixed-Point Storage Access 4.2 Floating-Point Facility Registers . . 106 Instructions . . . . . . . . . . . . . . . . . . . . . . .44 4.2.1 Floating-Point Registers . . . . . . 106 3.3.1.1 Storage Access Exceptions . . . .44 4.2.2 Floating-Point Status and Control 3.3.2 Fixed-Point Load Instructions . . . .44 Register. . . . . . . . . . . . . . . . . . . . . . . . 107 3.3.2.1 64-bit Fixed-Point Load 4.3 Floating-Point Data . . . . . . . . . . . . 109 Instructions [Category: 64-Bit] . . . . . . . .49 4.3.1 Data Format. . . . . . . . . . . . . . . . 109 3.3.3 Fixed-Point Store Instructions . . . .51 4.3.2 Value Representation . . . . . . . . 110 3.3.3.1 64-bit Fixed-Point Store 4.3.3 Sign of Result . . . . . . . . . . . . . . 111 Instructions [Category: 64-Bit] . . . . . . . .54 4.3.4 Normalization and 3.3.4 Fixed-Point Load and Store with Byte Denormalization . . . . . . . . . . . . . . . . . 112 Reversal Instructions . . . . . . . . . . . . . . .55 4.3.5 Data Handling and Precision . . . 112 3.3.4.1 64-Bit Load and Store with Byte 4.3.5.1 Single-Precision Operands . . . 112 Reversal Instructions 4.3.5.2 Integer-Valued Operands . . . . 113 [Category: 64-bit] . . . . . . . . . . . . . . . . . .56 4.3.6 Rounding . . . . . . . . . . . . . . . . . . 113 3.3.5 Fixed-Point Load and Store Multiple 4.4 Floating-Point Exceptions . . . . . . . 114 Instructions . . . . . . . . . . . . . . . . . . . . . . .57 4.4.1 Invalid Operation Exception . . . . 116 3.3.6 Fixed-Point Move Assist Instructions 4.4.1.1 Definition. . . . . . . . . . . . . . . . . 116 [Category: Move Assist] . . . . . . . . . . . . .58 4.4.1.2 Action . . . . . . . . . . . . . . . . . . . 116 3.3.7 Other Fixed-Point Instructions. . . .61 4.4.2 Zero Divide Exception . . . . . . . . 117 3.3.8 Fixed-Point Arithmetic 4.4.2.1 Definition. . . . . . . . . . . . . . . . . 117 Instructions . . . . . . . . . . . . . . . . . . . . . . .62 4.4.2.2 Action . . . . . . . . . . . . . . . . . . . 117 3.3.8.1 64-bit Fixed-Point Arithmetic 4.4.3 Overflow Exception . . . . . . . . . . 117 Instructions [Category: 64-Bit] . . . . . . . .71 4.4.3.1 Definition. . . . . . . . . . . . . . . . . 117 3.3.9 Fixed-Point Compare 4.4.3.2 Action . . . . . . . . . . . . . . . . . . . 117 Instructions . . . . . . . . . . . . . . . . . . . . . . .74 4.4.4 Underflow Exception . . . . . . . . . 118 3.3.10 Fixed-Point Trap Instructions . . . .76 4.4.4.1 Definition. . . . . . . . . . . . . . . . . 118 3.3.10.1 64-bit Fixed-Point Trap 4.4.4.2 Action . . . . . . . . . . . . . . . . . . . 118 Instructions [Category: 64-Bit] . . . . . . . .77 4.4.5 Inexact Exception . . . . . . . . . . . 119 3.3.11 Fixed-Point Select [Category: 4.4.5.1 Definition. . . . . . . . . . . . . . . . . 119 Phased-In (sV2.06)] . . . . . . . . . . . . . . . .77 4.4.5.2 Action . . . . . . . . . . . . . . . . . . . 119 3.3.12 Fixed-Point Logical Instructions .78 4.5 Floating-Point Execution Models . 119 3.3.12.1 64-bit Fixed-Point Logical 4.5.1 Execution Model for IEEE Instructions [Category: 64-Bit] . . . . . . . .85 Operations. . . . . . . . . . . . . . . . . . . . . . 119 3.3.13 Fixed-Point Rotate and Shift 4.5.2 Execution Model for Instructions . . . . . . . . . . . . . . . . . . . . . . .87 Multiply-Add Type Instructions . . . . . . 121 3.3.13.1 Fixed-Point Rotate 4.6 Floating-Point Facility Instructions . . . . . . . . . . . . . . . . . . . . . . .87 Instructions . . . . . . . . . . . . . . . . . . . . . 122 x Power ISATM I-III, VLE Version 2.06 4.6.1 Floating-Point Storage Access 5.5.4 Arithmetic Operations. . . . . . . . . 162 Instructions . . . . . . . . . . . . . . . . . . . . . 123 5.5.4.1 Sign of Arithmetic Result. . . . . 162 4.6.1.1 Storage Access Exceptions . . 123 5.5.5 Compare Operations . . . . . . . . . 163 4.6.2 Floating-Point Load 5.5.6 Test Operations . . . . . . . . . . . . . 163 Instructions . . . . . . . . . . . . . . . . . . . . . 123 5.5.7 Quantum Adjustment 4.6.3 Floating-Point Store Operations . . . . . . . . . . . . . . . . . . . . . . 163 Instructions . . . . . . . . . . . . . . . . . . . . . 127 5.5.8 Conversion Operations. . . . . . . . 163 4.6.4 Floating-Point Load Store 5.5.8.1 Data-Format Conversion . . . . . 163 Doubleword Pair Instructions [Category: 5.5.8.2 Data-Type Conversion. . . . . . . 164 Floating-Point.Phased-Out] . . . . . . . . . 131 5.5.9 Format Operations . . . . . . . . . . . 164 4.6.5 Floating-Point Move 5.5.10 DFP Exceptions . . . . . . . . . . . . 164 Instructions . . . . . . . . . . . . . . . . . . . . . 132 5.5.10.1 Invalid Operation Exception . 166 4.6.6 Floating-Point Arithmetic 5.5.10.2 Zero Divide Exception . . . . . . 167 Instructions . . . . . . . . . . . . . . . . . . . . . 133 5.5.10.3 Overflow Exception . . . . . . . . 167 4.6.6.1 Floating-Point Elementary 5.5.10.4 Underflow Exception . . . . . . . 168 Arithmetic Instructions. . . . . . . . . . . . . 133 5.5.10.5 Inexact Exception . . . . . . . . . 169 4.6.6.2 Floating-Point Multiply-Add 5.5.11 Summary of Normal Rounding And Instructions . . . . . . . . . . . . . . . . . . . . . 138 Range Actions . . . . . . . . . . . . . . . . . . . 170 4.6.7 Floating-Point Rounding and 5.6 DFP Instruction Descriptions. . . . . 172 Conversion Instructions. . . . . . . . . . . . 140 5.6.1 DFP Arithmetic Instructions . . . . 173 4.6.7.1 Floating-Point Rounding 5.6.2 DFP Compare Instructions. . . . . 177 Instruction . . . . . . . . . . . . . . . . . . . . . . 140 5.6.3 DFP Test Instructions . . . . . . . . . 180 4.6.7.2 Floating-Point Convert To/From 5.6.4 DFP Quantum Adjustment Integer Instructions . . . . . . . . . . . . . . . 140 Instructions . . . . . . . . . . . . . . . . . . . . . 183 4.6.7.3 Floating Round to Integer 5.6.5 DFP Conversion Instructions . . . 192 Instructions . . . . . . . . . . . . . . . . . . . . . 146 5.6.5.1 DFP Data-Format Conversion 4.6.8 Floating-Point Compare Instructions . . . . . . . . . . . . . . . . . . . . . 192 Instructions . . . . . . . . . . . . . . . . . . . . . 148 5.6.5.2 DFP Data-Type Conversion 4.6.9 Floating-Point Select I Instructions . . . . . . . . . . . . . . . . . . . . . 195 nstruction . . . . . . . . . . . . . . . . . . . . . . 149 5.6.6 DFP Format Instructions . . . . . . 197 4.6.10 Floating-Point Status and Control 5.6.7 DFP Instruction Summary . . . . . 201 Register Instructions . . . . . . . . . . . . . . 150 Chapter 6. Vector Facility [Category: Chapter 5. Decimal Floating-Point Vector]. . . . . . . . . . . . . . . . . . . . . . . 203 [Category: Decimal 6.1 Vector Facility Overview . . . . . . . . 204 Floating-Point]. . . . . . . . . . . . . . . . 153 6.2 Chapter Conventions. . . . . . . . . . . 204 5.1 Decimal Floating-Point (DFP) Facility 6.2.1 Description of Instruction Operation. Overview . . . . . . . . . . . . . . . . . . . . . . . 153 204 5.2 DFP Register Handling . . . . . . . . . 154 6.3 Vector Facility Registers . . . . . . . . 205 5.2.1 DFP Usage of Floating-Point 6.3.1 Vector Registers . . . . . . . . . . . . . 205 Registers . . . . . . . . . . . . . . . . . . . . . . . 154 6.3.2 Vector Status and Control 5.3 DFP Support for Non-DFP Data Register . . . . . . . . . . . . . . . . . . . . . . . . 205 Types. . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.3.3 VR Save Register. . . . . . . . . . . . 206 5.4 DFP Number Representation . . . . 157 6.4 Vector Storage Access 5.4.1 DFP Data Format. . . . . . . . . . . . 158 Operations . . . . . . . . . . . . . . . . . . . . . . 206 5.4.1.1 Fields Within the Data Format 158 6.4.1 Accessing Unaligned Storage 5.4.1.2 Summary of DFP Data Operands . . . . . . . . . . . . . . . . . . . . . . . 208 Formats . . . . . . . . . . . . . . . . . . . . . . . . 159 6.5 Vector Integer Operations . . . . . . . 209 5.4.1.3 Preferred DPD Encoding . . . . 159 6.5.1 Integer Saturation. . . . . . . . . . . . 209 5.4.2 Classes of DFP Data . . . . . . . . . 159 6.6 Vector Floating-Point Operations. . 210 5.5 DFP Execution Model . . . . . . . . . . 160 6.6.1 Floating-Point Overview . . . . . . . 210 5.5.1 Rounding . . . . . . . . . . . . . . . . . . 160 6.6.2 Floating-Point Exceptions. . . . . . 210 5.5.2 Rounding Mode Specification . . 161 6.6.2.1 NaN Operand Exception . . . . . 211 5.5.3 Formation of Final Result. . . . . . 162 6.6.2.2 Invalid Operation Exception . . 211 5.5.3.1 Use of Ideal Exponent . . . . . . 162 6.6.2.3 Zero Divide Exception . . . . . . . 211 6.6.2.4 Log of Zero Exception . . . . . . . 211 Table of Contents xi Version 2.06 6.6.2.5 Overflow Exception . . . . . . . . .211 Chapter 7. Vector-Scalar Floating- 6.6.2.6 Underflow Exception . . . . . . . .212 Point Operations [Category: 6.7 Vector Storage Access Instructions . . . . . . . . . . . . . . . . . . . . . .212 VSX] . . . . . . . . . . . . . . . . . . . . . . . . 271 6.7.1 Storage Access Exceptions . . . .212 7.1 Introduction . . . . . . . . . . . . . . . . . . 273 6.7.2 Vector Load Instructions . . . . . . .213 7.1.1 Overview of the Vector-Scalar 6.7.3 Vector Store Instructions . . . . . . .216 Extension . . . . . . . . . . . . . . . . . . . . . . 273 6.7.4 Vector Alignment Support 7.1.1.1 Compatibility with Category Instructions . . . . . . . . . . . . . . . . . . . . . .218 Floating-Point and Category Decimal 6.8 Vector Permute and Formatting Floating-Point Operations . . . . . . . . . . 273 Instructions . . . . . . . . . . . . . . . . . . . . . .219 7.1.1.2 Compatibility with Category Vector 6.8.1 Vector Pack and Unpack Operations. . . . . . . . . . . . . . . . . . . . . . 273 Instructions . . . . . . . . . . . . . . . . . . . . . .219 7.2 VSX Registers . . . . . . . . . . . . . . . 273 6.8.2 Vector Merge Instructions . . . . . .224 7.2.1 Vector-Scalar Registers . . . . . . . 273 6.8.3 Vector Splat Instructions . . . . . . .226 7.2.1.1 Floating-Point Registers . . . . . 274 6.8.4 Vector Permute Instruction . . . . .227 7.2.1.2 Vector Registers . . . . . . . . . . . 275 6.8.5 Vector Select Instruction . . . . . . .227 7.2.2 Floating-Point Status and Control 6.8.6 Vector Shift Instructions . . . . . . .228 Register. . . . . . . . . . . . . . . . . . . . . . . . 276 6.9 Vector Integer Instructions . . . . . . .230 7.3 VSX Operations . . . . . . . . . . . . . . 282 6.9.1 Vector Integer Arithmetic 7.3.1 VSX Floating-Point Arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . .230 Overview . . . . . . . . . . . . . . . . . . . . . . . 282 6.9.1.1 Vector Integer Add 7.3.2 VSX Floating-Point Data . . . . . . 283 Instructions . . . . . . . . . . . . . . . . . . . . . .230 7.3.2.1 Data Format . . . . . . . . . . . . . . 283 6.9.1.2 Vector Integer Subtract 7.3.2.2 Value Representation . . . . . . . 284 Instructions . . . . . . . . . . . . . . . . . . . . . .233 7.3.2.3 Sign of Result . . . . . . . . . . . . . 285 6.9.1.3 Vector Integer Multiply 7.3.2.4 Normalization and Instructions . . . . . . . . . . . . . . . . . . . . . .236 Denormalization . . . . . . . . . . . . . . . . . 285 6.9.1.4 Vector Integer Multiply-Add/Sum 7.3.2.5 Data Handling and Precision . 286 Instructions . . . . . . . . . . . . . . . . . . . . . .238 7.3.2.6 Rounding . . . . . . . . . . . . . . . . 287 6.9.1.5 Vector Integer Sum-Across 7.3.3 VSX Floating-Point Execution Instructions . . . . . . . . . . . . . . . . . . . . . .243 Models . . . . . . . . . . . . . . . . . . . . . . . . 289 6.9.1.6 Vector Integer Average 7.3.3.1 VSX Execution Model for IEEE Instructions . . . . . . . . . . . . . . . . . . . . . .245 Operations. . . . . . . . . . . . . . . . . . . . . . 289 6.9.1.7 Vector Integer Maximum and 7.3.3.2 VSX Execution Model for Minimum Instructions . . . . . . . . . . . . . .247 Multiply-Add Type Instructions . . . . . . 290 6.9.2 Vector Integer Compare 7.4 VSX Floating-Point Exceptions . . . 292 Instructions . . . . . . . . . . . . . . . . . . . . . .251 7.4.1 Floating-Point Invalid Operation 6.9.3 Vector Logical Instructions . . . . .254 Exception . . . . . . . . . . . . . . . . . . . . . . 295 6.9.4 Vector Integer Rotate and Shift 7.4.1.1 Definition. . . . . . . . . . . . . . . . . 295 Instructions . . . . . . . . . . . . . . . . . . . . . .255 7.4.1.2 Action for VE=1. . . . . . . . . . . . 295 6.10 Vector Floating-Point Instruction 7.4.1.3 Action for VE=0. . . . . . . . . . . . 297 Set . . . . . . . . . . . . . . . . . . . . . . . . . . . .259 7.4.2 Floating-Point Zero Divide 6.10.1 Vector Floating-Point Arithmetic Exception . . . . . . . . . . . . . . . . . . . . . . 302 Instructions . . . . . . . . . . . . . . . . . . . . . .259 7.4.2.1 Definition. . . . . . . . . . . . . . . . . 302 6.10.2 Vector Floating-Point Maximum and 7.4.2.2 Action for ZE=1 . . . . . . . . . . . . 302 Minimum Instructions . . . . . . . . . . . . . .261 7.4.2.3 Action for ZE=0 . . . . . . . . . . . . 302 6.10.3 Vector Floating-Point Rounding and 7.4.3 Floating-Point Overflow Conversion Instructions . . . . . . . . . . . .262 Exception . . . . . . . . . . . . . . . . . . . . . . 304 6.10.4 Vector Floating-Point Compare 7.4.3.1 Definition. . . . . . . . . . . . . . . . . 304 Instructions . . . . . . . . . . . . . . . . . . . . . .265 7.4.3.2 Action for OE=1 . . . . . . . . . . . 304 6.10.5 Vector Floating-Point Estimate 7.4.3.3 Action for OE=0 . . . . . . . . . . . 304 Instructions . . . . . . . . . . . . . . . . . . . . . .267 7.4.4 Floating-Point Underflow 6.11 Vector Status and Control Register Exception . . . . . . . . . . . . . . . . . . . . . . 306 Instructions . . . . . . . . . . . . . . . . . . . . . .269 7.4.4.1 Definition. . . . . . . . . . . . . . . . . 306 7.4.4.2 Action for UE=1 . . . . . . . . . . . 306 7.4.4.3 Action for UE=0 . . . . . . . . . . . 307 xii Power ISATM I-III, VLE Version 2.06 7.4.5 Floating-Point Inexact 7.7.1 Instruction Description Exception . . . . . . . . . . . . . . . . . . . . . . 308 Conventions. . . . . . . . . . . . . . . . . . . . . 322 7.4.5.1 Definition. . . . . . . . . . . . . . . . . 308 7.7.1.1 Instruction RTL Operators . . . . 322 7.4.5.2 Action for XE=1. . . . . . . . . . . . 308 7.7.1.2 Instruction RTL Function 7.4.5.3 Action for XE=0. . . . . . . . . . . . 308 Calls. . . . . . . . . . . . . . . . . . . . . . . . . . . 322 7.5 Storage Access Operations . . . . . 310 7.5.1 Accessing Aligned Storage Chapter 8. Signal Processing Engine Operands . . . . . . . . . . . . . . . . . . . . . . 310 (SPE) [Category: Signal Processing 7.5.2 Accessing Unaligned Storage Operands . . . . . . . . . . . . . . . . . . . . . . 311 Engine] . . . . . . . . . . . . . . . . . . . . . . 503 7.5.3 Storage Access Exceptions . . . . 312 8.1 Overview . . . . . . . . . . . . . . . . . . . . 503 7.6 VSX Instruction Set Summary . . . 313 8.2 Nomenclature and Conventions . . 503 7.6.1 VSX Storage Access 8.3 Programming Model . . . . . . . . . . . 504 Instructions . . . . . . . . . . . . . . . . . . . . . 313 8.3.1 General Operation . . . . . . . . . . . 504 7.6.1.1 VSX Scalar Storage Access 8.3.2 GPR Registers . . . . . . . . . . . . . . 504 Instructions . . . . . . . . . . . . . . . . . . . . . 313 8.3.3 Accumulator Register . . . . . . . . . 504 7.6.1.2 VSX Vector Storage Access 8.3.4 Signal Processing Embedded Instructions . . . . . . . . . . . . . . . . . . . . . 313 Floating-Point Status and Control Register 7.6.2 VSX Move Instructions . . . . . . . 314 (SPEFSCR) . . . . . . . . . . . . . . . . . . . . . 504 7.6.2.1 VSX Scalar Move 8.3.5 Data Formats . . . . . . . . . . . . . . . 507 Instructions . . . . . . . . . . . . . . . . . . . . . 314 8.3.5.1 Integer Format. . . . . . . . . . . . . 507 7.6.2.2 VSX Vector Move 8.3.5.2 Fractional Format . . . . . . . . . . 507 Instructions . . . . . . . . . . . . . . . . . . . . . 314 8.3.6 Computational Operations . . . . . 508 7.6.3 VSX Floating-Point Arithmetic 8.3.7 SPE Instructions. . . . . . . . . . . . . 509 Instructions . . . . . . . . . . . . . . . . . . . . . 315 8.3.8 Saturation, Shift, and Bit Reverse 7.6.3.1 VSX Scalar Floating-Point Models . . . . . . . . . . . . . . . . . . . . . . . . . 509 Arithmetic Instructions. . . . . . . . . . . . . 315 8.3.8.1 Saturation . . . . . . . . . . . . . . . . 509 7.6.3.2 VSX Vector Floating-Point 8.3.8.2 Shift Left . . . . . . . . . . . . . . . . . 509 Arithmetic Instructions. . . . . . . . . . . . . 315 8.3.8.3 Bit Reverse . . . . . . . . . . . . . . . 509 7.6.4 VSX Floating-Point Compare 8.3.9 SPE Instruction Set . . . . . . . . . . 510 Instructions . . . . . . . . . . . . . . . . . . . . . 317 7.6.4.1 VSX Scalar Floating-Point Chapter 9. Embedded Floating-Point Compare Instructions . . . . . . . . . . . . . 317 [Category: SPE.Embedded Float 7.6.4.2 VSX Vector Floating-Point Scalar Double] Compare Instructions . . . . . . . . . . . . . 317 7.6.5 VSX DP-SP Conversion [Category: SPE.Embedded Float Instructions . . . . . . . . . . . . . . . . . . . . . 318 Scalar Single] 7.6.5.1 VSX Scalar DP-SP Conversion [Category: SPE.Embedded Float Instructions . . . . . . . . . . . . . . . . . . . . . 318 7.6.5.2 VSX Vector DP-SP Conversion Vector]. . . . . . . . . . . . . . . . . . . . . . . 557 Instructions . . . . . . . . . . . . . . . . . . . . . 318 9.1 Overview . . . . . . . . . . . . . . . . . . . . 557 7.6.6 VSX Integer Conversion 9.2 Programming Model . . . . . . . . . . . 558 Instructions . . . . . . . . . . . . . . . . . . . . . 318 9.2.1 Signal Processing Embedded 7.6.6.1 VSX Scalar Integer Conversion Floating-Point Status and Control Register Instructions . . . . . . . . . . . . . . . . . . . . . 318 (SPEFSCR) . . . . . . . . . . . . . . . . . . . . . 558 7.6.6.2 VSX Vector Integer Conversion 9.2.2 Floating-Point Data Formats . . . 558 Instructions . . . . . . . . . . . . . . . . . . . . . 319 9.2.3 Exception Conditions . . . . . . . . . 559 7.6.7 VSX Round to Floating-Point Integer 9.2.3.1 Denormalized Values on Instructions . . . . . . . . . . . . . . . . . . . . . 320 Input. . . . . . . . . . . . . . . . . . . . . . . . . . . 559 7.6.7.1 VSX Scalar Round to Floating- 9.2.3.2 Embedded Floating-Point Overflow Point Integer Instructions . . . . . . . . . . 320 and Underflow . . . . . . . . . . . . . . . . . . . 559 7.6.7.2 VSX Vector Round to Floating- 9.2.3.3 Embedded Floating-Point Invalid Point Integer Instructions . . . . . . . . . . 320 Operation/Input Errors . . . . . . . . . . . . . 560 7.6.8 VSX Logical Instructions . . . . . . 320 9.2.3.4 Embedded Floating-Point Round 7.6.9 VSX Permute Instructions . . . . . 321 (Inexact). . . . . . . . . . . . . . . . . . . . . . . . 560 7.7 VSX Instruction Descriptions . . . . 322 9.2.3.5 Embedded Floating-Point Divide by Zero . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 Table of Contents xiii Version 2.06 9.2.3.6 Default Results . . . . . . . . . . . . .560 Appendix C. Vector RTL Functions 9.2.4 IEEE 754 Compliance . . . . . . . . .560 [Category: Vector] . . . . . . . . . . . . 619 9.2.4.1 Sticky Bit Handling For Exception Conditions. . . . . . . . . . . . . . . . . . . . . . .561 9.3 Embedded Floating-Point Appendix D. Embedded Floating- Instructions . . . . . . . . . . . . . . . . . . . . . .562 Point RTL Functions 9.3.1 Load/Store Instructions . . . . . . . .562 9.3.2 SPE.Embedded Float Vector [Category: SPE.Embedded Float Instructions [Category: SPE.Embedded Float Vector] . . . . . . . . . . . . . . . . . . . . .562 Scalar Double] 9.3.3 SPE.Embedded Float Scalar Single [Category: SPE.Embedded Float Instructions Scalar Single] [Category: SPE.Embedded Float Scalar [Category: SPE.Embedded Float Single]. . . . . . . . . . . . . . . . . . . . . . . . . .570 9.3.4 SPE.Embedded Float Scalar Double Vector] . . . . . . . . . . . . . . . . . . . . . . 621 Instructions D.1 Common Functions . . . . . . . . . . . 621 [Category: SPE.Embedded Float Scalar D.2 Convert from Single-Precision Double] . . . . . . . . . . . . . . . . . . . . . . . . .577 Embedded Floating-Point to Integer Word 9.4 Embedded Floating-Point Results with Saturation . . . . . . . . . . . . . . . . . . 622 Summary . . . . . . . . . . . . . . . . . . . . . . .586 D.3 Convert from Double-Precision Embedded Floating-Point to Integer Word Chapter 10. Legacy Move Assist with Saturation . . . . . . . . . . . . . . . . . . 623 D.4 Convert from Double-Precision Instruction [Category: Legacy Move Embedded Floating-Point to Integer Assist] . . . . . . . . . . . . . . . . . . . . . . 591 Doubleword with Saturation . . . . . . . . 624 D.5 Convert to Single-Precision Chapter 11. Legacy Integer Multiply- Embedded Floating-Point from Integer Accumulate Instructions [Category: Word . . . . . . . . . . . . . . . . . . . . . . . . . . 625 D.6 Convert to Double-Precision Legacy Integer Embedded Floating-Point from Integer Multiply-Accumulate] . . . . . . . . . . 593 Word . . . . . . . . . . . . . . . . . . . . . . . . . . 625 D.7 Convert to Double-Precision Appendix A. Suggested Floating- Embedded Floating-Point from Integer Doubleword . . . . . . . . . . . . . . . . . . . . . 626 Point Models [Category: Floating- Point] . . . . . . . . . . . . . . . . . . . . . . . 603 Appendix E. Assembler Extended A.1 Floating-Point Round to Single- Precision Model . . . . . . . . . . . . . . . . . .603 Mnemonics . . . . . . . . . . . . . . . . . . 627 A.2 Floating-Point Convert to Integer E.1 Symbols . . . . . . . . . . . . . . . . . . . . 627 Model . . . . . . . . . . . . . . . . . . . . . . . . . .607 E.2 Branch Mnemonics. . . . . . . . . . . . 628 A.3 Floating-Point Convert from Integer E.2.1 BO and BI Fields . . . . . . . . . . . . 628 Model . . . . . . . . . . . . . . . . . . . . . . . . . .610 E.2.2 Simple Branch Mnemonics . . . . 628 A.4 Floating-Point Round to Integer E.2.3 Branch Mnemonics Incorporating Model . . . . . . . . . . . . . . . . . . . . . . . . . .612 Conditions . . . . . . . . . . . . . . . . . . . . . . 629 E.2.4 Branch Prediction . . . . . . . . . . . 630 E.3 Condition Register Logical Appendix B. Densely Packed Mnemonics . . . . . . . . . . . . . . . . . . . . . 631 Decimal. . . . . . . . . . . . . . . . . . . . . . 615 E.4 Subtract Mnemonics. . . . . . . . . . . 631 B.1 BCD-to-DPD Translation . . . . . . . .615 E.4.1 Subtract Immediate . . . . . . . . . . 631 E.4.2 Subtract . . . . . . . . . . . . . . . . . . . 631 B.2 DPD-to-BCD Translation . . . . . . . .615 E.5 Compare Mnemonics . . . . . . . . . . 632 B.3 Preferred DPD encoding . . . . . . . .616 E.5.1 Doubleword Comparisons . . . . . 632 E.5.2 Word Comparisons . . . . . . . . . . 632 E.6 Trap Mnemonics . . . . . . . . . . . . . . 633 E.7 Rotate and Shift Mnemonics . . . . 635 E.7.1 Operations on Doublewords . . . 635 E.7.2 Operations on Words . . . . . . . . 636 E.8 Move To/From Special Purpose Register Mnemonics . . . . . . . . . . . . . . 637 xiv Power ISATM I-III, VLE Version 2.06 E.9 Miscellaneous Mnemonics . . . . . . 637 Book II: Appendix F. Programming Power ISA Virtual Environment Examples . . . . . . . . . . . . . . . . . . . . 641 Architecture . . . . . . . . . . . . . . . . . . 651 F.1 Multiple-Precision Shifts . . . . . . . . 641 F.2 Floating-Point Conversions [Category: Floating-Point] . . . . . . . . . . . . . . . . . . . 644 Chapter 1. Storage Model. . . . . . . 653 F.2.1 Conversion from 1.1 Definitions . . . . . . . . . . . . . . . . . . . 653 Floating-Point Number to 1.2 Introduction . . . . . . . . . . . . . . . . . . 654 Floating-Point Integer . . . . . . . . . . . . . 644 1.3 Virtual Storage . . . . . . . . . . . . . . . 655 F.2.2 Conversion from 1.4 Single-copy Atomicity . . . . . . . . . 655 Floating-Point Number to Signed Fixed- 1.5 Cache Model . . . . . . . . . . . . . . . . . 656 Point Integer Doubleword . . . . . . . . . . 644 1.6 Storage Control Attributes . . . . . . 656 F.2.3 Conversion from 1.6.1 Write Through Required . . . . . . 657 Floating-Point Number to Unsigned Fixed- 1.6.2 Caching Inhibited . . . . . . . . . . . 657 Point Integer Doubleword . . . . . . . . . . 644 1.6.3 Memory Coherence Required F.2.4 Conversion from [Category: Memory Coherence] . . . . . 657 Floating-Point Number to Signed Fixed- 1.6.4 Guarded . . . . . . . . . . . . . . . . . . 658 Point Integer Word . . . . . . . . . . . . . . . 644 1.6.5 Endianness [Category: F.2.5 Conversion from Embedded.Little-Endian] . . . . . . . . . . . 658 Floating-Point Number to Unsigned Fixed- 1.6.6 Variable Length Encoded (VLE) Point Integer Word . . . . . . . . . . . . . . . 645 Instructions . . . . . . . . . . . . . . . . . . . . . 658 F.2.6 Conversion from Signed Fixed-Point 1.6.7 Strong Access Order [Category: Integer Doubleword to Floating-Point SAO] . . . . . . . . . . . . . . . . . . . . . . . . . . 659 Number . . . . . . . . . . . . . . . . . . . . . . . . 645 1.7 Shared Storage . . . . . . . . . . . . . . 660 F.2.7 Conversion from Unsigned Fixed- 1.7.1 Storage Access Ordering . . . . 660 Point Integer Doubleword to Floating-Point 1.7.2 Storage Ordering of I/O Number . . . . . . . . . . . . . . . . . . . . . . . . 645 Accesses . . . . . . . . . . . . . . . . . . . . . . . 662 F.2.8 Conversion from Signed Fixed-Point 1.7.3 Atomic Update . . . . . . . . . . . . . . 662 Integer Word to Floating-Point Number 645 1.7.3.1 Reservations . . . . . . . . . . . . . 663 F.2.9 Conversion from Unsigned Fixed- 1.7.3.2 Forward Progress . . . . . . . . . . 665 Point Integer Word to Floating-Point 1.8 Instruction Storage . . . . . . . . . . . . 665 Number . . . . . . . . . . . . . . . . . . . . . . . . 645 1.8.1 Concurrent Modification and F.2.10 Unsigned Single-Precision BCD Execution of Instructions . . . . . . . . . . . 668 Arithmetic . . . . . . . . . . . . . . . . . . . . . . 646 F.2.11 Signed Single-Precision BCD Chapter 2. Effect of Operand Arithmetic . . . . . . . . . . . . . . . . . . . . . . 646 Placement on Performance . . . . . . 671 F.2.12 Unsigned Extended-Precision BCD 2.1 Instruction Restart . . . . . . . . . . . . 673 Arithmetic . . . . . . . . . . . . . . . . . . . . . . 646 F.3 Floating-Point Selection [Category: Floating-Point] . . . . . . . . . . . . . . . . . . . 648 Chapter 3. Management of Shared F.3.1 Comparison to Zero . . . . . . . . . . 648 Resources. . . . . . . . . . . . . . . . . . . . 675 F.3.2 Minimum and Maximum . . . . . . . 648 3.1 Program Priority Registers . . . . . . 675 F.3.3 Simple if-then-else 3.2 "or" Instruction . . . . . . . . . . . . . . . . 676 Constructions . . . . . . . . . . . . . . . . . . . 648 F.3.4 Notes . . . . . . . . . . . . . . . . . . . . . 648 Chapter 4. Storage Control F.4 Vector Unaligned Storage Operations Instructions . . . . . . . . . . . . . . . . . . 677 [Category: Vector] . . . . . . . . . . . . . . . . 649 4.1 Parameters Useful to Application F.4.1 Loading a Unaligned Quadword Programs . . . . . . . . . . . . . . . . . . . . . . 677 Using Permute from Big-Endian 4.2 Data Stream Control Register (DSCR) Storage . . . . . . . . . . . . . . . . . . . . . . . . 649 [Category: Stream] . . . . . . . . . . . . . . . 678 4.3 Cache Management Instructions . 679 4.3.1 Instruction Cache Instructions . . 680 4.3.2 Data Cache Instructions . . . . . . 681 Table of Contents xv Version 2.06 4.3.2.1 Obsolete Data Cache Instructions B.2.2.1 Export Shared Storage and [Category: Vector.Phased-Out]. . . . . . .692 Release Lock . . . . . . . . . . . . . . . . . . . 722 4.4 Synchronization Instructions . . . . .693 B.2.2.2 Export Shared Storage and 4.4.1 Instruction Synchronize Release Lock using lwsync . . . . . . . . . 722 Instruction . . . . . . . . . . . . . . . . . . . . . . .693 B.2.3 Safe Fetch . . . . . . . . . . . . . . . . . 722 4.4.2 Load and Reserve and Store B.3 List Insertion. . . . . . . . . . . . . . . . . 723 Conditional Instructions . . . . . . . . . . . .693 B.4 Notes . . . . . . . . . . . . . . . . . . . . . . 723 4.4.2.1 64-Bit Load and Reserve and Store Conditional Instructions [Category: Book III-S: 64-Bit] . . . . . . . . . . . . . . . . . . . . . . . . . .699 4.4.3 Memory Barrier Instructions . . . .701 4.4.4 Wait Instruction . . . . . . . . . . . . . .704 Power ISA Operating Environment Architecture - Server Environment Chapter 5. Time Base . . . . . . . . . 707 [Category: Server] . . . . . . . . . . . . 725 5.1 Time Base Overview . . . . . . . . . . .707 5.2 Time Base . . . . . . . . . . . . . . . . . . .707 Chapter 1. Introduction . . . . . . . . 727 5.2.1 Time Base Instructions . . . . . . . .708 1.1 Overview. . . . . . . . . . . . . . . . . . . . 727 5.3 Alternate Time Base [Category: 1.2 Document Conventions . . . . . . . . 727 Alternate Time Base] . . . . . . . . . . . . . .710 1.2.1 Definitions and Notation . . . . . . 727 1.2.2 Reserved Fields. . . . . . . . . . . . . 728 Chapter 6. Decorated Storage 1.3 General Systems Overview . . . . . 729 Facility [Category: Decorated 1.4 Exceptions . . . . . . . . . . . . . . . . . . 729 Storage] . . . . . . . . . . . . . . . . . . . . . 711 1.5 Synchronization . . . . . . . . . . . . . . 729 1.5.1 Context Synchronization . . . . . . 729 6.1 Decorated Load Instructions . . . . .712 1.5.2 Execution Synchronization . . . . 730 6.2 Decorated Store Instructions . . . . .713 6.3 Decorated Notify Instructions . . . . .714 Chapter 2. Logical Partitioning Chapter 7. External Control (LPAR) . . . . . . . . . . . . . . . . . . . . . . 731 [Category: External Control] . . . . 715 2.1 Overview. . . . . . . . . . . . . . . . . . . . 731 2.2 Logical Partitioning Control Register 7.1 External Access Instructions . . . . .716 (LPCR) . . . . . . . . . . . . . . . . . . . . . . . . 731 2.3 Real Mode Offset Register (RMOR) . . Appendix A. Assembler Extended 734 Mnemonics . . . . . . . . . . . . . . . . . . 717 2.4 Hypervisor Real Mode Offset Register A.1 Data Cache Block Flush (HRMOR) . . . . . . . . . . . . . . . . . . . . . . 734 Mnemonics . . . . . . . . . . . . . . . . . . . . . .717 2.5 Logical Partition A.2 Load and Reserve Identification Register (LPIDR) . . . . . . 734 Mnemonics . . . . . . . . . . . . . . . . . . . . . .717 2.6 Processor Compatibility Register A.3 Synchronize Mnemonics . . . . . . . .717 (PCR) . . . . . . . . . . . . . . . . . . . . . . . . . 735 A.4 Wait Mnemonics . . . . . . . . . . . . . .717 2.7 Other Hypervisor Resources . . . . 737 2.8 Sharing Hypervisor Resources. . . 738 Appendix B. Programming Examples 2.9 Hypervisor Interrupt Little-Endian (HILE) Bit . . . . . . . . . . . . . . . . . . . . . . 739 for Sharing Storage . . . . . . . . . . . 719 B.1 Atomic Update Primitives. . . . . . . .719 B.2 Lock Acquisition and Release, and Chapter 3. Branch Facility . . . . . 741 Related Techniques . . . . . . . . . . . . . . .721 3.1 Branch Facility Overview . . . . . . . 741 B.2.1 Lock Acquisition and Import 3.2 Branch Facility Registers . . . . . . . 741 Barriers . . . . . . . . . . . . . . . . . . . . . . . . .721 3.2.1 Machine State Register . . . . . . . 741 B.2.1.1 Acquire Lock and Import Shared 3.3 Branch Facility Instructions . . . . . . 745 Storage . . . . . . . . . . . . . . . . . . . . . . . . .721 3.3.1 System Linkage Instructions . . . 745 B.2.1.2 Obtain Pointer and Import Shared 3.3.2 Power-Saving Mode Storage . . . . . . . . . . . . . . . . . . . . . . . . .721 Instructions . . . . . . . . . . . . . . . . . . . . . 747 B.2.2 Lock Release and Export 3.3.2.1 Entering and Exiting Power-Saving Barriers . . . . . . . . . . . . . . . . . . . . . . . . .722 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . 751 xvi Power ISATM I-III, VLE Version 2.06 Chapter 4. Fixed-Point Facility . . 753 5.7.7 Virtual to Real Translation . . . . . 780 4.1 Fixed-Point Facility Overview . . . . 753 5.7.7.1 Page Table . . . . . . . . . . . . . . . 782 4.2 Special Purpose Registers . . . . . . 753 5.7.7.2 Storage Description 4.3 Fixed-Point Facility Registers . . . . 753 Register 1 . . . . . . . . . . . . . . . . . . . . . . 784 4.3.1 Processor Version Register . . . . 753 5.7.7.3 Page Table Search . . . . . . . . . 785 4.3.2 Processor Identification 5.7.7.4 Relaxed Page Table Alignment Register. . . . . . . . . . . . . . . . . . . . . . . . 754 [Category: Server.Relaxed Page Table 4.3.3 Control Register. . . . . . . . . . . . . 754 Alignment] . . . . . . . . . . . . . . . . . . . . . . 787 4.3.4 Program Priority Register . . . . . 754 5.7.8 Reference and Change 4.3.5 Software-use SPRs . . . . . . . . . . 755 Recording . . . . . . . . . . . . . . . . . . . . . . 787 4.4 Fixed-Point Facility Instructions . . 756 5.7.9 Storage Protection . . . . . . . . . . . 790 4.4.1 Fixed-Point Load and Store Caching 5.7.9.1 Virtual Page Class Key Inhibited Instructions . . . . . . . . . . . . . . 756 Protection . . . . . . . . . . . . . . . . . . . . . . 790 4.4.2 Fixed-Point Load and Store 5.7.9.2 Basic Storage Protection, Address Quadword Instructions [Category: Load/ Translation Enabled . . . . . . . . . . . . . . . 794 Store Quadword] . . . . . . . . . . . . . . . . . 759 5.7.9.3 Basic Storage Protection, Address 4.4.3 OR Instruction . . . . . . . . . . . . . . 760 Translation Disabled . . . . . . . . . . . . . . 794 4.4.4 Move To/From System Register 5.8 Storage Control Attributes . . . . . . . 796 Instructions . . . . . . . . . . . . . . . . . . . . . 760 5.8.1 Guarded Storage . . . . . . . . . . . . 796 5.8.1.1 Out-of-Order Accesses to Guarded Chapter 5. Storage Control . . . . . 769 Storage . . . . . . . . . . . . . . . . . . . . . . . . 796 5.8.2 Storage Control Bits . . . . . . . . . . 797 5.1 Overview. . . . . . . . . . . . . . . . . . . . 770 5.8.2.1 Storage Control Bit 5.2 Storage Exceptions. . . . . . . . . . . . 770 Restrictions . . . . . . . . . . . . . . . . . . . . . 798 5.3 Instruction Fetch . . . . . . . . . . . . . 770 5.8.2.2 Altering the Storage Control 5.3.1 Implicit Branch . . . . . . . . . . . . . . 770 Bits. . . . . . . . . . . . . . . . . . . . . . . . . . . . 798 5.3.2 Address Wrapping Combined with 5.9 Storage Control Instructions . . . . . 800 Changing MSR Bit SF . . . . . . . . . . . . . 770 5.9.1 Cache Management Instructions 800 5.4 Data Access . . . . . . . . . . . . . . . . . 770 5.9.2 Synchronize Instruction . . . . . . . 800 5.5 Performing Operations 5.9.3 Lookaside Buffer Out-of-Order . . . . . . . . . . . . . . . . . . . . 770 Management . . . . . . . . . . . . . . . . . . . . 801 5.6 Invalid Real Address . . . . . . . . . . . 771 5.9.3.1 SLB Management 5.7 Storage Addressing . . . . . . . . . . . 773 Instructions . . . . . . . . . . . . . . . . . . . . . 802 5.7.1 32-Bit Mode . . . . . . . . . . . . . . . . 773 5.9.3.2 Bridge to SLB Architecture 5.7.2 Virtualized Partition Memory (VPM) [Category:Server.Phased-Out] . . . . . . 808 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . 773 5.9.3.2.1 Segment Register 5.7.3 Real And Virtual Real Addressing Manipulation Instructions. . . . . . . . . . . 808 Modes . . . . . . . . . . . . . . . . . . . . . . . . . 774 5.9.3.3 TLB Management 5.7.3.1 Hypervisor Offset Real Mode Instructions . . . . . . . . . . . . . . . . . . . . . 811 Address . . . . . . . . . . . . . . . . . . . . . . . . 774 5.10 Page Table Update Synchronization 5.7.3.2 Offset Real Mode Address . . . 774 Requirements . . . . . . . . . . . . . . . . . . . 818 5.7.3.3 Storage Control Attributes for 5.10.1 Page Table Updates . . . . . . . . . 818 Accesses in Real and Hypervisor Real 5.10.1.1 Adding a Page Table Entry . . 819 Addressing Modes . . . . . . . . . . . . . . . 775 5.10.1.2 Modifying a Page Table 5.7.3.3.1 Hypervisor Real Mode Storage Entry . . . . . . . . . . . . . . . . . . . . . . . . . . 820 Control . . . . . . . . . . . . . . . . . . . . . . . . 775 5.10.1.3 Deleting a Page Table Entry . 820 5.7.3.4 Virtual Real Mode Addressing Mechanism . . . . . . . . . . . . . . . . . . . . . 776 5.7.3.5 Storage Control Attributes for Chapter 6. Interrupts. . . . . . . . . . . 821 Implicit Storage Accesses . . . . . . . . . . 777 6.1 Overview . . . . . . . . . . . . . . . . . . . . 821 5.7.4 Address Ranges Having Defined 6.2 Interrupt Registers. . . . . . . . . . . . . 822 Uses . . . . . . . . . . . . . . . . . . . . . . . . . . 777 6.2.1 Machine Status Save/Restore 5.7.5 Address Translation Overview . . 777 Registers . . . . . . . . . . . . . . . . . . . . . . . 822 5.7.6 Virtual Address Generation . . . . 778 6.2.2 Hypervisor Machine Status Save/ 5.7.6.1 Segment Lookaside Buffer Restore Registers . . . . . . . . . . . . . . . . 822 (SLB) . . . . . . . . . . . . . . . . . . . . . . . . . . 778 6.2.3 Data Address Register . . . . . . . . 822 5.7.6.2 SLB Search . . . . . . . . . . . . . . 779 Table of Contents xvii Version 2.06 6.2.4 Hypervisor Data Address 6.7.1 Unordered Exceptions . . . . . . . . 845 Register . . . . . . . . . . . . . . . . . . . . . . . .822 6.7.2 Ordered Exceptions . . . . . . . . . . 845 6.2.5 Data Storage Interrupt 6.8 Interrupt Priorities . . . . . . . . . . . . . 845 Status Register . . . . . . . . . . . . . . . . . . .822 6.2.6 Hypervisor Data Storage Interrupt Chapter 7. Timer Facilities . . . . . 849 Status Register . . . . . . . . . . . . . . . . . . .823 7.1 Overview. . . . . . . . . . . . . . . . . . . . 849 6.2.7 Hypervisor Emulation Instruction 7.2 Time Base (TB) . . . . . . . . . . . . . . 849 Register . . . . . . . . . . . . . . . . . . . . . . . .823 7.2.1 Writing the Time Base . . . . . . . . 850 6.2.8 Hypervisor Maintenance Exception 7.3 Decrementer . . . . . . . . . . . . . . . . . 850 Register . . . . . . . . . . . . . . . . . . . . . . . .823 7.3.1 Writing and Reading the 6.2.9 Hypervisor Maintenance Exception Decrementer . . . . . . . . . . . . . . . . . . . . 851 Enable Register . . . . . . . . . . . . . . . . . .823 7.4 Hypervisor Decrementer. . . . . . . . 851 6.3 Interrupt Synchronization . . . . . . . .824 7.5 Processor Utilization of Resources 6.4 Interrupt Classes . . . . . . . . . . . . . .824 Register (PURR) . . . . . . . . . . . . . . . . . 852 6.4.1 Precise Interrupt . . . . . . . . . . . . .824 7.6 Scaled Processor Utilization of 6.4.2 Imprecise Interrupt . . . . . . . . . . .824 Resources Register (SPURR) . . . . . . 853 6.4.3 Interrupt Processing . . . . . . . . . .825 6.4.4 Implicit alteration of HSRR0 and Chapter 8. Debug Facilities . . . . 855 HSRR1 . . . . . . . . . . . . . . . . . . . . . . . . .827 6.5 Interrupt Definitions . . . . . . . . . . . .828 8.1 Overview. . . . . . . . . . . . . . . . . . . . 855 6.5.1 System Reset Interrupt . . . . . . . .829 8.1.1 Come-From Address Register . . 855 6.5.2 Machine Check Interrupt . . . . . . .831 8.1.2 Data Address Breakpoint. . . . . . 855 6.5.3 Data Storage Interrupt. . . . . . . . .832 6.5.4 Data Segment Interrupt. . . . . . . .833 Chapter 9. External Control 6.5.5 Instruction Storage Interrupt . . . .834 [Category: External Control] . . . . 859 6.5.6 Instruction Segment 9.1 External Access Register . . . . . . . 859 Interrupt . . . . . . . . . . . . . . . . . . . . . . . .834 9.2 External Access Instructions . . . . 859 6.5.7 External Interrupt . . . . . . . . . . . .835 6.5.8 Alignment Interrupt . . . . . . . . . . .835 Chapter 10. Synchronization 6.5.9 Program Interrupt . . . . . . . . . . . .837 6.5.10 Floating-Point Unavailable Requirements for Context Interrupt . . . . . . . . . . . . . . . . . . . . . . . .838 Alterations. . . . . . . . . . . . . . . . . . . 861 6.5.11 Decrementer Interrupt . . . . . . . .839 6.5.12 Hypervisor Decrementer Appendix A. Assembler Extended Interrupt . . . . . . . . . . . . . . . . . . . . . . . .839 Mnemonics . . . . . . . . . . . . . . . . . . 867 6.5.13 System Call Interrupt . . . . . . . .839 A.1 Move To/From Special Purpose 6.5.14 Trace Interrupt [Category: Register Mnemonics . . . . . . . . . . . . . . 867 Trace] . . . . . . . . . . . . . . . . . . . . . . . . . .839 6.5.15 Hypervisor Data Storage Interrupt . . . . . . . . . . . . . . . . . . . . . . . .840 Appendix B. Example Performance 6.5.16 Hypervisor Instruction Storage Monitor . . . . . . . . . . . . . . . . . . . . . 869 Interrupt . . . . . . . . . . . . . . . . . . . . . . . .841 B.1 PMM Bit of the Machine State 6.5.17 Hypervisor Emulation Assistance Register. . . . . . . . . . . . . . . . . . . . . . . . 870 Interrupt . . . . . . . . . . . . . . . . . . . . . . . .842 B.2 Special Purpose Registers . . . . . . 870 6.5.18 Hypervisor Maintenance B.2.1 Performance Monitor Counter Interrupt . . . . . . . . . . . . . . . . . . . . . . . .842 Registers . . . . . . . . . . . . . . . . . . . . . . . 872 6.5.19 Performance Monitor B.2.2 Monitor Mode Control Interrupt [Category: Server.Performance Register 0 . . . . . . . . . . . . . . . . . . . . . . 872 Monitor]. . . . . . . . . . . . . . . . . . . . . . . . .843 B.2.3 Monitor Mode Control 6.5.20 Vector Unavailable Interrupt Register 1 . . . . . . . . . . . . . . . . . . . . . . 874 [Category: Vector]. . . . . . . . . . . . . . . . .843 B.2.4 Monitor Mode Control 6.5.21 VSX Unavailable Interrupt Register A . . . . . . . . . . . . . . . . . . . . . . 875 [Category: VSX] . . . . . . . . . . . . . . . . . .843 B.2.5 Sampled Instruction Address 6.6 Partially Executed Register. . . . . . . . . . . . . . . . . . . . . . . . 876 Instructions . . . . . . . . . . . . . . . . . . . . . .844 B.2.6 Sampled Data Address 6.7 Exception Ordering . . . . . . . . . . . .845 Register. . . . . . . . . . . . . . . . . . . . . . . . 876 xviii Power ISATM I-III, VLE Version 2.06 B.3 Performance Monitor Chapter 3. Thread Control [Category: Interrupt. . . . . . . . . . . . . . . . . . . . . . . . 876 Embedded Multi-Threading] . . . . . 899 B.4 Interaction with the Trace Facility . 877 3.1 Overview . . . . . . . . . . . . . . . . . . . . 899 3.2 Thread Identification Register Appendix C. Example Trace (TIR) . . . . . . . . . . . . . . . . . . . . . . . . . . 899 Extensions. . . . . . . . . . . . . . . . . . . 879 3.3 Thread Enable Register (TEN) . . . 899 3.4 Thread Enable Status Register Appendix D. Interpretation of the (TENSR) . . . . . . . . . . . . . . . . . . . . . . . 900 3.5 Disabling and Enabling Threads . . 900 DSISR as Set by an Alignment 3.6 Sharing of Multi-Threaded Processor Interrupt . . . . . . . . . . . . . . . . . . . . . 881 Resources . . . . . . . . . . . . . . . . . . . . . . 900 3.7 Thread Management Facility Appendix E. Programming [Category: Embedded Examples . . . . . . . . . . . . . . . . . . . . 885 Multithreading.Thread Management]. . 901 E.1 Unsigned Single-Precision BCD 3.7.1 Initialize Next Instruction Address Arithmetic . . . . . . . . . . . . . . . . . . . . . . 885 Registers . . . . . . . . . . . . . . . . . . . . . . . 901 E.2 Signed Single-Precision BCD 3.7.2 Thread Management Arithmetic . . . . . . . . . . . . . . . . . . . . . . 885 Instructions . . . . . . . . . . . . . . . . . . . . . 902 E.3 Unsigned Extended-Precision BCD Arithmetic . . . . . . . . . . . . . . . . . . . . . . 886 Chapter 4. Branch Facility . . . . . . 903 4.1 Branch Facility Overview . . . . . . . . 903 Book III-E: 4.2 Branch Facility Registers. . . . . . . . 903 4.2.1 Machine State Register . . . . . . . 903 4.2.2 Machine State Register Protect Power ISA Operating Environment Register (MSRP) . . . . . . . . . . . . . . . . . 905 Architecture - Embedded 4.2.3 Embedded Processor Control Environment [Category: Register (EPCR) . . . . . . . . . . . . . . . . . 906 4.3 Branch Facility Instructions . . . . . . 908 Embedded] . . . . . . . . . . . . . . . . . . 887 4.3.1 System Linkage Instructions . . . 908 Chapter 1. Introduction . . . . . . . . 889 Chapter 5. Fixed-Point Facility . . . 913 1.1 Overview. . . . . . . . . . . . . . . . . . . . 889 5.1 Fixed-Point Facility Overview . . . . 913 1.2 32-Bit Implementations . . . . . . . . . 889 5.2 Special Purpose Registers . . . . . . 913 1.3 Document Conventions . . . . . . . . 889 5.3 Fixed-Point Facility Registers . . . . 913 1.3.1 Definitions and Notation . . . . . . 889 5.3.1 Processor Version Register . . . . 913 1.3.2 Reserved Fields. . . . . . . . . . . . . 891 5.3.2 Processor Identification 1.4 General Systems Overview . . . . . 891 Register . . . . . . . . . . . . . . . . . . . . . . . . 914 1.5 Exceptions . . . . . . . . . . . . . . . . . . 892 5.3.3 Guest Processor Identification 1.6 Synchronization . . . . . . . . . . . . . . 892 Register [Category: 1.6.1 Context Synchronization . . . . . . 892 Embedded.Hypervisor] . . . . . . . . . . . . 914 1.6.2 Execution Synchronization . . . . 893 5.3.4 Program Priority Register 32-bit [Category: Phased-In] . . . . . . . . . . . . . 914 Chapter 2. Logical Partitioning 5.3.5 Software-use SPRs . . . . . . . . . . 915 [Category: 5.3.6 External Process ID Registers Embedded.Hypervisor]. . . . . . . . . 895 [Category: Embedded.External PID] . . 916 5.3.6.1 External Process ID Load Context 2.1 Overview. . . . . . . . . . . . . . . . . . . . 895 (EPLC) Register . . . . . . . . . . . . . . . . . 916 2.2 Registers . . . . . . . . . . . . . . . . . . . 896 5.3.6.2 External Process ID Store Context 2.2.1 Register Mapping . . . . . . . . . . . 896 (EPSC) Register . . . . . . . . . . . . . . . . . 917 2.2.2 Logical Partition Identification 5.4 Fixed-Point Facility Instructions . . . 918 Register (LPIDR). . . . . . . . . . . . . . . . . 896 5.4.1 Move To/From System Register 2.3 Interrupts and Exceptions . . . . . . . 896 Instructions . . . . . . . . . . . . . . . . . . . . . 918 2.3.1 Directed Interrupts . . . . . . . . . . . 896 5.4.2 OR Instruction . . . . . . . . . . . . . . 926 2.3.2 Hypervisor Service Interrupts . . 897 5.4.3 External Process ID Instructions 2.4 Instruction Mapping . . . . . . . . . . . 897 [Category: Embedded.External PID] . . 927 Table of Contents xix Version 2.06 Chapter 6. Storage Control . . . . . 939 6.10.3.2 TLB Configuration Registers 6.1 Overview . . . . . . . . . . . . . . . . . . . .940 (TLBnCFG) . . . . . . . . . . . . . . . . . . . . . 974 6.2 Storage Exceptions . . . . . . . . . . . .942 6.10.3.3 TLB Page Size Registers 6.3 Instruction Fetch . . . . . . . . . . . . . .942 (TLBnPS) [MAV=2.0]. . . . . . . . . . . . . . 976 6.3.1 Implicit Branch. . . . . . . . . . . . . . .942 6.10.3.4 Embedded Page Table 6.3.2 Address Wrapping Combined with Configuration Register (EPTCFG) . . . 976 Changing MSR Bit CM . . . . . . . . . . . . .943 6.10.3.5 LRAT Configuration Register 6.4 Data Access . . . . . . . . . . . . . . . . . .943 (LRATCFG) [Category: 6.5 Performing Operations Embedded.Hypervisor.LRAT] . . . . . . . 977 Out-of-Order . . . . . . . . . . . . . . . . . . . . .943 6.10.3.6 LRAT Page Size Register 6.6 Invalid Real Address . . . . . . . . . . .944 (LRATPS) [Category: 6.7 Storage Control . . . . . . . . . . . . . . .944 Embedded.Hypervisor.LRAT] . . . . . . . 977 6.7.1 Translation Lookaside Buffer . . . .944 6.10.3.7 MMU Control and Status Register 6.7.2 Virtual Address Spaces. . . . . . . .949 (MMUCSR0) . . . . . . . . . . . . . . . . . . . . 978 6.7.3 TLB Address Translation . . . . . . .950 6.10.3.8 MAS0 Register . . . . . . . . . . . 978 6.7.4 Page Table Address Translation 6.10.3.9 MAS1 Register . . . . . . . . . . . 980 [Category: Embedded.Page Table]. . . .953 6.10.3.10 MAS2 Register . . . . . . . . . . 980 6.7.5 Page Table Update Synchronization 6.10.3.11 MAS3 Register . . . . . . . . . . 981 Requirements [Category: Embedded.Page 6.10.3.12 MAS4 Register . . . . . . . . . . 982 Table] . . . . . . . . . . . . . . . . . . . . . . . . . .961 6.10.3.13 MAS5 Register [Category: 6.7.5.1 Page Table Updates . . . . . . . . .962 Embedded.Hypervisor] . . . . . . . . . . . . 983 6.7.5.1.1 Adding a Page Table Entry . .962 6.10.3.14 MAS6 Register . . . . . . . . . . 983 6.7.5.1.2 Deleting a Page Table Entry .963 6.10.3.15 MAS7 Register . . . . . . . . . . 984 6.7.5.1.3 Modifying a Page Table Entry . . 6.10.3.16 MAS8 Register [Category: 963 Embedded.Hypervisor] . . . . . . . . . . . . 984 6.7.5.2 Invalidating an Indirect TLB Entry . 6.10.3.17 Accesses to Paired MAS 963 Registers . . . . . . . . . . . . . . . . . . . . . . . 985 6.7.6 Storage Access Control . . . . . . .964 6.10.3.18 MAS Register Update Summary 6.7.6.1 Execute Access . . . . . . . . . . . .964 985 6.7.6.2 Write Access . . . . . . . . . . . . . .964 6.11 Storage Control Instructions . . . . 988 6.7.6.3 Read Access . . . . . . . . . . . . . .965 6.11.1 Cache Management Instructions . . 6.7.6.4 Virtualized Access . . .965 988 6.7.6.5 Storage Access Control Applied to 6.11.2 Cache Locking [Category: Cache Management Instructions . . . . .965 Embedded Cache Locking] . . . . . . . . . 989 6.7.6.6 Storage Access Control Applied to 6.11.2.1 Lock Setting and Clearing . . 989 String Instructions. . . . . . . . . . . . . . . . .966 6.11.2.2 Error Conditions . . . . . . . . . . 989 6.8 Storage Control Attributes . . . . . . .966 6.11.2.2.1 Overlocking . . . . . . . . . . . . 990 6.8.1 Guarded Storage. . . . . . . . . . . . .966 6.11.2.2.2 Unable-to-lock and Unable-to- 6.8.1.1 Out-of-Order Accesses to Guarded unlock Conditions . . . . . . . . . . . . . . . . 990 Storage . . . . . . . . . . . . . . . . . . . . . . . . .966 6.11.2.3 Cache Locking Instructions . 991 6.8.2 User-Definable . . . . . . . . . . . . . .966 6.11.3 Synchronize Instruction . . . . . . 993 6.8.3 Storage Control Bits . . . . . . . . . .966 6.11.4 LRAT [Category: 6.8.3.1 Storage Control Bit Restrictions . . . Embedded.Hypervisor.LRAT] and TLB 967 Management . . . . . . . . . . . . . . . . . . . . 993 6.8.3.2 Altering the Storage Control Bits . . 6.11.4.1 Reading TLB or LRAT Entries 993 968 6.11.4.2 Writing TLB or LRAT Entries. 993 6.9 Logical to Real Address Translation 6.11.4.2.1 TLB Write Conditional [Category: Embedded.Hypervisor.LRAT] . . [Embedded.TLB Write Conditional]. . . 994 971 6.11.4.3 Invalidating TLB Entries . . . . 997 6.10 Storage Control Registers . . . . . .973 6.11.4.4 TLB Lookaside Information. . 999 6.10.1 Process ID Register . . . . . . . . .973 6.11.4.5 Invalidating LRAT Entries . . . 999 6.10.2 MMU Assist Registers. . . . . . . .973 6.11.4.6 Searching TLB Entries . . . . . 999 6.10.3 MMU Configuration and Control 6.11.4.7 TLB Replacement Hardware Registers . . . . . . . . . . . . . . . . . . . . . . .974 Assist . . . . . . . . . . . . . . . . . . . . . . . . . 999 6.10.3.1 MMU Configuration Register 6.11.4.8 32-bit and 64-bit Specific MMU (MMUCFG) . . . . . . . . . . . . . . . . . . . . . .974 Behavior . . . . . . . . . . . . . . . . . . . . . . 1000 xx Power ISATM I-III, VLE Version 2.06 6.11.4.9 TLB Management Instructions . . 7.2.20 Guest External Proxy Register 1001 [Category: Embedded Hypervisor, External Proxy]. . . . . . . . . . . . . . . . . . . . . . . . . 1023 Chapter 7. Interrupts and 7.3 Exceptions. . . . . . . . . . . . . . . . . . 1025 Exceptions. . . . . . . . . . . . . . . . . . 1013 7.4 Interrupt Classification. . . . . . . . . 1025 7.4.1 Asynchronous Interrupts . . . . . 1025 7.1 Overview. . . . . . . . . . . . . . . . . . . 1014 7.4.2 Synchronous Interrupts . . . . . . 1025 7.2 Interrupt Registers . . . . . . . . . . . 1014 7.4.2.1 Synchronous, Precise Interrupts . . 7.2.1 Save/Restore Register 0 . . . . . 1014 1026 7.2.2 Save/Restore Register 1 . . . . . 1015 7.4.2.2 Synchronous, Imprecise Interrupts 7.2.3 Guest Save/Restore Register 0 1026 [Category:Embedded.Hypervisor]. . . 1015 7.4.3 Interrupt Classes . . . . . . . . . . . 1026 7.2.4 Guest Save/Restore Register 1 7.4.4 Machine Check Interrupts . . . . 1026 [Category:Embedded.Hypervisor]. . . 1015 7.5 Interrupt Processing . . . . . . . . . . 1027 7.2.5 Critical Save/Restore Register 0 . . . 7.6 Interrupt Definitions . . . . . . . . . . . 1030 1016 7.6.1 Interrupt Fixed Offsets [Category: 7.2.6 Critical Save/Restore Register 1 . . . Embedded.Phased-In]. . . . . . . . . . . . 1033 1016 7.6.2 Critical Input Interrupt. . . . . . . . 1034 7.2.7 Debug Save/Restore Register 0 7.6.3 Machine Check Interrupt . . . . . 1034 [Category: Embedded.Enhanced Debug] . 7.6.4 Data Storage Interrupt . . . . . . . 1035 1016 7.6.5 Instruction Storage Interrupt. . . 1037 7.2.8 Debug Save/Restore Register 1 7.6.6 External Input Interrupt . . . . . . 1039 [Category: Embedded.Enhanced Debug] . 7.6.7 Alignment Interrupt. . . . . . . . . . 1039 1017 7.6.8 Program Interrupt . . . . . . . . . . . 1040 7.2.9 Data Exception Address Register . . 7.6.9 Floating-Point Unavailable Interrupt . 1017 1042 7.2.10 Guest Data Exception Address 7.6.10 System Call Interrupt . . . . . . . 1042 Register [Category: 7.6.11 Auxiliary Processor Unavailable Embedded.Hypervisor] . . . . . . . . . . . 1017 Interrupt . . . . . . . . . . . . . . . . . . . . . . . 1042 7.2.11 Interrupt Vector Prefix Register . . . 7.6.12 Decrementer Interrupt . . . . . . 1043 1018 7.6.13 Fixed-Interval Timer Interrupt . 1043 7.2.12 Guest Interrupt Vector Prefix 7.6.14 Watchdog Timer Interrupt. . . . 1044 Register [Category: 7.6.15 Data TLB Error Interrupt. . . . . 1044 Embedded.Hypervisor.Phased-Out] . 1018 7.6.16 Instruction TLB Error Interrupt 1045 7.2.13 Exception Syndrome Register 1019 7.6.17 Debug Interrupt . . . . . . . . . . . 1046 7.2.14 Guest Exception Syndrome 7.6.18 SPE/Embedded Floating-Point/ Register [Category: Vector Unavailable Interrupt Embedded.Hypervisor] . . . . . . . . . . . 1020 [Categories: SPE.Embedded Float Scalar 7.2.15 Interrupt Vector Offset Registers Double, SPE.Embedded Float Vector, [Category: Embedded.Phased-Out] . 1020 Vector] . . . . . . . . . . . . . . . . . . . . . . . . 1047 7.2.16 Guest Interrupt Vector Offset 7.6.19 Embedded Floating-Point Data Register [Category: Interrupt Embedded.Hypervisor.Phased-Out] . 1021 [Categories: SPE.Embedded Float Scalar 7.2.17 Logical Page Exception Register Double, SPE.Embedded Float Scalar [Category: Embedded.Hypervisor and Single, SPE.Embedded Float Vector] 1048 Embedded.Page Table] . . . . . . . . . . . 1022 7.6.20 Embedded Floating-Point Round 7.2.18 Machine Check Registers . . . 1022 Interrupt 7.2.18.1 Machine Check Save/Restore [Categories: SPE.Embedded Float Scalar Register 0 . . . . . . . . . . . . . . . . . . . . . 1022 Double, SPE.Embedded Float Scalar 7.2.18.2 Machine Check Save/Restore Single, SPE.Embedded Float Vector] 1048 Register 1 . . . . . . . . . . . . . . . . . . . . . 1022 7.6.21 Performance Monitor Interrupt 7.2.18.3 Machine Check Syndrome [Category: Embedded.Performance Register. . . . . . . . . . . . . . . . . . . . . . . 1023 Monitor] . . . . . . . . . . . . . . . . . . . . . . . 1049 7.2.18.4 Machine Check Interrupt Vector 7.6.22 Processor Doorbell Interrupt Prefix Register . . . . . . . . . . . . . . . . . 1023 [Category: Embedded.Processor 7.2.19 External Proxy Register [Category: Control] . . . . . . . . . . . . . . . . . . . . . . . 1049 External Proxy] . . . . . . . . . . . . . . . . . 1023 Table of Contents xxi Version 2.06 7.6.23 Processor Doorbell Critical Interrupt Chapter 8. Reset and Initialization . . [Category: Embedded.Processor 1063 Control] . . . . . . . . . . . . . . . . . . . . . . . .1049 8.1 Background. . . . . . . . . . . . . . . . . 1063 7.6.24 Guest Processor Doorbell Interrupt 8.2 Reset Mechanisms . . . . . . . . . . . 1063 [Category: 8.3 Thread State after Reset . . . . . . 1063 Embedded.Hypervisor,Embedded.Process 8.4 Software Initialization or Control]. . . . . . . . . . . . . . . . . . . . . .1049 Requirements . . . . . . . . . . . . . . . . . . 1065 7.6.25 Guest Processor Doorbell Critical Interrupt [Category: Embedded.Hypervisor,Embedded.Process Chapter 9. Timer Facilities . . . . 1067 or Control]. . . . . . . . . . . . . . . . . . . . . .1051 9.1 Overview. . . . . . . . . . . . . . . . . . . 1067 7.6.26 Guest Processor Doorbell Machine 9.2 Time Base (TB) . . . . . . . . . . . . . 1067 Check Interrupt [Category: 9.2.1 Writing the Time Base . . . . . . . 1068 Embedded.Hypervisor,Embedded.Process 9.3 Decrementer . . . . . . . . . . . . . . . . 1069 or Control]. . . . . . . . . . . . . . . . . . . . . .1051 9.3.1 Writing and Reading the 7.6.27 Embedded Hypervisor System Call Decrementer . . . . . . . . . . . . . . . . . . . 1069 Interrupt [Category: 9.3.2 Decrementer Events . . . . . . . . 1069 Embedded.Hypervisor]. . . . . . . . . . . .1051 9.4 Decrementer Auto-Reload Register . . 7.6.28 Embedded Hypervisor Privilege 1070 Interrupt [Category: 9.5 Timer Control Register . . . . . . . . 1070 Embedded.Hypervisor]. . . . . . . . . . . .1052 9.5.1 Timer Status Register . . . . . . . 1072 7.6.29 LRAT Error Interrupt [Category: 9.6 Fixed-Interval Timer . . . . . . . . . . 1073 Embedded.Hypervisor.LRAT] . . . . . . .1052 9.7 Watchdog Timer . . . . . . . . . . . . . 1073 7.7 Partially Executed Instructions . . .1054 9.8 Freezing the Timer Facilities . . . . 1075 7.8 Interrupt Ordering and Masking . .1055 7.8.1 Guidelines for System Software 1056 Chapter 10. Debug Facilities . . 1077 7.8.2 Interrupt Order . . . . . . . . . . . . .1058 10.1 Overview. . . . . . . . . . . . . . . . . . 1077 7.9 Exception Priorities . . . . . . . . . . .1059 10.2 Internal Debug Mode . . . . . . . . 1077 7.9.1 Exception Priorities for Defined 10.3 External Debug Mode [Category: Instructions . . . . . . . . . . . . . . . . . . . . .1059 Embedded.Enhanced Debug] . . . . . . 1078 7.9.1.1 Exception Priorities for Defined 10.4 Debug Events . . . . . . . . . . . . . . 1078 Floating-Point Load and Store Instructions 10.4.1 Instruction Address Compare 1059 Debug Event . . . . . . . . . . . . . . . . . . . 1079 7.9.1.2 Exception Priorities for Other 10.4.2 Data Address Compare Debug Defined Load and Store Instructions and Event . . . . . . . . . . . . . . . . . . . . . . . . . 1081 Defined Cache Management Instructions . 10.4.3 Trap Debug Event . . . . . . . . . 1082 1059 10.4.4 Branch Taken Debug Event . . 1083 7.9.1.3 Exception Priorities for Other 10.4.5 Instruction Complete Debug Event Defined Floating-Point Instructions. . .1060 1083 7.9.1.4 Exception Priorities for Defined 10.4.6 Interrupt Taken Debug Event . 1083 Privileged Instructions . . . . . . . . . . . .1060 10.4.6.1 Causes of Interrupt Taken Debug 7.9.1.5 Exception Priorities for Defined Events . . . . . . . . . . . . . . . . . . . . . . . . 1083 Trap Instructions . . . . . . . . . . . . . . . . .1060 10.4.6.2 Interrupt Taken Debug Event 7.9.1.6 Exception Priorities for Defined Description . . . . . . . . . . . . . . . . . . . . 1083 System Call Instruction. . . . . . . . . . . .1060 10.4.7 Return Debug Event . . . . . . . 1084 7.9.1.7 Exception Priorities for Defined 10.4.8 Unconditional Debug Event . . 1084 Branch Instructions. . . . . . . . . . . . . . .1060 10.4.9 Critical Interrupt Taken Debug 7.9.1.8 Exception Priorities for Defined Event [Category: Embedded.Enhanced Return From Interrupt Instructions . . .1061 Debug]. . . . . . . . . . . . . . . . . . . . . . . . 1084 7.9.1.9 Exception Priorities for Other 10.4.10 Critical Interrupt Return Debug Defined Instructions . . . . . . . . . . . . . .1061 Event [Category: Embedded.Enhanced 7.9.2 Exception Priorities for Reserved Debug]. . . . . . . . . . . . . . . . . . . . . . . . 1085 Instructions . . . . . . . . . . . . . . . . . . . . .1061 10.5 Debug Registers . . . . . . . . . . . . 1085 10.5.1 Debug Control Registers . . . . 1085 10.5.1.1 Debug Control Register 0 (DBCR0) . . . . . . . . . . . . . . . . . . . . . . 1085 xxii Power ISATM I-III, VLE Version 2.06 10.5.1.2 Debug Control Register 1 A.2.1.3 Instruction Cache Debug Data (DBCR1) . . . . . . . . . . . . . . . . . . . . . . 1087 Register . . . . . . . . . . . . . . . . . . . . . . . 1105 10.5.1.3 Debug Control Register 2 A.2.1.4 Instruction Cache Debug Tag (DBCR2) . . . . . . . . . . . . . . . . . . . . . . 1088 Register High. . . . . . . . . . . . . . . . . . . 1105 10.5.2 Debug Status Register. . . . . . 1089 A.2.1.5 Instruction Cache Debug Tag 10.5.3 Debug Status Register Write Register Low . . . . . . . . . . . . . . . . . . . 1105 Register (DBSRWR) . . . . . . . . . . . . . 1091 A.2.2 Embedded Cache Debug 10.5.4 Instruction Address Compare Instructions . . . . . . . . . . . . . . . . . . . . 1106 Registers . . . . . . . . . . . . . . . . . . . . . . 1091 10.5.5 Data Address Compare Appendix B. Assembler Extended Registers . . . . . . . . . . . . . . . . . . . . . . 1091 Mnemonics . . . . . . . . . . . . . . . . . . 1109 10.5.6 Data Value Compare B.1 Move To/From Special Purpose Registers . . . . . . . . . . . . . . . . . . . . . . 1091 Register Mnemonics . . . . . . . . . . . . . 1110 10.6 Debugger Notify Halt Instruction [Category: Embedded.Enhanced Debug]. . . . . . . . . . . . . . . . . . . . . . . . 1092 Appendix C. Guidelines for 64-bit Implementations in 32-bit Mode and Chapter 11. Processor Control 32-bit Implementations . . . . . . . . 1111 [Category: Embedded.Processor C.1 Hardware Guidelines . . . . . . . . . 1111 Control] . . . . . . . . . . . . . . . . . . . . 1093 C.1.1 64-bit Specific Instructions. . . . 1111 C.1.2 Registers on 32-bit 11.1 Overview. . . . . . . . . . . . . . . . . . 1093 Implementations . . . . . . . . . . . . . . . . 1111 11.2 Programming Model . . . . . . . . . 1093 C.1.3 Addressing on 32-bit 11.2.1 Message Handling and Implementations . . . . . . . . . . . . . . . . 1111 Filtering . . . . . . . . . . . . . . . . . . . . . . . 1093 C.1.4 TLB Fields on 32-bit 11.2.2 Doorbell Message Filtering . . 1094 Implementations . . . . . . . . . . . . . . . . 1111 11.2.2.1 Doorbell Critical Message C.2 32-bit Software Guidelines . . . . . 1111 Filtering . . . . . . . . . . . . . . . . . . . . . . . 1095 C.2.1 32-bit Instruction Selection . . . 1111 11.2.2.2 Guest Doorbell Message Filtering [Category: Embedded.Hypervisor] . . 1095 11.2.2.3 Guest Doorbell Critical Message Appendix D. Example Performance Filtering [Category: Monitor [Category: Embedded.Hypervisor] . . . . . . . . . . . 1096 Embedded.Performance Monitor]. . . . 11.2.2.4 Guest Doorbell Machine Check 1113 Message Filtering [Category: Embedded.Hypervisor] . . . . . . . . . . . 1096 D.1 Overview . . . . . . . . . . . . . . . . . . . 1113 11.3 Processor Control Instructions . 1098 D.2 Programming Model . . . . . . . . . . 1113 D.2.1 Event Counting . . . . . . . . . . . . 1114 D.2.2 Thread Context Configurability 1114 Chapter 12. Synchronization D.2.3 Event Selection . . . . . . . . . . . . 1115 Requirements for Context D.2.4 Thresholds . . . . . . . . . . . . . . . . 1115 Alterations . . . . . . . . . . . . . . . . . . 1099 D.2.5 Performance Monitor Exception. . . . 1115 D.2.6 Performance Monitor Interrupt . 1115 Appendix A. Implementation- D.3 Performance Monitor Registers . 1115 Dependent Instructions . . . . . . . 1103 D.3.1 Performance Monitor Global Control Register 0 . . . . . . . . . . . . . . . . . . . . . 1115 A.1 Embedded Cache Initialization D.3.2 Performance Monitor Local Control [Category: Embedded.Cache Initialization] A Registers . . . . . . . . . . . . . . . . . . . . 1116 1103 D.3.3 Performance Monitor Local Control A.2 Embedded Cache Debug Facility B Registers . . . . . . . . . . . . . . . . . . . . 1116 [Category: Embedded.Cache Debug] 1104 D.3.4 Performance Monitor Counter Registers . . . . . . . . . . . . . . . . . . . . . . 1117 A.2.1 Embedded Cache Debug D.4 Performance Monitor Registers . . . . . . . . . . . . . . . . . . . . . . 1104 Instructions . . . . . . . . . . . . . . . . . . . . 1118 A.2.1.1 Data Cache Debug Tag Register D.5 Performance Monitor Software Usage High. . . . . . . . . . . . . . . . . . . . . . . . . . 1104 Notes . . . . . . . . . . . . . . . . . . . . . . . . . 1119 A.2.1.2 Data Cache Debug Tag Register D.5.1 Chaining Counters . . . . . . . . . . 1119 Low . . . . . . . . . . . . . . . . . . . . . . . . . . 1104 D.5.2 Thresholding . . . . . . . . . . . . . . 1119 Table of Contents xxiii Version 2.06 Book VLE: 2.2.1 Misaligned, Mismatched, and Byte Ordering Instruction Storage Exceptions . . . . . . . . . . . . . . . . . . . . 1130 Power ISA Operating Environment 2.2.2 VLE Exception Syndrome Bits. 1130 Architecture - Variable Length Encoding (VLE) Environment Chapter 3. VLE Compatibility with [Category: Variable Length Books I­III . . . . . . . . . . . . . . . . . . 1133 Encoding] . . . . . . . . . . . . . . . . . . . 1121 3.1 Overview. . . . . . . . . . . . . . . . . . . 1133 3.2 VLE Processor and Storage Control Chapter 1. Variable Length Encoding Extensions . . . . . . . . . . . . . . . . . . . . 1133 3.2.1 Instruction Extensions . . . . . . . 1133 Introduction . . . . . . . . . . . . . . . . . 1123 3.2.2 MMU Extensions . . . . . . . . . . . 1133 1.1 Overview . . . . . . . . . . . . . . . . . . .1123 3.3 VLE Limitations. . . . . . . . . . . . . . 1134 1.2 Documentation Conventions . . . .1124 1.2.1 Description of Instruction Chapter 4. Branch Operation Operation . . . . . . . . . . . . . . . . . . . . . .1124 1.3 Instruction Mnemonics and Instructions . . . . . . . . . . . . . . . . . 1135 Operands . . . . . . . . . . . . . . . . . . . . . .1124 4.1 Branch Facility Registers . . . . . . 1135 1.4 VLE Instruction Formats. . . . . . . .1124 4.1.1 Condition Register (CR) . . . . . 1135 1.4.1 BD8-form (16-bit Branch 4.1.1.1 Condition Register Setting for Instructions) . . . . . . . . . . . . . . . . . . . .1124 Compare Instructions . . . . . . . . . . . . 1136 1.4.2 C-form (16-bit Control 4.1.1.2 Condition Register Setting for the Instructions) . . . . . . . . . . . . . . . . . . . .1124 Bit Test Instruction. . . . . . . . . . . . . . . 1136 1.4.3 IM5-form (16-bit register + immediate 4.1.2 Link Register (LR) . . . . . . . . . . 1136 Instructions) . . . . . . . . . . . . . . . . . . . .1124 4.1.3 Count Register (CTR) . . . . . . . 1136 1.4.4 OIM5-form (16-bit register + offset 4.2 Branch Instructions . . . . . . . . . . . 1137 immediate Instructions) . . . . . . . . . . .1124 4.3 System Linkage Instructions. . . . 1140 1.4.5 IM7-form (16-bit Load immediate 4.4 Condition Register Instructions. . 1144 Instructions) . . . . . . . . . . . . . . . . . . . .1124 1.4.6 R-form (16-bit Monadic Chapter 5. Fixed-Point Instructions) . . . . . . . . . . . . . . . . . . . .1125 Instructions . . . . . . . . . . . . . . . . . 1147 1.4.7 RR-form (16-bit Dyadic 5.1 Fixed-Point Load Instructions . . . 1147 Instructions) . . . . . . . . . . . . . . . . . . . .1125 5.2 Fixed-Point Store Instructions. . . 1151 1.4.8 SD4-form (16-bit Load/Store 5.3 Fixed-Point Load and Store with Byte Instructions) . . . . . . . . . . . . . . . . . . . .1125 Reversal Instructions. . . . . . . . . . . . . 1154 1.4.9 BD15-form. . . . . . . . . . . . . . . . .1125 5.4 Fixed-Point Load and Store Multiple 1.4.10 BD24-form. . . . . . . . . . . . . . . .1125 Instructions . . . . . . . . . . . . . . . . . . . . 1154 1.4.11 D8-form . . . . . . . . . . . . . . . . . .1125 5.5 Fixed-Point Arithmetic 1.4.12 ESC-form . . . . . . . . . . . . . . . .1125 Instructions . . . . . . . . . . . . . . . . . . . . 1155 1.4.13 I16A-form . . . . . . . . . . . . . . . .1125 5.6 Fixed-Point Compare and Bit Test 1.4.14 I16L-form. . . . . . . . . . . . . . . . .1125 Instructions . . . . . . . . . . . . . . . . . . . . 1159 1.4.15 M-form. . . . . . . . . . . . . . . . . . .1125 5.7 Fixed-Point Trap Instructions . . . 1163 1.4.16 SCI8-form . . . . . . . . . . . . . . . .1125 5.8 Fixed-Point Select Instruction . . . 1163 1.4.17 LI20-form. . . . . . . . . . . . . . . . .1125 5.9 Fixed-Point Logical, Bit, and Move 1.4.18 X-form . . . . . . . . . . . . . . . . . . .1126 Instructions . . . . . . . . . . . . . . . . . . . . 1164 1.4.19 Instruction Fields . . . . . . . . . . .1126 5.10 Fixed-Point Rotate and Shift Instructions . . . . . . . . . . . . . . . . . . . . 1169 Chapter 2. VLE Storage 5.11 Move To/From System Register Addressing. . . . . . . . . . . . . . . . . . 1129 Instructions . . . . . . . . . . . . . . . . . . . . 1172 2.1 Data Storage Addressing Modes. . . . . . . . . . . . . . . . . . . . . . . . .1129 Chapter 6. Storage Control 2.2 Instruction Storage Addressing Instructions . . . . . . . . . . . . . . . . . 1173 Modes. . . . . . . . . . . . . . . . . . . . . . . . .1130 6.1 Storage Synchronization Instructions. . . . . . . . . . . . . . . . . . . . 1173 6.2 Cache Management Instructions 1174 xxiv Power ISATM I-III, VLE Version 2.06 6.3 Cache Locking Instructions. . . . . 1174 A.15 Load/Store Multiple 6.4 TLB Management Instructions . . 1174 Instructions . . . . . . . . . . . . . . . . . . . . 1213 6.5 Instruction Alignment and Byte A.16 Move Assist Instructions . . . . . . 1213 Ordering . . . . . . . . . . . . . . . . . . . . . . 1174 A.17 Move To/From SPR. . . . . . . . . . 1213 A.18 Effects of Exceptions on FPSCR Bits Chapter 7. Additional Categories FR and FI. . . . . . . . . . . . . . . . . . . . . . 1214 Available in VLE . . . . . . . . . . . . . 1175 A.19 Store Floating-Point Single Instructions . . . . . . . . . . . . . . . . . . . . 1214 7.1 Move Assist . . . . . . . . . . . . . . . . 1175 A.20 Move From FPSCR. . . . . . . . . . 1214 7.2 Vector . . . . . . . . . . . . . . . . . . . . . 1175 A.21 Zeroing Bytes in the Data 7.3 Signal Processing Engine. . . . . . 1175 Cache . . . . . . . . . . . . . . . . . . . . . . . . 1214 7.4 Embedded Floating Point . . . . . . 1175 A.22 Synchronization . . . . . . . . . . . . 1214 7.5 Embedded Hypervisor . . . . . . . . 1175 A.23 Move To Machine State Register 7.6 Legacy Move Assist . . . . . . . . . . 1175 Instruction . . . . . . . . . . . . . . . . . . . . . 1215 7.7 External PID . . . . . . . . . . . . . . . . 1176 A.24 Direct-Store Segments . . . . . . . 1215 7.8 Embedded Performance Monitor 1176 A.25 Segment Register 7.9 Processor Control . . . . . . . . . . . . 1176 Manipulation Instructions. . . . . . . . . . 1215 7.10 Decorated Storage . . . . . . . . . . 1176 A.26 TLB Entry Invalidation. . . . . . . . 1215 7.11 Embedded Cache Initialization . 1176 A.27 Alignment Interrupts . . . . . . . . . 1215 7.12 Embedded Cache Debug . . . . . 1176 A.28 Floating-Point Interrupts . . . . . . 1215 A.29 Timing Facilities . . . . . . . . . . . . 1215 Appendix A. VLE Instruction Set A.29.1 Real-Time Clock . . . . . . . . . . 1215 Sorted by Mnemonic. . . . . . . . . . 1177 A.29.2 Decrementer . . . . . . . . . . . . . 1216 A.30 Deleted Instructions . . . . . . . . . 1216 Appendix B. VLE Instruction Set A.31 Discontinued Opcodes . . . . . . . 1216 A.32 POWER2 Compatibility. . . . . . . 1218 Sorted by Opcode . . . . . . . . . . . . 1193 A.32.1 Cross-Reference for Changed POWER2 Mnemonics . . . . . . . . . . . . 1218 Appendices: A.32.2 Load/Store Floating-Point Double . . . . . . . . . . . . . . . . . . . . . . . . 1218 Power ISA Book I-III A.32.3 Floating-Point Conversion to Integer . . . . . . . . . . . . . . . . . . . . . . . . 1218 Appendices . . . . . . . . . . . . . . . . . 1209 A.32.4 Floating-Point Interrupts. . . . . 1219 A.32.5 Trace . . . . . . . . . . . . . . . . . . . 1219 Appendix A. Incompatibilities with A.33 Deleted Instructions . . . . . . . . . 1219 the POWER Architecture . . . . . . 1211 A.33.1 Discontinued Opcodes. . . . . . 1219 A.1 New Instructions, Formerly Privileged Instructions . . . . . . . . . . . . . . . . . . . . 1211 Appendix B. Platform Support A.2 Newly Privileged Requirements . . . . . . . . . . . . . . . . 1221 Instructions . . . . . . . . . . . . . . . . . . . . 1211 A.3 Reserved Fields in Appendix C. Complete SPR Instructions . . . . . . . . . . . . . . . . . . . . 1211 A.4 Reserved Bits in Registers . . . . . 1211 List. . . . . . . . . . . . . . . . . . . . . . . . . 1225 A.5 Alignment Check . . . . . . . . . . . . 1211 A.6 Condition Register . . . . . . . . . . . 1212 Appendix D. Illegal A.7 LK and Rc Bits . . . . . . . . . . . . . . 1212 Instructions . . . . . . . . . . . . . . . . . 1229 A.8 BO Field . . . . . . . . . . . . . . . . . . . 1212 A.9 BH Field . . . . . . . . . . . . . . . . . . . 1212 A.10 Branch Conditional to Count Appendix E. Reserved Register. . . . . . . . . . . . . . . . . . . . . . . 1212 Instructions . . . . . . . . . . . . . . . . . 1231 A.11 System Call . . . . . . . . . . . . . . . 1212 A.12 Fixed-Point Exception Appendix F. Opcode Maps . . . . . 1233 Register (XER) . . . . . . . . . . . . . . . . . 1213 A.13 Update Forms of Storage Access Appendix G. Power ISA Instruction Instructions . . . . . . . . . . . . . . . . . . . . 1213 A.14 Multiple Register Loads . . . . . . 1213 Set Sorted by Category . . . . . . . . 1259 Table of Contents xxv Version 2.06 Appendix H. Power ISA Instruction Set Sorted by Opcode . . . . . . . . . 1281 Appendix I. Power ISA Instruction Set Sorted by Mnemonic . . . . . . 1303 Index. . . . . . . . . . . . . . . . . . . . . . . . 1325 Last Page - End of Document . . . 1335 xxvi Power ISATM I-III, VLE Version 2.06 Figures Preface ................................................. iii 35. Instructions and byte ordering . . . . . . . . . . . . . 25 36. Assembly language program `p'. . . . . . . . . . . . 26 37. Big-Endian mapping of program `p' . . . . . . . . . 26 Table of Contents ................................ ix 38. Little-Endian mapping of program `p' . . . . . . . . 26 39. Condition Register . . . . . . . . . . . . . . . . . . . . . . 30 Figures............................................. xxvii 40. Link Register . . . . . . . . . . . . . . . . . . . . . . . . . . 31 41. Count Register . . . . . . . . . . . . . . . . . . . . . . . . . 31 42. BO field encodings . . . . . . . . . . . . . . . . . . . . . . 32 Book I: 43. "at" bit encodings . . . . . . . . . . . . . . . . . . . . . . . 32 44. BH field encodings . . . . . . . . . . . . . . . . . . . . . . 32 Power ISA User Instruction Set 45. General Purpose Registers . . . . . . . . . . . . . . . 42 46. Fixed-Point Exception Register . . . . . . . . . . . . 42 Architecture .......................................... 1 47. Software-use SPRs . . . . . . . . . . . . . . . . . . . . . 43 48. Floating-Point Registers. . . . . . . . . . . . . . . . . 107 1. Category Listing . . . . . . . . . . . . . . . . . . . . . . . . . 9 49. Floating-Point Status and Control 2. Logical processing model . . . . . . . . . . . . . . . . . 12 Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 3. Registers that are defined in Book I . . . . . . . . . 13 50. Floating-Point Result Flags . . . . . . . . . . . . . . 109 4. I instruction format. . . . . . . . . . . . . . . . . . . . . . . 15 51. Floating-point single format. . . . . . . . . . . . . . 109 5. B instruction format . . . . . . . . . . . . . . . . . . . . . . 15 52. Floating-point double format . . . . . . . . . . . . . 110 6. SC instruction format. . . . . . . . . . . . . . . . . . . . . 15 53. IEEE floating-point fields . . . . . . . . . . . . . . . . 110 7. D instruction format . . . . . . . . . . . . . . . . . . . . . . 15 54. Approximation to real numbers . . . . . . . . . . . 110 8. DS instruction format. . . . . . . . . . . . . . . . . . . . . 15 55. Selection of Z1 and Z2 . . . . . . . . . . . . . . . . . . 114 9. DQ instruction format . . . . . . . . . . . . . . . . . . . . 15 56. IEEE 64-bit execution model . . . . . . . . . . . . . 120 10. X Instruction Format . . . . . . . . . . . . . . . . . . . . 16 57. Interpretation of G, R, and X bits . . . . . . . . . . 120 11. XL instruction format . . . . . . . . . . . . . . . . . . . . 16 58. Location of the Guard, Round, and 12. XFX instruction format. . . . . . . . . . . . . . . . . . . 16 Sticky bits in the IEEE execution model . . . 120 13. XFL instruction format . . . . . . . . . . . . . . . . . . . 16 59. Multiply-add 64-bit execution model. . . . . . . . 121 14. XX1 Instruction Format . . . . . . . . . . . . . . . . . . 17 60. Location of the Guard, Round, and Sticky bits in the 15. XX2 Instruction Format . . . . . . . . . . . . . . . . . . 17 multiply-add execution model . . . . . . . . . . . 121 16. XX3 Instruction Format . . . . . . . . . . . . . . . . . . 17 62. Format for Unsigned Decimal Data . . . . . . . . 157 17. XX4-Form Instruction Format . . . . . . . . . . . . . 17 63. Format for Signed Decimal Data . . . . . . . . . . 157 18. XS instruction format . . . . . . . . . . . . . . . . . . . . 17 64. Summary of BCD Digit and Sign Codes . . . . 157 19. XO instruction format. . . . . . . . . . . . . . . . . . . . 17 65. DFP Short format . . . . . . . . . . . . . . . . . . . . . . 158 20. A instruction format . . . . . . . . . . . . . . . . . . . . . 17 66. DFP Long format . . . . . . . . . . . . . . . . . . . . . . 158 21. M instruction format. . . . . . . . . . . . . . . . . . . . . 17 67. DFP Extended format. . . . . . . . . . . . . . . . . . . 158 22. MD instruction format . . . . . . . . . . . . . . . . . . . 17 68. Encoding of the G field for Special Symbols . 158 23. MDS instruction format . . . . . . . . . . . . . . . . . . 17 69. Encoding of bits 0:4 of the G field for Finite 24. VA instruction format . . . . . . . . . . . . . . . . . . . . 17 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 25. VC instruction format. . . . . . . . . . . . . . . . . . . . 17 70. Summary of DFP Formats . . . . . . . . . . . . . . . 159 26. VX instruction format . . . . . . . . . . . . . . . . . . . . 18 71. Value Ranges for Finite Number Data Classes . . 27. EVX instruction format. . . . . . . . . . . . . . . . . . . 18 160 28. EVS instruction format . . . . . . . . . . . . . . . . . . 18 72. Encoding of NaN and Infinity Data Classes . . 160 29. Z22 instruction format . . . . . . . . . . . . . . . . . . . 18 73. Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 30. Z23 instruction format . . . . . . . . . . . . . . . . . . . 18 74. Encoding of DFP Rounding-Mode Control 31. Storage operands and byte ordering. . . . . . . . 24 (DRN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 32. C structure `s', showing values of elements . . 25 75. Primary Encoding of Rounding-Mode 33. Big-Endian mapping of structure `s'. . . . . . . . . 25 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 34. Little-Endian mapping of structure `s' . . . . . . . 25 Figures xxvii Version 2.06 76. Secondary Encoding of Rounding-Mode 124. GPR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 125. Accumulator . . . . . . . . . . . . . . . . . . . . . . . . . 504 77. Summary of Ideal Exponents . . . . . . . . . . . . 162 126. Signal Processing and Embedded Floating-Point 78. Overflow Results When Exception Is Disabled 168 Status and Control Register . . . . . . . . . . . . . . 504 79. Rounding and Range Actions (Part 1). . . . . . 170 127. Floating-Point Data Format . . . . . . . . . . . . . 558 80. Rounding and Range Actions (Part 2). . . . . . 171 81. Actions: Add . . . . . . . . . . . . . . . . . . . . . . . . . 174 Book II: 82. Actions: Multiply . . . . . . . . . . . . . . . . . . . . . . 175 83. Actions: Divide. . . . . . . . . . . . . . . . . . . . . . . . 176 84. Actions: Compare Unordered . . . . . . . . . . . . 178 Power ISA Virtual Environment Architec- 85. Actions: Compare Ordered . . . . . . . . . . . . . . 179 ture ................................................... 651 86. Actions: Test Exponent . . . . . . . . . . . . . . . . . 181 87. Actions: Test Significance . . . . . . . . . . . . . . . 182 88. DFP Quantize examples . . . . . . . . . . . . . . . . 184 1. Performance effects of storage operand 89. Actions (part 1) Quantize. . . . . . . . . . . . . . . . 185 placement . . . . . . . . . . . . . . . . . . . . . . . . . . 672 90. Actions (part2) Quantize . . . . . . . . . . . . . . . . 185 2. Program Priority Register. . . . . . . . . . . . . . . . . 675 91. DFP Reround examples . . . . . . . . . . . . . . . . 187 3. Priority levels for or Rx,Rx,Rx . . . . . . . . . . . . . 676 92. Actions: Reround. . . . . . . . . . . . . . . . . . . . . . 188 4. Data Stream Control Register . . . . . . . . . . . . . 678 93. Actions: Round to FP Integer With Inexact . . 190 5. Time Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707 94. Actions: Round to FP Integer Without Inexact 191 6. Alternate Time Base. . . . . . . . . . . . . . . . . . . . . 710 95. Actions: Data-Format Conversion Instructions 192 96. Actions: Convert To Fixed . . . . . . . . . . . . . . . 196 Book III-S: 97. Actions: Insert Biased Exponent . . . . . . . . . . 199 98. Decimal Floating-Point Instructions Summary 201 99. Vector Register elements . . . . . . . . . . . . . . . 205 Power ISA Operating Environment Archi- 100. Vector Registers . . . . . . . . . . . . . . . . . . . . . 205 tecture - Server Environment [Category: 101. Vector Status and Control Register. . . . . . . 205 Server].............................................. 725 102. Aligned quadword storage operand . . . . . . 207 103. Vector Register contents for aligned quadword 1. Logical Partitioning Control Register . . . . . . . . 731 Load or Store . . . . . . . . . . . . . . . . . . . . . . . 207 2. Real Mode Offset Register. . . . . . . . . . . . . . . . 734 104. Unaligned quadword storage operand . . . . 207 3. Hypervisor Real Mode Offset Register. . . . . . . 734 105. Vector Register contents . . . . . . . . . . . . . . . 207 4. Logical Partition Identification Register . . . . . . 734 106. Vector-Scalar Registers . . . . . . . . . . . . . . . 274 5. Processor Compatibility Register . . . . . . . . . . . 735 107. Vector-Scalar Register Elements . . . . . . . . 274 6. Machine State Register . . . . . . . . . . . . . . . . . . 741 108. Floating-Point Registers as part of VSRs . . 274 7. Processor Version Register . . . . . . . . . . . . . . . 753 109. Vector Registers as part of VSRs . . . . . . . . 275 8. Processor Identification Register . . . . . . . . . . . 754 110. Floating-point single-precision format . . . . 283 9. Control Register . . . . . . . . . . . . . . . . . . . . . . . . 754 111. Floating-point double-precision format . . . . 283 10. Software-use SPRs . . . . . . . . . . . . . . . . . . . . 755 112. Approximation to real numbers . . . . . . . . . . 284 11. SPRs for use by hypervisor programs . . . . . . 755 113. Selection of Z1 and Z2 . . . . . . . . . . . . . . . . 288 12. Priority levels for or Rx,Rx,Rx . . . . . . . . . . . . 760 114. IEEE floating-point execution model . . . . . . 289 13. SPR encodings . . . . . . . . . . . . . . . . . . . . . . . 761 115. Multiply-add 64-bit execution model . . . . . . 290 14. SLBE for VRMA . . . . . . . . . . . . . . . . . . . . . . . 776 116. Big-endian storage image of array AW . . . . 310 15. Address translation overview . . . . . . . . . . . . . 778 117. Little-endian storage image of array AW . . . 310 16. Translation of 64-bit effective address to 118. Vector-Scalar Register contents for aligned quad- 78 bit virtual address. . . . . . . . . . . . . . . . . . 778 word Load or Store VSX Vector. . . . . . . . . 310 17. SLB Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 779 119. Storage images of array B. . . . . . . . . . . . . . 311 18. Page Size Encoding. . . . . . . . . . . . . . . . . . . . 779 120. Process to load misaligned quadword from big-en- 19. Translation of 78-bit virtual address to 60-bit real dian storage using Load VSX Vector Word*4 In- address . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781 dexed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 20. Page Table Entry . . . . . . . . . . . . . . . . . . . . . . 782 121. Process to load misaligned quadword from little- 21. Format of PTELP when PTEL=1. . . . . . . . . . . 783 endian storage Load VSX Vector Word*4 In- 22. SDR1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784 dexed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 23. Setting the Reference and Change bits . . . . . 789 122. Process to store misaligned quadword to big-endi- 24. Authority Mask Register (AMR) . . . . . . . . . . . 790 an storage using Store VSX Vector Word*4 In- 25. Authority Mask Override Register (AMOR) . . 791 dexed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 26. User Authority Mask Override Register 123. Process to store misaligned quadword to little-en- (UAMOR) . . . . . . . . . . . . . . . . . . . . . . . . . . 791 dian storage Store VSX Vector Word*4 Indexed 312 xxviii Power ISATM Book I-III, VLE Version 2.06 27. PP bit protection states, address 6. Thread Management Register Numbers . . . . . 902 translation enabled. . . . . . . . . . . . . . . . . . . 794 7. Machine State Register . . . . . . . . . . . . . . . . . . 903 28. Protection states, address translation 8. Machine State Register Protect Register . . . . . 905 disabled . . . . . . . . . . . . . . . . . . . . . . . . . . . 795 9. Embedded Processor Control Register . . . . . . 906 29. Storage control bits . . . . . . . . . . . . . . . . . . . . 797 10. Processor Version Register . . . . . . . . . . . . . . 913 30. GPR contents for slbmte . . . . . . . . . . . . . . . . 805 11. Processor Identification Register . . . . . . . . . . 914 31. GPR contents for slbmfev . . . . . . . . . . . . . . . 806 12. Guest Processor Identification Register. . . . . 914 32. GPR contents for slbmfee . . . . . . . . . . . . . . . 806 13. Special Purpose Registers. . . . . . . . . . . . . . . 915 33. GPR contents for slbfee. . . . . . . . . . . . . . . . . 807 14. External Process ID Load Context Register. . 916 34. GPR contents for mtsr, mtsrin, mfsr, and 15. External Process ID Store Context Register . 917 mfsrin . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808 16. SPR Numbers . . . . . . . . . . . . . . . . . . . . . . . . 918 35. Save/Restore Registers . . . . . . . . . . . . . . . . 822 17. Priority levels for or Rx,Rx,Rx . . . . . . . . . . . . 926 36. Hypervisor Save/Restore Registers . . . . . . . 822 18. Address translation with page table . . . . . . . . 941 37. Data Address Register . . . . . . . . . . . . . . . . . 822 19. Overlaid TLB Field Example . . . . . . . . . . . . . 949 38. Hypervisor Data Address Register . . . . . . . . 822 20. Effective-to-Virtual-to-Real TLB Address Transla- 39. Data Storage Interrupt Status Register . . . . . 822 tion Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . 950 40. Hypervisor Data Storage Interrupt Status 21. Address Translation: Virtual Address to direct TLB Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 823 Entry Match Process. . . . . . . . . . . . . . . . . . 953 41. Hypervisor Emulation Instruction Register . . 823 22. Page Table Translation . . . . . . . . . . . . . . . . . 955 42. Hypervisor Maintenance Exception Register 823 23. Page Table Entry . . . . . . . . . . . . . . . . . . . . . . 958 43. Hypervisor Maintenance Exception Enable 24. Access Control Process . . . . . . . . . . . . . . . . . 964 Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 823 25. Storage control bits . . . . . . . . . . . . . . . . . . . . 967 44. MSR setting due to interrupt . . . . . . . . . . . . . 828 26. Processor ID Register (PID). . . . . . . . . . . . . . 973 45. Effective address of interrupt vector by 27. MMU Configuration Register [MAV=1.0] . . . . 974 interrupt type . . . . . . . . . . . . . . . . . . . . . . . 829 28. MMU Configuration Register [MAV=2.0] . . . . 974 46. Time Base . . . . . . . . . . . . . . . . . . . . . . . . . . . 849 29. TLB Configuration Register [MAV=1.0] . . . . . 975 47. Decrementer . . . . . . . . . . . . . . . . . . . . . . . . . 850 30. TLB Configuration Register [MAV=2.0] . . . . . 975 48. Hypervisor Decrementer . . . . . . . . . . . . . . . . 851 31. TLB n Page Size Register . . . . . . . . . . . . . . . 976 49. Processor Utilization of Resources Register . 852 32. Embedded Page Table Configuration 50. Scaled Processor Utilization of Resources Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 977 Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 853 33. LRAT Configuration Register . . . . . . . . . . . . . 977 51. Come-From Address Register. . . . . . . . . . . . 855 34. LRAT Page Size Register . . . . . . . . . . . . . . . 977 52. Data Address Breakpoint Register . . . . . . . . 856 35. MMU Control and Status Register 0 53. Data Address Breakpoint Register Extension 856 [MAV=1.0] . . . . . . . . . . . . . . . . . . . . . . . . . 978 54. External Access Register . . . . . . . . . . . . . . . 859 36. MMU Control and Status Register 0 55. Performance Monitor SPR encodings for [MAV=2.0]. . . . . . . . . . . . . . . . . . . . . . . . . . 978 mfspr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871 37. MAS0 register . . . . . . . . . . . . . . . . . . . . . . . . 979 56. Performance Monitor SPR encodings for 38. MAS1 register [MAV=1.0] . . . . . . . . . . . . . . . 980 mtspr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871 39. MAS1 register [MAV=2.0] . . . . . . . . . . . . . . . 980 57. Performance Monitor Counter registers . . . . 872 40. MAS2 register [MAV = 1.0] . . . . . . . . . . . . . . 980 58. Monitor Mode Control Register 0 . . . . . . . . . 872 41. MAS2 register [MAV = 2.0] . . . . . . . . . . . . . . 980 59. Monitor Mode Control Register 1 . . . . . . . . . 874 42. MAS3 register for MAS1IND=0 [MAV=1.0] . . 981 60. Monitor Mode Control Register A . . . . . . . . . 875 43. MAS3 register for MAS1IND=0 [MAV=2.0] . . 981 61. Sampled Instruction Address Register . . . . . 876 44. MAS3 register for MAS1IND=1 [MAV=2.0 and Cat- 62. Sampled Data Address Register . . . . . . . . . . 876 egory: E.PT] . . . . . . . . . . . . . . . . . . . . . . . . 981 45. MAS4 register [MAV=1.0] . . . . . . . . . . . . . . . 982 46. MAS4 register [MAV=2.0] . . . . . . . . . . . . . . . 982 Book III-E: 47. MAS5 register . . . . . . . . . . . . . . . . . . . . . . . . 983 48. MAS6 register [MAV = 1.0] . . . . . . . . . . . . . . 983 Power ISA Operating Environment Archi- 49. MAS6 register [MAV = 2.0] . . . . . . . . . . . . . . 983 tecture - Embedded Environment [Cate- 50. MAS7 register . . . . . . . . . . . . . . . . . . . . . . . . 984 51. MAS8 register . . . . . . . . . . . . . . . . . . . . . . . . 984 gory: Embedded] .............................. 887 52. Exception Syndrome Register Definitions . . . . . . . . . . . . . . . . . . . . . . . . 1020 1. Logical Partition Identification Register . . . . . . 896 53. Interrupt Vector Offset Register 2. Thread Identification Register . . . . . . . . . . . . . 899 Assignments . . . . . . . . . . . . . . . . . . . . . . . 1021 3. Thread Enable Register . . . . . . . . . . . . . . . . . 899 54. Guest Interrupt Vector Offset Register 4. Thread Enable Status Register . . . . . . . . . . . . 900 Assignments . . . . . . . . . . . . . . . . . . . . . . . 1021 5. Initialize Next Instruction Address Register. . . 901 55. Logical Page Exception Register . . . . . . . . . 1022 Figures xxix Version 2.06 56. External Proxy Register. . . . . . . . . . . . . . . . 1023 19. BO32 field encodings . . . . . . . . . . . . . . . . . . 1137 57. Guest External Proxy Register . . . . . . . . . . 1024 20. BO16 field encodings . . . . . . . . . . . . . . . . . . 1137 58. Interrupt and Exception Types . . . . . . . . . . 1032 59. Interrupt Vector Offsets . . . . . . . . . . . . . . . . 1034 Appendices: 60. Interrupt Hierarchy. . . . . . . . . . . . . . . . . . . . 1055 61. Machine State Register Initial Values . . . . . 1063 62. TLB Initial Values . . . . . . . . . . . . . . . . . . . . 1064 Power ISA Book I-III Appendices ... 1209 63. Time Base . . . . . . . . . . . . . . . . . . . . . . . . . . 1067 64. Decrementer . . . . . . . . . . . . . . . . . . . . . . . . 1069 21. Platform Support Requirements. . . . . . . . . . 1222 65. Decrementer . . . . . . . . . . . . . . . . . . . . . . . . 1070 22. SPR Numbers . . . . . . . . . . . . . . . . . . . . . . . 1225 66. Relationships of the Timer Facilities . . . . . . 1071 67. Watchdog State Machine . . . . . . . . . . . . . . 1074 68. Watchdog Timer Controls . . . . . . . . . . . . . . 1074 Index............................................... 1325 69. Debug Status Register Write Register . . . . 1091 70. Data Cache Debug Tag Register High . . . . 1104 Last Page - End of Document ........ 1335 71. Data Cache Debug Tag Register Low. . . . . 1104 72. Instruction Cache Debug Data Register . . . 1105 73. Instruction Cache Debug Tag Register High 1105 74. Instruction Cache Debug Tag Register Low 1105 75. Thread States and PMLCan Bit Settings. . . 1114 76. [User] Performance Monitor Global Control Register 0. . . . . . . . . . . . . . . . . . . . . . . . . 1115 77. [User] Performance Monitor Local Control A Registers . . . . . . . . . . . . . . . . . . . . . . . . . 1116 78. [User] Performance Monitor Local Control B Register . . . . . . . . . . . . . . . . . . . . . . . . . . 1116 79. [User] Performance Monitor Counter Registers . . . . . . . . . . . . . . . . . . . . . . . . . 1117 80. Embedded.Peformance Monitor PMRs . . . . 1118 Book VLE: Power ISA Operating Environment Archi- tecture - Variable Length Encoding (VLE) Environment [Category: Variable Length Encoding] ........................................ 1121 1. BD8 instruction format. . . . . . . . . . . . . . . . . . 1124 2. C instruction format . . . . . . . . . . . . . . . . . . . . 1124 3. IM5 instruction format . . . . . . . . . . . . . . . . . . 1124 4. OIM5 instruction format . . . . . . . . . . . . . . . . . 1124 5. IM7 instruction format . . . . . . . . . . . . . . . . . . 1124 6. R instruction format . . . . . . . . . . . . . . . . . . . . 1125 7. RR instruction format. . . . . . . . . . . . . . . . . . . 1125 8. SD4 instruction format. . . . . . . . . . . . . . . . . . 1125 9. BD15 instruction format. . . . . . . . . . . . . . . . . 1125 10. BD24 instruction format. . . . . . . . . . . . . . . . 1125 11. D8 instruction format . . . . . . . . . . . . . . . . . . 1125 12. I16A instruction format . . . . . . . . . . . . . . . . 1125 13. I16L instruction format. . . . . . . . . . . . . . . . . 1125 14. M instruction format. . . . . . . . . . . . . . . . . . . 1125 15. SC18 instruction format. . . . . . . . . . . . . . . . 1125 16. LI20 instruction format. . . . . . . . . . . . . . . . . 1125 17. X instruction format . . . . . . . . . . . . . . . . . . . 1126 18. Condition Register. . . . . . . . . . . . . . . . . . . . 1135 xxx Power ISATM Book I-III, VLE Version 2.06 Book I: Power ISA User Instruction Set Architecture Book I: Power ISA AS User Instruction Set Architecture 1 Version 2.06 2 Power ISATM Book I Version 2.06 Chapter 1. Introduction 1.1 Overview. . . . . . . . . . . . . . . . . . . . . . 3 1.6.13 XX3-FORM. . . . . . . . . . . . . . . . . 17 1.2 Instruction Mnemonics and 1.6.14 XX4-FORM. . . . . . . . . . . . . . . . . 17 Operands . . . . . . . . . . . . . . . . . . . . . . . . 3 1.6.15 XS-FORM. . . . . . . . . . . . . . . . . . 17 1.3 Document Conventions . . . . . . . . . . 4 1.6.16 XO-FORM . . . . . . . . . . . . . . . . . 17 1.3.1 Definitions . . . . . . . . . . . . . . . . . . . 4 1.6.17 A-FORM . . . . . . . . . . . . . . . . . . . 17 1.3.2 Notation . . . . . . . . . . . . . . . . . . . . . 4 1.6.18 M-FORM . . . . . . . . . . . . . . . . . . 17 1.3.3 Reserved Fields and Reserved Val- 1.6.19 MD-FORM . . . . . . . . . . . . . . . . . 17 ues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.6.20 MDS-FORM . . . . . . . . . . . . . . . . 17 1.3.4 Description of Instruction 1.6.21 VA-FORM . . . . . . . . . . . . . . . . . . 17 Operation . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6.22 VC-FORM . . . . . . . . . . . . . . . . . 17 1.3.5 Categories . . . . . . . . . . . . . . . . . . . 9 1.6.23 VX-FORM. . . . . . . . . . . . . . . . . . 18 1.3.5.1 Phased-In/Phased-Out . . . . . . . 10 1.6.24 EVX-FORM . . . . . . . . . . . . . . . . 18 1.3.5.2 Corequisite Category . . . . . . . . 11 1.6.25 EVS-FORM . . . . . . . . . . . . . . . . 18 1.3.5.3 Category Notation. . . . . . . . . . . 11 1.6.26 Z22-FORM . . . . . . . . . . . . . . . . . 18 1.3.6 Environments. . . . . . . . . . . . . . . . 11 1.6.27 Z23-FORM . . . . . . . . . . . . . . . . . 18 1.4 Processor Overview . . . . . . . . . . . . 12 1.6.28 Instruction Fields . . . . . . . . . . . . 18 1.5 Computation modes . . . . . . . . . . . . 14 1.7 Classes of Instructions . . . . . . . . . . 21 1.5.1 Modes [Category: Server] . . . . . . 14 1.7.1 Defined Instruction Class . . . . . . . 21 1.5.2 Modes [Category: Embedded]. . . 14 1.7.2 Illegal Instruction Class . . . . . . . . 22 1.6 Instruction formats . . . . . . . . . . . . . 15 1.7.3 Reserved Instruction Class . . . . . 22 1.6.1 I-FORM . . . . . . . . . . . . . . . . . . . . 15 1.8 Forms of Defined Instructions . . . . . 23 1.6.2 B-FORM . . . . . . . . . . . . . . . . . . . 15 1.8.1 Preferred Instruction Forms . . . . . 23 1.6.3 SC-FORM . . . . . . . . . . . . . . . . . . 15 1.8.2 Invalid Instruction Forms . . . . . . . 23 1.6.4 D-FORM . . . . . . . . . . . . . . . . . . . 15 1.8.3 Reserved-no-op Instructions [Cate- 1.6.5 DS-FORM . . . . . . . . . . . . . . . . . . 15 gory: Phased-In] . . . . . . . . . . . . . . . . . . 23 1.6.6 DQ-FORM . . . . . . . . . . . . . . . . . . 15 1.9 Exceptions. . . . . . . . . . . . . . . . . . . . 23 1.6.7 X-FORM . . . . . . . . . . . . . . . . . . . 16 1.10 Storage Addressing. . . . . . . . . . . . 24 1.6.8 XL-FORM . . . . . . . . . . . . . . . . . . 16 1.10.1 Storage Operands . . . . . . . . . . . 24 1.6.9 XFX-FORM . . . . . . . . . . . . . . . . . 16 1.10.2 Instruction Fetches . . . . . . . . . . . 25 1.6.10 XFL-FORM . . . . . . . . . . . . . . . . 16 1.10.3 Effective Address Calculation. . . 27 1.6.11 XX1-FORM . . . . . . . . . . . . . . . . 17 1.6.12 XX2-FORM . . . . . . . . . . . . . . . . 17 1.1 Overview 1.2 Instruction Mnemonics and This chapter describes computation modes, document Operands conventions, a processor overview, instruction formats, The description of each instruction includes the mne- storage addressing, and instruction fetching. monic and a formatted list of operands. Some exam- ples are the following. stw RS,D(RA) addis RT,RA,SI Chapter 1. Introduction 3 Version 2.06 Power ISA-compliant Assemblers will support the mne- handler includes a component for each of the vari- monics and operand lists exactly as shown. They ous kinds of error. These error-specific compo- should also provide certain extended mnemonics, such nents are referred to as the system alignment error as the ones described in Appendix E of Book I. handler, the system data storage error handler, etc. 1.3 Document Conventions latency Refers to the interval from the time an instruction begins execution until it produces a result that is 1.3.1 Definitions available for use by a subsequent instruction. The following definitions are used throughout this docu- unavailable ment. Refers to a resource that cannot be used by the program. For example, storage is unavailable if program access to it is denied. See Book III. A sequence of related instructions. undefined value application program May vary between implementations, and between A program that uses only the instructions and different executions on the same implementation, resources described in Books I and II. and similarly for register contents, storage con- Processor tents, etc., that are specified as being undefined. The hardware component that implements the boundedly undefined instruction set, storage model, and other facilities The results of executing a given instruction are defined in the Power ISA architecture, and exe- said to be boundedly undefined if they could have cutes the instructions specified in a program. been achieved by executing an arbitrary finite quadwords, doublewords, words, halfwords, sequence of instructions (none of which yields and bytes boundedly undefined results) in the state the pro- 128 bits, 64 bits, 32 bits, 16 bits, and 8 bits, cessor was in before executing the given instruc- respectively. tion. Boundedly undefined results may include the presentation of inconsistent state to the system positive error handler as described in Section 1.8.1 of Book Means greater than zero. II. Boundedly undefined results for a given instruc- tion may vary between implementations, and negative between different executions on the same imple- Means less than zero. mentation. floating-point single format (or simply single "must" format) If software violates a rule that is stated using the Refers to the representation of a single-precision word "must" (e.g., "this field must be set to 0"), the binary floating-point value in a register or storage. results are boundedly undefined unless otherwise floating-point double format (or simply double stated. format) sequential execution model Refers to the representation of a double-precision The model of program execution described in binary floating-point value in a register or storage. Section 2.2, "Instruction Execution Order" on system library program page 29. A component of the system software that can be Auxiliary Processor called by an application program using a Branch An implementation-specific processing unit. Previ- instruction. ous versions of the architecture use the term Auxil- system service program iary Processing Unit (APU) to describe this A component of the system software that can be extension of the architecture. Architectural support called by an application program using a System for auxiliary processors is part of the Embedded Call instruction. category. system trap handler A component of the system software that receives 1.3.2 Notation control when the conditions specified in a Trap instruction are satisfied. The following notation is used throughout the Power ISA documents. system error handler A component of the system software that receives All numbers are decimal unless specified in some control when an error occurs. The system error special way. 4 Power ISATM Book I Version 2.06 - 0bnnnn means a number expressed in binary tus information in certain fields of the Condition format. Register as a side effect of execution. - 0xnnnn means a number expressed in hexa- The symbol || is used to describe the concatena- decimal format. tion of two values. For example, 010 || 111 is the Underscores may be used between digits. same as 010111. RT, RA, R1, ... refer to General Purpose Registers. xn means x raised to the nth power. n FRT, FRA, FR1, ... refer to Floating-Point Regis- x means the replication of x, n times (i.e., x con- ters. catenated to itself n-1 times). (n)0 and (n)1 are special cases: FRTp, FRAp, FRBp, ... refer to an even-odd pair of n Floating-Point Registers. Values must be even, - 0 means a field of n bits with each bit equal to otherwise the instruction form is invalid. 0. Thus 50 is equivalent to 0b00000. n VRT, VRA, VR1, ... refer to Vector Registers. - 1 means a field of n bits with each bit equal to 1. Thus 51 is equivalent to 0b11111. (x) means the contents of register x, where x is the Each bit and field in instructions, and in status and name of an instruction field. For example, (RA) control registers (e.g., XER, FPSCR) and Special means the contents of register RA, and (FRA) Purpose Registers, is either defined or reserved. means the contents of register FRA, where RA Some defined fields contain reserved values. In and FRA are instruction fields. Names such as LR such cases when this document refers to the spe- and CTR denote registers, not fields, so parenthe- cific field, it refers only to the defined values, ses are not used with them. Parentheses are also unless otherwise specified. omitted when register x is the register into which the result of an operation is placed. /, //, ///, ... denotes a reserved field, in a register, instruction, field, or bit string. (RA|0) means the contents of register RA if the RA field has the value 1-31, or the value 0 if the RA ?, ??, ???, ... denotes an implementation-depen- field is 0. dent field in a register, instruction, field or bit string. Bytes in instructions, fields, and bit strings are numbered from left to right, starting with byte 0 (most significant). Bits in registers, instructions, fields, and bit strings are specified as follows. In the last three items (definition of Xp etc.), if X is a field that specifies a GPR, FPR, or VR (e.g., the RS field of an instruc- tion), the definitions apply to the register, not to the field. - Bits in instructions, fields, and bit strings are numbered from left to right, starting with bit 0 - For all registers except the Vector category, bits in registers that are less than 64 bits start with bit number 64-L, where L is the register length; for the Vector category, bits in regis- ters that are less than 128 bits start with bit number 128-L. - The leftmost bit of a sequence of bits is the most significant bit of the sequence. - Xp means bit p of register/instruction/field/ bit_string X. - Xp:q means bits p through q of register/instruc- tion/field/bit_string X. - Xp q ... means bits p, q, ... of register/instruc- tion/field/bit_string X. ¬(RA) means the one's complement of the con- tents of register RA. A period (.) as the last character of an instruction mnemonic means that the instruction records sta- Chapter 1. Introduction 5 Version 2.06 1.3.3 Reserved Fields and Programming Note Reserved Values It is the responsibility of software to preserve bits that are now reserved in System Registers, Reserved fields in instructions are ignored by the pro- because they may be assigned a meaning in some cessor. This is a requirement in the Server environment future version of the architecture. and is being phased into the Embedded environment. In order to accomplish this preservation in imple- In some cases a defined field of an instruction has cer- mentation-independent fashion, software should do tain values that are reserved. This includes cases in the following. which the field is shown in the instruction layout as con- Initialize each such register supplying zeros for taining a particular value; in such cases all other values all reserved bits. of the field are reserved. In general, if an instruction is Alter (defined) bit(s) in the register by reading coded such that a defined field contains a reserved the register, altering only the desired bit(s), value the instruction form is invalid; see Section 1.8.2 and then writing the new value back to the reg- on page 23. The only exceptions to the preceding rule ister. is that it does not apply to Reserved and Illegal classes of instructions (see Section 1.7) or to portions of The XER and FPSCR are partial exceptions to this defined fields that are specified, in the instruction recommendation. Software can alter the status bits description, as being treated as reserved fields. in these registers, preserving the reserved bits, by executing instructions that have the side effect of To maximize compatibility with future architecture altering the status bits. Similarly, software can alter extensions, software must ensure that reserved fields any defined bit in the FPSCR by executing a Float- in instructions contain zero and that defined fields of ing-Point Status and Control Register instruction. instructions do not contain reserved values. Using such instructions is likely to yield better per- formance than using the method described in the The handling of reserved bits in System Registers second item above. (e.g., XER, FPSCR) is implementation-dependent. Unless otherwise stated, software is permitted to write any value to such a bit. A subsequent reading of the bit returns 0 if the value last written to the bit was 0 and 1.3.4 Description of Instruction returns an undefined value (0 or 1) otherwise. Operation In some cases a defined field of a System Register has Instruction descriptions (including related material such certain values that are reserved. Software must not set as the introduction to the section describing the instruc- a defined field of a System Register to a reserved tions) mention that the instruction may cause a system value. error handler to be invoked, under certain conditions, if References elsewhere in this document to a defined and only if the system error handler may treat the case field (in an instruction or System Register) that has as a programming error. (An instruction may cause a reserved values assume the field does not contain a system error handler to be invoked under other condi- reserved value, unless otherwise stated or obvious tions as well; see Chapter 6 of Book III-S and Chapter 7 from context. of Book III-E). A formal description is given of the operation of each Assembler Note instruction. In addition, the operation of most instruc- Assemblers should report uses of reserved values tions is described by a semiformal language at the reg- of defined fields of instructions as errors. ister transfer level (RTL). This RTL uses the notation given below, in addition to the notation described in Section 1.3.2. Some of this notation is also used in the formal descriptions of instructions. RTL notation not summarized here should be self-explanatory. The RTL descriptions cover the normal execution of the instruction, except that "standard" setting of status reg- isters, such as the Condition Register, is not shown. ("Non-standard" setting of these registers, such as the setting of the Condition Register by the Compare instructions, is shown.) The RTL descriptions do not cover cases in which the system error handler is invoked, or for which the results are boundedly unde- fined. 6 Power ISATM Book I Version 2.06 The RTL descriptions specify the architectural transfor- Little-Endian byte ordering: mation performed by the execution of an instruction. The sequence starts with the byte at address They do not imply any particular implementation. x+y-1 and ends with the byte at address x. MEM_NOTIFY(x,z) The decoration z is sent to storage location x. Notation Meaning Assignment ROTL64(x, y) iea Assignment of an instruction effective Result of rotating the 64-bit value x left y address. In 32-bit mode the high-order 32 positions bits of the 64-bit target address are set to ROTL32(x, y) 0. Result of rotating the 64-bit value x||x left y ¬ NOT logical operator positions, where x is 32 bits long + Two's complement addition SINGLE(x) Result of converting x from floating-point - Two's complement subtraction, unary double format to floating-point single for- minus mat, using the model shown on page 127 × Multiplication SPR(x) Special Purpose Register x ×si Signed-integer multiplication TRAP Invoke the system trap handler ×ui Unsigned-integer multiplication characterization / Division Reference to the setting of status bits, in a ÷ Division, with result truncated to integer standard way that is explained in the text Square root undefined An undefined value. =, Equals, Not Equals relations CIA Current Instruction Address, which is the <, , >, Signed comparison relations 64-bit address of the instruction being u Unsigned comparison relations described by a sequence of RTL. Used by ? Unordered comparison relation relative branches to set the Next Instruc- &, | AND, OR logical operators tion Address (NIA), and by Branch instruc- , Exclusive OR, Equivalence logical opera- tions with LK=1 to set the Link Register. tors ((ab) = (a¬b)) Does not correspond to any architected ABS(x) Absolute value of x register. CEIL(x) Least integer x NIA Next Instruction Address, which is the DCR(x) Device Control Register x 64-bit address of the next instruction to be DOUBLE(x) Result of converting x from floating-point executed. For a successful branch, the single format to floating-point double for- next instruction address is the branch tar- mat, using the model shown on page 123 get address: in RTL, this is indicated by EXTS(x) Result of extending x on the left with sign assigning a value to NIA. For other instruc- bits tions that cause non-sequential instruction FLOOR(x) Greatest integer x fetching (see Book III), the RTL is similar. GPR(x) General Purpose Register x For instructions that do not branch, and do MASK(x, y) Mask having 1s in positions x through y not otherwise cause instruction fetching to (wrapping if x > y) and 0s elsewhere be non-sequential, the next instruction MEM(x, y) Contents of a sequence of y bytes of stor- address is CIA+4 (VLE behavior is differ- age. The sequence depends on the byte ent; see Book VLE). Does not correspond ordering used for storage access, as fol- to any architected register. lows. if... then... else... Big-Endian byte ordering: Conditional execution, indenting shows The sequence starts with the byte at range; else is optional. address x and ends with the byte at do Do loop, indenting shows range. "To" and/ address x+y-1. or "by" clauses specify incrementing an Little-Endian byte ordering: iteration variable, and a "while" clause The sequence starts with the byte at gives termination conditions. address x+y-1 and ends with the byte at leave Leave innermost do loop, or do loop address x. described in leave statement. for For loop, indenting shows range. Clause MEM_DECORATED(x,y,z) after "for" specifies the entities for which to Contents of a sequence of y bytes of storage, execute the body of the loop. where the storage is accessed with decoration z applied. The sequence depends on the byte ordering used for storage access, as follows. Big-Endian byte ordering: The sequence starts with the byte at address x and ends with the byte at address x+y-1. Chapter 1. Introduction 7 Version 2.06 The precedence rules for RTL operators are summa- Table 1: Operator precedence rized in Table 1. Operators higher in the table are applied before those lower in the table. Operators at the Operators Associativity same level in the table associate from left to right, from &, , left to right right to left, or not at all, as shown. (For example, - | left to right associates from left to right, so a-b-c = (a-b)-c.) : (range) none Parentheses are used to override the evaluation order implied by the table or to increase clarity; parenthe- , iea none sized expressions are evaluated before serving as operands. 1.3.5 Categories Table 1: Operator precedence Each facility (including registers and fields therein) and Operators Associativity instruction is in exactly one of the categories listed in Figure 1. subscript, function evaluation left to right pre-superscript (replication), right to left A category may be defined as a dependent category. post-superscript (exponentiation) These are categories that are supported only if the cat- egory they are dependent on is also supported. Depen- unary -, ¬ right to left dent categories are identified by the "." in their category ×, ÷ left to right name, e.g., if an implementation supports the Float- +, -, left to right ing-Point.Record category, then the Floating-Point cat- egory is also supported. || left to right =, , <, , >, ,u,? left to right An implementation that supports a facility or instruction in a given category, except for the two categories Category Abvr. Notes Base B Required for all implementations Server S Required for Server implementations Embedded E Required for Embedded implementations Alternate Time Base ATB An additional Time Base; see Book II Cache Specification CS Specify a specific cache for some instructions; see Book II Decimal Floating-Point DFP Decimal Floating-Point facilities Decorated Storage DS Decorated Storage facilities Embedded.Cache Debug E.CD Provides direct access to cache data and directory content Embedded.Cache Initialization E.CI Instructions that invalidate the entire cache Embedded.Device Control E.DC Legacy Device Control bus support Embedded.Enhanced Debug E.ED Embedded Enhanced Debug facility; see Book III-E Embedded.External PID E.PD Embedded External PID facility; see Book III-E Embedded.Hypervisor E.HV Embedded Logical Partitioning and hypervisor facilities Embedded.Hypervisor.LRAT E.HV.LRAT Embedded Hypervisor Logical to Real Address Translation facility; see Book III-E Embedded.Little-Endian E.LE Embedded Little-Endian page attribute; see Book III-E Embedded.Page Table E.PT Embedded Page Table facility; see Book III-E Embedded.TLB Write Conditional E.TWC Embedded TLB Write Conditional facility; see Book III-E Embedded.Performance Monitor E.PM Embedded Performance Monitor example; see Book III-E Embedded.Processor Control E.PC Embedded Processor Control facility; see Book III-E Embedded Cache Locking ECL Embedded Cache Locking facility; see Book III-E Embedded Multi-Threading EM Embedded Multi-Threading; see Book III-E Embedded Multi-Threading.Thread EM.TM Embedded Multi-Threading Thread Management Facility Management External Control EC External Control facility; see Book II Figure 1. Category Listing (Sheet 1 of 2) 8 Power ISATM Book I Version 2.06 Category Abvr. Notes External Proxy EXP External Proxy facility; see Book III-E Floating-Point FP Floating-Point Facilities Floating-Point.Record FP.R Floating-Point instructions with Rc=1 Legacy Integer Multiply-Accumulate1 LMA Legacy Integer Multiply-accumulate instructions Legacy Move Assist LMV Determine Left most Zero Byte instruction Load/Store Quadword LSQ Load/Store Quadword instructions; see Book III-S Memory Coherence MMC Requirement for Memory Coherence; see Book II Move Assist MA Move Assist instructions Processor Compatibility PCR Processor Compatibility Register Server.Performance Monitor S.PM Server Performance Monitor example; see Book III-S Server.Relaxed Page Table Alignment S.RPTA HTAB alignment on 256 KB boundary; see Book III-S Signal Processing Engine1, 2 SP Facility for signal processing SPE.Embedded Float Scalar Double SP.FD GPR-based Floating-Point double-precision instruction set SPE.Embedded Float Scalar Single SP.FS GPR-based Floating-Point single-precision instruction set SPE.Embedded Float Vector SP.FV GPR-based Floating-Point Vector instruction set Store Conditional Page Mobility SCPM Store Conditional accounting for page movement; see Book II Stream STM Stream variant of dcbt instruction; see Book II Strong Access Order SAO Assist for X86 and Sparc emulation; see Book II Trace TRC Trace Facility; see Book III-S Variable Length Encoding VLE Variable Length Encoding facility; see Book VLE Vector-Scalar Extension VSX Vector-Scalar Extension Requires implementation of Floating-Point and Vector catego- ries Vector1 V Vector facilities Vector.Little-Endian V.LE Little-Endian support for Vector storage operations. Wait WT wait instruction; see Book II 64-Bit 64 Required for 64-bit implementations; not defined for 32-bit impl's 1 Because of overlapping opcode usage, SPE is mutually exclusive with Vector and with Legacy Integer Multi- ply-Accumulate, and Legacy Integer Multiply-Accumulate is mutually exclusive with Vector. 2 The SPE-dependent Floating-Point categories are collectively referred to as SPE.Embedded Float_* or SP.*. Figure 1. Category Listing (Sheet 2 of 2) An instruction in a category that is not supported by the Phased-In as defined below. Abbreviations, if applica- implementation is treated as an illegal instruction or an ble, are shown in parentheses. unimplemented instruction on that implementation (see Section 1.7.2). Phased-In These are facilities and instructions that, in the next For an instruction that is supported by the implementa- version of the architecture, tion with field values that are defined by the architec- will be required as part of the ture, the field values defined as part of a category that category they are depen- is not supported by the implementation are treated as dent on. reserved values on that implementation (see Section 1.3.3 and Section 1.8.2). Servers do not implement a facility or instruction in this Bits in a register that are in a category that is not sup- category. Servers that com- ported by the implementation are treated as reserved. ply with earlier versions of this architecture may have 1.3.5.1 Phased-In/Phased-Out optionally implemented facili- ties or instructions that were There are two special categories, Phased-In and category Phased-In. Phased-Out, as well as two additional variations of Chapter 1. Introduction 9 Version 2.06 Server, These are facilities and 1.3.5.3 Category Notation Embedded.Phased-In instructions that are part of Instructions and facilities are considered part of the (S,E.PI) the Server environment and, Base category unless otherwise marked. If a section is in the next version of the marked with a specific category tag, all material in that architecture, will be required section and its subsections are considered part of the for the Embedded environ- category, unless otherwise marked. Overview sections ment. may contain discussion of instructions and facilities from various categories without being explicitly marked. It is implementation-depen- dent whether Embedded An example of a category tag is: [Category: Server]. processors implement a Alternatively, a shorthand notation of a category tag facility or instruction in this includes the category name in angled brackets "<>", category. such as . Embedded, These are facilities and An example of a dependent category is: Server.Phased-In instructions that are part of [Category: Server.Phased-In] (E,S.PI) the Embedded environment The shorthand and may also be used for Cat- and, in the next version of egory: Embedded and Server respectively. the architecture, will be required for the Server envi- ronment. 1.3.6 Environments Servers do not implement a All implementations support one of the two defined facility or instruction in this environments, Server or Embedded. Environments category. refer to common subsets of instructions that are shared Phased-Out These are facilities and across many implementations. The Server environment instructions that, in some describes implementations that support Category: future version of the archi- Base and Server. The Embedded environment tecture, will be dropped out describes implementations that support Category: of the architecture. System Base and Embedded. developers should develop a migration plan to eliminate use of them in new systems. For Server platforms, Phased-Out facilities and instructions must be imple- mented if the facility or instruction is part of another category (including the Base category) that is supported by the Server platform. Programming Note Warning: Instructions and facilities being phased out of the architecture are likely to perform poorly on future implementations. New programs should not use them. Programming Note Facilities are categorized as Phased-In only in cases where there is a difference between the Server and Embedded environments. 1.3.5.2 Corequisite Category A corequisite category is an additional category that is associated with an instruction or facility, and must be implemented if the instruction or facility is implemented. 10 Power ISATM Book I Version 2.06 1.4 Processor Overview to application programs are defined in other Books, and are not shown in the figure.) The basic classes of instructions are as follows: branch instructions (Chapter 2) branch GPR-based scalar fixed-point instructions (Chap- instruction ter 3, Chapter 9, and Chapter 11) processing GPR-based vector fixed-point instructions (Chap- ter 8) instructions GPR-based scalar and vector floating-point instructions (Chapter 10) FPR-based scalar floating-point instructions GPR-based FPR-based VR-based VSR-based (Chapter 4) instruction instruction instruction instruction FPR-based scalar decimal floating-point instruc- processing processing processing processing tions (Chapter 5) scalar scalar vector scalar VR-based vector fixed-point and floating-point fixed-point floating-point fixed-point floating-point instructions (Chapter 6) floating-point floating-point vector vector permute floating-point VSR-based scalar and vector floating-point fixed-point permute instructions (Chapter 7) floating-point Scalar fixed-point instructions operate on byte, half- word, word, doubleword, and quadword (see Book data III-S) operands, where each operand contained in a GPR. Vector fixed-point instructions operate on vectors of byte, halfword, and word operands, where each vec- tor is contained in a GPR or VR. Scalar floating-point instructions storage instructions operate on single-precision or double-pre- cision floating-point operands, where each operand is contained in a GPR, FPR, or VSR. Vector floating-point Figure 2. Logical processing model instructions operate on vectors of single-precision and double-precision floating-point operands, where each vector is contained in a GPR, VR, or VSR. The Power ISA uses instructions that are four bytes long and word-aligned (VLE has different instruction characteristics; see Book VLE). It provides for byte, halfword, word, doubleword, and quadword operand loads and stores between storage and a set of 32 Gen- eral Purpose Registers (GPRs). It provides for word and doubleword operand loads and stores between storage and a set of 32 Floating-Point Registers (FPRs). It also provides for byte, halfword, word, and quadword operand loads and stores between storage and a set of 32 Vector Registers (VRs). It provides for doubleword and quadword operand loads and stores between storage and a set of 64 Vector-Scalar Regis- ters (VSRs). Signed integers are represented in two's complement form. There are no computational instructions that modify storage; instructions that reference storage may refor- mat the data (e.g. load halfword algebraic). To use a storage operand in a computation and then modify the same or another storage location, the contents of the storage operand must be loaded into a register, modi- fied, and then stored back to the target location. Figure 2 is a logical representation of instruction pro- cessing. Figure 3 shows the registers that are defined in Book I. (A few additional registers that are available Chapter 1. Introduction 11 Version 2.06 CR FPSCR 32 63 32 63 "Condition Register" on page 30 "Floating-Point Status and Control Register" on page 107 LR 0 63 Category: Vector: "Link Register" on page 31 VR 0 CTR VR 1 0 63 ... "Count Register" on page 31 ... VR 30 GPR 0 VR 31 GPR 1 0 127 ... "Vector Registers" on page 205 ... GPR 30 VSCR 96 127 GPR 31 "Vector Status and Control Register" on page 205 0 63 "General Purpose Registers" on page 42 Category: Vector-Scalar Extension: VSR 0 XER 0 63 VSR 1 "Fixed-Point Exception Register" on page 42 ... ... VRSAVE VSR 62 32 63 VSR 63 "VR Save Register" on page 206 0 127 Category: Embedded: "Vector-Scalar Registers" on page 273 SPRG4 SPRG5 Category: SPE: SPRG6 SPRG7 Accumulator 0 63 0 63 "Software-use SPRs" on page 43. "Accumulator" on page 504 Category: Floating-Point: SPEFSCR 32 63 FPR 0 "Signal Processing and Embedded Floating-Point Status FPR 1 and Control Register" on page 504 ... ... FPR 30 FPR 31 0 63 "Floating-Point Registers" on page 107 Figure 3. Registers that are defined in Book I 12 Power ISATM Book I Version 2.06 1.5 Computation modes 1.5.1 Modes [Category: Server] - When an effective address is placed in a register other than the Initialize Next Processors provide two execution modes, 64-bit mode Instruction register (see Appendix D of and 32-bit mode. In both of these modes, instructions Book III-E) by an instruction or event, the that set a 64-bit register affect all 64 bits. The computa- high-order 32 bits are set to an undefined tional mode controls how the effective address is inter- value (see Section 1.10.3). preted, how Condition Register bits and XER bits are - Except for instructions in the SPE cate- set, how the Link Register is set by Branch instructions gory, instructions that operate on GPRs in which LK=1, and how the Count Register is tested by and SPRs use only the low-order 32 bits Branch Conditional instructions. Nearly all instructions of the source GPR or SPR and produce a are available in both modes (the only exceptions are a 32-bit result; the high-order 32 bits of tar- few instructions that are defined in Book III-S). In both get GPRs are set to an undefined value, modes, effective address computations use all 64 bits and the high-order 32 bits of target SPRs of the relevant registers (General Purpose Registers, are preserved. Instructions in the 64-Bit Link Register, Count Register, etc.) and produce a category are treated as illegal instruc- 64-bit result. However, in 32-bit mode the high-order 32 tions. bits of the computed effective address are ignored for the purpose of addressing storage; see Section 1.10.3 Programming Note for additional details. The high-order 32 bits of 64-bit SPRs are not modified in 32-bit mode because for Programming Note some 64-bit SPRs, such as the Thread Although instructions that set a 64-bit register affect Enable Register (see Book III-E), these all 64 bits in both 32-bit and 64-bit modes, operat- bits control facilities that are active in ing systems often do not preserve the upper 32-bits 32-bit mode. Treating all 64-bit SPRs the of all registers across context switches done in same way in this regard simplifies archi- 32-bit mode. For this reason, application programs tecture and implementation. operating in 32-bit mode should not assume that the upper 32 bits of the GPRs are preserved from Implementations may provide a means for select- instruction to instruction unless the operating sys- ing between the two treatments of the high-order tem is known to preserve these bits. 32 bits of GPRs in 32-bit mode (i.e., for selecting between the behavior described in the first sub-bullet and the behavior described in the sec- 1.5.2 Modes [Category: Embed- ond sub-bullet). The means, if provided, is imple- mentation-specific (including any software ded] synchronization requirements for changing the 64-bit processors provide 64-bit mode and 32-bit mode. selection), but must be hypervisor privileged, and The differences between the two modes are the hypervisor must ensure that the selection is described below. 32-bit processors provide only constant for a given partition. 32-bit mode, and do so as described at the end of this 32-bit processors provide only 32-bit mode, and pro- section. vide it as described by the second sub-bullet of the In 64-bit mode, the processor behaves as 32-bit mode bullet above. described for 64-bit mode in the Server environ- ment; see Section 1.5.1. 1.6 Instruction formats In 32-bit mode, the processor behavior depends on whether the high-order 32 bits of GPRs are All instructions are four bytes long and word-aligned implemented in 32-bit mode, as follows. (except for VLE instructions; see Book VLE). Thus, - If these bits are implemented in 32-bit mode, whenever instruction addresses are presented to the the processor behaves as described for 32-bit processor (as in Branch instructions) the low-order two mode in the Server environment. bits are ignored. Similarly, whenever the processor develops an instruction address the low-order two bits - If these bits are not implemented in 32-bit are zero. mode, the processor behaves as described for 32-bit mode in the Server Environment Bits 0:5 always specify the opcode (OPCD, below). except for the following. Many instructions also have an extended opcode (XO, below). The remaining bits of the instruction contain Chapter 1. Introduction 13 Version 2.06 one or more fields as shown below for the different 1.6.4 D-FORM instruction formats. 0 6 11 16 31 The format diagrams given below show horizontally all OPCD RT RA D valid combinations of instruction fields. The diagrams OPCD RT RA SI include instruction fields that are used only by instruc- OPCD RS RA D tions defined in Book II or in Book III. OPCD RS RA UI OPCD BF / L RA SI Split Field Notation OPCD BF / L RA UI In some cases an instruction field occupies more than OPCD TO RA SI one contiguous sequence of bits, or occupies one con- OPCD FRT RA D tiguous sequence of bits that are used in permuted order. Such a field is called a split field. In the format OPCD FRS RA D diagrams given below and in the individual instruction Figure 7. D instruction format layouts, the name of a split field is shown in small let- ters, once for each of the contiguous sequences. In the RTL description of an instruction having a split field, 1.6.5 DS-FORM and in certain other places where individual bits of a 0 6 11 16 30 31 split field are identified, the name of the field in small OPCD RT RA DS XO letters represents the concatenation of the sequences OPCD RS RA DS XO from left to right. In all other places, the name of the field is capitalized and represents the concatenation of OPCD RSp RA DS XO the sequences in some order, which need not be left to OPCD FRTp RA DS XO right, as described for each affected instruction. OPCD FRSp RA DS XO Figure 8. DS instruction format 1.6.1 I-FORM 0 6 30 31 OPCD LI AA LK 1.6.6 DQ-FORM 0 6 11 16 28 31 Figure 4. I instruction format OPCD RTp RA DQ /// Figure 9. DQ instruction format 1.6.2 B-FORM 0 6 11 16 30 31 OPCD BO BI BD AA LK Figure 5. B instruction format 1.6.3 SC-FORM 0 6 11 16 20 27 30 31 OPCD /// /// // LEV // 1 / OPCD /// /// /// /// // 1 / Figure 6. SC instruction format 14 Power ISATM Book I Version 2.06 1.6.7 X-FORM 0 6 11 16 21 31 0 6 11 16 21 31 OPCD FRTp S FRBp XO Rc OPCD RT RA /// XO / OPCD FRS RA RB XO / OPCD RT RA RB XO / OPCD FRSp RA RB XO / OPCD RT RA RB XO EH OPCD BT /// /// XO Rc OPCD RT RA NB XO / OPCD /// RA RB XO / OPCD RT / SR /// XO / OPCD /// /// RB XO / OPCD RT /// RB XO / OPCD /// /// /// XO / OPCD RT /// RB XO 1 OPCD /// /// E /// XO / OPCD RT /// /// XO / OPCD // IH /// /// XO / OPCD RS RA RB XO Rc OPCD /// RA RB XO 1 OPCD RT RA RB XO Rc OPCD /// WC /// /// XO / OPCD RS RA RB XO 1 OPCD /// T RA RB XO / OPCD RS RA RB XO / OPCD VRT RA RB XO / OPCD RS RA NB XO / OPCD VRS RA RB XO / OPCD RS RA SH XO Rc OPCD MO /// /// XO / OPCD RS RA /// XO Rc Figure 10. X Instruction Format OPCD RS RA /// XO / OPCD RS / SR /// XO / 1.6.8 XL-FORM OPCD RS /// RB XO / 0 6 11 16 21 31 OPCD RS /// /// XO / OPCD BT BA BB XO / OPCD RS /// L /// XO / OPCD BO BI /// BH XO LK OPCD TH RA RB XO / OPCD BF // BFA // /// XO / OPCD BF / L RA RB XO / OPCD /// /// /// XO / OPCD BF // FRA FRB XO / OPCD OC XO / OPCD BF // BFA // /// XO / OPCD BF // /// W U / XO Rc Figure 11. XL instruction format OPCD BF // /// /// XO / OPCD TH RA RB XO / 1.6.9 XFX-FORM OPCD / CT /// /// XO / 0 6 11 21 31 OPCD / CT RA RB XO / OPCD RT spr XO / OPCD /// L RA RB XO / OPCD RT tbr XO / OPCD /// L /// RB XO / OPCD RT 0 /// XO / OPCD /// L /// /// XO / OPCD RT 1 FXM / XO / OPCD TO RA RB XO / OPCD RT dcr XO / OPCD FRT RA RB XO / OPCD RT pmrn XO / OPCD FRT FRA FRB XO / OPCD DUI DUIS XO / OPCD FRTp RA RB XO / OPCD RS 0 FXM / XO / OPCD FRT /// FRB XO Rc OPCD RS 1 FXM / XO / OPCD FRT /// FRBp XO Rc OPCD RS spr XO / OPCD FRT /// /// XO Rc OPCD RS dcr XO / OPCD FRTp /// FRB XO Rc OPCD RS pmrn XO / OPCD FRTp /// FRBp XO Rc Figure 12. XFX instruction format OPCD FRTp FRA FRBp XO Rc OPCD FRTp FRAp FRBp XO Rc 1.6.10 XFL-FORM OPCD BF // FRA FRBp XO / 0 6 7 15 16 21 31 OPCD BF // FRAp FRBp XO / OPCD L FLM W FRB XO Rc OPCD FRT S FRB XO Rc Figure 13. XFL instruction format Figure 10. X Instruction Format Chapter 1. Introduction 15 Version 2.06 1.6.11 XX1-FORM 1.6.17 A-FORM 0 6 11 16 21 31 0 6 11 16 21 26 31 OPCD T RA RB XO TX OPCD FRT FRA FRB FRC XO Rc OPCD FRT FRA FRB /// XO Rc OPCD S RA RB XO SX 0 6 11 16 21 31 OPCD FRT FRA /// FRC XO Rc OPCD FRT /// FRB /// XO Rc Figure 14. XX1 Instruction Format OPCD RT RA RB BC XO / 1.6.12 XX2-FORM Figure 20. A instruction format 0 6 11 14 16 21 30 31 OPCD T /// B XO BX TX 1.6.18 M-FORM 0 6 11 16 21 26 31 OPCD T /// UIM B XO BX TX OPCD RS RA RB MB ME Rc 0 6 11 14 16 21 30 31 OPCD RS RA SH MB ME Rc Figure 15. XX2 Instruction Format Figure 21. M instruction format 1.6.13 XX3-FORM 0 6 9 11 14 16 21 22 23 29 30 31 1.6.19 MD-FORM 0 6 11 16 21 27 30 31 OPCD T A B XO AXBX TX OPCD RS RA sh mb XO sh Rc OPCD T A B Rc XO AXBX TX OPCD RS RA sh me XO sh Rc OPCD BF // A B XO AXBX / Figure 22. MD instruction format OPCD T A B XO SHW XO AXBX TX OPCD T A B XO DM XO AXBX TX 1.6.20 MDS-FORM 0 6 9 11 14 16 21 22 23 29 30 31 0 6 11 16 21 27 31 Figure 16. XX3 Instruction Format OPCD RS RA RB mb XO Rc OPCD RS RA RB me XO Rc 1.6.14 XX4-FORM Figure 23. MDS instruction format 0 6 11 16 21 26 28 29 30 31 OPCD T A B C XO CXAXBX TX 1.6.21 VA-FORM 0 6 11 16 21 26 28 29 30 31 0 6 11 16 21 26 31 Figure 17. XX4-Form Instruction Format OPCD VRT VRA VRB VRC XO OPCD VRT VRA VRB / SHB XO 1.6.15 XS-FORM 0 6 11 16 21 30 31 Figure 24. VA instruction format OPCD RS RA sh XO sh Rc Figure 18. XS instruction format 1.6.22 VC-FORM 1.6.16 XO-FORM 0 OPCD 6 VRT 11 VRA 16 VRB 21 Rc 22 XO 31 0 6 11 16 21 22 31 OPCD RT RA RB OE XO Rc Figure 25. VC instruction format OPCD RT RA RB / XO Rc OPCD RT RA RB / XO / OPCD RT RA /// OE XO Rc Figure 19. XO instruction format 16 Power ISATM Book I Version 2.06 1.6.23 VX-FORM 1.6.27 Z23-FORM 0 6 11 16 21 31 0 6 11 16 21 23 31 OPCD VRT VRA VRB XO OPCD FRT TE FRB RMC XO Rc OPCD VRT /// VRB XO OPCD FRTp TE FRBp RMC XO Rc OPCD VRT UIM VRB XO OPCD VRT / UIM VRB XO OPCD FRT FRA FRB RMC XO Rc OPCD VRT // UIM VRB XO OPCD FRTp FRA FRBp RMC XO Rc OPCD VRT /// UIM VRB XO OPCD FRTp FRAp FRBp RMC XO Rc OPCD VRT SIM /// XO OPCD VRT /// XO OPCD FRT /// R FRB RMC XO Rc OPCD /// VRB XO OPCD FRTp /// R FRBp RMC XO Rc Figure 26. VX instruction format Figure 30. Z23 instruction format 1.6.24 EVX-FORM 1.6.28 Instruction Fields 0 6 11 16 21 31 AA (30) OPCD RS RA RB XO Absolute Address bit. OPCD RS RA UI XO 0 The immediate field represents an OPCD RT /// RB XO address relative to the current instruction OPCD RT RA RB XO address. For I-form branches the effective OPCD RT RA /// XO address of the branch target is the sum of the LI field sign-extended to 64 bits and OPCD RT UI RB XO the address of the branch instruction. For OPCD BF // RA RB XO B-form branches the effective address of OPCD RT RA UI XO the branch target is the sum of the BD OPCD RT SI /// XO field sign-extended to 64 bits and the address of the branch instruction. Figure 27. EVX instruction format 1 The immediate field represents an abso- lute address. For I-form branches the effective address of the branch target is 1.6.25 EVS-FORM the LI field sign-extended to 64 bits. For B-form branches the effective address of 0 6 11 16 21 29 31 the branch target is the BD field OPCD RT RA RB XO BFA sign-extended to 64 bits. Figure 28. EVS instruction format AX (29) & A(11:15) Fields that are concatenated to specify a VSR to be used as a source. 1.6.26 Z22-FORM BA (11:15) 0 6 11 15 16 22 31 Field used to specify a bit in the CR to be used as OPCD BF // FRA DCM XO / a source. OPCD BF // FRAp DCM XO / BB (16:20) OPCD BF // FRA DGM XO / Field used to specify a bit in the CR to be used as a source. OPCD BF // FRAp DGM XO / OPCD FRT FRA SH XO Rc BC (21:25) Field used to specify a bit in the CR to be used as OPCD FRTp FRAp SH XO Rc a source. Figure 29. Z22 instruction format BD (16:29) Immediate field used to specify a 14-bit signed two's complement branch displacement which is concatenated on the right with 0b00 and sign-extended to 64 bits. Chapter 1. Introduction 17 Version 2.06 BF (6:8) DS (16:29) Field used to specify one of the CR fields or one of Immediate field used to specify a 14-bit signed the FPSCR fields to be used as a target. two's complement integer which is concatenated on the right with 0b00 and sign-extended to 64 BFA (11:13 or 29:31) bits. Field used to specify one of the CR fields or one of the FPSCR fields to be used as a source. DUI (6:10) Field used by the dnh instruction (see Book II). BH (19:20) Field used to specify a hint in the Branch Condi- DUIS (11:20) tional to Link Register and Branch Conditional to Field used by the dnh instruction (see Book II). Count Register instructions. The encoding is described in Section 2.4, "Branch Instructions". E (16) Field used by the Write MSR External Enable BI (11:15) instruction (see Book III-E). Field used to specify a bit in the CR to be tested by a Branch Conditional instruction. EH (31) Field used to specify a hint in the Load and BO (6:10) Reserve instructions. The meaning is described in Field used to specify options for the Branch Condi- Section 4.4.2, "Load and Reserve and Store Con- tional instructions. The encoding is described in ditional Instructions", in Book II. Section 2.4, "Branch Instructions". FLM (7:14) BT (6:10) Field mask used to identify the FPSCR fields that Field used to specify a bit in the CR or in the are to be updated by the mtfsf instruction. FPSCR to be used as a target. BX (30) & B(16:20) FRA (11:15) Fields that are concatenated to specify a VSR to Field used to specify an FPR to be used as a be used as a source. source. CT (7:10) FRAp (11:15) Field used in X-form instructions to specify a cache Field used to specify an even/odd pair of FPRs to target (see Section 4.3.2 of Book II). be concatenated and used as a source. FRB (16:20) CX (28) & C(21:25) Field used to specify an FPR to be used as a Fields that are concatenated to specify a VSR to source. be used as a source. FRBp (16:20) D (16:31) Field used to specify an even/odd pair of FPRs to Immediate field used to specify a 16-bit signed be concatenated and used as a source. two's complement integer which is sign-extended to 64 bits. FRC (21:25) Field used to specify an FPR to be used as a DCM (16:21) source. Immediate field used as the Data Class Mask. FRS (6:10) DCR (11:20) Field used to specify an FPR to be used as a Field used by the Move To/From Device Control source. Register instructions (see Book III-E). FRSp (6:10) DGM (16:21) Field used to specify an even/odd pair of FPRs to Immediate field used as the Data Group Mask. be concatenated and used as a source. DM (24:25) FRT (6:10) Immediate field used by xxpermdi instruction as Field used to specify an FPR to be used as a tar- doubleword permute control. get. DQ (16:27) FRTp (6:10) Immediate field used to specify a 12-bit signed Field used to specify an even/odd pair of FPRs to two's complement integer which is concatenated be concatenated and used as a target. on the right with 0b0000 and sign-extended to 64 bits. FXM (12:19) 18 Power ISATM Book I Version 2.06 Field mask used to identify the CR fields that are to MO (6:10) be written by the mtcrf and mtocrf instructions, or Field used in X-form instructions to specify a sub- read by the mfocrf instruction. set of storage accesses. IH (8:10) NB (16:20) Field used to specify a hint in the SLB Invalidate All Field used to specify the number of bytes to move instruction. The meaning is described in in an immediate Move Assist instruction. Section 5.9.3.1, "SLB Management Instructions", in Book III-S. OC (6:20) Field used by the Embedded Hypervisor Privilege L (6) instruction. Field used to specify whether the mtfsf instruction updates the entire FPSCR. OPCD (0:5) Primary opcode field. L (10 or 15) Field used to specify whether a fixed-point Com- OE (21) pare instruction is to compare 64-bit numbers or Field used by XO-form instructions to enable set- 32-bit numbers. ting OV and SO in the XER. Field used by the Data Cache Block Flush instruc- PMRN (11:20) tion (see Section 4.3.2 of Book II). Field used to specify a Performance Monitor Reg- ister for the mfpmr and mtpmr instructions. Field used by the Move To Machine State Register and TLB Invalidate Entry instructions (see Book R (15) III). Immediate field that specifies whether the RMC is specifying the primary or secondary encoding L (9:10) Field used by the Synchronize instruction (see RA (11:15) Section 4.4.1 of Book II). Field used to specify a GPR to be used as a source or as a target. LEV (20:26) Field used by the System Call instruction. RB (16:20) Field used to specify a GPR to be used as a LI (6:29) source. Immediate field used to specify a 24-bit signed two's complement integer which is concatenated Rc (21 OR 31) on the right with 0b00 and sign-extended to 64 RECORD bit. bits. 0 Do not alter the Condition Register. 1 Set Condition Register Field 0, Field 1, or LK (31) Field 6 as described in Section 2.3.1, LINK bit. "Condition Register" on page 30. 0 Do not set the Link Register. 1 Set the Link Register. The address of the RMC (21:22) instruction following the Branch instruction Immediate field used for DFP rounding mode con- is placed into the Link Register. trol. MB (21:25) and ME (26:30) RS (6:10) Fields used in M-form instructions to specify a Field used to specify a GPR to be used as a 64-bit mask consisting of 1-bits from bit MB+32 source. through bit ME+32 inclusive and 0-bits elsewhere, as described in Section 3.3.13, "Fixed-Point Rotate RSp (6:10) and Shift Instructions" on page 87. Field used to specify an even/odd pair of GPRs to be concatenated and used as a source. MB (21:26) Field used in MD-form and MDS-form instructions RT (6:10) to specify the first 1-bit of a 64-bit mask, as Field used to specify a GPR to be used as a target. described in Section 3.3.13, "Fixed-Point Rotate and Shift Instructions" on page 87. RTp (6:10) Field used to specify an even/odd pair of GPRs to ME (21:26) be concatenated and used as a target. Field used in MD-form and MDS-form instructions to specify the last 1-bit of a 64-bit mask, as S (11) described in Section 3.3.13, "Fixed-Point Rotate Immediate field that specifies signed versus and Shift Instructions" on page 87. unsigned conversion. Chapter 1. Introduction 19 Version 2.06 SH (16:20, or 16:20 and 30, or 16:21) UI (11:15, 16:20, or 16:31) Field used to specify a shift amount. Immediate field used to specify an unsigned inte- ger. SHB (22:25) Field used to specify a shift amount in bytes. UIM (11:15, 12:15, 13:15, 14:15) Immediate field used to specify an unsigned inte- SHW (24:25) ger. Field used to specify a shift amount in words. VRA (11:15) SI (16:31 or 11:15) Field used to specify a VR to be used as a source. Immediate field used to specify a 16-bit signed integer. VRB (16:20) Field used to specify a VR to be used as a source. SIM (11:15) Immediate field used to specify a 5-bit signed inte- VRC (21:25) ger. Field used to specify a VR to be used as a source. SP (11:12) VRS (6:10) Immediate field that specifies signed versus Field used to specify a VR to be used as a source. unsigned conversion. VRT (6:10) SPR (11:20) Field used to specify a VR to be used as a target. Field used to specify a Special Purpose Register W (15) for the mtspr and mfspr instructions. Field used by the mtfsfi and mtfsf instructions to spec- SR (12:15) ify the target word in the FPSCR. Field used by the Segment Register Manipulation instructions (see Book III-S). WC (9:10) Field used to specify the condition or conditions SX (31) & S(6:10) that cause instruction execution to resume after Fields that are concatenated to specify a VSR to executing a wait [Category: Wait] instruction (see be used as a source. Section 4.4.4 of Book II). T(9:10) XO (21, 21:28, 21:29, 21:30, 21:31, 22:28, 22:30, Field used to specify the type of invalidation done 22:31, 23:30, 24:28, 26:27, 26:30, 26:31, 27:29, by a TLB Invalidate Local instruction (see Book 27:30, or 30:31) III-E). Extended opcode field. TBR (11:20) Field used by the Move From Time Base instruc- tion (see Section 5.2.1 of Book II). 1.7 Classes of Instructions TE (11:15) An instruction falls into exactly one of the following Immediate field that specifies a DFP exponent. three classes: TH (6:10) Defined Field used by the data stream variant of the dcbt Illegal and dcbtst instructions (see Section 4.3.2 of Book Reserved II). The class is determined by examining the opcode, and TO (6:10) the extended opcode if any. If the opcode, or combina- Field used to specify the conditions on which to tion of opcode and extended opcode, is not that of a trap. The encoding is described in Section 3.3.10, defined instruction or a reserved instruction, the "Fixed-Point Trap Instructions" on page 76. instruction is illegal. TX (31) & T (6:10) Fields that are concatenated to specify a VSR to 1.7.1 Defined Instruction Class be used as a target. This class of instructions contains all the instructions U (16:19) defined in this document. Immediate field used as the data to be placed into A defined instruction can have preferred and/or invalid a field in the FPSCR. forms, as described in Section 1.8.1, "Preferred Instruc- tion Forms" and Section 1.8.2, "Invalid Instruction 20 Power ISATM Book I Version 2.06 Forms". Instructions that are part of a category that is not supported are treated as illegal instructions. 1.7.2 Illegal Instruction Class This class of instructions contains the set of instruc- tions described in Appendix D of Book Appendices. Ille- gal instructions are available for future extensions of the Power ISA ; that is, some future version of the Power ISA may define any of these instructions to per- form new functions. Any attempt to execute an illegal instruction will cause the system illegal instruction error handler to be invoked and will have no other effect. An instruction consisting entirely of binary 0s is guaran- teed always to be an illegal instruction. This increases the probability that an attempt to execute data or unini- tialized storage will result in the invocation of the sys- tem illegal instruction error handler. 1.7.3 Reserved Instruction Class This class of instructions contains the set of instruc- tions described in Appendix E of Book Appendices. Reserved instructions are allocated to specific pur- poses that are outside the scope of the Power ISA. Any attempt to execute a reserved instruction will: perform the actions described by the implementa- tion if the instruction is implemented; or cause the system illegal instruction error handler to be invoked if the instruction is not implemented. Chapter 1. Introduction 21 Version 2.06 1.8 Forms of Defined Instruc- allow software to use this new capability on new imple- mentations that support it while remaining compatible tions with existing implementations that may not support the new function. 1.8.1 Preferred Instruction Forms When a reserved-no-op instruction is executed, no operation is performed. Some of the defined instructions have preferred forms. For such an instruction, the preferred form will execute Reserved-no-op instructions are not assigned instruc- in an efficient manner, but any other form may take sig- tion names or mnemonics. There are no individual nificantly longer to execute than the preferred form. descriptions of reserved-no-op instructions in this docu- ment. Instructions having preferred forms are: the Condition Register Logical instructions the Load/Store Multiple instructions 1.9 Exceptions the Load/Store String instructions There are two kinds of exception, those caused directly the Or Immediate instruction (preferred form of by the execution of an instruction and those caused by no-op) an asynchronous event. In either case, the exception the Move To Condition Register Fields instruction may cause one of several components of the system software to be invoked. 1.8.2 Invalid Instruction Forms The exceptions that can be caused directly by the exe- Some of the defined instructions can be coded in a cution of an instruction include the following: form that is invalid. An instruction form is invalid if one an attempt to execute an illegal instruction, or an or more fields of the instruction, excluding the opcode attempt by an application program to execute a field(s), are coded incorrectly in a manner that can be "privileged" instruction (see Book III) (system ille- deduced by examining only the instruction encoding. gal instruction error handler or system privileged In general, any attempt to execute an invalid form of an instruction error handler) instruction will either cause the system illegal instruc- the execution of a defined instruction using an tion error handler to be invoked or yield boundedly invalid form (system illegal instruction error han- undefined results. Exceptions to this rule are stated in dler or system privileged instruction error handler) the instruction descriptions. an attempt to execute an instruction that is not pro- Some instruction forms are invalid because the instruc- vided by the implementation (system illegal tion contains a reserved value in a defined field (see instruction error handler) Section 1.3.3 on page 6); these invalid forms are not an attempt to access a storage location that is discussed further. All other invalid forms are identified unavailable (system instruction storage error han- in the instruction descriptions. dler or system data storage error handler) References to instructions elsewhere in this document an attempt to access storage with an effective assume the instruction form is not invalid, unless other- address alignment that is invalid for the instruction wise stated or obvious from context. (system alignment error handler) Assembler Note the execution of a System Call instruction (system service program) Assemblers should report uses of invalid instruc- tion forms as errors. the execution of a Trap instruction that traps (sys- tem trap handler) the execution of a floating-point instruction that 1.8.3 Reserved-no-op Instructions causes a floating-point enabled exception to exist [Category: Phased-In] (system floating-point enabled exception error handler) Reserved-no-op instructions include the following the execution of an auxiliary processor instruction extended opcodes under primary opcode 31: 530, 562, that causes an auxiliary processor enabled excep- 594, 626, 658, 690, 722, and 754. tion to exist (system auxiliary processor enabled Reserved-no-op instructions are provided in the archi- exception error handler) tecture to anticipate the eventual adoption of perfor- The exceptions that can be caused by an asynchro- mance hint instructions to the architecture. For these nous event are described in Book III. instructions, which cause no visible change to archi- tected state, employing a reserved-no-op opcode will 22 Power ISATM Book I Version 2.06 The invocation of the system error handler is precise, aligned at its natural boundary; otherwise it is said to be except that the invocation of the auxiliary processor unaligned. See the following table. enabled exception error handler may be imprecise, and if one of the imprecise modes for invoking the system Operand Length Addr60:63 if aligned floating-point enabled exception error handler is in Byte 8 bits xxxx effect (see page 115), then the invocation of the system Halfword 2 bytes xxx0 floating-point enabled exception error handler may also Word 4 bytes xx00 be imprecise. When the system error handler is invoked Doubleword 8 bytes x000 imprecisely, the excepting instruction does not appear Quadword 16 bytes 0000 to complete before the next instruction starts (because one of the effects of the excepting instruction, namely Note: An "x" in an address bit position indicates that the invocation of the system error handler, has not yet the bit can be 0 or 1 independent of the contents of occurred). other bits in the address. Additional information about exception handling can be The concept of alignment is also applied more gener- found in Book III. ally, to any datum in storage. For example, a 12-byte datum in storage is said to be word-aligned if its address is an integral multiple of 4. 1.10 Storage Addressing Some instructions require their storage operands to have certain alignments. In addition, alignment may A program references storage using the effective affect performance. For single-register Storage Access address computed by the processor when it executes a instructions and quadword Load and Store instructions, Storage Access or Branch instruction (or certain other the best performance is obtained when storage oper- instructions described in Book II and Book III), or when ands are aligned. Additional effects of data placement it fetches the next sequential instruction. on performance are described in Chapter 2 of Book II. Bytes in storage are numbered consecutively starting When a storage operand of length N bytes starting at with 0. Each number is the address of the correspond- effective address EA is copied between storage and a ing byte. register that is R bytes long (i.e., the register contains The byte ordering (Big-Endian or Little-Endian) for a bytes numbered from 0, most significant, through R-1, storage access is specified by the operating system. In least significant), the bytes of the operand are placed the Embedded environment this ordering is a page into the register or into storage in a manner that attribute (see Book II) and is specified independently depends on the byte ordering for the storage access as for each virtual page, while in the Server environment it shown in Figure 31, unless otherwise specified in the is a mode (see Book III-S) and applies to all storage. instruction description. 1.10.1 Storage Operands Big-Endian Byte Ordering A storage operand may be a byte, a halfword, a word, a Load Store doubleword, or a quadword, or, for the Load/Store Mul- for i=0 to N-1: for i=0 to N-1: tiple and Move Assist instructions, a sequence of bytes RT(R-N)+i MEM(EA+i,1) MEM(EA+i,1) (RS)(R-N)+i (Move Assist) or words (Load/Store Multiple). For Little-Endian Byte Ordering example, the storage operand of a Load Floating-Point Double Pair instruction is a quadword, not two double- Load Store words. The address of a storage operand is the for i=0 to N-1: for i=0 to N-1: address of its first byte (i.e., of its lowest-numbered RT(R-1)-i MEM(EA+i,1) MEM(EA+i,1) (RS)(R-1)-i byte). An instruction for which the storage operand is a Notes: byte is said to cause a byte access, and similarly for 1. In this table, subscripts refer to bytes in a register halfword, word, doubleword, and quadword. rather than to bits as defined in Section 1.3.2. 2. This table does not apply to the lvebx, lvehx, Operand length is implicit for each instruction. lvewx, stvebx, stvehx, and stvewx instructions. The storage operand of a single-register Storage Access instruction or quadword Load or Store instruc- Figure 31. Storage operands and byte ordering tion has a "natural" alignment boundary equal to the Figure 32 shows an example of a C language operand length. In other words, the "natural" address of structure s containing an assortment of scalars and such an operand is an integral multiple of the operand one character string. The value assumed to be in each length. Such an operand is said to be aligned if it is structure element is shown in hex in the C comments; these values are used below to show how the bytes making up each structure element are mapped into storage. It is assumed that structure s is compiled for Chapter 1. Introduction 23 Version 2.06 32-bit mode or for a 32-bit implementation. (This affects The Big-Endian mapping of structure s is shown in the length of the pointer to c.) Figure 33. Addresses are shown in hex at the left of each doubleword, and in small figures below each byte. C structure mapping rules permit the use of padding The contents of each byte, as indicated in the C exam- (skipped bytes) in order to align the scalars on desir- ple in Figure 32, are shown in hex (as characters for the able boundaries. Figures 33 and 34 show each scalar elements of the string). aligned at its natural boundary. This alignment intro- duces padding of four bytes between a and b, one byte The Little-Endian mapping of structure s is shown in between d and e, and two bytes between e and f. The Figure 34. Doublewords are shown laid out from right to same amount of padding is present for both Big-Endian left, which is the common way of showing storage maps and Little-Endian mappings. for processors that implement only Little-Endian byte ordering. struct { int a; /* 0x1112_1314 word */ double b; /* 0x2122_2324_2526_2728 doubleword */ char * c; /* 0x3132_3334 word */ char d[7]; /* `A', `B', `C', `D', `E', `F', `G' array of bytes */ short e; /* 0x5152 halfword */ int f; /* 0x6162_6364 word */ } s; Figure 32. C structure `s', showing values of 1.10.2 Instruction Fetches elements Instructions are always four bytes long and word-aligned (except for VLE instructions; see Book 00 11 12 13 14 VLE). 00 01 02 03 04 05 06 07 When an instruction starting at effective address EA is 08 21 22 23 24 25 26 27 28 fetched from storage, the relative order of the bytes 08 09 0A 0B 0C 0D 0E 0F within the instruction depend on the byte ordering for 10 31 32 33 34 `A' `B' `C' `D' the storage access as shown in Figure 35. 10 11 12 13 14 15 16 17 18 `E' `F' `G' 51 52 Big-Endian Byte Ordering 18 19 1A 1B 1C 1D 1E 1F for i=0 to 3: 20 61 62 63 64 insti MEM(EA+i,1) Little-Endian Byte Ordering 20 21 22 23 for i=0 to 3: Figure 33. Big-Endian mapping of structure `s' inst3-i MEM(EA+i,1) Note: In this table, subscripts refer to bytes of the instruction rather than 11 12 13 14 00 to bits as defined in Section 1.3.2. 07 06 05 04 03 02 01 00 Figure 35. Instructions and byte ordering 21 22 23 24 25 26 27 28 08 0F 0E 0D 0C 0B 0A 09 08 Figure 36 shows an example of a small assembly lan- `D' `C' `B' `A' 31 32 33 34 10 guage program p. 17 16 15 14 13 12 11 10 loop: 51 52 `G' `F' `E' 18 cmplwi r5,0 1F 1E 1D 1C 1B 1A 19 18 beq done 61 62 63 64 20 lwzux r4,r5,r6 add r7,r7,r4 23 22 21 20 subi r5,r5,4 Figure 34. Little-Endian mapping of structure `s' b loop done: stw r7,total Figure 36. Assembly language program `p' The Big-Endian mapping of program p is shown in Figure 37 (assuming the program starts at address 0). 24 Power ISATM Book I Version 2.06 00 loop: cmplwi r5,0 beq done 00 01 02 03 04 05 06 07 08 lwzux r4,r5,r6 add r7,r7,r4 08 09 0A 0B 0C 0D 0E 0F 10 subi r5,r5,4 b loop 10 11 12 13 14 15 16 17 18 done: stw r7,total 18 19 1A 1B 1C 1D 1E 1F Figure 37. Big-Endian mapping of program `p' The Little-Endian mapping of program p is shown in Figure 38. beq done loop: cmplwi r5,0 00 07 06 05 04 03 02 01 00 add r7,r7,r4 lwzux r4,r5,r6 08 0F 0E 0D 0C 0B 0A 09 08 b loop subi r5,r5,4 10 17 16 15 14 13 12 11 10 done: stw r7,total 18 1F 1E 1D 1C 1B 1A 19 18 Figure 38. Little-Endian mapping of program `p' Chapter 1. Introduction 25 Version 2.06 Programming Note The terms Big-Endian and Little-Endian come from forbidden, and the whole Party rendered incapable Part I, Chapter 4, of Jonathan Swift's Gulliver's Travels. by Law of holding Employments. During the Here is the complete passage, from the edition printed Course of these Troubles, the Emperors of Ble- in 1734 by George Faulkner in Dublin. fuscu did frequently expostulate by their Ambassa- dors, accusing us of making a Schism in Religion, ... our Histories of six Thousand Moons make no by offending against a fundamental Doctrine of our Mention of any other Regions, than the two great great Prophet Lustrog, in the fifty-fourth Chapter of Empires of Lilliput and Blefuscu. Which two mighty the Brundrecal, (which is their Alcoran.) This, how- Powers have, as I was going to tell you, been ever, is thought to be a mere Strain upon the text: engaged in a most obstinate War for six and thirty For the Words are these; That all true Believers Moons past. It began upon the following Occasion. shall break their Eggs at the convenient End: and It is allowed on all Hands, that the primitive Way of which is the convenient End, seems, in my humble breaking Eggs before we eat them, was upon the Opinion, to be left to every Man's Conscience, or larger End: But his present Majesty's Grand-father, at least in the Power of the chief Magistrate to while he was a Boy, going to eat an Egg, and determine. Now the Big-Endian Exiles have found breaking it according to the ancient Practice, hap- so much Credit in the Emperor of Blefuscu's Court; pened to cut one of his Fingers. Whereupon the and so much private Assistance and Encourage- Emperor his Father, published an Edict, com- ment from their Party here at home, that a bloody manding all his Subjects, upon great Penalties, to War has been carried on between the two Empires break the smaller End of their Eggs. The People so for six and thirty Moons with various Success; dur- highly resented this Law, that our Histories tell us, ing which Time we have lost Forty Capital Ships, there have been six Rebellions raised on that and a much greater Number of smaller Vessels, Account; wherein one Emperor lost his Life, and together with thirty thousand of our best Seamen another his Crown. These civil Commotions were and Soldiers; and the Damage received by the constantly fomented by the Monarchs of Blefuscu; Enemy is reckoned to be somewhat greater than and when they were quelled, the Exiles always fled ours. However, they have now equipped a numer- for Refuge to that Empire. It is computed that ous Fleet, and are just preparing to make a eleven Thousand Persons have, at several Times, Descent upon us: and his Imperial Majesty, placing suffered Death, rather than submit to break their great Confidence in your Valour and Strength, hath Eggs at the smaller End. Many hundred large Vol- commanded me to lay this Account of his Affairs umes have been published upon this Controversy: before you. But the Books of the Big-Endians have been long 1.10.3 Effective Address Calcula- In 64-bit mode, the entire 64-bit result comprises the 64-bit effective address. The effective address arith- tion metic wraps around from the maximum address, 264 - 1, to address 0, except that if the current instruc- An effective address is computed by the processor tion is at effective address 264 - 4 the effective address when executing a Storage Access or Branch instruction of the next sequential instruction is undefined. (or certain other instructions described in Book II, Book III, and Book VLE) when fetching the next sequential In 32-bit mode, the low-order 32 bits of the 64-bit result, instruction, or when invoking a system error handler. preceded by 32 0 bits, comprise the 64-bit effective The following provides an overview of this process. address for the purpose of addressing storage. When More detail is provided in the individual instruction an effective address is placed into a register by an descriptions. instruction or event, the value placed into the high-order 32 bits of the register differs between the Server envi- Effective address calculations, for both data and ronment and the Embedded environment. instruction accesses, use 64-bit two's complement Server environment, and Embedded Environment addition. All 64 bits of each address component partici- when the high-order 32 bits of GPRs are imple- pate in the calculation regardless of mode (32-bit or mented: 64-bit). In this computation one operand is an address - Load with Update and Store with Update (which is by definition an unsigned number) and the instructions set the high-order 32 bits of regis- second is a signed offset. Carries out of the most signif- ter RA to the high-order 32 bits of the 64-bit icant bit are ignored. result. 26 Power ISATM Book I Version 2.06 - In all other cases (e.g., the Link Register when With DS-form instructions, the 14-bit DS field is set by Branch instructions having LK=1, Spe- concatenated on the right with 0b00 and cial Purpose Registers when set to an effec- sign-extended to form a 64-bit address compo- tive address by invocation of a system error nent. In computing the effective address of a data handler) the high-order 32 bits of the register element, this address component is added to the are set to 0s except as described in the last contents of the GPR designated by RA or to zero if sentence of this paragraph. RA=0. Embedded environment when the high-order 32 With I-form Branch instructions, the 24-bit LI field is bits of GPRs are not implemented for the following concatenated on the right with 0b00 and cases: sign-extended to form a 64-bit address compo- The high-order 32 bits of the register are set to an nent. If AA=0, this address component is added to undefined value except for the Initialize Next the address of the Branch instruction to form the Instruction register [Category: Embedded.Multi- effective address of the target instruction. If AA=1, threading] (see Section 1.5.2 and Book III), and for this address component is the effective address of the following case. For a register that is loaded the target instruction. with an effective address by the invocation of a system error handler, the high-order 32 bits of the With B-form Branch instructions, the 14-bit BD field register are set to 0s if the computation mode is is concatenated on the right with 0b00 and 64-bit after the system error is invoked. The 64-bit sign-extended to form a 64-bit address compo- current instruction address is not affected by a nent. If AA=0, this address component is added to change from 32-bit mode to 64-bit mode, but is the address of the Branch instruction to form the affected by a change from 64-bit mode to 32-bit effective address of the target instruction. If AA=1, mode. In the latter case, the high-order 32 bits are this address component is the effective address of set to 0. The same is true for the 64-bit next the target instruction. instruction address, except as described in the last With XL-form Branch instructions, bits 0:61 of the item of the list below. Link Register or the Count Register are concate- As used to address storage, the effective address arith- nated on the right with 0b00 to form the effective metic appears to wrap around from the maximum address of the target instruction. address, 232 - 1, to address 0, except that if the current With sequential instruction fetching, the value 4 is instruction is at effective address 232 - 4 the effective added to the address of the current instruction to address of the next sequential instruction is undefined. form the effective address of the next instruction, RA is a field in the instruction which specifies an except that if the current instruction is at the maxi- address component in the computation of an effective mum instruction effective address for the mode address. A zero in the RA field indicates the absence of (264 - 4 in 64-bit mode, 232 - 4 in 32-bit mode) the the corresponding address component. A value of zero effective address of the next sequential instruction is substituted for the absent component of the effective is undefined. (There is one other exception to this address computation. This substitution is shown in the rule; this exception involves changing between instruction descriptions as (RA|0). 32-bit mode and 64-bit mode and is described in Section 6.3.2 of Book III-E and Section 6.3.2 of Effective addresses are computed as follows. In the Book III-E.) descriptions below, it should be understood that "the contents of a GPR" refers to the entire 64-bit contents, If the size of the operand of a storage access instruc- independent of mode, but that in 32-bit mode only bits tion is more than one byte, the effective address for 32:63 of the 64-bit result of the computation are used to each byte after the first is computed by adding 1 to the address storage. effective address of the preceding byte. With X-form instructions, in computing the effective address of a data element, the contents of the GPR designated by RB (or the value zero for lswi and stswi) are added to the contents of the GPR designated by RA or to zero if RA=0 or RA is not used in forming the EA. With D-form instructions, the 16-bit D field is sign-extended to form a 64-bit address compo- nent. In computing the effective address of a data element, this address component is added to the contents of the GPR designated by RA or to zero if RA=0. Chapter 1. Introduction 27 Version 2.06 28 Power ISATM Book I Version 2.06 Chapter 2. Branch Facility 2.1 Branch Facility Overview . . . . . . . . 29 2.5 Condition Register Instructions . . . . 37 2.2 Instruction Execution Order . . . . . . 29 2.5.1 Condition Register Logical 2.3 Branch Facility Registers . . . . . . . . 30 Instructions . . . . . . . . . . . . . . . . . . . . . . 37 2.3.1 Condition Register . . . . . . . . . . . . 30 2.5.2 Condition Register Field 2.3.2 Link Register . . . . . . . . . . . . . . . . 31 Instruction . . . . . . . . . . . . . . . . . . . . . . . 38 2.3.3 Count Register. . . . . . . . . . . . . . . 31 2.6 System Call Instruction . . . . . . . . . 39 2.4 Branch Instructions . . . . . . . . . . . . . 31 2.1 Branch Facility Overview that causes the exception need not complete before the next instruction begins execution, with This chapter describes the registers and instructions respect to setting exception bits and (if the excep- that make up the Branch Facility. tion is enabled) invoking the system error handler. A Store instruction modifies one or more bytes in an area of storage that contains instructions that 2.2 Instruction Execution Order will subsequently be executed. Before an instruc- tion in that area of storage is executed, software In general, instructions appear to execute sequentially, synchronization is required to ensure that the in the order in which they appear in storage. The instructions executed are consistent with the exceptions to this rule are listed below. results produced by the Store instruction. Branch instructions for which the branch is taken cause execution to continue at the target address Programming Note specified by the Branch instruction. This software synchronization will generally be Trap instructions for which the trap conditions are provided by system library programs (see satisfied, and System Call instructions, cause the Section 1.8 of Book II). Application programs appropriate system handler to be invoked. should call the appropriate system library pro- gram before attempting to execute modified Exceptions can cause the system error handler to instructions. be invoked, as described in Section 1.9, "Excep- tions" on page 23. Returning from a system service program, system trap handler, or system error handler causes exe- cution to continue at a specified address. The model of program execution in which the proces- sor appears to execute one instruction at a time, com- pleting each instruction before beginning to execute the next instruction is called the "sequential execution model". In general, the processor obeys the sequential execution model. For the instructions and facilities defined in this Book, the only exceptions to this rule are the following. A floating-point exception occurs when the proces- sor is running in one of the Imprecise floating-point exception modes (see Section 4.4). The instruction Chapter 2. Branch Facility 29 Version 2.06 2.3 Branch Facility Registers Bit Description 0 Negative (LT) The result is negative. 2.3.1 Condition Register 1 Positive (GT) The Condition Register (CR) is a 32-bit register which The result is positive. reflects the result of certain operations, and provides a 2 Zero (EQ) mechanism for testing (and branching). The result is zero. CR 3 Summary Overflow (SO) 32 63 This is a copy of the contents of XERSO at the completion of the instruction. Figure 39. Condition Register The stbcx., sthcx., stwcx., and stdcx. instructions The bits in the Condition Register are grouped into (see Section 4.4.2, "Load and Reserve and Store Con- eight 4-bit fields, named CR Field 0 (CR0), ..., CR Field ditional Instructions", in Book II) also set CR Field 0. 7 (CR7), which are set in one of the following ways. For all floating-point instructions in which Rc=1, CR Specified fields of the CR can be set by a move to Field 1 (bits 36:39 of the Condition Register) is set to the CR from a GPR (mtcrf, mtocrf). the Floating-Point exception status, copied from bits A specified field of the CR can be set by a move to 0:3 of the Floating-Point Status and Control Register. the CR from another CR field (mcrf), from This occurs regardless of whether any exceptions are XER32:35 (mcrxr), or from the FPSCR (mcrfs). enabled, and regardless of whether the writing of the CR Field 0 can be set as the implicit result of a result is suppressed (see Section 4.4, "Floating-Point fixed-point instruction. Exceptions" on page 114). These bits are interpreted CR Field 1 can be set as the implicit result of a as follows. floating-point instruction. CR Field 6 can be set as the implicit result of a Bit Description vector instruction. 0 Floating-Point Exception Summary (FX) A specified CR field can be set as the result of a This is a copy of the contents of FPSCRFX at Compare instruction. the completion of the instruction. CR Field 1 can be set as the implicit result of a decimal floating-point instruction. 1 Floating-Point Enabled Exception Sum- mary (FEX) Instructions are provided to perform logical operations This is a copy of the contents of FPSCRFEX at on individual CR bits and to test individual CR bits. the completion of the instruction. For all fixed-point instructions in which Rc=1, and for 2 Floating-Point Invalid Operation Excep- addic., andi., and andis., the first three bits of CR tion Summary (VX) Field 0 (bits 32:34 of the Condition Register) are set by This is a copy of the contents of FPSCRVX at signed comparison of the result to zero, and the fourth the completion of the instruction. bit of CR Field 0 (bit 35 of the Condition Register) is 3 Floating-Point Overflow Exception (OX) copied from the SO field of the XER. "Result" here This is a copy of the contents of FPSCROX at refers to the entire 64-bit value placed into the target the completion of the instruction. register in 64-bit mode, and to bits 32:63 of the 64-bit value placed into the target register in 32-bit mode. For Compare instructions, a specified CR field is set to reflect the result of the comparison. The bits of the if (64-bit mode) specified CR field are interpreted as follows. A com- then M 0 plete description of how the bits are set is given in the else M 32 instruction descriptions in Section 3.3.9, "Fixed-Point if (target_register)M:63 < 0 then c 0b100 Compare Instructions" on page 74, Section 4.6.8, else if (target_register)M:63 > 0 then c 0b010 "Floating-Point Compare Instructions" on page 148, else c 0b001 CR0 c || XERSO and Section 8.3.9, "SPE Instruction Set" on page 510. If any portion of the result is undefined, then the value Bit Description placed into the first three bits of CR Field 0 is unde- fined. 0 Less Than, Floating-Point Less Than (LT, FL) The bits of CR Field 0 are interpreted as follows. For fixed-point Compare instructions, (RA) < SI or (RB) (signed comparison) or (RA) The sequence of instruction execution can be changed SI or (RB) (signed comparison) or (RA) >u UI by the Branch instructions. Because all instructions are or (RB) (unsigned comparison). For floating- on word boundaries, bits 62 and 63 of the generated point Compare instructions, (FRA) > (FRB). branch target address are ignored by the processor in 2 Equal, Floating-Point Equal (EQ, FE) performing the branch. For fixed-point Compare instructions, (RA) = The Branch instructions compute the effective address SI, UI, or (RB). For floating-point Compare (EA) of the target in one of the following four ways, as instructions, (FRA) = (FRB). described in Section 1.10.3, "Effective Address Calcu- 3 Summary Overflow, Floating-Point Unor- lation" on page 27. dered (SO,FU) 1. Adding a displacement to the address of the For fixed-point Compare instructions, this is a Branch instruction (Branch or Branch Conditional copy of the contents of XERSO at the comple- with AA=0). tion of the instruction. For floating-point Com- pare instructions, one or both of (FRA) and 2. Specifying an absolute address (Branch or Branch (FRB) is a NaN. Conditional with AA=1). 3. Using the address contained in the Link Register 2.3.2 Link Register (Branch Conditional to Link Register). The Link Register (LR) is a 64-bit register. It can be 4. Using the address contained in the Count Register used to provide the branch target address for the (Branch Conditional to Count Register). Branch Conditional to Link Register instruction, and it In all four cases, in 32-bit mode the final step in the holds the return address after Branch instructions for address computation is setting the high-order 32 bits of which LK=1. the target address to 0. LR For the first two methods, the target addresses can be computed sufficiently ahead of the Branch instruction 0 63 that instructions can be prefetched along the target Figure 40. Link Register path. For the third and fourth methods, prefetching instructions along the target path is also possible pro- vided the Link Register or the Count Register is loaded 2.3.3 Count Register sufficiently ahead of the Branch instruction. The Count Register (CTR) is a 64-bit register. It can be Branching can be conditional or unconditional, and the used to hold a loop count that can be decremented dur- return address can optionally be provided. If the return ing execution of Branch instructions that contain an address is to be provided (LK=1), the effective address appropriately coded BO field. If the value in the Count of the instruction following the Branch instruction is Register is 0 before being decremented, it is -1 after- placed into the Link Register after the branch target ward. The Count Register can also be used to provide address has been computed; this is done regardless of the branch target address for the Branch Conditional to whether the branch is taken. Count Register instruction. For Branch Conditional instructions, the BO field speci- CTR fies the conditions under which the branch is taken, as shown in Figure 42. In the figure, M=0 in 64-bit mode 0 63 and M=32 in 32-bit mode. Figure 41. Count Register Chapter 2. Branch Facility 31 Version 2.06 provides a hint about the use of the instruction, as shown in Figure 44. BO Description BH Hint 0000z Decrement the CTR, then branch if the dec- 00 bclr[l]: The instruction is a subroutine remented CTRM:630 and CRBI=0 return 0001z Decrement the CTR, then branch if the dec- bcctr[l]: The instruction is not a subroutine remented CTRM:63=0 and CRBI=0 return; the target address is likely 001at Branch if CRBI=0 to be the same as the target 0100z Decrement the CTR, then branch if the dec- address used the preceding time remented CTRM:630 and CRBI=1 the branch was taken 0101z Decrement the CTR, then branch if the dec- 01 bclr[l]: The instruction is not a subroutine remented CTRM:63=0 and CRBI=1 return; the target address is likely to be the same as the target 011at Branch if CRBI=1 address used the preceding time 1a00t Decrement the CTR, then branch if the dec- the branch was taken remented CTRM:630 bcctr[l]: Reserved 1a01t Decrement the CTR, then branch if the dec- remented CTRM:63=0 10 Reserved 1z1zz Branch always 11 bclr[l] and bcctr[l]: The target address is not predictable Notes: 1. "z" denotes a bit that is ignored. Figure 44. BH field encodings 2. The "a" and "t" bits are used as described below. Programming Note Figure 42. BO field encodings The hint provided by the BH field is independent of The "a" and "t" bits of the BO field can be used by soft- the hint provided by the "at" bits (e.g., the BH field ware to provide a hint about whether the branch is provides no indication of whether the branch is likely to be taken or is likely not to be taken, as shown likely to be taken). in Figure 43. at Hint Extended mnemonics for branches 00 No hint is given Many extended mnemonics are provided so that 01 Reserved Branch Conditional instructions can be coded with por- tions of the BO and BI fields as part of the mnemonic 10 The branch is very likely not to be taken rather than as part of a numeric operand. Some of 11 The branch is very likely to be taken these are shown as examples with the Branch instruc- Figure 43. "at" bit encodings tions. See Appendix E for additional extended mne- monics. Programming Note Programming Note Many implementations have dynamic mechanisms for predicting whether a branch will be taken. The hints provided by the "at" bits and by the BH Because the dynamic prediction is likely to be very field do not affect the results of executing the accurate, and is likely to be overridden by any hint instruction. provided by the "at" bits, the "at" bits should be set The "z" bits should be set to 0, because they may to 0b00 unless the static prediction implied by be assigned a meaning in some future version of at=0b10 or at=0b11 is highly likely to be correct. the architecture. For Branch Conditional to Link Register and Branch Conditional to Count Register instructions, the BH field 32 Power ISATM Book I Version 2.06 Programming Note Many implementations have dynamic mechanisms for branch to, and use a bcctr instruction (LK=0, and predicting the target addresses of bclr[l] and bcctr[l] BH=0b11 if appropriate) to branch to the selected instructions. These mechanisms may cache return address. addresses (i.e., Link Register values set by Branch Direct subroutine linkage: instructions for which LK=1 and for which the branch Here A calls B and B returns to A. The two was taken, other than the special form shown in the branches should be as follows. first example below) and recently used branch target - A calls B: use a bl or bcl instruction (LK=1). addresses. To obtain the best performance across the - B returns to A: use a bclr instruction (LK=0) widest range of implementations, the programmer (the return address is in, or can be restored to, should obey the following rules. the Link Register). Use Branch instructions for which LK=1 only as Indirect subroutine linkage: subroutine calls (including function calls, etc.), or in Here A calls Glue, Glue calls B, and B returns to A the special form shown in the first example below. rather than to Glue. (Such a calling sequence is Pair each subroutine call (i.e., each Branch common in linkage code used when the subroutine instruction for which LK=1 and the branch is taken, that the programmer wants to call, here B, is in a other than the special form shown in the first different module from the caller; the Binder inserts example below) with a bclr instruction that returns "glue" code to mediate the branch.) The three from the subroutine and has BH=0b00. branches should be as follows. Do not use bclrl as a subroutine call. (Some imple- - A calls Glue: use a bl or bcl instruction mentations access the return address cache at (LK=1). most once per instruction; such implementations - Glue calls B: place the address of B into the are likely to treat bclrl as a subroutine return, and Count Register, and use a bcctr instruction not as a subroutine call.) (LK=0). For bclr[l] and bcctr[l], use the appropriate value - B returns to A: use a bclr instruction (LK=0) in the BH field. (the return address is in, or can be restored to, The following are examples of programming conven- the Link Register). tions that obey these rules. In the examples, BH is assumed to contain 0b00 unless otherwise stated. In Function call: addition, the "at" bits are assumed to be coded appro- Here A calls a function, the identity of which may priately. vary from one instance of the call to another, instead of calling a specific program B. This case Let A, B, and Glue be specific programs. should be handled using the conventions of the Obtaining the address of the next instruction: preceding two bullets, depending on whether the Use the following form of Branch and Link. call is direct or indirect, with the following differ- bcl 20,31,$+4 ences. Loop counts: - If the call is direct, place the address of the Keep them in the Count Register, and use a bc function into the Count Register, and use a instruction (LK=0) to decrement the count and to bcctrl instruction (LK=1) instead of a bl or bcl branch back to the beginning of the loop if the dec- instruction. remented count is nonzero. - For the bcctr[l] instruction that branches to the function, use BH=0b11 if appropriate. Computed goto's, case statements, etc.: Use the Count Register to hold the address to Chapter 2. Branch Facility 33 Version 2.06 Compatibility Note The bits corresponding to the current "a" and "t" bits, and to the current "z" bits except in the "branch always" BO encoding, had different meanings in versions of the architecture that precede Version 2.00. The bit corresponding to the "t" bit was called the "y" bit. The "y" bit indicated whether to use the architected default prediction (y=0) or to use the complement of the default prediction (y=1). The default prediction was defined as follows. - If the instruction is bc[l][a] with a negative value in the displacement field, the branch is taken. (This is the only case in which the prediction corresponding to the "y" bit differs from the prediction corresponding to the "t" bit.) - In all other cases (bc[l][a] with a nonnega- tive value in the displacement field, bclr[l], or bcctr[l]), the branch is not taken. The BO encodings that test both the Count Register and the Condition Register had a "y" bit in place of the current "z" bit. The meaning of the "y" bit was as described in the preceding item. The "a" bit was a "z" bit. Because these bits have always been defined either to be ignored or to be treated as hints, a given program will produce the same result on any implementation regardless of the values of the bits. Also, because even the "y" bit is ignored, in prac- tice, by most processors that comply with versions of the architecture that precede Version 2.00, the performance of a given program on those proces- sors will not be affected by the values of the bits. 34 Power ISATM Book I Version 2.06 Branch I-form Branch Conditional B-form b target_addr (AA=0 LK=0) bc BO,BI,target_addr (AA=0 LK=0) ba target_addr (AA=1 LK=0) bca BO,BI,target_addr (AA=1 LK=0) bl target_addr (AA=0 LK=1) bcl BO,BI,target_addr (AA=0 LK=1) bla target_addr (AA=1 LK=1) bcla BO,BI,target_addr (AA=1 LK=1) 18 LI AA LK 16 BO BI BD AA LK 0 6 30 31 0 6 11 16 30 31 if AA then NIA iea EXTS(LI || 0b00) if (64-bit mode) else NIA iea CIA + EXTS(LI || 0b00) then M 0 if LK then LR iea CIA + 4 else M 32 if ¬BO2 then CTR CTR - 1 target_addr specifies the branch target address. ctr_ok BO2 | ((CTRM:63 0) BO3) If AA=0 then the branch target address is the sum of cond_ok BO0 | (CRBI+32 BO1) LI || 0b00 sign-extended and the address of this if ctr_ok & cond_ok then if AA then NIA iea EXTS(BD || 0b00) instruction, with the high-order 32 bits of the branch tar- else NIA iea CIA + EXTS(BD || 0b00) get address set to 0 in 32-bit mode. if LK then LR iea CIA + 4 If AA=1 then the branch target address is the value BI+32 specifies the Condition Register bit to be tested. LI || 0b00 sign-extended, with the high-order 32 bits of The BO field is used to resolve the branch as described the branch target address set to 0 in 32-bit mode. in Figure 42. target_addr specifies the branch target If LK=1 then the effective address of the instruction fol- address. lowing the Branch instruction is placed into the Link If AA=0 then the branch target address is the sum of Register. BD || 0b00 sign-extended and the address of this Special Registers Altered: instruction, with the high-order 32 bits of the branch tar- LR (if LK=1) get address set to 0 in 32-bit mode. If AA=1 then the branch target address is the value BD || 0b00 sign-extended, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction fol- lowing the Branch instruction is placed into the Link Register. Special Registers Altered: CTR (if BO2=0) LR (if LK=1) Extended Mnemonics: Examples of extended mnemonics for Branch Condi- tional: Extended: Equivalent to: blt target bc 12,0,target bne cr2,target bc 4,10,target bdnz target bc 16,0,target Chapter 2. Branch Facility 35 Version 2.06 Branch Conditional to Link Register Branch Conditional to Count Register XL-form XL-form bclr BO,BI,BH (LK=0) bcctr BO,BI,BH (LK=0) bclrl BO,BI,BH (LK=1) bcctrl BO,BI,BH (LK=1) 19 BO BI /// BH 16 LK 19 BO BI /// BH 528 LK 0 6 11 16 19 21 31 0 6 11 16 19 21 31 if (64-bit mode) cond_ok BO0 | (CRBI+32 BO1) then M 0 if cond_ok then NIA iea CTR0:61 || 0b00 else M 32 if LK then LR iea CIA + 4 if ¬BO2 then CTR CTR - 1 ctr_ok BO2 | ((CTRM:63 0) BO3 BI+32 specifies the Condition Register bit to be tested. cond_ok BO0 | (CRBI+32 BO1) The BO field is used to resolve the branch as described if ctr_ok & cond_ok then NIA iea LR0:61 || 0b00 in Figure 42. The BH field is used as described in if LK then LR iea CIA + 4 Figure 44. The branch target address is CTR0:61 || 0b00, with the high-order 32 bits of the BI+32 specifies the Condition Register bit to be tested. branch target address set to 0 in 32-bit mode. The BO field is used to resolve the branch as described in Figure 42. The BH field is used as described in If LK=1 then the effective address of the instruction fol- Figure 44. The branch target address is LR0:61 || 0b00, lowing the Branch instruction is placed into the Link with the high-order 32 bits of the branch target address Register. set to 0 in 32-bit mode. If the "decrement and test CTR" option is specified If LK=1 then the effective address of the instruction fol- (BO2=0), the instruction form is invalid. lowing the Branch instruction is placed into the Link Special Registers Altered: Register. LR (if LK=1) Special Registers Altered: Extended Mnemonics: CTR (if BO2=0) LR (if LK=1) Examples of extended mnemonics for Branch Condi- tional to Count Register. Extended Mnemonics: Examples of extended mnemonics for Branch Condi- Extended: Equivalent to: tional to Link Register: bcctr 4,6 bcctr 4,6,0 bltctr bcctr 12,0,0 Extended: Equivalent to: bnectr cr2 bcctr 4,10,0 bclr 4,6 bclr 4,6,0 bltlr bclr 12,0,0 bnelr cr2 bclr 4,10,0 bdnzlr bclr 16,0,0 Programming Note bclr, bclrl, bcctr, and bcctrl each serve as both a basic and an extended mnemonic. The Assembler will recognize a bclr, bclrl, bcctr, or bcctrl mne- monic with three operands as the basic form, and a bclr, bclrl, bcctr, or bcctrl mnemonic with two operands as the extended form. In the extended form the BH operand is omitted and assumed to be 0b00. 36 Power ISATM Book I Version 2.06 2.5 Condition Register Instructions 2.5.1 Condition Register Logical Instructions The Condition Register Logical instructions have pre- Extended mnemonics for Condition ferred forms; see Section 1.8.1. In the preferred forms, Register logical operations the BT and BB fields satisfy the following rule. The bit specified by BT is in the same Condition A set of extended mnemonics is provided that allow Register field as the bit specified by BB. additional Condition Register logical operations, beyond those provided by the basic Condition Register Logical instructions, to be coded easily. Some of these are shown as examples with the Condition Register Logical instructions. See Appendix E for additional extended mnemonics. Condition Register AND XL-form Condition Register NAND XL-form crand BT,BA,BB crnand BT,BA,BB 19 BT BA BB 257 / 19 BT BA BB 225 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 CRBA+32 & CRBB+32 CRBT+32 ¬(CRBA+32 & CRBB+32) The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is ANDed with the bit in the Condition Register specified ANDed with the bit in the Condition Register specified by BB+32, and the result is placed into the bit in the by BB+32, and the complemented result is placed into Condition Register specified by BT+32. the bit in the Condition Register specified by BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 Condition Register OR XL-form Condition Register XOR XL-form cror BT,BA,BB crxor BT,BA,BB 19 BT BA BB 449 / 19 BT BA BB 193 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 CRBA+32 | CRBB+32 CRBT+32 CRBA+32 CRBB+32 The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is ORed with the bit in the Condition Register specified by XORed with the bit in the Condition Register specified BB+32, and the result is placed into the bit in the Con- by BB+32, and the result is placed into the bit in the dition Register specified by BT+32. Condition Register specified by BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 Extended Mnemonics: Extended Mnemonics: Example of extended mnemonics for Condition Regis- Example of extended mnemonics for Condition Regis- ter OR: ter XOR: Extended: Equivalent to: Extended: Equivalent to: crmove Bx,By cror Bx,By,By crclr Bx crxor Bx,Bx,Bx Chapter 2. Branch Facility 37 Version 2.06 Condition Register NOR XL-form Condition Register Equivalent XL-form crnor BT,BA,BB creqv BT,BA,BB 19 BT BA BB 33 / 19 BT BA BB 289 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 ¬(CRBA+32 | CRBB+32) CRBT+32 CRBA+32 CRBB+32 The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is ORed with the bit in the Condition Register specified by XORed with the bit in the Condition Register specified BB+32, and the complemented result is placed into the by BB+32, and the complemented result is placed into bit in the Condition Register specified by BT+32. the bit in the Condition Register specified by BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 Extended Mnemonics: Extended Mnemonics: Example of extended mnemonics for Condition Regis- Example of extended mnemonics for Condition Regis- ter NOR: ter Equivalent: Extended: Equivalent to: Extended: Equivalent to: crnot Bx,By crnor Bx,By,By crset Bx creqv Bx,Bx,Bx Condition Register AND with Complement Condition Register OR with Complement XL-form XL-form crandc BT,BA,BB crorc BT,BA,BB 19 BT BA BB 129 / 19 BT BA BB 417 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 CRBA+32 & ¬CRBB+32 CRBT+32 CRBA+32 | ¬CRBB+32 The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is ANDed with the complement of the bit in the Condition ORed with the complement of the bit in the Condition Register specified by BB+32, and the result is placed Register specified by BB+32, and the result is placed into the bit in the Condition Register specified by into the bit in the Condition Register specified by BT+32. BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 2.5.2 Condition Register Field Instruction Move Condition Register Field XL-form mcrf BF,BFA 19 BF // BFA // /// 0 / 0 6 9 11 14 16 21 31 CR4×BF+32:4×BF+35 CR4×BFA+32:4×BFA+35 The contents of Condition Register field BFA are cop- ied to Condition Register field BF. Special Registers Altered: CR field BF 38 Power ISATM Book I Version 2.06 2.6 System Call Instruction This instruction provides the means by which a pro- gram can call upon the system to perform a service. System Call SC-form sc LEV 17 /// /// // LEV // 1 / 0 6 11 16 20 27 30 31 This instruction calls the system to perform a service. A complete description of this instruction can be found in Book III. The use of the LEV field is described in Book III. The LEV values greater than 1 are reserved, and bits 0:5 of the LEV field (instruction bits 20:25) are treated as a reserved field. When control is returned to the program that executed the System Call instruction, the contents of the regis- ters will depend on the register conventions used by the program providing the system service. This instruction is context synchronizing (see Book III). Special Registers Altered: Dependent on the system service Programming Note sc serves as both a basic and an extended mne- monic. The Assembler will recognize an sc mne- monic with one operand as the basic form, and an sc mnemonic with no operand as the extended form. In the extended form the LEV operand is omitted and assumed to be 0. In application programs the value of the LEV oper- and for sc should be 0. Chapter 2. Branch Facility 39 Version 2.06 40 Power ISATM Book I Version 2.06 Chapter 3. Fixed-Point Facility 3.1 Fixed-Point Facility Overview . . . . . 41 3.3.8 Fixed-Point Arithmetic Instructions 62 3.2 Fixed-Point Facility Registers . . . . . 42 3.3.8.1 64-bit Fixed-Point Arithmetic 3.2.1 General Purpose Registers . . . . . 42 Instructions [Category: 64-Bit] . . . . . . . . 71 3.2.2 Fixed-Point Exception 3.3.9 Fixed-Point Compare Instructions 74 Register. . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3.10 Fixed-Point Trap Instructions . . . 76 3.2.3 VR Save Register . . . . . . . . . . . . 43 3.3.10.1 64-bit Fixed-Point Trap Instruc- 3.2.4 Software Use SPRs [Category: tions [Category: 64-Bit] . . . . . . . . . . . . . 77 Embedded] . . . . . . . . . . . . . . . . . . . . . . 43 3.3.11 Fixed-Point Select [Category: 3.2.5 Device Control Registers Phased-In (sV2.06)]. . . . . . . . . . . . . . . . 77 [Category: Embedded.Device Control] . 43 3.3.12 Fixed-Point Logical Instructions . 78 3.3 Fixed-Point Facility Instructions . . . 44 3.3.12.1 64-bit Fixed-Point Logical Instruc- 3.3.1 Fixed-Point Storage Access Instruc- tions [Category: 64-Bit] . . . . . . . . . . . . . 85 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.3.13 Fixed-Point Rotate and Shift 3.3.1.1 Storage Access Exceptions . . . 44 Instructions . . . . . . . . . . . . . . . . . . . . . . 87 3.3.2 Fixed-Point Load Instructions . . . 44 3.3.13.1 Fixed-Point Rotate Instructions 87 3.3.2.1 64-bit Fixed-Point Load Instruc- 3.3.13.1.1 64-bit Fixed-Point Rotate tions [Category: 64-Bit] . . . . . . . . . . . . . 49 Instructions [Category: 64-Bit] . . . . . . . . 90 3.3.3 Fixed-Point Store Instructions . . . 51 3.3.13.2 Fixed-Point Shift Instructions. . 93 3.3.3.1 64-bit Fixed-Point Store Instruc- 3.3.13.2.1 64-bit Fixed-Point Shift Instruc- tions [Category: 64-Bit] . . . . . . . . . . . . . 54 tions [Category: 64-Bit] . . . . . . . . . . . . . 95 3.3.4 Fixed-Point Load and Store with Byte 3.3.14 Binary Coded Decimal (BCD) Reversal Instructions. . . . . . . . . . . . . . . 55 Assist Instructions [Category: Embed- 3.3.4.1 64-Bit Load and Store with Byte ded.Phased-in, Server] . . . . . . . . . . . . . 97 Reversal Instructions [Category: 3.3.15 Move To/From System Register 64-bit] . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Instructions . . . . . . . . . . . . . . . . . . . . . . 99 3.3.5 Fixed-Point Load and Store Multiple 3.3.15.1 Move to/From One Condition Instructions . . . . . . . . . . . . . . . . . . . . . . 57 Register Field Instructions . . . . . . . . . . 103 3.3.6 Fixed-Point Move Assist Instructions 3.3.15.2 Move To/From System Registers [Category: Move Assist] . . . . . . . . . . . . 58 [Category: Embedded]. . . . . . . . . . . . . 104 3.3.7 Other Fixed-Point Instructions . . . 61 3.1 Fixed-Point Facility Overview This chapter describes the registers and instructions that make up the Fixed-Point Facility. Chapter 3. Fixed-Point Facility 41 Version 2.06 3.2 Fixed-Point Facility Registers 3.2.1 General Purpose Registers causes SO to be set to 0 and OV to be set to 1. All manipulation of information is done in registers 33 Overflow (OV) internal to the Fixed-Point Facility. The principal stor- The Overflow bit is set to indicate that an over- age internal to the Fixed-Point Facility is a set of 32 flow has occurred during execution of an General Purpose Registers (GPRs). See Figure 45. instruction. XO-form Add, Subtract From, and Negate GPR 0 instructions having OE=1 set it to 1 if the carry GPR 1 out of bit M is not equal to the carry out of bit ... M+1, and set it to 0 otherwise. XO-form Multiply Low and Divide instructions ... having OE=1 set it to 1 if the result cannot be GPR 30 represented in 64 bits (mulld, divd, divde, divdu, divdeu) or in 32 bits (mullw, divw, GPR 31 divwe, divwu, divweu), and set it to 0 other- 0 63 wise. The OV bit is not altered by Compare instructions, nor by other instructions (except Figure 45. General Purpose Registers mtspr to the XER, and mcrxr) that cannot Each GPR is a 64-bit register. overflow. [Category: 3.2.2 Fixed-Point Exception Legacy Integer Multiply-Accumulate] XO-form Legacy Integer Multiply-Accumulate Register instructions set OV when OE=1 to reflect overflow of the 32-bit result. For signed-inte- The Fixed-Point Exception Register (XER) is a 64-bit ger accumulation, overflow occurs when the register. add produces a carry out of bit 32 that is not equal to the carry out of bit 33. For XER unsigned-integer accumulation, overflow 0 63 occurs when the add produces a carry out of Figure 46. Fixed-Point Exception Register bit 32. The bit definitions for the Fixed-Point Exception Regis- 34 Carry (CA) ter are shown below. Here M=0 in 64-bit mode and The Carry bit is set as follows, during execu- M=32 in 32-bit mode. tion of certain instructions. Add Carrying, Sub- tract From Carrying, Add Extended, and The bits are set based on the operation of an instruc- Subtract From Extended types of instructions tion considered as a whole, not on intermediate results set it to 1 if there is a carry out of bit M, and (e.g., the Subtract From Carrying instruction, the result set it to 0 otherwise. Shift Right Algebraic of which is specified as the sum of three values, sets instructions set it to 1 if any 1-bits have been bits in the Fixed-Point Exception Register based on the shifted out of a negative operand, and set it to entire operation, not on an intermediate sum). 0 otherwise. The CA bit is not altered by Com- pare instructions, nor by other instructions Bit(s Description (except Shift Right Algebraic, mtspr to the 0:31 Reserved XER, and mcrxr) that cannot carry. 32 Summary Overflow (SO) 35:56 Reserved The Summary Overflow bit is set to 1 when- 57:63 This field specifies the number of bytes to be ever an instruction (except mtspr) sets the transferred by a Load String Indexed or Store Overflow bit. Once set, the SO bit remains set String Indexed instruction. until it is cleared by an mtspr instruction (specifying the XER) or an mcrxr instruction. [Category: Legacy Move Assist] It is not altered by Compare instructions, nor This field is used as a target by dmlzb to indi- by other instructions (except mtspr to the cate the byte location of the leftmost zero byte XER, and mcrxr) that cannot overflow. Exe- found. cuting an mtspr instruction to the XER, sup- plying the values 0 for SO and 1 for OV, 42 Power ISATM Book I Version 2.06 3.2.3 VR Save Register access them and does not define the Device Control Registers themselves. Device Control Registers may control the use of VRSAVE on-chip peripherals, such as memory controllers (the 32 63 definition of specific Device Control Registers is imple- mentation-dependent). The contents of user-mode-accessible Device Control The VR Save Register (VRSAVE) is a 32-bit register Registers can be read using mfdcrux and written using that can be used as a software use SPR; see Sections mtdcrux. 3.2.4 and 6.3.3. Architecture Note In versions of the Architecture that precede Version 2.05, the VRSAVE Register was in the Vector cate- gory (see Section 6.3.3). It is now included in the Base category to simplify operating system support of and migration between heterogeneous systems containing processors with and without support for the Vector category. 3.2.4 Software Use SPRs [Cate- gory: Embedded] Software Use SPRs are 64-bit registers that have no defined functionality. SPRG4-7 can be read by applica- tion programs. Additional Software Use SPRs are defined in Book III. SPRG4 SPRG5 SPRG6 SPRG7 0 63 Figure 47. Software-use SPRs Programming Note USPRG0 was made a 32-bit register and renamed to VRSAVE; see Sections 3.2.3 and 6.3.3. 3.2.5 Device Control Registers [Category: Embedded.Device Con- trol] Device Control Registers (DCRs) are on-chip registers that exist architecturally outside the processor and thus are not actually part of the processor architecture. This specification simply defines the existence of a Device Control Register `address space' and the instructions to Chapter 3. Fixed-Point Facility 43 Version 2.06 3.3 Fixed-Point Facility Instructions 3.3.1 Fixed-Point Storage Access Instructions The Storage Access instructions compute the effective Programming Note address (EA) of the storage to be accessed as described in Section 1.10.3 on page 27. The DS field in DS-form Storage Access instruc- tions is a word offset, not a byte offset like the D Programming Note field in D-form Storage Access instructions. How- ever, for programming convenience, Assemblers The la extended mnemonic permits computing an should support the specification of byte offsets for effective address as a Load or Store instruction both forms of instruction. would, but loads the address itself into a GPR rather than loading the value that is in storage at that address. 3.3.1.1 Storage Access Exceptions Storage accesses will cause the system data storage error handler to be invoked if the program is not allowed to modify the target storage (Store only), or if the program attempts to access storage that is unavail- able. 3.3.2 Fixed-Point Load Instructions The byte, halfword, word, or doubleword in storage addressed by EA is loaded into register RT. Many of the Load instructions have an "update" form, in which register RA is updated with the effective address. For these forms, if RA0 and RART, the effective address is placed into register RA and the storage ele- ment (byte, halfword, word, or doubleword) addressed by EA is loaded into RT. Programming Note In some implementations, the Load Algebraic and Load with Update instructions may have greater latency than other types of Load instructions. More- over, Load with Update instructions may take longer to execute in some implementations than the corresponding pair of a non-update Load instruction and an Add instruction. 44 Power ISATM Book I Version 2.06 Load Byte and Zero D-form Load Byte and Zero Indexed X-form lbz RT,D(RA) lbzx RT,RA,RB 34 RT RA D 31 RT RA RB 87 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + EXTS(D) EA b + (RB) RT 56 56 0 || MEM(EA, 1) RT 0 || MEM(EA, 1) Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum The byte in storage addressed by EA is loaded into (RA|0)+ (RB). The byte in storage addressed by EA is RT56:63. RT0:55 are set to 0. loaded into RT56:63. RT0:55 are set to 0. Special Registers Altered: Special Registers Altered: None None Load Byte and Zero with Update D-form Load Byte and Zero with Update Indexed X-form lbzu RT,D(RA) lbzux RT,RA,RB 35 RT RA D 0 6 11 16 31 31 RT RA RB 119 / 0 6 11 16 21 31 EA (RA) + EXTS(D) 56 RT 0 || MEM(EA, 1) EA (RA) + (RB) RA EA RT 56 0 || MEM(EA, 1) RA EA Let the effective address (EA) be the sum (RA)+ D. The byte in storage addressed by EA is loaded into RT56:63. Let the effective address (EA) be the sum (RA)+ (RB). RT0:55 are set to 0. The byte in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. EA is placed into register RA. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None Chapter 3. Fixed-Point Facility 45 Version 2.06 Load Halfword and Zero D-form Load Halfword and Zero Indexed X-form lhz RT,D(RA) lhzx RT,RA,RB 40 RT RA D 31 RT RA RB 279 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + EXTS(D) EA b + (RB) RT 48 48 0 || MEM(EA, 2) RT 0 || MEM(EA, 2) Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum The halfword in storage addressed by EA is loaded into (RA|0)+ (RB). The halfword in storage addressed by RT48:63. RT0:47 are set to 0. EA is loaded into RT48:63. RT0:47 are set to 0. Special Registers Altered: Special Registers Altered: None None Load Halfword and Zero with Update Load Halfword and Zero with Update D-form Indexed X-form lhzu RT,D(RA) lhzux RT,RA,RB 41 RT RA D 31 RT RA RB 311 / 0 6 11 16 31 0 6 11 16 21 31 EA (RA) + EXTS(D) EA (RA) + (RB) RT 48 48 0 || MEM(EA, 2) RT 0 || MEM(EA, 2) RA EA RA EA Let the effective address (EA) be the sum (RA)+ D. The Let the effective address (EA) be the sum (RA)+ (RB). halfword in storage addressed by EA is loaded into The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are set to 0. RT48:63. RT0:47 are set to 0. EA is placed into register RA. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None 46 Power ISATM Book I Version 2.06 Load Halfword Algebraic D-form Load Halfword Algebraic Indexed X-form lha RT,D(RA) lhax RT,RA,RB 42 RT RA D 31 RT RA RB 343 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + EXTS(D) EA b + (RB) RT EXTS(MEM(EA, 2)) RT EXTS(MEM(EA, 2)) Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum The halfword in storage addressed by EA is loaded into (RA|0)+ (RB). The halfword in storage addressed by RT48:63. RT0:47 are filled with a copy of bit 0 of the EA is loaded into RT48:63. RT0:47 are filled with a copy loaded halfword. of bit 0 of the loaded halfword. Special Registers Altered: Special Registers Altered: None None Load Halfword Algebraic with Update Load Halfword Algebraic with Update D-form Indexed X-form lhau RT,D(RA) lhaux RT,RA,RB 43 RT RA D 31 RT RA RB 375 / 0 6 11 16 31 0 6 11 16 21 31 EA (RA) + EXTS(D) EA (RA) + (RB) RT EXTS(MEM(EA, 2)) RT EXTS(MEM(EA, 2)) RA EA RA EA Let the effective address (EA) be the sum (RA)+ D. The Let the effective address (EA) be the sum (RA)+ (RB). halfword in storage addressed by EA is loaded into The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are filled with a copy of bit 0 of the RT48:63. RT0:47 are filled with a copy of bit 0 of the loaded halfword. loaded halfword. EA is placed into register RA. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Chapter 3. Fixed-Point Facility 47 Version 2.06 Load Word and Zero D-form Load Word and Zero Indexed X-form lwz RT,D(RA) lwzx RT,RA,RB 32 RT RA D 31 RT RA RB 23 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + EXTS(D) EA b + (RB) RT 32 32 0 || MEM(EA, 4) RT 0 || MEM(EA, 4) Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum The word in storage addressed by EA is loaded into (RA|0)+ (RB). The word in storage addressed by EA is RT32:63. RT0:31 are set to 0. loaded into RT32:63. RT0:31 are set to 0. Special Registers Altered: Special Registers Altered: None None Load Word and Zero with Update D-form Load Word and Zero with Update Indexed X-form lwzu RT,D(RA) lwzux RT,RA,RB 33 RT RA D 0 6 11 16 31 31 RT RA RB 55 / 0 6 11 16 21 31 EA (RA) + EXTS(D) 32 RT 0 || MEM(EA, 4) EA (RA) + (RB) RA EA RT 32 0 || MEM(EA, 4) RA EA Let the effective address (EA) be the sum (RA)+ D. The word in storage addressed by EA is loaded into Let the effective address (EA) be the sum (RA)+ (RB). RT32:63. RT0:31 are set to 0. The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. EA is placed into register RA. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None 48 Power ISATM Book I Version 2.06 3.3.2.1 64-bit Fixed-Point Load Instructions [Category: 64-Bit] Load Word Algebraic DS-form Load Word Algebraic Indexed X-form lwa RT,DS(RA) lwax RT,RA,RB 58 RT RA DS 2 31 RT RA RB 341 / 0 6 11 16 30 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + EXTS(DS || 0b00) EA b + (RB) RT EXTS(MEM(EA, 4)) RT EXTS(MEM(EA, 4)) Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (DS||0b00). The word in storage addressed by (RA|0)+ (RB). The word in storage addressed by EA is EA is loaded into RT32:63. RT0:31 are filled with a copy loaded into RT32:63. RT0:31 are filled with a copy of bit 0 of bit 0 of the loaded word. of the loaded word. Special Registers Altered: Special Registers Altered: None None Load Word Algebraic with Update Indexed X-form lwaux RT,RA,RB 31 RT RA RB 373 / 0 6 11 16 21 31 EA (RA) + (RB) RT EXTS(MEM(EA, 4)) RA EA Let the effective address (EA) be the sum (RA)+ (RB). The word in storage addressed by EA is loaded into RT32:63. RT0:31 are filled with a copy of bit 0 of the loaded word. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None Chapter 3. Fixed-Point Facility 49 Version 2.06 Load Doubleword DS-form Load Doubleword Indexed X-form ld RT,DS(RA) ldx RT,RA,RB 58 RT RA DS 0 31 RT RA RB 21 / 0 6 11 16 30 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + EXTS(DS || 0b00) EA b + (RB) RT MEM(EA, 8) RT MEM(EA, 8) Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (DS||0b00). The doubleword in storage (RA|0)+ (RB). The doubleword in storage addressed by addressed by EA is loaded into RT. EA is loaded into RT. Special Registers Altered: Special Registers Altered: None None Load Doubleword with Update DS-form Load Doubleword with Update Indexed X-form ldu RT,DS(RA) ldux RT,RA,RB 58 RT RA DS 1 0 6 11 16 30 31 31 RT RA RB 53 / 0 6 11 16 21 31 EA (RA) + EXTS(DS || 0b00) RT MEM(EA, 8) EA (RA) + (RB) RA EA RT MEM(EA, 8) RA EA Let the effective address (EA) be the sum (RA)+ (DS||0b00). The doubleword in storage Let the effective address (EA) be the sum (RA)+ (RB). addressed by EA is loaded into RT. The doubleword in storage addressed by EA is loaded into RT. EA is placed into register RA. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None 50 Power ISATM Book I Version 2.06 3.3.3 Fixed-Point Store Instructions The contents of register RS are stored into the byte, If RA0, the effective address is placed into regis- halfword, word, or doubleword in storage addressed by ter RA. EA. If RS=RA, the contents of register RS are copied to the target storage element and then EA is Many of the Store instructions have an "update" form, placed into RA (RS). in which register RA is updated with the effective address. For these forms, the following rules apply. Store Byte D-form Store Byte Indexed X-form stb RS,D(RA) stbx RS,RA,RB 38 RS RA D 31 RS RA RB 215 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + EXTS(D) EA b + (RB) MEM(EA, 1) (RS)56:63 MEM(EA, 1) (RS)56:63 Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum (RS)56:63 are stored into the byte in storage addressed (RA|0)+ (RB). (RS)56:63 are stored into the byte in stor- by EA. age addressed by EA. Special Registers Altered: Special Registers Altered: None None Store Byte with Update D-form Store Byte with Update Indexed X-form stbu RS,D(RA) stbux RS,RA,RB 39 RS RA D 31 RS RA RB 247 / 0 6 11 16 31 0 6 11 16 21 31 EA (RA) + EXTS(D) EA (RA) + (RB) MEM(EA, 1) (RS)56:63 MEM(EA, 1) (RS)56:63 RA EA RA EA Let the effective address (EA) be the sum (RA)+ D. Let the effective address (EA) be the sum (RA)+ (RB). (RS)56:63 are stored into the byte in storage addressed (RS)56:63 are stored into the byte in storage addressed by EA. by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Chapter 3. Fixed-Point Facility 51 Version 2.06 Store Halfword D-form Store Halfword Indexed X-form sth RS,D(RA) sthx RS,RA,RB 44 RS RA D 31 RS RA RB 407 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + EXTS(D) EA b + (RB) MEM(EA, 2) (RS)48:63 MEM(EA, 2) (RS)48:63 Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum (RS)48:63 are stored into the halfword in storage (RA|0)+ (RB). (RS)48:63 are stored into the halfword in addressed by EA. storage addressed by EA. Special Registers Altered: Special Registers Altered: None None Store Halfword with Update D-form Store Halfword with Update Indexed X-form sthu RS,D(RA) sthux RS,RA,RB 45 RS RA D 0 6 11 16 31 31 RS RA RB 439 / 0 6 11 16 21 31 EA (RA) + EXTS(D) MEM(EA, 2) (RS)48:63 EA (RA) + (RB) RA EA MEM(EA, 2) (RS)48:63 RA EA Let the effective address (EA) be the sum (RA)+ D. (RS)48:63 are stored into the halfword in storage Let the effective address (EA) be the sum (RA)+ (RB). addressed by EA. (RS)48:63 are stored into the halfword in storage addressed by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None 52 Power ISATM Book I Version 2.06 Store Word D-form Store Word Indexed X-form stw RS,D(RA) stwx RS,RA,RB 36 RS RA D 31 RS RA RB 151 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + EXTS(D) EA b + (RB) MEM(EA, 4) (RS)32:63 MEM(EA, 4) (RS)32:63 Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum (RS)32:63 are stored into the word in storage addressed (RA|0)+ (RB). (RS)32:63 are stored into the word in stor- by EA. age addressed by EA. Special Registers Altered: Special Registers Altered: None None Store Word with Update D-form Store Word with Update Indexed X-form stwu RS,D(RA) stwux RS,RA,RB 37 RS RA D 31 RS RA RB 183 / 0 6 11 16 31 0 6 11 16 21 31 EA (RA) + EXTS(D) EA (RA) + (RB) MEM(EA, 4) (RS)32:63 MEM(EA, 4) (RS)32:63 RA EA RA EA Let the effective address (EA) be the sum (RA)+ D. Let the effective address (EA) be the sum (RA)+ (RB). (RS)32:63 are stored into the word in storage addressed (RS)32:63 are stored into the word in storage addressed by EA. by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Chapter 3. Fixed-Point Facility 53 Version 2.06 3.3.3.1 64-bit Fixed-Point Store Instructions [Category: 64-Bit] Store Doubleword DS-form Store Doubleword Indexed X-form std RS,DS(RA) stdx RS,RA,RB 62 RS RA DS 0 31 RS RA RB 149 / 0 6 11 16 30 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + EXTS(DS || 0b00) EA b + (RB) MEM(EA, 8) (RS) MEM(EA, 8) (RS) Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (DS||0b00). (RS) is stored into the doubleword (RA|0)+ (RB). (RS) is stored into the doubleword in in storage addressed by EA. storage addressed by EA. Special Registers Altered: Special Registers Altered: None None Store Doubleword with Update DS-form Store Doubleword with Update Indexed X-form stdu RS,DS(RA) stdux RS,RA,RB 62 RS RA DS 1 0 6 11 16 30 31 31 RS RA RB 181 / 0 6 11 16 21 31 EA (RA) + EXTS(DS || 0b00) MEM(EA, 8) (RS) EA (RA) + (RB) RA EA MEM(EA, 8) (RS) RA EA Let the effective address (EA) be the sum (RA)+ (DS||0b00). (RS) is stored into the doubleword in Let the effective address (EA) be the sum (RA)+ (RB). storage addressed by EA. (RS) is stored into the doubleword in storage addressed by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None 54 Power ISATM Book I Version 2.06 3.3.4 Fixed-Point Load and Store with Byte Reversal Instructions Programming Note Programming Note These instructions have the effect of loading and In some implementations, the Load Byte-Reverse storing data in the opposite byte ordering from that instructions may have greater latency than other which would be used by other Load and Store Load instructions. instructions. Load Halfword Byte-Reverse Indexed Store Halfword Byte-Reverse Indexed X-form X-form lhbrx RT,RA,RB sthbrx RS,RA,RB 31 RT RA RB 790 / 31 RS RA RB 918 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + (RB) EA b + (RB) load_data MEM(EA, 2) MEM(EA, 2) (RS)56:63 || (RS)48:55 48 RT 0 || load_data8:15 || load_data0:7 Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+(RB). (RA|0)+ (RB). (RS)56:63 are stored into bits 0:7 of the Bits 0:7 of the halfword in storage addressed by EA are halfword in storage addressed by EA. (RS)48:55 are loaded into RT56:63. Bits 8:15 of the halfword in storage stored into bits 8:15 of the halfword in storage addressed by EA are loaded into RT48:55. RT0:47 are addressed by EA. set to 0. Special Registers Altered: Special Registers Altered: None None Load Word Byte-Reverse Indexed X-form Store Word Byte-Reverse Indexed X-form lwbrx RT,RA,RB stwbrx RS,RA,RB 31 RT RA RB 534 / 31 RS RA RB 662 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + (RB) EA b + (RB) load_data MEM(EA, 4) MEM(EA, 4) (RS)56:63 || (RS)48:55 || (RS)40:47 RT 320 || load_data 24:31 || load_data16:23 ||(RS)32:39 || load_data8:15 || load_data0:7 Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)56:63 are stored into bits 0:7 of the (RA|0)+ (RB). Bits 0:7 of the word in storage addressed word in storage addressed by EA. (RS)48:55 are stored by EA are loaded into RT56:63. Bits 8:15 of the word in into bits 8:15 of the word in storage addressed by EA. storage addressed by EA are loaded into RT48:55. Bits (RS)40:47 are stored into bits 16:23 of the word in stor- 16:23 of the word in storage addressed by EA are age addressed by EA. (RS)32:39 are stored into bits loaded into RT40:47. Bits 24:31 of the word in storage 24:31 of the word in storage addressed by EA. addressed by EA are loaded into RT32:39. RT0:31 are set to 0. Special Registers Altered: None Special Registers Altered: None Chapter 3. Fixed-Point Facility 55 Version 2.06 3.3.4.1 64-Bit Load and Store with Byte Reversal Instructions [Category: 64-bit] Load Doubleword Byte-Reverse Indexed Store Doubleword Byte-Reverse Indexed X-form X-form ldbrx RT,RA,RB stdbrx RS,RA,RB 31 RT RA RB 532 / 31 RS RA RB 660 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + (RB) EA b + (RB) load_data MEM(EA, 8) MEM(EA, 8) (RS)56:63 || (RS)48:55 RT load_data56:63 || load_data48:55 || (RS)40:47 || (RS)32:39 || load_data40:47 || load_data32:39 || (RS)24:31 || (RS)16:23 || load_data24:31 || load_data16:23 || (RS)8:15 || (RS)0:7 || load_data8:15 || load_data0:7 Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+(RB). (RA|0)+ (RB). (RS)56:63 are stored into bits 0:7 of the Bits 0:7 of the doubleword in storage addressed by EA doubleword in storage addressed by EA. (RS)48:55 are are loaded into RT56:63. Bits 8:15 of the doubleword in stored into bits 8:15 of the doubleword in storage storage addressed by EA are loaded into RT48:55. Bits addressed by EA. (RS)40:47 are stored into bits 16:23 of 16:23 of the doubleword in storage addressed by EA the doubleword in storage addressed by EA. (RS)32:39 are loaded into RT40:47. Bits 24:31 of the doubleword in are stored into bits 23:31 of the doubleword in storage storage addressed by EA are loaded into RT32:39. Bits addressed by EA. (RS)24:31 are stored into bits 32:39 of 32:39 of the doubleword in storage addressed by EA the doubleword in storage addressed by EA. (RS)16:23 are loaded into RT24:31. Bits 40:47 of the doubleword in are stored into bits 40:47 of the doubleword in storage storage addressed by EA are loaded into RT16:23. Bits addressed by EA. (RS)8:15 are stored into bits 48:55 of 48:55 of the doubleword in storage addressed by EA the doubleword in storage addressed by EA. (RS)0:7 are loaded into RT8:15. Bits 56:63 of the doubleword in are stored into bits 56:63 of the doubleword in storage storage addressed by EA are loaded into RT0:7. addressed by EA. Special Registers Altered: Special Registers Altered: None None 56 Power ISATM Book I Version 2.06 3.3.5 Fixed-Point Load and Store Multiple Instructions The Load/Store Multiple instructions have preferred (stored) from (into) the last byte of an aligned forms; see Section 1.8.1, "Preferred Instruction Forms" quadword in storage. on page 23. In the preferred forms, storage alignment For the Server environment, the Load/Store Multiple satisfies the following rule. instructions are not supported in Little-Endian mode. If The combination of the EA and RT (RS) is such they are executed in Little-Endian mode, the system that the low-order byte of GPR 31 is loaded alignment error handler is invoked. Load Multiple Word D-form Store Multiple Word D-form lmw RT,D(RA) stmw RS,D(RA) 46 RT RA D 47 RS RA D 0 6 11 16 31 0 6 11 16 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + EXTS(D) EA b + EXTS(D) r RT r RS do while r 31 do while r 31 GPR(r) 32 0 || MEM(EA, 4) MEM(EA, 4) GPR(r)32:63 r r + 1 r r + 1 EA EA + 4 EA EA + 4 Let n = (32-RT). Let the effective address (EA) be the Let n = (32-RS). Let the effective address (EA) be the sum (RA|0)+ D. sum (RA|0)+ D. n consecutive words starting at EA are loaded into the n consecutive words starting at EA are stored from the low-order 32 bits of GPRs RT through 31. The low-order 32 bits of GPRs RS through 31. high-order 32 bits of these GPRs are set to zero. Special Registers Altered: If RA is in the range of registers to be loaded, including None the case in which RA=0, the instruction form is invalid. Special Registers Altered: None Chapter 3. Fixed-Point Facility 57 Version 2.06 3.3.6 Fixed-Point Move Assist Instructions [Category: Move Assist] The Move Assist instructions allow movement of data RT = 4 or 5 from storage to registers or from registers to storage last register loaded/stored 12 without concern for alignment. These instructions can For some implementations, using GPR 4 for RS and be used for a short move between arbitrary storage RT may result in slightly faster execution than using locations or to initiate a long move between unaligned GPR 5. storage fields. For the Server environment, the Move Assist instruc- The Load/Store String instructions have preferred tions are not supported in Little-Endian mode. If they forms; see Section 1.8.1, "Preferred Instruction Forms" are executed in Little-Endian mode, the system align- on page 23. In the preferred forms, register usage sat- ment error handler may be invoked or the instructions isfies the following rules. may be treated as no-ops if the number of bytes speci- RS = 4 or 5 fied by the instruction is 0. 58 Power ISATM Book I Version 2.06 Load String Word Immediate X-form Load String Word Indexed X-form lswi RT,RA,NB lswx RT,RA,RB 31 RT RA NB 597 / 31 RT RA RB 533 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then EA 0 if RA = 0 then b 0 else EA (RA) else b (RA) if NB = 0 then n 32 EA b + (RB) else n NB n XER57:63 r RT - 1 r RT - 1 i 32 i 32 do while n > 0 RT undefined if i = 32 then do while n > 0 r r + 1 (mod 32) if i = 32 then GPR(r) 0 r r + 1 (mod 32) GPR(r)i:i+7 MEM(EA, 1) GPR(r) 0 i i + 8 GPR(r)i:i+7 MEM(EA, 1) if i = 64 then i 32 i i + 8 EA EA + 1 if i = 64 then i 32 n n - 1 EA EA + 1 n n - 1 Let the effective address (EA) be (RA|0). Let n = NB if NB0, n = 32 if NB=0; n is the number of bytes to load. Let the effective address (EA) be the sum Let nr=CEIL(n/4); nr is the number of registers to (RA|0)+ (RB). Let n=XER57:63; n is the number of bytes receive data. to load. Let nr=CEIL(n/4); nr is the number of registers to receive data. n consecutive bytes starting at EA are loaded into GPRs RT through RT+nr-1. Data are loaded into the If n>0, n consecutive bytes starting at EA are loaded low-order four bytes of each GPR; the high-order four into GPRs RT through RT+nr-1. Data are loaded into bytes are set to 0. the low-order four bytes of each GPR; the high-order four bytes are set to 0. Bytes are loaded left to right in each register. The sequence of registers wraps around to GPR 0 if Bytes are loaded left to right in each register. The required. If the low-order four bytes of register RT+nr-1 sequence of registers wraps around to GPR 0 if are only partially filled, the unfilled low-order byte(s) of required. If the low-order four bytes of register RT+nr-1 that register are set to 0. are only partially filled, the unfilled low-order byte(s) of that register are set to 0. If RA is in the range of registers to be loaded, including the case in which RA=0, the instruction form is invalid. If n=0, the contents of register RT are undefined. Special Registers Altered: If RA or RB is in the range of registers to be loaded, None including the case in which RA=0, the instruction is treated as if the instruction form were invalid. If RT=RA or RT=RB, the instruction form is invalid. Special Registers Altered: None Chapter 3. Fixed-Point Facility 59 Version 2.06 Store String Word Immediate X-form Store String Word Indexed X-form stswi RS,RA,NB stswx RS,RA,RB 31 RS RA NB 725 / 31 RS RA RB 661 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then EA 0 if RA = 0 then b 0 else EA (RA) else b (RA) if NB = 0 then n 32 EA b + (RB) else n NB n XER57:63 r RS - 1 r RS - 1 i 32 i 32 do while n > 0 do while n > 0 if i = 32 then r r + 1 (mod 32) if i = 32 then r r + 1 (mod 32) MEM(EA, 1) GPR(r)i:i+7 MEM(EA, 1) GPR(r)i:i+7 i i + 8 i i + 8 if i = 64 then i 32 if i = 64 then i 32 EA EA + 1 EA EA + 1 n n - 1 n n - 1 Let the effective address (EA) be (RA|0). Let n = NB if Let the effective address (EA) be the sum NB0, n = 32 if NB=0; n is the number of bytes to store. (RA|0)+ (RB). Let n = XER57:63; n is the number of Let nr =CEIL(n/4); nr is the number of registers to sup- bytes to store. Let nr = CEIL(n/4); nr is the number of ply data. registers to supply data. n consecutive bytes starting at EA are stored from If n>0, n consecutive bytes starting at EA are stored GPRs RS through RS+nr-1. Data are stored from the from GPRs RS through RS+nr-1. Data are stored from low-order four bytes of each GPR. the low-order four bytes of each GPR. Bytes are stored left to right from each register. The Bytes are stored left to right from each register. The sequence of registers wraps around to GPR 0 if sequence of registers wraps around to GPR 0 if required. required. Special Registers Altered: If n=0, no bytes are stored. None Special Registers Altered: None 60 Power ISATM Book I Version 2.06 3.3.7 Other Fixed-Point Instructions The remainder of the fixed-point instructions use the these bits are set by signed comparison of the result to contents of the General Purpose Registers (GPRs) as zero. In 32-bit mode, these bits are set by signed com- source operands, and place results into GPRs, into the parison of the low-order 32 bits of the result to zero. Fixed-Point Exception Register (XER), and into Condi- Unless otherwise noted and when appropriate, when tion Register fields. In addition, the Trap instructions CR Field 0 and the XER are set they reflect the value test the contents of a GPR or XER bit, invoking the sys- placed into the target register. tem trap handler if the result of the specified test is true. These instructions treat the source operands as signed Programming Note integers unless the instruction is explicitly identified as Instructions with the OE bit set or that set CA may performing an unsigned operation. execute slowly or may prevent the execution of The X-form and XO-form instructions with Rc=1, and subsequent instructions until the instruction has the D-form instructions addic., andi., and andis., set completed. the first three bits of CR Field 0 to characterize the result placed into the target register. In 64-bit mode, Chapter 3. Fixed-Point Facility 61 Version 2.06 3.3.8 Fixed-Point Arithmetic Instructions The XO-form Arithmetic instructions with Rc=1, and the Extended mnemonics for addition and D-form Arithmetic instruction addic., set the first three subtraction bits of CR Field 0 as described in Section 3.3.7, "Other Fixed-Point Instructions". Several extended mnemonics are provided that use the Add Immediate and Add Immediate Shifted instructions addic, addic., subfic, addc, subfc, adde, subfe, to load an immediate value or an address into a target addme, subfme, addze, and subfze always set CA, to register. Some of these are shown as examples with reflect the carry out of bit 0 in 64-bit mode and out of bit the two instructions. 32 in 32-bit mode. The XO-form Arithmetic instructions set SO and OV when OE=1 to reflect overflow of the The Power ISA supplies Subtract From instructions, result. Except for the Multiply Low and Divide instruc- which subtract the second operand from the third. A set tions, the setting of these bits is mode-dependent, and of extended mnemonics is provided that use the more reflects overflow of the 64-bit result in 64-bit mode and "normal" order, in which the third operand is subtracted overflow of the low-order 32-bit result in 32-bit mode. from the second, with the third operand being either an For XO-form Multiply Low and Divide instructions, the immediate field or a register. Some of these are shown setting of these bits is mode-independent, and reflects as examples with the appropriate Add and Subtract overflow of the 64-bit result for mulld, divd, divde, From instructions. divdu and divdeu, and overflow of the low-order 32-bit See Appendix E for additional extended mnemonics. result for mullw, divw, divwe, divwu, and divweu. Programming Note Notice that CR Field 0 may not reflect the "true" (infinitely precise) result if overflow occurs. Add Immediate D-form Add Immediate Shifted D-form addi RT,RA,SI addis RT,RA,SI 14 RT RA SI 15 RT RA SI 0 6 11 16 31 0 6 11 16 31 if RA = 0 then RT EXTS(SI) if RA = 0 then RT EXTS(SI || 160) else RT (RA) + EXTS(SI) else RT (RA) + EXTS(SI || 160) The sum (RA|0) + SI is placed into register RT. The sum (RA|0) + (SI || 0x0000) is placed into register RT. Special Registers Altered: None Special Registers Altered: None Extended Mnemonics: Extended Mnemonics: Examples of extended mnemonics for Add Immediate: Examples of extended mnemonics for Add Immediate Extended: Equivalent to: Shifted: li Rx,value addi Rx,0,value la Rx,disp(Ry) addi Rx,Ry,disp Extended: Equivalent to: subi Rx,Ry,value addi Rx,Ry,-value lis Rx,value addis Rx,0,value subis Rx,Ry,value addis Rx,Ry,-value Programming Note addi, addis, add, and subf are the preferred instructions for addition and subtraction, because they set few status bits. Notice that addi and addis use the value 0, not the contents of GPR 0, if RA=0. 62 Power ISATM Book I Version 2.06 Add XO-form Subtract From XO-form add RT,RA,RB (OE=0 Rc=0) subf RT,RA,RB (OE=0 Rc=0) add. RT,RA,RB (OE=0 Rc=1) subf. RT,RA,RB (OE=0 Rc=1) addo RT,RA,RB (OE=1 Rc=0) subfo RT,RA,RB (OE=1 Rc=0) addo. RT,RA,RB (OE=1 Rc=1) subfo. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB OE 266 Rc 31 RT RA RB OE 40 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 RT (RA) + (RB) RT ¬(RA) + (RB) + 1 The sum (RA) + (RB) is placed into register RT. The sum ¬(RA) + (RB) +1 is placed into register RT. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Extended Mnemonics: Example of extended mnemonics for Subtract From: Extended: Equivalent to: sub Rx,Ry,Rz subf Rx,Rz,Ry Add Immediate Carrying D-form Add Immediate Carrying and Record D-form addic RT,RA,SI addic. RT,RA,SI 12 RT RA SI 0 6 11 16 31 13 RT RA SI 0 6 11 16 31 RT (RA) + EXTS(SI) RT (RA) + EXTS(SI) The sum (RA) + SI is placed into register RT. The sum (RA) + SI is placed into register RT. Special Registers Altered: CA Special Registers Altered: CR0 CA Extended Mnemonics: Extended Mnemonics: Example of extended mnemonics for Add Immediate Carrying: Example of extended mnemonics for Add Immediate Carrying and Record: Extended: Equivalent to: subic Rx,Ry,value addic Rx,Ry,-value Extended: Equivalent to: subic. Rx,Ry,value addic. Rx,Ry,-value Chapter 3. Fixed-Point Facility 63 Version 2.06 Subtract From Immediate Carrying D-form subfic RT,RA,SI 8 RT RA SI 0 6 11 16 31 RT ¬(RA) + EXTS(SI) + 1 The sum ¬(RA) + SI + 1 is placed into register RT. Special Registers Altered: CA Add Carrying XO-form Subtract From Carrying XO-form addc RT,RA,RB (OE=0 Rc=0) subfc RT,RA,RB (OE=0 Rc=0) addc. RT,RA,RB (OE=0 Rc=1) subfc. RT,RA,RB (OE=0 Rc=1) addco RT,RA,RB (OE=1 Rc=0) subfco RT,RA,RB (OE=1 Rc=0) addco. RT,RA,RB (OE=1 Rc=1) subfco. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB OE 10 Rc 31 RT RA RB OE 8 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 RT (RA) + (RB) RT ¬(RA) + (RB) + 1 The sum (RA) + (RB) is placed into register RT. The sum ¬(RA) + (RB) + 1 is placed into register RT. Special Registers Altered: Special Registers Altered: CA CA CR0 (if Rc=1) CR0 (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Extended Mnemonics: Example of extended mnemonics for Subtract From Carrying: Extended: Equivalent to: subc Rx,Ry,Rz subfc Rx,Rz,Ry 64 Power ISATM Book I Version 2.06 Add Extended XO-form Subtract From Extended XO-form adde RT,RA,RB (OE=0 Rc=0) subfe RT,RA,RB (OE=0 Rc=0) adde. RT,RA,RB (OE=0 Rc=1) subfe. RT,RA,RB (OE=0 Rc=1) addeo RT,RA,RB (OE=1 Rc=0) subfeo RT,RA,RB (OE=1 Rc=0) addeo. RT,RA,RB (OE=1 Rc=1) subfeo. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB OE 138 Rc 31 RT RA RB OE 136 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 RT (RA) + (RB) + CA RT ¬(RA) + (RB) + CA The sum (RA) + (RB) + CA is placed into register RT. The sum ¬(RA) + (RB) + CA is placed into register RT. Special Registers Altered: Special Registers Altered: CA CA CR0 (if Rc=1) CR0 (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Add to Minus One Extended XO-form Subtract From Minus One Extended XO-form addme RT,RA (OE=0 Rc=0) addme. RT,RA (OE=0 Rc=1) subfme RT,RA (OE=0 Rc=0) addmeo RT,RA (OE=1 Rc=0) subfme. RT,RA (OE=0 Rc=1) addmeo. RT,RA (OE=1 Rc=1) subfmeo RT,RA (OE=1 Rc=0) subfmeo. RT,RA (OE=1 Rc=1) 31 RT RA /// OE 234 Rc 0 6 11 16 21 22 31 31 RT RA /// OE 232 Rc 0 6 11 16 21 22 31 RT (RA) + CA - 1 The sum (RA) + CA + 641 is placed into register RT. RT ¬(RA) + CA - 1 The sum ¬(RA) + CA + 641 is placed into register RT. Special Registers Altered: CA Special Registers Altered: CR0 (if Rc=1) CA SO OV (if OE=1) CR0 (if Rc=1) SO OV (if OE=1) Chapter 3. Fixed-Point Facility 65 Version 2.06 Add to Zero Extended XO-form Subtract From Zero Extended XO-form addze RT,RA (OE=0 Rc=0) subfze RT,RA (OE=0 Rc=0) addze. RT,RA (OE=0 Rc=1) subfze. RT,RA (OE=0 Rc=1) addzeo RT,RA (OE=1 Rc=0) subfzeo RT,RA (OE=1 Rc=0) addzeo. RT,RA (OE=1 Rc=1) subfzeo. RT,RA (OE=1 Rc=1) 31 RT RA /// OE 202 Rc 31 RT RA /// OE 200 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 RT (RA) + CA RT ¬(RA) + CA The sum (RA) + CA is placed into register RT. The sum ¬(RA) + CA is placed into register RT. Special Registers Altered: Special Registers Altered: CA CA CR0 (if Rc=1) CR0 (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Programming Note The setting of CA by the Add and Subtract From instructions, including the Extended versions thereof, is mode-dependent. If a sequence of these instructions is used to perform extended-precision addition or subtraction, the same mode should be used throughout the sequence. Negate XO-form neg RT,RA (OE=0 Rc=0) neg. RT,RA (OE=0 Rc=1) nego RT,RA (OE=1 Rc=0) nego. RT,RA (OE=1 Rc=1) 31 RT RA /// OE 104 Rc 0 6 11 16 21 22 31 RT ¬(RA) + 1 The sum ¬(RA) + 1 is placed into register RT. If the processor is in 64-bit mode and register RA con- tains the most negative 64-bit number (0x8000_ 0000_0000_0000), the result is the most negative num- ber and, if OE=1, OV is set to 1. Similarly, if the proces- sor is in 32-bit mode and (RA)32:63 contain the most negative 32-bit number (0x8000_0000), the low-order 32 bits of the result contain the most negative 32-bit number and, if OE=1, OV is set to 1. Special Registers Altered: CR0 (if Rc=1) SO OV (if OE=1) 66 Power ISATM Book I Version 2.06 Multiply Low Immediate D-form Multiply High Word XO-form mulli RT,RA,SI mulhw RT,RA,RB (Rc=0) mulhw. RT,RA,RB (Rc=1) 7 RT RA SI 0 6 11 16 31 31 RT RA RB / 75 Rc 0 6 11 16 21 22 31 prod0:127 (RA) × EXTS(SI) RT prod64:127 prod0:63 (RA)32:63 × (RB)32:63 RT32:63 prod0:31 The 64-bit first operand is (RA). The 64-bit second RT0:31 undefined operand is the sign-extended value of the SI field. The low-order 64 bits of the 128-bit product of the operands The 32-bit operands are the low-order 32 bits of RA are placed into register RT. and of RB. The high-order 32 bits of the 64-bit product of the operands are placed into RT32:63. The contents Both operands and the product are interpreted as of RT0:31 are undefined. signed integers. Both operands and the product are interpreted as Special Registers Altered: signed integers. None Special Registers Altered: CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) Multiply Low Word XO-form mullw RT,RA,RB (OE=0 Rc=0) Multiply High Word Unsigned XO-form mullw. RT,RA,RB (OE=0 Rc=1) mullwo RT,RA,RB (OE=1 Rc=0) mulhwu RT,RA,RB (Rc=0) mullwo. RT,RA,RB (OE=1 Rc=1) mulhwu. RT,RA,RB (Rc=1) 31 RT RA RB OE 235 Rc 31 RT RA RB / 11 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 RT (RA)32:63 × (RB)32:63 prod0:63 (RA)32:63 × (RB)32:63 RT32:63 prod0:31 The 32-bit operands are the low-order 32 bits of RA RT0:31 undefined and of RB. The 64-bit product of the operands is placed into register RT. The 32-bit operands are the low-order 32 bits of RA and of RB. The high-order 32 bits of the 64-bit product If OE=1 then OV is set to 1 if the product cannot be rep- of the operands are placed into RT32:63. The contents resented in 32 bits. of RT0:31 are undefined. Both operands and the product are interpreted as Both operands and the product are interpreted as signed integers. unsigned integers, except that if Rc=1 the first three Special Registers Altered: bits of CR Field 0 are set by signed comparison of the CR0 (if Rc=1) result to zero. SO OV (if OE=1) Special Registers Altered: CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) Programming Note For mulli and mullw, the low-order 32 bits of the product are the correct 32-bit product for 32-bit mode. For mulli and mulld, the low-order 64 bits of the product are independent of whether the operands are regarded as signed or unsigned 64-bit integers. For mulli and mullw, the low-order 32 bits of the product are independent of whether the operands are regarded as signed or unsigned 32-bit integers. Chapter 3. Fixed-Point Facility 67 Version 2.06 Divide Word XO-form Divide Word Unsigned XO-form divw RT,RA,RB (OE=0 Rc=0) divwu RT,RA,RB (OE=0 Rc=0) divw. RT,RA,RB (OE=0 Rc=1) divwu. RT,RA,RB (OE=0 Rc=1) divwo RT,RA,RB (OE=1 Rc=0) divwuo RT,RA,RB (OE=1 Rc=0) divwo. RT,RA,RB (OE=1 Rc=1) divwuo. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB OE 491 Rc 31 RT RA RB OE 459 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 dividend0:31 (RA)32:63 dividend0:31 (RA)32:63 divisor0:31 (RB)32:63 divisor0:31 (RB)32:63 RT32:63 dividend ÷ divisor RT32:63 dividend ÷ divisor RT0:31 undefined RT0:31 undefined The 32-bit dividend is (RA)32:63. The 32-bit divisor is The 32 bit dividend is (RA)32:63. The 32-bit divisor is (RB)32:63. The 32-bit quotient is placed into RT32:63. (RB)32:63. The 32-bit quotient is placed into RT32:63. The contents of RT0:31 are undefined. The remainder is The contents of RT0:31 are undefined. The remainder is not supplied as a result. not supplied as a result. Both operands and the quotient are interpreted as Both operands and the quotient are interpreted as signed integers. The quotient is the unique signed inte- unsigned integers, except that if Rc=1 the first three ger that satisfies bits of CR Field 0 are set by signed comparison of the result to zero. The quotient is the unique unsigned inte- dividend = (quotient × divisor) + r ger that satisfies where 0 r < |divisor| if the dividend is nonnegative, dividend = (quotient × divisor) + r and -|divisor| < r 0 if the dividend is negative. where 0 r < divisor. If an attempt is made to perform any of the divisions If an attempt is made to perform the division 0x8000_0000 ÷ -1 ÷ 0 ÷ 0 then the contents of register RT are undefined as are then the contents of register RT are undefined as are (if (if Rc=1) the contents of the LT, GT, and EQ bits of CR Rc=1) the contents of the LT, GT, and EQ bits of CR Field 0. In these cases, if OE=1 then OV is set to 1. Field 0. In this case, if OE=1 then OV is set to 1. Special Registers Altered: Special Registers Altered: CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Programming Note Programming Note The 32-bit signed remainder of dividing (RA)32:63 The 32-bit unsigned remainder of dividing (RA)32:63 by (RB)32:63 can be computed as follows, except in by (RB)32:63 can be computed as follows. the case that (RA)32:63 = -231 and (RB)32:63 = -1. divwu RT,RA,RB # RT = quotient divw RT,RA,RB # RT = quotient mullw RT,RT,RB # RT = quotient×divisor mullw RT,RT,RB # RT = quotient×divisor subf RT,RT,RA # RT = remainder subf RT,RT,RA # RT = remainder 68 Power ISATM Book I Version 2.06 Divide Word Extended XO-form Divide Word Extended Unsigned XO-form divwe RT,RA,RB (OE=0 Rc=0) divweu RT,RA,RB (OE=0 Rc=0) divwe. RT,RA,RB (OE=0 Rc=1) divweu. RT,RA,RB (OE=0 Rc=1) divweo RT,RA,RB (OE=1 Rc=0) divweuo RT,RA,RB (OE=1 Rc=0) divweo. RT,RA,RB (OE=1 Rc=1) divweuo. RT,RA,RB (OE=1 Rc=1) [Category: Server] [Category: Server] [Category: Embedded.Phased-In] [Category: Embedded.Phased-In] 31 RT RA RB OE 427 Rc 31 RT RA RB OE 395 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 dividend0:63 (RA)32:63 || 320 dividend0:63 (RA)32:63 || 320 divisor0:31 (RB)32:63 divisor0:31 (RB)32:63 RT32:63 dividend ÷ divisor RT32:63 dividend ÷ divisor RT0:31 undefined RT0:31 undefined The 64-bit dividend is (RA)32:63 || 320. The 32-bit divisor The 64-bit dividend is (RA)32:63 || 320. The 32-bit divisor is (RB)32:63. If the quotient can be represented in 32 is (RB)32:63. If the quotient can be represented in 32 bits, it is placed into RT32:63. The contents of RT0:31 bits, it is placed into RT32:63. The contents of RT0:31 are undefined. The remainder is not supplied as a are undefined. The remainder is not supplied as a result. result. Both operands and the quotient are interpreted as Both operands and the quotient are interpreted as signed integers. The quotient is the unique signed inte- unsigned integers, except that if Rc=1 the first three ger that satisfies bits of CR Field 0 are set by signed comparison of the result to zero. The quotient is the unique unsigned inte- dividend = (quotient × divisor) + r ger that satisfies where 0 r < |divisor| if the dividend is nonnegative, dividend = (quotient × divisor) + r and -|divisor| < r 0 if the dividend is negative. where 0 r < divisor. If the quotient cannot be represented in 32 bits, or if an attempt is made to perform the division If (RA) (RB), or if an attempt is made to perform the division ÷ 0 ÷ 0 then the contents of register RT are undefined as are (if Rc=1) the contents of the LT, GT, and EQ bits of CR then the contents of register RT are undefined as are (if Field 0. In these cases, if OE=1 then OV is set to 1. Rc=1) the contents of the LT, GT, and EQ bits of CR Field 0. In these cases, if OE=1 then OV is set to 1. Special Registers Altered: CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) Special Registers Altered: SO OV (if OE=1) CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) SO OV (if OE=1) Chapter 3. Fixed-Point Facility 69 Version 2.06 Programming Note Unsigned long division of a 64-bit dividend contained in Assembler Code: two 32-bit registers by a 32-bit divisor can be computed as follows. The algorithm is shown first, followed by # Dh in r4, Dl in r5 # Dv in r6 Assembler code that implements the algorithm. The divweu r3,r4,r6 # q1 dividend is Dh || Dl, the divisor is Dv, and the quotient divwu r7,r5,r6 # q2 and remainder are Q and R respectively, where these mullw r8,r3,r6 # -r1 = q1 * Dv variables and all intermediate variables represent mullw r0,r7,r6 # q2 * Dv unsigned 32-bit integers. It is assumed that Dv > Dh, subf r10,r0,r5 # r2 = Dl - (q2 * Dv) and that assigning a value to an intermediate variable add r3,r3,r7 # Q = q1 + q2 assigns the low-order 32 bits of the value and ignores subf r4,r8,r10 # R = r1 + r2 any higher-order bits of the value. (In both the algorithm cmplw r4,r10 # R < r2 ? and the Assembler code, "r1" and "r2" refer to "remain- blt *+12 # must adjust Q and R if yes der 1" and "remainder 2", rather than to GPRs 1 and 2.) cmplw r4,r6 # R Dv ? blt *+12 # must adjust Q and R if yes Algorithm: addi r3,r3,1 # Q = Q + 1 subf r4,r6,r4 # R = R - Dv 3. q1 divweu Dh, Dv # Quotient in r3 4. r1 -(q1 × Dv) # remainder of step 1 # Remainder in r4 divide operation Notes: (see Note 1) 5. q2 divwu Dl, Dv 1. The remainder is Dh || 320 - (q1 × Dv). Because the 6. r2 Dl - (q2 × Dv) # remainder of step 2 remainder must be less than Dv and Dv < 232, the divide operation remainder is representable in 32 bits. Because the 7. Q q1 + q2 low-order 32 bits of Dh || 320 are 0s, the remainder 8. R r1 + r2 is therefore equal to the low-order 32 bits of -(q1 × 9. if (R < r2) | (R Dv) then # (see Note 2) Dv). Thus assigning -(q1 × Dv) to r1 yields the cor- Q Q+1 # increment quotient rect remainder. R R - Dv # decrement rem'der 2. R is less than r2 (and also less than r1) if and only if the addition at step 6 carried out of 32 bits -- i.e., if and only if the correct sum could not be repre- sented in 32 bits -- in which case the correct sum is necessarily greater than Dv. 3. For additional information see the book Hacker's Delight, by Henry S. Warren, Jr., as potentially amended at the web site http://www.hackersde- light.org. 70 Power ISATM Book I Version 2.06 3.3.8.1 64-bit Fixed-Point Arithmetic Instructions [Category: 64-Bit] Multiply Low Doubleword XO-form Multiply High Doubleword XO-form mulld RT,RA,RB (OE=0 Rc=0) mulhd RT,RA,RB (Rc=0) mulld. RT,RA,RB (OE=0 Rc=1) mulhd. RT,RA,RB (Rc=1) mulldo RT,RA,RB (OE=1 Rc=0) mulldo. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB / 73 Rc 0 6 11 16 21 22 31 31 RT RA RB OE 233 Rc 0 6 11 16 21 22 31 prod0:127 (RA) × (RB) RT prod0:63 prod0:127 (RA) × (RB) The 64-bit operands are (RA) and (RB). The high-order RT prod64:127 64 bits of the 128-bit product of the operands are The 64-bit operands are (RA) and (RB). The low-order placed into register RT. 64 bits of the 128-bit product of the operands are Both operands and the product are interpreted as placed into register RT. signed integers. If OE=1 then OV is set to 1 if the product cannot be rep- Special Registers Altered: resented in 64 bits. CR0 (if Rc=1) Both operands and the product are interpreted as signed integers. Special Registers Altered: CR0 (if Rc=1) SO OV (if OE=1) Programming Note The XO-form Multiply instructions may execute faster on some implementations if RB contains the operand having the smaller absolute value. Multiply High Doubleword Unsigned XO-form mulhdu RT,RA,RB (Rc=0) mulhdu. RT,RA,RB (Rc=1) 31 RT RA RB / 9 Rc 0 6 11 16 21 22 31 prod0:127 (RA) × (RB) RT prod0:63 The 64-bit operands are (RA) and (RB). The high-order 64 bits of the 128-bit product of the operands are placed into register RT. Both operands and the product are interpreted as unsigned integers, except that if Rc=1 the first three bits of CR Field 0 are set by signed comparison of the result to zero. Special Registers Altered: CR0 (if Rc=1) Chapter 3. Fixed-Point Facility 71 Version 2.06 Divide Doubleword XO-form Divide Doubleword Unsigned XO-form divd RT,RA,RB (OE=0 Rc=0) divdu RT,RA,RB (OE=0 Rc=0) divd. RT,RA,RB (OE=0 Rc=1) divdu. RT,RA,RB (OE=0 Rc=1) divdo RT,RA,RB (OE=1 Rc=0) divduo RT,RA,RB (OE=1 Rc=0) divdo. RT,RA,RB (OE=1 Rc=1) divduo. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB OE 489 Rc 31 RT RA RB OE 457 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 dividend0:63 (RA) dividend0:63 (RA) divisor0:63 (RB) divisor0:63 (RB) RT dividend ÷ divisor RT dividend ÷ divisor The 64-bit dividend is (RA). The 64-bit divisor is (RB). The 64-bit dividend is (RA). The 64-bit divisor is (RB). The 64-bit quotient is placed into register RT. The The 64-bit quotient is placed into register RT. The remainder is not supplied as a result. remainder is not supplied as a result. Both operands and the quotient are interpreted as Both operands and the quotient are interpreted as signed integers. The quotient is the unique signed inte- unsigned integers, except that if Rc=1 the first three ger that satisfies bits of CR Field 0 are set by signed comparison of the result to zero. The quotient is the unique unsigned inte- dividend = (quotient × divisor) + r ger that satisfies where 0 r < |divisor| if the dividend is nonnegative, dividend = (quotient × divisor) + r and -|divisor| < r 0 if the dividend is negative. where 0 r < divisor. If an attempt is made to perform any of the divisions If an attempt is made to perform the division 0x8000_0000_0000_0000 ÷ -1 ÷ 0 ÷ 0 then the contents of register RT are undefined as are (if then the contents of register RT are undefined as are (if Rc=1) the contents of the LT, GT, and EQ bits of CR Rc=1) the contents of the LT, GT, and EQ bits of CR Field 0. In these cases, if OE=1 then OV is set to 1. Field 0. In this case, if OE=1 then OV is set to 1. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Programming Note Programming Note The 64-bit signed remainder of dividing (RA) by The 64-bit unsigned remainder of dividing (RA) by (RB) can be computed as follows, except in the (RB) can be computed as follows. case that (RA) = -263 and (RB) = -1. divdu RT,RA,RB # RT = quotient divd RT,RA,RB # RT = quotient mulld RT,RT,RB # RT = quotient×divisor mulld RT,RT,RB # RT = quotient×divisor subf RT,RT,RA # RT = remainder subf RT,RT,RA # RT = remainder 72 Power ISATM Book I Version 2.06 Divide Doubleword Extended XO-form Divide Doubleword Extended Unsigned XO-form divde RT,RA,RB (OE=0 Rc=0) divde. RT,RA,RB (OE=0 Rc=1) divdeu RT,RA,RB (OE=0 Rc=0) divdeo RT,RA,RB (OE=1 Rc=0) divdeu. RT,RA,RB (OE=0 Rc=1) divdeo. RT,RA,RB (OE=1 Rc=1) divdeuo RT,RA,RB (OE=1 Rc=0) [Category: Server] divdeuo. RT,RA,RB (OE=1 Rc=1) [Category: Embedded.Phased-In] [Category: Server] [Category: Embedded.Phased-In] 31 RT RA RB OE 425 Rc 0 6 11 16 21 22 31 31 RT RA RB OE 393 Rc 0 6 11 16 21 22 31 dividend0:127 (RA) || 640 divisor0:63 (RB) dividend0:127 (RA) || 640 RT dividend ÷ divisor divisor0:63 (RB) RT dividend ÷ divisor The 128-bit dividend is (RA) || 640. The 64-bit divisor is (RB). If the quotient can be represented in 64 bits, it is The 128-bit dividend is (RA) || 640. The 64-bit divisor is placed into register RT. The remainder is not supplied (RB). If the quotient can be represented in 64 bits, it is as a result. placed into register RT. The remainder is not supplied as a result. Both operands and the quotient are interpreted as signed integers. The quotient is the unique signed inte- Both operands and the quotient are interpreted as ger that satisfies unsigned integers, except that if Rc=1 the first three bits of CR Field 0 are set by signed comparison of the dividend = (quotient × divisor) + r result to zero. The quotient is the unique unsigned inte- ger that satisfies where 0 r < |divisor| if the dividend is nonnegative, and -|divisor| < r 0 if the dividend is negative. dividend = (quotient × divisor) + r If the quotient cannot be represented in 64 bits, or if an where 0 r < divisor. attempt is made to perform the division If (RA) (RB), or if an attempt is made to perform the ÷ 0 division then the contents of register RT are undefined as are (if ÷ 0 Rc=1) the contents of the LT, GT, and EQ bits of CR Field 0. In these cases, if OE=1 then OV is set to 1. then the contents of register RT are undefined as are (if Rc=1) the contents of the LT, GT, and EQ bits of CR Special Registers Altered: Field 0. In these cases, if OE=1 then OV is set to 1. CR0 (if Rc=1) SO OV (if OE=1) Special Registers Altered: CR0 (if Rc=1) SO OV (if OE=1) Programming Note Unsigned long division of a 128-bit dividend con- tained in two 64-bit registers by a 64-bit divisor can be accomplished using the technique described in the Programming Note with the divweu instruction description: divd[e]u would be used instead of divw[e]u (and cmpld instead of cmplw, etc.). Chapter 3. Fixed-Point Facility 73 Version 2.06 3.3.9 Fixed-Point Compare Instructions The fixed-point Compare instructions compare the con- two to 0. XERSO is copied to bit 3 of the designated CR tents of register RA with (1) the sign-extended value of field. the SI field, (2) the zero-extended value of the UI field, The CR field is set as follows or (3) the contents of register RB. The comparison is signed for cmpi and cmp, and unsigned for cmpli and . cmpl. Bit Name Description 0 LT (RA) < SI or (RB) (signed comparison) The L field controls whether the operands are treated (RA) SI or (RB) (signed comparison) L Operand length (RA) >u UI or (RB) (unsigned comparison) 0 32-bit operands 2 EQ (RA) = SI, UI, or (RB) 1 64-bit operands 3 SO Summary Overflow from the XER L=1 is part of Category: 64-Bit. Extended mnemonics for compares When the operands are treated as 32-bit signed quanti- A set of extended mnemonics is provided so that com- ties, bit 32 of the register (RA or RB) is the sign bit. pares can be coded with the operand length as part of The Compare instructions set one bit in the leftmost the mnemonic rather than as a numeric operand. Some three bits of the designated CR field to 1, and the other of these are shown as examples with the Compare instructions. See Appendix E for additional extended mnemonics. Compare Immediate D-form Compare X-form cmpi BF,L,RA,SI cmp BF,L,RA,RB 11 BF / L RA SI 31 BF / L RA RB 0 / 0 6 9 10 11 16 31 0 6 9 10 11 16 21 31 if L = 0 then a EXTS((RA)32:63) if L = 0 then a EXTS((RA)32:63) else a (RA) b EXTS((RB)32:63) if a < EXTS(SI) then c 0b100 else a (RA) else if a > EXTS(SI) then c 0b010 b (RB) else c 0b001 if a < b then c 0b100 CR4×BF+32:4×BF+35 c || XERSO else if a > b then c 0b010 else c 0b001 The contents of register RA ((RA)32:63 sign-extended to CR4×BF+32:4×BF+35 c || XERSO 64 bits if L=0) are compared with the sign-extended value of the SI field, treating the operands as signed The contents of register RA ((RA)32:63 if L=0) are com- integers. The result of the comparison is placed into CR pared with the contents of register RB ((RB)32:63 if field BF. L=0), treating the operands as signed integers. The result of the comparison is placed into CR field BF. Special Registers Altered: CR field BF Special Registers Altered: CR field BF Extended Mnemonics: Extended Mnemonics: Examples of extended mnemonics for Compare Imme- diate: Examples of extended mnemonics for Compare: Extended: Equivalent to: Extended: Equivalent to: cmpdi Rx,value cmpi 0,1,Rx,value cmpd Rx,Ry cmp 0,1,Rx,Ry cmpwi cr3,Rx,value cmpi 3,0,Rx,value cmpw cr3,Rx,Ry cmp 3,0,Rx,Ry 74 Power ISATM Book I Version 2.06 Compare Logical Immediate D-form Compare Logical X-form cmpli BF,L,RA,UI cmpl BF,L,RA,RB 10 BF / L RA UI 31 BF / L RA RB 32 / 0 6 9 10 11 16 31 0 6 9 10 11 16 21 31 if L = 0 then a 320 || (RA) if L = 0 then a 320 || (RA) 32:63 32:63 32 else a (RA) b 0 || (RB)32:63 if a u (480 || UI) then c 0b010 b (RB) else c 0b001 if a u b then c 0b010 else c 0b001 The contents of register RA ((RA)32:63 zero-extended to CR4×BF+32:4×BF+35 c || XERSO 64 bits if L=0) are compared with 480 || UI, treating the operands as unsigned integers. The result of the com- The contents of register RA ((RA)32:63 if L=0) are com- parison is placed into CR field BF. pared with the contents of register RB ((RB)32:63 if L=0), treating the operands as unsigned integers. The Special Registers Altered: result of the comparison is placed into CR field BF. CR field BF Special Registers Altered: Extended Mnemonics: CR field BF Examples of extended mnemonics for Compare Logi- Extended Mnemonics: cal Immediate: Examples of extended mnemonics for Compare Logi- Extended: Equivalent to: cal: cmpldi Rx,value cmpli 0,1,Rx,value cmplwi cr3,Rx,value cmpli 3,0,Rx,value Extended: Equivalent to: cmpld Rx,Ry cmpl 0,1,Rx,Ry cmplw cr3,Rx,Ry cmpl 3,0,Rx,Ry Chapter 3. Fixed-Point Facility 75 Version 2.06 3.3.10 Fixed-Point Trap Instructions The Trap instructions are provided to test for a speci- TO Bit ANDed with Condition fied set of conditions. If any of the conditions tested by 0 Less Than, using signed comparison a Trap instruction are met, the system trap handler is 1 Greater Than, using signed comparison invoked. If none of the tested conditions are met, 2 Equal instruction execution continues normally. 3 Less Than, using unsigned comparison 4 Greater Than, using unsigned comparison The contents of register RA are compared with either the sign-extended value of the SI field or the contents of register RB, depending on the Trap instruction. For Extended mnemonics for traps tdi and td, the entire contents of RA (and RB) partici- A set of extended mnemonics is provided so that traps pate in the comparison; for twi and tw, only the con- can be coded with the condition as part of the mne- tents of the low-order 32 bits of RA (and RB) participate monic rather than as a numeric operand. Some of in the comparison. these are shown as examples with the Trap instruc- This comparison results in five conditions which are tions. See Appendix E for additional extended mne- ANDed with TO. If the result is not 0 the system trap monics. handler is invoked. These conditions are as follows. Trap Word Immediate D-form Trap Word X-form twi TO,RA,SI tw TO,RA,RB 3 TO RA SI 31 TO RA RB 4 / 0 6 11 16 31 0 6 11 16 21 31 a EXTS((RA)32:63) a EXTS((RA)32:63) if (a < EXTS(SI)) & TO0 then TRAP b EXTS((RB)32:63) if (a > EXTS(SI)) & TO1 then TRAP if (a < b) & TO0 then TRAP if (a = EXTS(SI)) & TO2 then TRAP if (a > b) & TO1 then TRAP if (a u EXTS(SI)) & TO4 then TRAP if (a u b) & TO4 then TRAP The contents of RA32:63 are compared with the sign-extended value of the SI field. If any bit in the TO The contents of RA32:63 are compared with the con- field is set to 1 and its corresponding condition is met tents of RB32:63. If any bit in the TO field is set to 1 and by the result of the comparison, the system trap han- its corresponding condition is met by the result of the dler is invoked. comparison, the system trap handler is invoked. If the trap conditions are met, this instruction is context If the trap conditions are met, this instruction is context synchronizing (see Book III). synchronizing (see Book III). Special Registers Altered: Special Registers Altered: None None Extended Mnemonics: Extended Mnemonics: Examples of extended mnemonics for Trap Word Examples of extended mnemonics for Trap Word: Immediate: Extended: Equivalent to: Extended: Equivalent to: tweq Rx,Ry tw 4,Rx,Ry twgti Rx,value twi 8,Rx,value twlge Rx,Ry tw 5,Rx,Ry twllei Rx,value twi 6,Rx,value trap tw 31,0,0 76 Power ISATM Book I Version 2.06 3.3.10.1 64-bit Fixed-Point Trap Instructions [Category: 64-Bit] Trap Doubleword Immediate D-form tdi TO,RA,SI Trap Doubleword X-form 2 TO RA SI td TO,RA,RB 0 6 11 16 31 31 TO RA RB 68 / a (RA) 0 6 11 16 21 31 b EXTS(SI) if (a < b) & TO0 then TRAP if (a > b) & TO1 then TRAP a (RA) if (a = b) & TO2 then TRAP b (RB) if (a u b) & TO4 then TRAP if (a > b) & TO1 then TRAP if (a = b) & TO2 then TRAP The contents of register RA are compared with the if (a u b) & TO4 then TRAP field is set to 1 and its corresponding condition is met The contents of register RA are compared with the con- by the result of the comparison, the system trap han- tents of register RB. If any bit in the TO field is set to 1 dler is invoked. and its corresponding condition is met by the result of If the trap conditions are met, this instruction is context the comparison, the system trap handler is invoked. synchronizing (see Book III). If the trap conditions are met, this instruction is context Special Registers Altered: synchronizing (see Book III). None Special Registers Altered: Extended Mnemonics: None Examples of extended mnemonics for Trap Double- Extended Mnemonics: word Immediate: Examples of extended mnemonics for Trap Double- Extended: Equivalent to: word: tdlti Rx,value tdi 16,Rx,value Extended: Equivalent to: tdnei Rx,value tdi 24,Rx,value tdge Rx,Ry td 12,Rx,Ry tdlnl Rx,Ry td 5,Rx,Ry 3.3.11 Fixed-Point Select [Category: Phased-In (sV2.06)] Integer Select A-form isel RT,RA,RB,BC 31 RT RA RB BC 15 / 0 6 11 16 21 26 31 if RA=0 then a 0 else a (RA) if CRBC+32=1 then RT a else RT (RB) If the contents of bit BC+32 of the Condition Register are equal to 1, then the contents of register RA (or 0) are placed into register RT. Otherwise, the contents of register RB are placed into register RT. Special Registers Altered: None Chapter 3. Fixed-Point Facility 77 Version 2.06 3.3.12 Fixed-Point Logical Instructions The Logical instructions perform bit-parallel operations resources. This form is based on the OR Immediate on 64-bit operands. instruction. The second type is the executed form, which is intended to consume the same amount of the The X-form Logical instructions with Rc=1, and the processor's execution resources as if it were not a D-form Logical instructions andi. and andis., set the no-op. This form is based on the XOR Immediate first three bits of CR Field 0 as described in instruction. (There are also no-ops which affect pro- Section 3.3.7, "Other Fixed-Point Instructions" on gram priority, for which extended mnemonics have not page 61. The Logical instructions do not change the been assigned.) SO, OV, and CA bits in the XER. Extended mnemonics are provided that use the OR Extended mnemonics for logical oper- and NOR instructions to copy the contents of one regis- ter to another, with and without complementing. These ations are shown as examples with the two instructions. Extended mnemonics are provided that generate two See Appendix E, "Assembler Extended Mnemonics" on different types of "no-ops" (instructions that do nothing). page 627 for additional extended mnemonics. The first type is the preferred form, which is optimized to minimize its use of the processor's execution AND Immediate D-form OR Immediate D-form andi. RA,RS,UI ori RA,RS,UI 28 RS RA UI 24 RS RA UI 0 6 11 16 31 0 6 11 16 31 RA (RS) & (480 || UI) RA (RS) | (480 || UI) The contents of register RS are ANDed with 480 || UI The contents of register RS are ORed with 480 || UI and and the result is placed into register RA. the result is placed into register RA. Special Registers Altered: The preferred "no-op" (an instruction that does nothing) CR0 is: AND Immediate Shifted D-form ori 0,0,0 Special Registers Altered: andis. RA,RS,UI None 29 RS RA UI Extended Mnemonics: 0 6 11 16 31 Example of extended mnemonics for OR Immediate: RA (RS) & (320 || UI || 160) Extended: Equivalent to: no-op ori 0,0,0 The contents of register RS are ANDed with 32 0 || UI || 160 and the result is placed into register RA. Special Registers Altered: CR0 78 Power ISATM Book I Version 2.06 OR Immediate Shifted D-form oris RA,RS,UI 25 RS RA UI 0 6 11 16 31 RA (RS) | (320 || UI || 160) The contents of register RS are ORed with 32 0 || UI || 160 and the result is placed into register RA. Special Registers Altered: None XOR Immediate D-form XOR Immediate Shifted D-form xori RA,RS,UI xoris RA,RS,UI 26 RS RA UI 27 RS RA UI 0 6 11 16 31 0 6 11 16 31 RA (RS) XOR (480 || UI) RA (RS) XOR (320 || UI || 160) The contents of register RS are XORed with 480 || UI The contents of register RS are XORed with 32 and the result is placed into register RA. 0 || UI || 160 and the result is placed into register RA. The executed form of a "no-op" (an instruction that Special Registers Altered: does nothing, but consumes execution resources nev- None ertheless) is: xori 0,0,0 Special Registers Altered: None Extended Mnemonics: Example of extended mnemonics for XOR Immediate: Extended: Equivalent to: xnop xori 0,0,0 Programming Note The executed form of no-op should be used only when the intent is to alter the timing of a program. Chapter 3. Fixed-Point Facility 79 Version 2.06 AND X-form OR X-form and RA,RS,RB (Rc=0) or RA,RS,RB (Rc=0) and. RA,RS,RB (Rc=1) or. RA,RS,RB (Rc=1) 31 RS RA RB 28 Rc 31 RS RA RB 444 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RA (RS) & (RB) RA (RS) | (RB) The contents of register RS are ANDed with the con- The contents of register RS are ORed with the contents tents of register RB and the result is placed into register of register RB and the result is placed into register RA. RA. Some forms of or Rx,Rx,Rx provide special functions. Special Registers Altered: See Section 3.2, ""or" Instruction", in Book II. CR0 (if Rc=1) Special Registers Altered: CR0 (if Rc=1) Extended Mnemonics: XOR X-form Example of extended mnemonics for OR: xor RA,RS,RB (Rc=0) xor. RA,RS,RB (Rc=1) Extended: Equivalent to: mr Rx,Ry or Rx,Ry,Ry 31 RS RA RB 316 Rc 0 6 11 16 21 31 Programming Note Warning: Some forms of or Rx,Rx,Rx have undes- RA (RS) (RB) ired side effects. See Section 3.2, ""or" Instruction", in Book II for details. If a no-op is needed, the pre- The contents of register RS are XORed with the con- ferred no-op (ori 0,0,0) should be used. tents of register RB and the result is placed into register RA. Special Registers Altered: CR0 (if Rc=1) NAND X-form nand RA,RS,RB (Rc=0) nand. RA,RS,RB (Rc=1) 31 RS RA RB 476 Rc 0 6 11 16 21 31 RA ¬((RS) & (RB)) The contents of register RS are ANDed with the con- tents of register RB and the complemented result is placed into register RA. Special Registers Altered: CR0 (if Rc=1) Programming Note nand or nor with RS=RB can be used to obtain the one's complement. 80 Power ISATM Book I Version 2.06 NOR X-form Equivalent X-form nor RA,RS,RB (Rc=0) eqv RA,RS,RB (Rc=0) nor. RA,RS,RB (Rc=1) eqv. RA,RS,RB (Rc=1) 31 RS RA RB 124 Rc 31 RS RA RB 284 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RA ¬((RS) | (RB)) RA (RS) (RB) The contents of register RS are ORed with the contents The contents of register RS are XORed with the con- of register RB and the complemented result is placed tents of register RB and the complemented result is into register RA. placed into register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Extended Mnemonics: Example of extended mnemonics for NOR: Extended: Equivalent to: not Rx,Ry nor Rx,Ry,Ry AND with Complement X-form OR with Complement X-form andc RA,RS,RB (Rc=0) orc RA,RS,RB (Rc=0) andc. RA,RS,RB (Rc=1) orc. RA,RS,RB (Rc=1) 31 RS RA RB 60 Rc 31 RS RA RB 412 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RA (RS) & ¬(RB) RA (RS) | ¬(RB) The contents of register RS are ANDed with the com- The contents of register RS are ORed with the comple- plement of the contents of register RB and the result is ment of the contents of register RB and the result is placed into register RA. placed into register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Extend Sign Byte X-form Extend Sign Halfword X-form extsb RA,RS (Rc=0) extsh RA,RS (Rc=0) extsb. RA,RS (Rc=1) extsh. RA,RS (Rc=1) 31 RS RA /// 954 Rc 31 RS RA /// 922 Rc 0 6 11 16 21 31 0 6 11 16 21 31 s (RS)56 s (RS)48 RA56:63 (RS)56:63 RA48:63 (RS)48:63 RA0:55 56s RA0:47 48s (RS)56:63 are placed into RA56:63. RA0:55 are filled with (RS)48:63 are placed into RA48:63. RA0:47 are filled with a copy of (RS)56. a copy of (RS)48. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Count Leading Zeros Word X-form cntlzw RA,RS (Rc=0) Chapter 3. Fixed-Point Facility 81 Version 2.06 cntlzw. RA,RS (Rc=1) Compare Bytes X-form 31 RS RA /// 26 Rc cmpb RA,RS,RB 0 6 11 16 21 31 31 RS RA RB 508 / 0 6 11 16 21 31 n 32 do while n < 64 if (RS)n = 1 then leave do n = 0 to 7 n n + 1 if RS8×n:8×n+7 = (RB)8×n:8×n+7 then RA n - 32 RA8×n:8×n+7 8 1 else A count of the number of consecutive zero bits starting RA8×n:8×n+7 80 at bit 32 of register RS is placed into register RA. This number ranges from 0 to 32, inclusive. Each byte of the contents of register RS is compared to each corresponding byte of the contents in register RB. If Rc=1, CR Field 0 is set to reflect the result. If they are equal, the corresponding byte in RA is set to Special Registers Altered: 0xFF. Otherwise the corresponding byte in RA is set to CR0 (if Rc=1) 0x00. Special Registers Altered: Programming Note None For both Count Leading Zeros instructions, if Rc=1 then LT is set to 0 in CR Field 0. 82 Power ISATM Book I Version 2.06 Population Count Bytes X-form Population Count Words X-form popcntb RA, RS popcntw RA, RS [Category: Server] 31 RS RA /// 122 / [Category: Embedded.Phased-In] 0 6 11 16 21 31 31 RS RA /// 378 / 0 6 11 16 21 31 do i = 0 to 7 n 0 do j = 0 to 7 do i = 0 to 1 if (RS)(i×8)+j = 1 then n 0 n n+1 do j = 0 to 31 RA(i×8):(i×8)+7 n if (RS)(i×32)+j = 1 then A count of the number of one bits in each byte of regis- n n+1 RA(i×32):(i×32)+31 n ter RS is placed into the corresponding byte of register RA. This number ranges from 0 to 8, inclusive. A count of the number of one bits in each word of regis- ter RS is placed into the corresponding word of register Special Registers Altered: RA. This number ranges from 0 to 32, inclusive. None Special Registers Altered: None Chapter 3. Fixed-Point Facility 83 Version 2.06 Parity Doubleword X-form Parity Word X-form prtyd RA,RS prtyw RA,RS [Category: 64-bit] 31 RS RA /// 154 / 31 RS RA /// 186 / 0 6 11 16 21 31 0 6 11 16 21 31 s 0 s 0 t 0 do i = 0 to 7 do i = 0 to 3 s s / (RS)i%8+7 s s / (RS)i%8+7 63 do i = 4 to 7 RA 0 || s t t / (RS)i%8+7 The least significant bit in each byte of the contents of RA0:31 310 || s register RS is examined. If there is an odd number of RA32:63 31 0 || t one bits the value 1 is placed into register RA; other- wise the value 0 is placed into register RA. The least significant bit in each byte of (RS)0:31 is examined. If there is an odd number of one bits the Special Registers Altered: value 1 is placed into RA0:31; otherwise the value 0 is None placed into RA0:31. The least significant bit in each byte of (RS)32:63 is examined. If there is an odd number of one bits the value 1 is placed into RA32:63; otherwise the value 0 is placed into RA32:63. Special Registers Altered: None Programming Note The Parity instructions are designed to be used in conjunction with the Population Count instruction to compute the parity of words or a doubleword. The parity of the upper and lower words in (RS) can be computed as follows. popcntb RA, RS prtyw RA, RA The parity of (RS) can be computed as follows. popcntb RA, RS prtyd RA, RA 84 Power ISATM Book I Version 2.06 3.3.12.1 64-bit Fixed-Point Logical Population Count Doubleword X-form Instructions [Category: 64-Bit] popcntd RA, RS [Category: Server.64-bit] Extend Sign Word X-form [Category: Embedded.64-bit.Phased-In] extsw RA,RS (Rc=0) extsw. RA,RS (Rc=1) 31 RS RA /// 506 / 0 6 11 16 21 31 31 RS RA /// 986 Rc 0 6 11 16 21 31 n 0 do i = 0 to 63 if (RS)i = 1 then s (RS)32 n n+1 RA32:63 (RS)32:63 RA n RA0:31 32s A count of the number of one bits in register RS is (RS)32:63 are placed into RA32:63. RA0:31 are filled with placed into register RA. This number ranges from 0 to a copy of (RS)32. 64, inclusive. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) None Count Leading Zeros Doubleword X-form cntlzd RA,RS (Rc=0) cntlzd. RA,RS (Rc=1) 31 RS RA /// 58 Rc 0 6 11 16 21 31 n 0 do while n < 64 if (RS)n = 1 then leave n n + 1 RA n A count of the number of consecutive zero bits starting at bit 0 of register RS is placed into register RA. This number ranges from 0 to 64, inclusive. If Rc=1, CR Field 0 is set to reflect the result. Special Registers Altered: CR0 (if Rc=1) Chapter 3. Fixed-Point Facility 85 Version 2.06 Bit Permute Doubleword X-form bpermd RA,RS,RB [Category: Embedded.Phased-in, Server] 31 RS RA RB 252 / 0 6 11 16 21 31 For i = 0 to 7 index (RS)8*i:8*i+7 If index < 64 then permi (RB)index else permi 0 RA 56 0 || perm0:7 Eight permuted bits are produced. For each permuted bit i where i ranges from 0 to 7 and for each byte i of RS, do the following. If byte i of RS is less than 64, permuted bit i is set to the bit of RB specified by byte i of RS; otherwise permuted bit i is set to 0. The permuted bits are placed in the least-significant byte of RA, and the remaining bits are filled with 0s. Special Registers Altered: None Programming Note The fact that the permuted bit is 0 if the corre- sponding index value exceeds 63 permits the per- muted bits to be selected from a 128-bit quantity, using a single index register. For example, assume that the 128-bit quantity Q, from which the per- muted bits are to be selected, is in registers r2 (high-order 64 bits of Q) and r3 (low-order 64 bits of Q), that the index values are in register r1, with each byte of r1 containing a value in the range 0:127, and that each byte of register r4 contains the value 64. The following code sequence selects eight permuted bits from Q and places them into the low-order byte of r6. bpermd r6,r1,r2 # select from high- order half of Q xor r0,r1,r4 # adjust index values bpermd r5,r0,r3 # select from low- order half of Q or r6,r6,r5 # merge the two selections 86 Power ISATM Book I Version 2.06 3.3.13 Fixed-Point Rotate and Shift Instructions The Fixed-Point Facility performs rotation operations There is no way to specify an all-zero mask. on data from a GPR and returns the result, or a portion For instructions that use the rotate32 operation, the of the result, to a GPR. mask start and stop positions are always in the The rotation operations rotate a 64-bit quantity left by a low-order 32 bits of the mask. specified number of bit positions. Bits that exit from The use of the mask is described in following sections. position 0 enter at position 63. The Rotate and Shift instructions with Rc=1 set the first Two types of rotation operation are supported. three bits of CR field 0 as described in Section 3.3.7, For the first type, denoted rotate64 or ROTL64, the "Other Fixed-Point Instructions" on page 61. Rotate and value rotated is the given 64-bit value. The rotate64 Shift instructions do not change the OV and SO bits. operation is used to rotate a given 64-bit quantity. Rotate and Shift instructions, except algebraic right shifts, do not change the CA bit. For the second type, denoted rotate32 or ROTL32, the value rotated consists of two copies of bits 32:63 of the given 64-bit value, one copy in bits 0:31 and the other Extended mnemonics for rotates and in bits 32:63. The rotate32 operation is used to rotate a shifts given 32-bit quantity. The Rotate and Shift instructions, while powerful, can The Rotate and Shift instructions employ a mask gen- be complicated to code (they have up to five operands). erator. The mask is 64 bits long, and consists of 1-bits A set of extended mnemonics is provided that allow from a start bit, mstart, through and including a stop bit, simpler coding of often-used functions such as clearing mstop, and 0-bits elsewhere. The values of mstart and the leftmost or rightmost bits of a register, left justifying mstop range from 0 to 63. If mstart > mstop, the 1-bits or right justifying an arbitrary field, and performing sim- wrap around from position 63 to position 0. Thus the ple rotates and shifts. Some of these are shown as mask is formed as follows: examples with the Rotate instructions. See Appendix E, "Assembler Extended Mnemonics" on page 627 for if mstart mstop then additional extended mnemonics. maskmstart:mstop = ones maskall other bits = zeros else maskmstart:63 = ones mask0:mstop = ones maskall other bits = zeros 3.3.13.1 Fixed-Point Rotate Instructions These instructions rotate the contents of a register. The Rotate Left Word Immediate then AND result of the rotation is with Mask M-form inserted into the target register under control of a rlwinm RA,RS,SH,MB,ME (Rc=0) mask (if a mask bit is 1 the associated bit of the rlwinm. RA,RS,SH,MB,ME (Rc=1) rotated data is placed into the target register, and if the mask bit is 0 the associated bit in the target register remains unchanged); or 21 RS RA SH MB ME Rc 0 6 11 16 21 26 31 ANDed with a mask before being placed into the target register. n SH The Rotate Left instructions allow right-rotation of the r ROTL32((RS)32:63, n) contents of a register to be performed (in concept) by a m MASK(MB+32, ME+32) left-rotation of 64-n, where n is the number of bits by RA r & m which to rotate right. They allow right-rotation of the The contents of register RS are rotated32 left SH bits. A contents of the low-order 32 bits of a register to be per- mask is generated having 1-bits from bit MB+32 formed (in concept) by a left-rotation of 32-n, where n through bit ME+32 and 0-bits elsewhere. The rotated is the number of bits by which to rotate right. data are ANDed with the generated mask and the result is placed into register RA. Special Registers Altered: CR0 (if Rc=1) Chapter 3. Fixed-Point Facility 87 Version 2.06 Extended Mnemonics: Rotate Left Word then AND with Mask Examples of extended mnemonics for Rotate Left Word M-form Immediate then AND with Mask: rlwnm RA,RS,RB,MB,ME (Rc=0) Extended: Equivalent to: rlwnm. RA,RS,RB,MB,ME (Rc=1) extlwi Rx,Ry,n,b rlwinm Rx,Ry,b,0,n-1 srwi Rx,Ry,n rlwinm Rx,Ry,32-n,n,3 23 RS RA RB MB ME Rc 0 6 11 16 21 26 31 1 clrrwi Rx,Ry,n rlwinm Rx,Ry,0,0,31-n n (RB)59:63 Programming Note r ROTL32((RS)32:63, n) m MASK(MB+32, ME+32) Let RSL represent the low-order 32 bits of register RA r & m RS, with the bits numbered from 0 through 31. The contents of register RS are rotated32 left the num- rlwinm can be used to extract an n-bit field that ber of bits specified by (RB)59:63. A mask is generated starts at bit position b in RSL, right-justified into the having 1-bits from bit MB+32 through bit ME+32 and low-order 32 bits of register RA (clearing the 0-bits elsewhere. The rotated data are ANDed with the remaining 32-n bits of the low-order 32 bits of RA), generated mask and the result is placed into register by setting SH=b+n, MB=32-n, and ME=31. It can RA. be used to extract an n-bit field that starts at bit position b in RSL, left-justified into the low-order 32 Special Registers Altered: bits of register RA (clearing the remaining 32-n bits CR0 (if Rc=1) of the low-order 32 bits of RA), by setting SH=b, Extended Mnemonics: MB = 0, and ME=n-1. It can be used to rotate the contents of the low-order 32 bits of a register left Example of extended mnemonics for Rotate Left Word (right) by n bits, by setting SH=n (32-n), MB=0, then AND with Mask: and ME=31. It can be used to shift the contents of Extended: Equivalent to: the low-order 32 bits of a register right by n bits, by rotlw Rx,Ry,Rz rlwnm Rx,Ry,Rz,0,31 setting SH=32-n, MB=n, and ME=31. It can be used to clear the high-order b bits of the low-order 32 bits of the contents of a register and then shift Programming Note the result left by n bits, by setting SH=n, MB=b-n, Let RSL represent the low-order 32 bits of register and ME=31-n. It can be used to clear the low-order RS, with the bits numbered from 0 through 31. n bits of the low-order 32 bits of a register, by set- rlwnm can be used to extract an n-bit field that ting SH=0, MB=0, and ME=31-n. starts at variable bit position b in RSL, right-justified For all the uses given above, the high-order 32 bits into the low-order 32 bits of register RA (clearing of register RA are cleared. the remaining 32-n bits of the low-order 32 bits of RA), by setting RB59:63=b+n, MB=32-n, and Extended mnemonics are provided for all of these ME=31. It can be used to extract an n-bit field that uses; see Appendix E, "Assembler Extended Mne- starts at variable bit position b in RSL, left-justified monics" on page 627. into the low-order 32 bits of register RA (clearing the remaining 32-n bits of the low-order 32 bits of RA), by setting RB59:63=b, MB = 0, and ME=n-1. It can be used to rotate the contents of the low-order 32 bits of a register left (right) by variable n bits, by setting RB59:63=n (32-n), MB=0, and ME=31. For all the uses given above, the high-order 32 bits of register RA are cleared. Extended mnemonics are provided for some of these uses; see Appendix E, "Assembler Extended Mnemonics" on page 627. 88 Power ISATM Book I Version 2.06 Rotate Left Word Immediate then Mask Insert M-form rlwimi RA,RS,SH,MB,ME (Rc=0) rlwimi. RA,RS,SH,MB,ME (Rc=1) 20 RS RA SH MB ME Rc 0 6 11 16 21 26 31 n SH r ROTL32((RS)32:63, n) m MASK(MB+32, ME+32) RA r&m | (RA)&¬m The contents of register RS are rotated32 left SH bits. A mask is generated having 1-bits from bit MB+32 through bit ME+32 and 0-bits elsewhere. The rotated data are inserted into register RA under control of the generated mask. Special Registers Altered: CR0 (if Rc=1) Extended Mnemonics: Example of extended mnemonics for Rotate Left Word Immediate then Mask Insert: Extended: Equivalent to: inslwi Rx,Ry,n,b rlwimi Rx,Ry,32-b,b,b+n- 1 Programming Note Let RAL represent the low-order 32 bits of register RA, with the bits numbered from 0 through 31. rlwimi can be used to insert an n-bit field that is left-justified in the low-order 32 bits of register RS, into RAL starting at bit position b, by setting SH=32-b, MB=b, and ME=(b+n)-1. It can be used to insert an n-bit field that is right-justified in the low-order 32 bits of register RS, into RAL starting at bit position b, by setting SH=32-(b+n), MB=b, and ME=(b+n)-1. Extended mnemonics are provided for both of these uses; see Appendix E, "Assembler Extended Mnemonics" on page 627. Chapter 3. Fixed-Point Facility 89 Version 2.06 3.3.13.1.1 64-bit Fixed-Point Rotate Instructions [Category: 64-Bit] Rotate Left Doubleword Immediate then Rotate Left Doubleword Immediate then Clear Left MD-form Clear Right MD-form rldicl RA,RS,SH,MB (Rc=0) rldicr RA,RS,SH,ME (Rc=0) rldicl. RA,RS,SH,MB (Rc=1) rldicr. RA,RS,SH,ME (Rc=1) 30 RS RA sh mb 0 sh Rc 30 RS RA sh me 1 sh Rc 0 6 11 16 21 27 30 31 0 6 11 16 21 27 30 31 n sh5 || sh0:4 n sh5 || sh0:4 r ROTL64((RS), n) r ROTL64((RS), n) b mb5 || mb0:4 e me5 || me0:4 m MASK(b, 63) m MASK(0, e) RA r & m RA r & m The contents of register RS are rotated64 left SH bits. A The contents of register RS are rotated64 left SH bits. A mask is generated having 1-bits from bit MB through bit mask is generated having 1-bits from bit 0 through bit 63 and 0-bits elsewhere. The rotated data are ANDed ME and 0-bits elsewhere. The rotated data are ANDed with the generated mask and the result is placed into with the generated mask and the result is placed into register RA. register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Extended Mnemonics: Extended Mnemonics: Examples of extended mnemonics for Rotate Left Dou- Examples of extended mnemonics for Rotate Left Dou- bleword Immediate then Clear Left: bleword Immediate then Clear Right: Extended: Equivalent to: Extended: Equivalent to: extrdi Rx,Ry,n,b rldicl Rx,Ry,b+n,64- extldi Rx,Ry,n,b rldicr Rx,Ry,b,n-1 n sldi Rx,Ry,n rldicr Rx,Ry,n,63-n srdi Rx,Ry,n rldicl Rx,Ry,64-n,n clrrdi Rx,Ry,n rldicr Rx,Ry,0,63-n clrldi Rx,Ry,n rldicl Rx,Ry,0,n Programming Note Programming Note rldicr can be used to extract an n-bit field that rldicl can be used to extract an n-bit field that starts at bit position b in register RS, left-justified starts at bit position b in register RS, right-justified into register RA (clearing the remaining 64-n bits of into register RA (clearing the remaining 64-n bits of RA), by setting SH=b and ME=n-1. It can be used RA), by setting SH=b+n and MB=64-n. It can be to rotate the contents of a register left (right) by n used to rotate the contents of a register left (right) bits, by setting SH=n (64-n) and ME=63. It can be by n bits, by setting SH=n (64-n) and MB=0. It can used to shift the contents of a register left by n bits, be used to shift the contents of a register right by n by setting SH=n and ME=63-n. It can be used to bits, by setting SH=64-n and MB=n. It can be used clear the low-order n bits of a register, by setting to clear the high-order n bits of a register, by setting SH=0 and ME=63-n. SH=0 and MB=n. Extended mnemonics are provided for all of these Extended mnemonics are provided for all of these uses (some devolve to rldicl); see Appendix E, uses; see Appendix E, "Assembler Extended Mne- "Assembler Extended Mnemonics" on page 627. monics" on page 627. 90 Power ISATM Book I Version 2.06 Rotate Left Doubleword Immediate then Rotate Left Doubleword then Clear Left Clear MD-form MDS-form rldic RA,RS,SH,MB (Rc=0) rldcl RA,RS,RB,MB (Rc=0) rldic. RA,RS,SH,MB (Rc=1) rldcl. RA,RS,RB,MB (Rc=1) 30 RS RA sh mb 2 sh Rc 30 RS RA RB mb 8 Rc 0 6 11 16 21 27 30 31 0 6 11 16 21 27 31 n sh5 || sh0:4 n (RB)58:63 r ROTL64((RS), n) r ROTL64((RS), n) b mb5 || mb0:4 b mb5 || mb0:4 m MASK(b, ¬n) m MASK(b, 63) RA r & m RA r & m The contents of register RS are rotated64 left SH bits. A The contents of register RS are rotated64 left the num- mask is generated having 1-bits from bit MB through bit ber of bits specified by (RB)58:63. A mask is generated 63-SH and 0-bits elsewhere. The rotated data are having 1-bits from bit MB through bit 63 and 0-bits else- ANDed with the generated mask and the result is where. The rotated data are ANDed with the generated placed into register RA. mask and the result is placed into register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Extended Mnemonics: Extended Mnemonics: Example of extended mnemonics for Rotate Left Dou- Example of extended mnemonics for Rotate Left Dou- bleword Immediate then Clear: bleword then Clear Left: Extended: Equivalent to: Extended: Equivalent to: clrlsldi Rx,Ry,b,n rldic Rx,Ry,n,b-n rotld Rx,Ry,Rz rldcl Rx,Ry,Rz,0 Programming Note Programming Note rldic can be used to clear the high-order b bits of rldcl can be used to extract an n-bit field that starts the contents of a register and then shift the result at variable bit position b in register RS, right-justi- left by n bits, by setting SH=n and MB=b-n. It can fied into register RA (clearing the remaining 64-n be used to clear the high-order n bits of a register, bits of RA), by setting RB58:63=b+n and MB=64-n. by setting SH=0 and MB=n. It can be used to rotate the contents of a register left (right) by variable n bits, by setting RB58:63=n Extended mnemonics are provided for both of (64-n) and MB=0. these uses (the second devolves to rldicl); see Appendix E, "Assembler Extended Mnemonics" on Extended mnemonics are provided for some of page 627. these uses; see Appendix E, "Assembler Extended Mnemonics" on page 627. Chapter 3. Fixed-Point Facility 91 Version 2.06 Rotate Left Doubleword then Clear Right Rotate Left Doubleword Immediate then MDS-form Mask Insert MD-form rldcr RA,RS,RB,ME (Rc=0) rldimi RA,RS,SH,MB (Rc=0) rldcr. RA,RS,RB,ME (Rc=1) rldimi. RA,RS,SH,MB (Rc=1) 30 RS RA RB me 9 Rc 30 RS RA sh mb 3 sh Rc 0 6 11 16 21 27 31 0 6 11 16 21 27 30 31 n (RB)58:63 n sh5 || sh0:4 r ROTL64((RS), n) r ROTL64((RS), n) e me5 || me0:4 b mb5 || mb0:4 m MASK(0, e) m MASK(b, ¬n) RA r & m RA r&m | (RA)&¬m The contents of register RS are rotated64 left the num- The contents of register RS are rotated64 left SH bits. A ber of bits specified by (RB)58:63. A mask is generated mask is generated having 1-bits from bit MB through bit having 1-bits from bit 0 through bit ME and 0-bits else- 63-SH and 0-bits elsewhere. The rotated data are where. The rotated data are ANDed with the generated inserted into register RA under control of the generated mask and the result is placed into register RA. mask. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Extended Mnemonics: Programming Note rldcr can be used to extract an n-bit field that starts Example of extended mnemonics for Rotate Left Dou- at variable bit position b in register RS, left-justified bleword Immediate then Mask Insert: into register RA (clearing the remaining 64-n bits of RA), by setting RB58:63=b and ME=n-1. It can be Extended: Equivalent to: used to rotate the contents of a register left (right) insrdi Rx,Ry,n,b rldimi Rx,Ry,64-(b+n),b by variable n bits, by setting RB58:63=n (64-n) and ME=63. Programming Note rldimi can be used to insert an n-bit field that is Extended mnemonics are provided for some of right-justified in register RS, into register RA start- these uses (some devolve to rldcl); see ing at bit position b, by setting SH=64-(b+n) and Appendix E, "Assembler Extended Mnemonics" on MB=b. page 627. An extended mnemonic is provided for this use; see Appendix E, "Assembler Extended Mnemon- ics" on page 627. 92 Power ISATM Book I Version 2.06 3.3.13.2 Fixed-Point Shift Instructions The instructions in this section perform left and right Programming Note shifts. Any Shift Right Algebraic instruction, followed by addze, can be used to divide quickly by 2n. The Extended mnemonics for shifts setting of the CA bit by the Shift Right Algebraic Immediate-form logical (unsigned) shift operations are instructions is independent of mode. obtained by specifying appropriate masks and shift val- ues for certain Rotate instructions. A set of extended Programming Note mnemonics is provided to make coding of such shifts simpler and easier to understand. Some of these are Multiple-precision shifts can be programmed as shown as examples with the Rotate instructions. See shown in Section F.1, "Multiple-Precision Shifts" on Appendix E, "Assembler Extended Mnemonics" on page 641. page 627 for additional extended mnemonics. Shift Left Word X-form Shift Right Word X-form slw RA,RS,RB (Rc=0) srw RA,RS,RB (Rc=0) slw. RA,RS,RB (Rc=1) srw. RA,RS,RB (Rc=1) 31 RS RA RB 24 Rc 31 RS RA RB 536 Rc 0 6 11 16 21 31 0 6 11 16 21 31 n (RB)59:63 n (RB)59:63 r ROTL32((RS)32:63, n) r ROTL32((RS)32:63, 64-n) if (RB)58 = 0 then if (RB)58 = 0 then m MASK(32, 63-n) m MASK(n+32, 63) else m 640 else m 640 RA r & m RA r & m The contents of the low-order 32 bits of register RS are The contents of the low-order 32 bits of register RS are shifted left the number of bits specified by (RB)58:63. shifted right the number of bits specified by (RB)58:63. Bits shifted out of position 32 are lost. Zeros are sup- Bits shifted out of position 63 are lost. Zeros are sup- plied to the vacated positions on the right. The 32-bit plied to the vacated positions on the left. The 32-bit result is placed into RA32:63. RA0:31 are set to zero. result is placed into RA32:63. RA0:31 are set to zero. Shift amounts from 32 to 63 give a zero result. Shift amounts from 32 to 63 give a zero result. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Chapter 3. Fixed-Point Facility 93 Version 2.06 Shift Right Algebraic Word Immediate Shift Right Algebraic Word X-form X-form sraw RA,RS,RB (Rc=0) srawi RA,RS,SH (Rc=0) sraw. RA,RS,RB (Rc=1) srawi. RA,RS,SH (Rc=1) 31 RS RA RB 792 Rc 31 RS RA SH 824 Rc 0 6 11 16 21 31 0 6 11 16 21 31 n (RB)59:63 n SH r ROTL32((RS)32:63, 64-n) r ROTL32((RS)32:63, 64-n) if (RB)58 = 0 then m MASK(n+32, 63) m MASK(n+32, 63) 64 s (RS)32 else m 0 RA r&m | (64s)&¬m s (RS)32 CA s & ((r&¬m)32:630) RA r&m | (64s)&¬m CA s & ((r&¬m)32:630) The contents of the low-order 32 bits of register RS are shifted right SH bits. Bits shifted out of position 63 are The contents of the low-order 32 bits of register RS are lost. Bit 32 of RS is replicated to fill the vacated posi- shifted right the number of bits specified by (RB)58:63. tions on the left. The 32-bit result is placed into RA32:63. Bits shifted out of position 63 are lost. Bit 32 of RS is Bit 32 of RS is replicated to fill RA0:31. CA is set to 1 if replicated to fill the vacated positions on the left. The the low-order 32 bits of (RS) contain a negative number 32-bit result is placed into RA32:63. Bit 32 of RS is repli- and any 1-bits are shifted out of position 63; otherwise cated to fill RA0:31. CA is set to 1 if the low-order 32 bits CA is set to 0. A shift amount of zero causes RA to of (RS) contain a negative number and any 1-bits are receive EXTS((RS)32:63), and CA to be set to 0. shifted out of position 63; otherwise CA is set to 0. A shift amount of zero causes RA to receive EXTS((RS)32:63), and CA to be set to 0. Shift amounts Special Registers Altered: from 32 to 63 give a result of 64 sign bits, and cause CA CA to receive the sign bit of (RS)32:63. CR0 (if Rc=1) Special Registers Altered: CA CR0 (if Rc=1) 94 Power ISATM Book I Version 2.06 3.3.13.2.1 64-bit Fixed-Point Shift Instructions [Category: 64-Bit] Shift Left Doubleword X-form Shift Right Doubleword X-form sld RA,RS,RB (Rc=0) srd RA,RS,RB (Rc=0) sld. RA,RS,RB (Rc=1) srd. RA,RS,RB (Rc=1) 31 RS RA RB 27 Rc 31 RS RA RB 539 Rc 0 6 11 16 21 31 0 6 11 16 21 31 n (RB)58:63 n (RB)58:63 r ROTL64((RS), n) r ROTL64((RS), 64-n) if (RB)57 = 0 then if (RB)57 = 0 then m MASK(0, 63-n) m MASK(n, 63) 64 64 else m 0 else m 0 RA r & m RA r & m The contents of register RS are shifted left the number The contents of register RS are shifted right the num- of bits specified by (RB)57:63. Bits shifted out of position ber of bits specified by (RB)57:63. Bits shifted out of 0 are lost. Zeros are supplied to the vacated positions position 63 are lost. Zeros are supplied to the vacated on the right. The result is placed into register RA. Shift positions on the left. The result is placed into register amounts from 64 to 127 give a zero result. RA. Shift amounts from 64 to 127 give a zero result. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Chapter 3. Fixed-Point Facility 95 Version 2.06 Shift Right Algebraic Doubleword Shift Right Algebraic Doubleword X-form Immediate XS-form srad RA,RS,RB (Rc=0) sradi RA,RS,SH (Rc=0) srad. RA,RS,RB (Rc=1) sradi. RA,RS,SH (Rc=1) 31 RS RA RB 794 Rc 31 RS RA sh 413 sh Rc 0 6 11 16 21 31 0 6 11 16 21 30 31 n (RB)58:63 n sh5 || sh0:4 r ROTL64((RS), 64-n) r ROTL64((RS), 64-n) if (RB)57 = 0 then m MASK(n, 63) m MASK(n, 63) 64 s (RS)0 else m 0 RA r&m | (64s)&¬m s (RS)0 CA s & ((r&¬m)0) RA r&m | (64s)&¬m CA s & ((r&¬m)0) The contents of register RS are shifted right SH bits. Bits shifted out of position 63 are lost. Bit 0 of RS is rep- The contents of register RS are shifted right the num- licated to fill the vacated positions on the left. The result ber of bits specified by (RB)57:63. Bits shifted out of is placed into register RA. CA is set to 1 if (RS) is nega- position 63 are lost. Bit 0 of RS is replicated to fill the tive and any 1-bits are shifted out of position 63; other- vacated positions on the left. The result is placed into wise CA is set to 0. A shift amount of zero causes RA to register RA. CA is set to 1 if (RS) is negative and any be set equal to (RS), and CA to be set to 0. 1-bits are shifted out of position 63; otherwise CA is set to 0. A shift amount of zero causes RA to be set equal Special Registers Altered: to (RS), and CA to be set to 0. Shift amounts from 64 to CA 127 give a result of 64 sign bits in RA, and cause CA to CR0 (if Rc=1) receive the sign bit of (RS). Special Registers Altered: CA CR0 (if Rc=1) 96 Power ISATM Book I Version 2.06 3.3.14 Binary Coded Decimal (BCD) Assist Instructions [Category: Embed- ded.Phased-in, Server] The Binary Coded Decimal Assist instructions operate addg6s) and Decimal Floating-Point operands (cdt- on Binary Coded Decimal operands (cbcdtd and bcd) See Chapter 5. for additional information. Convert Declets To Binary Coded Decimal Add and Generate Sixes XO-Form X-form addg6s RT,RA,RB cdtbcd RA, RS 31 RT RA RB / 74 / 31 RS RA ///// 282 / 0 6 11 16 21 22 31 0 6 11 16 21 31 do i = 0 to 15 do i = 0 to 1 dci carry_out(RA4xi:63 + RB4xi:63) 4 n i x 32 c (dc0) || 4(dc1) || ... || 4(dc15) RAn+0:n+7 0 RT (¬c) & 0x6666_6666_6666_6666 RAn+8:n+19 DEC_TO_BCD( (RS)n+12:n+21 ) RAn+20:n+31 DEC_TO_BCD( (RS)n+22:n+31 ) The contents of register RA are added to the contents of register RB. Sixteen carry bits are produced, one The low-order 20 bits of each word of register RS con- for each carry out of decimal position n (bit posi- tain two declets which are converted to six, 4-bit BCD tion 4xn). fields; each set of six, 4-bit BCD fields is placed into the low-order 24 bits of the corresponding word in RA. The A doubleword is composed from the 16 carry bits, and high-order 8 bits in each word of RA are set to 0. placed into RT. The doubleword consists of a decimal Special Registers Altered: six (0b0110) in every decimal digit position for which None the corresponding carry bit is 0, and a zero (0b0000) in every position for which the corresponding carry bit is Convert Binary Coded Decimal To Declets 1. X-form Special Registers Altered: None cbcdtd RA, RS 31 RS RA ///// 314 / 0 6 11 16 21 31 do i = 0 to 1 n i x 32 RAn+0:n+11 0 RAn+12:n+21 BCD_TO_DEC( (RS)n+8:n+19 ) RAn+22:n+31 BCD_TO_DEC( (RS)n+20:n+31 ) The low-order 24 bits of each word of register RS con- tain six, 4-bit BCD fields which are converted to two declets; each set of two declets is placed into the low-order 20 bits of the corresponding word in RA. The high-order 12 bits in each word of RA are set to 0. If a 4-bit BCD field has a value greater than 9 the results are undefined. Special Registers Altered: None Chapter 3. Fixed-Point Facility 97 Version 2.06 Programming Note addg6s can be used to add or subtract two BCD operands. In these examples it is assumed that r0 contains 0x666...666. (BCD data formats are described in Section 5.3.) Addition of the unsigned BCD operand in register RA to the unsigned BCD operand in register RB can be accomplished as follows. add r1,RA,r0 add r2,r1,RB addg6s RT,r1,RB subf RT,RT,r2# RT = RA +BCD RB Subtraction of the unsigned BCD operand in regis- ter RA from the unsigned BCD operand in register RB can be accomplished as follows. (In this exam- ple it is assumed that RB is not register 0.) addi r1,RB,1 nor r2,RA,RA# one's complement of RA add r3,r1,r2 addg6s RT,r1,r2 subf RT,RT,r3# RT = RB -BCD RA Additional instructions are needed to handle signed BCD operands, and BCD operands that occupy more than one register (e.g., unsigned BCD oper- ands that have more than 16 decimal digits). 98 Power ISATM Book I Version 2.06 3.3.15 Move To/From System Register Instructions The Move To Condition Register Fields instruction has SPR name as part of the mnemonic rather than as a a preferred form; see Section 1.8.1, "Preferred Instruc- numeric operand. An extended mnemonic is provided tion Forms" on page 23. In the preferred form, the FXM for the mtcrf instruction for compatibility with old soft- field satisfies the following rule. ware (written for a version of the architecture that pre- Exactly one bit of the FXM field is set to 1. cedes Version 2.00) that uses it to set the entire Condition Register. Some of these extended mnemon- Extended mnemonics ics are shown as examples with the relevant instruc- tions. See Appendix E, "Assembler Extended Extended mnemonics are provided for the mtspr and Mnemonics" on page 627 for additional extended mne- mfspr instructions so that they can be coded with the monics. Chapter 3. Fixed-Point Facility 99 Version 2.06 Move To Special Purpose Register Extended Mnemonics: XFX-form Examples of extended mnemonics for Move To Special Purpose Register: mtspr SPR,RS Extended: Equivalent to: 31 RS spr 467 / mtxer Rx mtspr 1,Rx 0 6 11 21 31 mtlr Rx mtspr 8,Rx mtctr Rx mtspr 9,Rx n spr5:9 || spr0:4 mtppr Rx mtspr 896,Rx if n = 13 then see Book III-S mtppr32 Rx mtspr 898,Rx else if length(SPR(n)) = 64 then SPR(n) (RS) Programming Note else The AMR is part of the "context" of the program SPR(n) (RS)32:63 (see Book III-S). Therefore modification of the AMR The SPR field denotes a Special Purpose Register, requires "synchronization" by software. For this encoded as shown in the table below. Unless the SPR reason, most operating systems provide a system field contains 13 (denoting the AMR), the contents library program that application programs can use of register RS are placed into the designated Special to modify the AMR. Purpose Register. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RS are placed Compiler and Assembler Note into the SPR. For the mtspr and mfspr instructions, the SPR The AMR (Authority Mask Register) is used for "stor- number coded in Assembler language does not age protection" in the Server environment. This use, appear directly as a 10-bit binary number in the and operation of mtspr for the AMR, are described in instruction. The number coded is split into two 5-bit Book III-S. halves that are reversed in the instruction, with the high-order 5 bits appearing in bits 16:20 of the SPR1 Register decimal instruction and the low-order 5 bits in bits 11:15. spr5:9 spr0:4 Name 1 00000 00001 XER 8 00000 01000 LR 9 00000 01001 CTR 13 00000 01101 AMR5 256 01000 00000 VRSAVE 512 10000 00000 SPEFSCR2 896 11100 00000 PPR3 898 11100 00010 PPR324 1 Note that the order of the two 5-bit halves of the SPR number is reversed. 2 Category: SPE. 3 Category: Server; see Book III-S. 4 Category: Phased-In. See Section 3.1 of Book II. If execution of this instruction is attempted specifying an SPR number that is not shown above, or an SPR number that is shown above but is in a category that is not supported by the implementation, one of the follow- ing occurs. If spr0 = 0, the illegal instruction error handler is invoked. If spr0 = 1, the system privileged instruction error handler is invoked. A complete description of this instruction can be found in Book III. Special Registers Altered: See above 100 Power ISATM Book I Version 2.06 Move From Special Purpose Register A complete description of this instruction can be found XFX-form in Book III. Special Registers Altered: mfspr RT,SPR None 31 RT spr 339 / Extended Mnemonics: 0 6 11 21 31 Examples of extended mnemonics for Move From Spe- cial Purpose Register: n spr5:9 || spr0:4 if length(SPR(n)) = 64 then Extended: Equivalent to: RT SPR(n) mfxer Rx mfspr Rx,1 else mflr Rx mfspr Rx,8 RT 32 0 || SPR(n) mfctr Rx mfspr Rx,9 The SPR field denotes a Special Purpose Register, encoded as shown in the table below. The contents of Note the designated Special Purpose Register are placed See the Notes that appear with mtspr. into register RT. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RT receive the contents of the Special Purpose Register and the high-order 32 bits of RT are set to zero. SPR1 Register decimal spr5:9 spr0:4 Name 1 00000 00001 XER 8 00000 01000 LR 9 00000 01001 CTR 13 00000 01101 AMR8 136 00100 01000 CTRL 256 01000 00000 VRSAVE 259 01000 00011 SPRG3 260 01000 00100 SPRG42 261 01000 00101 SPRG52 262 01000 00110 SPRG62 263 01000 00111 SPRG72 268 01000 01100 TB3 269 01000 01101 TBU3 512 10000 00000 SPEFSCR4 526 10000 01110 ATB3,5 527 10000 01111 ATBU3,5 896 11100 00000 PPR6 898 11100 00010 PPR327 1 Note that the order of the two 5-bit halves of the SPR number is reversed. 2 Category: Embedded. 3 See Chapter 5 of Book II. 4 Category: SPE. 5 Category: Alternate Time Base. 6 Category: Server; see Book III-S. 7 Category: Phased-In. See Section 3.1 of Book II. If execution of this instruction is attempted specifying an SPR number that is not shown above, or an SPR number that is shown above but is in a category that is not supported by the implementation, one of the follow- ing occurs. If spr0 = 0, the illegal instruction error handler is invoked. If spr0 = 1, the system privileged instruction error handler is invoked. Chapter 3. Fixed-Point Facility 101 Version 2.06 Move To Condition Register Fields Move From Condition Register XFX-form XFX-form mfcr RT mtcrf FXM,RS 31 RT 0 /// 19 / 0 6 11 12 21 31 31 RS 0 FXM / 144 / 0 6 11 12 20 21 31 RT 32 0 || CR 4 The contents of the Condition Register are placed into mask (FXM0) || 4(FXM1) || ... 4(FXM7) RT32:63. RT0:31 are set to 0. CR ((RS)32:63 & mask) | (CR & ¬mask) Special Registers Altered: The contents of bits 32:63 of register RS are placed None into the Condition Register under control of the field mask specified by FXM. The field mask identifies the 4-bit fields affected. Let i be an integer in the range 0-7. If FXMi=1 then CR field i (CR bits 4×i+32:4×i+35) is set to the contents of the corresponding field of the low-order 32 bits of RS. Special Registers Altered: CR fields selected by mask Extended Mnemonics: Example of extended mnemonics for Move To Condi- tion Register Fields: Extended: Equivalent to: mtcr Rx mtcrf 0xFF,Rx Programming Note In the preferred form of this instruction (mtocrf), only one Condition Register field is updated. 102 Power ISATM Book I Version 2.06 3.3.15.1 Move to/From One Condition Register Field Instructions Move To One Condition Register Field Move From One Condition Register Field XFX-form XFX-form mtocrf FXM,RS mfocrf RT,FXM 31 RS 1 FXM / 144 / 31 RT 1 FXM / 19 / 0 6 11 12 20 21 31 0 6 11 12 20 21 31 count 0 RT undefined do i = 0 to 7 count 0 if FXMi = 1 then do i = 0 to 7 n i if FXMi = 1 then count count + 1 n i if count = 1 then count count + 1 CR4×n+32:4×n+35 (RS)4×n+32:4×n+35 if count = 1 then else CR undefined RT4×n+32:4×n+35 CR4×n+32:4×n+35 If exactly one bit of the FXM field is set to 1, let n be the If exactly one bit of the FXM field is set to 1, let n be the position of that bit in the field (0 n 7). The contents position of that bit in the field (0 n 7). The contents of bits 4×n+32:4×n+35 of register RS are placed into of CR field n (CR bits 4×n+32:4×n+35) are placed into CR field n (CR bits 4×n+32:4×n+35). Otherwise, the bits 4×n+32:4×n+35 of register RT and the contents of contents of the Condition Register are undefined. the remaining bits of register RT are undefined. Other- wise, the contents of register RT are undefined. Special Registers Altered: CR field selected by FXM Special Registers Altered: None Programming Note These forms of the mtcrf and mfcr instructions are intended to replace the old forms of the instructions (the forms shown in page 102), which will eventu- ally be phased out of the architecture. The new forms are backward compatible with most proces- sors that comply with versions of the architecture that precede Version 2.00. On those processors, the new forms are treated as the old forms. However, on some processors that comply with versions of the architecture that precede Version 2.00 the new forms may be treated as follows: mtocrf: may cause the system illegal instruction error handler to be invoked mfocrf: may place an undefined value into register RT Chapter 3. Fixed-Point Facility 103 Version 2.06 3.3.15.2 Move To/From System Registers [Category: Embedded] Move to Condition Register from XER Move From Device Control Register X-form User-mode Indexed X-form mcrxr BF mfdcrux RT,RA [Category: Embedded.Device Control] 31 BF // /// /// 512 / 0 6 9 11 16 21 31 31 RT RA /// 291 / 0 6 11 16 21 31 CR4×BF+32:4×BF+35 XER32:35 XER32:35 0b0000 DCRN (RA) RT DCR(DCRN) The contents of XER32:35 are copied to Condition Reg- ister field BF. XER32:35 are set to zero. Let the contents of register RA denote a Device Control Register. (The supported Device Control Registers are Special Registers Altered: implementation-dependent.) CR field BF XER32:35 The contents of the designated Device Control Register are placed into RT. For 32-bit Device Control Regis- ters, the contents of bits 32:63 of the designated Device Control Register are placed into RT. Move To Device Control Register See "Move From Device Control Register Indexed User-mode Indexed X-form X-form" on page 923 in Book III for more information on this instruction. mtdcrux RS,RA [Category: Embedded.Device Control] Special Registers Altered: Implementation-dependent 31 RS RA /// 419 / 0 6 11 16 21 31 DCRN (RA) DCR(DCRN) RS Let the contents of register RA denote a Device Control Register. (The supported Device Control Registers are implementation-dependent.) The contents of RS are placed into the designated Device Control Register. For 32-bit Device Control Registers, the contents of bits 32:63 of RS are placed into the Device Control Register. See "Move To Device Control Register Indexed X-form" on page 922 in Book III for more information on this instruction. Special Registers Altered: Implementation-dependent 104 Power ISATM Book I Version 2.06 Chapter 4. Floating-Point Facility [Category: Floating-Point] 4.1 Floating-Point Facility Overview . . 105 4.5.2 Execution Model for 4.2 Floating-Point Facility Registers . . 106 Multiply-Add Type Instructions. . . . . . . 121 4.2.1 Floating-Point Registers . . . . . . 106 4.6 Floating-Point Facility Instructions. 122 4.2.2 Floating-Point Status and Control 4.6.1 Floating-Point Storage Access Register. . . . . . . . . . . . . . . . . . . . . . . . 107 Instructions . . . . . . . . . . . . . . . . . . . . . 123 4.3 Floating-Point Data . . . . . . . . . . . . 109 4.6.1.1 Storage Access Exceptions . . 123 4.3.1 Data Format. . . . . . . . . . . . . . . . 109 4.6.2 Floating-Point Load Instructions. 123 4.3.2 Value Representation . . . . . . . . 110 4.6.3 Floating-Point Store Instructions 127 4.3.3 Sign of Result . . . . . . . . . . . . . . 111 4.6.4 Floating-Point Load Store 4.3.4 Normalization and Doubleword Pair Instructions [Category: Denormalization . . . . . . . . . . . . . . . . . 112 Floating-Point.Phased-Out] . . . . . . . . . 131 4.3.5 Data Handling and Precision . . . 112 4.6.5 Floating-Point Move Instructions 132 4.3.5.1 Single-Precision Operands . . . 112 4.6.6 Floating-Point Arithmetic 4.3.5.2 Integer-Valued Operands . . . . 113 Instructions . . . . . . . . . . . . . . . . . . . . . 133 4.3.6 Rounding . . . . . . . . . . . . . . . . . . 113 4.6.6.1 Floating-Point Elementary 4.4 Floating-Point Exceptions . . . . . . . 114 Arithmetic Instructions . . . . . . . . . . . . . 133 4.4.1 Invalid Operation Exception . . . . 116 4.6.6.2 Floating-Point Multiply-Add 4.4.1.1 Definition. . . . . . . . . . . . . . . . . 116 Instructions . . . . . . . . . . . . . . . . . . . . . 138 4.4.1.2 Action . . . . . . . . . . . . . . . . . . . 116 4.6.7 Floating-Point Rounding and 4.4.2 Zero Divide Exception . . . . . . . . 117 Conversion Instructions . . . . . . . . . . . . 140 4.4.2.1 Definition. . . . . . . . . . . . . . . . . 117 4.6.7.1 Floating-Point Rounding Instruction 4.4.2.2 Action . . . . . . . . . . . . . . . . . . . 117 140 4.4.3 Overflow Exception . . . . . . . . . . 117 4.6.7.2 Floating-Point Convert To/From 4.4.3.1 Definition. . . . . . . . . . . . . . . . . 117 Integer Instructions . . . . . . . . . . . . . . . 140 4.4.3.2 Action . . . . . . . . . . . . . . . . . . . 117 4.6.7.3 Floating Round to Integer 4.4.4 Underflow Exception . . . . . . . . . 118 Instructions . . . . . . . . . . . . . . . . . . . . . 146 4.4.4.1 Definition. . . . . . . . . . . . . . . . . 118 4.6.8 Floating-Point Compare 4.4.4.2 Action . . . . . . . . . . . . . . . . . . . 118 Instructions . . . . . . . . . . . . . . . . . . . . . 148 4.4.5 Inexact Exception . . . . . . . . . . . 119 4.6.9 Floating-Point Select Instruction. 149 4.4.5.1 Definition. . . . . . . . . . . . . . . . . 119 4.6.10 Floating-Point Status and Control 4.4.5.2 Action . . . . . . . . . . . . . . . . . . . 119 Register Instructions . . . . . . . . . . . . . . 150 4.5 Floating-Point Execution Models . 119 4.5.1 Execution Model for IEEE Operations. . . . . . . . . . . . . . . . . . . . . . 119 4.1 Floating-Point Facility Over- system compliant with the ANSI/IEEE Standard 754-1985, "IEEE Standard for Binary Floating-Point view Arithmetic" (hereafter referred to as "the IEEE stan- dard"). That standard defines certain required "opera- This chapter describes the registers and instructions tions" (addition, subtraction, etc.). Herein, the term that make up the Floating-Point Facility. "floating-point operation" is used to refer to one of these required operations and to additional operations The processor (augmented by appropriate software defined (e.g., those performed by Multiply-Add or support, where required) implements a floating-point Reciprocal Estimate instructions). A Non-IEEE mode is Chapter 4. Floating-Point Facility [Category: Floating-Point] 105 Version 2.06 also provided. This mode, which may produce results Floating-Point Exceptions not in strict compliance with the IEEE standard, allows shorter latency. The following floating-point exceptions are detected by the processor: Instructions are provided to perform arithmetic, round- ing, conversion, comparison, and other operations in Invalid Operation Exception (VX) floating-point registers; to move floating-point data SNaN (VXSNAN) between storage and these registers; and to manipu- Infinity-Infinity (VXISI) late the Floating-Point Status and Control Register Infinity÷Infinity (VXIDI) explicitly. Zero÷Zero (VXZDZ) These instructions are divided into two categories. Infinity×Zero (VXIMZ) Invalid Compare (VXVC) computational instructions Software-Defined Condition (VXSOFT) The computational instructions are those that per- Invalid Square Root (VXSQRT) form addition, subtraction, multiplication, division, Invalid Integer Convert (VXCVI) extracting the square root, rounding, conversion, Zero Divide Exception (ZX) comparison, and combinations of these opera- Overflow Exception (OX) tions. These instructions provide the floating-point Underflow Exception (UX) operations. They place status information into the Inexact Exception (XX) Floating-Point Status and Control Register. They Each floating-point exception, and each category of are the instructions described in Sections 4.6.6 Invalid Operation Exception, has an exception bit in the through 4.6.8. FPSCR. In addition, each floating-point exception has a non-computational instructions corresponding enable bit in the FPSCR. See Section 4.2.2, "Floating-Point Status and Control Reg- The non-computational instructions are those that ister" on page 107 for a description of these exception perform loads and stores, move the contents of a and enable bits, and Section 4.4, "Floating-Point floating-point register to another floating-point reg- Exceptions" on page 114 for a detailed discussion of ister possibly altering the sign, manipulate the floating-point exceptions, including the effects of the Floating-Point Status and Control Register explic- enable bits. itly, and select the value from one of two float- ing-point registers based on the value in a third floating-point register. The operations performed by these instructions are not considered float- 4.2 Floating-Point Facility Reg- ing-point operations. With the exception of the isters instructions that manipulate the Floating-Point Sta- tus and Control Register explicitly, they do not alter the Floating-Point Status and Control Register. 4.2.1 Floating-Point Registers They are the instructions described in Sections Implementations of this architecture provide 32 float- 4.6.2 through 4.6.5, and 4.6.10. ing-point registers (FPRs). The floating-point instruction A floating-point number consists of a signed exponent formats provide 5-bit fields for specifying the FPRs to and a signed significand. The quantity expressed by be used in the execution of the instruction. The FPRs this number is the product of the significand and the are numbered 0-31. See Figure 48 on page 107. number 2exponent. Encodings are provided in the data Each FPR contains 64 bits that support the float- format to represent finite numeric values, ±Infinity, and ing-point double format. Every instruction that inter- values that are "Not a Number" (NaN). Operations prets the contents of an FPR as a floating-point value involving infinities produce results obeying traditional uses the floating-point double format for this interpreta- mathematical conventions. NaNs have no mathemati- tion. cal interpretation. Their encoding permits a variable diagnostic information field. They may be used to indi- The computational instructions, and the Move and cate such things as uninitialized variables and can be Select instructions, operate on data located in FPRs produced by certain invalid operations. and, with the exception of the Compare instructions, place the result value into an FPR and optionally (when There is one class of exceptional events that occur dur- Rc=1) place status information into the Condition Reg- ing instruction execution that is unique to the Float- ister. Instruction forms with Rc=1 are part of Category: ing-Point Facility: the Floating-Point Exception. Floating-Point.Record. Floating-point exceptions are signaled with bits set in the Floating-Point Status and Control Register Load Double and Store Double instructions are pro- (FPSCR). They can cause the system floating-point vided that transfer 64 bits of data between storage and enabled exception error handler to be invoked, pre- the FPRs with no conversion. Load Single instructions cisely or imprecisely, if the proper control bits are set. are provided to transfer and convert floating-point val- 106 Power ISATM Book I Version 2.06 ues in floating-point single format from storage to the Bit(s) Description same value in floating-point double format in the FPRs. 0:31 Reserved Store Single instructions are provided to transfer and convert floating-point values in floating-point double 32 Floating-Point Exception Summary (FX) format from the FPRs to the same value in float- Every floating-point instruction, except mtfsfi ing-point single format in storage. and mtfsf, implicitly sets FPSCRFX to 1 if that instruction causes any of the floating-point Instructions are provided that manipulate the Float- exception bits in the FPSCR to change from 0 ing-Point Status and Control Register and the Condi- to 1. mcrfs, mtfsfi, mtfsf, mtfsb0, and tion Register explicitly. Some of these instructions copy mtfsb1 can alter FPSCRFX explicitly. data from an FPR to the Floating-Point Status and Con- trol Register or vice versa. Programming Note The computational instructions and the Select instruc- FPSCRFX is defined not to be altered tion accept values from the FPRs in double format. For implicitly by mtfsfi and mtfsf because single-precision arithmetic instructions, all input values permitting these instructions to alter must be representable in single format; if they are not, FPSCRFX implicitly could cause a para- the result placed into the target FPR, and the setting of dox. An example is an mtfsfi or mtfsf status bits in the FPSCR and in the Condition Register instruction that supplies 0 for FPSCRFX (if Rc=1), are undefined. and 1 for FPSCROX, and is executed when FPSCROX=0. See also the Pro- FPR 0 gramming Notes with the definition of FPR 1 these two instructions. ... 33 Floating-Point Enabled Exception Sum- ... mary (FEX) This bit is the OR of all the floating-point FPR 30 exception bits masked by their respective FPR 31 enable bits. mcrfs, mtfsfi, mtfsf, mtfsb0, and 0 63 mtfsb1 cannot alter FPSCRFEX explicitly. 34 Floating-Point Invalid Operation Excep- Figure 48. Floating-Point Registers tion Summary (VX) This bit is the OR of all the Invalid Operation 4.2.2 Floating-Point Status and exception bits. mcrfs, mtfsfi, mtfsf, mtfsb0, and mtfsb1 cannot alter FPSCRVX explicitly. Control Register 35 Floating-Point Overflow Exception (OX) The Floating-Point Status and Control Register See Section 4.4.3, "Overflow Exception" on (FPSCR) controls the handling of floating-point excep- page 117. tions and records status resulting from the float- ing-point operations. Bits 32:55 are status bits. Bits 36 Floating-Point Underflow Exception (UX) 56:63 are control bits. See Section 4.4.4, "Underflow Exception" on page 118. The exception bits in the FPSCR (bits 35:44, 53:55) are 37 Floating-Point Zero Divide Exception (ZX) sticky; that is, once set to 1 they remain set to 1 until See Section 4.4.2, "Zero Divide Exception" on they are set to 0 by an mcrfs, mtfsfi, mtfsf, or mtfsb0 page 117. instruction. The exception summary bits in the FPSCR (FX, FEX, and VX, which are bits 32:34) are not consid- 38 Floating-Point Inexact Exception (XX) ered to be "exception bits", and only FX is sticky. See Section 4.4.5, "Inexact Exception" on page 119. FEX and VX are simply the ORs of other FPSCR bits. Therefore these two bits are not listed among the FPSCRXX is a sticky version of FPSCRFI (see FPSCR bits affected by the various instructions. below). Thus the following rules completely describe how FPSCRXX is set by a given FPSCR instruction. 0 63 If the instruction affects FPSCRFI, the Figure 49. Floating-Point Status and Control new value of FPSCRXX is obtained by Register ORing the old value of FPSCRXX with the new value of FPSCRFI. The bit definitions for the FPSCR are as follows. Chapter 4. Floating-Point Facility [Category: Floating-Point] 107 Version 2.06 If the instruction does not affect Programming Note FPSCRFI, the value of FPSCRXX is unchanged. A single-precision operation that produces a denormalized result sets FPRF to indi- 39 Floating-Point Invalid Operation Excep- cate a denormalized number. When pos- tion (SNaN) (VXSNAN) sible, single-precision denormalized See Section 4.4.1, "Invalid Operation Excep- numbers are represented in normalized tion" on page 116. double format in the target register. 40 Floating-Point Invalid Operation Excep- tion ( - ) (VXISI) See Section 4.4.1. 47 Floating-Point Result Class Descriptor (C) 41 Floating-Point Invalid Operation Excep- Arithmetic, rounding, and Convert From Inte- tion ( ÷ ) (VXIDI) ger instructions may set this bit with the FPCC See Section 4.4.1. bits, to indicate the class of the result as shown in Figure 50 on page 109. 42 Floating-Point Invalid Operation Excep- tion (0 ÷0) (VXZDZ) 48:51 Floating-Point Condition Code (FPCC) See Section 4.4.1. Floating-point Compare instructions set one of the FPCC bits to 1 and the other three FPCC 43 Floating-Point Invalid Operation Excep- bits to 0. Arithmetic, rounding, and Convert tion ( ×0) (VXIMZ) From Integer instructions may set the FPCC See Section 4.4.1. bits with the C bit, to indicate the class of the 44 Floating-Point Invalid Operation Excep- result as shown in Figure 50 on page 109. tion (Invalid Compare) (VXVC) Note that in this case the high-order three bits See Section 4.4.1. of the FPCC retain their relational significance indicating that the value is less than, greater 45 Floating-Point Fraction Rounded (FR) than, or equal to zero. The last Arithmetic or Rounding and Conver- sion instruction incremented the fraction dur- 48 Floating-Point Less Than or Negative (FL ing rounding. See Section 4.3.6, "Rounding" or <) on page 113. This bit is not sticky. 49 Floating-Point Greater Than or Positive 46 Floating-Point Fraction Inexact (FI) (FG or >) The last Arithmetic or Rounding and Conver- 50 Floating-Point Equal or Zero (FE or =) sion instruction either produced an inexact result during rounding or caused a disabled 51 Floating-Point Unordered or NaN (FU or ?) Overflow Exception. See Section 4.3.6. This 52 Reserved bit is not sticky. 53 Floating-Point Invalid Operation Excep- See the definition of FPSCRXX, above, tion (Software-Defined Condition) regarding the relationship between FPSCRFI (VXSOFT) and FPSCRXX. This bit can be altered only by mcrfs, mtfsfi, 47:51 Floating-Point Result Flags (FPRF) mtfsf, mtfsb0, or mtfsb1. See Section 4.4.1. Arithmetic, rounding, and Convert From Inte- ger instructions set this field based on the Programming Note result placed into the target register and on FPSCRVXSOFT can be used by software the target precision, except that if any portion to indicate the occurrence of an arbitrary, of the result is undefined then the value software-defined, condition that is to be placed into FPRF is undefined. Floating-point treated as an Invalid Operation Exception. Compare instructions set this field based on For example, the bit could be set by a pro- the relative values of the operands being com- gram that computes a base 10 logarithm if pared. For Convert To Integer instructions, the the supplied input is negative. value placed into FPRF is undefined. Addi- tional details are given below. 54 Floating-Point Invalid Operation Excep- tion (Invalid Square Root) (VXSQRT) See Section 4.4.1. 55 Floating-Point Invalid Operation Excep- tion (Invalid Integer Convert) (VXCVI) See Section 4.4.1. 108 Power ISATM Book I Version 2.06 56 Floating-Point Invalid Operation Excep- Programming Note tion Enable (VE) See Section 4.4.1. When the processor is in floating-point non-IEEE mode, the results of float- 57 Floating-Point Overflow Exception Enable ing-point operations may be approximate, (OE) and performance for these operations See Section 4.4.3, "Overflow Exception" on may be better, more predictable, or less page 117. data-dependent than when the processor 58 Floating-Point Underflow Exception is not in non-IEEE mode. For example, in Enable (UE) non-IEEE mode an implementation may See Section 4.4.4, "Underflow Exception" on return 0 instead of a denormalized num- page 118. ber, and may return a large number instead of an infinity. 59 Floating-Point Zero Divide Exception Enable (ZE) 62:63 Floating-Point Rounding Control (RN) See See Section 4.4.2, "Zero Divide Exception" on Section 4.3.6, "Rounding" on page 113. page 117. 00 Round to Nearest 60 Floating-Point Inexact Exception Enable 01 Round toward Zero (XE) 10 Round toward +Infinity See Section 4.4.5, "Inexact Exception" on 11 Round toward -Infinity page 119. 61 Floating-Point Non-IEEE Mode (NI) Floating-point non-IEEE mode is optional. If Result floating-point non-IEEE mode is not imple- Flags Result Value Class mented, this bit is treated as reserved, and the C < > = ? remainder of the definition of this bit does not 1 0 0 0 1 Quiet NaN apply. 0 1 0 0 1 - Infinity If floating-point non-IEEE mode is imple- 0 1 0 0 0 - Normalized Number mented, this bit has the following meaning. 1 1 0 0 0 - Denormalized Number 0 The processor is not in floating-point 1 0 0 1 0 - Zero non-IEEE mode (i.e., all floating-point 0 0 0 1 0 + Zero operations conform to the IEEE standard). 1 0 1 0 0 + Denormalized Number 1 The processor is in floating-point 0 0 1 0 0 + Normalized Number non-IEEE mode. 0 0 1 0 1 + Infinity When the processor is in floating-point Figure 50. Floating-Point Result Flags non-IEEE mode, the remaining FPSCR bits may have meanings different from those given in this document, and floating-point operations need not conform to the IEEE standard. The 4.3 Floating-Point Data effects of executing a given floating-point instruction with FPSCRNI=1, and any addi- 4.3.1 Data Format tional requirements for using non-IEEE mode, are implementation-dependent. The results of This architecture defines the representation of a float- executing a given instruction in non-IEEE ing-point value in two different binary fixed-length for- mode may vary between implementations, mats. The format may be a 32-bit single format for a and between different executions on the same single-precision value or a 64-bit double format for a implementation. double-precision value. The single format may be used for data in storage. The double format may be used for data in storage and for data in floating-point registers. The lengths of the exponent and the fraction fields dif- fer between these two formats. The structure of the sin- gle and double formats is shown below. S EXP FRACTION 0 1 9 31 Figure 51. Floating-point single format Chapter 4. Floating-Point Facility [Category: Floating-Point] 109 Version 2.06 ties as defined below. The relative location on the real S EXP FRACTION number line for each of the defined entities is shown in Figure 54. 0 1 12 63 Figure 52. Floating-point double format -INF -NOR -DEN -0 +0 +DEN +NOR +INF Values in floating-point format are composed of three fields: Figure 54. Approximation to real numbers S sign bit The NaNs are not related to the numeric values or infin- EXP exponent+bias ities by order or value but are encodings used to con- FRACTION fraction vey diagnostic information such as the representation of uninitialized variables. Representation of numeric values in the floating-point formats consists of a sign bit (S), a biased exponent The following is a description of the different float- (EXP), and the fraction portion (FRACTION) of the sig- ing-point values defined in the architecture: nificand. The significand consists of a leading implied Binary floating-point numbers bit concatenated on the right with the FRACTION. This Machine representable values used as approximations leading implied bit is 1 for normalized numbers and 0 to real numbers. Three categories of numbers are sup- for denormalized numbers and is located in the unit bit ported: normalized numbers, denormalized numbers, position (i.e., the first bit to the left of the binary point). and zero values. Values representable within the two floating-point for- mats can be specified by the parameters listed in Normalized numbers (± NOR) Figure 53. These are values that have a biased exponent value in the range: Format Single Double 1 to 254 in single format 1 to 2046 in double format Exponent Bias +127 +1023 They are values in which the implied unit bit is 1. Nor- Maximum Exponent +127 +1023 malized numbers are interpreted as follows: Minimum Exponent -126 -1022 NOR = (-1)s x 2E x (1.fraction) Widths (bits) where s is the sign, E is the unbiased exponent, and Format 32 64 1.fraction is the significand, which is composed of a leading unit bit (implied bit) and a fraction part. Sign 1 1 Exponent 8 11 The ranges covered by the magnitude (M) of a normal- Fraction 23 52 ized floating-point number are approximately equal to: Significand 24 53 Single Format: Figure 53. IEEE floating-point fields 1.2x10-38 M 3.4x1038 The architecture requires that the FPRs of the Float- Double Format: ing-Point Facility support the floating-point double for- 2.2x10-308 M 1.8x10308 mat only. Zero values (± 0) These are values that have a biased exponent value of 4.3.2 Value Representation zero and a fraction value of zero. Zeros can have a This architecture defines numeric and non-numeric val- positive or negative sign. The sign of zero is ignored by ues representable within each of the two supported for- comparison operations (i.e., comparison regards +0 as mats. The numeric values are approximations to the equal to -0). real numbers and include the normalized numbers, Denormalized numbers (± DEN) denormalized numbers, and zero values. The These are values that have a biased exponent value of non-numeric values representable are the infinities and zero and a nonzero fraction value. They are nonzero the Not a Numbers (NaNs). The infinities are adjoined numbers smaller in magnitude than the representable to the real numbers, but are not numbers themselves, normalized numbers. They are values in which the and the standard rules of arithmetic do not hold when implied unit bit is 0. Denormalized numbers are inter- they are used in an operation. They are related to the preted as follows: real numbers by order alone. It is possible however to define restricted operations among numbers and infini- DEN = (-1)s x 2Emin x (0.fraction) 110 Power ISATM Book I Version 2.06 where Emin is the minimum representable exponent then FRT (FRB)0:34 || 290 value (-126 for single-precision, -1022 for double-pre- else FRT (FRB) cision). else if (FRC) is a NaN then FRT (FRC) Infinities (± ) else if generated QNaN These are values that have the maximum biased expo- then FRT generated QNaN nent value: If the operand specified by FRA is a NaN, then that 255 in single format NaN is stored as the result. Otherwise, if the operand 2047 in double format specified by FRB is a NaN (if the instruction specifies and a zero fraction value. They are used to approxi- an FRB operand), then that NaN is stored as the result, mate values greater in magnitude than the maximum with the low-order 29 bits of the result set to 0 if the normalized value. instruction is frsp. Otherwise, if the operand specified by FRC is a NaN (if the instruction specifies an FRC Infinity arithmetic is defined as the limiting case of real operand), then that NaN is stored as the result. Other- arithmetic, with restricted operations defined among wise, if a QNaN was generated due to a disabled numbers and infinities. Infinities and the real numbers Invalid Operation Exception, then that QNaN is stored can be related by ordering in the affine sense: as the result. If a QNaN is to be generated as a result, - < every finite number < + then the QNaN generated has a sign bit of 0, an expo- nent field of all 1s, and a high-order fraction bit of 1 with Arithmetic on infinities is always exact and does not all other fraction bits 0. Any instruction that generates a signal any exception, except when an exception occurs QNaN as the result of a disabled Invalid Operation due to the invalid operations as described in Exception generates this QNaN (i.e., Section 4.4.1, "Invalid Operation Exception" on 0x7FF8_0000_0000_0000). page 116. A double-precision NaN is considered to be represent- For comparison operations, +Infinity compares equal to able in single format if and only if the low-order 29 bits +Infinity and -Infinity compares equal to -Infinity. of the double-precision NaN's fraction are zero. Not a Numbers (NaNs) These are values that have the maximum biased expo- 4.3.3 Sign of Result nent value and a nonzero fraction value. The sign bit is ignored (i.e., NaNs are neither positive nor negative). If The following rules govern the sign of the result of an the high-order bit of the fraction field is 0 then the NaN arithmetic, rounding, or conversion operation, when the is a Signaling NaN; otherwise it is a Quiet NaN. operation does not yield an exception. They apply even when the operands or results are zeros or infinities. Signaling NaNs are used to signal exceptions when they appear as operands of computational instructions. The sign of the result of an add operation is the sign of the operand having the larger absolute Quiet NaNs are used to represent the results of certain value. If both operands have the same sign, the invalid operations, such as invalid arithmetic operations sign of the result of an add operation is the same on infinities or on NaNs, when Invalid Operation Excep- as the sign of the operands. The sign of the result tion is disabled (FPSCRVE=0). Quiet NaNs propagate of the subtract operation x-y is the same as the through all floating-point operations except ordered sign of the result of the add operation x+(-y). comparison, Floating Round to Single-Precision, and conversion to integer. Quiet NaNs do not signal excep- When the sum of two operands with opposite sign, tions, except for ordered comparison and conversion to or the difference of two operands with the same integer operations. Specific encodings in QNaNs can sign, is exactly zero, the sign of the result is posi- thus be preserved through a sequence of floating-point tive in all rounding modes except Round toward operations, and used to convey diagnostic information -Infinity, in which mode the sign is negative. to help identify results from invalid operations. The sign of the result of a multiply or divide opera- When a QNaN is the result of a floating-point operation tion is the Exclusive OR of the signs of the oper- because one of the operands is a NaN or because a ands. QNaN was generated due to a disabled Invalid Opera- The sign of the result of a Square Root or Recipro- tion Exception, then the following rule is applied to cal Square Root Estimate operation is always pos- determine the NaN with the high-order fraction bit set to itive, except that the square root of -0 is -0 and 1 that is to be stored as the result. the reciprocal square root of -0 is -Infinity. if (FRA) is a NaN The sign of the result of a Round to Single-Preci- then FRT (FRA) sion, or Convert From Integer, or Round to Integer else if (FRB) is a NaN operation is the sign of the operand being con- then if instruction is frsp verted. Chapter 4. Floating-Point Facility [Category: Floating-Point] 111 Version 2.06 For the Multiply-Add instructions, the rules given above access a true single-precision representation in stor- are applied first to the multiply operation and then to age, and a fixed-point integer representation in GPRs. the add or subtract operation (one of the inputs to the add or subtract operation is the result of the multiply operation). 4.3.5.1 Single-Precision Operands For single format data, a format conversion from single to double is performed when loading from storage into 4.3.4 Normalization and an FPR and a format conversion from double to single Denormalization is performed when storing from an FPR to storage. No floating-point exceptions are caused by these instruc- The intermediate result of an arithmetic or frsp instruc- tions. An instruction is provided to explicitly convert a tion may require normalization and/or denormalization double format operand in an FPR to single-precision. as described below. Normalization and denormalization Floating-point single-precision is enabled with four do not affect the sign of the result. types of instruction. When an arithmetic or rounding instruction produces an intermediate result which carries out of the significand, or in which the significand is nonzero but has a leading 1. Load Floating-Point Single zero bit, it is not a normalized number and must be nor- This form of instruction accesses a single-preci- malized before it is stored. For the carry-out case, the sion operand in single format in storage, converts significand is shifted right one bit, with a one shifted it to double format, and loads it into an FPR. No into the leading significand bit, and the exponent is floating-point exceptions are caused by these incremented by one. For the leading-zero case, the sig- instructions. nificand is shifted left while decrementing its exponent by one for each bit shifted, until the leading significand 2. Round to Floating-Point Single-Precision bit becomes one. The Guard bit and the Round bit (see The Floating Round to Single-Precision instruction Section 4.5.1, "Execution Model for IEEE Operations" rounds a double-precision operand to single-preci- on page 119) participate in the shift with zeros shifted sion, checking the exponent for single-precision into the Round bit. The exponent is regarded as if its range and handling any exceptions according to range were unlimited. respective enable bits, and places that operand After normalization, or if normalization was not into an FPR in double format. For results produced required, the intermediate result may have a nonzero by single-precision arithmetic instructions, sin- significand and an exponent value that is less than the gle-precision loads, and other instances of the minimum value that can be represented in the format Floating Round to Single-Precision instruction, this specified for the result. In this case, the intermediate operation does not alter the value. result is said to be "Tiny" and the stored result is deter- 3. Single-Precision Arithmetic Instructions mined by the rules described in Section 4.4.4, "Under- flow Exception". These rules may require This form of instruction takes operands from the denormalization. FPRs in double format, performs the operation as if it produced an intermediate result having infinite A number is denormalized by shifting its significand precision and unbounded exponent range, and right while incrementing its exponent by 1 for each bit then coerces this intermediate result to fit in single shifted, until the exponent is equal to the format's mini- format. Status bits, in the FPSCR and optionally in mum value. If any significant bits are lost in this shifting the Condition Register, are set to reflect the sin- process then "Loss of Accuracy" has occurred (See gle-precision result. The result is then converted to Section 4.4.4, "Underflow Exception" on page 118) and double format and placed into an FPR. The result Underflow Exception is signaled. lies in the range supported by the single format. All input values must be representable in single 4.3.5 Data Handling and Precision format; if they are not, the result placed into the target FPR, and the setting of status bits in the Most of the Floating-Point Facility Architecture, includ- FPSCR and in the Condition Register (if Rc=1), ing all computational, Move, and Select instructions, are undefined. use the floating-point double format to represent data in the FPRs. Single-precision and integer-valued oper- 4. Store Floating-Point Single ands may be manipulated using double-precision oper- This form of instruction converts a double-preci- ations. Instructions are provided to coerce these values sion operand to single format and stores that oper- from a double format operand. Instructions are also and into storage. No floating-point exceptions are provided for manipulations which do not require dou- caused by these instructions. (The value being ble-precision. In addition, instructions are provided to stored is effectively assumed to be the result of an instruction of one of the preceding three types.) 112 Power ISATM Book I Version 2.06 When the result of a Load Floating-Point Single, Float- The Floating Convert To Integer instructions con- ing Round to Single-Precision, or single-precision arith- vert a double-precision operand to a 32-bit or metic instruction is stored in an FPR, the low-order 29 64-bit signed fixed-point integer format. Variants FRACTION bits are zero. are provided both to perform rounding based on the value of FPSCRRN and to round toward zero. Programming Note These instructions may cause Invalid Operation The Floating Round to Single-Precision instruction (VXSNaN, VXCVI) and Inexact exceptions. The is provided to allow value conversion from dou- Floating Convert From Integer instruction converts ble-precision to single-precision with appropriate a 64-bit signed fixed-point integer to a double-pre- exception checking and rounding. This instruction cision floating-point integer. Because of the limita- should be used to convert double-precision float- tions of the source format, only an Inexact ing-point values (produced by double-precision exception may be generated. load and arithmetic instructions and by fcfid) to sin- gle-precision values prior to storing them into single 4.3.6 Rounding format storage elements or using them as oper- ands for single-precision arithmetic instructions. The material in this section applies to operations that Values produced by single-precision load and arith- have numeric operands (i.e., operands that are not metic instructions are already single-precision val- infinities or NaNs). Rounding the intermediate result of ues and can be stored directly into single format such an operation may cause an Overflow Exception, storage elements, or used directly as operands for an Underflow Exception, or an Inexact Exception. The single-precision arithmetic instructions, without pre- remainder of this section assumes that the operation ceding the store, or the arithmetic instruction, by a causes no exceptions and that the result is numeric. Floating Round to Single-Precision instruction. See Section 4.3.2, "Value Representation" and Section 4.4, "Floating-Point Exceptions" for the cases Programming Note not covered here. A single-precision value can be used in double-pre- The Arithmetic and Rounding and Conversion instruc- cision arithmetic operations. The reverse is true tions round their intermediate results. With the excep- only if the double-precision value is representable tion of the Estimate instructions, these instructions in single format. produce an intermediate result that can be regarded as having infinite precision and unbounded exponent Some implementations may execute single-preci- range. All but two groups of these instructions normal- sion arithmetic instructions faster than double-pre- ize or denormalize the intermediate result prior to cision arithmetic instructions. Therefore, if rounding and then place the final result into the target double-precision accuracy is not required, sin- FPR in double format. The Floating Round to Integer gle-precision data and instructions should be used. and Floating Convert To Integer instructions with biased exponents ranging from 1022 through 1074 are prepared for rounding by repetitively shifting the signifi- 4.3.5.2 Integer-Valued Operands cand right one position and incrementing the biased Instructions are provided to round floating-point oper- exponent until it reaches a value of 1075. (Intermediate ands to integer values in floating-point format. To facili- results with biased exponents 1075 or larger are tate exchange of data between the floating-point and already integers, and with biased exponents 1021 or fixed-Point facilities, instructions are provided to con- less round to zero.) After rounding, the final result for vert between floating-point double format and Floating Round to Integer is normalized and put in dou- fixed-point integer format in an FPR. Computation on ble format, and for Floating Convert To Integer is con- integer-valued operands may be performed using arith- verted to a signed fixed-point integer. metic instructions of the required precision. (The results FPSCR bits FR and FI generally indicate the results of may not be integer values.) The two groups of instruc- rounding. Each of the instructions which rounds its tions provided specifically to support integer-valued intermediate result sets these bits. If the fraction is operands are described below. incremented during rounding then FR is set to 1, other- 1. Floating Round to Integer wise FR is set to 0. If the result is inexact then FI is set to 1, otherwise FI is set to zero. The Round to Integer The Floating Round to Integer instructions round a instructions are exceptions to this rule, setting FR and double-precision operand to an integer value in FI to 0. The Estimate instructions set FR and FI to floating-point double format. These instructions undefined values. The remaining floating-point instruc- may cause Invalid Operation (VXSNAN) excep- tions do not alter FR and FI. tions. See Sections 4.3.6 and 4.5.1 for more infor- mation about rounding. Four user-selectable rounding modes are provided through the Floating-Point Rounding Control field in the 2. Floating Convert To/From Integer Chapter 4. Floating-Point Facility [Category: Floating-Point] 113 Version 2.06 FPSCR. See Section 4.2.2, "Floating-Point Status and Control Register". These are encoded as follows. 4.4 Floating-Point Exceptions This architecture defines the following floating-point exceptions: RN Rounding Mode Invalid Operation Exception 00 Round to Nearest SNaN 01 Round toward Zero Infinity-Infinity 10 Round toward +Infinity Infinity÷Infinity 11 Round toward -Infinity Zero÷Zero Let Z be the intermediate arithmetic result or the oper- Infinity×Zero and of a convert operation. If Z can be represented Invalid Compare exactly in the target format, then the result in all round- Software-Defined Condition ing modes is Z as represented in the target format. If Z Invalid Square Root cannot be represented exactly in the target format, let Invalid Integer Convert Z1 and Z2 bound Z as the next larger and next smaller Zero Divide Exception numbers representable in the target format. Then Z1 or Overflow Exception Z2 can be used to approximate the result in the target Underflow Exception format. Inexact Exception Figure 55 shows the relation of Z, Z1, and Z2 in this These exceptions, other than Invalid Operation Excep- case. The following rules specify the rounding in the tion due to Software-Defined Condition, may occur dur- four modes. "LSB" means "least significant bit". ing execution of computational instructions. An Invalid Operation Exception due to Software-Defined Condi- tion occurs when a Move To FPSCR instruction sets By Incrementing LSB of Z FPSCRVXSOFT to 1. Infinitely Precise Value By Truncating after LSB Each floating-point exception, and each category of Invalid Operation Exception, has an exception bit in the FPSCR. In addition, each floating-point exception has a Z2 Z1 0 Z2 Z1 corresponding enable bit in the FPSCR. The exception Z Z bit indicates occurrence of the corresponding excep- Negative values Positive values tion. If an exception occurs, the corresponding enable bit governs the result produced by the instruction and, Figure 55. Selection of Z1 and Z2 in conjunction with the FE0 and FE1 bits (see page 115), whether and how the system floating-point Round to Nearest enabled exception error handler is invoked. (In general, Choose the value that is closer to Z (Z1 or the enabling specified by the enable bit is of invoking Z2). In case of a tie, choose the one that is the system error handler, not of permitting the excep- even (least significant bit 0). tion to occur. The occurrence of an exception depends Round toward Zero only on the instruction and its inputs, not on the setting Choose the smaller in magnitude (Z1 or Z2). of any control bits. The only deviation from this general rule is that the occurrence of an Underflow Exception Round toward +Infinity may depend on the setting of the enable bit.) Choose Z1. A single instruction, other than mtfsfi or mtfsf, may set Round toward -Infinity more than one exception bit only in the following cases: Choose Z2. Inexact Exception may be set with Overflow See Section 4.5.1, "Execution Model for IEEE Opera- Exception. tions" on page 119 for a detailed explanation of round- Inexact Exception may be set with Underflow ing. Exception. Invalid Operation Exception (SNaN) is set with Invalid Operation Exception (×0) for Multiply-Add instructions for which the values being multiplied are infinity and zero and the value being added is an SNaN. Invalid Operation Exception (SNaN) may be set with Invalid Operation Exception (Invalid Compare) for Compare Ordered instructions. Invalid Operation Exception (SNaN) may be set with Invalid Operation Exception (Invalid Integer Convert) for Convert To Integer instructions. 114 Power ISATM Book I Version 2.06 When an exception occurs the writing of a result to the ing-point exception occurs. The system floating-point target register may be suppressed or a result may be enabled exception error handler is also invoked if a delivered, depending on the exception. Move To FPSCR instruction causes an exception bit and the corresponding enable bit both to be 1; the The writing of a result to the target register is sup- Move To FPSCR instruction is considered to cause the pressed for the following kinds of exception, so that enabled exception. there is no possibility that one of the operands is lost: The FE0 and FE1 bits control whether and how the Enabled Invalid Operation system floating-point enabled exception error handler Enabled Zero Divide is invoked if an enabled floating-point exception occurs. For the remaining kinds of exception, a result is gener- The location of these bits and the requirements for ated and written to the destination specified by the altering them are described in Book III. (The system instruction causing the exception. The result may be a floating-point enabled exception error handler is never different value for the enabled and disabled conditions invoked because of a disabled floating-point excep- for some of these exceptions. The kinds of exception tion.) The effects of the four possible settings of these that deliver a result are the following: bits are as follows. Disabled Invalid Operation Disabled Zero Divide FE0 FE1 Description Disabled Overflow 0 0 Ignore Exceptions Mode Disabled Underflow Floating-point exceptions do not cause Disabled Inexact the system floating-point enabled excep- Enabled Overflow tion error handler to be invoked. Enabled Underflow 0 1 Imprecise Nonrecoverable Mode Enabled Inexact The system floating-point enabled excep- Subsequent sections define each of the floating-point tion error handler is invoked at some point exceptions and specify the action that is taken when at or beyond the instruction that caused they are detected. the enabled exception. It may not be pos- sible to identify the excepting instruction The IEEE standard specifies the handling of excep- or the data that caused the exception. tional conditions in terms of "traps" and "trap handlers". Results produced by the excepting In this architecture, an FPSCR exception enable bit of 1 instruction may have been used by or may causes generation of the result value specified in the have affected subsequent instructions IEEE standard for the "trap enabled" case; the expecta- that are executed before the error handler tion is that the exception will be detected by software, is invoked. which will revise the result. An FPSCR exception 1 0 Imprecise Recoverable Mode enable bit of 0 causes generation of the "default result" The system floating-point enabled excep- value specified for the "trap disabled" (or "no trap tion error handler is invoked at some point occurs" or "trap is not implemented") case; the expecta- at or beyond the instruction that caused tion is that the exception will not be detected by soft- the enabled exception. Sufficient informa- ware, which will simply use the default result. The result tion is provided to the error handler that it to be delivered in each case for each exception is can identify the excepting instruction and described in the sections below. the operands, and correct the result. No The IEEE default behavior when an exception occurs is results produced by the excepting instruc- to generate a default value and not to notify software. tion have been used by or have affected In this architecture, if the IEEE default behavior when subsequent instructions that are executed an exception occurs is desired for all exceptions, all before the error handler is invoked. FPSCR exception enable bits should be set to 0 and 1 1 Precise Mode Ignore Exceptions Mode (see below) should be used. The system floating-point enabled excep- In this case the system floating-point enabled exception tion error handler is invoked precisely at error handler is not invoked, even if floating-point the instruction that caused the enabled exceptions occur: software can inspect the FPSCR exception. exception bits if necessary, to determine whether exceptions have occurred. In all cases, the question of whether a floating-point result is stored, and what value is stored, is governed In this architecture, if software is to be notified that a by the FPSCR exception enable bits, as described in given kind of exception has occurred, the correspond- subsequent sections, and is not affected by the value of ing FPSCR exception enable bit must be set to 1 and a the FE0 and FE1 bits. mode other than Ignore Exceptions Mode must be used. In this case the system floating-point enabled In all cases in which the system floating-point enabled exception error handler is invoked if an enabled float- exception error handler is invoked, all instructions Chapter 4. Floating-Point Facility [Category: Floating-Point] 115 Version 2.06 before the instruction at which the system floating-point 4.4.1 Invalid Operation Exception enabled exception error handler is invoked have com- pleted, and no instruction after the instruction at which the system floating-point enabled exception error han- 4.4.1.1 Definition dler is invoked has begun execution. The instruction at An Invalid Operation Exception occurs when an oper- which the system floating-point enabled exception error and is invalid for the specified operation. The invalid handler is invoked has completed if it is the excepting operations are: instruction and there is only one such instruction. Oth- Any floating-point operation on a Signaling NaN erwise it has not begun execution (or may have been (SNaN) partially executed in some cases, as described in Book For add or subtract operations, magnitude subtrac- III). tion of infinities ( - ) Division of infinity by infinity ( ÷ ) Programming Note Division of zero by zero (0 ÷ 0) In any of the three non-Precise modes, a Float- Multiplication of infinity by zero ( × 0) ing-Point Status and Control Register instruction Ordered comparison involving a NaN (Invalid can be used to force any exceptions, due to Compare) instructions initiated before the Floating-Point Sta- Square root or reciprocal square root of a negative tus and Control Register instruction, to be recorded (and nonzero) number (Invalid Square Root) in the FPSCR. (This forcing is superfluous for Pre- Integer convert involving a number too large in cise Mode.) magnitude to be represented in the target format, In either of the Imprecise modes, a Floating-Point or involving an infinity or a NaN (Invalid Integer Status and Control Register instruction can be Convert) used to force any invocations of the system float- An Invalid Operation Exception also occurs when an ing-point enabled exception error handler, due to mtfsfi, mtfsf, or mtfsb1 instruction is executed that instructions initiated before the Floating-Point Sta- sets FPSCRVXSOFT to 1 (Software-Defined Condition). tus and Control Register instruction, to occur. (This forcing has no effect in Ignore Exceptions Mode, and is superfluous for Precise Mode.) 4.4.1.2 Action The last sentence of the paragraph preceding this The action to be taken depends on the setting of the Programming Note can apply only in the Imprecise Invalid Operation Exception Enable bit of the FPSCR. modes, or if the mode has just been changed from When Invalid Operation Exception is enabled Ignore Exceptions Mode to some other mode. (It (FPSCRVE=1) and an Invalid Operation Exception always applies in the latter case.) occurs, the following actions are taken: In order to obtain the best performance across the wid- 1. One or two Invalid Operation Exceptions are set est range of implementations, the programmer should FPSCRVXSNAN (if SNaN) obey the following guidelines. FPSCRVXISI (if - ) FPSCRVXIDI (if ÷ ) If the IEEE default results are acceptable to the FPSCRVXZDZ (if 0 ÷ 0) application, Ignore Exceptions Mode should be FPSCRVXIMZ (if × 0) used with all FPSCR exception enable bits set to FPSCRVXVC (if invalid comp) 0. FPSCRVXSOFT (if sfw-def cond) If the IEEE default results are not acceptable to the FPSCRVXSQRT (if invalid sqrt) application, Imprecise Nonrecoverable Mode FPSCRVXCVI (if invalid int cvrt) should be used, or Imprecise Recoverable Mode if 2. If the operation is an arithmetic, Floating Round to recoverability is needed, with FPSCR exception Single-Precision, Floating Round to Integer, or enable bits set to 1 for those exceptions for which convert to integer operation, the system floating-point enabled exception error the target FPR is unchanged handler is to be invoked. FPSCRFR FI are set to zero Ignore Exceptions Mode should not, in general, be FPSCRFPRF is unchanged used when any FPSCR exception enable bits are 3. If the operation is a compare, set to 1. FPSCRFR FI C are unchanged Precise Mode may degrade performance in some FPSCRFPCC is set to reflect unordered implementations, perhaps substantially, and there- 4. If an mtfsfi, mtfsf, or mtfsb1 instruction is exe- fore should be used only for debugging and other cuted that sets FPSCRVXSOFT to 1, specialized applications. The FPSCR is set as specified in the instruc- tion description. 116 Power ISATM Book I Version 2.06 When Invalid Operation Exception is disabled 4.4.2.2 Action (FPSCRVE=0) and an Invalid Operation Exception occurs, the following actions are taken: The action to be taken depends on the setting of the Zero Divide Exception Enable bit of the FPSCR. 1. One or two Invalid Operation Exceptions are set FPSCRVXSNAN (if SNaN) When Zero Divide Exception is enabled (FPSCRZE=1) FPSCRVXISI (if - ) and a Zero Divide Exception occurs, the following FPSCRVXIDI (if ÷ ) actions are taken: FPSCRVXZDZ (if 0 ÷ 0) 1. Zero Divide Exception is set FPSCRVXIMZ (if × 0) FPSCRZX 1 FPSCRVXVC (if invalid comp) 2. The target FPR is unchanged FPSCRVXSOFT (if sfw-def cond) 3. FPSCRFR FI are set to zero FPSCRVXSQRT (if invalid sqrt) 4. FPSCRFPRF is unchanged FPSCRVXCVI (if invalid int cvrt) 2. If the operation is an arithmetic or Floating Round When Zero Divide Exception is disabled (FPSCRZE=0) to Single-Precision operation, and a Zero Divide Exception occurs, the following the target FPR is set to a Quiet NaN actions are taken: FPSCRFR FI are set to zero 1. Zero Divide Exception is set FPSCRFPRF is set to indicate the class of the FPSCRZX 1 result (Quiet NaN) 2. The target FPR is set to ± Infinity, where the sign is 3. If the operation is a convert to 64-bit integer opera- determined by the XOR of the signs of the oper- tion, ands the target FPR is set as follows: 3. FPSCRFR FI are set to zero FRT is set to the most positive 64-bit integer 4. FPSCRFPRF is set to indicate the class and sign of if the operand in FRB is a positive number the result (± Infinity) or + , and to the most negative 64-bit inte- ger if the operand in FRB is a negative num- ber, - , or NaN 4.4.3 Overflow Exception FPSCRFR FI are set to zero FPSCRFPRF is undefined 4. If the operation is a convert to 32-bit integer opera- 4.4.3.1 Definition tion, An Overflow Exception occurs when the magnitude of the target FPR is set as follows: what would have been the rounded result if the expo- FRT0:31 undefined nent range were unbounded exceeds that of the largest FRT32:63 are set to the most positive 32-bit finite number of the specified result precision. integer if the operand in FRB is a positive number or +infinity, and to the most nega- tive 32-bit integer if the operand in FRB is a 4.4.3.2 Action negative number, -infinity, or NaN The action to be taken depends on the setting of the FPSCRFR FI are set to zero Overflow Exception Enable bit of the FPSCR. FPSCRFPRF is undefined 5. If the operation is a compare, When Overflow Exception is enabled (FPSCROE=1) FPSCRFR FI C are unchanged and an Overflow Exception occurs, the following FPSCRFPCC is set to reflect unordered actions are taken: 1. Overflow Exception is set FPSCROX 1 6. If an mtfsfi, mtfsf, or mtfsb1 instruction is exe- 2. For double-precision arithmetic instructions, the cuted that sets FPSCRVXSOFT to 1, exponent of the normalized intermediate result is The FPSCR is set as specified in the instruc- adjusted by subtracting 1536 tion description. 3. For single-precision arithmetic instructions and the Floating Round to Single-Precision instruction, the 4.4.2 Zero Divide Exception exponent of the normalized intermediate result is adjusted by subtracting 192 4. The adjusted rounded result is placed into the tar- 4.4.2.1 Definition get FPR 5. FPSCRFPRF is set to indicate the class and sign of A Zero Divide Exception occurs when a Divide instruc- the result (± Normal Number) tion is executed with a zero divisor value and a finite nonzero dividend value. It also occurs when a Recipro- When Overflow Exception is disabled (FPSCROE=0) cal Estimate instruction (fre[s] or frsqrte[s]) is exe- and an Overflow Exception occurs, the following cuted with an operand value of zero. actions are taken: Chapter 4. Floating-Point Facility [Category: Floating-Point] 117 Version 2.06 1. Overflow Exception is set 4.4.4 Underflow Exception FPSCROX 1 2. Inexact Exception is set FPSCRXX 1 4.4.4.1 Definition 3. The result is determined by the rounding mode Underflow Exception is defined separately for the (FPSCRRN) and the sign of the intermediate result enabled and disabled states: as follows: - Round to Nearest Enabled: Store ± Infinity, where the sign is the sign Underflow occurs when the intermediate result is of the intermediate result "Tiny". - Round toward Zero Disabled: Store the format's largest finite number Underflow occurs when the intermediate result is with the sign of the intermediate result "Tiny" and there is "Loss of Accuracy". - Round toward + Infinity For negative overflow, store the format's A "Tiny" result is detected before rounding, when a most negative finite number; for positive nonzero intermediate result computed as though both overflow, store +Infinity the precision and the exponent range were unbounded - Round toward -Infinity would be less in magnitude than the smallest normal- For negative overflow, store -Infinity; for ized number. positive overflow, store the format's larg- If the intermediate result is "Tiny" and Underflow est finite number Exception is disabled (FPSCRUE=0) then the interme- 4. The result is placed into the target FPR diate result is denormalized (see Section 4.3.4, "Nor- 5. FPSCRFR is undefined malization and Denormalization" on page 112) and 6. FPSCRFI is set to 1 rounded (see Section 4.3.6, "Rounding" on page 113) 7. FPSCRFPRF is set to indicate the class and sign of before being placed into the target FPR. the result (± Infinity or ± Normal Number) "Loss of Accuracy" is detected when the delivered result value differs from what would have been com- puted were both the precision and the exponent range unbounded. 4.4.4.2 Action The action to be taken depends on the setting of the Underflow Exception Enable bit of the FPSCR. When Underflow Exception is enabled (FPSCRUE=1) and an Underflow Exception occurs, the following actions are taken: 1. Underflow Exception is set FPSCRUX 1 2. For double-precision arithmetic instructions, the exponent of the normalized intermediate result is adjusted by adding 1536 3. For single-precision arithmetic instructions and the Floating Round to Single-Precision instruction, the exponent of the normalized intermediate result is adjusted by adding 192 4. The adjusted rounded result is placed into the tar- get FPR 5. FPSCRFPRF is set to indicate the class and sign of the result (± Normalized Number) 118 Power ISATM Book I Version 2.06 Programming Note 4.5 Floating-Point Execution The FR and FI bits are provided to allow the sys- Models tem floating-point enabled exception error handler, when invoked because of an Underflow Exception, All implementations of this architecture must provide to simulate a "trap disabled" environment. That is, the equivalent of the following execution models to the FR and FI bits allow the system floating-point ensure that identical results are obtained. enabled exception error handler to unround the result, thus allowing the result to be denormalized. Special rules are provided in the definition of the com- putational instructions for the infinities, denormalized numbers and NaNs. The material in the remainder of When Underflow Exception is disabled (FPSCRUE=0) this section applies to instructions that have numeric and an Underflow Exception occurs, the following operands and a numeric result (i.e., operands and actions are taken: result that are not infinities or NaNs), and that cause no 1. Underflow Exception is set exceptions. See Section 4.3.2 and Section 4.4 for the FPSCRUX 1 cases not covered here. 2. The rounded result is placed into the target FPR 3. FPSCRFPRF is set to indicate the class and sign of Although the double format specifies an 11-bit expo- the result (± Normalized Number, ± Denormalized nent, exponent arithmetic makes use of two additional Number, or ± Zero) bits to avoid potential transient overflow conditions. One extra bit is required when denormalized dou- ble-precision numbers are prenormalized. The second 4.4.5 Inexact Exception bit is required to permit the computation of the adjusted exponent value in the following cases when the corre- sponding exception enable bit is 1: 4.4.5.1 Definition Underflow during multiplication using a denormal- An Inexact Exception occurs when one of two condi- ized operand. tions occur during rounding: Overflow during division using a denormalized divi- 1. The rounded result differs from the intermediate sor. result assuming both the precision and the expo- The IEEE standard includes 32-bit and 64-bit arith- nent range of the intermediate result to be metic. The standard requires that single-precision arith- unbounded. In this case the result is said to be metic be provided for single-precision operands. The inexact. (If the rounding causes an enabled Over- standard permits double-precision floating-point opera- flow Exception or an enabled Underflow Excep- tions to have either (or both) single-precision or dou- tion, an Inexact Exception also occurs only if the ble-precision operands, but states that single-precision significands of the rounded result and the interme- floating-point operations should not accept double-pre- diate result differ.) cision operands. The Power ISA follows these guide- 2. The rounded result overflows and Overflow Excep- lines; double-precision arithmetic instructions can have tion is disabled. operands of either or both precisions, while single-pre- cision arithmetic instructions require all operands to be 4.4.5.2 Action single-precision. Double-precision arithmetic instruc- tions and fcfid produce double-precision values, while The action to be taken does not depend on the setting single-precision arithmetic instructions produce sin- of the Inexact Exception Enable bit of the FPSCR. gle-precision values. When an Inexact Exception occurs, the following For arithmetic instructions, conversions from dou- actions are taken: ble-precision to single-precision must be done explicitly 1. Inexact Exception is set by software, while conversions from single-precision to FPSCRXX 1 double-precision are done implicitly. 2. The rounded or overflowed result is placed into the target FPR 4.5.1 Execution Model for IEEE 3. FPSCRFPRF is set to indicate the class and sign of the result Operations The following description uses 64-bit arithmetic as an Programming Note example. 32-bit arithmetic is similar except that the In some implementations, enabling Inexact Excep- FRACTION is a 23-bit field, and the single-precision tions may degrade performance more than does Guard, Round, and Sticky bits (described in this sec- enabling other types of floating-point exception. tion) are logically adjacent to the 23-bit FRACTION field. Chapter 4. Floating-Point Facility [Category: Floating-Point] 119 Version 2.06 IEEE-conforming significand arithmetic is considered to The significand of the intermediate result is prepared be performed with a floating-point accumulator having for rounding by shifting its contents right, if required, the following format, where bits 0:55 comprise the sig- until the least significant bit to be retained is in the nificand of the intermediate result. low-order bit position of the fraction. Four user-select- able rounding modes are provided through FPSCRRN S C L FRACTION GR X as described in Section 4.3.6, "Rounding" on page 113. 0 1 53 54 55 Using Z1 and Z2 as defined on page 113, the rules for rounding in each mode are as follows. Figure 56. IEEE 64-bit execution model Round to Nearest The S bit is the sign bit. Guard bit = 0 The C bit is the carry bit, which captures the carry out The result is truncated. (Result exact (GRX=000) of the significand. or closest to next lower value in magnitude (GRX=001, 010, or 011)) The L bit is the leading unit bit of the significand, which receives the implicit bit from the operand. Guard bit = 1 The FRACTION is a 52-bit field that accepts the frac- Depends on Round and Sticky bits: tion of the operand. Case a The Guard (G), Round (R), and Sticky (X) bits are If the Round or Sticky bit is 1 (inclusive), the extensions to the low-order bits of the accumulator. result is incremented. (Result closest to The G and R bits are required for postnormalization of next higher value in magnitude (GRX=101, the result. The G, R, and X bits are required during 110, or 111)) rounding to determine if the intermediate result is Case b equally near the two nearest representable values. The If the Round and Sticky bits are 0 (result X bit serves as an extension to the G and R bits by rep- midway between closest representable val- resenting the logical OR of all bits that may appear to ues), then if the low-order bit of the result is the low-order side of the R bit, due either to shifting the 1 the result is incremented. Otherwise (the accumulator right or to other generation of low-order low-order bit of the result is 0) the result is result bits. The G and R bits participate in the left shifts truncated (this is the case of a tie rounded with zeros being shifted into the R bit. Figure 57 shows to even). the significance of the G, R, and X bits with respect to the intermediate result (IR), the representable number Round toward Zero next lower in magnitude (NL), and the representable Choose the smaller in magnitude of Z1 or Z2. If the number next higher in magnitude (NH). Guard, Round, or Sticky bit is nonzero, the result is inexact. GRX Interpretation Round toward + Infinity 000 IR is exact Choose Z1. 001 Round toward - Infinity 010 IR closer to NL Choose Z2. 011 If rounding results in a carry into C, the significand is 100 IR midway between NL and NH shifted right one position and the exponent is incre- mented by one. This yields an inexact result, and possi- 101 bly also exponent overflow. If any of the Guard, Round, 110 IR closer to NH or Sticky bits is nonzero, then the result is also inexact. 111 Fraction bits are stored to the target FPR. For Floating Round to Integer, Floating Round to Single-Precision, Figure 57. Interpretation of G, R, and X bits and single-precision arithmetic instructions, low-order Figure 58 shows the positions of the Guard, Round, zeros must be appended as appropriate to fill out the and Sticky bits for double-precision and single-preci- double-precision fraction. sion floating-point numbers relative to the accumulator illustrated in Figure 56. Format Guard Round Sticky Double G bit R bit X bit Single 24 25 OR of 26:52, G, R, X Figure 58. Location of the Guard, Round, and Sticky bits in the IEEE execution model 120 Power ISATM Book I Version 2.06 4.5.2 Execution Model for If the instruction is Floating Negative Multiply-Add or Floating Negative Multiply-Subtract, the final result is Multiply-Add Type Instructions negated. The Power ISA provides a special form of instruction that performs up to three operations in one instruction (a multiplication, an addition, and a negation). With this added capability comes the special ability to produce a more exact intermediate result as input to the rounder. 32-bit arithmetic is similar except that the FRACTION field is smaller. Multiply-add significand arithmetic is considered to be performed with a floating-point accumulator having the following format, where bits 0:106 comprise the signifi- cand of the intermediate result. S C L FRACTION X' 0 1 2 3 106 Figure 59. Multiply-add 64-bit execution model The first part of the operation is a multiplication. The multiplication has two 53-bit significands as inputs, which are assumed to be prenormalized, and produces a result conforming to the above model. If there is a carry out of the significand (into the C bit), then the sig- nificand is shifted right one position, shifting the L bit (leading unit bit) into the most significant bit of the FRACTION and shifting the C bit (carry out) into the L bit. All 106 bits (L bit, the FRACTION) of the product take part in the add operation. If the exponents of the two inputs to the adder are not equal, the significand of the operand with the smaller exponent is aligned (shifted) to the right by an amount that is added to that exponent to make it equal to the other input's exponent. Zeros are shifted into the left of the significand as it is aligned and bits shifted out of bit 105 of the significand are ORed into the X' bit. The add operation also pro- duces a result conforming to the above model with the X' bit taking part in the add operation. The result of the addition is then normalized, with all bits of the addition result, except the X' bit, participating in the shift. The normalized result serves as the inter- mediate result that is input to the rounder. For rounding, the conceptual Guard, Round, and Sticky bits are defined in terms of accumulator bits. Figure 60 shows the positions of the Guard, Round, and Sticky bits for double-precision and single-precision float- ing-point numbers in the multiply-add execution model. Format Guard Round Sticky Double 53 54 OR of 55:105, X' Single 24 25 OR of 26:105, X' Figure 60. Location of the Guard, Round, and Sticky bits in the multiply-add execution model The rules for rounding the intermediate result are the same as those given in Section 4.5.1. Chapter 4. Floating-Point Facility [Category: Floating-Point] 121 Version 2.06 4.6 Floating-Point Facility Instructions For each instruction in this section that defines the use of an Rc bit, the behavior defined for the instruction cor- responding to Rc=1 is considered part of the Float- ing-Point.Record category. 122 Power ISATM Book I Version 2.06 4.6.1 Floating-Point Storage Access Instructions The Storage Access instructions compute the effective 4.6.1.1 Storage Access Exceptions address (EA) of the storage to be accessed as described in Section 1.10.3, "Effective Address Calcu- Storage accesses will cause the system data storage lation" on page 27. error handler to be invoked if the program is not allowed to modify the target storage (Store only), or if Programming Note the program attempts to access storage that is unavail- able. The la extended mnemonic permits computing an effective address as a Load or Store instruction would, but loads the address itself into a GPR rather than loading the value that is in storage at that address. This extended mnemonic is described in Section E.9, "Miscellaneous Mnemon- ics" on page 637. 4.6.2 Floating-Point Load Instructions There are three basic forms of load instruction: sin- exp exp - 1 gle-precision, double-precision, and integer. The inte- FRT0 sign ger form is provided by the Load Floating-Point as FRT1:11 exp + 1023 Integer Word Algebraic instruction, described on FRT12:63 frac1:52 page 126. Because the FPRs support only float- Zero / Infinity / NaN ing-point double format, single-precision Load Float- if WORD1:8 = 255 or WORD1:31 = 0 then ing-Point instructions convert single-precision data to FRT0:1 WORD0:1 double format prior to loading the operand into the tar- FRT2 WORD1 get FPR. The conversion and loading steps are as fol- FRT3 WORD1 lows. FRT4 WORD1 Let WORD0:31 be the floating-point single-precision FRT5:63 WORD2:31 || 290 operand accessed from storage. For double-precision Load Floating-Point instructions Normalized Operand and for the Load Floating-Point as Integer Word Alge- if WORD1:8 > 0 and WORD1:8 < 255 then braic instruction no conversion is required, as the data FRT0:1 WORD0:1 from storage are copied directly into the FPR. FRT2 ¬WORD1 Many of the Load Floating-Point instructions have an FRT3 ¬WORD1 "update" form, in which register RA is updated with the FRT4 ¬WORD1 effective address. For these forms, if RA0, the effec- FRT5:63 WORD2:31 || 290 tive address is placed into register RA and the storage Denormalized Operand element (word or doubleword) addressed by EA is if WORD1:8 = 0 and WORD9:31 0 then loaded into FRT. sign WORD0 Note: Recall that RA and RB denote General Purpose exp -126 Registers, while FRT denotes a Floating-Point Regis- frac0:52 0b0 || WORD9:31 || 290 ter. normalize the operand do while frac0 = 0 frac0:52 frac1:52 || 0b0 Chapter 4. Floating-Point Facility [Category: Floating-Point] 123 Version 2.06 Load Floating-Point Single D-form Load Floating-Point Single Indexed X-form lfs FRT,D(RA) lfsx FRT,RA,RB 48 FRT RA D 0 6 11 16 31 31 FRT RA RB 535 / 0 6 11 16 21 31 if RA = 0 then b 0 else b (RA) if RA = 0 then b 0 EA b + EXTS(D) else b (RA) FRT DOUBLE(MEM(EA, 4)) EA b + (RB) FRT DOUBLE(MEM(EA, 4)) Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum (RA|0)+(RB). The word in storage addressed by EA is interpreted as a floating-point single-precision operand. This word is The word in storage addressed by EA is interpreted as converted to floating-point double format (see a floating-point single-precision operand. This word is page 123) and placed into register FRT. converted to floating-point double format (see page 123) and placed into register FRT. Special Registers Altered: None Special Registers Altered: None Load Floating-Point Single with Update Load Floating-Point Single with Update D-form Indexed X-form lfsu FRT,D(RA) lfsux FRT,RA,RB 49 FRT RA D 31 FRT RA RB 567 / 0 6 11 16 31 0 6 11 16 21 31 EA (RA) + EXTS(D) EA (RA) + (RB) FRT DOUBLE(MEM(EA, 4)) FRT DOUBLE(MEM(EA, 4)) RA EA RA EA Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB). The word in storage addressed by EA is interpreted as The word in storage addressed by EA is interpreted as a floating-point single-precision operand. This word is a floating-point single-precision operand. This word is converted to floating-point double format (see converted to floating-point double format (see page 123) and placed into register FRT. page 123) and placed into register FRT. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None 124 Power ISATM Book I Version 2.06 Load Floating-Point Double D-form Load Floating-Point Double Indexed X-form lfd FRT,D(RA) lfdx FRT,RA,RB 50 FRT RA D 0 6 11 16 31 31 FRT RA RB 599 / 0 6 11 16 21 31 if RA = 0 then b 0 else b (RA) if RA = 0 then b 0 EA b + EXTS(D) else b (RA) FRT MEM(EA, 8) EA b + (RB) FRT MEM(EA, 8) Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum (RA|0)+(RB). The doubleword in storage addressed by EA is loaded into register FRT. The doubleword in storage addressed by EA is loaded into register FRT. Special Registers Altered: None Special Registers Altered: None Load Floating-Point Double with Update Load Floating-Point Double with Update D-form Indexed X-form lfdu FRT,D(RA) lfdux FRT,RA,RB 51 FRT RA D 31 FRT RA RB 631 / 0 6 11 16 31 0 6 11 16 21 31 EA (RA) + EXTS(D) EA (RA) + (RB) FRT MEM(EA, 8) FRT MEM(EA, 8) RA EA RA EA Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB). The doubleword in storage addressed by EA is loaded The doubleword in storage addressed by EA is loaded into register FRT. into register FRT. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Chapter 4. Floating-Point Facility [Category: Floating-Point] 125 Version 2.06 Load Floating-Point as Integer Word Algebraic Indexed X-form lfiwax FRT,RA,RB 31 FRT RA RB 855 / 0 6 11 16 21 31 if RA = 0 then b 0 else b (RA) EA b + (RB) FRT EXTS(MEM(EA, 4)) Let the effective address (EA) be the sum (RA|0)+(RB). The word in storage addressed by EA is loaded into FRT32:63. FRT0:31 are filled with a copy of bit 0 of the loaded word. Special Registers Altered: None Load Floating-Point as Integer Word and Zero Indexed X-form lfiwzx FRT,RA,RB [Category: Floating-Point.Phased-in] 31 FRT RA RB 887 / 0 6 11 16 21 31 if RA = 0 then b 0 else b (RA) EA b + (RB) 32 FRT 0 || MEM(EA, 4) Let the effective address (EA) be the sum (RA|0)+(RB). The word in storage addressed by EA is loaded into FRT32:63. FRT0:31 are set to 0. Special Registers Altered: None 126 Power ISATM Book I Version 2.06 4.6.3 Floating-Point Store Instructions There are three basic forms of store instruction: sin- gle-precision Load Floating-Point from WORD will not gle-precision, double-precision, and integer. The inte- compare equal to the contents of the original source ger form is provided by the Store Floating-Point as register). Integer Word instruction, described on page 130. For double-precision Store Floating-Point instructions Because the FPRs support only floating-point double and for the Store Floating-Point as Integer Word format for floating-point data, single-precision Store instruction no conversion is required, as the data from Floating-Point instructions convert double-precision the FPR are copied directly into storage. data to single format prior to storing the operand into storage. The conversion steps are as follows. Many of the Store Floating-Point instructions have an "update" form, in which register RA is updated with the Let WORD0:31 be the word in storage written to. effective address. For these forms, if RA0, the effec- No Denormalization Required (includes Zero / Infin- tive address is placed into register RA. ity / NaN) Note: Recall that RA and RB denote General Purpose if FRS1:11 > 896 or FRS1:63 = 0 then Registers, while FRS denotes a Floating-Point Regis- WORD0:1 FRS0:1 ter. WORD2:31 FRS5:34 Denormalization Required if 874 FRS1:11 896 then sign FRS0 exp FRS1:11 - 1023 frac0:52 0b1 || FRS12:63 denormalize operand do while exp < -126 frac0:52 0b0 || frac0:51 exp exp + 1 WORD0 sign WORD1:8 0x00 WORD9:31 frac1:23 else WORD undefined Notice that if the value to be stored by a single-preci- sion Store Floating-Point instruction is larger in magni- tude than the maximum number representable in single format, the first case above (No Denormalization Required) applies. The result stored in WORD is then a well-defined value, but is not numerically equal to the value in the source register (i.e., the result of a sin- Chapter 4. Floating-Point Facility [Category: Floating-Point] 127 Version 2.06 Store Floating-Point Single D-form Store Floating-Point Single Indexed X-form stfs FRS,D(RA) stfsx FRS,RA,RB 52 FRS RA D 0 6 11 16 31 31 FRS RA RB 663 / 0 6 11 16 21 31 if RA = 0 then b 0 else b (RA) if RA = 0 then b 0 EA b + EXTS(D) else b (RA) MEM(EA, 4) SINGLE((FRS)) EA b + (RB) MEM(EA, 4) SINGLE((FRS)) Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum (RA|0)+(RB). The contents of register FRS are converted to single format (see page 127) and stored into the word in stor- The contents of register FRS are converted to single age addressed by EA. format (see page 127) and stored into the word in stor- age addressed by EA. Special Registers Altered: None Special Registers Altered: None Store Floating-Point Single with Update Store Floating-Point Single with Update D-form Indexed X-form stfsu FRS,D(RA) stfsux FRS,RA,RB 53 FRS RA D 31 FRS RA RB 695 / 0 6 11 16 31 0 6 11 16 21 31 EA (RA) + EXTS(D) EA (RA) + (RB) MEM(EA, 4) SINGLE((FRS)) MEM(EA, 4) SINGLE((FRS)) RA EA RA EA Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB). The contents of register FRS are converted to single The contents of register FRS are converted to single format (see page 127) and stored into the word in stor- format (see page 127) and stored into the word in stor- age addressed by EA. age addressed by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None 128 Power ISATM Book I Version 2.06 Store Floating-Point Double D-form Store Floating-Point Double Indexed X-form stfd FRS,D(RA) stfdx FRS,RA,RB 54 FRS RA D 0 6 11 16 31 31 FRS RA RB 727 / 0 6 11 16 21 31 if RA = 0 then b 0 else b (RA) if RA = 0 then b 0 EA b + EXTS(D) else b (RA) MEM(EA, 8) (FRS) EA b + (RB) MEM(EA, 8) (FRS) Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum (RA|0)+(RB). The contents of register FRS are stored into the dou- bleword in storage addressed by EA. The contents of register FRS are stored into the dou- bleword in storage addressed by EA. Special Registers Altered: None Special Registers Altered: None Store Floating-Point Double with Update Store Floating-Point Double with Update D-form Indexed X-form stfdu FRS,D(RA) stfdux FRS,RA,RB 55 FRS RA D 31 FRS RA RB 759 / 0 6 11 16 31 0 6 11 16 21 31 EA (RA) + EXTS(D) EA (RA) + (RB) MEM(EA, 8) (FRS) MEM(EA, 8) (FRS) RA EA RA EA Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB). The contents of register FRS are stored into the dou- The contents of register FRS are stored into the dou- bleword in storage addressed by EA. bleword in storage addressed by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Chapter 4. Floating-Point Facility [Category: Floating-Point] 129 Version 2.06 Store Floating-Point as Integer Word Indexed X-form stfiwx FRS,RA,RB 31 FRS RA RB 983 / 0 6 11 16 21 31 if RA = 0 then b 0 else b (RA) EA b + (RB) MEM(EA, 4) (FRS)32:63 Let the effective address (EA) be the sum (RA|0)+(RB). (FRS)32:63 are stored, without conversion, into the word in storage addressed by EA. If the contents of register FRS were produced, either directly or indirectly, by a Load Floating-Point Single instruction, a single-precision Arithmetic instruction, or frsp, then the value stored is undefined. (The contents of register FRS are produced directly by such an instruction if FRS is the target register for the instruc- tion. The contents of register FRS are produced indi- rectly by such an instruction if FRS is the final target register of a sequence of one or more Floating-Point Move instructions, with the input to the sequence hav- ing been produced directly by such an instruction.) Special Registers Altered: None 130 Power ISATM Book I Version 2.06 4.6.4 Floating-Point Load Store Doubleword Pair Instructions [Category: Floating-Point.Phased-Out] For lfdp[x], the doubleword-pair in storage addressed odd-numbered FPR being stored into the rightmost by EA is loaded into an even-odd pair of FPRs with the doubleword. even-numbered FPR being loaded with the leftmost doubleword from storage and the odd-numbered FPR Programming Note being loaded with the rightmost doubleword. The instructions described in this section should For stfdp[x], the content of an even-odd pair of FPRs not be used to access an operand in DFP128 for- is stored into the doubleword-pair in storage mat when MSRLE=1. addressed by EA, with the even-numbered FPR being stored into the leftmost doubleword in storage and the Load Floating-Point Double Pair DS-form Store Floating-Point Double Pair DS-form lfdp FRTp,DS(RA) stfdp FRSp,DS(RA) 57 FRTp RA DS 00 61 FRSp RA DS 00 0 6 11 16 30 31 0 6 11 16 30 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + EXTS(DS||0b00) EA b + EXTS(DS||0b00) FRTp MEM(EA, 16) MEM(EA, 16) FRSp Let the effective address (EA) be the sum (RA|0) + Let the effective address (EA) be the sum (RA|0) + (DS||0b00). The doubleword-pair in storage addressed (DS||0b00). The contents of register-pair FRSp are by EA is placed into register-pair FRTp. stored into the doubleword-pair in storage addressed by EA. If FRTp is odd, the instruction form is invalid. If FRSp is odd, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None Load Floating-Point Double Pair Indexed X-form Store Floating-Point Double Pair Indexed X-form lfdpx FRTp,RA,RB stfdpx FRSp,RA,RB 31 FRTp RA RB 791 / 0 6 11 16 21 31 31 FRSp RA RB 919 / 0 6 11 16 21 31 if RA = 0 then b 0 else b (RA) if RA = 0 then b 0 EA b + (RB) else b (RA) FRTp MEM(EA, 16) EA b + (RB) MEM(EA, 16) FRSp Let the effective address (EA) be the sum (RA|0) + (RB). The doubleword-pair in storage addressed by EA Let the effective address (EA) be the sum (RA|0) + is placed into register-pair FRTp. (RB). The contents of register-pair FRSp are stored into the doubleword-pair in storage addressed by EA. If FRTp is odd, the instruction form is invalid. If FRSp is odd, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None Chapter 4. Floating-Point Facility [Category: Floating-Point] 131 Version 2.06 4.6.5 Floating-Point Move Instructions These instructions copy data from one floating-point value (e.g., the sign bit of a NaN may be altered by register to another, altering the sign bit (bit 0) as fneg, fabs, fnabs, and fcpsgn). These instructions do described below for fneg, fabs, fnabs, and fcpsgn. not alter the FPSCR. These instructions treat NaNs just like any other kind of Floating Move Register X-form Floating Negate X-form fmr FRT,FRB (Rc=0) fneg FRT,FRB (Rc=0) fmr. FRT,FRB (Rc=1) fneg. FRT,FRB (Rc=1) 63 FRT /// FRB 72 Rc 63 FRT /// FRB 40 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The contents of register FRB are placed into register The contents of register FRB with bit 0 inverted are FRT. placed into register FRT. Special Registers Altered: Special Registers Altered: CR1 (if Rc=1) CR1 (if Rc=1) Floating Absolute Value X-form Floating Copy Sign X-form fabs FRT,FRB (Rc=0) fcpsgn FRT, FRA, FRB (Rc=0) fabs. FRT,FRB (Rc=1) fcpsgn. FRT, FRA, FRB (Rc=1) 63 FRT /// FRB 264 Rc 63 FRT FRA FRB 8 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The contents of register FRB with bit 0 set to zero are The contents of register FRB with bit 0 set to the value placed into register FRT. of bit 0 of register FRA are placed into register FRT. Special Registers Altered: Special Registers Altered: CR1 (if Rc=1) CR1 (if Rc=1) Floating Negative Absolute Value X-form fnabs FRT,FRB (Rc=0) fnabs. FRT,FRB (Rc=1) 63 FRT /// FRB 136 Rc 0 6 11 16 21 31 The contents of register FRB with bit 0 set to one are placed into register FRT. Special Registers Altered: CR1 (if Rc=1) 132 Power ISATM Book I Version 2.06 4.6.6 Floating-Point Arithmetic Instructions 4.6.6.1 Floating-Point Elementary Arithmetic Instructions Floating Add [Single] A-form Floating Subtract [Single] A-form fadd FRT,FRA,FRB (Rc=0) fsub FRT,FRA,FRB (Rc=0) fadd. FRT,FRA,FRB (Rc=1) fsub. FRT,FRA,FRB (Rc=1) 63 FRT FRA FRB /// 21 Rc 63 FRT FRA FRB /// 20 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 fadds FRT,FRA,FRB (Rc=0) fsubs FRT,FRA,FRB (Rc=0) fadds. FRT,FRA,FRB (Rc=1) fsubs. FRT,FRA,FRB (Rc=1) 59 FRT FRA FRB /// 21 Rc 59 FRT FRA FRB /// 20 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 The floating-point operand in register FRA is added to The floating-point operand in register FRB is subtracted the floating-point operand in register FRB. from the floating-point operand in register FRA. If the most significant bit of the resultant significand is If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR and placed Rounding Control field RN of the FPSCR and placed into register FRT. into register FRT. Floating-point addition is based on exponent compari- The execution of the Floating Subtract instruction is son and addition of the two significands. The expo- identical to that of Floating Add, except that the con- nents of the two operands are compared, and the tents of FRB participate in the operation with the sign significand accompanying the smaller exponent is bit (bit 0) inverted. shifted right, with its exponent increased by one for FPSCRFPRF is set to the class and sign of the result, each bit shifted, until the two exponents are equal. The except for Invalid Operation Exceptions when two significands are then added or subtracted as FPSCRVE=1. appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand Special Registers Altered: as well as all three guard bits (G, R, and X) enter into FPRF FR FI the computation. FX OX UX XX VXSNAN VXISI If a carry occurs, the sum's significand is shifted right CR1 (if Rc=1) one bit position and the exponent is increased by one. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI CR1 (if Rc=1) Chapter 4. Floating-Point Facility [Category: Floating-Point] 133 Version 2.06 Floating Multiply [Single] A-form Floating Divide [Single] A-form fmul FRT,FRA,FRC (Rc=0) fdiv FRT,FRA,FRB (Rc=0) fmul. FRT,FRA,FRC (Rc=1) fdiv. FRT,FRA,FRB (Rc=1) 63 FRT FRA /// FRC 25 Rc 63 FRT FRA FRB /// 18 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 fmuls FRT,FRA,FRC (Rc=0) fdivs FRT,FRA,FRB (Rc=0) fmuls. FRT,FRA,FRC (Rc=1) fdivs. FRT,FRA,FRB (Rc=1) 59 FRT FRA /// FRC 25 Rc 59 FRT FRA FRB /// 18 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 The floating-point operand in register FRA is multiplied The floating-point operand in register FRA is divided by by the floating-point operand in register FRC. the floating-point operand in register FRB. The remain- der is not supplied as a result. If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to If the most significant bit of the resultant significand is the target precision under control of the Floating-Point not 1, the result is normalized. The result is rounded to Rounding Control field RN of the FPSCR and placed the target precision under control of the Floating-Point into register FRT. Rounding Control field RN of the FPSCR and placed into register FRT. Floating-point multiplication is based on exponent addi- tion and multiplication of the significands. Floating-point division is based on exponent subtrac- tion and division of the significands. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRFPRF is set to the class and sign of the result, FPSCRVE=1. except for Invalid Operation Exceptions when FPSCRVE=1 and Zero Divide Exceptions when Special Registers Altered: FPSCRZE=1. FPRF FR FI FX OX UX XX Special Registers Altered: VXSNAN VXIMZ FPRF FR FI CR1 (if Rc=1) FX OX UX ZX XX VXSNAN VXIDI VXZDZ CR1 (if Rc=1) 134 Power ISATM Book I Version 2.06 Floating Square Root [Single] A-form Floating Reciprocal Estimate [Single] A-form fsqrt FRT,FRB (Rc=0) fsqrt. FRT,FRB (Rc=1) fre FRT,FRB (Rc=0) fre. FRT,FRB (Rc=1) 63 FRT /// FRB /// 22 Rc 0 6 11 16 21 26 31 63 FRT /// FRB /// 24 Rc 0 6 11 16 21 26 31 fsqrts FRT,FRB (Rc=0) fsqrts. FRT,FRB (Rc=1) fres FRT,FRB (Rc=0) fres. FRT,FRB (Rc=1) 59 FRT /// FRB /// 22 Rc 0 6 11 16 21 26 31 59 FRT /// FRB /// 24 Rc 0 6 11 16 21 26 31 The square root of the floating-point operand in register FRB is placed into register FRT. An estimate of the reciprocal of the square root of the floating-point operand in register FRB is placed into If the most significant bit of the resultant significand is register FRT. The estimate placed into register FRT is not 1, the result is normalized. The result is rounded to correct to a precision of one part in 32 of the reciprocal the target precision under control of the Floating-Point of the square root of (FRB), i.e., Rounding Control field RN of the FPSCR and placed ABS(estimate ­ 1 / ( x )) ----- into register FRT. 1 ----------------------------------------------- - - 1 / ( x) 32 Operation with various special values of the operand is where x is the initial value in FRB. summarized below. Operation with various special values of the operand is Operand Result Exception summarized below. - QNaN1 VXSQRT <0 QNaN1 VXSQRT Operand Result Exception -0 -0 None - QNaN2 VXSQRT + + None <0 QNaN2 VXSQRT SNaN QNaN1 VXSNAN -0 -1 ZX QNaN QNaN None +0 +1 ZX 1 None No result if FPSCRVE = 1 + +0 SNaN QNaN2 VXSNAN FPSCRFPRF is set to the class and sign of the result, QNaN QNaN None except for Invalid Operation Exceptions when 1 No result if FPSCRZE = 1. FPSCRVE=1. 2 No result if FPSCRVE = 1. Special Registers Altered: FPRF FR FI FX XX FPSCRFPRF is set to the class and sign of the result, VXSNAN VXSQRT except for Invalid Operation Exceptions when CR1 (if Rc=1) FPSCRVE=1 and Zero Divide Exceptions when FPSCRZE=1. The results of executing this instruction may vary between implementations, and between different exe- cutions on the same implementation. Special Registers Altered: FPRF FR (undefined) FI (undefined) FX OX UX ZX XX (undefined) VXSNAN CR1 (if Rc=1) Chapter 4. Floating-Point Facility [Category: Floating-Point] 135 Version 2.06 Programming Note Floating Reciprocal Square Root Estimate [Single] A-form For the Floating-Point Estimate instructions, some implementations might implement a precision frsqrte FRT,FRB (Rc=0) higher than the minimum architected precision. frsqrte. FRT,FRB (Rc=1) Thus, a program may take advantage of the higher precision instructions to increase performance by 63 FRT /// FRB /// 26 Rc decreasing the iterations needed for software emu- 0 6 11 16 21 26 31 lation of floating-point instructions. However, there is no guarantee given about the precision which may vary (up or down) between implementations. frsqrtes FRT,FRB (Rc=0) Only programs targeted at a specific implementa- frsqrtes. FRT,FRB (Rc=1) tion (i.e., the program will not be migrated to another implementation) should take advantage of 59 FRT /// FRB /// 26 Rc 0 6 11 16 21 26 31 the higher precision of the instructions. All other programs should rely on the minimum architected precision, which will guarantee the program to run A estimate of the reciprocal of the square root of the properly across different implementations. floating-point operand in register FRB is placed into register FRT. The estimate placed into register FRT is correct to a precision of one part in 32 of the reciprocal of the square root of (FRB), i.e., ABS(estimate ­ 1 / ( x )) ----- 1- ----------------------------------------------- - 1 / ( x) 32 where x is the initial value in FRB. Operation with various special values of the operand is summarized below. Operand Result Exception - QNaN2 VXSQRT <0 QNaN2 VXSQRT -0 -1 ZX +0 +1 ZX + +0 None SNaN QNaN2 VXSNAN QNaN QNaN None 1 No result if FPSCRZE = 1. 2 No result if FPSCRVE = 1. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1 and Zero Divide Exceptions when FPSCRZE=1. The results of executing this instruction may vary between implementations, and between different exe- cutions on the same implementation. Special Registers Altered: FPRF FR (undefined) FI (undefined) FX ZX XX (undefined) VXSNAN VXSQRT CR1 (if Rc=1) Note See the Notes that appear with fre[s]. 136 Power ISATM Book I Version 2.06 Floating Test for software Divide X-form Floating Test for software Square Root X-form [Category: Floating Point.Phased-In] ftdiv BF,FRA,FRB [Category: Floating Point.Phased-In] ftsqrt BF,FRB 63 BF // FRA FRB 128 / 0 6 9 11 16 21 31 63 BF // /// FRB 160 / 0 6 9 11 16 21 31 Let e_a be the unbiased exponent of the double-preci- sion floating-point operand in register FRA. Let e_b be the unbiased exponent of the double-preci- sion floating-point operand in register FRB. Let e_b be the unbiased exponent of the double-preci- sion floating-point operand in register FRB. fe_flag is set to 1 if either of the following conditions occurs. fe_flag is set to 1 if any of the following conditions occurs. The double-precision floating-point operand in reg- ister FRB is a zero, a NaN, or an infinity, or a neg- The double-precision floating-point operand in reg- ative value. ister FRA is a NaN or an Infinity. e_b is less than or equal to -970. The double-precision floating-point operand in reg- ister FRB is a Zero, a NaN, or an Infinity. Otherwise fe_flag is set to 0. e_b is less than or equal to -1022. fg_flag is set to 1 if the following condition occurs. e_b is greater than or equal to 1021. The double-precision floating-point operand in reg- The double-precision floating-point operand in reg- ister FRB is a Zero, an Infinity, or a denormalized ister FRA is not a zero and the difference, value. e_a - e_b, is greater than or equal to 1023. Otherwise fg_flag is set to 0. The double-precision floating-point operand in reg- If the implementation guarantees a relative error of ister FRA is not a zero and the difference, frsqrte[s][.] of less than or equal to 2-14, then fl_flag e_a - e_b, is less than or equal to -1021. is set to 1. Otherwise fl_flag is set to 0. The double-precision floating-point operand in reg- CR field BF is set to the value ister FRA is not a zero and e_a is less than or fl_flag || fg_flag || fe_flag || 0b0. equal to -970 Special Registers Altered: Otherwise fe_flag is set to 0. CR field BF fg_flag is set to 1 if either of the following conditions occurs. Programming Note The double-precision floating-point operand in reg- ftdiv and ftsqrt are provided to accelerate software ister FRA is an Infinity. emulation of divide and square root operations, by performing the requisite special case checking. The double-precision floating-point operand in reg- Software needs only a single branch, on FE=1 (in ister FRB is a Zero, an Infinity, or a denormalized CR[BF]), to a special case handler. FG and FL may value. provide further acceleration opportunities. Otherwise fg_flag is set to 0. If the implementation guarantees a relative error of fre[s][.] of less than or equal to 2-14, then fl_flag is set to 1. Otherwise fl_flag is set to 0. CR field BF is set to the value fl_flag || fg_flag || fe_flag || 0b0. Special Registers Altered: CR field BF Chapter 4. Floating-Point Facility [Category: Floating-Point] 137 Version 2.06 4.6.6.2 Floating-Point Multiply-Add Instructions These instructions combine a multiply and an add oper- based on the final result of the operation, and not ation without an intermediate rounding operation. The on the result of the multiplication. fraction part of the intermediate product is 106 bits wide Invalid Operation Exception bits are set as if the (L bit, FRACTION), and all 106 bits take part in the add/ multiplication and the addition were performed subtract portion of the instruction. using two separate instructions (fmul[s], followed Status bits are set as follows. by fadd[s] or fsub[s]). That is, multiplication of infinity by 0 or of anything by an SNaN, and/or Overflow, Underflow, and Inexact Exception bits, addition of an SNaN, cause the corresponding the FR and FI bits, and the FPRF field are set exception bits to be set. Floating Multiply-Add [Single] A-form Floating Multiply-Subtract [Single] A-form fmadd FRT,FRA,FRC,FRB (Rc=0) fmsub FRT,FRA,FRC,FRB (Rc=0) fmadd. FRT,FRA,FRC,FRB (Rc=1) fmsub. FRT,FRA,FRC,FRB (Rc=1) 63 FRT FRA FRB FRC 29 Rc 63 FRT FRA FRB FRC 28 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 fmadds FRT,FRA,FRC,FRB (Rc=0) fmsubs FRT,FRA,FRC,FRB (Rc=0) fmadds. FRT,FRA,FRC,FRB (Rc=1) fmsubs. FRT,FRA,FRC,FRB (Rc=1) 59 FRT FRA FRB FRC 29 Rc 59 FRT FRA FRB FRC 28 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 The operation The operation FRT [(FRA)×(FRC)] + (FRB) FRT [(FRA)×(FRC)] - (FRB) is performed. is performed. The floating-point operand in register FRA is multiplied The floating-point operand in register FRA is multiplied by the floating-point operand in register FRC. The float- by the floating-point operand in register FRC. The float- ing-point operand in register FRB is added to this inter- ing-point operand in register FRB is subtracted from mediate result. this intermediate result. If the most significant bit of the resultant significand is If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR and placed Rounding Control field RN of the FPSCR and placed into register FRT. into register FRT. FPSCRFPRF is set to the class and sign of the result, FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when except for Invalid Operation Exceptions when FPSCRVE=1. FPSCRVE=1. Special Registers Altered: Special Registers Altered: FPRF FR FI FPRF FR FI FX OX UX XX FX OX UX XX VXSNAN VXISI VXIMZ VXSNAN VXISI VXIMZ CR1 (if Rc=1) CR1 (if Rc=1) 138 Power ISATM Book I Version 2.06 Floating Negative Multiply-Add [Single] Floating Negative Multiply-Subtract A-form [Single] A-form fnmadd FRT,FRA,FRC,FRB (Rc=0) fnmsub FRT,FRA,FRC,FRB (Rc=0) fnmadd. FRT,FRA,FRC,FRB (Rc=1) fnmsub. FRT,FRA,FRC,FRB (Rc=1) 63 FRT FRA FRB FRC 31 Rc 63 FRT FRA FRB FRC 30 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 fnmadds FRT,FRA,FRC,FRB (Rc=0) fnmsubs FRT,FRA,FRC,FRB (Rc=0) fnmadds. FRT,FRA,FRC,FRB (Rc=1) fnmsubs. FRT,FRA,FRC,FRB (Rc=1) 59 FRT FRA FRB FRC 31 Rc 59 FRT FRA FRB FRC 30 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 The operation The operation FRT - ( [(FRA)×(FRC)] + (FRB) ) FRT - ( [(FRA)×(FRC)] - (FRB) ) is performed. is performed. The floating-point operand in register FRA is multiplied The floating-point operand in register FRA is multiplied by the floating-point operand in register FRC. The float- by the floating-point operand in register FRC. The float- ing-point operand in register FRB is added to this inter- ing-point operand in register FRB is subtracted from mediate result. this intermediate result. If the most significant bit of the resultant significand is If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR, then negated Rounding Control field RN of the FPSCR, then negated and placed into register FRT. and placed into register FRT. This instruction produces the same result as would be This instruction produces the same result as would be obtained by using the Floating Multiply-Add instruction obtained by using the Floating Multiply-Subtract and then negating the result, with the following excep- instruction and then negating the result, with the follow- tions. ing exceptions. QNaNs propagate with no effect on their "sign" bit. QNaNs propagate with no effect on their "sign" bit. QNaNs that are generated as the result of a dis- QNaNs that are generated as the result of a dis- abled Invalid Operation Exception have a "sign" bit abled Invalid Operation Exception have a "sign" bit of 0. of 0. SNaNs that are converted to QNaNs as the result SNaNs that are converted to QNaNs as the result of a disabled Invalid Operation Exception retain of a disabled Invalid Operation Exception retain the "sign" bit of the SNaN. the "sign" bit of the SNaN. FPSCRFPRF is set to the class and sign of the result, FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when except for Invalid Operation Exceptions when FPSCRVE=1. FPSCRVE=1. Special Registers Altered: Special Registers Altered: FPRF FR FI FPRF FR FI FX OX UX XX FX OX UX XX VXSNAN VXISI VXIMZ VXSNAN VXISI VXIMZ CR1 (if Rc=1) CR1 (if Rc=1) Chapter 4. Floating-Point Facility [Category: Floating-Point] 139 Version 2.06 4.6.7 Floating-Point Rounding and Conversion Instructions Programming Note Examples of uses of these instructions to perform various conversions can be found in Section F.2, "Floating-Point Conversions [Category: Float- ing-Point]" on page 644. 4.6.7.1 Floating-Point Rounding 4.6.7.2 Floating-Point Convert To/ Instruction From Integer Instructions Floating Round to Single-Precision Floating Convert To Integer Doubleword X-form X-form frsp FRT,FRB (Rc=0) fctid FRT,FRB (Rc=0) frsp. FRT,FRB (Rc=1) fctid. FRT,FRB (Rc=1) 63 FRT /// FRB 12 Rc 63 FRT /// FRB 814 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The floating-point operand in register FRB is rounded Let src be the double-precision floating-point value in to single-precision, using the rounding mode specified FRB. by FPSCRRN, and placed into register FRT. If src is a NaN, then the result is The rounding is described fully in Section A.1, "Float- 0x8000_0000_0000_0000, VXCVI is set to 1, and, if ing-Point Round to Single-Precision Model" on src is an SNaN, VXSNAN is set to 1. page 603. Otherwise, src is rounded to a floating-point integer FPSCRFPRF is set to the class and sign of the result, using the rounding mode specified by FPSCRRN. except for Invalid Operation Exceptions when If the rounded value is greater than 263-1, then the FPSCRVE=1. result is 0x7FFF_FFFF_FFFF_FFFF and VXCVI is set Special Registers Altered: to 1. FPRF FR FI FX OX UX XX Otherwise, if the rounded value is less than -263, then VXSNAN the result is 0x8000_0000_0000_0000 and VXCVI is CR1 (if Rc=1) set to 1. Otherwise, the result is the rounded value converted to 64-bit signed-integer format. If an enabled Invalid Operation Exception does not occur, then the result is placed into FRT. The conversion is described fully in Section A.2, "Float- ing-Point Convert to Integer Model" on page 607. Except for enabled Invalid Operation Exceptions, FPSCRFPRF is undefined. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF (undefined) FR FI FX XX VXSNAN VXCVI CR1 (if Rc=1) 140 Power ISATM Book I Version 2.06 Floating Convert To Integer Doubleword Floating Convert To Integer Doubleword with round toward Zero X-form Unsigned X-form fctidz FRT,FRB (Rc=0) [Category: Floating-Point.Phased-In] fctidz. FRT,FRB (Rc=1) fctidu FRT,FRB (Rc=0) fctidu. FRT,FRB (Rc=1) 63 FRT /// FRB 815 Rc 0 6 11 16 21 31 63 FRT /// FRB 942 Rc 0 6 11 16 21 31 Let src be the double-precision floating-point value in FRB. Let src be the double-precision floating-point value in FRB. If src is a NaN, then the result is 0x8000_0000_0000_0000, VXCVI is set to 1, and, if If src is a NaN, then the result is src is an SNaN, VXSNAN is set to 1. 0x0000_0000_0000_0000, VXCVI is set to 1, and, if src is an SNaN, VXSNAN is set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round toward Zero. Otherwise, src is rounded to a floating-point integer using the rounding mode specified by FPSCRRN. If the rounded value is greater than 263-1, then the result is 0x7FFF_FFFF_FFFF_FFFF and VXCVI is set If the rounded value is greater than 264-1, then the to 1. result is 0xFFFF_FFFF_FFFF_FFFF, and VXCVI is set to 1. Otherwise, if the rounded value is less than -263, then the result is 0x8000_0000_0000_0000 and VXCVI is Otherwise, if the rounded value is less than 0, then the set to 1. result is 0x0000_0000_0000_0000, and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit signed-integer format. Otherwise, the result is the rounded value converted to 64-bit unsigned-integer format. If an enabled Invalid Operation Exception does not occur, then the result is placed into FRT. If an enabled Invalid Operation Exception does not occur, then the result is placed into FRT. The conversion is described fully in Section A.2, "Float- ing-Point Convert to Integer Model" on page 607. The conversion is described fully in Section A.2, "Float- ing-Point Convert to Integer Model" on page 607. Except for enabled Invalid Operation Exceptions, FPSCRFPRF is undefined. FPSCRFR is set if the result Except for enabled Invalid Operation Exceptions, is incremented when rounded. FPSCRFI is set if the FPSCRFPRF is undefined. FPSCRFR is set if the result result is inexact. is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF (undefined) FR FI Special Registers Altered: FX XX FPRF (undefined) FR FI VXSNAN VXCVI FX XX CR1 (if Rc=1) VXSNAN VXCVI CR1 (if Rc=1) Chapter 4. Floating-Point Facility [Category: Floating-Point] 141 Version 2.06 Floating Convert To Integer Doubleword Floating Convert To Integer Word X-form Unsigned with round toward Zero X-form [Category: Floating-Point.Phased-In] fctiw FRT,FRB (Rc=0) fctiduz FRT,FRB (Rc=0) fctiw. FRT,FRB (Rc=1) fctiduz. FRT,FRB (Rc=1) 63 FRT /// FRB 14 Rc 63 FRT /// FRB 943 Rc 0 6 11 16 21 31 0 6 11 16 21 31 Let src be the double-precision floating-point value in Let src be the double-precision floating-point value in FRB. FRB. If src is a NaN, then the result is 0x8000_0000, VXCVI If src is a NaN, then the result is is set to 1, and, if src is an SNaN, VXSNAN is set to 1. 0x0000_0000_0000_0000, VXCVI is set to 1, and, if Otherwise, src is rounded to a floating-point integer src is an SNaN, VXSNAN is set to 1. using the rounding mode specified by FPSCRRN. Otherwise, src is rounded to a floating-point integer If the rounded value is greater than 231-1, then the using the rounding mode Round toward Zero. result is 0x7FFF_FFFF, and VXCVI is set to 1. If the rounded value is greater than 264-1, then the Otherwise, if the rounded value is less than -231, then result is 0xFFFF_FFFF_FFFF_FFFF, and VXCVI is set the result is 0x8000_0000, and VXCVI is set to 1. to 1. Otherwise, the result is the rounded value converted to Otherwise, if the rounded value is less than 0, then the 32-bit signed-integer format. result is 0x0000_0000_0000_0000, and VXCVI is set to 1. If an enabled Invalid Operation Exception does not occur, then the result is placed into FRT32:63 and Otherwise, the result is the rounded value converted to FRT0:31 is undefined, 64-bit unsigned-integer format. The conversion is described fully in Section A.2, "Float- If an enabled Invalid Operation Exception does not ing-Point Convert to Integer Model" on page 607. occur, then the result is placed into FRT. Except for enabled Invalid Operation Exceptions, The conversion is described fully in Section A.2, "Float- FPSCRFPRF is undefined. FPSCRFR is set if the result ing-Point Convert to Integer Model" on page 607. is incremented when rounded. FPSCRFI is set if the Except for enabled Invalid Operation Exceptions, result is inexact. FPSCRFPRF is undefined. FPSCRFR is set if the result Special Registers Altered: is incremented when rounded. FPSCRFI is set if the FPRF (undefined) FR FI result is inexact. FX XX Special Registers Altered: VXSNAN VXCVI FPRF (undefined) FR FI CR1 (if Rc=1) FX XX VXSNAN VXCVI CR1 (if Rc=1) 142 Power ISATM Book I Version 2.06 Floating Convert To Integer Word Floating Convert To Integer Word with round toward Zero X-form Unsigned X-form fctiwz FRT,FRB (Rc=0) [Category: Floating-Point.Phased-In] fctiwz. FRT,FRB (Rc=1) fctiwu FRT,FRB (Rc=0) fctiwu. FRT,FRB (Rc=1) 63 FRT /// FRB 15 Rc 0 6 11 16 21 31 63 FRT /// FRB 142 Rc 0 6 11 16 21 31 Let src be the double-precision floating-point value in FRB. Let src be the double-precision floating-point value in FRB. If src is a NaN, then the result is 0x8000_0000, VXCVI is set to 1, and, if src is an SNaN, VXSNAN is set to 1. If src is a NaN, then the result is 0x0000_0000, VXCVI is set to 1, and, if src is an SNaN, VXSNAN is set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round toward Zero. Otherwise, src is rounded to a floating-point integer 31 using the rounding mode specified by FPSCRRN. If the rounded value is greater than 2 -1, then the result is 0x7FFF_FFFF, and VXCVI is set to 1. If the rounded value is greater than 232-1, then the result is 0xFFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than -231, then the result is 0x8000_0000, and VXCVI is set to 1. Otherwise, if the rounded value is less than 0.0, then the result is 0x0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit signed-integer format. Otherwise, the result is the rounded value converted to 32-bit unsigned-integer format. If an enabled Invalid Operation Exception does not occur, then the result is placed into FRT32:63 and If an enabled Invalid Operation Exception does not FRT0:31 is undefined, occur, then the result is placed into FRT32:63 and FRT0:31 is undefined, The conversion is described fully in Section A.2, "Float- ing-Point Convert to Integer Model" on page 607. The conversion is described fully in Section A.2, "Float- ing-Point Convert to Integer Model" on page 607. Except for enabled Invalid Operation Exceptions, FPSCRFPRF is undefined. FPSCRFR is set if the result Except for enabled Invalid Operation Exceptions, is incremented when rounded. FPSCRFI is set if the FPSCRFPRF is undefined. FPSCRFR is set if the result result is inexact. is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF (undefined) FR FI Special Registers Altered: FX XX FPRF (undefined) FR FI VXSNAN VXCVI FX XX CR1 (if Rc=1) VXSNAN VXCVI CR1 (if Rc=1) Chapter 4. Floating-Point Facility [Category: Floating-Point] 143 Version 2.06 Floating Convert To Integer Word Floating Convert From Integer Unsigned with round toward Zero X-form Doubleword X-form [Category: Floating-Point.Phased-In] fctiwuz FRT,FRB (Rc=0) fcfid FRT,FRB (Rc=0) fctiwuz. FRT,FRB (Rc=1) fcfid. FRT,FRB (Rc=1) 63 FRT /// FRB 143 Rc 63 FRT /// FRB 846 Rc 0 6 11 16 21 31 0 6 11 16 21 31 Let src be the double-precision floating-point value in The 64-bit signed fixed-point operand in register FRB is FRB. converted to an infinitely precise floating-point integer. The result of the conversion is rounded to double-preci- If src is a NaN, then the result is 0x0000_0000, VXCVI sion, using the rounding mode specified by FPSCRRN, is set to 1, and, if src is an SNaN, VXSNAN is set to 1. and placed into register FRT. Otherwise, src is rounded to a floating-point integer The conversion is described fully in Section A.3, "Float- using the rounding mode Round toward Zero. ing-Point Convert from Integer Model". If the rounded value is greater than 232-1, then the FPSCRFPRF is set to the class and sign of the result. result is 0xFFFF_FFFF and VXCVI is set to 1. FPSCRFR is set if the result is incremented when Otherwise, if the rounded value is less than 0.0, then rounded. FPSCRFI is set if the result is inexact. the result is 0x0000_0000 and VXCVI is set to 1. Special Registers Altered: Otherwise, the result is the rounded value converted to FPRF FR FI 32-bit unsigned-integer format. FX XX CR1 (if Rc=1) If an enabled Invalid Operation Exception does not occur, then the result is placed into FRT32:63 and Programming Note FRT0:31 is undefined, Converting a signed integer word to double-preci- The conversion is described fully in Section A.2, "Float- sion floating-point can be accomplished by loading ing-Point Convert to Integer Model" on page 607. the word from storage using Load Float Word Alge- braic Indexed and then using fcfid. Except for enabled Invalid Operation Exceptions, FPSCRFPRF is undefined. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF (undefined) FR FI FX XX VXSNAN VXCVI CR1 (if Rc=1) 144 Power ISATM Book I Version 2.06 Floating Convert From Integer Floating Convert From Integer Doubleword Unsigned X-form Doubleword Single X-form [Category: Floating-Point.Phased-In] [Category: Floating-Point.Phased-In] fcfidu FRT,FRB (Rc=0) fcfids FRT,FRB (Rc=0) fcfidu. FRT,FRB (Rc=1) fcfids. FRT,FRB (Rc=1) 63 FRT /// FRB 974 Rc 59 FRT /// FRB 846 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The 64-bit unsigned fixed-point operand in register The 64-bit signed fixed-point operand in register FRB is FRB is converted to an infinitely precise floating-point converted to an infinitely precise floating-point integer. integer. The result of the conversion is rounded to dou- The result of the conversion is rounded to single-preci- ble-precision, using the rounding mode specified by sion, using the rounding mode specified by FPSCRRN, FPSCRRN, and placed into register FRT. and placed into register FRT. The conversion is described fully in Section A.3, "Float- The conversion is described fully in Section A.3, "Float- ing-Point Convert from Integer Model". ing-Point Convert from Integer Model". FPSCRFPRF is set to the class and sign of the result. FPSCRFPRF is set to the class and sign of the result. FPSCRFR is set if the result is incremented when FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: Special Registers Altered: FPRF FR FI FPRF FR FI FX XX FX XX CR1 (if Rc=1) CR1 (if Rc=1) Programming Note Programming Note Converting an unsigned integer word to dou- Converting a signed integer word to single-preci- ble-precision floating-point can be accomplished by sion floating-point can be accomplished by loading loading the word from storage using Load Float the word from storage using Load Float Word Alge- Word and Zero Indexed and then using fcfidu. braic Indexed and then using fcfids. Chapter 4. Floating-Point Facility [Category: Floating-Point] 145 Version 2.06 Floating Convert From Integer 4 .6 .7 .3 F lo a tin g R o u n d to In te g e r Doubleword Unsigned Single X-form In s tr u c tio n s [Category: Floating-Point.Phased-In] The Floating Round to Integer instructions provide fcfidus FRT,FRB (Rc=0) direct support for rounding functions found in high level fcfidus. FRT,FRB (Rc=1) languages. For example, frin, friz, frip, and frim imple- ment C++ round(), trunc(), ceil(), and floor(), respec- 59 FRT /// FRB 974 Rc tively. Note that frin does not implement the IEEE 0 6 11 16 21 31 Round to Nearest function, which is often further described as "ties to even." The rounding performed by The 64-bit unsigned fixed-point operand in register these instructions is described fully in Section A.4, FRB is converted to an infinitely precise floating-point "Floating-Point Round to Integer Model" on page 612. integer. The result of the conversion is rounded to sin- gle-precision, using the rounding mode specified by Programming Note FPSCRRN, and placed into register FRT. These instructions set FPSCRFR FI to 0b00 regard- less of whether the result is inexact or rounded The conversion is described fully in Section A.3, "Float- because there is a desire to preserve the value of ing-Point Convert from Integer Model". FPSCRXX. Furthermore, it is believed that most FPSCRFPRF is set to the class and sign of the result. programs do not need to know whether these FPSCRFR is set if the result is incremented when rounding operations produce inexact or rounded rounded. FPSCRFI is set if the result is inexact. results. If it is necessary to determine whether the result is inexact or rounded, software must com- Special Registers Altered: pare the result with the original source operand. FPRF FR FI FX XX CR1 (if Rc=1) Programming Note Converting a unsigned integer word to single-preci- sion floating-point can be accomplished by loading the word from storage using Load Float Word and Zero Indexed and then using fcfidus. 146 Power ISATM Book I Version 2.06 Floating Round to Integer Nearest X-form Floating Round to Integer Plus X-form frin FRT,FRB (Rc=0) frip FRT,FRB (Rc=0) frin. FRT,FRB (Rc=1) frip. FRT,FRB (Rc=1) 63 FRT /// FRB 392 Rc 63 FRT /// FRB 456 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The floating-point operand in register FRB is rounded The floating-point operand in register FRB is rounded to an integral value as follows, with the result placed to an integral value using the rounding mode round into register FRT. If the sign of the operand is positive, toward +infinity, and the result is placed into register (FRB) + 0.5 is truncated to an integral value, otherwise FRT. (FRB) - 0.5 is truncated to an integral value. FPSCRFPRF is set to the class and sign of the result, FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when except for Invalid Operation Exceptions when FPSCRVE = 1. FPSCRVE = 1. Special Registers Altered: Special Registers Altered: FPRF FR (set to 0) FI (set to 0) FPRF FR (set to 0) FI (set to 0) FX FX VXSNAN VXSNAN CR1 (if Rc = 1) CR1 (if Rc = 1) Floating Round to Integer Toward Zero Floating Round to Integer Minus X-form X-form frim FRT,FRB (Rc=0) friz FRT,FRB (Rc=0) frim. FRT,FRB (Rc=1) friz. FRT,FRB (Rc=1) 63 FRT /// FRB 488 Rc 63 FRT /// FRB 424 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The floating-point operand in register FRB is rounded The floating-point operand in register FRB is rounded to an integral value using the rounding mode round to an integral value using the rounding mode round toward -infinity, and the result is placed into register toward zero, and the result is placed into register FRT. FRT. FPSCRFPRF is set to the class and sign of the result, FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when except for Invalid Operation Exceptions when FPSCRVE = 1. FPSCRVE = 1. Special Registers Altered: Special Registers Altered: FPRF FR (set to 0) FI (set to 0) FPRF FR (set to 0) FI (set to 0) FX FX VXSNAN VXSNAN CR1 (if Rc = 1) CR1 (if Rc = 1) Chapter 4. Floating-Point Facility [Category: Floating-Point] 147 Version 2.06 4.6.8 Floating-Point Compare Instructions The floating-point Compare instructions compare the The CR field and the FPCC are set as follows. contents of two floating-point registers. Comparison ignores the sign of zero (i.e., regards +0 as equal to Bit Name Description -0). The comparison can be ordered or unordered. 0 FL (FRA) < (FRB) 1 FG (FRA) > (FRB) The comparison sets one bit in the designated CR field 2 FE (FRA) = (FRB) to 1 and the other three to 0. The FPCC is set in the 3 FU (FRA) ? (FRB) (unordered) same way. Floating Compare Unordered X-form Floating Compare Ordered X-form fcmpu BF,FRA,FRB fcmpo BF,FRA,FRB 63 BF // FRA FRB 0 / 63 BF // FRA FRB 32 / 0 6 9 11 16 21 31 0 6 9 11 16 21 31 if (FRA) is a NaN or if (FRA) is a NaN or (FRB) is a NaN then c 0b0001 (FRB) is a NaN then c 0b0001 else if (FRA) < (FRB) then c 0b1000 else if (FRA) < (FRB) then c 0b1000 else if (FRA) > (FRB) then c 0b0100 else if (FRA) > (FRB) then c 0b0100 else c 0b0010 else c 0b0010 FPCC c FPCC c CR4×BF:4×BF+3 c CR4×BF:4×BF+3 c if (FRA) is an SNaN or if (FRA) is an SNaN or (FRB) is an SNaN then (FRB) is an SNaN then VXSNAN 1 VXSNAN 1 if VE = 0 then VXVC 1 The floating-point operand in register FRA is compared else if (FRA) is a QNaN or to the floating-point operand in register FRB. The result (FRB) is a QNaN then VXVC 1 of the compare is placed into CR field BF and the FPCC. The floating-point operand in register FRA is compared to the floating-point operand in register FRB. The result If either of the operands is a NaN, either quiet or signal- of the compare is placed into CR field BF and the ing, then CR field BF and the FPCC are set to reflect FPCC. unordered. If either of the operands is a Signaling NaN, then VXSNAN is set. If either of the operands is a NaN, either quiet or signal- ing, then CR field BF and the FPCC are set to reflect Special Registers Altered: unordered. If either of the operands is a Signaling NaN, CR field BF then VXSNAN is set and, if Invalid Operation is dis- FPCC abled (VE=0), VXVC is set. If neither operand is a Sig- FX naling NaN but at least one operand is a Quiet NaN, VXSNAN then VXVC is set. Special Registers Altered: CR field BF FPCC FX VXSNAN VXVC 148 Power ISATM Book I Version 2.06 4.6.9 Floating-Point Select Instruction Floating Select A-form fsel FRT,FRA,FRC,FRB (Rc=0) fsel. FRT,FRA,FRC,FRB (Rc=1) 63 FRT FRA FRB FRC 23 Rc 0 6 11 16 21 26 31 if (FRA) 0.0 then FRT (FRC) else FRT (FRB) The floating-point operand in register FRA is compared to the value zero. If the operand is greater than or equal to zero, register FRT is set to the contents of register FRC. If the operand is less than zero or is a NaN, regis- ter FRT is set to the contents of register FRB. The com- parison ignores the sign of zero (i.e., regards +0 as equal to -0). Special Registers Altered: CR1 (if Rc=1) Programming Note Examples of uses of this instruction can be found in Sections F.2, "Floating-Point Conversions [Cate- gory: Floating-Point]" on page 644 and F.3, "Float- ing-Point Selection [Category: Floating-Point]" on page 648. Warning: Care must be taken in using fsel if IEEE compatibility is required, or if the values being tested can be NaNs or infinities; see Section F.3.4, "Notes" on page 648. Chapter 4. Floating-Point Facility [Category: Floating-Point] 149 Version 2.06 4.6.10 Floating-Point Status and Control Register Instructions Every Floating-Point Status and Control Register Move From FPSCR X-form instruction synchronizes the effects of all floating-point instructions executed by a given processor. Executing mffs FRT (Rc=0) a Floating-Point Status and Control Register instruction mffs. FRT (Rc=1) ensures that all floating-point instructions previously ini- tiated by the given processor have completed before 63 FRT /// /// 583 Rc the Floating-Point Status and Control Register instruc- 0 6 11 16 21 31 tion is initiated, and that no subsequent floating-point instructions are initiated by the given processor until The contents of the FPSCR are placed into register the Floating-Point Status and Control Register instruc- FRT. tion has completed. In particular: Special Registers Altered: All exceptions that will be caused by the previously CR1 (if Rc=1) initiated instructions are recorded in the FPSCR before the Floating-Point Status and Control Reg- ister instruction is initiated. Move to Condition Register from FPSCR X-form All invocations of the system floating-point enabled exception error handler that will be caused by the mcrfs BF,BFA previously initiated instructions have occurred before the Floating-Point Status and Control Reg- 63 BF // BFA // /// 64 / ister instruction is initiated. 0 6 9 11 14 16 21 31 No subsequent floating-point instruction that depends on or alters the settings of any FPSCR The contents of FPSCR32:63 field BFA are copied to bits is initiated until the Floating-Point Status and Condition Register field BF. All exception bits copied Control Register instruction has completed. are set to 0 in the FPSCR. If the FX bit is copied, it is set to 0 in the FPSCR. (Floating-point Storage Access instructions are not affected.) Special Registers Altered: CR field BF The instruction descriptions in this section refer to FX OX (if BFA=0) "FPSCR fields," where FPSCR field k is FPSCR bits UX ZX XX VXSNAN (if BFA=1) 4xk:4xk+3. VXISI VXIDI VXZDZ VXIMZ (if BFA=2) VXVC (if BFA=3) VXSOFT VXSQRT VXCVI (if BFA=5) 150 Power ISATM Book I Version 2.06 Move To FPSCR Field Immediate X-form Move To FPSCR Fields XFL-form mtfsfi BF,U,W (Rc=0) mtfsf FLM,FRB,L,W (Rc=0) mtfsfi. BF,U,W (Rc=1) mtfsf. FLM,FRB,L,W (Rc=1) 63 BF // /// W U / 134 Rc 63 L FLM W FRB 711 Rc 0 6 9 11 15 16 20 21 31 0 6 7 15 16 21 31 The value of the U field is placed into FPSCR field The FPSCR is modified as specified by the FLM, L, and BF+8%(1-W). W fields. FPSCRFX is altered only if BF = 0 and W = 0. L=0 Special Registers Altered: The contents of register FRB are placed into the FPSCR field BF + 8%(1-W) FPSCR under control of the W field and the field CR1 (if Rc=1) mask specified by FLM. W and the field mask iden- tify the 4-bit fields affected. Let i be an integer in Programming Note the range 0-7. If FLMi=1 then FPSCR field k is set mtfsfi serves as both a basic and an extended to the contents of the corresponding field of regis- mnemonic. The Assembler will recognize a mtfsfi ter FRB, where k = i+8%(1-W). mnemonic with three operands as the basic form, L=1 and a mtfsfi mnemonic with two operands as the extended form. In the extended form the W oper- The contents of register FRB are placed into the and is omitted and assumed to be 0. FPSCR. FPSCRFX is not altered implicitly by this instruction. Programming Note Special Registers Altered: When FPSCR32:35 is specified, bits 32 (FX) and 35 FPSCR fields selected by mask, L, and W (OX) are set to the values of U0 and U3 (i.e., even if CR1 (if Rc=1) this instruction causes OX to change from 0 to 1, FX is set from U0 and not by the usual rule that FX Programming Note is set to 1 when an exception bit changes from 0 to mtfsf serves as both a basic and an extended 1). Bits 33 and 34 (FEX and VX) are set according mnemonic. The Assembler will recognize a mtfsf to the usual rule, given on page 107, and not from mnemonic with four operands as the basic form, U1:2. and a mtfsf mnemonic with two operands as the extended form. In the extended form the W and L operands are omitted and both are assumed to be 0. Programming Note Updating fewer than eight fields of the FPSCR may have substantially poorer performance on some implementations than updating eight fields or all of the fields. Programming Note If L=1 or if L=0 and FPSCR32:35 is specified, bits 32 (FX) and 35 (OX) are set to the values of (FRB)32 and (FRB)35 (i.e., even if this instruction causes OX to change from 0 to 1, FX is set from (FRB)32 and not by the usual rule that FX is set to 1 when an exception bit changes from 0 to 1). Bits 33 and 34 (FEX and VX) are set according to the usual rule, given on page 107, and not from (FRB)33:34. Chapter 4. Floating-Point Facility [Category: Floating-Point] 151 Version 2.06 Move To FPSCR Bit 0 X-form Move To FPSCR Bit 1 X-form mtfsb0 BT (Rc=0) mtfsb1 BT (Rc=0) mtfsb0. BT (Rc=1) mtfsb1. BT (Rc=1) 63 BT /// /// 70 Rc 63 BT /// /// 38 Rc 0 6 11 16 21 31 0 6 11 16 21 31 Bit BT+32 of the FPSCR is set to 0. Bit BT+32 of the FPSCR is set to 1. Special Registers Altered: Special Registers Altered: FPSCR bit BT+32 FPSCR bits BT+32 and FX CR1 (if Rc=1) CR1 (if Rc=1) Programming Note Programming Note Bits 33 and 34 (FEX and VX) cannot be explicitly Bits 32 and 34 (FEX and VX) cannot be explicitly reset. set. 152 Power ISATM Book I Version 2.06 Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 5.1 Decimal Floating-Point (DFP) Facility 5.5.8.1 Data-Format Conversion . . . . . 163 Overview . . . . . . . . . . . . . . . . . . . . . . . 153 5.5.8.2 Data-Type Conversion. . . . . . . 164 5.2 DFP Register Handling . . . . . . . . . 154 5.5.9 Format Operations . . . . . . . . . . . 164 5.2.1 DFP Usage of Floating-Point 5.5.10 DFP Exceptions . . . . . . . . . . . . 164 Registers . . . . . . . . . . . . . . . . . . . . . . . 154 5.5.10.1 Invalid Operation Exception . 166 5.3 DFP Support for Non-DFP Data 5.5.10.2 Zero Divide Exception . . . . . . 167 Types. . . . . . . . . . . . . . . . . . . . . . . . . . 156 5.5.10.3 Overflow Exception . . . . . . . . 167 5.4 DFP Number Representation . . . . 157 5.5.10.4 Underflow Exception . . . . . . . 168 5.4.1 DFP Data Format. . . . . . . . . . . . 158 5.5.10.5 Inexact Exception . . . . . . . . . 169 5.4.1.1 Fields Within the Data Format 158 5.5.11 Summary of Normal Rounding And 5.4.1.2 Summary of DFP Data Range Actions . . . . . . . . . . . . . . . . . . . 170 Formats . . . . . . . . . . . . . . . . . . . . . . . . 159 5.6 DFP Instruction Descriptions. . . . . 172 5.4.1.3 Preferred DPD Encoding . . . . 159 5.6.1 DFP Arithmetic Instructions . . . . 173 5.4.2 Classes of DFP Data . . . . . . . . . 159 5.6.2 DFP Compare Instructions. . . . . 177 5.5 DFP Execution Model . . . . . . . . . . 160 5.6.3 DFP Test Instructions . . . . . . . . . 180 5.5.1 Rounding . . . . . . . . . . . . . . . . . . 160 5.6.4 DFP Quantum Adjustment 5.5.2 Rounding Mode Specification . . 161 Instructions . . . . . . . . . . . . . . . . . . . . . 183 5.5.3 Formation of Final Result. . . . . . 162 5.6.5 DFP Conversion Instructions . . . 192 5.5.3.1 Use of Ideal Exponent . . . . . . 162 5.6.5.1 DFP Data-Format Conversion 5.5.4 Arithmetic Operations . . . . . . . . 162 Instructions . . . . . . . . . . . . . . . . . . . . . 192 5.5.4.1 Sign of Arithmetic Result . . . . 162 5.6.5.2 DFP Data-Type Conversion 5.5.5 Compare Operations . . . . . . . . . 163 Instructions . . . . . . . . . . . . . . . . . . . . . 195 5.5.6 Test Operations . . . . . . . . . . . . . 163 5.6.6 DFP Format Instructions . . . . . . 197 5.5.7 Quantum Adjustment Operations 163 5.6.7 DFP Instruction Summary . . . . . 201 5.5.8 Conversion Operations . . . . . . . 163 5.1 Decimal Floating-Point The DFP facility also shares the Condition Register (CR) with the fixed-Point facility, the BFP faciltiy, and (DFP) Facility Overview the vector facility. This chapter describes the behavior of the decimal The DFP facility supports three DFP data formats: DFP floating-point facility, the supported data types, formats, Short (single precision), DFP Long (double precision), and classes, and the usage of registers. Also included and DFP Extended (quad precision). Most operations are the execution model, exceptions, and instructions are performed on DFP Long or DFP Extended format supported by the decimal floating-point facility. directly. Support for DFP Short is limited to conversion to and from DFP Long. Some DFP instructions operate The decimal floating-point (DFP) facility shares the 32 on other data types, including signed or unsigned floating-point registers (FPRs) and the Floating-Point binary fixed-point data, and signed or unsigned decimal Status and Control Register (FPSCR) with the float- data. ing-point (BFP) facility. However, the interpretation of data formats in the FPRs, and the meaning of some DFP instructions are provided to perform arithmetic, control and status bits in the FPSCR are different compare, test, quantum-adjustment, conversion, and between the BFP and DFP facilities. format operations on operands held in FPRs or FPR pairs. Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 153 Version 2.06 Arithmetic instructions Inexact Exception (XX) These instructions perform addition, subtraction, Each DFP exception and each category of Invalid multiplication, and division operations. Operation Exception has an exception status bit in the Compare instructions FPSCR. In addition, each of the five DFP exceptions has a corresponding enable bit in the FPSCR. These These instructions perform a comparison opera- enable bits enable or disable the invocation of the sys- tion on the numerical value of two DFP operands. tem floating-point enabled exception error handler, and Test instructions may affect the setting of some exception status bits in the FPSCR. These instructions test the data class, the data group, the exponent, or the number of significant The usage of these bits by the DFP facility differs from digits of a DFP operand. the usage by the BFP facility. Section 5.5.10 "DFP Exceptions" on page 164 provides a detailed discus- Quantum-adjustment instructions sion of DFP exceptions, including the effects of the These instructions convert a DFP number to a enable bits. result in the form that has the designated expo- nent, which may be explicitly or implicitly specified. 5.2 DFP Register Handling Conversion instructions These instructions perform conversion between The following sections describe first how the float- different data formats or data types. ing-point registers are utilized by the DFP facility. The subsequent section covers the DFP usage of CR and Format instructions FPSCR. These instructions facilitate composing or decom- posing a DFP operand. 5.2.1 DFP Usage of Float- These instructions are described in Section 5.6 "DFP ing-Point Registers Instruction Descriptions" on page 172. The DFP facility shares the same 32 64-bit FPRs with The three DFP data formats allow finite numbers to be the BFP facility. Like the FP instructions, DFP instruc- represented with different precision and ranges. Spe- tions also use 5-bit fields for designating the FPRs to cial codes are also provided to represent +Infinity, hold the source or target operands. -Infinity, Quiet NaN (Not-a-Number), and Signaling NaN. Operations involving infinities produce results When data in DFP Short format is held in a FPR, it obeying traditional mathematical conventions. NaNs occupies the rightmost 32 bits of the FPR. The Load have no mathematical interpretation. The encoding of Floating-Point as Integer Word Algebraic instruction is NaNs provides a diagnostic information field. This diag- provided to load the rightmost 32 bits of a FPR with a nostic field may be used to indicate such things as the single-word data from storage. The Store Float- source of an uninitialized variable or the reason an ing-Point as Integer Word instruction is available to invalid result was produced. store the rightmost 32 bits of a FPR to a storage loca- tion. The DFP processor recognizes a set of DFP excep- tions which are indicated via bits set in the FPSCR. Data in DFP Long format, 64-bit binary fixed-point val- Additionally, the DFP exception actions depend on the ues, or 64-bit BCD values is held in a FPR using all 64 setting of the various exception enable bits in the bits. Data of 64 bits may be loaded from storage via FPSCR. any of the Load Floating-Point Double instructions and stored via any of the Store Floating-Point Double The following DFP exceptions are detected by the DFP instructions. processor. The exception status bits in the FPSCR are indicated in parentheses. Data in DFP Extended format or 128-bit BCD values is Invalid Operation Exception (VX) held in an even-odd FPR pair using all 128 bits. Data of SNaN (VXSNAN) 128 bits must be loaded into the desired even-odd pair - (VXISI) of floating-point registers using an appropriate ÷ (VXIDI) sequence of the Load Floating-Point Double instruc- 0 ÷ 0 (VXZDZ) tions and stored using an appropriate sequence of the % 0 (VXIMZ) Store Floating-Point Double instructions. Invalid Compare (VXVC) Data used as a source operand by any Decimal Float- Invalid conversion (VXCVI) ing-Point instruction that was produced, either directly Zero Divide Exception (ZX) or indirectly, by a Load Floating-Point Single instruc- Overflow Exception (OX) tion, a Floating Round to Single-Precision instruction, Underflow Exception (UX) 154 Power ISATM Book I Version 2.06 or a binary floating-point single-precision arithmetic 35 Floating-Point Overflow Exception (OX) instruction is boundedly undefined. See Section 5.5.10.3, "Overflow Exception" on page 167. When an even-odd FPR pair is used to hold a 128-bit operand, the even-numbered FPR is used to hold the 36 Floating-Point Underflow Exception (UX) leftmost doubleword of the operand and the next See Section 5.5.10.4, "Underflow Exception" higher-numbered FPR is used to hold the rightmost on page 168. doubleword. A DFP instruction designating an 37 Floating-Point Zero Divide Exception (ZX) odd-numbered FPR for a 128-bit operand is an invalid See Section 5.5.10.2, "Zero Divide Exception" instruction form. on page 167. Programming Note 38 Floating-Point Inexact Exception (XX) The Floating-Point Move instructions can be used See Section 5.5.10.5, "Inexact Exception" on to move operands between FPRs. page 169. FPSCRXX is a sticky version of FPSCRFI (see The bit definitions for the FPSCR are as follows. below). Thus the following rules completely describe how FPSCRXX is set by a given Bit(s) Description instruction. 0:28 Reserved If the instruction affects FPSCRFI, the 29:31 DFP Rounding Control (DRN) new value of FPSCRXX is obtained by See Section 5.5.2, "Rounding Mode Specifi- ORing the old value of FPSCRXX with cation" on page 161. the new value of FPSCRFI. If the instruction does not affect 000Round to Nearest, Ties to Even FPSCRFI, the value of FPSCRXX is 001Round toward Zero unchanged. 010Round toward +Infinity 011Round toward -Infinity 39 Floating-Point Invalid Operation Excep- 100Round to Nearest, Ties away from 0 tion (SNaN) (VXSNAN) 101Round to Nearest, Ties toward 0 See Section 5.5.10.1, "Invalid Operation 110Round to away from Zero Exception" on page 166. 111Round to Prepare for Shorter Precision 40 Floating-Point Invalid Operation Excep- tion ( - ) (VXISI) Programming Note See Section 5.5.10.1. FPSCR28 is reserved for extension of the 41 Floating-Point Invalid Operation Excep- DRN field, therefore DRN may be set tion ( + ) (VXIDI) using the mtfsfi instruction to set the See Section 5.5.10.1. rounding mode. 142 Floating-Point Invalid Operation Excep- 32 Floating-Point Exception Summary (FX) tion (0+ 0) (VXZDZ) Every floating-point instruction, except mtfsfi See Section 5.5.10.1. and mtfsf, implicitly sets FPSCRFX to 1 if that 43 Floating-Point Invalid Operation Excep- instruction causes any of the floating-point tion ( % 0) (VXIMZ) exception bits in the FPSCR to change from 0 See Section 5.5.10.1. to 1. mcrfs, mtfsfi, mtfsf, mtfsb0, and mtfsb1 can alter FPSCRFX explicitly. 44 Floating-Point Invalid Operation Excep- tion (Invalid Compare) (VXVC) 33 Floating-Point Enabled Exception Sum- See Section 5.5.10.1. mary (FEX) This bit is the OR of all the floating-point 45 Floating-Point Fraction Rounded (FR) exception bits masked by their respective The last Arithmetic or Rounding and Conver- enable bits. mcrfs, mtfsfi, mtfsf, mtfsb0, sion instruction incremented the fraction dur- and mtfsb1 cannot alter FPSCRFEX explicitly. ing rounding. See Section 5.5.1, "Rounding" on page 160. This bit is not sticky. 34 Floating-Point Invalid Operation Excep- tion Summary (VX) 46 Floating-Point Fraction Inexact (FI) This bit is the OR of all the Invalid Operation The last Arithmetic or Rounding and Conver- exception bits. mcrfs, mtfsfi, mtfsf, mtfsb0, sion instruction either produced an inexact and mtfsb1 cannot alter FPSCRVX explicitly. result during rounding or caused a disabled Overflow Exception. See Section 5.5.1. This bit is not sticky. Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 155 Version 2.06 See the definition of FPSCRXX, above, 57 Floating-Point Overflow Exception Enable regarding the relationship between FPSCRFI (OE) and FPSCRXX. See Section 5.5.10.3, "Overflow Exception" on page 167. 47:51 Floating-Point Result Flags (FPRF) This field is set as described below. For arith- 58 Floating-Point Underflow Exception metic, rounding, and conversion instructions, Enable (UE) the field is set based on the result placed into See Section 5.5.10.4, "Underflow Exception" the target register, except that if any portion of on page 168. the result is undefined then the value placed 59 Floating-Point Zero Divide Exception into FPRF is undefined. Enable (ZE) 47 Floating-Point Result Class Descriptor (C) See Section 5.5.10.2, "Zero Divide Exception" Arithmetic, rounding, and conversion instruc- on page 167. tions may set this bit with the FPCC bits, to 60 Floating-Point Inexact Exception Enable indicate the class of the result as shown in (XE) Figure 61 on page 156. See Section 5.5.10.5, "Inexact Exception" on 48:51 Floating-Point Condition Code (FPCC) page 169 Floating-point Compare and DFP Test instruc- 61 Reserved (not used by DFP) tions set one of the FPCC bits to 1 and the other three FPCC bits to 0. Arithmetic, round- 62:63 Binary Floating-Point Rounding Control ing, and conversion instructions may set the (RN) FPCC bits with the C bit, to indicate the class See Section 5.5.1, "Rounding" on page 160. of the result as shown in Figure 61 on 00 Round to Nearest page 156. Note that in this case the high-order 01 Round toward Zero three bits of the FPCC retain their relational 10 Round toward +Infinity significance indicating that the value is less 11 Round toward -Infinity than, greater than, or equal to zero. 48 Floating-Point Less Than or Negative (FL Result or <) Flags Result Value Class C < > = ? 49 Floating-Point Greater Than or Positive (FG or >) 0 0 0 0 1 Signaling NaN (DFP only) 1 0 0 0 1 Quiet NaN 50 Floating-Point Equal or Zero (FE or =) 0 1 0 0 1 - Infinity 51 Floating-Point Unordered or NaN (FU or ?) 0 1 0 0 0 - Normal Number 52 Reserved 1 1 0 0 0 - Subnormal Number 1 0 0 1 0 - Zero 53 Floating-Point Invalid Operation Excep- 0 0 0 1 0 + Zero tion (Software Request) (VXSOFT) 1 0 1 0 0 + Subnormal Number This bit can be altered only by mcrfs, mtfsfi, mtfsf, mtfsb0, or mtfsb1. See 0 0 1 0 0 + Normal Number Section 5.5.10.1, "Invalid Operation Excep- 0 0 1 0 1 + Infinity tion" on page 166. Figure 61. Floating-Point Result Flags 54 Neither used nor changed by DFP. Programming Note 5.3 DFP Support for Non-DFP Although the architecture does not pro- vide a DFP square root instruction, if soft- Data Types ware simulates such an instruction, it In addition to the DFP data types, the DFP processor should set bit 54 whenever the source provides limited support for the following non-DFP data operand of the square root function is types: signed or unsigned binary fixed-point data, and invalid. signed or unsigned decimal data. 55 Floating-Point Invalid Operation Excep- In unsigned binary fixed-point data, all bits are used to tion (Invalid Conversion) (VXCVI) express the absolute value of the number. For signed See Section 5.5.10.1. binary fixed-point data, the leftmost bit represents the sign, which is followed by the numeric field. Positive 56 Floating-Point Invalid Operation Excep- numbers are represented in true binary notation with tion Enable (VE) the sign bit set to zero. When the value is zero, all bits See Section 5.5.10.1. 156 Power ISATM Book I Version 2.06 are zeros, including the sign bit. Negative numbers are tion exception occurs. A summary of digit and sign represented in two's complement binary notation with a codes are provided in Figure 64. one in the sign-bit position. Binary Recognized As For decimal data, each byte contains a pair of four-bit nibbles; each four-bit nibble contains a Code Digit Sign binary-coded-decimal (BCD) code. There are two kinds 0000 0 Invalid of BCD codes: digit code and sign code. For unsigned 0001 1 Invalid decimal data, all nibbles contain a digit code (D) as shown in Figure 62 0010 2 Invalid 0011 3 Invalid D D D D ... D D D D 0100 4 Invalid Figure 62. Format for Unsigned Decimal Data 0101 5 Invalid For signed decimal data, the rightmost nibble contains 0110 6 Invalid a sign code (S) and all other nibbles contain a digit 0111 7 Invalid code as shown in Figure 63. 1000 8 Invalid 1001 9 Invalid D D D D ... D D D S 1010 Invalid Plus Figure 63. Format for Signed Decimal Data 1011 Invalid Minus The decimal digits 0-9 have the binary encoding 1100 Invalid Plus (preferred; option 1) 0000-1001. The preferred plus-sign codes are 1100 1101 Invalid Minus (preferred) and 1111. The preferred minus sign code is 1101. These are the sign codes generated for the results of 1110 Invalid Plus the Decode DPD To BCD instruction. A selection is 1111 Invalid Plus (preferred; option 2) provided by this instruction to specify which of the two preferred plus sign codes is to be generated. Alternate Figure 64. Summary of BCD Digit and Sign Codes sign codes are also recognized as valid in the sign position: 1010 and 1110 are alternate sign codes for plus, and 1011 is an alternate sign code for minus. 5.4 DFP Number Representation Alternate sign codes are accepted for any source oper- A DFP finite number consists of three components: a and, but are not generated as a result by the instruc- sign bit, a signed exponent, and a significand. The tion. When an invalid digit or sign code is detected by signed exponent is a signed binary integer. The signifi- the Encode BCD To DPD instruction, an invalid-opera- cand consists of a number of decimal digits, which are to the left of the implied decimal point. The rightmost digit of the significand is called the units digit. The numerical value of a DFP finite number is represented as (-1)sign % significand % 10exponent and the unit value of this number is (1 % 10exponent), which is called the quantum. DFP finite numbers are not normalized. This allows leading zeros and trailing zeros to exist in the signifi- cand. This unnormalized DFP number representation allows some values to have redundant forms; each form represents the DFP number with a different com- bination of the significand value and the exponent value. For example, 1000000 % 105 and 10 % 1010 are two different forms of the same numerical value. A form of this number representation carries information about both the numerical value and the quantum of a DFP finite number. The significant digits of a DFP finite number are the digits in the significand beginning with the leftmost non- zero digit and ending with the units digit. Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 157 Version 2.06 5.4.1 DFP Data Format for denoting the value as either a Not-a-Number or an Infinity. DFP numbers and NaNs may be represented in FPRs The first 5 bits of the combination field contain the in any of the three data formats: DFP Short, DFP Long, encoding of NaN or infinity, or the two leftmost bits of or DFP Extended. The contents of each data format the biased exponent and the leftmost digit (LMD) of the represent encoded information. Special codes are significand. The following tables show the encoding: assigned to NaNs and infinities. Different formats sup- port different sizes in both significand and exponent. Arithmetic, compare, test, quantum-adjustment, and G0:4 Description format instructions are provided for DFP Long and DFP 11111 NaN Extended formats only. 11110 Infinity The sign is encoded as a one bit binary value. Signifi- cand is encoded as an unsigned decimal integer in two All others Finite Number (see Figure 69) distinct parts. The leftmost digit (LMD) of the signifi- Figure 68. Encoding of the G field for Special cand is encoded as part of the combination field; the Symbols remaining digits of the significand are encoded in the trailing significand field. The exponent is contained in Leftmost 2-bits of biased exponent the combination field in two parts. However, prior to LMD encoding, the exponent is converted to an unsigned 00 01 10 binary value called the biased exponent by adding a 0 00000 01000 10000 bias value which is a constant for each format. The two 1 00001 01001 10001 leftmost bits of the biased exponent are encoded with 2 00010 01010 10010 the leftmost digit of the significand in the leftmost bits of the combination field. The rest of the biased exponent 3 00011 01011 10011 occupies the remaining portion of the combination field. 4 00100 01100 10100 5 00101 01101 10101 5.4.1.1 Fields Within the Data Format 6 00110 01110 10110 The DFP data representation comprises three fields, as 7 00111 01111 10111 diagrammed below for each of the three formats: 8 11000 11010 11100 9 11001 11011 11101 S G T Figure 69. Encoding of bits 0:4 of the G field for 0 1 12 31 Finite Numbers Figure 65. DFP Short format For DFP finite numbers, the rightmost N-5 bits of the N-bit combination field contain the remaining bits of the biased exponent. For NaNs, bit 5 of the combination S G T field is used to distinguish a Quiet NaN from a Signal- 0 1 14 63 ing NaN; the remaining bits in a source operand are Figure 66. DFP Long format ignored and they are set to zeros in a target operand by most operations. For infinities, the rightmost N-5 bits of the N-bit combination field of a source operand are S G T ignored and they are set to zeros in a target operand by 0 1 18 63 most operations. T (continued) Trailing Significand field (T) 64 127 For DFP finite numbers, this field contains the remain- Figure 67. DFP Extended format ing significand digits. For NaNs, this field may be used to contain diagnostic information. For infinities, con- The fields are defined as follows: tents in this field of a source operand are ignored and they are set to zeros in a target operand by most oper- Sign bit (S) ations. The trailing significand field is a multiple of The sign bit is in bit 0 of each format, and is zero for 10-bit blocks. The multiple depends on the format. plus and one for minus. Each 10-bit block is called a declet and represents Combination field (G) three decimal digits, using the Densely Packed Deci- As the name implies, this field provides a combination mal (DPD) encoding defined in Appendix B. of the exponent and the left-most digit (LMD) of the sig- nificand, for finite numbers, or provides a special code 158 Power ISATM Book I Version 2.06 5.4.1.2 Summary of DFP Data Formats The properties of the three DFP formats are summa- rized in the following table:. Format DFP Short DFP Long DFP Extended Widths (bits): Format 32 64 128 Sign (S) 1 1 1 Combination (G) 11 13 17 Trailing Significand (T) 20 50 110 Exponent: Maximum biased 191 767 12,287 Maximum (Xmax) 90 369 6111 Minimum (Xmin) -101 -398 -6176 Bias 101 398 6176 Precision (p) (digits) 7 16 34 Magnitude: Maximum normal number (Nmax) (107 - 1) x 1090 (1016 - 1) x 10369 (1034 - 1) x 106111 -95 -383 Minimum normal number (Nmin) 1 x 10 1 x 10 1 x 10-6143 Minimum subnormal number (Dmin) 1 x 10-101 1 x 10-398 1 x 10-6176 Figure 70. Summary of DFP Formats 5.4.1.3 Preferred DPD Encoding 5.4.2 Classes of DFP Data Execution of DFP instructions decodes source oper- There are six classes of DFP data, which include ands from DFP data formats to an internal format for numerical and nonnumeric entities. The numerical enti- processing, and encodes the operation result before ties include zero, subnormal number, normal number, the final result is returned as the target operand. and infinity data classes. The nonnumeric entities As part of the decoding process, declets in the trailing include quiet and signaling NaNs data classes. The significand field of source operands are decoded to value of a DFP finite number, including zero, subnor- their corresponding BCD digit codes using the mal number, and normal number, is a quantization of DPD-to-BCD decoding algorithm. As part of the encod- the real number based on the data format. The Test ing process, BCD digit codes to be stored into the Data Class instruction may be used to determine the trailing significand field of the target operand are class of a DFP operand. In general, an operation that encoded into declets using the BCD-to-DPD encoding returns a DFP result sets the FPSCRFPRF field to indi- algorithm. Both the decoding and encoding algorithms cate the data class of the result. are defined in Appendix B. The following tables show the value ranges for As explained in Appendix B, there are eight 3-digit dec- finite-number data classes, and the codes for NaNs imal values that have redundant DPD codes and one and infinities. preferred DPD code. All redundant DPD codes are rec- ognized in source operands for the associated 3-digit decimal number. DFP operations will always generate the preferred DPD codes for the trailing significand field of the target operand. Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 159 Version 2.06 Data Class Sign Magnitude 0b111110 in the leftmost 6 bits of the combination field indicates a Quiet NaN, whereas 0b111111 indicates a Zero ± 0* Signaling NaN. Subnormal ± Dmin |X| < Nmin A special QNaN is sometimes supplied as the default Normal ± Nmin |Y| Nmax QNaN for a disabled invalid-operation exception; it has * The significand is zero and the exponent is any rep- a plus sign, the leftmost 6 bits of the combination field resentable value set to 0b111110 and remaining bits in the combination field and the trailing significand field set to zero. Figure 71. Value Ranges for Finite Number Data Classes Normally, source QNaNs are propagated during opera- tions so that they will remain visible at the end. When a Data Class S G T QNaN is propagated, the sign is preserved, the decimal +Infinity 0 11110xxx . . . xxx xxx . . . xxx value of the trailing significand field is preserved but reencoded using the preferred DPD codes, and the ­Infinity 1 11110xxx . . . xxx xxx . . . xxx contents in the rightmost N-6 bits of the combination Quiet NaN x 111110xx . . . xxx xxx . . . xxx field set to zero, where N is the width of the combina- Signaling NaN x 111111xx . . . xxx xxx . . . xxx tion field for the format. x Don't care A source SNaN generally causes an invalid-operation exception. If the exception is disabled, the SNaN is converted to the corresponding QNaN and propagated. Figure 72. Encoding of NaN and Infinity Data The primary encoding difference between an SNaN Classes and a QNaN is that bit 5 of an SNaN is 1 and bit 5 of a Zeros QNaN is 0. When an SNaN is propagated as a QNaN, Zeros have a zero significand and any representable bit 5 is set to 0, and, just as with QNaN proagation, the value in the exponent. A +0 is distinct from -0, and sign is preserved, the decimal value of the trailing sig- zeros with different exponents are distinct, except that nificand field is preserved but reencoded using the pre- comparison treats them as equal. ferred DPD codes, and the contents in the rightmost N-6 bits of the combination field set to zero, where N is Subnormal Numbers the width of the combination field for the format. For Subnormal numbers have values that are smaller than some format-conversion instructions, a source SNaN Nmin and greater than zero in magnitude. does not cause an invalid-operation exception, and an Normal Numbers SNaN is returned as the target operand. Normal numbers are nonzero finite numbers whose For instructions with two source NaNs and a NaN is to magnitude is between Nmin and Nmax inclusively. be propagated as the result, do the following. Infinities If there is a QNaN in FRA and an SNaN in FRB, Infinities are represented by 0b11110 in the leftmost 5 the SNaN in FRB is propagated. bits of the combination field. When an operation is Otherwise, propagate the NaN is FRA. defined to generate an infinity as the result, a default infinity is sometimes supplied. A default infinity has all remaining bits in the combination field and trailing sig- 5.5 DFP Execution Model nificand field set to zeros. DFP operations are performed as if they first produce When infinities are used as source operands, only the an intermediate result correct to infinite precision and leftmost 5 bits of the combination field are interpreted with unbounded range. The intermediate result is then (i.e., 0b11110 indicates the value is an infinity). The rounded to the destination's precision according to one trailing significand field of infinities is usually ignored. of the eight DFP rounding modes. If the rounded result For generated infinities, the leftmost 5 bits of the combi- has only one form, it is delivered as the final result; if nation field are set to 0b11110 and all remaining combi- the rounded result has redundant forms, then an ideal nation bits are set to zero. exponent is used to select the form of the final result. The ideal exponent determines the form, not the value, Infinities can participate in most arithmetic operations of the final result. (See Section 5.5.3 "Formation of and give a consistent result. In comparisons, any Final Result" on page 162.) +Infinity compares greater than any finite number, and any -Infinity compares less than any finite number. All +Infinity are compared equal and all -Infinity are com- 5.5.1 Rounding pared equal. Rounding takes a number regarded as infinitely precise Signaling and Quiet NaNs and, if necessary, modifies it to fit the destination's pre- There are two types of Not-a-Numbers (NaNs), Signal- cision. The destination's precision of an operation ing (SNaN) and Quiet (QNaN). defines the set of permissible resultant values. For 160 Power ISATM Book I Version 2.06 most operations, the destination's precision is the tar- of a tie, choose the larger in magnitude (Z1 or Z2). get-format precision and the permissible resultant val- However, an infinitely precise result with magnitude at ues are those values representable in the target format. least (Nmax + 0.5Q(Nmax)) is rounded to infinity with no For some special operations, the destination precision change in sign; where Q(Nmax) is the quantum of Nmax. is constrained by both the target format and some addi- Round to Nearest, Ties toward 0 tional restrictions, and the permissible resultant values Choose the value that is closer to Z (Z1 or Z2). In case are a subset of the values representable in the target of a tie, choose the smaller in magnitude (Z1 or Z2). format. However, an infinitely precise result with magnitude Rounding sets FPSCR bits FR and FI. When an inex- greater than (Nmax + 0.5Q(Nmax)) is rounded to infinity act exception occurs, FI is set to one; otherwise, FI is with no change in sign; where Q(Nmax) is the quantum set to zero. When an inexact exception occurs and if of Nmax. the rounded result is greater in magnitude than the Round away from 0 intermediate result, then FR is set to one; otherwise, Choose the larger in magnitude (Z1 or Z2). FR is set to zero. The exception is the Round to FP Integer Without Inexact instruction, which always sets Round to prepare for shorter precision FR and FI to zero. Rounding may cause an overflow Choose the smaller in magnitude (Z1 or Z2). If the exception or underflow exception; it may also cause an selected value is inexact and the units digit of the inexact exception. selected value is either 0 or 5, then the digit is incre- mented by one and the incremented result is delivered. Refer to Figure 73 below for rounding. Let Z be the In all other cases, the selected value is delivered. intermediate result of a DFP operation. Z may or may When a value has redundant forms, the units digit is not fit in the destination's precision. If Z is exactly one determined by using the form that has the smallest of the permissible representable resultant values, then exponent. the final result in all rounding modes is Z. Otherwise, either Z1 or Z2 is chosen to approximate the result, where Z1 and Z2 are the next larger and smaller per- 5.5.2 Rounding Mode Specifica- missible resultant values, respectively. tion Unless otherwise specified in the instruction definition, the rounding mode used by an operation is specified in By increasing |Z| the DFP rounding control (DRN) field of the FPSCR. Infinitely precise value The eight DFP rounding modes are encoded in the By decreasing |Z| DRN field as specified in the table below. DRN Rounding Mode Z2 Z1 0 Z2 Z1 000 Round to Nearest, Ties to Even Z Z 001 Round toward 0 Negative values Positive Values 010 Round toward +Infinity 011 Round toward -Infinity 100 Round to Nearest, Ties away from 0 Figure 73. Rounding 101 Round to Nearest, Ties toward 0 Round to Nearest, Ties to Even 110 Round away from 0 Choose the value that is closer to Z (Z1 or Z2). In case 111 Round to Prepare for Shorter Precision of a tie, choose the one whose units digit would have Figure 74. Encoding of DFP Rounding-Mode been even in the form with the largest common quan- Control (DRN) tum of the two permissible resultant values. However, an infinitely precise result with magnitude at least (Nmax For the quantum-adjustment, a 2-bit immediate field, + 0.5Q(Nmax)) is rounded to infinity with no change in called RMC (Rounding Mode Control), in the instruction sign; where Q(Nmax) is the quantum of Nmax. specifies the rounding mode used. The RMC field may contain a primary encoding or a secondary encoding. Round toward 0 For Quantize, Quantize Immediate, and Reround, the Choose the smaller in magnitude (Z1 or Z2). RMC field contains the primary encoding. For Round Round toward + to FP Integer the field contains either encoding, Choose Z1. depending on the setting of a RMC-encoding-selection Round toward - Choose Z2. Round to Nearest, Ties away from 0 Choose the value that is closer to Z (Z1 or Z2). In case Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 161 Version 2.06 bit. The following tables define the primary encoding The following table specifies the ideal exponent for and the secondary encoding. each instruction. Primary Operations Ideal Exponent Rounding Mode RMC Add min(E(FRA), E(FRB)) 00 Round to nearest, ties to even 01 Round toward 0 Subtract min(E(FRA), E(FRB)) 10 Round to nearest, ties away from 0 Multiply E(FRA) + E(FRB) 11 Round according to FPSCRDRN Divide E(FRA) - E(FRB) Figure 75. Primary Encoding of Rounding-Mode Quantize-Immediate See Instruction Description Control Quantize E(FRA) Reround See Instruction Description Secondary Rounding Mode Round to FP Integer max(0, E(FRA)) RMC 00 Round to + Convert to DFP Long E(FRA) 01 Round to - Convert to DFP E(FRA) 10 Round away from 0 Extended 11 Round to nearest, ties toward 0 Round to DFP Short E(FRA) Figure 76. Secondary Encoding of Rounding-Mode Round to DFP Long E(FRA) Control Convert from Fixed 0 Encode BCD to DPD 0 5.5.3 Formation of Final Result Insert Biased Expo- E(FRA) nent An ideal exponent is defined for each DFP instruction that returns a DFP data operand. Notes: E(x) - exponent of the DFP operand in register x. 5.5.3.1 Use of Ideal Exponent Figure 77. Summary of Ideal Exponents For all DFP operations, if the rounded intermediate result has only one 5.5.4 Arithmetic Operations form, then that form is delivered as the final result. if the rounded intermediate result has redundant. Four arithmetic operations are provided: Add, Subtract, forms and is exact, then the form with the expo- Multiply, and Divide. nent closest to the ideal exponent is delivered. if the rounded intermediate result has redundant 5.5.4.1 Sign of Arithmetic Result forms and is inexact, then the form with the small- est exponent is delivered. The following rules govern the sign of an arithmetic operation when the operation does not yield an excep- tion. They apply even when the operands or results are zeros or infinities. The sign of the result of an add operation is the sign of the source operand having the larger abso- lute value. If both source operands have the same sign, the sign of the result of an add operation is the same as the sign of the source operands. When the sum of two operands with opposite signs is exactly zero, the sign of the result is positive in all rounding modes except Round toward -, in which case the sign is negative. The sign of the result of the subtract operation x - y is the same as the sign of the result of the add operation x + (-y). The sign of the result of a multiply or divide opera- tion is the exclusive-OR of the signs of the source operands. 162 Power ISATM Book I Version 2.06 5.5.5 Compare Operations the sign and significand of operands. Infinities compare equal, and NaNs compare equal. The test result is indi- Two sets of instructions are provided for comparing cated in the FPSCRFPCC field and CR field BF. numerical values: Compare Ordered and Compare The Test Significance instruction compares the number Unordered. In the absence of NaNs, these instructions of significant digits of one source operand with the ref- work the same. These instructions work differently erenced number of significant digits in another source when either of the followings is true: operand. The test result is indicated in the FPSCRFPCC 1. At least one source operand of the instruction is an field and CR field BF. SNaN and the invalid-operation exception is dis- Execution of a test instruction does not cause any DFP abled. exception. 2. When there is no SNaN in any source operand, at least one source operand of the instruction is a QNaN 5.5.7 Quantum Adjustment Opera- In case 1, Compare Unordered recognizes an tions invalid-operation exception and sets the FPSCRVXS- NAN flag, but Compare Ordered recognizes the excep- Four kinds of quantum-adjustment operations are pro- tion and sets both the FPSCRVXSNAN and FPSCRVXVC vided: Quantize, Quantize Immediate, Reround, and flags. In case 2, Compare Unordered does not recog- Round To FP Integer. Each of them has an immediate nize an exception, but Compare Ordered recognizes an field which specifies whether the rounding mode in invalid-operation exception and sets the FPSCRVXVC FPSCR or a different one is to be used. flag. The Quantize instruction is used to adjust a DFP num- For finite numbers, comparisons are performed on val- ber to the form that has the specified target exponent. ues, that is, all redundant forms of a DFP number are The Quantize Immediate instruction is similar to the treated equal. Quantize instruction, except that the target exponent is specified in a 5-bit immediate field as a signed binary Comparisons are always exact and cannot cause an integer and has a limited range. inexact exception. The Reround instruction is used to simulate a DFP Comparison ignores the sign of zero, that is, +0 equals operation of a precision other than that of DFP Long or -0. DFP Extended. For the Reround instruction to produce Infinities with like sign compare equal, that is, + a result which accurately reflects that which would have equals +, and - equals -. resulted from a DFP operation of the desired precision d in the range {1: 33} inclusively, the following condi- A NaN compares as unordered with any other operand, tions must be met: whether a finite number, an infinity, or another NaN, including itself. The precision of the preceding DFP operation must be at least one digit larger than d. Execution of a compare instruction always completes, The rounding mode used by the preceding DFP regardless of whether any DFP exception occurs or operation must be round-to-pre- not, and whether the exception is enabled or not. pare-for-shorter-precision. The Round To FP Integer instruction is used to round a 5.5.6 Test Operations DFP number to an integer value of the same format. Four kinds of test operations are provided: Test Data The target exponent is implicitly specified, and is Class, Test Data Group, Test Exponent, and Test Sig- greater than or equal to zero. nificance. The Test Data Class instruction examines the contents 5.5.8 Conversion Operations of a source operand and determines if the operand is There are two kinds of conversion operations: data-for- one of the specified data classes. The test result and mat conversion and data-type conversion. the sign of the source operand are indicated in the FPSCRFPCC field and CR field BF. The Test Data Group instruction examines the contents 5.5.8.1 Data-Format Conversion of a source operand and determines if the operand is The instructions Convert To DFP Long and Convert To one of the specified data groups. The test result and DFP Extended convert DFP operands to wider formats; the sign of the source operand are indicated in the the instructions Round To DFP Short and Round To FPSCRFPCC field and CR field BF. DFP Long convert DFP operands to narrower formats. The Test Exponent instruction compares the exponent When converting a finite number to a wider format, the of the two source operands. The test operation ignores result is exact. When converting a finite number to a Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 163 Version 2.06 narrower format, the source operand is rounded to the 5.5.9 Format Operations target-format precision, which is specified by the instruction, not by the target register size. The format instructions are provided to facilitate com- posing or decomposing a DFP number, and consist of When converting a finite number, the ideal exponent of Encode BCD To DPD, Decode DPD To BCD, Extract the result is the source exponent. Biased Exponent, Insert Biased Exponent, Shift Signifi- Conversion of an infinity or NaN to a different format cand Left Immediate, and Shift Significand Right Imme- does not preserve the source combination field. Let N diate. A source operand of SNaN does not cause an be the width of the target format's combination field. invalid-operation exception, and an SNaN may be pro- duced as the target operand. When the result is an infinity or a QNaN, the con- tents of the rightmost N-5 bits of the N-bit target combination field are set to zero. 5.5.10 DFP Exceptions When the result is an SNaN, bit 5 of the target for- This architecture defines the following DFP exceptions: mat's combination field is set to one and the right- most N-6 bits of the N-bit target combination field Invalid Operation Exception are set to zero. SNaN - When converting a NaN to a wider format or when con- ÷ verting an infinity from DFP Short to DFP Long, digits in 0÷0 the source trailing significand field are reencoded using %0 the preferred DPD codes with sufficient zeros Invalid Compare appended on the left to form the target trailing signifi- Invalid Conversion cand field. When converting a NaN to a narrower for- Zero Divide Exception mat or when converting an infinity from DFP Long to Overflow Exception DFP Short, the appropriate number of leftmost digits of Underflow Exception the source trailing significand field are removed and the Inexact Exception remaining digits of the field are reencoded using the preferred DPD codes to form the target trailing signifi- These exceptions may occur during execution of a DFP cand field. instruction. When converting an infinity between DFP Long and Each DFP exception, and each category of the Invalid DFP Extended, a default infinity with the same sign is Operation Exception, has an exception status bit in the produced. FPSCR. In addition, each DFP exception has a corre- sponding enable bit in the FPSCR. The exception sta- When converting an SNaN between DFP Short and tus bit indicates occurrence of the corresponding DFP Long, it is converted to an SNaN without causing exception. If an exception occurs, the corresponding an invalid-operation exception. When converting an enable bit governs the result produced by the instruc- SNaN between DFP Long and DFP Extended, the tion and, in conjunction with the FE0 and FE1 bits (see invalid-operation exception occurs; if the invalid-opera- the discussion of FE0 and FE1 below), whether and tion exception is disabled, the result is converted to the how the system floating-point enabled exception error corresponding QNaN. handler is invoked. (In general, the enabling specified by the enable bit is of invoking the system error han- 5.5.8.2 Data-Type Conversion dler, not of permitting the exception to occur. The occurrence of an exception depends only on the The instructions Convert From Fixed and Convert To instruction and its source operands, not on the setting Fixed are provided to convert a number between the of any control bits. The only deviation from this general DFP data type and the signed 64-bit binary-integer rule is that the occurrence of an Underflow Exception data type. may depend on the setting of the enable bit.) Conversion of a signed 64-bit binary integer to a DFP A single instruction, other than mtfsfi or mtfsf, may set Extended number is always exact. more than one exception bit only in the following cases: Conversion of a DFP number to a signed 64-bit binary Inexact Exception may be set with Overflow integer results in an invalid-operation exception when Exception. the converted value does not fit into the target format, Inexact Exception may be set with Underflow or when the source operand is an infinity or NaN. When Exception. the exception is disabled, the most positive integer is Invalid Operation Exception (SNaN) may be set returned if the source operand is a positive number or with Invalid Operation Exception (Invalid Compare) +, and the most negative integer is returned if the for Compare Ordered instructions source operand is a negative number, -, or NaN. 164 Power ISATM Book I Version 2.06 Invalid Operation Exception (SNaN) may be set In this case the system floating-point enabled exception with Invalid Operation Exception (Invalid Conver- error handler is not invoked, even if DFP exceptions sion) for Convert To Fixed instructions. occur: software can inspect the FPSCR exception bits if necessary, to determine whether exceptions have When an exception occurs the instruction execution occurred. may be completed or partially completed, depending on the exception and the operation. In this architecture, if software is to be notified that a given kind of exception has occurred, the correspond- For all instructions, except for the Compare and Test ing FPSCR exception enable bit must be set to one and instructions, the following exceptions cause the instruc- a mode other than Ignore Exceptions Mode must be tion execution to be partially completed. That is, setting used. In this case the system floating-point enabled of CR field 1(when Rc=1) and exception status flags is exception error handler is invoked if an enabled DFP performed, but no result is stored into the target FPR or exception occurs. The system floating-point enabled FPR pair. For Compare and Test instructions, instruc- exception error handler is also invoked if a Move To tion execution is always completed, regardless of FPSCR instruction causes an exception bit and the cor- whether any DFP exception occurs or not, and whether responding enable bit both to be 1; the Move To the exception is enabled or not. FPSCR instruction is considered to cause the enabled Enabled Invalid Operation exception. Enabled Zero Divide The FE0 and FE1 bits control whether and how the For the remaining kinds of exceptions, instruction exe- system floating-point enabled exception error handler cution is completed, a result, if specified by the instruc- is invoked if an enabled DFP exception occurs. The tion, is generated and stored into the target FPR or location of these bits and the requirements for altering FPR pair, and appropriate status flags are set. The them are described in Book III, Power AS Operating result may be a different value for the enabled and dis- Environment Architecture. (The system floating-point abled conditions for some of these exceptions. The enabled exception error handler is never invoked kinds of exceptions that deliver a result in target FPR because of a disabled DFP exception.) The effects of are the following: the four possible settings of these bits are as follows. Disabled Invalid Operation Disabled Zero Divide FE0 FE1 Description Disabled Overflow 0 0 Ignore Exceptions Mode Disabled Underflow DFP exceptions do not cause the system Disabled Inexact floating-point enabled exception error Enabled Overflow handler to be invoked. Enabled Underflow 0 1 Imprecise Nonrecoverable Mode Enabled Inexact The system floating-point enabled excep- Subsequent sections define each of the DFP excep- tion error handler is invoked at some point tions and specify the action that is taken when they are at or beyond the instruction that caused detected. the enabled exception. It may not be pos- sible to identify the excepting instruction The IEEE standard specifies the handling of excep- or the data that caused the exception. tional conditions in terms of "traps" and "trap handlers". Results produced by the excepting In this architecture, a FPSCR exception enable bit of 1 instruction may have been used by or may causes generation of the result value specified in the have affected subsequent instructions IEEE standard for the "trap enabled" case: the expecta- that are executed before the error handler tion is that the exception will be detected by software, is invoked. which will revise the result. A FPSCR exception enable 1 0 Imprecise Recoverable Mode bit of 0 causes generation of the "default result" value The system floating-point enabled excep- specified for the "trap disabled" (or "no trap occurs" or tion error handler is invoked at some point "trap is not implemented") case: the expectation is that at or beyond the instruction that caused the exception will not be detected by software, which the enabled exception. Sufficient informa- will simply use the default result. The result to be deliv- tion is provided to the error handler that it ered in each case for each exception is described in can identify the excepting instruction and the sections below. the operands, and correct the result. No The IEEE default behavior when an exception occurs is results produced by the excepting instruc- to generate a default value and not to notify software. tion have been used by or have affected In this architecture, if the IEEE default behavior when subsequent instructions that are executed an exception occurs is desired for all exceptions, all before the error handler is invoked. FPSCR exception enable bits should be set to zero and Ignore Exceptions Mode (see below) should be used. Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 165 Version 2.06 FE0 FE1 Description enable bits set to one for those exceptions for which the system floating-point enabled exception 1 1 Precise Mode error handler is to be invoked. The system floating-point enabled excep- tion error handler is invoked precisely at Ignore Exceptions Mode should not, in general, be the instruction that caused the enabled used when any FPSCR exception enable bits are exception. set to one. In all cases, the question of whether a DFP result is Precise Mode may degrade performance in some stored, and what value is stored, is governed by the implementations, perhaps substantially, and there- FPSCR exception enable bits, as described in subse- fore should be used only for debugging and other quent sections, and is not affected by the value of the specialized applications. FE0 and FE1 bits. In all cases in which the system floating-point enabled 5.5.10.1 Invalid Operation Exception exception error handler is invoked, all instructions before the instruction at which the system floating-point Definition enabled exception error handler is invoked have com- pleted, and no instruction after the instruction at which An Invalid Operation Exception occurs when an oper- the system floating-point enabled exception error han- and is invalid for the specified DFP operation. The dler is invoked has begun execution. (Recall that, for invalid DFP operations are: the two Imprecise modes, the instruction at which the Any DFP operation on a signaling NaN (SNaN), system floating-point enabled exception error handler except for Test, Round To DFP Short, Convert To is invoked need not be the instruction that caused the DFP Long, Decode DPD To BCD, Extract Biased exception.) The instruction at which the system float- Exponent, Insert Biased Exponent, Shift Signifi- ing-point enabled exception error handler is invoked cand Left Immediate, and Shift Significand Right has not been executed unless it is the excepting Immediate instruction, in which case it has been executed if the For add or subtract operations, magnitude subtrac- exception is not among those listed on page 164 as tion of infinities (+) + (-) suppressed. Division of infinity by infinity ( ÷ ) Division of zero by zero (0 ÷ 0) Programming Note Multiplication of infinity by zero ( % 0) In the ignore and both imprecise modes, a Float- Ordered comparison involving a NaN (Invalid ing-Point Status and Control Register instruction Compare) can be used to force any exceptions, due to The Quantize operation detects that the signifi- instructions initiated before the Floating-Point Sta- cand associated with the specified target exponent tus and Control Register instruction, to be recorded would have more significant digits than the tar- in the FPSCR. (This forcing is superfluous for Pre- get-format precision cise Mode.) For the Quantize operation, when one source operand specifies an infinity and the other speci- In either of the Imprecise modes, a Floating-Point fies a finite number Status and Control Register instruction can be The Reround operation detects that the target used to force any invocations of the system float- exponent associated with the specified target sig- ing-point enabled exception error handler, due to nificance would be greater than Xmax instructions initiated before the Floating-Point Sta- The Encode BCD To DPD operation detects an tus and Control Register instruction, to occur. (This invalid BCD digit or sign code forcing has no effect in Ignore Exceptions Mode, The Convert To Fixed operation involving a num- and is superfluous for Precise Mode.) ber too large in magnitude to be represented in the target format, or involving a NaN. In order to obtain the best performance across the wid- est range of implementations, the programmer should obey the following guidelines. If the IEEE default results are acceptable to the application, Ignore Exceptions Mode should be used with all FPSCR exception enable bits set to zero. If the IEEE default results are not acceptable to the application, Imprecise Nonrecoverable Mode should be used, or Imprecise Recoverable Mode if recoverability is needed, with FPSCR exception 166 Power ISATM Book I Version 2.06 +, and to the most negative 64-bit binary Programming Note integer if the operand in FRB is a negative In addition, an Invalid Operation Exception occurs if number, - , or NaN. software explicitly requests this by executing an FPSCRFR FI are set to zero mtfsfi, mtfsf, or mtfsb1 instruction that sets FPSCRFPRF is unchanged FPSCRVXSOFT to 1 (Software Request). The pur- 4. If the operation is a compare, pose of FPSCRVXSOFT is to allow software to FPSCRFR FI C are unchanged cause an Invalid Operation Exception for a condi- FPSCRFPCC is set to reflect unordered tion that is not necessarily associated with the exe- cution of a DFP instruction. For example, it might be set by a program that computes a square root, if 5.5.10.2 Zero Divide Exception the source operand is negative. Definition Action A Zero Divide Exception occurs when a Divide instruc- tion is executed with a zero divisor value and a finite The action to be taken depends on the setting of the nonzero dividend value. Invalid Operation Exception Enable bit of the FPSCR. When Invalid Operation Exception is enabled Action (FPSCRVE=1) and Invalid Operation occurs, the follow- The action to be taken depends on the setting of the ing actions are taken: Zero Divide Exception Enable bit of the FPSCR. 1. One or two Invalid Operation Exceptions are set: When Zero Divide Exception is enabled (FPSCRZE=1) FPSCRVXSNAN (if SNaN) and Zero Divide occurs, the following actions are taken: FPSCRVXISI (if - ) FPSCRVXIDI (if ÷ ) 1. Zero Divide Exception is set FPSCRVXZDZ (if 0 ÷ 0) FPSCRZX 1 FPSCRVXIMZ (if % 0) 2. The target FPR is unchanged FPSCRVXVC (if invalid comp) 3. FPSCRFR FI are set to zero FPSCRVXCVI (if invalid conversion) 4. FPSCRFPRF is unchanged 2. If the operation is an arithmetic, quantum-adjust- ment, conversion, or format, When Zero Divide Exception is disabled (FPSCRZE=0) the target FPR is unchanged, and Zero Divide occurs, the following actions are taken: FPSCRFR FI are set to zero, and 1. Zero Divide Exception is set FPSCRFPRF is unchanged. FPSCRZX 1 3. If the operation is a compare, 2. The target FPR is set to ±, where the sign is FPSCRFR FI C are unchanged, and determined by the XOR of the signs of the oper- FPSCRFPCC is set to reflect unordered. ands 3. FPSCRFR FI are set to zero When Invalid Operation Exception is disabled 4. FPSCRFPRF is set to indicate the class and sign of (FPSCRVE=0) and Invalid Operation occurs, the follow- the result (±) ing actions are taken: 1. One or two Invalid Operation Exceptions are set: FPSCRVXSNAN (if SNaN) 5.5.10.3 Overflow Exception FPSCRVXISI (if - ) FPSCRVXIDI (if ÷ ) Definition FPSCRVXZDZ (if 0 ÷ 0) An overflow exception occurs whenever the target for- FPSCRVXIMZ (if % 0) mat's largest finite number is exceeded in magnitude FPSCRVXVC (if invalid comp) by what would have been the rounded result if the FPSCRVXCVI (if invalid conversion) exponent range were unbounded. 2. If the operation is an arithmetic, quantum-adjust- ment, Round to DFP Long, Convert to DFP Extended, or format Action the target FPR is set to a Quiet NaN Except for Reround, the following describes the han- FPSCRFR FI are set to zero dling of the IEEE overflow exception condition. The FPSCRFPRF is set to indicate the class of the Reround operation does not recognize an overflow result (Quiet NaN) exception condition. 3. If the operation is a Convert To Fixed the target FPR is set as follows: The action to be taken depends on the setting of the FRT is set to the most positive 64-bit binary Overflow Exception Enable bit of the FPSCR. integer if the operand in FRB is a positive or Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 167 Version 2.06 When Overflow Exception is enabled (FPSCROE=1) 4. The result is placed into the target FPR and overflow occurs, the following actions are taken: 5. FPSCRFR is set to one if the returned result is ± , and is set to zero if the returned result is ±Nmax 1. Overflow Exception is set 6. FPSCRFI is set to one FPSCROX 1 7. FPSCRFPRF is set to indicate the class and sign of 2. The infinitely precise result is divided by 10. That the result (± or ± Normal number) is, the exponent adjustment is subtracted from the exponent. This is called the wrapped result. The exponent adjustment for all operations, except 5.5.10.4 Underflow Exception for Round To DFP Short and Round To DFP Long, is 576 for DFP Long and 9216 for DFP Extended. Definition For Round To DFP Short and Round To DFP Long, the exponent adjustment is 192 for the Except for Reround, the following describes the han- source format of DFP Long and 3072 for the dling of the IEEE underflow exception condition. The source format of DFP Extended. Reround operation does not recognize an underflow 3. The wrapped result is rounded to the target-format exception condition. precision. This is called the wrapped rounded The Underflow Exception is defined differently for the result. enabled and disabled states. However, a tininess con- 4. If the wrapped rounded result has only one form, it dition is recognized in both states when a result com- is the delivered result. If the wrapped rounded puted as though both the precision and exponent range result has redundant forms and is exact, the result were unbounded would be nonzero and less than the of the form that has the exponent closest to the target format's smallest normal number, Nmin, in magni- wrapped ideal exponent is returned. If the wrapped tude. rounded result has redundant forms and is inexact, the result of the form that has the smallest expo- Unless otherwise defined in the instruction description, nent is returned. The wrapped ideal exponent is an underflow exception occurs as follows: the result of subtracting the exponent adjustment Enabled: from the ideal exponent. When the tininess condition is recognized. 5. FPSCRFPRF is set to indicate the class and sign of the result (± Normal Number) Disabled: When the tininess condition is recognized and When Overflow Exception is disabled (FPSCROE=0) when the delivered result value differs from what and overflow occurs, the following actions are taken: would have been computed were both the preci- 1. Overflow Exception is set sion and the exponent range unbounded. FPSCROX 1 2. Inexact Exception is set Action FPSCRXX 1 3. The result is determined by the rounding mode The action to be taken depends on the setting of the and the sign of the intermediate result as follows. Underflow Exception Enable bit of the FPSCR. When Underflow Exception is enabled (FPSCRUE=1) Sign of inter- and underflow occurs, the following actions are taken: mediate result 1. Underflow Exception is set Rounding Mode Plus Minus FPSCRUX 1 2. The infinitely precise result is multiplied by 10. Round to Nearest, Ties to Even + - That is, the exponent adjustment is added to the Round toward 0 +Nmax -Nmax exponent. This is called the wrapped result. The exponent adjustment for all operations, except for Round toward + + -Nmax Round To DFP Short and Round To DFP Long, is Round toward - +Nmax - 576 for DFP Long and 9216 for DFP Extended. For Round to Nearest, Ties away + - Round To DFP Short and Round To DFP Long, from 0 the exponent adjustment is 192 for the source for- mat of DFP Long and 3072 for the source format of Round to Nearest, Ties toward 0 + - DFP Extended. Round away from 0 + - 3. The wrapped result is rounded to the target-format precision. This is called the wrapped rounded Round to prepare for shorter pre- +Nmax -Nmax result. cision 4. If the wrapped rounded result has only one form, it Figure 78. Overflow Results When Exception Is is the delivered result. If the wrapped rounded Disabled result has redundant forms and is exact, the result of the form that has the exponent closest to the 168 Power ISATM Book I Version 2.06 wrapped ideal exponent is returned. If the wrapped rounded result has redundant forms and is inexact, the result of the form that has the smallest expo- nent is returned. The wrapped ideal exponent is the result of adding the exponent adjustment to the ideal exponent. 5. FPSCRFPRF is set to indicate the class and sign of the result (± Normal number) When Underflow Exception is disabled (FPSCRUE=0) and underflow occurs, the following actions are taken: 1. Underflow Exception is set FPSCRUX 1 2. The infinitely precise result is rounded to the tar- get-format precision. 3. The rounded result is returned. If this result has redundant forms, the result of the form that is clos- est to the ideal exponent is returned. 4. FPSCRFPRF is set to indicate the class and sign of the result (± Normal number, ± Subnormal Num- ber, or ± Zero) 5.5.10.5 Inexact Exception Definition Except for Round to FP Integer Without Inexact, the fol- lowing describes the handling of the IEEE inexact exception condition. The Round to FP Integer Without Inexact does not recognize an inexact exception condi- tion. An Inexact Exception occurs when either of two condi- tions occur during rounding: 1. The delivered result differs from what would have been computed were both the precision and expo- nent range unbounded. 2. The rounded result overflows and Overflow Excep- tion is disabled. Action The action to be taken does not depend on the setting of the Inexact Exception Enable bit of the FPSCR. When Inexact Exception occurs, the following actions are taken: 1. Inexact Exception is set FPSCRXX 1 2. The rounded or overflowed result is placed into the target FPR 3. FPSCRFPRF is set to indicate the class and sign of the result Programming Note In some implementations, enabling Inexact Excep- tions may degrade performance more than does enabling other types of floating-point exception. Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 169 Version 2.06 5.5.11 Summary of Normal Rounding And Range Actions Figure 79 and Figure 80 summarize rounding and range actions, with the following exceptions: The Reround operation recognizes neither an underflow nor an overflow exception. The Round to FP Integer Without Inexact opera- tion does not recognize the inexact operation exception. Result (r) when Rounding Mode Is Range of v Case RNE RNTZ RNAZ RAFZ RTMI RFSP RTPI RTZ 1 1 1 1 1 v < -Nmax, q < -Nmax Overflow - - - - - -Nmax -Nmax -Nmax v < -Nmax, q = -Nmax Normal -Nmax -Nmax -Nmax -- -- -Nmax -Nmax -Nmax -Nmax v -Nmin Normal b b b b b b b b -Nmin < v -Dmin Tiny b* b* b* b* b* b* b b -Dmin < v < -Dmin/2 Tiny -Dmin -Dmin -Dmin -Dmin -Dmin -Dmin -0 -0 v = -Dmin/2 Tiny -0 -0 -Dmin -Dmin -Dmin -Dmin -0 -0 -Dmin/2 < v < 0 Tiny -0 -0 -0 -Dmin -Dmin -Dmin -0 -0 v=0 EZD +0 +0 +0 +0 -0 +0 +0 +0 0 < v < +Dmin/2 Tiny +0 +0 +0 +Dmin +0 +Dmin +Dmin +0 v = +Dmin/2 Tiny +0 +0 +Dmin +Dmin +0 +Dmin +Dmin +0 +Dmin/2 < v < +Dmin Tiny +Dmin +Dmin +Dmin +Dmin +0 +Dmin +Dmin +0 +Dmin v < +Nmin Tiny b* b* b* b* b b* b* b +Nmin v +Nmax Normal b b b b b b b b +Nmax < v, q = +Nmax Normal +Nmax +Nmax +Nmax -- +Nmax +Nmax -- +Nmax +Nmax < v, q > +Nmax Overflow +1 +1 +1 +1 +Nmax +Nmax +1 +Nmax Explanation: -- This situation cannot occur. 1 The normal result r is considered to have been incremented. * The rounded value, in the extreme case, may be Nmin. In this case, the exception conditions are underflow, inexact, and incremented. b The value derived when the precise result v is rounded to the destination's precision, including both bounded precision and bounded exponent range. q The value derived when the precise result v is rounded to the destination's precision, but assuming an unbounded exponent range. r This is the returned value when neither overflow nor underflow is enabled. v Precise result before rounding, assuming unbounded precision and an unbounded exponent range. For data-format conversion operations, v is the source value. Dmin Smallest (in magnitude) representable subnormal number in the target format. EZD The result r of the exact-zero-difference case applies only to ADD and SUBTRACT with both source operands having opposite signs. (For ADD and SUBTRACT, when both source operands have the same sign, the sign of the zero result is the same sign as the sign of the source operands.) Nmax Largest (in magnitude) representable finite number in the target format. Nmin Smallest (in magnitude) representable normalized number in the target format. RAFZ Round away from 0. RFSP Round to Prepare for Shorter Precision. RNAZ Round to Nearest, Ties away from 0. RNE Round to Nearest, Ties to even. RNTZ Round to Nearest, Ties toward 0. RTPI Round toward +. RTMI Round toward -. RTZ Round toward 0. Figure 79. Rounding and Range Actions (Part 1) 170 Power ISATM Book I Version 2.06 Is r Is r Incre- Is q Is q Incre- inexact mented inexact mented Case (rv) OE=1 UE=1 XE=1 (|r|>|v|) (qv) (|q|>|v|) Returned Results and Status Setting* Overflow Yes1 No -- No No -- -- T(r), OX 1, FI 1, FR 0, XX 1 Overflow Yes1 No -- No Yes -- -- T(r), OX 1, FI 1, FR 1, XX 1 Overflow Yes1 No -- Yes No -- -- T(r), OX 1, FI 1, FR 0, XX 1, TX Overflow Yes1 No -- Yes Yes -- -- T(r), OX 1, FI 1, FR 1, XX 1, TX Overflow Yes1 Yes -- -- -- No No1 Tw(q÷), OX 1, FI 0, FR 0, TO Overflow Yes1 Yes -- -- -- Yes No Tw(q÷), OX 1, FI 1, FR 0, XX 1,TO 1 Overflow Yes Yes -- -- -- Yes Yes Tw(q÷), OX 1, FI 1, FR 1, XX 1,TO Normal No -- -- -- -- -- -- T(r), FI 0, FR 0 Normal Yes -- -- No No -- -- T(r), FI 1, FR 0, XX 1 Normal Yes -- -- No Yes -- -- T(r), FI 1, FR 1, XX 1 Normal Yes -- -- Yes No -- -- T(r), FI 1, FR 0, XX 1, TX Normal Yes -- -- Yes Yes -- -- T(r), FI 1, FR 1, XX 1, TX Tiny No -- No -- -- -- -- T(r), FI 0, FR 0 Tiny No -- Yes -- -- No1 No1 Tw(q·), UX 1, FI 0, FR 0, TU Tiny Yes -- No No No -- -- T(r), UX 1, FI 1, FR 0, XX 1 Tiny Yes -- No No Yes -- -- T(r), UX 1, FI 1, FR 1, XX 1 Tiny Yes -- No Yes No -- -- T(r), UX 1, FI 1, FR 0, XX 1, TX Tiny Yes -- No Yes Yes -- -- T(r), UX 1, FI 1, FR 1, XX 1, TX Tiny Yes -- Yes -- -- No No1 Tw(q·), UX 1, FI 0, FR 0, TU Tiny Yes -- Yes -- -- Yes No Tw(q·), UX 1, FI 1, FR 0, XX 1,TU Tiny Yes -- Yes -- -- Yes Yes Tw(q·), UX 1, FI 1, FR 1, XX 1,TU Explanation: -- The results do not depend on this condition. 1 This condition is true by virtue of the state of some condition to the left of this column. * Rounding sets only the FI and FR status flags. Setting of the OX, XX, or UX flag is part of the exception actions. They are listed here for reference. Wrap adjust, which depends on the type of operation and operand format. For all operations except Round to DFP Short and Round to DFP Long, the wrap adjust depends on the target format: = 10, where is 576 for DFP Long, and 9216 for DFP Extended. For Round to DFP Short and Round to DFP Long, the wrap adjust depends on the source format: = 10 where is 192 for DFP Long and 3072 for DFP Extended. q The value derived when the precise result v is rounded to destination's precision, but assuming an unbounded exponent range. r The result as defined in Part 1 of this figure. v Precise result before rounding, assuming unbounded precision and unbounded exponent range. FI Floating-Point-Fraction-Inexact status flag, FPSCRFI. This status flag is non-sticky. FR Floating-Point-Fraction-Rounded status flag, FPSCRFR. OX Floating-Point Overflow Exception status flag, FPSCRoX. TO The system floating-point enabled exception error handler is invoked for the overflow exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. TU The system floating-point enabled exception error handler is invoked for the underflow exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. TX The system floating-point enabled exception error handler is invoked for the inexact exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. T(x) The value x is placed at the target operand location. Tw(x) The wrapped rounded result x is placed at the target operand location. For all operations except data format conversions, the wrapped rounded result is in the same format and length as normal results at the target location. For data format conversions, the wrapped rounded result is in the same format and length as the source, but rounded to the target-format precision. UX Floating-Point-Underflow-Exception status flag, FPSCRUX XX Float-Point-Inexact-Exception Status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI. Figure 80. Rounding and Range Actions (Part 2) Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 171 Version 2.06 5.6 DFP Instruction Descriptions The following sections describe the DFP instructions. When a 128-bit operand is used, it is held in a FPR pair and the instruction mnemonic uses a letter "q" to mean the quad-precision operation. Note that in the following descriptions, FPXp denotes a FPR pair and must address an even-odd pair. If the FPXp field specifies an odd-numbered register, then the instruction form is invalid. The notation FPX[p] means either a FPR, FPX, or a FPR pair, FPXp. For DFP instructions, if a DFP operand is returned, the trailing significand field of the target operand is encoded using preferred DPD codes. 172 Power ISATM Book I Version 2.06 5.6.1 DFP Arithmetic Instructions All DFP arithmetic instructions are X-form instructions. The arithmetic instructions consist of Add, Divide, Multi- They all set the FI and FR status flags, and also set the ply, and Subtract. FPSCRFPRF field. Furthermore, they all have an ideal exponent assigned and employ the record bit (Rc). DFP Add [Quad] X-form DFP Subtract [Quad] X-form dadd FRT,FRA,FRB (Rc=0) dsub FRT,FRA,FRB (Rc=0) dadd. FRT,FRA,FRB (Rc=1) dsub. FRT,FRA,FRB (Rc=1) 59 FRT FRA FRB 2 Rc 59 FRT FRA FRB 514 Rc 0 6 11 16 21 31 0 6 11 16 21 31 daddq FRTp,FRAp,FRBp (Rc=0) dsubq FRTp,FRAp,FRBp (Rc=0) daddq. FRTp,FRAp,FRBp (Rc=1) dsubq. FRTp,FRAp,FRBp (Rc=1) 63 FRTp FRAp FRBp 2 Rc 63 FRTp FRAp FRBp 514 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The DFP operand in FRB[p] is subtracted from the DFP The DFP operand in FRA[p] is added to the DFP oper- operand in FRA[p]. and in FRB[p]. The result is rounded to the target-format precision The result is rounded to the target-format precision under control of the DRN (bits 29:31) of the FPSCR. under control of the DRN (bits 29:31) of the FPSCR. An appropriate form of the rounded result is selected An appropriate form of the rounded result is selected based on the ideal exponent and is placed in FRT[p]. based on the ideal exponent and is placed in FRT[p]. The ideal exponent is the smaller exponent of the two The ideal exponent is the smaller exponent of the two source operands. source operands. The execution of Subtract is identical to that of Add, Figure 81 summarizes the actions for Add. Figure 81 except that the operand in FRB participates in the oper- does not include the setting of the FPSCRFPRF field. ation with its sign bit inverted. See Figure 81. The table The FPSCRFPRF field is always set to the class and does not include the setting of the FPSCRFPRF field. sign of the result, except for an enabled invalid-opera- The FPSCRFPRF field is always set to the class and tion exception, in which case the field remains sign of the result, except for an enabled invalid-opera- unchanged. tion exception, in which case the field remains Special Registers Altered: unchanged. FPRF FR FI Special Registers Altered: FX OX UX XX FPRF FR FI VXSNAN VXISI FX OX UX XX CR1 (if Rc=1) VXSNAN VXISI CR1 (if Rc=1) Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 173 Version 2.06 Operand a Actions for Add (a + b) when operand b in FRB[p] is in FRA[p] is - F + QNaN SNaN - T(-dINF) T(-dINF) VXISI: T(dNaN) P(b) VXSNAN: U(b) F T(-dINF) S(a + b) T(+dINF) P(b) VXSNAN: U(b) + VXISI: T(dNaN) T(+dINF) T(+dINF) P(b) VXSNAN: U(b) QNaN P(a) P(a) P(a) P(a) VXSNAN: U(b) SNaN VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) Explanation: a+b The value a added to b, rounded to the target-format precision and returned in the appropriate form. (See Section 5.5.11 on page 170) +dINF Default plus infinity. - dINF Default minus infinity. dNaN Default quiet NaN. F All finite numbers, including zeros. P(x) The QNaN of operand x is propagated and placed in FRT[p]. S(x) The value x is placed in FRT[p] with the sign set by the rules of algebra. When the source oper- ands have the same sign, the sign of the result is the same as the sign of the operands, includ- ing the case when the result is zero. When the operands have opposite signs, the sign of a zero result is positive in all rounding modes, except round toward -, in which case, the sign is minus. T(x) The value x is placed in FRT[p]. U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FRT[p]. VXISI The Invalid-Operation Exception (VXISI) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 "Invalid Operation Exception" on page 166 for the exception actions.) VXSNAN The Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the excep- tion is disabled. (See Section 5.5.10.1 "Invalid Operation Exception" on page 166 for the exception actions.) Figure 81. Actions: Add 174 Power ISATM Book I Version 2.06 DFP Multiply [Quad] X-form Special Registers Altered: FPRF FR FI dmul FRT,FRA,FRB (Rc=0) FX OX UX XX dmul. FRT,FRA,FRB (Rc=1) VXSNAN VXIMZ CR1 (if Rc=1) 59 FRT FRA FRB 34 Rc 0 6 11 16 21 31 dmulq FRTp,FRAp,FRBp (Rc=0) dmulq. FRTp,FRAp,FRBp (Rc=1) 63 FRTp FRAp FRBp 34 Rc 0 6 11 16 21 31 The DFP operand in FRA[p] is multiplied by the DFP operand in FRB[p]. The result is rounded to the target-format precision under control of the DRN (bits 29:31) of the FPSCR. An appropriate form of the rounded result is selected based on the ideal exponent and is placed in FRT[p]. The ideal exponent is the sum of the two exponents of the source operands. Figure 82 summarizes the actions for Multiply. Figure 82 does not include the setting of the FPSCRF- PRF field. The FPSCRFPRF field is always set to the class and sign of the result, except for an enabled invalid-operation exception, in which case the field remains unchanged. Operand a Actions for Multiply (a*b) when operand b in FRB[p] is in FRA[p] is 0 Fn QNaN SNaN 0 S(a * b) S(a * b) VXIMZ: T(dNaN) P(b) VXSNAN: U(b) Fn S(a * b) S(a * b) S(dINF) P(b) VXSNAN: U(b) VXIMZ: T(dNaN) S(dINF) S(dINF) P(b) VXSNAN: U(b) QNaN P(a) P(a) P(a) P(a) VXSNAN: U(b) SNaN VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) Explanation: a*b The value a multiplied by b, rounded to the target-format precision and returned in the appropriate form. (See Section 5.5.11 on page 170) dINF Default infinity. dNaN Default quiet NaN. Fn Finite nonzero number (includes both normal and subnormal numbers). P(x) The QNaN of operand x is propagated and placed in FRT[p]. S(x) The value x is placed in FRT[p] with the sign set to the exclusive-OR of the source-operand signs. T(x) The value x is placed in FRT[p]. U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FRT[p]. VXIMZ: The Invalid-Operation Exception (VXIMZ) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 "Invalid Operation Exception" on page 166 for the exception actions.) VXSNAN: The Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 "Invalid Operation Exception" on page 166 for the exception actions.) Figure 82. Actions: Multiply Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 175 Version 2.06 DFP Divide [Quad] X-form Figure 83 summarizes the actions for Divide. Figure 83 does not include the setting of the FPSCRFPRF field. ddiv FRT,FRA,FRB (Rc=0) The FPSCRFPRF field is always set to the class and ddiv. FRT,FRA,FRB (Rc=1) sign of the result, except for an enabled invalid-opera- tion and enabled zero-divide exceptions, in which 59 FRT FRA FRB 546 Rc cases the field remains unchanged. 0 6 11 16 21 31 Special Registers Altered: FPRF FR FI ddivq FRTp,FRAp,FRBp (Rc=0) FX OX UX ZX XX ddivq. FRTp,FRAp,FRBp (Rc=1) VXSNAN VXIDI VXZDZ CR1 (if Rc=1) 63 FRTp FRAp FRBp 546 Rc 0 6 11 16 21 31 The DFP operand in FRA[p] is divided by the DFP operand in FRB[p]. The result is rounded to the target-format precision under control of the DRN (bits 29:31) of the FPSCR. An appropriate form of the rounded result is selected based on the ideal exponent and is placed in FRT[p]. The ideal exponent is the difference of subtracting the exponent of the divisor from the exponent of the divi- dend. Operand a Actions for Divide (a ÷ b) when operand b in FRB[p] is in FRA[p] is 0 Fn QNaN SNaN 0 VXZDZ: T(dNaN) S(a ÷ b) S(zt) P(b) VXSNAN: U(b) Fn Zx: S(dINF) S(a ÷ b) S(zt) P(b) VXSNAN: U(b) S(dINF) S(dINF) VXIDI: T(dNaN) P(b) VXSNAN: U(b) QNaN P(a) P(a) P(a) P(a) VXSNAN: U(b) SNaN VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) Explanation: a ÷ b The value a divided by b, rounded to the target-format precision and returned in the appropriate form. (See Section 5.5.11 on page 170.) dINF Default infinity. dNaN Default quiet NaN. Fn Finite nonzero number (includes both normal and subnormal numbers). P(x) The QNaN of operand x is propagated and placed in FRT[p]. S(x) The value x is placed in FRT[p] with the sign set to the exclusive-OR of the source-operand signs. T(x) The value x is placed in FRT[p]. U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FRT[p]. VXIDI: The Invalid-Operation Exception (VXIDI) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 "Invalid Operation Exception" on page 166 for the exception actions.) VXSNAN: The Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 "Invalid Operation Exception" on page 166 for the exception actions.) VXZDZ: The Invalid-Operation Exception (VXZDZ) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 "Invalid Operation Exception" on page 166 for the exception actions.) zt True zero (zero significand and most negative exponent). Zx The Zero-Divide Exception occurs. The result is produced only when the exception is disabled (See Section 5.5.10.2 "Zero Divide Exception" on page 167 for the exception actions.) Figure 83. Actions: Divide 176 Power ISATM Book I Version 2.06 5.6.2 DFP Compare Instructions The DFP compare instructions consist of the Compare The codes in the CR field BF and FPSCRFPCC are Ordered and Compare Unordered instructions. The defined for the DFP compare operations as follows. compare instructions do not provide the record bit. Bit Name Description The comparison sets the designated CR field to indi- 0 FL (FRA[p]) < (FRB[p]) cate the result. The FPSCRFPCC is set in the same 1 FG (FRA[p]) > (FRB[p]) way. 2 FE (FRA[p]) = (FRB[p]) 3 FU (FRA[p]) ? (FRB[p]) Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 177 Version 2.06 DFP Compare Unordered [Quad] X-form dcmpu BF,FRA,FRB 59 BF // FRA FRB 642 / 0 6 9 11 16 21 31 dcmpuq BF,FRAp,FRBp 63 BF // FRAp FRBp 642 / 0 6 9 11 16 21 31 The DFP operand in FRA[p] is compared to the DFP operand in FRB[p]. The result of the compare is placed into CR field BF and the FPSCRFPCC. Special Registers Altered: CR field BF FPCC FX VXSNAN Operand a in Actions for Compare Unordered (a:b) when operand b in FRB[p] is FRA[p] is - F + QNaN SNaN - AeqB AltB AltB AuoB Fu, VXSNAN F AgtB C(a:b) AltB AuoB Fu, VXSNAN + AgtB AgtB AeqB AuoB Fu, VXSNAN QNaN AuoB AuoB AuoB AuoB Fu, VXSNAN SNaN Fu, VXSNAN Fu, VXSNAN Fu, VXSNAN Fu, VXSNAN Fu, VXSNAN Explanation: C(a:b) Algebraic comparison. See the table below. F All finite numbers, including zeros. AeqB CR field BF and FPSCRFPCC are set to 0b0010. AgtB CR field BF and FPSCRFPCC are set to 0b0100. AltB CR field BF and FPSCRFPCC are set to 0b1000. AuoB CR field BF and FPSCRFPCC are set to 0b0001. VXSNAN The invalid-operation exception (VXSNAN) occurs. See Section 5.5.10.1 for actions. Relation of Value a to Value b Action for C(a:b) a = b AeqB a < b AltB a > b AgtB Figure 84. Actions: Compare Unordered 178 Power ISATM Book I Version 2.06 DFP Compare Ordered [Quad] X-form dcmpo BF,FRA,FRB 59 BF // FRA FRB 130 / 0 6 9 11 16 21 31 dcmpoq BF,FRAp,FRBp 63 BF // FRAp FRBp 130 / 0 6 9 11 16 21 31 The DFP operand in FRA[p] is compared to the DFP operand in FRB[p]. The result of the compare is placed into CR field BF and the FPSCRFPCC. Special Registers Altered: CR field BF FPCC FX VXSNAN VXVC Operand a in Actions for Compare ordered (a:b) when operand b in FRB[p] is FRA[p] is - F + QNaN SNaN - AeqB AltB AltB AuoB, VXVC AuoB, VXSV F AgtB C(a:b) AltB AuoB, VXVC AuoB, VXSV + AgtB AgtB AeqB AuoB, VXVC AuoB, VXSV QNaN AuoB, VXVC AuoB, VXVC AuoB, VXVC AuoB, VXVC AuoB, VXSV SNaN AuoB, VXSV AuoB, VXSV AuoB, VXSV AuoB, VXSV AuoB, VXSV Explanation: C(a:b) Algebraic comparison. See the table below F All finite numbers, including zeros AeqB CR field BF and FPSCRFPCC are set to 0b0010. AgtB CR field BF and FPSCRFPCC are set to 0b0100. AltB CR field BF and FPSCRFPCC are set to 0b1000. AuoB CR field BF and FPSCRFPCC are set to 0b0001. VXSV The invalid-operation exception (VXSNAN) occurs. Additionally, if the exception is disabled (FPSCRVE=0), then FPSCRVXVC is also set to one. See Section 5.5.10.1 for actions. VXVC The invalid-operation exception (VXVC) occurs. See Section 5.5.10.1 for actions. Relation of Value a to Value b Action for C(a:b) a = b AeqB a < b AltB a > b AgtB Figure 85. Actions: Compare Ordered Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 179 Version 2.06 5.6.3 DFP Test Instructions The DFP test instructions consist of the Test Data The test instructions set the designated CR field to indi- Class, Test Data Group, Test Exponent, and Test Sig- cate the result. The FPSCRFPCC is set in the same nificance instructions, and they do not provide the way. record bit. DFP Test Data Class [Quad] Z22-form DFP Test Data Group [Quad] Z22-form dtstdc BF,FRA,DCM dtstdg BF,FRA,DGM 59 BF // FRA DCM 194 / 59 BF // FRA DGM 226 / 0 6 9 11 16 22 31 0 6 9 11 16 22 31 dtstdcq BF,FRAp,DCM dtstdgq BF,FRAp,DGM 63 BF // FRAp DCM 194 / 63 BF // FRAp DGM 226 / 0 6 9 11 16 22 31 0 6 9 11 16 22 31 Let the DCM (Data Class Mask) field specify one or Let the DGM (Data Group Mask) field specify one or more of the 6 possible data classes, where each bit more of the 6 possible data groups, where each bit cor- corresponds to a specific data class. responds to a specific data group. DCM Bit Data Class The term extreme exponent means either the maxi- mum exponent, Xmax, or the minimum exponent, Xmin. 0 Zero 1 Subnormal DGM Bit Data Group 2 Normal 0 Zero with non-extreme exponent 3 Infinity 1 Zero with extreme exponent 4 Quiet NaN 2 Subnormal or (Normal with extreme expo- 5 Signaling NaN nent) CR field BF and FPSCRFPCC are set to indicate the 3 Normal with non-extreme exponent and sign of the DFP operand in FRA[p] and whether the leftmost zero digit in significand data class of the DFP operand in FRA[p] matches any 4 Normal with non-extreme exponent and of the data classes specified by DCM. leftmost nonzero digit in significand 5 Special symbol (Infinity, QNaN, or SNaN) Field Meaning CR field BF and FPSCRFPCC are set to indicate the 0000 Operand positive with no match sign of the DFP operand in FRA[p] and whether the 0010 Operand positive with match data group of the DFP operand in FRA[p] matches any 1000 Operand negative with no match of the data groups specified by DGM. 1010 Operand negative with match Field Meaning Special Registers Altered: CR field BF 0000 Operand positive with no match FPCC 0010 Operand positive with match 1000 Operand negative with no match 1010 Operand negative with match Special Registers Altered: CR field BF FPCC 180 Power ISATM Book I Version 2.06 DFP Test Exponent [Quad] X-form dtstex BF,FRA,FRB 59 BF // FRA FRB 162 / 0 6 9 11 16 21 31 dtstexq BF,FRAp,FRBp 63 BF // FRAp FRBp 162 / 0 6 9 11 16 21 31 The exponent value (Ea) of the DFP operand in FRA[p] is compared to the exponent value (Eb) of the DFP operand in FRB [p]. The result of the compare is placed into CR field BF and the FPSCRFPCC. The codes in the CR field BF and FPSCRFPCC are defined for the DFP Test Exponent operations as fol- lows. Bit Description 0 Ea < Eb 1 Ea > Eb 2 Ea = Eb 3 Ea ? Eb Special Registers Altered: CR field BF FPCC Operand a in Actions for Test Exponent (Ea:Eb) when operand b in FRB[p] is FRA[p] is F QNaN SNaN F C(Ea:Eb) AuoB AuoB AuoB AuoB AeqB AuoB AuoB QNaN AuoB AuoB AeqB AeqB SNaN AuoB AuoB AeqB AeqB Explanation: C(Ea:Eb) Algebraic comparison. See the table below. F All finite numbers, including zeros AeqB CR field BF and FPSCRFPCC are set to 0b0010. AgtB CR field BF and FPSCRFPCC are set to 0b0100. AltB CR field BF and FPSCRFPCC are set to 0b1000. AuoB CR field BF and FPSCRFPCC are set to 0b0001. Relation of Value Ea to Value Eb Action for C(Ea:Eb) Ea = Eb AeqB Ea < Eb AltB Ea > Eb AgtB Figure 86. Actions: Test Exponent Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 181 Version 2.06 DFP Test Significance [Quad] X-form dtstsf BF,FRA,FRB Actions for Test Significance when the operand in FRB[p] is 59 BF // FRA FRB 674 / F QNaN SNaN 0 6 9 11 16 21 31 C(k: NSDb) AuoB AuoB AuoB Explanation: dtstsfq BF,FRA,FRBp C(k: NSDb) Algebraic comparison. See the table below. 63 BF // FRA FRBp 674 / F All finite numbers, including zeros. 0 6 9 11 16 21 31 AeqB CR field BF and FPSCRFPCC are set to 0b0010. Let k be the contents of bits 58:63 of FRA that specifies AgtB CR field BF and FPSCRFPCC are the reference significance. set to 0b0100. The number of significant digits of the DFP operand in AltB CR field BF and FPSCRFPCC are FRB[p], NSDb, is compared to the reference signifi- set to 0b1000. cance, k. For this instruction, the number of significant AuoB CR field BF and FPSCRFPCC are digits of the value 0 is considered to be zero. The set to 0b0001. result of the compare is placed into CR field BF and the FPSCRFPCC as follows. Relation of Value NSDb to Action for Bit Description Value k C(k:NSDb) 0 k g 0 and k < NSDb k g 0 and k = NSDb AeqB 1 k g 0 and k > NSDb, or k = 0 k g 0 and k < NSDb AltB 2 k g 0 and k = NSDb k g 0 and k > NSDb, or k = 0 AgtB 3 k ? NSDb Figure 87. Actions: Test Significance Special Registers Altered: CR field BF FPCC Programming Note The reference significance can be loaded into a FPR using a Load Float as Integer Word Algebraic instruction 182 Power ISATM Book I Version 2.06 5.6.4 DFP Quantum Adjustment Instructions The Quantum Adjustment operations consist of the mary or secondary encoding, depending on the setting Quantize, Quantize Immediate, Reround, and Round of a RMC-encoding-selection bit. See Section 5.5.2 To FP Integer operations. "Rounding Mode Specification" on page 161 for the definition of RMC encoding. The Quantum Adjustment instructions are Z23-form instructions and have an immediate RMC (Round- All Quantum Adjustment instructions set the FI and FR ing-Mode-Control) field, which specifies the rounding status flags, and also set the FPSCRFPRF field. The mode used. For Quantize, Quantize Immediate, and record bit is provided to each of these instructions. Reround, the RMC field contains the primary encoding. They return the target operand in a form with the ideal For Round to FP Integer, the field contains either pri- exponent. DFP Quantize Immediate [Quad] Z23-form Programming Note dquai TE,FRT,FRB,RMC (Rc=0) DFP Quantize Immediate can be used to adjust dquai. TE,FRT,FRB,RMC (Rc=1) values to a form having the specified exponent in the range -16 to 15. If the adjustment requires the 59 FRT TE FRB RMC 67 Rc significand to be shifted left, then: 0 6 11 16 21 23 31 if the result would cause overflow from the most significant digit, the result is a default dquaiq TE,FRTp,FRBp,RMC (Rc=0) QNaN.; dquaiq. TE,FRTp,FRBp,RMC (Rc=1) otherwise the result is the adjusted value (left shifted with matching exponent). 63 FRTp TE FRBp RMC 67 Rc If the adjustment requires the significand to be 0 6 11 16 21 23 31 shifted right, the result is rounded based on the value of the RMC field. The DFP operand in FRB[p] is converted and rounded to the form with the exponent specified by TE based on DFP Quantize Immediate can round a value to a the rounding mode specified in the RMC field. TE is a specific number of fractional digits. Consider the 5-bit signed binary integer. The result of that form is computation of sales tax. Values expressed in U.S. placed in FRT[p]. The sign of the result is the same as dollars have 2 fractional digits, and sales tax rates the sign of the operand in FRB[p]. The ideal exponent typically have 3 fractional digits. The product of is the exponent specified by TE. value and rate will yield 5 fractional digits. For example: When the value of the operand in FRB[p] is greater than (10p-1) % 10TE, where p is the format precision, an 39.95 * 0.075 = 2.99625 invalid operation exception is recognized. This result needs to be rounded to the penny to When the delivered result differs in value from the oper- compute the correct tax of $3.00. and in FRB[p], an inexact exception is recognized. No The following sequence computes the sales tax underflow exception is recognized by this operation, assuming the pre-tax total is in FRA and the tax regardless of the value of the operand in FRB[p]. rate is in FRB. The DFP Quantize Immediate The FPSCRFPRF field is always set to the class and instruction rounds the product (FRA * FRB) to 2 sign of the result, except for an enabled invalid-opera- fractional digits (TE field = -2) using Round to near- tion exception, in which case the field remains est, ties away from 0 (RMC field = 2). The quan- unchanged. tized and rounded result is placed in FRT. Special Registers Altered: dmul f0,FRA,FRB FPRF FR FI dquai -2,FRT,f0,2 FX XX VXSNAN VXCVI CR1 (if Rc=1) Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 183 Version 2.06 DFP Quantize [Quad] Z23-form underflow exception is recognized by this operation, regardless of the value of the operand in FRB[p]. dqua FRT,FRA,FRB,RMC (Rc=0) Figure 89 and Figure 90 summarize the actions. The dqua. FRT,FRA,FRB,RMC (Rc=1) tables do not include the setting of the FPSCRFPRF field. The FPSCRFPRF field is always set to the class 59 FRT FRA FRB RMC 3 Rc and sign of the result, except for an enabled 0 6 11 16 21 23 31 invalid-operation exception, in which case the field remains unchanged. dquaq FRTp,FRAp,FRBp,RMC (Rc=0) Special Register Altered: dquaq. FRTp,FRAp,FRBp,RMC (Rc=1) FPRF FR FI FX XX 63 FRTp FRAp FRBp RMC 3 Rc VXSNAN VXCVI 0 6 11 16 21 23 31 CR1 (if Rc=1) The DFP operand in register FRB[p] is converted and Programming Note rounded to the form with the same exponent as that of DFP Quantize can be used to adjust one DFP the DFP operand in FRA[p] based on the rounding value (FRB[p]) to a form having the same exponent mode specified in the RMC field. The result of that form as a second DFP value (FRA[p]). If the adjustment is placed in FRT[p]. The sign of the result is the same requires the significand to be shifted left, then: as the sign of the operand in FRB[p]. The ideal expo- nent is the exponent specified in FRA[p]. if the result would cause overflow from the most significant digit, the result is a default When the value of the operand in FRB[p] is greater QNaN.; than (10p-1) % 10Ea, where p is the format precision otherwise the result is the adjusted value (left and Ea is the exponent of the operand in FRA[p], an shifted with matching exponent). invalid operation exception is recognized. If the adjustment requires the significand to be When the delivered result differs in value from the oper- shifted right, the result is rounded based on the and in FRB[p], an inexact exception is recognized. No value of the RMC field. Figure 88 shows examples of these adjustments. FRA FRB FRT when RMC=1 FRT when RMC=2 1 (1 x 100) 9. (9 x 100) 9 (9 x 100) 9 (9 x 100) 1.00 (100 x 10-2) 9. (9 x 100) 9.00 (900 x 10-2) 9.00 (900 x 10-2) 1 (1 x 100) 49.1234 (491234 x 10-4) 49 (49 x 100) 49 (49 x 100) 1.00 (100 x 10-2) 49.1234 (491234 x 10-4) 49.12 (4912 x 10-2) 49.12 (4912 x 10-2) 1 (1 x 100) 49.9876 (499876 x 10-4) 49 (49 x 100) 50 (50 x 100) 1.00 (100 x 10-2) 49.9876 (499876 x 10-4) 49.98 (4998 x 10-2) 49.99 (4999 x 10-2) 0.01 (1 x 10-2) 49.9876 (499876 x 10-4) 49.98 (4998 x 10-2) 49.99 (4999 x 10-2) 9999999999999999 9999999999999999 9999999999999999 1 (1 x 100) (9999999999999999 x 100) (9999999999999999 x 100) (9999999999999999 x 100) 9999999999999999 1.0 (10 x 10-1) QNaN QNaN (9999999999999999 x 100) Figure 88. DFP Quantize examples 184 Power ISATM Book I Version 2.06 Operand a Actions for Quantize when operand b in FRB[p] is in FRA[p] is 0 Fn QNaN SNaN 0 * * VXCVI: T(dNaN) P(b) VXSNAN: U(b) Fn * * VXCVI: T(dNaN) P(b) VXSNAN: U(b) · VXCVI: T(dNaN) VXCVI: T(dNaN) T(dINF) P(b) VXSNAN: U(b) QNaN P(a) P(a) P(a) P(a) VXSNAN: U(b) SNaN VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) Explanation: * See next table. dINF Default infinity dNaN Default quiet NaN Fn Finite nonzero numbers (includes both subnormal and normal numbers) P(x) The QNaN of operand x is propagated and placed in FRT[p] T(x) The value x is placed in FRT[p] U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FRT[p]. VXCVI The Invalid-Operation Exception (VXCVI) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 for actions) VXSNAN The Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 for actions) Figure 89. Actions (part 1) Quantize Actions for Quantize when operand b in FRB[p] is 0 Fn Te < Se Vb > (10p - 1) % 10Te E(0) VXCVI: T(dNaN) Vb [ (10p - 1) % 10Te E(0) L(b) Te = Se E(0) W(b) Te > Se E(0) QR(b) Explanation: dNaN Default quiet NaN E(0) The value of zero with the exponent value Te is placed in FRT[p]. L(x) The operand x is converted to the form with the exponent value Te. p The precision of the format. QR(x) The operand x is rounded to the result of the form with the exponent value Te based on the specified rounding mode. The result of that form is placed in FRT[p]. Se The exponent of the operand in FRB[p]. Te The target exponent; FRA[p] for dqua[q], or TE, a 5-bit signed binary integer for dquai[q]. T(x) The value x is placed in FRT[p]. Vb The value of the operand in FRB[p]. W(x) The value and the form of operand x is placed in FRT[p]. VXCVI: The Invalid-Operation Exception (VXCVI) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 for actions.) Figure 90. Actions (part2) Quantize Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 185 Version 2.06 DFP Reround [Quad] Z23-form invalid-operation exception, in which case the field remains unchanged. drrnd FRT,FRA,FRB,RMC (Rc=0) Special Registers Altered: drrnd. FRT,FRA,FRB,RMC (Rc=1 FPRF FR FI FX XX 59 FRT FRA FRB RMC 35 Rc VXSNAN VXCVI 0 6 11 16 21 23 31 CR1 (if Rc=1) drrndq FRTp,FRA,FRBp,RMC (Rc=0) Programming Note drrndq. FRTp,FRA,FRBp,RMC (Rc=1) DFP Reround can be used to adjust a DFP value (FRB[p]) to have no more than a specified number 63 FRTp FRA FRBp RMC 35 Rc (FRA[p]58:63) of significant digits. The result 0 6 11 16 21 23 31 (FRT[p]) is right-justified leaving the specified num- ber of digits and rounded as specified by the RMC Let k be the contents of bits 58:63 of FRA that specifies field. If rounding increases the number of significant the reference significance. digits, the result is adjusted again (the significand is shifted right 1 digit and the exponent is incremented When the DFP operand in FRB[p] is a finite number, by 1). Figure 91 has example results from DFP and if the reference significance is zero, or if the refer- Reround for 1, 2, and 10 significant digits. ence significance is nonzero and the number of signifi- cant digits of the source operand is less than or equal to the reference significance, then the value and the Programming Note form of the source operand is placed in FRT[p]. If the DFP Reround is primarily used to round a DFP reference significance is nonzero and the number of value to a specific number of digits before conver- significant digits of the source operand is greater than sion to string format for printing or display. Another the reference significance, then the source operand is use for DFP Reround is to obtain the effective expo- converted and rounded to the number of significant dig- nent of the most significant digit by specifying a ref- its specified in the reference significance based on the erence significance of 1. The exponent can be rounding mode specified in the RMC field. The result extracted and used to compute the number of sig- of the form with the specified number of significant dig- nificant digits or to left-justify a value. its is placed in FRT[p]. The sign of the result is the For example, the following sequence computes the same as the sign of the operand in FRB[p]. number of significant digits and returns it as an inte- For this instruction, the number of significant digits of ger. FRB is the DFP value for which we want the the value 0 is considered to be zero. The ideal expo- number of significant digits; f13 contains the refer- nent is the greater value of the exponent of the operand ence significance value 0x0000000000000001; and in FRB[p] and the referenced exponent. The refer- r1 is the stack pointer, with free space for double- enced exponent is the resultant exponent if the oper- words at offsets -8 and -16. These doublewords are and in FRB[p] would have been converted and rounded used to transfer the biased exponents from the to the number of significant digits specified in the refer- FPRs to GPRs for integer computation. R3 contains ence significance based on the rounding mode speci- the result of E(reround(1,FRA) ) - E(FRA) + 1, fied in the RMC field. where E(x) represents the biased exponent of x. If the exponent of the rounded result of the form that dxex f0,FRB has the specified number of significant digits would be stfd f0,-16(r1) greater than Xmax, an invalid operation exception drrnd f1,f13,FRB,1 # reround 1 digit toward 0 (VXCVI) occurs. When the invalid-operation exception dxex f1,f1 stfd f1,-8(r1) occurs, and if the exception is disabled, a default QNaN lfd r11,-16(r1) is returned. When an invalid-operation exception lfd r3,-8(r1) occurs, no inexact exception is recognized. subf r3,r11,r3 In the absence of an invalid-operation exception, if the addi r3,r3,1 result differs in value from the operand in FRB[p], an Given the value 412.34 the result is E(4 x 102) - inexact exception is recognized. E(41234 x 10-2) + 1 = (398+2) - (398-2) + 1 = 400 - 396 + 1 = 5. Additional code is required to detect This operation causes neither an overflow nor an and handle special values like Subnormal, Infinity, underflow exception. and NAN. Figure 92 summarizes the actions for Reround. The table does not include the setting of the FPSCRFPRF field. The FPSCRFPRF field is always set to the class and sign of the result, except for an enabled 186 Power ISATM Book I Version 2.06 FRA58:63 (binary) FRB FRT when RMC=1 FRT when RMC=2 1 0.41234 (41234 % 10-5) 0.4 (4 % 10-1) 0.4 (4 % 10-1) 1 4.1234 (41234 % 10-4) 4 (4 % 100) 4 (4 % 100) 1 41.234 (41234 % 10-3) 4 (4 % 101) 4 (4 % 101) 1 412.34 (41234 % 10-2) 4 (4 % 102) 4 (4 % 102) 2 0.491234 (491234 % 10-6) 0.49 (49 % 10-2) 0.49 (49 % 10-2) 2 0.499876 (499876 % 10-6) 0.49 (49 % 10-2) 0.50 (50 % 10-2) 2 0.999876 (999876 % 10-6) 0.99 (99 % 10-2) 1.0 (10 % 10-1) 10 0.491234 (491234 % 10-6) 0.491234 (491234 % 10-6) 0.491234 (491234 % 10-6) 10 999.999 (999999 % 10-3) 999.999 (999999 % 10-3) 999.999 (999999 % 10-3) 9999999999999999 9.999999999E+14 1.000000000E+15 10 (9999999999999999 % 100) (9999999999 % 105) (1000000000 % 106) Figure 91. DFP Reround examples Programming Note DFP Reround combined with DFP Quantize can be used to left justify a value (as needed by the frexp function). FRB is the DFP value for which we want to left justify; f13 contains the reference significance value 0x0000000000000001; and r1 is the stack pointer, with free space for a doubleword at offset -8. This doubleword is used to transfer the biased exponents from the FPR to a GPR, for integer com- putation. The adjusted biased exponent (+ format precision - 1) is transferred back into an FPR so it can be inserted into the rerounded value. The adjusted rerounded value becomes the quantize reference value. The quantize instruction returns the left justified result in FRT. drrnd f1,f13,FRB,1 # reround 1 digit toward 0 dxex f0,f1 stfd f0,-8(r1) lfd r11,-8(r1) addi r11,r11,15 # biased exp + precision - 1 lfd r11,-8(r1) stfd f0,-8(r1) diex f1,f0,f1 # adjust exponent dqua FRT,f1,f0,1 # quantize to adjusted exponent Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 187 Version 2.06 Actions for Reround when operand b in FRB[p] is 0* Fn QNaN SNaN - RR(b) or T(dINF) P(b) VXSNAN: U(b) k g 0, k < m VXCVI: T(dNaN) k g 0, k = m - W(b) T(dINF) P(b) VXSNAN: U(b) k g 0 and k > m, W(b) W(b) T(dINF) P(b) VXSNAN: U(b) or k = 0 Explanation: * The number of significant digits of the value 0 is considered to be zero for this instruction. - Not applicable. dINF Default infinity. Fn Finite nonzero numbers (includes both subnormal and normal numbers). k Reference significance, which specifies the number of significant digits in the target operand. m Number of significant digits in the operand in FRB[p]. P(x) The QNaN of operand x is propagated and placed in FRT[p]. RR(x) The value x is rounded to the form that has the specified number of significant digits. If RR(x) [ (10k-1) % 10Xmax, then RR(x) is returned; otherwise an invalid-operation excep- tion is recognized. T(x) The value x is placed in FRT[p]. U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FRT[p]. VXCVI The Invalid-Operation Exception (VXCVI) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 for actions.) VXSNAN: The Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the exception is disabled. See Section 5.5.10.1 for actions. W(x) The value and the form of x is placed in FRT[p]. Figure 92. Actions: Reround 188 Power ISATM Book I Version 2.06 DFP Round To FP Integer With Inexact Programming Note [Quad] Z23-form The DFP Round To FP Integer With Inexact and drintx R,FRT,FRB,RMC (Rc=0) DFP Round To FP Integer With Inexact Quad drintx. R,FRT,FRB,RMC (Rc=1) instructions can be used to implement the decimal equivalent of the C99 rint function by specifying the 59 FRT /// R FRB RMC 99 Rc primary RMC encoding for round according to FPSCRDRN (R=0, RMC=11). The specification for 0 6 11 15 16 21 23 31 rint requires the inexact exception be raised if detected. drintxq R,FRTp,FRBp,RMC (Rc=0) drintxq. R,FRTp,FRBp,RMC (Rc=1) 63 FRTp /// R FRBp RMC 99 Rc 0 6 11 15 16 21 23 31 The DFP operand in FRB[p] is rounded to a float- ing-point integer and placed into FRT[p]. The sign of the result is the same as the sign of the operand in FRB[p]. The ideal exponent is the larger value of zero and the exponent of the operand in FRB[p]. The rounding mode used is specified in the RMC field. When the RMC-encoding-selection (R) bit is zero, the RMC field contains the primary encoding; when the bit is one, the field contains the secondary encoding. In addition to coercion of the converted value to fit the target format, the special rounding used by Round To FP Integer also coerces the target exponent to the ideal exponent. When the operand in FRB[p] is a finite number and the exponent is less than zero, the operand is rounded to the result with an exponent of zero. When the expo- nent is greater than or equal to zero, the result is set to the numerical value and the form of the operand in FRB[p]. When the result differs in value from the operand in FRB[p], an inexact exception is recognized. No under- flow exception is recognized by this operation, regard- less of the value of the operand in FRB[p]. Figure 93 summarizes the actions for Round To FP Integer With Inexact. The table does not include the setting of the FPSCRFPRF field. The FPSCRFPRF field is always set to the class and sign of the result, except for an enabled invalid-operation, in which case the field remains unchanged. Special Registers Altered: FPRF FR FI FX XX VXSNAN CR1 (if Rc=1) Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 189 Version 2.06 Inv.-Op. Inexact Is n Incre- Operand b Is n not pre- Exception Exception mented in FRB is cise (n b) Enabled Enabled (|n| > |b|) Actions* - No1 - - - T(-dINF), FI 0, FR 0 F No - - - W(n), FI 0, FR 0 F Yes - No No W(n), FI 1, FR 0, XX 1 F Yes - No Yes W(n), FI 1, FR 1, XX 1 F Yes - Yes No W(n), FI 1, FR 0, XX 1, TX F Yes - Yes Yes W(n), FI 1, FR 1, XX 1, TX + No1 - - - T(+dINF), FI 0, FR 0 QNaN No1 - - - P(b), FI 0, FR 0 SNaN No1 No - - U(b), FI 0, FR 0, VXSNAN 1 1 SNaN No Yes - - VXSNAN 1, TV Explanation: * Setting of XX and VXSNAN is part of the corresponding exception actions. Also, when an invalid-operation exception occurs, setting of FI and FR is part of the exception actions.(See the sections, "Inexact Exception" and "Invalid Operation Exception" for more details.) - The actions do not depend on this condition. 1 This condition is true by virtue of the state of some condition to the left of this column. dINF Default infinity. F All finite numbers, including zeros. FI Floating-Point-Fraction-Inexact status flag, FPSCRFI. FR Floating-Point-Fraction-Rounded status flag, FPSCRFR. n The value derived when the source operand, b, is rounded to an integer using the special rounding for Round To FP Integer. P(x) The QNaN of operand x is propagated and placed in FRT[p]. T(x) The value x is placed in FRT[p]. TV The system floating-point enabled exception error handler is invoked for the invalid-operation excep- tion if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. TX The system floating-point enabled exception error handler is invoked for the inexact exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-excep- tion mode. U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FPT[p]. W(x) The value x in the form of zero exponent or the source exponent is placed in FRT[p]. XX Floating-Point-Inexact-Exception status flag, FPSCRXX. Figure 93. Actions: Round to FP Integer With Inexact 190 Power ISATM Book I Version 2.06 DFP Round To FP Integer Without Inexact Special Registers Altered: [Quad] Z23-form FPRF FR (set to 0) FI (set to 0) FX drintn R,FRT,FRB,RMC (Rc=0) VXSNAN drintn. R,FRT,FRB,RMC (Rc=1) CR1 (if Rc=1) 59 FRT /// R FRB RMC 227 Rc Programming Note 0 6 11 15 16 21 23 31 The DFP Round To FP Integer Without Inexact and DFP Round To FP Integer Without Inexact Quad drintnq R,FRTp,FRBp,RMC (Rc=0) instructions can be used to implement decimal drintnq. R,FRTp,FRBp,RMC (Rc=1) equivalents of several C99 rounding functions by specifying the appropriate R and RMC field values. 63 FRTp /// R FRBp RMC 227 Rc FunctionR RMC 0 6 11 15 16 21 23 31 Ceil 1 0b00 Floor 1 0b01 This operation is the same as the Round To FP Integer Nearbyint0 0b11 With Inexact operation, except that this operation does Round 0 0b10 not recognize an inexact exception. Trunc 0 0b01 Figure 94 summarizes the actions for Round To FP Note that nearbyint is similar to the rint function but Integer Without Inexact. The table does not include the without raising the inexact exception. Similarly ceil, setting of the FPSCRFPRF field. The FPSCRFPRF field floor, round, and trunc do not require the inexact is always set to the class and sign of the result, except exception. for an enabled invalid-operation, in which case the field remains unchanged. Operand b in Inv.-Op. Exception Actions* FRB is Enabled - - T(-dINF), FI 0, FR 0 F - W(n), FI 0, FR 0 + - T(+dINF), FI 0, FR 0 QNaN - P(b), FI 0, FR 0 SNaN No U(b), FI 0, FR 0, VXSNAN 1 SNaN Yes VXSNAN 1, TV Explanation: * Setting of VXSNAN is part of the corresponding exception actions. Also, when an invalid-operation exception occurs, setting of FI and FR bits is part of the exception actions. (See the sections, "Invalid Operation Exception" for more details.) - The actions do not depend on this condition. dINF Default infinity. F All finite numbers, including zeros. FI Floating-Point-Fraction-Inexact status flag, FPSCRFI. FR Floating-Point-Fraction-Rounded status flag, FPSCRFR. n The value derived when the source operand, b, is rounded to an integer using the special rounding for Round-To-FP-Integer. P(x) The QNaN of operand x is propagated and placed in FRT[p]. T(x) The value x is placed in FRT[p]. TV The system floating-point enabled exception error handler is invoked for the invalid-operation exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FPT[p]. W(x) The value x in the form of zero exponent or the source exponent is placed in FRT[p]. Figure 94. Actions: Round to FP Integer Without Inexact Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 191 Version 2.06 5.6.5 DFP Conversion Instructions Programming Note The DFP conversion instructions consist of data-format DFP does not provide operations on short oper- conversion instructions and data-type conversion ands, so they must be converted to long format, instructions. They are all X-form instructions and and then converted back to be stored. Preserving employ the record bit (Rc). correct signaling NaN semantics requires that sig- naling NaNs be propagated from the source to the result without recognizing an exception during wid- 5.6.5.1 DFP Data-Format Conversion ening from short to long or narrowing from long to Instructions short. Because DFP does not provide equivalents to the FP Load Floating-Point Single and Store The data-format conversion instructions consist of Con- Floating-Point Single functions, the widening is per- vert To DFP Long, Convert To DFP Extended, Round formed by loading the DFP short value with a Load To DFP Short, and Round To DFP Long. Figure 95 Floating as Integer Word Indexed followed by a summarizes the actions for these instructions. DFP Convert to DFP Long, and narrowing is per- formed by a DFP Round to DFP Short followed by a Store Floating-Point as Integer Word Indexed. If the SNaN or infinity in DFP short format uses the preferred DPD encoding, then converting this oper- and to DFP long format and back to DFP short will result in the original bit pattern. Actions when operand b in FRB[p] is Instruction F QNaN SNaN Convert To DFP Long T(b)1 P(b)2,4 P(b)2,4 P(b)3,4 Convert To DFP Extended T(b)1 T(dINF) P(b)2,4 VXSNAN: U(b)2,4 Round To DFP Short R(b) 1 P(b) 2,5 P(b) 2,5 P(b)3,5 1 2,5 Round To DFP Long R(b) T(dINF) P(b) VXSNAN: U(b)2,5 Explanation: 1The ideal exponent is the exponent of the source operand. 2Bits 5:N-1 of the N-bit combination field are set to zero. 3Bit 5 of the N-bit combination field is set to one. Bits 6:N-1 of the combination field are set to zero. 4The trailing significand field is padded on the left with zeros. 5Leftmost digits in the trailing significand field are removed. dINFDefault infinity. FAll finite numbers, including zeros. P(x)The special symbol in operand x is propagated into FRT[p]. R(x)The value x is rounded to the target-format precision; see Section 5.5.11 T(x)The value x is placed in FRT[p]. U(x)The SNaN of operand x is converted to the corresponding QNaN. VXSNANThe Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the exception is disabled. See Section 5.5.10.1 for actions. Figure 95. Actions: Data-Format Conversion Instructions 192 Power ISATM Book I Version 2.06 DFP Convert To DFP Long X-form DFP Convert To DFP Extended X-form dctdp FRT,FRB (Rc=0) dctqpq FRTp,FRB (Rc=0) dctdp. FRT,FRB (Rc=1) dctqpq. FRTp,FRB (Rc=1) 59 FRT /// FRB 258 Rc 63 FRTp /// FRB 258 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The DFP short operand in bits 32-63 of FRB is con- The DFP long operand in the FRB is converted to DFP verted to DFP long format and the converted result is extended format and placed into FRTp. The sign of the placed into FRT. The sign of the result is the same as result is the same as the sign of the operand in FRB. the sign of the source operand. The ideal exponent is The ideal exponent is the exponent of the operand in the exponent of the source operand. FRB. If the operand in FRB is an SNaN, it is converted to an If the operand in FRB is an SNaN, an invalid-operation SNaN in DFP long format and does not cause an exception is recognized. If the exception is disabled, invalid-operation exception. the SNaN is converted to the corresponding QNaN in DFP extended format. Special Registers Altered: FPRF FR (undefined) FI (undefined) Special Registers Altered: CR1 (if Rc=1) FPRF FR (set to 0) FI (set to 0) FX Programming Note VXSNAN Note that DFP short format is a storage-only for- CR1 (if Rc=1) mat, Therefore, conversion of a short SNaN to long format will not cause an exception and the SNaN is preserved. Subsequent operation on that SNaN in long format will cause an exception. Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 193 Version 2.06 DFP Round To DFP Short X-form DFP Round To DFP Long X-form drsp FRT,FRB (Rc=0) drdpq FRTp,FRBp (Rc=0) drsp. FRT,FRB (Rc=1) drdpq. FRTp,FRBp (Rc=1) 59 FRT /// FRB 770 Rc 63 FRTp /// FRBp 770 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The DFP long operand in FRB is converted and The DFP extended operand in FRBp is converted and rounded to DFP short format. The DFP short value is rounded to DFP long format. The result concatenated extended on the left with zeros to form a 64-bit entity with 64 0s is placed in FRTp. The sign of the result is and placed into FRT. The sign of the result is the same the same as the sign of the source operand. The ideal as the sign of the source operand. The ideal exponent exponent is the exponent of the operand in FRBp. is the exponent of the source operand. If the operand in FRBp is an SNaN, an invalid-opera- If the operand in FRB is an SNaN, it is converted to an tion exception is recognized. If the exception is dis- SNaN in DFP short format and does not cause an abled, the SNaN is converted to the corresponding invalid-operation exception. QNaN in DFP long format. Normally, the result is in the format and length of the Normally, the result is in the format and length of the target. However, when an overflow or underflow target. However, when an overflow or underflow exception occurs and if the exception is enabled, the exception occurs and if the exception is enabled, the operation is completed by producing a wrapped operation is completed by producing a wrapped rounded result in the same format and length as the rounded result in the same format and length as the source but rounded to the target-format precision. source but rounded to the target-format precision. Special Registers Altered: Special Registers Altered: FPRF FR FI FPRF FR FI FX OX UX XX FX OX UX XX CR1 (if Rc=1) VXSNAN CR1 (if Rc=1) Programming Note Note that DFP short format is a storage-only for- Programming Note mat, Therefore, conversion of a long SNaN to short Note that DFP Round to DFP Long, while produc- format will not cause an exception. Converting a ing a result in DFP long format, actually targets a long format SNaN to short format is an implied register pair, writing 64 0s in FRTp+1. move operation. 194 Power ISATM Book I Version 2.06 5.6.5.2 DFP Data-Type Conversion Instructions The DFP data-type conversion instructions are used to The data-type conversion instructions consist of Con- convert data type between DFP and fixed. vert From Fixed and Convert To Fixed. DFP Convert From Fixed X-form DFP Convert To Fixed [Quad] X-form dcffix FRT,FRB (Rc=0) dctfix FRT,FRB (Rc=0) dcffix. FRT,FRB (Rc=1) dctfix. FRT,FRB (Rc=1) 59 FRT /// FRB 802 Rc 59 FRT /// FRB 290 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The 64-bit signed binary integer in FRB is converted dctfixq FRT,FRBp (Rc=0) and rounded to a DFP Long value and placed into FRT. dctfixq. FRT,FRBp (Rc=1) The sign of the result is the same as the sign of the source operand. The ideal exponent is zero. 63 FRT /// FRBp 290 Rc 0 6 11 16 21 31 If the source operand is a zero, then a plus zero with a zero exponent is returned. The DFP operand in FRB[p] is rounded to an integer The FPSCRFPRF field is set to the class and sign of the value and is placed into FRT in the 64-bit signed binary result. integer format. The sign of the result is the same as Special Registers Altered: the sign of the source operand, except when the source FPRF FR FI operand is a NaN or a zero. FX XX Figure 96 summarizes the actions for Convert To Fixed. CR1 (if Rc=1) Special Registers Altered: DFP Convert From Fixed Quad X-form FPRF (undefined) FR FI FX XX dcffixq FRTp,FRB (Rc=0) VXSNAN VXCVI dcffixq. FRTp,FRB (Rc=1) CR1 (if Rc=1) 63 FRTp /// FRB 802 Rc Programming Note 0 6 11 16 21 31 It is recommended that software pre-round the operand to a floating-point integral using drintx[q] The 64-bit signed binary integer in FRB is converted or drintn[q] is a rounding mode other than the cur- and rounded to a DFP Extended value and placed into rent rounding mode specified by FPSCRDRN is FRTp. The sign of the result is the same as the sign of needed. Saving, modifying and restoring the the source operand. The ideal exponent is zero. FPSCR just to temporarily change the rounding mode is less efficient than just employing drintx[p] If the source operand is a zero, then a plus zero with a or drint[p] which override the current rounding zero exponent is returned. mode using an immediate control field. The FPSCRFPRF field is set to the class and sign of the For example if the desired function rounding is result. Round to Nearest, Ties away from 0 but the default rounding (from FPSCRDRN) is Round to Nearest, Special Registers Altered: Ties to Even then following is preferred. FPRF FR (undefined) FI (undefined) CR1 (if Rc=1) drintn 0,f1,f1,2 dctfix f1,f1 Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 195 Version 2.06 Is n not Inv.-Op. Inexact Is n Incre- Operand b q is precise Except. Except. mented Actions * in FRB[p] is (n b) Enabled Enabled (|n| > |b|) - b < MN < MN - No - - T(MN), FI 0, FR 0, VXCVI 1 - b < MN < MN - Yes - - VXCVI 1, TV - < b < MN = MN - - No - T(MN), FI 1, FR 0, XX 1 - < b < MN = MN - - Yes - T(MN), FI 1, FR 0, XX 1,TX MN b < 0 - No - - - T(n), FI 0, FR 0 MN b < 0 - Yes - No No T(n), FI 1, FR 0, XX 1 MN b < 0 - Yes - No Yes T(n), FI 1, FR 1, XX 1 MN b < 0 - Yes - Yes No T(n), FI 1, FR 0, XX 1, TX MN b < 0 - Yes - Yes Yes T(n), FI 1, FR 1, XX 1, TX ±0 - No - - - T(0), FI 0, FR 0 0 < b MP - No - - - T(n), FI 0, FR 0 0 < b MP - Yes - No No T(n), FI 1, FR 0, XX 1 0 < b MP - Yes - No Yes T(n), FI 1, FR 1, XX 1 0 < b MP - Yes - Yes No T(n), FI 1, FR 0, XX 1, TX 0 < b MP - Yes - Yes Yes T(n), FI 1, FR 1, XX 1, TX MP < b < + = MP - - No - T(MP), FI 1, FR 0, XX 1 MP < b < + = MP - - Yes - T(MP), FI 1, FR 0, XX 1, TX MP < b + > MP - No - - T(MP), FI 0, FR 0, VXCVI 1 MP < b + > MP - Yes - - VXCVI 1, TV QNaN - - No - - T(MN), FI 0, FR 0, VXCVI 1 QNaN - - Yes - - VXCVI 1, TV SNaN - - No - - T(MN),FI 0, FR 0, VXCVI 1,VXSNAN 1 SNaN - - Yes - - VXCVI 1,VXSNAN 1, TV Explanation: * Setting of XX, VXCVI, and VXSNAN is part of the corresponding exception actions. Also, when an invalid-operation exception occurs, setting of FI and FR bits is part of the exception actions. (See the sections, "Inexact Exception" and "Invalid Operation Exception" for more details.) - The actions do not depend on this condition. FI Floating-Point-Fraction-Inexact status flag, FPSCRFI. FR Floating-Point-Fraction-Rounded status flag, FPSCRFR. MN Maximum negative number representable by the 64-bit binary integer format MP Maximum positive number representable by the 64-bit binary integer format. n The value q converted to a fixed-point result. q The value derived when the source value b is rounded to an integer using the specified rounding mode T(x) The value x is placed in FRT[p]. TV The system floating-point enabled exception error handler is invoked for the invalid-operation exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-excep- tion mode. TX The system floating-point enabled exception error handler is invoked for the inexact exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. VXCVI The FPSCRVXCVI invalid operation exception status bit. VXSNAN The FPSCRVXSNAN invalid operation exception status bit. XX Floating-Point-Inexact-Exception status flag, FPSCRXX. Figure 96. Actions: Convert To Fixed 196 Power ISATM Book I Version 2.06 5.6.6 DFP Format Instructions The DFP format instructions are used to compose or The format instructions consist of Decode DPD To decompose a DFP operand. A source operand of BCD, Encode BCD To DPD, Extract Biased Exponent, SNaN does not cause an invalid-operation exception. Insert Biased Exponent, Shift Significand Left Immedi- All format instructions employ the record bit (Rc). ate, and Shift Significand Right Immediate. DFP Decode DPD To BCD [Quad] X-form DFP Encode BCD To DPD [Quad] X-form ddedpd SP,FRT,FRB (Rc=0) denbcd S,FRT,FRB (Rc=0) ddedpd. SP,FRT,FRB (Rc=1) denbcd. S,FRT,FRB (Rc=1) 59 FRT SP /// FRB 322 Rc 59 FRT S /// FRB 834 Rc 0 6 11 13 16 21 31 0 6 11 12 16 21 31 ddedpdq SP,FRTp,FRBp (Rc=0) denbcdq S,FRTp,FRBp (Rc=0) ddedpdq. SP,FRTp,FRBp (Rc=1) denbcdq. S,FRTp,FRBp (Rc=1) 63 FRTp SP /// FRBp 322 Rc 63 FRTp S /// FRBp 834 Rc 0 6 11 13 16 21 31 0 6 11 12 16 21 31 A portion of the significand of the DFP operand in The signed or unsigned BCD operand, depending on FRB[p] is converted to a signed or unsigned BCD num- the S field, in FRB[p] is converted to a DFP number. ber depending on the SP field. For infinity and NaN, the The ideal exponent is zero. significand is considered to be the contents in the trail- ing significand field padded on the left by a zero digit. S = 0 (unsigned BCD operand) The unsigned BCD operand in FRB[p] is converted SP0 = 0 (unsigned conversion) to a positive DFP number of the same magnitude The rightmost 16 digits of the significand (32 digits and the result is placed into FRT[p]. for ddedpdq) is converted to an unsigned BCD number and the result is placed into FRT[p]. S = 1 (signed BCD operand) The signed BCD operand in FRB[p] is converted to SP0 = 1 (signed conversion) the corresponding DFP number and the result is The rightmost 15 digits of the significand (31 digits placed into FRT[p]. for ddedpdq) is converted to a signed BCD num- If an invalid BCD digit or sign code is detected in the ber with the same sign as the DFP operand, and source operand, an invalid-operation exception the result is placed into FRT[p]. If the DFP operand (VXCVI) occurs. is negative, the sign is encoded as 0b1101. If the DFP operand is positive, SP1 indicates which pre- FPSCRFPRF is set to the class and sign of the result, ferred plus sign encoding is used. If SP1 = 0, the except for Invalid Operation Exception when plus sign is encoded as 0b1100 (the option-1 pre- FPSCRVE=1. ferred sign code), otherwise the plus sign is encoded as 0b1111(the option-2 preferred sign Special Registers Altered: code). FPRF FR (set to 0) FI (set to 0) FX Special Registers Altered: VXCVI CR1 (if Rc=1) CR1 (if Rc=1) Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 197 Version 2.06 DFP Extract Biased Exponent [Quad] DFP Insert Biased Exponent [Quad] X-form X-form dxex FRT,FRB (Rc=0) diex FRT,FRA,FRB (Rc=0) dxex. FRT,FRB (Rc=1) diex. FRT,FRA,FRB (Rc=1) 59 FRT /// FRB 354 Rc 59 FRT FRA FRB 866 Rc 0 6 11 16 21 31 0 6 11 16 21 31 dxexq FRT,FRBp (Rc=0) diexq FRTp,FRA,FRBp (Rc=0) dxexq. FRT,FRBp (Rc=1) diexq. FRTp,FRA,FRBp (Rc=1) 63 FRT /// FRBp 354 Rc 63 FRTp FRA FRBp 866 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The biased exponent of the operand in FRB[p] is Let a be the value of the 64-bit signed binary integer in extracted and placed into FRT in the 64-bit signed FRA. binary integer format. When the operand in FRB is an a Result infinity, QNaN, or SNaN, a special code is returned. a > MBE1 QNaN MBE m a m 0 Finite number with biased exponent a Operand Result a = -1 Infinity Finite Number biased exponent value a = -2 QNaN Infinity -1 a = -3 SNaN QNaN -2 a < -3 QNaN SNaN -3 1 Maximum biased exponent for the target format Special Registers Altered: When 0 [ a [ MBE, a is the biased target exponent that CR1 (if Rc=1) is combined with the sign bit and the significand value of the DFP operand in FRB[p] to form the DFP result in Programming Note FRT[p]. The ideal exponent is the specified target The exponent bias value is 101 for DFP Short, 398 exponent. for DFP Long, and 6176 for DFP Extended. When a specifies a special code (a < 0 or a > MBE), an infinity, QNaN, or SNaN is formed in FRT[p] with the trailing significand field containing the value from the trailing significand field of the source operand in FRB[p], and with an N-bit combination field set as fol- lows. For an Infinity result, the leftmost 5 bits are set to 0b11110, and the rightmost N-5 bits are set to zero. For a QNaN result, the leftmost 5 bits are set to 0b11111, bit 5 is set to zero, and the rightmost N-5 bits are set to zero. For an SNaN result, the leftmost 5 bits are set to 0b11111, bit 5 is set to one, and the rightmost N-5 bits are set to zero. Special Registers Altered: CR1 (if Rc=1) Programming Note The exponent bias value is 101 for DFP Short, 398 for DFP Long, and 6176 for DFP Extended. 198 Power ISATM Book I Version 2.06 Operand a in Actions for Insert Biased Exponent when operand b in FRB[p] specifies FRA[p] specifies F QNaN SNaN F N, Rb Z, Rb Z, Rb Z, Rb I, Rb I, Rb I, Rb I, Rb QNaN Q, Rb Q, Rb Q, Rb Q, Rb SNaN S, Rb S, Rb S, Rb S, Rb Explanation: F All finite numbers, including zeros I The combination field in FRT[p] is set to indicate a default Infinity. N The combination field in FRT[p] is set to the specified biased exponent in FRA and the leftmost significand digit in FRB[p]. Q The combination field in FRT[p] is set to indicate a default QNaN. S The combination field in FRT[p] is set to indicate a default SNaN. Z The combination field in FRT[p] is set to indicate the specific biased exponent in FRA and a leftmost coefficient digit of zero. Rb The contents of the trailing significand field in FRB[p] are reencoded using preferred DPD encodings and the reencoded result is placed in the same field in FRT[p]. The sign bit of FRB[p] is copied into the sign bit in FRT[p]. Figure 97. Actions: Insert Biased Exponent Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 199 Version 2.06 DFP Shift Significand Left Immediate DFP Shift Significand Right Immediate [Quad] Z22-form [Quad] Z22-form dscli FRT,FRA,SH (Rc=0) dscri FRT,FRA,SH (Rc=0) dscli. FRT,FRA,SH (Rc=1) dscri. FRT,FRA,SH (Rc=1) 59 FRT FRA SH 66 Rc 59 FRT FRA SH 98 Rc 0 6 11 16 22 31 0 6 11 16 22 31 dscriq FRTp,FRAp,SH (Rc=0) dscliq FRTp,FRAp,SH (Rc=0) dscriq. FRTp,FRAp,SH (Rc=1) dscliq. FRTp,FRAp,SH (Rc=1) 63 FRTp FRAp SH 98 Rc 63 FRTp FRAp SH 66 Rc 0 6 11 16 22 31 0 6 11 16 22 31 The significand of the DFP operand in FRA[p] is shifted The significand of the DFP operand in FRA[p] is shifted right SH digits. For a NaN or infinity, all significand dig- left SH digits. For a NaN or infinity, all significand digits its are in the trailing significand field. SH is a 6-bit are in the trailing significand field. SH is a 6-bit unsigned binary integer. Digits shifted out of the units unsigned binary integer. Digits shifted out of the left- digit are lost. Zeros are supplied to the vacated posi- most digit are lost. Zeros are supplied to the vacated tions on the left. The result is placed into FRT[p]. The positions on the right. The result is placed into FRT[p]. sign of the result is the same as the sign of the source The sign of the result is the same as the sign of the operand in FRA[p]. source operand in FRA[p]. If the source operand in FRA[p] is a finite number, the If the source operand in FRA[p] is a finite number, the exponent of the result is the same as the exponent of exponent of the result is the same as the exponent of the source operand. the source operand. For an Infinity, QNaN or SNaN result, the target for- For an Infinity, QNaN or SNaN result, the target for- mat's N-bit combination field is set as follows. mat's N-bit combination field is set as follows. For an Infinity result, For an Infinity result, the leftmost 5 bits are set to 0b11110, and the leftmost 5 bits are set to 0b11110, and the rightmost N-5 bits are set to zero. the rightmost N-5 bits are set to zero. For a QNaN result, For a QNaN result, the leftmost 5 bits are set to 0b11111, the leftmost 5 bits are set to 0b11111, bit 5 is set to zero, and bit 5 is set to zero, and the rightmost N-6 bits are set to zero. the rightmost N-6 bits are set to zero. For an SNaN result, For an SNaN result, the leftmost 5 bits are set to 0b11111, the leftmost 5 bits are set to 0b11111, bit 5 is set to one, and bit 5 is set to one, and the rightmost N-6 bits are set to zero. the rightmost N-6 bits are set to zero. Special Registers Altered: Special Registers Altered: CR1 (if Rc=1) CR1 (if Rc=1) 200 Power ISATM Book I Version 2.06 5.6.7 DFP Instruction Summary Mnemonic FPRF Encoding FP FORM FPCC FR\FI SNaN Exception Rc Full Name Operands Vs G V Z O U X IE C dadd DFP Add X FRT, FRA, FRB Y N RE Y Y V O U X Y Y Y daddq DFP Add Quad X FRTp, FRAp, FRBp Y N RE Y Y V O U X Y Y Y dsub DFP Subtract X FRT, FRA, FRB Y N RE Y Y V O U X Y Y Y dsubq DFP Subtract Quad X FRTp, FRAp, FRBp Y N RE Y Y V O U X Y Y Y dmul DFP Multiply X FRT, FRA, FRB Y N RE Y Y V O U X Y Y Y dmulq DFP Multiply Quad X FRTp, FRAp, FRBp Y N RE Y Y V O U X Y Y Y ddiv DFP Divide X FRT, FRA, FRB Y N RE Y Y V Z O U X Y Y Y ddivq DFP Divide Quad X FRTp, FRAp, FRBp Y N RE Y Y V Z O U X Y Y Y dcmpo DFP Compare Ordered X BF, FRA, FRB Y - - N Y V - - N dcmpoq DFP Compare Ordered Quad X BF, FRAp, FRBp Y - - N Y V - - N dcmpu DFP Compare Unordered X BF, FRA, FRB Y - - N Y V - - N dcmpuq DFP Compare Unordered Quad X BF, FRAp, FRBp Y - - N Y V - - N dtstdc DFP Test Data Class Z22 BF, FRA, DCM N - - N Y1 - - N dtstdcq DFP Test Data Class Quad Z22 BF, FRAp, DCM N - - N Y1 - - N dtstdg DFP Test Data Group Z22 BF, FRA,DGM N - - N Y1 - - N dtstdgq DFP Test Data Group Quad Z22 BF, FRAp, DGM N - - N Y1 - - N dtstex DFP Test Exponent X BF, FRA, FRB N - - N Y - - N dtstexq DFP Test Exponent Quad X BF, FRAp, FRBp N - - N Y - - N dtstsf DFP Test Significance X BF, FRA(FIX), FRB N - - N Y - - N dtstsfq DFP Test Significance Quad X BF, FRA(FIX), FRBp N - - N Y - - N dquai DFP Quantize Immediate Z23 TE, FRT, FRB, RMC Y N RE Y Y V X Y Y Y dquaiq DFP Quantize Immediate Quad Z23 TE, FRTp, FRBp, RMC Y N RE Y Y V X Y Y Y dqua DFP Quantize Z23 FRT,FRA,FRB,RMC Y N RE Y Y V X Y Y Y dquaq DFP Quantize Quad Z23 FRTp,FRAp,FRBp, RMC Y N RE Y Y V X Y Y Y drrnd DFP Reround Z23 FRT,FRA(FIX),FRB,RMC Y N RE Y Y V X Y Y Y FRTp, FRA(FIX), FRBp, Y drrndq DFP Reround Quad Z23 Y N RE Y Y V X Y Y RMC DFP Round To FP Integer With Y drintx Z23 R,FRT, FRB,RMC Y N RE Y Y V X Y Y Inexact DFP Round To FP Integer With Y drintxq Z23 R,FRTp,FRBp,RMC Y N RE Y Y V X Y Y Inexact Quad DFP Round To FP Integer With- Y drintn Z23 R,FRT, FRB,RMC Y N RE Y Y V Y# Y out Inexact DFP Round To FP Integer With- Y drintnq Z23 R,FRTp, FRBp,RMC Y N RE Y Y V Y# Y out Inexact Quad dctdp DFP Convert To DFP Long X FRT, FRB (DFP Short) N Y RE Y Y2 U Y Y dctqpq DFP Convert To DFP Extended X FRTp, FRB Y N RE Y Y V Y# Y Y drsp DFP Round To DFP Short X FRT (DFP Short), FRB N Y RE Y Y2 O UX Y Y Y drdpq DFP Round To DFP Long X FRTp, FRBp Y N RE Y Y V O U X Y Y Y dcffixq DFP Convert From Fixed Quad X FRTp, FRB (FIX) - N RE Y Y U Y Y dctfix DFP Convert To Fixed X FRT (FIX), FRB Y N - U U V X Y - Y dctfixq DFP Convert To Fixed Quad X FRT (FIX), FRBp Y N - U U V X Y - Y ddedpd DFP Decode DPD To BCD X SP, FRT(BCD), FRB N - - N N - - Y Figure 98. Decimal Floating-Point Instructions Summary Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 201 Version 2.06 Mnemonic FPRF Encoding FP FORM FPCC FR\FI SNaN Exception Rc Full Name Operands Vs G V Z O U X IE C ddedpdq DFP Decode DPD To BCD Quad X SP, FRTp(BCD), FRBp N - - N N - - Y Y denbcd DFP Encode BCD To DPD X S, FRT, FRB (BCD) - N RE Y Y V Y Y# denbcdq DFP Encode BCD To DPD Quad X S, FRTp, FRBp (BCD) - N RE Y Y V Y# Y Y dxex DFP Extract Biased Exponent X FRT (FIX), FRB N N - N N - - Y DFP Extract Biased Exponent Y dxexq X FRT (FIX), FRBp N N - N N - - Quad diex DFP Insert Biased Exponent X FRT, FRA(FIX), FRB N Y RE N N - Y Y DFP Insert Biased Exponent Y diexq X FRTp, FRA(FIX), FRBp N Y RE N N - Y Quad DFP Shift Significand Left Imme- Y dscli Z22 FRT,FRA,SH N Y RE N N - - diate DFP Shift Significand Left Imme- Y dscliq Z22 FRTp,FRAp,SH N Y RE N N - - diate Quad DFP Shift Significand Right Imme- Y dscri Z22 FRT,FRA,SH N Y RE N N - - diate DFP Shift Significand Right Imme- Y dscriq Z22 FRTp,FRAp,SH N Y RE N N - - diate Quad Explanation: # FI and FR are set to zeros for these instructions. - Not applicable. 1 A unique definition of the FPSCRFPCC field is provided for the instruction. These are the only instructions that may generate an SNaN and also set the FPSCFPRF field. Since the BFP FPSCRFPRF 2 field does not include a code for SNaN, these instructions cause the need for redefining the FPSCRFPRF field for DFP. DCM A 6-bit immediate operand specifying the data-class mask. DGM A 6-bit immediate operand specifying the data-group mask. G An SNaN can be generated as the target operand. IE An ideal exponent is defined for the instruction. FI Setting of the FPSCRFI flag. FR Setting of the FPSCRFR flag. N No. O An overflow exception may be recognized. Rc The record bit, Rc, is provided to record FPSCR0:3 in CR field 1. The trailing significand field is reencoded using preferred DPD encodings.The preferred DPD encoding are also used for RE propagated NaNs, or converted NaNs and infinities. RMC A 2-bit immediate operand specifying the rounding-mode control. S An one-bit immediate operand specifying if the operation is signed or unsigned. A two-bit immediate operand: one bit specifies if the operation is signed or unsigned and, for signed operations, another SP bit specifies which preferred plus sign code is generated. U An underflow exception may be recognized. V An invalid-operation exception may be recognized. Vs An input operand of SNaN causes an invalid-operation exception. X An inexact exception may be recognized. Y Yes. U Undefined Z A zero-divide exception may be recognized. Figure 98. Decimal Floating-Point Instructions Summary (Continued) 202 Power ISATM Book I Version 2.06 Chapter 6. Vector Facility [Category: Vector] 6.1 Vector Facility Overview . . . . . . . . 204 6.8.6 Vector Shift Instructions . . . . . . . 228 6.2 Chapter Conventions . . . . . . . . . . 204 6.9 Vector Integer Instructions . . . . . . 230 6.2.1 Description of Instruction 6.9.1 Vector Integer Arithmetic Operation . . . . . . . . . . . . . . . . . . . . . . 204 Instructions . . . . . . . . . . . . . . . . . . . . . 230 6.3 Vector Facility Registers . . . . . . . . 205 6.9.1.1 Vector Integer Add 6.3.1 Vector Registers . . . . . . . . . . . . 205 Instructions . . . . . . . . . . . . . . . . . . . . . 230 6.3.2 Vector Status and Control 6.9.1.2 Vector Integer Subtract Instructions Register. . . . . . . . . . . . . . . . . . . . . . . . 205 233 6.3.3 VR Save Register . . . . . . . . . . . 206 6.9.1.3 Vector Integer Multiply 6.4 Vector Storage Access Instructions . . . . . . . . . . . . . . . . . . . . . 236 Operations. . . . . . . . . . . . . . . . . . . . . . 206 6.9.1.4 Vector Integer Multiply-Add/Sum 6.4.1 Accessing Unaligned Storage Instructions . . . . . . . . . . . . . . . . . . . . . 238 Operands . . . . . . . . . . . . . . . . . . . . . . 208 6.9.1.5 Vector Integer Sum-Across 6.5 Vector Integer Operations . . . . . . . 209 Instructions . . . . . . . . . . . . . . . . . . . . . 243 6.5.1 Integer Saturation . . . . . . . . . . . 209 6.9.1.6 Vector Integer Average Instructions 6.6 Vector Floating-Point Operations . 210 245 6.6.1 Floating-Point Overview. . . . . . . 210 6.9.1.7 Vector Integer Maximum and 6.6.2 Floating-Point Exceptions . . . . . 210 Minimum Instructions. . . . . . . . . . . . . . 247 6.6.2.1 NaN Operand Exception. . . . . 211 6.9.2 Vector Integer Compare 6.6.2.2 Invalid Operation Exception . . 211 Instructions . . . . . . . . . . . . . . . . . . . . . 251 6.6.2.3 Zero Divide Exception . . . . . . 211 6.9.3 Vector Logical Instructions . . . . . 254 6.6.2.4 Log of Zero Exception . . . . . . 211 6.9.4 Vector Integer Rotate and Shift 6.6.2.5 Overflow Exception. . . . . . . . . 211 Instructions . . . . . . . . . . . . . . . . . . . . . 255 6.6.2.6 Underflow Exception. . . . . . . . 212 6.10 Vector Floating-Point Instruction 6.7 Vector Storage Access Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Instructions . . . . . . . . . . . . . . . . . . . . . 212 6.10.1 Vector Floating-Point Arithmetic 6.7.1 Storage Access Exceptions . . . . 212 Instructions . . . . . . . . . . . . . . . . . . . . . 259 6.7.2 Vector Load Instructions . . . . . . 213 6.10.2 Vector Floating-Point Maximum and 6.7.3 Vector Store Instructions . . . . . . 216 Minimum Instructions. . . . . . . . . . . . . . 261 6.7.4 Vector Alignment Support 6.10.3 Vector Floating-Point Rounding and Instructions . . . . . . . . . . . . . . . . . . . . . 218 Conversion Instructions . . . . . . . . . . . . 262 6.8 Vector Permute and Formatting 6.10.4 Vector Floating-Point Compare Instructions . . . . . . . . . . . . . . . . . . . . . 219 Instructions . . . . . . . . . . . . . . . . . . . . . 265 6.8.1 Vector Pack and Unpack Instructions 6.10.5 Vector Floating-Point Estimate 219 Instructions . . . . . . . . . . . . . . . . . . . . . 267 6.8.2 Vector Merge Instructions . . . . . 224 6.11 Vector Status and Control Register 6.8.3 Vector Splat Instructions . . . . . . 226 Instructions . . . . . . . . . . . . . . . . . . . . . 269 6.8.4 Vector Permute Instruction . . . . 227 6.8.5 Vector Select Instruction . . . . . . 227 Chapter 6. Vector Facility [Category: Vector] 203 Version 2.06 6.1 Vector Facility Overview Clamp(x, y, z) x is interpreted as a signed integer. If the This chapter describes the registers and instructions value of x is less than y, then the value y is that make up the Vector Facility. returned, else if the value of x is greater than z, the value z is returned, else the value x is returned. 6.2 Chapter Conventions if (x < y) then result y VSCRSAT 1 6.2.1 Description of Instruction else if (x > z) then result z Operation VSCRSAT 1 else result x The following notation, in addition to that described in RoundToSPIntCeil(x) Section 1.3.2, is used in this chapter. Additional RTL The value x if x is a single-precision float- functions are described in Appendix C. ing-point integer; otherwise the smallest Notation Meaning single-precision floating-point integer that x?y:z if the value of x is true, then the value of y, is greater than x. otherwise the value z. RoundToSPIntFloor(x) +int Integer addition. The value x if x is a single-precision float- +fp Floating-point addition. ing-point integer; otherwise the largest sin- ­fp Floating-point subtraction. gle-precision floating-point integer that is ×sui Multiplication of a signed-integer (first less than x. operand) by an unsigned-integer (second RoundToSPIntNear(x) operand). The value x if x is a single-precision float- ×fp Floating-point multiplication. ing-point integer; otherwise the single-pre- =int Integer equals relation. cision floating-point integer that is nearest =fp Floating-point equals relation. in value to x (in case of a tie, the even sin- ui, ui gle-precision floating-point integer is Unsigned-integer comparison relations. used). si, si RoundToSPIntTrunc(x) Signed-integer comparison relations. The value x if x is a single-precision float- fp, fp ing-point integer; otherwise the largest sin- Floating-point comparison relations. gle-precision floating-point integer that is LENGTH( x ) Length of x, in bits. If x is the word "ele- less than x if x>0, or the smallest sin- ment", LENGTH( x ) is the length, in bits, gle-precision floating-point integer that is of the element implied by the instruction greater than x if x<0. mnemonic. RoundToNearSP(x) x << y Result of shifting x left by y bits, filling The single-precision floating-point number vacated bits with zeros. that is nearest in value to the infinitely-pre- b LENGTH(x) cise floating-point intermediate result x (in result (y < b) ? (xy:b-1 ||y0) : b0 case of a tie, the single-precision float- x >>ui y Result of shifting x right by y bits, filling ing-point value with the least-significant bit vacated bits with zeros. equal to 0 is used). b LENGTH(x) ReciprocalEstimateSP(x) result (y < b) ? (y0 || x0:(b-y)-1) : b0 A single-precision floating-point estimate x >> y Result of shifting x right by y bits, filling of the reciprocal of the single-precision vacated bits with copies of bit 0 (sign bit) floating-point number x. of x. ReciprocalSquareRootEstimateSP(x) b LENGTH(x) A single-precision floating-point estimate result (y>ui ( shb || 0b000 ) do i=0 to 127 by 8 t t & ((VRB)i+5:i+7=sh) The contents of VRA are shifted right by the number of if t=1 then VRT (VRA) >>ui sh bytes specified in (VRB)121:124. else VRT undefined - Bytes shifted out of byte 15 are lost. - Zeros are supplied to the vacated bytes on the The contents of VRA are shifted right by the number of left. bits specified in (VRB)125:127. - Bits shifted out of bit 127 are lost. The result is placed into VRT. - Zeros are supplied to the vacated bits on the Special Registers Altered: left. None The result is place into VRT, except if, for any byte ele- ment in register VRB, the low-order 3 bits are not equal to the shift amount, then VRT is undefined. Special Registers Altered: None Programming Note A double-register shift by a dynamically specified number of bits (0-127) can be performed in six instructions. The following example shifts Vw || Vx left by the number of bits specified in Vy and places the high-order 128 bits of the result into Vz. vslo Vt1,Vw,Vy #shift high-order reg left vsl Vt1,Vt1,Vy vsububm Vt3,V0,Vy #adjust shift count ((V0)=0) vsro Vt2,Vx,Vt3 #shift low-order reg right vsr Vt2,Vt2,Vt3 vor Vz,Vt1,Vt2 #merge to get final result Chapter 6. Vector Facility [Category: Vector] 229 Version 2.06 6.9 Vector Integer Instructions 6.9.1 Vector Integer Arithmetic Instructions 6.9.1.1 Vector Integer Add Instructions Vector Add and Write Carry-Out Unsigned Vector Add Signed Byte Saturate VX-form Word VX-form vaddsbs VRT,VRA,VRB vaddcuw VRT,VRA,VRB 4 VRT VRA VRB 768 4 VRT VRA VRB 384 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 32 aop EXTS(VRAi:i+7) aop EXTZ((VRA)i:i+31) bop EXTS(VRBi:i+7) bop EXTZ((VRB)i:i+31) VRTi:i+7 Clamp( aop +int bop, -128, 127 )24:31 VRTi:i+31 Chop( ( aop +int bop ) >>ui 32,1) For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 3, do the following. Signed-integer byte element i in VRA is added to Unsigned-integer word element i in VRA is added signed-integer byte element i in VRB. to unsigned-integer word element i in VRB. The - If the sum is greater than 127 the result carry out of the 32-bit sum is zero-extended to 32 saturates to 127. bits and placed into word element i of VRT. - If the sum is less than -128 the result sat- urates to -128. Special Registers Altered: None The low-order 8 bits of the result are placed into byte element i of VRT. Special Registers Altered: SAT Vector Add Signed Halfword Saturate Vector Add Signed Word Saturate VX-form VX-form vaddshs VRT,VRA,VRB vaddsws VRT,VRA,VRB 4 VRT VRA VRB 832 4 VRT VRA VRB 896 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 16 do i=0 to 127 by 32 aop EXTS((VRA)i:i+15) aop EXTS((VRA)i:i+31) bop EXTS((VRB)i:i+15) bop EXTS((VRB)i:i+31) VRTi:i+15 VRTi:i+31 Clamp(aop +int bop, -231, 231-1) Clamp(aop +int bop, -215, 215-1)16:31 For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 7, do the following. Signed-integer word element i in VRA is added to Signed-integer halfword element i in VRA is added signed-integer word element i in VRB. to signed-integer halfword element i in VRB. - If the sum is greater than 231-1 the result - If the sum is greater than 215-1 the result saturates to 231-1. saturates to 215-1 - If the sum is less than -231 the result satu- - If the sum is less than -215 the result satu- rates to -231. rates to -215. The low-order 32 bits of the result are placed into The low-order 16 bits of the result are placed into word element i of VRT. halfword element i of VRT. Special Registers Altered: Special Registers Altered: SAT SAT 230 Power ISATM Book I Version 2.06 Vector Add Unsigned Byte Modulo Vector Add Unsigned Halfword Modulo VX-form VX-form vaddubm VRT,VRA,VRB vadduhm VRT,VRA,VRB 4 VRT VRA VRB 0 4 VRT VRA VRB 64 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 aop EXTZ((VRA)i:i+7) aop EXTZ((VRA)i:i+15) bop EXTZ((VRB)i:i+7) bop EXTZ((VRB)i:i+15) VRTi:i+7 Chop( aop +int bop, 8 ) VRTi:i+15 Chop( aop +int bop, 16 ) For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRA is added Unsigned-integer halfword element i in VRA is to unsigned-integer byte element i in VRB. added to unsigned-integer halfword element i in VRB. The low-order 8 bits of the result are placed into byte element i of VRT. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: None Special Registers Altered: None Programming Note vaddubm can be used for unsigned or signed-inte- Programming Note gers. vadduhm can be used for unsigned or signed-inte- gers. Vector Add Unsigned Word Modulo VX-form vadduwm VRT,VRA,VRB 4 VRT VRA VRB 128 0 6 11 16 21 31 do i=0 to 127 by 32 aop EXTZ((VRA)i:i+31) bop EXTZ((VRB)i:i+31) temp aop +int bop VRTi:i+31 Chop( aop +int bop, 32 ) For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is added to unsigned-integer word element i in VRB. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: None Programming Note vadduwm can be used for unsigned or signed-inte- gers. Chapter 6. Vector Facility [Category: Vector] 231 Version 2.06 Vector Add Unsigned Byte Saturate Vector Add Unsigned Halfword Saturate VX-form VX-form vaddubs VRT,VRA,VRB vadduhs VRT,VRA,VRB 4 VRT VRA VRB 512 4 VRT VRA VRB 576 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 aop EXTZ((VRA)i:i+7) aop EXTZ((VRA)i:i+15) bop EXTZ((VRB)i:i+7) bop EXTZ((VRB)i:i+15) VRTi:i+7 Clamp( aop +int bop, 0, 255 )24:31 VRTi:i+15 Clamp(aop +int bop, 0, 216-1)16:31 For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRA is added Unsigned-integer halfword element i in VRA is to unsigned-integer byte element i in VRB. added to unsigned-integer halfword element i in - If the sum is greater than 255 the result VRB. saturates to 255. - If the sum is greater than 216-1 the result saturates to 216-1. The low-order 8 bits of the result are placed into byte element i of VRT. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: SAT Special Registers Altered: SAT Vector Add Unsigned Word Saturate VX-form vadduws VRT,VRA,VRB 4 VRT VRA VRB 640 0 6 11 16 21 31 do i=0 to 127 by 32 aop EXTZ((VRA)i:i+31) bop EXTZ((VRB)i:i+31) VRTi:i+31 Clamp(aop +int bop, 0, 232-1) For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is added to unsigned-integer word element i in VRB. - If the sum is greater than 232-1 the result saturates to 232-1. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: SAT 232 Power ISATM Book I Version 2.06 6.9.1.2 Vector Integer Subtract Instructions Vector Subtract and Write Carry-Out Vector Subtract Signed Byte Saturate Unsigned Word VX-form VX-form vsubcuw VRT,VRA,VRB vsubsbs VRT,VRA,VRB 4 VRT VRA VRB 1408 4 VRT VRA VRB 1792 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 8 aop (VRA)i:i+31 aop EXTS((VRA)i:i+7) bop (VRB)i:i+31 bop EXTS((VRB)i:i+7) temp (EXTZ(aop) +int EXTZ(¬bop) +int 1) >> 32 VRTi:i+7 VRTi:i+31 temp & 0x0000_0001 Clamp(aop +int ¬bop +int 1, -128, 127)24:31 For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 15, do the following. Unsigned-integer word element i in VRB is sub- Signed-integer byte element i in VRB is subtracted tracted from unsigned-integer word element i in from signed-integer byte element i in VRA. VRA. The complement of the borrow out of bit 0 of - If the intermediate result is greater than the 32-bit difference is zero-extended to 32 bits 127 the result saturates to 127. and placed into word element i of VRT. - If the intermediate result is less than -128 the result saturates to -128. Special Registers Altered: None The low-order 8 bits of the result are placed into byte element i of VRT. Special Registers Altered: SAT Vector Subtract Signed Halfword Saturate Vector Subtract Signed Word Saturate VX-form VX-form vsubshs VRT,VRA,VRB vsubsws VRT,VRA,VRB 4 VRT VRA VRB 1856 4 VRT VRA VRB 1920 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 16 do i=0 to 127 by 32 aop EXTS((VRA)i:i+15) aop EXTS((VRA)i:i+31) bop EXTS((VRB)i:i+15) bop EXTS((VRB)i:i+31) VRTi:i+15 VRTi:i+31 Clamp(aop +int ¬bop +int 1,-231,231-1) Clamp(aop +int ¬bop +int 1, -215, 215-1)16:31 For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 7, do the following. Signed-integer word element i in VRB is sub- Signed-integer halfword element i in VRB is sub- tracted from signed-integer word element i in VRA. tracted from signed-integer halfword element i in - If the intermediate result is greater than VRA. 231-1 the result saturates to 231-1. - If the intermediate result is greater than - If the intermediate result is less than -231 215-1 the result saturates to 215-1. the result saturates to -231. - If the intermediate result is less than -215 the result saturates to -215. The low-order 32 bits of the result are placed into word element i of VRT. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: SAT Special Registers Altered: SAT Chapter 6. Vector Facility [Category: Vector] 233 Version 2.06 Vector Subtract Unsigned Byte Modulo Vector Subtract Unsigned Halfword VX-form Modulo VX-form vsububm VRT,VRA,VRB vsubuhm VRT,VRA,VRB 4 VRT VRA VRB 1024 4 VRT VRA VRB 1088 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 aop EXTZ((VRA)i:i+7) aop EXTZ((VRA)i:i+15) bop EXTZ((VRB)i:i+7) bop EXTZ((VRB)i:i+15) VRTi:i+7 Chop( aop +int ¬bop +int 1, 8 ) VRTi:i+16 Chop( aop +int ¬bop +int 1, 16 ) For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRB is sub- Unsigned-integer halfword element i in VRB is tracted from unsigned-integer byte element i in subtracted from unsigned-integer halfword ele- VRA. The low-order 8 bits of the result are placed ment i in VRA. The low-order 16 bits of the result into byte element i of VRT. are placed into halfword element i of VRT. Special Registers Altered: Special Registers Altered: None None Vector Subtract Unsigned Word Modulo VX-form vsubuwm VRT,VRA,VRB 4 VRT VRA VRB 1152 0 6 11 16 21 31 do i=0 to 127 by 32 aop EXTZ((VRA)i:i+31) bop EXTZ((VRB)i:i+31) VRTi:i+31 Chop( aop +int ¬bop +int 1, 32 ) For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRB is sub- tracted from unsigned-integer word element i in VRA. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: None 234 Power ISATM Book I Version 2.06 Vector Subtract Unsigned Byte Saturate Vector Subtract Unsigned Halfword VX-form Saturate VX-form vsububs VRT,VRA,VRB vsubuhs VRT,VRA,VRB 4 VRT VRA VRB 1536 4 VRT VRA VRB 1600 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 aop EXTZ((VRA)i:i+7) aop EXTZ((VRA)i:i+15) bop EXTZ((VRB)i:i+7) bop EXTZ((VRB)i:i+15) VRTi:i+7 Clamp(aop +int ¬bop +int 1, 0, 255)24:31 VRTi:i+15 Clamp(aop +int ¬bop +int 1,0,216-1)16:31 For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRB is sub- Unsigned-integer halfword element i in VRB is tracted from unsigned-integer byte element i in subtracted from unsigned-integer halfword ele- VRA. If the intermediate result is less than 0 the ment i in VRA. If the intermediate result is less result saturates to 0. The low-order 8 bits of the than 0 the result saturates to 0. The low-order 16 result are placed into byte element i of VRT. bits of the result are placed into halfword element i of VRT. Special Registers Altered: SAT Special Registers Altered: SAT Vector Subtract Unsigned Word Saturate VX-form vsubuws VRT,VRA,VRB 4 VRT VRA VRB 1664 0 6 11 16 21 31 do i=0 to 127 by 32 aop EXTZ((VRA)i:i+31) bop EXTZ((VRB)i:i+31) VRTi:i+31 Clamp(aop +int ¬bop +int 1, 0, 232-1) For each vector element i from 0 to 7, do the following. Unsigned-integer word element i in VRB is sub- tracted from unsigned-integer word element i in VRA. - If the intermediate result is less than 0 the result saturates to 0. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: SAT Chapter 6. Vector Facility [Category: Vector] 235 Version 2.06 6.9.1.3 Vector Integer Multiply Instructions Vector Multiply Even Signed Byte Vector Multiply Even Signed Halfword VX-form VX-form vmulesb VRT,VRA,VRB vmulesh VRT,VRA,VRB 4 VRT VRA VRB 776 4 VRT VRA VRB 840 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 16 do i=0 to 127 by 32 prod EXTS((VRA)i:i+7) ×si EXTS((VRB)i:i+7) prod EXTS((VRA)i:i+15) ×si EXTS((VRB)i:i+15) VRTi:i+15 Chop( prod, 16 ) VRTi:i+31 Chop( prod, 32 ) For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 3, do the following. Signed-integer byte element i×2 in VRA is multi- Signed-integer halfword element i×2 in VRA is plied by signed-integer byte element i×2 in VRB. multiplied by signed-integer halfword element i×2 The low-order 16 bits of the product are placed into in VRB. The low-order 32 bits of the product are halfword element i VRT. placed into halfword element i VRT. Special Registers Altered: Special Registers Altered: None None Vector Multiply Even Unsigned Byte Vector Multiply Even Unsigned Halfword VX-form VX-form vmuleub VRT,VRA,VRB vmuleuh VRT,VRA,VRB 4 VRT VRA VRB 520 4 VRT VRA VRB 584 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 16 do i=0 to 127 by 32 prod EXTZ((VRA)i:i+7) ×ui EXTZ((VRB)i:i+7) prod EXTZ((VRA)i:i+15) ×ui EXTZ((VRB)i:i+15) VRTi:i+15 Chop(prod, 16) VRTi:i+31 Chop(prod, 32) For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 3, do the following. Unsigned-integer byte element i×2 in VRA is multi- Unsigned-integer halfword element i×2 in VRA is plied by unsigned-integer byte element i×2 in VRB. multiplied by unsigned-integer halfword element The low-order 16 bits of the product are placed into i×2 in VRB. The low-order 32 bits of the product halfword element i VRT. are placed into halfword element i VRT. Special Registers Altered: Special Registers Altered: None None 236 Power ISATM Book I Version 2.06 Vector Multiply Odd Signed Byte VX-form Vector Multiply Odd Signed Halfword VX-form vmulosb VRT,VRA,VRB vmulosh VRT,VRA,VRB 4 VRT VRA VRB 264 0 6 11 16 21 31 4 VRT VRA VRB 328 0 6 11 16 21 31 do i=0 to 127 by 16 prod EXTS((VRA)i+8:i+15) ×si EXTS((VRB)i+8:i+15) do i=0 to 127 by 32 VRTi:i+15 Chop( prod, 16 ) prod EXTS((VRA)i+16:i+31) ×si EXTS((VRB)i+16:i+31) VRTi:i+31 Chop( prod, 32 ) For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 3, do the following. Signed-integer byte element i×2+1 in VRA is multi- plied by signed-integer byte element i×2+1 in VRB. Signed-integer halfword element i×2+1 in VRA is The low-order 16 bits of the product are placed into multiplied by signed-integer halfword element halfword element i VRT. i×2+1 in VRB. The low-order 32 bits of the product are placed into halfword element i VRT. Special Registers Altered: None Special Registers Altered: None Vector Multiply Odd Unsigned Byte Vector Multiply Odd Unsigned Halfword VX-form VX-form vmuloub VRT,VRA,VRB vmulouh VRT,VRA,VRB 4 VRT VRA VRB 8 4 VRT VRA VRB 72 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 16 do i=0 to 127 by 32 prod EXTZ((VRA)i+8:i+15) ×ui EXTZ((VRB)i+8:i+15) prod EXTZ((VRA)i+16:i+31)×ui EXTZ((VRB)i+16:i+31) VRTi:i+15 Chop( prod, 16 ) VRTi:i+31 Chop( prod, 32 ) For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 3, do the following. Unsigned-integer byte element i×2+1 in VRA is Unsigned-integer halfword element i×2+1 in VRA multiplied by unsigned-integer byte element i×2+1 is multiplied by unsigned-integer halfword element in VRB. The low-order 16 bits of the product are i×2+1 in VRB. The low-order 32 bits of the product placed into halfword element i VRT. are placed into halfword element i VRT. Special Registers Altered: Special Registers Altered: None None Chapter 6. Vector Facility [Category: Vector] 237 Version 2.06 6.9.1.4 Vector Integer Multiply-Add/Sum Instructions Vector Multiply-High-Add Signed Vector Multiply-High-Round-Add Signed Halfword Saturate VA-form Halfword Saturate VA-form vmhaddshs VRT,VRA,VRB,VRC vmhraddshs VRT,VRA,VRB,VRC 4 VRT VRA VRB VRC 32 4 VRT VRA VRB VRC 33 0 6 11 16 21 26 31 0 6 11 16 21 26 31 do i=0 to 127 by 16 do i=0 to 127 by 16 prod EXTS((VRA)i:i+15) ×si EXTS((VRB)i:i+15) prod EXTS((VRA)i:i+15) ×si EXTS((VRB)i:i+15) sum (prod >>si 15) +int EXTS((VRC)i:i+15 sum ((prod +int 0x0000_4000) >>si 15) VRTi:i+15 Clamp(sum, -215, 215-1)16:31 +int EXTS((VRC)i:i+15) VRTi:i+15 Clamp(sum, -215, 215-1)16:31 For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 7, do the following. Signed-integer halfword element i in VRA is multi- plied by signed-integer halfword element i in VRB, Signed-integer halfword element i in VRA is multi- producing a 32-bit signed-integer product. Bits plied by signed-integer halfword element i in VRB, 0:16 of the product are added to signed-integer producing a 32-bit signed-integer product. The halfword element i in VRC. value 0x0000_4000 is added to the product, pro- - If the intermediate result is greater than ducing a 32-bit signed-integer sum. Bits 0:16 of the 215-1 the result saturates to 215-1. sum are added to signed-integer halfword element - If the intermediate result is less than -215 i in VRC. the result saturates to -215. - If the intermediate result is greater than 215-1 the result saturates to 215-1. The low-order 16 bits of the result are placed into - If the intermediate result is less than -215 halfword element i of VRT. the result saturates to -215. Special Registers Altered: The low-order 16 bits of the result are placed into SAT halfword element i of VRT. Special Registers Altered: SAT 238 Power ISATM Book I Version 2.06 Vector Multiply-Low-Add Unsigned Vector Multiply-Sum Unsigned Byte Halfword Modulo VA-form Modulo VA-form vmladduhm VRT,VRA,VRB,VRC vmsumubm VRT,VRA,VRB,VRC 4 VRT VRA VRB VRC 34 4 VRT VRA VRB VRC 36 0 6 11 16 21 26 31 0 6 11 16 21 26 31 do i=0 to 127 by 16 do i=0 to 127 by 32 prod EXTZ((VRA)i:i+15) ×ui EXTZ((VRB)i:i+15) temp EXTZ((VRC)i:i+31) sum Chop( prod, 16 ) +int (VRC)i:i+15 do j=0 to 31 by 8 VRTi:i+15 Chop( sum, 16 ) prod EXTZ((VRA)i+j:i+j+7) ×ui EXTZ((VRB)i+j:i+j+7) For each vector element i from 0 to 3, do the following. temp temp +int prod Unsigned-integer halfword element i in VRA is VRTi:i+31 Chop( temp, 32 ) multiplied by unsigned-integer halfword element i For each word element in VRT the following operations in VRB, producing a 32-bit unsigned-integer prod- are performed, in the order shown. uct. The low-order 16 bits of the product are added to unsigned-integer halfword element i in VRC. - Each of the four unsigned-integer byte ele- ments contained in the corresponding word The low-order 16 bits of the sum are placed into element of VRA is multiplied by the corre- halfword element i of VRT. sponding unsigned-integer byte element in Special Registers Altered: VRB, producing an unsigned-integer halfword None product. - The sum of these four unsigned-integer half- Programming Note word products is added to the unsigned-inte- vmladduhm can be used for unsigned or ger word element in VRC. signed-integers. - The unsigned-integer word result is placed into the corresponding word element of VRT. Special Registers Altered: None Chapter 6. Vector Facility [Category: Vector] 239 Version 2.06 Vector Multiply-Sum Mixed Byte Modulo Vector Multiply-Sum Signed Halfword VA-form Modulo VA-form vmsummbm VRT,VRA,VRB,VRC vmsumshm VRT,VRA,VRB,VRC 4 VRT VRA VRB VRC 37 4 VRT VRA VRB VRC 40 0 6 11 16 21 26 31 0 6 11 16 21 26 31 do i=0 to 127 by 32 do i=0 to 127 by 32 temp (VRC)i:i+31 temp (VRC)i:i+31 do j=0 to 31 by 8 do j=0 to 31 by 16 prod0:15 (VRA)i+j:i+j+7 ×sui (VRB)i+j:i+j+7 prod0:31 (VRA)i+j:i+j+15 ×si (VRB)i+j:i+j+15 temp temp +int EXTS(prod) temp temp +int prod VRTi:i+31 temp VRTi:i+31 temp For each word element in VRT the following operations For each word element in VRT the following operations are performed, in the order shown. are performed, in the order shown. - Each of the four signed-integer byte elements - Each of the two signed-integer halfword ele- contained in the corresponding word element ments contained in the corresponding word of VRA is multiplied by the corresponding element of VRA is multiplied by the corre- unsigned-integer byte element in VRB, pro- sponding signed-integer halfword element in ducing a signed-integer product. VRB, producing a signed-integer product. - The sum of these four signed-integer halfword - The sum of these two signed-integer word products is added to the signed-integer word products is added to the signed-integer word element in VRC. element in VRC. - The signed-integer result is placed into the - The signed-integer word result is placed into corresponding word element of VRT. the corresponding word element of VRT. Special Registers Altered: Special Registers Altered: None None 240 Power ISATM Book I Version 2.06 Vector Multiply-Sum Signed Halfword Vector Multiply-Sum Unsigned Halfword Saturate VA-form Modulo VA-form vmsumshs VRT,VRA,VRB,VRC vmsumuhm VRT,VRA,VRB,VRC 4 VRT VRA VRB VRC 41 4 VRT VRA VRB VRC 38 0 6 11 16 21 26 31 0 6 11 16 21 26 31 do i=0 to 127 by 32 do i=0 to 127 by 32 temp EXTS((VRC)i:i+31) temp EXTZ((VRC)i:i+31) do j=0 to 31 by 16 do j=0 to 31 by 16 prod EXTS((VRA)i+j:i+j+15) prod EXTZ((VRA)i+j:i+j+15) ×si EXTS((VRB)i+j:i+j+15) ×ui EXTZ((VRB)i+j:i+j+15) temp temp +int prod temp temp +int prod VRTi:i+31 Clamp(temp, -231, 231-1) VRTi:i+31 Chop( temp, 32 ) For each word element in VRT the following operations For each word element in VRT the following operations are performed, in the order shown. are performed, in the order shown. - Each of the two signed-integer halfword ele- - Each of the two unsigned-integer halfword ments contained in the corresponding word elements contained in the corresponding word element of VRA is multiplied by the corre- element of VRA is multiplied by the corre- sponding signed-integer halfword element in sponding unsigned-integer halfword element VRB, producing a signed-integer product. in VRB, producing an unsigned-integer word product. - The sum of these two signed-integer word products is added to the signed-integer word - The sum of these two unsigned-integer word element in VRC. products is added to the unsigned-integer word element in VRC. - If the intermediate result is greater than 231-1 the result saturates to 231-1 and if it is less - The unsigned-integer result is placed into the than -231 it saturates to -231. corresponding word element of VRT. - The result is placed into the corresponding Special Registers Altered: word element of VRT. None Special Registers Altered: SAT Chapter 6. Vector Facility [Category: Vector] 241 Version 2.06 Vector Multiply-Sum Unsigned Halfword Saturate VA-form vmsumuhs VRT,VRA,VRB,VRC 4 VRT VRA VRB VRC 39 0 6 11 16 21 26 31 do i=0 to 127 by 32 temp EXTZ((VRC)i:i+31) do j=0 to 31 by 16 prod EXTZ((VRA)i+j:i+j+15) ×ui EXTZ((VRB)i+j:i+j+15) temp temp +int prod VRTi:i+31 Clamp(temp, 0, 232-1) For each word element in VRT the following operations are performed, in the order shown. - Each of the two unsigned-integer halfword elements contained in the corresponding word element of VRA is multiplied by the corre- sponding unsigned-integer halfword element in VRB, producing an unsigned-integer prod- uct. - The sum of these two unsigned-integer word products is added to the unsigned-integer word element in VRC. - If the intermediate result is greater than 232-1 the result saturates to 232-1. - The result is placed into the corresponding word element of VRT. Special Registers Altered: SAT 242 Power ISATM Book I Version 2.06 6.9.1.5 Vector Integer Sum-Across Instructions Vector Sum across Signed Word Saturate Vector Sum across Half Signed Word VX-form Saturate VX-form vsumsws VRT,VRA,VRB vsum2sws VRT,VRA,VRB 4 VRT VRA VRB 1928 4 VRT VRA VRB 1672 0 6 11 16 21 31 0 6 11 16 21 31 temp EXTS((VRB)96:127) do i=0 to 127 by 64 do i=0 to 127 by 32 temp EXTS((VRB)i+32:i+63) temp temp +int EXTS((VRA)i:i+31) do j=0 to 63 by 32 VRT0:31 0x0000_0000 temp temp +int EXTS((VRA)i+j:i+j+31) VRT32:63 0x0000_0000 VRTi:i+63 0x0000_0000 || Clamp(temp, -231, 231-1) VRT64:95 0x0000_0000 VRT96:127 Clamp(temp, -231, 231-1) Word elements 0 and 2 of VRT are set to 0. The sum of the four signed-integer word elements in The sum of the signed-integer word elements 0 and 1 VRA is added to signed-integer word element 3 of in VRA is added to the signed-integer word element in VRB. bits 32:63 of VRB. - If the intermediate result is greater than 231-1 - If the intermediate result is greater than 231-1 the result saturates to 231-1. the result saturates to 231-1. - If the intermediate result is less than -231 the - If the intermediate result is less than -231 the result saturates to -231. result saturates to -231. The low-end 32 bits of the result are placed into word The low-order 32 bits of the result are placed into word element 3 of VRT. element 1 of VRT. Word elements 0 to 2 of VRT are set to 0. The sum of signed-integer word elements 2 and 3 in VRA is added to the signed-integer word element in Special Registers Altered: bits 96:127 of VRB. SAT - If the intermediate result is greater than 231-1 the result saturates to 231-1. - If the intermediate result is less than -231 the result saturates to -231. The low-order 32 bits of the result are placed into word element 3 of VRT. Special Registers Altered: SAT Chapter 6. Vector Facility [Category: Vector] 243 Version 2.06 Vector Sum across Quarter Signed Byte Vector Sum across Quarter Signed Saturate VX-form Halfword Saturate VX-form vsum4sbs VRT,VRA,VRB vsum4shs VRT,VRA,VRB 4 VRT VRA VRB 1800 4 VRT VRA VRB 1608 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 32 temp EXTS((VRB)i:i+31) temp EXTS((VRB)i:i+31) do j=0 to 31 by 8 do j=0 to 31 by 16 temp temp +int EXTS((VRA)i+j:i+j+7) temp temp +int EXTS((VRA)i+j:i+j+15) VRTi:i+31 Clamp(temp, -231, 231-1) VRTi:i+31 Clamp(temp, -231, 231-1) For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. The sum of the four signed-integer byte elements The sum of the two signed-integer halfword ele- contained in word element i of VRA is added to ments contained in word element i of VRA is signed-integer word element i in VRB. added to signed-integer word element i in VRB. - If the intermediate result is greater than - If the intermediate result is greater than 231-1 the result saturates to 231-1. 231-1 the result saturates to 231-1. - If the intermediate result is less than -231 - If the intermediate result is less than -231 the result saturates to -231. the result saturates to -231. The low-order 32 bits of the result are placed into The low-order 32 bits of the result are placed into word element i of VRT. the corresponding word element of VRT. Special Registers Altered: Special Registers Altered: SAT SAT Vector Sum across Quarter Unsigned Byte Saturate VX-form vsum4ubs VRT,VRA,VRB 4 VRT VRA VRB 1544 0 6 11 16 21 31 do i=0 to 127 by 32 temp EXTZ((VRB)i:i+31) do j=0 to 31 by 8 temp temp +int EXTZ((VRA)i+j:i+j+7) VRTi:i+31 Clamp( temp, 0, 232-1 ) For each vector element i from 0 to 3, do the following. The sum of the four unsigned-integer byte ele- ments contained in word element i of VRA is added to unsigned-integer word element i in VRB. - If the intermediate result is greater than 232-1 it saturates to 232-1. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: SAT 244 Power ISATM Book I Version 2.06 6.9.1.6 Vector Integer Average Instructions Vector Average Signed Byte VX-form Vector Average Signed Halfword VX-form vavgsb VRT,VRA,VRB vavgsh VRT,VRA,VRB 4 VRT VRA VRB 1282 4 VRT VRA VRB 1346 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 aop EXTS((VRA)i:i+7) aop EXTS((VRA)i:i+15) bop EXTS((VRB)i:i+7) bop EXTS((VRB)i:i+15) VRTi:i+7 Chop(( aop +int bop +int 1 ) >> 1, 8) VRTi:i+15 Chop(( aop +int bop +int 1 ) >> 1, 16) For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Signed-integer byte element i in VRA is added to Signed-integer halfword element i in VRA is added signed-integer byte element i in VRB. The sum is to signed-integer halfword element i in VRB. The incremented by 1 and then shifted right 1 bit. sum is incremented by 1 and then shifted right 1 bit. The low-order 8 bits of the result are placed into byte element i of VRT. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: None Special Registers Altered: None Vector Average Signed Word VX-form vavgsw VRT,VRA,VRB 4 VRT VRA VRB 1410 0 6 11 16 21 31 do i=0 to 127 by 32 aop EXTS((VRA)i:i+31) bop EXTS((VRB)i:i+31) VRTi:i+31 Chop(( aop +int bop +int 1 ) >> 1, 32) For each vector element i from 0 to 3, do the following. Signed-integer word element i in VRA is added to signed-integer word element i in VRB. The sum is incremented by 1 and then shifted right 1 bit. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: None Chapter 6. Vector Facility [Category: Vector] 245 Version 2.06 Vector Average Unsigned Byte VX-form Vector Average Unsigned Halfword VX-form vavgub VRT,VRA,VRB vavguh VRT,VRA,VRB 4 VRT VRA VRB 1026 0 6 11 16 21 31 4 VRT VRA VRB 1090 0 6 11 16 21 31 do i=0 to 127 by 8 aop EXTZ((VRA)i:i+7) do i=0 to 127 by 16 bop EXTZ((VRB)i:i+7 aop EXTZ((VRA)i:i+15) VRTi:i+7 Chop((aop +int bop +int 1) >>ui 1, 8) bop EXTZ((VRB)i:i+15) VRTi:i+15 Chop((aop +int bop +int 1) >>ui 1, 16) For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRA is added to unsigned-integer byte element i in VRB. The Unsigned-integer halfword element i in VRA is sum is incremented by 1 and then shifted right 1 added to unsigned-integer halfword element i in bit. VRB. The sum is incremented by 1 and then shifted right 1 bit. The low-order 8 bits of the result are placed into byte element i of VRT. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: None Special Registers Altered: None Vector Average Unsigned Word VX-form vavguw VRT,VRA,VRB 4 VRT VRA VRB 1154 0 6 11 16 21 31 do i=0 to 127 by 32 aop EXTZ((VRA)i:i+31) bop EXTZ((VRB)i:i+31) VRTi:i+31 Chop((aop +int bop +int 1) >>ui 1, 32) For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is added to unsigned-integer word element i in VRB. The sum is incremented by 1 and then shifted right 1 bit. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: None 246 Power ISATM Book I Version 2.06 6.9.1.7 Vector Integer Maximum and Minimum Instructions Vector Maximum Signed Byte VX-form Vector Maximum Signed Halfword VX-form vmaxsb VRT,VRA,VRB vmaxsh VRT,VRA,VRB 4 VRT VRA VRB 258 0 6 11 16 21 31 4 VRT VRA VRB 322 0 6 11 16 21 31 do i=0 to 127 by 8 aop EXTS((VRA)i:i+7) do i=0 to 127 by 16 bop EXTS((VRB)i:i+7) aop EXTS((VRA)i:i+15) VRTi:i+7 ( aop >si bop ) bop EXTS((VRB)i:i+15 ? (VRA)i:i+7 : (VRB)i:i+7 VRTi:i+15 ( aop >si bop ) ? (VRA)i:i+15 : (VRB)i:i+15 For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Signed-integer byte element i in VRA is compared to signed-integer byte element i in VRB. The larger Signed-integer halfword element i in VRA is com- of the two values is placed into byte element i of pared to signed-integer halfword element i in VRB. VRT. The larger of the two values is placed into halfword element i of VRT. Special Registers Altered: None Special Registers Altered: None Vector Maximum Signed Word VX-form vmaxsw VRT,VRA,VRB 4 VRT VRA VRB 386 0 6 11 16 21 31 do i=0 to 127 by 32 aop EXTS((VRA)i:i+31) bop EXTS((VRB)i:i+31) VRTi:i+31 ( aop >si bop ) ? (VRA)i:i+31 : (VRB)i:i+31 For each vector element i from 0 to 3, do the following. Signed-integer word element i in VRA is compared to signed-integer word element i in VRB. The larger of the two values is placed into word ele- ment i of VRT. Special Registers Altered: None Chapter 6. Vector Facility [Category: Vector] 247 Version 2.06 Vector Maximum Unsigned Byte VX-form Vector Maximum Unsigned Halfword VX-form vmaxub VRT,VRA,VRB vmaxuh VRT,VRA,VRB 4 VRT VRA VRB 2 0 6 11 16 21 31 4 VRT VRA VRB 66 0 6 11 16 21 31 do i=0 to 127 by 8 aop EXTZ((VRA)i:i+7) do i=0 to 127 by 16 bop EXTZ((VRB)i:i+7) aop EXTZ((VRA)i:i+15) VRTi:i+7 (aop >ui bop) ? (VRA)i:i+7 : (VRB)i:i+7 bop EXTZ((VRB)i:i+15) VRTi:i+15 (aop >ui bop) For each vector element i from 0 to 15, do the following. ? (VRA)i:i+15 : (VRB)i:i+15 Unsigned-integer byte element i in VRA is com- For each vector element i from 0 to 7, do the following. pared to unsigned-integer byte element i in VRB. The larger of the two values is placed into byte ele- Unsigned-integer halfword element i in VRA is ment i of VRT. compared to unsigned-integer halfword element i in VRB. The larger of the two values is placed into Special Registers Altered: halfword element i of VRT. None Special Registers Altered: None Vector Maximum Unsigned Word VX-form vmaxuw VRT,VRA,VRB 4 VRT VRA VRB 130 0 6 11 16 21 31 do i=0 to 127 by 32 aop EXTZ((VRA)i:i+31) bop EXTZ((VRB)i:i+31) VRTi:i+31 (aop >ui bop) ? (VRA)i:i+31 : (VRB)i:i+31 For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is com- pared to unsigned-integer word element i in VRB. The larger of the two values is placed into word element i of VRT. Special Registers Altered: None 248 Power ISATM Book I Version 2.06 Vector Minimum Signed Byte VX-form Vector Minimum Signed Halfword VX-form vminsb VRT,VRA,VRB vminsh VRT,VRA,VRB 4 VRT VRA VRB 770 0 6 11 16 21 31 4 VRT VRA VRB 834 0 6 11 16 21 31 do i=0 to 127 by 8 aop EXTS((VRA)i:i+7) do i=0 to 127 by 16 bop EXTS((VRB)i:i+7) aop EXTS((VRA)i:i+15) VRTi:i+7 (aop si (VRB)i:i+7) ? 81 : 80 if Rc=1 then do if Rc=1 then do t (VRT=1281) t (VRT=1281) f (VRT=1280) f (VRT=1280) CR6 t || 0b0 || f || 0b0 CR6 t || 0b0 || f || 0b0 For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 15, do the following. Unsigned-integer word element i in VRA is com- Signed-integer byte element i in VRA is compared pared to unsigned-integer word element i in VRB. to signed-integer byte element i in VRB. Byte ele- Word element i in VRT is set to all 1s if ment i in VRT is set to all 1s if signed-integer byte unsigned-integer word element i in VRA is equal to element i in VRA is greater than to signed-integer unsigned-integer word element i in VRB, and is set byte element i in VRB, and is set to all 0s other- to all 0s otherwise. wise. Special Registers Altered: Special Registers Altered: CR6 (if Rc=1) CR6 (if Rc=1) Vector Compare Greater Than Signed Vector Compare Greater Than Signed Halfword VC-form Word VC-form vcmpgtsh VRT,VRA,VRB (Rc=0) vcmpgtsw VRT,VRA,VRB (Rc=0) vcmpgtsh. VRT,VRA,VRB (Rc=1) vcmpgtsw. VRT,VRA,VRB (Rc=1) 4 VRT VRA VRB Rc 838 4 VRT VRA VRB Rc 902 0 6 11 16 21 22 31 0 6 11 16 21 22 31 do i=0 to 127 by 16 do i=0 to 127 by 32 VRTi:i+15 ((VRA)i:i+15 >si (VRB)i:i+15) ? 161 : 160 VRTi:i+31 ((VRA)i:i+31 >si (VRB)i:i+31) ? 321 : 320 if Rc=1 then do if Rc=1 then do t (VRT=1281) t (VRT=1281) f (VRT=1280) f (VRT=1280) CR6 t || 0b0 || f || 0b0 CR6 t || 0b0 || f || 0b0 For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 3, do the following. Signed-integer halfword element i in VRA is com- Signed-integer word element i in VRA is compared pared to signed-integer halfword element i in VRB. to signed-integer word element i in VRB. Word ele- Halfword element i in VRT is set to all 1s if ment i in VRT is set to all 1s if signed-integer word signed-integer halfword element i in VRA is greater element i in VRA is greater than signed-integer than signed-integer halfword element i in VRB, and word element i in VRB, and is set to all 0s other- is set to all 0s otherwise. wise. Special Registers Altered: Special Registers Altered: CR6 (if Rc=1) CR6 (if Rc=1) 252 Power ISATM Book I Version 2.06 Vector Compare Greater Than Unsigned Vector Compare Greater Than Unsigned Byte VC-form Halfword VC-form vcmpgtub VRT,VRA,VRB (Rc=0) vcmpgtuh VRT,VRA,VRB (Rc=0) vcmpgtub. VRT,VRA,VRB (Rc=1) vcmpgtuh. VRT,VRA,VRB (Rc=1) 4 VRT VRA VRB Rc 518 4 VRT VRA VRB Rc 582 0 6 11 16 21 22 31 0 6 11 16 21 22 31 do i=0 to 127 by 8 do i=0 to 127 by 16 VRTi:i+7 ((VRA)i:i+7 >ui (VRB)i:i+7) ? 81 : 80 VRTi:i+15 ((VRA)i:i+15 >ui (VRB)i:i+15) ? 161 : 160 if Rc=1 then do if Rc=1 then do t (VRT=1281) t (VRT=1281) f (VRT=1280) f (VRT=1280) CR6 t || 0b0 || f || 0b0 CR6 t || 0b0 || f || 0b0 For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRA is com- Unsigned-integer halfword element i in VRA is pared to unsigned-integer byte element i in VRB. compared to unsigned-integer halfword element i Byte element i in VRT is set to all 1s if in VRB. Halfword element i in VRT is set to all 1s if unsigned-integer byte element i in VRA is greater unsigned-integer halfword element i in VRA is than to unsigned-integer byte element i in VRB, greater than to unsigned-integer halfword element and is set to all 0s otherwise. i in VRB, and is set to all 0s otherwise. Special Registers Altered: Special Registers Altered: CR6 (if Rc=1) CR6 (if Rc=1) Vector Compare Greater Than Unsigned Word VC-form vcmpgtuw VRT,VRA,VRB (Rc=0) vcmpgtuw. VRT,VRA,VRB (Rc=1) 4 VRT VRA VRB Rc 646 0 6 11 16 21 22 31 do i=0 to 127 by 32 VRTi:i+31 ((VRA)i:i+31 >ui (VRB)i:i+31) ? 321 : 320 if Rc=1 then do t (VRT=1281) f (VRT=1280) CR6 t || 0b0 || f || 0b0 For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is com- pared to unsigned-integer word element i in VRB. Word element i in VRT is set to all 1s if unsigned-integer word element i in VRA is greater than to unsigned-integer word element i in VRB, and is set to all 0s otherwise. Special Registers Altered: CR6 (if Rc=1) Chapter 6. Vector Facility [Category: Vector] 253 Version 2.06 6.9.3 Vector Logical Instructions Extended mnemonics for vector logi- Vector Logical AND with Complement cal operations VX-form Extended mnemonics are provided that use the Vector vandc VRT,VRA,VRB OR and Vector NOR instructions to copy the contents of one Vector Register to another, with and without 4 VRT VRA VRB 1092 complementing. These are shown as examples with 0 6 11 16 21 31 the two instructions. Vector Move Register VRT (VRA) & ¬(VRB) Several vector instructions can be coded in a way The contents of VRA are ANDed with the complement such that they simply copy the contents of one of the contents of VRB and the result is placed into Vector Register to another. An extended mne- VRT. monic is provided to convey the idea that no com- Special Registers Altered: putation is being performed but merely data None movement (from one register to another). The following instruction copies the contents of Vector Logical NOR VX-form register Vy to register Vx. vnor VRT,VRA,VRB vmr Vx,Vy (equivalent to: vor Vx,Vy,Vy) 4 VRT VRA VRB 1284 Vector Complement Register 0 6 11 16 21 31 The Vector NOR instruction can be coded in a way such that it complements the contents of one Vec- VRT ¬( (VRA) | (VRB) ) tor Register and places the result into another Vec- The contents of VRA are ORed with the contents of tor Register. An extended mnemonic is provided VRB and the complemented result is placed into VRT. that allows this operation to be coded easily. Special Registers Altered: The following instruction complements the con- None tents of register Vy and places the result into regis- ter Vx. Vector Logical OR VX-form vnot Vx,Vy (equivalent to: vnor Vx,Vy,Vy) vor VRT,VRA,VRB Vector Logical AND VX-form 4 VRT VRA VRB 1156 vand VRT,VRA,VRB 0 6 11 16 21 31 4 VRT VRA VRB 1028 VRT (VRA) | (VRB) 0 6 11 16 21 31 The contents of VRA are ORed with the contents of VRT (VRA) & (VRB) VRB and the result is placed into VRT. The contents of VRA are ANDed with the contents of Special Registers Altered: VRB and the result is placed into VRT. None Special Registers Altered: Vector Logical XOR VX-form None vxor VRT,VRA,VRB 4 VRT VRA VRB 1220 0 6 11 16 21 31 VRT (VRA) (VRB) The contents of VRA are XORed with the contents of VRB and the result is placed into VRT. Special Registers Altered: None 254 Power ISATM Book I Version 2.06 6.9.4 Vector Integer Rotate and Shift Instructions Vector Rotate Left Byte VX-form Vector Rotate Left Halfword VX-form vrlb VRT,VRA,VRB vrlh VRT,VRA,VRB 4 VRT VRA VRB 4 4 VRT VRA VRB 68 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 sh (VRB)i+5:i+7 sh (VRB)i+12:i+15 VRTi:i+7 (VRA)i:i+7 <<< sh VRTi:i+15 (VRA)i:i+15 <<< sh For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Byte element i in VRA is rotated left by the number Halfword element i in VRA is rotated left by the of bits specified in the low-order 3 bits of the corre- number of bits specified in the low-order 4 bits of sponding byte element i in VRB. the corresponding halfword element i in VRB. The result is placed into byte element i in VRT. The result is placed into halfword element i in VRT. Special Registers Altered: Special Registers Altered: None None Vector Rotate Left Word VX-form vrlw VRT,VRA,VRB 4 VRT VRA VRB 132 0 6 11 16 21 31 do i=0 to 127 by 32 sh (VRB)i+27:i+31 VRTi:i+31 (VRA)i:i+31 <<< sh For each vector element i from 0 to 3, do the following. Word element i in VRA is rotated left by the num- ber of bits specified in the low-order 5 bits of the corresponding word element i in VRB. The result is placed into word element i in VRT. Special Registers Altered: None Chapter 6. Vector Facility [Category: Vector] 255 Version 2.06 Vector Shift Left Byte VX-form Vector Shift Left Halfword VX-form vslb VRT,VRA,VRB vslh VRT,VRA,VRB 4 VRT VRA VRB 260 4 VRT VRA VRB 324 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 sh (VRB)i+5:i+7 sh (VRB)i+12:i+15 VRTi:i+7 (VRA)i:i+7 << sh VRTi:i+15 (VRA)i:i+15 << sh For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Byte element i in VRA is shifted left by the number Halfword element i in VRA is shifted left by the of bits specified in the low-order 3 bits of byte ele- number of bits specified in the low-order 4 bits of ment i in VRB. halfword element i in VRB. - Bits shifted out of bit 0 are lost. - Bits shifted out of bit 0 are lost. - Zeros are supplied to the vacated bits on - Zeros are supplied to the vacated bits on the right. the right. The result is placed into byte element i of VRT. The result is placed into halfword element i of VRT. Special Registers Altered: Special Registers Altered: None None Vector Shift Left Word VX-form vslw VRT,VRA,VRB 4 VRT VRA VRB 388 0 6 11 16 21 31 do i=0 to 127 by 32 sh (VRB)i+27:i+31 VRTi:i+31 (VRA)i:i+31 << sh For each vector element i from 0 to 3, do the following. Word element i in VRA is shifted left by the number of bits specified in the low-order 5 bits of word ele- ment i in VRB. - Bits shifted out of bit 0 are lost. - Zeros are supplied to the vacated bits on the right. The result is placed into word element i of VRT. Special Registers Altered: None 256 Power ISATM Book I Version 2.06 Vector Shift Right Byte VX-form Vector Shift Right Halfword VX-form vsrb VRT,VRA,VRB vsrh VRT,VRA,VRB 4 VRT VRA VRB 516 4 VRT VRA VRB 580 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 sh (VRB)i+5:i+7 sh (VRB)i+12:i+15 VRTi:i+7 (VRA)i:i+7 >>ui sh VRTi:i+15 (VRA)i:i+15 >>ui sh For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Byte element i in VRA is shifted right by the num- Halfword element i in VRA is shifted right by the ber of bits specified in the low-order 3 bits of byte number of bits specified in the low-order 4 bits of element i in VRB. Bits shifted out of the least-sig- halfword element i in VRB. Bits shifted out of the nificant bit are lost. Zeros are supplied to the least-significant bit are lost. Zeros are supplied to vacated bits on the left. The result is placed into the vacated bits on the left. The result is placed byte element i of VRT. into halfword element i of VRT. Special Registers Altered: Special Registers Altered: None None Vector Shift Right Word VX-form vsrw VRT,VRA,VRB 4 VRT VRA VRB 644 0 6 11 16 21 31 do i=0 to 127 by 32 sh (VRB)i+27:i+31 VRTi:i+31 (VRA)i:i+31 >>ui sh For each vector element i from 0 to 3, do the following. Word element i in VRA is shifted right by the num- ber of bits specified in the low-order 5 bits of word element i in VRB. Bits shifted out of the least-sig- nificant bit are lost. Zeros are supplied to the vacated bits on the left. The result is placed into word element i of VRT. Special Registers Altered: None Chapter 6. Vector Facility [Category: Vector] 257 Version 2.06 Vector Shift Right Algebraic Byte Vector Shift Right Algebraic Halfword VX-form VX-form vsrab VRT,VRA,VRB vsrah VRT,VRA,VRB 4 VRT VRA VRB 772 4 VRT VRA VRB 836 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 sh (VRB)i+5:i+7 sh (VRB)i+12:i+15 VRTi:i+7 (VRA)i:i+7 >>si sh VRTi:i+15 (VRA)i:i+15 >>si sh For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Byte element i in VRA is shifted right by the num- Halfword element i in VRA is shifted right by the ber of bits specified in the low-order 3 bits of the number of bits specified in the low-order 4 bits of corresponding byte element i in VRB. Bits shifted the corresponding halfword element i in VRB. Bits out of bit 7 of the byte element are lost. Bit 0 of the shifted out of bit 15 of the halfword are lost. Bit 0 of byte element is replicated to fill the vacated bits on the halfword is replicated to fill the vacated bits on the left. The result is placed into byte element i of the left. The result is placed into halfword element i VRT. of VRT. Special Registers Altered: Special Registers Altered: None None Vector Shift Right Algebraic Word VX-form vsraw VRT,VRA,VRB 4 VRT VRA VRB 900 0 6 11 16 21 31 do i=0 to 127 by 32 sh (VRB)i+27:i+31 VRTi:i+31 (VRA)i:i+31 >>si sh For each vector element i from 0 to 3, do the following. Word element i in VRA is shifted right by the num- ber of bits specified in the low-order 5 bits of the corresponding word element i in VRB. Bits shifted out of bit 31 of the word are lost. Bit 0 of the word is replicated to fill the vacated bits on the left. The result is placed into word element i of VRT. Special Registers Altered: None 258 Power ISATM Book I Version 2.06 6.10 Vector Floating-Point Instruction Set 6.10.1 Vector Floating-Point Arithmetic Instructions Vector Add Single-Precision VX-form Vector Subtract Single-Precision VX-form vaddfp VRT,VRA,VRB vsubfp VRT,VRA,VRB 4 VRT VRA VRB 10 4 VRT VRA VRB 74 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 32 VRTi:i+31 VRTi:i+31 RoundToNearSP((VRA)i:i+31 +fp (VRB)i:i+31) RoundToNearSP((VRA)i:i+31 -fp (VRB)i:i+31) For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is Single-precision floating-point element i in VRB is added to single-precision floating-point element i in subtracted from single-precision floating-point ele- VRB. The intermediate result is rounded to the ment i in VRA. The intermediate result is rounded nearest single-precision floating-point number and to the nearest single-precision floating-point num- placed into word element i of VRT. ber and placed into word element i of VRT. Special Registers Altered: Special Registers Altered: None None Chapter 6. Vector Facility [Category: Vector] 259 Version 2.06 Vector Multiply-Add Single-Precision Vector Negative Multiply-Subtract VA-form Single-Precision VA-form vmaddfp VRT,VRA,VRC,VRB vnmsubfp VRT,VRA,VRC,VRB 4 VRT VRA VRB VRC 46 4 VRT VRA VRB VRC 47 0 6 11 16 21 26 31 0 6 11 16 21 26 31 do i=0 to 127 by 32 do i=0 to 127 by 32 prod (VRA)i:i+31 ×fp (VRC)i:i+31 prod0:inf (VRA)i:i+31 ×fp (VRC)i:i+31 VRTi:i+31 RoundToNearSP( prod +fp (VRB)i:i+31 ) VRTi:i+31 -RoundToNearSP(prod0:inf -fp (VRB)i:i+31) For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is multiplied by single-precision floating-point ele- Single-precision floating-point element i in VRA is ment i in VRC. Single-precision floating-point ele- multiplied by single-precision floating-point ele- ment i in VRB is added to the infinitely-precise ment i in VRC. Single-precision floating-point ele- product. The intermediate result is rounded to the ment i in VRB is subtracted from the nearest single-precision floating-point number and infinitely-precise product. The intermediate result placed into word element i of VRT. is rounded to the nearest single-precision float- ing-point number, then negated and placed into Special Registers Altered: word element i of VRT. None Special Registers Altered: Programming Note None To use a multiply-add to perform an IEEE or Java compliant multiply, the addend must be -0.0. This is necessary to insure that the sign of a zero result will be correct when the product is -0.0 (+0.0 + -0.0 +0.0, and -0.0 + -0.0 -0.0). When the sign of a resulting 0.0 is not important, then +0.0 can be used as an addend which may, in some cases, avoid the need for a second register to hold a -0.0 in addition to the integer 0/floating-point +0.0 that may already be available. 260 Power ISATM Book I Version 2.06 6.10.2 Vector Floating-Point Maximum and Minimum Instructions Vector Maximum Single-Precision Vector Minimum Single-Precision VX-form VX-form vmaxfp VRT,VRA,VRB vminfp VRT,VRA,VRB 4 VRT VRA VRB 1034 4 VRT VRA VRB 1098 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 32 VRTi:i+31 ( (VRA)i:i+31 >fp (VRB)i:i+31 ) VRTi:i+31 ( (VRA)i:i+31 fp (VRB)i:i+31) ? 321 : 320 if Rc=1 then do if Rc=1 then do t ( VRT=1281 ) t ( VRT=1281 ) f ( VRT=1280 ) f ( VRT=1280 ) CR6 t || 0b0 || f || 0b0 CR6 t || 0b0 || f || 0b0 For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is Single-precision floating-point element i in VRA is compared to single-precision floating-point ele- compared to single-precision floating-point ele- ment i in VRB. Word element i in VRT is set to all ment i in VRB. Word element i in VRT is set to all 1s if single-precision floating-point element i in 1s if single-precision floating-point element i in VRA is greater than or equal to single-precision VRA is greater than single-precision floating-point floating-point element i in VRB, and is set to all 0s element i in VRB, and is set to all 0s otherwise. otherwise. If the source element i in VRA or the source ele- If the source element i in VRA or the source ele- ment i in VRB is a NaN, VRT is set to all 0s, indi- ment i in VRB is a NaN, VRT is set to all 0s, indi- cating "not greater than". If the source element i in cating "not greater than or equal to". If the source VRA and the source element i in VRB are both element i in VRA and the source element i in VRB infinity with the same sign, VRT is set to all 0s, are both infinity with the same sign, VRT is set to indicating "not greater than". all 1s, indicating "greater than or equal to". Special Registers Altered: Special Registers Altered: CR6 (if Rc=1) CR6 (if Rc=1) 266 Power ISATM Book I Version 2.06 6.10.5 Vector Floating-Point Estimate Instructions Vector 2 Raised to the Exponent Estimate Vector Log Base 2 Estimate Floating-Point VX-form Floating-Point VX-form vexptefp VRT,VRB vlogefp VRT,VRB 4 VRT /// VRB 394 4 VRT /// VRB 458 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 32 VRTi:i+31 Power2EstimateSP( (VRB)i:i+31 ) VRTi:i+31 LogBase2EstimateSP((VRB)i:i+31) For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. The single-precision floating-point estimate of 2 The single-precision floating-point estimate of the raised to the power of single-precision float- base 2 logarithm of single-precision floating-point ing-point element i in VRB is placed into word ele- element i in VRB is placed into the corresponding ment i of VRT. word element of VRT. Let x be any single-precision floating-point input value. Let x be any single-precision floating-point input value. Unless x< -146 or the single-precision floating-point Unless | x-1 | is less than or equal to 0.125 or the sin- result of computing 2 raised to the power x would be a gle-precision floating-point result of computing the base zero, an infinity, or a QNaN, the estimate has a relative 2 logarithm of x would be an infinity or a QNaN, the error in precision no greater than one part in 16. The estimate has an absolute error in precision (absolute most significant 12 bits of the estimate's significand are value of the difference between the estimate and the monotonic. An integral input value returns an integral infinitely precise value) no greater than 2-5. Under the value when the result is representable. same conditions, the estimate has a relative error in precision no greater than one part in 8. The result for various special cases of the source value is given below. The most significant 12 bits of the estimate's signifi- cand are monotonic. The estimate is exact if x=2y, Value Result where y is an integer between -149 and +127 inclusive. - Infinity +0 Otherwise the value placed into the element of register -0 +1 VRT may vary between implementations, and between +0 +1 different executions on the same implementation. +Infinity +Infinity NaN QNaN The result for various special cases of the source value is given below. Special Registers Altered: None Value Result - Infinity QNaN <0 QNaN -0 - Infinity +0 - Infinity +Infinity +Infinity NaN QNaN Special Registers Altered: None Chapter 6. Vector Facility [Category: Vector] 267 Version 2.06 Vector Reciprocal Estimate Vector Reciprocal Square Root Estimate Single-Precision VX-form Single-Precision VX-form vrefp VRT,VRB vrsqrtefp VRT,VRB 4 VRT /// VRB 266 4 VRT /// VRB 330 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 32 VRTi:i+31 ReciprocalEstimateSP( (VRB)i:i+31 ) VRTi:i+31 ReciprocalSquareRootEstimateSP( (VRB)i:i+31 ) For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. The single-precision floating-point estimate of the reciprocal of single-precision floating-point ele- The single-precision floating-point estimate of the ment i in VRB is placed into word element i of reciprocal of the square root of single-precision VRT. floating-point element i in VRB is placed into word element i of VRT. Unless the single-precision floating-point result of com- puting the reciprocal of a value would be a zero, an Let x be any single-precision floating-point value. infinity, or a QNaN, the estimate has a relative error in Unless the single-precision floating-point result of com- precision no greater than one part in 4096. puting the reciprocal of the square root of x would be a zero, an infinity, or a QNaN, the estimate has a relative Note that results may vary between implementations, error in precision no greater than one part in 4096. and between different executions on the same imple- mentation. Note that results may vary between implementations, and between different executions on the same imple- The result for various special cases of the source value mentation. is given below. The result for various special cases of the source value Value Result is given below. - Infinity -0 -0 - Infinity Value Result +0 + Infinity - Infinity QNaN +Infinity +0 <0 QNaN NaN QNaN -0 - Infinity +0 + Infinity Special Registers Altered: +Infinity +0 None NaN QNaN Special Registers Altered: None 268 Power ISATM Book I Version 2.06 6.11 Vector Status and Control Register Instructions Move To Vector Status and Control Move From Vector Status and Control Register VX-form Register VX-form mtvscr VRB mfvscr VRT 4 /// VRB 1604 4 VRT /// 1540 0 6 16 21 31 0 6 11 21 31 VSCR (VRB)96:127 VRT 0 || (VSCR) 96 The contents of word element 3 of VRB are placed into The contents of the VSCR are placed into word ele- the VSCR. ment 3 of VRT. Special Registers Altered: The remaining word elements in VRT are set to 0. None Special Registers Altered: None Chapter 6. Vector Facility [Category: Vector] 269 Version 2.06 270 Power ISATM Book I Version 2.06 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 7.1 Introduction . . . . . . . . . . . . . . . . . . 273 7.4.3.1 Definition . . . . . . . . . . . . . . . . . 304 7.1.1 Overview of the Vector-Scalar 7.4.3.2 Action for OE=1 . . . . . . . . . . . . 304 Extension . . . . . . . . . . . . . . . . . . . . . . 273 7.4.3.3 Action for OE=0 . . . . . . . . . . . . 304 7.1.1.1 Compatibility with Category 7.4.4 Floating-Point Underflow Floating-Point and Category Decimal Exception. . . . . . . . . . . . . . . . . . . . . . . 306 Floating-Point Operations . . . . . . . . . . 273 7.4.4.1 Definition . . . . . . . . . . . . . . . . . 306 7.1.1.2 Compatibility with Category Vector 7.4.4.2 Action for UE=1 . . . . . . . . . . . . 306 Operations. . . . . . . . . . . . . . . . . . . . . . 273 7.4.4.3 Action for UE=0 . . . . . . . . . . . . 307 7.2 VSX Registers . . . . . . . . . . . . . . . 273 7.4.5 Floating-Point Inexact Exception 308 7.2.1 Vector-Scalar Registers . . . . . . . 273 7.4.5.1 Definition . . . . . . . . . . . . . . . . . 308 7.2.1.1 Floating-Point Registers . . . . . 274 7.4.5.2 Action for XE=1 . . . . . . . . . . . . 308 7.2.1.2 Vector Registers . . . . . . . . . . . 275 7.4.5.3 Action for XE=0 . . . . . . . . . . . . 308 7.2.2 Floating-Point Status and Control 7.5 Storage Access Operations. . . . . . 310 Register. . . . . . . . . . . . . . . . . . . . . . . . 276 7.5.1 Accessing Aligned Storage 7.3 VSX Operations . . . . . . . . . . . . . . 282 Operands . . . . . . . . . . . . . . . . . . . . . . . 310 7.3.1 VSX Floating-Point Arithmetic 7.5.2 Accessing Unaligned Storage Overview . . . . . . . . . . . . . . . . . . . . . . . 282 Operands . . . . . . . . . . . . . . . . . . . . . . . 311 7.3.2 VSX Floating-Point Data . . . . . . 283 7.5.3 Storage Access Exceptions . . . . 312 7.3.2.1 Data Format . . . . . . . . . . . . . . 283 7.6 VSX Instruction Set Summary . . . 313 7.3.2.2 Value Representation . . . . . . . 284 7.6.1 VSX Storage Access Instructions313 7.3.2.3 Sign of Result . . . . . . . . . . . . . 285 7.6.1.1 VSX Scalar Storage Access 7.3.2.4 Normalization and Instructions . . . . . . . . . . . . . . . . . . . . . 313 Denormalization . . . . . . . . . . . . . . . . . 285 7.6.1.2 VSX Vector Storage Access 7.3.2.5 Data Handling and Precision . 286 Instructions . . . . . . . . . . . . . . . . . . . . . 313 7.3.2.6 Rounding . . . . . . . . . . . . . . . . 287 7.6.2 VSX Move Instructions . . . . . . . . 314 7.3.3 VSX Floating-Point Execution 7.6.2.1 VSX Scalar Move Instructions. 314 Models . . . . . . . . . . . . . . . . . . . . . . . . 289 7.6.2.2 VSX Vector Move Instructions. 314 7.3.3.1 VSX Execution Model for IEEE 7.6.3 VSX Floating-Point Arithmetic Operations. . . . . . . . . . . . . . . . . . . . . . 289 Instructions . . . . . . . . . . . . . . . . . . . . . 315 7.3.3.2 VSX Execution Model for 7.6.3.1 VSX Scalar Floating-Point Multiply-Add Type Instructions . . . . . . 290 Arithmetic Instructions . . . . . . . . . . . . . 315 7.4 VSX Floating-Point Exceptions . . . 292 7.6.3.2 VSX Vector Floating-Point 7.4.1 Floating-Point Invalid Operation Arithmetic Instructions . . . . . . . . . . . . . 315 Exception . . . . . . . . . . . . . . . . . . . . . . 295 7.6.4 VSX Floating-Point Compare 7.4.1.1 Definition. . . . . . . . . . . . . . . . . 295 Instructions . . . . . . . . . . . . . . . . . . . . . 317 7.4.1.2 Action for VE=1. . . . . . . . . . . . 295 7.6.4.1 VSX Scalar Floating-Point 7.4.1.3 Action for VE=0. . . . . . . . . . . . 297 Compare Instructions . . . . . . . . . . . . . 317 7.4.2 Floating-Point Zero Divide 7.6.4.2 VSX Vector Floating-Point Exception . . . . . . . . . . . . . . . . . . . . . . 302 Compare Instructions . . . . . . . . . . . . . 317 7.4.2.1 Definition. . . . . . . . . . . . . . . . . 302 7.6.5 VSX DP-SP Conversion 7.4.2.2 Action for ZE=1 . . . . . . . . . . . . 302 Instructions . . . . . . . . . . . . . . . . . . . . . 318 7.4.2.3 Action for ZE=0 . . . . . . . . . . . . 302 7.6.5.1 VSX Scalar DP-SP Conversion 7.4.3 Floating-Point Overflow Instructions . . . . . . . . . . . . . . . . . . . . . 318 Exception . . . . . . . . . . . . . . . . . . . . . . 304 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 271 Version 2.06 7.6.5.2 VSX Vector DP-SP Conversion 7.6.7.2 VSX Vector Round to Instructions . . . . . . . . . . . . . . . . . . . . . .318 Floating-Point Integer Instructions . . . 320 7.6.6 VSX Integer Conversion 7.6.8 VSX Logical Instructions . . . . . . 320 Instructions . . . . . . . . . . . . . . . . . . . . . .318 7.6.9 VSX Permute Instructions . . . . . 321 7.6.6.1 VSX Scalar Integer Conversion 7.7 VSX Instruction Descriptions . . . . 322 Instructions . . . . . . . . . . . . . . . . . . . . . .318 7.7.1 Instruction Description 7.6.6.2 VSX Vector Integer Conversion Conventions . . . . . . . . . . . . . . . . . . . . 322 Instructions . . . . . . . . . . . . . . . . . . . . . .319 7.7.1.1 Instruction RTL Operators . . . 322 7.6.7 VSX Round to Floating-Point Integer 7.7.1.2 Instruction RTL Function Instructions . . . . . . . . . . . . . . . . . . . . . .320 Calls . . . . . . . . . . . . . . . . . . . . . . . . . . 322 7.6.7.1 VSX Scalar Round to Floating-Point Integer Instructions . . . .320 272 Power ISATM Book I Version 2.06 7.1 Introduction Programming Note Application binary interfaces extended to support VSX require special care of vector data written to 7.1.1 Overview of the VSRs 0-31 (i.e., VSRs corresponding to FPRs). Vector-Scalar Extension Legacy scalar function calls employ double- word-based loads and stores to preserve the con- Category Vector-Scalar Extension (VSX) provides tents of any nonvolatile registers, This has the facilities supporting vector and scalar binary adverse effect of not preserving the contents of floating-point operations. The following VSX features doubleword 1 of these VSRs. are provided to increase opportunities for vectorization. ­ A unified register file, a set of Vector-Scalar 7.1.1.2 Compatibility with Category Registers (VSR), supporting both scalar and Vector Operations vector operations is provided, eliminating the overhead of vector-scalar data transfer through The instruction set defined in Chapter 6. Vector Facility storage. [Category: Vector], retains its definition with one primary difference. The VRs are mapped to VSRs ­ Support for word-aligned storage accesses for 32-63. both scalar and vector operations is provided. ­ Robust support for IEEE-754 for both vector and scalar floating-point operations is provided. 7.2 VSX Registers Combining the Floating-Point Registers (FPR) defined 7.2.1 Vector-Scalar Registers in Chapter 4. Floating-Point Facility [Category: Floating-Point] and the Vector Registers (VR) defined Sixty-four 128-bit VSRs are provided. See Figure 106 in Chapter 6. Vector Facility [Category: Vector] All VSX floating-point computations and other data provides additional registers to support more manipulation are performed on data residing in aggressive compiler optimizations for both vector and Vector-Scalar Registers, and results are placed into a scalar operations. VSR. Implementations of VSX must also implement the Depending on the instruction, the contents of a VSR Floating-Point (Chapter 4) and Vector (Chapter 6) are interpreted as a sequence of equal-length categories. elements (words or doublewords) or as a quadword. Each of the elements is aligned at its natural boundary 7.1.1.1 Compatibility with Category within the VSR, as shown in Figure 106. Many instructions perform a given operation in parallel on all Floating-Point and Category Decimal elements in a VSR. Depending on the instruction, a Floating-Point Operations word element can be interpreted as a signed integer word (SW), an unsigned integer word (UW), a logical The instruction sets defined in Chapter 4. mask value (MW), or a single-precision floating-point Floating-Point Facility [Category: Floating-Point] and value (SP); a doubleword element can be interpreted Chapter 5. Decimal Floating-Point [Category: Decimal as a doubleword signed integer (SD), a doubleword Floating-Point] retain their definition with one primary unsigned integer (UD), a doubleword mask (DM), or a difference. The FPRs are mapped to doubleword double-precision floating-point value (DP). In the element 0 of VSRs 0-31. The contents of doubleword 1 instructions descriptions, phrases like signed integer of the VSR corresponding to a source FPR specified word element are used as shorthand for word element, by an instruction are ignored. The contents of interpreted as a signed integer. doubleword 1 of a VSR corresponding to the target FPR specified by an instruction are undefined. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 273 Version 2.06 Load and Store instructions are provided that transfer a byte, halfword, word, doubleword, or quadword between storage and a VSR. VSR[0] VSR[1] ... ... VSR[62] VSR[63] 0 127 Figure 106.Vector-Scalar Registers SD/UD/MD/DP 0 SD/UD/MD/DP 1 SW/UW/MW/SP 0 SW/UW/MW/SP 1 SW/UW/MW/SP 2 SW/UW/MW/SP 3 0 32 64 96 127 Figure 107.Vector-Scalar Register Elements 7.2.1.1 Floating-Point Registers doubleword element 0 of VSR[0], FPR[1] is located in doubleword element 0 of VSR[1], and so forth. Chapter 4. Floating-Point Facility [Category: Floating-Point] provides 32 64-bit FPRs. Chapter 5. All instructions that operate on an FPR are redefined Decimal Floating-Point [Category: Decimal to operate on doubleword element 0 of the Floating-Point] also employs FPRs in decimal corresponding VSR. The contents of doubleword floating-point (DFP) operations. When VSX is element 1 of the VSR corresponding to a source FPR implemented, the 32 FPRs are mapped to doubleword or FPR pair for these instructions are ignored and the 0 of VSRs 0-31. For example, FPR[0] is located in contents of doubleword element 1 of the VSR corresponding to the target FPR or FPR pair for these instructions are undefined. VSR[0] FPR[0] VSR[1] FPR[1] ... ... VSR[30] FPR[30] VSR[31] FPR[31] VSR[32] VSR[33] ... ... VSR[62] VSR[63] 0 63 127 Figure 108.Floating-Point Registers as part of VSRs 274 Power ISATM Book I Version 2.06 7.2.1.2 Vector Registers Chapter 6. Vector Facility [Category: Vector] provides All instructions that operate on a VR are redefined to 32 128-bit VRs. When VSX is implemented, the 32 operate on the corresponding VSR. VRs are mapped to VSRs 32-63. For example, VR[0] is located in VSR[32], VR[1] is located in VSR[33], and so forth. VSR[0] VSR[1] ... ... VSR[30] VSR[31] VSR[32] VR[0] VSR[33] VR[1] ... ... VSR[62] VR[30] VSR[63] VR[31] 0 127 Figure 109.Vector Registers as part of VSRs Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 275 Version 2.06 7.2.2 Floating-Point Status and Control Register The Floating-Point Status and Control Register Bits Definition (FPSCR) controls the handling of floating-point exceptions and records status resulting from the 34 Floating-Point Invalid Operation Exception floating-point operations. Bits 0:19 and 32:55 are Summary (VX) status bits. Bits 56:63 are control bits. This bit is the OR of all the Invalid Operation exception bits. mcrfs, mtfsfi, mtfsf, mtfsb0, The exception status bits in the FPSCR (bits 35:44, and mtfsb1 cannot alter VX explicitly. 53:55) are sticky; that is, once set to 1 they remain set to 1 until they are set to 0 by an mcrfs, mtfsfi, mtfsf, 35 Floating-Point Overflow Exception (OX) or mtfsb0 instruction. The exception summary bits in This bit is set to 1 when a VSX Scalar the FPSCR (FX, FEX, and VX, which are bits 32:34) Floating-Point Arithmetic, VSX Vector are not considered to be "exception status bits", and Floating-Point Arithmetic, VSX Scalar DP-SP only FX is sticky. Conversion or VSX Vector DP-SP Conversion class instruction causes an Overflow Programming Note exception. See Section 7.4.3 , "Floating-Point Access to Move To FPSCR and Move From Overflow Exception" on page 304. FPSCR instructions requires FP=1. This bit can be set to 0 or 1 by a Move To FPSCR class instruction. FEX and VX are simply the ORs of other FPSCR bits. Therefore these two bits are not listed among the 36 Floating-Point Underflow Exception (UX) FPSCR bits affected by the various instructions. This bit is set to 1 when a VSX Scalar Floating-Point Arithmetic, VSX Vector The bit definitions for the FPSCR are as follows. Floating-Point Arithmetic, VSX Scalar DP-SP Conversion or VSX Vector DP-SP Conversion Bits Definition class instruction causes an Underflow exception. See Section 7.4.4 , "Floating-Point 0:28 Decimal Floating-Point Rounding Underflow Exception" on page 306. Control (DRN) This bit can be set to 0 or 1 by a Move To This field is not used by VSX instructions. FPSCR class instruction. 32 Floating-Point Exception Summary (FX) Every floating-point instruction, except mtfsfi 37 Floating-Point Zero Divide and mtfsf, implicitly sets FX to 1 if that Exception (ZX) instruction causes any of the floating-point This bit is set to 1 when a VSX Scalar exception bits in the FPSCR to change from 0 Floating-Point Arithmetic or VSX Vector to 1. mcrfs, mtfsfi, mtfsf, mtfsb0, and Floating-Point Arithmetic class instruction mtfsb1 can alter FX explicitly. causes an Zero Divide exception. See Section 7.4.2 , "Floating-Point Zero Divide Programming Note Exception" on page 302. FX is defined not to be altered implicitly by This bit can be set to 0 or 1 by a Move To mtfsfi and mtfsf because permitting these FPSCR class instruction. instructions to alter FX implicitly can cause a paradox. An example is an mtfsfi or 38 Floating-Point Inexact Exception (XX) mtfsf instruction that supplies 0 for FX This bit is set to 1 when a VSX Scalar and 1 for OX, and is executed when Floating-Point Arithmetic, VSX Vector OX=0. See also the Programming Notes Floating-Point Arithmetic, VSX Scalar Integer with the definition of these two instruc- Conversion, VSX Vector Integer Conversion, tions. VSX Scalar Round to Floating-Point Integer, or VSX Vector Round to Floating-Point Integer class instruction causes an Inexact 33 Floating-Point Enabled Exception exception. See Section 7.4.5 , "Floating-Point Summary (FEX) Inexact Exception" on page 308. This bit is the OR of all the floating-point exception bits masked by their respective This bit can be set to 0 or 1 by a Move To enable bits. mcrfs, mtfsfi, mtfsf, mtfsb0, FPSCR class instruction. and mtfsb1 cannot alter FEX explicitly. 276 Power ISATM Book I Version 2.06 Bits Definition Bits Definition 39 Floating-Point Invalid Operation Exception 43 Floating-Point Invalid Operation Exception (SNAN) (VXSNAN) (Inf×Zero) (VXIMZ) This bit is set to 1 when a VSX Scalar This bit is set to 1 when a VSX Scalar Floating-Point and VSX Vector Floating-Point Floating-Point Arithmetic and VSX Vector class instruction causes an SNaN type Invalid Floating-Point Arithmetic class instruction Operation exception. See Section 7.4.1 , causes a Infinity × Zero type Invalid Operation "Floating-Point Invalid Operation Exception" exception. See Section 7.4.1 , "Floating-Point on page 295. Invalid Operation Exception" on page 295. This bit can be set to 0 or 1 by a Move To This bit can be set to 0 or 1 by a Move To FPSCR class instruction. FPSCR class instruction. 40 Floating-Point Invalid Operation Exception 44 Floating-Point Invalid Operation Exception (Inf-Inf) (VXISI) (Invalid Compare) (VXVC) This bit is set to 1 when a VSX Scalar This bit is set to 1 when a VSX Scalar Floating-Point Arithmetic and VSX Vector Compare Double-Precision, VSX Vector Floating-Point Arithmetic class instruction Compare Double-Precision, or VSX Vector causes an Infinity ­ Infinity type Invalid Compare Single-Precision class instruction Operation exception. See Section 7.4.1 , causes an Invalid Compare type Invalid "Floating-Point Invalid Operation Exception" Operation exception. See Section 7.4.1 , on page 295. "Floating-Point Invalid Operation Exception" on page 295. This bit can be set to 0 or 1 by a Move To FPSCR class instruction. This bit can be set to 0 or 1 by a Move To FPSCR class instruction. 41 Floating-Point Invalid Operation Exception (Inf÷Inf) (VXIDI) 45 Floating-Point Fraction Rounded (FR) This bit is set to 1 when a VSX Scalar This bit is set to 0 or 1 by VSX Scalar Floating-Point Arithmetic and VSX Vector Floating-Point Arithmetic, VSX Scalar Integer Floating-Point Arithmetic class instruction Conversion, and VSX Scalar Round to causes an Infinity ÷ Infinity type Invalid Floating-Point Integer class instructions to Operation exception. See Section 7.4.1 , indicate whether or not the fraction was "Floating-Point Invalid Operation Exception" incremented during rounding. See Section on page 295. 7.3.2.6 , "Rounding" on page 287. This bit is not sticky. This bit can be set to 0 or 1 by a Move To FPSCR class instruction. 46 Floating-Point Fraction Inexact (FI) This bit is set to 0 or 1 by VSX Scalar 42 Floating-Point Invalid Operation Exception Floating-Point Arithmetic, VSX Scalar Integer (Zero÷Zero) (VXZDZ) Conversion, and VSX Scalar Round to This bit is set to 1 when a VSX Scalar Floating-Point Integer class instructions to Floating-Point Arithmetic and VSX Vector indicate whether or not the rounded result is Floating-Point Arithmetic class instruction inexact or the instruction caused a disabled causes a Zero ÷ Zero type Invalid Operation Overflow exception. See Section 7.3.2.6 on exception. See Section 7.4.1 , "Floating-Point page 287. This bit is not sticky. Invalid Operation Exception" on page 295. See the definition of XX, above, regarding the This bit can be set to 0 or 1 by a Move To relationship between FI and XX. FPSCR class instruction. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 277 Version 2.06 Bits Definition Bits Definition 47:51 Floating-Point Result Flags (FPRF) 48 Floating-Point Less Than or VSX Scalar Floating-Point Arithmetic, VSX Negative (FL) Scalar DP-SP Conversion, VSX Scalar Convert Integer to Double-Precision, and 49 Floating-Point Greater Than or VSX Scalar Round to Double-Precision Positive (FG) Integer class instructions set this field based on the result placed into the target register 50 Floating-Point Equal or Zero (FE) and on the target precision, except that if any portion of the result is undefined then the 51 Floating-Point Unordered or NaN (FU) value placed into FPRF is undefined. 52 Reserved For VSX Scalar Convert Double-Precision to Integer class instructions, the value placed 53 Floating-Point Invalid Operation Exception into FPRF is undefined. (Software-Defined Condition) (VXSOFT) This bit can be altered only by mcrfs, mtfsfi, Additional details are as follows. mtfsf, mtfsb0, or mtfsb1. See Section 7.4.1 , "Floating-Point Invalid Operation 47 Floating-Point Result Class Exception" on page 295. Descriptor (C) VSX Scalar Floating-Point Arithmetic, VSX Programming Note Scalar DP-SP Conversion, VSX Scalar VXSOFT can be used by software to indi- Convert Integer to Double-Precision, and cate the occurrence of an arbitrary, soft- VSX Scalar Round to Double-Precision ware-defined, condition that is to be Integer class instructions set this bit with the treated as an Invalid Operation exception. FPCC bits, to indicate the class of the result For example, the bit could be set by a pro- as shown in Table 2, "Floating-Point Result gram that computes a base 10 logarithm if Flags," on page 281. the supplied input is negative. 48:51 Floating-Point Condition Code (FPCC) VSX Scalar Compare Double-Precision 54 Floating-Point Invalid Operation Exception instruction sets one of the FPCC bits to 1 and (Invalid Square Root) (VXSQRT) the other three FPCC bits to 0 based on the This bit is set to 1 when a VSX Scalar relative values of the operands being Floating-Point Arithmetic or VSX Vector compared. Floating-Point Arithmetic class instruction causes a Invalid Square Root type Invalid VSX Scalar Floating-Point Arithmetic, VSX Operation exception. See Section 7.4.1 , Scalar DP-SP Conversion, VSX Scalar "Floating-Point Invalid Operation Exception" Convert Integer to Double-Precision, and on page 295. VSX Scalar Round to Double-Precision This bit can be set to 0 or 1 by a Move To Integer class instructions set the FPCC bits FPSCR class instruction. with the C bit, to indicate the class of the result as shown in Table 2, "Floating-Point 55 Floating-Point Invalid Operation Exception Result Flags," on page 281. Note that in this (Invalid Integer Convert) (VXCVI) case the high-order three bits of the FPCC This bit is set to 1 when a VSX Scalar retain their relational significance indicating Convert Double-Precision to Integer, VSX that the value is less than, greater than, or Vector Convert Double-Precision to Integer, equal to zero. or VSX Vector Convert Single-Precision to Integer class instruction causes a Invalid Integer Convert type Invalid Operation exception. See Section 7.4.1 , "Floating-Point Invalid Operation Exception" on page 295. This bit can be set to 0 or 1 by a Move To FPSCR class instruction. 278 Power ISATM Book I Version 2.06 Bits Definition 0 The processor is not in floating-point non-IEEE mode (i.e., all floating-point 56 Floating-Point Invalid Operation Exception operations conform to the IEEE Enable (VE) standard). This bit is used by VSX Scalar Floating-Point 1 The processor is in floating-point and VSX Vector Floating-Point class non-IEEE mode. instructions to enable trapping on Invalid Operation exceptions. See Section 7.4.1 , "Floating-Point Invalid Operation Exception" on page 295. 57 Floating-Point Overflow Exception Enable (OE) This bit is used by VSX Scalar Floating-Point and VSX Vector Floating-Point class instructions to enable trapping on Overflow exceptions. See Section 7.4.3 , "Floating-Point Overflow Exception" on page 304. 58 Floating-Point Underflow Exception Enable (UE) This bit is used by VSX Scalar Floating-Point and VSX Vector Floating-Point class instructions to enable trapping on Underflow exceptions. See Section 7.4.4 , "Floating-Point Underflow Exception" on page 306. 59 Floating-Point Zero Divide Exception Enable (ZE) This bit is used by VSX Scalar Floating-Point and VSX Vector Floating-Point class instructions to enable trapping on Zero Divide exceptions. See Section 7.4.2 , "Floating-Point Zero Divide Exception" on page 302. 60 Floating-Point Inexact Exception Enable (XE) This bit is used by VSX Scalar Floating-Point and VSX Vector Floating-Point class instructions to enable trapping on Inexact exceptions. See Section 7.4.5 , "Floating-Point Inexact Exception" on page 308. 61 Floating-Point Non-IEEE Mode (NI) Floating-point non-IEEE mode is optional. If floating-point non-IEEE mode is not implemented, this bit is treated as reserved, and the remainder of the definition of this bit does not apply. If floating-point non-IEEE mode is implemented, this bit has the following meaning. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 279 Version 2.06 Bits Definition 61 Floating-Point Non-IEEE Mode (NI) (continued) When the processor is in floating-point non-IEEE mode, the remaining FPSCR bits is permitted to have meanings different from those given in this document, and floating-point operations need not conform to the IEEE standard. The effects of executing a given floating-point instruction with NI=1, and any additional requirements for using non-IEEE mode, are implementation-dependent. The results of executing a given instruction in non-IEEE mode is permitted to vary between implementations, and between different executions on the same implementation. Programming Note When the processor is in floating-point non-IEEE mode, the results of float- ing-point operations is permitted to be approximate, and performance for these operations might be better, more predict- able, or less data-dependent than when the processor is not in non-IEEE mode. For example, in non-IEEE mode an implemen- tation is permitted to return 0 instead of a denormalized number and return a large number instead of an infinity. 62:63 Floating-Point Rounding Control (RN) This field is used by VSX Scalar Floating-Point and VSX Vector Floating-Point class instructions that round their result and the rounding mode is not implied by the opcode. This bit can be explicitly set or reset by a new Move To FPSCR class instruction. See Section 7.3.2.6 , "Rounding" on page 287. 00 Round to Nearest Even 01 Round toward Zero 10 Round toward +Infinity 11 Round toward -Infinity 280 Power ISATM Book I Version 2.06 Result Flags Result Value Class C FL FG FE FU 1 0 0 0 1 Quiet NaN 0 1 0 0 1 - Infinity 0 1 0 0 0 - Normalized Number 1 1 0 0 0 - Denormalized Number 1 0 0 1 0 - Zero 0 0 0 1 0 + Zero 1 0 1 0 0 + Denormalized Number 0 0 1 0 0 + Normalized Number 0 0 1 0 1 + Infinity Table 2. Floating-Point Result Flags Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 281 Version 2.06 7.3 VSX Operations listed in Sections 7.6.1, 7.6.2.1, and 7.6.8 through 7.6.9. A floating-point number consists of a signed exponent 7.3.1 VSX Floating-Point Arith- and a signed significand. The quantity expressed by this number is the product of the significand and the metic Overview number 2exponent. Encodings are provided in the data This section describes the floating-point arithmetic and format to represent finite numeric values, ±Infinity, and exception model supported by category Vector-Scalar values that are "Not a Number" (NaN). Operations Extension. Except for extensions to support 32-bit sin- involving infinities produce results obeying traditional gle-precision floating-point vector operations, the mod- mathematical conventions. NaNs have no mathemati- els are identical to that described in Chapter 4. cal interpretation. Their encoding permits a variable Floating-Point Facility [Category: Floating-Point]. diagnostic information field. NaNs might be used to indicate such things as uninitialized variables and can The processor (augmented by appropriate software be produced by certain invalid operations. support, where required) implements a floating-point system compliant with the ANSI/IEEE Standard There is one class of exceptional events that occur dur- 754-1985, IEEE Standard for Binary Floating-Point ing instruction execution that is unique to categories Arithmetic (hereafter referred to as the IEEE standard). Vector-Scalar Extension and Floating-Point: the Float- That standard defines certain required "operations" ing-Point Exception. Floating-point exceptions are sig- (addition, subtraction, and so on). Herein, the term, naled with bits set in the FPSCR. They can cause the floating-point operation, is used to refer to one of these system floating-point enabled exception error handler required operations and to additional operations to be invoked, precisely or imprecisely, if the proper defined (e.g., those performed by Multiply-Add or control bits are set. Reciprocal Estimate instructions). A Non-IEEE mode is also provided. This mode, which is permitted to pro- Floating-Point Exceptions duce results not in strict compliance with the IEEE stan- The following floating-point exceptions are detected by dard, allows shorter latency. the processor: Instructions are provided to perform arithmetic, round- ing, conversion, comparison, and other operations in ­ Invalid Operation exception (VX) VSRs; to move floating-point data between storage and SNaN (VXSNAN) these registers. Infinity-Infinity (VXISI) These instructions are divided into two categories. Infinity÷Infinity (VXIDI) Zero÷Zero (VXZDZ) ­ computational instructions Infinity×Zero (VXIMZ) Invalid Compare (VXVC) The computational instructions are those that Software-Defined Condition (VXSOFT) perform addition, subtraction, multiplication, Invalid Square Root (VXSQRT) division, extracting the square root, rounding, Invalid Integer Convert (VXCVI) conversion, comparison, and combinations of ­ Zero Divide exception (ZX) these operations. These instructions provide the ­ Overflow exception (OX) floating-point operations. There are two forms of ­ Underflow exception (UX) computational instructions, scalar, which perform ­ Inexact exception (XX) a single floating-point operation, and vector, which perform either two double-precision floating-point Each floating-point exception, and each category of operations or four single-precision operations. Invalid Operation exception, has an exception bit in the Computational instructions place status FPSCR. In addition, each floating-point exception has information into the Floating-Point Status and a corresponding enable bit in the FPSCR. See Control Register. They are the instructions Section 7.2.2, "Floating-Point Status and Control described in Sections 7.6.3 through 7.6.7.2. Register" on page 276 for a description of these exception and enable bits, and Section 7.3.3 , "VSX ­ noncomputational instructions Floating-Point Execution Models" on page 289 for a detailed discussion of floating-point exceptions, The noncomputational instructions are those that including the effects of the enable bits. perform loads and stores, move the contents of a VSR to another floating-point register possibly altering the sign, and select the value from one of two VSRs based on the value in a third VSR. The operations performed by these instructions are not considered floating-point operations. These instructions do not alter the Floating-Point Status and Control Register. They are the instructions 282 Power ISATM Book I Version 2.06 7.3.2 VSX Floating-Point Data 7.3.2.1 Data Format Values in floating-point format are composed of three fields: This architecture defines the representation of a floating-point value in two different binary fixed-length S sign bit formats, 32-bit single-precision format and 64-bit EXP exponent+bias double-precision format. The single-precision format is FRACTION fraction used for SP data in storage and registers. The double-precision format is used for DP data in storage Representation of numeric values in the floating-point and registers. formats consists of a sign bit (S), a biased exponent (EXP), and the fraction portion (FRACTION) of the The lengths of the exponent and the fraction fields significand. The significand consists of a leading differ between these two formats. The structure of the implied bit concatenated on the right with the single-precision and double-precision formats is shown FRACTION. This leading implied bit is 1 for normalized below. numbers and 0 for denormalized numbers and is located in the unit bit position (that is, the first bit to the left of the binary point). Values representable within the two floating-point formats can be specified by the parameters listed in Table 3. S EXP FRACTION 0 9 31 Figure 110. Floating-point single-precision format S EXP FRACTION 01 12 63 Figure 111.Floating-point double-precision format Single-Precision Format Double-Precision Format Exponent Bias +127 +1023 Maximum Exponent (Emax) +127 +1023 Minimum Exponent (Emin) -126 -1022 Widths (bits): Format 32 64 Sign 1 1 Exponent 8 11 Fraction 23 52 Significand 24 53 Nmax (1-2-24) x 2128 3.4 x 1038 (1-2-53) x 21024 1.8 x 10308 Nmin 1.0 x 2-126 1.2 x 10-38 1.0 x 2-1022 2.2 x 10-308 Dmin 1.0 x 2-149 1.4 x 10-45 1.0 x 2-1074 4.9 x 10-324 Value is approximate Dmin Smallest (in magnitude) representable denormalized number. Nmax Largest (in magnitude) representable number. Nmin Smallest (in magnitude) representable normalized number. Table 3. IEEE floating-point fields Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 283 Version 2.06 7.3.2.2 Value Representation Zero values (±0) These are values that have a biased exponent This architecture defines numeric and nonnumeric value of zero and a fraction value of zero. Zeros values representable within each of the two supported can have a positive or negative sign. The sign of formats. The numeric values are approximations to the zero is ignored by comparison operations (that is, real numbers and include the normalized numbers, comparison regards +0 as equal to -0). denormalized numbers, and zero values. The nonnumeric values representable are the infinities and Denormalized numbers (±DEN) the Not a Numbers (NaNs). The infinities are adjoined These are values that have a biased exponent to the real numbers, but are not numbers themselves, value of zero and a nonzero fraction value. They and the standard rules of arithmetic do not hold when are nonzero numbers smaller in magnitude than they are used in an operation. They are related to the the representable normalized numbers. They are real numbers by order alone. It is possible however to values in which the implied unit bit is 0. define restricted operations among numbers and Denormalized numbers are interpreted as follows: infinities as defined below. The relative location on the real number line for each of the defined entities is DEN = (-1)s x 2Emin x (0.fraction) shown in Figure 112. where Emin is the minimum representable Figure 112.Approximation to real numbers exponent value (-126 for single-precision, -1022 for double-precision). -INF -NOR -DEN ­0 +0 +DEN +NOR +INF Infinities (±INF) These are values that have the maximum biased exponent value: The NaNs are not related to the numeric values or 255 in single-precision format infinities by order or value but are encodings used to 2047 in double-precision format convey diagnostic information such as the representation of uninitialized variables. and a zero fraction value. They are used to approximate values greater in magnitude than the The following is a description of the different maximum normalized value. floating-point values defined in the architecture: Infinity arithmetic is defined as the limiting case of Binary floating-point numbers real arithmetic, with restricted operations defined Machine representable values used as among numbers and infinities. Infinities and the approximations to real numbers. Three categories real numbers can be related by ordering in the of numbers are supported: normalized numbers, affine sense: denormalized numbers, and zero values. -Infinity < every finite number < +Infinity Normalized numbers (±NOR) These are values that have a biased exponent Arithmetic on infinities is always exact and does value in the range: not signal any exception, except when an exception occurs due to the invalid operations as 1 to 254 in single-precision format described in Section 7.4.1 , "Floating-Point Invalid 1 to 2046 in double-precision format Operation Exception" on page 295. They are values in which the implied unit bit is 1. For comparison operations, +Infinity compares Normalized numbers are interpreted as follows: equal to +Infinity and -Infinity compares equal to -Infinity. NOR = (-1)s x 2E x (1.fraction) Not a Numbers (NaNs) where s is the sign, E is the unbiased exponent, These are values that have the maximum biased and 1.fraction is the significand, which is exponent value and a nonzero fraction value. The composed of a leading unit bit (implied bit) and a sign bit is ignored (that is, NaNs are neither fraction part. positive nor negative). If the high-order bit of the fraction field is 0, the NaN is a Signaling NaN; otherwise it is a Quiet NaN. Signaling NaNs are used to signal exceptions when they appear as operands of computational instructions. 284 Power ISATM Book I Version 2.06 Quiet NaNs are used to represent the results of if the low-order 29 bits of the double-precision certain invalid operations, such as invalid NaN's fraction are zero. arithmetic operations on infinities or on NaNs, when Invalid Operation exception is disabled 7.3.2.3 Sign of Result (VE=0). Quiet NaNs propagate through all floating-point operations except ordered The following rules govern the sign of the result of an comparison and conversion to integer. Quiet arithmetic, rounding, or conversion operation, when NaNs do not signal exceptions, except for ordered the operation does not yield an exception. They apply comparison and conversion to integer operations. even when the operands or results are zeros or Specific encodings in QNaNs can thus be infinities. preserved through a sequence of floating-point operations, and used to convey diagnostic ­ The sign of the result of an add operation is the information to help identify results from invalid sign of the operand having the larger absolute operations. value. If both operands have the same signs, the sign of the result of an add operation is the same Assume the following generic arithmetic as the sign of the operands. The sign of the result templates. of the subtract operation x-y is the same as the sign of the result of the add operation x+(-y). f(src1,src3,src2) ex: result = (src1 x src3) - src2 When the sum of two operands with opposite sign, f(src1,src2) or the difference of two operands with the same ex: result = src1 x src2 signs, is exactly zero, the sign of the result is ex: result = src1 + src2 positive in all rounding modes except Round f(src1) toward -Infinity, in which mode the sign is ex: result = f(src1) negative. When a QNaN is the result of a floating-point ­ The sign of the result of a multiply or divide operation because one of the operands is a NaN operation is the Exclusive OR of the signs of the or because a QNaN was generated due to a operands. trap-disabled Invalid Operation exception, the following rule is applied to determine the NaN with ­ The sign of the result of a Square Root or the high-order fraction bit set to 1 that is to be Reciprocal Square Root Estimate operation is stored as the result. always positive, except that the square root of -0 is -0 and the reciprocal square root of -0 is if src1 is a NaN -Infinity. then result = Quiet(src1) else if src2 is a NaN (if there is a src2) ­ The sign of the result of a Convert From Integer or then result = Quiet(src2) Round to Floating-Point Integer operation is the else if src3 is a NaN (if there is a src3) sign of the operand being converted. then result = Quiet(src3) else if disabled invalid operation exception For the Multiply-Add instructions, the rules given above then result = generated QNaN are applied first to the multiply operation and then to the add or subtract operation (one of the inputs to the where Quiet(x) means x if x is a QNaN and x add or subtract operation is the result of the multiply converted to a QNaN if x is an SNaN. Any operation). instruction that generates a QNaN as the result of a disabled Invalid Operation exception generates the value 0x7FF8_0000_0000_0000 for 7.3.2.4 Normalization and double-precision and 0x7FC0_0000 for Denormalization single-precision. The intermediate result of an arithmetic instruction can Note that the M-form multiply-add-type require normalization and/or denormalization as instructions use the B source operand to specify described below. Normalization and denormalization src3 and the T target operand to specify src2, do not affect the sign of the result. whereas A-form multiply-add-type instructions use the B source operand to specify src2 and the T When an arithmetic or rounding instruction produces target operand to specify src3. an intermediate result which carries out of the significand, or in which the significand is nonzero but A double-precision NaN is considered to be has a leading zero bit, it is not a normalized number representable in single-precision format if and only and must be normalized before it is stored. For the Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 285 Version 2.06 carry-out case, the significand is shifted right one bit, 1. Rounding to a floating-point integer with a one shifted into the leading significand bit, and the exponent is incremented by one. For the The VSX Scalar Round to Floating-Point Integer leading-zero case, the significand is shifted left while instructions round a double-precision operand to decrementing its exponent by one for each bit shifted, an integer value in double-precision format. until the leading significand bit becomes one. The Guard bit and the Round bit (see Section 7.3.3.1, "VSX The VSX Vector Round to Double-Precision Execution Model for IEEE Operations" on page 289) Integer instructions round each double-precision participate in the shift with zeros shifted into the Round vector operand element to an integer value in bit. The exponent is regarded as if its range were double-precision format. unlimited. The VSX Vector Round to Single-Precision After normalization, or if normalization was not Integer instructions round each single-precision required, the intermediate result can have a nonzero vector operand element to an integer value in significand and an exponent value that is less than the single-precision format. minimum value that can be represented in the format specified for the result. In this case, the intermediate Except for xsrdpic, xvrdpic, and xvrspic, result is said to be "Tiny" and the stored result is rounding is performed using the rounding mode determined by the rules described in Section 7.4.4 , specified by the opcode. For xsrdpic, xvrdpic, "Floating-Point Underflow Exception" on page 306. and xvrspic, rounding is performed using the These rules can require denormalization. rounding mode specified by RN. These instructions can cause Invalid Operation A number is denormalized by shifting its significand (VXSNAN) exceptions. xsrdpic, xvrdpic, and right while incrementing its exponent by 1 for each bit xvrspic can also cause Inexact exception. See shifted, until the exponent is equal to the format's Sections 7.3.2.6 and 7.3.3.1 for more information minimum value. If any significant bits are lost in this about rounding. shifting process, "Loss of Accuracy" has occurred (See Section 7.4.4 , "Floating-Point Underflow Exception" 2. Converting floating-point format to integer format on page 306) and Underflow exception is signaled. The VSX Scalar Convert Double-Precision to Engineering Note Integer instructions convert a double-precision operand to 32-bit or 64-bit signed or unsigned When denormalized numbers are operands of mul- integer format. tiply, divide, and square root operations, some implementations might prenormalize the operands The VSX Vector Convert Double-Precision to internally before performing the operations. Integer and VSX Vector Convert Single-Precision to Integer instructions convert either double-precision or single-precision vector 7.3.2.5 Data Handling and Precision operand elements to 32-bit or 64-bit signed or unsigned integer format. Instructions are also provided for manipulations which do not require double-precision or single-precision. In Rounding is performed based on the value of RN. addition, instructions are provided to access an integer These instructions can cause Invalid Operation representation in GPRs. (VXSNAN, VXCVI) and Inexact exceptions. Integer-Valued Operands 3. Converting integer format to floating-point format Instructions are provided to round floating-point The VSX Scalar Convert Integer to operands to integer values in floating-point format. To Double-Precision instructions convert a 32-bit or facilitate exchange of data between the floating-point 64-bit signed or unsigned integer to and integer processing, instructions are provided to double-precision floating-point format. convert between floating-point double and single-precision format and integer word and The VSX Vector Convert Integer to doubleword format in a VSR. Computation on Double-Precision instructions convert each 32-bit integer-valued operands can be performed using or 64-bit signed or unsigned integer vector arithmetic instructions of the required precision. (The operand element to double-precision floating-point results might not be integer values.) The three groups format. of instructions provided specifically to support integer-valued operands are described below. The VSX Vector Convert Integer to Single-Precision instructions convert each 32-bit 286 Power ISATM Book I Version 2.06 or 64-bit signed or unsigned integer vector vector convert single-precision to integer is converted operand element to single-precision floating-point to a signed or unsigned integer. format. FR and FI generally indicate the results of rounding. Rounding is performed using Round towards Zero Each of the scalar instructions which rounds its rounding mode. Because of the limitations of the intermediate result sets these bits. There are no vector source format, only an Inexact exception can be instructions that modify FR and FI. If the fraction is generated. incremented during rounding, FR is set to 1, otherwise FR is set to 0. If the result is inexact, FI is set to 1, otherwise FI is set to zero. The scalar round to 7.3.2.6 Rounding double-precision integer instructions are exceptions to The material in this section applies to operations that this rule, setting FR and FI to 0. The scalar have numeric operands (that is, operands that are not double-precision estimate instructions set FR and FI to infinities or NaNs). Rounding the intermediate result of undefined values. The remaining scalar floating-point such an operation can cause an Overflow exception, instructions do not alter FR and FI. an Underflow exception, or an Inexact exception. The Four user-selectable rounding modes are provided remainder of this section assumes that the operation through the Floating-Point Rounding Control field in causes no exceptions and that the result is numeric. the FPSCR. See Section 7.2.2, "Floating-Point Status See Section 7.3.2.2, "Value Representation" and and Control Register" on page 276. These are Section 7.4, "VSX Floating-Point Exceptions" for the encoded as follows. cases not covered here. RN Rounding Mode The floating-point arithmetic, and rounding and 00 Round to Nearest Even conversion instructions round their intermediate 01 Round towards Zero results. With the exception of the estimate instructions, 10 Round towards +Infinity these instructions produce an intermediate result that 11 Round towards -Infinity can be regarded as having unbounded precision and exponent range. All but two groups of these A fifth rounding mode is provided in the round to instructions normalize or denormalize the intermediate floating-point integer instructions (Section 7.6.7.2 on result prior to rounding and then place the final result page 320), Round to Nearest Away. into the target element of the target VSR in either double or single-precision format. Let Z be the intermediate arithmetic result or the operand of a convert operation. If Z can be The scalar round to double-precision integer, vector represented exactly in the target format, the result in round to double-precision integer, and convert all rounding modes is Z as represented in the target double-precision to integer instructions with biased format. If Z cannot be represented exactly in the target exponents ranging from 1022 through 1074 are format, let Z1 and Z2 bound Z as the next larger and prepared for rounding by repetitively shifting the next smaller numbers representable in the target significand right one position and incrementing the format. Then Z1 or Z2 can be used to approximate the biased exponent until it reaches a value of 1075. result in the target format. (Intermediate results with biased exponents 1075 or larger are already integers, and with biased exponents Figure 113 shows the relation of Z, Z1, and Z2 in this 1021 or less round to zero.) After rounding, the final case. The following rules specify the rounding in the result for round to double-precision integer instructions four modes. is normalized and put in double-precision format, and, for the convert double-precision to integer instructions, See Section 7.3.3.1, "VSX Execution Model for IEEE is converted to a signed or unsigned integer. Operations" on page 289 for a detailed explanation of rounding. The vector round to single-precision integer and vector convert single-precision to integer instructions with Figure 113 also summarizes the rounding actions for biased exponents ranging from 126 through 178 are floating-point intermediate result for all supported prepared for rounding by repetitively shifting the rounding modes. significand right one position and incrementing the biased exponent until it reaches a value of 179. (Intermediate results with biased exponents 179 or larger are already integers, and with biased exponents 125 or less round to zero.) After rounding, the final result for vector round to single-precision integer is normalized and put in double-precision format, and for Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 287 Version 2.06 By Incrementing the least-significant bit of Z Infinitely-Precise Value By Truncating after the least-significant bit Z2 Z1 0 Z2 Z1 Z Z Negative values Positive values Round to Nearest Away Choose the value that is closer to Z (Z1 or Z2). In case of a tie, choose the one that is furthest away from 0. Round to Nearest Even Choose the value that is closer to Z (Z1 or Z2). In case of a tie, choose the one that is even (least significant bit is 0). Round toward Zero Choose the smaller in magnitude (Z1 or Z2). Round toward +Infinity Choose Z1. Round toward -Infinity Choose Z2. Figure 113.Selection of Z1 and Z2 288 Power ISATM Book I Version 2.06 7.3.3 VSX Floating-Point Execution Models All implementations of this architecture must provide having the following format, where bits 0:55 comprise the equivalent of the following execution models to the significand of the intermediate result. ensure that identical results are obtained. S C L FRACTION G R X Special rules are provided in the definition of the 0 1 53 54 55 computational instructions for the infinities, denormalized numbers and NaNs. The material in the Figure 114.IEEE floating-point execution model remainder of this section applies to instructions that have numeric operands and a numeric result (that is, The S bit is the sign bit. operands and result that are not infinities or NaNs), and that cause no exceptions. See Section 7.3.2.2 and The C bit is the carry bit, which captures the carry out Section 7.3.3 for the cases not covered here. of the significand. Although the double-precision format specifies an The L bit is the leading unit bit of the significand, which 11-bit exponent, exponent arithmetic makes use of two receives the implicit bit from the operand. additional bits to avoid potential transient overflow and underflow conditions. One extra bit is required when The FRACTION is a 52-bit field that accepts the denormalized double-precision numbers are fraction of the operand. prenormalized. The second bit is required to permit the computation of the adjusted exponent value in the The Guard (G), Round (R), and Sticky (X) bits are following cases when the corresponding exception extensions to the low-order bits of the accumulator. enable bit is 1: The G and R bits are required for postnormalization of the result. The G, R, and X bits are required during ­ Underflow during multiplication using a rounding to determine if the intermediate result is denormalized operand. equally near the two nearest representable values. ­ Overflow during division using a denormalized The X bit serves as an extension to the G and R bits divisor. by representing the logical OR of all bits that appear to ­ Undeflow during division using denormalized the low-order side of the R bit, resulting from either dividend and a large divisor. shifting the accumulator right or to other generation of low-order result bits. The G and R bits participate in The IEEE standard includes 32-bit and 64-bit the left shifts with zeros being shifted into the R bit. arithmetic. The standard requires that single-precision Table 4 shows the significance of the G, R, and X bits arithmetic be provided for single-precision operands. with respect to the intermediate result (IR), the representable number next lower in magnitude (NL), VSX defines both scalar and vector double-precision and the representable number next higher in floating-point operations to operate only on magnitude (NH). double-precision operands. VSX also defines vector single-precision floating-point operations to operate G R X Interpretation only on single-precision operands. 0 0 0 IR is exact 7.3.3.1 VSX Execution Model for IEEE 0 0 1 Operations 0 1 0 IR closer to NL The following description uses 64-bit arithmetic as an 0 1 1 example. 32-bit arithmetic is similar except that the 1 0 0 IR midway between NL and NH FRACTION is a 23-bit field, and the single-precision 1 0 1 Guard, Round, and Sticky bits (described in this section) are logically adjacent to the 23-bit FRACTION 1 1 0 IR closer to NH field. 1 1 1 IEEE-conforming significand arithmetic is considered Table 4. Interpretation of G, R, and X bits to be performed with a floating-point accumulator Table 5 shows the positions of the Guard, Round, and Sticky bits for double-precision and single-precision floating-point numbers relative to the accumulator illustrated in Figure 114. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 289 Version 2.06 ­ Round to Nearest Away Format Guard Round Sticky Guard bit = 0 Double G bit R bit X bit The result is truncated. Single 24 25 OR of bits 26:52, G, R, X Guard bit = 1 Table 5. Location of the Guard, Round, and The result is incremented. Sticky bits in the IEEE execution model If any of the Guard, Round, or Sticky bits is nonzero, The significand of the intermediate result is prepared the result is also inexact. for rounding by shifting its contents right, if required, until the least significant bit to be retained is in the If rounding results in a carry into C, the significand is low-order bit position of the fraction. shifted right one position and the exponent is incremented by one. This yields an inexact result, and Four user-selectable rounding modes are provided possibly also exponent overflow. Fraction bits are through RN as described in Section 7.3.2.6, stored to the target VSR. "Rounding" on page 287. The rules for rounding in each mode are as follows. 7.3.3.2 VSX Execution Model for ­ Round to Nearest Even Multiply-Add Type Instructions Guard bit = 0 This architecture provides a special form of instruction The result is truncated. that performs up to three operations in one instruction (a multiplication, an addition, and a negation). With this Guard bit = 1 added capability comes the special ability to produce a Depends on Round and Sticky bits: more exact intermediate result as input to the rounder. 32-bit arithmetic is similar, except that the FRACTION Case a field is smaller. If the Round or Sticky bit is 1 (inclusive), the result is incremented. Multiply-add significand arithmetic is considered to be performed with a floating-point accumulator having the Case b following format, where bits 0:106 comprise the If the Round and Sticky bits are 0 (result significand of the intermediate result. midway between closest representable values), if the low-order bit of the result is 1, the result is incremented. Otherwise (the S C L FRACTION X' low-order bit of the result is 0), the result is 0 1 2 3 106 truncated. This is the case of a tie rounded to Figure 115.Multiply-add 64-bit execution model even. The first part of the operation is a multiplication. The ­ Round toward Zero multiplication has two 53-bit significands as inputs, Choose the smaller in magnitude of Z1 or Z2. If which are assumed to be prenormalized, and produces the Guard, Round, or Sticky bit is nonzero, the a result conforming to the above model. If there is a result is inexact. carry out of the significand (into the C bit), the The result is truncated. significand is shifted right one position, shifting the L bit (leading unit bit) into the most significant bit of the ­ Round toward +Infinity FRACTION and shifting the C bit (carry out) into the L If positive, the result is incremented. bit. All 106 bits (L bit, the FRACTION) of the product If negative, the result is truncated. take part in the add operation. If the exponents of the two inputs to the adder are not equal, the significand of ­ Round toward -Infinity the operand with the smaller exponent is aligned If positive, the result is truncated. (shifted) to the right by an amount that is added to that If negative, the result is incremented. exponent to make it equal to the other input's A fifth rounding mode is provided in the VSX Round to exponent. Zeros are shifted into the left of the Floating-Point Integer instructions (Section 7.6.7.2 on significand as it is aligned and bits shifted out of bit page 320) with the rules for rounding as follows. 105 of the significand are ORed into the X' bit. The add operation also produces a result conforming to the above model with the X' bit taking part in the add operation. 290 Power ISATM Book I Version 2.06 The result of the addition is then normalized, with all bits of the addition result, except the X' bit, participating in the shift. The normalized result serves as the intermediate result that is input to the rounder. For rounding, the conceptual Guard, Round, and Sticky bits are defined in terms of accumulator bits. Figure 6 shows the positions of the Guard, Round, and Sticky bits for double-precision and single-precision floating-point numbers in the multiply-add execution model. Format Guard Round Sticky Double 53 54 OR of 55:105, X' Single 24 25 OR of 26:105, X' Table 6. Location of the Guard, Round, and Sticky bits in the multiply-add execution model The rules for rounding the intermediate result are the same as those given in Section 7.3.3.1. If the instruction is a negative multiply-add or negative multiply-subtract type instruction, the final result is negated. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 291 Version 2.06 7.4 VSX Floating-Point Exceptions This architecture defines the following floating-point ­ An Invalid Operation exception (SNaN) is set with exceptions under the IEEE-754 exception model: an Invalid Operation exception (Infinity×0) for multiply-add class instructions for which the ­ Invalid Operation exception values being multiplied are infinity and zero and SNaN the value being added is an SNaN. Infinity-Infinity Infinity÷Infinity ­ An Invalid Operation exception (SNaN) can be set Zero÷Zero with an Invalid Operation exception (Invalid Infinity×Zero Compare) for ordered comparison instructions. Invalid Compare Software-Defined Condition ­ An Invalid Operation exception (SNaN) can be set Invalid Square Root with an Invalid Operation exception (Invalid Invalid Integer Convert Integer Convert) for convert to integer instructions. ­ Zero Divide exception ­ Overflow exception When an exception occurs, the writing of a result to the ­ Underflow exception target register can be suppressed, or a result can be ­ Inexact exception delivered, depending on the exception. These exceptions, other than Invalid Operation The writing of a result to the target register is exception resulting from a Software-Defined Condition, suppressed for the certain kinds of exceptions, based can occur during execution of computational on whether the instruction is a vector or a scalar instructions. An Invalid Operation exception resulting instruction, so that there is no possibility that one of the from a Software-Defined Condition occurs when a operands is lost. For other kinds of exceptions and Move To FPSCR instruction sets VXSOFT to 1. also depending on whether the instruction is a vector or a scalar instruction, a result is generated and written Each floating-point exception, and each category of to the destination specified by the instruction causing Invalid Operation exception, has an exception bit in the the exception. The result can be a different value for FPSCR. In addition, each floating-point exception has the enabled and disabled conditions for some of these a corresponding enable bit in the FPSCR. The exceptions. Table 7 lists the types of exceptions and exception bit indicates the occurrence of the indicates whether a result is written to the target VSR corresponding exception. If an exception occurs, the or suppressed. corresponding enable bit governs the result produced by the instruction and, in conjunction with the FE0 and Scalar Vector FE1 bits (see page 293), whether and how the system On exception type... Instruction Instruction floating-point enabled exception error handler is Results Results invoked. In general, the enabling specified by the Enabled Invalid Operation suppressed suppressed enable bit is of invoking the system error handler, not of permitting the exception to occur. The occurrence of Enabled Zero Divide suppressed suppressed an exception depends only on the instruction and its Enabled Overflow written suppressed inputs, not on the setting of any control bits. The only deviation from this general rule is that the occurrence Enabled Underflow written suppressed of an Underflow exception depends on the setting of Enabled Inexact written suppressed the enable bit. Disabled Invalid Operation written written A single instruction, other than mtfsfi or mtfsf, can set more than one exception bit only in the following Disabled Zero Divide written written cases: Disabled Overflow written written ­ An Inexact exception can be set with an Overflow Disabled Underflow written written exception. Disabled Inexact written written ­ An Inexact exception can be set with an Table 7. Exception Types Result Suppression Underflow exception. The subsequent sections define each of the floating-point exceptions and specify the action that is taken when they are detected. 292 Power ISATM Book I Version 2.06 The IEEE standard specifies the handling of FE0 FE1 Description exceptional conditions in terms of traps and trap handlers. In this architecture, an FPSCR exception 0 0 Ignore Exceptions Mode enable bit of 1 causes generation of the result value specified in the IEEE standard for the trap enabled Floating-point exceptions do not cause the case; the expectation is that the exception is detected system floating-point enabled exception by software, which revises the result. An FPSCR error handler to be invoked. exception enable bit of 0 causes generation of the 0 1 Imprecise Nonrecoverable Mode default result value specified for the trap disabled (or no trap occurs or trap is not implemented) case. The The system floating-point enabled excep- expectation is that the exception is not detected by tion error handler is invoked at some point software, which uses the default result. The result to at or beyond the instruction that caused the be delivered in each case for each exception is enabled exception. It may not be possible described in the following sections. to identify the excepting instruction or the data that caused the exception. Results The IEEE default behavior when an exception occurs produced by the excepting instruction might is to generate a default value and not to notify have been used by or might have affected software. In this architecture, if the IEEE default subsequent instructions that are executed behavior when an exception occurs is required for all before the error handler is invoked. exceptions, all FPSCR exception enable bits must be 1 0 Imprecise Recoverable Mode set to 0, and Ignore Exceptions Mode (see below) should be used. In this case, the system floating-point The system floating-point enabled excep- enabled exception error handler is not invoked, even if tion error handler is invoked at some point floating-point exceptions occur: software can inspect at or beyond the instruction that caused the the FPSCR exception bits, if necessary, to determine enabled exception. Sufficient information is whether exceptions have occurred. provided to the error handler for it to identify the excepting instruction, the operands, and In this architecture, if software is to be notified that a correct the result. No results produced by given kind of exception has occurred, the the excepting instruction have been used corresponding FPSCR exception enable bit must be by or affected subsequent instructions that set to 1, and a mode other than Ignore Exceptions are executed before the error handler is Mode must be used. In this case, the system invoked. floating-point enabled exception error handler is 1 1 Precise Mode invoked if an enabled floating-point exception occurs. The system floating-point enabled exception error The system floating-point enabled excep- handler is also invoked if a Move To FPSCR tion error handler is invoked precisely at the instruction causes an exception bit and the instruction that caused the enabled excep- corresponding enable bit both to be 1. The Move To tion. FPSCR instruction is considered to cause the enabled exception. Architecture Note The FE0 and FE1 bits must be defined in Book III in The FE0 and FE1 bits control whether and how the a manner such that they can be changed dynami- system floating-point enabled exception error handler cally and can easily be treated as part of a process' is invoked if an enabled floating-point exception state. occurs. The location of these bits and the requirements for altering them are described in Book III. The system floating-point enabled exception error handler is never In all cases, the question of whether a floating-point invoked because of a disabled floating-point exception. result is stored, and what value is stored, is governed The effects of the four possible settings of these bits by the FPSCR exception enable bits, as described in are as follows. subsequent sections, and is not affected by the value of the FE0 and FE1 bits. In all cases in which the system floating-point enabled exception error handler is invoked, all instructions before the instruction at which the system floating-point enabled exception error handler is invoked have been completed, and no instruction after the instruction at which the system floating-point enabled exception error handler is invoked has begun Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 293 Version 2.06 execution. The instruction at which the system Engineering Note floating-point enabled exception error handler is invoked has completed if it is the excepting instruction, It is permissible for the implementation to be precise and there is only one such instruction. Otherwise, it in any of the three modes that permit interrupts, or to be recoverable in Nonrecoverable Mode. has not begun execution, or has been partially executed in some cases, as described in Book III. Programming Note In any of the three non-Precise modes, a Float- ing-Point Status and Control Register instruction can be used to force any exceptions, because of instructions initiated before the Floating-Point Status and Control Register instruction, to be recorded in the FPSCR. (This forcing is superfluous for Precise Mode.) In both Imprecise modes, a Floating-Point Status and Control Register instruction can be used to force any invocations of the system floating-point enabled exception error handler that result from instructions initiated before the Floating-Point Status and Control Register instruction to occur. This forc- ing has no effect in Ignore Exceptions Mode, and is superfluous for Precise Mode. The last sentence of the paragraph preceding this Programming Note can apply only in the Imprecise modes, or if the mode has just been changed from Ignore Exceptions Mode to some other mode. It always applies in the latter case. To obtain the best performance across the widest range of implementations, the programmer should obey the following guidelines. ­ If the IEEE default results are acceptable to the application, Ignore Exceptions Mode should be used with all FPSCR exception enable bits set to 0. ­ If the IEEE default results are not acceptable to the application, Imprecise Nonrecoverable Mode should be used, or Imprecise Recoverable Mode if recoverability is needed, with FPSCR exception enable bits set to 1 for those exceptions for which the system floating-point enabled exception error handler is to be invoked. ­ Ignore Exceptions Mode should not, in general, be used when any FPSCR exception enable bits are set to 1. ­ Precise Mode can degrade performance in some implementations, perhaps substantially, and therefore should be used only for debugging and other specialized applications. 294 Power ISATM Book I Version 2.06 7.4.1 Floating-Point Invalid Operation Exception 7.4.1.1 Definition 7.4.1.2 Action for VE=1 An Invalid Operation exception occurs when an When Invalid Operation exception is enabled (VE=1) operand is invalid for the specified operation. The and an Invalid Operation exception occurs, the invalid operations are: following actions are taken: SNaN For VSX Scalar Floating-Point Arithmetic, VSX Any floating-point operation on a Signaling NaN. Scalar DP-SP Conversion, VSX Scalar Convert Floating-Point to Integer, and VSX Scalar Round Infinity­Infinity to Floating-Point Integer instructions: Magnitude subtraction of infinities. 1. One or two of the following Invalid Operation Infinity÷Infinity exceptions are set to 1. Floating-point division of infinity by infinity. VXSNAN (if SNaN) Zero÷Zero VXISI (if Infinity­Infinity) Floating-point division of zero by zero. VXIDI (if Infinity÷Infinity) VXZDZ (if Zero÷Zero) Infinity × Zero VXIMZ (if Infinity×Zero) Floating-point multiplication of infinity by zero. VXSQRT (if Invalid Square Root) VXCVI (if Invalid Integer Convert) Invalid Compare Floating-point ordered comparison involving a 2. Update of VSR[XT] is suppressed. NaN. 3. FR and FI are set to zero. 4. FPRF is unchanged. Invalid Square Root Floating-point square root or reciprocal square For VSX Scalar Floating-Point Compare root of a nonzero negative number. instructions: Invalid Integer Convert 1. One or two of the following Invalid Operation Floating-point-to-integer convert involving a exceptions are set to 1. number too large in magnitude to be represented in the target format, or involving an infinity or a VXSNAN (if SNaN) NaN. VXVC (if Invalid Compare) An Invalid Operation exception also occurs when an 2. FR, FI, and C are unchanged. mtfsfi, mtfsf, or mtfsb1 instruction is executed that 3. FPCC is set to reflect unordered. sets VXSOFT to 1 (Software-Defined Condition). For VSX Vector Floating-Point Arithmetic, VSX The action to be taken depends on the setting of the Vector Floating-Point Compare, VSX Vector Invalid Operation Exception Enable bit of the FPSCR. DP-SP Conversion, VSX Vector Convert Floating-Point to Integer, and VSX Vector Round to Floating-Point Integer instructions: 1. One or two of the following Invalid Operation exceptions are set to 1. VXSNAN (if SNaN) VXISI (if Infinity ­ Infinity) VXIDI (if Infinity ÷ Infinity) VXZDZ (if Zero ÷ Zero) VXIMZ (if Infinity × Zero) VXVC (if Invalid Compare) VXSQRT (if Invalid Square Root) VXCVI (if Invalid Integer Convert) 2. Update of VSR[XT] is suppressed for all Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 295 Version 2.06 vector elements. 3. FR and FI are unchanged. 4. FPRF is unchanged. 296 Power ISATM Book I Version 2.06 7.4.1.3 Action for VE=0 When Invalid Operation exception is disabled (VE=0) and an Invalid Operation exception occurs, the following actions are taken: For the VSX Scalar round and Convert For the VSX Scalar Double-Precision Arithmetic Double-Precision to Single-Precision format instructions, VSX Scalar Double-Precision (xscvdpsp) instruction: Maximum/Minimum instructions, the VSX Scalar Convert Single-Precision to Double-Precision 1. VXSNAN is set to 1. format (xscvspdp) instruction, and the VSX Scalar Round to Double-Precision Integer 2. The single-precision representation of a Quiet instructions: NaN is placed into word element 0 of VSR[XT]. The contents of word elements 1-3 1. One or two of the following Invalid Operation of VSR[XT] are undefined. exceptions are set to 1. 3. FR and FI are set to 0. VXSNAN (if SNaN) VXISI (if Infinity ­ Infinity) 4. FPRF is set to indicate the class of the result VXIDI (if Infinity ÷ Infinity) (Quiet NaN). VXZDZ (if Zero ÷ Zero) VXIMZ (if Infinity × Zero) For the VSX Vector Single-Precision Arithmetic VXSQRT (if Invalid Square Root) instructions, VSX Vector Single-Precision Maximum/Minimum instructions, the VSX Vector 2. The double-precision representation of a round and Convert Double-Precision to Quiet NaN is placed into doubleword element Single-Precision format (xvcvdpsp) instruction, 0 of VSR[XT]. The contents of doubleword and the VSX Vector Round to Single-Precision element 1 of VSR[XT] are undefined. Integer instructions: 3. FR and FI are set to 0. 1. One or two of the following Invalid Operation exceptions are set to 1. 4. FPRF is set to indicate the class of the result (Quiet NaN). VXSNAN (if SNaN) VXISI (if Infinity ­ Infinity) For the VSX Vector Double-Precision Arithmetic VXIDI (if Infinity ÷ Infinity) instructions, VSX Vector Double-Precision VXZDZ (if Zero ÷ Zero) Maximum/Minimum instructions, the VSX Vector VXIMZ (if Infinity × Zero) Convert Single-Precision to Double-Precision VXSQRT (if Invalid Square Root) format (xvcvspdp) instruction, and the VSX Vector Round to Double-Precision Integer 2. The single-precision representation of a Quiet instructions: NaN is placed into its respective word element of VSR[XT]. 1. One or two of the following Invalid Operation exceptions are set to 1. 3. FR, FI, and FPRF are not modified. VXSNAN (if SNaN) VXISI (if Infinity ­ Infinity) VXIDI (if Infinity ÷ Infinity) VXZDZ (if Zero ÷ Zero) VXIMZ (if Infinity × Zero) VXSQRT (if Invalid Square Root) 2. The double-precision representation of a Quiet NaN is placed into its respective doubleword element of VSR[XT]. 3. FR, FI, and FPRF are not modified. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 297 Version 2.06 For the VSX Scalar Convert Double-Precision to For the VSX Scalar Convert Double-Precision to Signed Integer Doubleword (xscvdpsxd) Unsigned Integer Doubleword (xscvdpuxd) instruction: instruction: 1. One or two of the following Invalid Operation 1. One or two of the following Invalid Operation exceptions are set to 1. exceptions are set to 1. VXSNAN (if SNaN) VXSNAN (if SNaN) VXCVI (if Invalid Integer Convert) VXCVI (if Invalid Integer Convert) 2. 0x7FFF_FFFF_FFFF_FFFF is placed into 2. 0xFFFF_FFFF_FFFF_FFFF is placed into doubleword element 0 of VSR[XT] if the doubleword element 0 of VSR[XT] if the double-precision operand in doubleword double-precision operand in doubleword element 0 of VSR[XB] is a positive number or element 0 of VSR[XB] is a positive number or +Infinity. +Infinity. 0x8000_0000_0000_0000 is placed into 0x0000_0000_0000_0000 is placed into doubleword element 0 of VSR[XT] if the doubleword element 0 of VSR[XT] if the double-precision operand in doubleword double-precision operand in doubleword element 0 of VSR[XB] is a negative number, element 0 of VSR[XB] is a negative number, -Infinity, or NaN. -Infinity, or NaN. The contents of doubleword element 1 of The contents of doubleword element 1 of VSR[XT] are undefined. VSR[XT] are undefined. 3. FR and FI are set to 0. 3. FR and FI are set to 0. 4. FPRF is undefined. 4. FPRF is undefined. For the VSX Scalar Convert Double-Precision to For the VSX Scalar Convert Double-Precision to Signed Integer Word (xscvdpsxw) instruction: Unsigned Integer Word (xscvdpuxw) instruction: 1. One or two of the following Invalid Operation 1. One or two of the following Invalid Operation exceptions are set to 1. exceptions are set to 1. VXSNAN (if SNaN) VXSNAN (if SNaN) VXCVI (if Invalid Integer Convert) VXCVI (if Invalid Integer Convert) 2. 0x7FFF_FFFF is placed into word element 1 2. 0xFFFF_FFFF is placed into word element 1 of VSR[XT] if the double-precision operand in of VSR[XT] if the double-precision operand in doubleword element 0 of VSR[XB] is a doubleword element 0 of VSR[XB] is a positive number or +Infinity. positive number or +Infinity. 0x8000_0000 is placed into word element 1 0x0000_0000 is placed into word element 1 of VSR[XT] if the double-precision operand in of VSR[XT] if the double-precision operand in doubleword element 0 of VSR[XB] is a doubleword element 0 of VSR[XB] is a negative number, -Infinity, or NaN. negative number, -Infinity, or NaN. The contents of word elements 0, 2, and 3 of The contents of word elements 0, 2, and 3 of VSR[XT] are undefined. VSR[XT] are undefined. 3. FR and FI are set to 0. 3. FR and FI are set to 0. 4. FPRF is undefined. 4. FPRF is undefined. 298 Power ISATM Book I Version 2.06 For the VSX Vector Convert Double-Precision to For the VSX Vector Convert Double-Precision to Signed Integer Doubleword (xvcvdpsxd) Unsigned Integer Doubleword (xvcvdpuxd) instruction: instruction: 1. One or two of the following Invalid Operation 1. One or two of the following Invalid Operation exceptions are set to 1. exceptions are set to 1. VXSNAN (if SNaN) VXSNAN (if SNaN) VXCVI (if Invalid Integer Convert) VXCVI (if Invalid Integer Convert) 2. 0x7FFF_FFFF_FFFF_FFFF is placed into 2. 0xFFFF_FFFF_FFFF_FFFF is placed into doubleword element i of VSR[XT] if the doubleword element i of VSR[XT] if the double-precision operand in the double-precision operand in doubleword corresponding doubleword element of element i of VSR[XB] is a positive number or VSR[XB] is a positive number or +Infinity. +Infinity. 0x8000_0000_0000_0000 is placed into its 0x0000_0000_0000_0000 is placed into respective doubleword element i of VSR[XT] doubleword element i of VSR[XT] if the if the double-precision operand in the double-precision operand in doubleword corresponding doubleword element of element i of VSR[XB] is a negative number, VSR[XB] is a negative number, -Infinity, or -Infinity, or NaN. NaN. 3. FR, FI, and FPRF are not modified. 3. FR, FI, and FPRF are not modified. For the VSX Vector Convert Double-Precision to For the VSX Vector Convert Double-Precision to Signed Integer Word (xvcvdpsxw) instruction: Unsigned Integer Word (xvcvdpuxw) instruction: 1. One or two of the following Invalid Operation 1. One or two of the following Invalid Operation exceptions are set to 1. exceptions are set to 1. VXSNAN (if SNaN) VXSNAN (if SNaN) VXCVI (if Invalid Integer Convert) VXCVI (if Invalid Integer Convert) 2. 0x7FFF_FFFF is placed intoword element i×2 2. 0xFFFF_FFFF is placed into word element of VSR[XT] if the double-precision operand in i×2 of VSR[XT] if the double-precision doubleword element i of VSR[XB] is a positive operand in doubleword element i of VSR[XB] number or +Infinity. is a positive number or +Infinity. 0x8000_0000 is placed into word element 0x0000_0000 is placed into word element i×2 of VSR[XT] if the double-precision i×2 of VSR[XT] if the double-precision operand in doubleword element i of VSR[XB] operand in doubleword element i of VSR[XB] is a negative number, -Infinity, or NaN. is a negative number, -Infinity, or NaN. The contents of word element i×2+1 of The contents of word element i×2+1 of VSR[XT] are undefined. VSR[XT] are undefined. 3. FR, FI, and FPRF are not modified. 3. FR, FI, and FPRF are not modified. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 299 Version 2.06 For the VSX Vector Convert Single-Precision to For the VSX Vector Convert Single-Precision to Signed Integer Doubleword (xvcvspsxd) Unsigned Integer Doubleword (xvcvspuxd) instruction: instruction: 1. One or two of the following Invalid Operation 1. One or two of the following Invalid Operation exceptions are set to 1. exceptions are set to 1. VXSNAN (if SNaN) VXSNAN (if SNaN) VXCVI (if Invalid Integer Convert) VXCVI (if Invalid Integer Convert) 2. 0x7FFF_FFFF_FFFF_FFFF is placed into 2. 0xFFFF_FFFF_FFFF_FFFF is placed into doubleword element i of VSR[XT] if the doubleword element i of VSR[XT] if the single-precision operand in word element i×2 single-precision operand in word element i×2 of VSR[XB] is a positive number or +Infinity. of VSR[XB] is a positive number or +Infinity. 0x8000_0000_0000_0000 is placed into 0x0000_0000_0000_0000 is placed into doubleword element i of VSR[XT] if the doubleword element i of VSR[XT] if the single-precision operand in word element i×2 single-precision operand in word element i×2 of VSR[XB] is a negative number, -Infinity, or of VSR[XB] is a negative number, -Infinity, or NaN. NaN. 3. FR, FI, and FPRF are not modified. 3. FR, FI, and FPRF are not modified. For the VSX Vector Convert Single-Precision to For the VSX Vector Convert Single-Precision to Signed Integer Word (xvcvspsxw) instruction: Unsigned Integer Word (xvcvspuxw) instruction: 1. One or two of the following Invalid Operation 1. One or two of the following Invalid Operation exceptions are set to 1. exceptions are set to 1. VXSNAN (if SNaN) VXSNAN (if SNaN) VXCVI (if Invalid Integer Convert) VXCVI (if Invalid Integer Convert) 2. 0x7FFF_FFFF is placed into word element i 2. 0xFFFF_FFFF is placed into word element i of VSR[XT] if the single-precision operand in of VSR[XT] if the single-precision operand in word element i of VSR[XB] is a positive the corresponding word element i×2 of number or +Infinity. VSR[XB] is a positive number or +Infinity. 0x8000_0000 is placed into word element i 0x0000_0000 is placed into word element i of VSR[XT] if the single-precision operand in of VSR[XT] if the single-precision operand in word element i of VSR[XB] is a negative word element i i×2 of VSR[XB] is a negative number, -Infinity, or NaN. number, -Infinity, or NaN. The contents of word element i×2+1 of The contents of word element i×2+1 of VSR[XT] are undefined. VSR[XT] are undefined. 3. FR, FI, and FPRF are not modified. 3. FR, FI, and FPRF are not modified. 300 Power ISATM Book I Version 2.06 For the VSX Scalar Floating-Point Compare instructions: 1. One or two of the following Invalid Operation exceptions are set to 1. VXSNAN (if SNaN) VXVC (if invalid compare) 2. FR, FI and C are unchanged. 3. FPCC is set to reflect unordered. For the VSX Vector Compare Single-Precision instructions: 1. One or two of the following Invalid Operation exceptions are set to 1. VXSNAN (if SNaN) VXVC (if invalid compare) 2. 0x0000_0000 is placed into its respective word element of VSR[XT]. 3. FR, FI, and FPRF are not modified. For the vector double-precision compare instructions: 1. One or two of the following Invalid Operation exceptions are set to 1. VXSNAN (if SNaN) VXVC (if invalid compare) 2. 0x0000_0000_0000_0000 is placed into its respective doubleword element of VSR[XT]. 3. FR, FI, and FPRF are not modified. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 301 Version 2.06 7.4.2 Floating-Point Zero Divide Exception 7.4.2.1 Definition 7.4.2.3 Action for ZE=0 A Zero Divide exception occurs when a VSX Scalar When Zero Divide exception is disabled (ZE=0) and a Divide Double-Precision (xsdivdp), VSX Vector Divide Zero Divide exception occurs, the following actions are Double-Precision (xvdivdp), or VSX Vector Divide taken: Single-Precision (xvdivsp) instruction is executed with a zero divisor value and a finite nonzero dividend For the VSX Scalar Divide Double-Precision value. (xsdivdp) instruction: It also occurs when a VSX Scalar Reciprocal Estimate 1. ZX is set to 1. Double-Precision (xsredp), VSX Vector Reciprocal Estimate Double-Precision (xvredp), VSX Vector 2. The double-precision representation of Reciprocal Estimate Single-Precision (xvresp), VSX Infinity, where the sign is determined by the Scalar Reciprocal Square Root Estimate XOR of the signs of the operands, is placed Double-Precision (xsrsqrtedp), VSX Vector into doubleword element 0 of VSR[XT]. The Reciprocal Square Root Estimate Double-Precision contents of doubleword element 1 of VSR[XT] (xvrsqrtedp), or VSX Vector Reciprocal Square Root are undefined. Estimate Single-Precision (xvrsqrtesp) instruction is executed with an operand value of zero. 3. FR and FI are set to 0. The action to be taken depends on the setting of the 4. FPRF is set to indicate the class and sign of Zero Divide Exception Enable bit of the FPSCR. the result (± Infinity). For the VSX Vector Divide Double-Precision 7.4.2.2 Action for ZE=1 (xvdivdp) instructions: When Zero Divide exception is enabled (ZE=1) and a 1. ZX is set to 1. Zero Divide exception occurs, the following actions are taken: 2. The double-precision representation of Infinity, where the sign is determined by the For the VSX Scalar Divide Double-Precision XOR of the signs of the operands, is placed (xsdivdp), VSX Scalar Reciprocal Estimate into its respective doubleword element of Double-Precision (xsredp), and VSX Scalar VSR[XT]. Reciprocal Square Root Estimate Double-Precision (xsrsqrtedp) instructions: 3. FR, FI, and FPRF are not modified. 1. ZX is set to 1. For the VSX Vector Divide Single-Precision 2. Update of VSR[XT] is suppressed. (xvdivsp) instructions: 3. FR and FI are set to 0. 4. FPRF is unchanged. 1. ZX is set to 1. For the VSX Vector Divide Double-Precision 2. The single-precision representation of Infinity, (xvdivdp), VSX Vector Divide Single-Precision where the sign is determined by the XOR of (xvdivsp), VSX Vector Reciprocal Estimate the signs of the operands, is placed into its Double-Precision (xvredp), VSX Vector respective word element of VSR[XT]. Reciprocal Estimate Single-Precision (xvresp), VSX Vector Reciprocal Square Root Estimate 2. The result is placed into its respective word Double-Precision (xvrsqrtedp), or VSX Vector element of VSR[XT] as a single-precision Reciprocal Square Root Estimate Single-Precision value. (xvrsqrtesp) instructions: 3. FR, FI, and FPRF are not modified. 1. ZX is set to 1. 2. Update of VSR[XT] is suppressed for all vector elements. 3. FR and FI are unchanged. 4. FPRF is unchanged. 302 Power ISATM Book I Version 2.06 For the VSX Scalar Reciprocal Estimate Double-Precision (xsredp) and VSX Scalar Reciprocal Square Root Estimate Double-Precision (xsrsqrtedp) instructions: 1. ZX is set to 1. 2. The double-precision representation of Infinity, where the sign is the sign of the operand, is placed into doubleword element 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are undefined. 3. FR and FI are set to 0. 4. FPRF is set to indicate the class and sign of the result (± Infinity). For the VSX Vector Reciprocal Estimate Double-Precision (xvredp) and VSX Vector Reciprocal Square Root Estimate Double-Precision (xvrsqrtedp) instructions: 1. ZX is set to 1. 2. The double-precision representation of Infinity, where the sign is the sign of the operand, is placed into its respective doubleword element of VSR[XT]. 3. FR, FI, and FPRF are not modified. For the VSX Vector Reciprocal Estimate Single-Precision (xvresp) and VSX Vector Reciprocal Square Root Estimate Single-Precision (xvrsqrtesp) instructions: 1. ZX is set to 1. 2. The single-precision representation of Infinity, where the sign is the sign of the operand, is placed into its respective word element of VSR[XT]. 2. The result is placed into its respective word element of VSR[XT] as a single-precision value. 3. FR, FI, and FPRF are not modified. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 303 Version 2.06 7.4.3 Floating-Point Overflow Exception 7.4.3.1 Definition For VSX Vector Add Double-Precision (xvadddp), VSX Vector Add Single-Precision (xvaddsp), VSX An Overflow exception occurs when the magnitude of Vector Divide Double-Precision (xvdivdp), VSX what would have been the rounded result if the Vector Divide Single-Precision (xvdivsp), VSX exponent range were unbounded exceeds that of the Vector Multiply Double-Precision (xvmuldp), VSX largest finite number of the specified result precision. Vector Multiply Single-Precision (xvmulsp), VSX Vector Reciprocal Estimate Double-Precision The action to be taken depends on the setting of the (xvredp), VSX Vector Reciprocal Estimate Overflow Exception Enable bit of the FPSCR. Single-Precision (xvresp), VSX Vector round and Convert Double-Precision to Single-Precision format (xvcvdpsp), VSX Vector Subtract 7.4.3.2 Action for OE=1 Double-Precision (xvsubdp), VSX Vector When Overflow exception is enabled (OE=1) and an Subtract Single-Precision (xvsubsp), VSX Vector Overflow exception occurs, the following actions are Double-Precision Multiply-Add Arithmetic, and taken: VSX Vector Single-Precision Multiply-Add Arithmetic instructions: For the VSX Vector round and Convert Double-Precision to Single-Precision format 1. OX is set to 1. (xscvdpsp) instruction: 2. Update of VSR[XT] is suppressed for all 1. OX is set to 1. vector elements. 2. The exponent of the normalized intermediate 3. FR, FI, and FPRF are not modified. result is adjusted by subtracting 192. 7.4.3.3 Action for OE=0 3. The adjusted rounded result is placed into word element 0 of VSR[XT] as a When Overflow exception is disabled (OE=0) and an single-precision value. The contents of word Overflow exception occurs, the following actions are elements 1-3 of VSR[XT] are undefined. taken: 4. FPRF is set to indicate the class and sign of 1. OX and XX are set to 1. the result (±Normal Number). 2. The result is determined by the rounding For VSX Scalar Add Double-Precision (xsadddp), mode (RN) and the sign of the intermediate VSX Scalar Divide Double-Precision (xsdivdp), result as follows: VSX Scalar Multiply Double-Precision (xsmuldp), VSX Scalar Reciprocal Estimate Double-Precision Round to Nearest Even (xsredp), VSX Scalar Subtract Double-Precision For negative overflow, the result is (xssubdp), and VSX Scalar Double-Precision -Infinity. Multiply-Add Arithmetic instructions: For positive overflow, the result is +Infinity. 1. OX is set to 1. Round toward Zero 2. The exponent of the normalized intermediate For negative overflow, the result is the result is adjusted by subtracting 1536. format's most negative finite number. For positive overflow, the result is the 3. The adjusted rounded result is placed into format's most positive finite number. doubleword element 0 of VSR[XT] as a double-precision value. The contents of Round toward +Infinity doubleword element 1 of VSR[XT] are For negative overflow, the result is the undefined. format's most negative finite number. For positive overflow, the result is 4. FPRF is set to indicate the class and sign of +Infinity. the result (±Normal Number). 304 Power ISATM Book I Version 2.06 Round toward -Infinity For VSX Vector Add Double-Precision (xvadddp), For negative overflow, the result is VSX Vector Divide Double-Precision (xvdivdp), -Infinity. VSX Vector Multiply Double-Precision (xvmuldp), For positive overflow, the result is the VSX Vector Reciprocal Estimate Double-Precision format's most positive finite number. (xvredp), VSX Vector Subtract Double-Precision (xvsubdp), and VSX Vector Double-Precision For the VSX Vector round and Convert Multiply-Add Arithmetic instructions: Double-Precision to Single-Precision format (xscvdpsp) instruction: 3. The result is placed into its respective doubleword element of VSR[XT] as a 3. The result is placed into word element 0 of double-precision value. VSR[XT] as a single-precision value. The contents of word elements 1-3 of VSR[XT] 4. FR, FI, and FPRF are not modified. are undefined. 4. FR is undefined. 5. FI is set to 1. 6. FPRF is set to indicate the class and sign of the result (±Infinity or ±Normal Number). For VSX Scalar Add Double-Precision (xsadddp), VSX Scalar Divide Double-Precision (xsdivdp), VSX Scalar Multiply Double-Precision (xsmuldp), VSX Scalar Reciprocal Estimate Double-Precision (xsredp), VSX Scalar Subtract Double-Precision (xssubdp), and VSX Scalar Double-Precision Multiply-Add Arithmetic instructions: 3. The result is placed into doubleword element 0 of VSR[XT] as a double-precision value. The contents of doubleword element 1 of VSR[XT] are undefined. 4. FR is undefined. 5. FI is set to 1. 6. FPRF is set to indicate the class and sign of the result (±Infinity or ±Normal Number). For VSX Vector Add Single-Precision (xvaddsp), VSX Vector Divide Single-Precision (xvdivsp), VSX Vector Multiply Single-Precision (xvmulsp), VSX Vector Reciprocal Estimate Single-Precision (xvresp), VSX Vector round and Convert Double-Precision to Single-Precision format (xvcvdpsp), VSX Vector Subtract Single-Precision (xvsubsp), and VSX Vector Single-Precision Multiply-Add Arithmetic instructions: 3. The result is placed into its respective word element of VSR[XT] as a single-precision value. 4. FR, FI, and FPRF are not modified. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 305 Version 2.06 7.4.4 Floating-Point Underflow Exception 7.4.4.1 Definition For VSX Scalar Add Double-Precision (xsadddp), VSX Scalar Divide Double-Precision (xsdivdp), Underflow exception is defined separately for the VSX Scalar Multiply Double-Precision (xsmuldp), enabled and disabled states: VSX Scalar Reciprocal Estimate Double-Precision (xsredp), VSX Scalar Subtract Double-Precision Enabled: (xssubdp), and VSX Scalar Double-Precision Underflow occurs when the intermediate Multiply-Add Arithmetic instructions: result is "Tiny". 1. UX is set to 1. Disabled: Underflow occurs when the intermediate 2. The exponent of the normalized intermediate result is "Tiny" and there is "Loss of result is adjusted by adding 1536. Accuracy". 3. The adjusted rounded result is placed into A tiny result is detected before rounding, when a doubleword element 0 of VSR[XT] as a nonzero intermediate result computed as though both double-precision value. The contents of the precision and the exponent range were unbounded doubleword element 1 of VSR[XT] are would be less in magnitude than the smallest undefined. normalized number. 4. FPRF is set to indicate the class and sign of If the intermediate result is tiny and Underflow the result (±Normalized Number). exception is disabled (UE=0), the intermediate result is denormalized (see Section 7.3.2.4 , "Normalization Programming Note and Denormalization" on page 285) and rounded (see The FR and FI bits are provided to allow the Section 7.3.2.6 , "Rounding" on page 287) before system floating-point enabled exception error being placed into the target VSR. handler, when invoked because of an Under- flow exception, to simulate a "trap disabled" Loss of accuracy is detected when the delivered result environment. That is, the FR and FI bits allow value differs from what would have been computed the system floating-point enabled exception were both the precision and the exponent range error handler to unround the result, thus allow- unbounded. ing the result to be denormalized. The action to be taken depends on the setting of the Underflow Exception Enable bit of the FPSCR. For VSX Vector Add Double-Precision (xvadddp), VSX Vector Add Single-Precision (xvaddsp), VSX Vector Divide Double-Precision (xvdivdp), VSX 7.4.4.2 Action for UE=1 Vector Divide Single-Precision (xvdivsp), VSX Vector Multiply Double-Precision (xvmuldp), VSX When Underflow exception is enabled (UE=1) and an Vector Multiply Single-Precision (xvmulsp), VSX Underflow exception occurs, the following actions are Vector Reciprocal Estimate Double-Precision taken: (xvredp), VSX Vector Reciprocal Estimate Single-Precision (xvresp), VSX Vector round and For the VSX Vector round and Convert Convert Double-Precision to Single-Precision Double-Precision to Single-Precision format format (xvcvdpsp), VSX Vector Subtract (xscvdpsp) instruction: Double-Precision (xvsubdp), VSX Vector Subtract Single-Precision (xvsubsp), VSX Vector 1. UX is set to 1. Double-Precision Multiply-Add Arithmetic, and VSX Vector Single-Precision Multiply-Add 2. The exponent of the normalized intermediate Arithmetic instructions: result is adjusted by adding 192. 1. UX is set to 1. 3. The adjusted rounded result is placed into word element 0 of VSR[XT] as a 2. Update of VSR[XT] is suppressed for all single-precision value. The contents of word vector elements. elements 1-3 of VSR[XT] are undefined. 3. FR, FI, and FPRF are not modified. 4. FPRF is set to indicate the class and sign of the result (±Normalized Number). 306 Power ISATM Book I Version 2.06 7.4.4.3 Action for UE=0 For VSX Vector Add Double-Precision (xvadddp), VSX Vector Divide Double-Precision (xvdivdp), When Underflow exception is disabled (UE=0) and an VSX Vector Multiply Double-Precision (xvmuldp), Underflow exception occurs, the following actions are VSX Vector Reciprocal Estimate Double-Precision taken: (xvredp), VSX Vector Subtract Double-Precision (xvsubdp), and VSX Vector Double-Precision For the VSX Vector round and Convert Multiply-Add Arithmetic instructions: Double-Precision to Single-Precision format (xscvdpsp) instruction: 1. UX is set to 1. 1. UX is set to 1. 2. The result is placed into its respective doubleword element of VSR[XT] as a 2. The result is placed into word element 0 of double-precision value. VSR[XT] as a single-precision value. The contents of word elements 1-3 of VSR[XT] 3. FR, FI, and FPRF are not modified. are undefined. 3. FPRF is set to indicate the class and sign of the result (± Normalized Number, ± Denormalized Number, or ± Zero). For VSX Scalar Add Double-Precision (xsadddp), VSX Scalar Divide Double-Precision (xsdivdp), VSX Scalar Multiply Double-Precision (xsmuldp), VSX Scalar Reciprocal Estimate Double-Precision (xsredp), VSX Scalar Subtract Double-Precision (xssubdp), and VSX Scalar Double-Precision Multiply-Add Arithmetic instructions: 1. UX is set to 1. 2. The result is placed into doubleword element 0 of VSR[XT] as a double-precision value. The contents of doubleword element 1 of VSR[XT] are undefined. 3. FPRF is set to indicate the class and sign of the result (± Normalized Number, ± Denormalized Number, or ± Zero). For VSX Vector Add Single-Precision (xvaddsp), VSX Vector Divide Single-Precision (xvdivsp), VSX Vector Multiply Single-Precision (xvmulsp), VSX Vector Reciprocal Estimate Single-Precision (xvresp), VSX Vector round and Convert Double-Precision to Single-Precision format (xvcvdpsp), VSX Vector Subtract Single-Precision (xvsubsp), and VSX Vector Single-Precision Multiply-Add Arithmetic instructions: 1. UX is set to 1. 2. The result is placed into its respective word element of VSR[XT] as a single-precision value. 3. FR, FI, and FPRF are not modified. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 307 Version 2.06 7.4.5 Floating-Point Inexact Exception 7.4.5.1 Definition 1. XX is set to 1. An Inexact exception occurs when one of two 2. The result is placed into doubleword element conditions occur during rounding: 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are undefined. 1. The rounded result differs from the intermediate result assuming both the precision and the 3. FPRF is set to indicate the class and sign of exponent range of the intermediate result to be the result. unbounded. In this case the result is said to be inexact. (If the rounding causes an enabled For the VSX Scalar truncate Double-Precision to Overflow exception or an enabled Underflow integer and Convert to Signed Fixed-Point Word exception, an Inexact exception also occurs only if format with Saturate (xscvdpsxws) and VSX the significands of the rounded result and the Scalar truncate Double-Precision to integer and intermediate result differ.) Convert to Unsigned Fixed-Point Word format with Saturate (xscvdpuxws) instructions: 2. The rounded result overflows and Overflow exception is disabled. 1. XX is set to 1. The action to be taken depends on the setting of the 2. The result is placed into word element 1 of Inexact Exception Enable bit of the FPSCR. VSR[XT]. The contents of word elements 0, 2, and 3 of VSR[XT] are undefined. 7.4.5.2 Action for XE=1 3. FPRF is set to indicate the class and sign of the result. Programming Note In some implementations, enabling Inexact excep- For VSX vector floating-point operations: tions can degrade performance more than does enabling other types of floating-point exception. 1. XX is set to 1. 2. Update of VSR[XT] is suppressed for all When Inexact exception is enabled (UE=1) and an vector elements. Inexact exception occurs, the following actions are taken: 3. FR, FI, and FPRF are not modified. For the VSX Vector round and Convert Double-Precision to Single-Precision format 7.4.5.3 Action for XE=0 (xscvdpsp) instruction: When Inexact exception is disabled (XE=0) and an 1. XX is set to 1. Inexact exception occurs, the following actions are taken: 2. The result is placed into word element 0 of VSR[XT] as a single-precision value. The For the VSX Vector round and Convert contents of word elements 1-3 of VSR[XT] Double-Precision to Single-Precision format are undefined. instruction: 3. FPRF is set to indicate the class and sign of 1. XX is set to 1. the result. 2. The result is placed into word element 0 of For VSX Scalar Add Double-Precision (xsadddp), VSR[XT] as a single-precision value. The VSX Scalar Subtract Double-Precision (xssubdp), contents of word elements 1-3 of VSR[XT] VSX Scalar Multiply Double-Precision (xsmuldp), are undefined. VSX Scalar Divide Double-Precision (xsdivdp), VSX Scalar Square Root Double-Precision 3. FPRF is set to indicate the class and sign of (xssqrtdp), VSX Scalar Double-Precision the result. Multiply-Add Arithmetic, VSX Scalar Round to For VSX scalar double-precision floating-point Double-Precision Integer Exact using Current operations: rounding mode (xsrdpic), and VSX Scalar Convert Integer to Double-Precision instructions: 308 Power ISATM Book I Version 2.06 1. XX is set to 1. 2. The result is placed into doubleword element 0 of VSR[XT] as a double-precision value. The contents of doubleword element 1 of VSR[XT] are undefined. 3. FPRF is set to indicate the class and sign of the result (±Normalized Number, ± Denormalized Number, or ± Zero). For the VSX Scalar truncate Floating-Point to integer and Convert to Signed Fixed-Point Word format with Saturate instructions: 1. XX is set to 1. 2. The result is placed into word element 1 of VSR[XT] as a single-precision value. The contents of word elements 0, 2, and 3 of VSR[XT] are undefined. 3. FPRF is set to indicate the class and sign of the result. For VSX vector single-precision floating-point operations: 1. XX is set to 1. 2. The result is placed into its respective word element of VSR[XT] as a single-precision value. 3. FR, FI, and FPRF are not modified. For VSX vector double-precision floating-point operations: 1. XX is set to 1. 2. The result is placed into its respective doubleword element of VSR[XT] as a double-precision value. 3. FR, FI, and FPRF are not modified. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 309 Version 2.06 7.5 Storage Access Operations VSR contents when accessing aligned quadword in big-endian storage from The VSX Storage Access instructions compute the Figure 116 effective address (EA) of the storage to be accessed Vt,Vs 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F as described in Power ISA Book I. 0 1 2 3 4 5 6 7 8 9 A B C D E F VSR contents when accessing aligned 7.5.1 Accessing Aligned quadword in little-endian storage from Storage Operands Figure 117 Vt,Vs 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F The following quadword-aligned array, AH, consists of 8 halfwords. 0 1 2 3 4 5 6 7 8 9 A B C D E F Figure 118.Vector-Scalar Register contents for short AW[4] = { 0x0001_0203, aligned quadword Load or Store VSX 0x0405_0607, Vector 0x0809_0A0B, 0x0C0D_0E0F }; Figure 116 illustrates the big-endian storage image of array AW. 0x0000: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 0x0010: 0 1 2 3 4 5 6 7 8 9 A B C D E F Figure 116.Big-endian storage image of array AW Figure 117 illustrates the little-endian storage image of array AW. 0x0000: 03 02 01 00 07 06 05 04 0B 0A 09 08 0F 0E 0D 0C 0x0010: 0 1 2 3 4 5 6 7 8 9 A B C D E F Figure 117.Little-endian storage image of array AW Figure 118 shows the result of loading that quadword into a VSR or, equivalently, shows the contents that must be in a VSR if storing that VSR is to produce the storage contents shown in Figure 116 for big-endian. Note that Figure shows the effect of loading the quadword from both big-endian storage and little-endian storage. 310 Power ISATM Book I Version 2.06 7.5.2 Accessing Unaligned Storage Operands The following array, B, consists of 5 word elements. Loading an Unaligned Quadword from Big-Endian Storage int B[5]; B[0] = 0x01234567; Loading elements from elements 1 through 4 of B (see B[1] = 0x00112233; Figure 119) into VR[VT] involves an unaligned B[2] = 0x44556677; quadword storage access. B[3] = 0x8899AABB; B[4] = 0xCCDDEEFF; VSX supports word-aligned vector and scalar storage accesses using big-endian byte ordering. Figure 119 illustrates both big-endian and little-endian Big-endian storage image of array B storage images of array B. 0x0000: 01 23 45 67 00 11 22 33 44 55 66 77 88 99 AA BB Big-endian storage image of array B 0x0010: CC DD EE FF 0x0000: 01 23 45 67 00 11 22 33 44 55 66 77 88 99 AA BB 0 1 2 3 4 5 6 7 8 9 A B C D E F 0x0010: CC DD EE FF # Assumptions 0 1 2 3 4 5 6 7 8 9 A B C D E F GPR[Ra] = address of B Little-endian storage image of array B GPR[Rb] = 4 (index to B[1]) 0x0000: 67 45 23 01 33 22 11 00 77 66 55 44 BB AA 99 88 lxvw4x Xt,Ra,Rb 0x0010: FF EE DD CC Xt: 00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F Figure 119.Storage images of array B Figure 120.Process to load misaligned quadword from big-endian storage using Load VSX Vector Word*4 Indexed Though this example shows the array starting at a quadword-aligned address, if the subject data of Loading an Unaligned Quadword from interest are elements 1 through 4, accessing elements Little-Endian Storage 1 through 4 of array B involves an unaligned quadword storage access that spans two aligned quadwords. Loading elements from elements 1 through 4 of B (see Figure 119) into VR[VT] involves an unaligned quadword storage access. VSX supports word-aligned vector and scalar storage accesses using little-endian byte ordering. Little-endian storage image of array B 0x0000: 67 45 23 01 33 22 11 00 77 66 55 44 BB AA 99 88 0x0010: FF EE DD CC 0 1 2 3 4 5 6 7 8 9 A B C D E F # Assumptions GPR[A] = address of B GPR[B] = 4 (index to B[1]) lxvw4x Xt,Ra,Rb Xt: 00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF 0 1 2 3 4 5 6 7 8 9 A B C D E F Figure 121.Process to load misaligned quadword from little-endian storage Load VSX Vector Word*4 Indexed Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 311 Version 2.06 Storing an Unaligned Quadword to Big-Endian Storing an Unaligned Quadword to Little-Endian Storage Storage Storing a VSR to elements 1 through 4 of B (see Storing a VSR to elements 1 through 4 of B (see Figure 119) into VR[VT] involves an unaligned Figure 119) into VR[VT] involves an unaligned quadword storage access. quadword storage access. VSX supports word-aligned vector and scalar storage VSX supports word-aligned vector and scalar storage accesses using big-endian byte ordering. accesses using little-endian byte ordering. Big-endian storage image of array B Little-endian storage image of array B 0x0000: 01 23 45 67 00 11 22 33 44 55 66 77 88 99 AA BB 0x0000: 67 45 23 01 33 22 11 00 77 66 55 44 BB AA 99 88 0x0010: CC DD EE FF 0x0010: FF EE DD CC 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F Xs: F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA BB FC FD FE FF Xs: F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA BB FC FD FE FF 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F # Assumptions # Assumptions GPR[Ra] = address of B GPR[A] = address of B GPR[Rb] = 4 (index to B[1]) GPR[B] = 4 (index to B[1]) stxvw4x Xs,Ra,Rb stxvw4x Xs,Ra,Rb 0x0000: 01 23 45 67 F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA BB 0x0000: 67 45 23 01 F3 F2 F1 F0 F7 F6 F5 F4 FB FA F9 F8 0x0010: FC FD FE FF 0x0010: FF FE FD FC 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F Figure 122.Process to store misaligned quadword Figure 123.Process to store misaligned quadword to big-endian storage using Store VSX to little-endian storage Store VSX Vector Vector Word*4 Indexed Word*4 Indexed 7.5.3 Storage Access Exceptions Storage accesses cause the system data storage error handler to be invoked if the program is not allowed to modify the target storage (Store only), or if the program attempts to access storage that is unavailable. The address used for VSX storage accesses must be word-aligned to guarantee the access can be performed in hardware rather than result in an Alignment interrupt. 312 Power ISATM Book I Version 2.06 7.6 VSX Instruction Set Summary 7.6.1 VSX Storage Access Instructions There are two basic forms of scalar load and scalar There are two basic forms of vector load and vector store instructions, word and doubleword. VSX Scalar store instructions, a vector of 4 word elements and a Load instructions place a copy of the contents of the vector of two doublewords. Both forms access a addressed word or doubleword in storage into the quadword in storage. left-most word or doubleword element of the target VSR. The contents of the right-most element(s) of the There is one basic form of scalar load and splat target VSR are undefined. VSX Scalar Store instruction, doubleword. VSX Scalar Load and Splat instructions place a copy of the contents of the instruction places a copy of the contents of the left-most word or doubleword element in the source addressed doubleword in storage into both doubleword VSR into the addressed word or doubleword in elements of the target VSR. storage. 7.6.1.1 VSX Scalar Storage Access Instructions Mnemonic Instruction Name Page lxsdux Load VSX Scalar Doubleword with Update Indexed 338 lxsdx Load VSX Scalar Doubleword Indexed 338 Table 8. VSX Scalar Load Instructions Mnemonic Instruction Name Page stxsdux Store VSX Scalar Doubleword with Update Indexed 340 stxsdx Store VSX Scalar Doubleword Indexed 340 Table 9. VSX Scalar Store Instructions 7.6.1.2 VSX Vector Storage Access Instructions Mnemonic Instruction Name Page lxvd2ux Load VSX Vector Doubleword*2 with Update Indexed 338 lxvd2x Load VSX Vector Doubleword*2 Indexed 338 lxvw4ux Load VSX Vector Word*4 with Update Indexed 339 lxvw4x Load VSX Vector Word*4 Indexed 339 Table 10. VSX Vector Load Instructions Mnemonic Instruction Name Page lxvdsx Load VSX Vector Doubleword and Splat Indexed 339 Table 11. VSX Vector Load and Splat Instruction Mnemonic Instruction Name Page stxvd2ux Store VSX Vector Doubleword*2 with Update Indexed 340 stxvd2x Store VSX Vector Doubleword*2 Indexed 340 stxvw4ux Store VSX Vector Word*4 with Update Indexed 341 stxvw4x Store VSX Vector Word*4 Indexed 341 Table 12. VSX Vector Store Instructions Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 313 Version 2.06 7.6.2 VSX Move Instructions 7.6.2.1 VSX Scalar Move Instructions Mnemonic Instruction Name Page xsabsdp VSX Scalar Absolute Value Double-Precision 341 xscpsgndp VSX Scalar Copy Sign Double-Precision 351 xsnabsdp VSX Scalar Negative Absolute Value Double-Precision 377 xsnegdp VSX Scalar Negate Double-Precision 377 Table 13. VSX Scalar Double-Precision Move Instructions 7.6.2.2 VSX Vector Move Instructions Mnemonic Instruction Name Page xvabsdp VSX Vector Absolute Value Double-Precision 397 xvcpsgndp VSX Vector Copy Sign Double-Precision 410 xvnabsdp VSX Vector Negative Absolute Value Double-Precision 461 xvnegdp VSX Vector Negate Double-Precision 462 Table 14. VSX Vector Double-Precision Move Instructions Mnemonic Instruction Name Page xvabssp VSX Vector Absolute Value Single-Precision 397 xvcpsgnsp VSX Vector Copy Sign Single-Precision 410 xvnabssp VSX Vector Negative Absolute Value Single-Precision 461 xvnegsp VSX Vector Negate Single-Precision 462 Table 15. VSX Vector Single-Precision Move Instructions 314 Power ISATM Book I Version 2.06 7.6.3 VSX Floating-Point Arithmetic Instructions 7.6.3.1 VSX Scalar Floating-Point Arithmetic Instructions Mnemonic Instruction Name Page xsadddp VSX Scalar Add Double-Precision 342 xsdivdp VSX Scalar Divide Double-Precision 363 xsmuldp VSX Scalar Multiply Double-Precision 375 xsredp VSX Scalar Reciprocal Estimate Double-Precision 390 xsrsqrtedp VSX Scalar Reciprocal Square Root Estimate Double-Precision 391 xssqrtdp VSX Scalar Square Root Double-Precision 392 xssubdp VSX Scalar Subtract Double-Precision 393 xstdivdp VSX Scalar Test for software Divide Double-Precision 395 xstsqrtdp VSX Scalar Test for software Square Root Double-Precision 396 Table 16. VSX Scalar Double-Precision Elementary Arithmetic Instructions Mnemonic Instruction Name Page xsmaddadp VSX Scalar Multiply-Add Type-A Double-Precision 365 xsmaddmdp VSX Scalar Multiply-Add Type-M Double-Precision 365 xsmsubadp VSX Scalar Multiply-Subtract Type-A Double-Precision 372 xsmsubmdp VSX Scalar Multiply-Subtract Type-M Double-Precision 372 xsnmaddadp VSX Scalar Negative Multiply-Add Type-A Double-Precision 378 xsnmaddmdp VSX Scalar Negative Multiply-Add Type-M Double-Precision 378 xsnmsubadp VSX Scalar Negative Multiply-Subtract Type-A Double-Precision 383 xsnmsubmdp VSX Scalar Negative Multiply-Subtract Type-M Double-Precision 383 Table 17. VSX Scalar Double-Precision Multiply-Add Arithmetic Instructions 7.6.3.2 VSX Vector Floating-Point Arithmetic Instructions Mnemonic Instruction Name Page xvadddp VSX Vector Add Double-Precision 398 xvdivdp VSX Vector Divide Double-Precision 433 xvmuldp VSX Vector Multiply Double-Precision 457 xvredp VSX Vector Reciprocal Estimate Double-Precision 480 xvrsqrtedp VSX Vector Reciprocal Square Root Estimate Double-Precision 485 xvsqrtdp VSX Vector Square Root Double-Precision 487 xvsubdp VSX Vector Subtract Double-Precision 489 xvtdivdp VSX Vector Test for software Divide Double-Precision 493 xvtsqrtdp VSX Vector Test for software Square Root Double-Precision 495 Table 18. VSX Vector Double-Precision Elementary Arithmetic Instructions Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 315 Version 2.06 Mnemonic Instruction Name Page xvaddsp VSX Vector Add Single-Precision 402 xvdivsp VSX Vector Divide Single-Precision 435 xvmulsp VSX Vector Multiply Single-Precision 459 xvresp VSX Vector Reciprocal Estimate Single-Precision 481 xvrsqrtesp VSX Vector Reciprocal Square Root Estimate Single-Precision 486 xvsqrtsp VSX Vector Square Root Single-Precision 488 xvsubsp VSX Vector Subtract Single-Precision 491 xvtdivsp VSX Vector Test for software Divide Single-Precision 494 xvtsqrtsp VSX Vector Test for software Square Root Single-Precision 495 Table 19. VSX Vector Single-Precision Elementary Arithmetic Instructions Mnemonic Instruction Name Page xvmaddadp VSX Vector Multiply-Add Type-A Double-Precision 437 xvmaddmdp VSX Vector Multiply-Add Type-M Double-Precision 437 xvmsubadp VSX Vector Multiply-Subtract Type-A Double-Precision 451 xvmsubmdp VSX Vector Multiply-Subtract Type-M Double-Precision 451 xvnmaddadp VSX Vector Negative Multiply-Add Type-A Double-Precision 463 xvnmaddmdp VSX Vector Negative Multiply-Add Type-M Double-Precision 463 xvnmsubadp VSX Vector Negative Multiply-Subtract Type-A Double-Precision 471 xvnmsubmdp VSX Vector Negative Multiply-Subtract Type-M Double-Precision 471 Table 20. VSX Vector Double-Precision Multiply-Add Arithmetic Instructions Mnemonic Instruction Name Page xvmaddasp VSX Vector Multiply-Add Type-A Single-Precision 440 xvmaddmsp VSX Vector Multiply-Add Type-M Single-Precision 440 xvmsubasp VSX Vector Multiply-Subtract Type-A Single-Precision 454 xvmsubmsp VSX Vector Multiply-Subtract Type-M Single-Precision 454 xvnmaddasp VSX Vector Negative Multiply-Add Type-A Single-Precision 468 xvnmaddmsp VSX Vector Negative Multiply-Add Type-M Single-Precision 468 xvnmsubasp VSX Vector Negative Multiply-Subtract Type-A Single-Precision 474 xvnmsubmsp VSX Vector Negative Multiply-Subtract Type-M Single-Precision 474 Table 21. VSX Vector Single-Precision Multiply-Add Arithmetic Instructions 316 Power ISATM Book I Version 2.06 7.6.4 VSX Floating-Point Compare Instructions 7.6.4.1 VSX Scalar Floating-Point Compare Instructions Mnemonic Instruction Name Page xscmpodp VSX Scalar Compare Ordered Double-Precision 347 xscmpudp VSX Scalar Compare Unordered Double-Precision 349 Table 22. VSX Scalar Compare Double-Precision Instructions Mnemonic Instruction Name Page xsmaxdp VSX Scalar Maximum Double-Precision 368 xsmindp VSX Scalar Minimum Double-Precision 370 Table 23. VSX Scalar Double-Precision Maximum/Minimum Instructions 7.6.4.2 VSX Vector Floating-Point Compare Instructions Mnemonic Instruction Name Page xvcmpeqdp[.] VSX Vector Compare Equal To Double-Precision 404 xvcmpgedp[.] VSX Vector Compare Greater Than or Equal To Double-Precision 406 xvcmpgtdp[.] VSX Vector Compare Greater Than Double-Precision 408 Table 24. VSX Vector Compare Double-Precision Instructions Mnemonic Instruction Name Page xvcmpeqsp[.] VSX Vector Compare Equal To Single-Precision 405 xvcmpgesp[.] VSX Vector Compare Greater Than or Equal To Single-Precision 407 xvcmpgtsp[.] VSX Vector Compare Greater Than Single-Precision 409 Table 25. VSX Vector Compare Single-Precision Instructions Mnemonic Instruction Name Page xvmaxdp VSX Vector Maximum Double-Precision 443 xvmindp VSX Vector Minimum Double-Precision 447 Table 26. VSX Vector Double-Precision Maximum/Minimum Instructions Mnemonic Instruction Name Page xvmaxsp VSX Vector Maximum Single-Precision 445 xvminsp VSX Vector Minimum Single-Precision 449 Table 27. VSX Vector Single-Precision Maximum/Minimum Instructions Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 317 Version 2.06 7.6.5 VSX DP-SP Conversion Instructions 7.6.5.1 VSX Scalar DP-SP Conversion Instructions Mnemonic Instruction Name Page xscvdpsp VSX Scalar round and Convert Double-Precision to Single-Precision format 352 xscvspdp VSX Scalar Convert Single-Precision to Double-Precision format 361 Table 28. VSX Scalar DP-SP Conversion Instructions 7.6.5.2 VSX Vector DP-SP Conversion Instructions Mnemonic Instruction Name Page xvcvdpsp VSX Vector round and Convert Double-Precision to Single-Precision format 411 xvcvspdp VSX Vector Convert Single-Precision to Double-Precision format 420 Table 29. VSX Vector DP-SP Conversion Instructions 7.6.6 VSX Integer Conversion Instructions 7.6.6.1 VSX Scalar Integer Conversion Instructions Mnemonic Instruction Name Page VSX Scalar truncate Double-Precision to integer and Convert to Signed Fixed-Point xscvdpsxds 353 Doubleword format with Saturate VSX Scalar truncate Double-Precision to integer and Convert to Signed Fixed-Point Word xscvdpsxws 355 format with Saturate VSX Scalar truncate Double-Precision to integer and Convert to Unsigned Fixed-Point xscvdpuxds 357 Doubleword format with Saturate VSX Scalar truncate Double-Precision to integer and Convert to Unsigned Fixed-Point Word xscvdpuxws 359 format with Saturate Table 30. VSX Scalar Convert Double-Precision to Integer Instructions Mnemonic Instruction Name Page xscvsxddp VSX Scalar round and Convert Signed Fixed-Point Doubleword to Double-Precision format 361 xscvuxddp VSX Scalar round and Convert Unsigned Fixed-Point Doubleword to Double-Precision format 362 Table 31. VSX Scalar Convert Integer to Double-Precision Instructions 318 Power ISATM Book I Version 2.06 7.6.6.2 VSX Vector Integer Conversion Instructions Mnemonic Instruction Name Page VSX Vector truncate Double-Precision to integer and Convert to Signed Fixed-Point xvcvdpsxds 412 Doubleword format with Saturate VSX Vector truncate Double-Precision to integer and Convert to Signed Fixed-Point Word xvcvdpsxws 414 format with Saturate VSX Vector truncate Double-Precision to integer and Convert to Unsigned Fixed-Point xvcvdpuxds 416 Doubleword format with Saturate VSX Vector truncate Double-Precision to integer and Convert to Unsigned Fixed-Point Word xvcvdpuxws 418 format with Saturate Table 32. VSX Vector Convert Double-Precision to Integer Instructions Mnemonic Instruction Name Page VSX Vector truncate Single-Precision to integer and Convert to Signed Fixed-Point xvcvspsxds 421 Doubleword format with Saturate VSX Vector truncate Single-Precision to integer and Convert to Signed Fixed-Point Word xvcvspsxws 423 format with Saturate VSX Vector truncate Single-Precision to integer and Convert to Unsigned Fixed-Point xvcvspuxds 425 Doubleword format with Saturate VSX Vector truncate Single-Precision to integer and Convert to Unsigned Fixed-Point Word xvcvspuxws 427 format with Saturate Table 33. VSX Vector Convert Single-Precision to Integer Instructions Mnemonic Instruction Name Page xvcvsxddp VSX Vector Convert and round Signed Fixed-Point Doubleword to Double-Precision format 429 xvcvsxwdp VSX Vector Convert Signed Fixed-Point Word to Double-Precision format 430 xvcvuxddp VSX Vector Convert and round Unsigned Fixed-Point Doubleword to Double-Precision format 431 xvcvuxwdp VSX Vector Convert Unsigned Fixed-Point Word to Double-Precision format 432 Table 34. VSX Vector Convert Integer to Double-Precision Instructions Mnemonic Instruction Name Page xvcvsxdsp VSX Vector Convert and round Signed Fixed-Point Doubleword to Single-Precision format 429 xvcvsxwsp VSX Vector Convert and round Signed Fixed-Point Word to Single-Precision format 430 xvcvuxdsp VSX Vector Convert and round Unsigned Fixed-Point Doubleword to Single-Precision format 431 xvcvuxwsp VSX Vector Convert and round Unsigned Fixed-Point Word to Single-Precision format 432 Table 35. VSX Vector Convert Integer to Single-Precision Instructions Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 319 Version 2.06 7.6.7 VSX Round to Floating-Point Integer Instructions 7.6.7.1 VSX Scalar Round to Floating-Point Integer Instructions Mnemonic Instruction Name Page xsrdpi VSX Scalar Round to Double-Precision Integer using round to Nearest Away 386 xsrdpic VSX Scalar Round to Double-Precision Integer Exact using Current rounding mode 387 xsrdpim VSX Scalar Round to Double-Precision Integer using round towards -Infinity rounding mode 388 xsrdpip VSX Scalar Round to Double-Precision Integer using round towards +Infinity rounding mode 388 xsrdpiz VSX Scalar Round to Double-Precision Integer using round towards Zero rounding mode 389 Table 36. VSX Scalar Round to Double-Precision Integer Instructions 7.6.7.2 VSX Vector Round to Floating-Point Integer Instructions Mnemonic Instruction Name Page xvrdpi VSX Vector Round to Double-Precision Integer using round to Nearest Away 477 xvrdpic VSX Vector Round to Double-Precision Integer Exact using Current rounding mode 478 xvrdpim VSX Vector Round to Double-Precision Integer using round towards -Infinity rounding mode 478 xvrdpip VSX Vector Round to Double-Precision Integer using round towards +Infinity rounding mode 479 xvrdpiz VSX Vector Round to Double-Precision Integer using round towards Zero rounding mode 479 Table 37. VSX Vector Round to Double-Precision Integer Instructions Mnemonic Instruction Name Page xvrspi VSX Vector Round to Single-Precision Integer using round to Nearest Away 482 xvrspic VSX Vector Round to Single-Precision Integer Exact using Current rounding mode 482 xvrspim VSX Vector Round to Single-Precision Integer using round towards -Infinity rounding mode 483 xvrspip VSX Vector Round to Single-Precision Integer using round towards +Infinity rounding mode 483 xvrspiz VSX Vector Round to Single-Precision Integer using round towards Zero rounding mode 484 Table 38. VSX Vector Round to Single-Precision Integer Instructions 7.6.8 VSX Logical Instructions Mnemonic Instruction Name Page xxland VSX Logical AND 496 xxlandc VSX Logical AND with Complement 496 xxlnor VSX Logical NOR 497 xxlor VSX Logical OR 497 xxlxor VSX Logical XOR 498 Table 39. VSX Logical Instructions Mnemonic Instruction Name Page xxsel VSX Select 500 Table 40. VSX Vector Select Instruction 320 Power ISATM Book I Version 2.06 7.6.9 VSX Permute Instructions Mnemonic Instruction Name Page xxmrghw VSX Merge High Word 499 xxmrglw VSX Merge Low Word 499 Table 41. VSX Merge Instructions Mnemonic Instruction Name Page xxspltw VSX Splat Word 501 Table 42. VSX Splat Instruction Mnemonic Instruction Name Page xxpermdi VSX Permute Doubleword Immediate 500 Table 43. VSX Permute Instruction Mnemonic Instruction Name Page xxsldwi VSX Shift Left Double by Word Immediate 501 Table 44. VSX Shift Instruction Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 321 Version 2.06 7.7 VSX Instruction Descriptions 7.7.1 Instruction Description x>=y x and y are integer values. Conventions Return 1 if x is greater than or equal to y, otherwise return 0. 7.7.1.1 Instruction RTL Operators 7.7.1.2 Instruction RTL Function Calls x{y} Return bit y of x. AddDP(x,y) x and y are double-precision floating-point values. x{y:z} Return bits y:z of x. If x or y is an SNaN, vxsnan_flag is set to 1. x=y If x is an Infinity and y is an Infinity of the opposite The value of y is placed into x. sign, vxisi_flag is set to 1. x |= y If x is a QNaN, return x. The value of y is ORed with the value x and placed into x. Otherwise, if x is an SNaN, return x represented as a QNaN. ~x Return the one's complement of x. Otherwise, if y is a QNaN, return y. !x Otherwise, if y is an SNaN, return y represented Return 1 if the contents of x are equal to 0, as a QNaN. otherwise return 0. Otherwise, if x and y are infinities of opposite sign, x || y return the standard QNaN. Return the value of x concatenated with the value of y. For example, 0b010 || 0b111 is the same Otherwise, return the normalized sum of x and y, as 0b010111. having unbounded range and precision. x^y AddSP(x,y) Return the value of x exclusive ORed with the x and y are single-precision floating-point values. value of y. If x or y is an SNaN, vxsnan_flag is set to 1. x?y:z If the value of x is true, return the value of y, If x is an Infinity and y is an Infinity of the opposite otherwise return the value z. sign, vxisi_flag is set to 1. x+y If x is a QNaN, return x. x and y are integer values. Return the sum of x and y. Otherwise, if x is an SNaN, return x represented as a QNaN. x­y x and y are integer values. Otherwise, if y is a QNaN, return y. Return the difference of x and y. Otherwise, if y is an SNaN, return y represented x!=y as a QNaN. x and y are integer values. Return 1 if x is not equal to y, otherwise return 0. Otherwise, if x and y are infinities of opposite sign, return the standard QNaN. x<=y x and y are integer values. Otherwise, return the normalized sum of x added Return 1 if x is less than or equal to y, otherwise to y, having unbounded range and precision. return 0. 322 Power ISATM Book I Version 2.06 ClassDP(x,y) CompareGTSP(x,y) Return a 5-bit characterization of the x and y are single-precision floating-point values. double-precision floating-point number x. If x or y is a NaN, return 0. 0b10001 = Quiet NaN 0b01001 = -Infinity Otherwise, if x is greater than y, return 1. 0b01000 = -Normalized Number 0b11000 = -Denormalized Number Otherwise, return 0. 0b10010 = -Zero 0b00010 = +Zero CompareLTDP(x,y) 0b10100 = +Denormalized Number x and y are double-precision floating-point values. 0b00100 = +Normalized Number 0b00101 = +Infinity If x or y is a NaN, return 0. ClassSP(x,y) Otherwise, if x is less than y, return 1. Return a 5-bit characterization of the Otherwise, return 0. single-precision floating-point number x. CompareLTSP(x,y) 0b10001 = Quiet NaN x and y are single-precision floating-point values. 0b01001 = -Infinity 0b01000 = -Normalized Number If x or y is a NaN, return 0. 0b11000 = -Denormalized Number 0b10010 = -Zero Otherwise, if x is less than y, return 1. 0b00010 = +Zero 0b10100 = +Denormalized Number Otherwise, return 0. 0b00100 = +Normalized Number 0b00101 = +Infinity ConvertDPtoSD(x) x is a double-precision integer value. CompareEQDP(x,y) x and y are double-precision floating-point values. If x is a NaN, return 0x8000_0000_0000_0000, If x or y is a NaN, return 0. vxcvi_flag is set to 1, and vxsnan_flag is set to 1 if x is an SNaN. Otherwise, if x is equal to y, return 1. Otherwise, do the following. Otherwise, return 0. If x is greater than 263-1, return 0x7FFF_FFFF_FFFF_FFFF and CompareEQSP(x,y) vxcvi_flag is set to 1. x and y are single-precision floating-point values. Otherwise, if x is less than -263, If x or y is a NaN, return 0, return 0x8000_0000_0000_0000 and vxcvi_flag is set to 1. Otherwise, if x is equal to y, return 1. Otherwise, return the value x in 64-bit signed Otherwise, return 0. integer format. CompareGTDP(x,y) x and y are double-precision floating-point values. If x or y is a NaN, return 0, Otherwise, if x is greater than y, return 1. Otherwise, return 0. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 323 Version 2.06 ConvertDPtoSW(x) ConvertFPtoDP(x) x is a double-precision integer value. Return the floating-point value x in DP format. If x is a NaN, ConvertFPtoSP(x) return 0x8000_0000, Return the floating-point value x in vxcvi_flag is set to 1, and single-precision format. vxsnan_flag is set to 1 if x is an SNaN ConvertSDtoFP(x) Otherwise, do the following. x is a 64-bit signed integer value. If x is greater than 231-1, return 0x7FFF_FFFF and Return the value x converted to floating-point vxcvi_flag is set to 1. format having unbounded significand precision. Otherwise, if x is less than -231, ConvertSPtoDP(x) return 0x8000_0000 and x is a single-precision floating-point value. vxcvi_flag is set to 1. If x is an SNaN, vxsnan_flag is set to 1. Otherwise, return the value x in 32-bit signed integer format. If x is an SNaN, return x represented as a QNaN in double-precision floating-point format. ConvertDPtoUD(x) x is a double-precision integer value. Otherwise, if x is an QNaN, return x in double-precision floating-point format. If x is a NaN, return 0x0000_0000_0000_0000, Otherwise, return the value x in double-precision vxcvi_flag is set to 1, and floating-point format. vxsnan_flag is set to 1 if x is an SNaN ConvertSPtoSD(x) Otherwise, do the following. x is a single-precision integer value. If x is greater than 264-1, return 0xFFFF_FFFF_FFFF_FFFF and If x is a NaN, vxcvi_flag is set to 1. return 0x8000_0000_0000_0000 and vxcvi_flag is set to 1, and Otherwise, if x is less than 0, vxsnan_flag is set to 1 if x is an SNaN return 0x0000_0000_0000_0000 and vxcvi_flag is set to 1. Otherwise, do the following. If x is greater than 263-1, Otherwise, return the value x in 64-bit return 0x7FFF_FFFF_FFFF_FFFF and unsigned integer format. vxcvi_flag is set to 1. ConvertDPtoUW(x) Otherwise, if x is less than -263, x is a double-precision integer value. return 0x8000_0000_0000_0000 and vxcvi_flag is set to 1. If x is a NaN, return 0x0000_0000, Otherwise, return the value x in 64-bit signed vxcvi_flag is set to 1, and integer format. vxsnan_flag is set to 1 if x is an SNaN ConvertSPtoSW(x) Otherwise, do the following. x is a single-precision integer value. If x is greater than 232-1, return 0xFFFF_FFFF and If x is a NaN, vxcvi_flag is set to 1. return 0x8000_0000, vxcvi_flag is set to 1, and Otherwise, if x is less than 0, vxsnan_flag is set to 1 if x is an SNaN return 0x0000_0000 and vxcvi_flag is set to 1. Otherwise, do the following. If x is greater than 231-1, Otherwise, return the value x in 32-bit return 0x7FFF_FFFF and unsigned integer format. vxcvi_flag is set to 1. 324 Power ISATM Book I Version 2.06 Otherwise, if x is less than -231, ConvertUWtoFP(x) return 0x8000_0000 and x is a 32-bit unsigned integer value. vxcvi_flag is set to 1. Return the value x converted to floating-point Otherwise, return the value x in 32-bit signed format having unbounded significand precision. integer format. DivideDP(x,y) ConvertSPtoUD(x) x and y are double-precision floating-point values. x is a single-precision integer value. If x or y is an SNaN, vxsnan_flag is set to 1. If x is a NaN, return 0x0000_0000_0000_0000, If x is a Zero and y is a Zero, vxzdz_flag is set vxcvi_flag is set to 1, and to 1. vxsnan_flag is set to 1 if x is an SNaN If x is a finite, nonzero value and y is a Zero, Otherwise, do the following. zx_flag is set to 1. If x is greater than 264-1, return 0xFFFF_FFFF_FFFF_FFFF and If x is an Infinity and y is an Infinity, vxidi_flag vxcvi_flag is set to 1. is set to 1. Otherwise, if x is less than 0, If x is a QNaN, return x. return 0x0000_0000_0000_0000 and vxcvi_flag is set to 1. Otherwise, if x is an SNaN, return x represented as a QNaN. Otherwise, return the value x in 64-bit unsigned integer format. Otherwise, if y is a QNaN, return y. ConvertSPtoUW(x) Otherwise, if y is an SNaN, return y represented x is a single-precision integer value. as a QNaN. If x is a NaN, Otherwise, if x is a Zero and y is a Zero, return the return 0x0000_0000, standard QNaN. vxcvi_flag is set to 1, and vxsnan_flag is set to 1 if x is an SNaN Otherwise, if x is a finite, nonzero value and y is a Zero with the same sign as x, return +Infinity. Otherwise, do the following. If x is greater than 232-1, Otherwise, if x is a finite, nonzero value and y is a return 0xFFFF_FFFF and Zero with the opposite sign as x, return -Infinity. vxcvi_flag is set to 1. Otherwise, if x is an Infinity and y is an Infinity, Otherwise, if x is less than 0, return the standard QNaN. return 0x0000_0000 and Otherwise, return the normalized quotient of x vxcvi_flag is set to 1. divided by y, having unbounded range and Otherwise, return the value x in 32-bit precision. unsigned integer format. ConvertSWtoFP(x) x is a 32-bit signed integer value. Return the value x converted to floating-point format having unbounded significand precision. ConvertUDtoFP(x) x is a 64-bit unsigned integer value. Return the value x converted to floating-point format having unbounded significand precision. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 325 Version 2.06 DivideSP(x,y) IsNaN(x) x and y are single-precision floating-point values. Return 1 if x is either an SNaN or a QNaN, otherwise return 0. If x or y is an SNaN, vxsnan_flag is set to 1. IsNeg(x) If x is a Zero and y is a Zero, vxzdz_flag is set Return 1 if x is a negative, nonzero value, to 1. otherwise return 0. If x is a finite, nonzero value and y is a Zero, IsSNaN(x) zx_flag is set to 1. Return 1 if x is an SNaN, otherwise return 0. If x is an Infinity and y is an Infinity, vxidi_flag IsZero(x) is set to 1. Return 1 if x is a Zero, otherwise return 0. If x is a QNaN, return x. MaximumDP(x,y) x and y are double-precision floating-point values. Otherwise, if x is an SNaN, return x represented as a QNaN. If x or y is an SNaN, vxsnan_flag is set to 1. Otherwise, if y is a QNaN, return y. If x is a QNaN and y is not a NaN, return y. Otherwise, if y is an SNaN, return y represented Otherwise, if x is a QNaN, return x. as a QNaN. Otherwise, if x is an SNaN, return x represented Otherwise, if x is a Zero and y is a Zero, return the as a QNaN. standard QNaN. Otherwise, if y is a QNaN, return x. Otherwise, if x is a finite, nonzero value and y is a Zero with the same sign as x, return +Infinity. Otherwise, if y is an SNaN, return y represented as a QNaN. Otherwise, if x is a finite, nonzero value and y is a Zero with the opposite sign as x, return -Infinity. Otherwise, return the greater of x and y, where +0 is considered greater than ­0. Otherwise, if x is an Infinity and y is an Infinity, return the standard QNaN. MaximumSP(x,y) x and y are single-precision floating-point values. Otherwise, return the normalized quotient of x divided by y, having unbounded range and If x or y is an SNaN, vxsnan_flag is set to 1. precision. If x is a QNaN and y is not a NaN, return y. DenormDP(x) x is a floating-point value having unbounded range Otherwise, if x is a QNaN, return x. and precision. Otherwise, if x is an SNaN, return x represented Return the value x with its significand shifted right as a QNaN. by a number of bits equal to the difference of the -1022 and the unbiased exponent of x, and its Otherwise, if y is a QNaN, return x. unbiased exponent set to -1022. Otherwise, if y is an SNaN, return y represented DenormSP(x) as a QNaN. x is a floating-point value having unbounded range and precision. Otherwise, return the greater of x and y, where +0 is considered greater than ­0. Return the value x with its significand shifted right by a number of bits equal to the difference of the -126 and the unbiased exponent of x, and its unbiased exponent set to -126. IsInf(x) Return 1 if x is an Infinity, otherwise return 0. 326 Power ISATM Book I Version 2.06 MEM(x,y) MultiplyAddDP(x,y,z) Contents of a sequence of y bytes of storage. The x, y and z are double-precision floating-point sequence depends on the endianness of the values. storage access as follows. If x, y or z is an SNaN, vxsnan_flag is set to 1. - For big-endian storage accesses, the sequence starts with the byte at address x If x is a Zero and y, is an Infinity or x is an Infinity and ends with the byte at address x+y-1. and y is an Zero, vximz_flag is set to 1. - For little-endian storage accesses, the If the product of x and y is an Infinity and z is an sequence starts with the byte at address Infinity of the opposite sign, vxisi_flag is set to x+y-1 and ends with the byte at address x. 1. MinimumDP(x,y) If x is a QNaN, return x. x and y are double-precision floating-point values. Otherwise, if x is an SNaN, return x represented If x or y is an SNaN, vxsnan_flag is set to 1. as a QNaN. If x is a QNaN and y is not a NaN, return y. Otherwise, if z is a QNaN, return z. Otherwise, if x is a QNaN, return x. Otherwise, if z is an SNaN, return z represented as a QNaN. Otherwise, if x is an SNaN, return x represented as a QNaN. Otherwise, if y is a QNaN, return y. Otherwise, if y is a QNaN, return x. Otherwise, if y is an SNaN, return y represented as a QNaN. Otherwise, if y is an SNaN, return y represented as a QNaN. Otherwise, if x is a Zero and y is an Infinity or x is an Infinity and y is an Zero, return the standard Otherwise, return the lesser of x and y, where ­0 QNaN. is considered less than +0. Otherwise, if the product of x and y is an Infinity, MinimumSP(x,y) and z is an Infinity of the opposite sign, return the x and y are single-precision floating-point values. standard QNaN. If x or y is an SNaN, vxsnan_flag is set to 1. Otherwise, return the normalized sum of z and the product of x and y, having unbounded range and If x is a QNaN and y is not a NaN, return y. precision. Otherwise, if x is a QNaN, return x. Otherwise, if x is an SNaN, return x represented as a QNaN. Otherwise, if y is a QNaN, return x. Otherwise, if y is an SNaN, return y represented as a QNaN. Otherwise, return the lesser of x and y, where ­0 is considered less than +0. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 327 Version 2.06 MultiplyAddSP(x,y,z) MultiplyDP(x,y) x, y and z are single-precision floating-point x and y are double-precision floating-point values. values. If x or y is an SNaN, vxsnan_flag is set to 1. If x, y or z is an SNaN, vxsnan_flag is set to 1. If x is a Zero and y is an Infinity, or x is an Infinity If x is a Zero and y is an Infinity, or x is an Infinity and y is an Zero, vximz_flag is set to 1. and y is an Zero, vximz_flag is set to 1. If x is a QNaN, return x. If the product of x and y is an Infinity and z is an Infinity of the opposite sign, vxisi_flag is set to Otherwise, if x is an SNaN, return x represented 1. as a QNaN. If x is a QNaN, return x. Otherwise, if y is a QNaN, return y. Otherwise, if x is an SNaN, return x represented Otherwise, if y is an SNaN, return y represented as a QNaN. as a QNaN. Otherwise, if z is a QNaN, return z. Otherwise, if x is a Zero and y is as Infinity or x is a Infinity and y is an Zero, return the standard Otherwise, if z is an SNaN, return z represented QNaN. as a QNaN. Otherwise, return the normalized product of x and Otherwise, if y is a QNaN, return y. y, having unbounded range and precision. Otherwise, if y is an SNaN, return y represented MultiplySP(x,y) as a QNaN. x and y are single-precision floating-point values. Otherwise, if x is a Zero and y is an Infinity or x is If x or y is an SNaN, vxsnan_flag is set to 1. an Infinity and y is an Zero, return the standard QNaN. If x is a Zero and y is an Infinity, or x is an Infinity and y is an Zero, vximz_flag is set to 1. Otherwise, if the product of x and y is an Infinity, and z is an Infinity of the opposite sign, return the If x is a QNaN, return x. standard QNaN. Otherwise, if x is an SNaN, return x represented Otherwise, return the normalized sum of z and the as a QNaN. product of x and y, having unbounded range and precision. Otherwise, if y is a QNaN, return y. Otherwise, if y is an SNaN, return y represented as a QNaN. Otherwise, if x is a Zero and y is as Infinity or x is a Infinity and y is an Zero, return the standard QNaN. Otherwise, return the normalized product of x and y, having unbounded range and precision. NegateDP(x) If the double-precision floating-point value x is a NaN, return x. Otherwise, return the double-precision floating-point value x with its sign bit complemented. 328 Power ISATM Book I Version 2.06 NegateSP(x) ReciprocalSquareRootEstimateDP(x) If the single-precision floating-point value x is a x is a double-precision floating-point value. NaN, return x. If x is an SNaN, vxsnan_flag is set to 1. Otherwise, return the single-precision floating-point value x with its sign bit If x is a Zero, zx_flag is set to 1. complemented. If x is a negative, nonzero number, vxsqrt_flag ReciprocalEstimateDP(x) is set to 1. x is a double-precision floating-point value. If x is a QNaN, return x. If x is an SNaN, vxsnan_flag is set to 1. Otherwise, if x is an SNaN, return x represented If x is a Zero, zx_flag is set to 1. as a QNaN. If x is a QNaN, return x. Otherwise, if x is a negative, nonzero value, return the default QNaN. Otherwise, if x is an SNaN, return x represented as a QNaN. Otherwise, return an estimate of the reciprocal of the square root of x having unbounded exponent Otherwise, if x is a Zero, return an Infinity with the range. sign of x. ReciprocalSquareRootEstimateSP(x) Otherwise, if x is an Infinity, return a Zero with the x is a single-precision floating-point value. sign of x. If x is an SNaN, vxsnan_flag is set to 1. Otherwise, return an estimate of the reciprocal of x having unbounded exponent range. If x is a Zero, zx_flag is set to 1. ReciprocalEstimateSP(x) If x is a negative, nonzero number, vxsqrt_flag x is a single-precision floating-point value. is set to 1. If x is an SNaN, vxsnan_flag is set to 1. If x is a QNaN, return x. If x is a Zero, zx_flag is set to 1. Otherwise, if x is an SNaN, return x represented as a QNaN. If x is a QNaN, return x. Otherwise, if x is a negative, nonzero value, return Otherwise, if x is an SNaN, return x represented the default QNaN. as a QNaN. Otherwise, return an estimate of the reciprocal of Otherwise, if x is a Zero, return an Infinity with the the square root of x having unbounded exponent sign of x. range. Otherwise, if x is an Infinity, return a Zero with the reset_xflags() sign of x. vxsnan_flag is set to 0. vximz_flag is set to 0. Otherwise, return an estimate of the reciprocal of vxidi_flag is set to 0. x having unbounded exponent range. vxisi_flag is set to 0. vxzdz_flag is set to 0. vxsqrt_flag is set to 0. vxcvi_flag is set to 0. vxvc_flag is set to 0. ox_flag is set to 0. ux_flag is set to 0. xx_flag is set to 0. zx_flag is set to 0. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 329 Version 2.06 RoundToDP(x,y) x is a 2-bit unsigned integer specifying one of four rounding modes. 0b00 Round to Nearest Even 0b01 Round towards Zero 0b10 Round towards +Infinity 0b11 Round towards - Infinity y is a normalized floating-point value having unbounded range and precision. Return the value y rounded to double-precision under control of the rounding mode specified by x. if IsQNaN(y) then return ConvertFPtoDP(y) if IsInf(y) then return ConvertFPtoDP(y) if IsZero(y) then return ConvertFPtoDP(y) if yNmax then do if OE=0 then do if x=0b00 then r sign ? -Inf : +Inf if x=0b01 then r sign ? -Nmax : +Nmax if x=0b10 then r sign ? -Nmax : +Inf if x=0b11 then r sign ? -Inf : +Nmax ox_flag 0b1 xx_flag 0b1 inc_flag 0bU return(ConvertFPtoDP(r)) end else do r Scalb(r,-1536) ox_flag 1 end end return(ConvertFPtoDP(r)) 330 Power ISATM Book I Version 2.06 RoundToDPCeil(x) RoundToDPIntegerCeil(x) x is a floating-point value having unbounded range x is a double-precision floating-point value. and precision. If x is an SNaN, vxsnan_flag is set to 1. If x is a QNaN, return x. If x is a QNaN, return x. Otherwise, if x is an Infinity, return x. Otherwise, if x is an SNaN, return x represented Otherwise, do the following. as a QNaN. Return the smallest floating-point number having unbounded exponent range but Otherwise, if x is an infinity, return x. double-precision significand precision that is greater or equal in value to x. Otherwise, do the following. Return the smallest double-precision If the magnitude of the value returned is floating-point integer value that is greater or greater than x, inc_flag is set to 1. equal in value to x. If the value returned is not equal to x, If the magnitude of the value returned is xx_flag is set to 1. greater than x, inc_flag is set to 1. RoundToDPFloor(x) If the value returned is not equal to x, x is a floating-point value having unbounded range xx_flag is set to 1. and precision. RoundToDPIntegerFloor(x) If x is a QNaN, return x. x is a double-precision floating-point value. Otherwise, if x is an Infinity, return x. If x is an SNaN, vxsnan_flag is set to 1. Otherwise, do the following. If x is a QNaN, return x. Return the largest floating-point number having unbounded exponent range but Otherwise, if x is an SNaN, return x represented double-precision significand precision that is as a QNaN. lesser or equal in value to x. Otherwise, if x is an infinity, return x. If the magnitude of the value returned is greater than x, inc_flag is set to 1. Otherwise, do the following. Return the largest double-precision If the value returned is not equal to x, floating-point integer value that is lesser or xx_flag is set to 1. equal in value to x If the magnitude of the value returned is greater than x, inc_flag is set to 1. If the value returned is not equal to x, xx_flag is set to 1. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 331 Version 2.06 RoundToDPIntegerNearAway(x) RoundToDPIntegerTrunc(x) x is a double-precision floating-point value. x is a double-precision floating-point value. If x is an SNaN, vxsnan_flag is set to 1. If x is an SNaN, vxsnan_flag is set to 1. If x is a QNaN, return x. If x is a QNaN, return x. Otherwise, if x is an SNaN, return x represented Otherwise, if x is an SNaN, return x represented as a QNaN. as a QNaN. Otherwise, if x is an infinity, return x. Otherwise, if x is an infinity, return x. Otherwise, do the following. Otherwise, do the following. Return the largest double-precision Return the largest double-precision floating-point integer value that is lesser or floating-point integer value that is lesser or equal in value to x+0.5 if x>0, or the smallest equal in value to x if x>0, or the smallest double-precision floating-point integer that is double-precision floating-point integer value greater or equal in value to x-0.5 if x<0. that is greater or equal in value to x if x<0. If the magnitude of the value returned is If the magnitude of the value returned is greater than x, inc_flag is set to 1. greater than x, inc_flag is set to 1. If the value returned is not equal to x, If the value returned is not equal to x, xx_flag is set to 1. xx_flag is set to 1. RoundToDPIntegerNearEven(x) RoundToDPNearEven(x) x is a double-precision floating-point value. x is a floating-point value having unbounded range and precision. If x is an SNaN, vxsnan_flag is set to 1. If x is a QNaN, return x. If x is a QNaN, return x. Otherwise, if x is an Infinity, return x. Otherwise, if x is an SNaN, return x represented as a QNaN. Otherwise, do the following. Return the floating-point number having Otherwise, if x is an infinity, return x. unbounded exponent range but double-precision significand precision that is Otherwise, do the following. nearest in value to x (in case of a tie, the Return the double-precision floating-point floating-point number having unbounded integer value that is nearest in value to x (in exponent range but double-precision case of a tie, the double-precision significand precision with the least-significant floating-point integer value with the bit equal to 0 is used). least-significant bit equal to 0 is used). If the magnitude of the value returned is If the magnitude of the value returned is greater than x, inc_flag is set to 1. greater than x, inc_flag is set to 1. If the value returned is not equal to x, If the value returned is not equal to x, xx_flag is set to 1. xx_flag is set to 1. 332 Power ISATM Book I Version 2.06 RoundToDPTrunc(x) x is a floating-point value having unbounded range and precision. If x is a QNaN, return x. Otherwise, if x is an Infinity, return x. Otherwise, do the following. Return the largest floating-point number having unbounded exponent range but double-precision significand precision that is lesser or equal in value to x if x>0, or the smallest floating-point number having unbounded exponent range but double-precision significand precision that is greater or equal in value to x if x<0. If the magnitude of the value returned is greater than x, inc_flag is set to 1. If the value returned is not equal to x, xx_flag is set to 1. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 333 Version 2.06 RoundToSP(x,y) x is a 2-bit unsigned integer specifying one of four rounding modes. 0b00 Round to Nearest Even 0b01 Round towards Zero 0b10 Round towards +Infinity 0b11 Round towards - Infinity y is a normalized floating-point value having unbounded range and precision. Return the value y rounded to single-precision under control of the rounding mode specified by x. if IsQNaN(y) then return ConvertFPtoSP(y) if IsInf(y) then return ConvertFPtoSP(y) if IsZero(y) then return ConvertFPtoSP(y) if yNmax then do if OE=0 then do if x=0b00 then r sign ? -Inf : +Inf if x=0b01 then r sign ? -Nmax : +Nmax if x=0b10 then r sign ? -Nmax : +Inf if x=0b11 then r sign ? -Inf : +Nmax ox_flag 0b1 xx_flag 0b1 inc_flag 0bU return(ConvertFPtoSP(r)) end else do r Scalb(r,-192) ox_flag 1 end end return(ConvertFPtoSP(r)) 334 Power ISATM Book I Version 2.06 RoundToSPCeil(x) RoundToSPIntegerCeil(x) x is a floating-point value having unbounded range x is a single-precision floating-point value. and precision. If x is an SNaN, vxsnan_flag is set to 1. If x is a QNaN, return x. If x is a QNaN, return x. Otherwise, if x is an Infinity, return x. Otherwise, if x is an SNaN, return x represented Otherwise, do the following. as a QNaN. Return the smallest floating-point number having unbounded exponent range but Otherwise, if x is an infinity, return x. single-precision significand precision that is greater or equal in value to x. Otherwise, do the following. Return the smallest single-precision If the magnitude of the value returned is floating-point integer value that is greater or greater than x, inc_flag is set to 1. equal in value to x. If the value returned is not equal to x, If the magnitude of the value returned is xx_flag is set to 1. greater than x, inc_flag is set to 1. RoundToSPFloor(x) If the value returned is not equal to x, x is a floating-point value having unbounded range xx_flag is set to 1. and precision. RoundToSPIntegerFloor(x) If x is a QNaN, return x. x is a single-precision floating-point value. Otherwise, if x is an Infinity, return x. If x is an SNaN, vxsnan_flag is set to 1. Otherwise, do the following. If x is a QNaN, return x. Return the largest floating-point number having unbounded exponent range but Otherwise, if x is an SNaN, return x represented single-precision significand precision that is as a QNaN. lesser or equal in value to x. Otherwise, if x is an infinity, return x. If the magnitude of the value returned is greater than x, inc_flag is set to 1. Otherwise, do the following. Return the largest single-precision If the value returned is not equal to x, floating-point integer value that is lesser or xx_flag is set to 1. equal in value to x. If the magnitude of the value returned is greater than x, inc_flag is set to 1. If the value returned is not equal to x, xx_flag is set to 1. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 335 Version 2.06 RoundToSPIntegerNearAway(x) RoundToSPIntegerTrunc(x) x is a single-precision floating-point value. x is a single-precision floating-point value. If x is an SNaN, vxsnan_flag is set to 1. If x is a QNaN, return x. If x is a QNaN, return x. Otherwise, if x is an SNaN, return x represented as a QNaN, and vxsnan_flag is set to 1. Otherwise, if x is an SNaN, return x represented as a QNaN. Otherwise, if x is an infinity, return x. Otherwise, if x is an infinity, return x. Otherwise, do the following. Return the largest single-precision Otherwise, do the following. floating-point integer value that is lesser or Return x if x is a floating-point integer; equal in value to x if x>0, or the smallest otherwise return the largest single-precision single-precision floating-point integer value floating-point integer value that is lesser or that is greater or equal in value to x if x<0. equal in value to x+0.5 if x>0, or the smallest single-precision floating-point integer value If the magnitude of the value returned is that is greater or equal in value to x-0.5 if x<0. greater than x, inc_flag is set to 1. If the magnitude of the value returned is If the value returned is not equal to x, greater than x, inc_flag is set to 1. xx_flag is set to 1. If the value returned is not equal to x, RoundToSPNearEven(x) xx_flag is set to 1. x is a floating-point value having unbounded range and precision. RoundToSPIntegerNearEven(x) x is a single-precision floating-point value. If x is a QNaN, return x. If x is an SNaN, vxsnan_flag is set to 1. Otherwise, if x is an Infinity, return x. If x is a QNaN, return x. Otherwise, do the following. Return the floating-point number having Otherwise, if x is an SNaN, return x represented unbounded exponent range but as a QNaN. single-precision significand precision that is nearest in value to x (in case of a tie, the Otherwise, if x is an infinity, return x. floating-point number having unbounded exponent range but single-precision Otherwise, do the following. significand precision with the least-significant Return x if x is a floating-point integer; bit equal to 0 is used). otherwise return the single-precision floating-point integer value that is nearest in If the magnitude of the value returned is value to x (in case of a tie, the greater than x, inc_flag is set to 1. single-precision floating-point integer value with the least-significant bit equal to 0 is If the value returned is not equal to x, used). xx_flag is set to 1. If the magnitude of the value returned is greater than x, inc_flag is set to 1. If the value returned is not equal to x, xx_flag is set to 1. 336 Power ISATM Book I Version 2.06 RoundToSPTrunc(x) SquareRootSP(x) x is a floating-point value having unbounded range x is a single-precision floating-point value. and precision. If x is an SNaN, vxsnan_flag is set to 1. If x is a QNaN, return x. If x is a negative, nonzero value, vxsqrt_flag is Otherwise, if x is an Infinity, return x. set to 1. Otherwise, do the following. If x is a QNaN, return x. Return the largest floating-point number having unbounded exponent range but Otherwise, if x is an SNaN, return x represented single-precision significand precision that is as a QNaN. lesser or equal in value to x if x>0, or the smallest single-precision floating-point Otherwise, if x is a negative, nonzero value, return number that is greater or equal in value to x if the default QNaN. x<0. Otherwise, return the normalized square root of x, If the magnitude of the value returned is having unbounded range and precision. greater than x, inc_flag is set to 1. If the value returned is not equal to x, xx_flag is set to 1. Scalb(x,y) x is a floating-point value having unbounded range and precision. y is a signed integer. Result of multiplying the floating-point value x by 2 y. SetFX(x) x is one of the exception flags in the FPSCR. If the contents of x is 0, FX and x are set to 1. SquareRootDP(x) x is a double-precision floating-point value. If x is an SNaN, vxsnan_flag is set to 1. If x is a negative, nonzero value, vxsqrt_flag is set to 1. If x is a QNaN, return x. Otherwise, if x is an SNaN, return x represented as a QNaN. Otherwise, if x is a negative, nonzero value, return the default QNaN. Otherwise, return the normalized square root of x, having unbounded range and precision. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 337 Version 2.06 Load VSX Scalar Doubleword [ with Update ] Load VSX Vector Doubleword*2 Indexed XX1-form [ with Update ] Indexed XX1-form lxsdx XT,RA,RB (0x7C00_0498) lxvd2x XT,RA,RB (0x7C00_0698) 31 T RA RB 588 TX 31 T RA RB 844 TX 0 6 11 16 21 31 0 6 11 16 21 31 lxsdux XT,RA,RB (0x7C00_04D8) lxvd2ux XT,RA,RB (0x7C00_06D8) 31 T RA RB 620 TX 31 T RA RB 876 TX 0 6 11 16 21 31 0 6 11 16 21 31 XT TX || T XT TX || T a{0:63} (RA=0) ? 0 : GPR[RA] a{0:63} (RA=0) ? 0 : GPR[RA] EA{0:63} a + GPR[RB] EA{0:63} a + GPR[RB] VSR[XT] MEM(EA,8) || 0xUUUU_UUUU_UUUU_UUUU VSR[XT]{0:63} MEM(EA,8) if "lxsdux" then GPR[RA] EA VSR[XT]{64:127} MEM(EA+8,8) if "lxvd2ux" then GPR[RA] EA Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let EA be the sum of the contents of GPR[RA], or 0 if RA is equal to 0, and the contents of GPR[RB]. Let EA be the sum of the contents of GPR[RA], or 0 if RA is equal to 0, and the contents of GPR[RB]. The contents of the doubleword in storage at address EA are placed in doubleword element 0 of VSR[XT]. The contents of the doubleword in storage at address EA are placed into doubleword element 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are The contents of the doubleword in storage at address undefined. EA+8 are placed into doubleword element 1 of VSR[XT]. For lxsdux, ­ EA is placed into GPR[RA]. For lxvd2ux, ­ If RA is equal to 0, the instruction form is invalid. ­ EA is placed into GPR[RA]. ­ If RA is equal to 0, the instruction form is invalid. If EA is not word-aligned (that is, EA{62:63}g0b00), the system alignment error handler is permitted to be If EA is not word-aligned (that is, EA{62:63}g0b00), invoked instead of performing the storage access. the system alignment error handler is permitted to be invoked instead of performing the storage access. Special Registers Altered: None Special Registers Altered: None VSR Data Layout for lxsd[u]x tgt = VSR[XT] VSR Data Layout for lxvd2[u]x tgt = VSR[XT] MEM(EA,8) undefined 0 64 127 MEM(EA,8) MEM(EA+8,8) 0 64 127 338 Power ISATM Book I Version 2.06 Load VSX Vector Doubleword & Splat Indexed Load VSX Vector Word*4 [ with Update ] XX1-form Indexed XX1-form lxvdsx XT,RA,RB (0x7C00_0298) lxvw4x XT,RA,RB (0x7C00_0618) 31 T RA RB 332 TX 31 T RA RB 780 TX 0 6 11 16 21 31 0 6 11 16 21 31 XT TX || T lxvw4ux XT,RA,RB (0x7C00_0658) a{0:63} (RA=0) ? 0 : GPR[RA] EA{0:63} a + GPR[RB] 31 T RA RB 812 TX load_data{0:63} MEM(EA,8) 0 6 11 16 21 31 VSR[XT]{0:63} load_data VSR[XT]{64:127} load_data XT TX || T a{0:63} (RA=0) ? 0 : GPR[RA] Let XT be the value TX concatenated with T. EA{0:63} a + GPR[RB] VSR[XT]{0:31} MEM(EA,4) Let EA be the sum of the contents of GPR[RA], or 0 if VSR[XT]{32:63} MEM(EA+4,4) RA is equal to 0, and the contents of GPR[RB]. VSR[XT]{64:95} MEM(EA+8,4) VSR[XT]{96:127} MEM(EA+12,4) The contents of the doubleword in storage at address if "lxvw4ux" then GPR[RA] EA EA are copied into doubleword elements 0 and 1 of VSR[XT]. Let XT be the value TX concatenated with T. If EA is not word-aligned (that is, EA{62:63}g0b00), Let EA be the sum of the contents of GPR[RA], or 0 if the system alignment error handler is permitted to be RA is equal to 0, and the contents of GPR[RB]. invoked instead of performing the storage access. The contents of the word in storage at address EA are Special Registers Altered: placed into word element 0 of VSR[XT]. None The contents of the word in storage at address EA+4 are placed into word element 1 of VSR[XT]. VSR Data Layout for lxvdsx tgt = VSR[XT] The contents of the word in storage at address EA+8 MEM(EA,8) MEM(EA,8) are placed into word element 2 of VSR[XT]. 0 64 127 The contents of the word in storage at address EA+12 are placed into word element 3 of VSR[XT]. For lxvw4ux, ­ EA is placed into GPR[RA]. ­ If RA is equal to 0, the instruction form is invalid. If EA is not word-aligned (that is, EA{62:63}g0b00), the system alignment error handler is permitted to be invoked instead of performing the storage access. Special Registers Altered: None VSR Data Layout for lxvw4[u]x tgt = VSR[XT] MEM(EA,4) MEM(EA+4,4) MEM(EA+8,4) MEM(EA+12,4) 0 32 64 96 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 339 Version 2.06 Store VSX Scalar Doubleword [ with Update ] Store VSX Vector Doubleword*2 Indexed XX1-form [ with Update ] Indexed XX1-form stxsdx XS,RA,RB (0x7C00_0598) stxvd2x XS,RA,RB (0x7C00_0798) 31 S RA RB 716 SX 31 S RA RB 972 SX 0 6 11 16 21 31 0 6 11 16 21 31 stxsdux XS,RA,RB (0x7C00_05D8) stxvd2ux XS,RA,RB (0x7C00_07D8) 31 S RA RB 748 SX 31 S RA RB 1004 SX 0 6 11 16 21 31 0 6 11 16 21 31 XS SX || S XS SX || S a{0:63} (RA=0) ? 0 : GPR[RA] a{0:63} (RA=0) ? 0 : GPR[RA] EA{0:63} a + GPR[RB] EA{0:63} a + GPR[RB] MEM(EA,8) VSR[XS]{0:63} MEM(EA,8) VSR[XS]{0:63} if "stxsdux" then GPR[RA] EA MEM(EA+8,8) VSR[XS]{64:127} if "stxvd2ux" then GPR[RA] EA Let XS be the value SX concatenated with S. Let XS be the value SX concatenated with S. Let EA be the sum of the contents of GPR[RA], or 0 if RA is equal to 0, and the contents of GPR[RB]. Let EA be the sum of the contents of GPR[RA], or 0 if RA is equal to 0, and the contents of GPR[RB]. The contents of doubleword element 0 of VSR[XS] are placed in the doubleword in storage at address EA. The contents of doubleword element 0 of VSR[XS] are placed in the doubleword in storage at address EA. For stxsdux, ­ EA is placed into GPR[RA]. The contents of doubleword element 1 of VSR[XS] are ­ If RA is equal to 0, the instruction form is invalid. placed in the doubleword in storage at address EA+8. If EA is not word-aligned (that is, EA{62:63}g0b00), For stxvd2ux, the system alignment error handler is permitted to be ­ EA is placed into GPR[RA]. invoked instead of performing the storage access. ­ If RA is equal to 0, the instruction form is invalid. Special Registers Altered: If EA is not word-aligned (that is, EA{62:63}g0b00), None the system alignment error handler is permitted to be invoked instead of performing the storage access. VSR Data Layout for stxsd[u]x Special Registers Altered: src = VSR[XS] None DP/SD/UD/MD unused 0 64 127 VSR Data Layout for stxvd2[u]x src = VSR[XS] DP/SD/UD/MD DP/SD/UD/MD 0 64 127 340 Power ISATM Book I Version 2.06 Store VSX Vector Word*4 [ with Update ] VSX Scalar Absolute Value Double-Precision Indexed XX1-form XX2-form stxvw4x XS,RA,RB (0x7C00_0718) xsabsdp XT,XB (0xF000_0564) 31 S RA RB 908 SX 60 T /// B 345 BX TX 0 6 11 16 21 31 0 6 11 16 21 30 31 stxvw4ux XS,RA,RB (0x7C00_0758) XT TX || T XB BX || B 31 S RA RB 940 SX result{0:63} 0b0 || VSR[XB]{1:63} VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU 0 6 11 16 21 31 XS SX || S Let XT be the value TX concatenated with T. a{0:63} (RA=0) ? 0 : GPR[RA] Let XB be the value BX concatenated with B. EA{0:63} a + GPR[RB] MEM(EA,4) VSR[XS]{0:31} The absolute value of the double-precision MEM(EA+4,4) VSR[XS]{32:63} floating-point operand in doubleword element 0 of MEM(EA+8,4) VSR[XS]{64:95} VSR[XB] is placed into doubleword element 0 of MEM(EA+12,4) VSR[XS]{96:127} VSR[XT] in double-precision format. if "stxvw4ux" then GPR[RA] EA The contents of doubleword element 1 of VSR[XT] are Let XS be the value SX concatenated with S. undefined. Let EA be the sum of the contents of GPR[RA], or 0 if Special Registers Altered: RA is equal to 0, and the contents of GPR[RB]. None The contents of word element 0 of VSR[XS] are placed in the word in storage at address EA. VSR Data Layout for xsabsdp src = VSR[XB] The contents of word element 1 of VSR[XS] are placed in the word in storage at address EA+4. DP unused tgt = VSR[XT] The contents of word element 2 of VSR[XS] are placed in the word in storage at address EA+8. DP undefined 0 64 127 The contents of word element 3 of VSR[XS] are placed in the word in storage at address EA+12. For stxvw4ux, ­ EA is placed into GPR[RA]. ­ If RA is equal to 0, the instruction form is invalid. If EA is not word-aligned (that is, EA{62:63}g0b00), the system alignment error handler is permitted to be invoked instead of performing the storage access. Special Registers Altered: None VSR Data Layout for stxvw4[u]x src = VSR[XS] SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW 0 32 64 96 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 341 Version 2.06 VSX Scalar Add Double-Precision XX3-form The result is placed into doubleword element 0 of VSR[XT] in double-precision format. xsadddp XT,XA,XB (0xF000_0100) The contents of doubleword element 1 of VSR[XT] are 60 T A B 32 AXBX TX undefined. 0 6 11 16 21 29 30 31 FPRF is set to the class and sign of the result. FR is XT TX || T set to indicate if the result was incremented when XA AX || A rounded. FI is set to indicate the result is inexact. XB BX || B reset_xflags() src1 VSR[XA]{0:63} If a trap-enabled invalid operation exception occurs, src2 VSR[XB]{0:63} VSR[XT] and FPRF are not modified, and FR and FI v{0:inf} AddDP(src1,src2) are set to 0. result{0:63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) See Table 47, "Scalar Floating-Point Final Result," on if(vxisi_flag) then SetFX(VXISI) page 345. if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) Special Registers Altered: if(xx_flag) then SetFX(XX) FPRF FR FI FX OX UX XX vex_flag VE & (vxsnan_flag | vxisi_flag) VXSNAN VXISI if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU VSR Data Layout for xsadddp FPRF ClassSP(result) src1 = VSR[XA] FR inc_flag FI xx_flag DP unused end else do src2 = VSR[XB] FR 0b0 DP unused FI 0b0 end tgt = VSR[XT] Let XT be the value TX concatenated with T. DP undefined Let XA be the value AX concatenated with A. 0 64 127 Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src2 is added1 to src1, producing a sum having unbounded range and precision. The sum is normalized2. See Table 45, "Actions for xsadddp," on page 343. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, "Floating-Point Intermediate Result Handling," on page 344. 1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. 342 Power ISATM Book I Version 2.06 src2 -Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) -Infinity v -Infinity v -Infinity v -Infinity v -Infinity v -Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) -NZF v -Infinity v A(src1,src2) v src1 v src1 v A(src1,src2) v +Infinity v src2 vxsnan_flag 1 v Q(src2) -Zero v -Infinity v src2 v -Zero v Rezd v src2 v +Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v -Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2 vxsnan_flag 1 src1 v Q(src2) +NZF v -Infinity v A(src1,src2) v src1 v src1 v A(src1,src2) v +Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v src1 QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1 vxsnan_flag 1 v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Explanation: src1 The double-precision floating-point value in doubleword element 0 of VSR[XA]. src2 The double-precision floating-point value in doubleword element 0 of VSR[XB]. dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Q(x) Return a QNaN with the payload of x. v The intermediate result having unbounded signficand precision and unbounded exponent range. Table 45. Actions for xsadddp Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 343 Version 2.06 Rounding Mode Range of v Case RTN RTZ RTP RTM v is a QNaN Special rv rv rv rv v = -Infinity Special rv rv rv rv q Rnd(v) q Rnd(v) q Rnd(v) q Rnd(v) -Infinity < v [ -(Nmax + 1ulp) Overflow r -Infinity r -Nmax r -Nmax r -Infinity q Rnd(v) q Rnd(v) Overflow r -Infinity -- -- r -Infinity -(Nmax + 1ulp) < v [ -(Nmax + ˝ulp) Normal -- r -Nmax r -Nmax -- q Rnd(v) Overflow ­ -- -- r -Infinity -(Nmax + ˝ulp) < v < -Nmax Normal r -Nmax r -Nmax r -Nmax -- v = -Nmax Normal r -Nmax r -Nmax r -Nmax r -Nmax -Nmax < v < -Nmin Normal r Rnd(v) r Rnd(v) r Rnd(v) r Rnd(v) v = -Nmin Normal r -Nmin r -Nmin r -Nmin r -Nmin q Rnd(v) q Rnd(v) q Rnd(v) q Rnd(v) -Nmin < v < -Zero Tiny r Rnd(Den(v)) r Rnd(Den(v)) r Rnd(Den(v)) r Rnd(Den(v)) v = -Zero Special rv rv rv rv v = Rezd Special r +Zero r +Zero r +Zero r -Zero v = +Zero Special rv rv rv rv q Rnd(v) q Rnd(v) q Rnd(v) q Rnd(v) +Zero < v < +Nmin Tiny r Rnd(Den(v)) r Rnd(Den(v)) r Rnd(Den(v)) r Rnd(Den(v)) v = +Nmin Normal r +Nmin r +Nmin r +Nmin r +Nmin +Nmin < v < +Nmax Normal r Rnd(v) r Rnd(v) r Rnd(v) r Rnd(v) v = +Nmax Normal r +Nmax r +Nmax r +Nmax r +Nmax q Rnd(v) Overflow -- -- r +Infinity -- +Nmax < v < +(Nmax + ˝ulp) Normal r +Nmax r +Nmax -- r +Nmax q Rnd(v) q Rnd(v) Overflow r +Infinity -- r +Infinity -- +(Nmax + ˝ulp) [ v < +(Nmax + 1ulp) Normal -- r +Nmax -- r +Nmax q Rnd(v) q Rnd(v) q Rnd(v) q Rnd(v) +(Nmax + 1ulp) [ v < +Infinity Overflow r -Infinity r +Nmax r -Infinity r +Nmax v = +Infinity Special rv rv rv rv Explanation: ­ This situation cannot occur. v The precise intermediate result defined in the instruction having unbounded range and precision. Den(x) The value x is denormalized. The significand is shifted left by the amount of the difference between Emin for the target precision (that is, -1022 for double-precision, -126 for single-precision) and the unbiased exponent of x. The unbiased exponent of the denormalized value is Emin. The significand of the denormalized value has unbounded significand precision. Rezd Exact-zero-difference result. Applies only to add operations involving source operands having the same magnitude and different signs. Rnd(x) The significand of x is rounded to the target precision according to the rounding mode specified in FPSCRRN. Exponent range of the rounded result is unbounded. See Section 7.3.2.6. Nmax Largest (in magnitude) representable normalized number in the target precision format. Nmin Smallest (in magnitude) representable normalized number in the target precision format. ulp Least significant bit in the target precision format's significand (Unit in the Last Position). RTN Round To Nearest, ties to Even. RTZ Round Toward Zero. RTP Round Toward +infinity. RTM Round Toward ­infinity. Table 46. Floating-Point Intermediate Result Handling 344 Power ISATM Book I Version 2.06 Is r incremented? (|r| > |v|) Is q incremented? (|q| > |v|) Is r inexact? (r g v) Is q inexact? (q g v) vxsnan_flag vxsqrt_flag vximz_flag vxisi_flag vxidi_flag vxzdz_flag zx_flag OE UE VE XE Case ZE Returned Results and Status Setting ­ ­ ­ ­ ­ 0 0 0 0 0 0 0 ­ ­ ­ ­ T(r), FPRFClassFP(r), FI0, FR0 ­ ­ ­ 0 ­ ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ T(r), FI0, FR0, fx(ZX) ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ fx(ZX), error() 0 ­ ­ ­ ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ T(r), FPRFClassFP(r), FI0, FR0, fx(VXSQRT) 0 ­ ­ ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ T(r), FPRFClassFP(r), FI0, FR0, fx(VXZDZ) 0 ­ ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ T(r), FPRFClassFP(r), FI0, FR0, fx(VXIDI) 0 ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ T(r), FPRFClassFP(r), FI0, FR0, fx(VXISI) 0 ­ ­ ­ ­ 0 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ T(r), FPRFClassFP(r), FI0, FR0, fx(VXIMZ) Special 0 ­ ­ ­ ­ 1 0 ­ ­ ­ ­ ­ ­ ­ ­ ­ T(r), FPRFClassFP(r), FI0, FR0, fx(VXSNAN) 0 ­ ­ ­ ­ 1 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ T(r), FPRFClassFP(r), FI0, FR0, fx(VXSNAN), fx(VXIMZ) 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ T(r), FPRFClassFP(r), FI0, FR0, fx(VXSQRT) 1 ­ ­ ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ fx(VXZDZ), error() 1 ­ ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ fx(VXIDI), error() 1 ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ fx(VXISI), error() 1 ­ ­ ­ ­ 0 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ fx(VXIMZ), error() 1 ­ ­ ­ ­ 1 0 ­ ­ ­ ­ ­ ­ ­ ­ ­ fx(VXSNAN), error() 1 ­ ­ ­ ­ 1 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ fx(VXSNAN), fx(VXIMZ), error() Explanation: ­ The results do not depend on this condition. ClassFP(x) Classifies the floating-point value x as defined in Table 2, "Floating-Point Result Flags," on page 281. fx(x) FX is set to 1 if x=0. x is set to 1. Wrap adjust, where = 21536 for double-precision and = 2192 for single-precision. q The value defined in Table 46, "Floating-Point Intermediate Result Handling," on page 344, signficand rounded to the target precision, unbounded exponent range. r The value defined in Table 46, "Floating-Point Intermediate Result Handling," on page 344, signficand rounded to the target precision, bounded exponent range. v The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range. FI Floating-Point Fraction Inexact status flag, FPSCRFI. This status flag is nonsticky. FR Floating-Point Fraction Rounded status flag, FPSCRFR. OX Floating-Point Overflow exception status flag, FPSCROX. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. T(x) The value x is placed in element 0 of VSR[XT] in the target precision format. The contents of the remaining element(s) of VSR[XT] are undefined. UX Floating-Point Underflow exception status flag, FPSCRUX VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. VXSQRT Floating-Point Invalid Operation Exception (Invalid Square Root) status flag, FPSCRVXSQRT. VXIDI Floating-Point Invalid Operation Exception (Infinity ÷ Infinity) status flag, FPSCRVXIDI. VXIMZ Floating-Point Invalid Operation Exception (Infinity × Zero) status flag, FPSCRVXIMZ. VXISI Floating-Point Invalid Operation Exception (Infinity ­ Infinity) status flag, FPSCRVXISI. VXZDZ Floating-Point Invalid Operation Exception (Zero ÷ Zero) status flag, FPSCRVXZDZ. XX Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI. ZX Floating-Point Zero Divide Exception status flag, FPSCRZX. Table 47. Scalar Floating-Point Final Result Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 345 Version 2.06 Is r incremented? (|r| > |v|) Is q incremented? (|q| > |v|) Is r inexact? (r g v) Is q inexact? (q g v) vxsnan_flag vxsqrt_flag vximz_flag vxisi_flag vxidi_flag vxzdz_flag zx_flag OE UE VE XE ZE Case Returned Results and Status Setting ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ no ­ ­ ­ T(r), FPRFClassFP(r), FI0, FR0 ­ ­ ­ ­ 0 ­ ­ ­ ­ ­ ­ ­ yes no ­ ­ T(r), FPRFClassFP(r), FI1, FR0, fx(XX) Normal ­ ­ ­ ­ 0 ­ ­ ­ ­ ­ ­ ­ yes yes ­ ­ T(r), FPRFClassFP(r), FI1, FR1, fx(XX) ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ yes no ­ ­ T(r), FPRFClassFP(r), FI1, FR0, fx(XX), error() ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ yes yes ­ ­ T(r), FPRFClassFP(r), FI1, FR1, fx(XX), error() ­ 0 ­ ­ 0 ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ T(r), FPRFClassFP(r), FI1, FR?, fx(OX), fx(XX) ­ 0 ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ T(r), FPRFClassFP(r), FI1, FR?, fx(OX), fx(XX), error() Overflow ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ no ­ T(q÷), FPRFClassFP(q÷), FI0, FR0, fx(OX), error() ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ yes no T(q÷), FPRFClassFP(q÷), FI1, FR0, fx(OX), fx(XX), error() ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ yes yes T(q÷), FPRFClassFP(q÷), FI1, FR1, fx(OX), fx(XX), error() ­ ­ 0 ­ ­ ­ ­ ­ ­ ­ ­ ­ no ­ ­ ­ T(r), FPRFClassFP(r), FI0, FR0 ­ ­ 0 ­ 0 ­ ­ ­ ­ ­ ­ ­ yes no ­ ­ T(r), FPRFClassFP(r), FI1, FR0, fx(UX), fx(XX) ­ ­ 0 ­ 0 ­ ­ ­ ­ ­ ­ ­ yes yes ­ ­ T(r), FPRFClassFP(r), FI1, FR1, fx(UX), fx(XX) ­ ­ 0 ­ 1 ­ ­ ­ ­ ­ ­ ­ yes no ­ ­ T(r), FPRFClassFP(r), FI1, FR0, fx(UX), fx(XX), error() Tiny ­ ­ 0 ­ 1 ­ ­ ­ ­ ­ ­ ­ yes yes ­ ­ T(r), FPRFClassFP(r), FI1, FR1, fx(UX), fx(XX), error() ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ yes ­ no ­ T(q×), FPRFClassFP(q×), FI0, FR0, fx(UX), error() ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ yes ­ yes no T(q×), FPRFClassFP(q×), FI1, FR0, fx(UX), fx(XX), error() ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ yes ­ yes yes T(q×), FPRFClassFP(q×), FI1, FR1, fx(UX), fx(XX), error() Explanation: ­ The results do not depend on this condition. ClassFP(x) Classifies the floating-point value x as defined in Table 2, "Floating-Point Result Flags," on page 281. fx(x) FX is set to 1 if x=0. x is set to 1. Wrap adjust, where = 21536 for double-precision and = 2192 for single-precision. q The value defined in Table 46, "Floating-Point Intermediate Result Handling," on page 344, signficand rounded to the target precision, unbounded exponent range. r The value defined in Table 46, "Floating-Point Intermediate Result Handling," on page 344, signficand rounded to the target precision, bounded exponent range. v The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range. FI Floating-Point Fraction Inexact status flag, FPSCRFI. This status flag is nonsticky. FR Floating-Point Fraction Rounded status flag, FPSCRFR. OX Floating-Point Overflow exception status flag, FPSCROX. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. T(x) The value x is placed in element 0 of VSR[XT] in the target precision format. The contents of the remaining element(s) of VSR[XT] are undefined. UX Floating-Point Underflow exception status flag, FPSCRUX VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. VXSQRT Floating-Point Invalid Operation Exception (Invalid Square Root) status flag, FPSCRVXSQRT. VXIDI Floating-Point Invalid Operation Exception (Infinity ÷ Infinity) status flag, FPSCRVXIDI. VXIMZ Floating-Point Invalid Operation Exception (Infinity × Zero) status flag, FPSCRVXIMZ. VXISI Floating-Point Invalid Operation Exception (Infinity ­ Infinity) status flag, FPSCRVXISI. VXZDZ Floating-Point Invalid Operation Exception (Zero ÷ Zero) status flag, FPSCRVXZDZ. XX Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI. ZX Floating-Point Zero Divide Exception status flag, FPSCRZX. Table 47. Scalar Floating-Point Final Result (Continued) 346 Power ISATM Book I Version 2.06 VSX Scalar Compare Ordered Double-Precision XX3-form VSR Data Layout for xscmpodp src1 = VSR[XA] xscmpodp BF,XA,XB (0xF000_0158) DP unused 60 BF // A B 43 AXBX / src2 = VSR[XB] 0 6 9 11 16 21 29 30 31 DP undefined XA AX || A 0 64 127 XB BX || B reset_xflags() src1 VSR[XA]{0:63} src2 VSR[XB]{0:63} if( IsSNaN(src1) | IsSNaN(src2) ) then do vxsnan_flag 0b1 if(VE=0) then vxvc_flag 0b1 end else if( IsQNaN(src1) | IsQNaN(src2) ) then vxvc_flag = 0b1 FL CompareLTDP(src1,src2) FG CompareGTDP(src1,src2) FE CompareEQDP(src1,src2) FU IsNAN(src1) | IsNAN(src2) CR[BF] FL || FG || FE || FU if(vxsnan_flag) then SetFX(VXSNAN) if(vxvc_flag) then SetFX(VXVC) Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src1 is compared to src2. Zeros of same or opposite signs compare equal. Infinities of same signs compare equal. See Table 48, "Actions for xscmpodp - Part 1: Compare Ordered," on page 348. The result of the compare is placed into CR field BF and the FPCC. If either of the operands is a NaN, either quiet or signaling, CR field BF and the FPCC are set to reflect unordered. If either of the operands is a Signaling NaN, VXSNAN is set, and Invalid Operation is disabled (VE=0), VXVC is set. If neither operand is a Signaling NaN but at least one operand is a Quiet NaN, VXVC is set. See Table 49, "Actions for xscmpodp - Part 2: Result," on page 348. Special Registers Altered: CR[BF] FPCC FX VXSNAN VXVC Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 347 Version 2.06 src2 ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN cc0b0001 cc0b0001 ­Infinity cc0b0010 cc0b1000 cc0b1000 cc0b1000 cc0b1000 cc0b1000 vxsnan_flag1 vxvc_flag1 vxvc_flag(VE=0) cc0b0001 cc0b0001 ­NZF cc0b0100 ccC(src1,src2) cc0b1000 cc0b1000 cc0b1000 cc0b1000 vxsnan_flag1 vxvc_flag1 vxvc_flag(VE=0) cc0b0001 cc0b0001 ­Zero cc0b0100 cc0b0100 cc0b0010 cc0b0010 cc0b1000 cc0b1000 vxsnan_flag1 vxvc_flag1 vxvc_flag(VE=0) cc0b0001 cc0b0001 +Zero cc0b0100 cc0b0100 cc0b0010 cc0b0010 cc0b1000 cc0b1000 vxsnan_flag1 vxvc_flag1 vxvc_flag(VE=0) src1 cc0b0001 cc0b0001 +NZF cc0b0100 cc0b0100 cc0b0100 cc0b0100 ccC(src1,src2) cc0b1000 vxsnan_flag1 vxvc_flag1 vxvc_flag(VE=0) cc0b0001 cc0b0001 +Infinity cc0b0100 cc0b0100 cc0b0100 cc0b0100 cc0b0100 cc0b0010 vxsnan_flag1 vxvc_flag1 vxvc_flag(VE=0) cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 QNaN vxsnan_flag1 vxvc_flag1 vxvc_flag1 vxvc_flag1 vxvc_flag1 vxvc_flag1 vxvc_flag1 vxvc_flag1 vxvc_flag(VE=0) cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 SNaN vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0) Explanation: src1 The double-precision floating-point value in doubleword element 0 of VSR[XA]. src2 The double-precision floating-point value in doubleword element 0 of VSR[XB]. NZF Nonzero finite number. C(x,y) The floating-point value x is compared to the floating-point value y, returning one of three 4-bit results. 0b1000 when x is greater than y 0b0100 when x is less than y 0b0010 when x is equal to y cc The 4-bit result compare code. Table 48. Actions for xscmpodp - Part 1: Compare Ordered vxsnan_flag vxvc_flag VE Returned Results and Status Setting ­ 0 0 FPCCcc, CR[BF]cc 0 0 1 FPCCcc, CR[BF]cc, fx(VXVC) 0 1 0 FPCCcc, CR[BF]cc, fx(VXSNAN) 0 1 1 FPCCcc, CR[BF]cc, fx(VXSNAN), fx(VXVC) 1 0 1 FPCCcc, CR[BF]cc, fx(VXVC), error() 1 1 ­ FPCCcc, CR[BF]cc, fx(VXSNAN), error() Explanation: ­ The results do not depend on this condition. cc The 4-bit result as defined in Table 48. fx(x) FX is set to 1 if x=0. x is set to 1. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. FX Floating-Point Summary Exception status flag, FPSCRFX. VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. See Section 7.4.1. VXC Floating-Point Invalid Operation Exception (Invalid Compare) status flag, FPSCRVXVC. See Section 7.4.1. Table 49. Actions for xscmpodp - Part 2: Result 348 Power ISATM Book I Version 2.06 VSX Scalar Compare Unordered Double-Precision XX3-form VSR Data Layout for xscmpudp src1 = VSR[XA] xscmpudp BF,XA,XB (0xF000_0118) DP unused 60 BF // A B 35 AXBX / src2 = VSR[XB] 0 6 9 11 16 21 29 30 31 DP undefined XA AX || A 0 64 127 XB BX || B reset_xflags() src1 VSR[XA]{0:63} src2 VSR[XB]{0:63} if( IsSNaN(src1) | IsSNaN(src2) ) then vxsnan_flag 1 FL CompareLTDP(src1,src2) FG CompareGTDP(src1,src2) FE CompareEQDP(src1,src2) FU IsNAN(src1) | IsNAN(src2) CR[BF] FL || FG || FE || FU if(vxsnan_flag) then SetFX(VXSNAN) Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src1 is compared to src2. Zeros of same or opposite signs compare equal equal. Infinities of same signs compare equal. See Table 50, "Actions for xscmpudp - Part 1: Compare Unordered," on page 350. The result of the compare is placed into CR field BF and the FPCC. If either of the operands is a NaN, either quiet or signaling, CR field BF and the FPCC are set to reflect unordered. If either of the operands is a Signaling NaN, VXSNAN is set. See Table 51, "Actions for xscmpudp - Part 2: Result," on page 350. Special Registers Altered: CR[BF] FPCC FX VXSNAN Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 349 Version 2.06 src2 ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN cc = 0b0001 ­Infinity cc = 0b0010 cc = 0b1000 cc = 0b1000 cc = 0b1000 cc = 0b1000 cc = 0b1000 cc = 0b0001 vxsnan_flag = 1 cc = 0b0001 ­NZF cc = 0b0100 cc = C(src1,src2) cc = 0b1000 cc = 0b1000 cc = 0b1000 cc = 0b1000 cc = 0b0001 vxsnan_flag = 1 cc = 0b0001 ­Zero cc = 0b0100 cc = 0b0100 cc = 0b0010 cc = 0b0010 cc = 0b1000 cc = 0b1000 cc = 0b0001 vxsnan_flag = 1 cc = 0b0001 +Zero cc = 0b0100 cc = 0b0100 cc = 0b0010 cc = 0b0010 cc = 0b1000 cc = 0b1000 cc = 0b0001 vxsnan_flag = 1 src1 cc = 0b0001 +NZF cc = 0b0100 cc = 0b0100 cc = 0b0100 cc = 0b0100 cc = C(src1,src2) cc = 0b1000 cc = 0b0001 vxsnan_flag = 1 cc = 0b0001 +Infinity cc = 0b0100 cc = 0b0100 cc = 0b0100 cc = 0b0100 cc = 0b0100 cc = 0b0010 cc = 0b0001 vxsnan_flag = 1 cc = 0b0001 QNaN cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001 vxsnan_flag = 1 cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001 cc = 0b0001 SNaN vxsnan_flag = 1 vxsnan_flag = 1 vxsnan_flag = 1 vxsnan_flag = 1 vxsnan_flag = 1 vxsnan_flag = 1 vxsnan_flag = 1 vxsnan_flag = 1 Explanation: src1 The double-precision floating-point value in doubleword element 0 of VSR[XA]. src2 The double-precision floating-point value in doubleword element 0 of VSR[XB]. NZF Nonzero finite number. C(x,y) The floating-point value x is compared to the floating-point value y, returning one of three 4-bit results. 0b1000 when x is greater than y 0b0100 when x is less than y 0b0010 when x is equal to y cc The 4-bit result compare code. Table 50. Actions for xscmpudp - Part 1: Compare Unordered vxsnan_flag VE Returned Results and Status Setting ­ 0 FPCCcc, CR[BF]cc 0 1 FPCCcc, CR[BF]cc, fx(VXSNAN) 1 1 FPCCcc, CR[BF]cc, fx(VXSNAN), error() Explanation: ­ The results do not depend on this condition. cc The 4-bit result as defined in Table 50. fx(x) FX is set to 1 if x=0. x is set to 1. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. FX Floating-Point Summary Exception status flag, FPSCRFX. VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. See Section 7.4.1. Table 51. Actions for xscmpudp - Part 2: Result 350 Power ISATM Book I Version 2.06 VSX Scalar Copy Sign Double-Precision XX3-form xscpsgndp XT,XA,XB (0xF000_0580) 60 T A B 176 AXBX TX 0 6 11 16 21 29 30 31 XT TX || T XA AX || A XB BX || B result{0:63} VSR[XA]{0} || VSR[XB]{1:63} VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Bit 0 of VSR[XT] is set to the contents of bit 0 of VSR[XA]. Bits 1:63 of VSR[XT] are set to the contents of bits 1:63 of VSR[XB]. The contents of doubleword element 1 of VSR[XT] are undefined. Special Registers Altered: None VSR Data Layout for xscpsgndp src1 = VSR[XA] DP unused src2 = VSR[XB] DP unused tgt = VSR[XT] DP undefined 0 64 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 351 Version 2.06 VSX Scalar round Double-Precision to Special Registers Altered: single-precision and Convert to FPRF FR FI FX OX UX XX VXSNAN Single-Precision format XX2-form xscvdpsp XT,XB (0xF000_0424) VSR Data Layout for xscvdpsp src = VSR[XB] 60 T /// B 265 BX TX 0 6 11 16 21 30 31 DP unused tgt = VSR[XT] XT TX || T XB BX || B SP undefined undefined reset_xflags() 0 32 64 127 src VSR[XB]{0:63} result{0:31} RoundToSP(RN,src) if(vxsnan_flag) then SetFX(VXSNAN) Programming Note if(xx_flag) then SetFX(XX) xscvdpsp can be used to convert the result of a if(ox_flag) then SetFX(OX) category Floating-Point single-precision operation if(ux_flag) then SetFX(UX) from 64-bit double-precision format to 32-bit sin- vex_flag VE & vxsnan_flag gle-precision format for eventual use in VSX vector single-precision operations. The 32-bit single-preci- if( ~vex_flag ) then do sion format is not compatible with category Float- VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU_UUUU_UUUU ing-Point single-precision operations. FPRF ClassSP(result) FR inc_flag FI xx_flag end else do FR 0b0 FI 0b0 end Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, "Floating-Point Intermediate Result Handling," on page 344. The result is placed into word element 0 of VSR[XT] in single-precision format. The contents of word elements 1, 2, and 3 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. See Table 47, "Scalar Floating-Point Final Result," on page 345. 352 Power ISATM Book I Version 2.06 VSX Scalar truncate Double-Precision to ­ FR is set to indicate if the result was incremented integer and Convert to Signed Integer when rounded. Doubleword format with Saturate XX2-form ­ FI is set to indicate the result is inexact. xscvdpsxds XT,XB (0xF000_0560) See Table 52. 60 T /// B 344 BX TX 0 6 11 16 21 30 31 Special Registers Altered: FPRF=0bUUUUU FR FI FX XX XT TX || T VXSNAN VXCVI XB BX || B inc_flag 0b0 reset_xflags() VSR Data Layout for xscvdpsxds rnd{0:63} RoundToDPIntegerTrunc(VSR[XB]{0:63}) src = VSR[XB] result{0:63} ConvertDPtoSD(rnd) if(vxsnan_flag) then SetFX(VXSNAN) DP unused if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) tgt = VSR[XT] vex_flag VE & (vxsnan_flag | vxcvi_flag) SD undefined if( ~vex_flag ) then do 0 64 127 VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF 0bUUUUU Programming Note FR inc_flag xscvdpsxds rounds using Round towards Zero FI xx_flag rounding mode. For other rounding modes, software end must use a Round to Double-Precision Integer else do instruction that corresponds to the desired rounding FR 0b0 mode, including xsrdpic which uses the rounding FI 0b0 mode specified by the RN field in the FPSCR. end Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. If src is a NaN, the result is the value 0x8000_0000_0000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 263-1, the result is 0x7FFF_FFFF_FFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than -263, the result is 0x8000_0000_0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit signed-integer format. If a trap-enabled invalid operation exception occurs, ­ VSR[XT] and FPRF are not modified ­ FR and FI are set to 0. Otherwise, ­ The result is placed into doubleword element 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are undefined. ­ FPRF is set to an undefined value. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 353 Version 2.06 Inexact? ( RoundToDPintegerTrunc((src) g src ) VE XE Returned Results and Status Setting 0 ­ ­ T(Nmin), FR0, FI0, fx(VXCVI) src [ Nmin-1 1 ­ ­ FR0, FI0, fx(VXCVI), error() 0 yes T(Nmin), FR0, FI1, fx(XX) Nmin-1 < src < Nmin ­ 1 yes T(Nmin), FR0, FI1, fx(XX), error() src = Nmin ­ ­ no T(Nmin), FR0, FI0 ­ no T(ConvertDPtoSD(RoundToDPintegerTrunc(src))), FR0, FI0 Nmin < src < Nmax ­ 0 yes T(ConvertDPtoSD(RoundToDPintegerTrunc(src))), FR0, FI1, fx(XX) 1 yes T(ConvertDPtoSD(RoundToDPintegerTrunc(src))), FR0, FI1, fx(XX), error() T(Nmax), FR0, FI0 src = Nmax ­ ­ no Note: This case cannot occur as Nmax is not representable in DP format but is included here for completeness. 0 yes T(Nmax), FR0, FI1, fx(XX) Nmax < src < Nmax+1 ­ 1 yes T(Nmax), FR0, FI1, fx(XX), error() 0 ­ ­ T(Nmax), FR0, FI0, fx(VXCVI) src m Nmax+1 1 ­ ­ FR0, FI0, fx(VXCVI), error() 0 ­ ­ T(Nmin), FR0, FI0, fx(VXCVI) src is a QNaN 1 ­ ­ FR0, FI0, fx(VXCVI), error() 0 ­ ­ T(Nmin), FR0, FI0, fx(VXCVI), fx(VXSNAN) src is a SNaN 1 ­ ­ FR0, FI0, fx(VXCVI), fx(VXSNAN), error() Explanation: fx(x) FX is set to 1 if x=0. x is set to 1. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Nmin The smallest signed integer doubleword value, -263 (0x8000_0000_0000_0000). Nmax The largest signed integer doubleword value, 263-1 (0x7FFF_FFFF_FFFF_FFFF). src The double-precision floating-point value in doubleword element 0 of VSR[XB]. T(x) The signed integer doubleword value x is placed in doubleword element 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are undefined. Table 52. Actions for xscvdpsxds 354 Power ISATM Book I Version 2.06 VSX Scalar truncate Double-Precision to ­ FR is set to indicate if the result was incremented integer and Convert to Signed Integer Word when rounded. format with Saturate XX2-form ­ FI is set to indicate the result is inexact. xscvdpsxws XT,XB (0xF000_0160) See Table 53. 60 T /// B 88 BX TX 0 6 11 16 21 30 31 Special Registers Altered: FPRF=0bUUUUU FR FI FX XX XT TX || T VXSNAN VXCVI XB BX || B inc_flag 0b0 reset_xflags() VSR Data Layout for xscvdpsxws rnd{0:63} RoundToDPIntegerTrunc(VSR[XB]{0:63}) src = VSR[XB] result{0:31} ConvertDPtoSW(rnd) if(vxsnan_flag) then SetFX(VXSNAN) DP unused if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) tgt = VSR[XT] vex_flag VE & (vxsnan_flag | vxcvi_flag) undefined SW undefined if( ~vex_flag ) then do 0 32 64 127 VSR[XT] 0xUUUU_UUUU || result || 0xUUUU_UUUU_UUUU_UUUU FPRF 0bUUUUU Programming Note FR inc_flag xscvdpsxws rounds using Round towards Zero FI xx_flag end rounding mode. For other rounding modes, soft- else do ware must use a Round to Double-Precision Inte- FR 0b0 ger instruction that corresponds to the desired FI 0b0 rounding mode, including xsrdpic which uses the end rounding mode specified by the RN field in the FPSCR. Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. If src is a NaN, the result is the value 0x8000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 231-1, the result is 0x7FFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than -231, the result is 0x8000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit signed-integer format. If a trap-enabled invalid operation exception occurs, ­ VSR[XT] and FPRF are not modified ­ FR and FI are set to 0. Otherwise, ­ The result is placed into word element 1 of VSR[XT]. The contents of word elements 0, 2, and 3 of VSR[XT] are undefined. ­ FPRF is set to an undefined value. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 355 Version 2.06 Inexact? ( RoundToDPintegerTrunc(src) g src ) VE XE Returned Results and Status Setting 0 ­ ­ T(Nmin), FR0, FI0, fx(VXCVI) src [ Nmin-1 1 ­ ­ FR0, FI0, fx(VXCVI), error() 0 yes T(Nmin), FR0, FI1, fx(XX) Nmin-1 < src < Nmin ­ 1 yes T(Nmin), FR0, FI1, fx(XX), error() src = Nmin ­ ­ no T(Nmin), FR0, FI0 ­ no T(ConvertDPtoSW(RoundToDPintegerTrunc(src))), FR0, FI0 Nmin < src < Nmax ­ 0 yes T(ConvertDPtoSW(RoundToDPintegerTrunc(src))), FR0, FI1, fx(XX) 1 yes T(ConvertDPtoSW(RoundToDPintegerTrunc(src))), FR0, FI1, fx(XX), error() src = Nmax ­ ­ no T(Nmax), FR0, FI0 0 yes T(Nmax), FR0, FI1, fx(XX) Nmax < src < Nmax+1 ­ 1 yes T(Nmax), FR0, FI1, fx(XX), error() 0 ­ ­ T(Nmax), FR0, FI0, fx(VXCVI) src m Nmax+1 1 ­ ­ FR0, FI0, fx(VXCVI), error() 0 ­ ­ T(Nmin), FR0, FI0, fx(VXCVI) src is a QNaN 1 ­ ­ FR0, FI0, fx(VXCVI), error() 0 ­ ­ T(Nmin), FR0, FI0, fx(VXCVI), fx(VXSNAN) src is a SNaN 1 ­ ­ FR0, FI0, fx(VXCVI), fx(VXSNAN), error() Explanation: fx(x) FX is set to 1 if x=0. x is set to 1. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Nmin The smallest signed integer word value, -231(0x8000_0000). Nmax The largest signed integer word value, 231-1 (0x7FFF_FFFF). src The double-precision floating-point value in doubleword element 0 of VSR[XB]. T(x) The signed integer word value x is placed in word element 1 of VSR[XT]. The contents of word elements 0, 2, and 3 of VSR[XT] are undefined. Table 53. Actions for xscvdpsxws 356 Power ISATM Book I Version 2.06 VSX Scalar truncate Double-Precision integer ­ FR is set to indicate if the result was incremented and Convert to Unsigned Integer Doubleword when rounded. format with Saturate XX2-form ­ FI is set to indicate the result is inexact. xscvdpuxds XT,XB (0xF000_0520) See Table 54. 60 T /// B 328 BX TX 0 6 11 16 21 30 31 Special Registers Altered: FPRF=0bUUUUU FR FI FX XX XT TX || T VXSNAN VXCVI XB BX || B inc_flag 0b0 reset_xflags() VSR Data Layout for xscvdpuxds rnd{0:63} RoundToDPIntegerTrunc(VSR[XB]{0:63}) src = VSR[XB] result{0:63} ConvertDPtoUD(rnd) if(vxsnan_flag) then SetFX(VXSNAN) DP unused if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) tgt = VSR[XT] vex_flag VE & (vxsnan_flag | vxcvi_flag) UD undefined if( ~vex_flag ) then do 0 64 127 VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF 0bUUUUU Programming Note FR inc_flag xscvdpuxds rounds using Round towards Zero FI xx_flag end rounding mode. For other rounding modes, soft- else do ware must use a Round to Double-Precision Inte- FR 0b0 ger instruction that corresponds to the desired FI 0b0 rounding mode, including xsrdpic which uses the end rounding mode specified by the RN field in the FPSCR. Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. If src is a NaN, the result is the value 0x0000_0000_0000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 264-1, the result is 0xFFFF_FFFF_FFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than 0, the result is 0x0000_0000_0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit unsigned-integer format. If a trap-enabled invalid operation exception occurs, ­ VSR[XT] and FPRF are not modified ­ FR and FI are set to 0. Otherwise, ­ The result is placed into doubleword element 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are undefined. ­ FPRF is set to an undefined value. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 357 Version 2.06 Inexact? ( RoundToDPintegerTrunc(src) g src ) VE XE Returned Results and Status Setting 0 ­ ­ T(Nmin), FR0, FI0, fx(VXCVI) src [ Nmin-1 1 ­ ­ FR0, FI0, fx(VXCVI), error() 0 yes T(Nmin), FR0, FI1, fx(XX) Nmin-1 < src < Nmin ­ 1 yes T(Nmin), FR0, FI1, fx(XX), error() src = Nmin ­ ­ no T(Nmin), FR0, FI0 ­ no T(ConvertDPtoUD(RoundToDPintegerTrunc(src))), FR0, FI0 Nmin < src < Nmax ­ 0 yes T(ConvertDPtoUD(RoundToDPintegerTrunc(src))), FR0, FI1, fx(XX) 1 yes T(ConvertDPtoUD(RoundToDPintegerTrunc(src))), FR0, FI1, fx(XX), error() T(Nmax), FR0, FI0 src = Nmax ­ ­ no Note: This case cannot occur as Nmax is not representable in DP format but is included here for completeness. 0 yes T(Nmax), FR0, FI1, fx(XX) Nmax < src < Nmax+1 ­ 1 yes T(Nmax), FR0, FI1, fx(XX), error() 0 ­ ­ T(Nmax), FR0, FI0, fx(VXCVI) src m Nmax+1 1 ­ ­ FR0, FI0, fx(VXCVI), error() 0 ­ ­ T(Nmin), FR0, FI0, fx(VXCVI) src is a QNaN 1 ­ ­ FR0, FI0, fx(VXCVI), error() 0 ­ ­ T(Nmin), FR0, FI0, fx(VXCVI), fx(VXSNAN) src is a SNaN 1 ­ ­ FR0, FI0, fx(VXCVI), fx(VXSNAN), error() Explanation: fx(x) FX is set to 1 if x=0. x is set to 1. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Nmin The smallest unsigned integer doubleword value, 0 (0x0000_0000_0000_0000). Nmax The largest unsigned integer doubleword value, 264-1 (0xFFFF_FFFF_FFFF_FFFF). src The double-precision floating-point value in doubleword element 0 of VSR[XB]. T(x) The unsigned integer doubleword value x is placed in doubleword element 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are undefined. Table 54. Actions for xscvdpuxds 358 Power ISATM Book I Version 2.06 VSX Scalar truncate Double-Precision to ­ FR is set to indicate if the result was incremented integer and Convert to Unsigned Integer Word when rounded. format with Saturate XX2-form ­ FI is set to indicate the result is inexact. xscvdpuxws XT,XB (0xF000_0120) See Table 55. 60 T /// B 72 BX TX 0 6 11 16 21 30 31 Special Registers Altered: FPRF=0bUUUUU FR FI FX XX XT TX || T VXSNAN VXCVI XB BX || B inc_flag 0b0 reset_xflags() VSR Data Layout for xscvdpuxws rnd{0:63} RoundToDPIntegerTrunc(VSR[XB]{0:63}) src = VSR[XB] result{0:31} ConvertDPtoUW(rnd) if(vxsnan_flag) then SetFX(VXSNAN) DP unused if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) tgt = VSR[XT] vex_flag VE & (vxsnan_flag | vxcvi_flag) undefined UW undefined if( ~vex_flag ) then do 0 32 64 127 VSR[XT] 0xUUUU_UUUU || result || 0xUUUU_UUUU_UUUU_UUUU FPRF 0bUUUUU Programming Note FR inc_flag xscvdpuxws rounds using Round towards Zero FI xx_flag end rounding mode. For other rounding modes, soft- else do ware must use a Round to Double-Precision Inte- FR 0b0 ger instruction that corresponds to the desired FI 0b0 rounding mode, including xsrdpic which uses the end rounding mode specified by the RN field in the FPSCR. Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. If src is a NaN, the result is the value 0x0000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 232-1, the result is 0xFFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than 0, the result is 0x0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit unsigned-integer format. If a trap-enabled invalid operation exception occurs, ­ VSR[XT] and FPRF are not modified ­ FR and FI are set to 0. Otherwise, ­ The result is placed into word element 1 of VSR[XT]. The contents of word elements 0, 2, and 3 of VSR[XT] are undefined. ­ FPRF is set to an undefined value. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 359 Version 2.06 Inexact? ( RoundToDPintegerTrunc(src) g src ) VE XE Returned Results and Status Setting 0 ­ ­ T(Nmin), FR0, FI0, fx(VXCVI) src [ Nmin-1 1 ­ ­ FR0, FI0, fx(VXCVI), error() 0 yes T(Nmin), FR0, FI1, fx(XX) Nmin-1 < src < Nmin ­ 1 yes T(Nmin), FR0, FI1, fx(XX), error() src = Nmin ­ ­ no T(Nmin), FR0, FI0 ­ no T(ConvertDPtoUW(RoundToDPintegerTrunc(src))), FR0, FI0 Nmin < src < Nmax ­ 0 yes T(ConvertDPtoUW(RoundToDPintegerTrunc(src))), FR0, FI1, fx(XX) 1 yes T(ConvertDPtoUW(RoundToDPintegerTrunc(src))), FR0, FI1, fx(XX), error() src = Nmax ­ ­ no T(Nmax), FR0, FI0 0 yes T(Nmax), FR0, FI1, fx(XX) Nmax < src < Nmax+1 ­ 1 yes T(Nmax), FR0, FI1, fx(XX), error() 0 ­ ­ T(Nmax), FR0, FI0, fx(VXCVI) src m Nmax+1 1 ­ ­ FR0, FI0, fx(VXCVI), error() 0 ­ ­ T(Nmin), FR0, FI0, fx(VXCVI) src is a QNaN 1 ­ ­ FR0, FI0, fx(VXCVI), error() 0 ­ ­ T(Nmin), FR0, FI0, fx(VXCVI), fx(VXSNAN) src is a SNaN 1 ­ ­ FR0, FI0, fx(VXCVI), fx(VXSNAN), error() Explanation: fx(x) FX is set to 1 if x=0. x is set to 1. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Nmin The smallest unsigned integer word value, 0 (0x0000_0000). Nmax The largest unsigned integer word value, 232-1 (0xFFFF_FFFF). src The double-precision floating-point value in doubleword element 0 of VSR[XB]. T(x) The unsigned integer word value x is placed in word element 1 of VSR[XT]. The contents of word elements 0, 2, and 3 of VSR[XT] are undefined. Table 55. Actions for xscvdpuxws 360 Power ISATM Book I Version 2.06 VSX Scalar Convert Single-Precision to VSX Scalar Convert and round Signed Integer Double-Precision format XX2-form Doubleword to Double-Precision format XX2-form xscvspdp XT,XB (0xF000_0524) xscvsxddp XT,XB (0xF000_05E0) 60 T /// B 329 BX TX 0 6 11 16 21 30 31 60 T /// B 376 BX TX 0 6 11 16 21 30 31 XT TX || T XB BX || B XT TX || T reset_xflags() XB BX || B result{0:31} ConvertSPtoDP(VSR[XB]{0:31}) reset_xflags() if(vxsnan_flag) then SetFX(VXSNAN) v{0:inf} ConvertSDtoFP(VSR[XB]{0:63}) FR 0b0 result{0:63} RoundToDP(RN,v) FI 0b0 VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU vex_flag VE & vxsnan_flag if(xx_flag) then SetFX(XX) FPRF ClassDP(result) if( ~vex_flag ) then do FR inc_flag VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FI xx_flag FPRF ClassDP(result) end Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the signed integer value in doubleword element 0 of VSR[XB]. Let src be the single-precision floating-point value in word element 0 of VSR[XB]. src is converted to an unbounded-precision floating-point value and rounded to double-precision src is placed into doubleword element 0 of VSR[XT] in using the rounding mode specified by the double-precision format. Floating-Point Rounding Control field RN of the FPSCR. The contents of doubleword element 1 of VSR[XT] are undefined. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. FPRF is set to the class and sign of the result. FR is set to 0. FI is set to 0. The contents of doubleword element 1 of VSR[XT] are undefined. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI FPRF is set to the class and sign of the result. FR is are set to 0. set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. Special Registers Altered: FPRF FR=0b0 FI=0b0 FX VXSNAN Special Registers Altered: FPRF FR FI FX XX VSR Data Layout for xscvspdp src = VSR[XB] VSR Data Layout for xscvsxddp SP unused unused src = VSR[XB] tgt = VSR[XT] SD unused DP undefined tgt = VSR[XT] 0 32 64 127 DP undefined 0 64 127 Programming Note xscvspdp can be used to convert a single-preci- sion value in single-precision format to double-pre- cision format for use by category Floating-Point scalar single-precision operations. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 361 Version 2.06 VSX Scalar Convert and round Unsigned Integer Doubleword to Double-Precision format XX2-form xscvuxddp XT,XB (0xF000_05A0) 60 T /// B 360 BX TX 0 6 11 16 21 30 31 XT TX || T XB BX || B reset_xflags() src{0:inf} ConvertUDtoFP(VSR[XB]{0:63}) result{0:63} RoundToDP(RN,src) VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU if(xx_flag) then SetFX(XX) FPRF ClassDP(result) FR inc_flag FI xx_flag Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the unsigned integer value in doubleword element 0 of VSR[XB]. src is converted to an unbounded-precision floating-point value and rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. Special Registers Altered: FPRF FR FI FX XX VSR Data Layout for xscvuxddp src = VSR[XB] UD unused tgt = VSR[XT] DP undefined 0 64 127 362 Power ISATM Book I Version 2.06 VSX Scalar Divide Double-Precision XX3-form The result is placed into doubleword element 0 of VSR[XT] in double-precision format. xsdivdp XT,XA,XB (0xF000_01C0) The contents of doubleword element 1 of VSR[XT] are 60 T A B 56 AXBX TX undefined. 0 6 11 16 21 29 30 31 FPRF is set to the class and sign of the result. FR is XT TX || T set to indicate if the result was incremented when XA AX || A rounded. FI is set to indicate the result is inexact. XB BX || B reset_xflags() src1 VSR[XA]{0:63} If a trap-enabled invalid operation exception or a src2 VSR[XB]{0:63} trap-enabled zero divide exception occurs, VSR[XT] v{0:inf} DivideFP(src1,src2) and FPRF are not modified, and FR and FI are set to result{0:63} RoundToDP(RN,v) 0. if(vxsnan_flag) then SetFX(VXSNAN) if(vxidi_flag) then SetFX(VXIDI) See Table 47, "Scalar Floating-Point Final Result," on if(vxzdz_flag) then SetFX(VXZDZ) page 345. if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) Special Registers Altered: if(xx_flag) then SetFX(XX) FPRF FR FI FX OX UX ZX XX if(zx_flag) then SetFX(ZX) VXSNAN VXIDI VXZDZ vex_flag VE & (vxsnan_flag | vxidi_flag | vxzdz_flag) zex_flag ZE & zx_flag VSR Data Layout for xsdivdp if( ~vex_flag & ~zex_flag ) then do src1 = VSR[XA] VSR[XT] = result || 0xUUUU_UUUU_UUUU_UUUU FPRF = ClassDP(result) DP unused FR = inc_flag FI = xx_flag src2 = VSR[XB] end DP unused else do FR = 0b0 tgt = VSR[XT] FI = 0b0 end DP undefined 0 64 127 Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src1 is divided1 by src2, producing a quotient having unbounded range and precision. The quotient is normalized2. See Actions for xsdivdp (p. 364). The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, "Floating-Point Intermediate Result Handling," on page 344. 1. Floating-point division is based on exponent subtraction and division of the significands. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 363 Version 2.06 src2 -Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v dQNaN v Q(src2) -Infinity v +Infinity v +Infinity v ­Infinity v ­Infinity v src2 vxidi_flag 1 vxidi_flag 1 vxsnan_flag 1 v +Infinity v ­Infinity v Q(src2) -NZF v +Zero v D(src1,src2) v D(src1,src2) v ­Zero v src2 zx_flag 1 zx_flag 1 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) -Zero v +Zero v +Zero v ­Zero v ­Zero v src2 vxzdz_flag 1 vxzdz_flag 1 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) +Zero v ­Zero v ­Zero v +Zero v +Zero v src2 vxzdz_flag 1 vxzdz_flag 1 vxsnan_flag 1 src1 v ­Infinity v +Infinity v Q(src2) +NZF v ­Zero v D(src1,src2) v D(src1,src2) v +Zero v src2 zx_flag 1 zx_flag 1 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) +Infinity v ­Infinity v ­Infinity v +Infinity v +Infinity v src2 vxidi_flag 1 vxidi_flag 1 vxsnan_flag 1 v src1 QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1 vxsnan_flag 1 v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Explanation: src1 The double-precision floating-point value in doubleword element 0 of VSR[XA]. src2 The double-precision floating-point value in doubleword element 0 of VSR[XB]. dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. D(x,y) Return the normalized quotient of floating-point value x divided by floating-point value y, having unbounded range and precision. Q(x) Return a QNaN with the payload of x. v The intermediate result having unbounded signficand precision and unbounded exponent range. Table 56. Actions for xsdivdp 364 Power ISATM Book I Version 2.06 VSX Scalar Multiply-Add Double-Precision ­ Let src3 be the double-precision floating-point XX3-form value in doubleword element 0 of VSR[XT]. xsmaddadp XT,XA,XB (0xF000_0108) src1 is multiplied1 by src3, producing a product having unbounded range and precision. 60 T A B 33 AXBX TX 0 6 11 16 21 29 30 31 See part 1 of Table 57. xsmaddmdp XT,XA,XB (0xF000_0148) src2 is added2 to the product, producing a sum having unbounded range and precision. 60 T A B 41 AXBX TX 0 6 11 16 21 29 30 31 The sum is normalized3. XT TX || T See part 2 of Table 57. XA AX || A XB BX || B The intermediate result is rounded to double-precision reset_xflags() using the rounding mode specified by the src1 VSR[XA]{0:63} src2 "xsmaddadp" ? VSR[XT]{0:63} : VSR[XB]{0:63} Floating-Point Rounding Control field RN of the src3 "xsmaddadp" ? VSR[XB]{0:63} : VSR[XT]{0:63} FPSCR. v{0:inf} MultiplyAddFP(src1,src3,src2) result{0:63} RoundToDP(RN,v) See Table 46, "Floating-Point Intermediate Result if(vxsnan_flag) then SetFX(VXSNAN) Handling," on page 344. if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) The result is placed into doubleword element 0 of if(ox_flag) then SetFX(OX) VSR[XT] in double-precision format. if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) The contents of doubleword element 1 of VSR[XT] are vex_flag VE & (vxsnan_flag | vximz_flag | vxisi_flag) undefined. if( ~vex_flag ) then do FPRF is set to the class and sign of the result. FR is VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU set to indicate if the result was incremented when FPRF ClassDP(result) FR inc_flag rounded. FI is set to indicate the result is inexact. FI xx_flag end If a trap-enabled invalid operation exception occurs, else do VSR[XT] and FPRF are not modified, and FR and FI FR 0b0 are set to 0. FI 0b0 end See Table 47, "Scalar Floating-Point Final Result," on page 345. Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Special Registers Altered: Let XB be the value BX concatenated with B. FPRF FR FI FX OX UX XX VXSNAN VXISI VXIMZ Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. For xsmaddadp, do the following. ­ Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XT]. ­ Let src3 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. For xsmaddmdp, do the following. ­ Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. 1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 365 Version 2.06 VSR Data Layout for xsmadd(a|m)dp src1 = VSR[XA] DP unused src2 = xsmaddadp ? VSR[XT] : VSR[XB] DP unused src3 = xsmaddadp ? VSR[XB] : VSR[XT] DP unused tgt = VSR[XT] DP undefined 0 64 127 366 Power ISATM Book I Version 2.06 Part 1: src3 Multiply ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN p dQNaN p dQNaN p Q(src3) ­Infinity p +Infinity p +Infinity p ­Infinity p ­Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p Q(src3) ­NZF p +Infinity p M(src1,src3) p +Zero p ­Zero p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) ­Zero p +Zero p +Zero p ­Zero p ­Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Zero p ­Zero p ­Zero p +Zero p +Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 src1 p Q(src3) +NZF p ­Infinity p M(src1,src3) p ­Zero p +Zero p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Infinity p ­Infinity p +Infinity p +Infinity p +Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p src1 QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1 vxsnan_flag 1 p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Part 2: src2 Add ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) ­NZF v ­Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2 vxsnan_flag 1 v Q(src2) ­Zero v ­Infinity v src2 v ­Zero v Rezd v src2 v +Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v ­Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2 vxsnan_flag 1 p v Q(src2) +NZF v ­Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 QNaN & vp vp vp vp vp vp vp vp src1 is a NaN vxsnan_flag 1 QNaN & v Q(src2) vp vp vp vp vp vp v src2 src1 not a NaN vxsnan_flag 1 Explanation: src1 The double-precision floating-point value in doubleword element 0 of VSR[XA]. src2 For xsmaddadp, the double-precision floating-point value in doubleword element 0 of VSR[XT]. For xsmaddmdp, the double-precision floating-point value in doubleword element 0 of VSR[XB]. src3 For xsmaddadp, the double-precision floating-point value in doubleword element 0 of VSR[XB]. For xsmaddmdp, the double-precision floating-point value in doubleword element 0 of VSR[XT]. dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Q(x) Return a QNaN with the payload of x. A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. p The intermediate product having unbounded range and precision. v The intermediate result having unbounded range and precision. Table 57. Actions for xsmadd(a|m)dp Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 367 Version 2.06 VSX Scalar Maximum Double-Precision XX3-form VSR Data Layout for xsmaxdp src1 = VSR[XA] xsmaxdp XT,XA,XB (0xF000_0500) DP unused 60 T A B 160 AXBX TX src2 = VSR[XB] 0 6 11 16 21 29 30 31 DP unused XT TX || T XA AX || A tgt = VSR[XT] XB BX || B DP undefined reset_xflags() src1 VSR[XA]{0:63} 0 64 127 src2 VSR[XB]{0:63} result{0:63} MaximumDP(src1,src2) if(vxsnan_flag) then SetFX(VXSNAN) vex_flag VE & vxsnan_flag if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU end Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. If src1 is greater than src2, src1 is placed into doubleword element 0 of VSR[XT]. Otherwise, src2 is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. The maximum of +0 and ­0 is +0. The maximum of a QNaN and any value is that value. The maximum of any value and an SNaN is that SNaN converted to a QNaN. FPRF, FR and FI are not modified. If a trap-enabled invalid operation exception occurs, VSR[XT] is not modified. See Table 58. Special Registers Altered: FX VXSNAN 368 Power ISATM Book I Version 2.06 src2 ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN T(Q(src2)) ­Infinity T(src1) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) fx(VXSNAN) T(Q(src2)) ­NZF T(src1) T(M(src1,src2)) T(src2) T(src2) T(src2) T(src2) T(src1) fx(VXSNAN) T(Q(src2)) ­Zero T(src1) T(src1) T(src1) T(src2) T(src2) T(src2) T(src1) fx(VXSNAN) T(Q(src2)) +Zero T(src1) T(src1) T(src1) T(src1) T(src2) T(src2) T(src1) fx(VXSNAN) src1 T(Q(src2)) +NZF T(src1) T(src1) T(src1) T(src1) T(M(src1,src2)) T(src2) T(src1) fx(VXSNAN) T(Q(src2)) +Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) fx(VXSNAN) T(src1) QNaN T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) fx(VXSNAN) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) SNaN fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) Explanation: src1 The double-precision floating-point value in doubleword element 0 of VSR[XA]. src2 The double-precision floating-point value in doubleword element 0 of VSR[XT]. NZF Nonzero finite number. Q(x) Return a QNaN with the payload of x. M(x,y) Return the greater of floating-point value x and floating-point value y. T(x) The value x is placed in doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF, FR and FI are not modified. fx(x) If x is equal to 0, FX is set to 1. x is set to 1. VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. If VE=1, update of VSR[XT] is suppressed. Table 58. Actions for xsmaxdp Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 369 Version 2.06 VSX Scalar Minimum Double-Precision XX3-form VSR Data Layout for xsmindp src1 = VSR[XA] xsmindp XT,XA,XB (0xF000_0540) DP unused 60 T A B 168 AXBX TX src2 = VSR[XB] 0 6 11 16 21 29 30 31 DP unused XT TX || T XA AX || A tgt = VSR[XT] XB BX || B DP undefined reset_xflags() src1 VSR[XA]{0:63} 0 64 127 src2 VSR[XB]{0:63} result{0:63} MinimumDP(src1,src2) if(vxsnan_flag) then SetFX(VXSNAN) vex_flag VE & vxsnan_flag if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU end Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. If src1 is less than src2, src1 is placed into doubleword element 0 of VSR[XT] in double-precision format. Otherwise, src2 is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. The minimum of +0 and ­0 is ­0. The minimum of a QNaN and any value is that value. The minimum of any value and an SNaN is that SNaN converted to a QNaN. FPRF, FR and FI are not modified. If a trap-enabled invalid operation exception occurs, VSR[XT] is not modified. See Table 59. Special Registers Altered: FX VXSNAN 370 Power ISATM Book I Version 2.06 src2 ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN T(Q(src2)) ­Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) fx(VXSNAN) T(Q(src2)) ­NZF T(src2) T(M(src1,src2)) T(src1) T(src1) T(src1) T(src1) T(src1) fx(VXSNAN) T(Q(src2)) ­Zero T(src2) T(src2) T(src1) T(src1) T(src1) T(src1) T(src1) fx(VXSNAN) T(Q(src2)) +Zero T(src2) T(src2) T(src2) T(src1) T(src1) T(src1) T(src1) fx(VXSNAN) src1 T(Q(src2)) +NZF T(src2) T(src2) T(src2) T(src2) T(M(src1,src2)) T(src1) T(src1) fx(VXSNAN) T(Q(src2)) +Infinity T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) T(src1) fx(VXSNAN) T(src1) QNaN T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) fx(VXSNAN) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) SNaN fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) Explanation: src1 The double-precision floating-point value in doubleword element 0 of VSR[XA]. src2 The double-precision floating-point value in doubleword element 0 of VSR[XT]. NZF Nonzero finite number. Q(x) Return a QNaN with the payload of x. M(x,y) Return the lesser of floating-point value x and floating-point value y. T(x) The value x is placed in doubleword element i (i{0,1}) of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF, FR and FI are not modified. fx(x) If x is equal to 0, FX is set to 1. x is set to 1. VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. If VE=1, update of VSR[XT] is suppressed. Table 59. Actions for xvmindp Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 371 Version 2.06 VSX Scalar Multiply-Subtract For xsmsubmdp, do the following. Double-Precision XX3-form ­ Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. xsmsubadp XT,XA,XB (0xF000_0188) ­ Let src3 be the double-precision floating-point value in doubleword element 0 of VSR[XT]. 60 T A B 49 AXBX TX 0 6 11 16 21 29 30 31 src1 is multiplied1 by src3, producing a product having unbounded range and precision. xsmsubmdp XT,XA,XB (0xF000_01C8) See part 1 of Table 60. 60 T A B 57 AXBX TX 0 6 11 16 21 29 30 31 src2 is negated and added2 to the product, producing a sum having unbounded range and precision. XT TX || T XA AX || A The result, having unbounded range and precision, is XB BX || B reset_xflags() normalized3. src1 VSR[XA]{0:63} src2 VSR[XT]{0:63} See part 2 of Table 60. src3 VSR[XB]{0:63} src2 "xsmsubadp" ? VSR[XT]{0:63} : VSR[XB]{0:63} The intermediate result is rounded to double-precision src3 "xsmsubadp" ? VSR[XB]{0:63} : VSR[XT]{0:63} using the rounding mode specified by the v{0:inf} MultiplyAddDP(src1,src3,NegateDP(src2)) Floating-Point Rounding Control field RN of the result{0:63} RoundToDP(RN,v) FPSCR. if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) See Table 46, "Floating-Point Intermediate Result if(vxisi_flag) then SetFX(VXISI) Handling," on page 344. if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) The result is placed into doubleword element 0 of if(xx_flag) then SetFX(XX) VSR[XT] in double-precision format. vex_flag VE & (vxsnan_flag | vximz_flag | vxisi_flag) if( ~vex_flag ) then do The contents of doubleword element 1 of VSR[XT] are VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU undefined. FPRF ClassDP(result) FR inc_flag FPRF is set to the class and sign of the result. FR is FI xx_flag set to indicate if the result was incremented when end rounded. FI is set to indicate the result is inexact. else do FR 0b0 If a trap-enabled invalid operation exception occurs, FI 0b0 VSR[XT] and FPRF are not modified, and FR and FI end are set to 0. Let XT be the value TX concatenated with T. See Table 47, "Scalar Floating-Point Final Result," on Let XA be the value AX concatenated with A. page 345. Let XB be the value BX concatenated with B. Special Registers Altered: Let src1 be the double-precision floating-point value FPRF FR FI FX OX UX XX in doubleword element 0 of VSR[XA]. VXSNAN VXISI VXIMZ For xsmsubadp, do the following. ­ Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XT]. ­ Let src3 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. 1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. 372 Power ISATM Book I Version 2.06 VSR Data Layout for xsmsub(a|m)dp src1 = VSR[XA] DP unused src2 = xsmsubadp ? VSR[XT] : VSR[XB] DP unused src3 = xsmsubadp ? VSR[XB] : VSR[XT] DP unused tgt = VSR[XT] DP undefined 0 64 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 373 Version 2.06 Part 1: src3 Multiply ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN p dQNaN p dQNaN p Q(src3) ­Infinity p +Infinity p +Infinity p ­Infinity p ­Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p Q(src3) ­NZF p +Infinity p M(src1,src3) p +Zero p ­Zero p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) ­Zero p +Zero p +Zero p ­Zero p ­Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Zero p ­Zero p ­Zero p +Zero p +Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 src1 p Q(src3) +NZF p ­Infinity p M(src1,src3) p ­Zero p +Zero p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Infinity p ­Infinity p +Infinity p +Infinity p +Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p src1 QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1 vxsnan_flag 1 p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Part 2: src2 Subtract ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) ­NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v ­Infinity v src2 vxsnan_flag 1 v Q(src2) ­Zero v +Infinity v ­src2 v Rezd v ­Zero v ­src2 v ­Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v +Infinity v ­src2 v +Zero v Rezd v ­src2 v ­Infinity v src2 vxsnan_flag 1 p v Q(src2) +NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v ­Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 QNaN & vp vp vp vp vp vp vp vp src1 is a NaN vxsnan_flag 1 QNaN & v Q(src2) vp vp vp vp vp vp v src2 src1 not a NaN vxsnan_flag 1 Explanation: src1 The double-precision floating-point value in doubleword element 0 of VSR[XA]. src2 For xsmsubadp, the double-precision floating-point value in doubleword element 0 of VSR[XT]. For xsmsubmdp, the double-precision floating-point value in doubleword element 0 of VSR[XB]. src3 For xsmsubadp, the double-precision floating-point value in doubleword element 0 of VSR[XB]. For xsmsubmdp, the double-precision floating-point value in doubleword element 0 of VSR[XT]. dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Q(x) Return a QNaN with the payload of x. S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = y, v is considered to be an exact-zero-difference result (Rezd). M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. p The intermediate product having unbounded range and precision. v The intermediate result having unbounded range and precision. Table 60. Actions for xsmsub(a|m)dp 374 Power ISATM Book I Version 2.06 VSX Scalar Multiply Double-Precision The result is placed into doubleword element 0 of XX3-form VSR[XT] in double-precision format. xsmuldp XT,XA,XB (0xF000_0180) The contents of doubleword element 1 of VSR[XT] are undefined. 60 T A B 48 AXBX TX 0 6 11 16 21 29 30 31 FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when XT TX || T rounded. FI is set to indicate the result is inexact. XA AX || A XB BX || B reset_xflags() If a trap-enabled invalid operation exception occurs, src1 VSR[XA]{0:63} VSR[XT] and FPRF are not modified, and FR and FI src2 VSR[XB]{0:63} are set to 0. v{0:inf} MultiplyFP(src1,src2) result{0:63} RoundToDP(RN,v) See Table 47, "Scalar Floating-Point Final Result," on if(vxsnan_flag) then SetFX(VXSNAN) page 345. if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) Special Registers Altered: if(ox_flag) then SetFX(OX) FPRF FR FI FX OX UX XX if(ux_flag) then SetFX(UX) VXSNAN VXIMZ if(xx_flag) then SetFX(XX) vex_flag VE & (vxsnan_flag | vximz_flag) VSR Data Layout for xsmuldp if( ~vex_flag ) then do src1 = VSR[XA] VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) DP unused FR inc_flag FI xx_flag src2 = VSR[XB] end DP unused else do FR 0b0 tgt = VSR[XT] FI 0b0 end DP undefined 0 64 127 Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src1 is multiplied1 by src2, producing a product having unbounded range and precision. The product is normalized2. See Table 61. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, "Floating-Point Intermediate Result Handling," on page 344. 1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 375 Version 2.06 src2 -Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v dQNaN v Q(src2) -Infinity v +Infinity v +Infinity v ­Infinity v ­Infinity v src2 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 v Q(src2) -NZF v +Infinity v M(src1,src2) v +Zero v ­Zero v M(src1,src2) v +Infinity v src2 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) -Zero v +Zero v +Zero v ­Zero v ­Zero v src2 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) +Zero v ­Zero v ­Zero v +Zero v +Zero v src2 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 src1 v Q(src2) +NZF v ­Infinity v M(src1,src2) v ­Zero v +Zero v M(src1,src2) v +Infinity v src2 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) +Infinity v ­Infinity v +Infinity v +Infinity v +Infinity v src2 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 v src1 QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1 vxsnan_flag 1 v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Explanation: src1 The double-precision floating-point value in doubleword element 0 of VSR[XA]. src2 The double-precision floating-point value in doubleword element 0 of VSR[XB]. dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. Q(x) Return a QNaN with the payload of x. v The intermediate result having unbounded signficand precision and unbounded exponent range. Table 61. Actions for xsmuldp 376 Power ISATM Book I Version 2.06 VSX Scalar Negative Absolute Value VSX Scalar Negate Double-Precision Double-Precision XX2-form XX2-form xsnabsdp XT,XB (0xF000_05A4) xsnegdp XT,XB (0xF000_05E4) 60 T /// B 361 BX TX 60 T /// B 377 BX TX 0 6 11 16 21 30 31 0 6 11 16 21 30 31 XT TX || T XT TX || T XB BX || B XB BX || B result{0:63} 0b1 || VSR[XB]{1:63} result{0:63} ~VSR[XB]{0} || VSR[XB]{1:63} VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let XB be the value BX concatenated with B. The contents of doubleword element 0 of VSR[XB], The contents of doubleword element 0 of VSR[XB], with bit 0 set to 1, is placed into doubleword element 0 with bit 0 complemented, is placed into doubleword of VSR[XT]. element 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are The contents of doubleword element 1 of VSR[XT] are undefined. undefined. Special Registers Altered: Special Registers Altered: None None VSR Data Layout for xsnabsdp VSR Data Layout for xsnegdp src = VSR[XB] src = VSR[XB] DP unused DP unused tgt = VSR[XT] tgt = VSR[XT] DP undefined DP undefined 0 64 127 0 64 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 377 Version 2.06 VSX Scalar Negative Multiply-Add ­ Let src3 be the double-precision floating-point Double-Precision XX3-form value in doubleword element 0 of VSR[XT]. xsnmaddadp XT,XA,XB (0xF000_0508) src1 is multiplied1 by src3, producing a product having unbounded range and precision. 60 T A B 161 AXBX TX 0 6 11 16 21 29 30 31 See part 1 of Table 62. xsnmaddmdp XT,XA,XB (0xF000_0548) src2 is added2 to the product, producing a sum having unbounded range and precision. 60 T A B 169 AXBX TX 0 6 11 16 21 29 30 31 The sum is normalized3. XT TX || T See part 2 of Table 62. XA AX || A XB BX || B The intermediate result is rounded to double-precision reset_xflags() using the rounding mode specified by the src1 VSR[XA]{0:63} src2 "xsnmaddadp" ? VSR[XT]{0:63} : VSR[XB]{0:63} Floating-Point Rounding Control field RN of the src3 "xsnmaddadp" ? VSR[XB]{0:63} : VSR[XT]{0:63} FPSCR. v{0:inf} MultiplyAddDP(src1,src3,src2) result{0:63} NegateDP(RoundToDP(RN,v)) See Table 46, "Floating-Point Intermediate Result if(vxsnan_flag) then SetFX(VXSNAN) Handling," on page 344. if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) The result is negated and placed into doubleword if(ox_flag) then SetFX(OX) element 0 of VSR[XT] in double-precision format. if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) The contents of doubleword element 1 of VSR[XT] are vex_flag VE & (vxsnan_flag | vximz_flag | vxisi_flag) undefined. if( ~vex_flag ) then do FPRF is set to the class and sign of the result. FR is VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU set to indicate if the result was incremented when FPRF ClassDP(result) FR inc_flag rounded. FI is set to indicate the result is inexact. FI xx_flag end If a trap-enabled invalid operation exception occurs, else do VSR[XT] and FPRF are not modified, and FR and FI FR 0 are set to 0. FI 0 end See Table 63, "Scalar Floating-Point Final Result with Negation," on page 381. Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Special Registers Altered: Let XB be the value BX concatenated with B. FPRF FR FI FX OX UX XX VXSNAN VXISI VXIMZ Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. For xsnmaddadp, do the following. ­ Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XT]. ­ Let src3 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. For xsnmaddmdp, do the following. ­ Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. 1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. 378 Power ISATM Book I Version 2.06 VSR Data Layout for xsnmadd(a|m)dp src1 = VSR[XA] DP unused src2 = xsnmaddadp ? VSR[XT] : VSR[XB] DP unused src3 = xsnmaddadp ? VSR[XB] : VSR[XT] DP unused tgt = VSR[XT] DP undefined 0 64 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 379 Version 2.06 Part 1: src3 Multiply ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN p dQNaN p dQNaN p Q(src3) ­Infinity p +Infinity p +Infinity p ­Infinity p ­Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p Q(src3) ­NZF p +Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) ­Zero p +Zero p +Zero p ­Zero p ­Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Zero p ­Zero p ­Zero p +Zero p +Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 src1 p Q(src3) +NZF p ­Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Infinity p ­Infinity p +Infinity p +Infinity p +Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p src1 QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1 vxsnan_flag 1 p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Part 2: src2 Add ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) ­NZF v ­Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2 vxsnan_flag 1 v Q(src2) ­Zero v ­Infinity v src2 v ­Zero v Rezd v src2 v +Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v ­Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2 vxsnan_flag 1 p v Q(src2) +NZF v ­Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 QNaN & vp vp vp vp vp vp vp vp src1 is a NaN vxsnan_flag 1 QNaN & v Q(src2) vp vp vp vp vp vp v src2 src1 not a NaN vxsnan_flag 1 Explanation: src1 The double-precision floating-point value in doubleword element 0 of VSR[XA]. src2 For xsnmaddadp, the double-precision floating-point value in doubleword element 0 of VSR[XT]. For xsnmaddmdp, the double-precision floating-point value in doubleword element 0 of VSR[XB]. src3 For xsnmaddadp, the double-precision floating-point value in doubleword element 0 of VSR[XB]. For xsnmaddmdp, the double-precision floating-point value in doubleword element 0 of VSR[XT]. dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Q(x) Return a QNaN with the payload of x. A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. p The intermediate product having unbounded range and precision. v The intermediate result having unbounded range and precision. Table 62. Actions for xsnmadd(a|m)dp 380 Power ISATM Book I Version 2.06 Is r incremented? (|r| > |v|) Is q incremented? (|q| > |v|) Is r inexact? (r g v) Is q inexact? (q g v) vxsnan_flag vximz_flag vxisi_flag OE UE VE XE Case ZE Returned Results and Status Setting ­ ­ ­ ­ ­ 0 0 0 ­ ­ ­ ­ T(N(r)), FPRFClassFP(r), FI0, FR0 0 ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ T(r), FPRFClassFP(r), FI0, FR0, fx(VXISI) 0 ­ ­ ­ ­ 0 1 ­ ­ ­ ­ ­ T(r), FPRFClassFP(r), FI0, FR0, fx(VXIMZ) 0 ­ ­ ­ ­ 1 0 ­ ­ ­ ­ ­ T(r), FPRFClassFP(r), FI0, FR0, fx(VXSNAN) Special 0 ­ ­ ­ ­ 1 1 ­ ­ ­ ­ ­ T(r), FPRFClassFP(r), FI0, FR0, fx(VXSNAN), fx(VXIMZ) 1 ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ fx(VXISI), error() 1 ­ ­ ­ ­ 0 1 ­ ­ ­ ­ ­ fx(VXIMZ), error() 1 ­ ­ ­ ­ 1 0 ­ ­ ­ ­ ­ fx(VXSNAN), error() 1 ­ ­ ­ ­ 1 1 ­ ­ ­ ­ ­ fx(VXSNAN), fx(VXIMZ), error() ­ ­ ­ ­ ­ ­ ­ ­ no ­ ­ ­ T(N(r)), FPRFClassFP(N(r)), FI0, FR0 ­ ­ ­ ­ 0 ­ ­ ­ yes no ­ ­ T(N(r)), FPRFClassFP(N(r)), FI1, FR0, fx(XX) Normal ­ ­ ­ ­ 0 ­ ­ ­ yes yes ­ ­ T(N(r)), FPRFClassFP(N(r)), FI1, FR1, fx(XX) ­ ­ ­ ­ 1 ­ ­ ­ yes no ­ ­ T(N(r)), FPRFClassFP(N(r)), FI1, FR0, fx(XX), error() ­ ­ ­ ­ 1 ­ ­ ­ yes yes ­ ­ T(N(r)), FPRFClassFP(N(r)), FI1, FR1, fx(XX), error() ­ 0 ­ ­ 0 ­ ­ ­ ­ ­ ­ ­ T(N(r)), FPRFClassFP(N(r)), FI1, FR?, fx(OX), fx(XX) ­ 0 ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ T(N(r)), FPRFClassFP(N(r)), FI1, FR?, fx(OX), fx(XX), error() Overflow ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ no ­ T(N(q)÷), FPRFClassFP(N(q)÷), FI0, FR0, fx(OX), error() ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ yes no T(N(q)÷), FPRFClassFP(N(q)÷), FI1, FR0, fx(OX), fx(XX), error() ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ yes yes T(N(q)÷), FPRFClassFP(N(q)÷), FI1, FR1, fx(OX), fx(XX), error() Explanation: ­ The results do not depend on this condition. ClassFP(x) Classifies the floating-point value x as defined in Table 2, "Floating-Point Result Flags," on page 281. fx(x) FX is set to 1 if x=0. x is set to 1. Wrap adjust, where = 21536 for double-precision and = 2192 for single-precision. q The value defined in Table 46, "Floating-Point Intermediate Result Handling," on page 344, signficand rounded to the target precision, unbounded exponent range. r The value defined in Table 46, "Floating-Point Intermediate Result Handling," on page 344, signficand rounded to the target precision, bounded exponent range. v The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range. FI Floating-Point Fraction Inexact status flag, FPSCRFI. This status flag is nonsticky. FR Floating-Point Fraction Rounded status flag, FPSCRFR. OX Floating-Point Overflow Exception status flag, FPSCROX. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. N(x) The value x is is negated by complementing the sign bit of x. T(x) The value x is placed in element 0 of VSR[XT] in the target precision format. The contents of the remaining element(s) of VSR[XT] are undefined. UX Floating-Point Underflow Exception status flag, FPSCRUX VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. VXIMZ Floating-Point Invalid Operation Exception (Infinity × Zero) status flag, FPSCRVXIMZ. VXISI Floating-Point Invalid Operation Exception (Infinity ­ Infinity) status flag, FPSCRVXISI. XX Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI. Table 63. Scalar Floating-Point Final Result with Negation Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 381 Version 2.06 Is r incremented? (|r| > |v|) Is q incremented? (|q| > |v|) Is r inexact? (r g v) Is q inexact? (q g v) vxsnan_flag vximz_flag vxisi_flag OE UE VE XE ZE Case Returned Results and Status Setting ­ ­ 0 ­ ­ ­ ­ ­ no ­ ­ ­ T(N(r)), FPRFClassFP(N(r)), FI0, FR0 ­ ­ 0 ­ 0 ­ ­ ­ yes no ­ ­ T(N(r)), FPRFClassFP(N(r)), FI1, FR0, fx(UX), fx(XX) ­ ­ 0 ­ 0 ­ ­ ­ yes yes ­ ­ T(N(r)), FPRFClassFP(N(r)), FI1, FR1, fx(UX), fx(XX) ­ ­ 0 ­ 1 ­ ­ ­ yes no ­ ­ T(N(r)), FPRFClassFP(N(r)), FI1, FR0, fx(UX), fx(XX), error() Tiny ­ ­ 0 ­ 1 ­ ­ ­ yes yes ­ ­ T(N(r)), FPRFClassFP(N(r)), FI1, FR1, fx(UX), fx(XX), error() ­ ­ 1 ­ ­ ­ ­ ­ yes ­ no ­ T(N(q)×), FPRFClassFP(N(q)×), FI0, FR0, fx(UX), error() ­ ­ 1 ­ ­ ­ ­ ­ yes ­ yes no T(N(q)×), FPRFClassFP(N(q)×), FI1, FR0, fx(UX), fx(XX), error() ­ ­ 1 ­ ­ ­ ­ ­ yes ­ yes yes T(N(q)×), FPRFClassFP(N(q)×), FI1, FR1, fx(UX), fx(XX), error() Explanation: ­ The results do not depend on this condition. ClassFP(x) Classifies the floating-point value x as defined in Table 2, "Floating-Point Result Flags," on page 281. fx(x) FX is set to 1 if x=0. x is set to 1. Wrap adjust, where = 21536 for double-precision and = 2192 for single-precision. q The value defined in Table 46, "Floating-Point Intermediate Result Handling," on page 344, signficand rounded to the target precision, unbounded exponent range. r The value defined in Table 46, "Floating-Point Intermediate Result Handling," on page 344, signficand rounded to the target precision, bounded exponent range. v The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range. FI Floating-Point Fraction Inexact status flag, FPSCRFI. This status flag is nonsticky. FR Floating-Point Fraction Rounded status flag, FPSCRFR. OX Floating-Point Overflow Exception status flag, FPSCROX. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. N(x) The value x is is negated by complementing the sign bit of x. T(x) The value x is placed in element 0 of VSR[XT] in the target precision format. The contents of the remaining element(s) of VSR[XT] are undefined. UX Floating-Point Underflow Exception status flag, FPSCRUX VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. VXIMZ Floating-Point Invalid Operation Exception (Infinity × Zero) status flag, FPSCRVXIMZ. VXISI Floating-Point Invalid Operation Exception (Infinity ­ Infinity) status flag, FPSCRVXISI. XX Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI. Table 63. Scalar Floating-Point Final Result with Negation (Continued) 382 Power ISATM Book I Version 2.06 VSX Scalar Negative Multiply-Subtract For xsnmsubmdp, do the following. Double-Precision XX3-form ­ Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. xsnmsubadp XT,XA,XB (0xF000_0588) ­ Let src3 be the double-precision floating-point value in doubleword element 0 of VSR[XT]. 60 T A B 177 AXBX TX 0 6 11 16 21 29 30 31 src1 is multiplied1 by src3, producing a product having unbounded range and precision. xsnmsubmdp XT,XA,XB (0xF000_05C8) See part 1 of Table 64. 60 T A B 185 AXBX TX 0 6 11 16 21 29 30 31 src2 is negated and added2 to the product, producing a sum having unbounded range and precision. XT TX || T XA AX || A The sum is normalized3. XB BX || B reset_xflags() src1 VSR[XA]{0:63} See part 2 of Table 64. src2 VSR[XT]{0:63} src3 VSR[XB]{0:63} The intermediate result is rounded to double-precision src2 "xsnmsubadp" ? VSR[XT]{0:63} : VSR[XB]{0:63} using the rounding mode specified by the src3 "xsnmsubadp" ? VSR[XB]{0:63} : VSR[XT]{0:63} Floating-Point Rounding Control field RN of the v{0:inf} MultiplyAddDP(src1,src3,NegateDP(src2)) FPSCR. result{0:63} NegateDP(RoundToDP(RN,v)) if(vxsnan_flag) then SetFX(VXSNAN) See Table 46, "Floating-Point Intermediate Result if(vximz_flag) then SetFX(VXIMZ) Handling," on page 344. if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) The result is negated and placed into doubleword if(ux_flag) then SetFX(UX) element 0 of VSR[XT] in double-precision format. if(xx_flag) then SetFX(XX) vex_flag VE & (vxsnan_flag | vximz_flag | vxisi_flag) The contents of doubleword element 1 of VSR[XT] are if( ~vex_flag ) then do undefined. VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) FPRF is set to the class and sign of the result. FR is FR inc_flag set to indicate if the result was incremented when FI xx_flag rounded. FI is set to indicate the result is inexact. end else do If a trap-enabled invalid operation exception occurs, FR 0b0 VSR[XT] and FPRF are not modified, and FR and FI FI 0b0 are set to 0. end See Table 63, "Scalar Floating-Point Final Result with Let XT be the value TX concatenated with T. Negation," on page 381. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Special Registers Altered: FPRF FR FI FX OX UX XX Let src1 be the double-precision floating-point value VXSNAN VXISI VXIMZ in doubleword element 0 of VSR[XA]. For xsnmsubadp, do the following. ­ Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XT]. ­ Let src3 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. 1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 383 Version 2.06 VSR Data Layout for xsnmsub(a|m)dp src1 = VSR[XA] DP unused src2 = xsnmsubadp ? VSR[XT] : VSR[XB] DP unused src3 = xsnmsubadp ? VSR[XB] : VSR[XT] DP unused tgt = VSR[XT] DP undefined 0 64 127 384 Power ISATM Book I Version 2.06 Part 1: src3 Multiply ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN p dQNaN p dQNaN p Q(src3) ­Infinity p +Infinity p +Infinity p ­Infinity p ­Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p Q(src3) ­NZF p +Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) ­Zero p +Zero p +Zero p ­Zero p ­Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Zero p ­Zero p ­Zero p +Zero p +Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 src1 p Q(src3) +NZF p ­Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Infinity p ­Infinity p +Infinity p +Infinity p +Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p src1 QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1 vxsnan_flag 1 p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Part 2: src2 Subtract ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) ­NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v ­Infinity v src2 vxsnan_flag 1 v Q(src2) ­Zero v +Infinity v ­src2 v Rezd v ­Zero v ­src2 v ­Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v +Infinity v ­src2 v +Zero v Rezd v ­src2 v ­Infinity v src2 vxsnan_flag 1 p v Q(src2) +NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v ­Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 QNaN & vp vp vp vp vp vp vp vp src1 is a NaN vxsnan_flag 1 QNaN & v Q(src2) vp vp vp vp vp vp v src2 src1 not a NaN vxsnan_flag 1 Explanation: src1 The double-precision floating-point value in doubleword element 0 of VSR[XA]. src2 For xsnmsubadp, the double-precision floating-point value in doubleword element 0 of VSR[XT]. For xsnmsubmdp, the double-precision floating-point value in doubleword element 0 of VSR[XB]. src3 For xsnmsubadp, the double-precision floating-point value in doubleword element 0 of VSR[XB]. For xsnmsubmdp, the double-precision floating-point value in doubleword element 0 of VSR[XT]. dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Q(x) Return a QNaN with the payload of x. S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = y, v is considered to be an exact-zero-difference result (Rezd). M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. p The intermediate product having unbounded range and precision. v The intermediate result having unbounded range and precision. Table 64. Actions for xsnmsub(a|m)dp Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 385 Version 2.06 VSX Scalar Round to Double-Precision Integer using round to Nearest Away XX2-form xsrdpi XT,XB (0xF000_0124) 60 T /// B 73 BX TX 0 6 11 16 21 30 31 XT TX || T XB BX || B reset_xflags() result{0:63} RoundToDPIntegerNearAway(VSR[XB]{0:63}) if(vxsnan_flag) then SetFX(VXSNAN) FR 0b0 FI 0b0 vex_flag VE & vxsnan_flag if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassFP(result) end Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src is rounded to an integer using the rounding mode Round to Nearest Away. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to 0. FI is set to 0. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. Special Registers Altered: FPRF FR=0b0 FI=0b0 FX VXSNAN VSR Data Layout for xsrdpi src = VSR[XB] DP unused tgt = VSR[XT] DP undefined 0 64 127 386 Power ISATM Book I Version 2.06 VSX Scalar Round to Double-Precision Integer exact using Current rounding mode XX2-form VSR Data Layout for xsrdpic src = VSR[XB] xsrdpic XT,XB (0xF000_01AC) DP unused 60 T /// B 107 BX TX tgt = VSR[XT] 0 6 11 16 21 30 31 DP undefined XT TX || T 0 64 127 XB BX || B reset_xflags() src VSR[XB]{0:63} if(RN=0b00) then result{0:63} RoundToDPIntegerNearEven(src) if(RN=0b01) then result{0:63} RoundToDPIntegerTrunc(src) if(RN=0b10) then result{0:63} RoundToDPIntegerCeil(src) if(RN=0b11) then result{0:63} RoundToDPIntegerFloor(src) if(vxsnan_flag) then SetFX(VXSNAN) if(xx_flag) then SetFX(XX) vex_flag VE & vxsnan_flag if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) FR inc_flag FI xx_flag end else do FR 0b0 FI 0b0 end Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src is rounded to an integer using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. Special Registers Altered: FPRF FR FI FX XX VXSNAN Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 387 Version 2.06 VSX Scalar Round to Double-Precision Integer VSX Scalar Round to Double-Precision Integer using round toward -Infinity XX2-form using round toward +Infinity XX2-form xsrdpim XT,XB (0xF000_01E4) xsrdpip XT,XB (0xF000_01A4) 60 T /// B 121 BX TX 60 T /// B 105 BX TX 0 6 11 16 21 30 31 0 6 11 16 21 30 31 XT TX || T XT TX || T XB BX || B XB BX || B reset_xflags() reset_xflags() result{0:63} RoundToDPIntegerFloor(VSR[XB]{0:63}) result{0:63} RoundToDPIntegerCeil(VSR[XB]{0:63}) if(vxsnan_flag) then SetFX(VXSNAN) if(vxsnan_flag) then SetFX(VXSNAN) FR 0b0 FR 0b0 FI 0b0 FI 0b0 vex_flag VE & vxsnan_flag vex_flag VE & vxsnan_flag if( ~vex_flag ) then do if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) FPRF ClassDP(result) end end Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. doubleword element 0 of VSR[XB]. src is rounded to an integer using the rounding mode src is rounded to an integer using the rounding mode Round toward -Infinity. Round toward +Infinity. The result is placed into doubleword element 0 of The result is placed into doubleword element 0 of VSR[XT] in double-precision format. VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are The contents of doubleword element 1 of VSR[XT] are undefined. undefined. FPRF is set to the class and sign of the result. FR is FPRF is set to the class and sign of the result. FR is set to 0. FI is set to 0. set to 0. FI is set to 0. If a trap-enabled invalid operation exception occurs, If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI VSR[XT] and FPRF are not modified, and FR and FI are set to 0. are set to 0. Special Registers Altered: Special Registers Altered: FPRF FR=0b0 FI=0b0 FX VXSNAN FPRF FR=0b0 FI=0b0 FX VXSNAN VSR Data Layout for xsrdpim VSR Data Layout for xsrdpip src = VSR[XB] src = VSR[XB] DP unused DP unused tgt = VSR[XT] tgt = VSR[XT] DP undefined DP undefined 0 64 127 0 64 127 388 Power ISATM Book I Version 2.06 VSX Scalar Round to Double-Precision Integer using round toward Zero XX2-form xsrdpiz XT,XB (0xF000_0164) 60 T /// B 89 BX TX 0 6 11 16 21 30 31 XT TX || T XB BX || B reset_xflags() result{0:63} RoundToDPIntegerTrunc(VSR[XB]{0:63}) if(vxsnan_flag) then SetFX(VXSNAN) FR 0b0 FI 0b0 vex_flag VE & vxsnan_flag if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) end Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src is rounded to an integer using the rounding mode Round toward Zero. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to 0. FI is set to 0. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. Special Registers Altered: FPRF FR=0b0 FI=0b0 FX VXSNAN VSR Data Layout for xsrdpiz src = VSR[XB] DP unused tgt = VSR[XT] DP undefined 0 64 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 389 Version 2.06 VSX Scalar Reciprocal Estimate Source Value Result Exception Double-Precision XX2-form ­Infinity ­Zero None xsredp XT,XB (0xF000_0168) ­Zero ­Infinity1 ZX 60 T /// B 90 BX TX +Zero +Infinity1 ZX 0 6 11 16 21 30 31 +Infinity +Zero None XT TX || T SNaN QNaN2 VXSNAN XB BX || B reset_xflags() QNaN QNaN None v{0:inf} ReciprocalEstimateDP(VSR[XB]{0:63}) 1. No result if ZE=1. result{0:63} RoundToDP(RN,v) 2. No result if VE=1. if(vxsnan_flag) then SetFX(VXSNAN) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) The contents of doubleword element 1 of VSR[XT] are if(zx_flag) then SetFX(ZX) undefined. vex_flag VE & vxsnan_flag zex_flag ZE & zx_flag FPRF is set to the class and sign of the result. FR is set to an undefined value. FI is set to an undefined if( ~vex_flag & ~zex_flag ) then do value. VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) If a trap-enabled invalid operation exception or a FR 0bU trap-enabled zero divide exception occurs, VSR[XT] FI 0bU and FPRF are not modified. end The results of executing this instruction is permitted to Let XT be the value TX concatenated with T. vary between implementations, and between different Let XB be the value BX concatenated with B. executions on the same implementation. Let src be the double-precision floating-point value in Special Registers Altered: doubleword element 0 of VSR[XB]. FPRF FR=0bU FI=0bU FX OX UX ZX VXSNAN A double-precision floating-point estimate of the reciprocal of src is placed into doubleword element 0 of VSR[XT] in double-precision format. VSR Data Layout for xsredp src = VSR[XB] Unless the reciprocal of src would be a zero, an infinity, or a QNaN, the estimate has a relative error in DP unused precision no greater than one part in 16384 of the tgt = VSR[XT] reciprocal of src. That is, DP undefined estimate ­ ---------- 1 0 64 127 src 1 -------------------------------------------- - ------------------ 1 16384 ---------- src Operation with various special values of the operand is summarized below. 390 Power ISATM Book I Version 2.06 VSX Scalar Reciprocal Square Root Estimate Source Value Result Exception Double-Precision XX2-form ­Infinity QNaN1 VXSQRT xsrsqrtedp XT,XB (0xF000_0128) ­Finite QNaN1 VXSQRT 60 T /// B 74 BX TX ­Zero ­Infinity2 ZX 0 6 11 16 21 30 31 +Zero +Infinity2 ZX XT TX || T +Infinity +Zero None XB BX || B reset_xflags() SNaN QNaN1 VXSNAN v{0:inf} ReciprocalSquareRootEstimateDP(VSR[XB]{0:63}) result{0:63} RoundToDP(RN,v) QNaN QNaN None if(vxsnan_flag) then SetFX(VXSNAN) 1. No result if VE=1. if(vxsqrt_flag) then SetFX(VXSQRT) 2. No result if ZE=1. if(zx_flag) then SetFX(ZX) vex_flag VE & (vxsnan_flag | vxsqrt_flag) The contents of doubleword element 1 of VSR[XT] are zex_flag ZE & zx_flag undefined. if( ~vex_flag & ~zex_flag ) then do FPRF is set to the class and sign of the result. FR is VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) set to an undefined value. FI is set to an undefined FR 0bU value. FI 0bU end If a trap-enabled invalid operation exception or a trap-enabled zero divide exception occurs, VSR[XT] Let XT be the value TX concatenated with T. and FPRF are not modified. Let XB be the value BX concatenated with B. The results of executing this instruction is permitted to Let src be the double-precision floating-point value in vary between implementations, and between different doubleword element 0 of VSR[XB]. executions on the same implementation. A double-precision floating-point estimate of the Special Registers Altered: reciprocal square root of src is placed into FPRF FR=0bU FI=0bU FX ZX doubleword element 0 of VSR[XT] in double-precision VXSNAN VXSQRT format. VSR Data Layout for xsrsqrtedp Unless the reciprocal of the square root of src would be a zero, an infinity, or a QNaN, the estimate has a src = VSR[XB] relative error in precision no greater than one part in DP unused 16384 of the reciprocal of the square root of src. That is, tgt = VSR[XT] 1 DP undefined estimate ­ -------------- - 1 0 64 127 src --------------- - ------------------------------------------------ - 1 16384 ---------------- src Operation with various special values of the operand is summarized below. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 391 Version 2.06 VSX Scalar Square Root Double-Precision The intermediate result is rounded to double-precision XX2-form using the rounding mode specified by the Floating-Point Rounding Control field RN of the xssqrtdp XT,XB (0xF000_012C) FPSCR. 60 T /// B 75 BX TX See Table 46, "Floating-Point Intermediate Result 0 6 11 16 21 30 31 Handling," on page 344. XT TX || T The result is placed into doubleword element 0 of XB BX || B reset_xflags() VSR[XT] in double-precision format. v{0:inf} SquareRootFP(VSR[XB]{0:63}) result{0:63} RoundToDP(RN,v) The contents of doubleword element 1 of VSR[XT] are if(vxsnan_flag) then SetFX(VXSNAN) undefined. if(vxsqrt_flag) then SetFX(VXSQRT) if(xx_flag) then SetFX(XX) FPRF is set to the class and sign of the result. FR is vex_flag VE & (vxsnan_flag | vxsqrt_flag) set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU If a trap-enabled invalid operation exception occurs, FPRF ClassDP(result) VSR[XT] and FPRF are not modified, and FR and FI FR inc_flag are set to 0. FI xx_flag end See Table 47, "Scalar Floating-Point Final Result," on else do FR 0b0 page 345. FI 0b0 end Special Registers Altered: FPRF FR FI FX XX VXSNAN VXSQRT Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. VSR Data Layout for xssqrtdp src = VSR[XB] Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. DP unused The unbounded-precision square root of src is tgt = VSR[XT] produced. DP undefined 0 64 127 See Table 65. src -Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v dQNaN v Q(src) v +Zero v +Zero v SQRT(src) v +Infinity v src vxsqrt_flag 1 vxsqrt_flag 1 vxsnan_flag 1 Explanation: src The double-precision floating-point value in doubleword element 0 of VSR[XB]. dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. SQRT(x) The unbounded-precision square root of the floating-point value x. Q(x) Return a QNaN with the payload of x. v The intermediate result having unbounded signficand precision and unbounded exponent range. Table 65. Actions for xssqrtdp 392 Power ISATM Book I Version 2.06 VSX Scalar Subtract Double-Precision See Table 46, "Floating-Point Intermediate Result XX3-form Handling," on page 344. xssubdp XT,XA,XB (0xF000_0140) The result is placed into doubleword element 0 of VSR[XT]. 60 T A B 40 AXBX TX 0 6 11 16 21 30 30 31 The contents of doubleword element 1 of VSR[XT] are undefined. XT TX || T XA AX || A FPRF is set to the class and sign of the result. FR is XB BX || B reset_xflags() set to indicate if the result was incremented when src1 VSR[XA]{0:63} rounded. FI is set to indicate the result is inexact. src2 VSR[XB]{0:63} v{0:inf} AddDP(src1,NegateDP(src2)) If a trap-enabled invalid operation exception occurs, result{0:63} RoundToDP(RN,v) VSR[XT] and FPRF are not modified, and FR and FI if(vxsnan_flag) then SetFX(VXSNAN) are set to 0. if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) See Table 47, "Scalar Floating-Point Final Result," on if(ux_flag) then SetFX(UX) page 345. if(xx_flag) then SetFX(XX) vex_flag VE & (vxsnan_flag | vxisi_flag) Special Registers Altered: FPRF FR FI FX OX UX XX if( ~vex_flag ) then do VXSNAN VXISI VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) FR inc_flag VSR Data Layout for xssubdp FI xx_flag src1 = VSR[XA] end else do DP unused FR 0b0 FI 0b0 src2 = VSR[XB] end DP unused Let XT be the value TX concatenated with T. tgt = VSR[XT] Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. DP undefined 0 64 127 Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src2 is negated and added1 to src1, producing a sum having unbounded range and precision. See Table 66. The sum is normalized2. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. 1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 393 Version 2.06 src2 -Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) -Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) -NZF v +Infinity v S(src1,src2) v src1 v src1 v S(src1,src2) v ­Infinity v src2 vxsnan_flag 1 v Q(src2) -Zero v +Infinity v ­src2 v ­Zero v Rezd v ­src2 v ­Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v +Infinity v ­src2 v Rezd v +Zero v ­src2 v ­Infinity v src2 vxsnan_flag 1 src1 v Q(src2) +NZF v +Infinity v S(src1,src2) v src1 v src1 v S(src1,src2) v ­Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v src1 QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1 vxsnan_flag 1 v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Explanation: src1 The double-precision floating-point value in doubleword element 0 of VSR[XA]. src2 The double-precision floating-point value in doubleword element 0 of VSR[XB]. dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). S(x,y) The floating-point value y is negated and then added to the floating-point value x. S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = y, v is considered to be an exact-zero-difference result (Rezd). Q(x) Return a QNaN with the payload of x. v The intermediate result having unbounded signficand precision and unbounded exponent range. Table 66. Actions for xssubdp 394 Power ISATM Book I Version 2.06 VSX Scalar Test for software Divide CR field BF is set to the value Double-Precision XX3-form 0b1 || fg_flag || fe_flag || 0b0. xstdivdp BF,XA,XB (0xF000_01E8) Special Registers Altered: CR[BF] 60 BF // A B 61 AXBX / 0 6 9 11 16 21 30 30 31 VSR Data Layout for xstdivdp XA AX || A src1 = VSR[XA] XB BX || B src1 VSR[XA]{0:63} DP unused src2 VSR[XB]{0:63} src2 = VSR[XB] e_a VSR[XA]{1:11} - 1023 e_b VSR[XB]{1:11} - 1023 DP undefined fe_flag IsNaN(src1) | IsInf(src1) | 0 64 127 IsNaN(src2) | IsInf(src2) | IsZero(src2) | ( e_b <= -1022 ) | ( e_b >= 1021 ) | ( !IsZero(src1) & ( (e_a - e_b) >= 1023 ) ) | ( !IsZero(src1) & ( (e_a - e_b) <= -1021 ) ) | ( !IsZero(src1) & ( e_a <= -970 ) ) fg_flag IsInf(src1) | IsInf(src2) | IsZero(src2) | IsDen(src2) fl_flag xsredp_error() <= 2-14 CR[BF] 0b1 || fg_flag || fe_flag || 0b0 Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. Let e_a be the unbiased exponent of src1. Let e_b be the unbiased exponent of src2. fe_flag is set to 1 for any of the following conditions. ­ src1 is a NaN or an infinity. ­ src2 is a zero, a NaN, or an infinity. ­ e_b is less than or equal to -1022. ­ e_b is greater than or equal to 1021. ­ src1 is not a zero and the difference, e_a - e_b, is greater than or equal to 1023. ­ src1 is not a zero and the difference, e_a - e_b, is less than or equal to -1021. ­ src1 is not a zero and e_a is less than or equal to -970 Otherwise fe_flag is set to 0. fg_flag is set to 1 for any of the following conditions. ­ src1 is an infinity. ­ src2 is a zero, an infinity, or a denormalized value. Otherwise fg_flag is set to 0. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 395 Version 2.06 VSX Scalar Test for software Square Root Double-Precision XX2-form xstsqrtdp BF,XB (0xF000_01A8) 60 BF // /// B 106 BX / 0 6 9 11 16 21 30 31 XB BX || B src VSR[XB]{0:63} e_b VSR[XB]{1:11} - 1023 fe_flag IsNaN(src) | IsInf(src) | IsZero(src) | IsNeg(src) | ( e_b <= -970 ) fg_flag IsInf(src) | IsZero(src) | IsDen(src) fl_flag xsrsqrtedp_error() <= 2-14 CR[BF] 0b1 || fg_flag || fe_flag || 0b0 Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. Let e_b be the unbiased exponent of src. fe_flag is set to 1 for any of the following conditions. ­ src is a zero, a NaN, an infinity, or a negative value. ­ e_b is less than or equal to -970 Otherwise fe_flag is set to 0. fg_flag is set to 1 for any of the following conditions. ­ src is a zero, an infinity, or a denormalized value. Otherwise fg_flag is set to 0. CR field BF is set to the value 0b1 || fg_flag || fe_flag || 0b0. Special Registers Altered: CR[BF] VSR Data Layout for xstsqrtdp src = VSR[XB] DP unused 0 64 127 396 Power ISATM Book I Version 2.06 VSX Vector Absolute Value Double-Precision VSX Vector Absolute Value Single-Precision XX2-form XX2-form xvabsdp XT,XB (0xF000_0764) xvabssp XT,XB (0xF000_0664) 60 T /// B 473 BX TX 60 T /// B 409 BX TX 0 6 11 16 21 30 31 0 6 11 16 21 30 31 XT TX || T XT TX || T XB BX || B XB BX || B do i=0 to 127 by 64 do i=0 to 127 by 32 VSR[XT]{i:i+63} 0b0 || VSR[XB]{i+1:i+63} VSR[XT]{i:i+31} 0b0 || VSR[XB]{i+1:i+31} end end Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. For each vector element i from 0 to 3, do the following. The contents of doubleword element i of VSR[XB], The contents of word element i of VSR[XB], with with bit 0 set to 0, is placed into doubleword bit 0 set to 0, is placed into word element i of element i of VSR[XT]. VSR[XT]. Special Registers Altered: Special Registers Altered: None None VSR Data Layout for xvabsdp VSR Data Layout for xvabssp src = VSR[XB] src = VSR[XB] DP DP SP SP SP SP tgt = VSR[XT] tgt = VSR[XT] DP DP SP SP SP SP 0 64 127 0 32 64 96 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 397 Version 2.06 VSX Vector Add Double-Precision XX3-form The result is placed into doubleword element i of VSR[XT] in double-precision format. xvadddp XT,XA,XB (0xF000_0300) See Table 68, "Vector Floating-Point Final Result," 60 T A B 96 AXBX TX on page 400. 0 6 11 16 21 29 30 31 If a trap-enabled exception occurs in any element of XT TX || T the vector, no results are written to VSR[XT]. XA AX || A XB BX || B Special Registers Altered: ex_flag 0b0 FX OX UX XX VXSNAN VXISI do i=0 to 127 by 64 reset_xflags() VSR Data Layout for xvadddp src1 VSR[XA]{i:i+63} src1 = VSR[XA] src2 VSR[XB]{i:i+63} v{0:inf} AddDP(src1,src2) DP DP result{i:i+63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) src2 = VSR[XB] if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) DP DP if(ux_flag) then SetFX(UX) tgt = VSR[XT] if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) DP DP ex_flag ex_flag | (VE & vxisi_flag) 0 64 127 ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. src2 is added1 to src1, producing a sum having unbounded range and precision. The sum is normalized2. See Table 67. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, "Floating-Point Intermediate Result Handling," on page 344. 1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. 398 Power ISATM Book I Version 2.06 src2 -Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) -Infinity v -Infinity v -Infinity v -Infinity v -Infinity v -Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) -NZF v -Infinity v A(src1,src2) v src1 v src1 v A(src1,src2) v +Infinity v src2 vxsnan_flag 1 v Q(src2) -Zero v -Infinity v src2 v -Zero v Rezd v src2 v +Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v -Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2 vxsnan_flag 1 src1 v Q(src2) +NZF v -Infinity v A(src1,src2) v src1 v src1 v A(src1,src2) v +Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v src1 QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1 vxsnan_flag 1 v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Explanation: src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). src2 The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Q(x) Return a QNaN with the payload of x. v The intermediate result having unbounded signficand precision and unbounded exponent range. Table 67. Actions for xvadddp (element i) Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 399 Version 2.06 Is r incremented? (|r| > |v|) Is q incremented? (|q| > |v|) Is r inexact? (r g v) Is q inexact? (q g v) vxsnan_flag vxsqrt_flag vximz_flag vxisi_flag vxidi_flag vxzdz_flag zx_flag OE UE VE XE ZE Case Returned Results and Status Setting ­ ­ ­ ­ ­ 0 0 0 0 0 0 0 ­ ­ ­ ­ T(r) ­ ­ ­ 0 ­ ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ T(r), fx(ZX) ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ fx(ZX), error() 0 ­ ­ ­ ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ T(r), fx(VXSQRT) 0 ­ ­ ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ T(r), fx(VXZDZ) 0 ­ ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ T(r), fx(VXIDI) 0 ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ T(r), fx(VXISI) 0 ­ ­ ­ ­ 0 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ T(r), fx(VXIMZ) Special 0 ­ ­ ­ ­ 1 0 ­ ­ ­ ­ ­ ­ ­ ­ ­ T(r), fx(VXSNAN) 0 ­ ­ ­ ­ 1 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ T(r), fx(VXSNAN), fx(VXIMZ) 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ T(r), fx(VXSQRT) 1 ­ ­ ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ fx(VXZDZ), error() 1 ­ ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ fx(VXIDI), error() 1 ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ fx(VXISI), error() 1 ­ ­ ­ ­ 0 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ fx(VXIMZ), error() 1 ­ ­ ­ ­ 1 0 ­ ­ ­ ­ ­ ­ ­ ­ ­ fx(VXSNAN), error() 1 ­ ­ ­ ­ 1 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ fx(VXSNAN), fx(VXIMZ), error() ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ no ­ ­ ­ T(r) ­ ­ ­ ­ 0 ­ ­ ­ ­ ­ ­ ­ yes no ­ ­ T(r), fx(XX) Normal ­ ­ ­ ­ 0 ­ ­ ­ ­ ­ ­ ­ yes yes ­ ­ T(r), fx(XX) ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ yes no ­ ­ T(r), fx(XX), error() ­ ­ ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ yes yes ­ ­ T(r), fx(XX), error() Explanation: ­ The results do not depend on this condition. fx(x) FX is set to 1 if x=0. x is set to 1. q The value defined in Table 46, "Floating-Point Intermediate Result Handling," on page 344, signficand rounded to the target precision, unbounded exponent range. r The value defined in Table 46, "Floating-Point Intermediate Result Handling," on page 344, signficand rounded to the target precision, bounded exponent range. v The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range. OX Floating-Point Overflow Exception status flag, FPSCROX. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of the target VSR is suppressed for all vector elements. T(x) The value x is placed in element i of VSR[XT] in the target precision format (where i c {0,1} for results with 64-bit elements, and i c {0,1,3,4}) for results with 32-bit elements). UX Floating-Point Underflow Exception status flag, FPSCRUX VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. VXSQRT Floating-Point Invalid Operation Exception (Invalid Square Root) status flag, FPSCRVXSQRT. VXIDI Floating-Point Invalid Operation Exception (Infinity ÷ Infinity) status flag, FPSCRVXIDI. VXIMZ Floating-Point Invalid Operation Exception (Infinity × Zero) status flag, FPSCRVXIMZ. VXISI Floating-Point Invalid Operation Exception (Infinity ­ Infinity) status flag, FPSCRVXISI. VXZDZ Floating-Point Invalid Operation Exception (Zero ÷ Zero) status flag, FPSCRVXZDZ. XX Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI. ZX Floating-Point Zero Divide Exception status flag, FPSCRZX. Table 68. Vector Floating-Point Final Result 400 Power ISATM Book I Version 2.06 Is r incremented? (|r| > |v|) Is q incremented? (|q| > |v|) Is r inexact? (r g v) Is q inexact? (q g v) vxsnan_flag vxsqrt_flag vximz_flag vxisi_flag vxidi_flag vxzdz_flag zx_flag OE UE VE XE Case ZE Returned Results and Status Setting ­ 0 ­ ­ 0 ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ T(r), fx(OX), fx(XX) ­ 0 ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ T(r), fx(OX), fx(XX), error() Overflow ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ no ­ fx(OX), error() ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ yes no fx(OX), fx(XX), error() ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ yes yes fx(OX), fx(XX), error() ­ ­ 0 ­ ­ ­ ­ ­ ­ ­ ­ ­ no ­ ­ ­ T(r) ­ ­ 0 ­ 0 ­ ­ ­ ­ ­ ­ ­ yes no ­ ­ T(r), fx(UX), fx(XX) ­ ­ 0 ­ 0 ­ ­ ­ ­ ­ ­ ­ yes yes ­ ­ T(r), fx(UX), fx(XX) ­ ­ 0 ­ 1 ­ ­ ­ ­ ­ ­ ­ yes no ­ ­ T(r), fx(UX), fx(XX), error() Tiny ­ ­ 0 ­ 1 ­ ­ ­ ­ ­ ­ ­ yes yes ­ ­ T(r), fx(UX), fx(XX), error() ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ yes ­ no ­ fx(UX), error() ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ yes ­ yes no fx(UX), fx(XX), error() ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ ­ yes ­ yes yes fx(UX), fx(XX), error() Explanation: ­ The results do not depend on this condition. fx(x) FX is set to 1 if x=0. x is set to 1. q The value defined in Table 46, "Floating-Point Intermediate Result Handling," on page 344, signficand rounded to the target precision, unbounded exponent range. r The value defined in Table 46, "Floating-Point Intermediate Result Handling," on page 344, signficand rounded to the target precision, bounded exponent range. v The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range. OX Floating-Point Overflow Exception status flag, FPSCROX. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of the target VSR is suppressed for all vector elements. T(x) The value x is placed in element i of VSR[XT] in the target precision format (where i c {0,1} for results with 64-bit elements, and i c {0,1,3,4}) for results with 32-bit elements). UX Floating-Point Underflow Exception status flag, FPSCRUX VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. VXSQRT Floating-Point Invalid Operation Exception (Invalid Square Root) status flag, FPSCRVXSQRT. VXIDI Floating-Point Invalid Operation Exception (Infinity ÷ Infinity) status flag, FPSCRVXIDI. VXIMZ Floating-Point Invalid Operation Exception (Infinity × Zero) status flag, FPSCRVXIMZ. VXISI Floating-Point Invalid Operation Exception (Infinity ­ Infinity) status flag, FPSCRVXISI. VXZDZ Floating-Point Invalid Operation Exception (Zero ÷ Zero) status flag, FPSCRVXZDZ. XX Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI. ZX Floating-Point Zero Divide Exception status flag, FPSCRZX. Table 68. Vector Floating-Point Final Result (Continued) Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 401 Version 2.06 VSX Vector Add Single-Precision XX3-form The result is placed into word element i of VSR[XT] in single-precision format. xvaddsp XT,XA,XB (0xF000_0200) See Table 68, "Vector Floating-Point Final Result," 60 T A B 64 AXBX TX on page 400. 0 6 11 16 21 29 30 31 If a trap-enabled exception occurs in any element of XT TX || T the vector, no results are written to VSR[XT]. XA AX || A XB BX || B Special Registers Altered: ex_flag 0b0 FX OX UX XX VXSNAN VXISI do i=0 to 127 by 32 reset_xflags() VSR Data Layout for xvaddsp src1 VSR[XA]{i:i+31} src1 = VSR[XA] src2 VSR[XB]{i:i+31} v{0:inf} AddSP(src1,src2) SP SP SP SP result{i:i+31} RoundToSP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) src2 = VSR[XB] if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) SP SP SP SP if(ux_flag) then SetFX(UX) tgt = VSR[XT] if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) SP SP SP SP ex_flag ex_flag | (VE & vxisi_flag) 0 32 64 96 127 ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. src2 is added1 to src1, producing a sum having unbounded range and precision. The sum is normalized2. See Table 69. The intermediate result is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, "Floating-Point Intermediate Result Handling," on page 344. 1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. 402 Power ISATM Book I Version 2.06 src2 -Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) -Infinity v -Infinity v -Infinity v -Infinity v -Infinity v -Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) -NZF v -Infinity v A(src1,src2) v src1 v src1 v A(src1,src2) v +Infinity v src2 vxsnan_flag 1 v Q(src2) -Zero v -Infinity v src2 v -Zero v Rezd v src2 v +Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v -Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2 vxsnan_flag 1 src1 v Q(src2) +NZF v -Infinity v A(src1,src2) v src1 v src1 v A(src1,src2) v +Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v src1 QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1 vxsnan_flag 1 v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Explanation: src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). src2 The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). dQNaN Default quiet NaN (0x7FC0_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Q(x) Return a QNaN with the payload of x. v The intermediate result having unbounded signficand precision and unbounded exponent range. Table 69. Actions for xvaddsp (element i) Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 403 Version 2.06 VSX Vector Compare Equal To Two zero inputs of same or different signs return Double-Precision [ & Record ] XX3-form true for that element. xvcmpeqdp XT,XA,XB (Rc=0) (0xF000_0318) Two infinity inputs of same signs return true for xvcmpeqdp. XT,XA,XB (Rc=1) (0xF000_0718) that element. 60 T A B Rc 99 AXBX TX If Rc=1, CR Field 6 is set as follows. 0 6 11 16 21 22 29 30 31 ­ Bit 0 of CR[6] is set to indicate all vector elements compared true. XT TX || T ­ Bit 1 of CR[6] is set to 0. XA AX || A XB BX || B ­ Bit 2 of CR[6] is set to indicate all vector elements ex_flag 0b0 compared false. all_false 0b1 ­ Bit 3 of CR[6] is set to 0. all_true 0b1 If a trap-enabled exception occurs in any element of do i0 to 127 by 64 the vector, no results are written to VSR[XT] and the reset_xflags() contents of CR[6] are undefined if Rc is equal to 1. src1 VSR[XA]{i:i+63} src2 VSR[XB]{i:i+63} Special Registers Altered: vxsnan_flag IsSNaN(src1) | IsSNaN(src2) CR[6] . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1) FX VXSNAN if( CompareEQDP(src1,src2) ) then result{i:i+63} 0xFFFF_FFFF_FFFF_FFFF all_false 0b0 VSR Data Layout for xvcmpeqdp[.] end src1 = VSR[XA] else do result{i:i+63} 0x0000_0000_0000_0000 DP DP all_true 0b0 end src2 = VSR[XB] if(vxsnan_flag) then SetFX(VXSNAN) DP DP ex_flag ex_flag | (VE & vxsnan_flag) end tgt = VSR[XT] if( ex_flag = 0 ) then VSR[XT] result MD MD 0 64 127 if(Rc=1) then do if( !vex_flag ) then CR[6] all_true || 0b0 || all_false || 0b0 else CR[6] 0bUUUU end Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. src1 is compared to src2. The contents of doubleword element i of VSR[XT] are set to all 1s if src1 is equal to src2, and is set to all 0s otherwise. A NaN input causes the comparison to return false for that element. 404 Power ISATM Book I Version 2.06 VSX Vector Compare Equal To Two zero inputs of same or different signs return Single-Precision [ & Record ] XX3-form true for that element. xvcmpeqsp XT,XA,XB (Rc=0) (0xF000_0218) Two infinity inputs of same signs return true for xvcmpeqsp. XT,XA,XB (Rc=1) (0xF000_0618) that element. 60 T A B Rc 67 AXBX TX If Rc=1, CR Field 6 is set as follows. 0 6 11 16 21 22 29 30 31 ­ Bit 0 of CR[6] is set to indicate all vector elements compared true. XT TX || T ­ Bit 1 of CR[6] is set to 0. XA AX || A XB BX || B ­ Bit 2 of CR[6] is set to indicate all vector elements ex_flag 0b0 compared false. all_false 0b1 ­ Bit 3 of CR[6] is set to 0. all_true 0b1 If a trap-enabled exception occurs in any element of do i=0 to 127 by 32 the vector, no results are written to VSR[XT] and the reset_xflags() contents of CR[6] are undefined if Rc is equal to 1. src1 VSR[XA]{i:i+31} src2 VSR[XB]{i:i+31} Special Registers Altered: vxsnan_flag IsSNaN(src1) | IsSNaN(src2) CR[6] . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1) FX VXSNAN if( CompareEQSP(src1,src2) ) then result{i:i+31} 0xFFFF_FFFF all_false 0b0 VSR Data Layout for xvcmpeqsp[.] end src1 = VSR[XA] else do result{i:i+31} 0x0000_0000 SP SP SP SP all_true 0b0 end src2 = VSR[XB] if(vxsnan_flag) then SetFX(VXSNAN) SP SP SP SP ex_flag ex_flag | (VE & vxsnan_flag) end tgt = VSR[XT] if( ex_flag = 0 ) then VSR[XT] result MW MW MW MW 0 32 64 96 127 if(Rc=1) then do if( !vex_flag ) then CR[6] all_true || 0b0 || all_false || 0b0 else CR[6] 0bUUUU end Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. src1 is compared to src2. The contents of word element i of VSR[XT] are set to all 1s if src1 is equal to src2, and is set to all 0s otherwise. A NaN input causes the comparison to return false for that element. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 405 Version 2.06 VSX Vector Compare Greater Than or Equal The contents of doubleword element i of VSR[XT] To Double-Precision [ & Record ] XX3-form are set to all 1s if src1 is greater than or equal to the double-precision floating-point operand in xvcmpgedp XT,XA,XB (Rc=0) (0xF000_0398) doubleword element i of VSR[XB]src2, and is set xvcmpgedp. XT,XA,XB (Rc=1) (0xF000_0798) to all 0s otherwise. 60 T A B Rc 115 AXBX TX A NaN input causes the comparison to return false 0 6 11 16 21 22 29 30 31 for that element. XT TX || T Two zero inputs of same or different signs return XA AX || A XB BX || B true for that element. ex_flag 0b0 all_false 0b1 Two infinity inputs of same signs return true for all_true 0b1 that element. do i=0 to 127 by 64 If Rc=1, CR Field 6 is set as follows. reset_xflags() ­ Bit 0 of CR[6] is set to indicate all vector elements src1 VSR[XA]{i:i+63} compared true. src2 VSR[XB]{i:i+63} ­ Bit 1 of CR[6] is set to 0. ­ Bit 2 of CR[6] is set to indicate all vector elements if( IsSNaN(src1) | IsSNaN(src2) ) then do compared false. vxsnan_flag 0b1 ­ Bit 3 of CR[6] is set to 0. if(VE=0) then vxvc_flag 0b1 end else vxvc_flag IsQNaN(src1) | IsQNaN(src2) If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT] and the if( CompareGEDP(src1,src2) ) then contents of CR[6] are undefined if Rc is equal to 1. result{i:i+63} 0xFFFF_FFFF_FFFF_FFFF all_false 0b0 Special Registers Altered: end CR[6] . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1) else do FX VXSNAN VXVC result{i:i+63} 0x0000_0000_0000_0000 all_true 0b0 end VSR Data Layout for xvcmpgedp[.] if(vxsnan_flag) then SetFX(VXSNAN) src1 = VSR[XA] if(vxvc_flag) then SetFX(VXVC) ex_flag ex_flag | (VE & vxsnan_flag) DP DP ex_flag ex_flag | (VE & vxvc_flag) src2 = VSR[XB] end DP DP if( ex_flag = 0 ) then VSR[XT] result tgt = VSR[XT] if(Rc=1) then do MD MD if( !vex_flag ) then CR[6] all_true || 0b0 || all_false || 0b0 0 64 127 else CR[6] 0bUUUU end Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. src1 is compared to src2. 406 Power ISATM Book I Version 2.06 VSX Vector Compare Greater Than or Equal The contents of word element i of VSR[XT] are set To Single-Precision [ & record CR6 ] to all 1s if src1 is greater than or equal to src2, XX3-form and is set to all 0s otherwise. xvcmpgesp XT,XA,XB (Rc=0) (0xF000_0298) A NaN input causes the comparison to return false xvcmpgesp. XT,XA,XB (Rc=1) (0xF000_0698) for that element. 60 T A B Rc 83 AXBX TX Two zero inputs of same or different signs return 0 6 11 16 21 22 29 30 31 true for that element. XT TX || T Two infinity inputs of same signs return true for XA AX || A XB BX || B that element. ex_flag 0b0 all_false 0b1 If Rc=1, CR Field 6 is set as follows. all_true 0b1 ­ Bit 0 of CR[6] is set to indicate all vector elements compared true. do i=0 to 127 by 32 ­ Bit 1 of CR[6] is set to 0. reset_xflags() ­ Bit 2 of CR[6] is set to indicate all vector elements src1 VSR[XA]{i:i+31} compared false. src2 VSR[XB]{i:i+31} ­ Bit 3 of CR[6] is set to 0. if( IsSNaN(src1) | IsSNaN(src2) ) then do If a trap-enabled exception occurs in any element of vxsnan_flag 0b1 the vector, no results are written to VSR[XT] and the if(VE=0) then vxvc_flag 0b1 end contents of CR[6] are undefined if Rc is equal to 1. else vxvc_flag IsQNaN(src1) | IsQNaN(src2) Special Registers Altered: if( CompareGESP(src1,src2) ) then CR[6] . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1) result{i:i+31} 0xFFFF_FFFF FX VXSNAN VXVC all_false 0b0 end VSR Data Layout for xvcmpgesp[.] else do result{i:i+31} 0x0000_0000 src1 = VSR[XA] all_true 0b0 SP SP SP SP end if(vxsnan_flag) then SetFX(VXSNAN) src2 = VSR[XB] if(vxvc_flag) then SetFX(VXVC) ex_flag ex_flag | (VE & vxsnan_flag) SP SP SP SP ex_flag ex_flag | (VE & vxvc_flag) end tgt = VSR[XT] MW MW MW MW if( ex_flag = 0 ) then VSR[XT] result 0 32 64 96 127 if(Rc=1) then do if( !vex_flag ) then CR[6] all_true || 0b0 || all_false || 0b0 else CR[6] 0bUUUU end Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. src1 is compared to src2. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 407 Version 2.06 VSX Vector Compare Greater Than The contents of doubleword element i of VSR[XT] Double-Precision [ & record CR6 ] XX3-form are set to all 1s if src1 is greater than src2, and is set to all 0s otherwise. xvcmpgtdp XT,XA,XB (Rc=0) (0xF000_0358) xvcmpgtdp. XT,XA,XB (Rc=1) (0xF000_0758) A NaN input causes the comparison to return false for that element. 60 T A B Rc 107 AXBX TX 0 6 11 16 21 22 29 30 31 Two zero inputs of same or different signs return false for that element. XT TX || T XA AX || A If Rc=1, CR Field 6 is set as follows. XB BX || B ex_flag 0b0 ­ Bit 0 of CR[6] is set to indicate all vector elements all_false 0b1 compared true. all_true 0b1 ­ Bit 1 of CR[6] is set to 0. ­ Bit 2 of CR[6] is set to indicate all vector elements do i=0 to 127 by 64 compared false. reset_xflags() ­ Bit 3 of CR[6] is set to 0. src1 VSR[XA]{i:i+63} src2 VSR[XB]{i:i+63} If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT] and the if( IsSNaN(src1) | IsSNaN(src2) ) then do contents of CR[6] are undefined if Rc is equal to 1. vxsnan_flag 0b1 if(VE=0) then vxvc_flag 0b1 Special Registers Altered: end else vxvc_flag IsQNaN(src1) | IsQNaN(src2) CR[6] . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1) FX VXSNAN VXVC if( CompareGTDP(src1,src2) ) then do result{i:i+63} 0xFFFF_FFFF_FFFF_FFFF VSR Data Layout for xvcmpgtdp[.] all_false 0b0 end src1 = VSR[XA] else do DP DP result{i:i+63} 0x0000_0000_0000_0000 all_true 0b0 src2 = VSR[XB] end if(vxsnan_flag) then SetFX(VXSNAN) DP DP if(vxvc_flag) then SetFX(VXVC) tgt = VSR[XT] ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxvc_flag) MD MD end 0 64 127 if( ex_flag = 0 ) then VSR[XT] result if(Rc=1) then do if( !vex_flag ) then CR[6] all_true || 0b0 || all_false || 0b0 else CR[6] 0bUUUU end Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. src1 is compared to src2. 408 Power ISATM Book I Version 2.06 VSX Vector Compare Greater Than The contents of word element i of VSR[XT] are set Single-Precision [ & record CR6 ] XX3-form to all 1s if src1 is greater than src2, and is set to all 0s otherwise. xvcmpgtsp XT,XA,XB (Rc=0) (0xF000_0258) xvcmpgtsp. XT,XA,XB (Rc=1) (0xF000_0658) A NaN input causes the comparison to return false for that element. 60 T A B Rc 75 AXBX TX 0 6 11 16 21 22 29 30 31 Two zero inputs of same or different signs return false for that element. XT TX || T XA AX || A If Rc=1, CR Field 6 is set as follows. XB BX || B ex_flag 0b0 ­ Bit 0 of CR[6] is set to indicate all vector elements all_false 0b1 compared true. all_true 0b1 ­ Bit 1 of CR[6] is set to 0. ­ Bit 2 of CR[6] is set to indicate all vector elements do i=0 to 127 by 32 compared false. reset_xflags() ­ Bit 3 of CR[6] is set to 0. src1 VSR[XA]{i:i+31} src2 VSR[XB]{i:i+31} If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT] and the if( IsSNaN(src1) | IsSNaN(src2) ) then do contents of CR[6] are undefined if Rc is equal to 1. vxsnan_flag 0b1 if(VE=0) then vxvc_flag 0b1 Special Registers Altered: end else vxvc_flag IsQNaN(src1) | IsQNaN(src2) CR[6] . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1) FX VXSNAN VXVC if( CompareGTSP(src1,src2) ) then do result{i:i+31} 0xFFFF_FFFF VSR Data Layout for xvcmpgtsp[.] all_false 0b0 end src1 = VSR[XA] else do SP SP SP SP result{i:i+31} 0x0000_0000 all_true 0b0 src2 = VSR[XB] end if(vxsnan_flag) then SetFX(VXSNAN) SP SP SP SP if(vxvc_flag) then SetFX(VXVC) tgt = VSR[XT] ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxvc_flag) MW MW MW MW end 0 32 64 96 127 if( ex_flag = 0 ) then VSR[XT] result if(Rc=1) then do if( !vex_flag ) then CR[6] all_true || 0b0 || all_false || 0b0 else CR[6] 0bUUUU end Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. src1 is compared to src2. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 409 Version 2.06 VSX Vector Copy Sign Double-Precision VSX Vector Copy Sign Single-Precision XX3-form XX3-form xvcpsgndp XT,XA,XB (0xF000_0780) xvcpsgnsp XT,XA,XB (0xF000_0680) 60 T A B 240 AXBX TX 60 T A B 208 AXBX TX 0 6 11 16 21 29 30 31 0 6 11 16 21 29 30 31 XT TX || T XT TX || T XA AX || A XA AX || A XB BX || B XB BX || B do i=0 to 127 by 64 do i=0 to 127 by 32 VSR[XT]{i:i+63} VSR[XA]{i} || VSR[XB]{i+1:i+63} VSR[XT]{i:i+31} VSR[XA]{i} || VSR[XB]{i+1:i+31} end end Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. For each vector element i from 0 to 3, do the following. The contents of bit 0 of doubleword element i of The contents of bit 0 of word element i of VSR[XA] VSR[XA] are concatenated with the contents of are concatenated with the contents of bits 1:31 of bits 1:63 of doubleword element i of VSR[XB] and word element i of VSR[XB] and placed into word placed into doubleword element i of VSR[XT]. element i of VSR[XT]. Special Registers Altered: Special Registers Altered: None None Extended Mnemonic Equivalent To Extended Mnemonic Equivalent To xvmovdp XT,XB xvcpsgndp XT,XB,XB xvmovsp XT,XB xvcpsgnsp XT,XB,XB VSR Data Layout for xvcpsgndp VSR Data Layout for xvcpsgnsp src1 = VSR[XA] src1 = VSR[XA] DP DP SP SP SP SP src2 = VSR[XB] src2 = VSR[XB] DP DP SP SP SP SP tgt = VSR[XT] tgt = VSR[XT] DP DP SP SP SP SP 0 64 127 0 32 64 96 127 410 Power ISATM Book I Version 2.06 VSX Vector round Double-Precision to single-precision and Convert to Single-Precision format XX2-form xvcvdpsp XT,XB (0xF000_0624) 60 T /// B 393 BX TX 0 6 11 16 21 30 31 XT TX || T XB BX || B ex_flag 0b0 do i=0 to 127 by 64 reset_xflags() src VSR[XB]{i:i+63} result{i:i+31} RoundToSP(RN,src) result{i+32:i+63} 0xUUUU_UUUU if(vxsnan_flag) then SetFX(VXSNAN) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. src is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into bits 0:31 of doubleword element i of VSR[XT] in single-precision format. The contents of bits 32:63 of doubleword element i of VSR[XT] are undefined. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX XX VXSNAN VSR Data Layout for xvcvdpsp src = VSR[XB] DP DP tgt = VSR[XT] SP undefined SP undefined 0 32 64 96 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 411 Version 2.06 VSX Vector truncate Double-Precision to integer and Convert to Signed Integer VSR Data Layout for xvcvdpsxds Doubleword format with Saturate XX2-form src = VSR[XB] xvcvdpsxds XT,XB (0xF000_0760) DP DP tgt = VSR[XT] 60 T /// B 472 BX TX 0 6 11 16 21 30 31 SD SD 0 64 127 XT TX || T XB BX || B Programming Note ex_flag 0b0 xvcvdpsxds rounds using Round towards Zero do i=0 to 127 by 64 rounding mode. For other rounding modes, soft- reset_xflags() ware must use a Round to Double-Precision Inte- rnd{0:63} RoundToDPIntegerTrunc(VSR[XB]{i:i+63}) ger instruction that corresponds to the desired result{i:i+63} ConvertDPtoSD(rnd) rounding mode, including xvrdpic which uses the if(vxsnan_flag) then SetFX(VXSNAN) rounding mode specified by the RN field in the if(vxcvi_flag) then SetFX(VXCVI) FPSCR. if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxcvi_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. If src is a NaN, the result is the value 0x8000_0000_0000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 263-1, the result is 0x7FFF_FFFF_FFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than -263, the result is 0x8000_0000_0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit signed-integer format. The result is placed into doubleword element i of VSR[XT]. See Table 70. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: 412 Power ISATM Book I Version 2.06 Inexact? ( RoundToDPintegerTrunc(src) g src ) VE XE Returned Results and Status Setting 0 ­ ­ T(Nmin), fx(VXCVI) src [ Nmin-1 1 ­ ­ fx(VXCVI), error() 0 yes T(Nmin), fx(XX) Nmin-1 < src < Nmin ­ 1 yes fx(XX), error() src = Nmin ­ ­ no T(Nmin) ­ no T(ConvertDPtoSD(RoundToDPintegerTrunc(src))) Nmin < src < Nmax ­ 0 yes T(ConvertDPtoSD(RoundToDPintegerTrunc(src))), fx(XX) 1 yes fx(XX), error() T(Nmax) src = Nmax ­ ­ no Note: This case cannot occur as Nmax is not representable in DP format but is included here for completeness. 0 yes T(Nmax), fx(XX) Nmax < src < Nmax+1 ­ 1 yes fx(XX), error() 0 ­ ­ T(Nmax), fx(VXCVI) src m Nmax+1 1 ­ ­ fx(VXCVI), error() 0 ­ ­ T(Nmin), fx(VXCVI) src is a QNaN 1 ­ ­ fx(VXCVI), error() 0 ­ ­ T(Nmin), fx(VXCVI), fx(VXSNAN) src is a SNaN 1 ­ ­ fx(VXCVI), fx(VXSNAN), error() Explanation: fx(x) FX is set to 1 if x=0. x is set to 1. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of VSR[XT] is suppressed. Nmin The smallest signed integer doubleword value, -263 (0x8000_0000_0000_0000). Nmax The largest signed integer doubleword value, 263-1 (0x7FFF_FFFF_FFFF_FFFF). src The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). T(x) The signed integer doubleword value x is placed in doubleword element i of VSR[XT] (where i c {0,1}). Table 70. Actions for xvcvdpsxds Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 413 Version 2.06 VSX Vector truncate Double-Precision to Special Registers Altered: integer and Convert to Signed Integer Word FX XX VXSNAN VXCVI format with Saturate XX2-form xvcvdpsxws XT,XB (0xF000_0360) VSR Data Layout for xvcvdpsxws src = VSR[XB] 60 T /// B 216 BX TX 0 6 11 16 21 30 31 DP DP tgt = VSR[XT] XT TX || T XB BX || B SW undefined SW undefined ex_flag 0b0 0 32 64 96 127 do i=0 to 127 by 64 reset_xflags() Programming Note rnd{0:63} RoundToDPIntegerTrunc(VSR[XB]{i:i+63}) xvcvdpsxws rounds using Round towards Zero result{i:i+31} ConvertDPtoSW(rnd) rounding mode. For other rounding modes, soft- result{i+32:i+63} 0xUUUU_UUUU ware must use a Round to Double-Precision Inte- if(vxsnan_flag) then SetFX(VXSNAN) ger instruction that corresponds to the desired if(vxcvi_flag) then SetFX(VXCVI) rounding mode, including xvrdpic which uses the if(xx_flag) then SetFX(XX) rounding mode specified by the RN field in the ex_flag ex_flag | (VE & vxsnan_flag) FPSCR. ex_flag ex_flag | (VE & vxcvi_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. If src is a NaN, the result is the value 0x8000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 231-1, the result is 0x7FFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than -231, the result is 0x8000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit signed-integer format. The result is placed into bits 0:31 of doubleword element i of VSR[XT]. The contents of bits 32:63 of doubleword element 1 of VSR[XT] are undefined. See Table 71. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. 414 Power ISATM Book I Version 2.06 Inexact? ( RoundToDPintegerTrunc(src) g src ) VE XE Returned Results and Status Setting 0 ­ ­ T(Nmin), fx(VXCVI) src [ Nmin-1 1 ­ ­ fx(VXCVI), error() 0 yes T(Nmin), fx(XX) Nmin-1 < src < Nmin ­ 1 yes fx(XX), error() src = Nmin ­ ­ no T(Nmin) ­ no T(ConvertDPtoSW(RoundToDPintegerTrunc(src))) Nmin < src < Nmax ­ 0 yes T(ConvertDPtoSW(RoundToDPintegerTrunc(src))), fx(XX) 1 yes fx(XX), error() src = Nmax ­ ­ no T(Nmax) 0 yes T(Nmax), fx(XX) Nmax < src < Nmax+1 ­ 1 yes T(Nmax), fx(XX), error() 0 ­ ­ T(Nmax), fx(VXCVI) src m Nmax+1 1 ­ ­ fx(VXCVI), error() 0 ­ ­ T(Nmin), fx(VXCVI) src is a QNaN 1 ­ ­ fx(VXCVI), error() 0 ­ ­ T(Nmin), fx(VXCVI), fx(VXSNAN) src is a SNaN 1 ­ ­ fx(VXCVI), fx(VXSNAN), error() Explanation: ConvertDPtoSW(x) The double-precision floating-point integer value x converted to signed integer word format. fx(x) FX is set to 1 if x=0. x is set to 1. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of VSR[XT] is suppressed. Nmin The smallest signed integer word value, -231(0x8000_0000). Nmax The largest signed integer word value, 231-1 (0x7FFF_FFFF). RoundToDPintegerTrunc(x) The double-precision floating-point value x rounded to an integer using the rounding mode Round towards Zero. src The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). T(x) The signed integer word value x is placed in word element i of VSR[XT] (where i c {0,2}). Table 71. Actions for xvcvdpsxws Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 415 Version 2.06 VSX Vector truncate Double-Precision to integer and Convert to Unsigned Integer VSR Data Layout for xvcvdpuxds Doubleword format with Saturate XX2-form src = VSR[XB] xvcvdpuxds XT,XB (0xF000_0720) DP DP tgt = VSR[XT] 60 T /// B 456 BX TX 0 6 11 16 21 30 31 UD UD 0 64 127 XT TX || T XB BX || B Programming Note ex_flag 0b0 xvcvdpuxds rounds using Round towards Zero do i=0 to 127 by 64 rounding mode. For other rounding modes, soft- reset_xflags() ware must use a Round to Double-Precision Inte- rnd{0:63} RoundToDPIntegerTrunc(VSR[XB]{i:i+63}) ger instruction that corresponds to the desired result{i:i+63} ConvertDPtoUD(rnd) rounding mode, including xvrdpic which uses the if(vxsnan_flag) then SetFX(VXSNAN) rounding mode specified by the RN field in the if(vxcvi_flag) then SetFX(VXCVI) FPSCR. if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxcvi_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. If src is a NaN, the result is the value 0x0000_0000_0000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 264-1, the result is 0xFFFF_FFFF_FFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than 0, the result is 0x0000_0000_0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit unsigned-integer format. The result is placed into doubleword element i of VSR[XT]. See Table 72. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: 416 Power ISATM Book I Version 2.06 Inexact? ( RoundToDPintegerTrunc(src) g src ) VE XE Returned Results and Status Setting 0 ­ ­ T(Nmin), fx(VXCVI) src [ Nmin-1 1 ­ ­ fx(VXCVI), error() 0 yes T(Nmin), fx(XX) Nmin-1 < src < Nmin ­ 1 yes fx(XX), error() src = Nmin ­ ­ no T(Nmin) ­ no T(ConvertDPtoUD(RoundToDPintegerTrunc(src))) Nmin < src < Nmax ­ 0 yes T(ConvertDPtoUD(RoundToDPintegerTrunc(src))), fx(XX) 1 yes fx(XX), error() T(Nmax) src = Nmax ­ ­ no Note: This case cannot occur as Nmax is not representable in DP format but is included here for completeness. 0 yes T(Nmax), fx(XX) Nmax < src < Nmax+1 ­ 1 yes T(Nmax), fx(XX), error() 0 ­ ­ T(Nmax), fx(VXCVI) src m Nmax+1 1 ­ ­ fx(VXCVI), error() 0 ­ ­ T(Nmin), fx(VXCVI) src is a QNaN 1 ­ ­ fx(VXCVI), error() 0 ­ ­ T(Nmin), fx(VXCVI), fx(VXSNAN) src is a SNaN 1 ­ ­ fx(VXCVI), fx(VXSNAN), error() Explanation: ConvertDPtoUD(x) The double-precision floating-point integer value x converted to unsigned integer doubleword format. fx(x) FX is set to 1 if x=0. x is set to 1. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of VSR[XT] is suppressed. Nmin The smallest unsigned integer doubleword value, 0 (0x0000_0000_0000_0000). Nmax The largest unsigned integer doubleword value, 264-1 (0xFFFF_FFFF_FFFF_FFFF). RoundToDPintegerTrunc(x) The double-precision floating-point value x rounded to an integer using the rounding mode Round towards Zero. src The double-precision floating-point value in doubleword element i VSR[XB] (where i c {0,1}). T(x) The unsigned integer doubleword value x is placed in doubleword element i of VSR[XT] (where i c {0,1}). Table 72. Actions for xvcvdpuxds Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 417 Version 2.06 VSX Vector truncate Double-Precision to Special Registers Altered: integer and Convert to Unsigned Integer Word FX XX VXSNAN VXCVI format with Saturate XX2-form xvcvdpuxws XT,XB (0xF000_0320) VSR Data Layout for xvcvdpuxws src = VSR[XB] 60 T /// B 200 BX TX 0 6 11 16 21 30 31 DP DP tgt = VSR[XT] XT TX || T XB BX || B UW undefined UW undefined ex_flag 0b0 0 32 64 96 127 do i=0 to 127 by 64 reset_xflags() Programming Note rnd{0:63} RoundToDIntegerTrunc(VSR[XB]{i:i+63}) xvcvdpuxws rounds using Round towards Zero result{i:i+31} ConvertDPtoUW(rnd) rounding mode. For other rounding modes, soft- result{i+32:i+63} 0xUUUU_UUUU ware must use a Round to Double-Precision Inte- if(vxsnan_flag) then SetFX(VXSNAN) ger instruction that corresponds to the desired if(vxcvi_flag) then SetFX(VXCVI) rounding mode, including xvrdpic which uses the if(xx_flag) then SetFX(XX) rounding mode specified by the RN field in the ex_flag ex_flag | (VE & vxsnan_flag) FPSCR. ex_flag ex_flag | (VE & vxcvi_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. If src is a NaN, the result is the value 0x8000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 232-1, the result is 0xFFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than 0, the result is 0x0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit unsigned-integer format. The result is placed into bits 0:31 of doubleword element i of VSR[XT]. The contents of bits 32:63 of doubleword element i of VSR[XT] are undefined. See Table 73. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. 418 Power ISATM Book I Version 2.06 Inexact? ( RoundToDPintegerTrunc(src) g src ) VE XE Returned Results and Status Setting 0 ­ ­ T(Nmin), fx(VXCVI) src [ Nmin-1 1 ­ ­ fx(VXCVI), error() 0 yes T(Nmin), fx(XX) Nmin-1 < src < Nmin ­ 1 yes fx(XX), error() src = Nmin ­ ­ no T(Nmin) ­ no T(ConvertDPtoUW(RoundToDPintegerTrunc(src))) Nmin < src < Nmax ­ 0 yes T(ConvertDPtoUW(RoundToDPintegerTrunc(src))), fx(XX) 1 yes fx(XX), error() src = Nmax ­ ­ no T(Nmax) 0 yes T(Nmax), fx(XX) Nmax < src < Nmax+1 ­ 1 yes fx(XX), error() 0 ­ ­ T(Nmax), fx(VXCVI) src m Nmax+1 1 ­ ­ fx(VXCVI), error() 0 ­ ­ T(Nmin), fx(VXCVI) src is a QNaN 1 ­ ­ fx(VXCVI), error() 0 ­ ­ T(Nmin), fx(VXCVI), fx(VXSNAN) src is a SNaN 1 ­ ­ fx(VXCVI), fx(VXSNAN), error() Explanation: ConvertDPtoUW(x) The double-precision floating-point integer value x converted to unsigned integer word format. fx(x) FX is set to 1 if x=0. x is set to 1. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of VSR[XT] is suppressed. Nmin The smallest unsigned integer word value, 0 (0x0000_0000). Nmax The largest unsigned integer word value, 232-1 (0xFFFF_FFFF). RoundToDPintegerTrunc(x) The double-precision floating-point value x rounded to an integer using the rounding mode Round towards Zero. src The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). T(x) The unsigned integer word value x is placed in word element i of VSR[XT] (where i c {0,2}). Table 73. Actions for xvcvdpuxws Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 419 Version 2.06 VSX Vector Convert Single-Precision to Double-Precision format XX2-form xvcvspdp XT,XB (0xF000_0724) 60 T /// B 457 BX TX 0 6 11 16 21 30 31 XT TX || T XB BX || B ex_flag 0b0 do i=0 to 127 by 64 reset_xflags() result{i:i+63} ConvertSPtoDP(VSR[XB]{i:i+31}) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the single-precision floating-point operand in bits 0:31 of doubleword element i of VSR[XB]. src is placed into doubleword element i of VSR[XT] in double-precison format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN VSR Data Layout for xvcvspdp src = VSR[XB] SP unused SP unused tgt = VSR[XT] DP DP 0 32 64 96 127 420 Power ISATM Book I Version 2.06 VSX Vector truncate Single-Precision to Special Registers Altered: integer and Convert to Signed Integer FX XX VXSNAN VXCVI Doubleword format with Saturate XX2-form xvcvspsxds XT,XB (0xF000_0660) VSR Data Layout for xvcvspsxds src = VSR[XB] 60 T /// B 408 BX TX 0 6 11 16 21 30 31 SP unused SP unused tgt = VSR[XT] XT TX || T XB BX || B SD SD ex_flag 0b0 0 32 64 96 127 do i=0 to 127 by 64 reset_xflags() Programming Note rnd{0:31} RoundToSPIntegerTrunc(VSR[XB]{i:i+31}) xvcvspsxds rounds using Round towards Zero result{i:i+63} ConvertSPtoSD(rnd) rounding mode. For other rounding modes, soft- if(vxsnan_flag) then SetFX(VXSNAN) ware must use a Round to Single-Precision Integer if(vxcvi_flag) then SetFX(VXCVI) instruction that corresponds to the desired round- if(xx_flag) then SetFX(XX) ing mode, including xvrspic which uses the round- ex_flag ex_flag | (VE & vxsnan_flag) ing mode specified by the RN field in the FPSCR. ex_flag ex_flag | (VE & vxcvi_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the single-precision floating-point operand in word element i×2 of VSR[XB]. If src is a NaN, the result is the value 0x8000_0000_0000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 263-1, the result is 0x7FFF_FFFF_FFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than -263, the result is 0x8000_0000_0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit signed-integer format. The result is placed into doubleword element i of VSR[XT]. See Table 73. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 421 Version 2.06 Inexact? ( RoundToSPintegerTrunc(src) g src ) VE XE Returned Results and Status Setting 0 ­ ­ T(Nmin), fx(VXCVI) src [ Nmin-1 1 ­ ­ fx(VXCVI), error() 0 yes T(Nmin), fx(XX) Nmin-1 < src < Nmin ­ 1 yes fx(XX), error() src = Nmin ­ ­ no T(Nmin) ­ no T(ConvertSPtoSD(RoundToSPintegerTrunc(src))) Nmin < src < Nmax ­ 0 yes T(ConvertSPtoSD(RoundToSPintegerTrunc(src))), fx(XX) 1 yes fx(XX), error() T(Nmax) src = Nmax ­ ­ no Note: This case cannot occur as Nmax is not representable in SP format but is included here for completeness. 0 yes T(Nmax), fx(XX) Nmax < src < Nmax+1 ­ 1 yes fx(XX), error() 0 ­ ­ T(Nmax), fx(VXCVI) src m Nmax+1 1 ­ ­ fx(VXCVI), error() 0 ­ ­ T(Nmin), fx(VXCVI) src is a QNaN 1 ­ ­ fx(VXCVI), error() 0 ­ ­ T(Nmin), fx(VXCVI), fx(VXSNAN) src is a SNaN 1 ­ ­ fx(VXCVI), fx(VXSNAN), error() Explanation: ConvertSPtoSD(x) The single-precision floating-point integer value x converted to signed integer doubleword format. fx(x) FX is set to 1 if x=0. x is set to 1. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of VSR[XT] is suppressed. Nmin The smallest signed integer doubleword value, -263 (0x8000_0000_0000_0000). Nmax The largest signed integer doubleword value, 263-1 (0x7FFF_FFFF_FFFF_FFFF). RoundToSPintegerTrunc(x) The single-precision floating-point value x rounded to an integer using the rounding mode Round towards Zero. src The single-precision floating-point value in word element i of VSR[XB] (where i c {0,2}). T(x) The signed integer doubleword value x is placed in doubleword element i of VSR[XT] (where i c {0,1}). Table 74. Actions for xvcvspsxds 422 Power ISATM Book I Version 2.06 VSX Vector truncate Single-Precision to Special Registers Altered: integer and Convert to Signed Integer Word FX XX VXSNAN VXCVI format with Saturate XX2-form xvcvspsxws XT,XB (0xF000_0260) VSR Data Layout for xvcvspsxws src = VSR[XB] 60 T /// B 152 BX TX 0 6 11 16 21 30 31 SP SP SP SP tgt = VSR[XT] XT TX || T XB BX || B SW SW SW SW ex_flag 0b0 0 32 64 96 127 do i=0 to 127 by 32 reset_xflags() Programming Note rnd{0:31} RoundToSPIntegerTrunc(VSR[XB]{i:i+31}) xvcvspsxws rounds using Round towards Zero result{i:i+31} ConvertSPtoSW(rnd) rounding mode. For other rounding modes, soft- if(vxsnan_flag) then SetFX(VXSNAN) ware must use a Round to Single-Precision Integer if(vxcvi_flag) then SetFX(VXCVI) instruction that corresponds to the desired round- if(xx_flag) then SetFX(XX) ing mode, including xvrspic which uses the round- ex_flag ex_flag | (VE & vxsnan_flag) ing mode specified by the RN field in the FPSCR. ex_flag ex_flag | (VE & vxcvi_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point operand in word element i of VSR[XB]. If src is a NaN, the result is the value 0x8000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 231-1, the result is 0x7FFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than -231, the result is 0x8000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit signed-integer format. The result is placed into word element i of VSR[XT]. See Table 73. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 423 Version 2.06 Inexact? ( RoundToSPintegerTrunc(src) g src ) VE XE Returned Results and Status Setting 0 ­ ­ T(Nmin), fx(VXCVI) src [ Nmin-1 1 ­ ­ fx(VXCVI), error() 0 yes T(Nmin), fx(XX) Nmin-1 < src < Nmin ­ 1 yes fx(XX), error() src = Nmin ­ ­ no T(Nmin) ­ no T(ConvertSPtoSW(RoundToSPintegerTrunc(src))) Nmin < src < Nmax ­ 0 yes T(ConvertSPtoSW(RoundToSPintegerTrunc(src))), fx(XX) 1 yes fx(XX), error() T(Nmax) src = Nmax ­ ­ no Note: This case cannot occur as Nmax is not representable in SP format but is included here for completeness. 0 yes T(Nmax), fx(XX) Nmax < src < Nmax+1 ­ 1 yes fx(XX), error() 0 ­ ­ T(Nmax), fx(VXCVI) src m Nmax+1 1 ­ ­ fx(VXCVI), error() 0 ­ ­ T(Nmin), fx(VXCVI) src is a QNaN 1 ­ ­ fx(VXCVI), error() 0 ­ ­ T(Nmin), fx(VXCVI), fx(VXSNAN) src is a SNaN 1 ­ ­ fx(VXCVI), fx(VXSNAN), error() Explanation: ConvertSPtoSW(x) The single-precision floating-point integer value x converted to signed integer word format. fx(x) FX is set to 1 if x=0. x is set to 1. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of VSR[XT] is suppressed. Nmin The smallest signed integer word value, -231 (0x8000_0000). Nmax The largest signed integer word value, 231-1 (0x7FFF_FFFF). RoundToSPintegerTrunc(x) The single-precision floating-point value x rounded to an integer using the rounding mode Round towards Zero. src The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). T(x) The signed integer word value x is placed in word element i of VSR[XT] (where i c {0,1,2,3}). Table 75. Actions for xvcvspsxws 424 Power ISATM Book I Version 2.06 VSX Vector truncate Single-Precision to integer and Convert to Unsigned Integer VSR Data Layout for xvcvspuxds Doubleword format with Saturate XX2-form src = VSR[XB] xvcvspuxds XT,XB (0xF000_0620) SP unused SP unused tgt = VSR[XT] 60 T /// B 392 BX TX 0 6 11 16 21 30 31 UD UD 0 32 64 96 127 XT TX || T XB BX || B Programming Note ex_flag 0b0 xvcvspuxds rounds using Round towards Zero do i=0 to 127 by 64 rounding mode. For other rounding modes, soft- reset_xflags() ware must use a Round to Single-Precision Integer rnd{0:inf} RoundToSPIntegerTrunc(src) instruction that corresponds to the desired round- result{i:i+63} ConvertSPtoUD(rnd) ing mode, including xvrspic which uses the round- if(vxsnan_flag) then SetFX(VXSNAN) ing mode specified by the RN field in the FPSCR. if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxcvi_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the single-precision floating-point operand in word element i×2 of VSR[XB]. If src is a NaN, the result is the value 0x0000_0000_0000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 264-1, the result is 0xFFFF_FFFF_FFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than 0, the result is 0x0000_0000_0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit unsigned-integer format. The result is placed into doubleword element i of VSR[XT]. See Table 73. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 425 Version 2.06 Inexact? ( RoundToSPintegerTrunc(src) g src ) VE XE Returned Results and Status Setting 0 ­ ­ T(Nmin), fx(VXCVI) src [ Nmin-1 1 ­ ­ fx(VXCVI), error() 0 yes T(Nmin), fx(XX) Nmin-1 < src < Nmin ­ 1 yes fx(XX), error() src = Nmin ­ ­ no T(Nmin) ­ no T(ConvertSPtoUD(RoundToSPintegerTrunc(src))) Nmin < src < Nmax ­ 0 yes T(ConvertSPtoUD(RoundToSPintegerTrunc(src))), fx(XX) 1 yes fx(XX), error() T(Nmax) src = Nmax ­ ­ no Note: This case cannot occur as Nmax is not representable in SP format but is included here for completeness. 0 yes T(Nmax), fx(XX) Nmax < src < Nmax+1 ­ 1 yes fx(XX), error() 0 ­ ­ T(Nmax), fx(VXCVI) src m Nmax+1 1 ­ ­ fx(VXCVI), error() 0 ­ ­ T(Nmin), fx(VXCVI) src is a QNaN 1 ­ ­ fx(VXCVI), error() 0 ­ ­ T(Nmin), fx(VXCVI), fx(VXSNAN) src is a SNaN 1 ­ ­ fx(VXCVI), fx(VXSNAN), error() Explanation: ConvertSPtoUD(x) The single-precision floating-point integral value x converted to unsigned integer doubleword format. fx(x) FX is set to 1 if x=0. x is set to 1. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of VSR[XT] is suppressed. Nmin The smallest unsigned integer doubleword value, 0 (0x0000_0000_0000_0000). Nmax The largest unsigned integer doubleword value, 264-1 (0xFFFF_FFFF_FFFF_FFFF). RoundToSPintegerTrunc(x) The single-precision floating-point value x rounded to an integer using the rounding mode Round towards Zero. src The single-precision floating-point value in word element i of VSR[XB] (where i c {0,2}). T(x) The unsigned integer doubleword value x is placed in doubleword element i of VSR[XT] (where i c {0,1}). Table 76. Actions for xvcvspuxds 426 Power ISATM Book I Version 2.06 VSX Vector truncate Single-Precision to Special Registers Altered: integer and Convert to Unsigned Integer Word FX XX VXSNAN VXCVI format with Saturate XX2-form xvcvspuxws XT,XB (0xF000_0220) VSR Data Layout for xvcvspuxws src = VSR[XB] 60 T /// B 136 BX TX 0 6 11 16 21 30 31 SP SP SP SP tgt = VSR[XT] XT TX || T XB BX || B UW UW UW UW ex_flag 0b0 0 32 64 96 127 do i=0 to 127 by 32 reset_xflags() Programming Note rnd{0:31} RoundToSPIntegerTrunc(src) xvcvspuxws rounds using Round towards Zero result{i:i+31} ConvertSPtoUW(rnd) rounding mode. For other rounding modes, soft- if(vxsnan_flag) then SetFX(VXSNAN) ware must use a Round to Single-Precision Integer if(vxcvi_flag) then SetFX(VXCVI) instruction that corresponds to the desired round- if(xx_flag) then SetFX(XX) ing mode, including xvrspic which uses the round- ex_flag ex_flag | (VE & vxsnan_flag) ing mode specified by the RN field in the FPSCR. ex_flag ex_flag | (VE & vxcvi_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point operand in word element i of VSR[XB]. If src is a NaN, the result is the value 0x0000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 232-1, the result is 0xFFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than 0, the result is 0x0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit unsigned-integer format. The result is placed into word element i of VSR[XT]. See Table 73. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 427 Version 2.06 Inexact? ( RoundToSPintegerTrunc(src) g src ) VE XE Returned Results and Status Setting 0 ­ ­ T(Nmin), fx(VXCVI) src [ Nmin-1 1 ­ ­ fx(VXCVI), error() 0 yes T(Nmin), fx(XX) Nmin-1 < src < Nmin ­ 1 yes fx(XX), error() src = Nmin ­ ­ no T(Nmin) ­ no T(ConvertSPtoUW(RoundToSPintegerTrunc(src))) Nmin < src < Nmax ­ 0 yes T(ConvertSPtoUW(RoundToSPintegerTrunc(src))), fx(XX) 1 yes fx(XX), error() T(Nmax) src = Nmax ­ ­ no Note: This case cannot occur as Nmax is not representable in SP format but is included here for completeness. 0 yes T(Nmax), fx(XX) Nmax < src < Nmax+1 ­ 1 yes fx(XX), error() 0 ­ ­ T(Nmax), fx(VXCVI) src m Nmax+1 1 ­ ­ fx(VXCVI), error() 0 ­ ­ T(Nmin), fx(VXCVI) src is a QNaN 1 ­ ­ fx(VXCVI), error() 0 ­ ­ T(Nmin), fx(VXCVI), fx(VXSNAN) src is a SNaN 1 ­ ­ fx(VXCVI), fx(VXSNAN), error() Explanation: ConvertSPtoUW(x) The single-precision floating-point integer value x converted to unsigned integer word format. fx(x) FX is set to 1 if x=0. x is set to 1. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of VSR[XT] is suppressed. Nmin The smallest unsigned integer word value, 0 (0x0000_0000). Nmax The largest unsigned integer word value, 232-1 (0xFFFF_FFFF). RoundToSPintegerTrunc(x) The single-precision floating-point value x rounded to an integer using the rounding mode Round towards Zero. src The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). T(x) The unsigned integer word value x is placed in word element i of VSR[XT] (where i c {0,1,2,3}). Table 77. Actions for xvcvspuxws 428 Power ISATM Book I Version 2.06 VSX Vector Convert and round Signed Integer VSX Vector Convert and round Signed Integer Doubleword to Double-Precision format Doubleword to Single-Precision format XX2-form XX2-form xvcvsxddp XT,XB (0xF000_07E0) xvcvsxdsp XT,XB (0xF000_06E0) 60 T /// B 504 BX TX 60 T /// B 440 BX TX 0 6 11 16 21 30 31 0 6 11 16 21 30 31 XT TX || T XT TX || T XB BX || B XB BX || B ex_flag 0b0 ex_flag 0b0 do i=0 to 127 by 64 do i=0 to 127 by 64 reset_xflags() reset_xflags() v{0:inf} ConvertSDtoFP(VSR[XB]{i:i+63}) v{0:inf} ConvertSDtoFP(VSR[XB]{i:i+63}) result{i:i+63} RoundToDP(RN,v) result{i:i+31} RoundToSP(RN,v) if(xx_flag) then SetFX(XX) result{i+32:i+63} 0xUUUU_UUUU ex_flag ex_flag | (XE & xx_flag) if(xx_flag) then SetFX(XX) end ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the signed integer in doubleword For each vector element i from 0 to 1, do the following. element i of VSR[XB]. Let src be the signed integer in doubleword element i of VSR[XB]. src is converted to an unbounded-precision floating-point value and rounded to src is converted to an unbounded-precision double-precision using the rounding mode floating-point value and rounded to specified by the Floating-Point Rounding Control single-precision using the rounding mode field RN of the FPSCR. specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into doubleword element i of VSR[XT] in double-precision format. The result is placed into bits 0:31 of doubleword element i of VSR[XT] in single-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. The contents of bits 32:63 of doubleword element i of VSR[XT] are undefined. Special Registers Altered: FX XX If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. VSR Data Layout for xvcvsxddp Special Registers Altered: src = VSR[XB] FX XX SD SD VSR Data Layout for xvcvsxdsp tgt = VSR[XT] src = VSR[XB] DP DP SD SD 0 64 127 tgt = VSR[XT] SP undefined SP undefined 0 32 64 96 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 429 Version 2.06 VSX Vector Convert Signed Integer Word to VSX Vector Convert and round Signed Integer Double-Precision format XX2-form Word to Single-Precision format XX2-form xvcvsxwdp XT,XB (0xF000_03E0) xvcvsxwsp XT,XB (0xF000_02E0) 60 T /// B 248 BX TX 60 T /// B 184 BX TX 0 6 11 16 21 30 31 0 6 11 16 21 30 31 XT TX || T XT TX || T XB BX || B XB BX || B ex_flag 0b0 ex_flag 0b0 do i=0 to 127 by 64 do i=0 to 127 by 32 reset_xflags() reset_xflags() v{0:inf} ConvertSWtoFP(VSR[XB]{i:i+31}) v{0:inf} ConvertSWtoFP(VSR[XB]{i:i+31}) result{i:i+63} RoundToDP(RN,v) result{i:i+31} RoundToSP(RN,v) if(xx_flag) then SetFX(XX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (XE & xx_flag) ex_flag ex_flag | (XE & xx_flag) end end if( ex_flag = 0 ) then VSR[XT] result if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. For each vector element i from 0 to 3, do the following. Let src be the signed integer in bits 0:31 of Let src be the signed integer in word element i of doubleword element i of VSR[XB]. VSR[XB]. src is converted to an unbounded-precision src is converted to an unbounded-precision floating-point value and rounded to floating-point value and rounded to double-precision using the rounding mode single-precision using the rounding mode specified by the Floating-Point Rounding Control specified by the Floating-Point Rounding Control field RN of the FPSCR. field RN of the FPSCR. The result is placed into doubleword element i of The result is placed into word element i of VSR[XT] in double-precision format. VSR[XT] in single-precision format. If a trap-enabled exception occurs in any element of If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. the vector, no results are written to VSR[XT]. Special Registers Altered: Special Registers Altered: FX XX FX XX VSR Data Layout for xvcvsxwdp VSR Data Layout for xvcvsxwsp src = VSR[XB] src = VSR[XB] SW unused SW unused SW SW SW SW tgt = VSR[XT] tgt = VSR[XT] DP DP SP SP SP SP 0 32 64 96 127 0 32 64 96 127 430 Power ISATM Book I Version 2.06 VSX Vector Convert and round Unsigned VSX Vector Convert and round Unsigned Integer Doubleword to Double-Precision Integer Doubleword to Single-Precision format XX2-form format XX2-form xvcvuxddp XT,XB (0xF000_07A0) xvcvuxdsp XT,XB (0xF000_06A0) 60 T /// B 488 BX TX 60 T /// B 424 BX TX 0 6 11 16 21 30 31 0 6 11 16 21 30 31 XT TX || T XT TX || T XB BX || B XB BX || B ex_flag 0b0 ex_flag 0b0 do i=0 to 127 by 64 do i=0 to 127 by 64 reset_xflags() reset_xflags() v{0:inf} ConvertUDtoFP(VSR[XB]{i:i+63}) v{0:inf} ConvertUDtoFP(VSR[XB]{i:i+63}) result{i:i+63} RoundToDP(RN,v) result{i:i+31} RoundToSP(RN,v) if(xx_flag) then SetFX(XX) result{i+32:i+63} 0xUUUU_UUUU ex_flag ex_flag | (XE & xx_flag) if(xx_flag) then SetFX(XX) end ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the unsigned integer in doubleword For each vector element i from 0 to 1, do the following. element i of VSR[XB]. Let src be the unsigned integer in doubleword element i of VSR[XB]. src is converted to an unbounded-precision floating-point value and rounded to src is converted to an unbounded-precision double-precision using the rounding mode floating-point value and rounded to specified by the Floating-Point Rounding Control single-precision using the rounding mode field RN of the FPSCR. specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into doubleword element i of VSR[XT] in double-precision format. The result is placed into bits 0:31 of doubleword element i of VSR[XT] in single-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. The contents of bits 32:63 of doubleword element i of VSR[XT] are undefined. Special Registers Altered: FX XX If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. VSR Data Layout for xvcvuxddp Special Registers Altered: src = VSR[XB] FX XX UD UD VSR Data Layout for xvcvuxdsp tgt = VSR[XT] src = VSR[XB] DP DP UD UD 0 32 64 96 127 tgt = VSR[XT] SP undefined SP undefined 0 32 64 96 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 431 Version 2.06 VSX Vector Convert and round Unsigned VSX Vector Convert and round Unsigned Integer Word to Double-Precision format Integer Word to Single-Precision format XX2-form XX2-form xvcvuxwdp XT,XB (0xF000_03A0) xvcvuxwsp XT,XB (0xF000_02A0) 60 T /// B 232 BX TX 60 T /// B 168 BX TX 0 6 11 16 21 30 31 0 6 11 16 21 30 31 XT TX || T XT TX || T XB BX || B XB BX || B ex_flag 0b0 ex_flag 0b0 do i=0 to 127 by 64 do i=0 to 127 by 32 reset_xflags() reset_xflags() v{0:inf} ConvertUWtoFP(VSR[XB]{i:i+31}) v{0:inf} ConvertUWtoFP(VSR[XB]{i:i+31}) result{i:i+63} RoundToDP(RN,v) result{i:i+31} RoundToSP(RN,v) if(xx_flag) then SetFX(XX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (XE & xx_flag) ex_flag ex_flag | (XE & xx_flag) end end if( ex_flag = 0 ) then VSR[XT] result if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. For each vector element i from 0 to 3, do the following. Let src be the unsigned integer in bits 0:31 of Let src be the unsigned integer in word element i doubleword element i of VSR[XB]. of VSR[XB]. src is converted to an unbounded-precision src is converted to an unbounded-precision floating-point value and rounded to floating-point value and rounded to double-precision using the rounding mode single-precision using the rounding mode specified by the Floating-Point Rounding Control specified by the Floating-Point Rounding Control field RN of the FPSCR. field RN of the FPSCR. The result is placed into doubleword element i of The result is placed into word element i of VSR[XT] in double-precision format. VSR[XT] in single-precision format. If a trap-enabled exception occurs in any element of If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. the vector, no results are written to VSR[XT]. Special Registers Altered: Special Registers Altered: FX XX FX XX VSR Data Layout for xvcvuxwdp VSR Data Layout for xvcvuxwsp src = VSR[XB] src = VSR[XB] UW unused UW unused UW UW UW UW tgt = VSR[XT] tgt = VSR[XT] DP DP SP SP SP SP 0 32 64 96 127 0 32 64 96 127 432 Power ISATM Book I Version 2.06 VSX Vector Divide Double-Precision XX3-form See Table 46, "Floating-Point Intermediate Result Handling," on page 344. xvdivdp XT,XA,XB (0xF000_03C0) The result is placed into doubleword element i of 60 T A B 120 AXBX TX VSR[XT] in double-precision format. 0 6 11 16 21 29 30 31 See Table 68, "Vector Floating-Point Final Result," XT TX || T on page 400. XA AX || A XB BX || B If a trap-enabled exception occurs in any element of ex_flag 0b0 the vector, no results are written to VSR[XT]. do i=0 to 127 by 64 reset_xflags() Special Registers Altered: src1 VSR[XA]{i:i+63} FX OX UX ZX XX src2 VSR[XB]{i:i+63} VXSNAN VXIDI VXZDZ v{0:inf} DivideDP(src1,src2) result{i:i+63} RoundToDP(RN,v) VSR Data Layout for xvdivdp if(vxsnan_flag) then SetFX(VXSNAN) if(vxidi_flag) then SetFX(VXIDI) src1 = VSR[XA] if(vxisi_flag) then SetFX(VXZDZ) DP DP if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) src2 = VSR[XB] if(xx_flag) then SetFX(XX) if(zx_flag) then SetFX(ZX) DP DP ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxidi_flag) tgt = VSR[XT] ex_flag ex_flag | (VE & vxzdz_flag) DP DP ex_flag ex_flag | (OE & ox_flag) 0 64 127 ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (ZE & zx_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. src1 is divided1 by src2, producing a quotient having unbounded range and precision. The quotient is normalized2. See Table 78. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. 1. Floating-point division is based on exponent subtraction and division of the significands. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 433 Version 2.06 src2 -Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v dQNaN v Q(src2) -Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxidi_flag 1 vxidi_flag 1 vxsnan_flag 1 v +Infinity v ­Infinity v Q(src2) -NZF v +Zero v D(src1,src2) v D(src1,src2) v ­Zero v src2 zx_flag 1 zx_flag 1 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) -Zero v +Zero v +Zero v ­Zero v ­Zero v src2 vxzdz_flag 1 vxzdz_flag 1 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) +Zero v ­Zero v ­Zero v +Zero v +Zero v src2 vxzdz_flag 1 vxzdz_flag 1 vxsnan_flag 1 src1 v ­Infinity v +Infinity v Q(src2) +NZF v ­Zero v D(src1,src2) v D(src1,src2) v +Zero v src2 zx_flag 1 zx_flag 1 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxidi_flag 1 vxidi_flag 1 vxsnan_flag 1 v src1 QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1 vxsnan_flag 1 v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Explanation: src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). src2 The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). D(x,y) Return the normalized quotient of floating-point value x divided by floating-point value y, having unbounded range and precision. Q(x) Return a QNaN with the payload of x. v The intermediate result having unbounded signficand precision and unbounded exponent range. Table 78. Actions for xvdivdp (element i) 434 Power ISATM Book I Version 2.06 VSX Vector Divide Single-Precision XX3-form See Table 46, "Floating-Point Intermediate Result Handling," on page 344. xvdivsp XT,XA,XB (0xF000_02C0) The result is placed into word element i of 60 T A B 88 AXBX TX VSR[XT] in single-precision format. 0 6 11 16 21 29 30 31 See Table 68, "Vector Floating-Point Final Result," XT TX || T on page 400. XA AX || A XB BX || B If a trap-enabled exception occurs in any element of ex_flag 0b0 the vector, no results are written to VSR[XT]. do i=0 to 127 by 32 reset_xflags() Special Registers Altered: src1 VSR[XA]{i:i+31} FX OX UX ZX XX src2 VSR[XB]{i:i+31} VXSNAN VXIDI VXZDZ v{0:inf} DivideSP(src1,src2) result{i:i+31} RoundToSP(RN,v) VSR Data Layout for xvdivsp if(vxsnan_flag) then SetFX(VXSNAN) if(vxidi_flag) then SetFX(VXIDI) src1 = VSR[XA] if(vxisi_flag) then SetFX(VXZDZ) SP SP SP SP if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) src2 = VSR[XB] if(xx_flag) then SetFX(XX) if(zx_flag) then SetFX(ZX) SP SP SP SP ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxidi_flag) tgt = VSR[XT] ex_flag ex_flag | (VE & vxzdz_flag) SP SP SP SP ex_flag ex_flag | (OE & ox_flag) 0 32 64 96 127 ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (ZE & zx_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. src1 is divided1 by src2, producing a quotient having unbounded range and precision. The quotient is normalized2. See Table 79. The intermediate result is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. 1. Floating-point division is based on exponent subtraction and division of the significands. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 435 Version 2.06 src2 -Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v dQNaN v Q(src2) -Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxidi_flag 1 vxidi_flag 1 vxsnan_flag 1 v +Infinity v ­Infinity v Q(src2) -NZF v +Zero v D(src1,src2) v D(src1,src2) v ­Zero v src2 zx_flag 1 zx_flag 1 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) -Zero v +Zero v +Zero v ­Zero v ­Zero v src2 vxzdz_flag 1 vxzdz_flag 1 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) +Zero v ­Zero v ­Zero v +Zero v +Zero v src2 vxzdz_flag 1 vxzdz_flag 1 vxsnan_flag 1 src1 v ­Infinity v +Infinity v Q(src2) +NZF v ­Zero v D(src1,src2) v D(src1,src2) v +Zero v src2 zx_flag 1 zx_flag 1 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxidi_flag 1 vxidi_flag 1 vxsnan_flag 1 v src1 QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1 vxsnan_flag 1 v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Explanation: src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). src2 The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). dQNaN Default quiet NaN (0x7FC0_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). D(x,y) Return the normalized quotient of floating-point value x divided by floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Q(x) Return a QNaN with the payload of x. v The intermediate result having unbounded signficand precision and unbounded exponent range. Table 79. Actions for xvdivsp (element i) 436 Power ISATM Book I Version 2.06 VSX Vector Multiply-Add Double-Precision For xvmaddmdp, do the following. XX3-form ­ Let src2 be the double-precision floating-point operand in doubleword element xvmaddadp XT,XA,XB (0xF000_0308) i of VSR[XB]. ­ Let src3 be the double-precision 60 T A B 97 AXBX TX floating-point operand in doubleword element 0 6 11 16 21 29 30 31 i of VSR[XT]. xvmaddmdp XT,XA,XB (0xF000_0348) src1 is multiplied1 by src3, producing a product having unbounded range and precision. 60 T A B 105 AXBX TX 0 6 11 16 21 29 30 31 See part 1 of Table 80. XT TX || T src2 is added2 to the product, producing a sum XA AX || A XB BX || B having unbounded range and precision. ex_flag 0b0 The sum is normalized3. do i=0 to 127 by 64 reset_xflags() See part 2 of Table 80. src1 VSR[XA]{i:i+63} src2 "xvmaddadp" ? VSR[XT]{i:i+63} : VSR[XB]{i:i+63} The intermediate result is rounded to src3 "xvmaddadp" ? VSR[XB]{i:i+63} : VSR[XT]{i:i+63} double-precision using the rounding mode v{0:inf} MultiplyAddDP(src1,src3,src2) specified by the Floating-Point Rounding Control result{i:i+63} RoundToDP(RN,v) field RN of the FPSCR. if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) See Table 46, "Floating-Point Intermediate Result if(vxisi_flag) then SetFX(VXISI) Handling," on page 344. if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) The result is placed into doubleword element i of if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) VSR[XT] in double-precision format. ex_flag ex_flag | (VE & vximz_flag) ex_flag ex_flag | (VE & vxisi_flag) See Table 68, "Vector Floating-Point Final Result," ex_flag ex_flag | (OE & ox_flag) on page 400. ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) If a trap-enabled exception occurs in any element of end the vector, no results are written to VSR[XT]. if( ex_flag = 0 ) then VSR[XT] result Special Registers Altered: FX OX UX XX VXSNAN VXISI VXIMZ Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. For xvmaddadp, do the following. ­ Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XT]. ­ Let src3 be the double-precision floating-point operand in doubleword element i of VSR[XB]. 1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 437 Version 2.06 VSR Data Layout for xvmadd(a|m)dp src1 = VSR[XA] DP DP src2 = xsmaddadp ? VSR[XT] : VSR[XB] DP DP src3 = xsmaddadp ? VSR[XB] : VSR[XT] DP DP tgt = VSR[XT] DP DP 0 64 127 438 Power ISATM Book I Version 2.06 Part 1: src3 Multiply ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN p dQNaN p dQNaN p Q(src3) ­Infinity p +Infinity p +Infinity p ­Infinity p ­Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p Q(src3) ­NZF p +Infinity p M(src1,src3) p +Zero p ­Zero p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) ­Zero p +Zero p +Zero p ­Zero p ­Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Zero p ­Zero p ­Zero p +Zero p +Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 src1 p Q(src3) +NZF p ­Infinity p M(src1,src3) p ­Zero p +Zero p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Infinity p ­Infinity p +Infinity p +Infinity p +Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p src1 QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1 vxsnan_flag 1 p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Part 2: src2 Add ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) ­NZF v ­Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2 vxsnan_flag 1 v Q(src2) ­Zero v ­Infinity v src2 v ­Zero v Rezd v src2 v +Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v ­Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2 vxsnan_flag 1 p v Q(src2) +NZF v ­Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 QNaN & vp vp vp vp vp vp vp vp src1 is a NaN vxsnan_flag 1 QNaN & v Q(src2) vp vp vp vp vp vp v src2 src1 not a NaN vxsnan_flag 1 Explanation: src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). src2 For xvmaddadp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). For xvmaddmdp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). src3 For xvmaddadp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). For xvmaddmdp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Q(x) Return a QNaN with the payload of x. A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. p The intermediate product having unbounded range and precision. v The intermediate result having unbounded range and precision. Table 80. Actions for xvmadd(a|m)dp Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 439 Version 2.06 VSX Vector Multiply-Add Single-Precision For xvmaddmsp, do the following. XX3-form ­ Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. xvmaddasp XT,XA,XB (0xF000_0208) ­ Let src3 be the single-precision floating-point operand in word element i of VSR[XT]. 60 T A B 65 AXBX TX 0 6 11 16 21 29 30 31 src1 is multiplied1 by src3, producing a product having unbounded range and precision. xvmaddmsp XT,XA,XB (0xF000_0248) See part 1 of Table 81. 60 T A B 73 AXBX TX 0 6 11 16 21 29 30 31 src2 is added2 to the product, producing a sum having unbounded range and precision. XT TX || T XA AX || A The sum is normalized3. XB BX || B ex_flag 0b0 See part 2 of Table 81. do i=0 to 127 by 32 reset_xflags() The intermediate result is rounded to src1 VSR[XA]{i:i+31} single-precision using the rounding mode src2 "xvmaddasp" ? VSR[XT]{i:i+31} : VSR[XB]{i:i+31} specified by the Floating-Point Rounding Control src3 "xvmaddasp" ? VSR[XB]{i:i+31} : VSR[XT]{i:i+31} field RN of the FPSCR. v{0:inf} MultiplyAddSP(src1,src3,src2) result{i:i+63} RoundToSP(RN,v) See Table 46, "Floating-Point Intermediate Result if(vxsnan_flag) then SetFX(VXSNAN) Handling," on page 344. if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) The result is placed into word element i of if(ox_flag) then SetFX(OX) VSR[XT] in single-precision format. if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) See Table 68, "Vector Floating-Point Final Result," ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vximz_flag) on page 400. ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) If a trap-enabled exception occurs in any element of ex_flag ex_flag | (UE & ux_flag) the vector, no results are written to VSR[XT]. ex_flag ex_flag | (XE & xx_flag) end Special Registers Altered: FX OX UX XX VXSNAN VXISI VXIMZ if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. For xvmaddasp, do the following. ­ Let src2 be the single-precision floating-point operand in word element i of VSR[XT]. ­ Let src3 be the single-precision floating-point operand in word element i of VSR[XB]. 1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. 440 Power ISATM Book I Version 2.06 VSR Data Layout for xvmadd(a|m)sp src1 = VSR[XA] SP SP SP SP src2 = xsmaddasp ? VSR[XT] : VSR[XB] SP SP SP SP src3 = xsmaddasp ? VSR[XB] : VSR[XT] SP SP SP SP tgt = VSR[XT] SP SP SP SP 0 32 64 96 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 441 Version 2.06 Part 1: src3 Multiply ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN p dQNaN p dQNaN p Q(src3) ­Infinity p +Infinity p +Infinity p ­Infinity p ­Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p Q(src3) ­NZF p +Infinity p M(src1,src3) p +Zero p ­Zero p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) ­Zero p +Zero p +Zero p ­Zero p ­Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Zero p ­Zero p ­Zero p +Zero p +Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 src1 p Q(src3) +NZF p ­Infinity p M(src1,src3) p ­Zero p +Zero p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Infinity p ­Infinity p +Infinity p +Infinity p +Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p src1 QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1 vxsnan_flag 1 p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Part 2: src2 Add ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) ­NZF v ­Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2 vxsnan_flag 1 v Q(src2) ­Zero v ­Infinity v src2 v ­Zero v Rezd v src2 v +Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v ­Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2 vxsnan_flag 1 p v Q(src2) +NZF v ­Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 QNaN & vp vp vp vp vp vp vp vp src1 is a NaN vxsnan_flag 1 QNaN & v Q(src2) vp vp vp vp vp vp v src2 src1 not a NaN vxsnan_flag 1 Explanation: src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). src2 For xvmaddasp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). For xvmaddmsp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). src3 For xvmaddasp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). For xvmaddmsp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). dQNaN Default quiet NaN (0x7FC0_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Q(x) Return a QNaN with the payload of x. A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. p The intermediate product having unbounded range and precision. v The intermediate result having unbounded range and precision. Table 81. Actions for xvmadd(a|m)sp 442 Power ISATM Book I Version 2.06 VSX Vector Maximum Double-Precision XX3-form VSR Data Layout for xvmaxdp src1 = VSR[XA] xvmaxdp XT,XA,XB (0xF000_0700) DP DP 60 T A B 224 AXBX TX src2 = VSR[XB] 0 6 11 16 21 29 30 31 DP DP XT TX || T XA AX || A tgt = VSR[XT] XB BX || B DP DP ex_flag 0b0 0 64 127 do i=0 to 127 by 64 reset_xflags() src1 VSR[XA]{i:i+63} src2 VSR[XB]{i:i+63} result{i:i+63} MaximumDP(src1,src2) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. If src1 is greater than src2, src1 is placed into doubleword element i of VSR[XT] in double-precision format. Otherwise, src2 is placed into doubleword element i of VSR[XT] in double-precision format. The maximum of +0 and ­0 is +0. The maximum of a QNaN and any value is that value. The maximum of any value and an SNaN when VE=0 is that SNaN converted to a QNaN. See Table 82. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 443 Version 2.06 src2 ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN T(Q(src2)) ­Infinity T(src1) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) fx(VXSNAN) T(Q(src2)) ­NZF T(src1) T(M(src1,src2)) T(src2) T(src2) T(src2) T(src2) T(src1) fx(VXSNAN) T(Q(src2)) ­Zero T(src1) T(src1) T(src1) T(src2) T(src2) T(src2) T(src1) fx(VXSNAN) T(Q(src2)) +Zero T(src1) T(src1) T(src1) T(src1) T(src2) T(src2) T(src1) fx(VXSNAN) src1 T(Q(src2)) +NZF T(src1) T(src1) T(src1) T(src1) T(M(src1,src2)) T(src2) T(src1) fx(VXSNAN) T(Q(src2)) +Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) fx(VXSNAN) T(src1) QNaN T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) fx(VXSNAN) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) SNaN fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) Explanation: src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). src2 The double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). NZF Nonzero finite number. Q(x) Return a QNaN with the payload of x. M(x,y) Return the greater of floating-point value x and floating-point value y. T(x) The value x is placed in doubleword element i (i{0,1}) of VSR[XT] in double-precision format. FPRF, FR and FI are not modified. fx(x) If x is equal to 0, FX is set to 1. x is set to 1. VXSNAN Floating-point Invalid Operation Exception (SNaN). If VE=1, update of VSR[XT] is suppressed. Table 82. Actions for xvmaxdp 444 Power ISATM Book I Version 2.06 VSX Vector Maximum Single-Precision XX3-form VSR Data Layout for xvmaxsp src1 = VSR[XA] xvmaxsp XT,XA,XB (0xF000_0600) SP SP SP SP 60 T A B 192 AXBX TX src2 = VSR[XB] 0 6 11 16 21 29 30 31 SP SP SP SP XT TX || T XA AX || A tgt = VSR[XT] XB BX || B SP SP SP SP ex_flag 0b0 0 32 64 96 127 do i=0 to 127 by 32 reset_xflags() src1 VSR[XA]{i:i+31} src2 VSR[XB]{i:i+31} result{i:i+63} MaximumSP(src1,src2) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. If src1 is greater than src2, src1 is placed into word element i of VSR[XT] in single-precision format. Otherwise, src2 is placed into word element i of VSR[XT] in single-precision format. The maximum of +0 and ­0 is +0. The maximum of a QNaN and any value is that value. The maximum of any value and an SNaN when VE=0 is that SNaN converted to a QNaN. See Table 83. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 445 Version 2.06 src2 ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN T(Q(src2)) ­Infinity T(src1) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) fx(VXSNAN) T(Q(src2)) ­NZF T(src1) T(M(src1,src2)) T(src2) T(src2) T(src2) T(src2) T(src1) fx(VXSNAN) T(Q(src2)) ­Zero T(src1) T(src1) T(src1) T(src2) T(src2) T(src2) T(src1) fx(VXSNAN) T(Q(src2)) +Zero T(src1) T(src1) T(src1) T(src1) T(src2) T(src2) T(src1) fx(VXSNAN) src1 T(Q(src2)) +NZF T(src1) T(src1) T(src1) T(src1) T(M(src1,src2)) T(src2) T(src1) fx(VXSNAN) T(Q(src2)) +Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) fx(VXSNAN) T(src1) QNaN T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) fx(VXSNAN) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) SNaN fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) Explanation: src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). src2 The single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). NZF Nonzero finite number. Q(x) Return a QNaN with the payload of x. M(x,y) Return the greater of floating-point value x and floating-point value y. T(x) The value x is placed in word element i (i{0,1,2,3}) of VSR[XT] in single-precision format. FPRF, FR and FI are not modified. fx(x) If x is equal to 0, FX is set to 1. x is set to 1. VXSNAN Floating-point Invalid Operation Exception (SNaN). If VE=1, update of VSR[XT] is suppressed. Table 83. Actions for xvmaxsp 446 Power ISATM Book I Version 2.06 VSX Vector Minimum Double-Precision XX3-form VSR Data Layout for xvmindp src1 = VSR[XA] xvmindp XT,XA,XB (0xF000_0740) DP DP 60 T A B 232 AXBX TX src2 = VSR[XB] 0 6 11 16 21 29 30 31 DP DP XT TX || T XA AX || A tgt = VSR[XT] XB BX || B DP DP ex_flag 0b0 0 64 127 do i=0 to 127 by 64 reset_xflags() src1 VSR[XA]{i:i+63} src2 VSR[XB]{i:i+63} result{i:i+63} MinimumDP(src1,src2) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. If src1 is less than src2, src1 is placed into doubleword element i of VSR[XT] in double-precision format. Otherwise, src2 is placed into doubleword element i of VSR[XT] in double-precision format. The minimum of +0 and ­0 is ­0. The minimum of a QNaN and any value is that value. The minimum of any value and an SNaN when VE=0 is that SNaN converted to a QNaN. See Table 84. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 447 Version 2.06 src2 ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN T(Q(src2)) ­Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) fx(VXSNAN) T(Q(src2)) ­NZF T(src2) T(M(src1,src2)) T(src1) T(src1) T(src1) T(src1) T(src1) fx(VXSNAN) T(Q(src2)) ­Zero T(src2) T(src2) T(src1) T(src1) T(src1) T(src1) T(src1) fx(VXSNAN) T(Q(src2)) +Zero T(src2) T(src2) T(src2) T(src1) T(src1) T(src1) T(src1) fx(VXSNAN) src1 T(Q(src2)) +NZF T(src2) T(src2) T(src2) T(src2) T(M(src1,src2)) T(src1) T(src1) fx(VXSNAN) T(Q(src2)) +Infinity T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) T(src1) fx(VXSNAN) T(src1) QNaN T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) fx(VXSNAN) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) SNaN fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) Explanation: src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). src2 The double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). NZF Nonzero finite number. Q(x) Return a QNaN with the payload of x. M(x,y) Return the lesser of floating-point value x and floating-point value y. T(x) The value x is placed in doubleword element i (i{0,1}) of VSR[XT] in double-precision format. FPRF, FR and FI are not modified. fx(x) If x is equal to 0, FX is set to 1. x is set to 1. VXSNAN Floating-point Invalid Operation Exception (SNaN). If VE=1, update of VSR[XT] is suppressed. Table 84. Actions for xvmindp 448 Power ISATM Book I Version 2.06 VSX Vector Minimum Single-Precision XX3-form VSR Data Layout for xvminsp src1 = VSR[XA] xvminsp XT,XA,XB (0xF000_0640) SP SP SP SP 60 T A B 200 AXBX TX src2 = VSR[XB] 0 6 11 16 21 29 30 31 SP SP SP SP XT TX || T XA AX || A tgt = VSR[XT] XB BX || B SP SP SP SP ex_flag 0b0 0 32 64 96 127 do i=0 to 127 by 32 reset_xflags() src1 VSR[XA]{i:i+31} src2 VSR[XB]{i:i+31} result{i:i+31} MinimumSP(src1,src2) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. If src1 is less than src2, src1 is placed into word element i of VSR[XT] in single-precision format. Otherwise, src2 is placed into word element i of VSR[XT] in single-precision format. The minimum of +0 and ­0 is ­0. The minimum of a QNaN and any value is that value. The minimum of any value and an SNaN when VE=0 is that SNaN converted to a QNaN. See Table 85. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 449 Version 2.06 src2 ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN T(Q(src2)) ­Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) fx(VXSNAN) T(Q(src2)) ­NZF T(src2) T(M(src1,src2)) T(src1) T(src1) T(src1) T(src1) T(src1) fx(VXSNAN) T(Q(src2)) ­Zero T(src2) T(src2) T(src1) T(src1) T(src1) T(src1) T(src1) fx(VXSNAN) T(Q(src2)) +Zero T(src2) T(src2) T(src2) T(src1) T(src1) T(src1) T(src1) fx(VXSNAN) src1 T(Q(src2)) +NZF T(src2) T(src2) T(src2) T(src2) T(M(src1,src2)) T(src1) T(src1) fx(VXSNAN) T(Q(src2)) +Infinity T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) T(src1) fx(VXSNAN) T(src1) QNaN T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) fx(VXSNAN) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) T(Q(src1)) SNaN fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) fx(VXSNAN) Explanation: src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). src2 The single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). NZF Nonzero finite number. Q(x) Return a QNaN with the payload of x. M(x,y) Return the lesser of floating-point value x and floating-point value y. T(x) The value x is placed in word element i (i{0,1,2,3}) of VSR[XT] in single-precision format. FPRF, FR and FI are not modified. fx(x) If x is equal to 0, FX is set to 1. x is set to 1. VXSNAN Floating-point Invalid Operation Exception (SNaN). If VE=1, update of VSR[XT] is suppressed. Table 85. Actions for xvminsp 450 Power ISATM Book I Version 2.06 VSX Vector Multiply-Subtract For xvmsubmdp, do the following. Double-Precision XX3-form ­ Let src2 be the double-precision floating-point operand in doubleword element xvmsubadp XT,XA,XB (0xF000_0388) i of VSR[XB]. ­ Let src3 be the double-precision 60 T A B 113 AXBX TX floating-point operand in doubleword element 0 6 11 16 21 29 30 31 i of VSR[XT]. xvmsubmdp XT,XA,XB (0xF000_03C8) src1 is multiplied1 by src3, producing a product having unbounded range and precision. 60 T A B 121 AXBX TX 0 6 11 16 21 29 30 31 See part 1 of Table 86. XT TX || T src2 is negated and added2 to the product, XA AX || A XB BX || B producing a sum having unbounded range and ex_flag 0b0 precision. do i=0 to 127 by 64 The sum is normalized3. reset_xflags() src1 VSR[XA]{i:i+63} See part 2 of Table 86. src2 "xvmsubadp" ? VSR[XT]{i:i+63} : VSR[XB]{i:i+63} src3 "xvmsubadp" ? VSR[XB]{i:i+63} : VSR[XT]{i:i+63} The intermediate result is rounded to v{0:inf} MultiplyAddDP(src1,src3,NegateDP(src2)) double-precision using the rounding mode result{i:i+63} RoundToDP(RN,v) specified by the Floating-Point Rounding Control if(vxsnan_flag) then SetFX(VXSNAN) field RN of the FPSCR. if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) See Table 46, "Floating-Point Intermediate Result if(ox_flag) then SetFX(OX) Handling," on page 344. if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) The result is placed into doubleword element i of ex_flag ex_flag | (VE & vximz_flag) VSR[XT] in double-precision format. ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) See Table 68, "Vector Floating-Point Final Result," ex_flag ex_flag | (UE & ux_flag) on page 400. ex_flag ex_flag | (XE & xx_flag) end If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. if( ex_flag = 0 ) then VSR[XT] result Special Registers Altered: Let XT be the value TX concatenated with T. FX OX UX XX VXSNAN VXISI VXIMZ Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. For xvmsubadp, do the following. ­ Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XT]. ­ Let src3 be the double-precision floating-point operand in doubleword element i of VSR[XB]. 1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 451 Version 2.06 VSR Data Layout for xvmsub(a|m)dp src1 = VSR[XA] DP DP src2 = xvmsubadp ? VSR[XT] : VSR[XB] DP DP src3 = xvmsubadp ? VSR[XB] : VSR[XB] DP DP tgt = VSR[XT] DP DP 0 64 127 452 Power ISATM Book I Version 2.06 Part 1: src3 Multiply ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN p dQNaN p dQNaN p Q(src3) ­Infinity p +Infinity p +Infinity p ­Infinity p ­Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p Q(src3) ­NZF p +Infinity p M(src1,src3) p +Zero p ­Zero p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) ­Zero p +Zero p +Zero p ­Zero p ­Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Zero p ­Zero p ­Zero p +Zero p +Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 src1 p Q(src3) +NZF p ­Infinity p M(src1,src3) p ­Zero p +Zero p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Infinity p ­Infinity p +Infinity p +Infinity p +Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p src1 QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1 vxsnan_flag 1 p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Part 2: src2 Subtract ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) ­NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v ­Infinity v src2 vxsnan_flag 1 v Q(src2) ­Zero v +Infinity v ­src2 v ­Zero v Rezd v ­src2 v ­Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v +Infinity v ­src2 v Rezd v +Zero v ­src2 v ­Infinity v src2 vxsnan_flag 1 p v Q(src2) +NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v ­Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 QNaN & vp vp vp vp vp vp vp vp src1 is a NaN vxsnan_flag 1 QNaN & v Q(src2) vp vp vp vp vp vp v src2 src1 not a NaN vxsnan_flag 1 Explanation: src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). src2 For xvmsubadp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). For xvmsubmdp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). src3 For xvmsubadp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). For xvmsubmdp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Q(x) Return a QNaN with the payload of x. S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = y, v is considered to be an exact-zero-difference result (Rezd). M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. p The intermediate product having unbounded range and precision. v The intermediate result having unbounded range and precision. Table 86. Actions for xvmsub(a|m)dp Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 453 Version 2.06 VSX Vector Multiply-Subtract Single-Precision ­ Let src2 be the single-precision floating-point XX3-form operand in word element i of VSR[XB]. ­ Let src3 be the single-precision floating-point xvmsubasp XT,XA,XB (0xF000_0288) operand in word element i of VSR[XT]. 60 T A B 81 AXBX TX src1 is multiplied1 by src3, producing a product 0 6 11 16 21 29 30 31 having unbounded range and precision. xvmsubmsp XT,XA,XB (0xF000_02C8) See part 1 of Table 87. 60 T A B 89 AXBX TX src2 is negated and added2 to the product, 0 6 11 16 21 29 30 31 producing a sum having unbounded range and precision. XT TX || T XA AX || A The sum is normalized3. XB BX || B ex_flag 0b0 See part 2 of Table 87. do i=0 to 127 by 32 reset_xflags() The intermediate result is rounded to src1 VSR[XA]{i:i+31} single-precision using the rounding mode src2 "xvmsubasp" ? VSR[XT]{i:i+31} : VSR[XB]{i:i+31} specified by the Floating-Point Rounding Control src3 "xvmsubasp" ? VSR[XB]{i:i+31} : VSR[XT]{i:i+31} field RN of the FPSCR. v{0:inf} MultiplyAddSP(src1,src3,NegateSP(src2)) result{i:i+31} RoundToSP(RN,v) See Table 46, "Floating-Point Intermediate Result if(vxsnan_flag) then SetFX(VXSNAN) Handling," on page 344. if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) The result is placed into word element i of if(ox_flag) then SetFX(OX) VSR[XT] in single-precision format. if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) See Table 68, "Vector Floating-Point Final Result," ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vximz_flag) on page 400. ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) If a trap-enabled exception occurs in any element of ex_flag ex_flag | (UE & ux_flag) the vector, no results are written to VSR[XT]. ex_flag ex_flag | (XE & xx_flag) end Special Registers Altered: FX OX UX XX VXSNAN VXISI VXIMZ if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. For xvmsubasp, do the following. ­ Let src2 be the single-precision floating-point operand in word element i of VSR[XT]. ­ Let src3 be the single-precision floating-point operand in word element i of VSR[XB]. For xvmsubmsp, do the following. 1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. 454 Power ISATM Book I Version 2.06 VSR Data Layout for xvmsub(a|m)sp src1 = VSR[XA] SP SP SP SP src2 = xvmsubasp ? VSR[XT] : VSR[XB] SP SP SP SP src3 = xvmsubasp ? VSR[XB] : VSR[XT] SP SP SP SP tgt = VSR[XT] SP SP SP SP 0 32 64 96 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 455 Version 2.06 Part 1: src3 Multiply ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN p dQNaN p dQNaN p Q(src3) ­Infinity p +Infinity p +Infinity p ­Infinity p ­Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p Q(src3) ­NZF p +Infinity p M(src1,src3) p +Zero p ­Zero p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) ­Zero p +Zero p +Zero p ­Zero p ­Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Zero p ­Zero p ­Zero p +Zero p +Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 src1 p Q(src3) +NZF p ­Infinity p M(src1,src3) p ­Zero p +Zero p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Infinity p ­Infinity p +Infinity p +Infinity p +Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p src1 QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1 vxsnan_flag 1 p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Part 2: src2 Subtract ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) ­NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v ­Infinity v src2 vxsnan_flag 1 v Q(src2) ­Zero v +Infinity v ­src2 v ­Zero v Rezd v ­src2 v ­Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v +Infinity v ­src2 v Rezd v +Zero v ­src2 v ­Infinity v src2 vxsnan_flag 1 p v Q(src2) +NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v ­Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 QNaN & vp vp vp vp vp vp vp vp src1 is a NaN vxsnan_flag 1 QNaN & v Q(src2) vp vp vp vp vp vp v src2 src1 not a NaN vxsnan_flag 1 Explanation: src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). src2 For xvmsubasp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). For xvmsubmsp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). src3 For xvmsubasp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). For xvmsubmsp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). dQNaN Default quiet NaN (0x7FC0_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Q(x) Return a QNaN with the payload of x. S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = y, v is considered to be an exact-zero-difference result (Rezd). M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. p The intermediate product having unbounded range and precision. v The intermediate result having unbounded range and precision. Table 87. Actions for xvmsub(a|m)sp 456 Power ISATM Book I Version 2.06 VSX Vector Multiply Double-Precision The result is placed into doubleword element i of XX3-form VSR[XT] in double-precision format. xvmuldp XT,XA,XB (0xF000_0380) See Table 68, "Vector Floating-Point Final Result," on page 400. 60 T A B 112 AXBX TX 0 6 11 16 21 29 30 31 If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. XT TX || T XA AX || A Special Registers Altered: XB BX || B ex_flag 0b0 FX OX UX XX VXSNAN VXIMZ do i=0 to 127 by 64 VSR Data Layout for xvmuldp reset_xflags() src1 = VSR[XA] src1 VSR[XA]{i:i+63} src3 VSR[XB]{i:i+63} DP DP v{0:inf} MultiplyDP(src1,src3) result{i:i+63} RoundToDP(RN,v) src2 = VSR[XB] if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) DP DP if(ox_flag) then SetFX(OX) tgt = VSR[XT] if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) DP DP ex_flag ex_flag | (VE & vxsnan_flag) 0 64 127 ex_flag ex_flag | (VE & vximz_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. src1 is multiplied1 by src2, producing a product having unbounded range and precision. The product is normalized2. See Table 88. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, "Floating-Point Intermediate Result Handling," on page 344. 1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 457 Version 2.06 src2 -Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v dQNaN v Q(src2) -Infinity v +Infinity v +Infinity v ­Infinity v ­Infinity v src2 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 v Q(src2) -NZF v +Infinity v M(src1,src2) v +Zero v ­Zero v M(src1,src2) v +Infinity v src2 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) -Zero v +Zero v +Zero v ­Zero v ­Zero v src2 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) +Zero v ­Zero v ­Zero v +Zero v +Zero v src2 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 src1 v Q(src2) +NZF v ­Infinity v M(src1,src2) v ­Zero v +Zero v M(src1,src2) v +Infinity v src2 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) +Infinity v ­Infinity v +Infinity v +Infinity v +Infinity v src2 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 v src1 QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1 vxsnan_flag 1 v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Explanation: src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). src2 The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. Q(x) Return a QNaN with the payload of x. v The intermediate result having unbounded signficand precision and unbounded exponent range. Table 88. Actions for xvmuldp 458 Power ISATM Book I Version 2.06 VSX Vector Multiply Single-Precision The result is placed into word element i of XX3-form VSR[XT] in single-precision format. xvmulsp XT,XA,XB (0xF000_0280) See Table 68, "Vector Floating-Point Final Result," on page 400. 60 T A B 80 AXBX TX 0 6 11 16 21 29 30 31 If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. XT TX || T XA AX || A Special Registers Altered: XB BX || B ex_flag 0b0 FX OX UX XX VXSNAN VXIMZ do i=0 to 127 by 32 VSR Data Layout for xvmulsp reset_xflags() src1 = VSR[XA] src1 VSR[XA]{i:i+31} src3 VSR[XB]{i:i+31} SP SP SP SP v{0:inf} MultiplySP(src1,src3) result{i:i+31} RoundToSP(RN,v) src2 = VSR[XB] if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) SP SP SP SP if(ox_flag) then SetFX(OX) tgt = VSR[XT] if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) SP SP SP SP ex_flag ex_flag | (VE & vxsnan_flag) 0 32 64 96 127 ex_flag ex_flag | (VE & vximz_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. src1 is multiplied1 by src2, producing a product having unbounded range and precision. The product is normalized2. See Table 89. The intermediate result is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, "Floating-Point Intermediate Result Handling," on page 344. 1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 459 Version 2.06 src2 -Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v dQNaN v Q(src2) -Infinity v +Infinity v +Infinity v ­Infinity v ­Infinity v src2 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 v Q(src2) -NZF v +Infinity v M(src1,src2) v +Zero v ­Zero v M(src1,src2) v +Infinity v src2 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) -Zero v +Zero v +Zero v ­Zero v ­Zero v src2 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) +Zero v ­Zero v ­Zero v +Zero v +Zero v src2 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 src1 v Q(src2) +NZF v ­Infinity v M(src1,src2) v ­Zero v +Zero v M(src1,src2) v +Infinity v src2 vxsnan_flag 1 v dQNaN v dQNaN v Q(src2) +Infinity v ­Infinity v +Infinity v +Infinity v +Infinity v src2 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 v src1 QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1 vxsnan_flag 1 v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Explanation: src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). src2 The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). dQNaN Default quiet NaN (0x7FC0_0000). NZF Nonzero finite number. M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. Q(x) Return a QNaN with the payload of x. v The intermediate result having unbounded signficand precision and unbounded exponent range. Table 89. Actions for xvmulsp 460 Power ISATM Book I Version 2.06 VSX Vector Negative Absolute Value VSX Vector Negative Absolute Value Double-Precision XX2-form Single-Precision XX2-form xvnabsdp XT,XB (0xF000_07A4) xvnabssp XT,XB (0xF000_06A4) 60 T /// B 489 BX TX 60 T /// B 425 BX TX 0 6 11 16 21 30 31 0 6 11 16 21 30 31 XT TX || T XT TX || T XB BX || B XB BX || B do i=0 to 127 by 64 do i=0 to 127 by 32 VSR[XT]{i:i+63} 0b1 || VSR[XB]{i+1:i+63} VSR[XT]{i:i+31} 0b1 || VSR[XB]{i+1:i+31} end end Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. For each vector element i from 0 to 3, do the following. The contents of doubleword element i of VSR[XB], The contents of word element i of VSR[XB], with with bit 0 set to 1, is placed into doubleword bit 0 set to 1, is placed into word element i of element i of VSR[XT]. VSR[XT]. Special Registers Altered: Special Registers Altered: None None VSR Data Layout for xvnabsdp VSR Data Layout for xvnabssp src = VSR[XB] src = VSR[XB] DP DP SP SP SP SP tgt = VSR[XT] tgt = VSR[XT] DP DP SP SP SP SP 0 64 127 0 32 64 96 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 461 Version 2.06 VSX Vector Negate Double-Precision VSX Vector Negate Single-Precision XX2-form XX2-form xvnegdp XT,XB (0xF000_07E4) xvnegsp XT,XB (0xF000_06E4) 60 T /// B 505 BX TX 60 T /// B 441 BX TX 0 6 11 16 21 30 31 0 6 11 16 21 30 31 XT TX || T XT TX || T XB BX || B XB BX || B do i=0 to 127 by 64 do i=0 to 127 by 32 VSR[XT]{i:i+63} ~VSR[XB]{i} || VSR[XB]{i+1:i+63} VSR[XT]{i:i+31} ~VSR[XB]{i} || VSR[XB]{i+1:i+31} end end Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. For each vector element i from 0 to 3, do the following. The contents of doubleword element i of VSR[XB], The contents of word element i of VSR[XB], with with bit 0 complemented, is placed into bit 0 complemented, is placed into word element i doubleword element i of VSR[XT]. of VSR[XT]. Special Registers Altered: Special Registers Altered: None None VSR Data Layout for xvnegdp VSR Data Layout for xvnegsp src = VSR[XB] src = VSR[XB] DP DP SP SP SP SP tgt = VSR[XT] tgt = VSR[XT] DP DP SP SP SP SP 0 64 127 0 32 64 96 127 462 Power ISATM Book I Version 2.06 VSX Vector Negative Multiply-Add For xvnmaddmdp, do the following. Double-Precision XX3-form ­ Let src2 be the double-precision floating-point operand in doubleword element xvnmaddadp XT,XA,XB (0xF000_0708) i of VSR[XB]. ­ Let src3 be the double-precision 60 T A B 225 AXBX TX floating-point operand in doubleword element 0 6 11 16 21 29 30 31 i of VSR[XT]. xvnmaddmdp XT,XA,XB (0xF000_0748) src1 is multiplied1 by src3, producing a product having unbounded range and precision. 60 T A B 233 AXBX TX 0 6 11 16 21 29 30 31 See part 1 of Table 90. XT TX || T src2 is added2 to the product, producing a sum XA AX || A XB BX || B having unbounded range and precision. ex_flag 0b0 The sum is normalized3. do i=0 to 127 by 64 reset_xflags() See part 2 of Table 90. src1 VSR[XA]{i:i+63} src2 "xvnmaddadp" ? VSR[XT]{i:i+63} : VSR[XB]{i:i+63} The intermediate result is rounded to src3 "xvnmaddadp" ? VSR[XB]{i:i+63} : VSR[XT]{i:i+63} double-precision using the rounding mode v{0:inf} MultiplyAddDP(src1,src3,src2) specified by the Floating-Point Rounding Control result{i:i+63} NegateDP(RoundToDP(RN,v)) field RN of the FPSCR. if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) See Table 46, "Floating-Point Intermediate Result if(vxisi_flag) then SetFX(VXISI) Handling," on page 344. if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) The result is negated and placed into doubleword if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) element i of VSR[XT] in double-precision format. ex_flag ex_flag | (VE & vximz_flag) ex_flag ex_flag | (VE & vxisi_flag) See Table 91, "Vector Floating-Point Final Result ex_flag ex_flag | (OE & ox_flag) with Negation," on page 466. ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) If a trap-enabled exception occurs in any element of end the vector, no results are written to VSR[XT]. if( ex_flag = 0 ) then VSR[XT] result Special Registers Altered: FX OX UX XX VXSNAN VXISI VXIMZ Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. For xvnmaddadp, do the following. ­ Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XT]. ­ Let src3 be the double-precision floating-point operand in doubleword element i of VSR[XB]. 1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 463 Version 2.06 VSR Data Layout for xvnmadd(a|m)dp src1 = VSR[XA] DP DP src2 = xsmaddadp ? VSR[XT] : VSR[XB] DP DP src3 = xsmaddadp ? VSR[XB] : VSR[XT] DP DP tgt = VSR[XT] DP DP 0 64 127 464 Power ISATM Book I Version 2.06 Part 1: src3 Multiply ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN p dQNaN p dQNaN p Q(src3) ­Infinity p +Infinity p +Infinity p ­Infinity p ­Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p Q(src3) ­NZF p +Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) ­Zero p +Zero p +Zero p ­Zero p ­Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Zero p ­Zero p ­Zero p +Zero p +Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 src1 p Q(src3) +NZF p ­Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Infinity p ­Infinity p +Infinity p +Infinity p +Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p src1 QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1 vxsnan_flag 1 p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Part 2: src2 Add ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) ­NZF v ­Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2 vxsnan_flag 1 v Q(src2) ­Zero v ­Infinity v src2 v ­Zero v Rezd v src2 v +Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v ­Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2 vxsnan_flag 1 p v Q(src2) +NZF v ­Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 QNaN & vp vp vp vp vp vp vp vp src1 is a NaN vxsnan_flag 1 QNaN & v Q(src2) vp vp vp vp vp vp v src2 src1 not a NaN vxsnan_flag 1 Explanation: src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). src2 For xvnmaddadp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). For xvnmaddmdp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). src3 For xvnmaddadp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). For xvnmaddmdp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Q(x) Return a QNaN with the payload of x. A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). M(x,y) Return the product of floating-point value x and floating-point value y, having unbounded range and precision. p The intermediate product having unbounded range and precision. v The intermediate result having unbounded range and precision. Table 90. Actions for xvnmadd(a|m)dp Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 465 Version 2.06 Is r incremented? (|r| > |v|) Is q incremented? (|q| > |v|) Is r inexact? (r g v) Is q inexact? (q g v) vxsnan_flag vximz_flag vxisi_flag OE UE VE XE ZE Case Returned Results and Status Setting ­ ­ ­ ­ ­ 0 0 0 ­ ­ ­ ­ T(N(r)) 0 ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ T(r), fx(VXISI) 0 ­ ­ ­ ­ 0 1 ­ ­ ­ ­ ­ T(r), fx(VXIMZ) 0 ­ ­ ­ ­ 1 0 ­ ­ ­ ­ ­ T(r), fx(VXSNAN) Special 0 ­ ­ ­ ­ 1 1 ­ ­ ­ ­ ­ T(r), fx(VXSNAN), fx(VXIMZ) 1 ­ ­ ­ ­ ­ ­ 1 ­ ­ ­ ­ fx(VXISI), error() 1 ­ ­ ­ ­ 0 1 ­ ­ ­ ­ ­ fx(VXIMZ), error() 1 ­ ­ ­ ­ 1 0 ­ ­ ­ ­ ­ fx(VXSNAN), error() 1 ­ ­ ­ ­ 1 1 ­ ­ ­ ­ ­ fx(VXSNAN), fx(VXIMZ), error() ­ ­ ­ ­ ­ ­ ­ ­ no ­ ­ ­ T(N(r)) ­ ­ ­ ­ 0 ­ ­ ­ yes no ­ ­ T(N(r)), fx(XX) Normal ­ ­ ­ ­ 0 ­ ­ ­ yes yes ­ ­ T(N(r)), fx(XX) ­ ­ ­ ­ 1 ­ ­ ­ yes no ­ ­ T(N(r)), fx(XX), error() ­ ­ ­ ­ 1 ­ ­ ­ yes yes ­ ­ T(N(r)), fx(XX), error() ­ 0 ­ ­ 0 ­ ­ ­ ­ ­ ­ ­ T(N(r)), fx(OX), fx(XX) ­ 0 ­ ­ 1 ­ ­ ­ ­ ­ ­ ­ T(N(r)), fx(OX), fx(XX), error() Overflow ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ no ­ fx(OX), error() ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ yes no fx(OX), fx(XX), error() ­ 1 ­ ­ ­ ­ ­ ­ ­ ­ yes yes fx(OX), fx(XX), error() Explanation: ­ The results do not depend on this condition. fx(x) FX is set to 1 if x=0. x is set to 1. q The value defined in Table 46, "Floating-Point Intermediate Result Handling," on page 344, signficand rounded to the target precision, unbounded exponent range. r The value defined in Table 46, "Floating-Point Intermediate Result Handling," on page 344, signficand rounded to the target precision, bounded exponent range. v The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range. FI Floating-Point Fraction Inexact status flag, FPSCRFI. This status flag is nonsticky. FR Floating-Point Fraction Rounded status flag, FPSCRFR. OX Floating-Point Overflow Exception status flag, FPSCROX. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of the target VSR is suppressed for all vector elements. N(x) The value x is is negated by complementing the sign bit of x. T(x) The value x is placed in element i of VSR[XT] in the target precision format (where i c {0,1} for results with 64-bit elements, and i c {0,1,3,4}) for results with 32-bit elements). UX Floating-Point Underflow Exception status flag, FPSCRUX VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. VXIMZ Floating-Point Invalid Operation Exception (Infinity × Zero) status flag, FPSCRVXIMZ. VXISI Floating-Point Invalid Operation Exception (Infinity ­ Infinity) status flag, FPSCRVXISI. XX Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI. Table 91. Vector Floating-Point Final Result with Negation 466 Power ISATM Book I Version 2.06 Is r incremented? (|r| > |v|) Is q incremented? (|q| > |v|) Is r inexact? (r g v) Is q inexact? (q g v) vxsnan_flag vximz_flag vxisi_flag OE UE VE XE Case ZE Returned Results and Status Setting ­ ­ 0 ­ ­ ­ ­ ­ no ­ ­ ­ T(N(r)) ­ ­ 0 ­ 0 ­ ­ ­ yes no ­ ­ T(N(r)), fx(UX), fx(XX) ­ ­ 0 ­ 0 ­ ­ ­ yes yes ­ ­ T(N(r)), fx(UX), fx(XX) ­ ­ 0 ­ 1 ­ ­ ­ yes no ­ ­ T(N(r)), fx(UX), fx(XX), error() Tiny ­ ­ 0 ­ 1 ­ ­ ­ yes yes ­ ­ T(N(r)), fx(UX), fx(XX), error() ­ ­ 1 ­ ­ ­ ­ ­ yes ­ no ­ fx(UX), error() ­ ­ 1 ­ ­ ­ ­ ­ yes ­ yes no fx(UX), fx(XX), error() ­ ­ 1 ­ ­ ­ ­ ­ yes ­ yes yes fx(UX), fx(XX), error() Explanation: ­ The results do not depend on this condition. fx(x) FX is set to 1 if x=0. x is set to 1. q The value defined in Table 46, "Floating-Point Intermediate Result Handling," on page 344, signficand rounded to the target precision, unbounded exponent range. r The value defined in Table 46, "Floating-Point Intermediate Result Handling," on page 344, signficand rounded to the target precision, bounded exponent range. v The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range. FI Floating-Point Fraction Inexact status flag, FPSCRFI. This status flag is nonsticky. FR Floating-Point Fraction Rounded status flag, FPSCRFR. OX Floating-Point Overflow Exception status flag, FPSCROX. error() The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of the target VSR is suppressed for all vector elements. N(x) The value x is is negated by complementing the sign bit of x. T(x) The value x is placed in element i of VSR[XT] in the target precision format (where i c {0,1} for results with 64-bit elements, and i c {0,1,3,4}) for results with 32-bit elements). UX Floating-Point Underflow Exception status flag, FPSCRUX VXSNAN Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. VXIMZ Floating-Point Invalid Operation Exception (Infinity × Zero) status flag, FPSCRVXIMZ. VXISI Floating-Point Invalid Operation Exception (Infinity ­ Infinity) status flag, FPSCRVXISI. XX Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI. Table 91. Vector Floating-Point Final Result with Negation (Continued) Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 467 Version 2.06 VSX Vector Negative Multiply-Add ­ Let src2 be the single-precision floating-point Single-Precision XX3-form operand in word element i of VSR[XB]. ­ Let src3 be the single-precision floating-point xvnmaddasp XT,XA,XB (0xF000_0608) operand in word element i of VSR[XT]. 60 T A B 193 AXBX TX src1 is multiplied1 by src3, producing a product 0 6 11 16 21 29 30 31 having unbounded range and precision. xvnmaddmsp XT,XA,XB (0xF000_0648) See part 1 of Table 92. 60 T A B 201 AXBX TX src2 is added2 to the product, producing a sum 0 6 11 16 21 29 30 31 having unbounded range and precision. XT TX || T The sum is normalized3. XA AX || A XB BX || B See part 2 of Table 92. ex_flag 0b0 do i=0 to 127 by 32 The intermediate result is rounded to reset_xflags() single-precision using the rounding mode src1 VSR[XA]{i:i+31} specified by the Floating-Point Rounding Control src2 "xvnmaddasp" ? VSR[XT]{i:i+31} : VSR[XB]{i:i+31} field RN of the FPSCR. src3 "xvnmaddasp" ? VSR[XB]{i:i+31} : VSR[XT]{i:i+31} v{0:inf} MultiplyAddSP(src1,src3,src2) See Table 46, "Floating-Point Intermediate Result result{i:i+31} NegateSP(RoundToSP(RN,v)) Handling," on page 344. if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) The result is negated and placed into word if(vxisi_flag) then SetFX(VXISI) element i of VSR[XT] in single-precision format. if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) See Table 91, "Vector Floating-Point Final Result if(xx_flag) then SetFX(XX) with Negation," on page 466. ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vximz_flag) ex_flag ex_flag | (VE & vxisi_flag) If a trap-enabled exception occurs in any element of ex_flag ex_flag | (OE & ox_flag) the vector, no results are written to VSR[XT]. ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) Special Registers Altered: end FX OX UX XX VXSNAN VXISI VXIMZ if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. For xvnmaddasp, do the following. ­ Let src2 be the single-precision floating-point operand in word element i of VSR[XT]. ­ Let src3 be the single-precision floating-point operand in word element i of VSR[XB]. For xvnmaddmsp, do the following. 1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. 468 Power ISATM Book I Version 2.06 VSR Data Layout for xvnmadd(a|m)sp src1 = VSR[XA] SP SP SP SP src2 = xsmaddadp ? VSR[XT] : VSR[XB] SP SP SP SP src3 = xsmaddadp ? VSR[XB] : VSR[XT] SP SP SP SP tgt = VSR[XT] SP SP SP SP 0 32 64 96 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 469 Version 2.06 Part 1: src3 Multiply ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN p dQNaN p dQNaN p Q(src3) ­Infinity p +Infinity p +Infinity p ­Infinity p ­Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p Q(src3) ­NZF p +Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) ­Zero p +Zero p +Zero p ­Zero p ­Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Zero p ­Zero p ­Zero p +Zero p +Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 src1 p Q(src3) +NZF p ­Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Infinity p ­Infinity p +Infinity p +Infinity p +Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p src1 QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1 vxsnan_flag 1 p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Part 2: src2 Add ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) ­NZF v ­Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2 vxsnan_flag 1 v Q(src2) ­Zero v ­Infinity v src2 v ­Zero v Rezd v src2 v +Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v ­Infinity v src2 v Rezd v +Zero v src2 v +Infinity v src2 vxsnan_flag 1 p v Q(src2) +NZF v ­Infinity v A(p,src2) vp vp v A(p,src2) v +Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 QNaN & vp vp vp vp vp vp vp vp src1 is a NaN vxsnan_flag 1 QNaN & v Q(src2) vp vp vp vp vp vp v src2 src1 not a NaN vxsnan_flag 1 Explanation: src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). src2 For xvnmaddasp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). For xvnmaddmsp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). src3 For xvnmaddasp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). For xvnmaddmsp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). dQNaN Default quiet NaN (0x7FC0_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Q(x) Return a QNaN with the payload of x. A(x,y) Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. p The intermediate product having unbounded range and precision. v The intermediate result having unbounded range and precision. Table 92. Actions for xvnmadd(a|m)sp 470 Power ISATM Book I Version 2.06 VSX Vector Negative Multiply-Subtract For xvmsubmdp, do the following. Double-Precision XX3-form ­ Let src2 be the double-precision floating-point operand in doubleword element xvnmsubadp XT,XA,XB (0xF000_0788) i of VSR[XB]. ­ Let src3 be the double-precision 60 T A B 241 AXBX TX floating-point operand in doubleword element 0 6 11 16 21 29 30 31 i of VSR[XT]. xvnmsubmdp XT,XA,XB (0xF000_07C8) src1 is multiplied1 by src3, producing a product having unbounded range and precision. 60 T A B 249 AXBX TX 0 6 11 16 21 29 30 31 See part 1 of Table 93. XT TX || T src2 is negated and added2 to the product, XA AX || A XB BX || B producing a sum having unbounded range and ex_flag 0b0 precision. do i=0 to 127 by 64 The sum is normalized3. reset_xflags() src1 VSR[XA]{i:i+63} See part 2 of Table 93. src2 "xvmsubadp" ? VSR[XT]{i:i+63} : VSR[XB]{i:i+63} src3 "xvmsubadp" ? VSR[XB]{i:i+63} : VSR[XT]{i:i+63} The intermediate result is rounded to v{0:inf} MultiplyAddDP(src1,src3,NegateDP(src2)) double-precision using the rounding mode result{i:i+63} NegateDP(RoundToDP(RN,v)) specified by the Floating-Point Rounding Control if(vxsnan_flag) then SetFX(VXSNAN) field RN of the FPSCR. if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) See Table 46, "Floating-Point Intermediate Result if(ox_flag) then SetFX(OX) Handling," on page 344. if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) The result is negated and placed into doubleword ex_flag ex_flag | (VE & vximz_flag) element i of VSR[XT] in double-precision format. ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) See Table 91, "Vector Floating-Point Final Result ex_flag ex_flag | (UE & ux_flag) with Negation," on page 466. ex_flag ex_flag | (XE & xx_flag) end If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. if( ex_flag = 0 ) then VSR[XT] result Special Registers Altered: Let XT be the value TX concatenated with T. FX OX UX XX VXSNAN VXISI VXIMZ Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. For xvmsubadp, do the following. ­ Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XT]. ­ Let src3 be the double-precision floating-point operand in doubleword element i of VSR[XB]. 1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 471 Version 2.06 VSR Data Layout for xvnmsub(a|m)dp src1 = VSR[XA] DP DP src2 = xvnmsubadp ? VSR[XT] : VSR[XB] DP DP src3 = xvnmsubadp ? VSR[XB] : VSR[XB] DP DP tgt = VSR[XT] DP DP 0 64 127 472 Power ISATM Book I Version 2.06 Part 1: src3 Multiply ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN p dQNaN p dQNaN p Q(src3) ­Infinity p +Infinity p +Infinity p ­Infinity p ­Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p Q(src3) ­NZF p +Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) ­Zero p +Zero p +Zero p ­Zero p ­Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Zero p ­Zero p ­Zero p +Zero p +Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 src1 p Q(src3) +NZF p ­Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Infinity p ­Infinity p +Infinity p +Infinity p +Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p src1 QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1 vxsnan_flag 1 p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Part 2: src2 Subtract ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) ­NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v ­Infinity v src2 vxsnan_flag 1 v Q(src2) ­Zero v +Infinity v ­src2 v ­Zero v Rezd v ­src2 v ­Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v +Infinity v ­src2 v Rezd v +Zero v ­src2 v ­Infinity v src2 vxsnan_flag 1 p v Q(src2) +NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v ­Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 QNaN & vp vp vp vp vp vp vp vp src1 is a NaN vxsnan_flag 1 QNaN & v Q(src2) vp vp vp vp vp vp v src2 src1 not a NaN vxsnan_flag 1 Explanation: src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). src2 For xvnmsubadp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). For xvnmsubmdp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). src3 For xvnmsubadp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). For xvnmsubmdp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Q(x) Return a QNaN with the payload of x. S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. p The intermediate product having unbounded range and precision. v The intermediate result having unbounded range and precision. Table 93. Actions for xvnmsub(a|m)dp Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 473 Version 2.06 VSX Vector Negative Multiply-Subtract ­ Let src2 be the single-precision floating-point Single-Precision XX3-form operand in word element i of VSR[XB]. ­ Let src3 be the single-precision floating-point xvnmsubasp XT,XA,XB (0xF000_0688) operand in word element i of VSR[XT]. 60 T A B 209 AXBX TX src1 is multiplied1 by src3, producing a product 0 6 11 16 21 29 30 31 having unbounded range and precision. xvnmsubmsp XT,XA,XB (0xF000_06C8) See part 1 of Table 94. 60 T A B 217 AXBX TX src2 is negated and added2 to the product, 0 6 11 16 21 29 30 31 producing a sum having unbounded range and precision. XT TX || T XA AX || A The sum is normalized3. XB BX || B ex_flag 0b0 See part 2 of Table 94. do i=0 to 127 by 32 reset_xflags() The intermediate result is rounded to src1 VSR[XA]{i:i+31} single-precision using the rounding mode src2 "xvnmsubasp" ? VSR[XT]{i:i+31} : VSR[XB]{i:i+31} specified by the Floating-Point Rounding Control src3 "xvnmsubasp" ? VSR[XB]{i:i+31} : VSR[XT]{i:i+31} field RN of the FPSCR. v{0:inf} MultiplyAddSP(src1,src3,NegateSP(src2)) result{i:i+31} NegateSP(RoundToSP(RN,v)) See Table 46, "Floating-Point Intermediate Result if(vxsnan_flag) then SetFX(VXSNAN) Handling," on page 344. if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) The result is negated and placed into word if(ox_flag) then SetFX(OX) element i of VSR[XT] in single-precision format. if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) See Table 91, "Vector Floating-Point Final Result ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vximz_flag) with Negation," on page 466. ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) If a trap-enabled exception occurs in any element of ex_flag ex_flag | (UE & ux_flag) the vector, no results are written to VSR[XT]. ex_flag ex_flag | (XE & xx_flag) end Special Registers Altered: FX OX UX XX VXSNAN VXISI VXIMZ if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. For xvnmsubasp, do the following. ­ Let src2 be the single-precision floating-point operand in word element i of VSR[XT]. ­ Let src3 be the single-precision floating-point operand in word element i of VSR[XB]. For xvnmsubmsp, do the following. 1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. 474 Power ISATM Book I Version 2.06 VSR Data Layout for xvnmsub(a|m)sp src1 = VSR[XA] SP SP SP SP src2 = xvnmsubasp ? VSR[XT] : VSR[XB] SP SP SP SP src3 = xvnmsubasp ? VSR[XB] : VSR[XT] SP SP SP SP tgt = VSR[XT] SP SP SP SP 0 32 64 96 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 475 Version 2.06 Part 1: src3 Multiply ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN p dQNaN p dQNaN p Q(src3) ­Infinity p +Infinity p +Infinity p ­Infinity p ­Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p Q(src3) ­NZF p +Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) ­Zero p +Zero p +Zero p ­Zero p ­Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Zero p ­Zero p ­Zero p +Zero p +Zero p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 src1 p Q(src3) +NZF p ­Infinity p M(src1,src3) p src1 p src1 p M(src1,src3) p +Infinity p src3 vxsnan_flag 1 p dQNaN p dQNaN p Q(src3) +Infinity p ­Infinity p +Infinity p +Infinity p +Infinity p src3 vximz_flag 1 vximz_flag 1 vxsnan_flag 1 p src1 QNaN p src1 p src1 p src1 p src1 p src1 p src1 p src1 vxsnan_flag 1 p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) p Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Part 2: src2 Subtract ­Infinity ­NZF ­Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) ­NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v ­Infinity v src2 vxsnan_flag 1 v Q(src2) ­Zero v +Infinity v ­src2 v ­Zero v Rezd v ­src2 v ­Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v +Infinity v ­src2 v Rezd v +Zero v ­src2 v ­Infinity v src2 vxsnan_flag 1 p v Q(src2) +NZF v +Infinity v S(p,src2) vp vp v S(p,src2) v ­Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 QNaN & vp vp vp vp vp vp vp vp src1 is a NaN vxsnan_flag 1 QNaN & v Q(src2) vp vp vp vp vp vp v src2 src1 not a NaN vxsnan_flag 1 Explanation: src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). src2 The single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). src3 The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). dQNaN Default quiet NaN (0x7FC0_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Q(x) Return a QNaN with the payload of x. S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). M(x,y) Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. p The intermediate product having unbounded range and precision. v The intermediate result having unbounded range and precision. Table 94. Actions for xvnmsub(a|m)sp 476 Power ISATM Book I Version 2.06 VSX Vector Round to Double-Precision Integer using round to Nearest Away XX2-form xvrdpi XT,XB (0xF000_0324) 60 T /// B 201 BX TX 0 6 11 16 21 30 31 XT TX || T XB BX || B ex_flag 0b0 do i=0 to 127 by 64 reset_xflags() result{i:i+63} RoundToDPIntegerNearAway(VSR[XB]{i:i+63}) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. src is rounded to an integer using the rounding mode Round to Nearest Away. The result is placed into doubleword element i of VSR[XT] in double-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN VSR Data Layout for xvrdpi src = VSR[XB] DP DP tgt = VSR[XT] DP DP 0 64 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 477 Version 2.06 VSX Vector Round to Double-Precision VSX Vector Round to Double-Precision Integer Exact using Current rounding mode Integer using round toward -Infinity XX2-form XX2-form xvrdpim XT,XB (0xF000_03E4) xvrdpic XT,XB (0xF000_03AC) 60 T /// B 249 BX TX 60 T /// B 235 BX TX 0 6 11 16 21 30 31 0 6 11 16 21 30 31 XT TX || T XT TX || T XB BX || B XB BX || B ex_flag 0b0 ex_flag 0b0 do i=0 to 127 by 64 do i=0 to 127 by 64 reset_xflags() reset_xflags() result{i:i+63} RoundToDPIntegerFloor(VSR[XB]{i:i+63}) src{0:63} VSR[XB]{i:i+63} if(vxsnan_flag) then SetFX(VXSNAN) if(RN=0b00) then ex_flag ex_flag | (VE & vxsnan_flag) result{i:i+63} RoundToDPIntegerNearEven(src) end if(RN=0b01) then result{i:i+63} RoundToDPIntegerTrunc(src) if( ex_flag = 0 ) then VSR[XT] result if(RN=0b10) then result{i:i+63} RoundToDPIntegerCeil(src) Let XT be the value TX concatenated with T. if(RN=0b11) then Let XB be the value BX concatenated with B. result{i:i+63} RoundToDPIntegerFloor(src) if(vxsnan_flag) then SetFX(VXSNAN) For each vector element i from 0 to 1, do the following. if(xx_flag) then SetFX(XX) Let src be the double-precision floating-point ex_flag ex_flag | (VE & vxsnan_flag) operand in doubleword element i of VSR[XB]. ex_flag ex_flag | (XE & xx_flag) end src is rounded to an integer using the rounding if( ex_flag = 0 ) then VSR[XT] result mode Round toward -Infinity. Let XT be the value TX concatenated with T. The result is placed into doubleword element i of Let XB be the value BX concatenated with B. VSR[XT] in double-precision format. For each vector element i from 0 to 1, do the following. If a trap-enabled exception occurs in any element of Let src be the double-precision floating-point the vector, no results are written to VSR[XT]. operand in doubleword element i of VSR[XB]. Special Registers Altered: src is rounded to an integer using the rounding FX VXSNAN mode specified by the Floating-Point Rounding Control field RN of the FPSCR. VSR Data Layout for xvrdpim src = VSR[XB] The result is placed into doubleword element i of VSR[XT] in double-precision format. DP DP If a trap-enabled exception occurs in any element of tgt = VSR[XT] the vector, no results are written to VSR[XT]. DP DP 0 64 127 Special Registers Altered: FX XX VXSNAN VSR Data Layout for xvrdpic src = VSR[XB] DP DP tgt = VSR[XT] DP DP 0 64 127 478 Power ISATM Book I Version 2.06 VSX Vector Round to Double-Precision VSX Vector Round to Double-Precision Integer using round toward +Infinity XX2-form Integer using round toward Zero XX2-form xvrdpip XT,XB (0xF000_03A4) xvrdpiz XT,XB (0xF000_0364) 60 T /// B 233 BX TX 60 T /// B 217 BX TX 0 6 11 16 21 30 31 0 6 11 16 21 30 31 XT TX || T XT TX || T XB BX || B XB BX || B ex_flag 0b0 ex_flag 0b0 do i=0 to 127 by 64 do i=0 to 127 by 64 reset_xflags() reset_xflags() result{i:i+63} RoundToDPIntegerCeil(VSR[XB]{i:i+63}) result{i:i+63} RoundToDPIntegerTrunc(VSR[XB]{i:i+63}) if(vxsnan_flag) then SetFX(VXSNAN) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxsnan_flag) end end if( ex_flag = 0 ) then VSR[XT] result if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. operand in doubleword element i of VSR[XB]. src is rounded to an integer using the rounding src is rounded to an integer using the rounding mode Round toward +Infinity. mode Round toward Zero. The result is placed into doubleword element i of The result is placed into doubleword element i of VSR[XT] in double-precision format. VSR[XT] in double-precision format. If a trap-enabled exception occurs in any element of If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. the vector, no results are written to VSR[XT]. Special Registers Altered: Special Registers Altered: FX VXSNAN FX VXSNAN VSR Data Layout for xvrdpip VSR Data Layout for xvrdpiz src = VSR[XB] src = VSR[XB] DP DP DP DP tgt = VSR[XT] tgt = VSR[XT] DP DP DP DP 0 64 127 0 64 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 479 Version 2.06 VSX Vector Reciprocal Estimate Source Value Result Exception Double-Precision XX2-form ­Infinity ­Zero None xvredp XT,XB (0xF000_0368) ­Zero ­Infinity1 ZX 60 T /// B 218 BX TX +Zero +Infinity1 ZX 0 6 11 16 21 30 31 +Infinity +Zero None XT TX || T SNaN QNaN2 VXSNAN XB BX || B ex_flag 0b0 QNaN QNaN None 1. No result if ZE=1. do i=0 to 127 by 64 2. No result if VE=1. reset_xflags() v{0:inf} ReciprocalEstimateDP(VSR[XB]{i:i+63}) result{i:i+63} RoundToDP(RN,v) If a trap-enabled exception occurs in any element of if(vxsnan_flag) then SetFX(VXSNAN) the vector, no results are written to VSR[XT]. if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) The results of executing this instruction is permitted to if(zx_flag) then SetFX(ZX) vary between implementations, and between different ex_flag ex_flag | (VE & vxsnan_flag) executions on the same implementation. ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) Special Registers Altered: ex_flag ex_flag | (ZE & zx_flag) FX OX UX ZX VXSNAN end if( ex_flag = 0 ) then VSR[XT] result VSR Data Layout for xvredp src = VSR[XB] Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. DP DP tgt = VSR[XT] For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point DP DP operand in doubleword element i of VSR[XB]. 0 64 127 A double-precision floating-point estimate of the reciprocal of src is placed into doubleword element i of VSR[XT] in double-precision format. Unless the reciprocal of src would be a zero, an infinity, or a QNaN, the estimate has a relative error in precision no greater than one part in 16384 of the reciprocal of src. That is, estimate ­ ---------- 1 src 1 -------------------------------------------- - ------------------ 1 16384 ---------- src Operation with various special values of the operand is summarized below. 480 Power ISATM Book I Version 2.06 VSX Vector Reciprocal Estimate Source Value Result Exception Single-Precision XX2-form ­Infinity ­Zero None xvresp XT,XB (0xF000_0268) ­Zero ­Infinity1 ZX 60 T /// B 154 BX TX +Zero +Infinity1 ZX 0 6 11 16 21 30 31 +Infinity +Zero None XT TX || T SNaN QNaN2 VXSNAN XB BX || B ex_flag 0b0 QNaN QNaN None 1. No result if ZE=1. do i=0 to 127 by 32 2. No result if VE=1. reset_xflags() v{0:inf} ReciprocalEstimateSP(VSR[XB]{i:i+31}) result{i:i+31} RoundToSP(RN,v) If a trap-enabled exception occurs in any element of if(vxsnan_flag) then SetFX(VXSNAN) the vector, no results are written to VSR[XT]. if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) The results of executing this instruction is permitted to if(zx_flag) then SetFX(ZX) vary between implementations, and between different ex_flag ex_flag | (VE & vxsnan_flag) executions on the same implementation. ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) Special Registers Altered: ex_flag ex_flag | (ZE & zx_flag) FX OX UX ZX VXSNAN end if( ex_flag = 0 ) then VSR[XT] result VSR Data Layout for xvresp src = VSR[XB] Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. SP SP SP SP tgt = VSR[XT] For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point SP SP SP SP operand in word element i of VSR[XB]. 0 32 64 96 127 A single-precision floating-point estimate of the reciprocal of src is placed into word element i of VSR[XT] in single-precision format. Unless the reciprocal of src would be a zero, an infinity, or a QNaN, the estimate has a relative error in precision no greater than one part in 16384 of the reciprocal of src. That is, estimate ­ ---------- 1 src 1 -------------------------------------------- - ------------------ 1 16384 ---------- src Operation with various special values of the operand is summarized below. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 481 Version 2.06 VSX Vector Round to Single-Precision Integer VSX Vector Round to Single-Precision Integer using round to Nearest Away XX2-form Exact using Current rounding mode XX2-form xvrspi XT,XB (0xF000_0224) xvrspic XT,XB (0xF000_02AC) 60 T /// B 137 BX TX 60 T /// B 171 BX TX 0 6 11 16 21 30 31 0 6 11 16 21 30 31 XT TX || T XT TX || T XB BX || B XB BX || B ex_flag 0b0 ex_flag 0b0 do i=0 to 127 by 32 do i=0 to 127 by 32 reset_xflags() reset_xflags() result{i:i+31} RoundToSPIntegerNearAway(VSR[XB]{i:i+31}) src{0:31} VSR[XB]{i:i+31} if(vxsnan_flag) then SetFX(VXSNAN) if(RN=0b00) then ex_flag ex_flag | (VE & vxsnan_flag) result{i:i+31} RoundToSPIntegerNearEven(src) end if(RN=0b01) then result{i:i+31} RoundToSPIntegerTrunc(src) if( ex_flag = 0 ) then VSR[XT] result if(RN=0b10) then result{i:i+31} RoundToSPIntegerCeil(src) Let XT be the value TX concatenated with T. if(RN=0b11) then Let XB be the value BX concatenated with B. result{i:i+31} RoundToSPIntegerFloor(src) if(vxsnan_flag) then SetFX(VXSNAN) For each vector element i from 0 to 3, do the following. if(xx_flag) then SetFX(XX) Let src be the single-precision floating-point ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (XE & xx_flag) operand in word element i of VSR[XB]. end src is rounded to an integer using the rounding if( ex_flag = 0 ) then VSR[XT] result mode Round to Nearest Away. Let XT be the value TX concatenated with T. The result is placed into word element i of Let XB be the value BX concatenated with B. VSR[XT] in single-precision format. For each vector element i from 0 to 3, do the following. If a trap-enabled exception occurs in any element of Let src be the single-precision floating-point the vector, no results are written to VSR[XT]. operand in word element i of VSR[XB]. Special Registers Altered: src is rounded to an integer value using the FX VXSNAN rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. VSR Data Layout for xvrspi src = VSR[XB] The result is placed into word element i of VSR[XT] in single-precision format. SP SP SP SP If a trap-enabled exception occurs in any element of tgt = VSR[XT] the vector, no results are written to VSR[XT]. SP SP SP SP 0 32 64 96 127 Special Registers Altered: FX XX VXSNAN VSR Data Layout for xvrspic src = VSR[XB] SP SP SP SP tgt = VSR[XT] SP SP SP SP 0 32 64 96 127 482 Power ISATM Book I Version 2.06 VSX Vector Round to Single-Precision Integer VSX Vector Round to Single-Precision Integer using round toward -Infinity XX2-form using round toward +Infinity XX2-form xvrspim XT,XB (0xF000_02E4) xvrspip XT,XB (0xF000_02A4) 60 T /// B 185 BX TX 60 T /// B 169 BX TX 0 6 11 16 21 30 31 0 6 11 16 21 30 31 XT TX || T XT TX || T XB BX || B XB BX || B ex_flag 0b0 ex_flag 0b0 do i=0 to 127 by 32 do i=0 to 127 by 32 reset_xflags() reset_xflags() result{i:i+31} = RoundToSPIntegerFloor(VSR[XB]{i:i+31}) result{i:i+31} = RoundToSPIntegerCeil(VSR[XB]{i:i+31}) if(vxsnan_flag) then SetFX(VXSNAN) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxsnan_flag) end end if( ex_flag = 0 ) then VSR[XT] result if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point Let src be the single-precision floating-point operand in word element i of VSR[XB]. operand in word element i of VSR[XB]. src is rounded to an integer using the rounding src is rounded to an integer using the rounding mode Round toward -Infinity. mode Round toward +Infinity. The result is placed into word element i of The result is placed into word element i of VSR[XT] in single-precision format. VSR[XT] in single-precision format. If a trap-enabled exception occurs in any element of If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. the vector, no results are written to VSR[XT]. Special Registers Altered: Special Registers Altered: FX VXSNAN FX VXSNAN VSR Data Layout for xvrspim VSR Data Layout for xvrspip src = VSR[XB] src = VSR[XB] SP SP SP SP SP SP SP SP tgt = VSR[XT] tgt = VSR[XT] SP SP SP SP SP SP SP SP 0 32 64 96 127 0 32 64 96 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 483 Version 2.06 VSX Vector Round to Single-Precision Integer using round toward Zero XX2-form xvrspiz XT,XB (0xF000_0264) 60 T /// B 153 BX TX 0 6 11 16 21 30 31 XT TX || T XB BX || B ex_flag 0b0 do i=0 to 127 by 32 reset_xflags() result{i:i+31} = RoundToSPIntegerTrunc(VSR[XB]{i:i+31}) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point operand in word element i of VSR[XB]. src is rounded to an integer using the rounding mode Round toward Zero. The result is placed into word element i of VSR[XT] in single-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN VSR Data Layout for xvrspiz src = VSR[XB] SP SP SP SP tgt = VSR[XT] SP SP SP SP 0 32 64 96 127 484 Power ISATM Book I Version 2.06 VSX Vector Reciprocal Square Root Estimate Source Value Result Exception Double-Precision XX2-form ­Infinity QNaN1 VXSQRT xvrsqrtedp XT,XB (0xF000_0328) +Infinity +Zero None 60 T /// B 202 BX TX ­Finite QNaN1 VXSQRT 0 6 11 16 21 30 31 ­Zero ­Infinity2 ZX XT TX || T +Zero +Infinity2 ZX XB BX || B ex_flag 0b0 SNaN QNaN1 VXSNAN do i0 to 127 by 64 QNaN QNaN None reset_xflags() 1. No result if VE=1. v{0:inf} RecipSquareRootEstimateDP(VSR[XB]{i:i+63}) 2. No result if ZE=1. result{i:i+63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) If a trap-enabled exception occurs in any element of if(vxsqrt_flag) then SetFX(VXSQRT) the vector, no results are written to VSR[XT]. if(zx_flag) then SetFX(ZX) ex_flag ex_flag | (VE & vxsnan_flag) The results of executing this instruction is permitted to ex_flag ex_flag | (VE & vxsqrt_flag) ex_flag ex_flag | (ZE & zx_flag) vary between implementations, and between different end executions on the same implementation. if( ex_flag = 0 ) then VSR[XT] result Special Registers Altered: FX ZX VXSNAN VXSQRT Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. VSR Data Layout for xvrsqrtedp src = VSR[XB] For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point DP DP operand in doubleword element i of VSR[XB]. tgt = VSR[XT] A double-precision floating-point estimate of the DP DP reciprocal square root of src is placed into 0 64 127 doubleword element i of VSR[XT] in double-precision format. Unless the reciprocal of the square root of src would be a zero, an infinity, or a QNaN, the estimate has a relative error in precision no greater than one part in 16384 of the reciprocal of the square root of src. That is, 1 estimate ­ -------------- - src- 1 ------------------------------------------------ --------------- - 1 16384 ---------------- src Operation with various special values of the operand is summarized below. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 485 Version 2.06 VSX Vector Reciprocal Square Root Estimate Source Value Result Exception Single-Precision XX2-form ­Infinity QNaN1 VXSQRT xvrsqrtesp XT,XB (0xF000_0228) +Infinity +Zero None 60 T /// B 138 BX TX ­Finite QNaN1 VXSQRT 0 6 11 16 21 30 31 ­Zero ­Infinity2 ZX XT TX || T +Zero +Infinity2 ZX XB BX || B ex_flag 0b0 SNaN QNaN1 VXSNAN do i=0 to 127 by 32 QNaN QNaN None reset_xflags() 1. No result if VE=1. v{0:inf} RecipSquareRootEstimateSP(VSR[XB]{i:i+31}) 2. No result if ZE=1. result{i:i+31} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) If a trap-enabled exception occurs in any element of if(vxsqrt_flag) then SetFX(VXSQRT) the vector, no results are written to VSR[XT]. if(zx_flag) then SetFX(ZX) ex_flag ex_flag | (VE & vxsnan_flag) The results of executing this instruction is permitted to ex_flag ex_flag | (VE & vxsqrt_flag) ex_flag ex_flag | (ZE & zx_flag) vary between implementations, and between different end executions on the same implementation. if( ex_flag = 0 ) then VSR[XT] result Special Registers Altered: FX ZX VXSNAN VXSQRT Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. VSR Data Layout for xvrsqrtesp src = VSR[XB] For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point SP SP SP SP operand in word element i of VSR[XB]. tgt = VSR[XT] A single-precision floating-point estimate of the SP SP SP SP reciprocal square root of src is placed into word 0 32 64 96 127 element i of VSR[XT] in single-precision format. Unless the reciprocal of the square root of src would be a zero, an infinity, or a QNaN, the estimate has a relative error in precision no greater than one part in 16384 of the reciprocal of the square root of src. That is, 1 estimate ­ -------------- - src 1 ------------------------------------------------ - --------------- - 1 16384 ---------------- src Operation with various special values of the operand is summarized below. 486 Power ISATM Book I Version 2.06 VSX Vector Square Root Double-Precision See Table 46, "Floating-Point Intermediate Result XX2-form Handling," on page 344. xvsqrtdp XT,XB (0xF000_032C) The result is placed into doubleword element i of VSR[XT] in double-precision format. 60 T /// B 203 BX TX 0 6 11 16 21 30 31 See Table 68, "Vector Floating-Point Final Result," on page 400. XT TX || T XB BX || B If a trap-enabled exception occurs in any element of ex_flag 0b0 the vector, no results are written to VSR[XT]. do i0 to 127 by 64 reset_xflags() Special Registers Altered: v{0:inf} SquareRootDP(VSR[XB]{i:i+63}) FX XX VXSNAN VXSQRT result{i:i+63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) VSR Data Layout for xvsqrtdp if(vxsqrt_flag) then SetFX(VXSQRT) if(xx_flag) then SetFX(XX) src = VSR[XB] ex_flag ex_flag | (VE & vxsnan_flag) DP DP ex_flag ex_flag | (VE & vxsqrt_flag) ex_flag ex_flag | (XE & xx_flag tgt = VSR[XT] end DP DP if( ex_flag ) then VSR[XT] result 0 64 127 Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. The unbounded-precision square root of src is produced. See Table 95. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. src -Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v dQNaN v Q(src) v +Zero v +Zero v SQRT(src) v +Infinity v src vxsqrt_flag 1 vxsqrt_flag 1 vxsnan_flag 1 Explanation: src The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. SQRT(x) The unbounded-precision square root of the floating-point value x. Q(x) Return a QNaN with the payload of x. v The intermediate result having unbounded signficand precision and unbounded exponent range. Table 95. Actions for xvsqrtdp Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 487 Version 2.06 VSX Vector Square Root Single-Precision See Table 46, "Floating-Point Intermediate Result XX2-form Handling," on page 344. xvsqrtsp XT,XB (0xF000_022C) The result is placed into word element i of VSR[XT] in single-precision format. 60 T /// B 139 BX TX 0 6 11 16 21 30 31 See Table 68, "Vector Floating-Point Final Result," on page 400. XT TX || T XB BX || B If a trap-enabled exception occurs in any element of ex_flag 0b0 the vector, no results are written to VSR[XT]. do i=0 to 127 by 32 reset_xflags() Special Registers Altered: v{0:inf} SquareRootSP(VSR[XB]{i:i+31}) FX XX VXSNAN VXSQRT result{i:i+31} RoundToSP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) VSR Data Layout for xvsqrtsp if(vxsqrt_flag) then SetFX(VXSQRT) if(xx_flag) then SetFX(XX) src = VSR[XB] ex_flag ex_flag | (VE & vxsnan_flag) SP SP SP SP ex_flag ex_flag | (VE & vxsqrt_flag) ex_flag ex_flag | (XE & xx_flag tgt = VSR[XT] end SP SP SP SP if( ex_flag ) then VSR[XT] result 0 32 64 96 127 Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point operand in word element i of VSR[XB]. The unbounded-precision square root of src is produced. See Table 96. The intermediate result is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. src -Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v dQNaN v Q(src) v +Zero v +Zero v SQRT(src) v +Infinity v src vxsqrt_flag 1 vxsqrt_flag 1 vxsnan_flag 1 Explanation: src The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). dQNaN Default quiet NaN (0x7FC0_0000). NZF Nonzero finite number. SQRT(x) The unbounded-precision square root of the floating-point value x. Q(x) Return a QNaN with the payload of x. v The intermediate result having unbounded signficand precision and unbounded exponent range. Table 96. Actions for xvsqrtsp 488 Power ISATM Book I Version 2.06 VSX Vector Subtract Double-Precision See Table 46, "Floating-Point Intermediate Result XX3-form Handling," on page 344. xvsubdp XT,XA,XB (0xF000_0340) The result is placed into doubleword element i of VSR[XT] in double-precision format. 60 T A B 104 AXBX TX 0 6 11 16 21 29 30 31 See Table 68, "Vector Floating-Point Final Result," on page 400. XT TX || T XA AX || A If a trap-enabled exception occurs in any element of XB BX || B ex_flag 0b0 the vector, no results are written to VSR[XT]. do i=0 to 127 by 64 Special Registers Altered: reset_xflags() FX OX UX XX VXSNAN VXISI src1 VSR[XA]{i:i+63} src2 VSR[XB]{i:i+63} VSR Data Layout for xvsubdp v{0:inf} AddDP(src1,NegateDP(src2)) result{i:i+63} RoundToDP(RN,v) src1 = VSR[XA] if(vxsnan_flag) then SetFX(VXSNAN) DP DP if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) src2 = VSR[XB] if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) DP DP ex_flag ex_flag | (VE & vxsnan_flag) tgt = VSR[XT] ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) DP DP ex_flag ex_flag | (UE & ux_flag) 0 64 127 ex_flag ex_flag | (XE & xx_flag) end if( ex_flag ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. src2 is negated and added1 to src1, producing a sum having unbounded range and precision. The sum is normalized2. See Table 97. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. 1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 489 Version 2.06 src2 -Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) -Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) -NZF v +Infinity v S(src1,src2) v src1 v src1 v S(src1,src2) v ­Infinity v src2 vxsnan_flag 1 v Q(src2) -Zero v +Infinity v ­src2 v ­Zero v Rezd v ­src2 v ­Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v +Infinity v ­src2 v Rezd v +Zero v ­src2 v ­Infinity v src2 vxsnan_flag 1 src1 v Q(src2) +NZF v +Infinity v S(src1,src2) v src1 v src1 v S(src1,src2) v ­Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v src1 QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1 vxsnan_flag 1 v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Explanation: src1 The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). src2 The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). dQNaN Default quiet NaN (0x7FF8_0000_0000_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Q(x) Return a QNaN with the payload of x. v The intermediate result having unbounded signficand precision and unbounded exponent range. Table 97. Actions for xvsubdp 490 Power ISATM Book I Version 2.06 VSX Vector Subtract Single-Precision See Table 46, "Floating-Point Intermediate Result XX3-form Handling," on page 344. xvsubsp XT,XA,XB (0xF000_0240) The result is placed into word element i of VSR[XT] in single-precision format. 60 T A B 72 AXBX TX 0 6 11 16 21 29 30 31 See Table 68, "Vector Floating-Point Final Result," on page 400. XT TX || T XA AX || A If a trap-enabled exception occurs in any element of XB BX || B ex_flag 0b0 the vector, no results are written to VSR[XT]. do i=0 to 127 by 32 Special Registers Altered: reset_xflags() FX OX UX XX VXSNAN VXISI src1 VSR[XA]{i:i+31} src2 VSR[XB]{i:i+31} VSR Data Layout for xvsubsp v{0:inf} AddSP(src1,NegateSP(src2)) result{i:i+31} RoundToSP(RN,v) src1 = VSR[XA] if(vxsnan_flag) then SetFX(VXSNAN) SP SP SP SP if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) src2 = VSR[XB] if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) SP SP SP SP ex_flag ex_flag | (VE & vxsnan_flag) tgt = VSR[XT] ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) SP SP SP SP ex_flag ex_flag | (UE & ux_flag) 0 32 64 96 127 ex_flag ex_flag | (XE & xx_flag) end if( ex_flag ) then VSR[XT] result Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. src2 is negated and added1 to src1, producing a sum having unbounded range and precision. The sum is normalized2. See Table 98. The intermediate result is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. 1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two expo- nents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermedi- ate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted. Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 491 Version 2.06 src2 -Infinity -NZF -Zero +Zero +NZF +Infinity QNaN SNaN v dQNaN v Q(src2) -Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v ­Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v Q(src2) -NZF v +Infinity v S(src1,src2) v src1 v src1 v S(src1,src2) v ­Infinity v src2 vxsnan_flag 1 v Q(src2) -Zero v +Infinity v ­src2 v ­Zero v Rezd v ­src2 v ­Infinity v src2 vxsnan_flag 1 v Q(src2) +Zero v +Infinity v ­src2 v Rezd v +Zero v ­src2 v ­Infinity v src2 vxsnan_flag 1 src1 v Q(src2) +NZF v +Infinity v S(src1,src2) v src1 v src1 v S(src1,src2) v ­Infinity v src2 vxsnan_flag 1 v dQNaN v Q(src2) +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src2 vxisi_flag 1 vxsnan_flag 1 v src1 QNaN v src1 v src1 v src1 v src1 v src1 v src1 v src1 vxsnan_flag 1 v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) v Q(src1) SNaN vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 vxsnan_flag 1 Explanation: src1 The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). src2 The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). dQNaN Default quiet NaN (0x7FC0_0000). NZF Nonzero finite number. Rezd Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). S(x,y) Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Q(x) Return a QNaN with the payload of x. v The intermediate result having unbounded signficand precision and unbounded exponent range. Table 98. Actions for xvsubsp 492 Power ISATM Book I Version 2.06 VSX Vector Test for software Divide fg_flag is set to 1 for any of the following Double-Precision XX3-form conditions. xvtdivdp BF,XA,XB (0xF000_03E8) ­ src1 is an infinity. ­ src2 is a zero, an infinity, or a denormalized 60 BF // A B 125 AXBX / value. 0 6 9 11 16 21 29 30 31 CR field BF is set to the value XA AX || A 0b1 || fg_flag || fe_flag || 0b0. XB BX || B eq_flag 0b0 Special Registers Altered: gt_flag 0b0 CR[BF] do i=0 to 127 by 64 src1 VSR[XA]{i:i+63} VSR Data Layout for xvtdivdp src2 VSR[XB]{i:i+63} src1 = VSR[XA] e_a src1{1:11} - 1023 e_b src2{1:11} - 1023 DP DP fe_flag fe_flag | IsNaN(src1) | IsInf(src1) | IsNaN(src2) | IsInf(src2) | IsZero(src2) | src2 = VSR[XB] ( e_b <= -1022 ) | DP DP ( e_b >= 1021 ) | 0 64 127 ( !IsZero(src1) & ( (e_a - e_b) >= 1023 ) ) | ( !IsZero(src1) & ( (e_a - e_b) <= -1021 ) ) | ( !IsZero(src1) & ( e_a <= -970 ) ) fg_flag fg_flag | IsInf(src1) | IsInf(src2) | IsZero(src2) | IsDen(src2) end fl_flag xvredp_error() <= 2-14 CR[BF] 0b1 || fg_flag || fe_flag || 0b0 Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. fe_flag is initialized to 0. fg_flag is initialized to 0. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. Let e_a be the unbiased exponent of src1. Let e_b be the unbiased exponent of src2. fe_flag is set to 1 for any of the following conditions. ­ src1 is a NaN or an infinity. ­ src2 is a zero, a NaN, or an infinity. ­ e_b is less than or equal to -1022. ­ e_b is greater than or equal to 1021. ­ src1 is not a zero and the difference, e_a - e_b, is greater than or equal to 1023. ­ src1 is not a zero and the difference, e_a - e_b, is less than or equal to -1021. ­ src1 is not a zero and e_a is less than or equal to -970 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 493 Version 2.06 VSX Vector Test for software Divide fg_flag is set to 1 for any of the following Single-Precision XX3-form conditions. xvtdivsp BF,XA,XB (0xF000_02E8) ­ src1 is an infinity. ­ src2 is a zero, an infinity, or a denormalized 60 BF // A B 93 AXBX / value. 0 6 9 11 16 21 29 30 31 CR field BF is set to the value XA AX || A 0b1 || fg_flag || fe_flag || 0b0. XB BX || B eq_flag 0b0 gt_flag 0b0 Special Registers Altered: CR[BF] do i=0 to 127 by 32 src1 VSR[XA]{i:i+31} VSR Data Layout for xvtdivsp src2 VSR[XB]{i:i+31} e_a src1{1:8} - 127 src1 = VSR[XA] e_b src2{1:8} - 127 SP SP SP SP fe_flag fe_flag | IsNaN(src1) | IsInf(src1) | IsNaN(src2) | IsInf(src2) | IsZero(src2) | src2 = VSR[XB] ( e_b <= -126 ) | ( e_b >= 125 ) | SP SP SP SP ( !IsZero(src1) & ( (e_a - e_b) >= 127 ) ) | 0 32 64 96 127 ( !IsZero(src1) & ( (e_a - e_b) <= -125 ) ) | ( !IsZero(src1) & ( e_a <= -103 ) ) fg_flag fg_flag | IsInf(src1) | IsInf(src2) | IsZero(src2) | IsDen(src2) end fl_flag xvredp_error() <= 2-14 CR[BF] 0b1 || fg_flag || fe_flag || 0b0 Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. fe_flag is initialized to 0. fg_flag is initialized to 0. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. Let e_a be the unbiased exponent of src1. Let e_b be the unbiased exponent of src2. fe_flag is set to 1 for any of the following conditions. ­ src1 is a NaN or an infinity. ­ src2 is a zero, a NaN, or an infinity. ­ e_b is less than or equal to -126. ­ e_b is greater than or equal to 125. ­ src1 is not a zero and the difference, e_a - e_b, is greater than or equal to 127. ­ src1 is not a zero and the difference, e_a - e_b, is less than or equal to -125. ­ src1 is not a zero and e_a is less than or equal to -103. 494 Power ISATM Book I Version 2.06 VSX Vector Test for software Square Root VSX Vector Test for software Square Root Double-Precision XX2-form Single-Precision XX2-form xvtsqrtdp BF,XB (0xF000_03A8) xvtsqrtsp BF,XB (0xF000_02A8) 60 BF // /// B 234 BX / 60 BF // /// B 170 BX / 0 6 9 11 16 21 30 31 0 6 9 11 16 21 30 31 XB BX || B XB BX || B fe_flag 0b0 fe_flag 0b0 fg_flag 0b0 fg_flag 0b0 do i=0 to 127 by 64 do i=0 to 127 by 32 src VSR[XB]{i:i+63} src VSR[XB]{i:i+31} e_b src2{1:11} - 1023 e_b src2{1:8} - 127 fe_flag fe_flag | IsNaN(src) | IsInf(src) | fe_flag fe_flag | IsNaN(src) | IsInf(src) | IsZero(src) | IsNeg(src) | ( e_a <= -970 ) IsZero(src) | IsNeg(src) | ( e_a <= -103 ) fg_flag fg_flag | IsInf(src) | IsZero(src) | fg_flag fg_flag | IsInf(src) | IsZero(src) | IsDen(src) IsDen(src) end end fl_flag xvrsqrtedp_error() <= 2-14 fl_flag = xvrsqrtesp_error() <= 2-14 CR[BF] 0b1 || fg_flag || fe_flag || 0b0 CR[BF] = 0b1 || fg_flag || fe_flag || 0b0 Let XB be the value BX concatenated with B. Let XB be the value BX concatenated with B. fe_flag is initialized to 0. fe_flag is initialized to 0. fg_flag is initialized to 0. fg_flag is initialized to 0. For each vector element i from 0 to 1, do the following. For each vector element i from 0 to 3, do the following. Let src be the double-precision floating-point Let src be the single-precision floating-point operand in doubleword element i of VSR[XB]. operand in word element i of VSR[XB]. Let e_b be the unbiased exponent of src. Let e_b be the unbiased exponent of src. fe_flag is set to 1 for any of the following fe_flag is set to 1 for any of the following conditions. conditions. ­ src is a zero, a NaN, an infinity, or a negative ­ src is a zero, a NaN, an infinity, or a negative value. value. ­ e_b is less than or equal to -970. ­ e_b is less than or equal to -103. fg_flag is set to 1 for the following condition. fg_flag is set to 1 for the following condition. ­ src is a zero, an infinity, or a denormalized ­ src is a zero, an infinity, or a denormalized value. value. CR field BF is set to the value CR field BF is set to the value 0b1 || fg_flag || fe_flag || 0b0. 0b1 || fg_flag || fe_flag || 0b0. Special Registers Altered: Special Registers Altered: CR[BF] CR[BF] VSR Data Layout for xvtsqrtdp VSR Data Layout for xvtsqrtsp src = VSR[XB] src = VSR[XB] DP DP SP SP SP SP 0 64 127 0 32 64 96 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 495 Version 2.06 VSX Logical AND XX3-form VSX Logical AND with Complement XX3-form xxland XT,XA,XB (0xF000_0410) xxlandc XT,XA,XB (0xF000_0450) 60 T A B 130 AXBX TX 60 T A B 138 AXBX TX 0 6 11 16 21 29 30 31 0 6 11 16 21 29 30 31 XT TX || T XT TX || T XA AX || A XA AX || A XB BX || B XB BX || B VSR[XT] VSR[XA] & VSR[XB] VSR[XT] VSR[XA] & ~VSR[XB] Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let XB be the value BX concatenated with B. The contents of VSR[XA] are ANDed with the contents The contents of VSR[XA] are ANDed with the of VSR[XB] and the result is placed into VSR[XT]. complement of the contents of VSR[XB] and the result is placed into VSR[XT]. Special Registers Altered: None Special Registers Altered: None VSR Data Layout for xxland src1 = VSR[XA] VSR Data Layout for xxland src1 = VSR[XA] src2 = VSR[XB] src2 = VSR[XB] tgt = VSR[XT] tgt = VSR[XT] 0 127 0 127 496 Power ISATM Book I Version 2.06 VSX Logical NOR XX3-form VSX Logical OR XX3-form xxlnor XT,XA,XB (0xF000_0510) xxlor XT,XA,XB (0xF000_0490) 60 T A B 162 AXBX TX 60 T A B 146 AXBX TX 0 6 11 16 21 29 30 31 0 6 11 16 21 29 30 31 XT TX || T XT TX || T XA AX || A XA AX || A XB BX || B XB BX || B VSR[XT] ~( VSR[XA] | VSR[XB] ) VSR[XT] VSR[XA] | VSR[XB] Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let XB be the value BX concatenated with B. The contents of VSR[XA] are ORed with the contents The contents of VSR[XA] are ORed with the contents of VSR[XB] and the complemented result is placed into of VSR[XB] and the result is placed into VSR[XT]. VSR[XT]. Special Registers Altered: Special Registers Altered: None None VSR Data Layout for xxlor VSR Data Layout for xxlnor src1 = VSR[XA] src1 = VSR[XA] src2 = VSR[XB] src2 = VSR[XB] tgt = VSR[XT] tgt = VSR[XT] 0 127 0 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 497 Version 2.06 VSX Logical XOR XX3-form xxlxor XT,XA,XB (0xF000_04D0) 60 T A B 154 AXBX TX 0 6 11 16 21 29 30 31 XT TX || T XA AX || A XB BX || B VSR[XT] VSR[XA] ^ VSR[XB] Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. The contents of VSR[XA] are XORed with the contents of VSR[XB] and the result is placed into VSR[XT]. Special Registers Altered: None VSR Data Layout for xxlxor src1 = VSR[XA] src2 = VSR[XB] tgt = VSR[XT] 0 127 498 Power ISATM Book I Version 2.06 VSX Merge High Word XX3-form VSX Merge Low Word XX3-form xxmrghw XT,XA,XB (0xF000_0090) xxmrglw XT,XA,XB (0xF000_0190) 60 T A B 18 AXBX TX 60 T A B 50 AXBX TX 0 6 11 16 21 29 30 31 0 6 11 16 21 29 30 31 XT TX || T XT TX || T XA AX || A XA AX || A XB BX || B XB BX || B VSR[XT]{0:31} VSR[XA]{0:31} VSR[XT]{0:31} VSR[XA]{64:95} VSR[XT]{32:63} VSR[XB]{0:31} VSR[XT]{32:63} VSR[XB]{64:95} VSR[XT]{64:95} VSR[XA]{32:63} VSR[XT]{64:95} VSR[XA]{96:127} VSR[XT]{96:127} VSR[XB]{32:63} VSR[XT]{96:127} VSR[XB]{96:127} Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let XB be the value BX concatenated with B. The contents of word element 0 of VSR[XA] are placed The contents of word element 2 of VSR[XA] are placed into word element 0 of VSR[XT]. into word element 0 of VSR[XT]. The contents of word element 0 of VSR[XB] are placed The contents of word element 2 of VSR[XB] are placed into word element 1 of VSR[XT]. into word element 1 of VSR[XT]. The contents of word element 1 of VSR[XA] are placed The contents of word element 3 of VSR[XA] are placed into word element 2 of VSR[XT]. into word element 2 of VSR[XT]. The contents of word element 1 of VSR[XB] are placed The contents of word element 3 of VSR[XB] are placed into word element 3 of VSR[XT]. into word element 3 of VSR[XT]. Special Registers Altered: Special Registers Altered: None None VSR Data Layout for xxmrghw VSR Data Layout for xxmrglw src1 = VSR[XA] src1 = VSR[XA] SP/SW/UW/MW SP/SW/UW/MW unused unused unused unused SP/SW/UW/MW SP/SW/UW/MW src2 = VSR[XB] src2 = VSR[XB] SP/SW/UW/MW SP/SW/UW/MW unused unused unused unused SP/SW/UW/MW SP/SW/UW/MW tgt = VSR[XT] tgt = VSR[XT] SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW 0 32 64 96 127 0 32 64 96 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 499 Version 2.06 VSX Permute Doubleword Immediate VSX Select XX4 XX3-form xxsel XT,XA,XB,XC (0xF000_0030) xxpermdi XT,XA,XB,DM (0xF000_0050) 60 T A B C 3 CXAXBX TX 60 T A B 0 DM 10 AXBX TX 0 6 11 16 21 26 28 29 30 31 0 6 11 16 21 22 24 29 30 31 XT TX || T XT TX || T XA AX || A XA AX || A XB BX || B XB BX || B XC CX || C if(DM=0b00) then VSR[XT] VSR[XA]{0:63} || VSR[XB]{0:63} do i=0 to 127 if(DM=0b01) then VSR[XT] VSR[XA]{0:63} || VSR[XB]{64:127} VSR[XT]{i} (VSR[XC]{i}=0) ? VSR[XA]{i} : VSR[XB]{i} if(DM=0b10) then VSR[XT] VSR[XA]{64:127} || VSR[XB]{0:63} end if(DM=0b11) then VSR[XT] VSR[XA]{64:127} || VSR[XB]{64:127} Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let XB be the value BX concatenated with B. Let XC be the value CX concatenated with C. If DM0=0, the contents of doubleword element 0 of For each bit of VSR[XC] that contains the value 0, the VSR[XA] are placed into doubleword element 0 of corresponding bit of VSR[XA] is placed into the VSR[XT]. Otherwise the contents of doubleword corresponding bit of VSR[XT]. Otherwise, the element 1 of VSR[XA] are placed into doubleword corresponding bit of VSR[XB] is placed into the element 0 of VSR[XT]. corresponding bit of VSR[XT]. If DM1=0, the contents of doubleword element 0 of Special Registers Altered: VSR[XB] are placed into doubleword element 1 of None VSR[XT]. Otherwise the contents of doubleword element 1 of VSR[XB] are placed into doubleword VSR Data Layout for xxsel element 1 of VSR[XT]. src1 = VSR[XA] Special Registers Altered: None src2 = VSR[XB] Extended Mnemonic Equivalent To xxspltd T,A,0 xxpermdi T,A,A,0b00 src3 = VSR[XC] xxspltd T,A,1 xxpermdi T,A,A,0b11 xxmrghd T,A,B xxpermdi T,A,B,0b00 tgt = VSR[XT] xxmrgld T,A,B xxpermdi T,A,B,0b11 0 127 xxswapd T,A xxpermdi T,A,A,0b10 VSR Data Layout for xxpermdi src1 = VSR[XA] DP/SD/UD/MD DP/SD/UD/MD src2 = VSR[XB] DP/SD/UD/MD DP/SD/UD/MD tgt = VSR[XT] DP/SD/UD/MD DP/SD/UD/MD 0 64 127 500 Power ISATM Book I Version 2.06 VSX Shift Left Double by Word Immediate VSX Splat Word XX2-form XX3-form xxspltw XT,XB,UIM (0xF000_0290) xxsldwi XT,XA,XB,SHW (0xF000_0010) 60 T / / / UIM B 164 BX TX 60 T A B 0 SHW 2 AXBX TX 0 6 11 14 16 21 30 31 0 6 11 16 21 22 24 29 30 31 XT TX || T XT TX || T XB BX || B XA AX || A VSR[XT]{0:31} VSR[XB]{32×UIM:32×UIM+31} XB BX || B VSR[XT]{32:63} VSR[XB]{32×UIM:32×UIM+31} source{0:255} VSR[XA] || VSR[XB] VSR[XT]{64:95} VSR[XB]{32×UIM:32×UIM+31} VSR[XT] source{32×SHW:32×SHW+127} VSR[XT]{96:127} VSR[XB]{32×UIM:32×UIM+31} Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let XB be the value BX concatenated with B. The contents of word element UIM of VSR[XB] are Let the source vector be the concatenation of the replicated in each word element of VSR[XT]. contents of VSR[XA] followed by the contents of VSR[XB]. Words SHW:SHW+3 of the source vector Special Registers Altered: are placed into VSR[XT]. None Special Registers Altered: VSR Data Layout for xxspltw None src = VSR[XB] VSR Data Layout for xxsldwi SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW src1 = VSR[XA] tgt = VSR[XT] SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW src2 = VSR[XB] 0 32 64 96 127 SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW tgt = VSR[XT] SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW 0 32 64 96 127 Chapter 7. Vector-Scalar Floating-Point Operations [Category: VSX] 501 Version 2.06 502 Power ISATM Book I Version 2.06 Chapter 8. Signal Processing Engine (SPE) [Category: Signal Processing Engine] 8.1 Overview. . . . . . . . . . . . . . . . . . . . 503 8.3.5.2 Fractional Format . . . . . . . . . . 507 8.2 Nomenclature and Conventions . . 503 8.3.6 Computational Operations . . . . . 508 8.3 Programming Model . . . . . . . . . . . 504 8.3.7 SPE Instructions. . . . . . . . . . . . . 509 8.3.1 General Operation . . . . . . . . . . . 504 8.3.8 Saturation, Shift, and Bit Reverse 8.3.2 GPR Registers. . . . . . . . . . . . . . 504 Models . . . . . . . . . . . . . . . . . . . . . . . . . 509 8.3.3 Accumulator Register . . . . . . . . 504 8.3.8.1 Saturation . . . . . . . . . . . . . . . . 509 8.3.4 Signal Processing Embedded 8.3.8.2 Shift Left . . . . . . . . . . . . . . . . . 509 Floating-Point Status and Control Register 8.3.8.3 Bit Reverse . . . . . . . . . . . . . . . 509 (SPEFSCR). . . . . . . . . . . . . . . . . . . . . 504 8.3.9 SPE Instruction Set . . . . . . . . . . 510 8.3.5 Data Formats . . . . . . . . . . . . . . . 507 8.3.5.1 Integer Format . . . . . . . . . . . . 507 8.1 Overview The RTL conventions in described below are used in addition to those described in Section 1.3:Additional The Signal Processing Engine (SPE) accelerates sig- RTL functions are described in Appendix D. nal processing applications normally suited to DSP Notation Meaning operation. This is accomplished using short vectors ×sf Signed fractional multiplication. Result of (two element) within 64-bit GPRs and using single multiplying 2 signed fractional quantities instruction multiple data (SIMD) operations to perform having bit length n taking the least signifi- the requisite computations. SPE also architects an cant 2n-1 bits of the sign extended product Accumulator register to allow for back to back opera- and concatenating a 0 to the least signifi- tions without loop unrolling. cant bit forming a signed fractional result of 2n bits. Two 16-bit signed fractional quantities, a and b are multiplied, as 8.2 Nomenclature and Conven- shown below: tions ea0:31 = EXTS(a) eb0:31 = EXTS(b) Several conventions regarding nomenclature are used prod0:63 = ea X eb for SPE: eprod0:63 = EXTS(prod32:63) The Signal Processing Engine category is abbrevi- result0:31 = eprod33:63 || 0b0 ated as SPE. ×gsf Guarded signed fractional multiplication. Bits 0 to 31 of a 64-bit register are referenced as Result of multiplying 2 signed fractional upper word, even word or high word element of the quantities having bit length 16 taking the register. Bits 32:63 are referred to as lower word, least significant 31 bits of the sign odd word or low word element of the register. Each extended product and concatenating a 0 half is an element of a 64-bit GPR. to the least significant bit forming a Bits 0 to 15 and bits 32 to 47 are referenced as guarded signed fractional result of 64 bits. even halfwords. Bits 16 to 31 and bits 48 to 63 are Since guarded signed fractional multiplica- referenced as odd halfwords. tion produces a 64-bit result, fractional Mnemonics for SPE instructions generally begin input quantities of -1 and -1 can produce with the letters `ev' (embedded vector). +1 in the intermediate product. Two 16-bit fractional quantities, a and b are multi- plied, as shown below: Chapter 8. Signal Processing Engine (SPE) 503 Version 2.06 ea0:31 = EXTS(a) Unless otherwise specified, SPE instructions write all eb0:31 = EXTS(b) 64-bits of the destination register. prod0:63 = ea X eb eprod0:63 = EXTS(prod32:63) GPR Upper Word GPR Lower Word result0:63 = eprod1:63 || 0b0 0 32 63 << Logical shift left. x << y shifts value x left by y bits, leaving zeros in the vacated bits. Figure 124.GPR >> Logical shift right. x >> y shifts value x right by y bits, leaving zeros in the vacated 8.3.3 Accumulator Register bits. A partially visible accumulator register (ACC) is pro- vided for some SPE instructions. The accumulator is a 8.3 Programming Model 64-bit register that holds the results of the Multiply Accumulate (MAC) forms of SPE Fixed-Point instruc- tions. The accumulator allows the back-to-back execu- 8.3.1 General Operation tion of dependent MAC instructions, something that is SPE instructions generally take elements from one found in the inner loops of DSP code such as FIR and source register and operate on them with the corre- FFT filters. The accumulator is partially visible to the sponding elements of a second source register (and/or programmer in the sense that its results do not have to the accumulator) to produce results. Results are placed be explicitly read to use them. Instead they are always in the destination register and/or the accumulator. copied into a 64-bit destination GPR which is specified Instructions that are vector in nature (i.e. produce as part of the instruction. Based upon the type of results of more than one element) provide results for instruction, the accumulator can hold either a single each element that are independent of the computation 64-bit value or a vector of two 32-bit elements. of the other elements. These instructions can also be used to perform scalar DSP operations by ignoring the ACC Upper Word ACC Lower Word results of the upper 32-bit half of the register file. 0 32 63 There are no record forms of SPE instructions. As a Figure 125.Accumulator result, the meaning of bits in the CR is different than for other categories. SPE Compare instructions specify a CR field, two source registers, and the type of com- 8.3.4 Signal Processing Embed- pare: greater than, less than, or equal. Two bits of the ded Floating-Point Status and Con- CR field are written with the result of the vector com- pare, one for each element. The remaining two bits trol Register (SPEFSCR) reflect the ANDing and ORing of the vector compare Status and control for SPE uses the SPEFSCR regis- results. ter. This register is also used by the SPE.Embedded Float Scalar Double, SPE.Embedded Float Scalar Sin- gle, and SPE.Embedded Float Vector categories. Sta- 8.3.2 GPR Registers tus and control bits are shared with these categories. The SPE requires a GPR register file with thirty-two The SPEFSCR register is implemented as special pur- 64-bit registers. For 32-bit implementations, instruc- pose register (SPR) number 512 and is read and writ- tions that normally operate on a 32-bit register file ten by the mfspr and mtspr instructions. SPE access and change only the least significant 32-bits of instructions affect both the high element (bits 32:33) the GPRs leaving the most significant 32-bits and low element status flags (bits 48:49) of the SPEF- unchanged. For 64-bit implementations, operation of SCR. these instructions is unchanged, i.e. those instructions continue to operate on the 64-bit registers as they SPEFSCR would if the SPE was not implemented. Most SPE 32 63 instructions view the 64-bit register as being composed of a vector of two elements, each of which is 32 bits Figure 126. Signal Processing and Embedded wide (some instructions read or write 16-bit elements). Floating-Point Status and Control Register The most significant 32-bits are called the upper word, The SPEFSCR bits are defined as shown below. high word or even word. The least significant 32-bits are called the lower word, low word or odd word. Bit Description 32 Summary Integer Overflow High (SOVH) SOVH is set to 1 when an SPE instruction sets OVH. This is a sticky bit. 504 Power ISATM Book I Version 2.06 33 Integer Overflow High (OVH) Execution of an SPE.Embedded Float Scalar OVH is set to 1 to indicate that an overflow instruction leaves FDBZH undefined. has occurred in the upper element during exe- 38 Embedded Floating-Point Underflow High cution of an SPE instruction. The bit is set to 1 (FUNFH) [Category: SP.FV] if a result of an operation performed by the The FUNFH bit is set to 1 when the execution instruction cannot be represented in the num- of an SPE.Embedded Float Vector instruction ber of bits into which the result is to be placed, results in an underflow on the high word oper- and is set to 0 otherwise. The OVH bit is not ation. altered by Modulo instructions, nor by other instructions that cannot overflow. Execution of an SPE.Embedded Float Scalar instruction leaves FUNFH undefined. 34 Embedded Floating-Point Guard Bit High (FGH) [Category: SP.FV] 39 Embedded Floating-Point Overflow High FGH is supplied for use by the Embedded (FOVFH) [Category: SP.FV] Floating-Point Round interrupt handler. FGH The FOVFH bit is set to 1 when the execution is an extension of the low-order bits of the of an SPE.Embedded Float Vector instruction fractional result produced from an results in an overflow on the high word opera- SPE.Embedded Float Vector instruction on tion. the high word. FGH is zeroed if an overflow, Execution of an SPE.Embedded Float Scalar underflow, or invalid input error is detected on instruction leaves FOVFH undefined. the high element of an SPE.Embedded Float Vector instruction. 40:41 Reserved Execution of an SPE.Embedded Float Scalar 42 Embedded Floating-Point Inexact Sticky instruction leaves FGH undefined. Flag (FINXS) [Categories: SP.FV, SP.FD, SP.FS] 35 Embedded Floating-Point Inexact Bit High The FINXS bit is set to 1 whenever the execu- (FXH) [Category: SP.FV] tion of an Embedded Floating-Point instruction FXH is supplied for use by the Embedded delivers an inexact result for either the low or Floating-Point Round interrupt handler. FXH is high element and no Embedded Float- an extension of the low-order bits of the frac- ing-Point Data interrupt is taken for either ele- tional result produced from an SPE.Embed- ment, or if an Embedded Floating-Point ded Float Vector instruction on the high word. instruction results in overflow (FOVF=1 or FXH represents the logical `or' of all the bits FOVFH=1), but Embedded Floating-Point shifted right from the Guard bit when the frac- Overflow exceptions are disabled (FOVFE=0), tional result is normalized. FXH is zeroed if an or if an Embedded Floating-Point instruction overflow, underflow, or invalid input error is results in underflow (FUNF=1 or FUNFH=1), detected on the high element of an but Embedded Floating-Point Underflow SPE.Embedded Float Vector instruction. exceptions are disabled (FUNFE=0), and no Execution of an SPE.Embedded Float Scalar Embedded Floating-Point Data interrupt instruction leaves FXH undefined. occurs. This is a sticky bit. 36 Embedded Floating-Point Invalid Opera- 43 Embedded Floating-Point Invalid Opera- tion/Input Error High (FINVH) [Category: tion/Input Sticky Flag (FINVS) [Categories: SP.FV] SP.FV, SP.FD, SP.FS] The FINVH bit is set to 1 if any high word The FINVS bit is defined to be the sticky result operand of an SPE.Embedded Float Vector of any Embedded Floating-Point instruction instruction is infinity, NaN, or a denormalized that causes FINVH or FINV to be set to 1. value, or if the instruction is a divide and the That is, FINVS FINVS | FINV | FINVH. This dividend and divisor are both 0, or if a conver- is a sticky bit. sion to integer or fractional value overflows. 44 Embedded Floating-Point Divide By Zero Execution of an SPE.Embedded Float Scalar Sticky Flag (FDBZS) [Categories: SP.FV, instruction leaves FINVH undefined. SP.FD, SP.FS] The FDBZS bit is set to 1 when an Embedded 37 Embedded Floating-Point Divide By Zero Floating-Point Divide instruction sets FDBZH High (FDBZH) [Category: SP.FV] or FDBZ to 1. That is, FDBZS FDBZS | The FDBZH bit is set to 1 when an FDBZ | FDBZH. This is a sticky bit. SPE.Embedded Vector Floating-Point Divide instruction is executed with a divisor of 0 in the 45 Embedded Floating-Point Underflow high word operand, and the dividend is a finite Sticky Flag (FUNFS) [Categories: SP.FV, nonzero number. SP.FD, SP.FS] The FUNFS bit is defined to be the sticky Chapter 8. Signal Processing Engine (SPE) 505 Version 2.06 result of any Embedded Floating-Point or if the operation is a divide and the dividend instruction that causes FUNFH or FUNF to be and divisor are both 0, or if a conversion to set to 1. That is, FUNFS FUNFS | FUNF | integer or fractional value overflows. FUNFH. This is a sticky bit. 53 Embedded Floating-Point Divide By Zero 46 Embedded Floating-Point Overflow Sticky (Low/scalar) (FDBZ) [Categories: SP.FV, Flag (FOVFS) [Categories: SP.FV, SP.FD, SP.FD, SP.FS] SP.FS] The FDBZ bit is set to 1 when an Embedded The FOVFS bit is defined to be the sticky Floating-Point Divide instruction is executed result of any Embedded Floating-Point with a divisor of 0 in the low word operand, instruction that causes FOVH or FOVF to be and the dividend is a finite nonzero number. set to 1. That is, FOVFS FOVFS | FOVF | 54 Embedded Floating-Point Underflow (Low/ FOVFH. This is a sticky bit. scalar) (FUNF) [Categories: SP.FV, SP.FD, 47 Reserved SP.FS] The FUNF bit is set to 1 when the execution of 48 Summary Integer Overflow (SOV) an Embedded Floating-Point instruction SOV is set to 1 when an SPE instruction sets results in an underflow on the low word opera- OV to 1. This is a sticky bit. tion. 49 Integer Overflow (OV) 55 Embedded Floating-Point Overflow (Low/ OV is set to 1 to indicate that an overflow has scalar) (FOVF) [Categories: SP.FV, SP.FD, occurred in the lower element during execu- SP.FS] tion of an SPE instruction. The bit is set to 1 if The FOVF bit is set to 1 when the execution of a result of an operation performed by the an Embedded Floating-Point instruction instruction cannot be represented in the num- results in an overflow on the low word opera- ber of bits into which the result is to be placed, tion. and is set to 0 otherwise. The OV bit is not altered by Modulo instructions, nor by other 56 Reserved instructions that cannot overflow. 57 Embedded Floating-Point Round (Inexact) 50 Embedded Floating-Point Guard Bit (Low/ Exception Enable (FINXE) [Categories: scalar) (FG) [Categories: SP.FV, SP.FD, SP.FV, SP.FD, SP.FS] SP.FS] 0 Exception disabled FG is supplied for use by the Embedded 1 Exception enabled Floating-Point Round interrupt handler. FG is an extension of the low-order bits of the frac- The Embedded Floating-Point Round interrupt tional result produced from an Embedded is taken if the exception is enabled and if FG | Floating-Point instruction on the low word. FG FGH | FX | FXH (signifying an inexact result) is zeroed if an overflow, underflow, or invalid is set to 1 as a result of an Embedded Float- input error is detected on the low element of ing-Point instruction. an Embedded Floating-Point instruction. If an Embedded Floating-Point instruction 51 Embedded Floating-Point Inexact Bit (Low/ results in overflow or underflow and the corre- scalar) (FX) [Categories: SP.FV, SP.FD, sponding Embedded Floating-Point Underflow SP.FS] or Embedded Floating-Point Overflow excep- FX is supplied for use by the Embedded Float- tion is disabled then the Embedded Float- ing-Point Round interrupt handler. FX is an ing-Point Round interrupt is taken. extension of the low-order bits of the fractional 58 Embedded Floating-Point Invalid Opera- result produced from an Embedded Float- tion/Input Error Exception Enable (FINVE) ing-Point instruction on the low word. FX rep- [Categories: SP.FV, SP.FD, SP.FS] resents the logical `or' of all the bits shifted right from the Guard bit when the fractional 0 Exception disabled result is normalized. FX is zeroed if an over- 1 Exception enabled flow, underflow, or invalid input error is If the exception is enabled, an Embedded detected on Embedded Floating-Point instruc- Floating-Point Data interrupt is taken if the tion FINV or FINVH bit is set to 1 by an Embedded 52 Embedded Floating-Point Invalid Opera- Floating-Point instruction. tion/Input Error (Low/scalar) (FINV) [Cate- 59 Embedded Floating-Point Divide By Zero gories: SP.FV, SP.FD, SP.FS] Exception Enable (FDBZE) [Categories: The FINV bit is set to 1 if any low word oper- SP.FV, SP.FD, SP.FS] and of an Embedded Floating-Point instruc- tion is infinity, NaN, or a denormalized value, 0 Exception disabled 506 Power ISATM Book I Version 2.06 1 Exception enabled produce values larger than 2n-1 or smaller than 0 may set OV or OVH in the SPEFSCR. If the exception is enabled, an Embedded Floating-Point Data interrupt is taken if the Signed integers consist of 16, 32, or 64-bit binary val- FDBZ or FDBZH bit is set to 1 by an Embed- ues in two's complement form. The largest represent- ded Floating-Point instruction. able value is 2n-1-1 where n represents the number of 60 Embedded Floating-Point Underflow bits in the value. The smallest representable value is Exception Enable (FUNFE) [Categories: -2n-1. Computations that produce values larger than SP.FV, SP.FD, SP.FS] 2n-1-1 or smaller than -2n-1 may set OV or OVH in the SPEFSCR. 0 Exception disabled 1 Exception enabled 8.3.5.2 Fractional Format If the exception is enabled, an Embedded Floating-Point Data interrupt is taken if the Fractional data format is conventionally used for DSP FUNF or FUNFH bit is set to 1 by an Embed- fractional arithmetic. Fractional data is useful for repre- ded Floating-Point instruction. senting data converted from analog devices. 61 Embedded Floating-Point Overflow Excep- Unsigned fractions consist of 16, 32, or 64-bit binary tion Enable (FOVFE) [Categories: SP.FV, fractional values that range from 0 to less than 1. SP.FD, SP.FS] Unsigned fractions place the radix point immediately to the left of the most significant bit. The most significant 0 Exception disabled bit of the value represents the value 2-1, the next most 1 Exception enabled significant bit represents the value 2-2 and so on. The If the exception is enabled, an Embedded largest representable value is 1-2-n where n represents Floating-Point Data interrupt is taken if the the number of bits in the value. The smallest represent- FOVF or FOVFH bit is set to 1 by an Embed- able value is 0. Computations that produce values ded Floating-Point instruction. larger than 1-2-n or smaller than 0 may set OV or OVH in the SPEFSCR. The SPE category does not define 62:63 Embedded Floating-Point Rounding Mode unsigned fractional forms of instructions to manipulate Control (FRMC) [Categories: SP.FV, SP.FD, unsigned fractional data since the unsigned integer SP.FS] forms of the instructions produce the same results as 00 Round to Nearest would the unsigned fractional forms. 01 Round toward Zero Guarded unsigned fractions are 64-bit binary fractional 10 Round toward +Infinity values. Guarded unsigned fractions place the decimal 11 Round toward -Infinity point immediately to the left of bit 32. The largest repre- sentable value is 232-2-32. The smallest representable Programming Note value is 0. Guarded unsigned fractional computations Rounding modes 0b10 (+Infinity) and are always modulo and do not set OV or OVH in the 0b11 (-Infinity) may not be supported by SPEFSCR. some implementations. If an implementa- tion does not support these, Embedded Signed fractions consist of 16, 32, or 64-bit binary frac- Floating-Point Round interrupts are gen- tional values in two's-complement form that range from erated for every Embedded Floating-Point -1 to less than 1. Signed fractions place the decimal instruction for which rounding is required point immediately to the right of the most significant bit. when +Infinity or -Infinity modes are set The largest representable value is 1-2-(n-1) where n and software is required to produce the represents the number of bits in the value. The smallest correctly rounded result representable value is -1. Computations that produce values larger than 1-2-(n-1)or smaller than -1 may set OV or OVH in the SPEFSCR. Multiplication of two 8.3.5 Data Formats signed fractional values causes the result to be shifted left one bit to remove the resultant redundant sign bit in The SPE provides two different data formats, integer the product. In this case, a 0 bit is concatenated as the and fractional. Both data formats can be treated as least significant bit of the shifted result. signed or unsigned quantities. Guarded signed fractions are 64-bit binary fractional values. Guarded signed fractions place the decimal 8.3.5.1 Integer Format point immediately to the left of bit 33. The largest repre- sentable value is 232-2-31. The smallest representable Unsigned integers consist of 16, 32, or 64-bit binary value is -232-1+2-31. Guarded signed fractional compu- integer values. The largest representable value is 2n-1 tations are always modulo and do not set OV or OVH in where n represents the number of bits in the value. The the SPEFSCR. smallest representable value is 0. Computations that Chapter 8. Signal Processing Engine (SPE) 507 Version 2.06 8.3.6 Computational Operations Multiply and Accumulate instructions. These instructions perform multiply operations, optionally The SPE category supports several different computa- add the result to the accumulator, and place the tional capabilities. Both modulo and saturation results result into the destination register and optionally can be performed. Modulo results produce truncation of into the accumulator. These instructions are com- the overflow bits in a calculation, therefore overflow posed of different multiply forms, data formats and does not occur and no saturation is performed. For data accumulate options. The mnemonics for instructions for which overflow occurs, saturation pro- these instructions indicate their various character- vides a maximum or minimum representable value (for istics. These are shown in Table 99. the data type) in the case of overflow. Instructions are Load and Store instructions. These instructions provided for a wide range of computational capability. provide load and store capabilities for moving data The operation types can be divided into 4 basic catego- to and from memory. A variety of forms are pro- ries: vided that position data for efficient computation. Compare and miscellaneous instructions. These Simple Vector instructions. These instructions use instructions perform miscellaneous functions such the corresponding low and high word elements of as field manipulation, bit reversed incrementing, the operands to produce a vector result that is and vector compares. placed in the destination register, the accumulator, or both. Table 99:Mnemonic Extensions for Multiply Accumulate Instructions Extension Meaning Comments Multiply Form he halfword even 16 X 16 32 heg halfword even guarded 16 X 16 32, 64-bit final accumulate result ho halfword odd 16 X 16 32 hog halfword odd guarded 16 X 16 32, 64-bit final accumulate result w word 32 X 32 64 wh word high 32 X 32 32 (high-order 32 bits of product) wl word low 32 X 32 32 (low-order 32 bits of product) Data Format smf signed modulo fractional modulo, no saturation or overflow smi signed modulo integer modulo, no saturation or overflow ssf signed saturate fractional saturation on product and accumulate ssi signed saturate integer saturation on product and accumulate umi unsigned modulo integer modulo, no saturation or overflow usi unsigned saturate integer saturation on product and accumulate Accumulate Option a place in accumulator result accumulator aa add to accumulator accumulator + result accumulator aaw add to accumulator as word elements accumulator0:31 + result0:31 accumulator0:31 accumulator32:63 + result32:63 accumulator32:63 an add negated to accumulator accumulator - result accumulator anw add negated to accumulator as word accumulator0:31 - result0:31 accumulator0:31 elements accumulator32:63 - result32:63 accumulator32:63 508 Power ISATM Book I Version 2.06 8.3.7 SPE Instructions 8.3.8 Saturation, Shift, and Bit Reverse Models For saturation, left shifts, and bit reversal, the pseudo RTL is provided here to more accurately describe those functions that are referenced in the instruction pseudo RTL. 8.3.8.1 Saturation SATURATE(ov, carry, sat_ovn, sat_ov, val) if ov then if carry then return sat_ovn else return sat_ov else return val 8.3.8.2 Shift Left SL(value, cnt) if cnt > 31 then return 0 else return (value << cnt) 8.3.8.3 Bit Reverse BITREVERSE(value) result 0 mask 1 shift 31 cnt 32 while cnt > 0 then do t value & mask if shift >= 0 then result (t << shift) | result else result (t >> -shift) | result cnt cnt - 1 shift shift - 2 mask mask << 1 return result Chapter 8. Signal Processing Engine (SPE) 509 Version 2.06 8.3.9 SPE Instruction Set Bit Reversed Increment EVX-form Vector Absolute Value EVX-form brinc RT,RA,RB evabs RT,RA 4 RT RA RB 527 4 RT RA /// 520 0 6 11 16 21 31 0 6 11 16 21 31 n implementation-dependent number of mask bits RT0:31 ABS((RA)0:31) mask (RB)64-n:63 RT32:63 ABS((RA)32:63) a (RA)64-n:63 d BITREVERSE(1 + BITREVERSE(a | (¬ mask))) The absolute value of each element of RA is placed in RT (RA)0:63-n || (d & mask) the corresponding elements of RT. An absolute value of 0x8000_0000 (most negative number) returns brinc computes a bit-reverse index based on the con- 0x8000_0000. tents of RA and a mask specified in RB. The new index is written to RT. Special Registers Altered: None The number of bits in the mask is implementa- tion-dependent but may not exceed 32. Special Registers Altered: None Vector Add Immediate Word EVX-form Programming Note evaddiw RT,RB,UI brinc provides a way for software to access FFT 4 RT UI RB 514 data in a bit-reversed manner. RA contains the 0 6 11 16 21 31 index into a buffer that contains data on which FFT is to be performed. RB contains a mask that allows RT0:31 (RB)0:31 + EXTZ(UI) the index to be updated with bit-reversed address- RT32:63 (RB)32:63 + EXTZ(UI) ing. Typically this instruction precedes a load with index instruction; for example, UI is zero-extended and added to both the high and low elements of RB and the results are placed in RT. Note brinc r2, r3, r4 that the same value is added to both elements of the lhax r8, r5, r2 register. RB contains a bit-mask that is based on the num- Special Registers Altered: ber of points in an FFT. To access a buffer contain- None ing n byte sized data that is to be accessed with bit-reversed addressing, the mask has log2n 1s in the least significant bit positions and 0s in the remaining most significant bit positions. If, how- Vector Add Signed, Modulo, Integer to ever, the data size is a multiple of a halfword or a Accumulator Word EVX-form word, the mask is constructed so that the 1s are shifted left by log2 (size of the data) and 0s are evaddsmiaaw RT,RA placed in the least significant bit positions. 4 RT RA /// 1225 0 6 11 16 21 31 Programming Note Architecture Note This instruction only modifies the lower 32 bits of RT0:31 (ACC)0:31 + (RA)0:31 the destination register in 32-bit implementations. RT32:63 (ACC)32:63 + (RA)32:63 For 64-bit implementations in 32-bit mode, the con- ACC0:63 (RT)0:63 tents of the upper 32-bits of the destination register are undefined. Each word element in RA is added to the correspond- ing element in the accumulator and the results are placed in RT and into the accumulator. Programming Note Special Registers Altered: Execution of brinc does not cause SPE Unavail- ACC able exceptions regardless of MSRSPV. 510 Power ISATM Book I Version 2.06 Vector Add Signed, Saturate, Integer to Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX-form Accumulator Word EVX-form evaddssiaaw RT,RA evaddusiaaw RT,RA 4 RT RA /// 1217 4 RT RA /// 1216 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 EXTS((ACC)0:31) + EXTS((RA)0:31) temp0:63 EXTZ((ACC)0:31) + EXTZ((RA)0:31) ovh temp31 temp32 ovh temp31 RT0:31 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 SATURATE(ovh, temp31, 0xFFFF_FFFF, 0x7FFF_FFFF, temp32:63) 0xFFFF_FFFF, temp32:63) temp0:63 EXTS((ACC)32:63) + EXTS((RA)32:63) temp0:63 EXTZ((ACC)32:63) + EXTZ((RA)32:63) ovl temp31 temp32 ovl temp31 RT32:63 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 SATURATE(ovl, temp31, 0xFFFF_FFFF, 0x7FFF_FFFF, temp32:63) 0xFFFF_FFFF, temp32:63) ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 SPEFSCROVH ovh SPEFSCROV ovl SPEFSCROVH ovh SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCROV ovl SPEFSCRSOV SPEFSCRSOV | ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl Each signed-integer word element in RA is sign-extended and added to the corresponding Each unsigned-integer word element in RA is sign-extended element in the accumulator saturating if zero-extended and added to the corresponding overflow occurs, and the results are placed in RT and zero-extended element in the accumulator saturating if the accumulator. overflow occurs, and the results are placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH Special Registers Altered: ACC OV OVH SOV SOVH Vector Add Unsigned, Modulo, Integer to Vector Add Word EVX-form Accumulator Word EVX-form evaddw RT,RA,RB evaddumiaaw RT,RA 4 RT RA RB 512 4 RT RA /// 1224 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 (RA)0:31 + (RB)0:31 RT32:63 (RA)32:63 + (RB)32:63 RT0:31 (ACC)0:31 + (RA)0:31 The corresponding elements of RA and RB are added RT32:63 (ACC)32:63 + (RA)32:63 ACC0:63 (RT)0:63 and the results are placed in RT. The sum is a modulo sum. Each unsigned-integer word element in RA is added to the corresponding element in the accumulator and the Special Registers Altered: results are placed in RT and the accumulator. None Special Registers Altered: ACC Chapter 8. Signal Processing Engine (SPE) 511 Version 2.06 Vector AND EVX-form Vector AND with Complement EVX-form evand RT,RA,RB evandc RT,RA,RB 4 RT RA RB 529 4 RT RA RB 530 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 (RA)0:31 & (RB)0:31 RT0:31 (RA)0:31 & (¬(RB)0:31) RT32:63 (RA)32:63 & (RB)32:63 RT32:63 (RA)32:63 & (¬(RB)32:63) The corresponding elements of RA and RB are ANDed The word elements of RA are ANDed bitwise with the bitwise and the results are placed in the corresponding complement of the corresponding elements of RB. The element of RT. results are placed in the corresponding element of RT. Special Registers Altered: Special Registers Altered: None None Vector Compare Equal EVX-form Vector Compare Greater Than Signed EVX-form evcmpeq BF,RA,RB evcmpgts BF,RA,RB 4 BF // RA RB 564 0 6 9 11 16 21 31 4 BF // RA RB 561 0 6 9 11 16 21 31 ah (RA)0:31 al (RA)32:63 ah (RA)0:31 bh (RB)0:31 al (RA)32:63 bl (RB)32:63 bh (RB)0:31 if (ah = bh) then ch 1 bl (RB)32:63 else ch 0 if (ah > bh) then ch 1 if (al = bl) then cl 1 else ch 0 else cl 0 if (al > bl) then cl 1 CR4×BF+32:4×BF+35 ch || cl || (ch | cl) || (ch & cl) else cl 0 CR4×BF+32:4×BF+35 ch || cl || (ch | cl) || (ch & cl) The most significant bit in BF is set if the high-order ele- ment of RA is equal to the high-order element of RB; it The most significant bit in BF is set if the high-order ele- is cleared otherwise. The next bit in BF is set if the ment of RA is greater than the high-order element of low-order element of RA is equal to the low-order ele- RB; it is cleared otherwise. The next bit in BF is set if ment of RB and cleared otherwise. The last two bits of the low-order element of RA is greater than the BF are set to the OR and AND of the result of the com- low-order element of RB and cleared otherwise. The pare of the high and low elements. last two bits of BF are set to the OR and AND of the result of the compare of the high and low elements. Special Registers Altered: CR field BF Special Registers Altered: CR field BF 512 Power ISATM Book I Version 2.06 Vector Compare Greater Than Unsigned Vector Compare Less Than Signed EVX-form EVX-form evcmpgtu BF,RA,RB evcmplts BF,RA,RB 4 BF // RA RB 560 4 BF // RA RB 563 0 6 9 11 16 21 31 0 6 9 11 16 21 31 ah (RA)0:31 ah (RA)0:31 al (RA)32:63 al (RA)32:63 bh (RB)0:31 bh (RB)0:31 bl (RB)32:63 bl (RB)32:63 if (ah >u bh) then ch 1 if (ah < bh) then ch 1 else ch 0 else ch 0 if (al >u bl) then cl 1 if (al < bl) then cl 1 else cl 0 else cl 0 CR4×BF+32:4×BF+35 ch || cl || (ch | cl) || (ch & cl) CR4×BF+32:4×BF+35 ch || cl || (ch | cl) || (ch & cl) The most significant bit in BF is set if the high-order ele- The most significant bit in BF is set if the high-order ele- ment of RA is greater than the high-order element of ment of RA is less than the high-order element of RB; it RB; it is cleared otherwise. The next bit in BF is set if is cleared otherwise. The next bit in BF is set if the the low-order element of RA is greater than the low-order element of RA is less than the low-order ele- low-order element of RB and cleared otherwise. The ment of RB and cleared otherwise. The last two bits of last two bits of BF are set to the OR and AND of the BF are set to the OR and AND of the result of the com- result of the compare of the high and low elements. pare of the high and low elements. Special Registers Altered: Special Registers Altered: CR field BF CR field BF Vector Compare Less Than Unsigned EVX-form evcmpltu BF,RA,RB 4 BF // RA RB 562 0 6 9 11 16 21 31 ah (RA)0:31 al (RA)32:63 bh (RB)0:31 bl (RB)32:63 if (ah = 0) & (dvh = 0)) then RT32:63 n RT0:31 0x7FFFFFFF ovh 1 The leading sign bits in each element of RA are else if (ddh = 0x8000_0000)&(dvh = 0xFFFF_FFFF) counted, and the respective count is placed into each then element of RT. RT0:31 0x7FFFFFFF ovh 1 Special Registers Altered: if ((ddl < 0) & (dvl = 0)) then None RT32:63 0x8000_0000 ovl 1 Programming Note else if ((ddl >= 0) & (dvl = 0)) then evcntlzw is used for unsigned operands; evcntlsw RT32:63 0x7FFFFFFF ovl 1 is used for signed operands. else if (ddl = 0x8000_0000)&(dvl = 0xFFFF_FFFF) then RT32:63 0x7FFFFFFF ovl 1 SPEFSCROVH ovh SPEFSCROV ovl Vector Count Leading Zeros Word SPEFSCRSOVH SPEFSCRSOVH | ovh EVX-form SPEFSCRSOV SPEFSCRSOV | ovl The two dividends are the two elements of the contents evcntlzw RT,RA of RA. The two divisors are the two elements of the contents of RB. The resulting two 32-bit quotients on 4 RT RA /// 525 0 6 11 16 21 31 each element are placed into RT. The remainders are not supplied. The operands and quotients are inter- preted as signed integers. n 0 do while n < 32 Special Registers Altered: if (RA)n = 1 then leave OV OVH SOV SOVH n n + 1 RT0:31 n Programming Note n 0 do while n < 32 Note that any overflow indication is always set as a if (RA)n+32 = 1 then leave side effect of this instruction. No form is defined n n + 1 that disables the setting of the overflow bits. In RT32:63 n case of overflow, a saturated value is delivered into The leading zero bits in each element of RA are the destination register. counted, and the respective count is placed into each element of RT. Special Registers Altered: None 514 Power ISATM Book I Version 2.06 Vector Divide Word Unsigned EVX-form Vector Equivalent EVX-form evdivwu RT,RA,RB eveqv RT,RA,RB 4 RT RA RB 1223 4 RT RA RB 537 0 6 11 16 21 31 0 6 11 16 21 31 ddh (RA)0:31 RT0:31 (RA)0:31 (RB)0:31 ddl (RA)32:63 RT32:63 (RA)32:63 (RB)32:63 dvh (RB)0:31 dvl (RB)32:63 The corresponding elements of RA and RB are XORed RT0:31 ddh ÷ dvh bitwise, and the complemented results are placed in RT32:63 ddl ÷ dvl RT. ovh 0 Special Registers Altered: ovl 0 if (dvh = 0) then None RT0:31 0xFFFFFFFF ovh 1 if (dvl = 0) then RT32:63 0xFFFFFFFF ovl 1 Vector Extend Sign Byte EVX-form SPEFSCROVH ovh SPEFSCROV ovl evextsb RT,RA SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl 4 RT RA /// 522 The two dividends are the two elements of the contents 0 6 11 16 21 31 of RA. The two divisors are the two elements of the contents of RB. Two 32-bit quotients are formed as a RT0:31 EXTS((RA)24:31) result of the division on each of the high and low ele- RT32:63 EXTS((RA)56:63) ments and the quotients are placed into RT. Remain- ders are not supplied. Operands and quotients are The signs of the low-order byte in each of the elements interpreted as unsigned integers. in RA are extended, and the results are placed in RT. Special Registers Altered: Special Registers Altered: OV OVH SOV SOVH None Programming Note Note that any overflow indication is always set as a side effect of this instruction. No form is defined Vector Extend Sign Halfword EVX-form that disables the setting of the overflow bits. In case of overflow, a saturated value is delivered into evextsh RT,RA the destination register. 4 RT RA /// 523 0 6 11 16 21 31 RT0:31 EXTS((RA)16:31) RT32:63 EXTS((RA)48:63) The signs of the odd halfwords in each of the elements in RA are extended, and the results are placed in RT. Special Registers Altered: None Chapter 8. Signal Processing Engine (SPE) 515 Version 2.06 Vector Load Double Word into Double Vector Load Double Word into Double Word EVX-form Word Indexed EVX-form evldd RT,D(RA) evlddx RT,RA,RB 4 RT RA UI 769 4 RT RA RB 768 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 0 if (RA = 0) then b 0 else b (RA) else b (RA) EA b + EXTZ(UI×8) EA b + (RB) RT MEM(EA, 8) RT MEM(EA, 8) D in the instruction mnemonic is UI × 8. The double- The doubleword addressed by EA is loaded from mem- word addressed by EA is loaded from memory and ory and placed in RT. placed in RT. Special Registers Altered: Special Registers Altered: None None Vector Load Double into Four Halfwords Vector Load Double into Four Halfwords EVX-form Indexed EVX-form evldh RT,D(RA) evldhx RT,RA,RB 4 RT RA UI 773 4 RT RA RB 772 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 0 if (RA = 0) then b 0 else b (RA) else b (RA) EA b + EXTZ(UI×8) EA b + (RB) RT0:15 MEM(EA, 2) RT0:15 MEM(EA, 2) RT16:31 MEM(EA+2,2) RT16:31 MEM(EA+2,2) RT32:47 MEM(EA+4,2) RT32:47 MEM(EA+4,2) RT48:63 MEM(EA+6,2) RT48:63 MEM(EA+6,2) D in the instruction mnemonic is UI × 8. The double- The doubleword addressed by EA is loaded from mem- word addressed by EA is loaded from memory and ory and placed in RT. placed in RT. Special Registers Altered: Special Registers Altered: None None 516 Power ISATM Book I Version 2.06 Vector Load Double into Two Words Vector Load Double into Two Words EVX-form Indexed EVX-form evldw RT,D(RA) evldwx RT,RA,RB 4 RT RA UI 771 4 RT RA RB 770 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 0 if (RA = 0) then b 0 else b (RA) else b (RA) EA b + EXTZ(UI×8) EA b + (RB) RT0:31 MEM(EA, 4) RT0:31 MEM(EA, 4) RT32:63 MEM(EA+4, 4) RT32:63 MEM(EA+4, 4) D in the instruction mnemonic is UI × 8. The double- The doubleword addressed by EA is loaded from mem- word addressed by EA is loaded from memory and ory and placed in RT. placed in RT. Special Registers Altered: Special Registers Altered: None None Vector Load Halfword into Halfwords Vector Load Halfword into Halfwords Even and Splat EVX-form Even and Splat Indexed EVX-form evlhhesplat RT,D(RA) evlhhesplatx RT,RA,RB 4 RT RA UI 777 4 RT RA RB 776 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 0 if (RA = 0) then b 0 else b (RA) else b (RA) EA b + EXTZ(UI×2) EA b + (RB) RT0:15 MEM(EA,2) RT0:15 MEM(EA,2) RT16:31 0x0000 RT16:31 0x0000 RT32:47 MEM(EA,2) RT32:47 MEM(EA,2) RT48:63 0x0000 RT48:63 0x0000 D in the instruction mnemonic is UI × 2. The halfword The halfword addressed by EA is loaded from memory addressed by EA is loaded from memory and placed in and placed in the even halfwords of each element of the even halfwords of each element of RT. The odd RT. The odd halfwords of each element of RT are set to halfwords of each element of RT are set to 0. 0. Special Registers Altered: Special Registers Altered: None None Chapter 8. Signal Processing Engine (SPE) 517 Version 2.06 Vector Load Halfword into Halfword Odd Vector Load Halfword into Halfword Odd Signed and Splat EVX-form Signed and Splat Indexed EVX-form evlhhossplat RT,D(RA) evlhhossplatx RT,RA,RB 4 RT RA UI 783 4 RT RA RB 782 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 0 if (RA = 0) then b 0 else b (RA) else b (RA) EA b + EXTZ(UI×2) EA b + (RB) RT0:31 EXTS(MEM(EA,2)) RT0:31 EXTS(MEM(EA,2)) RT32:63 EXTS(MEM(EA,2)) RT32:63 EXTS(MEM(EA,2)) D in the instruction mnemonic is UI × 2. The halfword The halfword addressed by EA is loaded from memory addressed by EA is loaded from memory and placed in and placed in the odd halfwords sign extended in each the odd halfwords sign extended in each element of element of RT. RT. Special Registers Altered: Special Registers Altered: None None Vector Load Halfword into Halfword Odd Vector Load Halfword into Halfword Odd Unsigned and Splat EVX-form Unsigned and Splat Indexed EVX-form evlhhousplat RT,D(RA) evlhhousplatx RT,RA,RB 4 RT RA UI 781 4 RT RA RB 780 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 0 if (RA = 0) then b 0 else b (RA) else b (RA) EA b + EXTZ(UI×2) EA b + (RB) RT0:31 EXTZ(MEM(EA,2)) RT0:31 EXTZ(MEM(EA,2)) RT32:63 EXTZ(MEM(EA,2)) RT32:63 EXTZ(MEM(EA,2)) D in the instruction mnemonic is UI × 2. The halfword The halfword addressed by EA is loaded from memory addressed by EA is loaded from memory and placed in and placed in the odd halfwords zero-extended in each the odd halfwords zero-extended in each element of element of RT. RT. Special Registers Altered: Special Registers Altered: None None 518 Power ISATM Book I Version 2.06 Vector Load Word into Two Halfwords Vector Load Word into Two Halfwords Even EVX-form Even Indexed EVX-form evlwhe RT,D(RA) evlwhex RT,RA,RB 4 RT RA UI 785 4 RT RA RB 784 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 0 if (RA = 0) then b 0 else b (RA) else b (RA) EA b + EXTZ(UI×4) EA b + (RB) RT0:15 MEM(EA,2) RT0:15 MEM(EA,2) RT16:31 0x0000 RT16:31 0x0000 RT32:47 MEM(EA+2,2) RT32:47 MEM(EA+2,2) RT48:63 0x0000 RT48:63 0x0000 D in the instruction mnemonic is UI × 4. The word The word addressed by EA is loaded from memory and addressed by EA is loaded from memory and placed in placed in the even halfwords in each element of RT. the even halfwords of each element of RT. The odd The odd halfwords of each element of RT are set to 0. halfwords of each element of RT are set to 0. Special Registers Altered: Special Registers Altered: None None Vector Load Word into Two Halfwords Vector Load Word into Two Halfwords Odd Signed (with sign extension) Odd Signed Indexed (with sign extension) EVX-form EVX-form evlwhos RT,D(RA) evlwhosx RT,RA,RB 4 RT RA UI 791 4 RT RA RB 790 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 0 if (RA = 0) then b 0 else b (RA) else b (RA) EA b + EXTZ(UI×4) EA b + (RB) RT0:31 EXTS(MEM(EA,2)) RT0:31 EXTS(MEM(EA,2)) RT32:63 EXTS(MEM(EA+2,2)) RT32:63 EXTS(MEM(EA+2,2)) D in the instruction mnemonic is UI × 4. The word The word addressed by EA is loaded from memory and addressed by EA is loaded from memory and placed in placed in the odd halfwords sign extended in each ele- the odd halfwords sign extended in each element of ment of RT. RT. Special Registers Altered: Special Registers Altered: None None Chapter 8. Signal Processing Engine (SPE) 519 Version 2.06 Vector Load Word into Two Halfwords Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) EVX-form Odd Unsigned Indexed (zero-extended) EVX-form evlwhou RT,D(RA) evlwhoux RT,RA,RB 4 RT RA UI 789 0 6 11 16 21 31 4 RT RA RB 788 0 6 11 16 21 31 if (RA = 0) then b 0 else b (RA) if (RA = 0) then b 0 EA b + EXTZ(UI×4) else b (RA) RT0:31 EXTZ(MEM(EA,2)) EA b + (RB) RT32:63 EXTZ(MEM(EA+2,2)) RT0:31 EXTZ(MEM(EA,2)) RT32:63 EXTZ(MEM(EA+2,2)) D in the instruction mnemonic is UI × 4. The word addressed by EA is loaded from memory and placed in The word addressed by EA is loaded from memory and the odd halfwords zero-extended in each element of placed in the odd halfwords zero-extended in each ele- RT. ment of RT. Special Registers Altered: Special Registers Altered: None None Vector Load Word into Two Halfwords Vector Load Word into Two Halfwords and Splat EVX-form and Splat Indexed EVX-form evlwhsplat RT,D(RA) evlwhsplatx RT,RA,RB 4 RT RA UI 797 4 RT RA RB 796 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 0 if (RA = 0) then b 0 else b (RA) else b (RA) EA b + EXTZ(UI×4) EA b + (RB) RT0:15 MEM(EA,2) RT0:15 MEM(EA,2) RT16:31 MEM(EA,2) RT16:31 MEM(EA,2) RT32:47 MEM(EA+2,2) RT32:47 MEM(EA+2,2) RT48:63 MEM(EA+2,2) RT48:63 MEM(EA+2,2) D in the instruction mnemonic is UI × 4. The word The word addressed by EA is loaded from memory and addressed by EA is loaded from memory and placed in placed in both the even and odd halfwords in each ele- both the even and odd halfwords in each element of ment of RT. RT. Special Registers Altered: Special Registers Altered: None None 520 Power ISATM Book I Version 2.06 Vector Load Word into Word and Splat Vector Load Word into Word and Splat EVX-form Indexed EVX-form evlwwsplat RT,D(RA) evlwwsplatx RT,RA,RB 4 RT RA UI 793 4 RT RA RB 792 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 0 if (RA = 0) then b 0 else b (RA) else b (RA) EA b + EXTZ(UI×4) EA b + (RB) RT0:31 MEM(EA,4) RT0:31 MEM(EA,4) RT32:63 MEM(EA,4) RT32:63 MEM(EA,4) D in the instruction mnemonic is UI × 4. The word The word addressed by EA is loaded from memory and addressed by EA is loaded from memory and placed in placed in both elements of RT. both elements of RT. Special Registers Altered: Special Registers Altered: None None Vector Merge High EVX-form Vector Merge Low EVX-form evmergehi RT,RA,RB evmergelo RT,RA,RB 4 RT RA RB 556 4 RT RA RB 557 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 (RA)0:31 RT0:31 (RA)32:63 RT32:63 (RB)0:31 RT32:63 (RB)32:63 The high-order elements of RA and RB are merged and The low-order elements of RA and RB are merged and placed in RT. placed in RT. Special Registers Altered: Special Registers Altered: None None Programming Note Programming Note A vector splat high can be performed by specifying A vector splat low can be performed by specifying the same register in RA and RB. the same register in RA and RB. Chapter 8. Signal Processing Engine (SPE) 521 Version 2.06 Vector Merge High/Low EVX-form Vector Merge Low/High EVX-form evmergehilo RT,RA,RB evmergelohi RT,RA,RB 4 RT RA RB 558 4 RT RA RB 559 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 (RA)0:31 RT0:31 (RA)32:63 RT32:63 (RB)32:63 RT32:63 (RB)0:31 The high-order element of RA and the low-order ele- The low-order element of RA and the high-order ele- ment of RB are merged and placed in RT. ment of RB are merged and placed in RT. Special Registers Altered: Special Registers Altered: None None Programming Note Programming Note With appropriate specification of RA and RB, A vector swap can be performed by specifying the evmergehi, evmergelo, evmergehilo, and same register in RA and RB. evmergelohi provide a full 32-bit permute of two source operands. Vector Multiply Halfwords, Even, Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Guarded, Signed, Modulo, Fractional and Accumulate EVX-form Accumulate Negative EVX-form evmhegsmfaa RT,RA,RB evmhegsmfan RT,RA,RB 4 RT RA RB 1323 4 RT RA RB 1451 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 (RA)32:47 ×gsf (RB)32:47 temp0:63 (RA)32:47 ×gsf (RB)32:47 RT0:63 (ACC)0:63 + temp0:63 RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 The corresponding low even-numbered, halfword The corresponding low even-numbered, halfword signed fractional elements in RA and RB are multiplied signed fractional elements in RA and RB are multiplied using guarded signed fractional multiplication produc- using guarded signed fractional multiplication produc- ing a sign extended 64-bit fractional product with the ing a sign extended 64-bit fractional product with the decimal between bits 32 and 33. The product is added decimal between bits 32 and 33. The product is sub- to the contents of the 64-bit accumulator and the result tracted from the contents of the 64-bit accumulator and is placed in RT and the accumulator the result is placed in RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Note Note If the two input operands are both -1.0, the interme- If the two input operands are both -1.0, the interme- diate product is represented as +1.0. diate product is represented as +1.0. 522 Power ISATM Book I Version 2.06 Vector Multiply Halfwords, Even, Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Guarded, Signed, Modulo, Integer and Accumulate EVX-form Accumulate Negative EVX-form evmhegsmiaa RT,RA,RB evmhegsmian RT,RA,RB 4 RT RA RB 1321 4 RT RA RB 1449 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 (RA)32:47 ×si (RB)32:47 temp0:31 (RA)32:47 ×si (RB)32:47 temp0:63 EXTS(temp0:31) temp0:63 EXTS(temp0:31) RT0:63 (ACC)0:63 + temp0:63 RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 The corresponding low even-numbered halfword The corresponding low even-numbered halfword signed-integer elements in RA and RB are multiplied. signed-integer elements in RA and RB are multiplied. The intermediate product is sign-extended and added The intermediate product is sign-extended and sub- to the contents of the 64-bit accumulator, and the tracted from the contents of the 64-bit accumulator, and resulting sum is placed in RT and into the accumulator. the result is placed in RT and into the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Vector Multiply Halfwords, Even, Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Guarded, Unsigned, Modulo, Integer and Accumulate EVX-form Accumulate Negative EVX-form evmhegumiaa RT,RA,RB evmhegumian RT,RA,RB 4 RT RA RB 1320 4 RT RA RB 1448 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 (RA)32:47 ×ui (RB)32:47 temp0:31 (RA)32:47 ×ui (RB)32:47 temp0:63 EXTZ(temp0:31) temp0:63 EXTZ(temp0:31) RT0:63 (ACC)0:63 + temp0:63 RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 The corresponding low even-numbered halfword The corresponding low even-numbered unsigned-inte- unsigned-integer elements in RA and RB are multi- ger elements in RA and RB are multiplied. The interme- plied. The intermediate product is zero-extended and diate product is zero-extended and subtracted from the added to the contents of the 64-bit accumulator. The contents of the 64-bit accumulator. The result is placed resulting sum is placed in RT and into the accumulator. in RT and into the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Chapter 8. Signal Processing Engine (SPE) 523 Version 2.06 Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Modulo, Fractional EVX-form Modulo, Fractional to Accumulator EVX-form evmhesmf RT,RA,RB evmhesmfa RT,RA,RB 4 RT RA RB 1035 0 6 11 16 21 31 4 RT RA RB 1067 0 6 11 16 21 31 RT0:31 (RA)0:15 ×sf (RB)0:15 RT32:63 (RA)32:47 ×sf (RB)32:47 RT0:31 (RA)0:15 ×sf (RB)0:15 The corresponding even-numbered halfword signed RT32:63 (RA)32:47 ×sf (RB)32:47 fractional elements in RA and RB are multiplied then ACC0:63 (RT)0:63 placed into the corresponding words of RT. The corresponding even-numbered halfword signed Special Registers Altered: fractional elements in RA and RB are multiplied then None placed into the corresponding words of RT and into the accumulator. Special Registers Altered: ACC Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate into Modulo, Fractional and Accumulate Words EVX-form Negative into Words EVX-form evmhesmfaaw RT,RA,RB evmhesmfanw RT,RA,RB 4 RT RA RB 1291 4 RT RA RB 1419 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 (RA)0:15 ×sf (RB)0:15 temp0:31 (RA)0:15 ×sf (RB)0:15 RT0:31 (ACC)0:31 + temp0:31 RT0:31 (ACC)0:31 - temp0:31 temp0:31 (RA)32:47 ×sf (RB)32:47 RT32:63 (ACC)32:63 + temp0:31 temp0:31 (RA)32:47 ×sf (RB)32:47 ACC0:63 (RT)0:63 RT32:63 (ACC)32:63 - temp0:31 ACC0:63 (RT)0:63 For each word element in the accumulator, the corre- sponding even-numbered halfword signed fractional For each word element in the accumulator, the corre- elements in RA and RB are multiplied. The 32 bits of sponding even-numbered halfword signed fractional each intermediate product are added to the contents of elements in RA and RB are multiplied. The 32-bit inter- the accumulator words to form intermediate sums, mediate products are subtracted from the contents of which are placed into the corresponding RT words and the accumulator words to form intermediate differ- into the accumulator. ences, which are placed into the corresponding RT words and into the accumulator. Special Registers Altered: ACC Special Registers Altered: ACC 524 Power ISATM Book I Version 2.06 Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Modulo, Integer EVX-form Modulo, Integer to AccumulatorEVX-form evmhesmi RT,RA,RB evmhesmia RT,RA,RB 4 RT RA RB 1033 4 RT RA RB 1065 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 (RA)0:15 ×si (RB)0:15 RT0:31 (RA)0:15 ×si (RB)0:15 RT32:63 (RA)32:47 ×si (RB)32:47 RT32:63 (RA)32:47 ×si (RB)32:47 ACC0:63 (RT)0:63 The corresponding even-numbered halfword signed-integer elements in RA and RB are multiplied. The corresponding even-numbered halfword The two 32-bit products are placed into the correspond- signed-integer elements in RA and RB are multiplied. ing words of RT. The two 32-bit products are placed into the correspond- ing words of RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate into Modulo, Integer and Accumulate Negative Words EVX-form into Words EVX-form evmhesmiaaw RT,RA,RB evmhesmianw RT,RA,RB 4 RT RA RB 1289 4 RT RA RB 1417 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 (RA)0:15 ×si (RB)0:15 temp0:31 (RA)0:15 ×si (RB)0:15 RT0:31 (ACC)0:31 + temp0:31 RT0:31 (ACC)0:31 - temp0:31 temp0:31 (RA)32:47 ×si (RB)32:47 temp0:31 (RA)32:47 ×si (RB)32:47 RT32:63 (ACC)32:63 + temp0:31 RT32:63 (ACC)32:63 - temp0:31 ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 For each word element in the accumulator, the corre- For each word element in the accumulator, the corre- sponding even-numbered halfword signed-integer ele- sponding even-numbered halfword signed-integer ele- ments in RA and RB are multiplied. Each intermediate ments in RA and RB are multiplied. Each intermediate 32-bit product is added to the contents of the accumu- 32-bit product is subtracted from the contents of the lator words to form intermediate sums, which are accumulator words to form intermediate differences, placed into the corresponding RT words and into the which are placed into the corresponding RT words and accumulator. into the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Chapter 8. Signal Processing Engine (SPE) 525 Version 2.06 Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Saturate, Fractional EVX-form Saturate, Fractional to Accumulator EVX-form evmhessf RT,RA,RB evmhessfa RT,RA,RB 4 RT RA RB 1027 0 6 11 16 21 31 4 RT RA RB 1059 0 6 11 16 21 31 temp0:31 (RA)0:15 ×sf (RB)0:15 if ((RA)0:15 = 0x8000) & ((RB)0:15 = 0x8000) then temp0:31 (RA)0:15 ×sf (RB)0:15 RT0:31 0x7FFF_FFFF if ((RA)0:15 = 0x8000) & ((RB)0:15 = 0x8000) then movh 1 RT0:31 0x7FFF_FFFF else movh 1 RT0:31 temp0:31 else movh 0 RT0:31 temp0:31 temp0:31 (RA)32:47 ×sf (RB)32:47 movh 0 if ((RA)32:47 = 0x8000) & ((RB)32:47 = 0x8000) then temp0:31 (RA)32:47 ×sf (RB)32:47 RT32:63 0x7FFF_FFFF if ((RA)32:47 = 0x8000) & ((RB)32:47 = 0x8000) then movl 1 RT32:63 0x7FFF_FFFF else movl 1 RT32:63 temp0:31 else movl 0 RT32:63 temp0:31 SPEFSCROVH movh movl 0 SPEFSCROV movl ACC0:63 (RT)0:63 SPEFSCRSOVH SPEFSCRSOVH | movh SPEFSCROVH movh SPEFSCRSOV SPEFSCRSOV | movl SPEFSCROV movl SPEFSCRSOVH SPEFSCRSOVH | movh The corresponding even-numbered halfword signed SPEFSCRSOV SPEFSCRSOV | movl fractional elements in RA and RB are multiplied. The 32 bits of each product are placed into the corresponding The corresponding even-numbered halfword signed words of RT. If both inputs are -1.0, the result saturates fractional elements in RA and RB are multiplied. The 32 to the largest positive signed fraction. bits of each product are placed into the corresponding words of RT and into the accumulator. If both inputs are Special Registers Altered: -1.0, the result saturates to the largest positive signed OV OVH SOV SOVH fraction. Special Registers Altered: ACC OV OVH SOV SOVH 526 Power ISATM Book I Version 2.06 Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Saturate, Fractional and Accumulate Words EVX-form Negative into Words EVX-form evmhessfaaw RT,RA,RB evmhessfanw RT,RA,RB 4 RT RA RB 1283 4 RT RA RB 1411 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 (RA)0:15 ×sf (RB)0:15 temp0:31 (RA)0:15 ×sf (RB)0:15 if ((RA)0:15 = 0x8000) & ((RB)0:15 = 0x8000) then if ((RA)0:15 = 0x8000) & ((RB)0:15 = 0x8000) then temp0:31 0x7FFF_FFFF temp0:31 0x7FFF_FFFF movh 1 movh 1 else else movh 0 movh 0 temp0:63 EXTS((ACC)0:31) + EXTS(temp0:31) temp0:63 EXTS((ACC)0:31) - EXTS(temp0:31) ovh (temp31 temp32) ovh (temp31 temp32) RT0:31 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) temp0:31 (RA)32:47 ×sf (RB)32:47 temp0:31 (RA)32:47 ×sf (RB)32:47 if ((RA)32:47 = 0x8000) & ((RB)32:47 = 0x8000) then if ((RA)32:47 = 0x8000) & ((RB)32:47 = 0x8000) then temp0:31 0x7FFF_FFFF temp0:31 0x7FFF_FFFF movl 1 movl 1 else else movl 0 movl 0 temp0:63 EXTS((ACC)32:63) - EXTS(temp0:31) temp0:63 EXTS((ACC)32:63) + EXTS(temp0:31) ovl (temp31 temp32) ovl (temp31 temp32) RT32:63 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 SPEFSCROVH ovh | movh SPEFSCROVH ovh | movh SPEFSCROV ovl| movl SPEFSCROV ovl| movl SPEFSCRSOVH SPEFSCRSOVH | ovh | movh SPEFSCRSOVH SPEFSCRSOVH | ovh | movh SPEFSCRSOV SPEFSCRSOV | ovl| movl SPEFSCRSOV SPEFSCRSOV | ovl| movl The corresponding even-numbered halfword signed The corresponding even-numbered halfword signed fractional elements in RA and RB are multiplied pro- fractional elements in RA and RB are multiplied pro- ducing a 32-bit product. If both inputs are -1.0, the ducing a 32-bit product. If both inputs are -1.0, the result saturates to 0x7FFF_FFFF. Each 32-bit product result saturates to 0x7FFF_FFFF. Each 32-bit product is then added to the corresponding word in the accu- is then subtracted from the corresponding word in the mulator saturating if overflow occurs, and the result is accumulator saturating if overflow occurs, and the placed in RT and the accumulator. result is placed in RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH Chapter 8. Signal Processing Engine (SPE) 527 Version 2.06 Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate into Saturate, Integer and Accumulate Words EVX-form Negative into Words EVX-form evmhessiaaw RT,RA,RB evmhessianw RT,RA,RB 4 RT RA RB 1281 4 RT RA RB 1409 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 (RA)0:15 ×si (RB)0:15 temp0:31 (RA)0:15 ×si (RB)0:15 temp0:63 EXTS((ACC)0:31) + EXTS(temp0:31) temp0:63 EXTS((ACC)0:31) - EXTS(temp0:31) ovh (temp31 temp32) ovh (temp31 temp32) RT0:31 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) temp0:31 (RA)32:47 ×si (RB)32:47 temp0:31 (RA)32:47 ×si (RB)32:47 temp0:63 EXTS((ACC)32:63) - EXTS(temp0:31) temp0:63 EXTS((ACC)32:63) + EXTS(temp0:31) ovl (temp31 temp32) ovl (temp31 temp32) RT32:63 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) ACC0:63 RT0:63 ACC0:63 (RT)0:63 SPEFSCROVH ovh SPEFSCROVH ovh SPEFSCROV ovl SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl SPEFSCRSOV SPEFSCRSOV | ovl The corresponding even-numbered halfword The corresponding even-numbered halfword signed-integer elements in RA and RB are multiplied signed-integer elements in RA and RB are multiplied producing a 32-bit product. Each 32-bit product is then producing a 32-bit product. Each 32-bit product is then subtracted from the corresponding word in the accumu- added to the corresponding word in the accumulator lator saturating if overflow occurs, and the result is saturating if overflow occurs, and the result is placed in placed in RT and the accumulator. RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH 528 Power ISATM Book I Version 2.06 Vector Multiply Halfwords, Even, Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer EVX-form Unsigned, Modulo, Integer to Accumulator EVX-form evmheumi RT,RA,RB evmheumia RT,RA,RB 4 RT RA RB 1032 0 6 11 16 21 31 4 RT RA RB 1064 0 6 11 16 21 31 RT0:31 (RA)0:15 ×ui (RB)0:15 RT32:63 (RA)32:47 ×ui (RB)32:47 RT0:31 (RA)0:15 ×ui (RB)0:15 RT32:63 (RA)32:47 ×ui (RB)32:47 The corresponding even-numbered halfword ACC0:63 (RT)0:63 unsigned-integer elements in RA and RB are multi- plied. The two 32-bit products are placed into the corre- The corresponding even-numbered halfword sponding words of RT. unsigned-integer elements in RA and RB are multi- plied. The two 32-bit products are placed into RT and Special Registers Altered: into the accumulator. None Special Registers Altered: ACC Vector Multiply Halfwords, Even, Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Unsigned, Modulo, Integer and Accumulate into Words EVX-form Accumulate Negative into Words EVX-form evmheumiaaw RT,RA,RB evmheumianw RT,RA,RB 4 RT RA RB 1288 0 6 11 16 21 31 4 RT RA RB 1416 0 6 11 16 21 31 temp0:31 (RA)0:15 ×ui (RB)0:15 RT0:31 (ACC)0:31 + temp0:31 temp0:31 (RA)0:15 ×ui (RB)0:15 temp0:31 (RA)32:47 ×ui (RB)32:47 RT0:31 (ACC)0:31 - temp0:31 RT32:63 (ACC)32:63 + temp0:31 temp0:31 (RA)32:47 ×ui (RB)32:47 ACC0:63 (RT)0:63 RT32:63 (ACC)32:63 - temp0:31 ACC0:63 (RT)0:63 For each word element in the accumulator, the corre- sponding even-numbered halfword unsigned-integer For each word element in the accumulator, the corre- elements in RA and RB are multiplied. Each intermedi- sponding even-numbered halfword unsigned-integer ate product is added to the contents of the correspond- elements in RA and RB are multiplied. Each intermedi- ing accumulator words and the sums are placed into ate product is subtracted from the contents of the corre- the corresponding RT and accumulator words. sponding accumulator words. The differences are placed into the corresponding RT and accumulator Special Registers Altered: words. ACC Special Registers Altered: ACC Chapter 8. Signal Processing Engine (SPE) 529 Version 2.06 Vector Multiply Halfwords, Even, Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Unsigned, Saturate, Integer and Accumulate into Words EVX-form Accumulate Negative into Words EVX-form evmheusiaaw RT,RA,RB evmheusianw RT,RA,RB 4 RT RA RB 1280 0 6 11 16 21 31 4 RT RA RB 1408 0 6 11 16 21 31 temp0:31 (RA)0:15 ×ui (RB)0:15 temp0:63 EXTZ((ACC)0:31) + EXTZ(temp0:31) temp0:31 (RA)0:15 ×ui (RB)0:15 ovh temp31 temp0:63 EXTZ((ACC)0:31) - EXTZ(temp0:31) RT0:31 SATURATE(ovh, 0, 0xFFFF_FFFF, 0xFFFF_FFFF, ovh temp31 temp32:63) RT0:31 SATURATE(ovh, 0, 0x0000_0000, 0x0000_0000, temp0:31 (RA)32:47 ×ui (RB)32:47 temp32:63) temp0:63 EXTZ((ACC)32:63) + EXTZ(temp0:31) temp0:31 (RA)32:47 ×ui (RB)32:47 ovl temp31 temp0:63 EXTZ((ACC)32:63) - EXTZ(temp0:31) RT32:63 SATURATE(ovl, 0, 0xFFFF_FFFF, ovl temp31 0xFFFF_FFFF, temp32:63) RT32:63 SATURATE(ovl, 0, 0x0000_0000, ACC0:63 (RT)0:63 0x0000_0000, temp32:63) SPEFSCROVH ovh ACC0:63 (RT)0:63 SPEFSCROV ovl SPEFSCROVH ovh SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCROV ovl SPEFSCRSOV SPEFSCRSOV | ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl For each word element in the accumulator, correspond- ing even-numbered halfword unsigned-integer ele- For each word element in the accumulator, correspond- ments in RA and RB are multiplied producing a 32-bit ing even-numbered halfword unsigned-integer ele- product. Each 32-bit product is then added to the corre- ments in RA and RB are multiplied producing a 32-bit sponding word in the accumulator saturating if overflow product. Each 32-bit product is then subtracted from occurs, and the result is placed in RT and the accumu- the corresponding word in the accumulator saturating if lator. overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH Special Registers Altered: ACC OV OVH SOV SOVH 530 Power ISATM Book I Version 2.06 Vector Multiply Halfwords, Odd, Guarded, Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Signed, Modulo, Fractional and Accumulate EVX-form Accumulate Negative EVX-form evmhogsmfaa RT,RA,RB evmhogsmfan RT,RA,RB 4 RT RA RB 1327 4 RT RA RB 1455 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 (RA)48:63 ×gsf (RB)48:63 temp0:63 (RA)48:63 ×gsf (RB)48:63 RT0:63 (ACC)0:63 + temp0:63 RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 The corresponding low odd-numbered, halfword signed The corresponding low odd-numbered, halfword signed fractional elements in RA and RB are multiplied using fractional elements in RA and RB are multiplied using guarded signed fractional multiplication producing a guarded signed fractional multiplication producing a sign extended 64-bit fractional product with the decimal sign extended 64-bit fractional product with the decimal between bits 32 and 33. The product is added to the between bits 32 and 33. The product is subtracted from contents of the 64-bit accumulator and the result is the contents of the 64-bit accumulator and the result is placed in RT and the accumulator. placed in RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Note Note If the two input operands are both -1.0, the interme- If the two input operands are both -1.0, the interme- diate product is represented as +1.0. diate product is represented as +1.0. Vector Multiply Halfwords, Odd, Guarded, Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Signed, Modulo, Integer and Accumulate EVX-form Negative EVX-form evmhogsmiaa RT,RA,RB evmhogsmian RT,RA,RB 4 RT RA RB 1325 4 RT RA RB 1453 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 (RA)48:63 ×si (RB)48:63 temp0:31 (RA)48:63 ×si (RB)48:63 temp0:63 EXTS(temp0:31) temp0:63 EXTS(temp0:31) RT0:63 (ACC)0:63 + temp0:63 RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 The corresponding low odd-numbered halfword The corresponding low odd-numbered halfword signed-integer elements in RA and RB are multiplied. signed-integer elements in RA and RB are multiplied. The intermediate product is sign-extended to 64 bits The intermediate product is sign-extended to 64 bits then added to the contents of the 64-bit accumulator, then subtracted from the contents of the 64-bit accumu- and the result is placed in RT and into the accumulator. lator, and the result is placed in RT and into the accu- mulator. Special Registers Altered: ACC Special Registers Altered: ACC Chapter 8. Signal Processing Engine (SPE) 531 Version 2.06 Vector Multiply Halfwords, Odd, Guarded, Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Unsigned, Modulo, Integer and Accumulate EVX-form Accumulate Negative EVX-form evmhogumiaa RT,RA,RB evmhogumian RT,RA,RB 4 RT RA RB 1324 4 RT RA RB 1452 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 (RA)48:63 ×ui (RB)48:63 temp0:31 (RA)48:63 ×ui (RB)48:63 temp0:63 EXTZ(temp0:31) temp0:63 EXTZ(temp0:31) RT0:63 (ACC)0:63 + temp0:63 RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 The corresponding low odd-numbered halfword The corresponding low odd-numbered halfword unsigned-integer elements in RA and RB are multi- unsigned-integer elements in RA and RB are multi- plied. The intermediate product is zero-extended to 64 plied. The intermediate product is zero-extended to 64 bits then added to the contents of the 64-bit accumula- bits then subtracted from the contents of the 64-bit tor, and the result is placed in RT and into the accumu- accumulator, and the result is placed in RT and into the lator. accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional EVX-form Modulo, Fractional to Accumulator EVX-form evmhosmf RT,RA,RB evmhosmfa RT,RA,RB 4 RT RA RB 1039 0 6 11 16 21 31 4 RT RA RB 1071 0 6 11 16 21 31 RT0:31 (RA)16:31 ×sf (RB)16:31 RT32:63 (RA)48:63 ×sf (RB)48:63 RT0:31 (RA)16:31 ×sf (RB)16:31 RT32:63 (RA)48:63 ×sf (RB)48:63 The corresponding odd-numbered, halfword signed ACC0:63 (RT)0:63 fractional elements in RA and RB are multiplied. Each product is placed into the corresponding words of RT. The corresponding odd-numbered, halfword signed fractional elements in RA and RB are multiplied. Each Special Registers Altered: product is placed into the corresponding words of RT. None and into the accumulator. Special Registers Altered: ACC 532 Power ISATM Book I Version 2.06 Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate into Modulo, Fractional and Accumulate Words EVX-form Negative into Words EVX-form evmhosmfaaw RT,RA,RB evmhosmfanw RT,RA,RB 4 RT RA RB 1295 4 RT RA RB 1423 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 (RA)16:31 ×sf (RB)16:31 temp0:31 (RA)16:31 ×sf (RB)16:31 RT0:31 (ACC)0:31 + temp0:31 RT0:31 (ACC)0:31 - temp0:31 temp0:31 (RA)48:63 ×sf (RB)48:63 temp0:31 (RA)48:63 ×sf (RB)48:63 RT32:63 (ACC)32:63 + temp0:31 RT32:63 (ACC)32:63 - temp0:31 ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 For each word element in the accumulator, the corre- For each word element in the accumulator, the corre- sponding odd-numbered halfword signed fractional ele- sponding odd-numbered halfword signed fractional ele- ments in RA and RB are multiplied. The 32 bits of each ments in RA and RB are multiplied. The 32 bits of each intermediate product are added to the contents of the intermediate product are subtracted from the contents corresponding accumulator word and the results are of the corresponding accumulator word and the results placed into the corresponding RT words and into the are placed into the corresponding RT words and into accumulator. the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Modulo, Integer EVX-form Modulo, Integer to AccumulatorEVX-form evmhosmi RT,RA,RB evmhosmia RT,RA,RB 4 RT RA RB 1037 4 RT RA RB 1069 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 (RA)16:31 ×si (RB)16:31 RT0:31 (RA)16:31 ×si (RB)16:31 RT32:63 (RA)48:63 ×si (RB)48:63 RT32:63 (RA)48:63 ×si (RB)48:63 ACC0:63 (RT)0:63 The corresponding odd-numbered halfword signed-integer elements in RA and RB are multiplied. The corresponding odd-numbered halfword The two 32-bit products are placed into the correspond- signed-integer elements in RA and RB are multiplied. ing words of RT. The two 32-bit products are placed into the correspond- ing words of RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC Chapter 8. Signal Processing Engine (SPE) 533 Version 2.06 Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate into Modulo, Integer and Accumulate Negative Words EVX-form into Words EVX-form evmhosmiaaw RT,RA,RB evmhosmianw RT,RA,RB 4 RT RA RB 1293 4 RT RA RB 1421 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 (RA)16:31 ×si (RB)16:31 temp0:31 (RA)16:31 ×si (RB)16:31 RT0:31 (ACC)0:31 + temp0:31 RT0:31 (ACC)0:31 - temp0:31 temp0:31 (RA)48:63 ×si (RB)48:63 temp0:31 (RA)48:63 ×si (RB)48:63 RT32:63 (ACC)32:63 + temp0:31 RT32:63 (ACC)32:63 - temp0:31 ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 For each word element in the accumulator, the corre- For each word element in the accumulator, the corre- sponding odd-numbered halfword signed-integer ele- sponding odd-numbered halfword signed-integer ele- ments in RA and RB are multiplied. Each intermediate ments in RA and RB are multiplied. Each intermediate 32-bit product is added to the contents of the corre- 32-bit product is subtracted from the contents of the sponding accumulator word and the results are placed corresponding accumulator word and the results are into the corresponding RT words and into the accumu- placed into the corresponding RT words and into the lator. accumulator. Special Registers Altered: Special Registers Altered: ACC ACC 534 Power ISATM Book I Version 2.06 Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional EVX-form Saturate, Fractional to Accumulator EVX-form evmhossf RT,RA,RB evmhossfa RT,RA,RB 4 RT RA RB 1031 0 6 11 16 21 31 4 RT RA RB 1063 0 6 11 16 21 31 temp0:31 (RA)16:31 ×sf (RB)16:31 if ((RA)16:31 = 0x8000) & ((RB)16:31 = 0x8000) then temp0:31 (RA)16:31 ×sf (RB)16:31 RT0:31 0x7FFF_FFFF if ((RA)16:31 = 0x8000) & ((RB)16:31 = 0x8000) then movh 1 RT0:31 0x7FFF_FFFF else movh 1 RT0:31 temp0:31 else movh 0 RT0:31 temp0:31 temp0:31 (RA)48:63 ×sf (RB)48:63 movh 0 if ((RA)48:63 = 0x8000) & ((RB)48:63 = 0x8000) then temp0:31 (RA)48:63 ×sf (RB)48:63 RT32:63 0x7FFF_FFFF if ((RA)48:63 = 0x8000) & ((RB)48:63 = 0x8000) then movl 1 RT32:63 0x7FFF_FFFF else movl 1 RT32:63 temp0:31 else movl 0 RT32:63 temp0:31 SPEFSCROVH movh movl 0 SPEFSCROV movl ACC0:63 (RT)0:63 SPEFSCRSOVH SPEFSCRSOVH | movh SPEFSCROVH movh SPEFSCRSOV SPEFSCRSOV | movl SPEFSCROV movl SPEFSCRSOVH SPEFSCRSOVH | movh The corresponding odd-numbered halfword signed SPEFSCRSOV SPEFSCRSOV | movl fractional elements in RA and RB are multiplied. The 32 bits of each product are placed into the corresponding The corresponding odd-numbered halfword signed words of RT. If both inputs are -1.0, the result saturates fractional elements in RA and RB are multiplied. The 32 to the largest positive signed fraction. bits of each product are placed into the corresponding words of RT and into the accumulator. If both inputs are Special Registers Altered: -1.0, the result saturates to the largest positive signed OV OVH SOV SOVH fraction. Special Registers Altered: ACC OV OVH SOV SOVH Chapter 8. Signal Processing Engine (SPE) 535 Version 2.06 Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional and Accumulate into Saturate, Fractional and Accumulate Words EVX-form Negative into Words EVX-form evmhossfaaw RT,RA,RB evmhossfanw RT,RA,RB 4 RT RA RB 1287 4 RT RA RB 1415 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 (RA)16:31 ×sf (RB)16:31 temp0:31 (RA)16:31 ×sf (RB)16:31 if ((RA)16:31 = 0x8000) & ((RB)16:31 = 0x8000) then if ((RA)16:31 = 0x8000) & ((RB)16:31 = 0x8000) then temp0:31 0x7FFF_FFFF temp0:31 0x7FFF_FFFF movh 1 movh 1 else else movh 0 movh 0 temp0:63 EXTS((ACC)0:31) + EXTS(temp0:31) temp0:63 EXTS((ACC)0:31) - EXTS(temp0:31) ovh (temp31 temp32) ovh (temp31 temp32) RT0:31 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) temp0:31 (RA)48:63 ×sf (RB)48:63 temp0:31 (RA)48:63 ×sf (RB)48:63 if ((RA)48:63 = 0x8000) & ((RB)48:63 = 0x8000) then if ((RA)48:63 = 0x8000) & ((RB)48:63 = 0x8000) then temp0:31 0x7FFF_FFFF temp0:31 0x7FFF_FFFF movl 1 movl 1 else else movl 0 movl 0 temp0:63 EXTS((ACC)32:63) + EXTS(temp0:31) temp0:63 EXTS((ACC)32:63) - EXTS(temp0:31) ovl (temp31 temp32) ovl (temp31 temp32) RT32:63 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 SPEFSCROVH ovh | movh SPEFSCROVH ovh | movh SPEFSCROV ovl| movl SPEFSCROV ovl| movl SPEFSCRSOVH SPEFSCRSOVH | ovh | movh SPEFSCRSOVH SPEFSCRSOVH | ovh | movh SPEFSCRSOV SPEFSCRSOV | ovl| movl SPEFSCRSOV SPEFSCRSOV | ovl| movl The corresponding odd-numbered halfword signed The corresponding odd-numbered halfword signed fractional elements in RA and RB are multiplied pro- fractional elements in RA and RB are multiplied pro- ducing a 32-bit product. If both inputs are -1.0, the ducing a 32-bit product. If both inputs are -1.0, the result saturates to 0x7FFF_FFFF. Each 32-bit product result saturates to 0x7FFF_FFFF. Each 32-bit product is then added to the corresponding word in the accu- is then subtracted from the corresponding word in the mulator saturating if overflow occurs, and the result is accumulator saturating if overflow occurs, and the placed in RT and the accumulator. result is placed in RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH 536 Power ISATM Book I Version 2.06 Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate into Saturate, Integer and Accumulate Words EVX-form Negative into Words EVX-form evmhossiaaw RT,RA,RB evmhossianw RT,RA,RB 4 RT RA RB 1285 4 RT RA RB 1413 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 (RA)16:31 ×si (RB)16:31 temp0:31 (RA)16:31 ×si (RB)16:31 temp0:63 EXTS((ACC)0:31) + EXTS(temp0:31) temp0:63 EXTS((ACC)0:31) - EXTS(temp0:31) ovh (temp31 temp32) ovh (temp31 temp32) RT0:31 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) temp0:31 (RA)48:63 ×si (RB)48:63 temp0:31 (RA)48:63 ×si (RB)48:63 temp0:63 EXTS((ACC)32:63) + EXTS(temp0:31) temp0:63 EXTS((ACC)32:63) - EXTS(temp0:31) ovl (temp31 temp32) ovl (temp31 temp32) RT32:63 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 SPEFSCROVH ovh SPEFSCROVH ovh SPEFSCROV ovl SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl SPEFSCRSOV SPEFSCRSOV | ovl The corresponding odd-numbered halfword The corresponding odd-numbered halfword signed-integer elements in RA and RB are multiplied signed-integer elements in RA and RB are multiplied producing a 32-bit product. Each 32-bit product is then producing a 32-bit product. Each 32-bit product is then added to the corresponding word in the accumulator subtracted from the corresponding word in the accumu- saturating if overflow occurs, and the result is placed in lator saturating if overflow occurs, and the result is RT and the accumulator. placed in RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH Vector Multiply Halfwords, Odd, Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer EVX-form Unsigned, Modulo, Integer to Accumulator EVX-form evmhoumi RT,RA,RB evmhoumia RT,RA,RB 4 RT RA RB 1036 0 6 11 16 21 31 4 RT RA RB 1068 0 6 11 16 21 31 RT0:31 (RA)16:31 ×ui (RB)16:31 RT32:63 (RA)48:63 ×ui (RB)48:63 RT0:31 (RA)16:31 ×ui (RB)16:31 RT32:63 (RA)48:63 ×ui (RB)48:63 The corresponding odd-numbered halfword ACC0:63 (RT)0:63 unsigned-integer elements in RA and RB are multi- plied. The two 32-bit products are placed into the corre- The corresponding odd-numbered halfword sponding words of RT. unsigned-integer elements in RA and RB are multi- plied. The two 32-bit products are placed into RT and Special Registers Altered: into the accumulator. None Special Registers Altered: ACC Chapter 8. Signal Processing Engine (SPE) 537 Version 2.06 Vector Multiply Halfwords, Odd, Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Unsigned, Modulo, Integer and Accumulate into Words EVX-form Accumulate Negative into Words EVX-form evmhoumiaaw RT,RA,RB evmhoumianw RT,RA,RB 4 RT RA RB 1292 0 6 11 16 21 31 4 RT RA RB 1420 0 6 11 16 21 31 temp0:31 (RA)16:31 ×ui (RB)16:31 RT0:31 (ACC)0:31 + temp0:31 temp0:31 (RA)16:31 ×ui (RB)16:31 temp0:31 (RA)48:63 ×ui (RB)48:63 RT0:31 (ACC)0:31 - temp0:31 RT32:63 (ACC)32:63 + temp0:31 temp0:31 (RA)48:63 ×ui (RB)48:63 ACC0:63 (RT)0:63 RT32:63 (ACC)32:63 - temp0:31 ACC0:63 (RT)0:63 For each word element in the accumulator, the corre- sponding odd-numbered halfword unsigned-integer For each word element in the accumulator, the corre- elements in RA and RB are multiplied. Each intermedi- sponding odd-numbered halfword unsigned-integer ate product is added to the contents of the correspond- elements in RA and RB are multiplied. Each intermedi- ing accumulator word. The sums are placed into the ate product is subtracted from the contents of the corre- corresponding RT and accumulator words. sponding accumulator word. The results are placed into the corresponding RT and accumulator words. Special Registers Altered: ACC Special Registers Altered: ACC Vector Multiply Halfwords, Odd, Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Unsigned, Saturate, Integer and Accumulate into Words EVX-form Accumulate Negative into Words EVX-form evmhousiaaw RT,RA,RB evmhousianw RT,RA,RB 4 RT RA RB 1284 0 6 11 16 21 31 4 RT RA RB 1412 0 6 11 16 21 31 temp0:31 (RA)16:31 ×ui (RB)16:31 temp0:63 EXTZ((ACC)0:31) + EXTZ(temp0:31) temp0:31 (RA)16:31 ×ui (RB)16:31 ovh temp31 temp0:63 EXTZ((ACC)0:31) - EXTZ(temp0:31) RT0:31 SATURATE(ovh, 0, 0xFFFF_FFFF, 0xFFFF_FFFF, ovh temp31 temp32:63) RT0:31 SATURATE(ovh, 0, 0x0000_0000, 0x0000_0000, temp0:31 (RA)48:63 ×ui (RB)48:63 temp32:63) temp0:63 EXTZ((ACC)32:63) + EXTZ(temp0:31) temp0:31 (RA)48:63 ×ui (RB)48:63 ovl temp31 temp0:63 EXTZ((ACC)32:63) - EXTZ(temp0:31) RT32:63 SATURATE(ovl, 0, 0xFFFF_FFFF, ovl temp31 0xFFFF_FFFF, temp32:63) RT32:63 SATURATE(ovl, 0, 0x0000_0000,0x0000_0000, ACC0:63 (RT)0:63 temp32:63) SPEFSCROVH ovh ACC0:63 (RT)0:63 SPEFSCROV ovl SPEFSCROVH ovh SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCROV ovl SPEFSCRSOV SPEFSCRSOV | ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl For each word element in the accumulator, correspond- ing odd-numbered halfword unsigned-integer elements For each word element in the accumulator, correspond- in RA and RB are multiplied producing a 32-bit product. ing odd-numbered halfword unsigned-integer elements Each 32-bit product is then added to the corresponding in RA and RB are multiplied producing a 32-bit product. word in the accumulator saturating if overflow occurs, Each 32-bit product is then subtracted from the corre- and the result is placed in RT and the accumulator. sponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumu- Special Registers Altered: lator. ACC OV OVH SOV SOVH Special Registers Altered: ACC OV OVH SOV SOVH 538 Power ISATM Book I Version 2.06 Initialize Accumulator EVX-form evmra RT,RA 4 RT RA /// 1220 0 6 11 16 21 31 ACC0:63 (RA)0:63 RT0:63 (RA)0:63 The contents of RA are placed into the accumulator and RT. This is the method for initializing the accumula- tor. Special Registers Altered: ACC Vector Multiply Word High Signed, Vector Multiply Word High Signed, Modulo, Fractional EVX-form Modulo, Fractional to Accumulator EVX-form evmwhsmf RT,RA,RB evmwhsmfa RT,RA,RB 4 RT RA RB 1103 0 6 11 16 21 31 4 RT RA RB 1135 0 6 11 16 21 31 temp0:63 (RA)0:31 ×sf (RB)0:31 RT0:31 temp0:31 temp0:63 (RA)0:31 ×sf (RB)0:31 temp0:63 (RA)32:63 ×sf (RB)32:63 RT0:31 temp0:31 RT32:63 temp0:31 temp0:63 (RA)32:63 ×sf (RB)32:63 RT32:63 temp0:31 The corresponding word signed fractional elements in ACC0:63 (RT)0:63 RA and RB are multiplied and bits 0:31 of the two prod- ucts are placed into the two corresponding words of The corresponding word signed fractional elements in RT. RA and RB are multiplied and bits 0:31 of the two prod- ucts are placed into the two corresponding words of RT Special Registers Altered: and into the accumulator. None Special Registers Altered: ACC Vector Multiply Word High Signed, Vector Multiply Word High Signed, Modulo, Integer EVX-form Modulo, Integer to AccumulatorEVX-form evmwhsmi RT,RA,RB evmwhsmia RT,RA,RB 4 RT RA RB 1101 4 RT RA RB 1133 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 (RA)0:31 ×si (RB)0:31 temp0:63 (RA)0:31 ×si (RB)0:31 RT0:31 temp0:31 RT0:31 temp0:31 temp0:63 (RA)32:63 ×si (RB)32:63 temp0:63 (RA)32:63 ×si (RB)32:63 RT32:63 temp0:31 RT32:63 temp0:31 ACC0:63 (RT)0:63 The corresponding word signed-integer elements in RA and RB are multiplied. Bits 0:31 of the two 64-bit prod- The corresponding word signed-integer elements in RA ucts are placed into the two corresponding words of and RB are multiplied. Bits 0:31 of the two 64-bit prod- RT. ucts are placed into the two corresponding words of RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC Chapter 8. Signal Processing Engine (SPE) 539 Version 2.06 Vector Multiply Word High Signed, Vector Multiply Word High Signed, Saturate, Fractional EVX-form Saturate, Fractional to Accumulator EVX-form evmwhssf RT,RA,RB evmwhssfa RT,RA,RB 4 RT RA RB 1095 0 6 11 16 21 31 4 RT RA RB 1127 0 6 11 16 21 31 temp0:63 (RA)0:31 ×sf (RB)0:31 if ((RA)0:31 = 0x8000_0000)& ((RB)0:31 = 0x8000_0000) temp0:63 (RA)0:31 ×sf (RB)0:31 then if ((RA)0:31 = 0x8000_0000) & ((RB)0:31 = 0x8000_0000) RT0:31 0x7FFF_FFFF then movh 1 RT0:31 0x7FFF_FFFF else movh 1 RT0:31 temp0:31 else movh 0 RT0:31 temp0:31 temp0:63 (RA)32:63 ×sf (RB)32:63 movh 0 if ((RA)32:63 = 0x8000_0000 &(RB)32:63 = 0x8000_0000) temp0:63 (RA)32:63 ×sf (RB)32:63 then if ((RA)32:63=0x8000_0000)&((RB)32:63=0x8000_0000) RT32:63 0x7FFF_FFFF then movl 1 RT32:63 0x7FFF_FFFF else movl 1 RT32:63 temp0:31 else movl 0 RT32:63 temp0:31 SPEFSCROVH movh movl 0 SPEFSCROV movl ACC0:63 (RT)0:63 SPEFSCRSOVH SPEFSCRSOVH | movh SPEFSCROVH movh SPEFSCRSOV SPEFSCRSOV | movl SPEFSCROV movl SPEFSCRSOVH SPEFSCRSOVH | movh The corresponding word signed fractional elements in SPEFSCRSOV SPEFSCRSOV | movl RA and RB are multiplied. Bits 0:31 of each product are placed into the corresponding words of RT. If both The corresponding word signed fractional elements in inputs are -1.0, the result saturates to the largest posi- RA and RB are multiplied. Bits 0:31 of each product are tive signed fraction. placed into the corresponding words of RT and into the accumulator. If both inputs are -1.0, the result saturates Special Registers Altered: to the largest positive signed fraction. OV OVH SOV SOVH Special Registers Altered: ACC OV OVH SOV SOVH Vector Multiply Word High Unsigned, Vector Multiply Word High Unsigned, Modulo, Integer EVX-form Modulo, Integer to AccumulatorEVX-form evmwhumi RT,RA,RB evmwhumia RT,RA,RB 4 RT RA RB 1100 4 RT RA RB 1132 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 (RA)0:31 ×ui (RB)0:31 temp0:63 (RA)0:31 ×ui (RB)0:31 RT0:31 temp0:31 RT0:31 temp0:31 temp0:63 (RA)32:63 ×ui (RB)32:63 temp0:63 (RA)32:63 ×ui (RB)32:63 RT32:63 temp0:31 RT32:63 temp0:31 ACC0:63 (RT)0:63 The corresponding word unsigned-integer elements in RA and RB are multiplied. Bits 0:31 of the two products The corresponding word unsigned-integer elements in are placed into the two corresponding words of RT. RA and RB are multiplied. Bits 0:31 of the two products are placed into the two corresponding words of RT and Special Registers Altered: into the accumulator. None Special Registers Altered: ACC 540 Power ISATM Book I Version 2.06 Vector Multiply Word Low Signed, Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Modulo, Integer and Accumulate Negative Words EVX-form in Words EVX-form evmwlsmiaaw RT,RA,RB evmwlsmianw RT,RA,RB 4 RT RA RB 1353 4 RT RA RB 1481 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 (RA)0:31 ×si (RB)0:31 temp0:63 (RA)0:31 ×si (RB)0:31 RT0:31 (ACC)0:31 + temp32:63 RT0:31 (ACC)0:31 - temp32:63 temp0:63 (RA)32:63 ×si (RB)32:63 temp0:63 (RA)32:63 ×si (RB)32:63 RT32:63 (ACC)32:63 + temp32:63 RT32:63 (ACC)32:63 - temp32:63 ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 For each word element in the accumulator, the corre- For each word element in the accumulator, the corre- sponding word signed-integer elements in RA and RB sponding word elements in RA and RB are multiplied. are multiplied. The least significant 32 bits of each The least significant 32 bits of each intermediate prod- intermediate product are added to the contents of the uct are subtracted from the contents of the correspond- corresponding accumulator words, and the result is ing accumulator words and the result is placed in RT placed in RT and the accumulator. and the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Vector Multiply Word Low Signed, Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Saturate, Integer and Accumulate Words EVX-form Negative in Words EVX-form evmwlssiaaw RT,RA,RB evmwlssianw RT,RA,RB 4 RT RA RB 1345 4 RT RA RB 1473 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 (RA)0:31 ×si (RB)0:31 temp0:63 (RA)0:31 ×si (RB)0:31 temp0:63 EXTS((ACC)0:31) + EXTS(temp32:63) temp0:63 EXTS((ACC)0:31) - EXTS(temp32:63) ovh (temp31 temp32) ovh (temp31 temp32) RT0:31 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) temp0:63 (RA)32:63 ×si (RB)32:63 temp0:63 (RA)32:63 ×si (RB)32:63 temp0:63 EXTS((ACC)32:63) + EXTS(temp32:63) temp0:63 EXTS((ACC)32:63) - EXTS(temp32:63) ovl (temp31 temp32) ovl (temp31 temp32) RT32:63 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 SPEFSCROVH ovh SPEFSCROVH ovh SPEFSCROV ovl SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl SPEFSCRSOV SPEFSCRSOV | ovl The corresponding word signed-integer elements in RA The corresponding word signed-integer elements in RA and RB are multiplied producing a 64-bit product. The and RB are multiplied producing a 64-bit product. The least significant 32 bits of each product are then added least significant 32 bits of each product are then sub- to the corresponding word in the accumulator saturat- tracted from the corresponding word in the accumulator ing if overflow occurs, and the result is placed in RT saturating if overflow occurs, and the result is placed in and the accumulator. RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH Chapter 8. Signal Processing Engine (SPE) 541 Version 2.06 Vector Multiply Word Low Unsigned, Vector Multiply Word Low Unsigned, Modulo, Integer EVX-form Modulo, Integer to AccumulatorEVX-form evmwlumi RT,RA,RB evmwlumia RT,RA,RB 4 RT RA RB 1096 4 RT RA RB 1128 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 (RA)0:31 ×ui (RB)0:31 temp0:63 (RA)0:31 ×ui (RB)0:31 RT0:31 temp32:63 RT0:31 temp32:63 temp0:63 (RA)32:63 ×ui (RB)32:63 temp0:63 (RA)32:63 ×ui (RB)32:63 RT32:63 temp32:63 RT32:63 temp32:63 ACC0:63 (RT)0:63 The corresponding word unsigned-integer elements in RA and RB are multiplied. The least significant 32 bits The corresponding word unsigned-integer elements in of each product are placed into the two corresponding RA and RB are multiplied. The least significant 32 bits words of RT. of each product are placed into the two corresponding words of RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC Programming Note The least significant 32 bits of the product are inde- Programming Note pendent of whether the word elements in RA and The least significant 32 bits of the product are inde- RB are treated as signed or unsigned 32-bit inte- pendent of whether the word elements in RA and gers. RB are treated as signed or unsigned 32-bit inte- gers. Note that evmwlumi can be used for signed or unsigned integers. Note that evmwlumia can be used for signed or unsigned integers. Vector Multiply Word Low Unsigned, Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Modulo, Integer and Accumulate Negative Words EVX-form in Words EVX-form evmwlumiaaw RT,RA,RB evmwlumianw RT,RA,RB 4 RT RA RB 1352 4 RT RA RB 1480 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 (RA)0:31 ×ui (RB)0:31 temp0:63 (RA)0:31 ×ui (RB)0:31 RT0:31 (ACC)0:31 + temp32:63 RT0:31 (ACC)0:31 - temp32:63 temp0:63 (RA)32:63 ×ui (RB)32:63 temp0:63 (RA)32:63 ×ui (RB)32:63 RT32:63 (ACC)32:63 - temp32:63 RT32:63 (ACC)32:63 + temp32:63 ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 For each word element in the accumulator, the corre- For each word element in the accumulator, the corre- sponding word unsigned-integer elements in RA and sponding word unsigned-integer elements in RA and RB are multiplied. The least significant 32 bits of each RB are multiplied. The least significant 32 bits of each product are subtracted from the contents of the corre- product are added to the contents of the corresponding sponding accumulator word and the result is placed in accumulator word and the result is placed in RT and RT and the accumulator. the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC 542 Power ISATM Book I Version 2.06 Vector Multiply Word Low Unsigned, Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Saturate, Integer and Accumulate Words EVX-form Negative in Words EVX-form evmwlusiaaw RT,RA,RB evmwlusianw RT,RA,RB 4 RT RA RB 1344 4 RT RA RB 1472 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 (RA)0:31 ×ui (RB)0:31 temp0:63 (RA)0:31 ×ui (RB)0:31 temp0:63 EXTZ((ACC)0:31) + EXTZ(temp32:63) temp0:63 EXTZ((ACC)0:31) - EXTZ(temp32:63) ovh temp31 ovh temp31 RT0:31 SATURATE(ovh, 0, 0xFFFF_FFFF, 0xFFFF_FFFF, RT0:31 SATURATE(ovh, 0, 0x0000_0000, 0x0000_0000, temp32:63) temp32:63) temp0:63 (RA)32:63 ×ui (RB)32:63 temp0:63 (RA)32:63 ×ui (RB)32:63 temp0:63 EXTZ((ACC)32:63) + EXTZ(temp32:63) temp0:63 EXTZ((ACC)32:63) - EXTZ(temp32:63) ovl temp31 ovl temp31 RT32:63 SATURATE(ovl, 0, 0xFFFF_FFFF, RT32:63 SATURATE(ovl, 0, 0x0000_0000, 0xFFFF_FFFF, temp32:63) 0x0000_0000, temp32:63) ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 SPEFSCROVH ovh SPEFSCROVH ovh SPEFSCROV ovl SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl SPEFSCRSOV SPEFSCRSOV | ovl For each word element in the accumulator, correspond- For each word element in the accumulator, correspond- ing word unsigned-integer elements in RA and RB are ing word unsigned-integer elements in RA and RB are multiplied producing a 64-bit product. The least signifi- multiplied producing a 64-bit product. The least signifi- cant 32 bits of each product are then added to the cor- cant 32 bits of each product are then subtracted from responding word in the accumulator saturating if the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the overflow occurs, and the result is placed in RT and the accumulator. accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH Vector Multiply Word Signed, Modulo, Vector Multiply Word Signed, Modulo, Fractional EVX-form Fractional to Accumulator EVX-form evmwsmf RT,RA,RB evmwsmfa RT,RA,RB 4 RT RA RB 1115 4 RT RA RB 1147 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 (RA)32:63 ×sf (RB)32:63 RT0:63 (RA)32:63 ×sf (RB)32:63 ACC0:63 (RT)0:63 The corresponding low word signed fractional elements in RA and RB are multiplied. The product is placed in The corresponding low word signed fractional elements RT. in RA and RB are multiplied. The product is placed in RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC Chapter 8. Signal Processing Engine (SPE) 543 Version 2.06 Vector Multiply Word Signed, Modulo, Vector Multiply Word Signed, Modulo, Fractional and Accumulate EVX-form Fractional and Accumulate Negative EVX-form evmwsmfaa RT,RA,RB evmwsmfan RT,RA,RB 4 RT RA RB 1371 0 6 11 16 21 31 4 RT RA RB 1499 0 6 11 16 21 31 temp0:63 (RA)32:63 ×sf (RB)32:63 RT0:63 (ACC)0:63 + temp0:63 temp0:63 (RA)32:63 ×sf (RB)32:63 ACC0:63 (RT)0:63 RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 The corresponding low word signed fractional elements in RA and RB are multiplied. The intermediate product The corresponding low word signed fractional elements is added to the contents of the 64-bit accumulator and in RA and RB are multiplied. The intermediate product the result is placed in RT and the accumulator. is subtracted from the contents of the accumulator and the result is placed in RT and the accumulator. Special Registers Altered: ACC Special Registers Altered: ACC Vector Multiply Word Signed, Modulo, Vector Multiply Word Signed, Modulo, Integer EVX-form Integer to Accumulator EVX-form evmwsmi RT,RA,RB evmwsmia RT,RA,RB 4 RT RA RB 1113 4 RT RA RB 1145 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 (RA)32:63 ×si (RB)32:63 RT0:63 (RA)32:63 ×si (RB)32:63 ACC0:63 (RT)0:63 The low word signed-integer elements in RA and RB are multiplied. The product is placed in RT. The low word signed-integer elements in RA and RB are multiplied. The product is placed in RT and the Special Registers Altered: accumulator. None Special Registers Altered: ACC Vector Multiply Word Signed, Modulo, Vector Multiply Word Signed, Modulo, Integer and Accumulate EVX-form Integer and Accumulate Negative EVX-form evmwsmiaa RT,RA,RB evmwsmian RT,RA,RB 4 RT RA RB 1369 0 6 11 16 21 31 4 RT RA RB 1497 0 6 11 16 21 31 temp0:63 (RA)32:63 ×si (RB)32:63 RT0:63 (ACC)0:63 + temp0:63 temp0:63 (RA)32:63 ×si (RB)32:63 ACC0:63 (RT)0:63 RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 The low word signed-integer elements in RA and RB are multiplied. The intermediate product is added to the The low word signed-integer elements in RA and RB contents of the 64-bit accumulator and the result is are multiplied. The intermediate product is subtracted placed in RT and the accumulator. from the contents of the 64-bit accumulator and the result is placed in RT and the accumulator. Special Registers Altered: ACC Special Registers Altered: ACC 544 Power ISATM Book I Version 2.06 Vector Multiply Word Signed, Saturate, Vector Multiply Word Signed, Saturate, Fractional EVX-form Fractional to Accumulator EVX-form evmwssf RT,RA,RB evmwssfa RT,RA,RB 4 RT RA RB 1107 4 RT RA RB 1139 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 (RA)32:63 ×sf (RB)32:63 temp0:63 (RA)32:63 ×sf (RB)32:63 if ((RA)32:63 = 0x8000_0000) & (RB32:63 = 0x8000_0000) if ((RA)32:63=0x8000_0000)&((RB)32:63=0x8000_0000) then then RT0:63 0x7FFF_FFFF_FFFF_FFFF RT0:63 0x7FFF_FFFF_FFFF_FFFF mov 1 mov 1 else else RT0:63 temp0:63 RT0:63 temp0:63 mov 0 mov 0 SPEFSCROVH 0 ACC0:63 (RT)0:63 SPEFSCROV mov SPEFSCROVH 0 SPEFSCRSOV SPEFSCRSOV | mov SPEFSCROV mov SPEFSCRSOV SPEFSCRSOV | mov The low word signed fractional elements in RA and RB are multiplied. The 64-bit product is placed in RT. If The low word signed fractional elements in RA and RB both inputs are -1.0, the result saturates to the largest are multiplied. The 64-bit product is placed in RT and positive signed fraction. into the accumulator. If both inputs are -1.0, the result saturates to the largest positive signed fraction. Special Registers Altered: OV OVH SOV Special Registers Altered: ACC OV OVH SOV Chapter 8. Signal Processing Engine (SPE) 545 Version 2.06 Vector Multiply Word Signed, Saturate, Vector Multiply Word Signed, Saturate, Fractional and Accumulate EVX-form Fractional and Accumulate Negative EVX-form evmwssfaa RT,RA,RB evmwssfan RT,RA,RB 4 RT RA RB 1363 0 6 11 16 21 31 4 RT RA RB 1491 0 6 11 16 21 31 temp0:63 (RA)32:63 ×sf (RB)32:63 if ((RA)32:63=0x8000_0000)&((RB)32:63=0x8000_0000) temp0:63 (RA)32:63 ×sf (RB)32:63 then if ((RA)32:63=0x8000_0000)&((RB)32:63=0x8000_0000) temp0:63 0x7FFF_FFFF_FFFF_FFFF then mov 1 temp0:63 0x7FFF_FFFF_FFFF_FFFF else mov 1 mov 0 else temp0:64 EXTS((ACC)0:63) + EXTS(temp0:63) mov 0 ov (temp0 temp1) temp0:64 EXTS((ACC)0:63) - EXTS(temp0:63) RT0:63 temp1:64 ov (temp0 temp1) RT0:63 temp1:64 ACC0:63 (RT)0:63 ACC0:63 (RT)0:63 SPEFSCROVH 0 SPEFSCROVH 0 SPEFSCROV ov | mov SPEFSCROV ov | mov SPEFSCRSOV SPEFSCRSOV | ov | mov SPEFSCRSOV SPEFSCRSOV | ov | mov The low word signed fractional elements in RA and RB The low word signed fractional elements in RA and RB are multiplied producing a 64-bit product. If both inputs are multiplied producing a 64-bit product. If both inputs are -1.0, the product saturates to the largest positive are -1.0, the product saturates to the largest positive signed fraction. The 64-bit product is then added to the signed fraction. The 64-bit product is then subtracted accumulator and the result is placed in RT and the from the accumulator and the result is placed in RT and accumulator. the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV ACC OV OVH SOV Vector Multiply Word Unsigned, Modulo, Vector Multiply Word Unsigned, Modulo, Integer EVX-form Integer to Accumulator EVX-form evmwumi RT,RA,RB evmwumia RT,RA,RB 4 RT RA RB 1112 4 RT RA RB 1144 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 (RA)32:63 ×ui (RB)32:63 RT0:63 (RA)32:63 ×ui (RB)32:63 ACC0:63 (RT)0:63 The low word unsigned-integer elements in RA and RB are multiplied to form a 64-bit product that is placed in The low word unsigned-integer elements in RA and RB RT. are multiplied to form a 64-bit product that is placed in RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC 546 Power ISATM Book I Version 2.06 Vector Multiply Word Unsigned, Modulo, Vector Multiply Word Unsigned, Modulo, Integer and Accumulate EVX-form Integer and Accumulate Negative EVX-form evmwumiaa RT,RA,RB evmwumian RT,RA,RB 4 RT RA RB 1368 0 6 11 16 21 31 4 RT RA RB 1496 0 6 11 16 21 31 temp0:63 (RA)32:63 ×ui (RB)32:63 RT0:63 (ACC)0:63 + temp0:63 temp0:63 (RA)32:63 ×ui (RB)32:63 ACC0:63 (RT)0:63 RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 The low word unsigned-integer elements in RA and RB are multiplied. The intermediate product is added to the The low word unsigned-integer elements in RA and RB contents of the 64-bit accumulator, and the resulting are multiplied. The intermediate product is subtracted value is placed into the accumulator and in RT. from the contents of the 64-bit accumulator, and the resulting value is placed into the accumulator and in Special Registers Altered: RT. ACC Special Registers Altered: ACC Vector NAND EVX-form Vector Negate EVX-form evnand RT,RA,RB evneg RT,RA 4 RT RA RB 542 4 RT RA /// 521 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 ¬((RA)0:31 & (RB)0:31) RT0:31 NEG((RA)0:31) RT32:63 ¬((RA)32:63 & (RB)32:63) RT32:63 NEG((RA)32:63) Each element of RA and RB is bitwise NANDed. The The negative of each element of RA is placed in RT. result is placed in the corresponding element of RT. The negative of 0x8000_0000 (most negative number) returns 0x8000_0000. Special Registers Altered: None Special Registers Altered: None Chapter 8. Signal Processing Engine (SPE) 547 Version 2.06 Vector NOR EVX-form Vector OR EVX-form evnor RT,RA,RB evor RT,RA,RB 4 RT RA RB 536 4 RT RA RB 535 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 ¬((RA)0:31 | (RB)0:31) RT0:31 (RA)0:31 | (RB)0:31 RT32:63 ¬((RA)32:63 | (RB)32:63) RT32:63 (RA)32:63 | (RB)32:63 Each element of RA and RB is bitwise NORed. The Each element of RA and RB is bitwise ORed. The result is placed in the corresponding element of RT. result is placed in the corresponding element of RT. Special Registers Altered: Special Registers Altered: None None Extended Mnemonics: Extended Mnemonics: Extended mnemonics are provided for the Vector NOR Extended mnemonics are provided for the Vector OR instruction to produce a vector bitwise complement instruction to provide a 64-bit vector move instruction. operation. Extended: Equivalent to: Extended: Equivalent to: evmr RT,RA evor RT,RA,RA evnot RT,RA evnor RT,RA,RA Vector OR with Complement EVX-form Vector Rotate Left Word EVX-form evorc RT,RA,RB evrlw RT,RA,RB 4 RT RA RB 539 4 RT RA RB 552 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 (RA)0:31 | (¬(RB)0:31) nh (RB)27:31 RT32:63 (RA)32:63 | (¬(RB)32:63) nl (RB)59:63 RT0:31 ROTL((RA)0:31, nh) Each element of RA is bitwise ORed with the comple- RT32:63 ROTL((RA)32:63, nl) ment of RB. The result is placed in the corresponding element of RT. Each of the high and low elements of RA is rotated left by an amount specified in RB. The result is placed in Special Registers Altered: RT. Rotate values for each element of RA are found in None bit positions RB27:31 and RB59:63. Special Registers Altered: None 548 Power ISATM Book I Version 2.06 Vector Rotate Left Word Immediate Vector Round Word EVX-form EVX-form evrndw RT,RA evrlwi RT,RA,UI 4 RT RA /// 524 4 RT RA UI 554 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 ((RA)0:31+0x00008000) & 0xFFFF0000 n UI RT32:63 ((RA)32:63+0x00008000) & 0xFFFF0000 RT0:31 ROTL((RA)0:31, n) The 32-bit elements of RA are rounded into 16 bits. RT32:63 ROTL((RA)32:63, n) The result is placed in RT. The resulting 16 bits are Both the high and low elements of RA are rotated left placed in the most significant 16 bits of each element of by an amount specified by UI. RT, zeroing out the low-order 16 bits of each element. Special Registers Altered: Special Registers Altered: None None Vector Select EVS-form evsel RT,RA,RB,BFA 4 RT RA RB 79 BFA 0 6 11 16 21 29 31 ch CRBFA×4 cl CRBFA×4+1 if (ch = 1) then RT0:31 (RA)0:31 else RT0:31 (RB)0:31 if (cl = 1) then RT32:63 (RA)32:63 else RT32:63 (RB)32:63 If the most significant bit in the BFA field of CR is set to 1, the high-order element of RA is placed in the high-order element of RT; otherwise, the high-order element of RB is placed into the high-order element of RT. If the next most significant bit in the BFA field of CR is set to 1, the low-order element of RA is placed in the low-order element of RT, otherwise, the low-order ele- ment of RB is placed into the low-order element of RT. Special Registers Altered: None Chapter 8. Signal Processing Engine (SPE) 549 Version 2.06 Vector Shift Left Word EVX-form Vector Shift Left Word Immediate EVX-form evslw RT,RA,RB evslwi RT,RA,UI 4 RT RA RB 548 0 6 11 16 21 31 4 RT RA UI 550 0 6 11 16 21 31 nh (RB)26:31 nl (RB)58:63 n UI RT0:31 SL((RA)0:31, nh) RT0:31 SL((RA)0:31, n) RT32:63 SL((RA)32:63, nl) RT32:63 SL((RA)32:63, n) Each of the high and low elements of RA is shifted left Both high and low elements of RA are shifted left by the by an amount specified in RB. The result is placed in 5-bit UI value and the results are placed in RT. RT. The separate shift amounts for each element are specified by 6 bits in RB that lie in bit positions 26:31 Special Registers Altered: and 58:63. None Shift amounts from 32 to 63 give a zero result. Special Registers Altered: None Vector Splat Fractional Immediate Vector Splat Immediate EVX-form EVX-form evsplati RT,SI evsplatfi RT,SI 4 RT SI /// 553 4 RT SI /// 555 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 EXTS(SI) RT0:31 SI || 270 RT32:63 EXTS(SI) RT32:63 SI || 270 The value specified by SI is sign extended and placed The value specified by SI is padded with trailing zeros in both elements of RT. and placed in both elements of RT. The SI ends up in Special Registers Altered: bit positions RT0:4 and RT32:36. None Special Registers Altered: None Vector Shift Right Word Immediate Vector Shift Right Word Immediate Signed Unsigned EVX-form EVX-form evsrwiu RT,RA,UI evsrwis RT,RA,UI 4 RT RA UI 546 4 RT RA UI 547 0 6 11 16 21 31 0 6 11 16 21 31 n UI n UI RT0:31 EXTZ((RA)0:31-n) RT0:31 EXTS((RA)0:31-n) RT32:63 EXTZ((RA)32:63-n) RT32:63 EXTS((RA)32:63-n) Both high and low elements of RA are shifted right by Both high and low elements of RA are shifted right by the 5-bit UI value; zeros are shifted into the most signif- the 5-bit UI value. Bits in the most significant positions icant position. vacated by the shift are filled with a copy of the sign bit. Special Registers Altered: Special Registers Altered: None None 550 Power ISATM Book I Version 2.06 Vector Shift Right Word Signed EVX-form Vector Shift Right Word Unsigned EVX-form evsrws RT,RA,RB evsrwu RT,RA,RB 4 RT RA RB 545 0 6 11 16 21 31 4 RT RA RB 544 0 6 11 16 21 31 nh (RB)26:31 nl (RB)58:63 nh (RB)26:31 RT0:31 EXTS((RA)0:31-nh) nl (RB)58:63 RT32:63 EXTS((RA)32:63-nl) RT0:31 EXTZ((RA)0:31-nh) RT32:63 EXTZ((RA)32:63-nl) Both the high and low elements of RA are shifted right by an amount specified in RB. The result is placed in Both the high and low elements of RA are shifted right RT. The separate shift amounts for each element are by an amount specified in RB. The result is placed in specified by 6 bits in RB that lie in bit positions 26:31 RT. The separate shift amounts for each element are and 58:63. The sign bits are shifted into the most signif- specified by 6 bits in RB that lie in bit positions 26:31 icant position. and 58:63. Zeros are shifted into the most significant position. Shift amounts from 32 to 63 give a result of 32 sign bits. Shift amounts from 32 to 63 give a zero result. Special Registers Altered: None Special Registers Altered: None Vector Store Double of Double EVX-form Vector Store Double of Double Indexed EVX-form evstdd RS,D(RA) evstddx RS,RA,RB 4 RS RA UI 801 0 6 11 16 21 31 4 RS RA RB 800 0 6 11 16 21 31 if (RA = 0) then b 0 else b (RA) if (RA = 0) then b 0 EA b + EXTZ(UI×8) else b (RA) MEM(EA,8) (RS)0:63 EA b + (RB) MEM(EA,8) (RS)0:63 D in the instruction mnemonic is UI × 8. The contents of RS are stored as a doubleword in storage addressed The contents of RS are stored as a doubleword in stor- by EA. age addressed by EA. Special Registers Altered: Special Registers Altered: None None Chapter 8. Signal Processing Engine (SPE) 551 Version 2.06 Vector Store Double of Four Halfwords Vector Store Double of Four Halfwords EVX-form Indexed EVX-form evstdh RS,D(RA) evstdhx RS,RA,RB 4 RS RA UI 805 4 RS RA RB 804 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 0 if (RA = 0) then b 0 else b (RA) else b (RA) EA b + EXTZ(UI×8) EA b + (RB) MEM(EA,2) (RS)0:15 MEM(EA,2) (RS)0:15 MEM(EA+2,2) (RS)16:31 MEM(EA+2,2) (RS)16:31 MEM(EA+4,2) (RS)32:47 MEM(EA+4,2) (RS)32:47 MEM(EA+6,2) (RS)48:63 MEM(EA+6,2) (RS)48:63 D in the instruction mnemonic is UI × 8. The contents of The contents of RS are stored as four halfwords in stor- RS are stored as four halfwords in storage addressed age addressed by EA. by EA. Special Registers Altered: Special Registers Altered: None None Vector Store Double of Two Words Vector Store Double of Two Words EVX-form Indexed EVX-form evstdw RS,D(RA) evstdwx RS,RA,RB 4 RS RA UI 803 4 RS RA RB 802 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 0 if (RA = 0) then b 0 else b (RA) else b (RA) EA b + EXTZ(UI×8) EA b + (RB) MEM(EA,4) (RS)0:31 MEM(EA,4) (RS)0:31 MEM(EA+4,4) (RS)32:63 MEM(EA+4,4) (RS)32:63 D in the instruction mnemonic is UI × 8. The contents of The contents of RS are stored as two words in storage RS are stored as two words in storage addressed by addressed by EA. EA. Special Registers Altered: Special Registers Altered: None None 552 Power ISATM Book I Version 2.06 Vector Store Word of Two Halfwords from Vector Store Word of Two Halfwords from Even EVX-form Even Indexed EVX-form evstwhe RS,D(RA) evstwhex RS,RA,RB 4 RS RA UI 817 4 RS RA RB 816 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 0 if (RA = 0) then b 0 else b (RA) else b (RA) EA b + EXTZ(UI×4) EA b + (RB) MEM(EA,2) (RS)0:15 MEM(EA,2) (RS)0:15 MEM(EA+2,2) (RS)32:47 MEM(EA+2,2) (RS)32:47 D in the instruction mnemonic is UI × 4. The even half- The even halfwords from each element of RS are words from each element of RS are stored as two half- stored as two halfwords in storage addressed by EA. words in storage addressed by EA. Special Registers Altered: Special Registers Altered: None None Vector Store Word of Two Halfwords from Vector Store Word of Two Halfwords from Odd EVX-form Odd Indexed EVX-form evstwho RS,D(RA) evstwhox RS,RA,RB 4 RS RA UI 821 4 RS RA RB 820 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 0 if (RA = 0) then b 0 else b (RA) else b (RA) EA b + EXTZ(UI×4) EA b + (RB) MEM(EA,2) (RS)16:31 MEM(EA,2) (RS)16:31 MEM(EA+2,2) (RS)48:63 MEM(EA+2,2) (RS)48:63 D in the instruction mnemonic is UI × 4. The odd half- The odd halfwords from each element of RS are stored words from each element of RS are stored as two half- as two halfwords in storage addressed by EA. words in storage addressed by EA. Special Registers Altered: Special Registers Altered: None None Vector Store Word of Word from Even Vector Store Word of Word from Even EVX-form Indexed EVX-form evstwwe RS,D(RA) evstwwex RS,RA,RB 4 RS RA UI 825 4 RS RA RB 824 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 0 if (RA = 0) then b 0 else b (RA) else b (RA) EA b + EXTZ(UI×4) EA b + (RB) MEM(EA,4) (RS)0:31 MEM(EA,4) (RS)0:31 D in the instruction mnemonic is UI × 4. The even word The even word of RS is stored in storage addressed by of RS is stored in storage addressed by EA. EA. Special Registers Altered: Special Registers Altered: None None Chapter 8. Signal Processing Engine (SPE) 553 Version 2.06 Vector Store Word of Word from Odd Vector Store Word of Word from Odd EVX-form Indexed EVX-form evstwwo RS,D(RA) evstwwox RS,RA,RB 4 RS RA UI 829 4 RS RA RB 828 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 0 if (RA = 0) then b 0 else b (RA) else b (RA) EA b + EXTZ(UI×4) EA b + (RB) MEM(EA,4) (RS)32:63 MEM(EA,4) (RS)32:63 D in the instruction mnemonic is UI × 4. The odd word The odd word of RS is stored in storage addressed by of RS is stored in storage addressed by EA. EA. Special Registers Altered: Special Registers Altered: None None Vector Subtract Signed, Modulo, Integer Vector Subtract Signed, Saturate, Integer to Accumulator Word EVX-form to Accumulator Word EVX-form evsubfsmiaaw RT,RA evsubfssiaaw RT,RA 4 RT RA /// 1227 4 RT RA /// 1219 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 (ACC)0:31 - (RA)0:31 temp0:63 EXTS((ACC)0:31) - EXTS((RA)0:31) RT32:63 (ACC)32:63 - (RA)32:63 ovh temp31 temp32 ACC0:63 (RT)0:63 RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) Each word element in RA is subtracted from the corre- temp0:63 EXTS((ACC)32:63) - EXTS((RA)32:63) sponding element in the accumulator and the differ- ovl temp31 temp32 ence is placed into the corresponding RT word and into RT32:63 SATURATE(ovl, temp31, 0x8000_0000, the accumulator. 0x7FFF_FFFF, temp32:63) ACC0:63 (RT)0:63 Special Registers Altered: SPEFSCROVH ovh ACC SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl Each signed-integer word element in RA is sign-extended and subtracted from the corresponding sign-extended element in the accumulator saturating if overflow occurs, and the results are placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH 554 Power ISATM Book I Version 2.06 Vector Subtract Unsigned, Modulo, Vector Subtract from Word EVX-form Integer to Accumulator Word EVX-form evsubfw RT,RA,RB evsubfumiaaw RT,RA 4 RT RA RB 516 4 RT RA /// 1226 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 (RB)0:31 - (RA)0:31 RT0:31 (ACC)0:31 - (RA)0:31 RT32:63 (RB)32:63 - (RA)32:63 RT32:63 (ACC)32:63 - (RA)32:63 Each signed-integer element of RA is subtracted from ACC0:63 (RT)0:63 the corresponding element of RB and the results are Each unsigned-integer word element in RA is sub- placed in RT. tracted from the corresponding element in the accumu- Special Registers Altered: lator and the results are placed in RT and into the None accumulator. Special Registers Altered: ACC Vector Subtract Unsigned, Saturate, Vector Subtract Immediate from Word Integer to Accumulator Word EVX-form EVX-form evsubfusiaaw RT,RA evsubifw RT,UI,RB 4 RT RA /// 1218 4 RT UI RB 518 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 EXTZ((ACC)0:31) - EXTZ((RA)0:31) RT0:31 (RB)0:31 - EXTZ(UI) ovh temp31 RT32:63 (RB)32:63 - EXTZ(UI) RT0:31 SATURATE(ovh, temp31, 0x0000_0000, 0x0000_0000, temp32:63) UI is zero-extended and subtracted from both the high temp0:63 EXTS((ACC)32:63) - EXTS((RA)32:63) and low elements of RB. Note that the same value is ovl temp31 subtracted from both elements of the register. RT32:63 SATURATE(ovl, temp31, 0x0000_0000, 0x0000_0000, temp32:63) Special Registers Altered: ACC0:63 (RT)0:63 None SPEFSCROVH ovh SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl Vector XOR EVX-form Each unsigned-integer word element in RA is zero-extended and subtracted from the corresponding evxor RT,RA,RB zero-extended element in the accumulator saturating if overflow occurs, and the results are placed in RT and 4 RT RA RB 534 0 6 11 16 21 31 the accumulator. Special Registers Altered: RT0:31 (RA)0:31 (RB)0:31 ACC OV OVH SOV SOVH RT32:63 (RA)32:63 (RB)32:63 Each element of RA and RB is exclusive-ORed. The results are placed in RT. Special Registers Altered: None Chapter 8. Signal Processing Engine (SPE) 555 Version 2.06 556 Power ISATM Book I Version 2.06 Chapter 9. Embedded Floating-Point [Category: SPE.Embedded Float Scalar Double] [Category: SPE.Embedded Float Scalar Single] [Category: SPE.Embedded Float Vector] 9.1 Overview. . . . . . . . . . . . . . . . . . . . 557 9.2.4.1 Sticky Bit Handling For Exception 9.2 Programming Model . . . . . . . . . . . 558 Conditions . . . . . . . . . . . . . . . . . . . . . . 561 9.2.1 Signal Processing Embedded 9.3 Embedded Floating-Point Floating-Point Status and Control Register Instructions . . . . . . . . . . . . . . . . . . . . . 562 (SPEFSCR). . . . . . . . . . . . . . . . . . . . . 558 9.3.1 Load/Store Instructions . . . . . . . 562 9.2.2 Floating-Point Data Formats . . . 558 9.3.2 SPE.Embedded Float Vector 9.2.3 Exception Conditions . . . . . . . . . 559 Instructions [Category: SPE.Embedded 9.2.3.1 Denormalized Values on Input 559 Float Vector]. . . . . . . . . . . . . . . . . . . . . 562 9.2.3.2 Embedded Floating-Point Overflow 9.3.3 SPE.Embedded Float Scalar Single and Underflow . . . . . . . . . . . . . . . . . . . 559 Instructions 9.2.3.3 Embedded Floating-Point Invalid [Category: SPE.Embedded Float Scalar Operation/Input Errors. . . . . . . . . . . . . 560 Single] . . . . . . . . . . . . . . . . . . . . . . . . . 570 9.2.3.4 Embedded Floating-Point Round 9.3.4 SPE.Embedded Float Scalar Double (Inexact) . . . . . . . . . . . . . . . . . . . . . . . 560 Instructions 9.2.3.5 Embedded Floating-Point Divide by [Category: SPE.Embedded Float Scalar Zero. . . . . . . . . . . . . . . . . . . . . . . . . . . 560 Double] . . . . . . . . . . . . . . . . . . . . . . . . 577 9.2.3.6 Default Results . . . . . . . . . . . . 560 9.4 Embedded Floating-Point Results 9.2.4 IEEE 754 Compliance . . . . . . . . 560 Summary . . . . . . . . . . . . . . . . . . . . . . . 586 9.1 Overview Single-precision floating-point is handled by the SPE.Embedded Float Vector and SPE.Embedded The Embedded Floating-Point categories require the Float Scalar Single categories; double-precision float- implementation of the Signal Processing Engine (SPE) ing-point is handled by the SPE.Embedded Float Sca- category and consist of three distinct categories: lar Double category. Embedded vector single-precision floating-point (SPE.Embedded Float Vector [SP.FV]) Embedded scalar single-precision floating-point (SPE.Embedded Float Scalar Single [SP.FS]) Embedded scalar double-precision floating-point (SPE.Embedded Float Scalar Double [SP.FD]) Although each of these may be implemented indepen- dently, they are defined in a single chapter because it is likely that they may be implemented together. References to Embedded Floating-Point categories, Embedded Floating-Point instructions, or Embedded Floating-Point operations apply to all 3 categories. Chapter 9. Embedded Floating-Point 557 Version 2.06 9.2 Programming Model ing-point data elements are 64 bits wide with 1 sign bit (s), 11 bits of biased exponent (e) and 52 bits of frac- Embedded floating-point operations are performed in tion (f). the GPRs of the processor. In the IEEE 754 specification, floating-point values are The SPE.Embedded Float Vector and SPE.Embedded represented in a format consisting of three explicit Float Scalar Double categories require a GPR register fields (sign field, biased exponent field, and fraction file with thirty-two 64-bit registers as required by the field) and an implicit hidden bit. Signal Processing Engine category. hidden bit The SPE.Embedded Float Scalar Single category 0 1 8 9 31 (or 32:63) s exp fraction Single-precision requires a GPR register file with thirty-two 32-bit regis- ters. When implemented with a 64-bit register file on a 0 1 11 12 63 32-bit implementation, instructions in this category only s exp fraction Double-precision use and modify bits 32:63 of the GPR. In this case, bits s - sign bit; 0 = positive; 1 = negative 0:31 of the GPR are left unchanged by the operation. exp - biased exponent field For 64-bit implementations, bits 0:31 are unchanged fraction - fractional portion of number after the operation. Figure 127.Floating-Point Data Format Instructions in the SPE.Embedded Float Scalar Double category operate on the entire 64 bits of the GPRs. For single-precision normalized numbers, the biased exponent value e lies in the range of 1 to 254 corre- Instructions in the SPE.Embedded Float Vector cate- sponding to an actual exponent value E in the range gory operate on the entire 64 bits of the GPRs as well, -126 to +127. For double-precision normalized num- but contain two 32-bit data items that are operated on bers, the biased exponent value e lies in the range of 1 independently of each other in a SIMD fashion. The for- to 2046 corresponding to an actual exponent value E in mat of both data items is the same as the format of a the range -1022 to +1023. With the hidden bit implied to data item in the SPE.Embedded Float Scalar Single be `1' (for normalized numbers), the value of the num- category. The data item contained in bits 0:31 is called ber is interpreted as follows: the `high word'. The data item contained in bits 32:63 is called the `low word'. s ( ­ 1 ) × 2 E × ( 1.fraction ) There are no record forms of Embedded Floating-Point instructions. Embedded Floating-Point Compare where E is the unbiased exponent and 1.fraction is the instructions treat NaNs, Infinity, and Denorm as nor- mantissa (or significand) consisting of a leading `1' (the malized numbers for the comparison calculation when hidden bit) and a fractional part (fraction field). For the default results are provided. single-precision format, the maximum positive normal- ized number (pmax) is represented by the encoding 9.2.1 Signal Processing Embed- 0x7F7FFFFF which is approximately 3.4E+38 (2128), and the minimum positive normalized value (pmin) is ded Floating-Point Status and Con- represented by the encoding 0x00800000 which is trol Register (SPEFSCR) approximately 1.2E-38 (2-126). For the double-precision format, the maximum positive normalized number Status and control for the Embedded Floating-Point (pmax) is represented by the encoding categories uses the SPEFSCR. This register is defined 0x7feFFFFF_FFFFFFFF which is approximately by the Signal Processing Engine category in Section 1.8E+307 (21024), and the minimum positive normal- 8.3.4. Status and control bits are shared for Embedded ized value (pmin) is represented by the encoding Floating-Point and SPE operations. Instructions in the 0x00100000_00000000 which is approximately SPE.Embedded Float Vector category affect both the 2.2E-308 (2-1022). high element (bits 34:39) and low element floating-point Two specific values of the biased exponent are status flags (bits 50:55). Instructions in the reserved (0 and 255 for single-precision; 0 and 2047 for SPE.Embedded Float Scalar Double and SPE.Embed- double-precision) for encoding special values of +0, -0, ded Float Scalar Single categories affect only the low +infinity, -infinity, and NaNs. element floating-point status flags and leave the high element floating-point status flags undefined. Zeros of both positive and negative sign are repre- sented by a biased exponent value e of 0 and a fraction f which is 0. 9.2.2 Floating-Point Data Formats Infinities of both positive and negative sign are repre- Single-precision floating-point data elements are 32 sented by a maximum exponent field value (255 for sin- bits wide with 1 sign bit (s), 8 bits of biased exponent gle-precision, 2047 for double-precision) and a fraction (e) and 23 bits of fraction (f). Double-precision float- which is 0. 558 Power ISATM Book I Version 2.06 Denormalized numbers of both positive and negative Programming Note sign are represented by a biased exponent value e of 0 and a fraction f, which is nonzero. For these numbers, On some implementations, operations that result in the hidden bit is defined by the IEEE 754 standard to overflow or underflow are likely to take significantly be 0. This number type is not directly supported in longer than operations that do not. For example, hardware. Instead, either a software interrupt handler is these operations may cause a system error handler invoked, or a default value is defined. to be invoked; on such implementations, the sys- tem error handler updates the overflow bits appro- Not-a-Numbers (NaNs) are represented by a maximum priately. exponent field value (255 for single-precision, 2047 for double-precision) and a fraction f which is nonzero. 9.2.3 Exception Conditions 9.2.3.3 Embedded Floating-Point Invalid Operation/Input Errors 9.2.3.1 Denormalized Values on Input Embedded Floating-Point Invalid Operation/Input Any denormalized value used as an operand may be errors occur when an operand to an operation contains truncated by the implementation to a properly signed an invalid input value. If any of the input values are zero value. Infinity, Denorm, or NaN, or for an Embedded Float- ing-Point Divide instruction both operands are +/-0, SPEFSCRFINV FINVH are set to 1 appropriately, and 9.2.3.2 Embedded Floating-Point Over- SPEFSCRFGH FXH FG FX are set to 0 appropriately. If flow and Underflow SPEFSCRFINVE=1, an Embedded Floating-Point Data interrupt is taken and the destination register is not Defining pmax to be the most positive normalized value updated. (farthest from zero), pmin the smallest positive normal- ized value (closest to zero), nmax the most negative normalized value (farthest from zero) and nmin the 9.2.3.4 Embedded Floating-Point smallest normalized negative value (closest to zero), Round (Inexact) an overflow is said to have occurred if the numerically correct result (r) of an instruction is such that r>pmax or If any result element of an Embedded Floating-Point r bh) then ch 1 if (ah < bh) then ch 1 else ch 0 else ch 0 if (al > bl) then cl 1 if (al < bl) then cl 1 else cl 0 else cl 0 CR4×BF:4×BF+3 ch || cl || (ch | cl) || (ch & cl) CR4×BF:4×BF+3 ch || cl || (ch | cl) || (ch & cl) Each element of register RA is compared against the Each element of register RA is compared against the corresponding element of register RB. The results of corresponding element of register RB. The results of the comparisons are placed into CR field BF. If RA0:31 the comparisons are placed into CR field BF. If RA0:31 is greater than RB0:31, bit 0 of CR field BF is set to 1, is less than RB0:31, bit 0 of CR field BF is set to 1, oth- otherwise it is set to 0. If RA32:63 is greater than erwise it is set to 0. If RA32:63 is less than RB32:63, bit 1 RB32:63, bit 1 of CR field BF is set to 1, otherwise it is of CR field BF is set to 1, otherwise it is set to 0. Bit 2 of set to 0. Bit 2 of CR field BF is set to the OR of both CR field BF is set to the OR of both result bits and Bit 3 result bits and Bit 3 of CR field BF is set to the AND of of CR field BF is set to the AND of both result bits. both result bits. Comparison ignores the sign of 0 Comparison ignores the sign of 0 (+0 = -0). (+0 = -0). If an input error occurs and default results are gener- If an input error occurs and default results are gener- ated, NaNs, Infinities, and Denorms as treated as nor- ated, NaNs, Infinities, and Denorms as treated as nor- malized numbers, using their values of `e' and `f' malized numbers, using their values of `e' and `f' directly. directly. Special Registers Altered: Special Registers Altered: FINV FINVH FINVS FINV FINVH FINVS FGH FXH FG FX FGH FXH FG FX CR field BF CR field BF Chapter 9. Embedded Floating-Point 563 Version 2.06 Vector Floating-Point Single-Precision Vector Floating-Point Single-Precision Compare Equal EVX-form Test Greater Than EVX-form evfscmpeq BF,RA,RB evfststgt BF,RA,RB 4 BF // RA RB 654 4 BF // RA RB 668 0 6 9 11 16 21 31 0 6 9 11 16 21 31 ah (RA)0:31 ah (RA)0:31 al (RA)32:63 al (RA)32:63 bh (RB)0:31 bh (RB)0:31 bl (RB)32:63 bl (RB)32:63 if (ah = bh) then ch 1 if (ah > bh) then ch 1 else ch 0 else ch 0 if (al = bl) then cl 1 if (al > bl) then cl 1 else cl 0 else cl 0 CR4×BF:4×BF+3 ch || cl || (ch | cl) || (ch & cl) CR4×BF:4×BF+3 ch || cl || (ch | cl) || (ch & cl) Each element of register RA is compared against the Each element of register RA is compared against the corresponding element of register RB. The results of corresponding element of register RB.The results of the the comparisons are placed into CR field BF. If RA0:31 comparisons are placed into CR field BF. If RA0:31 is is equal to RB0:31, bit 0 of CR field BF is set to 1, other- greater than RB0:31, bit 0 of CR field BF is set to 1, oth- wise it is set to 0. If RA32:63 is equal to RB32:63, bit 1 of erwise it is set to 0. If RA32:63 is greater than RB32:63, CR field BF is set to 1, otherwise it is set to 0. Bit 2 of bit 1 of CR field BF is set to 1, otherwise it is set to 0. CR field BF is set to the OR of both result bits and Bit 3 Bit 2 of CR field BF is set to the OR of both result bits of CR field BF is set to the AND of both result bits. and Bit 3 of CR field BF is set to the AND of both result Comparison ignores the sign of 0 (+0 = -0). bits. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, If an input error occurs and default results are gener- and Denorms as normalized numbers, using their val- ated, NaNs, Infinities, and Denorms as treated as nor- ues of `e' and `f' directly. malized numbers, using their values of `e' and `f' directly. No exceptions are taken during the execution of evfst- stgt. Special Registers Altered: FINV FINVH FINVS Special Registers Altered: FGH FXH FG FX CR field BF CR field BF Programming Note In an implementation, the execution of evfststgt is likely to be faster than the execution of evfscmpgt; however, if strict IEEE 754 compliance is required, the program should use evfscmpgt. 564 Power ISATM Book I Version 2.06 Vector Floating-Point Single-Precision Vector Floating-Point Single-Precision Test Less Than EVX-form Test Equal EVX-form evfststlt BF,RA,RB evfststeq BF,RA,RB 4 BF // RA RB 669 4 BF // RA RB 670 0 6 9 11 16 21 31 0 6 9 11 16 21 31 ah (RA)0:31 ah (RA)0:31 al (RA)32:63 al (RA)32:63 bh (RB)0:31 bh (RB)0:31 bl (RB)32:63 bl (RB)32:63 if (ah < bh) then ch 1 if (ah = bh) then ch 1 else ch 0 else ch 0 if (al < bl) then cl 1 if (al = bl) then cl 1 else cl 0 else cl 0 CR4×BF:4×BF+3 ch || cl || (ch | cl) || (ch & cl) CR4×BF:4×BF+3 ch || cl || (ch | cl) || (ch & cl) Each element of register RA is compared with the cor- Each element of register RA is compared against the responding element of register RB. The results of the corresponding element of register RB. The results of comparisons are placed into CR field BF. If RA0:31 is the comparisons are placed into CR field BF. If RA0:31 less than RB0:31, bit 0 of CR field BF is set to 1, other- is equal to RB0:31, bit 0 of CR field BF is set to 1, other- wise it is set to 0. If RA32:63 is less than RB32:63, bit 1 of wise it is set to 0. If RA32:63 is equal to RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bit 2 of CR field BF is set to 1, otherwise it is set to 0. Bit 2 of CR field BF is set to the OR of both result bits and Bit 3 CR field BF is set to the OR of both result bits and Bit 3 of CR field BF is set to the AND of both result bits. of CR field BF is set to the AND of both result bits. Comparison ignores the sign of 0 (+0 = -0). The com- Comparison ignores the sign of 0 (+0 = -0). The com- parison proceeds after treating NaNs, Infinities, and parison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of Denorms as normalized numbers, using their values of `e' and `f' directly. `e' and `f' directly. No exceptions are taken during the execution of evfst- No exceptions are taken during the execution of evfst- stlt. steq. Special Registers Altered: Special Registers Altered: CR field BF CR field BF Programming Note Programming Note In an implementation, the execution of evfststlt is In an implementation, the execution of evfststeq is likely to be faster than the execution of evfscmplt; likely to be faster than the execution of evfsc- however, if strict IEEE 754 compliance is required, mpeq; however, if strict IEEE 754 compliance is the program should use evfscmplt. required, the program should use evfscmpeq. Chapter 9. Embedded Floating-Point 565 Version 2.06 Vector Convert Floating-Point Vector Convert Floating-Point Single-Precision from Signed Integer Single-Precision from Unsigned Integer EVX-form EVX-form evfscfsi RT,RB evfscfui RT,RB 4 RT /// RB 657 4 RT /// RB 656 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 CnvtI32ToFP32((RB)0:31, S, HI, I) RT0:31 CnvtI32ToFP32((RB)0:31, U, HI, I) RT32:63 CnvtI32ToFP32((RB)32:63, S, LO, I) RT32:63 CnvtI32ToFP32((RB)32:63, U, LO, I) Each signed integer element of register RB is con- Each unsigned integer element of register RB is con- verted to the nearest single-precision floating-point verted to the nearest single-precision floating-point value using the current rounding mode and the results value using the current rounding mode and the results are placed into the corresponding element of register are placed into the corresponding elements of register RT. RT. Special Registers Altered: Special Registers Altered: FGH FXH FG FX FINXS FGH FXH FG FX FINXS Vector Convert Floating-Point Vector Convert Floating-Point Single-Precision from Signed Fraction Single-Precision from Unsigned Fraction EVX-form EVX-form evfscfsf RT,RB evfscfuf RT,RB 4 RT /// RB 659 4 RT /// RB 658 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 CnvtI32ToFP32((RB)0:31, S, HI, F) RT0:31 CnvtI32ToFP32((RB)0:31, U, HI, F) RT32:63 CnvtI32ToFP32((RB)32:63, S, LO, F) RT32:63 CnvtI32ToFP32((RB)32:63, U, LO, F) Each signed fractional element of register RB is con- Each unsigned fractional element of register RB is con- verted to a single-precision floating-point value using verted to a single-precision floating-point value using the current rounding mode and the results are placed the current rounding mode and the results are placed into the corresponding elements of register RT. into the corresponding elements of register RT. Special Registers Altered: Special Registers Altered: FGH FXH FG FX FINXS FGH FXH FG FX FINXS 566 Power ISATM Book I Version 2.06 Vector Convert Floating-Point Vector Convert Floating-Point Single-Precision to Signed Integer Single-Precision to Signed Integer with EVX-form Round toward Zero EVX-form evfsctsi RT,RB evfsctsiz RT,RB 4 RT /// RB 661 4 RT /// RB 666 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 CnvtFP32ToI32Sat((RB)0:31, S, HI, RND, I) RT0:31 CnvtFP32ToI32Sat((RB)0:31, S, HI, ZER, I) RT32:63 CnvtFP32ToI32Sat((RB)32:63, S, LO, RND, I) RT32:63 CnvtFP32ToI32Sat((RB)32:63, S, LO, ZER, I) Each single-precision floating-point element in register Each single-precision floating-point element in register RB is converted to a signed integer using the current RB is converted to a signed integer using the rounding rounding mode and the result is saturated if it cannot mode Round toward Zero and the result is saturated if be represented in a 32-bit integer. NaNs are converted it cannot be represented in a 32-bit integer. NaNs are as though they were zero. converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVH FINVS FINV FINVH FINVS FGH FXH FG FX FINXS FGH FXH FG FX FINXS Vector Convert Floating-Point Vector Convert Floating-Point Single-Precision to Unsigned Integer Single-Precision to Unsigned Integer with EVX-form Round toward Zero EVX-form evfsctui RT,RB evfsctuiz RT,RB 4 RT /// RB 660 4 RT /// RB 664 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 CnvtFP32ToI32Sat((RB)0:31, U, HI, RND, I) RT0:31 CnvtFP32ToI32Sat((RB)0:31, U, HI, ZER, I) RT32:63 CnvtFP32ToI32Sat((RB)32:63,U, LO, RND, I) RT32:63 CnvtFP32ToI32Sat((RB)32:63, U, LO, ZER, I) Each single-precision floating-point element in register Each single-precision floating-point element in register RB is converted to an unsigned integer using the cur- RB is converted to an unsigned integer using the rent rounding mode and the result is saturated if it can- rounding mode Round toward Zero and the result is not be represented in a 32-bit integer. NaNs are saturated if it cannot be represented in a 32-bit integer. converted as though they were zero. NaNs are converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVH FINVS FINV FINVH FINVS FGH FXH FG FX FINXS FGH FXH FG FX FINXS Chapter 9. Embedded Floating-Point 567 Version 2.06 Vector Convert Floating-Point Vector Convert Floating-Point Single-Precision to Signed Fraction Single-Precision to Unsigned Fraction EVX-form EVX-form evfsctsf RT,RB evfsctuf RT,RB 4 RT /// RB 663 4 RT /// RB 662 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 CnvtFP32ToI32Sat((RB)0:31, S, HI, RND ,F) RT0:31 CnvtFP32ToI32Sat((RB)0:31, U, HI, RND, F) RT32:63 CnvtFP32ToI32Sat((RB)32:63, S, LO, RND, F) RT32:63 CnvtFP32ToI32Sat((RB)32:63, U, LO, RND, F) Each single-precision floating-point element in register Each single-precision floating-point element in register RB is converted to a signed fraction using the current RB is converted to an unsigned fraction using the cur- rounding mode and the result is saturated if it cannot rent rounding mode and the result is saturated if it can- be represented in a 32-bit signed fraction. NaNs are not be represented in a 32-bit fraction. NaNs are converted as though they were zero. converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVH FINVS FINV FINVH FINVS FGH FXH FG FX FINXS FGH FXH FG FX FINXS 568 Power ISATM Book I Version 2.06 9.3.3 SPE.Embedded Float Scalar Single Instructions [Category: SPE.Embedded Float Scalar Single] Floating-Point Single-Precision Absolute Floating-Point Single-Precision Negative Value EVX-form Absolute Value EVX-form efsabs RT,RA efsnabs RT,RA 4 RT RA /// 708 4 RT RA /// 709 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 0b0 || (RA)33:63 RT32:63 0b1 || (RA)33:63 The sign bit of the low element of register RA is set to 0 The sign bit of the low element of register RA is set to 1 and the result is placed into the low element of register and the result is placed into the low element of register RT. RT. Regardless of the value of register RA, no exceptions Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. are taken during the execution of this instruction. Special Registers Altered: Special Registers Altered: None None Floating-Point Single-Precision Negate EVX-form efsneg RT,RA 4 RT RA /// 710 0 6 11 16 21 31 RT32:63 ¬(RA)32 || (RA)33:63 The sign bit of the low element of register RA is com- plemented and the result is placed into the low element of register RT. Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. Special Registers Altered: None Chapter 9. Embedded Floating-Point 569 Version 2.06 Floating-Point Single-Precision Add Floating-Point Single-Precision Subtract EVX-form EVX-form efsadd RT,RA,RB efssub RT,RA,RB 4 RT RA RB 704 4 RT RA RB 705 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 (RA)32:63 +sp (RB)32:63 RT32:63 (RA)32:63 -sp (RB)32:63 The low element of register RA is added to the low ele- The low element of register RB is subtracted from the ment of register RB and the result is stored in the low low element of register RA and the result is stored in element of register RT. the low element of register RT. If an underflow occurs, +0 (for rounding modes RN, RZ, If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in register RP) or -0 (for rounding mode RM) is stored in register RT. RT. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FOVF FOVFS FOVF FOVFS FUNF FUNFS FUNF FUNFS FG FX FINXS FG FX FINXS Floating-Point Single-Precision Multiply Floating-Point Single-Precision Divide EVX-form EVX-form efsmul RT,RA,RB efsdiv RT,RA,RB 4 RT RA RB 712 4 RT RA RB 713 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 (RA)32:63 ×sp (RB)32:63 RT32:63 (RA)32:63 ÷sp (RB)32:63 The low element of register RA is multiplied by the low The low element of register RA is divided by the low element of register RB and the result is stored in the element of register RB and the result is stored in the low element of register RT. low element of register RT. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FOVF FOVFS FG FX FINXS FUNF FUNFS FDBZ FDBZS FG FX FINXS FOVF FOVFS FUNF FUNFS 570 Power ISATM Book I Version 2.06 Floating-Point Single-Precision Compare Floating-Point Single-Precision Compare Greater Than EVX-form Less Than EVX-form efscmpgt BF,RA,RB efscmplt BF,RA,RB 4 BF // RA RB 716 4 BF // RA RB 717 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al (RA)32:63 al (RA)32:63 bl (RB)32:63 bl (RB)32:63 if (al > bl) then cl 1 if (al < bl) then cl 1 else cl 0 else cl 0 CR4×BF:4×BF+3 undefined || cl || undefined || undefined CR4×BF:4×BF+3 undefined || cl || undefined || undefined The low element of register RA is compared against the The low element of register RA is compared against the low element of register RB. The results of the compari- low element of register RB. If RA32:63 is less than sons are placed into CR field BF. If RA32:63 is greater RB32:63, bit 1 of CR field BF is set to 1, otherwise it is than RB32:63, bit 1 of CR field BF is set to 1, otherwise it set to 0. Bits 0, 2, and 3 of CR field BF are undefined. is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). Comparison ignores the sign of 0 (+0 = -0). If an Input Error occurs and default results are gener- If an Input Error occurs and default results are gener- ated, NaNs, Infinities, and Denorms are treated as nor- ated, NaNs, Infinities, and Denorms are treated as nor- malized numbers, using their values of `e' and `f' malized numbers, using their values of `e' and `f' directly. directly. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FG FX FG FX CR field BF CR field BF Chapter 9. Embedded Floating-Point 571 Version 2.06 Floating-Point Single-Precision Compare Floating-Point Single-Precision Test Equal EVX-form Greater Than EVX-form efscmpeq BF,RA,RB efststgt BF,RA,RB 4 BF // RA RB 718 4 BF // RA RB 732 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al (RA)32:63 al (RA)32:63 bl (RB)32:63 bl (RB)32:63 if (al = bl) then cl 1 if (al > bl) then cl 1 else cl 0 else cl 0 CR4×BF:4×BF+3 undefined || cl || undefined || undefined CR4×BF:4×BF+3 undefined || cl || undefined || undefined The low element of register RA is compared against the The low element of register RA is compared against the low element of register RB. If RA32:63 is equal to low element of register RB. If RA32:63 is greater than RB32:63, bit 1 of CR field BF is set to 1, otherwise it is RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). Comparison ignores the sign of 0 (+0 = -0). The com- parison proceeds after treating NaNs, Infinities, and If an Input Error occurs and default results are gener- Denorms as normalized numbers, using their values of ated, NaNs, Infinities, and Denorms are treated as nor- `e' and `f' directly. malized numbers, using their values of `e' and `f' directly. No exceptions are generated during the execution of efststgt. Special Registers Altered: FINV FINVS Special Registers Altered: FG FX CR field BF CR field BF Programming Note In an implementation, the execution of efststgt is likely to be faster than the execution of efscmpgt; however, if strict IEEE 754 compliance is required, the program should use efscmpgt. 572 Power ISATM Book I Version 2.06 Floating-Point Single-Precision Test Less Floating-Point Single-Precision Test Than EVX-form Equal EVX-form efststlt BF,RA,RB efststeq BF,RA,RB 4 BF // RA RB 733 4 BF // RA RB 734 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al (RA)32:63 al (RA)32:63 bl (RB)32:63 bl (RB)32:63 if (al < bl) then cl 1 if (al = bl) then cl 1 else cl 0 else cl 0 CR4×BF:4×BF+3 undefined || cl || undefined || undefined CR4×BF:4×BF+3 undefined || cl || undefined || undefined The low element of register RA is compared against the The low element of register RA is compared against the low element of register RB. If RA32:63 is less than low element of register RB. If RA32:63 is equal to RB32:63, bit 1 of CR field BF is set to 1, otherwise it is RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). The com- Comparison ignores the sign of 0 (+0 = -0). The com- parison proceeds after treating NaNs, Infinities, and parison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of Denorms as normalized numbers, using their values of `e' and `f' directly. `e' and `f' directly. No exceptions are generated during the execution of No exceptions are generated during the execution of efststlt. efststeq. Special Registers Altered: Special Registers Altered: CR field BF CR field BF Programming Note Programming Note In an implementation, the execution of efststlt is In an implementation, the execution of efststeq is likely to be faster than the execution of efscmplt; likely to be faster than the execution of efscmpeq; however, if strict IEEE 754 compliance is required, however, if strict IEEE 754 compliance is required, the program should use efscmplt. the program should use efscmpeq. Chapter 9. Embedded Floating-Point 573 Version 2.06 Convert Floating-Point Single-Precision Convert Floating-Point Single-Precision from Signed Integer EVX-form from Unsigned Integer EVX-form efscfsi RT,RB efscfui RT,RB 4 RT /// RB 721 4 RT /// RB 720 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 CnvtI32ToFP32((RB)32:63, S, LO, I) RT32:63 CnvtI32ToFP32((RB)32:63, U, LO, I) The signed integer low element in register RB is con- The unsigned integer low element in register RB is con- verted to a single-precision floating-point value using verted to a single-precision floating-point value using the current rounding mode and the result is placed into the current rounding mode and the result is placed into the low element of register RT. the low element of register RT. Special Registers Altered: Special Registers Altered: FINXS FG FX FINXS FG FX Convert Floating-Point Single-Precision Convert Floating-Point Single-Precision from Signed Fraction EVX-form from Unsigned Fraction EVX-form efscfsf RT,RB efscfuf RT,RB 4 RT /// RB 723 4 RT /// RB 722 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 CnvtI32ToFP32((RB)32:63, S, LO, F) RT32:63 CnvtI32ToFP32((RB)32:63, U, LO, F) The signed fractional low element in register RB is con- The unsigned fractional low element in register RB is verted to a single-precision floating-point value using converted to a single-precision floating-point value the current rounding mode and the result is placed into using the current rounding mode and the result is the low element of register RT. placed into the low element of register RT. Special Registers Altered: Special Registers Altered: FINXS FG FX FINXS FG FX Convert Floating-Point Single-Precision Convert Floating-Point Single-Precision to Signed Integer EVX-form to Unsigned Integer EVX-form efsctsi RT,RB efsctui RT,RB 4 RT /// RB 725 4 RT /// RB 724 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 CnvtFP32ToI32Sat((RB)32:63, S, LO, RND, I) RT32:63 CnvtFP32ToI32Sat((RB)32:63, U, LO, RND, I) The single-precision floating-point low element in regis- The single-precision floating-point low element in regis- ter RB is converted to a signed integer using the cur- ter RB is converted to an unsigned integer using the rent rounding mode and the result is saturated if it current rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FINXS FG FX FINXS FG FX 574 Power ISATM Book I Version 2.06 Convert Floating-Point Single-Precision Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero to Unsigned Integer with Round toward EVX-form Zero EVX-form efsctsiz RT,RB efsctuiz RT,RB 4 RT /// RB 730 4 RT /// RB 728 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 CnvtFP32ToI32Sat((RB)32:63, S, LO, ZER, I) RT32:63 CnvtFP32ToI32Sat((RB)32:63, U, LO, ZER, I) The single-precision floating-point low element in regis- The single-precision floating-point low element in regis- ter RB is converted to a signed integer using the round- ter RB is converted to an unsigned integer using the ing mode Round toward Zero and the result is rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. NaNs are converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FINXS FG FX FINXS FG FX Convert Floating-Point Single-Precision Convert Floating-Point Single-Precision to Signed Fraction EVX-form to Unsigned Fraction EVX-form efsctsf RT,RB efsctuf RT,RB 4 RT /// RB 727 4 RT /// RB 726 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 CnvtFP32ToI32Sat((RB)32:63, S, LO, RND, F) RT32:63 CnvtFP32ToI32Sat((RB)32:63, U, LO, RND, F) The single-precision floating-point low element in regis- The single-precision floating-point low element in regis- ter RB is converted to a signed fraction using the cur- ter RB is converted to an unsigned fraction using the rent rounding mode and the result is saturated if it current rounding mode and the result is saturated if it cannot be represented in a 32-bit fraction. NaNs are cannot be represented in a 32-bit unsigned fraction. converted as though they were zero. NaNs are converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FINXS FG FX FINXS FG FX Chapter 9. Embedded Floating-Point 575 Version 2.06 9.3.4 SPE.Embedded Float Scalar Double Instructions [Category: SPE.Embedded Float Scalar Double] Floating-Point Double-Precision Absolute Floating-Point Double-Precision Negative Value EVX-form Absolute Value EVX-form efdabs RT,RA efdnabs RT,RA 4 RT RA /// 740 4 RT RA /// 741 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 0b0 || (RA)1:63 RT0:63 0b1 || (RA)1:63 The sign bit of register RA is set to 0 and the result is The sign bit of register RA is set to 1 and the result is placed in register RT. placed in register RT. Regardless of the value of register RA, no exceptions Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. are taken during the execution of this instruction. Special Registers Altered: Special Registers Altered: None None Floating-Point Double-Precision Negate EVX-form efdneg RT,RA 4 RT RA /// 742 0 6 11 16 21 31 RT0:63 ¬(RA)0 || (RA)1:63 The sign bit of register RA is complemented and the result is placed in register RT. Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. Special Registers Altered: None 576 Power ISATM Book I Version 2.06 Floating-Point Double-Precision Add Floating-Point Double-Precision Subtract EVX-form EVX-form efdadd RT,RA,RB efdsub RT,RA,RB 4 RT RA RB 736 4 RT RA RB 737 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 (RA)0:63 +dp (RB)0:63 RT0:63 (RA)0:63 -dp (RB)0:63 RA is added to RB and the result is stored in register RB is subtracted from RA and the result is stored in RT. register RT. If an underflow occurs, +0 (for rounding modes RN, RZ, If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in register RP) or -0 (for rounding mode RM) is stored in register RT. RT. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FOVF FOVFS FOVF FOVFS FUNF FUNFS FUNF FUNFS FG FX FINXS FG FX FINXS Floating-Point Double-Precision Multiply Floating-Point Double-Precision Divide EVX-form EVX-form efdmul RT,RA,RB efddiv RT,RA,RB 4 RT RA RB 744 4 RT RA RB 745 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 (RA)0:63 ×dp (RB)0:63 RT0:63 (RA)0:63 ÷dp (RB)0:63 RA is multiplied by RB and the result is stored in regis- RA is divided by RB and the result is stored in register ter RT. RT. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FOVF FOVFS FG FX FINXS FUNF FUNFS FDBZ FDBZS FG FX FINXS FOVF FOVFS FUNF FUNFS Chapter 9. Embedded Floating-Point 577 Version 2.06 Floating-Point Double-Precision Compare Floating-Point Double-Precision Compare Greater Than EVX-form Less Than EVX-form efdcmpgt BF,RA,RB efdcmplt BF,RA,RB 4 BF // RA RB 748 4 BF // RA RB 749 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al (RA)0:63 al (RA)0:63 bl (RB)0:63 bl (RB)0:63 if (al > bl) then cl 1 if (al < bl) then cl 1 else cl 0 else cl 0 CR4×BF:4×BF+3 undefined || cl || undefined || undefined CR4×BF:4×BF+3 undefined || cl || undefined || undefined RA is compared against RB. If RA is greater than RB, RA is compared against RB. If RA is less than RB, bit 1 bit 1 of CR field BF is set to 1, otherwise it is set to 0. of CR field BF is set to 1, otherwise it is set to 0. Bits 0, Bits 0, 2, and 3 of CR field BF are undefined. Compari- 2, and 3 of CR field BF are undefined. Comparison son ignores the sign of 0 (+0 = -0). ignores the sign of 0 (+0 = -0). If an input error occurs and default results are gener- If an input error occurs and default results are gener- ated, NaNs, Infinities, and Denorms are treated as nor- ated, NaNs, Infinities, and Denorms are treated as nor- malized numbers, using their values of `e' and `f' malized numbers, using their values of `e' and `f' directly. directly. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FG FX FG FX CR field BF CR field BF Floating-Point Double-Precision Compare Floating-Point Double-Precision Test Equal EVX-form Greater Than EVX-form efdcmpeq BF,RA,RB efdtstgt BF,RA,RB 4 BF // RA RB 750 4 BF // RA RB 764 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al (RA)0:63 al (RA)0:63 bl (RB)0:63 bl (RB)0:63 if (al = bl) then cl 1 if (al > bl) then cl 1 else cl 0 else cl 0 CR4×BF:4×BF+3 undefined || cl || undefined || undefined CR4×BF:4×BF+3 undefined || cl || undefined || undefined RA is compared against RB. If RA is equal to RB, bit 1 RA is compared against RB. If RA is greater than RB, of CR field BF is set to 1, otherwise it is set to 0. Bits 0, bit 1 of CR field BF is set to 1, otherwise it is set to 0. 2, and 3 of CR field BF are undefined. Comparison Bits 0, 2, and 3 of CR field BF are undefined. Compari- ignores the sign of 0 (+0 = -0). son ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms If an input error occurs and default results are gener- as normalized numbers, using their values of `e' and `f' ated, NaNs, Infinities, and Denorms are treated as nor- directly. malized numbers, using their values of `e' and `f' directly. No exceptions are generated during the execution of efdtstgt. Special Registers Altered: FINV FINVS Special Registers Altered: FG FX CR field BF CR field BF Programming Note In an implementation, the execution of efdtstgt is likely to be faster than the execution of efdcmpgt; however, if strict IEEE 754 compliance is required, the program should use efdcmpgt. 578 Power ISATM Book I Version 2.06 Floating-Point Double-Precision Test Floating-Point Double-Precision Test Less Than EVX-form Equal EVX-form efdtstlt BF,RA,RB efdtsteq BF,RA,RB 4 BF // RA RB 765 4 BF // RA RB 766 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al (RA)0:63 al (RA)0:63 bl (RB)0:63 bl (RB)0:63 if (al < bl) then cl 1 if (al = bl) then cl 1 else cl 0 else cl 0 CR4×BF:4×BF+3 undefined || cl || undefined || undefined CR4×BF:4×BF+3 undefined || cl || undefined || undefined RA is compared against RB. If RA is less than RB, bit 1 RA is compared against RB. If RA is equal to RB, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). The comparison pro- ignores the sign of 0 (+0 = -0). The comparison pro- ceeds after treating NaNs, Infinities, and Denorms as ceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of `e' and `f' normalized numbers, using their values of `e' and `f' directly. directly. No exceptions are generated during the execution of No exceptions are generated during the execution of efdtstlt. efdtsteq. Special Registers Altered: Special Registers Altered: CR field BF CR field BF Programming Note Programming Note In an implementation, the execution of efdtstlt is In an implementation, the execution of efdtsteq is likely to be faster than the execution of efdcmplt; likely to be faster than the execution of efdcmpeq; however, if strict IEEE 754 compliance is required, however, if strict IEEE 754 compliance is required, the program should use efdcmplt. the program should use efdcmpeq. Convert Floating-Point Double-Precision Convert Floating-Point Double-Precision from Signed Integer EVX-form from Unsigned Integer EVX-form efdcfsi RT,RB efdcfui RT,RB 4 RT /// RB 753 4 RT /// RB 752 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 CnvtI32ToFP64((RB)32:63, S, I) RT0:63 CnvtI32ToFP64((RB)32:63, U, I) The signed integer low element in register RB is con- The unsigned integer low element in register RB is con- verted to a double-precision floating-point value using verted to a double-precision floating-point value using the current rounding mode and the result is placed in the current rounding mode and the result is placed in register RT. register RT. Special Registers Altered: Special Registers Altered: None None Chapter 9. Embedded Floating-Point 579 Version 2.06 Convert Floating-Point Double-Precision Convert Floating-Point Double-Precision from Signed Integer Doubleword from Unsigned Integer Doubleword EVX-form EVX-form efdcfsid RT,RB efdcfuid RT,RB 4 RT /// RB 739 4 RT /// RB 738 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 CnvtI64ToFP64((RB)0:63, S) RT0:63 CnvtI64ToFP64((RB)0:63, U) The signed integer doubleword in register RB is con- The unsigned integer doubleword in register RB is con- verted to a double-precision floating-point value using verted to a double-precision floating-point value using the current rounding mode and the result is placed in the current rounding mode and the result is placed in register RT. register RT. Corequisite Categories: Corequisite Categories: 64-Bit 64-Bit Special Registers Altered: Special Registers Altered: FINXS FG FX FINXS FG FX Convert Floating-Point Double-Precision Convert Floating-Point Double-Precision from Signed Fraction to Signed Integer EVX-form EVX-form efdctsi RT,RB efdcfsf RT,RB 4 RT /// RB 757 4 RT /// RB 755 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 CnvtFP64ToI32Sat((RB)0:63, S, RND, I) RT0:63 CnvtI32ToFP64((RB)32:63, S, F) The double-precision floating-point value in register RB The signed fractional low element in register RB is con- is converted to a signed integer using the current verted to a double-precision floating-point value using rounding mode and the result is saturated if it cannot the current rounding mode and the result is placed in be represented in a 32-bit integer. NaNs are converted register RT. as though they were zero. Special Registers Altered: Special Registers Altered: None FINV FINVS FINXS FG FX Convert Floating-Point Double-Precision from Unsigned Fraction EVX-form Convert Floating-Point Double-Precision to Unsigned Integer EVX-form efdcfuf RT,RB efdctui RT,RB 4 RT /// RB 754 0 6 11 16 21 31 4 RT /// RB 756 0 6 11 16 21 31 RT0:63 CnvtI32ToFP64((RB)32:63, U, F) RT32:63 CnvtFP64ToI32Sat((RB)0:63, U, RND, I) The unsigned fractional low element in register RB is converted to a double-precision floating-point value The double-precision floating-point value in register RB using the current rounding mode and the result is is converted to an unsigned integer using the current placed in register RT. rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted Special Registers Altered: as though they were zero. None Special Registers Altered: FINV FINVS FINXS FG FX 580 Power ISATM Book I Version 2.06 Convert Floating-Point Double-Precision Convert Floating-Point Double-Precision to Signed Integer Doubleword with Round to Unsigned Integer Doubleword with toward Zero EVX-form Round toward Zero EVX-form efdctsidz RT,RB efdctuidz RT,RB 4 RT /// RB 747 4 RT /// RB 746 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 CnvtFP64ToI64Sat((RB)0:63, S, ZER) RT0:63 CnvtFP64ToI64Sat((RB)0:63, U, ZER) The double-precision floating-point value in register RB The double-precision floating-point value in register RB is converted to a signed integer doubleword using the is converted to an unsigned integer doubleword using rounding mode Round toward Zero and the result is the rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 64-bit integer. saturated if it cannot be represented in a 64-bit integer. NaNs are converted as though they were zero. NaNs are converted as though they were zero. Corequisite Categories: Corequisite Categories: 64-Bit 64-Bit Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FINXS FG FX FINXS FG FX Chapter 9. Embedded Floating-Point 581 Version 2.06 Convert Floating-Point Double-Precision Convert Floating-Point Double-Precision to Signed Integer with Round toward Zero to Unsigned Integer with Round toward EVX-form Zero EVX-form efdctsiz RT,RB efdctuiz RT,RB 4 RT /// RB 762 4 RT /// RB 760 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 CnvtFP64ToI32Sat((RB)0:63, S, ZER, I) RT32:63 CnvtFP64ToI32Sat((RB)0:63, U, ZER, I) The double-precision floating-point value in register RB The double-precision floating-point value in register RB is converted to a signed integer using the rounding is converted to an unsigned integer using the rounding mode Round toward Zero and the result is saturated if mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FINXS FG FX FINXS FG FX Convert Floating-Point Double-Precision Floating-Point Double-Precision Convert to Signed Fraction EVX-form from Single-Precision EVX-form efdctsf RT,RB efdcfs RT,RB 4 RT /// RB 759 4 RT /// RB 751 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 CnvtFP64ToI32Sat((RB)0:63, S, RND, F) FP32format f; FP64format result; The double-precision floating-point value in register RB f (RB)32:63 is converted to a signed fraction using the current if (fexp = 0) & (ffrac = 0)) then rounding mode and the result is saturated if it cannot result fsign || 630 be represented in a 32-bit fraction. NaNs are converted else if Isa32NaNorInfinity(f) | Isa32Denorm(f) then as though they were zero. SPEFSCRFINV 1 result fsign || 0b11111111110 || 521 Special Registers Altered: else if Isa32Denorm(f) then FINV FINVS SPEFSCRFINV 1 FINXS FG FX result fsign || 630 else resultsign fsign Convert Floating-Point Double-Precision resultexp fexp - 127 + 1023 resultfrac ffrac || 290 to Unsigned Fraction EVX-form RT0:63 result efdctuf RT,RB The single-precision floating-point value in the low ele- ment of register RB is converted to a double-precision 4 RT /// RB 758 floating-point value and the result is placed in register 0 6 11 16 21 31 RT. Corequisite Categories: RT32:63 CnvtFP64ToI32Sat((RB)0:63, U, RND, F) SPE.Embedded Float Scalar Single or The double-precision floating-point value in register RB SPE.Embedded Float Vector is converted to an unsigned fraction using the current Special Registers Altered: rounding mode and the result is saturated if it cannot FINV FINVS be represented in a 32-bit unsigned fraction. NaNs are FG FX converted as though they were zero. Special Registers Altered: FINV FINVS FINXS FG FX 582 Power ISATM Book I Version 2.06 Floating-Point Single-Precision Convert from Double-Precision EVX-form efscfd RT,RB 4 RT /// RB 719 0 6 11 16 21 31 FP64format f; FP32format result; f (RB)0:63 if (fexp = 0) & (ffrac = 0)) then result fsign || 310 else if Isa64NaNorInfinity(f) then SPEFSCRFINV 1 result fsign || 0b11111110 || 231 else if Isa64Denorm(f) then SPEFSCRFINV 1 result fsign || 310 else unbias fexp - 1023 if unbias > 127 then result fsign || 0b11111110 || 231 SPEFSCRFOVF 1 else if unbias < -126 then result fsign || 310 SPEFSCRFUNF 1 else resultsign fsign resultexp unbias + 127 resultfrac ffrac[0:22] guard ffrac[23] sticky (ffrac[24:51] 0) result Round32(result, LO, guard, sticky) SPEFSCRFG guard SPEFSCRFX sticky if guard | sticky then SPEFSCRFINXS 1 RT32:63 result The double-precision floating-point value in register RB is converted to a single-precision floating-point value using the current rounding mode and the result is placed into the low element of register RT. Corequisite Categories: SPE.Embedded Float Scalar Scalar Special Registers Altered: FINV FINVS FOVF FOVFS FUNF FUNFS FG FX FINXS Chapter 9. Embedded Floating-Point 583 Version 2.06 9.4 Embedded Floating-Point Results Summary The following tables summarize the results of various nmin denotes the minimum normalized negative types of Embedded Floating-Point operations on vari- number. The encoding for single-precision is: ous combinations of input operands. Flag settings are 0x80800000. The encoding for double-precision is: performed on appropriate element flags. For all the 0x80100000_00000000. tables the following annotation and general rules apply: Calculations that overflow or underflow saturate. * denotes that this status flag is set based on the Overflow for operations that have a floating-point results of the calculation. result force the result to max. Underflow for opera- _Calc_ denotes that the result is updated with the tions that have a floating-point result force the results of the computation. result to zero. Overflow for operations that have a max denotes the maximum normalized number signed integer result force the result to with the sign set to the computation [sign(operand 0x7FFFFFFF (positive) or 0x80000000 (negative). A) XOR sign(operand B)]. Overflow for operations that have an unsigned amax denotes the maximum normalized number integer result force the result to 0xFFFFFFFF (pos- with the sign set to the sign of Operand A. itive) or 0x00000000 (negative). bmax denotes the maximum normalized number 1 (superscript) denotes that the sign of the result is with the sign set to the sign of Operand B. positive when the sign of Operand A and the sign pmax denotes the maximum normalized positive of Operand B are different, for all rounding modes number. The encoding for single-precision is: except round to -infinity, where the sign of the 0x7F7FFFFF. The encoding for double-precision result is then negative. 2 is: 0x7FEFFFFF_FFFFFFFF. (superscript) denotes that the sign of the result is nmax denotes the maximum normalized negative positive when the sign of Operand A and the sign number. The encoding for single-precision is: of Operand B are the same, for all rounding modes 0xFF7FFFFF. The encoding for double-precision except round to -infinity, where the sign of the is: 0xFFEFFFFF_FFFFFFFF. result is then negative. 3 pmin denotes the minimum normalized positive (superscript) denotes that the sign for any multi- number. The encoding for single-precision is: ply or divide is always the result of the operation 0x00800000. The encoding for double-precision is: [sign(Operand A) XOR sign(Operand B)]. 0x00100000_00000000. 4 (superscript) denotes that if an overflow is detected, the result may be saturated. Table 100:Embedded Floating-Point Results Summary--Add, Sub, Mul, Div Operation Operand A Operand B Result FINV FOVF FUNF FDBZ FINX Add Add amax 1 0 0 0 0 Add NaN amax 1 0 0 0 0 Add denorm amax 1 0 0 0 0 Add zero amax 1 0 0 0 0 Add Norm amax 1 0 0 0 0 Add NaN amax 1 0 0 0 0 Add NaN NaN amax 1 0 0 0 0 Add NaN denorm amax 1 0 0 0 0 Add NaN zero amax 1 0 0 0 0 Add NaN norm amax 1 0 0 0 0 Add denorm bmax 1 0 0 0 0 Add denorm NaN bmax 1 0 0 0 0 1 Add denorm denorm zero 1 0 0 0 0 Add denorm zero zero1 1 0 0 0 0 4 Add denorm norm operand_b 1 0 0 0 0 Add zero bmax 1 0 0 0 0 Add zero NaN bmax 1 0 0 0 0 Add zero denorm zero1 1 0 0 0 0 584 Power ISATM Book I Version 2.06 Table 100:Embedded Floating-Point Results Summary--Add, Sub, Mul, Div (Continued) Operation Operand A Operand B Result FINV FOVF FUNF FDBZ FINX Add zero zero zero1 0 0 0 0 0 Add zero norm operand_b4 0 0 0 0 0 Add norm bmax 1 0 0 0 0 Add norm NaN bmax 1 0 0 0 0 Add norm denorm operand_a4 1 0 0 0 0 Add norm zero operand_a4 0 0 0 0 0 Add norm norm _Calc_ 0 * * 0 * Subtract Sub amax 1 0 0 0 0 Sub NaN amax 1 0 0 0 0 Sub denorm amax 1 0 0 0 0 Sub zero amax 1 0 0 0 0 Sub Norm amax 1 0 0 0 0 Sub NaN amax 1 0 0 0 0 Sub NaN NaN amax 1 0 0 0 0 Sub NaN denorm amax 1 0 0 0 0 Sub NaN zero amax 1 0 0 0 0 Sub NaN norm amax 1 0 0 0 0 Sub denorm -bmax 1 0 0 0 0 Sub denorm NaN -bmax 1 0 0 0 0 2 Sub denorm denorm zero 1 0 0 0 0 2 Sub denorm zero zero 1 0 0 0 0 Sub denorm norm -operand_b4 1 0 0 0 0 Sub zero -bmax 1 0 0 0 0 Sub zero NaN -bmax 1 0 0 0 0 Sub zero denorm zero2 1 0 0 0 0 Sub zero zero zero2 0 0 0 0 0 4 Sub zero norm -operand_b 0 0 0 0 0 Sub norm -bmax 1 0 0 0 0 Sub norm NaN -bmax 1 0 0 0 0 Sub norm denorm operand_a4 1 0 0 0 0 Sub norm zero operand_a4 0 0 0 0 0 Sub norm norm _Calc_ 0 * * 0 * Multiply3 Mul max 1 0 0 0 0 Mul NaN max 1 0 0 0 0 Mul denorm zero 1 0 0 0 0 Mul zero zero 1 0 0 0 0 Mul Norm max 1 0 0 0 0 Mul NaN max 1 0 0 0 0 Mul NaN NaN max 1 0 0 0 0 Mul NaN denorm zero 1 0 0 0 0 Mul NaN zero zero 1 0 0 0 0 Mul NaN norm max 1 0 0 0 0 Chapter 9. Embedded Floating-Point 585 Version 2.06 Table 100:Embedded Floating-Point Results Summary--Add, Sub, Mul, Div (Continued) Operation Operand A Operand B Result FINV FOVF FUNF FDBZ FINX Mul denorm zero 1 0 0 0 0 Mul denorm NaN zero 1 0 0 0 0 Mul denorm denorm zero 1 0 0 0 0 Mul denorm zero zero 1 0 0 0 0 Mul denorm norm zero 1 0 0 0 0 Mul zero zero 1 0 0 0 0 Mul zero NaN zero 1 0 0 0 0 Mul zero denorm zero 1 0 0 0 0 Mul zero zero zero 0 0 0 0 0 Mul zero norm zero 0 0 0 0 0 Mul norm max 1 0 0 0 0 Mul norm NaN max 1 0 0 0 0 Mul norm denorm zero 1 0 0 0 0 Mul norm zero zero 0 0 0 0 0 Mul norm norm _Calc_ 0 * * 0 * 3 Divide Div zero 1 0 0 0 0 Div NaN zero 1 0 0 0 0 Div denorm max 1 0 0 0 0 Div zero max 1 0 0 0 0 Div Norm max 1 0 0 0 0 Div NaN zero 1 0 0 0 0 Div NaN NaN zero 1 0 0 0 0 Div NaN denorm max 1 0 0 0 0 Div NaN zero max 1 0 0 0 0 Div NaN norm max 1 0 0 0 0 Div denorm zero 1 0 0 0 0 Div denorm NaN zero 1 0 0 0 0 Div denorm denorm max 1 0 0 0 0 Div denorm zero max 1 0 0 0 0 Div denorm norm zero 1 0 0 0 0 Div zero zero 1 0 0 0 0 Div zero NaN zero 1 0 0 0 0 Div zero denorm max 1 0 0 0 0 Div zero zero max 1 0 0 0 0 Div zero norm zero 0 0 0 0 0 Div norm zero 1 0 0 0 0 Div norm NaN zero 1 0 0 0 0 Div norm denorm max 1 0 0 0 0 Div norm zero max 0 0 0 1 0 Div norm norm _Calc_ 0 * * 0 * 586 Power ISATM Book I Version 2.06 Table 101:Embedded Floating-Point Results Summary--Single Convert from Double Operand B efscfd result FINV FOVF FUNF FDBZ FINX + pmax 1 0 0 0 0 - nmax 1 0 0 0 0 +NaN pmax 1 0 0 0 0 -NaN nmax 1 0 0 0 0 +denorm +zero 1 0 0 0 0 -denorm -zero 1 0 0 0 0 +zero +zero 0 0 0 0 0 -zero -zero 0 0 0 0 0 norm _Calc_ 0 * * 0 * Table 102:Embedded Floating-Point Results Summary--Double Convert from Single Operand B efdcfs result FINV FOVF FUNF FDBZ FINX + pmax 1 0 0 0 0 - nmax 1 0 0 0 0 +NaN pmax 1 0 0 0 0 -NaN nmax 1 0 0 0 0 +denorm +zero 1 0 0 0 0 -denorm -zero 1 0 0 0 0 +zero +zero 0 0 0 0 0 -zero -zero 0 0 0 0 0 norm _Calc_ 0 0 0 0 0 Table 103:Embedded Floating-Point Results Summary--Convert to Unsigned Integer Result Fractional Result Operand B FINV FOVF FUNF FDBZ FINX ctui[d][z] ctuf + 0xFFFF_FFFF 0x7FFF_FFFF 1 0 0 0 0 0xFFFF_FFFF_FFFF_FFFF - 0 0 1 0 0 0 0 +NaN 0 0 1 0 0 0 0 -NaN 0 0 1 0 0 0 0 denorm 0 0 1 0 0 0 0 zero 0 0 0 0 0 0 0 +norm _Calc_ _Calc_ * 0 0 0 * -norm _Calc_ _Calc_ * 0 0 0 * Chapter 9. Embedded Floating-Point 587 Version 2.06 Table 104:Embedded Floating-Point Results Summary--Convert to Signed Integer Result Fractional Result Operand B FINV FOVF FUNF FDBZ FINX ctsi[d][z] ctsf + 0x7FFF_FFFF 0x7FFF_FFFF 1 0 0 0 0 0x7FFF_FFFF_FFFF_FFFF - 0x8000_0000 0x8000_0000 1 0 0 0 0 0x8000_0000_0000_0000 +NaN 0 0 1 0 0 0 0 -NaN 0 0 1 0 0 0 0 denorm 0 0 1 0 0 0 0 zero 0 0 0 0 0 0 0 +norm _Calc_ _Calc_ * 0 0 0 * -norm _Calc_ _Calc_ * 0 0 0 * Table 105:Embedded Floating-Point Results Summary--Convert from Unsigned Integer Source Fractional Source Operand B FINV FOVF FUNF FDBZ FINX cfui cfuf zero zero zero 0 0 0 0 0 norm _Calc_ _Calc_ 0 0 0 0 * Table 106:Embedded Floating-Point Results Summary--Convert from Signed Integer Source Fractional Source Operand B FINV FOVF FUNF FDBZ FINX cfsi cfsf zero zero zero 0 0 0 0 0 norm _Calc_ _Calc_ 0 0 0 0 * Table 107:Embedded Floating-Point Results Summary--*abs, *nabs, *neg Operand A *abs *nabs *neg FINV FOVF FUNF FDBZ FINX + pmax | + nmax | - -amax | - 1 0 0 0 0 - pmax | + nmax | - -amax | + 1 0 0 0 0 +NaN pmax | NaN nmax | -NaN -amax | -NaN 1 0 0 0 0 -NaN pmax | NaN nmax | -NaN -amax | +NaN 1 0 0 0 0 +denorm +zero | +denorm -zero | -denorm -zero | -denorm 1 0 0 0 0 -denorm +zero | +denorm -zero | -denorm +zero | +denorm 1 0 0 0 0 +zero +zero -zero -zero 0 0 0 0 0 -zero +zero -zero +zero 0 0 0 0 0 +norm +norm -norm -norm 0 0 0 0 0 -norm +norm -norm +norm 0 0 0 0 0 588 Power ISATM Book I Version 2.06 Chapter 10. Legacy Move Assist Instruction [Category: Legacy Move Assist] Determine Leftmost Zero Byte X-form Special Registers Altered: XER57:63 dlmzb RA,RS,RB (Rc=0) CR0 (if Rc=1) dlmzb. RA,RS,RB (Rc=1) 31 RS RA RB 78 Rc 0 6 11 16 21 31 d0:63 (RS)32:63 || (RB)32:63 i 0 x 0 y 0 do while (x<8) & (y=0) x x + 1 if di+32:i+39 = 0 then y 1 else i i + 8 RA x XER57:63 x if Rc = 1 then do CR35 SO if y = 1 then do if x<5 then CR32:34 0b010 else CR32:34 0b100 else CR32:34 0b001 The contents of bits 32:63 of register RS and the con- tents of bits 32:63 of register RB are concatenated to form an 8-byte operand. The operand is searched for the leftmost byte in which each bit is 0 (i.e., a null byte). Bytes in the operand are numbered from left to right starting with 1. If a null byte is found, its byte number is placed into bits 57:63 of the XER and into register RA. Otherwise, the value 0b000_1000 is placed into both bits 57:63 of the XER and register RA. If Rc is equal to 1, SO is copied into bit 35 of the CR and bits 32:34 of the CR are updated as follows: If no null byte is found, bits 32:34 of the CR are set to 0b001. If the leftmost null byte is in the first 4 bytes (i.e., from register RS), bits 32:34 of the CR are set to 0b010. If the leftmost null byte is in the last 4 bytes (i.e., from register RB), bits 32:34 of the CR are set to 0b100. Chapter 10. Legacy Move Assist Instruction [Category: Legacy Move As- 591 Version 2.06 592 Power ISATM Book I Version 2.06 Chapter 11. Legacy Integer Multiply-Accumulate Instructions [Category: Legacy Integer Multiply-Accumulate] The Legacy Integer Multiply-Accumulate instructions Programming Note with Rc=1 set the first three bits of CR Field 0 based on the 32-bit result, as described in Section 3.3.7, "Other Notice that CR Field 0 may not reflect the "true" Fixed-Point Instructions". (infinitely precise) result if overflow occurs. The XO-form Legacy Integer Multiply-Accumulate instructions set SO and OV when OE=1 to reflect over- flow of the 32-bit result. Multiply Accumulate Cross Halfword to Multiply Accumulate Cross Halfword to Word Modulo Signed XO-form Word Saturate Signed XO-form macchw RT,RA,RB (OE=0 Rc=0) macchws RT,RA,RB (OE=0 Rc=0) macchw. RT,RA,RB (OE=0 Rc=1) macchws. RT,RA,RB (OE=0 Rc=1) macchwo RT,RA,RB (OE=1 Rc=0) macchwso RT,RA,RB (OE=1 Rc=0) macchwo. RT,RA,RB (OE=1 Rc=1) macchwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 172 Rc 4 RT RA RB OE 236 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 (RA)48:63 ×si (RB)32:47 prod0:31 (RA)48:63 ×si (RB)32:47 temp0:32 prod0:31 + (RT)32:63 temp0:32 prod0:31 + RT32:63 RT32:63 temp1:32 if temp < -231 then RT32:63 0x8000_0000 RT0:31 undefined else if temp > 231-1 then RT32:63 0x7FFF_FFFF else RT32:63 temp1:32 The signed-integer halfword in bits 48:63 of register RA RT0:31 undefined is multiplied by the signed-integer halfword in bits 32:47 The signed-integer halfword in bits 48:63 of register RA of register RB. is multiplied by the signed-integer halfword in bits 32:47 The 32-bit signed-integer product is added to the of register RB. signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is added to the signed-integer word in bits 32:63 of register RT. The low-order 32 bits of the sum are placed into bits 32:63 of register RT. If the sum is less than -231, then the value 0x8000_0000 is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. If the sum is greater than 231-1, then the value Special Registers Altered: 0x7FFF_FFFF is placed into bits 32:63 of register RT. SO OV (if OE=1) Otherwise, the sum is placed into bits 32:63 of register CR0 (if Rc=1) RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) Chapter 11. Legacy Integer Multiply-Accumulate Instructions 593 Version 2.06 Multiply Accumulate Cross Halfword to Multiply Accumulate Cross Halfword to Word Modulo Unsigned XO-form Word Saturate Unsigned XO-form macchwu RT,RA,RB (OE=0 Rc=0) macchwsu RT,RA,RB (OE=0 Rc=0) macchwu. RT,RA,RB (OE=0 Rc=1) macchwsu. RT,RA,RB (OE=0 Rc=1) macchwuo RT,RA,RB (OE=1 Rc=0) macchwsuo RT,RA,RB (OE=1 Rc=0) macchwuo. RT,RA,RB (OE=1 Rc=1) macchwsuo. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 140 Rc 4 RT RA RB OE 204 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 (RA)48:63 ×ui (RB)32:47 prod0:31 (RA)48:63 ×ui (RB)32:47 temp0:32 prod0:31 + (RT)32:63 temp0:32 prod0:31 + (RT)32:63 RT temp1:32 if temp > 232-1 then RT 0xFFFF_FFFF else RT temp1:32 The unsigned-integer halfword in bits 48:63 of register RA is multiplied by the unsigned-integer halfword in bits The unsigned-integer halfword in bits 48:63 of register 32:47 of register RB. RA is multiplied by the unsigned-integer halfword in bits 32:47 of register RB. The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. The low-order 32 bits of the sum are placed into bits 32:63 of register RT. If the sum is greater than 232-1, then the value 0xFFFF_FFFF is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Otherwise, the sum is placed into bits 32:63 of register Special Registers Altered: RT. SO OV (if OE=1) CR0 (if Rc=1) The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) 594 Power ISATM Book I Version 2.06 Multiply Accumulate High Halfword to Multiply Accumulate High Halfword to Word Modulo Signed XO-form Word Saturate Signed XO-form machhw RT,RA,RB (OE=0 Rc=0) machhws RT,RA,RB (OE=0 Rc=0) machhw. RT,RA,RB (OE=0 Rc=1) machhws. RT,RA,RB (OE=0 Rc=1) machhwo RT,RA,RB (OE=1 Rc=0) machhwso RT,RA,RB (OE=1 Rc=0) machhwo. RT,RA,RB (OE=1 Rc=1) machhwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 44 Rc 4 RT RA RB OE 108 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 (RA)32:47 ×si (RB)32:47 prod0:31 (RA)32:47 ×si (RB)32:47 temp0:32 prod0:31 + (RT)32:63 temp0:32 prod0:31 + (RT)32:63 RT32:63 temp1:32 if temp < -231 then RT32:63 0x8000_0000 RT0:31 undefined else if temp > 231-1 then RT32:63 0x7FFF_FFFF else RT32:63 temp1:32 The signed-integer halfword in bits 32:47 of register RA RT0:31 undefined is multiplied by the signed-integer halfword in bits 32:47 of register RB. The signed-integer halfword in bits 32:47 of register RA is multiplied by the signed-integer halfword in bits 32:47 The 32-bit signed-integer product is added to the of register RB. signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is added to the The low-order 32 bits of the sum are placed into bits signed-integer word in bits 32:63 of register RT. 32:63 of register RT. If the sum is less than -231, then the value 0x8000_0000 The contents of bits 0:31 of register RT are undefined. is placed into bits 32:63 of register RT. Special Registers Altered: If the sum is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the sum is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) Chapter 11. Legacy Integer Multiply-Accumulate Instructions 595 Version 2.06 Multiply Accumulate High Halfword to Multiply Accumulate High Halfword to Word Modulo Unsigned XO-form Word Saturate Unsigned XO-form machhwu RT,RA,RB (OE=0 Rc=0) machhwsu RT,RA,RB (OE=0 Rc=0) machhwu. RT,RA,RB (OE=0 Rc=1) machhwsu. RT,RA,RB (OE=0 Rc=1) machhwuo RT,RA,RB (OE=1 Rc=0) machhwsuo RT,RA,RB (OE=1 Rc=0) machhwuo. RT,RA,RB (OE=1 Rc=1) machhwsuo. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 12 Rc 4 RT RA RB OE 76 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 (RA)32:47 ×ui (RB)32:47 prod0:31 (RA)32:47 ×ui (RB)32:47 temp0:32 prod0:31 + (RT)32:63 temp0:32 prod0:31 + (RT)32:63 RT32:63 temp1:32 if temp > 232-1 then RT 0xFFFF_FFFF RT0:31 undefined else RT temp1:32 The unsigned-integer halfword in bits 32:47 of register The unsigned-integer halfword in bits 32:47 of register RA is multiplied by the unsigned-integer halfword in bits RA is multiplied by the unsigned-integer halfword in bits 32:47 of register RB. 32:47 of register RB. The 32-bit unsigned-integer product is added to the The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. unsigned-integer word in bits 32:63 of register RT. The low-order 32 bits of the sum are placed into bits If the sum is greater than 232-1, then the value 32:63 of register RT. 0xFFFF_FFFF is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Otherwise, the sum is placed into bits 32:63 of register RT. Special Registers Altered: SO OV (if OE=1) The contents of bits 0:31 of register RT are undefined. CR0 (if Rc=1) Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) 596 Power ISATM Book I Version 2.06 Multiply Accumulate Low Halfword to Multiply Accumulate Low Halfword to Word Modulo Signed XO-form Word Saturate Signed XO-form maclhw RT,RA,RB (OE=0 Rc=0) maclhws RT,RA,RB (OE=0 Rc=0) maclhw. RT,RA,RB (OE=0 Rc=1) maclhws. RT,RA,RB (OE=0 Rc=1) maclhwo RT,RA,RB (OE=1 Rc=0) maclhwso RT,RA,RB (OE=1 Rc=0) maclhwo. RT,RA,RB (OE=1 Rc=1) maclhwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 428 Rc 4 RT RA RB OE 492 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 (RA)48:63 ×si (RB)48:63 prod0:31 (RA)48:63 ×si (RB)48:63 temp0:32 prod0:31 + (RT)32:63 temp0:32 prod0:31 + (RT)32:63 RT32:63 temp1:32 if temp < -231 then RT32:63 0x8000_0000 RT0:31 undefined else if temp > 231-1 then RT32:63 0x7FFF_FFFF else RT32:63 temp1:32 The signed-integer halfword in bits 48:63 of register RA RT0:31 undefined is multiplied by the signed-integer halfword in bits 48:63 of register RB. The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 48:63 The 32-bit signed-integer product is added to the of register RB. signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is added to the The low-order 32 bits of the sum are placed into bits signed-integer word in bits 32:63 of register RT. 32:63 of register RT. If the sum is less than -231, then the value 0x8000_0000 The contents of bits 0:31 of register RT are undefined. is placed into bits 32:63 of register RT. Special Registers Altered: If the sum is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the sum is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) Chapter 11. Legacy Integer Multiply-Accumulate Instructions 597 Version 2.06 Multiply Accumulate Low Halfword to Multiply Accumulate Low Halfword to Word Modulo Unsigned XO-form Word Saturate Unsigned XO-form maclhwu RT,RA,RB (OE=0 Rc=0) maclhwsu RT,RA,RB (OE=0 Rc=0) maclhwu. RT,RA,RB (OE=0 Rc=1) maclhwsu. RT,RA,RB (OE=0 Rc=1) maclhwuo RT,RA,RB (OE=1 Rc=0) maclhwsuo RT,RA,RB (OE=1 Rc=0) maclhwuo. RT,RA,RB (OE=1 Rc=1) maclhwsuo. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 396 Rc 4 RT RA RB OE 460 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 (RA)48:63 ×ui (RB)48:63 prod0:31 (RA)48:63 ×ui (RB)48:63 temp0:32 prod0:31 + (RT)32:63 temp0:32 prod0:31 + (RT)32:63 RT32:63 temp1:32 if temp > 232-1 then RT 0xFFFF_FFFF RT0:31 undefined else RT temp1:32 The unsigned-integer halfword in bits 48:63 of register The unsigned-integer halfword in bits 48:63 of register RA is multiplied by the unsigned-integer halfword in bits RA is multiplied by the unsigned-integer halfword in bits 48:63 of register RB. 48:63 of register RB. The 32-bit unsigned-integer product is added to the The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. unsigned-integer word in bits 32:63 of register RT. The low-order 32 bits of the sum are placed into bits If the sum is greater than 232-1, then the value 32:63 of register RT. 0xFFFF_FFFF is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Otherwise, the sum is placed into bits 32:63 of register RT. Special Registers Altered: SO OV (if OE=1) The contents of bits 0:31 of register RT are undefined. CR0 (if Rc=1) Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) Multiply Cross Halfword to Word Signed Multiply Cross Halfword to Word X-form Unsigned X-form mulchw RT,RA,RB (Rc=0) mulchwu RT,RA,RB (Rc=0) mulchw. RT,RA,RB (Rc=1) mulchwu. RT,RA,RB (Rc=1) 4 RT RA RB 168 Rc 4 RT RA RB 136 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 (RA)48:63 ×si (RB)32:47 RT32:63 (RA)48:63 ×ui (RB)32:47 RT0:31 undefined RT0:31 undefined The signed-integer halfword in bits 48:63 of register RA The unsigned-integer halfword in bits 48:63 of register is multiplied by the signed-integer halfword in bits 32:47 RA is multiplied by the unsigned-integer halfword in bits of register RB and the signed-integer word result is 32:47 of register RB and the unsigned-integer word placed into bits 32:63 of register RT. result is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) 598 Power ISATM Book I Version 2.06 Multiply High Halfword to Word Signed Multiply High Halfword to Word Unsigned X-form X-form mulhhw RT,RA,RB (Rc=0) mulhhwu RT,RA,RB (Rc=0) mulhhw. RT,RA,RB (Rc=1) mulhhwu. RT,RA,RB (Rc=1) 4 RT RA RB 40 Rc 4 RT RA RB 8 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 (RA)32:47 ×si (RB)32:47 RT32:63 (RA)32:47 ×ui (RB)32:47 RT0:31 undefined RT0:31 undefined The signed-integer halfword in bits 32:47 of register RA The unsigned-integer halfword in bits 32:47 of register is multiplied by the signed-integer halfword in bits 32:47 RA is multiplied by the unsigned-integer halfword in bits of register RB and the signed-integer word result is 32:47 of register RB and the unsigned-integer word placed into bits 32:63 of register RT. result is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Multiply Low Halfword to Word Signed Multiply Low Halfword to Word Unsigned X-form X-form mullhw RT,RA,RB (Rc=0) mullhwu RT,RA,RB (Rc=0) mullhw. RT,RA,RB (Rc=1) mullhwu. RT,RA,RB (Rc=1) 4 RT RA RB 424 Rc 4 RT RA RB 392 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 (RA)48:63 ×si (RB)48:63 RT32:63 (RA)48:63 ×ui (RB)48:63 RT0:31 undefined RT0:31 undefined The signed-integer halfword in bits 48:63 of register RA The unsigned-integer halfword in bits 48:63 of register is multiplied by the signed-integer halfword in bits 48:63 RA is multiplied by the unsigned-integer halfword in bits of register RB and the signed-integer word result is 48:63 of register RB and the unsigned-integer word placed into bits 32:63 of register RT. result is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Chapter 11. Legacy Integer Multiply-Accumulate Instructions 599 Version 2.06 Negative Multiply Accumulate Cross Negative Multiply Accumulate Cross Halfword to Word Modulo Signed Halfword to Word Saturate Signed XO-form XO-form nmacchw RT,RA,RB (OE=0 Rc=0) nmacchws RT,RA,RB (OE=0 Rc=0) nmacchw. RT,RA,RB (OE=0 Rc=1) nmacchws. RT,RA,RB (OE=0 Rc=1) nmacchwo RT,RA,RB (OE=1 Rc=0) nmacchwso RT,RA,RB (OE=1 Rc=0) nmacchwo. RT,RA,RB (OE=1 Rc=1) nmacchwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 174 Rc 4 RT RA RB OE 238 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 (RA)48:63 ×si (RB)32:47 prod0:31 (RA)48:63 ×si (RB)32:47 temp0:32 (RT)32:63 -si prod0:31 temp0:32 (RT)32:63 -si prod0:31 RT32:63 temp1:32 if temp < -231 then RT32:63 0x8000_0000 RT0:31 undefined else if temp > 231-1 then RT32:63 0x7FFF_FFFF else RT32:63 temp1:32 The signed-integer halfword in bits 48:63 of register RA RT0:31 undefined is multiplied by the signed-integer halfword in bits 32:47 of register RB. The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 32:47 The 32-bit signed-integer product is subtracted from of register RB. the signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is subtracted from The low-order 32 bits of the difference are placed into the signed-integer word in bits 32:63 of register RT. bits 32:63 of register RT. If the difference is less than -231, then the value The contents of bits 0:31 of register RT are undefined. 0x8000_0000 is placed into bits 32:63 of register RT. Special Registers Altered: If the difference is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the difference is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) 600 Power ISATM Book I Version 2.06 Negative Multiply Accumulate High Negative Multiply Accumulate High Halfword to Word Modulo Signed Halfword to Word Saturate Signed XO-form XO-form nmachhw RT,RA,RB (OE=0 Rc=0) nmachhws RT,RA,RB (OE=0 Rc=0) nmachhw. RT,RA,RB (OE=0 Rc=1) nmachhws. RT,RA,RB (OE=0 Rc=1) nmachhwo RT,RA,RB (OE=1 Rc=0) nmachhwso RT,RA,RB (OE=1 Rc=0) nmachhwo. RT,RA,RB (OE=1 Rc=1) nmachhwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 46 Rc 4 RT RA RB OE 110 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 (RA)32:47 ×si (RB)32:47 prod0:31 (RA)32:47 ×si (RB)32:47 temp0:32 (RT)32:63 -si prod0:31 temp0:32 (RT)32:63 -si prod0:31 RT32:63 temp1:32 if temp < -231 then RT32:63 0x8000_0000 RT0:31 undefined else if temp > 231-1 then RT32:63 0x7FFF_FFFF else RT32:63 temp1:32 The signed-integer halfword in bits 32:47 of register RA RT0:31 undefined is multiplied by the signed-integer halfword in bits 32:47 of register RB. The signed-integer halfword in bits 32:47 of register RA is multiplied by the signed-integer halfword in bits 32:47 The 32-bit signed-integer product is subtracted from of register RB. the signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is subtracted from The low-order 32 bits of the difference are placed into the signed-integer word in bits 32:63 of register RT. bits 32:63 of register RT. If the difference is less than -231, then the value The contents of bits 0:31 of register RT are undefined. 0x8000_0000 is placed into bits 32:63 of register RT. Special Registers Altered: If the difference is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the difference is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) Chapter 11. Legacy Integer Multiply-Accumulate Instructions 601 Version 2.06 Negative Multiply Accumulate Low Negative Multiply Accumulate Low Halfword to Word Modulo Signed Halfword to Word Saturate Signed XO-form XO-form nmaclhw RT,RA,RB (OE=0 Rc=0) nmaclhws RT,RA,RB (OE=0 Rc=0) nmaclhw. RT,RA,RB (OE=0 Rc=1) nmaclhws. RT,RA,RB (OE=0 Rc=1) nmaclhwo RT,RA,RB (OE=1 Rc=0) nmaclhwso RT,RA,RB (OE=1 Rc=0) nmaclhwo. RT,RA,RB (OE=1 Rc=1) nmaclhwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 430 Rc 4 RT RA RB OE 494 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 (RA)48:63 ×si (RB)48:63 prod0:31 (RA)48:63 ×si (RB)48:63 temp0:32 (RT)32:63 -si prod0:31 temp0:32 (RT)32:63 -si prod0:31 RT32:63 temp1:32 if temp < -231 then RT32:63 0x8000_0000 RT0:31 undefined else if temp > 231-1 then RT32:63 0x7FFF_FFFF else RT32:63 temp1:32 The signed-integer halfword in bits 48:63 of register RA RT0:31 undefined is multiplied by the signed-integer halfword in bits 48:63 of register RB. The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 48:63 The 32-bit signed-integer product is subtracted from of register RB. the signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is subtracted from The low-order 32 bits of the difference are placed into the signed-integer word in bits 32:63 of register RT. bits 32:63 of register RT. If the difference is less than -231, then the value The contents of bits 0:31 of register RT are undefined. 0x8000_0000 is placed into bits 32:63 of register RT. Special Registers Altered: If the difference is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the difference is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) 602 Power ISATM Book I Version 2.06 Appendix A. Suggested Floating-Point Models [Category: Floating-Point] A.1 Floating-Point Round to Single-Precision Model The following describes algorithmically the operation of the Floating Round to Single-Precision instruction. If (FRB)1:11 < 897 and (FRB)1:63 > 0 then Do If FPSCRUE = 0 then goto Disabled Exponent Underflow If FPSCRUE = 1 then goto Enabled Exponent Underflow End If (FRB)1:11 > 1150 and (FRB)1:11 < 2047 then Do If FPSCROE = 0 then goto Disabled Exponent Overflow If FPSCROE = 1 then goto Enabled Exponent Overflow End If (FRB)1:11 > 896 and (FRB)1:11 < 1151 then goto Normal Operand If (FRB)1:63 = 0 then goto Zero Operand If (FRB)1:11 = 2047 then Do If (FRB)12:63 = 0 then goto Infinity Operand If (FRB)12 = 1 then goto QNaN Operand If (FRB)12 = 0 and (FRB)13:63 > 0 then goto SNaN Operand End Disabled Exponent Underflow: sign (FRB)0 If (FRB)1:11 = 0 then Do exp -1022 frac0:52 0b0 || (FRB)12:63 End If (FRB)1:11 > 0 then Do exp (FRB)1:11 - 1023 frac0:52 0b1 || (FRB)12:63 End Denormalize operand: G || R || X 0b000 Do while exp < -126 exp exp + 1 frac0:52 || G || R || X 0b0 || frac0:52 || G || (R | X) End FPSCRUX (frac24:52 || G || R || X) > 0 Round Single(sign,exp,frac0:52,G,R,X) FPSCRXX FPSCRXX | FPSCRFI If frac0:52 = 0 then Do Appendix A. Suggested Floating-Point Models [Category: Floating-Point] 603 Version 2.06 FRT0 sign FRT1:63 0 If sign = 0 then FPSCRFPRF "+ zero" If sign = 1 then FPSCRFPRF "- zero" End If frac0:52 > 0 then Do If frac0 = 1 then Do If sign = 0 then FPSCRFPRF "+ normal number" If sign = 1 then FPSCRFPRF "- normal number" End If frac0 = 0 then Do If sign = 0 then FPSCRFPRF "+ denormalized number" If sign = 1 then FPSCRFPRF "- denormalized number" End Normalize operand: Do while frac0 = 0 exp exp-1 frac0:52 frac1:52 || 0b0 End FRT0 sign FRT1:11 exp + 1023 FRT12:63 frac1:52 End Done Enabled Exponent Underflow: FPSCRUX 1 sign (FRB)0 If (FRB)1:11 = 0 then Do exp -1022 frac0:52 0b0 || (FRB)12:63 End If (FRB)1:11 > 0 then Do exp (FRB)1:11 - 1023 frac0:52 0b1 || (FRB)12:63 End Normalize operand: Do while frac0 = 0 exp exp - 1 frac0:52 frac1:52 || 0b0 End Round Single(sign,exp,frac0:52,0,0,0) FPSCRXX FPSCRXX | FPSCRFI exp exp + 192 FRT0 sign FRT1:11 exp + 1023 FRT12:63 frac1:52 If sign = 0 then FPSCRFPRF "+ normal number" If sign = 1 then FPSCRFPRF "- normal number" Done Disabled Exponent Overflow: FPSCROX 1 If FPSCRRN = 0b00 then /* Round to Nearest */ Do If (FRB)0 = 0 then FRT 0x7FF0_0000_0000_0000 If (FRB)0 = 1 then FRT 0xFFF0_0000_0000_0000 If (FRB)0 = 0 then FPSCRFPRF "+ infinity" If (FRB)0 = 1 then FPSCRFPRF "- infinity" End 604 Power ISATM Book I Version 2.06 If FPSCRRN = 0b01 then /* Round toward Zero */ Do If (FRB)0 = 0 then FRT 0x47EF_FFFF_E000_0000 If (FRB)0 = 1 then FRT 0xC7EF_FFFF_E000_0000 If (FRB)0 = 0 then FPSCRFPRF "+ normal number" If (FRB)0 = 1 then FPSCRFPRF "- normal number" End If FPSCRRN = 0b10 then /* Round toward +Infinity */ Do If (FRB)0 = 0 then FRT 0x7FF0_0000_0000_0000 If (FRB)0 = 1 then FRT 0xC7EF_FFFF_E000_0000 If (FRB)0 = 0 then FPSCRFPRF "+ infinity" If (FRB)0 = 1 then FPSCRFPRF "- normal number" End If FPSCRRN = 0b11 then /* Round toward -Infinity */ Do If (FRB)0 = 0 then FRT 0x47EF_FFFF_E000_0000 If (FRB)0 = 1 then FRT 0xFFF0_0000_0000_0000 If (FRB)0 = 0 then FPSCRFPRF "+ normal number" If (FRB)0 = 1 then FPSCRFPRF "- infinity" End FPSCRFR undefined FPSCRFI 1 FPSCRXX 1 Done Enabled Exponent Overflow: sign (FRB)0 exp (FRB)1:11 - 1023 frac0:52 0b1 || (FRB)12:63 Round Single(sign,exp,frac0:52,0,0,0) FPSCRXX FPSCRXX | FPSCRFI Enabled Overflow: FPSCROX 1 exp exp - 192 FRT0 sign FRT1:11 exp + 1023 FRT12:63 frac1:52 If sign = 0 then FPSCRFPRF "+ normal number" If sign = 1 then FPSCRFPRF "- normal number" Done Zero Operand: FRT (FRB) If (FRB)0 = 0 then FPSCRFPRF "+ zero" If (FRB)0 = 1 then FPSCRFPRF "- zero" FPSCRFRFI 0b00 Done Infinity Operand: FRT (FRB) If (FRB)0 = 0 then FPSCRFPRF "+ infinity" If (FRB)0 = 1 then FPSCRFPRF "- infinity" FPSCRFRFI 0b00 Done QNaN Operand: FRT (FRB)0:34 || 290 FPSCRFPRF "QNaN" FPSCRFR FI 0b00 Done Appendix A. Suggested Floating-Point Models [Category: Floating-Point] 605 Version 2.06 SNaN Operand: FPSCRVXSNAN 1 If FPSCRVE = 0 then Do FRT0:11 (FRB)0:11 FRT12 1 FRT13:63 (FRB)13:34 || 290 FPSCRFPRF "QNaN" End FPSCRFR FI 0b00 Done Normal Operand: sign (FRB)0 exp (FRB)1:11 - 1023 frac0:52 0b1 || (FRB)12:63 Round Single(sign,exp,frac0:52,0,0,0) FPSCRXX FPSCRXX | FPSCRFI If exp > 127 and FPSCROE = 0 then go to Disabled Exponent Overflow If exp > 127 and FPSCROE = 1 then go to Enabled Overflow FRT0 sign FRT1:11 exp + 1023 FRT12:63 frac1:52 If sign = 0 then FPSCRFPRF "+ normal number" If sign = 1 then FPSCRFPRF "- normal number" Done Round Single(sign,exp,frac0:52,G,R,X): inc 0 lsb frac23 gbit frac24 rbit frac25 xbit (frac26:52||G||R||X)0 If FPSCRRN = 0b00 then /* Round to Nearest */ Do /* comparisons ignore u bits */ If sign || lsb || gbit || rbit || xbit = 0bu11uu then inc 1 If sign || lsb || gbit || rbit || xbit = 0bu011u then inc 1 If sign || lsb || gbit || rbit || xbit = 0bu01u1 then inc 1 End If FPSCRRN = 0b10 then /* Round toward + Infinity */ Do /* comparisons ignore u bits */ If sign || lsb || gbit || rbit || xbit = 0b0u1uu then inc 1 If sign || lsb || gbit || rbit || xbit = 0b0uu1u then inc 1 If sign || lsb || gbit || rbit || xbit = 0b0uuu1 then inc 1 End If FPSCRRN = 0b11 then /* Round toward - Infinity */ Do /* comparisons ignore u bits */ If sign || lsb || gbit || rbit || xbit = 0b1u1uu then inc 1 If sign || lsb || gbit || rbit || xbit = 0b1uu1u then inc 1 If sign || lsb || gbit || rbit || xbit = 0b1uuu1 then inc 1 End frac0:23 frac0:23 + inc If carry_out = 1 then Do frac0:23 0b1 || frac0:22 exp exp + 1 End frac24:52 290 FPSCRFR inc FPSCRFI gbit | rbit | xbit Return 606 Power ISATM Book I Version 2.06 A.2 Floating-Point Convert to Integer Model The following describes algorithmically the operation of the Floating Convert To Integer instructions. if Floating Convert To Integer Word then do round_mode FPSCRRN tgt_precision "32-bit signed integer" end if Floating Convert To Integer Word Unsigned then do round_mode FPSCRRN tgt_precision "32-bit unsigned integer" end if Floating Convert To Integer Word with round toward Zero then do round_mode 0b01 tgt_precision "32-bit signed integer" end if Floating Convert To Integer Word Unsigned with round toward Zero then do round_mode 0b01 tgt_precision "32-bit unsigned integer" end if Floating Convert To Integer Doubleword then do round_mode FPSCRRN tgt_precision "64-bit signed integer" end if Floating Convert To Integer Doubleword Unsigned then do round_mode FPSCRRN tgt_precision "64-bit unsigned integer" end if Floating Convert To Integer Doubleword with round toward Zero then do round_mode 0b01 tgt_precision "64-bit signed integer" end if Floating Convert To Integer Doubleword Unsigned with round toward Zero then do round_mode 0b01 tgt_precision "64-bit unsigned integer" end sign (FRB)0 if (FRB)1:11 = 2047 and (FRB)12:63 = 0 then goto Infinity Operand if (FRB)1:11 = 2047 and (FRB)12 = 0 then goto SNaN Operand if (FRB)1:11 = 2047 and (FRB)12 = 1 then goto QNaN Operand if (FRB)1:11 > 1086 then goto Large Operand if (FRB)1:11 > 0 then exp (FRB)1:11 - 1023 /* exp - bias */ if (FRB)1:11 = 0 then exp -1022 if (FRB)1:11 > 0 then frac0:64 0b01 || (FRB)12:63 || 110 /* normal */ if (FRB)1:11 = 0 then frac0:64 0b00 || (FRB)12:63 || 110 /* denormal */ gbit || rbit || xbit 0b000 do i=1,63-exp /* do the loop 0 times if exp = 63 */ frac0:64 || gbit || rbit || xbit 0b0 || frac0:64 || gbit || (rbit | xbit) end Round Integer( sign, frac0:64, gbit, rbit, xbit, round_mode ) if sign = 1 then frac0:64 ¬frac0:64 + 1 /* needed leading 0 for -264<(FRB)<-263 */ Appendix A. Suggested Floating-Point Models [Category: Floating-Point] 607 Version 2.06 if tgt_precision = "32-bit signed integer" and frac0:64 > 231-1 then goto Large Operand if tgt_precision = "64-bit signed integer" and frac0:64 > 263-1 then goto Large Operand if tgt_precision = "32-bit signed integer" and frac0:64 < -231 then goto Large Operand if tgt_precision = "64-bit signed integer" and frac0:64 < -263 then goto Large Operand if tgt_precision = "32-bit unsigned integer" & frac0:64 > 232-1 then goto Large Operand if tgt_precision = "64-bit unsigned integer" & frac0:64 > 264-1 then goto Large Operand if tgt_precision = "32-bit unsigned integer" & frac0:64 < 0 then goto Large Operand if tgt_precision = "64-bit unsigned integer" & frac0:64 < 0 then goto Large Operand FPSCRXX FPSCRXX | FPSCRFI if tgt_precision = "32-bit signed integer" then FRT 0xUUUU_UUUU || frac33:64 if tgt_precision = "32-bit unsigned integer" then FRT 0xUUUU_UUUU || frac33:64 if tgt_precision = "64-bit signed integer" then FRT frac1:64 if tgt_precision = "64-bit unsigned integer" then FRT frac1:64 FPSCRFPRF 0bUUUUU done Round Integer( sign, frac0:64, gbit, rbit, xbit, round_mode ): inc 0 if round_mode = 0b00 then do /* Round to Nearest */ if sign || frac64 || gbit || rbit || xbit = 0bU11UU then inc 1 if sign || frac64 || gbit || rbit || xbit = 0bU011U then inc 1 if sign || frac64 || gbit || rbit || xbit = 0bU01U1 then inc 1 end if round_mode = 0b10 then do /* Round toward +Infinity */ if sign || frac64 || gbit || rbit || xbit = 0b0U1UU then inc 1 if sign || frac64 || gbit || rbit || xbit = 0b0UU1U then inc 1 if sign || frac64 || gbit || rbit || xbit = 0b0UUU1 then inc 1 end if round_mode = 0b11 then do /* Round toward -Infinity */ if sign || frac64 || gbit || rbit || xbit = 0b1U1UU then inc 1 if sign || frac64 || gbit || rbit || xbit = 0b1UU1U then inc 1 if sign || frac64 || gbit || rbit || xbit = 0b1UUU1 then inc 1 end frac0:64 frac0:64 + inc FPSCRFR inc FPSCRFI gbit | rbit | xbit return Infinity Operand: FPSCRFR 0b0 FPSCRFI 0b0 FPSCRVXCVI 0b1 if FPSCRVE = 0 then do if tgt_precision = "32-bit signed integer" then do if sign=0 then FRT 0xUUUU_UUUU_7FFF_FFFF if sign=1 then FRT 0xUUUU_UUUU_8000_0000 end else if tgt_precision = "32-bit unsigned integer" then do if sign=0 then FRT 0xUUUU_UUUU_FFFF_FFFF if sign=1 then FRT 0xUUUU_UUUU_0000_0000 end else if tgt_precision = "64-bit signed integer" then do if sign=0 then FRT 0x7FFF_FFFF_FFFF_FFFF if sign=1 then FRT 0x8000_0000_0000_0000 608 Power ISATM Book I Version 2.06 end else if tgt_precision = "64-bit unsigned integer" then do if sign=0 then FRT 0xFFFF_FFFF_FFFF_FFFF if sign=1 then FRT 0x0000_0000_0000_0000 end FPSCRFPRF 0bUUUUU end done SNaN Operand: FPSCRFR 0b0 FPSCRFI 0b0 FPSCRVXSNAN 0b1 FPSCRVXCVI 0b1 if FPSCRVE = 0 then do if tgt_precision = "32-bit signed integer" then FRT 0xUUUU_UUUU_8000_0000 if tgt_precision = "64-bit signed integer" then FRT 0x8000_0000_0000_0000 if tgt_precision = "32-bit unsigned integer" then FRT 0xUUUU_UUUU_0000_0000 if tgt_precision = "64-bit unsigned integer" then FRT 0x0000_0000_0000_0000 FPSCRFPRF 0bUUUUU end done QNaN Operand: FPSCRFR 0b0 FPSCRFI 0b0 FPSCRVXCVI 0b1 if FPSCRVE = 0 then do if tgt_precision = "32-bit signed integer" then FRT 0xUUUU_UUUU_8000_0000 if tgt_precision = "64-bit signed integer" then FRT 0x8000_0000_0000_0000 if tgt_precision = "32-bit unsigned integer" then FRT 0xUUUU_UUUU_0000_0000 if tgt_precision = "64-bit unsigned integer" then FRT 0x0000_0000_0000_0000 FPSCRFPRF 0bUUUUU end done Large Operand: FPSCRFR 0b0 FPSCRFI 0b0 FPSCRVXCVI 0b1 if FPSCRVE = 0 then do if tgt_precision = "32-bit signed integer" then do if sign = 0 then FRT 0xUUUU_UUUU_7FFF_FFFF if sign = 1 then FRT 0xUUUU_UUUU_8000_0000 end else if tgt_precision = "64-bit signed integer" then do if sign = 0 then FRT 0x7FFF_FFFF_FFFF_FFFF if sign = 1 then FRT 0x8000_0000_0000_0000 end else if tgt_precision = "32-bit unsigned integer" then do if sign = 0 then FRT 0xUUUU_UUUU_FFFF_FFFF if sign = 1 then FRT 0xUUUU_UUUU_0000_0000 end else if tgt_precision = "64-bit unsigned integer" then do if sign = 0 then FRT 0xFFFF_FFFF_FFFF_FFFF if sign = 1 then FRT 0x0000_0000_0000_0000 end FPSCRFPRF 0bUUUUU end done Appendix A. Suggested Floating-Point Models [Category: Floating-Point] 609 Version 2.06 A.3 Floating-Point Convert from Integer Model The following describes algorithmically the operation of the Floating Convert From Integer instructions. if Floating Convert From Integer Doubleword then do tgt_precision "double-precision" sign (FRB)0 exp 63 frac0:63 (FRB) end if Floating Convert From Integer Doubleword Single then do tgt_precision "single-precision" sign (FRB)0 exp 63 frac0:63 (FRB) end if Floating Convert From Integer Doubleword Unsigned then do tgt_precision "double-precision" sign 0 exp 63 frac0:63 (FRB) end if Floating Convert From Integer Doubleword Unsigned Single then do tgt_precision "single-precision" sign 0 exp 63 frac0:63 (FRB) end if frac0:63 = 0 then go to Zero Operand if sign = 1 then frac0:63 ¬frac0:63 + 1 /* do the loop 0 times if (FRB) = max negative 64-bit integer or */ /* if (FRB) = max unsigned 64-bit integer */ do while frac0 = 0 frac0:63 frac1:63 || 0b0 exp exp - 1 end Round Float( sign, exp, frac0:63, RN ) if sign = 0 then FPSCRFPRF "+normal number" if sign = 1 then FPSCRFPRF "-normal number" FRT0 sign FRT1:11 exp + 1023 /* exp + bias */ FRT12:63 frac1:52 done Zero Operand: FPSCRFR 0b00 FPSCRFI 0b00 FPSCRFPRF "+ zero" FRT 0x0000_0000_0000_0000 done Round Float( sign, exp, frac0:63, round_mode ): inc 0 if tgt_precision = "single-precision" then do lsb frac23 gbit frac24 rbit frac25 xbit frac26:63 > 0 end else do /* tgt_precision = "double-precision" */ 610 Power ISATM Book I Version 2.06 lsb frac52 gbit frac53 rbit frac54 xbit frac55:63 > 0 end if round_mode = 0b00 then do /* Round to Nearest */ if sign || lsb || gbit || rbit || xbit = 0bU11UU then inc 1 if sign || lsb || gbit || rbit || xbit = 0bU011U then inc 1 if sign || lsb || gbit || rbit || xbit = 0bU01U1 then inc 1 end if round_mode = 0b10 then do /* Round toward + Infinity */ if sign || lsb || gbit || rbit || xbit = 0b0U1UU then inc 1 if sign || lsb || gbit || rbit || xbit = 0b0UU1U then inc 1 if sign || lsb || gbit || rbit || xbit = 0b0UUU1 then inc 1 end if round_mode = 0b11 then do /* Round toward - Infinity */ if sign || lsb || gbit || rbit || xbit = 0b1U1UU then inc 1 if sign || lsb || gbit || rbit || xbit = 0b1UU1U then inc 1 if sign || lsb || gbit || rbit || xbit = 0b1UUU1 then inc 1 end if tgt_precision = "single-precision" then frac0:23 frac0:23 + inc else /* tgt_precision = "double-precision" */ frac0:52 frac0:52 + inc if carry_out = 1 then exp exp + 1 FPSCRFR inc FPSCRFI gbit | rbit | xbit FPSCRXX FPSCRXX | FPSCRFI return Appendix A. Suggested Floating-Point Models [Category: Floating-Point] 611 Version 2.06 A.4 Floating-Point Round to Integer Model The following describes algorithmically the operation of the Floating Round To Integer instructions. If (FRB)1:11 = 2047 and (FRB)12:63 = 0, then goto Infinity Operand If (FRB)1:11 = 2047 and (FRB)12 = 0, then goto SNaN Operand If (FRB)1:11 = 2047 and (FRB)12 = 1, then goto QNaN Operand if (FRB)1:63 = 0 then goto Zero Operand If (FRB)1:11 < 1023 then goto Small Operand /* exp < 0; |value| < 1*/ If (FRB)1:11 > 1074 then goto Large Operand /* exp > 51; integral value */ sign (FRB)0 exp (FRB)1:11 - 1023 /* exp - bias */ frac0:52 0b1 || (FRB)12:63 gbit || rbit || xbit 0b000 Do i = 1, 52 - exp frac0:52 || gbit || rbit || xbit 0b0 || frac0:52 || gbit || (rbit | xbit) End Round Integer (sign, frac0:52, gbit, rbit, xbit) Do i = 2, 52 - exp frac0:52 frac1:52 || 0b0 End If frac0 = 1, then exp exp + 1 Else frac0:52 frac1:52 || 0b0 FRT0 sign FRT1:11 exp + 1023 FRT12:63 frac1:52 If (FRT)0 = 0 then FPSCRFPRF "+ normal number" Else FPSCRFPRF "- normal number" FPSCRFR FI 0b00 Done Round Integer(sign, frac0:52, gbit, rbit, xbit): inc 0 If inst = Floating Round to Integer Nearest then /* ties away from zero */ Do /* comparisons ignore u bits */ If sign || frac52 || gbit || rbit || xbit = 0buu1uu then inc 1 End If inst = Floating Round to Integer Plus then Do /* comparisons ignore u bits */ If sign || frac52 || gbit || rbit || xbit = 0b0u1uu then inc 1 If sign || frac52 || gbit || rbit || xbit = 0b0uu1u then inc 1 If sign || frac52 || gbit || rbit || xbit = 0b0uuu1 then inc 1 End If inst = Floating Round to Integer Minus then Do /* comparisons ignore u bits */ If sign || frac52 || gbit || rbit || xbit = 0b1u1uu then inc 1 If sign || frac52 || gbit || rbit || xbit = 0b1uu1u then inc 1 If sign || frac52 || gbit || rbit || xbit = 0b1uuu1 then inc 1 End frac0:52 frac0:52 + inc Return 612 Power ISATM Book I Version 2.06 Infinity Operand: If FRT0 = 0 then FPSCRFPRF "+ normal num- FRT (FRB) ber" If (FRB)0 = 0 then FPSCRFPRF "+ infinity" Else FPSCRFPRF "- normal number" If (FRB)0 = 1 then FPSCRFPRF "- infinity" FPSCRFR FI 0b00 FPSCRFR FI 0b00 Done Done SNaN Operand: FPSCRVXSNAN 1 If FPSCRVE = 0 then Do FRT (FRB) FRT12 1 FPSCRFPRF "QNaN" End FPSCRFR FI 0b00 Done QNaN Operand: FRT (FRB) FPSCRFPRF "QNaN" FPSCRFR FI 0b00 Done Zero Operand: If (FRB)0 = 0 then Do FRT 0x0000_0000_0000_0000 FPSCRFPRF "+ zero" End Else Do FRT 0x8000_0000_0000_0000 FPSCRFPRF "- zero" End FPSCRFR FI 0b00 Done Small Operand: If inst = Floating Round to Integer Nearest and (FRB)1:11 < 1022 then goto Zero Operand If inst = Floating Round to Integer Toward Zero then goto Zero Operand If inst = Floating Round to Integer Plus and (FRB)0 = 1 then goto Zero Operand If inst = Floating Round to Integer Minus and (FRB)0 = 0 then goto Zero Operand If (FRB)0 = 0 then Do FRT 0x3FF0_0000_0000_0000 /* value = 1.0 */ FPSCRFPRF "+ normal number" End Else Do FRT 0xBFF0_0000_0000_0000 /* value = -1.0 */ FPSCRFPRF "- normal number" End FPSCRFR FI 0b00 Done Large Operand: FRT (FRB) Appendix A. Suggested Floating-Point Models [Category: Floating-Point] 613 Version 2.06 614 Power ISATM Book I Version 2.06 Appendix B. Densely Packed Decimal The trailing significand field of the decimal floating-point can be applied or reversed using simple Boolean oper- data format is encoded using Densely Packed Decimal ations. In the following examples, a 3-digit BCD num- (DPD). DPD encoding is a compression technique ber is represented as (abcd)(efgh)(ijkm), a 10-bit DPD which supports the representation of decimal integers number is represented as (pqr)(stu)(v)(wxy), and the of arbitrary length. Translation operates on three Boolean operations, & (AND), | (OR), and ¬ (NOT) are Binary Coded Decimal (BCD) digits at a time com- used. pressing the 12 bits into 10 bits with an algorithm that B.1 BCD-to-DPD Translation with the DPD entries shown in hexadecimal format. The BCD number is produced by replacing `_' in the The translation from a 3-digit BCD number to a 10-bit leftmost column with the corresponding digit along the DPD can be performed through the following Boolean top row. The table is split into two halves, with the right operations. half being a continuation of the left half. p = (f & a & i & ¬e) | (j & a & ¬i) | (b & ¬a) q = (g & a & i & ¬e) | (k & a & ¬i) | (c & ¬a) B.2 DPD-to-BCD Translation r = d The translation from a 10-bit DPD to a 3-digit BCD s = (j & ¬a & e & ¬i) | (f & ¬i & ¬e) | number can be performed through the following Bool- (f & ¬a & ¬e) | (e & i) ean operations. t = (k & ¬a & e & ¬i) | (g & ¬i & ¬e) | (g & ¬a & ¬e) | (a & i) a = (¬s & v & w) | (t & v & w & s) | (v & w & ¬x) u = h b = (p & s & x & ¬t) | (p & ¬w) | (p & ¬v) c = (q & s & x & ¬t) | (q & ¬w) | (q & ¬v) v = a | e | i d = r w = (¬e & j & ¬i) | (e & i) | a e = (v & ¬w & x) | (s & v & w & x) | x = (¬a & k & ¬i) | (a & i) | e (¬t & v & x & w) y = m f = (p & t & v & w & x & ¬s) | (s & ¬x & v) | Alternatively, the following table can be used to perform (s & ¬v) g = (q & t & w & v & x & ¬s) | (t & ¬x & v) | the translation. The most significant bit of the three (t & ¬v) BCD digits (left column) is used to select a specific 10- h = u bit encoding (right column) of the DPD. i = (t & v & w & x) | (s & v & w & x) | aei pqr stu v wxy (v & ¬w & ¬x) 000 bcd fgh 0 jkm j = (p & ¬s & ¬t & w & v) | (s & v & ¬w & x) | (p & w & ¬x & v) | (w & ¬v) 001 bcd fgh 1 00m k = (q & ¬s & ¬t & v & w) | (t & v & ¬w & x) | 010 bcd jkh 1 01m (q & v & w & ¬x) | (x & ¬v) m = y 011 bcd 10h 1 11m 100 jkd fgh 1 10m Alternatively, the following table can be used to perform 101 fgd 01h 1 11m the translation. A combination of five bits in the DPD 110 jkd 00h 1 11m encoding (leftmost column) are used to specify a trans- lation to the 3-digit BCD encoding. Dashes (-) in the 111 00d 11h 1 11m table are don't cares, and can be either one or zero. The full translation of a 3-digit BCD number (000 - 999) to a 10-bit DPD is shown in Table 108 on page 617, Appendix B. Densely Packed Decimal 615 Version 2.06 vwxst abcd efgh ijkm DPD Code BCD Value DPD Code BCD Value 0---- 0pqr 0stu 0wxy 0x06E 0x0EE 100-- 0pqr 0stu 100y (0x16E) 888 (0x1EE) 988 101-- 0pqr 100u 0sty (0x26E) (0x2EE) 110-- 100r 0stu 0pqy (0x36E) (0x3EE) 11100 100r 100u 0pqy 0x06F 0x0EF 11101 100r 0pqu 100y (0x16F) 889 (0x1EF) 989 11110 0pqr 100u 100y (0x26F) (0x2EF) 11111 100r 100u 100y (0x36F) (0x3EF) The full translation of the 10-bit DPD to a 3-digit BCD 0x07E 0x0FE number is shown in Table 109 on page 618. The 10-bit (0x17E) 898 (0x1FE) 998 DPD index is produced by concatenating the 6-bit value (0x27E) (0x2FE) shown in the left column with the 4-bit index along the (0x37E) (0x3FE) top row, both represented in hexadecimal. The values in parentheses are non-preferred translations and are 0x07F 0x0FF explained further in the following section. (0x17F) 899 (0x1FF) 999 (0x27F) (0x2FF) B.3 Preferred DPD encoding (0x37F) (0x3FF) Translating from a 3-digit BCD number (1000 numbers) to a 10-bit DPD encoding (1024 combinations) leaves 24 redundant translations. The 24 redundant combina- tions are evenly assigned to eight BCD numbers and are shown in the following table, with the non-preferred encoding in parentheses. The preferred encoding is produced by translating a 3-digit BCD number with the translation table or Boolean operations shown in Sec- tion B.1. The redundant DPD encodings are all valid and will be correctly translated to their respective BCD value through the mechanisms provided in Section B.2. For decimal floating-point operations all DPD encod- ings are recognized as source operands. 616 Power ISATM Book I Version 2.06 Table 108:BCD-to-DPD translation 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 00_ 000 001 002 003 004 005 006 007 008 009 50_ 280 281 282 283 284 285 286 287 288 289 01_ 010 011 012 013 014 015 016 017 018 019 51_ 290 291 292 293 294 295 296 297 298 299 02_ 020 021 022 023 024 025 026 027 028 029 52_ 2A0 2A1 2A2 2A3 2A4 2A5 2A6 2A7 2A8 2A9 03_ 030 031 032 033 034 035 036 037 038 039 53_ 2B0 2B1 2B2 2B3 2B4 2B5 2B6 2B7 2B8 2B9 04_ 040 041 042 043 044 045 046 047 048 049 54_ 2C0 2C1 2C2 2C3 2C4 2C5 2C6 2C7 2C8 2C9 05_ 050 051 052 053 054 055 056 057 058 059 55_ 2D0 2D1 2D2 2D3 2D4 2D5 2D6 2D7 2D8 2D9 06_ 060 061 062 063 064 065 066 067 068 069 56_ 2E0 2E1 2E2 2E3 2E4 2E5 2E6 2E7 2E8 2E9 07_ 070 071 072 073 074 075 076 077 078 079 57_ 2F0 2F1 2F2 2F3 2F4 2F5 2F6 2F7 2F8 2F9 08_ 00A 00B 02A 02B 04A 04B 06A 06B 04E 04F 58_ 28A 28B 2AA 2AB 2CA 2CB 2EA 2EB 2CE 2CF 09_ 01A 01B 03A 03B 05A 05B 07A 07B 05E 05F 59_ 29A 29B 2BA 2BB 2DA 2DB 2FA 2FB 2DE 2DF 10_ 080 081 082 083 084 085 086 087 088 089 60_ 300 301 302 303 304 305 306 307 308 309 11_ 090 091 092 093 094 095 096 097 098 099 61_ 310 311 312 313 314 315 316 317 318 319 12_ 0A0 0A1 0A2 0A3 0A4 0A5 0A6 0A7 0A8 0A9 62_ 320 321 322 323 324 325 326 327 328 329 13_ 0B0 0B1 0B2 0B3 0B4 0B5 0B6 0B7 0B8 0B9 63_ 330 331 332 333 334 335 336 337 338 339 14_ 0C0 0C1 0C2 0C3 0C4 0C5 0C6 0C7 0C8 0C9 64_ 340 341 342 343 344 345 346 347 348 349 15_ 0D0 0D1 0D2 0D3 0D4 0D5 0D6 0D7 0D8 0D9 65_ 350 351 352 353 354 355 356 357 358 359 16_ 0E0 0E1 0E2 0E3 0E4 0E5 0E6 0E7 0E8 0E9 66_ 360 361 362 363 364 365 366 367 368 369 17_ 0F0 0F1 0F2 0F3 0F4 0F5 0F6 0F7 0F8 0F9 67_ 370 371 372 373 374 375 376 377 378 379 18_ 08A 08B 0AA 0AB 0CA 0CB 0EA 0EB 0CE 0CF 68_ 30A 30B 32A 32B 34A 34B 36A 36B 34E 34F 19_ 09A 09B 0BA 0BB 0DA 0DB 0FA 0FB 0DE 0DF 69_ 31A 31B 33A 33B 35A 35B 37A 37B 35E 35F 20_ 100 101 102 103 104 105 106 107 108 109 70_ 380 381 382 383 384 385 386 387 388 389 21_ 110 111 112 113 114 115 116 117 118 119 71_ 390 391 392 393 394 395 396 397 398 399 22_ 120 121 122 123 124 125 126 127 128 129 72_ 3A0 3A1 3A2 3A3 3A4 3A5 3A6 3A7 3A8 3A9 23_ 130 131 132 133 134 135 136 137 138 139 73_ 3B0 3B1 3B2 3B3 3B4 3B5 3B6 3B7 3B8 3B9 24_ 140 141 142 143 144 145 146 147 148 149 74_ 3C0 3C1 3C2 3C3 3C4 3C5 3C6 3C7 3C8 3C9 25_ 150 151 152 153 154 155 156 157 158 159 75_ 3D0 3D1 3D2 3D3 3D4 3D5 3D6 3D7 3D8 3D9 26_ 160 161 162 163 164 165 166 167 168 169 76_ 3E0 3E1 3E2 3E3 3E4 3E5 3E6 3E7 3E8 3E9 27_ 170 171 172 173 174 175 176 177 178 179 77_ 3F0 3F1 3F2 3F3 3F4 3F5 3F6 3F7 3F8 3F9 28_ 10A 10B 12A 12B 14A 14B 16A 16B 14E 14F 78_ 38A 38B 3AA 3AB 3CA 3CB 3EA 3EB 3CE 3CF 29_ 11A 11B 13A 13B 15A 15B 17A 17B 15E 15F 79_ 39A 39B 3BA 3BB 3DA 3DB 3FA 3FB 3DE 3DF 30_ 180 181 182 183 184 185 186 187 188 189 80_ 00C 00D 10C 10D 20C 20D 30C 30D 02E 02F 31_ 190 191 192 193 194 195 196 197 198 199 81_ 01C 01D 11C 11D 21C 21D 31C 31D 03E 03F 32_ 1A0 1A1 1A2 1A3 1A4 1A5 1A6 1A7 1A8 1A9 82_ 02C 02D 12C 12D 22C 22D 32C 32D 12E 12F 33_ 1B0 1B1 1B2 1B3 1B4 1B5 1B6 1B7 1B8 1B9 83_ 03C 03D 13C 13D 23C 23D 33C 33D 13E 13F 34_ 1C0 1C1 1C2 1C3 1C4 1C5 1C6 1C7 1C8 1C9 84_ 04C 04D 14C 14D 24C 24D 34C 34D 22E 22F 35_ 1D0 1D1 1D2 1D3 1D4 1D5 1D6 1D7 1D8 1D9 85_ 05C 05D 15C 15D 25C 25D 35C 35D 23E 23F 36_ 1E0 1E1 1E2 1E3 1E4 1E5 1E6 1E7 1E8 1E9 86_ 06C 06D 16C 16D 26C 26D 36C 36D 32E 32F 37_ 1F0 1F1 1F2 1F3 1F4 1F5 1F6 1F7 1F8 1F9 87_ 07C 07D 17C 17D 27C 27D 37C 37D 33E 33F 38_ 18A 18B 1AA 1AB 1CA 1CB 1EA 1EB 1CE 1CF 88_ 00E 00F 10E 10F 20E 20F 30E 30F 06E 06F 39_ 19A 19B 1BA 1BB 1DA 1DB 1FA 1FB 1DE 1DF 89_ 01E 01F 11E 11F 21E 21F 31E 31F 07E 07F 40_ 200 201 202 203 204 205 206 207 208 209 90_ 08C 08D 18C 18D 28C 28D 38C 38D 0AE 0AF 41_ 210 211 212 213 214 215 216 217 218 219 91_ 09C 09D 19C 19D 29C 29D 39C 39D 0BE 0BF 42_ 220 221 222 223 224 225 226 227 228 229 92_ 0AC 0AD 1AC 1AD 2AC 2AD 3AC 3AD 1AE 1AF 43_ 230 231 232 233 234 235 236 237 238 239 93_ 0BC 0BD 1BC 1BD 2BC 2BD 3BC 3BD 1BE 1BF 44_ 240 241 242 243 244 245 246 247 248 249 94_ 0CC 0CD 1CC 1CD 2CC 2CD 3CC 3CD 2AE 2AF 45_ 250 251 252 253 254 255 256 257 258 259 95_ 0DC 0DD 1DC 1DD 2DC 2DD 3DC 3DD 2BE 2BF 46_ 260 261 262 263 264 265 266 267 268 269 96_ 0EC 0ED 1EC 1ED 2EC 2ED 3EC 3ED 3AE 3AF 47_ 270 271 272 273 274 275 276 277 278 279 97_ 0FC 0FD 1FC 1FD 2FC 2FD 3FC 3FD 3BE 3BF 48_ 20A 20B 22A 22B 24A 24B 26A 26B 24E 24F 98_ 08E 08F 18E 18F 28E 28F 38E 38F 0EE 0EF 49_ 21A 21B 23A 23B 25A 25B 27A 27B 25E 25F 99_ 09E 09F 19E 19F 29E 29F 39E 39F 0FE 0FF Appendix B. Densely Packed Decimal 617 Version 2.06 Table 109: DPD-to-BCD translation 0 1 2 3 4 5 6 7 8 9 A B C D E F 00_ 000 001 002 003 004 005 006 007 008 009 080 081 800 801 880 881 01_ 010 011 012 013 014 015 016 017 018 019 090 091 810 811 890 891 02_ 020 021 022 023 024 025 026 027 028 029 082 083 820 821 808 809 03_ 030 031 032 033 034 035 036 037 038 039 092 093 830 831 818 819 04_ 040 041 042 043 044 045 046 047 048 049 084 085 840 841 088 089 05_ 050 051 052 053 054 055 056 057 058 059 094 095 850 851 098 099 06_ 060 061 062 063 064 065 066 067 068 069 086 087 860 861 888 889 07_ 070 071 072 073 074 075 076 077 078 079 096 097 870 871 898 899 08_ 100 101 102 103 104 105 106 107 108 109 180 181 900 901 980 981 09_ 110 111 112 113 114 115 116 117 118 119 190 191 910 911 990 991 0A_ 120 121 122 123 124 125 126 127 128 129 182 183 920 921 908 909 0B_ 130 131 132 133 134 135 136 137 138 139 192 193 930 931 918 919 0C_ 140 141 142 143 144 145 146 147 148 149 184 185 940 941 188 189 0D_ 150 151 152 153 154 155 156 157 158 159 194 195 950 951 198 199 0E_ 160 161 162 163 164 165 166 167 168 169 186 187 960 961 988 989 0F_ 170 171 172 173 174 175 176 177 178 179 196 197 970 971 998 999 10_ 200 201 202 203 204 205 206 207 208 209 280 281 802 803 882 883 11_ 210 211 212 213 214 215 216 217 218 219 290 291 812 813 892 893 12_ 220 221 222 223 224 225 226 227 228 229 282 283 822 823 828 829 13_ 230 231 232 233 234 235 236 237 238 239 292 293 832 833 838 839 14_ 240 241 242 243 244 245 246 247 248 249 284 285 842 843 288 289 15_ 250 251 252 253 254 255 256 257 258 259 294 295 852 853 298 299 16_ 260 261 262 263 264 265 266 267 268 269 286 287 862 863 (888) (889) 17_ 270 271 272 273 274 275 276 277 278 279 296 297 872 873 (898) (899) 18_ 300 301 302 303 304 305 306 307 308 309 380 381 902 903 982 983 19_ 310 311 312 313 314 315 316 317 318 319 390 391 912 913 992 993 1A_ 320 321 322 323 324 325 326 327 328 329 382 383 922 923 928 929 1B_ 330 331 332 333 334 335 336 337 338 339 392 393 932 933 938 939 1C_ 340 341 342 343 344 345 346 347 348 349 384 385 942 943 388 389 1D_ 350 351 352 353 354 355 356 357 358 359 394 395 952 953 398 399 1E_ 360 361 362 363 364 365 366 367 368 369 386 387 962 963 (988) (989) 1F_ 370 371 372 373 374 375 376 377 378 379 396 397 972 973 (998) (999) 20_ 400 401 402 403 404 405 406 407 408 409 480 481 804 805 884 885 21_ 410 411 412 413 414 415 416 417 418 419 490 491 814 815 894 895 22_ 420 421 422 423 424 425 426 427 428 429 482 483 824 825 848 849 23_ 430 431 432 433 434 435 436 437 438 439 492 493 834 835 858 859 24_ 440 441 442 443 444 445 446 447 448 449 484 485 844 845 488 489 25_ 450 451 452 453 454 455 456 457 458 459 494 495 854 855 498 499 26_ 460 461 462 463 464 465 466 467 468 469 486 487 864 865 (888) (889) 27_ 470 471 472 473 474 475 476 477 478 479 496 497 874 875 (898) (899) 28_ 500 501 502 503 504 505 506 507 508 509 580 581 904 905 984 985 29_ 510 511 512 513 514 515 516 517 518 519 590 591 914 915 994 995 2A_ 520 521 522 523 524 525 526 527 528 529 582 583 924 925 948 949 2B_ 530 531 532 533 534 535 536 537 538 539 592 593 934 935 958 959 2C_ 540 541 542 543 544 545 546 547 548 549 584 585 944 945 588 589 2D_ 550 551 552 553 554 555 556 557 558 559 594 595 954 955 598 599 2E_ 560 561 562 563 564 565 566 567 568 569 586 587 964 965 (988) (989) 2F_ 570 571 572 573 574 575 576 577 578 579 596 597 974 975 (998) (999) 30_ 600 601 602 603 604 605 606 607 608 609 680 681 806 807 886 887 31_ 610 611 612 613 614 615 616 617 618 619 690 691 816 817 896 897 32_ 620 621 622 623 624 625 626 627 628 629 682 683 826 827 868 869 33_ 630 631 632 633 634 635 636 637 638 639 692 693 836 837 878 879 34_ 640 641 642 643 644 645 646 647 648 649 684 685 846 847 688 689 35_ 650 651 652 653 654 655 656 657 658 659 694 695 856 857 698 699 36_ 660 661 662 663 664 665 666 667 668 669 686 687 866 867 (888) (889) 37_ 670 671 672 673 674 675 676 677 678 679 696 697 876 877 (898) (899) 38_ 700 701 702 703 704 705 706 707 708 709 780 781 906 907 986 987 39_ 710 711 712 713 714 715 716 717 718 719 790 791 916 917 996 997 3A_ 720 721 722 723 724 725 726 727 728 729 782 783 926 927 968 969 3B_ 730 731 732 733 734 735 736 737 738 739 792 793 936 937 978 979 3C_ 740 741 742 743 744 745 746 747 748 749 784 785 946 947 788 789 3D_ 750 751 752 753 754 755 756 757 758 759 794 795 956 957 798 799 3E_ 760 761 762 763 764 765 766 767 768 769 786 787 966 967 (988) (989) 3F_ 770 771 772 773 774 775 776 777 778 779 796 797 976 977 (998) (999) 618 Power ISATM Book I Version 2.06 Appendix C. Vector RTL Functions [Category: Vector] ConvertSPtoSXWsaturate( X, Y ) sign = X0 exp0:7 = X1:8 frac0:30 = X9:31 || 0b0000_0000 if((exp==255)&(frac!=0)) then return(0x0000_0000) // NaN operand if((exp==255)&(frac==0)) then do // infinity operand VSCRSAT = 1 return( (sign==1) ? 0x8000_0000 : 0x7FFF_FFFF ) if((exp+Y-127)>30) then do // large operand VSCRSAT = 1 return( (sign==1) ? 0x8000_0000 : 0x7FFF_FFFF ) if((exp+Y-127)<0) then return(0x0000_0000) // -1.0 < value < 1.0 (value rounds to 0) significand0:31 = 0b1 || frac do i=1 to 31-(exp+Y-127) significand = significand >>ui 1 return( (sign==0) ? significand : (¬significand + 1) ) ConvertSPtoUXWsaturate( X, Y ) sign = X0 exp0:7 = X1:8 frac0:30 = X9:31 || 0b0000_0000 if((exp==255)&&(frac!=0)) then return(0x0000_0000) // NaN operand if((exp==255)&&(frac==0)) then do // infinity operand VSCRSAT = 1 return( (sign==1) ? 0x0000_0000 : 0xFFFF_FFFF ) if((exp+Y-127)>31) then do // large operand VSCRSAT = 1 return( (sign==1) ? 0x0000_0000 : 0xFFFF_FFFF ) if((exp+Y-127)<0) then return(0x0000_0000) // -1.0 < value < 1.0 // value rounds to 0 if( sign==1 ) then do // negative operand VSCRSAT = 1 return(0x0000_0000) significand0:31 = 0b1 || frac do i=1 to 31-(exp+Y-127) significand = significand >>ui 1 return( significand ) ConvertSXWtoSP( X ) sign = X0 exp0:7 = 32 + 127 frac0:32 = X0 || X0:31 if( frac==0 ) return( 0x0000_0000 ) // Zero operand if( sign==1 ) then frac = ¬frac + 1 do while( frac0==0 ) frac = frac << 1 exp = exp - 1 lsb = frac23 gbit = frac24 xbit = frac25:32!=0 inc = ( lsb && gbit ) | ( gbit && xbit ) frac0:23 = frac0:23 + inc if( carry_out==1 ) exp = exp + 1 return( sign || exp || frac1:23 ) Appendix C. Vector RTL Functions [Category: Vector] 619 Version 2.06 ConvertUXWtoSP( X ) exp0:7 = 31 + 127 frac0:31 = X0:31 if( frac==0 ) return( 0x0000_0000 ) // Zero Operand do while( frac0==0 ) frac = frac << 1 exp = exp - 1 lsb = frac23 gbit = frac24 xbit = frac25:31!=0 inc = ( lsb && gbit ) | ( gbit && xbit ) frac0:23 = frac0:23 + inc if( carry_out==1 ) exp = exp + 1 return( 0b0 || exp || frac1:23 ) 620 Power ISATM Book I Version 2.06 Appendix D. Embedded Floating-Point RTL Functions [Category: SPE.Embedded Float Scalar Double] [Category: SPE.Embedded Float Scalar Single] [Category: SPE.Embedded Float Vector] D.1 Common Functions // Round a 32-bit fp result Round32(fp, guard, sticky) // Check if 32-bit fp value is a NaN or Infinity FP32format fp; Isa32NaNorInfinity(fp) if (SPEFSCRFINXE = 0) then return (fpexp = 255) if (SPEFSCRFRMC = 0b00) then // nearest if (guard) then Isa32NaN(fp) if (sticky | fpfrac[22]) then return ((fpexp = 255) & (fpfrac 0)) v0:23 fpfrac + 1 if v0 then // Check if 32-bit fp value is denormalized if (fpexp >= 254) then Isa32Denorm(fp) // overflow return ((fpexp = 0) & (fpfrac 0)) fp fpsign || 0b11111110 || 231 else // Check if 64-bit fp value is a NaN or Infinity fpexp fpexp + 1 Isa64NaNorInfinity(fp) fpfrac v1:23 return (fpexp = 2047) else fpfrac v1:23 Isa64NaN(fp) else if ((SPEFSCRFRMC & 0b10) = 0b10) then return ((fpexp = 2047) & (fpfrac 0)) // infinity modes // implementation dependent // Check if 32-bit fp value is denormalized return fp Isa64Denorm(fp) return ((fpexp = 0) & (fpfrac 0)) // Round a 64-bit fp result Round64(fp, guard, sticky) // Signal an error in the SPEFSCR SignalFPError(upper_lower, bits) FP32format fp; if (upper_lower = HI) then if (SPEFSCRFINXE = 0) then bits bits << 15 if (SPEFSCRFRMC = 0b00) then // nearest SPEFSCR SPEFSCR | bits if (guard) then bits (FG | FX) if (sticky | fpfrac[51]) then if (upper_lower = HI) then v0:52 fpfrac + 1 bits bits << 15 if v0 then SPEFSCR SPEFSCR & ¬bits if (fpexp >= 2046) then // overflow fp fpsign || 0b11111111110 || 521 else fpexp fpexp + 1 fpfrac v1:52 else fpfrac v1:52 else if ((SPEFSCRFRMC & 0b10) = 0b10) then // infinity modes // implementation dependent return fp Appendix D. Embedded Floating-Point RTL Functions [Category: SPE.Em- 621 Version 2.06 D.2 Convert from Single-Preci- guard result result & 0x00000001 result > 1 sion Embedded Floating-Point // Report sticky and guard bits if (upper_lower = HI) then to Integer Word with Saturation SPEFSCRFGH guard SPEFSCRFXH sticky // Convert 32-bit Floating-Point to 32-bit integer else // or fractional SPEFSCRFG guard // signed = S (signed) or U (unsigned) SPEFSCRFX sticky // upper_lower = HI (high word) or LO (low word) // round = RND (round) or ZER (truncate) if (guard | sticky) then // fractional = F (fractional) or I (integer) SPEFSCRFINXS 1 // Round the integer result CnvtFP32ToI32Sat(fp, signed, if ((round = RND) & (SPEFSCRFINXE = 0)) then upper_lower, round, fractional) if (SPEFSCRFRMC = 0b00) then // nearest if (guard) then FP32format fp; if (sticky | (result & 0x00000001)) then if (Isa32NaNorInfinity(fp)) then result result + 1 SignalFPError(upper_lower, FINV) else if ((SPEFSCRFRMC & 0b10) = 0b10) then if (Isa32NaN(fp)) then // infinity modes return 0x00000000 // all NaNs // implementation dependent if (signed = S) then if (signed = S) then if (fpsign = 1) then if (fpsign = 1) then return 0x80000000 result ¬result + 1 else return result return 0x7fffffff else if (fpsign = 1) then return 0x00000000 else return 0xffffffff if (Isa32Denorm(fp)) then SignalFPError(upper_lower, FINV) return 0x00000000 // regardless of sign if ((signed = U) & (fpsign = 1)) then SignalFPError(upper_lower, FOVF) // overflow return 0x00000000 if ((fpexp = 0) & (fpfrac = 0)) then return 0x00000000 // all zero values if (fractional = I) then // convert to integer max_exp 158 shift 158 - fpexp if (signed = S) then if ((fpexp158)|(fpfrac0)|(fpsign1)) then max_exp max_exp - 1 else // fractional conversion max_exp 126 shift 126 - fpexp if (signed = S) then shift shift + 1 if (fpexp > max_exp) then SignalFPError(upper_lower, FOVF) // overflow if (signed = S) then if (fpsign = 1) then return 0x80000000 else return 0x7fffffff else return 0xffffffff result 0b1 || fpfrac || 0b00000000 // add U bit guard 0 sticky 0 for (n 0; n < shift; n n + 1) do sticky sticky | guard 622 Power ISATM Book I Version 2.06 D.3 Convert from Double-Preci- guard result result & 0x00000001 result > 1 sion Embedded Floating-Point // Report sticky and guard bits to Integer Word with Saturation SPEFSCRFG guard SPEFSCRFX sticky // Convert 64-bit Floating-Point to 32-bit integer // or fractional if (guard | sticky) then // signed = S (signed) or U (unsigned) SPEFSCRFINXS 1 // round = RND (round) or ZER (truncate) // Round the result // fractional = F (fractional) or I (integer) if ((round = RND) & (SPEFSCRFINXE = 0)) then if (SPEFSCRFRMC = 0b00) then // nearest CnvtFP64ToI32Sat(fp, signed, round, if (guard) then fractional) if (sticky | (result & 0x00000001)) then FP64format fp; result result + 1 else if ((SPEFSCRFRMC & 0b10) = 0b10) then if (Isa64NaNorInfinity(fp)) then // infinity modes SignalFPError(LO, FINV) // implementation dependent if (Isa64NaN(fp)) then if (signed = S) then return 0x00000000 // all NaNs if (fpsign = 1) then if (signed = S) then result ¬result + 1 if (fpsign = 1) then return result return 0x80000000 else return 0x7fffffff else if (fpsign = 1) then return 0x00000000 else return 0xffffffff if (Isa64Denorm(fp)) then SignalFPError(LO, FINV) return 0x00000000 // regardless of sign if ((signed = U) & (fpsign = 1)) then SignalFPError(LO, FOVF) // overflow return 0x00000000 if ((fpexp = 0) & (fpfrac = 0)) then return 0x00000000 // all zero values if (fractional = I) then // convert to integer max_exp 1054 shift 1054 - fpexp if (signed S) then if ((fpexp1054)|(fpfrac0)|(fpsign1)) then max_exp max_exp - 1 else // fractional conversion max_exp 1022 shift 1022 - fpexp if (signed = S) then shift shift + 1 if (fpexp > max_exp) then SignalFPError(LO, FOVF) // overflow if (signed = S) then if (fpsign = 1) then return 0x80000000 else return 0x7fffffff else return 0xffffffff result 0b1 || fpfrac[0:30] // add U to frac guard fpfrac[31] sticky (fpfrac[32:63] 0) for (n 0; n < shift; n n + 1) do sticky sticky | guard Appendix D. Embedded Floating-Point RTL Functions [Category: SPE.Em- 623 Version 2.06 D.4 Convert from Double-Preci- if (guard | sticky) then SPEFSCRFINXS 1 sion Embedded Floating-Point // Round the result if ((round = RND) & (SPEFSCRFINXE = 0)) then to Integer Doubleword with Satu- if (SPEFSCRFRMC = 0b00) then // nearest if (guard) then ration if (sticky | (result&0x00000000_00000001)) then // Convert 64-bit Floating-Point to 64-bit integer result result + 1 // signed = S (signed) or U (unsigned) else if ((SPEFSCRFRMC & 0b10) = 0b10) then // round = RND (round) or ZER (truncate) // infinity modes // implementation dependent CnvtFP64ToI64Sat(fp, signed, round) if (signed = S) then FP64format fp; if (fpsign = 1) then if (Isa64NaNorInfinity(fp)) then result ¬result + 1 SignalFPError(LO, FINV) return result if (Isa64NaN(fp)) then return 0x00000000_00000000 // all NaNs if (signed = S) then if (fpsign = 1) then return 0x80000000_00000000 else return 0x7fffffff_ffffffff else if (fpsign = 1) then return 0x00000000_00000000 else return 0xffffffff_ffffffff if (Isa64Denorm(fp)) then SignalFPError(LO, FINV) return 0x00000000_00000000 if ((signed = U) & (fpsign = 1)) then SignalFPError(LO, FOVF) // overflow return 0x00000000_00000000 if ((fpexp = 0) & (fpfrac = 0)) then return 0x00000000_00000000 // all zero values max_exp 1086 shift 1086 - fpexp if (signed = S) then if ((fpexp1086)|(fpfrac0)|(fpsign1)) then max_exp max_exp - 1 if (fpexp > max_exp) then SignalFPError(LO, FOVF) // overflow if (signed = S) then if (fpsign = 1) then return 0x80000000_00000000 else return 0x7fffffff_ffffffff else return 0xffffffff_ffffffff result 0b1 || fpfrac || 0b00000000000 //add U bit guard 0 sticky 0 for (n 0; n < shift; n n + 1) do sticky sticky | guard guard result & 0x00000000_00000001 result result > 1 // Report sticky and guard bits SPEFSCRFG guard SPEFSCRFX sticky 624 Power ISATM Book I Version 2.06 D.5 Convert to Single-Precision D.6 Convert to Double-Precision Embedded Floating-Point from Embedded Floating-Point from Integer Word Integer Word // Convert from 32-bit integer or fractional to // Convert from integer or fractional to 64 bit // 32-bit Floating-Point // Floating-Point // signed = S (signed) or U (unsigned) // signed = S (signed) or U (unsigned) // round = RND (round) or ZER (truncate) // fractional = F (fractional) or I (integer) // fractional = F (fractional) or I (integer) CnvtI32ToFP64(v, signed, fractional) CnvtI32ToFP32(v, signed, upper_lower, FP64format result; fractional) resultsign 0 FP32format result; if (v = 0) then resultsign 0 result 0 if (v = 0) then SPEFSCRFG 0 result 0 SPEFSCRFX 0 if (upper_lower = HI) then else SPEFSCRFGH 0 if (signed = S) then SPEFSCRFXH 0 if (v0 = 1) then else v ¬v + 1 SPEFSCRFG 0 resultsign 1 SPEFSCRFX 0 if (fractional = F) then // frac bit align else maxexp 1023 if (signed = S) then if (signed = U) then if (v0 = 1) then maxexp maxexp - 1 v ¬v + 1 else resultsign 1 maxexp 1054 // integer bit align if (fractional = F) then // frac bit align sc 0 maxexp 127 while (v0 = 0) if (signed = U) then v v << 1 maxexp maxexp - 1 sc sc + 1 else v0 0 // clear U bit maxexp 158 // integer bit alignment resultexp maxexp - sc sc 0 while (v0 = 0) // Report sticky and guard bits v v << 1 sc sc + 1 SPEFSCRFG 0 v0 0 // clear U bit SPEFSCRFX 0 resultexp maxexp - sc guard v24 resultfrac v1:31 || 210 sticky (v25:31 0) return result // Report sticky and guard bits if (upper_lower = HI) then SPEFSCRFGH guard SPEFSCRFXH sticky else SPEFSCRFG guard SPEFSCRFX sticky if (guard | sticky) then SPEFSCRFINXS 1 // Round the result resultfrac v1:23 result Round32(result, guard, sticky) return result Appendix D. Embedded Floating-Point RTL Functions [Category: SPE.Em- 625 Version 2.06 D.7 Convert to Double-Precision Embedded Floating-Point from Integer Doubleword // Convert from 64-bit integer to 64-bit // floating-point // signed = S (signed) or U (unsigned) CnvtI64ToFP64(v, signed) FP64format result; resultsign 0 if (v = 0) then result 0 SPEFSCRFG 0 SPEFSCRFX 0 else if (signed = S) then if (v0 = 1) then v ¬v + 1 resultsign 1 maxexp 1054 sc 0 while (v0 = 0) v v << 1 sc sc + 1 v0 0 // clear U bit resultexp maxexp - sc guard v53 sticky (v54:63 0) // Report sticky and guard bits SPEFSCRFG guard SPEFSCRFX sticky if (guard | sticky) then SPEFSCRFINXS 1 // Round the result resultfrac v1:52 result Round64(result, guard, sticky) return result 626 Power ISATM Book I Version 2.06 Appendix E. Assembler Extended Mnemonics In order to make assembler language programs simpler to write and easier to understand, a set of extended mne- monics and symbols is provided that defines simple shorthand for the most frequently used forms of Branch Condi- tional, Compare, Trap, Rotate and Shift, and certain other instructions. Assemblers should provide the extended mnemonics and symbols listed here, and may provide others. E.1 Symbols The following symbols are defined for use in instructions (basic or extended mnemonics) that specify a Condition Register field or a Condition Register bit. The first five (lt, ..., un) identify a bit number within a CR field. The remainder (cr0, ..., cr7) identify a CR field. An expression in which a CR field symbol is multiplied by 4 and then added to a bit- number-within-CR-field symbol and 32 can be used to identify a CR bit. Symbol Value Meaning lt 0 Less than gt 1 Greater than eq 2 Equal so 3 Summary overflow un 3 Unordered (after floating-point comparison) cr0 0 CR Field 0 cr1 1 CR Field 1 cr2 2 CR Field 2 cr3 3 CR Field 3 cr4 4 CR Field 4 cr5 5 CR Field 5 cr6 6 CR Field 6 cr7 7 CR Field 7 The extended mnemonics in Sections E.2.2 and E.3 require identification of a CR bit: if one of the CR field symbols is used, it must be multiplied by 4 and added to a bit-number-within-CR-field (value in the range 0-3, explicit or sym- bolic) and 32. The extended mnemonics in Sections E.2.3 and E.5 require identification of a CR field: if one of the CR field symbols is used, it must not be multiplied by 4 or added to 32. (For the extended mnemonics in Section E.2.3, the bit number within the CR field is part of the extended mnemonic. The programmer identifies the CR field, and the Assembler does the multiplication and addition required to produce a CR bit number for the BI field of the underlying basic mnemonic.) 627 Power ISATM Book I Version 2.06 E.2 Branch Mnemonics The mnemonics discussed in this section are variations of the Branch Conditional instructions. Note: bclr, bclrl, bcctr, and bcctrl each serve as both a basic and an extended mnemonic. The Assembler will rec- ognize a bclr, bclrl, bcctr, or bcctrl mnemonic with three operands as the basic form, and a bclr, bclrl, bcctr, or bcctrl mnemonic with two operands as the extended form. In the extended form the BH operand is omitted and assumed to be 0b00. Similarly, for all the extended mnemonics described in Sections E.2.2 - E.2.4 that devolve to any of these four basic mnemonics the BH operand can either be coded or omitted. If it is omitted it is assumed to be 0b00. E.2.1 BO and BI Fields The 5-bit BO and BI fields control whether the branch is taken. Providing an extended mnemonic for every possible combination of these fields would be neither useful nor practical. The mnemonics described in Sections E.2.2 - E.2.4 include the most useful cases. Other cases can be coded using a basic Branch Conditional mnemonic (bc[l][a], bclr[l], bcctr[l]) with the appropriate operands. E.2.2 Simple Branch Mnemonics Instructions using one of the mnemonics in Table 110 that tests a Condition Register bit specify the corresponding bit as the first operand. The symbols defined in Section E.1 can be used in this operand. Notice that there are no extended mnemonics for relative and absolute unconditional branches. For these the basic mnemonics b, ba, bl, and bla should be used. Table 110: Simple branch mnemonics LR not Set LR Set Branch Semantics bc bca bclr bcctr bcl bcla bclrl bcctrl Relative Absolute To LR To CTR Relative Absolute To LR To CTR Branch unconditionally - - blr bctr - - blrl bctrl Branch if CRBI=1 bt bta btlr btctr btl btla btlrl btctrl Branch if CRBI=0 bf bfa bflr bfctr bfl bfla bflrl bfctrl Decrement CTR, branch if bdnz bdnza bdnzlr - bdnzl bdnzla bdnzlrl - CTR nonzero Decrement CTR, branch if bdnzt bdnzta bdnztlr - bdnztl bdnztla bdnztlrl - CTR nonzero and CRBI=1 Decrement CTR, branch if bdnzf bdnzfa bdnzflr - bdnzfl bdnzfla bdnzflrl - CTR nonzero and CRBI=0 Decrement CTR, branch if bdz bdza bdzlr - bdzl bdzla bdzlrl - CTR zero Decrement CTR, branch if bdzt bdzta bdztlr - bdztl bdztla bdztlrl - CTR zero and CRBI=1 Decrement CTR, branch if bdzf bdzfa bdzflr - bdzfl bdzfla bdzflrl - CTR zero and CRBI=0 Examples 1. Decrement CTR and branch if it is still nonzero (closure of a loop controlled by a count loaded into CTR). bdnz target (equivalent to: bc 16,0,target) 2. Same as (1) but branch only if CTR is nonzero and condition in CR0 is "equal". bdnzt eq,target (equivalent to: bc 8,2,target) 3. Same as (2), but "equal" condition is in CR5. bdnzt 4×cr5+eq,target (equivalent to: bc 8,22,target) 628 Power ISATM Book I Version 2.06 4. Branch if bit 59 of CR is 0. bf 27,target (equivalent to: bc 4,27,target) 5. Same as (4), but set the Link Register. This is a form of conditional "call". bfl 27,target (equivalent to: bcl 4,27,target) E.2.3 Branch Mnemonics Incorporating Conditions In the mnemonics defined in Table 111, the test of a bit in a Condition Register field is encoded in the mnemonic. Instructions using the mnemonics in Table 111 specify the CR field as an optional first operand. One of the CR field symbols defined in Section E.1 can be used for this operand. If the CR field being tested is CR Field 0, this operand need not be specified unless the resulting basic mnemonic is bclr[l] or bcctr[l] and the BH operand is specified. A standard set of codes has been adopted for the most common combinations of branch conditions. Code Meaning lt Less than le Less than or equal eq Equal ge Greater than or equal gt Greater than nl Not less than ne Not equal ng Not greater than so Summary overflow ns Not summary overflow un Unordered (after floating-point comparison) nu Not unordered (after floating-point comparison) These codes are reflected in the mnemonics shown in Table 111. Table 111: Branch mnemonics incorporating conditions LR not Set LR Set Branch Semantics bcctr bcctrl bc bca bclr bcl bcla bclrl To To Relative Absolute To LR Relative Absolute To LR CTR CTR Branch if less than blt blta bltlr bltctr bltl bltla bltlrl bltctrl Branch if less than or equal ble blea blelr blectr blel blela blelrl blectrl Branch if equal beq beqa beqlr beqctr beql beqla beqlrl beqctrl Branch if greater than or equal bge bgea bgelr bgectr bgel bgela bgelrl bgectrl Branch if greater than bgt bgta bgtlr bgtctr bgtl bgtla bgtlrl bgtctrl Branch if not less than bnl bnla bnllr bnlctr bnll bnlla bnllrl bnlctrl Branch if not equal bne bnea bnelr bnectr bnel bnela bnelrl bnectrl Branch if not greater than bng bnga bnglr bngctr bngl bngla bnglrl bngctrl Branch if summary overflow bso bsoa bsolr bsoctr bsol bsola bsolrl bsoctrl Branch if not summary overflow bns bnsa bnslr bnsctr bnsl bnsla bnslrl bnsctrl Branch if unordered bun buna bunlr bunctr bunl bunla bunlrl bunctrl Branch if not unordered bnu bnua bnulr bnuctr bnul bnula bnulrl bnuctrl Examples 1. Branch if CR0 reflects condition "not equal". bne target (equivalent to: bc 4,2,target) 2. Same as (1), but condition is in CR3. 629 Power ISATM Book I Version 2.06 bne cr3,target (equivalent to: bc 4,14,target) 3. Branch to an absolute target if CR4 specifies "greater than", setting the Link Register. This is a form of condi- tional "call". bgtla cr4,target (equivalent to: bcla 12,17,target) 4. Same as (3), but target address is in the Count Register. bgtctrl cr4 (equivalent to: bcctrl 12,17,0) E.2.4 Branch Prediction Software can use the "at" bits of Branch Conditional instructions to provide a hint to the processor about the behavior of the branch. If, for a given such instruction, the branch is almost always taken or almost always not taken, a suffix can be added to the mnemonic indicating the value to be used for the "at" bits. + Predict branch to be taken (at=0b11) - Predict branch not to be taken (at=0b10) Such a suffix can be added to any Branch Conditional mnemonic, either basic or extended, that tests either the Count Register or a CR bit (but not both). Assemblers should use 0b00 as the default value for the "at" bits, indicating that software has offered no prediction. Examples 1. Branch if CR0 reflects condition "less than", specifying that the branch should be predicted to be taken. blt+ target 2. Same as (1), but target address is in the Link Register and the branch should be predicted not to be taken. bltlr- 630 Power ISATM Book I Version 2.06 E.3 Condition Register Logical Mnemonics The Condition Register Logical instructions can be used to set (to 1), clear (to 0), copy, or invert a given Condition Register bit. Extended mnemonics are provided that allow these operations to be coded easily. Table 112: Condition Register logical mnemonics Operation Extended Mnemonic Equivalent to Condition Register set crset bx creqv bx,bx,bx Condition Register clear crclr bx crxor bx,bx,bx Condition Register move crmove bx,by cror bx,by,by Condition Register not crnot bx,by crnor bx,by,by The symbols defined in Section E.1 can be used to identify the Condition Register bits. Examples 1. Set CR bit 57. crset 25 (equivalent to: creqv 25,25,25) 2. Clear the SO bit of CR0. crclr so (equivalent to: crxor 3,3,3) 3. Same as (2), but SO bit to be cleared is in CR3. crclr 4×cr3+so (equivalent to: crxor 15,15,15) 4. Invert the EQ bit. crnot eq,eq (equivalent to: crnor 2,2,2) 5. Same as (4), but EQ bit to be inverted is in CR4, and the result is to be placed into the EQ bit of CR5. crnot 4×cr5+eq,4×cr4+eq (equivalent to: crnor 22,18,18) E.4 Subtract Mnemonics E.4.1 Subtract Immediate Although there is no "Subtract Immediate" instruction, its effect can be achieved by using an Add Immediate instruc- tion with the immediate operand negated. Extended mnemonics are provided that include this negation, making the intent of the computation clearer. subi Rx,Ry,value (equivalent to: addi Rx,Ry,-value) subis Rx,Ry,value (equivalent to: addis Rx,Ry,-value) subic Rx,Ry,value (equivalent to: addic Rx,Ry,-value) subic. Rx,Ry,value (equivalent to: addic. Rx,Ry,-value) E.4.2 Subtract The Subtract From instructions subtract the second operand (RA) from the third (RB). Extended mnemonics are pro- vided that use the more "normal" order, in which the third operand is subtracted from the second. Both these mne- monics can be coded with a final "o" and/or "." to cause the OE and/or Rc bit to be set in the underlying instruction. sub Rx,Ry,Rz (equivalent to: subf Rx,Rz,Ry) subc Rx,Ry,Rz (equivalent to: subfc Rx,Rz,Ry) 631 Power ISATM Book I Version 2.06 E.5 Compare Mnemonics The L field in the fixed-point Compare instructions controls whether the operands are treated as 64-bit quantities or as 32-bit quantities. Extended mnemonics are provided that represent the L value in the mnemonic rather than requiring it to be coded as a numeric operand. The BF field can be omitted if the result of the comparison is to be placed into CR Field 0. Otherwise the target CR field must be specified as the first operand. One of the CR field symbols defined in Section E.1 can be used for this operand. Note: The basic Compare mnemonics of Power ISA are the same as those of POWER, but the POWER instructions have three operands while the Power ISA instructions have four. The Assembler will recognize a basic Compare mne- monic with three operands as the POWER form, and will generate the instruction with L=0. (Thus the Assembler must require that the BF field, which normally can be omitted when CR Field 0 is the target, be specified explicitly if L is.) E.5.1 Doubleword Comparisons Table 113: Doubleword compare mnemonics Operation Extended Mnemonic Equivalent to Compare doubleword immediate cmpdi bf,ra,si cmpi bf,1,ra,si Compare doubleword cmpd bf,ra,rb cmp bf,1,ra,rb Compare logical doubleword immediate cmpldi bf,ra,ui cmpli bf,1,ra,ui Compare logical doubleword cmpld bf,ra,rb cmpl bf,1,ra,rb Examples 1. Compare register Rx and immediate value 100 as unsigned 64-bit integers and place result into CR0. cmpldi Rx,100 (equivalent to: cmpli 0,1,Rx,100) 2. Same as (1), but place result into CR4. cmpldi cr4,Rx,100 (equivalent to: cmpli 4,1,Rx,100) 3. Compare registers Rx and Ry as signed 64-bit integers and place result into CR0. cmpd Rx,Ry (equivalent to: cmp 0,1,Rx,Ry) E.5.2 Word Comparisons Table 114: Word compare mnemonics Operation Extended Mnemonic Equivalent to Compare word immediate cmpwi bf,ra,si cmpi bf,0,ra,si Compare word cmpw bf,ra,rb cmp bf,0,ra,rb Compare logical word immediate cmplwi bf,ra,ui cmpli bf,0,ra,ui Compare logical word cmplw bf,ra,rb cmpl bf,0,ra,rb Examples 1. Compare bits 32:63 of register Rx and immediate value 100 as signed 32-bit integers and place result into CR0. cmpwi Rx,100 (equivalent to: cmpi 0,0,Rx,100) 2. Same as (1), but place result into CR4. cmpwi cr4,Rx,100 (equivalent to: cmpi 4,0,Rx,100) 3. Compare bits 32:63 of registers Rx and Ry as unsigned 32-bit integers and place result into CR0. cmplw Rx,Ry (equivalent to: cmpl 0,0,Rx,Ry) 632 Power ISATM Book I Version 2.06 E.6 Trap Mnemonics The mnemonics defined in Table 115 are variations of the Trap instructions, with the most useful values of TO repre- sented in the mnemonic rather than specified as a numeric operand. A standard set of codes has been adopted for the most common combinations of trap conditions. Code Meaning TO encoding < > = u lt Less than 16 1 0 0 0 0 le Less than or equal 20 1 0 1 0 0 eq Equal 4 0 0 1 0 0 ge Greater than or equal 12 0 1 1 0 0 gt Greater than 8 0 1 0 0 0 nl Not less than 12 0 1 1 0 0 ne Not equal 24 1 1 0 0 0 ng Not greater than 20 1 0 1 0 0 llt Logically less than 2 0 0 0 1 0 lle Logically less than or equal 6 0 0 1 1 0 lge Logically greater than or equal 5 0 0 1 0 1 lgt Logically greater than 1 0 0 0 0 1 lnl Logically not less than 5 0 0 1 0 1 lng Logically not greater than 6 0 0 1 1 0 u Unconditionally with parameters 31 1 1 1 1 1 (none) Unconditional 31 1 1 1 1 1 These codes are reflected in the mnemonics shown in Table 115. Table 115: Trap mnemonics 64-bit Comparison 32-bit Comparison Trap Semantics tdi td twi tw Immediate Register Immediate Register Trap unconditionally - - - trap Trap unconditionally with parameters tdui tdu twui twu Trap if less than tdlti tdlt twlti twlt Trap if less than or equal tdlei tdle twlei twle Trap if equal tdeqi tdeq tweqi tweq Trap if greater than or equal tdgei tdge twgei twge Trap if greater than tdgti tdgt twgti twgt Trap if not less than tdnli tdnl twnli twnl Trap if not equal tdnei tdne twnei twne Trap if not greater than tdngi tdng twngi twng Trap if logically less than tdllti tdllt twllti twllt Trap if logically less than or equal tdllei tdlle twllei twlle Trap if logically greater than or equal tdlgei tdlge twlgei twlge Trap if logically greater than tdlgti tdlgt twlgti twlgt Trap if logically not less than tdlnli tdlnl twlnli twlnl Trap if logically not greater than tdlngi tdlng twlngi twlng 633 Power ISATM Book I Version 2.06 Examples 1. Trap if register Rx is not 0. tdnei Rx,0 (equivalent to: tdi 24,Rx,0) 2. Same as (1), but comparison is to register Ry. tdne Rx,Ry (equivalent to: td 24,Rx,Ry) 3. Trap if bits 32:63 of register Rx, considered as a 32-bit quantity, are logically greater than 0x7FF. twlgti Rx,0x7FF (equivalent to: twi 1,Rx,0x7FF) 4. Trap unconditionally. trap (equivalent to: tw 31,0,0) 5. Trap unconditionally with immediate parameters Rx and Ry tdu Rx,Ry (equivalent to: td 31,Rx,Ry) 634 Power ISATM Book I Version 2.06 E.7 Rotate and Shift Mnemonics The Rotate and Shift instructions provide powerful and general ways to manipulate register contents, but can be diffi- cult to understand. Extended mnemonics are provided that allow some of the simpler operations to be coded easily. Mnemonics are provided for the following types of operation. Extract Select a field of n bits starting at bit position b in the source register; left or right justify this field in the target register; clear all other bits of the target register to 0. Insert Select a left-justified or right-justified field of n bits in the source register; insert this field starting at bit posi- tion b of the target register; leave other bits of the target register unchanged. (No extended mnemonic is provided for insertion of a left-justified field when operating on doublewords, because such an insertion requires more than one instruction.) Rotate Rotate the contents of a register right or left n bits without masking. Shift Shift the contents of a register right or left n bits, clearing vacated bits to 0 (logical shift). Clear Clear the leftmost or rightmost n bits of a register to 0. Clear left and shift left Clear the leftmost b bits of a register, then shift the register left by n bits. This operation can be used to scale a (known nonnegative) array index by the width of an element. E.7.1 Operations on Doublewords All these mnemonics can be coded with a final "." to cause the Rc bit to be set in the underlying instruction. Table 116: Doubleword rotate and shift mnemonics Operation Extended Mnemonic Equivalent to Extract and left justify immediate extldi ra,rs,n,b (n > 0) rldicr ra,rs,b,n-1 Extract and right justify immediate extrdi ra,rs,n,b (n > 0) rldicl ra,rs,b+n,64-n Insert from right immediate insrdi ra,rs,n,b (n > 0) rldimi ra,rs,64-(b+n),b Rotate left immediate rotldi ra,rs,n rldicl ra,rs,n,0 Rotate right immediate rotrdi ra,rs,n rldicl ra,rs,64-n,0 Rotate left rotld ra,rs,rb rldcl ra,rs,rb,0 Shift left immediate sldi ra,rs,n (n < 64) rldicr ra,rs,n,63-n Shift right immediate srdi ra,rs,n (n < 64) rldicl ra,rs,64-n,n Clear left immediate clrldi ra,rs,n (n < 64) rldicl ra,rs,0,n Clear right immediate clrrdi ra,rs,n (n < 64) rldicr ra,rs,0,63-n Clear left and shift left immediate clrlsldi ra,rs,b,n (n <= b < 64) rldic ra,rs,n,b-n Examples 1. Extract the sign bit (bit 0) of register Ry and place the result right-justified into register Rx. extrdi Rx,Ry,1,0 (equivalent to: rldicl Rx,Ry,1,63) 2. Insert the bit extracted in (1) into the sign bit (bit 0) of register Rz. insrdi Rz,Rx,1,0 (equivalent to: rldimi Rz,Rx,63,0) 3. Shift the contents of register Rx left 8 bits. sldi Rx,Rx,8 (equivalent to: rldicr Rx,Rx,8,55) 4. Clear the high-order 32 bits of register Ry and place the result into register Rx. clrldi Rx,Ry,32 (equivalent to: rldicl Rx,Ry,0,32) 635 Power ISATM Book I Version 2.06 E.7.2 Operations on Words All these mnemonics can be coded with a final "." to cause the Rc bit to be set in the underlying instruction. The operations as described above apply to the low-order 32 bits of the registers, as if the registers were 32-bit registers. The Insert operations either preserve the high-order 32 bits of the target register or place rotated data there; the other operations clear these bits. Table 117: Word rotate and shift mnemonics Operation Extended Mnemonic Equivalent to Extract and left justify immediate extlwi ra,rs,n,b (n > 0) rlwinm ra,rs,b,0,n-1 Extract and right justify immediate extrwi ra,rs,n,b (n > 0) rlwinm ra,rs,b+n,32-n,31 Insert from left immediate inslwi ra,rs,n,b (n > 0) rlwimi ra,rs,32-b,b,(b+n)-1 Insert from right immediate insrwi ra,rs,n,b (n > 0) rlwimi ra,rs,32-(b+n),b,(b+n)-1 Rotate left immediate rotlwi ra,rs,n rlwinm ra,rs,n,0,31 Rotate right immediate rotrwi ra,rs,n rlwinm ra,rs,32-n,0,31 Rotate left rotlw ra,rs,rb rlwnm ra,rs,rb,0,31 Shift left immediate slwi ra,rs,n (n < 32) rlwinm ra,rs,n,0,31-n Shift right immediate srwi ra,rs,n (n < 32) rlwinm ra,rs,32-n,n,31 Clear left immediate clrlwi ra,rs,n (n < 32) rlwinm ra,rs,0,n,31 Clear right immediate clrrwi ra,rs,n (n < 32) rlwinm ra,rs,0,0,31-n Clear left and shift left immediate clrlslwi ra,rs,b,n (n b < 32) rlwinm ra,rs,n,b-n,31-n Examples 1. Extract the sign bit (bit 32) of register Ry and place the result right-justified into register Rx. extrwi Rx,Ry,1,0 (equivalent to: rlwinm Rx,Ry,1,31,31) 2. Insert the bit extracted in (1) into the sign bit (bit 32) of register Rz. insrwi Rz,Rx,1,0 (equivalent to: rlwimi Rz,Rx,31,0,0) 3. Shift the contents of register Rx left 8 bits, clearing the high-order 32 bits. slwi Rx,Rx,8 (equivalent to: rlwinm Rx,Rx,8,0,23) 4. Clear the high-order 16 bits of the low-order 32 bits of register Ry and place the result into register Rx, clearing the high-order 32 bits of register Rx. clrlwi Rx,Ry,16 (equivalent to: rlwinm Rx,Ry,0,16,31) 636 Power ISATM Book I Version 2.06 E.8 Move To/From Special Purpose Register Mnemonics The mtspr and mfspr instructions specify a Special Purpose Register (SPR) as a numeric operand. Extended mne- monics are provided that represent the SPR in the mnemonic rather than requiring it to be coded as an operand. Table 118: Extended mnemonics for moving to/from an SPR Move To SPR Move From SPR Special Purpose Register Extended Equivalent to Extended Equivalent to Fixed-Point Exception Register (XER) mtxer Rx mtspr 1,Rx mfxer Rx mfspr Rx,1 Link Register (LR) mtlr Rx mtspr 8,Rx mflr Rx mfspr Rx,8 Count Register (CTR) mtctr Rx mtspr 9,Rx mfctr Rx mfspr Rx,9 Authority Mask Register (AMR) mtuamr Rx mtspr 13,Rx mfuamr Rx mfspr Rx,13 Program Priority Register (PPR) mtppr Rx mtspr 896,Rx mfppr Rx mfspr Rx,896 Program Priority Register 32-Bit (PPR32)1 mtppr32 Rx mtspr 898,Rx mfppr32 Rx mfspr Rx,898 1 Category: Phased-In Examples 1. Copy the contents of register Rx to the XER. mtxer Rx (equivalent to: mtspr 1,Rx) 2. Copy the contents of the LR to register Rx. mflr Rx (equivalent to: mfspr Rx,8) 3. Copy the contents of register Rx to the CTR. mtctr Rx (equivalent to: mtspr 9,Rx) E.9 Miscellaneous Mnemonics No-op Many Power ISA instructions can be coded in a way such that, effectively, no operation is performed. An extended mnemonic is provided for the preferred form of no-op. If an implementation performs any type of run-time optimization related to no-ops, the preferred form is the no-op that will trigger this. nop (equivalent to: ori 0,0,0) For some uses of a no-op instruction, optimizations related to no-ops, such as removal from the execution stream, are not desireable. An extended mnemonic is provided for the executed form of no-op. This form of no-op will still consume execution resources. xnop (equivalent to: xori 0,0,0) Load Immediate The addi and addis instructions can be used to load an immediate value into a register. Extended mnemonics are provided to convey the idea that no addition is being performed but merely data movement (from the immediate field of the instruction to a register). Load a 16-bit signed immediate value into register Rx. li Rx,value (equivalent to: addi Rx,0,value) Load a 16-bit signed immediate value, shifted left by 16 bits, into register Rx. lis Rx,value (equivalent to: addis Rx,0,value) 637 Power ISATM Book I Version 2.06 Load Address This mnemonic permits computing the value of a base-displacement operand, using the addi instruction which nor- mally requires separate register and immediate operands. la Rx,D(Ry) (equivalent to: addi Rx,Ry,D) The la mnemonic is useful for obtaining the address of a variable specified by name, allowing the Assembler to sup- ply the base register number and compute the displacement. If the variable v is located at offset Dv bytes from the address in register Rv, and the Assembler has been told to use register Rv as a base for references to the data struc- ture containing v, then the following line causes the address of v to be loaded into register Rx. la Rx,v (equivalent to: addi Rx,Rv,Dv) Move Register Several Power ISA instructions can be coded in a way such that they simply copy the contents of one register to another. An extended mnemonic is provided to convey the idea that no computation is being performed but merely data movement (from one register to another). The following instruction copies the contents of register Ry to register Rx. This mnemonic can be coded with a final "." to cause the Rc bit to be set in the underlying instruction. mr Rx,Ry (equivalent to: or Rx,Ry,Ry) Complement Register Several Power ISA instructions can be coded in a way such that they complement the contents of one register and place the result into another register. An extended mnemonic is provided that allows this operation to be coded easily. The following instruction complements the contents of register Ry and places the result into register Rx. This mne- monic can be coded with a final "." to cause the Rc bit to be set in the underlying instruction. not Rx,Ry (equivalent to: nor Rx,Ry,Ry) Move To/From Condition Register This mnemonic permits copying the contents of the low-order 32 bits of a GPR to the Condition Register, using the same style as the mfcr instruction. mtcr Rx (equivalent to: mtcrf 0xFF,Rx) The following instructions may generate either the (old) mtcrf or mfcr instructions or the (new) mtocrf or mfocrf instruction, respectively, depending on the target machine type assembler parameter. mtcrf FXM,Rx mfcr Rx All three extended mnemonics in this subsection are being phased out. In future assemblers the form "mtcr Rx" may not exist, and the mtcrf and mfcr mnemonics may generate the old form instructions (with bit 11 = 0) regardless of the target machine type assembler parameter, or may cease to exist. 638 Power ISATM Book I Version 2.06 Appendix E. Assembler Extended Mnemonics 639 Version 2.06 640 Power ISATM Book I Version 2.06 Appendix F. Programming Examples F.1 Multiple-Precision Shifts them to the case N=2 when the more stringent restric- tion on shift amount is met. For shifts with immediate This section gives examples of how multiple-precision shift amounts only the case N=3 is shown, because the shifts can be programmed. more stringent restriction on shift amount is always met. A multiple-precision shift is defined to be a shift of an N-doubleword quantity (64-bit mode) or an N-word In the examples it is assumed that GPRs 2 and 3 (and quantity (32-bit mode), where N>1. The quantity to be 4) contain the quantity to be shifted, and that the result shifted is contained in N registers. The shift amount is is to be placed into the same registers, except for the specified either by an immediate value in the instruc- immediate left shifts in 64-bit mode for which the result tion, or by a value in a register. is placed into GPRs 3, 4, and 5. In all cases, for both input and result, the lowest-numbered register contains The examples shown below distinguish between the the highest-order part of the data and highest-num- cases N=2 and N>2. If N=2, the shift amount may be in bered register contains the lowest-order part. For the range 0 through 127 (64-bit mode) or 0 through 63 non-immediate shifts, the shift amount is assumed to (32-bit mode), which are the maximum ranges sup- be in GPR 6. For immediate shifts, the shift amount is ported by the Shift instructions used. However if N>2, assumed to be greater than 0. GPRs 0 and 31 are used the shift amount must be in the range 0 through 63 as scratch registers. (64-bit mode) or 0 through 31 (32-bit mode), in order for the examples to yield the desired result. The specific For N>2, the number of instructions required is 2N-1 instance shown for N>2 is N=3; extending those code (immediate shifts) or 3N-1 (non-immediate shifts). sequences to larger N is straightforward, as is reducing Appendix F. Programming Examples 641 Version 2.06 Multiple-precision shifts in 64-bit Multiple-precision shifts in 32-bit mode [Category: 64-Bit] mode Shift Left Immediate, N = 3 (shift amnt < 64) Shift Left Immediate, N = 3 (shift amnt < 32) rldicr r5,r4,sh,63-sh rlwinm r2,r2,sh,0,31-sh rldimi r4,r3,0,sh rlwimi r2,r3,sh,32-sh,31 rldicl r4,r4,sh,0 rlwinm r3,r3,sh,0,31-sh rldimi r3,r2,0,sh rlwimi r3,r4,sh,32-sh,31 rldicl r3,r3,sh,0 rlwinm r4,r4,sh,0,31-sh Shift Left, N = 2 (shift amnt < 128) Shift Left, N = 2 (shift amnt < 64) subfic r31,r6,64 subfic r31,r6,32 sld r2,r2,r6 slw r2,r2,r6 srd r0,r3,r31 srw r0,r3,r31 or r2,r2,r0 or r2,r2,r0 addi r31,r6,-64 addi r31,r6,-32 sld r0,r3,r31 slw r0,r3,r31 or r2,r2,r0 or r2,r2,r0 sld r3,r3,r6 slw r3,r3,r6 Shift Left, N = 3 (shift amnt < 64) Shift Left, N = 3 (shift amnt < 32) subfic r31,r6,64 subfic r31,r6,32 sld r2,r2,r6 slw r2,r2,r6 srd r0,r3,r31 srw r0,r3,r31 or r2,r2,r0 or r2,r2,r0 sld r3,r3,r6 slw r3,r3,r6 srd r0,r4,r31 srw r0,r4,r31 or r3,r3,r0 or r3,r3,r0 sld r4,r4,r6 slw r4,r4,r6 Shift Right Immediate, N = 3 (shift amnt < 64) Shift Right Immediate, N = 3 (shift amnt < 32) rldimi r4,r3,0,64-sh rlwinm r4,r4,32-sh,sh,31 rldicl r4,r4,64-sh,0 rlwimi r4,r3,32-sh,0,sh-1 rldimi r3,r2,0,64-sh rlwinm r3,r3,32-sh,sh,31 rldicl r3,r3,64-sh,0 rlwimi r3,r2,32-sh,0,sh-1 rldicl r2,r2,64-sh,sh rlwinm r2,r2,32-sh,sh,31 Shift Right, N = 2 (shift amnt < 128) Shift Right, N = 2 (shift amnt < 64) subfic r31,r6,64 subfic r31,r6,32 srd r3,r3,r6 srw r3,r3,r6 sld r0,r2,r31 slw r0,r2,r31 or r3,r3,r0 or r3,r3,r0 addi r31,r6,-64 addi r31,r6,-32 srd r0,r2,r31 srw r0,r2,r31 or r3,r3,r0 or r3,r3,r0 srd r2,r2,r6 srw r2,r2,r6 Shift Right, N = 3 (shift amnt < 64) Shift Right, N = 3 (shift amnt < 32) subfic r31,r6,64 subfic r31,r6,32 srd r4,r4,r6 srw r4,r4,r6 sld r0,r3,r31 slw r0,r3,r31 or r4,r4,r0 or r4,r4,r0 srd r3,r3,r6 srw r3,r3,r6 sld r0,r2,r31 slw r0,r2,r31 or r3,r3,r0 or r3,r3,r0 srd r2,r2,r6 srw r2,r2,r6 642 Power ISATM Book I Version 2.06 Multiple-precision shifts in 64-bit Multiple-precision shifts in 32-bit mode, continued [Category: 64-Bit] mode, continued Shift Right Algebraic Immediate, N = 3 (shift amnt < Shift Right Algebraic Immediate, N = 3 (shift amnt < 64) 32) rldimi r4,r3,0,64-sh rlwinm r4,r4,32-sh,sh,31 rldicl r4,r4,64-sh,0 rlwimi r4,r3,32-sh,0,sh-1 rldimi r3,r2,0,64-sh rlwinm r3,r3,32-sh,sh,31 rldicl r3,r3,64-sh,0 rlwimi r3,r2,32-sh,0,sh-1 sradi r2,r2,sh srawi r2,r2,sh Shift Right Algebraic, N = 2 (shift amnt < 128) Shift Right Algebraic, N = 2 (shift amnt < 64) subfic r31,r6,64 subfic r31,r6,32 srd r3,r3,r6 srw r3,r3,r6 sld r0,r2,r31 slw r0,r2,r31 or r3,r3,r0 or r3,r3,r0 addic. r31,r6,-64 addic. r31,r6,-32 srad r0,r2,r31 sraw r0,r2,r31 ble $+8 ble $+8 ori r3,r0,0 ori r3,r0,0 srad r2,r2,r6 sraw r2,r2,r6 Shift Right Algebraic, N = 3 (shift amnt < 64) Shift Right Algebraic, N = 3 (shift amnt < 32) subfic r31,r6,64 subfic r31,r6,32 srd r4,r4,r6 srw r4,r4,r6 sld r0,r3,r31 slw r0,r3,r31 or r4,r4,r0 or r4,r4,r0 srd r3,r3,r6 srw r3,r3,r6 sld r0,r2,r31 slw r0,r2,r31 or r3,r3,r0 or r3,r3,r0 srad r2,r2,r6 sraw r2,r2,r6 Appendix F. Programming Examples 643 Version 2.06 F.2 Floating-Point Conversions [Category: Floating-Point] This section gives examples of how the Floating-Point Warning: Some of the examples use the fsel instruc- Conversion instructions can be used to perform various tion. Care must be taken in using fsel if IEEE compati- conversions. bility is required, or if the values being tested can be NaNs or infinities; see Section F.3.4, "Notes" on page 648. F.2.1 Conversion from F.2.3 Conversion from Floating-Point Number to Floating-Point Number to Unsigned Floating-Point Integer Fixed-Point Integer Doubleword The full convert to floating-point integer function can be The full convert to unsigned fixed-point integer double- implemented with the sequence shown below, assum- word function can be implemented with the sequence ing the floating-point value to be converted is in FPR 1 shown below, assuming the floating-point value to be and the result is returned in FPR 3. converted is in FPR 1, the value 0 is in FPR 0, the value 264-2048 is in FPR 3, the value 263 is in FPR 4 mtfsb0 23 #clear VXCVI and GPR 4, the result is returned in GPR 3, and a dou- fctid[z] f3,f1 #convert to fx int bleword at displacement "disp" from the address in fcfid f3,f3 #convert back again mcrfs 7,5 #VXCVI to CR GPR 1 can be used as scratch space. bf 31,$+8 #skip if VXCVI was 0 fsel f2,f1,f1,f0 #use 0 if < 0 fmr f3,f1 #input was fp int fsub f5,f3,f1 #use max if > max fsel f2,f5,f2,f3 F.2.2 Conversion from fsub f5,f2,f4 #subtract 263 fcmpu cr2,f2,f4 #use diff if >= 263 Floating-Point Number to Signed fsel f2,f5,f5,f2 fctid[z] f2,f2 #convert to fx int Fixed-Point Integer Doubleword stfd f2,disp(r1) #store float ld r3,disp(r1) #load dword The full convert to signed fixed-point integer double- blt cr2,$+8 #add 263 if input word function can be implemented with the sequence add r3,r3,r4 # was >= 263 shown below, assuming the floating-point value to be converted is in FPR 1, the result is returned in GPR 3, and a doubleword at displacement "disp" from the F.2.4 Conversion from address in GPR 1 can be used as scratch space. Floating-Point Number to Signed fctid[z] f2,f1 #convert to dword int Fixed-Point Integer Word stfd f2,disp(r1) #store float ld r3,disp(r1) #load dword The full convert to signed fixed-point integer word func- tion can be implemented with the sequence shown below, assuming the floating-point value to be con- verted is in FPR 1, the result is returned in GPR 3, and a doubleword at displacement "disp" from the address in GPR 1 can be used as scratch space. fctiw[z] f2,f1 #convert to fx int stfd f2,disp(r1) #store float lwa r3,disp+4(r1) #load word algebraic 644 Power ISATM Book I Version 2.06 F.2.5 Conversion from An alternative, shorter, sequence can be used if round- ing according to FSCPRRN is desired and FPSCRRN Floating-Point Number to Unsigned specifies Round toward +Infinity or Round toward Fixed-Point Integer Word -Infinity, or if it is acceptable for the rounded answer to be either of the two representable floating-point inte- The full convert to unsigned fixed-point integer word gers nearest to the given fixed-point integer. In this function can be implemented with the sequence shown case the full convert from unsigned fixed-point integer below, assuming the floating-point value to be con- doubleword function can be implemented with the verted is in FPR 1, the value 0 is in FPR 0, the value sequence shown below, assuming the value 264 is in 232-1 is in FPR 3, the result is returned in GPR 3, and a FPR 2. doubleword at displacement "disp" from the address in GPR 1 can be used as scratch space. std r3,disp(r1) #store dword lfd f1,disp(r1) #load float fsel f2,f1,f1,f0 #use 0 if < 0 fcfid f1,f1 #convert to fp int fsub f4,f3,f1 #use max if > max fadd f4,f1,f2 #add 264 fsel f2,f4,f2,f3 fsel f1,f1,f1,f4 # if r3 < 0 fctid[z] f2,f2 #convert to fx int stfd f2,disp(r1) #store float lwz r3,disp+4(r1) #load word and zero F.2.8 Conversion from Signed Fixed-Point Integer Word to Float- F.2.6 Conversion from Signed ing-Point Number Fixed-Point Integer Doubleword to The full convert from signed fixed-point integer word Floating-Point Number function can be implemented with the sequence shown below, assuming the fixed-point value to be converted The full convert from signed fixed-point integer double- is in GPR 3, the result is returned in FPR 1, and a dou- word function, using the rounding mode specified by bleword at displacement "disp" from the address in FPSCRRN, can be implemented with the sequence GPR 1 can be used as scratch space. (The result is shown below, assuming the fixed-point value to be con- exact.) verted is in GPR 3, the result is returned in FPR 1, and a doubleword at displacement "disp" from the address extsw r3,r3 #extend sign in GPR 1 can be used as scratch space. std r3,disp(r1) #store dword lfd f1,disp(r1) #load float std r3,disp(r1) #store dword fcfid f1,f1 #convert to fp int lfd f1,disp(r1) #load float The following sequence can be used, assuming a word fcfid f1,f1 #convert to fp int at the address in GPR 1 + GPR 2 can be used as scratch space. F.2.7 Conversion from Unsigned stwx r3,r1,r2 # store word Fixed-Point Integer Doubleword to lfiwax f1,r1,r2 # load float Floating-Point Number fcfid f1,f1 # convert to fp int The full convert from unsigned fixed-point integer dou- bleword function, using the rounding mode specified by F.2.9 Conversion from Unsigned FPSCRRN, can be implemented with the sequence Fixed-Point Integer Word to Float- shown below, assuming the fixed-point value to be con- verted is in GPR 3, the value 232 is in FPR 4, the result ing-Point Number is returned in FPR 1, and two doublewords at displace- The full convert from unsigned fixed-point integer word ment "disp" from the address in GPR 1 can be used as function can be implemented with the sequence shown scratch space. below, assuming the fixed-point value to be converted is in GPR 3, the result is returned in FPR 1, and a dou- rldicl r2,r3,32,32 #isolate high half rldicl r0,r3,0,32 #isolate low half bleword at displacement "disp" from the address in std r2,disp(r1) #store dword both GPR 1 can be used as scratch space. (The result is std r0,disp+8(r1) exact.) lfd f2,disp(r1) #load float both lfd f1,disp+8(r1) rldicl r0,r3,0,32 #zero-extend fcfid f2,f2 #convert each half to std r0,disp(r1) #store dword fcfid f1,f1 # fp int (exact result) lfd f1,disp(r1) #load float fmadd f1,f4,f2,f1 #(232)×high + low fcfid f1,f1 #convert to fp int Appendix F. Programming Examples 645 Version 2.06 F.2.10 Unsigned Single-Preci- xori RA,RA,1 sion BCD Arithmetic SignedAdd: xor r5,RA,RB addg6s can be used to add or subtract two BCD oper- andi. r5,r5, 15 # compare sign codes ands. In these examples it is assumed that r0 contains cmpld cr1,RA,RB # compare magnitudes 0x666...666. (BCD data formats are described in beq cr0,samesign Section 5.3 of Book I.) ble cr1,BminusA Addition of the unsigned BCD operand in register RA to # set up for RT = RA -BCD RB the unsigned BCD operand in register RB can be nor r9,RB,RB # one's complement of RB accomplished as follows. addi r10,RA,16 # generate the carry in b submag add r1,RA,r0 add r2,r1,RB BminusA: addg6s RT,r1,RB # set up for RT = RB -BCD RA subf RT,RT,r2 # RT = RA +BCD RB nor r9,RA,RA # one's complement of RA addi r10,RB,16 # generate the carry in Subtraction of the unsigned BCD operand in register RA from the unsigned BCD operand in register RB can submag: be accomplished as follows. (In this example it is rldicr r9,r9,0,59 # remove the sign code assumed that RB is not register 0.) add r8,r10,r9 addg6s RT,r10,r9 addi r1,RB,1 rldicr RT,RT,0,59 # remove generated 6 from nor r2,RA,RA # one's complement of RA # sign position add r3,r1,r2 subf RT,RT,r8 addg6s RT,r1,r2 b done subf RT,RT,r3 # RT = RB -BCD RA samesign: Additional instructions are needed to handle signed rldicr r8,RB,0,59 # remove the sign code BCD operands, and BCD operands that occupy more add r10,RA,r11 # add 6's than one register (e.g., unsigned BCD operands that add r9,r10,r8 have more than 16 decimal digits). addg6s RT,r10,RB subf RT,RT,r9 # RT = RA +BCD RB F.2.11 Signed Single-Precision done: BCD Arithmetic F.2.12 Unsigned Extended-Preci- Addition of the signed 15-digit BCD operand in register RA to the signed BCD operand in register RB can be sion BCD Arithmetic accomplished as follows. If the signs of operands are Multiple precision BCD arithmetic requires additional different, then the operand of smaller magnitude is sub- code to add/subtract higher order digits and handle the tracted from the operand of larger magnitude and the carry between 16 digit groups. For example, the follow- sign of the larger operand is preserved; otherwise the ing sequence implements a 32-digit BCD add. In this operands are added and the sign is preserved. example the contents of register R3 concatenated with The sign code is in the low order 4 bits of the operands the contents of R4 represent the first 32-digit operand and uses one of the standard encodings. (See and the contents of register R5 concatenated with the Section 5.3 of Book I for a description of BCD and sign contents of R6 represents the second operand. The encodings.) This example assumes preferred sign contents of register R3 concatenated with the contents option 1 (0b1100 is plus and 0b1101 is minus). For pre- of register R4 represents the result. ferred sign option 2 (0b1111 is plus and 0b1101 is (In this example r0 contains 0x6666 6666 6666 6666.) minus), replace the xori after the "SignedSub" label with "xori RA,RA,2". add r10,R4,r0 Preserving the appropriate sign code is accomplished addc r9,r10,R6 # generate the carry by zeroing the sign code of the other operand before addg6s R4,r10,R6 performing a 16 digit BCD addition/subtraction. Other subf R4,R4,r9 # RT1 = RA1 +BCD RB1 addends (ones complement or 6's) must leave the sign code position as zero. addze R5,R5 # propagate the carry add r10,R3,r0 (In this example r11 contains 0x6666 6666 6666 6660.) add r9,r10,R5 addg6s R3,r10,R5 subf R3,R3,r9 # RT0 = RA0 +BCD RB0 SignedSub: 646 Power ISATM Book I Version 2.06 Note that an extra instruction (addze) is required to propagate the carry so that the same value is used in the subsequent add and addg6s. The following sequence implements a 32-digit BCD subtraction. In this example the first operand in R3 and R4 is subtracted from the 2nd operand in R5 and R6.The result is in R3 and R4. addi r10,R6,1 nor r9,R4,R4 # one's complement of RA0 addc r8,r10,r9 # Generate the carry addg6s R4,r10,r9 subf R4,R4,r8 # RT1 = RB1 -BCD RA1 addze r10,R5 # propagate the carry nor r9,R3,R3 # one's complement of RA0 add r8,r10,r2 addg6s R3,r10,r9 subf R3,R3,r8 # RT0 = RB0 -BCD RA0 Appendix F. Programming Examples 647 Version 2.06 F.3 Floating-Point Selection [Category: Floating-Point] This section gives examples of how the Floating Select in FPRs fa, fb, fx, fy, and fz. FPR fs is assumed to be instruction can be used to implement floating-point min- available for scratch space. imum and maximum functions, and certain simple Additional examples can be found in Section F.2, forms of if-then-else constructions, without branching. "Floating-Point Conversions [Category: Floating-Point]" The examples show program fragments in an imagi- on page 644. nary, C-like, high-level programming language, and the Warning: Care must be taken in using fsel if IEEE corresponding program fragment using fsel and other compatibility is required, or if the values being tested Power ISA instructions. In the examples, a, b, x, y, and can be NaNs or infinities; see Section F.3.4. z are floating-point variables, which are assumed to be F.3.1 Comparison to Zero F.3.4 Notes The following Notes apply to the preceding examples High-level language: Power ISA: Notes and to the corresponding cases using the other three if a 0.0 then x y fsel fx,fa,fy,fz (1) arithmetic relations (<, , and ). They should also be else x z considered when any other use of fsel is contemplated. if a > 0.0 then x y fneg fs,fa (1,2) fsel fx,fs,fz,fy In these Notes, the "optimized program" is the Power else x z ISA program shown, and the "unoptimized program" if a = 0.0 then x y fsel fx,fa,fy,fz (1) (not shown) is the corresponding Power ISA program else x z fneg fs,fa that uses fcmpu and Branch Conditional instructions fsel fx,fs,fx,fz instead of fsel. 1. The unoptimized program affects the VXSNAN bit F.3.2 Minimum and Maximum of the FPSCR, and therefore may cause the sys- tem error handler to be invoked if the correspond- High-level language: Power ISA: Notes ing exception is enabled, while the optimized x min(a,b) fsub fs,fa,fb (3,4,5) program does not affect this bit. This property of fsel fx,fs,fb,fa the optimized program is incompatible with the IEEE standard. x max(a,b) fsub fs,fa,fb (3,4,5) fsel fx,fs,fa,fb 2. The optimized program gives the incorrect result if a is a NaN. F.3.3 Simple if-then-else 3. The optimized program gives the incorrect result if a and/or b is a NaN (except that it may give the Constructions correct result in some cases for the minimum and maximum functions, depending on how those High-level language: Power ISA: Notes functions are defined to operate on NaNs). if a b then x y fsub fs,fa,fb (4,5) 4. The optimized program gives the incorrect result if else x z fsel fx,fs,fy,fz a and b are infinities of the same sign. (Here it is if a > b then x y fsub fs,fb,fa (3,4,5) assumed that Invalid Operation Exceptions are else x z fsel fx,fs,fz,fy disabled, in which case the result of the subtrac- if a = b then x y fsub fs,fa,fb (4,5) tion is a NaN. The analysis is more complicated if else x z fsel fx,fs,fy,fz Invalid Operation Exceptions are enabled, fneg fs,fs because in that case the target register of the sub- fsel fx,fs,fx,fz traction is unchanged.) 5. The optimized program affects the OX, UX, XX, and VXISI bits of the FPSCR, and therefore may cause the system error handler to be invoked if the corresponding exceptions are enabled, while the unoptimized program does not affect these bits. This property of the optimized program is incom- patible with the IEEE standard. 648 Power ISATM Book I Version 2.06 F.4 Vector Unaligned Storage Operations [Category: Vector] F.4.1 Loading a Unaligned Quad- word Using Permute from Big-Endian Storage The following sequence of instructions copies the unaligned quadword storage operand into VRT. # Assumptions: # Rb != 0 and contents of Rb = 0xB lvx Vhi,0,Rb # load MSQ lvsl Vp,0,Rb # set permute control vector addi Rb,Rb,16 # address of LSQ lvx Vlo,0,Rb # load LSQ perm Vt,Vhi,Vlo,Vp # align the data Appendix F. Programming Examples 649 Version 2.06 650 Power ISATM Book I Version 2.06 Book II: Power ISA Virtual Environment Architecture Book II: Power ISA AS Virtual Environment Architecture 651 Version 2.06 652 Power ISATM Book II Version 2.06 Chapter 1. Storage Model 1.1 Definitions . . . . . . . . . . . . . . . . . . . 653 1.6.7 Strong Access Order [Category: 1.2 Introduction . . . . . . . . . . . . . . . . . . 654 SAO] . . . . . . . . . . . . . . . . . . . . . . . . . . 659 1.3 Virtual Storage . . . . . . . . . . . . . . . 655 1.7 Shared Storage . . . . . . . . . . . . . . 660 1.4 Single-copy Atomicity . . . . . . . . . 655 1.7.1 Storage Access Ordering . . . . 660 1.5 Cache Model . . . . . . . . . . . . . . . . 656 1.7.2 Storage Ordering of I/O 1.6 Storage Control Attributes . . . . . . 656 Accesses . . . . . . . . . . . . . . . . . . . . . . . 662 1.6.1 Write Through Required . . . . . . 657 1.7.3 Atomic Update . . . . . . . . . . . . . . 662 1.6.2 Caching Inhibited . . . . . . . . . . . 657 1.7.3.1 Reservations . . . . . . . . . . . . . 663 1.6.3 Memory Coherence Required 1.7.3.2 Forward Progress . . . . . . . . . . 665 [Category: Memory Coherence] . . . . . 657 1.8 Instruction Storage . . . . . . . . . . . . 665 1.6.4 Guarded . . . . . . . . . . . . . . . . . . 658 1.8.1 Concurrent Modification and Execu- 1.6.5 Endianness [Category: tion of Instructions . . . . . . . . . . . . . . . . 668 Embedded.Little-Endian]. . . . . . . . . . . 658 1.6.6 Variable Length Encoded (VLE) Instructions . . . . . . . . . . . . . . . . . . . . . 658 1.1 Definitions instruction storage The view of storage as seen by the mechanism The following definitions, in addition to those specified that fetches instructions. in Book I, are used in this Book. In these definitions, data storage "Load instruction" includes the Cache Management The view of storage as seen by a Load or Store and other instructions that are stated in the instruction instruction. descriptions to be "treated as a Load", and similarly for "Store instruction". program order The execution of instructions in the order required by the sequential execution model. (See system Section 2.2 of Book I.) A dcbz instruction that A combination of processors, storage, and associ- modifies storage which contains instructions has ated mechanisms that is capable of executing pro- the same effect with respect to the sequential exe- grams. Sometimes the reference to system cution model as a Store instruction as described includes services provided by the privileged soft- there.) ware. storage location main storage A contiguous sequence of one or more bytes in The level of storage hierarchy in which all storage storage. When used in association with a specific state is visible to all processors and mechanisms instruction or the instruction fetching mechanism, in the system. the length of the sequence of one or more bytes is typically implied by the operation. In other uses, it primary cache may refer more abstractly to a group of bytes The level of cache closest to the processor. which share common storage attributes. secondary cache storage access After the primary cache, the next closest level of An access to a storage location. There are three cache to the processor. (mutually exclusive) kinds of storage access. Chapter 1. Storage Model 653 Version 2.06 - data access the block that existed in its instruction cache when the instruction causing the invalidation was exe- An access to the storage location specified by cuted, and similarly for a data cache block invali- a Load or Store instruction, or, if the access is dation. performed "out-of-order" (see Section 5.5 of Book III-S and Section 6.5 of Book III-E), an The preceding definitions apply regardless of access to a storage location as if it were the whether P1 and P2 are the same entity. storage location specified by a Load or Store page (virtual page) instruction. 2n contiguous bytes of storage aligned such that - instruction fetch the effective address of the first byte in the page is an integral multiple of the page size for which pro- An access for the purpose of fetching an tection and control attributes are independently instruction. specifiable and for which reference and change - implicit access status are independently recorded. An access by the processor for the purpose of block address translation or reference and change The aligned unit of storage operated on by the recording (see Book III-S). Cache Management instructions. The size of an instruction cache block may differ from the size of caused by, associated with a data cache block, and both sizes may vary - caused by between implementations. The maximum block size is equal to the minimum page size. A storage access is said to be caused by an instruction if the instruction is a Load or Store aligned storage access and the access (data access) is to the storage A load or store is aligned if the address of the tar- location specified by the instruction. get storage location is a multiple of the size of the transfer effected by the instruction. - associated with A storage access is said to be associated with an instruction if the access is for the purpose 1.2 Introduction of fetching the instruction (instruction fetch), or is a data access caused by the instruction, or The Power ISA User Instruction Set Architecture, dis- is an implicit access that occurs as a side cussed in Book I, defines storage as a linear array of effect of fetching or executing the instruction. bytes indexed from 0 to a maximum of 264-1. Each byte is identified by its index, called its address, and each prefetched instructions byte contains a value. This information is sufficient to Instructions for which a copy of the instruction has allow the programming of applications that require no been fetched from instruction storage, but the special features of any particular system environment. instruction has not yet been executed. The Power ISA Virtual Environment Architecture, uniprocessor described herein, expands this simple storage model to A system that contains one processor. include caches, virtual storage, and shared storage multiprocessors. The Power ISA Virtual Environment multiprocessor Architecture, in conjunction with services based on the A system that contains two or more processors. Power ISA Operating Environment Architecture (see Book III) and provided by the operating system, permits shared storage multiprocessor explicit control of this expanded storage model. A sim- A multiprocessor that contains some common stor- ple model for sequential execution allows at most one age, which all the processors in the system can storage access to be performed at a time and requires access. that all storage accesses appear to be performed in performed program order. In contrast to this simple model, the A load or instruction fetch by a processor or mech- Power ISA specifies a relaxed model of storage consis- anism (P1) is performed with respect to any pro- tency. In a multiprocessor system that allows multiple cessor or mechanism (P2) when the value to be copies of a storage location, aggressive implementa- returned by the load or instruction fetch can no tions of the architecture can permit intervals of time longer be changed by a store by P2. A store by P1 during which different copies of a storage location have is performed with respect to P2 when a load by P2 different values. This chapter describes features of the from the location accessed by the store will return Power ISA that enable programmers to write correct the value stored (or a value stored subsequently). programs for this storage model. An instruction cache block invalidation by P1 is performed with respect to P2 when an instruction fetch by P2 will not be satisfied from the copy of 654 Power ISATM Book II Version 2.06 1.3 Virtual Storage No other accesses are guaranteed to be atomic. For example, the access caused by the following instruc- The Power ISA system implements a virtual storage tions is not guaranteed to be atomic. model for applications. This means that a combination any Load or Store instruction for which the oper- of hardware and software can present a storage model and is unaligned that allows applications to exist within a "virtual" lmw, stmw, lswi, lswx, stswi, stswx address space larger than either the effective address lfdp, lfdpx, stfdp, stfdpx space or the real address space. any Cache Management instruction Each program can access 264 bytes of "effective An access that is not atomic is performed as a set of address" (EA) space, subject to limitations imposed by smaller disjoint atomic accesses. The number and the operating system. In a typical Power ISA system, alignment of these accesses are implementation- each program's EA space is a subset of a larger "virtual dependent, as is the relative order in which they are address" (VA) space managed by the operating sys- performed. Accesses that are aligned on a doubleword tem. boundary for lfdp, lfdpx, stfdp, and stfdpx are per- Each effective address is translated to a real address formed as a pair of disjoint atomic doubleword (i.e., to an address of a byte in real storage or on an I/O accesses. device) before being used to access storage. The The results for several combinations of loads and hardware accomplishes this, using the address transla- stores to the same or overlapping locations are tion mechanism described in Book III. The operating described below. system manages the real (physical) storage resources of the system, by setting up the tables and other infor- 1. When two processors execute atomic stores to mation used by the hardware address translation locations that do not overlap, and no other stores mechanism. are performed to those locations, the contents of those locations are the same as if the two stores were performed by a single processor. In general, real storage may not be large enough to map all the virtual pages used by the currently active 2. When two processors execute atomic stores to the applications. With support provided by hardware, the same storage location, and no other store is per- operating system can attempt to use the available real formed to that location, the contents of that loca- pages to map a sufficient set of virtual pages of the tion are the result stored by one of the processors. applications. If a sufficient set is maintained, "paging" 3. When two processors execute stores that have the activity is minimized. If not, performance degradation same target location and are not guaranteed to be is likely. atomic, and no other store is performed to that location, the result is some combination of the The operating system can support restricted access to bytes stored by both processors. virtual pages (including read/write, read only, and no access; see Book III), based on system standards 4. When two processors execute stores to overlap- (e.g., program code might be read only) and application ping locations, and no other store is performed to requests. those locations, the result is some combination of the bytes stored by the processors to the overlap- ping bytes. The portions of the locations that do 1.4 Single-copy Atomicity not overlap contain the bytes stored by the proces- sor storing to the location. An access is single-copy atomic, or simply atomic, if it is always performed in its entirety with no visible frag- 5. When a processor executes an atomic store to a mentation. Atomic accesses are thus serialized: each location, a second processor executes an atomic happens in its entirety in some order, even when that load from that location, and no other store is per- order is not specified in the program or enforced formed to that location, the value returned by the between processors. load is the contents of the location before the store or the contents of the location after the store. Vector storage accesses are not guaranteed to be atomic. The following other types of single-register 6. When a load and a store with the same target loca- accesses are always atomic: tion can be executed simultaneously, and no other store is performed to that location, the value byte accesses (all bytes are aligned on byte returned by the load is some combination of the boundaries) contents of the location before the store and the halfword accesses aligned on halfword boundaries contents of the location after the store. word accesses aligned on word boundaries doubleword accesses aligned on doubleword boundaries (64-bit implementations only; see Section 1.2 of Book III-E) Chapter 1. Storage Model 655 Version 2.06 1.5 Cache Model 1.6 Storage Control Attributes A cache model in which there is one cache for instruc- Some operating systems may provide a means to allow tions and another cache for data is called a "Harvard- programs to specify the storage control attributes style" cache. This is the model assumed by the Power described in this section. Because the support pro- ISA, e.g., in the descriptions of the Cache Management vided for these attributes by the operating system may instructions in Section 4.3. Alternative cache models vary between systems, the details of the specific sys- may be implemented (e.g., a "combined cache" model, tem being used must be known before these attributes in which a single cache is used for both instructions and can be used. data, or a model in which there are several levels of Storage control attributes are associated with units of caches), but they support the programming model storage that are multiples of the page size. Each stor- implied by a Harvard-style cache. age access is performed according to the storage con- The processor is not required to maintain copies of trol attributes of the specified storage location, as storage locations in the instruction cache consistent described below. The storage control attributes are the with modifications to those storage locations (e.g., following. modifications caused by Store instructions). Write Through Required A location in the data cache is considered to be modi- Caching Inhibited fied in that cache if the location has been modified Memory Coherence Required (e.g., by a Store instruction) and the modified data have Guarded not been written to main storage. Endianness Strong Access Order [Category: SAO] Cache Management instructions are provided so that programs can manage the caches when needed. For These attributes have meaning only when an effective example, program management of the caches is address is translated by the processor performing the needed when a program generates or modifies code storage access. that will be executed (i.e., when the program modifies Additional storage control attributes may be data in storage and then attempts to execute the modi- defined for some implementations. See Section 6.8 of fied data as instructions). The Cache Management Book III-E for additional information. instructions are also useful in optimizing the use of memory bandwidth in such applications as graphics Programming Note and numerically intensive computing. The functions performed by these instructions depend on the storage The Write Through Required and Caching Inhibited control attributes associated with the specified storage attributes are mutually exclusive because, as location (see Section 1.6, "Storage Control Attributes"). described below, the Write Through Required attribute permits the storage location to be in the The Cache Management instructions allow the pro- data cache while the Caching Inhibited attribute gram to do the following. does not. invalidate the copy of storage in an instruction Storage that is Write Through Required or Caching cache block (icbi) Inhibited is not intended to be used for general-pur- provide a hint that an instruction will probably pose programming. For example, the lbarx, lharx, soon be accessed from a specified instruction lwarx, ldarx, stbcx., sthcx., stwcx., and stdcx. cache block (icbt) instructions may cause the system data storage provide a hint that the program will probably soon error handler to be invoked if they specify a location access a specified data cache block (dcbt, dcbtst) in storage having either of these attributes. allocate a data cache block and set the con- tents of that block to zeros, but perform no opera- In the remainder of this section, "Load instruction" tion if no write access is allowed to the data cache includes the Cache Management and other instructions block (dcba) that are stated in the instruction descriptions to be set the contents of a data cache block to zeros "treated as a Load" unless they are explicitly excluded, (dcbz) and similarly for "Store instruction". copy the contents of a modified data cache block to main storage (dcbst) copy the contents of a modified data cache block 1.6.1 Write Through Required to main storage and make the copy of the block in A store to a Write Through Required storage location is the data cache invalid (dcbf or dcbfl) performed in main storage. A Store instruction that specifies a location in Write Through Required storage may cause additional locations in main storage to be accessed. If a copy of the block containing the speci- 656 Power ISATM Book II Version 2.06 fied location is retained in the data cache, the store is is not available to every processor or mechanism at the also performed in the data cache. The store does not same instant, and it may be that a processor or mecha- cause the block to be considered to be modified in the nism observes only some of the values that are written data cache. to a location. However, when a location is accessed atomically and coherently by all processors and mech- In general, accesses caused by separate Store instruc- anisms, the sequence of values loaded from the loca- tions that specify locations in Write Through Required tion by any processor or mechanism during any interval storage may be combined into one access. Such com- of time forms a subsequence of the sequence of values bining does not occur if the Store instructions are sepa- that the location logically held during that interval. That rated by a sync, eieio, or mbar instruction. is, a processor or mechanism can never load a "newer" value first and then, later, load an "older" value. 1.6.2 Caching Inhibited Memory coherence is managed in blocks called coher- An access to a Caching Inhibited storage location is ence blocks. Their size is implementation-dependent, performed in main storage. A Load instruction that but is larger than a word and is usually the size of a specifies a location in Caching Inhibited storage may cache block. cause additional locations in main storage to be For storage that is not Memory Coherence Required, accessed unless the specified location is also Guarded. software must explicitly manage memory coherence to An instruction fetch from Caching Inhibited storage may the extent required by program correctness. The oper- cause additional words in main storage to be accessed. ations required to do this may be system-dependent. No copy of the accessed locations is placed into the caches. Because the Memory Coherence Required attribute for a given storage location is of little use unless all pro- In general, non-overlapping accesses caused by sepa- cessors that access the location do so coherently, in rate Load instructions that specify locations in Caching statements about Memory Coherence Required stor- Inhibited storage may be combined into one access, as age elsewhere in this document it is generally assumed may non-overlapping accesses caused by separate that the storage has the Memory Coherence Required Store instructions that specify locations in Caching attribute for all processors that access it. Inhibited storage. Such combining does not occur if the Load or Store instructions are separated by a sync or Programming Note mbar instruction. Combining may also occur Operating systems that allow programs to request among such accesses from multiple processors that that storage not be Memory Coherence Required share a common memory interface. No combining should provide services to assist in managing occurs if the storage is also Guarded. memory coherence for such storage, including all system-dependent aspects thereof. Programming Note None of the memory barrier instructions prevent In most systems the default is that all storage is the combining of accesses from different proces- Memory Coherence Required. For some applica- sors. The Guarded storage attribute must be used tions in some systems, software management of in combination with Caching Inhibited to prevent coherence may yield better performance. In such such combining. cases, a program can request that a given unit of storage not be Memory Coherence Required, and can manage the coherence of that storage by using 1.6.3 Memory Coherence the sync instruction, the Cache Management instructions, and services provided by the operat- Required [Category: Memory ing system. Coherence] An access to a Memory Coherence Required storage 1.6.4 Guarded location is performed coherently, as follows. A data access to a Guarded storage location is per- Memory coherence refers to the ordering of stores to a formed only if either (a) the access is caused by an single location. Atomic stores to a given location are instruction that is known to be required by the sequen- coherent if they are serialized in some order, and no tial execution model, or (b) the access is a load and the processor or mechanism is able to observe any subset storage location is already in a cache. If the storage is of those stores as occurring in a conflicting order. This also Caching Inhibited, only the storage location speci- serialization order is an abstract sequence of values; fied by the instruction is accessed; otherwise any stor- the physical storage location need not assume each of age location in the cache block containing the specified the values written to it. For example, a processor may storage location may be accessed. update a location several times before the value is writ- ten to physical storage. The result of a store operation Chapter 1. Storage Model 657 Version 2.06 For the Server environment, instructions are not fetched from virtual storage that is Guarded. If the instruction addressed by the current instruction address is in such storage, the system instruction stor- age error handler may be invoked (see Section 6.5.5 of Book III-S). Programming Note In some implementations, instructions may be exe- cuted before they are known to be required by the sequential execution model. Because the results of instructions executed in this manner are dis- carded if it is later determined that those instruc- tions would not have been executed in the sequential execution model, this behavior does not affect most programs. This behavior does affect programs that access storage locations that are not "well-behaved" (e.g., a storage location that represents a control register on an I/O device that, when accessed, causes the device to perform an operation). To avoid unin- tended results, programs that access such storage locations should request that the storage be Guarded, and should prevent such storage loca- tions from being in a cache (e.g., by requesting that the storage also be Caching Inhibited). 1.6.5 Endianness [Category: Embedded.Little-Endian] The Endianness storage control attribute specifies the byte ordering (Big-Endian or Little-Endian) that is used when the storage location is accessed; see Section 1.10 of Book I. 1.6.6 Variable Length Encoded (VLE) Instructions VLE storage is used to store VLE instructions. Instruc- tions fetched from VLE storage are processed as VLE instructions. VLE storage must also be Big-Endian. Instructions fetched from VLE storage that is Little- Endian cause a Byte-ordering exception, and the sys- tem instruction storage error handler will be invoked. The VLE attribute has no effect on data accesses. See Chapter 1 of Book VLE. 658 Power ISATM Book II Version 2.06 1.6.7 Strong Access Order [Cate- other processors and mechanisms, and than it is per- formed in memory. A direct consequence of this con- gory: SAO] sideration is that although programs running on each processor will see the same sequence of accesses All accesses to storage with the Strong Access Order from any individual processor to SAO storage, each (SAO) attribute (referred to as SAO storage) will be may in general see a different interleaving of the indi- performed using a set of ordering rules different from vidual sequences. The memory barrier instructions that of the weakly consistent model that is described in may be used to establish stronger ordering, as Section 1.7.1, "Storage Access Ordering". These rules described in Section 1.7.1, "Storage Access Ordering", apply only to accesses that are caused by a Load or a beginning with the third major bullet. Store, and not to accesses associated with those instructions. Furthermore, these rules do not apply to accesses that are caused by or associated with instruc- tions that are stated in their descriptions to be "treated as a Load" or "treated as a Store." The details are described below, from the programmer's point of view. (The processor may deviate from these rules if the pro- grammer cannot detect the deviation.) The SAO attribute is not intended to be used for general purpose programming. It is provided in a manner that is not fully independent of the other storage attributes. Specifi- cally, it is only provided for storage that is Memory Coherence Required, but not Write Through Required, not Caching Inhibited, and not Guarded. See Section 5.8.2.1, "Storage Control Bit Restrictions", in Book III-S for more details. Accesses to SAO storage are likely to be performed more slowly than similar accesses to non-SAO storage. The order in which a processor performs storage accesses to SAO storage, the order in which those accesses are performed with respect to other proces- sors and mechanisms, and the order in which those accesses are performed in main storage are the same except in the circumstances described in the following paragraph. The ordering rules for accesses performed by a single processor to SAO storage are as follows. Stores are performed in program order. When a store accesses data adjacent to that which is accessed by the next store in program order, the two storage accesses may be combined into a single larger access. Loads are performed in program order. When a load accesses data adjacent to that which is accessed by the next load in program order, the two storage accesses may be combined into a single larger access. Stores may not be performed before loads which pre- cede them in program order. Loads may be performed before stores which precede them in program order, with the provision that a load which follows a store of the same datum (to the same address) must obtain a value which is no older (in consideration of the possibil- ity of programs on other processors sharing the same storage) than the value stored by the preceding store. When any given processor loads the datum it just stored, as described above, the load may be performed by the processor before the preceding store has been performed with respect to other processors and mecha- nisms, and in main storage. This may cause the pro- cessor to see its store earlier relative to stores performed by other processors than it is observed by Chapter 1. Storage Model 659 Version 2.06 1.7 Shared Storage value is used to compute the effective address specified by the second Load), the corresponding This architecture supports the sharing of storage storage accesses are performed in program order between programs, between different instances of the with respect to any processor or mechanism to the same program, and between processors and other extent required by the associated Memory Coher- mechanisms. It also supports access to a storage loca- ence Required attributes. This applies even if the tion by one or more programs using different effective dependency has no effect on program logic (e.g., addresses. All these cases are considered storage the value returned by the first Load is ANDed with sharing. Storage is shared in blocks that are an inte- zero and then added to the effective address spec- gral number of pages. ified by the second Load). When the same storage location has different effective When a processor (P1) executes a Synchronize, addresses, the addresses are said to be aliases. Each eieio, or mbar instruction a memory bar- application can be granted separate access privileges rier is created, which orders applicable storage to aliased pages. accesses pairwise, as follows. Let A be a set of storage accesses that includes all storage accesses associated with instructions preceding 1.7.1 Storage Access Ordering the barrier-creating instruction, and let B be a set of storage accesses that includes all storage The Power ISA defines two models for the ordering of accesses associated with instructions following the storage accesses: weakly consistent and strong barrier-creating instruction. For each applicable access ordering. The predominant model is weakly pair ai,bj of storage accesses such that ai is in A consistent. This model provides an opportunity for and bj is in B, the memory barrier ensures that ai improved performance over a model that has stronger will be performed with respect to any processor or consistency rules, but places the responsibility on the mechanism, to the extent required by the associ- program to ensure that ordering or synchronization ated Memory Coherence Required attributes, instructions are properly placed when storage is shared before bj is performed with respect to that proces- by two or more programs. Implementations which sup- sor or mechanism. port Category SAO apply a stronger consistency model among accesses to SAO storage. The order between The ordering done by a memory barrier is said to accesses to SAO storage and those performed using be "cumulative" if it also orders storage accesses the weakly consistent model is characteristic of the that are performed by processors and mecha- weakly consistent model. The following description, nisms other than P1, as follows. through the second major bullet, applies only to the - A includes all applicable storage accesses by weakly consistent model. The corresponding descrip- any such processor or mechanism that have tion for SAO storage is found in Section 1.6.7, "Strong been performed with respect to P1 before the Access Order [Category: SAO]". The rest of the memory barrier is created. description following the second bulletted item applies to both models. - B includes all applicable storage accesses by any such processor or mechanism that are The order in which the processor performs storage performed after a Load instruction executed accesses, the order in which those accesses are per- by that processor or mechanism has returned formed with respect to another processor or mecha- the value stored by a store that is in B. nism, and the order in which those accesses are performed in main storage may all be different. Several No ordering should be assumed among the storage means of enforcing an ordering of storage accesses accesses caused by a single instruction (i.e, by an are provided to allow programs to share storage with instruction for which the access is not atomic), even if other programs, or with mechanisms such as I/O the accesses are to SAO storage, and no means are devices. These means are listed below. The phrase provided for controlling that order. "to the extent required by the associated Memory Coherence Required attributes" refers to the Memory Coherence Required attribute, if any, associated with each access. If two Store instructions or two Load instructions specify storage locations that are both Caching Inhibited and Guarded, the corresponding storage accesses are performed in program order with respect to any processor or mechanism. If a Load instruction depends on the value returned by a preceding Load instruction (because the 660 Power ISATM Book II Version 2.06 Programming Note Because stores cannot be performed "out-of-order" not order the Store Conditional's store with respect (see Book III), if a Store instruction depends on the to storage accesses caused by instructions that value returned by a preceding Load instruction follow the Branch. (because the value returned by the Load is used to Because processors may predict branch target compute either the effective address specified by the addresses and branch condition resolution, control Store or the value to be stored), the corresponding stor- dependencies (e.g., branches) do not order stor- age accesses are performed in program order. The age accesses except as described above. For same applies if whether the Store instruction is exe- example, when a subroutine returns to its caller cuted depends on a conditional Branch instruction that the return address may be predicted, with the in turn depends on the value returned by a preceding result that loads caused by instructions at or after Load instruction. the return address may be performed before the Because an isync instruction prevents the execution of load that obtains the return address is performed. instructions following the isync until instructions pre- Because processors may implement nonarchitected ceding the isync have completed, if an isync follows a duplicates of architected resources (e.g., GPRs, CR conditional Branch instruction that depends on the fields, and the Link Register), resource dependencies value returned by a preceding Load instruction, the (e.g., specification of the same target register for two load on which the Branch depends is performed before Load instructions) do not order storage accesses. any loads caused by instructions following the isync. This applies even if the effects of the "dependency" are Examples of correct uses of dependencies, sync, independent of the value loaded (e.g., the value is lwsync, and eieio to order storage accesses can compared to itself and the Branch tests the EQ bit in be found in Appendix B. "Programming Examples for the selected CR field), and even if the branch target is Sharing Storage" on page 719. the sequentially next instruction. Because the storage model is weakly consistent, the With the exception of the cases described above and sequential execution model as applied to instructions earlier in this section, data dependencies and control that cause storage accesses guarantees only that dependencies do not order storage accesses. Exam- those accesses appear to be performed in program ples include the following. order with respect to the processor executing the instructions. For example, an instruction may com- If a Load instruction specifies the same storage plete, and subsequent instructions may be executed, location as a preceding Store instruction and the before storage accesses caused by the first instruction location is in storage that is not Caching Inhibited, have been performed. However, for a sequence of the load may be satisfied from a "store queue" (a atomic accesses to the same storage location, if the buffer into which the processor places stored val- location is in storage that is Memory Coherence ues before presenting them to the storage sub- Required the definition of coherence guarantees that system), and not be visible to other processors the accesses are performed in program order with and mechanisms. A consequence is that if a sub- respect to any processor or mechanism that accesses sequent Store depends on the value returned by the location coherently, and similarly if the location is in the Load, the two stores need not be performed in storage that is Caching Inhibited. program order with respect to other processors and mechanisms. Because accesses to storage that is Caching Inhibited Because a Store Conditional instruction may com- are performed in main storage, memory barriers and plete before its store has been performed, a condi- dependencies on Load instructions order such tional Branch instruction that depends on the CR0 accesses with respect to any processor or mechanism value set by a Store Conditional instruction does even if the storage is not Memory Coherence Required. Chapter 1. Storage Model 661 Version 2.06 word forms lbarx, stbcx., lharx, sthcx., ldarx, and Programming Note stdcx. is the same except for obvious substitutions. The first example below illustrates cumulative ordering of storage accesses preceding a memory The lwarx instruction is a load from a word-aligned barrier, and the second illustrates cumulative location that has two side effects. Both of these side ordering of storage accesses following a memory effects occur at the same time that the load is per- barrier. Assume that locations X, Y, and Z initially formed. contain the value 0. Example 1: Processor A: stores the value 1 to location X Processor B: loads from location X obtaining the value 1, executes a sync instruction, then stores the value 2 to location Y Processor C: loads from location Y obtaining the value 2, executes a sync instruction, then loads from location X Example 2: Processor A: stores the value 1 to location X, executes a sync instruction, then stores the value 2 to location Y Processor B: loops loading from location Y until the value 2 is obtained, then stores the value 3 to location Z Processor C: loads from location Z obtaining the value 3, executes a sync instruction, then loads from location X In both cases, cumulative ordering dictates that the value loaded from location X by processor C is 1. 1.7.2 Storage Ordering of I/O Accesses A "coherence domain" consists of all processors and all interfaces to main storage. Memory reads and writes initiated by mechanisms outside the coherence domain are performed within the coherence domain in the order in which they enter the coherence domain and are performed as coherent accesses. 1.7.3 Atomic Update The Load And Reserve and Store Conditional instruc- tions together permit atomic update of a shared storage location. There are byte, halfword, word, and double- word forms of each of these instructions. Described here is the operation of the word forms lwarx and stwcx.; operation of the byte, halfword, and double- 662 Power ISATM Book II Version 2.06 1. A reservation for a subsequent stwcx. instruction Programming Note is created. Before reassigning a virtual address to a different 2. The memory coherence mechanism is notified that real page, privileged software may need to clear all a reservation exists for the storage location speci- processors' reservations for the original real page fied by the lwarx. in order to avoid a Store Conditional being suc- The stwcx. instruction is a store to a word-aligned loca- cessful only because the corresponding reserva- tion that is conditioned on the existence of the reserva- tion for the original location is not cleared by a store tion created by the lwarx and on whether the same to the new real page by some other processor or storage location is specified by both instructions. To mechanism. This clearing of reservations is unnec- emulate an atomic operation with these instructions, it essary on processors that support the Store Condi- is necessary that both the lwarx and the stwcx. specify tional Page Mobility category. the same storage location. The Store Conditional Page Mobility category does A stwcx. performs a store to the target storage location not provide a mechanism for the Store Conditional only if the storage location specified by the lwarx that instruction to detect that a virtual page has been established the reservation has not been stored into by moved to a new real page and back again to the another processor or mechanism since the reservation original real page that was accessed by a Load and was created. If the storage locations specified by the Reserve instruction. Privileged software that moves two instructions differ, the store is not necessarily per- a virtual page could clear the reservation on the formed except that if the Store Conditional Page Mobil- processor it is running on in order to ensure that a ity category is supported and the storage locations are Store Conditional instruction executed by that pro- in different naturally aligned blocks of real storage cessor does not succeed in this case. (The stores whose size is the smallest real page size supported by that occur naturally as part of moving the virtual the implementation, the store is not performed. page will cause any reservations, held by other processors, in the target real page to be lost.) A stwcx. that performs its store is said to "succeed". Examples of the use of lwarx and stwcx. are given in Appendix B. "Programming Examples for Sharing Stor- 1.7.3.1 Reservations age" on page 719. The ability to emulate an atomic operation using lwarx A successful stwcx. to a given location may complete and stwcx. is based on the conditional behavior of before its store has been performed with respect to stwcx., the reservation created by lwarx, and the other processors and mechanisms. As a result, a sub- clearing of that reservation if the target storage location sequent load or lwarx from the given location by is modified by another processor or mechanism before another processor may return a "stale" value. However, the stwcx. performs its store. a subsequent lwarx from the given location by the A reservation is held on an aligned unit of real storage other processor followed by a successful stwcx. by called a reservation granule. The size of the reserva- that processor is guaranteed to have returned the value tion granule is 2n bytes, where n is implementation- stored by the first processor's stwcx. (in the absence of dependent but is always at least 4 (thus the minimum other stores to the given location). reservation granule size is a quadword) and, if the If a Store Conditional instruction is used with a preced- Store Conditional Page Mobility category is supported, ing Load and Reserve instruction that has a different where 2n is not larger than the smallest real page size storage operand length (e.g., stwcx. with ldarx), the supported by the implementation. The reservation reservation is cleared and it is undefined whether the granule associated with effective address EA contains store is performed. the real address to which EA maps. ("real_addr(EA)" in the RTL for the Load And Reserve and Store Condi- Programming Note tional instructions stands for "real address to which EA maps".) The reservation also has an associated length, The store caused by a successful stwcx. is which is equal to the storage operand length, in bytes, ordered, by a dependence on the reservation, with of the Load and Reserve instruction that established respect to the load caused by the lwarx that estab- the reservation. lished the reservation, such that the two storage accesses are performed in program order with A processor has at most one reservation at any time. A respect to any processor or mechanism. reservation is established by executing a lbarx, lharx, lwarx, or ldarx instruction, as described below, and is lost (or may be lost, in the case of the fourth, fifth, sixth, seventh and ninth item) if any of the following occur. Items 1-8 apply only if the relevant access is performed. (For example, an access that would ordi- narily be caused by an instruction might not be per- Chapter 1. Storage Model 663 Version 2.06 formed if the instruction causes the system error Programming Note handler to be invoked.) One use of lwarx and stwcx. is to emulate a "Com- 1. The processor holding the reservation executes pare and Swap" primitive like that provided by the another lbarx, lharx, lwarx, or ldarx: this clears IBM System/370 Compare and Swap instruction; the first reservation and establishes a new one. see Section B.1, "Atomic Update Primitives" on 2. The processor holding the reservation executes page 719. A System/370-style Compare and Swap any stbcx., sthcx., stwcx., or stdcx., regardless checks only that the old and current values of the of whether the specified address matches the word being tested are equal, with the result that address specified by the lbarx, lharx, lwarx, or programs that use such a Compare and Swap to ldarx that established the reservation, and regard- control a shared resource can err if the word has less of whether the storage operand lengths of the been modified and the old value subsequently two instructions are the same. restored. The combination of lwarx and stwcx. 3. Some other processor executes a Store, dcbz, or improves on such a Compare and Swap, because dcbzep that specifies a location in the same the reservation reliably binds the lwarx and stwcx. reservation granule. together. The reservation is always lost if the word is modified by another processor or mechanism 4. Some other processor executes a dcbtst, dcbt- between the lwarx and stwcx., so the stwcx. step, or dcbtstls that specifies a location never succeeds unless the word has not been in the same reservation granule: whether the res- stored into (by another processor or mechanism) ervation is lost is undefined. (For a dcbtst instruc- since the lwarx. tion that specifies a data stream, "location" in the preceding sentence includes all locations in the data stream.) Programming Note 5. Some other processor executes a dcba that In general, programming conventions must ensure specifies a location in the same reservation gran- that lwarx and stwcx. specify addresses that ule: the reservation is lost if the instruction causes match; a stwcx. should be paired with a specific the target block to be newly established in a data lwarx to the same storage location. Situations in cache or to be modified; otherwise whether the which a stwcx. may erroneously be issued after reservation is lost is undefined. some lwarx other than that with which it is intended 6. Some other processor executes a dcbi that to be paired must be scrupulously avoided. For specifies a location in the same reservation gran- example, there must not be a context switch in ule: the reservation may be lost if the instruction is which the processor holds a reservation in behalf of treated as a Store. the old context, and the new context resumes after 7. Any processor modifies a Reference or a lwarx and before the paired stwcx.. The stwcx. Change bit (see Book III-S) in the same reserva- in the new context might succeed, which is not tion granule: whether the reservation is lost is what was intended by the programmer. Such a sit- undefined. uation must be prevented by executing a stbcx., sthcx., stwcx., or stdcx. that specifies a dummy 8. Some mechanism other than a processor modifies a storage location in the same reservation granule. writable aligned location as part of the context switch; see Section 6.4.3 of Book III-S and 9. An interrupt (see Book III) occurs on the processor Section 7.5 of Book III-E. holding the reservation: for the Embedded environ- ment, whether the reservation is lost is undefined. (For the Server environment the reservation is not lost. However, for both environments, system soft- ware invoked by interrupts may clear the reserva- tion.) 10. Implementation-specific characteristics of the coherence mechanism cause the reservation to be lost. 664 Power ISATM Book II Version 2.06 Programming Note Programming Note Because the reservation is lost if another processor The architecture does not include a "fairness guar- stores anywhere in the reservation granule, lock antee". In competing for a reservation, two proces- words (or bytes, halfwords, or doublewords) should sors can indefinitely lock out a third. be allocated such that few such stores occur, other than perhaps to the lock word itself. (Stores by other processors to the lock word result from con- tention for the lock, and are an expected conse- 1.8 Instruction Storage quence of using locks to control access to shared The instruction execution properties and requirements storage; stores to other locations in the reservation described in this section, including its subsections, granule can cause needless reservation loss.) apply only to instruction execution that is required by Such allocation can most easily be accomplished the sequential execution model. by allocating an entire reservation granule for the lock and wasting all but one word. Because reser- In this section, including its subsections, it is assumed vation granule size is implementation-dependent, that all instructions for which execution is attempted are portable code must do such allocation dynamically. in storage that is not Caching Inhibited and (unless instruction address translation is disabled; see Similar considerations apply to other data that are Book III-S) is not Guarded, and from which instruction shared directly using lwarx and stwcx. (e.g., point- fetching does not cause the system error handler to be ers in certain linked lists; see Section B.3, "List invoked (e.g., from which instruction fetching is not pro- Insertion" on page 723). hibited by the "address translation mechanism" or the "storage protection mechanism"; see Book III). 1.7.3.2 Forward Progress Programming Note Forward progress in loops that use lwarx and stwcx. is The results of attempting to execute instructions achieved by a cooperative effort among hardware, sys- from storage that does not satisfy this assumption tem software, and application software. are described in Section 1.6.2 and Section 1.6.4 of The architecture guarantees that when a processor this Book and in Book III. executes a lwarx to obtain a reservation for location X and then a stwcx. to store a value to location X, either For each instance of executing an instruction from loca- tion X, the instruction may be fetched multiple times. 1. the stwcx. succeeds and the value is written to The instruction cache is not necessarily kept consistent location X, or with the data cache or with main storage. It is the 2. the stwcx. fails because some other processor or responsibility of software to ensure that instruction stor- mechanism modified location X, or age is consistent with data storage when such consis- tency is required for program correctness. 3. the stwcx. fails because the processor's reserva- tion was lost for some other reason. After one or more bytes of a storage location have been modified and before an instruction located in that In Cases 1 and 2, the system as a whole makes storage location is executed, software must execute progress in the sense that some processor successfully the appropriate sequence of instructions to make modifies location X. Case 3 covers reservation loss instruction storage consistent with data storage. Other- required for correct operation of the rest of the system. wise the result of attempting to execute the instruction This includes cancellation caused by some other pro- is boundedly undefined except as described in cessor or mechanism writing elsewhere in the reserva- Section 1.8.1, "Concurrent Modification and Execution tion granule, cancellation caused by the operating of Instructions" on page 668. system in managing certain limited resources such as real storage, and cancellation caused by any of the other effects listed in see Section 1.7.3.1. An implementation may make a forward progress guar- antee, defining the conditions under which the system as a whole makes progress. Such a guarantee must specify the possible causes of reservation loss in Case 3. While the architecture alone cannot provide such a guarantee, the characteristics listed in Cases 1 and 2 are necessary conditions for any forward progress guarantee. An implementation and operating system can build on them to provide such a guarantee. Chapter 1. Storage Model 665 Version 2.06 Programming Note Following are examples of how to make instruction li r0,1 #put a 1 value in r0 storage consistent with data storage. Because the opti- dcbst X #copy the block in main storage mal instruction sequence to make instruction storage sync #order copy before invalidation consistent with data storage may vary between sys- icbi X #invalidate copy in instr cache tems, many operating systems will provide a system sync #order invalidation before store # to flag service to perform this function. stw r0,flag #set flag indicating instruction Case 1: The given program does not modify instruc- # storage is now consistent tions executed by another program nor does another The following instruction sequence, executed by the program modify the instructions executed by the given waiting program, will prevent the waiting programs from program. executing the instruction at location X until location X in Assume that location X previously contained the instruction storage is consistent with data storage, and instruction A0; the program modified one of more bytes then will cause any prefetched instructions to be dis- of that location such that, in data storage, the location carded. contains the instruction A1; and location X is wholly lwz r0,flag #loop until flag = 1 (when 1 is contained in a single cache block. The following cmpwi r0,1 # loaded, location X in inst'n bne $-8 # storage is consistent with instruction sequence will make instruction storage con- # location X in data storage) sistent with data storage such that if the isync was in isync #discard any prefetched inst'ns location X-4, the instruction A1 in location X would be executed immediately after the isync. In the preceding instruction sequence any context syn- dcbst X #copy the block to main storage chronizing instruction (e.g., rfid) can be used instead of sync #order copy before invalidation isync. (For Case 1 only isync can be used.) icbi X #invalidate copy in instr cache isync #discard prefetched instructions For both cases, if two or more instructions in separate data cache blocks have been modified, the dcbst Case 2: One or more programs execute the instruc- instruction in the examples must be replaced by a tions that are concurrently being modified by another sequence of dcbst instructions such that each block program. containing the modified instructions is copied back to Assume program A has modified the instruction at loca- main storage. Similarly, for icbi the sequence must tion X and other programs are waiting for program A to invalidate each instruction cache block containing a signal that the new instruction is ready to execute. The location of an instruction that was modified. The sync following instruction sequence will make instruction instruction that appears above between "dcbst X" and storage consistent with data storage and then set a flag "icbi X" would be placed between the sequence of to indicate to the waiting programs that the new instruc- dcbst instructions and the sequence of icbi instructions. tion can be executed. 666 Power ISATM Book II Version 2.06 1.8.1 Concurrent Modification and Programming Note Execution of Instructions An example of how failure to satisfy the require- ments given above can cause inconsistent informa- The phrase "concurrent modification and execution of tion to be presented to the system error handler is instructions" (CMODX) refers to the case in which a as follows. If the value X0 (an illegal instruction) is processor fetches and executes an instruction from executed, causing the system illegal instruction instruction storage which is not consistent with data handler to be invoked, and before the error handler storage or which becomes inconsistent with data stor- can load X0 into a register, X0 is replaced with X1, age prior to the completion of its processing. This sec- an Add Immediate instruction, it will appear that a tion describes the only case in which executing this legal instruction caused an illegal instruction instruction under these conditions produces defined exception. results. In the remainder of this section the following terminol- Programming Note ogy is used. It is possible to apply a patch or to instrument a Location X is an arbitrary word-aligned storage given program without the need to suspend or halt location. the program. This can be accomplished by modify- X0 is the value of the contents of location X for ing the example shown in the Programming Note at which software has made the location X in instruc- the end of Section 1.8 where one program is creat- tion storage consistent with data storage. ing instructions to be executed by one or more other programs. X1, X2, ..., Xn are the sequence of the first n values occupying location X after X0. In place of the Store to a flag to indicate to the other programs that the code is ready to be exe- Xn is the first value of X subsequent to X0 for which cuted, the program that is applying the patch would software has again made instruction storage con- replace a patch class instruction in the original pro- sistent with data storage. gram with a Branch instruction that would cause The "patch class" of instructions consists of the I- any program executing the Branch to branch to the form Branch instruction (b[l][a]) and the preferred newly created code. The first instruction in the no-op instruction (ori 0,0,0). newly created code must be an isync, which will cause any prefetched instructions to be discarded, If the instruction from location X is executed after the ensuring that the execution is consistent with the copy of location X in instruction storage is made consis- newly created code. The instruction storage loca- tent for the value X0 and before it is made consistent for tion containing the isync instruction in the patch the value Xn, the results of executing the instruction are area must be consistent with data storage with defined if and only if the following conditions are satis- respect to the processor that will execute the fied. patched code before the Store which stores the 1. The stores that place the values X1, ..., Xn into new Branch instruction is performed. location X are atomic stores that modify all four bytes of location X. Programming Note 2. Each Xi, 0 i n, is a patch class instruction. It is believed that all processors that comply with 3. Location X is in storage that is Memory Coherence versions of the architecture that precede Version Required. 2.01 support concurrent modification and execution of instructions as described in this section if the If these conditions are satisfied, the result of each exe- requirements given above are satisfied, and that cution of an instruction from location X will be the exe- most such processors yield boundedly undefined cution of some Xi, 0 i n. The value of the ordinate i results if the requirements given above are not sat- associated with each value executed may be different isfied. However, in general such support has not and the sequence of ordinates i associated with a been verified by processor testing. Also, one such sequence of values executed is not constrained, (e.g., processor is known to yield undefined results in a valid sequence of executions of the instruction at certain cases if the requirements given above are location X could be the sequence Xi, Xi+2, then Xi-1). If not satisfied. these conditions are not satisfied, the results of each such execution of an instruction from location X are boundedly undefined, and may include causing incon- sistent information to be presented to the system error handler. Chapter 1. Storage Model 667 Version 2.06 668 Power ISATM Book II Version 2.06 Chapter 2. Effect of Operand Placement on Performance 2.1 Instruction Restart . . . . . . . . . . . 671 The placement (location and alignment) of operands in 1. Operand Size storage affects relative performance of storage 2. Operand Alignment accesses, and may affect it significantly. The best per- 3. Crossing no boundary formance is guaranteed if storage operands are 4. Crossing a cache block boundary aligned. In order to obtain the best performance across 5. Crossing a virtual page boundary the widest range of implementations, the programmer should assume the performance model described in The Move Assist instructions have no alignment Figure 1 with respect to the placement of storage oper- requirements. ands for the Embedded environment. For the Server environment, Figure 1 applies for Big-Endian byte ordering, and Little-Endian byte ordering. Performance of storage accesses varies depending on the following: Chapter 2. Effect of Operand Placement on Performance 671 Version 2.06 Operand Boundary Crossing Byte Cache Virtual Size Align. None Block Page2 Integer 8 Byte 8 optimal - - 4 good good good <4 good good good 4 Byte 4 optimal - - <4 good good good 2 Byte 2 optimal - - <2 good good good 1 Byte 1 optimal - - lmw, 4 good good good stmw <4 poor poor poor string good good good Float lfdp, 16 optimal - - lfdpx, <16 poor poor poor stfdp, stfdpx 8 Byte 8 optimal - - 4 good good poor <4 poor poor poor 4 Byte 4 optimal - - <4 poor poor poor Vector any any optimal3 - - 1 If an instruction causes an access that is not atomic and any portion of the operand is in stor- age that is Write Through Required or Caching Inhibited, performance is likely to be poor. 2 If the storage operand spans two virtual pages that have different storage control attributes or, in the Server environment, spans two segments, performance is likely to be poor. 3 The storage operands for Vector instructions are all assumed to be aligned (see Section 6.4 of Book I). Figure 1. Performance effects of storage operand placement 672 Power ISATM Book II Version 2.06 2.1 Instruction Restart Programming Note There are many events that might cause a Load or In this section, "Load instruction" includes the Cache Store instruction to be restarted. For example, a Management and other instructions that are stated in hardware error may cause execution of the instruc- the instruction descriptions to be "treated as a Load", tion to be aborted after part of the access has been and similarly for "Store instruction". performed, and the recovery operation could then The following instructions are never restarted after hav- cause the aborted instruction to be re-executed. ing accessed any portion of the storage operand When an instruction is aborted after being partially (unless the instruction causes a "Data Address Break- executed, the contents of the instruction pointer point match", for which the corresponding rules are indicate that the instruction has not been executed, given in Book III). however, the contents of some registers may have 1. A Store instruction that causes an atomic access been altered and some bytes within the storage and, for the Embedded environment, accesses operand may have been accessed. The following storage that is Guarded are examples of an instruction being partially exe- 2. A Load instructionthat causes an atomic access to cuted and altering the program state even though it storage that is Guarded and, for the Server envi- appears that the instruction has not been executed. ronment, is also Caching Inhibited. 1. Load Multiple, Load String: Some registers in Any other Load or Store instruction may be partially the range of registers to be loaded may have executed and then aborted after having accessed a been altered. portion of the storage operand, and then re-executed 2. Any Store instruction, dcbz: Some bytes of the (i.e., restarted, by the processor or the operating sys- storage operand may have been altered. tem). If an instruction is partially executed, the contents of registers are preserved to the extent that the correct result will be produced when the instruction is re-exe- cuted. Additional restrictions on the partial execution of instructions are described in Section 6.6 of Book III-S and Section 7.7 of Book III-E. Programming Note In order to ensure that the contents of registers are preserved to the extent that a partially executed instruction can be re-executed correctly, the regis- ters that are preserved must satisfy the following conditions. For any given instruction, zero or more of the conditions applies. For a fixed-point Load instruction that is not a multiple or string form, or for an eciwx instruc- tion, if RT=RA or RT=RB then the contents of register RT are not altered. For an update form Load or Store instruction, the contents of register RA are not altered. Chapter 2. Effect of Operand Placement on Performance 673 Version 2.06 674 Power ISATM Book II Version 2.06 Chapter 3. Management of Shared Resources 3.1 Program Priority Registers . . . . . . 675 3.2 "or" Instruction . . . . . . . . . . . . . . . 676 The facilities described in this section provide the Programming Note means to control the use of resources that are shared with other processors. The ability to access the low-order half of the PPR (and thus the use of mfppr and mtppr) might be phased out in a future version of the architecture. 3.1 Program Priority Registers Programming Note The Program Priority Register (PPR) is a 64-bit register that controls the program's priority. The PPR provides By setting the PRI field, a programmer may be able access to the full 64-bit PPR, and the Program Priority to improve system throughput by causing system Register 32-bit (PPR32) provides access to the upper resources to be used more efficiently. 32 bits of the PPR. The Embedded environment only E.g., if a program is waiting on a lock (see Section provides access to PPR32. The layouts of the PPR and B.2), it could set low priority, with the result that PPR32 are shown in Figure 2. more processor resources would be diverted to the program that holds the lock. This diversion of PPR [Category: Server]: resources may enable the lock-holding program to /// PRI /// ??? /// ??? complete the operation under the lock more 0 11 14 26 32 44 63 quickly, and then relinquish the lock to the waiting PPR32 [Category: Phased-In] program. /// PRI /// ??? 32 43 46 58 63 Programming Note Bit(s) Description or Rx,Rx,Rx can be used to modify the PRI field; see Section 3.2. 11:13 Program Priority (PRI) (PPR3243:45) Programming Note 010 low 011 medium low (normal) When the system error handler is invoked, the PRI 100 medium field may be set to an undefined value. If other values are written to this field, the PRI field is not changed. (See Section 4.3.4 of Book III-S for additional information.) 26:31 Implementation-specific (PPR3258:63) 44:63 implementation-specific All other fields are reserved. Figure 2. Program Priority Register Chapter 3. Management of Shared Resources 675 Version 2.06 3.2 "or" Instruction Setting the PPR The or Rx,Rx,Rx (see Book I) instruction can be used to set PPRPRI as shown in Figure 3. or. Rx,Rx,Rx does not set PPRPRI. Rx PPRPRI Priority 1 010 low 6 011 medium low (normal) 2 100 medium Figure 3. Priority levels for or Rx,Rx,Rx The following forms of or Rx,Rx,Rx provide hints about usage of shared processor resources. "or" Shared Resource Hints or 27,27,27 This form of or provides a hint that performance will probably be improved if shared resources ded- icated to the executing processor are released for use by other processors. or 29,29,29 This form of or provides a hint that performance will probably be improved if shared resources ded- icated to the executing processor are released until all outstanding storage accesses to caching- inhibited storage have been completed. or 30,30,30 This form of or provides a hint that performance will probably be improved if shared resources ded- icated to the executing processor are released until all outstanding storage accesses to cache- able storage for which the data is not in the cache have been completed. Extended Mnemonics: Additional extended mnemonics for the or hints: Extended: Equivalent to: yield or 27,27,27 mdoio or 29,29,29 mdoom or 30,30,30 Programming Note Warning: Other forms of or Rx,Rx,Rx that are not described in this section may also cause program priority to change. Use of these forms should be avoided except when software explicitly intends to alter program priority. If a no-op is needed, the pre- ferred no-op (ori 0,0,0) should be used. 676 Power ISATM Book II Version 2.06 Chapter 4. Storage Control Instructions 4.1 Parameters Useful to Application Pro- 4.4.1 Instruction Synchronize grams . . . . . . . . . . . . . . . . . . . . . . . . . 677 Instruction . . . . . . . . . . . . . . . . . . . . . . 693 4.2 Data Stream Control Register (DSCR) 4.4.2 Load and Reserve and Store Condi- [Category: Stream] . . . . . . . . . . . . . . . 678 tional Instructions. . . . . . . . . . . . . . . . . 693 4.3 Cache Management Instructions . 679 4.4.2.1 64-Bit Load and Reserve and 4.3.1 Instruction Cache Instructions . . 680 Store Conditional Instructions [Category: 4.3.2 Data Cache Instructions . . . . . . 681 64-Bit] . . . . . . . . . . . . . . . . . . . . . . . . . 699 4.3.2.1 Obsolete Data Cache Instructions 4.4.3 Memory Barrier Instructions. . . . 701 [Category: Vector.Phased-Out] . . . . . . 692 4.4.4 Wait Instruction. . . . . . . . . . . . . . 704 4.4 Synchronization Instructions. . . . . 693 4.1 Parameters Useful to Appli- cation Programs It is suggested that the operating system provide a ser- vice that allows an application program to obtain the following information. 1. The virtual page sizes 2. Coherence block size 3. Reservation granule size 4. An indication of the cache model implemented (e.g., Harvard-style cache, combined cache) 5. Instruction cache size 6. Data cache size 7. Instruction cache block size 8. Data cache block size 9. Instruction cache associativity 10. Data cache associativity 11. Number of stream IDs supported for the stream variant of dcbt 12. Factors for converting the Time Base to seconds If the caches are combined, the same value should be given for an instruction cache attribute and the corre- sponding data cache attribute. Chapter 4. Storage Control Instructions 677 Version 2.06 4.2 Data Stream Control Regis- Programming Note ter (DSCR) [Category: Stream] The SNSE and SSE fields do not affect the initia- tion of streams specified using the dcbt and dcbtst The layout of the Data Stream Control Register instructions. (DSCR) is shown in Figure 4 below. See Section 4.3.2 Note that even when SNSE is not set, hardware for information on streams. may detect Stride-N streams in intervals when they access elements that map to sequential cache // SNSE SSE DPFD blocks. 0 59 60 61 63 Figure 4. Data Stream Control Register Programming Note Bits Name Description The contents of the DSCR are intended to be man- 59 SNSE Stride-N Stream Enable aged by application programs. Access to the DSCR is privileged because, when the DSCR was added This bit enables the hardware to the architecture, adding it as non-privileged detection and initiation of load and would have been incompatible with the application store streams that have a stride binary interface (ABI) of some operating systems. greater than a single cache block. Operating systems will provide a service that Such store streams are detected allows application programs to manage the con- only when SSE is also one. tents of the DSCR. 60 SSE Store Stream Enable This bit enables hardware detec- tion and initiation of store streams. 61:63 DPFD Default Prefetch Depth This field supplies a prefetch depth for hardware-detected streams and for software-defined streams for which a depth of zero is specified or for which dcbt/ dcbtst with TH=1010 is not used in their description. Values and their meanings are as follows. 0 default (LPCRDPFD) 1 none 2 shallowest 3 shallow 4 medium 5 deep 6 deeper 7 deepest The contents of the DSCR affect how a processor han- dles hardware-detected and software-defined data streams. A move to the DSCR causes all active and nascent data streams to cease to exist. Access to this SPR is privileged; see Book III. 678 Power ISATM Book II Version 2.06 4.3 Cache Management Instructions The Cache Management instructions obey the sequen- tial execution model except as described in Section 4.3.1. In the instruction descriptions the statements "this instruc- tion is treated as a Load" and "this instruction is treated as a Store" mean that the instruction is treated as a Load (Store) from (to) the addressed byte with respect to address translation, the definition of program order on page 653, storage protection, reference and change recording, and the storage access ordering described in Section 1.7.1 and is treated as a Read (Write) from (to) the addressed byte with respect to debug events unless otherwise specified. (See Book III-E.) Programming Note Accesses that are caused by or associated with Cache Management instructions that are "treated as a Load" or "treated as a Store" are not subject to the special ordering rules described for SAO stor- age. These accesses are always performed in accordance with the weakly consistent storage model. Some Cache Management instructions contain a CT field that is used to specify a cache level within a cache hierarchy or a portion of a cache structure to which the instruction is to be applied. The correspondence between the CT value specified and the cache level is shown below. CT Field Value Cache Level 0 Primary Cache 2 Secondary Cache CT values not shown above may be used to specify implementation-dependent cache levels or implemen- tation-dependent portions of a cache structure. Chapter 4. Storage Control Instructions 679 Version 2.06 4.3.1 Instruction Cache Instructions Instruction Cache Block Invalidate X-form Instruction Cache Block Touch X-form icbi RA,RB icbt CT, RA, RB [Category: Embedded] 31 /// RA RB 982 / 0 6 11 16 21 31 31 / CT RA RB 22 / 0 6 7 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). If the block containing the byte addressed by EA is in storage that is Memory Coherence Required and a If CT=0, this instruction provides a hint that the pro- block containing the byte addressed by EA is in the gram will probably soon execute code from the instruction cache of any processors, the block is invali- addressed location. dated in those instruction caches. If CT0, the operation performed by this instruction is If the block containing the byte addressed by EA is in implementation-dependent, except that the instruction storage that is not Memory Coherence Required and is treated as a no-op for values of CT that are not the block is in the instruction cache of this processor, implemented. the block is invalidated in that instruction cache. The hint is ignored if the block is Caching Inhibited. The function of this instruction is independent of This instruction is treated as a Load (see Section 4.3), whether the block containing the byte addressed by EA except that the system instruction storage error handler is in storage that is Write Through Required or Caching is not invoked. Inhibited. Special Registers Altered: This instruction is treated as a Load (see Section 4.3), None except that reference and change recording need not be done. Special Registers Altered: None Programming Note Because the instruction is treated as a Load, the effective address is translated using translation resources that are used for data accesses, even though the block being invalidated was copied into the instruction cache based on translation resources used for instruction fetches (see Book III). Programming Note The invalidation of the specified block need not have been performed with respect to the processor executing the icbi instruction until a subsequent isync instruction has been executed by that pro- cessor. No other instruction or event has the corre- sponding effect. 680 Power ISATM Book II Version 2.06 4.3.2 Data Cache Instructions The Data Cache instructions control various aspects of fied, the address of the first element is the address of the data cache. the first unit. TH field in the dcbt and dcbtst instructions Programming Note Described below are the TH field values for the dcbt The architecture does not provide a way to specify and dcbtst instructions. For all TH field values which the size of the data elements that compose a are not listed, the hint provided by the instruction is stream. An implementation may assume some undefined. fixed size for all data elements. As a result, depending on the offset, stride, and size (and in TH=0b00000 particular whether the elements are aligned), the If TH=0b00000, the dcbt/dcbtst instruction provides a implementation may reduce the latency for access- hint that the program will probably soon access the ing only a portion of some of the elements. A future block containing the byte addressed by EA. version of the architecture may enable the specifi- cation of element size to avoid this limitation. TH=0b00000 - 0b00111 [Category: Cache Specification] Each such data stream is associated, by software, with a stream ID, which is a resource that the processor In addition to the hint specified above for the TH field uses to distinguish the data stream from other such value of 0b00000, an additional hint is provided indicat- data streams. The number of stream IDs is an imple- ing that placement of the block in the cache specified mentation-dependent value in the range 1:16. Stream by the TH field might also improve performance. The IDs are numbered sequentially starting from 0. correspondence between each value of the TH field and the cache to be specified is the same as the corre- The encodings of the TH field and of the corresponding spondence between each value the CT field and the EA values are as follows. In the EA layout diagrams, cache to be specified as defined in Section 4.3. The fields shown as "/"s are reserved. These reserved fields hints corresponding to values of the TH field not sup- are treated in the same manner as the corresponding ported by the implementation are undefined. case for instruction fields (see Section 1.3.3 of Book I). If a reserved value is specified for a defined EA field, or TH=0b01000 - 0b01111 [Category: Stream] if a TH value is specified that is not explicitly defined The dcbt/dcbtst instructions provide hints regarding a below, the hint provided by the instruction is undefined. sequence of accesses to data elements, or indicate the expected use thereof. Such a sequence is called a TH Description "data stream", and a dcbt/dcbtst instruction in which 01000 The dcbt/dcbtst instruction provides a hint TH is set to one of these values is said to be a "data that describes certain attributes of a data stream variant" of dcbt/dcbtst. In the remainder of this stream, and may indicate that the program will section, "data stream" may be abbreviated to "stream". probably soon access the stream. A data stream to which a program may perform Load The EA is interpreted as follows. accesses is said to be a "load data stream", and is described using the data stream variants of the dcbt EATRUNC D UG / ID instruction. A data stream to which a program may per- 0 57 59 60 63 form Store accesses is said to be a "store data stream", and is described using the data stream variants of the dcbtst instruction. Bit(s) Description 0:56 EATRUNC When, and how often, effective addresses for a data stream are translated is implementation-dependent. High-order 57 bits of the effective address of the first element of the data Each data element is associated with a unit of storage, stream. (i.e., the effective address of which is the aligned 128-byte location in storage that the first unit of the stream is contains the first byte of the element. The data stream EATRUNC || 70) variants may be used to specify the address of the beginning of the data stream, the displacement (stride) 57 Direction (D) between the first byte of successive elements, and the 0 Subsequent elements have number of unique units of storage that are associated increasing addresses. with all of the data elements. If the stride is specified, 1 Subsequent elements have both the stride and the address of the first element are decreasing addresses. specified at 4 byte granularity. If the stride is not speci- Chapter 4. Storage Control Instructions 681 Version 2.06 58 Unlimited/GO (UG) stream ID. (All other fields of the EA except the ID field are ignored.) 0 No information is provided by the 11 For dcbt, the program will probably UG field. no longer access the load and 1 The number of elements in the store data streams associated with data stream is unlimited, the ele- all stream IDs. (All other fields of ments are adjacent to each other, the EA are ignored.) For dcbtst, the program's need for each ele- this field value holds no meaning, ment of the stream is not likely to and is treated as though it were be transient, and the program will 0b00. probably soon access the stream. 59 Reserved 35 Reserved 60:63 Stream ID (ID) 36:38 Depth (DEP) Stream ID to use for this data stream. The DEP field provides a relative esti- 01010 The dcbt/dcbtst instruction provides a hint mate of how many elements ahead of that describes certain attributes of a data the point of stream use the latency- stream, or indicates that the program will reducing actions should go. This value probably soon access data streams that have reflects a comparison of the rate of been described using data stream variants of consumption of the elements of the the dcbt/dcbtst instruction, or will probably no data stream and the latency to bring an longer access such data streams. arbitrary element of the stream into The EA is interpreted as follows. If GO=1 and cache. The values are as follows. S0b00 the hint provided by the instruction is 0 default = DSCRDPFD undefined; the remainder of this instruction 1 none description assumes that this combination is 2 shallowest not used. 3 shallow 4 medium /// GO S / DEP // UNITCNT T U / ID 5 deep 0 32 35 36 39 47 57 59 60 63 6 deeper 7 deepest Bit(s) Description 39:46 Reserved 0:31 Reserved 47:56 UNITCNT 32 GO Number of units in data stream. 0 No information is provided by the 57 Transient (T) GO field. 1 For dcbt, the program will probably If T=1, the program's need for each soon access all nascent load and element of the data stream is likely to store data streams that have been be transient (i.e., the time interval dur- completely described, and will ing which the program accesses the probably no longer access all other element is likely to be short). nascent load and store data 58 Unlimited (U) streams. All other fields of the EA are ignored. ("Nascent" and "com- If U=1, the number of units in the data pletely described" are defined stream is unlimited (and the UNITCNT below.) For dcbtst, this field value field is ignored). holds no meaning and is treated as 59 Reserved though it were zero. 60:63 Stream ID (ID) 33:34 Stop (S) Stream ID to use for this data stream 00 No information is provided by the S (GO=0 and S=0b00), or stream ID field. associated with the data stream which 01 Reserved the program will probably no longer 10 The program will probably no access(S=0b10). longer access the data stream (if any) associated with the specified 682 Power ISATM Book II Version 2.06 Programming Note Programming Note To maximize the utility of the Depth control mecha- A program should use a dcbt/dcbtst instruction nism, the architecture provides a hierarchy of three with TH=0b01011 only when the stride is larger ways to program it. In the server environment, the than 128 bytes. Otherwise, consecutive units will DPFD field in the LPCR is used by the provisory/ be accessed, so the additional stream information firmware to set a safe or appropriate default depth has no benefit. for unaware operating systems and applications. (The corresponding default in the embedded envi- ronment is implementation specific.) The DPFD field in the DSCR may be initialized by the aware If the specified stream ID value is greater than m -1, OS and overwritten by an application via the OS- where m is the number of stream IDs provided by the provided service when per stream control is unnec- implementation, and either (a) TH=0b01000 or essary or unaffordable. The DEP field in the EA TH=0b01011, or (b) TH=0b01010 with GO=0 and specification when TH=0b01010 may be used by S0b11, no hint is provided by the instruction. the application to specify the depth on a per-stream The following terminology is used to describe the state basis. of a data stream. Except as described in the paragraph The number of elements ahead of the point of after the next paragraph, the state of a data stream at a stream use indicated by a given depth value may given time is determined by the most recently provided differ across implementations, as may the latency hint(s) for the stream. to bring a given element into the cache. To achieve A data stream for which only descriptive hints have optimum performance, some experimentation with been provided (by dcbt/dcbtst instructions with different depth values may be necessary. TH=0b01000 and UG=0, TH=0b01010 and GO=0 and S=0b00, and/or with TH=0b01011) is said to 01011 The dcbt/dcbtst instruction provides a hint be "nascent". A nascent data stream for which all that describes certain attributes of a data relevant descriptive hints have been provided (by stream. the dcbt/dcbtst usages listed in the preceding The EA is interpreted as follows. sentence) is considered to be "completely described". The order of descriptive hints with respect to one another is unimportant. /// STRIDE OFFSET // ID 0 32 50 56 60 63 A data stream for which a hint has been provided (by a dcbt/dcbtst instruction with TH=0b01000 Bit(s) Description and UG=1 or dcbt with TH=0b01010 and GO=1) that the program will probably soon access it is 0:31 Reserved said to be "active". 32:49 Stride A data stream that is either nascent or active is The displacement, in words, between considered to "exist". the first byte of successive elements in A data stream for which a hint has been provided the stream. The effective address of (e.g., by a dcbt instruction with TH=0b01010 and the Nth element in the stream is S0b00) that the program will probably no longer (N-1)×STRIDE×4 access it is considered no longer to exist. greater than or less than the effective The hint provided by a dcbt/dcbtst instruction with address of the first element of the TH=0b01000 and UG=1 implicitly includes a hint that stream, depending on the direction the program will probably no longer access the data specified for the stream. stream (if any) previously associated with the specified stream ID. The hint provided by a dcbt/dcbtst instruc- 50 Reserved tion with TH=0b01000 and UG=0, or with TH=0b01010 51:55 Offset and GO=0 and S=0b00, or with TH=0b01011 implicitly includes a hint that the program will probably no longer The word-offset of the first element of access the active data stream (if any) previously asso- the stream in its unit (i.e., the effective ciated with the specified stream ID. address of the first element of the stream is (EATRUNC || OFFSET || If a data stream is specified without using a dcbt/ 0b00)).56:59Reserved dcbtst instruction with TH=0b01010 and GO=0 and 60:63 Stream ID (ID) S=0b00, then the number of elements in the stream is unlimited, and the program's need for each element of Stream ID to use for this data stream. the stream is not likely to be transient. If a data stream is specified without using a dcbt/dcbtst instruction with Chapter 4. Storage Control Instructions 683 Version 2.06 TH=0b01011, then the stream will access consecutive units of storage. Interrupts (see Book III) cause all existing data streams to cease to exist. In addition, depending on the imple- mentation, certain conditions and events may cause an existing data stream to cease to exist; for example, in some implementations an existing data stream ceases to exist when it comes to the end of a page. 684 Power ISATM Book II Version 2.06 Programming Note To obtain the best performance across the widest At each level of the storage hierarchy that is "near" range of implementations that support the data stream the processor, elements of a data stream that is variants of dcbt/dcbtst, the programmer should specified as transient are most likely to be assume the following model when using those variants. replaced. As a result, it may be desirable to stag- ger addresses of streams (choose addresses that The processor's response to a hint that the pro- map to different cache congruence classes) to gram will probably soon access a given data reduce the likelihood that an element of a transient stream is to take actions that reduce the latency of stream will be replaced prior to being accessed by accesses to the first few elements of the stream. the program. (Such actions may include prefetching cache blocks into levels of the storage hierarchy that are Processors that comply with versions of the archi- "near" the processor.) Thereafter, as the program tecture that do not support the TH field at all treat accesses each successive element of the stream, TH = 0b01000, 0b01010, and 0b01011 as if TH = the processor takes latency-reducing actions for 0b00000. additional elements of the stream, pacing these A single set of stream IDs is shared between the actions with the program's accesses (i.e., taking dcbt and dcbtst instructions. the actions for only a limited number of elements ahead of the element that the program is currently On some implementations, data streams that are accessing). not specified by software may be detected by the processor. Such data streams are called "hard- The processor's response to a hint that the pro- ware-detected data streams". On some such gram will probably no longer access a given data implementations, data stream resources stream, or to the cessation of existence of a data (resources that are used primarily to support data stream, is to stop taking latency-reducing actions streams) are shared between software-specified for the stream. data streams and hardware-detected data A data stream having finite length ceases to exist streams. On these latter implementations, the pro- when the latency-reducing actions have been gramming model includes the following. taken for all elements of the stream. - Software-specified data streams take prece- If the program ceases to need a given data stream dence over hardware-detected data streams before having accessed all elements of the stream in use of data stream resources. (always the case for streams having unlimited - The processor's response to a hint that the length), performance may be improved if the pro- program will probably no longer access a gram then provides a hint that it will no longer given data stream, or to the cessation of exist- access the stream (e.g., by executing the appropri- ence of a data stream, includes releasing the ate dcbt instruction with TH=0b01010 and associated data stream resources, so that S0b00). they can be used by hardware-detected data streams. Chapter 4. Storage Control Instructions 685 Version 2.06 Programming Note This Programming Note describes several aspects of the dcbt instruction with GO=1 from the preceding using the data stream variants of the dcbt and dcbtst dcbt/dcbtst instructions, and another eieio (or instructions. sync) instruction must separate that dcbt instruc- tion from the following dcbt/dcbtst instructions. A non-transient data stream having unlimited length and which will access consecutive units in In practice, the second eieio (or sync) storage can be completely specified, including pro- described above can sometimes be omitted. For viding the hint that the program will probably soon example, if the program consists of an outer loop access it, using one dcbt instruction. The corre- that contains the dcbt/dcbtst instructions and an sponding specification for a data stream having inner loop that contains the Load or Store instruc- other attributes requires two or three dcbt/dcbtst tions that access the data streams, the character- instructions to describe the stream and one addi- istics of the inner loop and of the implementation's tional dcbt instruction to start the stream. How- branch prediction mechanisms may make it highly ever, one dcbt instruction with TH=0b01010 and unlikely that hints corresponding to a given itera- GO=1 can apply to a set of the data streams tion of the outer loop will be provided out of pro- described in the preceding sentence, so the corre- gram order with respect to hints corresponding to sponding specification for n such data streams the previous iteration of the outer loop. (Also, any requires 2×n to 3×n dcbt/dcbtst instructions plus providing of hints out of program order affects only one dcbt instruction. (There is no need to execute performance, not program correctness.) a dcbt/dcbtst instruction with TH=0b01010 and To mitigate the effects of interrupts on data S=0b10 for a given stream ID before using the streams, it may be desirable to specify a given stream ID for a new data stream; the implicit por- "logical" data stream as a sequence of shorter, tion of the hint provided by dcbt/dcbtst instruc- component data streams. Similar considerations tions that describe data streams suffices.) apply to conditions and events that, depending on If it is desired that the hint provided by a given the implementation, may cause an existing data dcbt/dcbtst instruction be provided in program stream to cease to exist; for example, in some order with respect to the hint provided by another implementations an existing data stream ceases to dcbt/dcbtst instruction, the two instructions must exist when it comes to the end of a virtual page. be separated by an eieio (or sync) instruction. If it is desired to specify data streams without For example, if a dcbt instruction with regard to the number of stream IDs provided by TH=0b01010 and GO=1 is intended to indicate the implementation, stream IDs should be that the program will probably soon access assigned to data streams in order of decreasing nascent data streams described (completely) by stream importance (stream ID 0 to the most impor- preceding dcbt/dcbtst instructions, and is tant stream, stream ID 1 to the next most important intended not to indicate that the program will prob- stream, etc.). This order ensures that the hints for ably soon access nascent data streams described the most important data streams will be provided. (completely) by following dcbt/dcbtst instructions, an eieio (or sync) instruction must separate Programming Note The processor's response to the hint that access to TH=0b10000 the block will be transient is to prefetch data into the cache hierarchy in a way that minimizes the If TH=0b10000, the dcbt instruction provides a hint that displacement of data that has not been identified the program will probably soon load from the block con- as transient. taining the byte addressed by EA, and that the pro- gram's need for the block will be transient (i.e. the time interval during which the program accesses the block is likely to be short). Data Cache Block Allocate X-form [Category: Embedded] dcba RA,RB 31 /// RA RB 758 / 686 Power ISATM Book II Version 2.06 0 6 11 16 21 31 Data Cache Block Touch X-form Let the effective address (EA) be the sum (RA|0)+(RB). dcbt RA,RB,TH [Category: Server] dcbt TH,RA,RB [Category: Embedded] This instruction provides a hint that the program will probably soon store into a portion of the block and the 31 TH RA RB 278 / contents of the rest of the block are not meaningful to 0 6 11 16 21 31 the program. The contents of the block are undefined when the instruction completes. The hint is ignored if Let the effective address (EA) be the sum (RA|0)+(RB). the block is Caching Inhibited. The dcbt instruction provides a hint that describes a This instruction is treated as a Store (see Section 4.3) block or data stream to which the program may perform except that the instruction is treated as a no-op if exe- a Load access. The instruction is also used to indicate cution of the instruction would cause the system data imminent access or end of access to described load storage error handler to be invoked. and store data streams. A hint that the program will Special Registers Altered: probably soon load from a given storage location is None ignored if the location is Caching Inhibited or Guarded. The only operation that is "caused" by the dcbt instruc- tion is the providing of the hint. The actions (if any) taken by the processor in response to the hint are not considered to be "caused by" or "associated with" the dcbt instruction (e.g., dcbt is considered not to cause any data accesses). No means are provided by which software can synchronize these actions with the execu- tion of the instruction stream. For example, these actions are not ordered by the memory barrier created by a sync instruction. The dcbt instruction may complete before the opera- tion it causes has been performed. The nature of the hint depends, in part, on the value of the TH field, as specified at the beginning of this sec- tion. If TH0b01010 and TH0b01011, this instruction is treated as a Load (see Section 4.3), except that the system data storage error handler is not invoked, and reference and change recording need not be done. Special Registers Altered: None Extended Mnemonics: Extended mnemonics are provided for the Data Cache Block Touch instruction so that it can be coded with the TH value as the last operand for all categories, and so that the transient hint can be specified without coding the TH field explicitly. Extended: Equivalent to: dcbtct RA,RB,TH dcbt for TH values of 0b00000 - 0b00111; other TH values are invalid. dcbtds RA,RB,TH dcbt for TH values of 0b00000 or 0b01000 - 0b01111; other TH values are invalid. dcbtt RA,RB dcbt for TH value of 0b10000 Chapter 4. Storage Control Instructions 687 Version 2.06 Programming Notes Data Cache Block Touch for Store X-form New programs should avoid using the dcbt and dcbtst RA,RB,TH [Category: Server] dcbtst mnemonics; one of the extended mnemon- dcbtst TH,RA,RB [Category: Embedded] ics should be used exclusively. If the dcbt mnemonic is used with only two 31 TH RA RB 246 / operands, the TH operand is assumed to be 0 6 11 16 21 31 0b00000. Let the effective address (EA) be the sum (RA|0)+(RB). Processors that comply with versions of the archi- tecture that precede Version 2.01 do not necessar- The dcbtst instruction provides a hint that describes a ily ignore the hint provided by dcbt and dcbtst if block or data stream to which the program may perform the specified block is in storage that is Guarded a Store access, or indicates the expected use thereof. and not Caching Inhibited. A hint that the program will soon store to a given stor- age location is ignored if the location is Caching Inhib- ited or Guarded. Programming Note See the Programming Notes at the beginning of The only operation that is "caused by" the dcbtst this section. instruction is the providing of the hint. The actions (if any) taken by the processor in response to the hint are not considered to be "caused by" or "associated with" the dcbtst instruction (e.g., dcbtst is considered not to cause any data accesses). No means are provided by which software can synchronize these actions with the execution of the instruction stream. For example, these actions are not ordered by memory barriers. The dcbtst instruction may complete before the opera- tion it causes has been performed. The nature of the hint depends, in part, on the value of the TH field, as specified at the beginning of this sec- tion. If TH0b01010 and TH0b01011, this instruction is treated as a Store (see Section 4.3), except that the system data storage error handler is not invoked, refer- ence recording need not be done, and change recording is not done. Special Registers Altered: None Extended Mnemonics: Extended mnemonics are provided for the Data Cache Block Touch for Store instruction so that it can be coded with the TH value as the last operand for all cat- egories, and so that the transient hint can be specified without coding the TH field explicitly. Extended: Equivalent to: dcbtstct RA,RB,TH dcbtst for TH values of 0b00000 or 0b00000 - 0b00111; other TH values are invalid. dcbtstds RA,RB,TH dcbtst for TH values of 0b00000 or 0b01000 - 0b01111; other TH values are invalid. dcbtstt RA,RB dcbtst for TH value of 0b10000. Programming Note See the Programming Notes at the beginning of this section. 688 Power ISATM Book II Version 2.06 Data Cache Block set to Zero X-form dcbz RA,RB 31 /// RA RB 1014 / 0 6 11 16 21 31 if RA = 0 then b 0 else b (RA) EA b + (RB) n block size (bytes) m log2(n) ea EA0:63-m || m0 MEM(ea, n) n0x00 Let the effective address (EA) be the sum (RA|0)+(RB). All bytes in the block containing the byte addressed by EA are set to zero. This instruction is treated as a Store (see Section 4.3). Special Registers Altered: None Programming Note dcbz does not cause the block to exist in the data cache if the block is in storage that is Caching Inhibited. For storage that is neither Write Through Required nor Caching Inhibited, dcbz provides an efficient means of setting blocks of storage to zero. It can be used to initialize large areas of such storage, in a manner that is likely to consume less memory bandwidth than an equivalent sequence of Store instructions. For storage that is either Write Through Required or Caching Inhibited, dcbz is likely to take signifi- cantly longer to execute than an equivalent sequence of Store instructions. For example, on some implementations dcbz for such storage may cause the system alignment error handler to be invoked; on such implementations the system alignment error handler sets the specified block to zero using Store instructions. See Section 5.9.1 of Book III-S and Section 6.11.1 of Book III-E for additional information about dcbz. Chapter 4. Storage Control Instructions 689 Version 2.06 Data Cache Block Store X-form Data Cache Block Flush X-form dcbst RA,RB dcbf RA,RB,L 31 /// RA RB 54 / 31 /// L RA RB 86 / 0 6 11 16 21 31 0 6 9 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). If the block containing the byte addressed by EA is in L=0 storage that is Memory Coherence Required and a If the block containing the byte addressed by EA is block containing the byte addressed by EA is in the in storage that is Memory Coherence Required data cache of any processor and any locations in the and a block containing the byte addressed by EA block are considered to be modified there, those loca- is in the data cache of any processor and any loca- tions are written to main storage, additional locations in tions in the block are considered to be modified the block may be written to main storage, and the block there, those locations are written to main storage ceases to be considered to be modified in that data and additional locations in the block may be written cache. to main storage. The block is invalidated in the If the block containing the byte addressed by EA is in data caches of all processors. storage that is not Memory Coherence Required and If the block containing the byte addressed by EA is the block is in the data cache of this processor and any in storage that is not Memory Coherence Required locations in the block are considered to be modified and the block is in the data cache of this processor there, those locations are written to main storage, addi- and any locations in the block are considered to be tional locations in the block may be written to main stor- modified there, those locations are written to main age, and the block ceases to be considered to be storage and additional locations in the block may modified in that data cache. be written to main storage. The block is invalidated The function of this instruction is independent of in the data cache of this processor. whether the block containing the byte addressed by EA L=1 ("dcbf local") [Category: Server ] is in storage that is Write Through Required or Caching Inhibited. The L=1 form of the dcbf instruction permits a pro- gram to limit the scope of the "flush" operation to This instruction is treated as a Load (see Section 4.3), the data cache of this processor. If the block con- except that reference and change recording need taining the byte addressed by EA is in the data not be done, and it is treated as a Write with respect to cache of this processor, it is removed from this debug events. cache. The coherence of the block is maintained to Special Registers Altered: the extent required by the Memory Coherence None Required storage attribute. L = 3 ("dcbf local primary") [Category: Server] The L=3 form of the dcbf instruction permits a pro- gram to limit the scope of the "flush" operation to the primary data cache of this processor. If the block containing the byte addressed by EA is in the primary data cache of this processor, it is removed from this cache. The coherence of the block is maintained to the extent required by the Memory Coherence Required storage attribute. For the L operand, the value 2 is reserved. The results of executing a dcbf instruction with L=2 are boundedly undefined. The function of this instruction is independent of whether the block containing the byte addressed by EA is in storage that is Write Through Required or Caching Inhibited. This instruction is treated as a Load (see Section 4.3), except that reference and change recording need 690 Power ISATM Book II Version 2.06 not be done, and it is treated as a Write with respect to The treatment of these instructions is independent of debug events. whether other Vector instructions are available (i.e., is independent of the contents of MSRVEC (see Book Special Registers Altered: III-S) or MSRSPV (see Book III-E). None Extended Mnemonics: Programming Note Extended mnemonics are provided for the Data Cache These instructions merely provided hints, and thus Block Flush instruction so that it can be coded with the were permitted to be treated as no-ops even on L value as part of the mnemonic rather than as a processors that implemented them. numeric operand. These are shown as examples with The treatment of these instructions is independent the instruction. See Appendix A. "Assembler Extended of whether other Vector instructions are available Mnemonics" on page 717. The extended mnemonics because, on processors that implemented the are shown below. instructions, the instructions were available even when other Vector instructions were not. Extended: Equivalent to: dcbf RA,RB dcbf RA,RB,0 The extended mnemonics for these instructions dcbfl RA,RB dcbf RA,RB,1 were dstt, dststt, and dssall. dcbflp RA,RB dcbf RA,RB,3 Except in the dcbf instruction description in this sec- tion, references to "dcbf" in Books I-III imply L=0 unless otherwise stated or obvious from context; "dcbfl" is used for L=1 and "dcbflp" is used for L=3. Programming Note dcbf serves as both a basic and an extended mne- monic. The Assembler will recognize a dcbf mne- monic with three operands as the basic form, and a dcbf mnemonic with two operands as the extended form. In the extended form the L operand is omitted and assumed to be 0. Programming Note [Category: Server] dcbf with L=1 can be used to provide a hint that a block in this processor's data cache will not be reused soon. dcbf with L=3 can be used to flush a block from the processor's primary data cache but reduce the latency of a subsequent access. For example, the block may be evicted from the primary data cache but a copy retained in a lower level of the cache hierarchy. Programs which manage coherence in software must use dcbf with L=0. 4.3.2.1 Obsolete Data Cache Instruc- tions [Category: Vector.Phased-Out] The Data Stream Touch (dst), Data Stream Touch for Store (dstst), and Data Stream Stop (dss) instructions (primary opcode 31, extended opcodes 342, 374, and 822 respectively), which were proposed for addition to the Power ISA and were implemented by some proces- sors, must be treated as no-ops (rather than as illegal instructions). Chapter 4. Storage Control Instructions 691 Version 2.06 4.4 Synchronization Instructions The synchronization instructions are used to ensure instructions are initiated, or to control storage access that certain instructions have completed before other ordering, or to support debug operations. 4.4.1 Instruction Synchronize 4.4.2 Load and Reserve and Store Instruction Conditional Instructions The Load And Reserve and Store Conditional instruc- Instruction Synchronize XL-form tions can be used to construct a sequence of instruc- tions that appears to perform an atomic update isync operation on an aligned storage location. See Section 1.7.3, "Atomic Update" for additional informa- 19 /// /// /// 150 / tion about these instructions. 0 6 11 16 21 31 The Load And Reserve and Store Conditional instruc- Executing an isync instruction ensures that all instruc- tions are fixed-point Storage Access instructions; see tions preceding the isync instruction have completed Section 3.3.1, "Fixed-Point Storage Access Instruc- before the isync instruction completes, and that no tions", in Book I. subsequent instructions are initiated until after the The storage location specified by the Load And isync instruction completes. It also ensures that all Reserve and Store Conditional instructions must be in instruction cache block invalidations caused by icbi storage that is Memory Coherence Required if the loca- instructions preceding the isync instruction have been tion may be modified by another processor or mecha- performed with respect to the processor executing the nism. If the specified location is in storage that is Write isync instruction, and then causes any prefetched Through Required or Caching Inhibited, the system instructions to be discarded. data storage error handler or the system alignment Except as described in the preceding sentence, the error handler is invoked for the Server environment and isync instruction may complete before storage may be invoked for the Embedded environment. accesses associated with instructions preceding the The Load and Reserve instructions include an Exclu- isync instruction have been performed. sive Access hint (EH), which can be used to indicate This instruction is context synchronizing (see Book III). that the instruction sequence being executed is imple- menting one of two types of algorithms: Special Registers Altered: None Atomic Update (EH=0) This hint indicates that the program is using a fetch and operate (e.g., fetch and add) or some similar algorithm and that all programs accessing the shared variable are likely to use a similar operation to access the shared variable for some time. Exclusive Access (EH=1) This hint indicates that the program is attempting to acquire a lock and if it succeeds, will perform another store to the lock variable (releasing the lock) before another program attempts to modify the lock variable. Programming Note The Memory Coherence Required attribute on other processors and mechanisms ensures that their stores to the reservation granule will cause the reservation created by the Load And Reserve instruction to be lost. 692 Power ISATM Book II Version 2.06 RESERVE 1 Programming Note RESERVE_LENGTH 1 Because the Load And Reserve and Store Condi- RESERVE_ADDR real_addr(EA) tional instructions have implementation dependen- RT 56 0 || MEM(EA, 1) cies (e.g., the granularity at which reservations are managed), they must be used with care. The oper- Let the effective address (EA) be the sum (RA|0)+(RB). ating system should provide system library pro- The byte in storage addressed by EA is loaded into grams that use these instructions to implement the RT56:63. RT0:55 are set to 0. high-level synchronization functions (Test and Set, This instruction creates a reservation for use by a Compare and Swap, locking, etc.; see Appendix B) stbcx. instruction. A real address computed from the that are needed by application programs. Applica- tion programs should use these library programs, EA as described in Section 1.7.3.1 is associated with rather than use the Load And Reserve and Store the reservation, and replaces any address previously Conditional instructions directly. associated with the reservation. A length of 1 byte is associated with the reservation, and replaces any length previously associated with the reservation. Programming Note The value of EH provides a hint as to whether the pro- EH = 1 should be used when the program is obtain- gram will perform a subsequent store to the byte in ing a lock variable which it will subsequently release before another program attempts to per- storage addressed by EA before some other processor form a store to it. When contention for a lock is sig- attempts to modify it. nificant, using this hint may reduce the number of 0 Other programs might attempt to modify times a cache block is transferred between proces- the byte in storage addressed by EA sor caches. regardless of the result of the correspond- ing stbcx. instruction. EH = 0 should be used when all accesses to a mutex variable are performed using an instruction 1 Other programs will not attempt to modify sequence with Load and Reserve followed by Store the byte in storage addressed by EA until Conditional (e.g., emulating atomic update primi- the program that has acquired the lock tives such as "Fetch and Add;" see Appendix B). performs a subsequent store releasing the The processor may use this hint to optimize the lock. cache to cache transfer of the block containing the Special Registers Altered: mutex variable, thus reducing the latency of per- forming an operation such as `Fetch and Add'. None Programming Note Programming Note Engineering Note lbarx serves as both a basic and an extended Either value of the EH field is appropriate for a mnemonic. The Assembler will recognize a lbarx Load and Reserve instruction that is intended to mnemonic with four operands as the basic form, establish a reservation for a subsequent waitrsv and a lbarx mnemonic with three operands as the and not a subsequent Store Conditional instruction. extended form. In the extended form the EH oper- and is omitted and assumed to be 0. Programming Note Warning: On some processors that comply with versions of the architecture that precede Version 2.00, executing a Load And Reserve instruction in which EH = 1 will cause the illegal instruction error handler to be invoked. Load Byte And Reserve Indexed X-form [Category: Phased-In] lbarx RT,RA,RB,EH 31 RT RA RB 52 EH 0 6 11 16 21 31 if RA = 0 then b 0 else b (RA) EA b +(RB) Chapter 4. Storage Control Instructions 693 Version 2.06 Load Halfword And Reserve Indexed X- Load Word And Reserve Indexed X-form form [Category: Phased-In] lwarx RT,RA,RB,EH lharx RT,RA,RB,EH 31 RT RA RB 20 EH 0 6 11 16 21 31 31 RT RA RB 116 EH 0 6 11 16 21 31 if RA = 0 then b 0 else b (RA) if RA = 0 then b 0 EA b +(RB) else b (RA) RESERVE 1 EA b +(RB) RESERVE_LENGTH 4 RESERVE 1 RESERVE_ADDR real_addr(EA) RT 320 || MEM(EA, 4) RESERVE_LENGTH 2 RESERVE_ADDR real_addr(EA) Let the effective address (EA) be the sum (RA|0)+(RB). RT 48 0 || MEM(EA, 2) The word in storage addressed by EA is loaded into Let the effective address (EA) be the sum (RA|0)+(RB). RT32:63. RT0:31 are set to 0. The halfword in storage addressed by EA is loaded into This instruction creates a reservation for use by a RT48:63. RT0:47 are set to 0. stwcx. instruction. A real address computed from the This instruction creates a reservation for use by a EA as described in Section 1.7.3.1 is associated with sthcx. instruction. A real address computed from the the reservation, and replaces any address previously EA as described in Section 1.7.3.1 is associated with associated with the reservation. A length of 4 bytes is the reservation, and replaces any address previously associated with the reservation, and replaces any associated with the reservation. A length of 2 bytes is length previously associated with the reservation. associated with the reservation, and replaces any The value of EH provides a hint as to whether the pro- length previously associated with the reservation. gram will perform a subsequent store to the word in The value of EH provides a hint as to whether the pro- storage addressed by EA before some other processor gram will perform a subsequent store to the halfword in attempts to modify it. storage addressed by EA before some other processor 0 Other programs might attempt to modify attempts to modify it. the word in storage addressed by EA 0 Other programs might attempt to modify regardless of the result of the correspond- the halfword in storage addressed by EA ing stwcx. instruction. regardless of the result of the correspond- 1 Other programs will not attempt to modify ing sthcx. instruction. the word in storage addressed by EA until 1 Other programs will not attempt to modify the program that has acquired the lock the halfword in storage addressed by EA performs a subsequent store releasing the until the program that has acquired the lock. lock performs a subsequent store releas- EA must be a multiple of 4. If it is not, either the system ing the lock. alignment error handler is invoked or the results are EA must be a multiple of 2. If it is not, either the system boundedly undefined. alignment error handler is invoked or the results are Special Registers Altered: boundedly undefined. None Special Registers Altered: None Programming Note lwarx serves as both a basic and an extended Programming Note mnemonic. The Assembler will recognize a lwarx lharx serves as both a basic and an extended mnemonic with four operands as the basic form, mnemonic. The Assembler will recognize a lharx and a lwarx mnemonic with three operands as the mnemonic with four operands as the basic form, extended form. In the extended form the EH oper- and a lharx mnemonic with three operands as the and is omitted and assumed to be 0. extended form. In the extended form the EH oper- and is omitted and assumed to be 0. 694 Power ISATM Book II Version 2.06 Store Byte Conditional Indexed X-form established the reservation, it is undefined whether [Category: Phased-In] (RS)56:63 are stored into the byte in storage addressed by EA. Otherwise, no store is per- stbcx. RS,RA,RB formed. If the Store Conditional Page Mobility category is 31 RS RA RB 694 1 not supported, it is undefined whether (RS)56:63 0 6 11 16 21 31 are stored into the byte in storage addressed by EA. if RA = 0 then b 0 If a reservation exists and the length associated with else b (RA) the reservation is not 1 byte, it is undefined whether EA b + (RB) (RS)56:63 are stored into the byte in storage addressed if RESERVE then by EA. if RESERVE_LENGTH = 1 then if RESERVE_ADDR = real_addr(EA) then If a reservation does not exist, no store is performed. MEM(EA, 1) (RS)56:63 undefined_case 0 CR Field 0 is set as follows. n is a 1-bit value that indi- store_performed 1 cates whether the store was performed, except that if, else per the preceding description, it is undefined whether if SCPM category supported then the store is performed, the value of n is undefined (and z smallest real page size supported by need not reflect whether the store was performed). implementation if RESERVE_ADDR ÷ z = real_addr(EA) ÷ z then undefined_case 1 CR0LT GT EQ SO = 0b00 || n || XERSO else The reservation is cleared. undefined_case 0 store_performed 0 Special Registers Altered: else CR0 undefined_case 1 else undefined_case 1 else undefined_case 0 store_performed 0 if undefined_case then u1 undefined 1-bit value if u1 then MEM(EA, 1) (RS)56:63 u2 undefined 1-bit value CR0 0b00 || u2 || XERSO else CR0 0b00 || store_performed || XERSO RESERVE 0 Let the effective address (EA) be the sum (RA|0)+(RB). If a reservation exists, the length associated with the reservation is 1 byte, and the real storage location specified by the stbcx. is the same as the real storage location specified by the lbarx instruction that estab- lished the reservation, (RS)56:63 are stored into the byte in storage addressed by EA. If a reservation exists, the length associated with the reservation is 1 byte, and the real storage location specified by the stbcx. is not the same as the real stor- age location specified by the lbarx instruction that established the reservation, the following applies. If the Store Conditional Page Mobility category is supported, the following applies. Let z denote a naturally aligned block of real storage whose size is the smallest real page size supported by the implementation. If the real storage location speci- fied by the stbcx. is in the same z as the real stor- age location specified by the lbarx instruction that Chapter 4. Storage Control Instructions 695 Version 2.06 Store Halfword Conditional Indexed X- age location specified by the lharx instruction that form established the reservation, it is undefined whether [Category: Phased-In] (RS)48:63 are stored into the halfword in storage addressed by EA. Otherwise, no store is per- sthcx. RS,RA,RB formed. If the Store Conditional Page Mobility category is 31 RS RA RB 726 1 not supported, it is undefined whether (RS)48:63 0 6 11 16 21 31 are stored into the halfword in storage addressed by EA. if RA = 0 then b 0 If a reservation exists and the length associated with else b (RA) the reservation is not 2 bytes, it is undefined whether EA b + (RB) (RS)48:63 are stored into the halfword in storage if RESERVE then addressed by EA. if RESERVE_LENGTH = 2 then if RESERVE_ADDR = real_addr(EA) then If a reservation does not exist, no store is performed. MEM(EA, 2) (RS)48:63 undefined_case 0 CR Field 0 is set as follows. n is a 1-bit value that indi- store_performed 1 cates whether the store was performed, except that if, else per the preceding description, it is undefined whether if SCPM category supported then the store is performed, the value of n is undefined (and z smallest real page size supported by need not reflect whether the store was performed). implementation if RESERVE_ADDR ÷ z = real_addr(EA) ÷ z then CR0LT GT EQ SO = 0b00 || n || XERSO undefined_case 1 else The reservation is cleared. undefined_case 0 store_performed 0 EA must be a multiple of 2. If it is not, either the system else alignment error handler is invoked or the results are undefined_case 1 boundedly undefined. else undefined_case 1 Special Registers Altered: else CR0 undefined_case 0 store_performed 0 if undefined_case then u1 undefined 1-bit value if u1 then MEM(EA, 2) (RS)48:63 u2 undefined 1-bit value CR0 0b00 || u2 || XERSO else CR0 0b00 || store_performed || XERSO RESERVE 0 Let the effective address (EA) be the sum (RA|0)+(RB). If a reservation exists, the length associated with the reservation is 2 bytes, and the real storage location specified by the sthcx. is the same as the real storage location specified by the lharx instruction that estab- lished the reservation, (RS)48:63 are stored into the halfword in storage addressed by EA. If a reservation exists, the length associated with the reservation is 2 bytes, and the real storage location specified by the sthcx. is not the same as the real stor- age location specified by the lharx instruction that established the reservation, the following applies. If the Store Conditional Page Mobility category is supported, the following applies. Let z denote a naturally aligned block of real storage whose size is the smallest real page size supported by the implementation. If the real storage location speci- fied by the sthcx. is in the same z as the real stor- 696 Power ISATM Book II Version 2.06 Store Word Conditional Indexed X-form (RS)32:63 are stored into the word in storage addressed by EA. Otherwise, no store is per- stwcx. RS,RA,RB formed. If the Store Conditional Page Mobility category is 31 RS RA RB 150 1 not supported, it is undefined whether (RS)32:63 0 6 11 16 21 31 are stored into the word in storage addressed by EA. if RA = 0 then b 0 If a reservation exists and the length associated with else b (RA) the reservation is not 4 bytes, it is undefined whether EA b + (RB) (RS)32:63 are stored into the word in storage addressed if RESERVE then by EA. if RESERVE_LENGTH = 4 then if RESERVE_ADDR = real_addr(EA) then If a reservation does not exist, no store is performed. MEM(EA, 4) (RS)32:63 undefined_case 0 CR Field 0 is set as follows. n is a 1-bit value that indi- store_performed 1 cates whether the store was performed, except that if, else per the preceding description, it is undefined whether if SCPM category supported then the store is performed, the value of n is undefined (and z smallest real page size supported by need not reflect whether the store was performed). implementation if RESERVE_ADDR ÷ z = real_addr(EA) ÷ z then undefined_case 1 CR0LT GT EQ SO = 0b00 || n || XERSO else The reservation is cleared. undefined_case 0 store_performed 0 EA must be a multiple of 4. If it is not, either the system else alignment error handler is invoked or the results are undefined_case 1 boundedly undefined. else undefined_case 1 Special Registers Altered: else CR0 undefined_case 0 store_performed 0 if undefined_case then u1 undefined 1-bit value if u1 then MEM(EA, 4) (RS)32:63 u2 undefined 1-bit value CR0 0b00 || u2 || XERSO else CR0 0b00 || store_performed || XERSO RESERVE 0 Let the effective address (EA) be the sum (RA|0)+(RB). If a reservation exists, the length associated with the reservation is 4 bytes, and the real storage location specified by the stwcx. is the same as the real storage location specified by the lwarx instruction that estab- lished the reservation, (RS)32:63 are stored into the word in storage addressed by EA. If a reservation exists, the length associated with the reservation is 4 bytes, and the real storage location specified by the stwcx. is not the same as the real stor- age location specified by the lwarx instruction that established the reservation, the following applies. If the Store Conditional Page Mobility category is supported, the following applies. Let z denote a naturally aligned block of real storage whose size is the smallest real page size supported by the implementation. If the real storage location speci- fied by the stwcx. is in the same z as the real stor- age location specified by the lwarx instruction that established the reservation, it is undefined whether Chapter 4. Storage Control Instructions 697 Version 2.06 4.4.2.1 64-Bit Load and Reserve and Store Conditional Instructions [Category: 64-Bit] Load Doubleword And Reserve Indexed Store Doubleword Conditional Indexed X-form X-form ldarx RT,RA,RB,EH stdcx. RS,RA,RB 31 RT RA RB 84 EH 31 RS RA RB 214 1 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b +(RB) EA b + (RB) RESERVE 1 if RESERVE then RESERVE_LENGTH 8 if RESERVE_LENGTH = 8 then RESERVE_ADDR real_addr(EA) if RESERVE_ADDR = real_addr(EA) then RT MEM(EA, 8) MEM(EA, 8) (RS) undefined_case 0 Let the effective address (EA) be the sum (RA|0)+(RB). store_performed 1 The doubleword in storage addressed by EA is loaded else into RT. if SCPM category supported then z smallest real page size supported by This instruction creates a reservation for use by a implementation stdcx. instruction. A real address computed from the if RESERVE_ADDR ÷ z = real_addr(EA) ÷ z then EA as described in Section 1.7.3.1 is associated with undefined_case 1 the reservation, and replaces any address previously else associated with the reservation. A length of 8 bytes is undefined_case 0 associated with the reservation, and replaces any store_performed 0 length previously associated with the reservation. else undefined_case 1 The value of EH provides a hint as to whether the pro- else gram will perform a subsequent store to the double- undefined_case 1 word in storage addressed by EA before some other else processor attempts to modify it. undefined_case 0 0 Other programs might attempt to modify store_performed 0 the doubleword in storage addressed by if undefined_case then u1 undefined 1-bit value EA regardless of the result of the corre- if u1 then sponding stdcx. instruction. MEM(EA, 8) (RS) 1 Other programs will not attempt to modify u2 undefined 1-bit value the doubleword in storage addressed by CR0 0b00 || u2 || XERSO EA until the program that has acquired the else lock performs a subsequent store releas- CR0 0b00 || store_performed || XERSO ing the lock. RESERVE 0 EA must be a multiple of 8. If it is not, either the system Let the effective address (EA) be the sum (RA|0)+(RB). alignment error handler is invoked or the results are If a reservation exists, the length associated with the boundedly undefined. reservation is 8 bytes, and the real storage location Special Registers Altered: specified by the stdcx. is the same as the real storage None location specified by the ldarx instruction that estab- lished the reservation, (RS) is stored into the double- Programming Note word in storage addressed by EA. ldarx serves as both a basic and an extended If a reservation exists, the length associated with the mnemonic. The Assembler will recognize a ldarx reservation is 8 bytes, and the real storage location mnemonic with four operands as the basic form, specified by the stdcx. is not the same as the real stor- and a ldarx mnemonic with three operands as the age location specified by the ldarx instruction that extended form. In the extended form the EH oper- established the reservation, the following applies. and is omitted and assumed to be 0. If the Store Conditional Page Mobility category is supported, the following applies. Let z denote a naturally aligned block of real storage whose size 698 Power ISATM Book II Version 2.06 is the smallest real page size supported by the implementation. If the real storage location speci- fied by the stdcx. is in the same z as the real stor- age location specified by the ldarx instruction that established the reservation, it is undefined whether (RS) is stored into the doubleword in storage addressed by EA. Otherwise, no store is per- formed. If the Store Conditional Page Mobility category is not supported, it is undefined whether (RS) is stored into the doubleword in storage addressed by EA. If a reservation exists and the length associated with the reservation is not 8 bytes, it is undefined whether (RS) is stored into the doubleword in storage addressed by EA. If a reservation does not exist, no store is performed. CR Field 0 is set as follows. n is a 1-bit value that indi- cates whether the store was performed, except that if, per the preceding description, it is undefined whether the store is performed, the value of n is undefined (and need not reflect whether the store was performed). CR0LT GT EQ SO = 0b00 || n || XERSO The reservation is cleared. EA must be a multiple of 8. If it is not, either the system alignment error handler is invoked or the results are boundedly undefined. Special Registers Altered: CR0 Chapter 4. Storage Control Instructions 699 Version 2.06 4.4.3 Memory Barrier Instructions The Memory Barrier instructions can be used to control Extended mnemonics for Synchronize the order in which storage accesses are performed. Additional information about these instructions and Extended mnemonics are provided for the Synchronize about related aspects of storage management can be instruction so that it can be supported by assemblers found in Book III. that recognize only the msync mnemonic and so that it can be coded with the L value as part of the mne- monic rather than as a numeric operand. These are shown as examples with the instruction. See Appendix A. "Assembler Extended Mnemonics" on page 717. The only reason for the msync mnemonic is com- patibility with Book E assembler code. Synchronize X-form If L=0 (or L=2), the sync instruction has the follow- ing additional properties. sync L Executing the sync instruction ensures that all instructions preceding the sync instruction have 31 /// L /// /// 598 / completed before the sync instruction completes, 0 6 9 11 16 21 31 and that no subsequent instructions are initiated until after the sync instruction completes. The sync instruction creates a memory barrier (see The sync instruction is execution synchronizing Section 1.7.1). The set of storage accesses that is (see Book III). However, address translation and ordered by the memory barrier depends on the value of reference and change recording (see Book III) the L field. associated with subsequent instructions may be L=0 ("heavyweight sync") performed before the sync instruction completes. The memory barrier provides an ordering function The memory barrier provides the additional order- for the storage accesses associated with all ing function such that if a given instruction that is instructions that are executed by the processor the result of a store in set B is executed, all appli- executing the sync instruction. The applicable cable storage accesses in set A have been per- pairs are all pairs ai,bj in which bj is a data access, formed with respect to the processor executing the except that if ai is the storage access caused by an instruction to the extent required by the associated icbi instruction then bj may be performed with memory coherence properties. The single excep- respect to the processor executing the sync tion is that any storage access in set A that is instruction before ai is performed with respect to caused by an icbi instruction executed by the pro- that processor. cessor executing the sync instruction (P1) may not have been performed with respect to P1 (see the L=1 ("lightweight sync") description of the icbi instruction on page 680). The memory barrier provides an ordering function The cumulative properties of the barrier apply to for the storage accesses caused by Load, Store, the execution of the given instruction as they and dcbz instructions that are executed by the would to a load that returned a value that was the processor executing the sync instruction and for result of a store in set B. which the specified storage location is in storage that is Memory Coherence Required and is neither The sync instruction provides an ordering function Write Through Required nor Caching Inhibited. for the operations caused by the stream variants of The applicable pairs are all pairs ai,bj of such the dcbt and dcbtst instructions (i.e. the providing accesses except those in which ai is an access of hints). caused by a Store or dcbz instruction and bj is an The value L=3 is reserved. access caused by a Load instruction. L=2 The sync instruction may complete before storage accesses associated with instructions preceding the The set of storage accesses that is ordered by the sync instruction have been performed. The sync memory barrier is described in Section 5.9.2 of instruction may complete before operations caused by Book III-S, as are additional properties of the sync dcbt and dcbtst instructions preceding the sync instruction with L=2. instruction have been performed. The ordering done by the memory barrier is cumulative. 700 Power ISATM Book II Version 2.06 See Section 6.11.3 of Book III-E for additional informa- Programming Note tion related to sync with L=0 for the Embedded envi- ronment. The sync instruction can be used to ensure that all stores into a data structure, caused by Store Special Registers Altered: instructions executed in a "critical section" of a pro- None gram, will be performed with respect to another Extended Mnemonics: processor before the store that releases the lock is performed with respect to that processor; see Extended mnemonics for Synchronize: Section B.2, "Lock Acquisition and Release, and Related Techniques" on page 721. Extended: Equivalent to: sync sync 0 The memory barrier created by a sync instruction msync sync 0 with L=0 or L=1 does not order implicit storage lwsync sync 1 accesses. The memory barrier created by a sync ptesync sync 2 instruction with any L value does not order instruc- tion fetches. Except in the sync instruction description in this sec- tion, references to "sync" in Books I-III imply L=0 (The memory barrier created by a sync instruction unless otherwise stated or obvious from context; the with L=0 ­ or L=2; see Book III ­ appears to appropriate extended mnemonics are used when other order instruction fetches for instructions preceding L values are intended. the sync instruction with respect to data accesses caused by instructions following the sync instruc- Programming Note tion. However, this ordering is a consequence of the first "additional property" of sync with L=0, not Section 1.8 contains a detailed description of how a property of the memory barrier.) to modify instructions such that a well-defined result is obtained. In order to obtain the best performance across the widest range of implementations, the programmer should use the sync instruction with L=1, or the Programming Note eieio or mbar instruction, if any of these is sync serves as both a basic and an extended mne- sufficient for his needs; otherwise he should use monic. The Assembler will recognize a sync mne- sync with L=0. sync with L=2 should not be monic with one operand as the basic form, and a used by application programs. sync mnemonic with no operand as the extended form. In the extended form the L operand is omitted and assumed to be 0. Programming Note The functions provided by sync with L=1 are a strict subset of those provided by sync with L=0. (The functions provided by sync with L=2 are a strict superset of those provided by sync with L=0; see Book III.) Chapter 4. Storage Control Instructions 701 Version 2.06 Enforce In-order Execution of I/O X-form Memory Barrier X-form eieio mbar MO [Category: Server] [Category: Embedded] 31 /// /// /// 854 / 31 MO /// /// 854 / 0 6 11 16 21 31 0 6 11 16 21 31 The eieio instruction creates a memory barrier (see When MO=0, the mbar instruction creates a cumula- Section 1.7.1, "Storage Access Ordering"), which pro- tive memory barrier (see Section 1.7.1, "Storage vides an ordering function for the storage accesses Access Ordering"), which provides an ordering function caused by Load, Store, dcbz, eciwx, and ecowx for the storage accesses executed by the processor instructions executed by the processor executing the executing the mbar instruction. eieio instruction. These storage accesses are divided When MO0, an implementation may support the mbar into the two sets listed below. The storage access instruction ordering a particular subset of storage caused by an eciwx instruction is ordered as a load, accesses. An implementation may also support multi- and the storage access caused by a dcbz or ecowx ple, non-zero values of MO that each specify a different instruction is ordered as a store. subset of storage accesses that are ordered by the 1. Loads and stores to storage that is both Caching mbar instruction. Which subsets of storage accesses Inhibited and Guarded, and stores to main storage that are ordered and which values of MO that specify caused by stores to storage that is Write Through these subsets is implementation-dependent. Required. The mbar instruction may complete before storage The applicable pairs are all pairs ai,bj of such accesses associated with instructions preceding the accesses. mbar instruction have been performed. The mbar 2. Stores to storage that is Memory Coherence instruction may complete before operations caused by Required and is neither Write Through Required dcbt and dcbtst instructions preceding the sync nor Caching Inhibited. instruction have been performed. The applicable pairs are all pairs ai,bj of such Special Registers Altered: accesses. None The operations caused by the stream variants of the Programming Note dcbt and dcbtst instructions (i.e. the providing of hints) The eieio and mbar instructions are are ordered by eieio as a third set of operations, and intended for use in doing memory-mapped I/O). the operations caused by tlbie and tlbsync Because loads, and separately stores, to storage instructions (see Book III-S) are ordered by eieio as a that is both Caching Inhibited and Guarded are per- fourth set of operations. formed in program order (see Section 1.7.1, "Stor- Each of the four sets of storage accesses or operations age Access Ordering" on page 660), eieio or is ordered independently of the other three sets. The mbar is needed for such storage only when ordering done by eieio's memory barrier for the second loads must be ordered with respect to stores. set is cumulative; the ordering done by eieio's memory For the eieio instruction, accesses in set 1, ai barrier for the other three sets is not cumulative. and bj need not be the same kind of access or be to The eieio instruction may complete before storage storage having the same storage control attributes. accesses associated with instructions preceding the For example, ai can be a load to Caching Inhibited, eieio instruction have been performed. The eieio Guarded storage, and bj a store to Write Through instruction may complete before operations caused by Required storage. dcbt and dcbtst instructions preceding the eieio If stronger ordering is desired than that provided by instruction have been performed eieio or mbar, the sync instruction must Special Registers Altered: be used, with the appropriate value in the L field. None 702 Power ISATM Book II Version 2.06 Programming Note 4.4.4 Wait Instruction The functions provided by eieio and mbar are a strict subset of those provided by sync with Wait X-form L=0. The functions provided by eieio for its second set are a strict subset of those provided by wait WC sync with L=1. [Category: Wait.Phased-In] Since eieio and mbarshare the same op- 31 /// WC /// /// 62 / code, software designed for both server and 0 6 9 11 16 21 31 embedded environments must assume that only the eieio functionality applies since the func- wait tions provided by eieio are a subset of those pro- [Category: Wait.Phased-Out] vided by mbar with MO=0. 31 /// /// /// 62 / 0 6 11 16 21 31 The wait instruction allows instruction fetching and execution to be suspended under certain conditions, depending on the value of the WC field. A wait instruc- tion without the WC field is treated as a wait instruction with WC=0. The defined values for WC are as follows. 0b00 Resume instruction fetching and execution when an interrupt occurs. 0b01 Resume instruction fetching and execution when an interrupt occurs or when a reserva- tion made by the processor does not exist (see Section 1.7.3). It is implementation- dependent whether this WC value is sup- ported or wait with this WC value is treated as a no-op. 0b10 Resume instruction fetching and execution when an interrupt occurs or when an imple- mentation-specific condition exists. It is imple- mentation-dependent whether this WC value is supported or is treated as reserved. 0b11 This WC value is treated as a no-op. If WC=0, or if WC=1 and a reservation made by the processor exists when the wait instruction is executed, or if WC=2 and the associated implementation-specific condition does not exist when the wait instruction is executed, the following applies. Instruction fetching and execution is suspended. Once the wait instruction has completed, the NIA will point to the next sequential instruction. Instruction fetching and execution resumes when any of the following conditions are met. An interrupt occurs. WC=1 and a reservation made by the proces- sor does not exist. WC=2 and the associated implementation- specific condition exists. Chapter 4. Storage Control Instructions 703 Version 2.06 When the wait instruction is executed, if WC=1 and a Programming Note Engineering Note reservation made by the processor does not exist, or if WC=2 and the associated implementation-specific con- Execution of a wait instruction indicates that no fur- dition exists, the instruction is treated as a no-op. ther instruction fetching will occur until the condi- tion(s) associated with the WC field value for the Programming Note instruction take place. The main purpose of the wait instruction is to enable power savings. wait On implementations which do not support the wait frees computational resources which might be allo- instruction with WC=0b10, the behavior for non- cated to another program or converted into power support (treated as reserved) differs from the non- savings. support of the other non-zero WC values (treated as no-ops). The possibility of boundedly undefined If an interrupt causes resumption of instruction exe- behavior such as causing the system illegal instruc- cution, the interrupt handler will return to the tion error handler to be invoked is meant to discour- instruction after the wait. age the use of WC=0b10 in programs that are intended to be portable. Programming Note Engineering Note Only programs that are implementation-aware In previous versions of the architecture the wait should use WC=0b10. instruction was context synchronizing. Programming Note Engineering Note Causing the system illegal instruction error handler to be invoked if an attempt is made to execute wait with WC=0b10 on an implementation that does not support that form of wait facilitates the debugging of software. Special Registers Altered: None Extended Mnemonics: Examples of extended mnemonics for Wait: Extended: Equivalent to: wait wait 0 waitrsv wait 1 waitimpl wait 2 Programming Note The wait instruction with WC=0b00 can be used in verification test cases to signal the end of a test case. The encoding for the instruction is the same in both Big-Endian and Little-Endian modes. Programming Note The wait instruction may be useful as the primary instruction of an "idle process" or the completion of processing for a cooperative processor. However, overall system performance may be better served if the wait instruction is used by applications only for idle times that are expected to be short. 704 Power ISATM Book II Version 2.06 Programming Note Engineering Note The wait instruction is not execution synchronizing and does not cause a memory barrier. waitrsv behavior relative to a preceding Load and Reserve instruction or Store Conditional instruction has a data dependency on the reservation. When execu- tion proceeds past waitrsv as the result of another processor storing to the reservation granule, a sub- sequent load from the same storage location may return stale data. It is also possible that execution could proceed past the waitrsv for other reasons such as the occurrence of an interrupt. There are no architecturally defined means to determine what terminated the wait. Moreover, even if software were to attempt to determine what caused the wait to terminate, by the time the check occurred, both causes (interrupt and storage modification) might be true. Software must be designed to deal with the various causes of wait termination. In general, if the program that performed wait does not see the new value of the storage location for which the res- ervation was held, it should re-establish the reser- vation by repeating the Load and Reserve instruction, and then perform another waitrsv. The following code waits for a device to update a memory location and assumes that r3 contains the address of the word to be updated. This assumes that software has already set this word to zero and is waiting for the device to update the word to a non-zero value. loop: lwarx r4,0,r3 # load and reserve cmpwi r4,0 # exit if nonzero bne- exit waitrsv # wait for reservation loss b loop exit: ... The b instruction results in re-execution of the waitrsv if instruction execution had resumed for some reason other than loss of the reservation made by the processor. This branch instruction is also necessary because the reservation might have been lost for reasons other than the device updating the memory location addressed by r3. Also, even if the device updated this memory loca- tion, the lwarx and waitrsv instructions may need to be re-executed until the lwarx returns the cur- rent data. Programming Note Engineering Note A wait instruction without the WC field is treated as a wait instruction with WC=0b00 because older processors that comply with Power ISA 2.06 do not support the WC field. Chapter 4. Storage Control Instructions 705 Version 2.06 706 Power ISATM Book II Version 2.06 Chapter 5. Time Base 5.1 Time Base Overview. . . . . . . . . . . 707 5.3 Alternate Time Base [Category: Alter- 5.2 Time Base . . . . . . . . . . . . . . . . . . 707 nate Time Base] . . . . . . . . . . . . . . . . . 710 5.2.1 Time Base Instructions . . . . . . . 708 5.1 Time Base Overview updated and other frequencies, such as the CPU clock or bus clock. The Time Base update frequency is not The time base facilities include a Time Base and an required to be constant. What is required, so that sys- Alternate Time Base which is category: Alternate Time tem software can keep time of day and operate interval Base. The Alternate Time Base is analogous to the timers, is one of the following. Time Base except that it may count at a different fre- The system provides an (implementation-depen- quency and is not writable. dent) interrupt to software whenever the update frequency of the Time Base bits 0:59 changes, and a means to determine what the current update fre- 5.2 Time Base quency is. The Time Base (TB) is a 64-bit register (see Figure 5) The update frequency of the Time Base bits 0:59 is containing a 64-bit unsigned integer that is incremented under the control of the system software. periodically as described below. Programming Note TBU TBL If the operating system initializes the Time Base on 0 32 63 power-on to some reasonable value and the update frequency of the Time Base is constant, the Field Description Time Base can be used as a source of values that TBU Upper 32 bits of Time Base increase at a constant rate, such as for time stamps in trace entries. TBL Lower 32 bits of Time Base Even if the update frequency is not constant, val- Figure 5. Time Base ues read from the Time Base are monotonically increasing (except when the Time Base wraps from The Time Base bits 0:59 increment until their value 264-1 to 0). If a trace entry is recorded each time becomes 0xFFF_FFFF_FFFF_FFFF (259 - 1), at the the update frequency changes, the sequence of next increment their value becomes Time Base values can be post-processed to 0x000_0000_0000_0000. There is no interrupt or other become actual time values. indication when this occurs. Successive readings of the Time Base may return Time base bits 60:63 may increment at a variable rate. identical values. When the value of bit 59 changes, bits 60:63 are set to zero; if bits 60:63 increment to 0xF before the value of bit 59 changes, they remain at 0xF until the value of bit 59 changes. 5.2.1 Time Base Instructions 2 64 × 32 Move From Time Base XFX-form TTB = --------------------- = 5.90 x 1011 seconds 1GHz mftb RT,TBR which is approximately 18,700 years. [Category: Phased-Out] The Power ISA AS does not specify a relationship between the frequency at which the Time Base is 31 RT tbr 371 / Chapter 5. Time Base 707 Version 2.06 0 6 11 21 31 This instruction behaves as if it were an mfspr instruc- tion; see the mfspr instruction description in Section 3.3.15 of Book I. Special Registers Altered: None Extended Mnemonics: Extended mnemonics for Move From Time Base: Extended: Equivalent to: mftb Rx,268 mftb Rx mfspr Rx,268 mftb Rx,269 mftbu Rx mfspr Rx,269 Programming Note New programs should use mfspr instead of mftb to access the Time Base. Programming Note mftb serves as both a basic and an extended mne- monic. The Assembler will recognize an mftb mnemonic with two operands as the basic form, and an mftb mnemonic with one operand as the extended form. In the extended form the TBR operand is omitted and assumed to be 268 (the value that corresponds to TB). Programming Note The mfspr instruction can be used to read the Time Base on all processors that comply with Ver- sion 2.01 of the architecture or with any subse- quent version. It is believed that the mfspr instruction can be used to read the Time Base on most processors that comply with versions of the architecture that pre- cede Version 2.01. Processors for which mfspr cannot be used to read the Time Base include the following. - 601 - POWER3 (601 implements neither the Time Base nor mftb, but depends on software using mftb to read the Time Base, so that the attempt causes the Illegal Instruction error handler to be invoked and thereby permits the operating system to emulate the Time Base.) 708 Power ISATM Book II Version 2.06 Programming Note Since the update frequency of the Time Base is imple- mulld Rz,Rz,Rx # Rz = quotient * divisor mentation-dependent, the algorithm for converting the sub Rz,Ry,Rz # Rz = excess ticks current value in the Time Base to time of day is also lwz Rx,ns_adj implementation-dependent. slwi Rz,Rz,1 # Rz = 2 * excess ticks mulhwu Rz,Rz,Rx # mul by (ns/tick)/2 * 232 As an example, assume that the Time Base increments stw Rz,posix_ns# product[0:31] = excess ns at the constant rate of 512 MHz. (Note, however, that For the Embedded environment when the processor is programs should allow for the possibility that some in 32-bit mode, it is not possible to read the Time Base implementations may not increment the least-signifi- using a single instruction. Instead, two instructions cant 4 bits of the Time Base at a constant rate.) What is must be used, one of which reads TBL and the other of wanted is the pair of 32-bit values comprising a POSIX which reads TBU. Because of the possibility of a carry standard clock:1 the number of whole seconds that from TBL to TBU occurring between the two reads, a have passed since 00:00:00 January 1, 1970, UTC, sequence such as the following must be used to read and the remaining fraction of a second expressed as a the Time Base. number of nanoseconds. loop: Assume that: mfspr Rx,TBU # load from TBU mfspr Ry,TB # load from TB The value 0 in the Time Base represents the start mfspr Rz,TBU # load from TBU time of the POSIX clock (if this is not true, a simple cmp cr0,0,Rx,Rz# check if `old'='new' 64-bit subtraction will make it so). bne loop #branch if carry occurred The integer constant ticks_per_sec contains the value 512,000,000, which is the number of times Non-constant update frequency the Time Base is updated each second. In a system in which the update frequency of the Time The integer constant ns_adj contains the value Base may change over time, it is not possible to con- vert an isolated Time Base value into time of day. 1,000,000,000 Instead, a Time Base value has meaning only with ------------------------------------- × 232 / 2 = 4194304000 - respect to the current update frequency and the time of 512,000,000 day that the update frequency was last changed. Each which is the number of nanoseconds per tick of the time the update frequency changes, either the system Time Base, multiplied by 232 for use in mulhwu software is notified of the change via an interrupt (see (see below), and then divided by 2 in order to fit, as Book III), or the change was instigated by the system an unsigned integer, into 32 bits. software itself. At each such change, the system soft- ware must compute the current time of day using the When the processor is in 64-bit mode, The POSIX old update frequency, compute a new value of clock can be computed with an instruction sequence ticks_per_sec for the new frequency, and save the time such as this: of day, Time Base value, and tick rate. Subsequent mfspr Ry,268 # Ry = Time Base calls to compute Time of Day use the current Time lwz Rx,ticks_per_sec Base Value and the saved value. divdu Rz,Ry,Rx # Rz = whole seconds stw Rz,posix_sec 1. Described in POSIX Draft Standard P1003.4/D12, Draft Standard for Information Technology -- Portable Operating System Interface (POSIX) -- Part 1: System Application Program Interface (API) - Amendment 1: Real-time Extension [C Language]. Institute of Electrical and Electronics Engi- neers, Inc., Feb. 1992. Chapter 5. Time Base 709 Version 2.06 5.3 Alternate Time Base [Cate- gory: Alternate Time Base] The Alternate Time Base (ATB) is a 64-bit register (see Figure 5) containing a 64-bit unsigned integer that is incremented periodically. The frequency at which the integer is updated is implementation-dependent. ATBU ATBL 0 32 63 Figure 6. Alternate Time Base The ATBL register is an aliased name for the ATB. The Alternate Time Base increments until its value becomes 0xFFFF_FFFF_FFFF_FFFF (264 - 1). At the next increment, its value becomes 0x0000_0000_0000_0000. There is no explicit indica- tion (such as an interrupt; see Book III) that this has occurred. The Alternate Time Base is accessible in both user and supervisor mode. The counter can be read by execut- ing a mfspr instruction specifying the ATB (or ATBL) register, but cannot be written. A second SPR register ATBU, is defined that accesses only the upper 32 bits of the counter. Thus the upper 32 bits of the counter may be read into a register by reading the ATBU regis- ter. The effect of entering a power-savings mode or of pro- cessor frequency changes on counting in the Alternate Time Base is implementation-dependent. 710 Power ISATM Book II Version 2.06 Chapter 6. Decorated Storage Facility [Category: Decorated Storage] 6.1 Decorated Load Instructions. . . . . 712 6.3 Decorated Notify Instructions . . . . 714 6.2 Decorated Store Instructions . . . . 713 The Decorated Storage facility provides Load, Store, decorated device that requires a decoration is bound- and Notify operations to storage that have additional edly undefined. semantics other than the reading and writing of data For Decorated Load operations, a Load operation with values to the addressed storage locations. A decora- the specified decoration is performed to the EA and the tion is specified that provides semantic information data provided by the decorated device is placed in the about how the operation is to be performed. A deco- target register. rated device is a device that implements an address range of storage, and applies decorations to operations For Decorated Store operations, a Store operation performed on the address range of storage. using the data specified in the source register with the specified decoration is performed to the EA. A Decorated Storage instruction specifies the following attributes: Decorated Load instructions are treated as Load The type of access, which is either a Decorated instructions for address translation, access control, Load, Decorated Store, or a Decorated Notify. debug events, storage attributes, alignment, and mem- The EA in register RB, to which the operation is to ory access ordering. Decorated Store instructions are be performed. treated as Store instructions for address translation, The decoration in register RA, which further access control, debug events, storage attributes, align- defines what operation should be performed by the ment, and memory access ordering. A Decorated decorated device. Notify instruction is treated as a zero byte Store for The data itself, either data provided by the proces- address translation, access control, debug events, stor- sor to the decorated device (in the case of a Deco- age attributes, alignment, and memory access order- rated Store), or the data provided by the decorated ing. device to be consumed by the processor (in the case of a Decorated Load). Decorated Notify oper- Programming Note ations do not contain data. Software should be acutely aware of how transac- The semantics of any Decorated Storage operation that tions to a decorated device that implements Deco- is Caching Inhibited are defined by the decorated rated Storage will occur. Not only does this imply device depending on whether it is a Decorated Load, knowing the particular decorated device's seman- Decorated Store, or Decorated Notify, and the value tics, but also ensuring that the transactions are supplied as a decoration. Such semantics may differ appropriately issued by the processor. This from decorated device to decorated device similar to includes alignment, speculative accesses, and how devices other than well-behaved memory may ordering. In general, Caching Inhibited accesses treat Load and Store operations. The semantics of any are required to be Guarded and properly aligned. operation associated with a Decorated Storage opera- tion that is not Caching Inhibited are the same as an analogous Load or Store instruction of the same data size. The results of a Decorated Storage operation that is Caching Inhibited to a device that does not support decorations is boundedly undefined. The results of a Load or Store operation that is Caching Inhibited to a Chapter 6. Decorated Storage Facility [Category: Decorated Storage] 711 Version 2.06 6.1 Decorated Load Instructions Load Byte with Decoration Indexed X- Load Doubleword with Decoration form Indexed X-form lbdx RT,RA,RB lddx RT,RA,RB [Co-requisite category: 64-Bit] 31 RT RA RB 515 / 31 RT RA RB 611 / 0 6 11 16 21 31 0 6 11 16 21 31 EA (RB) EA (RB) RT 560 || MEM_DECORATED(EA,1,(RA)) RT MEM_DECORATED(EA,8,(RA)) Let the effective address (EA) be the contents of RB. Let the effective address (EA) be the contents of RB. The byte in storage addressed by EA is loaded using The doubleword in storage addressed by EA is loaded the decoration supplied by (RA) into RT56:63. RT0:55 using the decoration supplied by (RA) into RT. are set to 0. Special Registers Altered: Special Registers Altered: None None Load Floating Doubleword with Load Halfword with Decoration Indexed Decoration Indexed X-form X-form lfddx FRT,RA,RB [Co-requisite category: FP] lhdx RT,RA,RB 31 FRT RA RB 803 / 31 RT RA RB 547 / 0 6 11 16 21 31 0 6 11 16 21 31 EA (RB) EA (RB) FRT MEM_DECORATED(EA,8,(RA)) RT 48 0 || MEM_DECORATED(EA,2,(RA)) Let the effective address (EA) be the contents of RB. Let the effective address (EA) be the contents of RB. The doubleword in storage addressed by EA is loaded The halfword in storage addressed by EA is loaded using the decoration supplied by (RA) into FRT. using the decoration supplied by (RA) into RT48:63. Special Registers Altered: RT0:47 are set to 0. None Special Registers Altered: None Load Word with Decoration Indexed X- form lwdx RT,RA,RB 31 RT RA RB 579 / 0 6 11 16 21 31 EA (RB) 32 RT 0 || MEM_DECORATED(EA,4,(RA)) Let the effective address (EA) be the contents of RB. The word in storage addressed by EA is loaded using the decoration supplied by (RA) into RT32:63. RT0:31 are set to 0. Special Registers Altered: None 712 Power ISATM Book II Version 2.06 6.2 Decorated Store Instructions Store Byte with Decoration Indexed X- Store Doubleword with Decoration form Indexed X-form stbdx RS,RA,RB stddx RS,RA,RB [Co-requisite category: 64-Bit] 31 RS RA RB 643 / 31 RS RA RB 739 / 0 6 11 16 21 31 0 6 11 16 21 31 EA (RB) EA (RB) MEM_DECORATED(EA,1,(RA)) (RS)56:63 MEM_DECORATED(EA,8,(RA)) (RS) Let the effective address (EA) be the contents of RB. (RS)56:63 are stored to the byte in storage addressed by Let the effective address (EA) be the contents of RB. EA using the decoration supplied by (RA). (RS) is stored to the doubleword in storage addressed Special Registers Altered: by EA using the decoration supplied by (RA). None Special Registers Altered: None Store Halfword with Decoration Indexed X-form Store Floating Doubleword with sthdx RS,RA,RB Decoration Indexed X-form stfddx FRS,RA,RB [Co-requisite category: FP] 31 RS RA RB 675 / 0 6 11 16 21 31 31 FRS RA RB 931 / 0 6 11 16 21 31 EA (RB) MEM_DECORATED(EA,2,(RA)) (RS)48:63 EA (RB) Let the effective address (EA) be the contents of RB. MEM_DECORATED(EA,8,(RA)) (FRS) (RS)48:63 are stored to the halfword in storage addressed by EA using the decoration supplied by Let the effective address (EA) be the contents of RB. (RA). (FRS) is stored to the doubleword in storage addressed Special Registers Altered: by EA using the decoration supplied by (RA). None Special Registers Altered: None Store Word with Decoration Indexed X- form stwdx RS,RA,RB 31 RS RA RB 707 / 0 6 11 16 21 31 EA (RB) MEM_DECORATED(EA,4,(RA)) (RS)32:63 Let the effective address (EA) be the contents of RB. (RS)32:63 are stored to the word in storage addressed by EA using the decoration supplied by (RA). Special Registers Altered: None Chapter 6. Decorated Storage Facility [Category: Decorated Storage] 713 Version 2.06 6.3 Decorated Notify Instructions Decorated Storage Notify X-form dsn RA,RB 31 /// RA RB 483 / 0 6 11 16 21 31 EA (RB) MEM_NOTIFY(EA,(RA)) Let the effective address (EA) be the contents of RB. The decoration supplied by (RA) is sent to the address in storage specified by EA. Special Registers Altered: None 714 Power ISATM Book II Version 2.06 Chapter 7. External Control [Category: External Control] The External Control category of facilities and instruc- The ecowx instruction might be used to send the tions permits a program to communicate with a special- device the translated real address of a buffer contain- purpose device. Two instructions are provided, both of ing graphics data, and the word transmitted from the which must be implemented if the facility is provided. General Purpose Register might be control information that tells the adapter what operation to perform on the External Control In Word Indexed (eciwx), which data in the buffer. The eciwx instruction might be used does the following: to load status information from the adapter. - Computes an effective address (EA) like most A device designed to be used with the External Control X-form instructions facility may also recognize events that indicate that the - Validates the EA as would be done for a load address translation being used by the processor has from that address changed. In this case the operating system need not - Translates the EA to a real address "pin" the area of storage identified by an eciwx or - Transmits the real address to the device ecowx instruction (i.e., need not protect it from being - Accepts a word of data from the device and paged out). places it into a General Purpose Register External Control Out Word Indexed (ecowx), which does the following: - Computes an effective address (EA) like most X-form instructions - Validates the EA as would be done for a store to that address - Translates the EA to a real address - Transmits the real address and a word of data from a General Purpose Register to the device Permission to execute these instructions and identifica- tion of the target device are controlled by two fields, called the E bit and the RID field respectively. If attempt is made to execute either of these instructions when E=0 the system data storage error handler is invoked. The location of these fields is described in Book III. The storage access caused by eciwx and ecowx is performed as though the specified storage location is Caching Inhibited and Guarded, and is neither Write Through Required nor Memory Coherence Required. Interpretation of the real address transmitted by eciwx and ecowx and of the 32-bit value transmitted by ecowx is up to the target device, and is not specified by the Power ISA. See the System Architecture documen- tation for a given Power ISA system for details on how the External Control facility can be used with devices on that system. Example An example of a device designed to be used with the External Control facility might be a graphics adapter. Chapter 7. External Control [Category: External Control] 715 Version 2.06 7.1 External Access Instructions In the instruction descriptions the statements "this treated as a Store" have the same meanings as for the instruction is treated as a Load" and "this instruction is Cache Management instructions; see Section 4.3. External Control In Word Indexed X-form else b (RA) EA b + (RB) eciwx RT,RA,RB raddr address translation of EA send store word request for raddr to device identified by RID send (RS)32:63 to device 31 RT RA RB 310 / 0 6 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). A store word request for the real address correspond- if RA = 0 then b 0 ing to EA and the contents of RS32:63 are sent to the else b (RA) device identified by RID, bypassing the cache. EA b + (RB) raddr address translation of EA The E bit must be 1. If it is not, the data storage error send load word request for raddr to handler is invoked. device identified by RID RT 320 || word from device EA must be a multiple of 4. If it is not, either the system alignment error handler is invoked or the results are Let the effective address (EA) be the sum (RA|0)+(RB). boundedly undefined. A load word request for the real address corresponding This instruction is treated as a Store, except that its to EA is sent to the device identified by RID, bypassing storage access is not performed in program order with the cache. The word returned by the device is placed respect to accesses to other Caching Inhibited and into RT32:63. RT0:31 are set to 0. Guarded storage locations unless software explicitly The E bit must be 1. If it is not, the data storage error imposes that order. handler is invoked. See Book III-S for additional information about this EA must be a multiple of 4. If it is not, either the system instruction. alignment error handler is invoked or the results are Special Registers Altered: boundedly undefined. None This instruction is treated as a Load. See Book III-S for additional information about this instruction. Special Registers Altered: None Programming Note The eieio or mbar instruction can be used to ensure that the storage accesses caused by eciwx and ecowx are performed in program order with respect to other Caching Inhibited and Guarded storage accesses. External Control Out Word Indexed X-form ecowx RS,RA,RB 31 RS RA RB 438 / 0 6 11 16 21 31 if RA = 0 then b 0 716 Power ISATM Book II Version 2.06 Appendix A. Assembler Extended Mnemonics In order to make assembler language programs simpler symbols related to instructions defined in Book II. to write and easier to understand, a set of extended Assemblers should provide the extended mnemonics mnemonics and symbols is provided for certain instruc- and symbols listed here, and may provide others. tions. This appendix defines extended mnemonics and A.1 Data Cache Block Flush A.3 Synchronize Mnemonics Mnemonics The L field in the Synchronize instruction controls the scope of the synchronization function performed by the The L field in the Data Cache Block Flush instruction instruction. Extended mnemonics are provided that controls the scope of the flush function performed by represent the L value in the mnemonic rather than the instruction. Extended mnemonics are provided that requiring it to be coded as a numeric operand. Two represent the L value in the mnemonic rather than extended mnemonics are provided for the L=0 value in requiring it to be coded as a numeric operand. order to support Assemblers that do not recognize the Note: dcbf serves as both a basic and an extended sync mnemonic. mnemonic. The Assembler will recognize a dcbf mne- Note: sync serves as both a basic and an extended monic with three operands as the basic form, and a mnemonic. The Assembler will recognize a sync mne- dcbf mnemonic with two operands as the extended monic with one operand as the basic form, and a sync form. In the extended form the L operand is omitted mnemonic with no operand as the extended form. In and assumed to be 0. the extended form the L operand is omitted and assumed to be 0. dcbf RA,RB (equivalent to: dcbf RA,RB,0) dcbfl RA,RB (equivalent to: dcbfl RA,RB,1) sync (equivalent to: sync 0) msync (equivalent to: sync 0) lwsync (equivalent to: sync 1) A.2 Load and Reserve ptesync (equivalent to: sync 2) Mnemonics The EH field in the Load and Reserve instructions pro- A.4 Wait Mnemonics vides a hint regarding the type of algorithm imple- mented by the instruction sequence being executed. The WC field in the wait instruction determines the Extended mnemonics are provided that allow the EH condition that causes instruction execution to resume. value to be omitted and assumed to be 0b0. Extended mnemonics are provided that represent the WC value in the mnemonic rather than requiring it to be Note: lbarx, lharx, lwarx, and ldarx serve as both coded as a numeric operand. basic and extended mnemonics. The Assembler will recognize these mnemonics with four operands as the Note: wait serves as both a basic and an extended basic form, and these mnemonics with three operands mnemonic. The Assembler will recognize a wait mne- as the extended form. In the extended form the EH monic with one operand as the basic form, and a wait operand is omitted and assumed to be 0. mnemonic with no operands as the extended form. In the extended form the WC operand is omitted and lbarx RT,RA,RB (equivalent to: lbarx RT,RA,RB,0) assumed to be 0. lharx RT,RA,RB (equivalent to: lharx RT,RA,RB,0) lwarx RT,RA,RB (equivalent to: lwarx RT,RA,RB,0) wait (equivalent to: wait 0) ldarx RT,RA,RB (equivalent to: ldarx RT,RA,RB,0) waitrsv (equivalent to: wait 1) waitimpl (equivalent to: wait 2) Appendix A. Assembler Extended Mnemonics 717 Version 2.06 718 Power ISATM Book II Version 2.06 Appendix B. Programming Examples for Sharing Storage This appendix gives examples of how dependencies In these examples it is assumed that contention for the and the Synchronization instructions can be used to shared resource is low; the conditional branches are control storage access ordering when storage is shared optimized for this case by using "+" and "-" suffixes between programs. appropriately. Many of the examples use extended mnemonics (e.g., The examples deal with words; they can be used for bne, bne-, cmpw) that are defined in Appendix E of doublewords by changing all word-specific mnemonics Book I. to the corresponding doubleword-specific mnemonics (e.g., lwarx to ldarx, cmpw to cmpd). Many of the examples use the Load And Reserve and Store Conditional instructions, in a sequence that In this appendix it is assumed that all shared storage begins with a Load And Reserve instruction and ends locations are in storage that is Memory Coherence with a Store Conditional instruction (specifying the Required, and that the storage locations specified by same storage location as the Load Conditional) fol- Load And Reserve and Store Conditional instructions lowed by a Branch Conditional instruction that tests are in storage that is neither Write Through Required whether the Store Conditional instruction succeeded. nor Caching Inhibited. B.1 Atomic Update Primitives An atomic read/modify/write operation reads a storage location and writes its next value, which may be a func- This section gives examples of how the Load And tion of its current value, all as a single atomic operation. Reserve and Store Conditional instructions can be The examples shown provide the effect of an atomic used to emulate atomic read/modify/write operations. read/modify/write operation, but use several instruc- tions rather than a single atomic instruction. Fetch and No-op Fetch and Store The "Fetch and No-op" primitive atomically loads the The "Fetch and Store" primitive atomically loads and current value in a word in storage. replaces a word in storage. In this example it is assumed that the address of the In this example it is assumed that the address of the word to be loaded is in GPR 3 and the data loaded are word to be loaded and replaced is in GPR 3, the new returned in GPR 4. value is in GPR 4, and the old value is returned in GPR 5. loop: lwarx r4,0,r3 #load and reserve loop: stwcx. r4,0,r3 #store old value if lwarx r5,0,r3 #load and reserve # still reserved stwcx. r4,0,r3 #store new value if bne- loop #loop if lost reservation # still reserved bne- loop loop if lost reservation Note: 1. The stwcx., if it succeeds, stores to the target location the same value that was loaded by the preceding lwarx. While the store is redundant with respect to the value in the location, its success ensures that the value loaded by the lwarx is still the current value at the time the stwcx. is exe- cuted. Appendix B. Programming Examples for Sharing Storage 719 Version 2.06 Fetch and Add Compare and Swap The "Fetch and Add" primitive atomically increments a The "Compare and Swap" primitive atomically com- word in storage. pares a value in a register with a word in storage, if they are equal stores the value from a second register In this example it is assumed that the address of the into the word in storage, if they are unequal loads the word to be incremented is in GPR 3, the increment is in word from storage into the first register, and sets the GPR 4, and the old value is returned in GPR 5. EQ bit of CR Field 0 to indicate the result of the com- loop: parison. lwarx r5,0,r3 #load and reserve In this example it is assumed that the address of the add r0,r4,r5#increment word stwcx. r0,0,r3 #store new value if still res'ved word to be tested is in GPR 3, the comparand is in bne- loop #loop if lost reservation GPR 4 and the old value is returned there, and the new value is in GPR 5. Fetch and AND loop: The "Fetch and AND" primitive atomically ANDs a lwarx r6,0,r3 #load and reserve value into a word in storage. cmpw r4,r6 #1st 2 operands equal? bne- exit #skip if not In this example it is assumed that the address of the stwcx. r5,0,r3 #store new value if still res'ved word to be ANDed is in GPR 3, the value to AND into it bne- loop #loop if lost reservation is in GPR 4, and the old value is returned in GPR 5. exit: mr r4,r6 #return value from storage loop: Notes: lwarx r5,0,r3 #load and reserve and r0,r4,r5#AND word 1. The semantics given for "Compare and Swap" stwcx. r0,0,r3 #store new value if still res'ved above are based on those of the IBM System/370 bne- loop #loop if lost reservation Compare and Swap instruction. Other architec- Note: tures may define a Compare and Swap instruction differently. 1. The sequence given above can be changed to per- form another Boolean operation atomically on a 2. "Compare and Swap" is shown primarily for peda- word in storage, simply by changing the and gogical reasons. It is useful on machines that lack instruction to the desired Boolean instruction (or, the better synchronization facilities provided by xor, etc.). lwarx and stwcx.. A major weakness of a Sys- tem/370-style Compare and Swap instruction is that, although the instruction itself is atomic, it Test and Set checks only that the old and current values of the This version of the "Test and Set" primitive atomically word being tested are equal, with the result that loads a word from storage, sets the word in storage to a programs that use such a Compare and Swap to nonzero value if the value loaded is zero, and sets the control a shared resource can err if the word has EQ bit of CR Field 0 to indicate whether the value been modified and the old value subsequently loaded is zero. restored. The sequence shown above has the same weakness. In this example it is assumed that the address of the word to be tested is in GPR 3, the new value (nonzero) 3. In some applications the second bne- instruction is in GPR 4, and the old value is returned in GPR 5. and/or the mr instruction can be omitted. The bne- is needed only if the application requires that loop: if the EQ bit of CR Field 0 on exit indicates "not lwarx r5,0,r3 #load and reserve equal" then (r4) and (r6) are in fact not equal. The cmpwi r5,0 #done if word not equal to 0 mr is needed only if the application requires that if bne- exit the comparands are not equal then the word from stwcx. r4,0,r3 #try to store non-0 storage is loaded into the register with which it was bne- loop #loop if lost reservation compared (rather than into a third register). If exit: ... either or both of these instructions is omitted, the resulting Compare and Swap does not obey Sys- tem/370 semantics. 720 Power ISATM Book II Version 2.06 B.2 Lock Acquisition and Release, and Related Techniques This section gives examples of how dependencies and ment locks, import and export barriers, and similar con- the Synchronization instructions can be used to imple- structs. B.2.1 Lock Acquisition and Import quent isync create an import barrier that prevents the load from "data1" from being performed until the branch Barriers has been resolved not to be taken. An "import barrier" is an instruction or sequence of If the shared data structure is in storage that is neither instructions that prevents storage accesses caused by Write Through Required nor Caching Inhibited, an instructions following the barrier from being performed lwsync instruction can be used instead of the isync before storage accesses that acquire a lock have been instruction. If lwsync is used, the load from "data1" performed. An import barrier can be used to ensure may be performed before the stwcx.. But if the stwcx. that a shared data structure protected by a lock is not fails, the second branch is taken and the lwarx is re- accessed until the lock has been acquired. A sync executed. If the stwcx. succeeds, the value returned instruction can be used as an import barrier, but the by the load from "data1" is valid even if the load is per- approaches shown below will generally yield better per- formed before the stwcx., because the lwsync formance because they order only the relevant storage ensures that the load is performed after the instance of accesses. the lwarx that created the reservation used by the suc- cessful stwcx.. B.2.1.1 Acquire Lock and Import Shared Storage B.2.1.2 Obtain Pointer and Import If lwarx and stwcx. instructions are used to obtain the Shared Storage lock, an import barrier can be constructed by placing an If lwarx and stwcx. instructions are used to obtain a isync instruction immediately following the loop con- pointer into a shared data structure, an import barrier is taining the lwarx and stwcx.. The following example not needed if all the accesses to the shared data struc- uses the "Compare and Swap" primitive to acquire the ture depend on the value obtained for the pointer. The lock. following example uses the "Fetch and Add" primitive to obtain and increment the pointer. In this example it is assumed that the address of the lock is in GPR 3, the value indicating that the lock is In this example it is assumed that the address of the free is in GPR 4, the value to which the lock should be pointer is in GPR 3, the value to be added to the pointer set is in GPR 5, the old value of the lock is returned in is in GPR 4, and the old value of the pointer is returned GPR 6, and the address of the shared data structure is in GPR 5. in GPR 9. loop: loop: lwarx r5,0,r3 #load pointer and reserve lwarx r6,0,r3,1 #load lock and reserve add r0,r4,r5#increment the pointer cmpw r4,r6 #skip ahead if stwcx. r0,0,r3 #try to store new value bne- wait # lock not free bne- loop #loop if lost reservation stwcx. r5,0,r3 #try to set lock lwz r7,data1(r5) #load shared data bne- loop #loop if lost reservation isync #import barrier The load from "data1" cannot be performed until the lwz r7,data1(r9)#load shared data pointer value has been loaded into GPR 5 by the . lwarx. The load from "data1" may be performed before . the stwcx.. But if the stwcx. fails, the branch is taken wait... #wait for lock to free and the value returned by the load from "data1" is dis- carded. If the stwcx. succeeds, the value returned by The hint provided with lwarx indicates that after the the load from "data1" is valid even if the load is per- program acquires the lock variable (i.e. stwcx. is suc- formed before the stwcx., because the load uses the cessful), it will release it (i.e. store to it) prior to another pointer value returned by the instance of the lwarx that program attempting to modify it. created the reservation used by the successful stwcx.. The second bne- does not complete until CR0 has An isync instruction could be placed between the bne- been set by the stwcx.. The stwcx. does not set CR0 and the subsequent lwz, but no isync is needed if all until it has completed (successfully or unsuccessfully). accesses to the shared data structure depend on the The lock is acquired when the stwcx. completes suc- value returned by the lwarx. cessfully. Together, the second bne- and the subse- Appendix B. Programming Examples for Sharing Storage 721 Version 2.06 B.2.2 Lock Release and Export The lwsync ensures that the store that releases the lock will not be performed with respect to any other pro- Barriers cessor until all stores caused by instructions preceding the lwsync have been performed with respect to that An "export barrier" is an instruction or sequence of processor. instructions that prevents the store that releases a lock from being performed before stores caused by instruc- tions preceding the barrier have been performed. An export barrier can be used to ensure that all stores to a shared data structure protected by a lock will be per- B.2.3 Safe Fetch formed with respect to any other processor before the If a load must be performed before a subsequent store store that releases the lock is performed with respect to (e.g., the store that releases a lock protecting a shared that processor. data structure), a technique similar to the following can be used. B.2.2.1 Export Shared Storage and In this example it is assumed that the address of the Release Lock storage operand to be loaded is in GPR 3, the contents A sync instruction can be used as an export barrier of the storage operand are returned in GPR 4, and the independent of the storage control attributes (e.g., address of the storage operand to be stored is in GPR presence or absence of the Caching Inhibited attribute) 5. of the storage containing the shared data structure. lwz r4,0(r3)#load shared data Because the lock must be in storage that is neither cmpw r4,r4 #set CR0 to "equal" Write Through Required nor Caching Inhibited, if the bne- $-8 #branch never taken shared data structure is in storage that is Write stw r7,0(r5)#store other shared data Through Required or Caching Inhibited a sync instruc- tion must be used as the export barrier. An alternative is to use a technique similar to that described in Section B.2.1.2, by causing the stw to In this example it is assumed that the shared data depend on the value returned by the lwz and omitting structure is in storage that is Caching Inhibited, the the cmpw and bne-. The dependency could be created address of the lock is in GPR 3, the value indicating by ANDing the value returned by the lwz with zero and that the lock is free is in GPR 4, and the address of the then adding the result to the value to be stored by the shared data structure is in GPR 9. stw. If both storage operands are in storage that is nei- ther Write Through Required nor Caching Inhibited, stw r7,data1(r9)#store shared data (last) another alternative is to replace the cmpw and bne- sync #export barrier with an lwsync instruction. stw r4,lock(r3)#release lock The sync ensures that the store that releases the lock will not be performed with respect to any other proces- sor until all stores caused by instructions preceding the sync have been performed with respect to that proces- sor. B.2.2.2 Export Shared Storage and Release Lock using lwsync If the shared data structure is in storage that is neither Write Through Required nor Caching Inhibited, an lwsync instruction can be used as the export barrier. Using lwsync rather than sync will yield better perfor- mance in most systems. In this example it is assumed that the shared data structure is in storage that is neither Write Through Required nor Caching Inhibited, the address of the lock is in GPR 3, the value indicating that the lock is free is in GPR 4, and the address of the shared data structure is in GPR 9. stw r7,data1(r9)#store shared data (last) lwsync #export barrier stw r4,lock(r3)#release lock 722 Power ISATM Book II Version 2.06 B.3 List Insertion B.4 Notes This section shows how the lwarx and stwcx. instruc- 1. To increase the likelihood that forward progress is tions can be used to implement simple insertion into a made, it is important that looping on lwarx/stwcx. singly linked list. (Complicated list insertion, in which pairs be minimized. For example, in the "Test and multiple values must be changed atomically, or in Set" sequence shown in Section B.1, this is which the correct order of insertion depends on the achieved by testing the old value before attempting contents of the elements, cannot be implemented in the the store; were the order reversed, more stwcx. manner shown below and requires a more complicated instructions might be executed, and reservations strategy such as using locks.) might more often be lost between the lwarx and the stwcx. The "next element pointer" from the list element after which the new element is to be inserted, here called the 2. The manner in which lwarx and stwcx. are com- "parent element", is stored into the new element, so municated to other processors and mechanisms, that the new element points to the next element in the and between levels of the storage hierarchy within list; this store is performed unconditionally. Then the a given processor, is implementation-dependent. address of the new element is conditionally stored into In some implementations performance may be the parent element, thereby adding the new element to improved by minimizing looping on a lwarx instruc- the list. tion that fails to return a desired value. For exam- ple, in the "Test and Set" sequence shown in In this example it is assumed that the address of the Section B.1, if the programmer wishes to stay in parent element is in GPR 3, the address of the new ele- the loop until the word loaded is zero, he could ment is in GPR 4, and the next element pointer is at off- change the "bne- exit" to "bne- loop". However, in set 0 from the start of the element. It is also assumed some implementations better performance may be that the next element pointer of each list element is in a obtained by using an ordinary Load instruction to reservation granule separate from that of the next ele- do the initial checking of the value, as follows. ment pointer of all other list elements. loop: loop: lwz r5,0(r3)#load the word lwarx r2,0,r3 #get next pointer cmpwi r5,0 #loop back if word stw r2,0(r4)#store in new element bne- loop # not equal to 0 lwsync or sync #order stw before stwcx lwarx r5,0,r3 #try again, reserving stwcx. r4,0,r3 #add new element to list cmpwi r5,0 # (likely to succeed) bne- loop #loop if stwcx. failed bne- loop stwcx.r4,0,r3 #try to store non-0 In the preceding example, if two list elements have next bne- loop #loop if lost reserv'n element pointers in the same reservation granule then, in a multiprocessor, "livelock" can occur. (Livelock is a 3. In a multiprocessor, livelock is possible if there is a state in which processors interact in a way such that no Store instruction (or any other instruction that can processor makes forward progress.) clear another processor's reservation; see Section 1.7.3.1) between the lwarx and the stwcx. of a If it is not possible to allocate list elements such that lwarx/stwcx. loop and any byte of the storage each element's next element pointer is in a different location specified by the Store is in the reservation reservation granule, then livelock can be avoided by granule. For example, the first code sequence using the following, more complicated, sequence. shown in Section B.3 can cause livelock if two list elements have next element pointers in the same lwz r2,0(r3)#get next pointer reservation granule. loop1: mr r5,r2 #keep a copy stw r2,0(r4)#store in new element sync #order stw before stwcx. and before lwarx loop2: lwarx r2,0,r3 #get it again cmpw r2,r5 #loop if changed (someone bne- loop1 # else progressed) stwcx. r4,0,r3 #add new element to list bne- loop2 #loop if failed In the preceding example, livelock is avoided by the fact that each processor re-executes the stw only if some other processor has made forward progress. Appendix B. Programming Examples for Sharing Storage 723 Version 2.06 724 Power ISATM Book II Version 2.06 Book III-S: Power ISA Operating Environment Architecture - Server Environment [Category: Server] Book III-S: Power ISA AS Operating Environment Architecture 725 Version 2.06 726 Power ISATM Book III-S Version 2.06 Chapter 1. Introduction 1.1 Overview. . . . . . . . . . . . . . . . . . . . 727 1.4 Exceptions. . . . . . . . . . . . . . . . . . . 729 1.2 Document Conventions . . . . . . . . 727 1.5 Synchronization. . . . . . . . . . . . . . . 729 1.2.1 Definitions and Notation . . . . . . 727 1.5.1 Context Synchronization . . . . . . 729 1.2.2 Reserved Fields. . . . . . . . . . . . . 728 1.5.2 Execution Synchronization . . . . . 730 1.3 General Systems Overview . . . . . 729 1.1 Overview For "system trap handler" substitute "Trap type Program interrupt". Chapter 1 of Book I describes computation modes, document conventions, a general systems overview, 1.2.1 Definitions and Notation instruction formats, and storage addressing. This chap- ter augments that description as necessary for the The definitions and notation given in Book I are aug- Power ISA Operating Environment Architecture. mented by the following. Threaded processor, single-threaded processor, thread 1.2 Document Conventions A threaded processor implements one or more The notation and terminology used in Book I apply to "threads", where a thread corresponds to the Book this Book also, with the following substitutions. I/II concept of "processor". That is, the definition of For "system alignment error handler" substitute "thread" is the same as the Book I definition of "Alignment interrupt". "processor", and "processor" as used in Books I and II can be thought of as either a single-threaded For "system data storage error handler" substitute processor or as one thread of a multi-threaded "Data Storage interrupt", "Hypervisor Data Storage processor. The only unqualified uses of "proces- interrupt", or "Data Segment interrupt", as appro- sor" in Book III are in resource names (e.g. Pro- priate. cessor Identification Register); such uses should For "system error handler" substitute "interrupt". be regarded as meaning "threaded processor". The threads of a multi-threaded processor typically For "system floating-point enabled exception error share certain resources, such as the hardware handler" substitute "Floating-Point Enabled Excep- components that execute certain kinds of instruc- tion type Program interrupt". tions (e.g., Fixed-Point instructions), certain For "system illegal instruction error handler" substi- caches, the address translation mechanism, and tute "Hypervisor Emulation Assistance interrupt". certain hypervisor resources. For "system instruction storage error handler" sub- stitute "Instruction Storage interrupt", "Hypervisor real page Instruction Storage interrupt", or "Instruction Seg- A unit of real storage that is aligned at a boundary ment interrupt", as appropriate. that is a multiple of its size. The real page size is For "system privileged instruction error handler" 4KB. substitute "Privileged Instruction type Program context of a program interrupt". The state (e.g., privilege and relocation) in which For "system service program" substitute "System the program executes. The context is controlled by Call interrupt". the contents of certain System Registers, such as the MSR and SDR1, of certain lookaside buffers, such as the SLB and TLB, and of the Page Table. Chapter 1. Introduction 727 Version 2.06 exception architected location or to an implementation- An error, unusual condition, or external signal, that dependent location. Any use of emulation assists may set a status bit and may or may not cause an or interrupts to implement the architecture is imple- interrupt, depending upon whether the correspond- mentation-dependent. ing interrupt is enabled. hypervisor privileged interrupt A term used to describe an instruction or facility The act of changing the machine state in response that is available only when the thread is in hypervi- to an exception, as described in Chapter sor state. 6. "Interrupts" on page 821. privileged state and supervisor mode trap interrupt Used interchangeably to refer to a state in which An interrupt that results from execution of a Trap privileged facilities are available. instruction. problem state and user mode Additional exceptions to the sequential execution Used interchangeably to refer to a state in which model, beyond those described in the section enti- privileged facilities are not available. tled "Instruction Fetching" in Book I, are the follow- /, //, ///, ... denotes a field that is reserved in an ing. instruction, in a register, or in an architected stor- - A System Reset or Machine Check interrupt age table. may occur. The determination of whether an ?, ??, ???, ... denotes a field that is implementa- instruction is required by the sequential exe- tion-dependent in an instruction, in a register, or in cution model is not affected by the potential an architected storage table. occurrence of a System Reset or Machine Check interrupt. (The determination is affected by the potential occurrence of any 1.2.2 Reserved Fields other kind of interrupt.) Book I's description of the handling of reserved bits in - A context-altering instruction is executed System Registers, and of reserved values of defined (Chapter 10. "Synchronization Requirements fields of System Registers, applies also to the SLB. for Context Alterations" on page 861). The Book I's description of the handling of reserved values context alteration need not take effect until the of defined fields of System Registers applies also to required subsequent synchronizing operation architected storage tables (e.g., the Page Table). has occurred. Some fields of certain architected storage tables may - A Reference and Change bit is updated by the be written to automatically by the hardware, e.g., Refer- thread. The update need not be performed ence and Change bits in the Page Table. When the with respect to that thread until the required hardware writes to such a table, the following rules are subsequent synchronizing operation has obeyed. occurred. Unless otherwise stated, no defined field other - A Branch instruction is executed and the than the one(s) specifically being updated are branch is taken. The update of the Come- modified. From Address Register (see Section 8.1.1 of Book III-S) need not occur until a subse- Contents of reserved fields are either preserved or quent context synchronizing operation has written as zero. occurred. Programming Note "must" If hypervisor software violates a rule that is stated Software should set reserved fields in the SLB and using the word "must" (e.g., "this field must be set in architected storage tables to zero, because to 0"), and the rule pertains to the contents of a these fields may be assigned a meaning in some hypervisor resource, to executing an instruction future version of the architecture. that can be executed only in hypervisor state, or to accessing storage in real addressing mode, the results are undefined, and may include altering resources belonging to other partitions, causing 1.3 General Systems Overview the system to "hang", etc. The hardware contains the sequencing and processing hardware controls for instruction fetch, instruction execution, and Any combination of hard-wired implementation, interrupt action. Most implementations also contain emulation assist, or interrupt for software assis- data and instruction caches. Instructions that the pro- tance. In the last case, the interrupt may be to an cessing unit can execute fall into the following classes: 728 Power ISATM Book III-S Version 2.06 instructions executed in the Branch Facility precede the operation have completed to a point at instructions executed in the Fixed-Point Facility which they have reported all exceptions they will instructions executed in the Floating-Point Facility cause. instructions executed in the Vector Facility 3. The operation ensures that the instructions that Almost all instructions executed in the Branch Facility, precede the operation will complete execution in Fixed-Point Facility, Floating-Point Facility, and Vector the context (privilege, relocation, storage protec- Facility are nonprivileged and are described in Book I. tion, etc.) in which they were initiated, except that Book II may describe additional nonprivileged instruc- the operation has no effect on the context in which tions (e.g., Book II describes some nonprivileged the associated Reference and Change bit updates instructions for cache management). Instructions are performed. related to the privileged state, control of hardware 4. If the operation directly causes an interrupt (e.g., resources, control of the storage hierarchy, and all sc directly causes a System Call interrupt) or is an other privileged instructions are described here or are interrupt, the operation is not initiated until no implementation-dependent. exception exists having higher priority than the exception associated with the interrupt (see Sec- tion 6.8). 1.4 Exceptions 5. The operation ensures that the instructions that fol- The following augments the exceptions defined in Book low the operation will be fetched and executed in I that can be caused directly by the execution of an the context established by the operation. (This instruction: requirement dictates that any prefetched instruc- the execution of a floating-point instruction when tions be discarded and that any effects and side MSRFP=0 (Floating-Point Unavailable interrupt) effects of executing them out-of-order also be dis- carded, except as described in Section 5.5, "Per- an attempt to modify a hypervisor resource when forming Operations Out-of-Order".) the thread is in privileged but non-hypervisor state (see Chapter 2), or an attempt to execute a hyper- Programming Note visor-only instruction (e.g., tlbie) when the thread A context synchronizing operation is necessarily is in privileged but non-hypervisor state execution synchronizing; see Section 1.5.2. the execution of a traced instruction (Trace inter- Unlike the Synchronize instruction, a context syn- rupt) chronizing operation does not affect the order in the execution of a Vector instruction when the vec- which storage accesses are performed. tor facility is unavailable (Vector Unavailable inter- rupt) Item 2 permits a choice only for isync (and sync and ptesync; see Section 1.5.2) because all other execution synchronizing operations also alter con- 1.5 Synchronization text. The synchronization described in this section refers to the state of the thread that is performing the synchroni- zation. 1.5.1 Context Synchronization An instruction or event is context synchronizing if it sat- isfies the requirements listed below. Such instructions and events are collectively called context synchronizing operations. The context synchronizing operations are the isync instruction, the System Linkage instructions, the mtmsr[d] instructions with L=0, and most interrupts (see Section 6.4). 1. The operation causes instruction dispatching (the issuance of instructions by the instruction fetching mechanism to any instruction execution mecha- nism) to be halted. 2. The operation is not initiated or, in the case of isync, does not complete, until all instructions that Chapter 1. Introduction 729 Version 2.06 1.5.2 Execution Synchronization An instruction is execution synchronizing if it satisfies items 2 and 3 of the definition of context synchroniza- tion (see Section 1.5.1). sync and ptesync are treated like isync with respect to item 2. The execution syn- chronizing instructions are sync, ptesync, the mtmsr[d] instructions with L=1, and all context syn- chronizing instructions. Programming Note Unlike a context synchronizing operation, an exe- cution synchronizing instruction does not ensure that the instructions following that instruction will execute in the context established by that instruc- tion. This new context becomes effective some- time after the execution synchronizing instruction completes and before or at a subsequent context synchronizing operation. 730 Power ISATM Book III-S Version 2.06 Chapter 2. Logical Partitioning (LPAR) 2.1 Overview. . . . . . . . . . . . . . . . . . . . 731 2.6 Processor Compatibility Register 2.2 Logical Partitioning Control Register (PCR). . . . . . . . . . . . . . . . . . . . . . . . . . 735 (LPCR) . . . . . . . . . . . . . . . . . . . . . . . . 731 2.7 Other Hypervisor Resources. . . . . 737 2.3 Real Mode Offset Register (RMOR) . . 2.8 Sharing Hypervisor Resources . . . 738 734 2.9 Hypervisor Interrupt Little-Endian 2.4 Hypervisor Real Mode Offset Register (HILE) Bit . . . . . . . . . . . . . . . . . . . . . . . 739 (HRMOR) . . . . . . . . . . . . . . . . . . . . . . 734 2.5 Logical Partition Identification Register (LPIDR) . . . . . . 734 2.1 Overview The number of partitions supported is implementation- dependent. The Logical Partitioning (LPAR) facility permits threads A thread is assigned to one partition at any given time. and portions of real storage to be assigned to logical A thread can be assigned to any given partition without collections called partitions, such that a program exe- consideration of the physical configuration of the sys- cuting on a thread in one partition cannot interfere with tem (e.g., shared registers, caches, organization of the any program executing on a thread in a different parti- storage hierarchy), except that threads that share cer- tion. This isolation can be provided for both problem tain hypervisor resources may need to be assigned to state and privileged state programs, by using a layer of the same partition; see Section 2.7. The registers and trusted software, called a hypervisor program (or sim- facilities used to control Logical Partitioning are listed ply a "hypervisor"), and the resources provided by this below and described in the following subsections. facility to manage system resources. (A hypervisor is a program that runs in hypervisor state; see below.) Except in the following subsections, references to the "operating system" in this document include the hyper- visor unless otherwise stated or obvious from context. 2.2 Logical Partitioning Control Register (LPCR) The layout of the Logical Partitioning Control Register (LPCR) is shown in Figure 1 below. VRMASD HDICE RMLS DPFD PECE LPES /// /// /// /// /// /// MER ILE TC VC 0 3 9 12 17 34 38 39 49 52 5354 55 60 62 63 Figure 1. Logical Partitioning Control Register The contents of the LPCR control a number of aspects Bit Description of the operation of the thread with respect to a logical 0:2 Virtualization Control (VC) partition. Below are shown the bit definitions for the LPCR. Chapter 2. Logical Partitioning (LPAR) 731 Version 2.06 Controls the virtualization of partition memory. VPM0=0 or address translation is enabled, the This field contains two subfields, VPM and setting of the VRMASD has no effect. ISL. 0:1 Virtualized Partition Memory (VPM) Bit Description This field controls whether VPM mode is 0 Virtual Page Size Selector Bit 0 (L) enabled as specified below. (See Section 1:2 Reserved 5.7.3.4 and Section 5.7.2, "Virtualized Par- 3:4 Virtual Page Size Selector Bits 1:2 (LP) tition Memory (VPM) Mode" for additional information on VPM mode.) Programming Note 17:33 Reserved Bit Description 34:37 Real Mode Limit Selector (RMLS) 0 This bit controls whether VPM mode is enabled when address translation is The RMLS field specifies the largest effective disabled address that can be used by partition software 0 - VPM mode disabled when address translation is disabled. The 1 - VPM mode enabled valid RMLS values are implementation- dependent, and each value corresponds to a 1 This bit controls whether VPM mode is maximum effective address of 2m, where m enabled when address translation is has a minimum value of 12 and a maximum enabled value equal to the number of bits in the real 0 - VPM mode disabled address size supported by the implementa- 1 - VPM mode enabled tion. 38 Interrupt Little-Endian (ILE) 2 Ignore SLB Large Page Specification (ISL) The contents of the ILE bit are copied into MSRLE by interrupts that set MSRHV to 0 (see Controls whether ISL mode is enabled as Section 6.5), to establish the Endian mode for specified below. the interrupt handler. 0 - ISL mode disabled 39:48 Reserved 1 - ISL mode enabled 49:51 Power-saving mode Exit Cause Enable When ISL mode is enabled and address (PECE) translation is enabled and the thread is not 49 If PECE0 = 1 when a power-saving mode in hypervisor state, address translation is instruction is executed, External exceptions performed as if the contents of SLBL||LP are enabled to cause exit from power-saving were 0b000. When address translation is mode; otherwise External exceptions are dis- disabled, the setting of the ISL bit has no abled from causing exit from power-saving effect. ISL mode has no effect on SLB, mode. TLB, and ERAT entry invalidations caused by slbie, slbia, tlbia, tlbie, and slbie. 50 If PECE1 = 1 when a power-saving mode instruction is executed, Decrementer excep- 3:8 Reserved tions are enabled to cause exit from power- 9:11 Default Prefetch Depth (DPFD) saving mode; otherwise Decrementer excep- tions are disabled from causing exit from The DPFD field is used as the default prefetch power-saving mode. (In sleep and rvwinkle depth for data stream prefetching when power-saving levels, Decrementer exceptions DSCRDPFD=0; see page 682. do not occur if the state of the Decrementer is 12:16 Virtual Real Mode Area Segment Descrip- not maintained and updated as if the thread tor (VRMASD) was not in power-saving mode.) When address translation is disabled and 51 If PECE2=1 when a power-saving mode VPM0=1, the contents of this field specify the instruction is executed, Machine Check, L and LP fields of the segment descriptor that Hypervisor Maintenance, and certain imple- apply for storage references to the virtualized mentation-specific exceptions are enabled to real mode area (VRMA). See Section 5.7.3.4 cause exit from power-saving mode; other- for additional information. The definitions and wise Machine Check, Hypervisor Mainte- allowed values of the L and LP fields are the nance, and the same implementation-specific same as for the corresponding fields in the exceptions are disabled from causing exit segment descriptor. (See Section 5.7.7.) If from power-saving mode. 732 Power ISATM Book III-S Version 2.06 It is implementation-specific whether the exceptions Controls whether External interrupts set enabled by the PECE field cause exit from sleep and MSRHV to 1 and MSRRI to 0, or leaves them rvwinkle power-saving levels. See Section 6.5.1 and unchanged. Section 6.5.2 for additional information about exit from 61 LPES1 power-saving mode. Controls how storage is accessed when 52 Mediated External Exception Request address translation is disabled, and whether a (MER) subset of interrupts set MSRHV to 1. 0 A Mediated External exception is not requested. Programming Note 1 A Mediated External exception is LPES1=0 provides an environment in requested. which only the hypervisor can run with The exception effects of this bit are said to be address translation disabled and in which consistent with the contents of this bit if one of all interrupts invoke the hypervisor. This the following statements is true. value (along with MSRHV=1) can also be - LPCRMER = 1 and a Mediated External used in a system that is not partitioned, to exception exists. permit the operating system to access all - LPCRMER = 0 and a Mediated External system resources. exception does not exist. 62 Reserved A context synchronizing instruction or event that is executed or occurs when LPCRMER = 0 63 Hypervisor Decrementer Interrupt Condi- ensures that the exception effects of tionally Enable (HDICE) LPCRMER are consistent with the contents of 0 Hypervisor Decrementer interrupts are LPCRMER. Otherwise, when an instruction disabled. changes the contents of LPCRMER, the 1 Hypervisor Decrementer interrupts are exception effects of LPCRMER become con- enabled if permitted by MSREE, MSRHV, sistent with the new contents of LPCRMER and MSRPR; see Section 6.5.12 on reasonably soon after the change. page 839. Programming Note See Section 5.7.3 on page 774 (including subsections) and Section 5.7.9 on page 790 for a description of how LPCRMER provides a means for the storage accesses are affected by the setting of LPES1, hypervisor to direct an external exception and RMLS. See Section 6.5 on page 828 for a descrip- to a partition independent of the partition's tion of how the setting of LPES0:1 affects the process- MSREE setting. (When MSREE=0, it is ing of interrupts. inappropriate for the hypervisor to deliver the exception.) Using LPCRMER, the parti- tion can be interrupted upon enabling external interrupts. Without using 2.3 Real Mode Offset Register LPCRMER, the hypervisor must check the (RMOR) state of MSREE whenever it gets control, which will result in less timely delivery of The layout of the Real Mode Offset Register (RMOR) is the exception to the partition. shown in Figure 2 below. 53 Reserved // RMO 54 Translation Control (TC) 0 4 63 0 The secondary Page Table search is Bits Name Description enabled. 4:63 RMO Real Mode Offset 1 The secondary Page Table search is dis- abled. Figure 2. Real Mode Offset Register 55:59 Reserved All other fields are reserved. 60:61 Logical Partitioning Environment Selector The supported RMO values are the non-negative multi- (LPES) ples of 2s, where 2s is the smallest implementation- dependent limit value representable by the contents of Three of the four LPES values are supported. the Real Mode Limit Selector field of the LPCR. The 0b10 value is reserved. 60 LPES0 Chapter 2. Logical Partitioning (LPAR) 733 Version 2.06 The contents of the RMOR affect how some storage Programming Note accesses are performed as described in Section 5.7.3 on page 774 and Section 5.7.4 on page 777. On some implementations, software must prevent the execution of a tlbie instruction with an LPID operand value which matches the contents of 2.4 Hypervisor Real Mode Offset another thread's LPIDR that is being modified or is the same as the new value being written to the Register (HRMOR) LPIDR. This restriction can be met with less effort if one partition identity is used only on threads on The layout of the Hypervisor Real Mode Offset Register which no tlbie instruction is ever executed. This (HRMOR) is shown in Figure 3 below. partition can be thought of as the transfer partition used exclusively to move a thread from one parti- // HRMO tion to another. 0 4 63 Bits Name Description 4:63 HRMO Real Mode Offset 2.6 Processor Compatibility Figure 3. Hypervisor Real Mode Offset Register Register (PCR) All other fields are reserved. The layout of the Processor Compatibility Register (PCR) is shown in Figure 3 below. The supported HRMO values are the non-negative multiples of 2r, where r is an implementation-dependent v2.05 VEC VSX value and 12 r 26. /// // The contents of the HRMOR affect how some storage accesses are performed as described in Section 5.7.3 0 1 2 61 62 63 on page 774 and Section 5.7.4 on page 777. Figure 5. Processor Compatibility Register Starting from bit 0, high-order PCR bits are assigned to 2.5 Logical Partition control the availability of certain categories. Starting from bit 63, low-order PCR bits are assigned to control Identification Register (LPIDR) the availability of resources that are new in a specified version of the Architecture. These low-order bits have The layout of the Logical Partition Identification Regis- the possibility of subsetting the function provided by a ter (LPIDR) is shown in Figure 4 below. category for which availability is controlled by the high- order bits. For example, if a new function is added to LPID Vector category in V 2.06, the V 2.05 bit and the VEC 32 63 bits may be used together to enable a version of the Vector category that was available in V 2.05. Bits Name Description Each defined bit in the PCR controls whether certain 32:63 LPID Logical Partition Identifier instructions, SPRs, and other related facilities are avail- able in problem state. Except as specified elsewhere in Figure 4. Logical Partition Identification Register this section, the PCR has no effect on facilities when The contents of the LPIDR identify the partition to the thread is not in problem state. Facilities that are which the thread is assigned, affecting operations nec- made unavailable by the PCR are treated as follows essary to manage the coherency of some translation when the thread is in problem state. lookaside buffers (see Section 5.10.1 and Chapter 10.). - Instructions are treated as illegal instructions, The number of LPIDR bits supported is implementa- - SPRs are treated as if they were not defined tion-dependent. for the implementation - Fields in instructions are treated as if they were 0s. A PCR bit may also determine how an instruction field value is interpreted or may define other behavior as specified in the bit definitions below. The PCR has no effect on the setting of the MSR and [H]SRR1 by interrupts, and by the [h]rfid and mtmsr[d] instructions, except as specified elsewhere in this section. 734 Power ISATM Book III-S Version 2.06 When facilities that have enable bits in the MSR are - AMR access using SPR 13 made unavailable by the value in the PCR (e.g. Vector - addg6s Facility), they become unavailable as specified above - bperm regardless of whether they are enabled by the corre- - cdtbcd, cbcdtd sponding MSR bit. - dcffix[.] The bit definitions for the PCR are shown below. - divde[o][.], divdeu[o][.], divwe[o][.], divweu[o][.] Bit Description - isel - lfiwzx [Category: Floating-Point: Phased- 0 Vector (VEC) [Category: Vector] In] This bit controls the availability, in problem - fctidu[.], fctiduz[.], fctiwu[.], fctiwuz[.], fcfids[.], fcfidu[.], fcfidus[.], ftdiv, ftsqrt state, of the instructions and facilities in the [Category: Floating-Point: Phased-In] Vector category as it is defined in the latest version of the architecture for which new prob- - ldbrx, stdbrx [Category: 64-bit] lem-state resources are made available. - popcntw, popcntd 0 The instructions and facilities in the Vector category are available in problem state. 0 The listed instructions, facilities, and 1 The instructions and facilities in the Vector behaviors are available in the privilege category are unavailable in problem state. states specified above. 1 The listed instructions, facilities, and Programming Note behaviors are unavailable in the privilege states specified above. Since the category of the VRSAVE regis- ter was changed from Vector to Base in Programming Note versions of the architecture subsequent to Version 2.05, the availability of the The definition of the VSX bit makes facili- VRSAVE register in problem state is con- ties in the VSX category unavailable in trolled by the Vector bit only when the problem state when the v2.05 bit is set to Version 2.05 bit is set to 1. 1 since these facilities were not defined in Version 2.05. See the VSX bit definition 1 VSX [Category: VSX] for additional information. This bit controls the availability, in problem 63 Reserved state, of the instructions and facilities in the VSX category as it was defined in the latest The initial state of the PCR is all 0s. version of the architecture for which new prob- lem-state resources are made available; if the VSX category was not defined in that version of the architecture, then VSX instructions and facilities are unavailable. 0 The instructions and facilities in the VSX category are available in problem state. 1 The instructions and facilities in the VSX category are unavailable in problem state. Programming Note Since facilities in the VSX category were not defined in Version 2.05, these facilities are not available in problem state when the v2.05 bit is set to 1 regardless of the value of the VSX bit. 2:61 Reserved 62 Version 2.05 (v2.05) This bit controls the availability, in problem state, of the following instructions, facilities, and behaviors that were newly available in problem state in the version of the architecture subsequent to Version 2.05. Chapter 2. Logical Partitioning (LPAR) 735 Version 2.06 error detection, set breakpoints, control power Programming Note management, or significantly affect performance. Because the PCR has no effect on privileged instructions except as specified above, privileged ME bit of the MSR instructions that are available on newer implemen- SPRs defined as hypervisor-privileged in Section tations but not available on older implementations 4.4.4. (Note: Although the Time Base, the PURR, will behave differently when the thread is in prob- and the SPURR can be altered only by a hypervi- lem state. On older implementations, either an Ille- sor program, the Time Base can be read by all pro- gal Instruction type Program interrupt or a grams and the PURR and SPURR can be read Hypervisor Emulation Assistance interrupt will when the thread is in privileged state.) occur because the instruction is undefined; on newer implementations, a Privileged Instruction The contents of a hypervisor resource can be modified type Program interrupt will occur because the by the execution of an instruction (e.g., mtspr) only in instruction is implemented. (On older implementa- hypervisor state (MSRHV PR = 0b10). An attempt to tions the interrupt will be an Illegal Instruction type modify the contents of a given hypervisor resource, Program interrupt if the implementation complies other than MSRME, in privileged but non-hypervisor with a version of the architecture that precedes V. state (MSRHV PR = 0b00) causes a Privileged Instruc- 2.05, or complies with V. 2.05 and does not support tion type Program interrupt. An attempt to modify the Hypervisor Emulation Assistance category, and MSRME in privileged but non-hypervisor state is will be a Hypervisor Emulation Assistance interrupt ignored (i.e., the bit is not changed). otherwise.) Programming Note In future versions of the architecture, in general the Because the SPRs listed above are privileged for lowest-order reserved bit of the PCR will be used to writing, an attempt to modify the contents of any of control the availability of the instructions and these SPRs in problem state (MSRPR=1) using related resources that are new in that version of the mtspr causes a Privileged Instruction type Pro- architecture; the name of the bit will correspond to gram exception, and similarly for MSRME. the previous version of the architecture (i.e. the newest version in which the instructions and related resources were not available). In these future versions of the architecture, there 2.8 Sharing Hypervisor will be a requirement that if any bit of the low-order Resources defined bits is set to 1 then all higher-order bits of the defined low-order bits must also be set to 1, Some hypervisor resources may be shared among and the architecture version with which the imple- threads. Programs that modify these resources must mentation appears to comply, in problem state, will be aware of this sharing, and must allow for the fact be the version corresponding to the name of the that changes to these resources may affect more than lowest-order 1 bit in the set of defined low-order one thread. The following resources may be shared PCR bits, or the current architecture version if none among threads. of these bits are 1. Also, in general the highest- RMOR (see Section 2.3) order reserved bits will be used to control the avail- HRMOR (see Section 2.4) ability of sets of instructions and related resources LPIDR (see Section 2.5) having the requirement that their availability be PCR [Category: Processor Control] (see Section independent of versions of the architecture. 2.6) PVR (see Section 4.3.1) SDR1 (see Section 5.7.7.2) 2.7 Other Hypervisor Resources AMOR (see Section 5.7.9.1) HMEER (see Section 6.2.9) In addition to the resources described above, all hyper- Time Base (see Section 7.2) visor privileged instructions as well as the following Hypervisor Decrementer (see Section 7.4) resources are hypervisor resources, accessible to soft- certain implementation-specific registers or imple- ware only when the thread is in hypervisor state except mentation-specific fields in architected registers as noted below. The set of resources that are shared is implementation- All implementation-specific resources, including dependent. implementation-specific registers (e.g., "HID" reg- isters), that control hardware functions or affect the Threads that share any of the resources listed above, results of instruction execution. Examples include with the exception of the PVR and the HRMOR, must resources that disable caches, disable hardware be in the same partition. 736 Power ISATM Book III-S Version 2.06 For each field of the LPCR, except the HDICE field and the MER field, software must ensure that the contents of the field are identical among all threads that are in the same partition and are in a state such that the con- tents of the field could have side effects. (E.g., software must ensure that the contents of LPCRLPES are identi- cal among all threads that are in the same partition and are not in hypervisor state.) For the HDICE field, soft- ware must ensure that the contents of the field are identical among all threads that share the Hypervisor Decrementer and are in a state such that the contents of the field could have side effects. (There are no iden- tity requirements for the MER field). 2.9 Hypervisor Interrupt Little- Endian (HILE) Bit The Hypervisor Interrupt Little-Endian (HILE) bit is a bit in an implementation-dependent register or similar mechanism. The contents of the HILE bit are copied into MSRLE by interrupts that set MSRHV to 1 (see Sec- tion 6.5), to establish the Endian mode for the interrupt handler. The HILE bit is set, by an implementation- dependent method, during system initialization, and cannot be modified after system initialization. The contents of the HILE bit must be the same for all threads under the control of a given instance of the hypervisor; otherwise all results are undefined. Chapter 2. Logical Partitioning (LPAR) 737 Version 2.06 738 Power ISATM Book III-S Version 2.06 Chapter 3. Branch Facility 3.1 Branch Facility Overview . . . . . . . 741 3.3.2 Power-Saving Mode Instructions 747 3.2 Branch Facility Registers . . . . . . . 741 3.3.2.1 Entering and Exiting Power-Sav- 3.2.1 Machine State Register . . . . . . . 741 ing Mode . . . . . . . . . . . . . . . . . . . . . . . 751 3.3 Branch Facility Instructions . . . . . . 745 3.3.1 System Linkage Instructions . . . 745 3.1 Branch Facility Overview Programming Note The privilege state of the thread is deter- This chapter describes the details concerning the regis- mined by MSRHV and MSRPR, as follows. ters and the privileged instructions implemented in the Branch Facility that are not covered in Book I. HV PR 0 0 privileged 0 1 problem 3.2 Branch Facility Registers 1 0 privileged and hypervisor 1 1 problem 3.2.1 Machine State Register MSRHV can be set to 1 only by the Sys- tem Call instruction and some interrupts. It The Machine State Register (MSR) is a 64-bit register. can be set to 0 only by rfid and hrfid. This register defines the state of the thread. On inter- rupt, the MSR bits are altered in accordance with 4:37 Reserved Figure 44 on page 828. The MSR can also be modified by the mtmsr[d], rfid, and hrfid instructions. It can be 38 Vector Available (VEC) [Category: Vector] read by the mfmsr instruction. 0 The thread cannot execute any vector instructions, including vector loads, MSR stores, and moves. 0 63 1 The thread can execute vector instruc- Figure 6. Machine State Register tions. Below are shown the bit definitions for the Machine 39 Reserved State Register. Bit Description 0 Sixty-Four-Bit Mode (SF) 0 The thread is in 32-bit mode. 1 The thread is in 64-bit mode. 1:2 Reserved 3 Hypervisor State (HV) 0 The thread is not in hypervisor state. 1 If MSRPR=0 the thread is in hypervisor state; otherwise the thread is not in hyper- visor state. Chapter 3. Branch Facility 741 Version 2.06 40 VSX Available (VSX) unless that instruction is hrfid or rfid, which are never traced. Successful com- 0 The thread cannot execute any VSX pletion means that the instruction caused instructions, including VSX loads, stores, no other interrupt. and moves. 1 The thread can execute VSX instructions. 54 Branch Trace Enable (BE) [Category: Trace] 41:46 Reserved 0 The thread executes branch instructions 47 Reserved normally. 48 External Interrupt Enable (EE) 1 The thread generates a Branch type 0 External and Decrementer interrupts are Trace interrupt after completing the exe- disabled. cution of a branch instruction, whether or 1 External and Decrementer interrupts are not the branch is taken. enabled. Branch tracing need not be supported on all This bit also affects whether Hypervisor Dec- implementations that support the Trace cate- rementer and Hypervisor Maintenance inter- gory. If the function is not implemented, this rupts are enabled; see Section 6.5.12 on bit is treated as reserved. page 839 and Section 6.2.9 on page 823. 55 Floating-Point Exception Mode 1 (FE1) 49 Problem State (PR) [Category: Floating-Point] 0 The thread is in privileged state. See below. 1 The thread is in problem state. 56:57 Reserved 58 Instruction Relocate (IR) Programming Note Any instruction that sets MSRPR to 1 also 0 Instruction address translation is disabled. sets MSREE, MSRIR, and MSRDR to 1. 1 Instruction address translation is enabled. 50 Floating-Point Available (FP) Programming Note [Category: Floating-Point] See the Programming Note in the defini- tion of MSRPR. 0 The thread cannot execute any floating- point instructions, including floating-point 59 Data Relocate (DR) loads, stores, and moves. 1 The thread can execute floating-point 0 Data address translation is disabled. instructions. Effective Address Overflow (EAO) (see Book I) does not occur. 51 Machine Check Interrupt Enable (ME) 1 Data address translation is enabled. EAO 0 Machine Check interrupts are disabled. causes a Data Storage interrupt. 1 Machine Check interrupts are enabled. This bit is a hypervisor resource; see Chapter Programming Note 2., "Logical Partitioning (LPAR)", on page 731. See the Programming Note in the defini- tion of MSRPR. Programming Note The only instructions that can alter 60 Reserved MSRME are rfid and hrfid. 61 Performance Monitor Mark (PMM) [Category: Server.Performance Monitor] See Appendix B of Book III-S. 52 Floating-Point Exception Mode 0 (FE0) 62 Recoverable Interrupt (RI) [Category: Floating-Point] 0 Interrupt is not recoverable. See below. 1 Interrupt is recoverable. 53 Single-Step Trace Enable (SE) Additional information about the use of this bit [Category: Trace] is given in Sections 6.4.3, "Interrupt Process- 0 The thread executes instructions nor- ing" on page 825, 6.5.1, "System Reset Inter- mally. rupt" on page 829, and 6.5.2, "Machine Check 1 The thread generates a Single-Step type Interrupt" on page 831. Trace interrupt after successfully complet- 63 Little-Endian Mode (LE) ing the execution of the next instruction, 742 Power ISATM Book III-S Version 2.06 0 The thread is in Big-Endian mode. 1 The thread is in Little-Endian mode. Programming Note The only instructions that can alter MSRLE are rfid and hrfid. The Floating-Point Exception Mode bits FE0 and FE1 are interpreted as shown below. For further details see Book I. FE0 FE1 Mode 0 0 Ignore Exceptions 0 1 Imprecise Nonrecoverable 1 0 Imprecise Recoverable 1 1 Precise Chapter 3. Branch Facility 743 Version 2.06 3.3 Branch Facility Instructions 3.3.1 System Linkage Instructions These instructions provide the means by which a pro- The System Call instruction is described in Book I, but gram can call upon the system to perform a service, only at the level required by an application programmer. and by which the system can return from performing a A complete description of this instruction appears service or from processing an interrupt. below. System Call SC-form Programming Note sc LEV sc serves as both a basic and an extended mne- monic. The Assembler will recognize an sc mne- 17 /// /// // LEV // 1 / monic with one operand as the basic form, and an 0 6 11 16 20 27 30 31 sc mnemonic with no operand as the extended form. In the extended form the LEV operand is omitted and assumed to be 0. SRR0 iea CIA + 4 SRR133:36 42:47 0 SRR10:32 37:41 48:63 MSR0:32 37:41 48:63 MSR new_value (see below) NIA 0x0000_0000_0000_0C00 The effective address of the instruction following the System Call instruction is placed into SRR0. Bits 0:32, 37:41, and 48:63 of the MSR are placed into the corre- sponding bits of SRR1, and bits 33:36 and 42:47 of SRR1 are set to zero. Then a System Call interrupt is generated. The inter- rupt causes the MSR to be set as described in Section 6.5, "Interrupt Definitions" on page 828. The setting of the MSR is affected by the contents of the LEV field. LEV values greater than 1 are reserved. Bits 0:5 of the LEV field (instruction bits 20:25) are treated as a reserved field. The interrupt causes the next instruction to be fetched from effective address 0x0000_0000_0000_0C00. This instruction is context synchronizing. Special Registers Altered: SRR0 SRR1 MSR Programming Note If LEV=1 the hypervisor is invoked. If LPES1=1, executing this instruction with LEV=1 is the only way that executing an instruction can cause hypervisor state to be entered. Because this instruction is not privileged, it is possi- ble for application software to invoke the hypervi- sor. However, such invocation should be considered a programming error. 744 Power ISATM Book III-S Version 2.06 Return From Interrupt Doubleword Hypervisor Return From Interrupt XL-form Doubleword XL-form rfid hrfid 19 /// /// /// 18 / 19 /// /// /// 274 / 0 6 11 16 21 31 0 6 11 16 21 31 MSR51 (MSR3 & SRR151) | ((¬MSR3) & MSR51) MSR48 HSRR148 | HSRR149 MSR3 MSR3 & SRR13 MSR58 HSRR158 | HSRR149 MSR48 SRR148 | SRR149 MSR59 HSRR159 | HSRR149 MSR58 SRR158 | SRR149 MSR0:32 37:41 49:57 60:63 HSRR10:32 37:41 49:57 60:63 MSR59 SRR159 | SRR149 NIA iea HSRR00:61 || 0b00 MSR0:2 4:32 37:41 49:50 52:57 60:63 SRR10:2 4:32 37:41 49:50 52:57 60:63 NIA iea SRR00:61 || 0b00 The result of ORing bits 48 and 49 of HSRR1 is placed into MSR48. The result of ORing bits 58 and 49 of If MSR3=1 then bits 3 and 51 of SRR1 are placed into HSRR1 is placed into MSR58. The result of ORing bits the corresponding bits of the MSR. The result of ORing 59 and 49 of HSRR1 is placed into MSR59. Bits 0:32, bits 48 and 49 of SRR1 is placed into MSR48. The 37:41, 49:57, and 60:63 of HSRR1 are placed into the result of ORing bits 58 and 49 of SRR1 is placed into corresponding bits of the MSR. MSR58. The result of ORing bits 59 and 49 of SRR1 is placed into MSR59. Bits 0:2, 4:32, 37:41, 49:50, 52:57, If the new MSR value does not enable any pending and 60:63 of SRR1 are placed into the corresponding exceptions, then the next instruction is fetched, under bits of the MSR. control of the new MSR value, from the address HSRR00:61 || 0b00 (when SF=1 in the new MSR value) If the new MSR value does not enable any pending or 320 || HSRR032:61 || 0b00 (when SF=0 in the new exceptions, then the next instruction is fetched, under MSR value). If the new MSR value enables one or more control of the new MSR value, from the address pending exceptions, the interrupt associated with the SRR00:61 || 0b00 (when SF=1 in the new MSR value) highest priority pending exception is generated; in this or 320 || SRR032:61 || 0b00 (when SF=0 in the new MSR case the value placed into SRR0 or HSRR0 by the value). If the new MSR value enables one or more interrupt processing mechanism (see Section 6.4.3) is pending exceptions, the interrupt associated with the the address of the instruction that would have been highest priority pending exception is generated; in this executed next had the interrupt not occurred. case the value placed into SRR0 or HSRR0 by the interrupt processing mechanism (see Section 6.4.3) is This instruction is hyfpervisor privileged and context the address of the instruction that would have been synchronizing. executed next had the interrupt not occurred. Special Registers Altered: This instruction is privileged and context synchronizing. MSR Special Registers Altered: Programming Note MSR If this instruction sets MSRPR to 1, it also sets MSREE, MSRIR, and MSRDR to 1. Programming Note If this instruction sets MSRPR to 1, it also sets MSREE, MSRIR, and MSRDR to 1. Chapter 3. Branch Facility 745 Version 2.06 3.3.2 Power-Saving Mode Instructions The power-saving mode instructions provide a means thread exits from power-saving mode, are implementa- by which the hypervisor can put the thread into power- tion-specific. saving mode. When the thread is in power-saving mode Read-only resources (including the HILE bit) are main- it does not execute instructions, and it may consume tained in all power-saving levels. Descriptions of less power than it would consume when it is not in resource state loss in the power-saving mode instruc- power-saving mode. tion descriptions do not apply to read-only resources. There are four levels of power-savings, called doze, nap, sleep, and rvwinkle. For each level in this list, the Programming Note power consumed is less than or equal to the power The hypervisor determines which power-saving consumed in the preceding level, and the time required level to enter based on how responsive the system for the thread to exit from the level and for software then needs to be. If the hypervisor decides that some to resume normal operation is greater than or equal to loss of state is acceptable, it can use the nap the corresponding time for the preceding level. Doze instruction rather than the doze instruction, and power-saving level requires a minimum amount of such when the thread exits from power-saving mode the time, while the other levels may require more time. hypervisor can quickly determine whether any Resources other than those listed in the instruction resources need to be restored. descriptions that are maintained in each level other than doze, and the actions required by the hypervisor in order for software to resume normal operation after the 746 Power ISATM Book III-S Version 2.06 Doze XL-form Nap XL-form doze nap 19 /// /// /// 402 / 19 /// /// /// 434 / 0 6 11 16 21 31 0 6 11 16 21 31 The thread is placed into doze power-saving level. The thread is placed into nap power-saving level. When the thread is in doze power-saving level, the When the thread is in nap power-saving level, the state state of all thread resources is maintained as if the of the Decrementer and all hypervisor resources is thread was not in power-saving mode. maintained as if the thread was not in power-saving mode, and sufficient information is maintained to allow When the interrupt that causes exit from doze power- the hypervisor to resume execution. saving level occurs, resource state is as described in the preceding paragraph, except that if the exception When the interrupt that causes exit from nap power- that caused the exit is a System Reset, Machine saving level occurs, resource state is as described in Check, or Hypervisor Maintenance exception, resource the preceding paragraph, except that if the exception state that would be lost if the exception occurred when that caused the exit is a System Reset, Machine the thread was not in power-saving mode may be lost. Check, or Hypervisor Maintenance exception, resource state that would be lost if the exception occurred when This instruction is hypervisor privileged and context the thread was not in power-saving mode may be lost. synchronizing. This instruction is hypervisor privileged and context Special Registers Altered: synchronizing. None Special Registers Altered: None Programming Note If the state of the Decrementer were not maintained and updated as if the thread was not in power-sav- ing mode, Decrementer exceptions would not reli- ably cause exit from nap power-saving level even if Decrementer exceptions were enabled to cause exit. Sleep XL-form Special Registers Altered: None sleep Programming Note 19 /// /// /// 466 / If the state of the Decrementer is not maintained 0 6 11 16 21 31 and updated, in sleep or rvwinkle power-saving level, as if the thread was not in power-saving The thread is placed into sleep power-saving level. mode, Decrementer exceptions will not reliably When the thread is in sleep power-saving level, the cause exit from power-saving mode even if Decre- state of all resources may be lost except for the menter exceptions are enabled to cause exit. HRMOR. When the interrupt that causes exit from sleep power- saving level occurs, resource state is as described in Note the preceding paragraph, except that if the exception that caused the exit is a System Reset, Machine See the Notes that appear in the rvwinkle instruc- Check, or Hypervisor Maintenance exception, resource tion description. state that would be lost if the exception occurred when the thread was not in power-saving mode may be lost. This instruction is hypervisor privileged and context synchronizing. Chapter 3. Branch Facility 747 Version 2.06 Rip Van Winkle XL-form rvwinkle 19 /// /// /// 498 / 0 6 11 16 21 31 The thread is placed into rvwinkle power-saving level. When the thread is in rvwinkle power-saving level, the state of all resources may be lost except for the HRMOR. When the interrupt that causes exit from rvwinkle power-saving level occurs, resource state is as described in the preceding paragraph, except that if the exception that caused the exit is a System Reset, Machine Check, or Hypervisor Maintenance exception, resource state that would be lost if the exception occurred when the thread was not in power-saving mode may be lost. This instruction is hypervisor privileged and context synchronizing. Special Registers Altered: None Programming Note In the short story by Washington Irving, Rip Van Winkle is a man who fell asleep on a green knoll and awoke twenty years later. Note See the Notes that appear in the sleep instruction description. 748 Power ISATM Book III-S Version 2.06 3.3.2.1 Entering and Exiting Power-Saving Mode In order to enter power-saving mode, the hypervisor Programming Note must use the instruction sequence shown below. Before executing this sequence, the hypervisor must The ptesync instruction (see Book III-S, Section ensure that LPCRMER contains the value 0, the 5.9.2) in the preceding sequence, in conjunction LPCRPECE contains the desired value if doze or nap with the ld instruction and the loop, ensure that all power-saving level is to be entered, MSRSF, MSRHV, storage accesses associated with instructions pre- and MSRME contain the value 1, and all other bits of the ceding the ptesync instruction, and all Reference, MSR contain the value 0 except for MSRRI, which may and Change bit updates associated with additional contain either 0 or 1. Depending on the implementation address translations that were performed, by the and on the power-saving mode being entered, it may thread executing the ptesync instruction, before also be necessary for the hypervisor to save the state the ptesync instruction is executed, have been of certain resources before entering the sequence. The performed with respect to all threads and mecha- sequence must be exactly as shown, with no interven- nisms, to the extent required by the associated ing instructions, except that any GPR may be used as Memory Coherence Required attributes, before the Rx and as Ry, and any value may be used for thread enters power-saving mode. The b instruc- "save_area" provided the resulting effective address is tion (branch to self) is not executed since the pre- double-word aligned and corresponds to a valid real ceding power-saving mode instruction puts the address. thread in a power-saving mode in which instruc- tions are not executed. Even though it is not exe- std Rx,save_area(Ry) /* last store neces-*/ cuted, requiring it to be present simplifies /* sary to save state*/ implementation and testing because it reduces the ptesync /* order load after*/ synchronization needed between execution of the /* last store */ instruction stream and entry into power-saving ld Rx,save_area(Ry) /* reload from last */ mode. /* store location, */ /* for synchro- */ If the performance monitor is in use when the /* nization */ thread enters power-saving mode, the Performance loop: Monitor data obtainable when the thread exits from cmp Rx,Rx /* create dependency */ power-saving mode may be incomplete or other- bne loop wise misleading. nap/doze/sleep/rvwinkle /* enter power- */ /* saving mode */ b $ /* branch to self */ Programming Note After the thread has entered power-saving mode as Software is not required to set the RI bit to any par- specified above, various exceptions may cause exit ticular value prior to entering power-saving mode from power-saving mode. The exceptions include, Sys- because the setting of SRR162 upon exit from tem Reset, Machine Check, Decrementer, External, power-saving mode is independent of the value of Hypevisor Maintenance, and implementation-specific the RI bit upon entry into power-saving mode. exceptions. Upon exit from power-saving mode, if the exception was a Machine Check exception, then a Machine Check interrupt occurs; otherwise a System Reset interrupt occurs, and the contents of SRR1 indi- cate the type of exception that caused exit from power- saving mode. See Section 6.5.1 for additional informa- tion. Chapter 3. Branch Facility 749 Version 2.06 750 Power ISATM Book III-S Version 2.06 Chapter 4. Fixed-Point Facility 4.1 Fixed-Point Facility Overview . . . . 753 4.4.1 Fixed-Point Load and Store Caching 4.2 Special Purpose Registers . . . . . . 753 Inhibited Instructions . . . . . . . . . . . . . . 756 4.3 Fixed-Point Facility Registers . . . . 753 4.4.2 Fixed-Point Load and Store Quad- 4.3.1 Processor Version Register . . . . 753 word Instructions [Category: Load/Store 4.3.2 Processor Identification Register 754 Quadword] . . . . . . . . . . . . . . . . . . . . . . 759 4.3.3 Control Register. . . . . . . . . . . . . 754 4.4.3 OR Instruction . . . . . . . . . . . . . . 760 4.3.4 Program Priority Register . . . . . 754 4.4.4 Move To/From System Register 4.3.5 Software-use SPRs . . . . . . . . . . 755 Instructions . . . . . . . . . . . . . . . . . . . . . 760 4.4 Fixed-Point Facility Instructions . . 756 4.1 Fixed-Point Facility Over- The PVR distinguishes between implementations that differ in attributes that may affect software. It contains view two fields. Version A 16-bit number that identifies the version This chapter describes the details concerning the regis- of the implementation. Different version ters and the privileged instructions implemented in the numbers indicate major differences Fixed-Point Facility that are not covered in Book I. between implementations, such as which categories are supported. 4.2 Special Purpose Registers Revision A 16-bit number that distinguishes between implementations of the version. Different Special Purpose Registers (SPRs) are read and written revision numbers indicate minor differences using the mfspr (page 764) and mtspr (page 763) between implementations having the same instructions. Most SPRs are defined in other chapters version number, such as clock rate and of this book; see the index to locate those definitions. Engineering Change level. Version numbers are assigned by the Power ISA pro- 4.3 Fixed-Point Facility Regis- cess. Revision numbers are assigned by an implemen- tation-defined process. ters 4.3.2 Processor Identification 4.3.1 Processor Version Register Register The Processor Version Register (PVR) is a 32-bit read- The Processor Identification Register (PIR) is a 32-bit only register that contains a value identifying the ver- register that contains a value that can be used to distin- sion and revision level of the implementation. The con- guish the thread from other threads in the system. The tents of the PVR can be copied to a GPR by the mfspr contents of the PIR can be copied to a GPR by the instruction. Read access to the PVR is privileged; write mfspr instruction. Read access to the PIR is privileged; access is not provided. write access, if provided, is hypervisor privileged. It is Version Revision 32 48 63 Figure 7. Processor Version Register Chapter 4. Fixed-Point Facility 753 Version 2.06 implementation-dependent whether write access is pro- 4.3.3 Control Register vided. The Control Register (CTRL) is a 32-bit register that PROCID controls an external I/O pin. This signal may be used 32 63 for the following: driving the RUN Light on a system operator panel Bits Name Description Direct External exception routing 0:31 PROCID Thread ID Performance Monitor Counter incrementing (see Appendix B) Figure 8. Processor Identification Register /// RUN The means by which the PIR is initialized are imple- 32 63 mentation-dependent. The PIR is a hypervisor resource; see Chapter 2. Bit Name Description 63 RUN Run state bit All other fields are implementation-dependent. Figure 9. Control Register The CTRL RUN can be used by the operating system to indicate when the thread is doing useful work. The contents of the CTRL can be written by the mtspr instruction and read by the mfspr instruction. Write access to the CTRL is privileged. Reads can be per- formed in privileged or problem state. 4.3.4 Program Priority Register Privileged programs may set a wider range of program priorities in the PRI field of PPR and PPR32 than may be set by problem-state programs (see Chapter 3 of Book II). Problem-state programs may only set values in the range of 0b010 to 0b100. Privileged programs may set values in the range of 0b001 to 0b110. Hyper- visor software may also set 0b111. If a program attempts to set a value that is not available to it, the PRI field remains unchanged. The values and their corre- sponding meanings are as follows. Bit(s) Description 11:13 Program Priority (PRI) 001 very low 010 low 011 medium low (normal) 100 medium 101 medium high 110 high 111 very high 754 Power ISATM Book III-S Version 2.06 4.3.5 Software-use SPRs Software-use SPRs are 64-bit registers provided for use by software. SPRG0 SPRG1 SPRG2 SPRG3 0 63 Figure 10. Software-use SPRs SPRG0, SPRG1, and SPRG2 are privileged registers. SPRG3 is a privileged register except that the contents may be copied to a GPR in Problem state when accessed using the mfspr instruction. Programming Note Neither the contents of the SPRGs, nor accessing them using mtspr or mfspr, has a side effect on the operation of the thread. One or more of the reg- isters is likely to be needed by non-hypervisor inter- rupt handler programs (e.g., as scratch registers and/or pointers to per thread save areas). Operating systems must ensure that no sensitive data are left in SPRG3 when a problem state pro- gram is dispatched, and operating systems for secure systems must ensure that SPRG3 cannot be used to implement a "covert channel" between problem state programs. These requirements can be satisfied by clearing SPRG3 before passing control to a program that will run in problem state. HSPRG0 and HSPRG1 are 64-bit registers provided for use by hypervisor programs. HSPRG0 HSPRG1 0 63 Figure 11. SPRs for use by hypervisor programs Programming Note Neither the contents of the HSPRGs, nor accessing them using mtspr or mfspr, has a side effect on the operation of the thread. One or more of the reg- isters is likely to be needed by hypervisor interrupt handler programs (e.g., as scratch registers and/or pointers to per thread save areas). Chapter 4. Fixed-Point Facility 755 Version 2.06 4.4 Fixed-Point Facility Instructions 4.4.1 Fixed-Point Load and Store Caching Inhibited Instructions The storage accesses caused by the instructions Programming Note described in this section are performed as though the specified storage location is Caching Inhibited and The instructions described in this section can be Guarded. The instructions can be executed only in used to permit a control register on an I/O device to hypervisor state. If any of the following restrictions on be accessed without permitting the corresponding execution of these instructions (while in hypervisor storage location to be copied into the caches. state) are violated, the results are undefined. They must be executed only when MSRDR=0. See also, in Book I, the introductions to Section 3.3.1, The specified storage location must not be in stor- "Fixed-Point Storage Access Instructions", age specified by the Hypervisor Real Mode Stor- Section 3.3.2, "Fixed-Point Load Instructions", and age Control facility to be treated as non-Guarded. Section 3.3.3, "Fixed-Point Store Instructions". Software must ensure that the specified storage location is not in the caches. 756 Power ISATM Book III-S Version 2.06 Load Byte and Zero Caching Inhibited Load Halfword and Zero Caching Indexed X-form Inhibited Indexed X-form lbzcix RT,RA,RB lhzcix RT,RA,RB 31 RT RA RB 853 / 31 RT RA RB 821 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + (RB) EA b + (RB) 560 || MEM(EA, 1) 480 || MEM(EA, 2) RT RT Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (RB). The byte in storage addressed by EA is (RA|0)+ (RB). The halfword in storage addressed by loaded into RT56:63. RT0:55 are set to 0. EA is loaded into RT48:63. RT0:47 are set to 0. The storage access caused by this instruction is per- The storage access caused by this instruction is per- formed as though the specified storage location is formed as though the specified storage location is Caching Inhibited and Guarded. Caching Inhibited and Guarded. This instruction is hypervisor privileged. This instruction is hypervisor privileged. Special Registers Altered: Special Registers Altered: None None Load Word and Zero Caching Inhibited Load Doubleword Caching Inhibited Indexed X-form Indexed X-form lwzcix RT,RA,RB ldcix RT,RA,RB 31 RT RA RB 789 / 31 RT RA RB 885 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + (RB) EA b + (RB) RT 32 0 || MEM(EA, 4) RT MEM(EA, 8) Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (RB). The word in storage addressed by EA is (RA|0)+ (RB). The doubleword in storage addressed by loaded into RT32:63. RT0:31 are set to 0. EA is loaded into RT. The storage access caused by this instruction is per- The storage access caused by this instruction is per- formed as though the specified storage location is formed as though the specified storage location is Caching Inhibited and Guarded. Caching Inhibited and Guarded. This instruction is hypervisor privileged. This instruction is hypervisor privileged. Special Registers Altered: Special Registers Altered: None None Chapter 4. Fixed-Point Facility 757 Version 2.06 Store Byte Caching Inhibited Indexed Store Halfword Caching Inhibited Indexed X-form X-form stbcix RS,RA,RB sthcix RS,RA,RB 31 RS RA RB 981 / 31 RS RA RB 949 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + (RB) EA b + (RB) MEM(EA, 1) (RS)56:63 MEM(EA, 2) (RS)48:63 Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)56:63 are stored into the byte in stor- (RA|0)+ (RB). (RS)48:63 are stored into the halfword in age addressed by EA. storage addressed by EA. The storage access caused by this instruction is per- The storage access caused by this instruction is per- formed as though the specified storage location is formed as though the specified storage location is Caching Inhibited and Guarded. Caching Inhibited and Guarded. This instruction is hypervisor privileged. This instruction is hypervisor privileged. Special Registers Altered: Special Registers Altered: None None Store Word Caching Inhibited Indexed Store Doubleword Caching Inhibited X-form Indexed X-form stwcix RS,RA,RB stdcix RS,RA,RB 31 RS RA RB 917 / 31 RS RA RB 1013 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + (RB) EA b + (RB) MEM(EA, 4) (RS)32:63 MEM(EA, 8) (RS) Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)32:63 are stored into the word in stor- (RA|0)+ (RB). (RS) is stored into the doubleword in age addressed by EA. storage addressed by EA. The storage access caused by this instruction is per- The storage access caused by this instruction is per- formed as though the specified storage location is formed as though the specified storage location is Caching Inhibited and Guarded. Caching Inhibited and Guarded. This instruction is hypervisor privileged. This instruction is hypervisor privileged. Special Registers Altered: Special Registers Altered: None None 758 Power ISATM Book III-S Version 2.06 4.4.2 Fixed-Point Load and Store Quadword Instructions [Category: Load/ Store Quadword] Load Quadword DQ-form Store Quadword DS-form lq RTp,DQ(RA) stq RSp,DS(RA) 56 RTp RA DQ /// 62 RSp RA DS 2 0 6 11 16 28 31 0 6 11 16 30 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + EXTS(DQ || 0b0000) EA b + EXTS(DS || 0b00) RTp MEM(EA, 8) MEM(EA, 8) RSp Let the effective address (EA) be the sum (RA|0)+ Let the effective address (EA) be the sum (RA|0)+ (DQ||0b0000). The quadword in storage addressed by (DS||0b00). The contents of register pair RSp are EA is loaded into register pair RTp. stored into the quadword in storage addressed by EA. EA must be a multiple of 16. If it is not, an Alignment EA must be a multiple of 16. If it is not, an Alignment interrupt occurs. interrupt occurs. If RTp is odd or RTp=RA, the instruction form is invalid. If RSp is odd, the instruction form is invalid. If RTp=RA, an attempt to execute this instruction This instruction is not supported in Little-Endian mode. causes an Illegal Instruction type Program interrupt. Execution of this instruction in Little-Endian mode (The RTp=RA case includes the case of RTp=RA=0.) causes either an Alignment interrupt or the results are This instruction is not supported in Little-Endian mode. boundedly undefined. Execution of this instruction in Little-Endian mode This instruction is privileged. causes either an Alignment interrupt or the results are boundedly undefined. This instruction is privileged. Special Registers Altered: None Special Registers Altered: None Chapter 4. Fixed-Point Facility 759 Version 2.06 4.4.3 OR Instruction 4.4.4 Move To/From System Reg- or Rx,Rx,Rx can be used to set PPRPRI (see Section ister Instructions 4.3.4) as shown in Figure 12. PPRPRI remains The Move To Special Purpose Register and Move From unchanged if the privilege state of the thread executing Special Purpose Register instructions are described in the instruction is lower than the privilege indicated in Book I, but only at the level available to an application the figure. (The encodings available to problem-state programmer. For example, no mention is made there of programs, as well as encodings for additional shared registers that can be accessed only in privileged state. resource hints not shown here, are described in Chap- The descriptions of these instructions given below ter 3 of Book II.) extend the descriptions given in Book I, but do not list Special Purpose Registers that are implementation- PPRPRI Priority Privi- dependent. In the descriptions of these instructions Rx leged given below, the "defined" SPR numbers are the SPR 31 001 very low yes numbers shown in the figure for the instruction and the 1 010 low no implementation-specific SPR numbers that are imple- mented, and similarly for "defined" registers. 6 011 medium low no 2 100 medium (normal) no Extended mnemonics 5 101 medium high yes Extended mnemonics are provided for the mtspr and 3 110 high yes mfspr instructions so that they can be coded with the 7 111 very high hypv SPR name as part of the mnemonic rather than as a numeric operand. See Appendix A, "Assembler Figure 12. Priority levels for or Rx,Rx,Rx Extended Mnemonics" on page 867. 760 Power ISATM Book III-S Version 2.06 Figure 13. SPR encodings (Sheet 1 of 2) SPR1 Privileged Length decimal Register Name Cat2 spr5:9 spr0:4 mtspr mfspr (bits) 1 00000 00001 XER no no 64 B 8 00000 01000 LR no no 64 B 9 00000 01001 CTR no no 64 B 13 00000 01101 AMR no6 no 64 S 17 00000 10001 DSCR yes yes 64 STM 18 00000 10010 DSISR yes yes 32 S 19 00000 10011 DAR yes yes 64 S 22 00000 10110 DEC yes yes 32 B 25 00000 11001 SDR1 hypv3 hypv3 64 S 26 00000 11010 SRR0 yes yes 64 B 27 00000 11011 SRR1 yes yes 64 B 28 00000 11100 CFAR yes yes 64 S 29 00000 11101 AMR yes6 yes 64 S 136 00100 01000 CTRL - no 32 S 152 00100 11000 CTRL yes - 32 S 157 00100 11101 UAMOR yes7 yes 64 S 256 01000 00000 VRSAVE no no 32 B 259 01000 00011 SPRG3 - no 64 B 268 01000 01100 TB - no 64 B 269 01000 01101 TBU - no 32 B 272-275 01000 100xx SPRG[0-3] yes yes 64 B 282 01000 11010 EAR hypv3 hypv3 32 EC 284 01000 11100 TBL hypv3 - 32 B 285 01000 11101 TBU hypv3 - 32 B 286 01000 11110 TBU40 hypv - 64 S 287 01000 11111 PVR - yes 32 B 304 01001 10000 HSPRG0 hypv3 hypv3 64 S 3 305 01001 10001 HSPRG1 hypv hypv3 64 S 3 306 01001 10010 HDSISR hypv hypv3 32 S 307 01001 10011 HDAR hypv3 hypv3 64 S 308 01001 10100 SPURR hypv3 yes 64 S 309 01001 10101 PURR hypv3 yes 64 S 310 01001 10110 HDEC hypv3 hypv3 32 S 312 01001 11000 RMOR hypv3 hypv3 64 S 313 01001 11001 HRMOR hypv3 hypv3 64 S 3 314 01001 11010 HSRR0 hypv hypv3 64 S 3 hypv3 315 01001 11011 HSRR1 hypv 64 S Chapter 4. Fixed-Point Facility 761 Version 2.06 Figure 13. SPR encodings (Sheet 2 of 2) SPR1 Privileged Length decimal Register Name Cat2 spr5:9 spr0:4 mtspr mfspr (bits) 318 01001 11110 LPCR hypv3 hypv3 64 S 3 319 01001 11111 LPIDR hypv hypv3 32 S 336 01010 10000 HMER hypv3,4 hypv3 64 S 337 01010 10001 HMEER hypv3 hypv3 64 S 338 01010 10010 PCR hypv3 hypv3 64 S 339 01010 10011 HEIR hypv3 hypv3 32 S 349 01010 11101 AMOR hypv3 hypv3 64 S 512 10000 00000 SPEFSCR no no 32 SP 526 10000 01110 ATB/ATBL - no 64 ATB 527 10000 01111 ATBU - no 32 ATB 768-783 11000 0xxxx perf_mon - no 64 S.PM 784-799 11000 1xxxx perf_mon yes yes 64 S.PM 896 11100 00000 PPR no no 64 S 898 11100 00010 PPR32 no no 32 B5 1013 11111 10101 DABR hypv3 hypv3 64 S 3 1015 11111 10111 DABRX hypv hypv3 64 S 1023 11111 11111 PIR - yes 32 S - This register is not defined for this instruction. 1 Note that the order of the two 5-bit halves of the SPR number is reversed. 2 See Section 1.3.5 of Book I. 3 This register is a hypervisor resource, and can be accessed by this instruction only in hypervisor state (see Chapter 2). 4 This register cannot be directly written. Instead, bits in the register corresponding to 0 bits in (RS) can be cleared using mtspr SPR,RS. 5 The register is Category: Phased-in. 6 The value specified in register RS may be masked by the contents of the [U]AMOR before being placed into the AMR; see the mtspr instruction description. 7 The value specified in register RS may be ANDed with the contents of the AMOR before being placed into the UAMOR; see the mtspr instruction description. All SPR numbers that are not shown above and are not implementation-specific are reserved. 762 Power ISATM Book III-S Version 2.06 Move To Special Purpose Register When the designated SPR is the Hypervisor Mainte- XFX-form nance Exception Register (HMER), the contents of reg- ister RS are ANDed with the contents of the HMER and mtspr SPR,RS the result is placed into the HMER. For this instruction, SPRs TBL and TBU are treated as 31 RS spr 467 / separate 32-bit registers; setting one leaves the other 0 6 11 21 31 unaltered. n spr5:9 || spr0:4 spr0=1 if and only if writing the register is privileged. if (n 13) & (n 29) & (n 157) & (n 336) then Execution of this instruction specifying an SPR number if length(SPR(n)) = 64 then with spr0=1 causes a Privileged Instruction type Pro- SPR(n) (RS) gram interrupt when MSRPR=1 and, if the SPR is a else hypervisor resource (see Figure 13), when SPR(n) (RS)32:63 MSRHV PR=0b00. else if n = 13 then if MSRHV PR = 0b10 then Execution of this instruction specifying an SPR number SPR(13) (RS) that is not defined for the implementation, including else SPR numbers that are shown in Figure 13 but are in a if MSRHV PR = 0b00 then category that is not supported by the implementation, SPR(13) ((RS) & AMOR) | ((SPR(13)) & ¬AMOR) causes one of the following. else SPR(13) ((RS) & UAMOR) | ((SPR(13)) & ¬UAMOR) if spr0=0: if n = 29 then - if MSRPR=1: Hypervisor Emulation Assistance if MSRHV PR = 0b10 then interrupt SPR(29) (RS) else - if MSRPR=0: Hypervisor Emulation Assistance SPR(29) ((RS) & AMOR) | ((SPR(29)) & ¬AMOR) interrupt for SPR 0 and no operation (i.e. the if n = 157 then instruction is treated as a no-op) for all other if MSRHV PR = 0b10 then SPRs SPR(157) (RS) if spr0=1: else - if MSRPR=1: Privileged Instruction type Pro- SPR(157) (RS) & AMOR gram interrupt if n = 336 then SPR(336) (SPR(336)) & (RS) - if MSRPR=0: no operation (i.e. the instruction The SPR field denotes a Special Purpose Register, is treated as a no-op) encoded as shown in Figure 13. The contents of regis- ter RS are placed into the designated Special Purpose Register, except as described in the next four para- Special Registers Altered: graphs. For Special Purpose Registers that are 32 bits See Figure 13 long, the low-order 32 bits of RS are placed into the SPR. Programming Note For a discussion of software synchronization When the designated SPR is the Authority Mask Regis- requirements when altering certain Special Pur- ter (AMR), using SPR 13 or SPR 29, and pose Registers, see Chapter 10. "Synchronization MSRHV PR=0b00, the contents of bit positions of regis- Requirements for Context Alterations" on ter RS corresponding to 1 bits in the Authority Mask page 861. Override Register (AMOR) are placed into the corre- sponding bits of the AMR; the other AMR bits are not modified. When the designated SPR is the AMR, using SPR 13, and MSRPR=1, the contents of bit positions of register RS corresponding to 1 bits in the User Authority Mask Override Register (UAMOR) are placed into the corre- sponding bits of the AMR; the other AMR bits are not modified. When the designated SPR is the UAMOR and MSRHV PR=0b00, the contents of register RS are ANDed with the contents of the AMOR and the result is placed into the UAMOR. Chapter 4. Fixed-Point Facility 763 Version 2.06 Move From Special Purpose Register XFX-form mfspr RT,SPR 31 RT spr 339 / 0 6 11 21 31 n spr5:9 || spr0:4 if length(SPR(n)) = 64 then RT SPR(n) else RT 32 0 || SPR(n) The SPR field denotes a Special Purpose Register, encoded as shown in Figure 13. The contents of the designated Special Purpose Register are placed into register RT. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RT receive the con- tents of the Special Purpose Register and the high- order 32 bits of RT are set to zero. spr0=1 if and only if reading the register is privileged. Execution of this instruction specifying an SPR number with spr0=1 causes a Privileged Instruction type Pro- gram interrupt when MSRPR=1 and, if the SPR is a hypervisor resource (see Figure 13), when MSRHV PR=0b00. Execution of this instruction specifying an SPR number that is not defined for the implementation, including SPR numbers that are shown in Figure 13 but are in a category that is not supported by the implementation, causes one of the following. if spr0=0: - if MSRPR=1: Hypervisor Emulation Assistance interrupt - if MSRPR=0: Hypervisor Emulation Assistance interrupt for SPRs 0, 4, 5, and 6 and no opera- tion (i.e. the instruction is treated as a no-op) for all other SPRs if spr0=1: - if MSRPR=1: Privileged Instruction type Pro- gram interrupt - if MSRPR=0: no operation (i.e. the instruction is treated as a no-op) Special Registers Altered: None Note See the Notes that appear with mtspr. 764 Power ISATM Book III-S Version 2.06 Move To Machine State Register X-form Programming Note mtmsr RS,L If MSREE=0 and an External or Decrementer exception is pending, executing an mtmsr instruc- 31 RS /// L /// 146 / tion that sets MSREE to 1 will cause the External or 0 6 11 15 16 21 31 Decrementer interrupt to occur before the next instruction is executed, if no higher priority excep- tion exists (see Section 6.8, "Interrupt Priorities" on if L = 0 then page 845). Similarly, if a Hypervisor Decrementer MSR48 (RS)48 | (RS)49 MSR58 (RS)58 | (RS)49 interrupt is pending, execution of the instruction by MSR59 (RS)59 | (RS)49 the hypervisor causes a Hypervisor Decrementer MSR32:47 49:50 52:57 60:62 (RS)32:47 49:50 52:57 60:62 interrupt to occur if HDICE=1. else For a discussion of software synchronization MSR48 62 (RS)48 62 requirements when altering certain MSR bits, see The MSR is set based on the contents of register RS Chapter 10. and of the L field. L=0: Programming Note The result of ORing bits 48 and 49 of register RS is mtmsr serves as both a basic and an extended placed into MSR48. The result of ORing bits 58 and mnemonic. The Assembler will recognize an 49 of register RS is placed into MSR58. The result mtmsr mnemonic with two operands as the basic of ORing bits 59 and 49 of register RS is placed form, and an mtmsr mnemonic with one operand into MSR59. Bits 32:47, 49:50, 52:57, and 60:62 of as the extended form. In the extended form the L register RS are placed into the corresponding bits operand is omitted and assumed to be 0. of the MSR. L=1: Programming Note Bits 48 and 62 of register RS are placed into the There is no need for an analogous version of the corresponding bits of the MSR. The remaining bits mfmsr instruction, because the existing instruction of the MSR are unchanged. copies the entire contents of the MSR to the selected GPR. This instruction is privileged. If L=0 this instruction is context synchronizing. If L=1 this instruction is execution synchronizing; in addition, the alterations of the EE and RI bits take effect as soon as the instruction completes. Special Registers Altered: MSR Except in the mtmsr instruction description in this sec- tion, references to "mtmsr" in this document imply either L value unless otherwise stated or obvious from context (e.g., a reference to an mtmsr instruction that modifies an MSR bit other than the EE or RI bit implies L=0). Programming Note If this instruction sets MSRPR to 1, it also sets MSREE, MSRIR, and MSRDR to 1. This instruction does not alter MSRME or MSRLE. (This instruction does not alter MSRHV because it does not alter any of the high-order 32 bits of the MSR.) If the only MSR bits to be altered are MSREE RI, to obtain the best performance L=1 should be used. Chapter 4. Fixed-Point Facility 765 Version 2.06 Move To Machine State Register Programming Note Doubleword X-form If MSREE=0 and an External or Decrementer mtmsrd RS,L exception is pending, executing an mtmsrd instruc- tion that sets MSREE to 1 will cause the External or 31 RS /// L /// 178 / Decrementer interrupt to occur before the next 0 6 11 15 16 21 31 instruction is executed, if no higher priority excep- tion exists (see Section 6.8, "Interrupt Priorities" on page 845). Similarly, if a Hypervisor Decrementer if L = 0 then interrupt is pending, execution of the instruction by the hypervisor causes a Hypervisor Decrementer MSR48 (RS)48 | (RS)49 MSR58 (RS)58 | (RS)49 interrupt to occur if HDICE=1. MSR59 (RS)59 | (RS)49 For a discussion of software synchronization MSR0:2 4:47 49:50 52:57 60:62 (RS)0:2 4:47 49:50 52:57 60:62 requirements when altering certain MSR bits, see else MSR48 62 (RS)48 62 Chapter 10. The MSR is set based on the contents of register RS and of the L field. Programming Note mtmsrd serves as both a basic and an extended L=0: mnemonic. The Assembler will recognize an mtm- The result of ORing bits 48 and 49 of register RS is srd mnemonic with two operands as the basic placed into MSR48. The result of ORing bits 58 and form, and an mtmsrd mnemonic with one operand 49 of register RS is placed into MSR58. The result as the extended form. In the extended form the L of ORing bits 59 and 49 of register RS is placed operand is omitted and assumed to be 0. into MSR59. Bits 0:2, 4:47, 49:50, 52:57, and 60:62 of register RS are placed into the corresponding bits of the MSR. L=1: Bits 48 and 62 of register RS are placed into the corresponding bits of the MSR. The remaining bits of the MSR are unchanged. This instruction is privileged. If L=0 this instruction is context synchronizing. If L=1 this instruction is execution synchronizing; in addition, the alterations of the EE and RI bits take effect as soon as the instruction completes. Special Registers Altered: MSR Except in the mtmsrd instruction description in this section, references to "mtmsrd" in this document imply either L value unless otherwise stated or obvious from context (e.g., a reference to an mtmsrd instruction that modifies an MSR bit other than the EE or RI bit implies L=0). Programming Note If this instruction sets MSRPR to 1, it also sets MSREE, MSRIR, and MSRDR to 1. This instruction does not alter MSRLE, MSRME or MSRHV. If the only MSR bits to be altered are MSREE RI, to obtain the best performance L=1 should be used. 766 Power ISATM Book III-S Version 2.06 Move From Machine State Register X-form mfmsr RT 31 RT /// /// 83 / 0 6 11 16 21 31 RT MSR The contents of the MSR are placed into register RT. This instruction is privileged. Special Registers Altered: None Chapter 4. Fixed-Point Facility 767 Version 2.06 768 Power ISATM Book III-S Version 2.06 Chapter 5. Storage Control 5.1 Overview. . . . . . . . . . . . . . . . . . . . 772 5.7.7.4 Relaxed Page Table Alignment 5.2 Storage Exceptions. . . . . . . . . . . . 772 [Category: Server.Relaxed Page Table 5.3 Instruction Fetch . . . . . . . . . . . . . 772 Alignment] . . . . . . . . . . . . . . . . . . . . . . 789 5.3.1 Implicit Branch . . . . . . . . . . . . . . 772 5.7.8 Reference and Change 5.3.2 Address Wrapping Combined with Recording . . . . . . . . . . . . . . . . . . . . . . 789 Changing MSR Bit SF . . . . . . . . . . . . . 772 5.7.9 Storage Protection . . . . . . . . . . . 792 5.4 Data Access . . . . . . . . . . . . . . . . . 772 5.7.9.1 Virtual Page Class Key 5.5 Performing Operations Protection . . . . . . . . . . . . . . . . . . . . . . 792 Out-of-Order . . . . . . . . . . . . . . . . . . . . 772 5.7.9.2 Basic Storage Protection, Address 5.6 Invalid Real Address . . . . . . . . . . . 773 Translation Enabled . . . . . . . . . . . . . . . 796 5.7 Storage Addressing . . . . . . . . . . . 775 5.7.9.3 Basic Storage Protection, Address 5.7.1 32-Bit Mode . . . . . . . . . . . . . . . . 775 Translation Disabled . . . . . . . . . . . . . . 796 5.7.2 Virtualized Partition Memory (VPM) 5.8 Storage Control Attributes . . . . . . . 798 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . 775 5.8.1 Guarded Storage . . . . . . . . . . . . 798 5.7.3 Real And Virtual Real Addressing 5.8.1.1 Out-of-Order Accesses to Guarded Modes . . . . . . . . . . . . . . . . . . . . . . . . . 776 Storage . . . . . . . . . . . . . . . . . . . . . . . . 798 5.7.3.1 Hypervisor Offset Real Mode 5.8.2 Storage Control Bits . . . . . . . . . . 799 Address . . . . . . . . . . . . . . . . . . . . . . . . 776 5.8.2.1 Storage Control Bit 5.7.3.2 Offset Real Mode Address . . . 776 Restrictions . . . . . . . . . . . . . . . . . . . . . 800 5.7.3.3 Storage Control Attributes for 5.8.2.2 Altering the Storage Control Accesses in Real and Hypervisor Real Bits. . . . . . . . . . . . . . . . . . . . . . . . . . . . 800 Addressing Modes . . . . . . . . . . . . . . . 777 5.9 Storage Control Instructions . . . . . 802 5.7.3.3.1 Hypervisor Real Mode Storage 5.9.1 Cache Management Instructions 802 Control . . . . . . . . . . . . . . . . . . . . . . . . 777 5.9.2 Synchronize Instruction . . . . . . . 802 5.7.3.4 Virtual Real Mode Addressing 5.9.3 Lookaside Buffer Mechanism . . . . . . . . . . . . . . . . . . . . . 778 Management . . . . . . . . . . . . . . . . . . . . 803 5.7.3.5 Storage Control Attributes for 5.9.3.1 SLB Management Implicit Storage Accesses . . . . . . . . . . 779 Instructions . . . . . . . . . . . . . . . . . . . . . 804 5.7.4 Address Ranges Having Defined 5.9.3.2 Bridge to SLB Architecture [Cate- Uses . . . . . . . . . . . . . . . . . . . . . . . . . . 779 gory:Server.Phased-Out] . . . . . . . . . . . 810 5.7.5 Address Translation Overview . . 779 5.9.3.2.1 Segment Register 5.7.6 Virtual Address Generation . . . . 780 Manipulation Instructions. . . . . . . . . . . 810 5.7.6.1 Segment Lookaside Buffer 5.9.3.3 TLB Management Instructions 813 (SLB) . . . . . . . . . . . . . . . . . . . . . . . . . . 780 5.10 Page Table Update Synchronization 5.7.6.2 SLB Search . . . . . . . . . . . . . . 781 Requirements . . . . . . . . . . . . . . . . . . . 820 5.7.7 Virtual to Real Translation . . . . . 782 5.10.1 Page Table Updates . . . . . . . . . 820 5.7.7.1 Page Table . . . . . . . . . . . . . . . 784 5.10.1.1 Adding a Page Table Entry . . 821 5.7.7.2 Storage Description 5.10.1.2 Modifying a Page Table Register 1 . . . . . . . . . . . . . . . . . . . . . . 786 Entry . . . . . . . . . . . . . . . . . . . . . . . . . . 822 5.7.7.3 Page Table Search . . . . . . . . . 787 5.10.1.3 Deleting a Page Table Entry . 822 Chapter 5. Storage Control 769 Version 2.06 5.1 Overview ters) for which alteration can cause an implicit branch are indicated as such in Chapter 10. "Synchronization A program references storage using the effective Requirements for Context Alterations" on page 861. address computed by the hardware when it executes a Implicit branches are not supported by the Power ISA. Load, Store, Branch, or Cache Management instruc- If an implicit branch occurs, the results are boundedly tion, or when it fetches the next sequential instruction. undefined. The effective address is translated to a real address according to procedures described in Section 5.7.3, in 5.3.2 Address Wrapping Com- Section 5.7.5 and in the following sections. The real address is what is presented to the storage subsystem. bined with Changing MSR Bit SF For a complete discussion of storage addressing and If the current instruction is at effective address 232 - 4 effective address calculation, see Section 1.10 of Book and is an mtmsrd instruction that changes the contents I. of MSRSF, the effective address of the next sequential instruction is undefined. 5.2 Storage Exceptions Programming Note In the case described in the preceding paragraph, if A storage exception results when the sequential execu- an interrupt occurs before the next sequential tion model requires that a storage access be performed instruction is executed, the contents of SRR0, or but the access is not permitted (e.g., is not permitted by HSRR0, as appropriate to the interrupt, are unde- the storage protection mechanism), the access cannot fined. be performed because the effective address cannot be translated to a real address, or the access matches some tracking mechanism criteria (e.g., Data Address Breakpoint). 5.4 Data Access In certain cases a storage exception may result in the Data accesses are controlled by MSRDR. "restart" of (re-execution of at least part of) a Load or Store instruction. See Section 2.1 of Book II, and Sec- MSRDR=0 tion 6.6 in this Book. The effective address of the data is interpreted as described in Section 5.7.3. 5.3 Instruction Fetch MSRDR=1 The effective address of the data is translated by Instructions are fetched under control of MSRIR. the Address Translation mechanism described in MSRIR=0 Section 5.7.5. The effective address of the instruction is inter- preted as described in Section 5.7.3. 5.5 Performing Operations MSRIR=1 Out-of-Order The effective address of the instruction is trans- lated by the Address Translation mechanism An operation is said to be performed "in-order" if, at the described beginning in Section 5.7.5. time that it is performed, it is known to be required by the sequential execution model. An operation is said to be performed "out-of-order" if, at the time that it is per- 5.3.1 Implicit Branch formed, it is not known to be required by the sequential execution model. Explicitly altering certain MSR bits (using mtmsr[d]), or explicitly altering SLB entries, Page Table Entries, or Operations are performed out-of-order on the expecta- certain System Registers (including the HRMOR, and tion that the results will be needed by an instruction that possibly other implementation-dependent registers), will be required by the sequential execution model. may have the side effect of changing the addresses, Whether the results are really needed is contingent on effective or real, from which the current instruction everything that might divert the control flow away from stream is being fetched. This side effect is called an the instruction, such as Branch, Trap, System Call, and implicit branch. For example, an mtmsrd instruction Return From Interrupt instructions, and interrupts, and that changes the value of MSRSF may change the on everything that might change the context in which effective addresses from which the current instruction the instruction is executed. stream is being fetched. The MSR bits and System Registers (excluding implementation-dependent regis- Typically, operations are performed out-of-order when resources are available that would otherwise be idle, so 770 Power ISATM Book III-S Version 2.06 the operation incurs little or no cost. If subsequent Programming Note events such as branches or interrupts indicate that the operation would not have been performed in the In configurations supporting multiple partitions, sequential execution model, any results of the opera- hypervisor software must ensure that a storage tion are abandoned (except as described below). access by a program in one partition will not cause a Checkstop or other system-wide event that could In the remainder of this section, including its subsec- affect the integrity of other partitions (see Chapter tions, "Load instruction" includes the Cache Manage- 2). For example, such an event could occur if a real ment and other instructions that are stated in the address placed in a Page Table Entry or made instruction descriptions to be "treated as a Load", and accessible to a partition using the Offset Real similarly for "Store instruction". Mode Address mechanism (see Section 5.7.3.2) A data access that is performed out-of-order may corre- does not exist. spond to an arbitrary Load or Store instruction (e.g., a Load or Store instruction that is not in the instruction stream being executed). Similarly, an instruction fetch that is performed out-of-order may be for an arbitrary instruction (e.g., the aligned word at an arbitrary loca- tion in instruction storage). Most operations can be performed out-of-order, as long as the machine appears to follow the sequential execu- tion model. Certain out-of-order operations are restricted, as follows. Stores Stores are not performed out-of-order (even if the Store instructions that caused them were executed out-of-order). Accessing Guarded Storage The restrictions for this case are given in Section 5.8.1.1. The only permitted side effects of performing an opera- tion out-of-order are the following. A Machine Check or Checkstop that could be caused by in-order execution may occur out-of- order, except as described in Section 5.7.3.3.1 for the Hypervisor Real Mode Storage Control facility. On implementations which support Reference and Change bits, these bits may be set as described in Section 5.7.8. Non-Guarded storage locations that could be fetched into a cache by in-order fetching or execu- tion of an arbitrary instruction may be fetched out- of-order into that cache. 5.6 Invalid Real Address A storage access (including an access that is per- formed out-of-order; see Section 5.5) may cause a Machine Check if the accessed storage location con- tains an uncorrectable error or does not exist. In the case that the accessed storage location does not exist, the Checkstop state may be entered. See Section 6.5.2 on page 831. Chapter 5. Storage Control 771 Version 2.06 5.7 Storage Addressing Storage Control Overview bit mode. In 32-bit mode (MSRSF=0), the high-order 32 m bits of the 64-bit effective address are treated as zeros Real address space size is 2 bytes, m60; see for the purpose of addressing storage. This applies to Note 1. both data accesses and instruction fetches. It applies Real page size is 212 bytes (4 KB). independent of whether address translation is enabled or disabled. This truncation of the effective address is Effective address space size is 264 bytes. the only respect in which storage accesses in 32-bit An effective address is translated to a virtual mode differ from those in 64-bit mode. address via the Segment Lookaside Buffer (SLB). - Virtual address space size is 2n bytes, Programming Note 65n78; see Note 2. Treating the high-order 32 bits of the effective - Segment size is 2s bytes, s=28 or 40. address as zeros effectively truncates the 64-bit - 2n-40 number of virtual segments 2n-28; effective address to a 32-bit effective address such see Note 2. as would have been generated on a 32-bit imple- - Virtual page size is 2p bytes, where 12p, and mentation of the Power ISA. Thus, for example, the 2p is no larger than either the size of the big- ESID in 32-bit mode is the high-order four bits of gest segment or the real address space; a this truncated effective address; the ESID thus lies size of 4 KB, 64 KB, and an implementation- in the range 0-15. When address translation is dependent number of other sizes are sup- enabled, these four bits would select a Segment ported; see Note 3. The Page Table specifies Register on a 32-bit implementation of the Power the virtual page size. The SLB specifies the ISA. The SLB entries that translate these 16 ESIDs base virtual page size, which is the smallest can be used to emulate these Segment Registers. virtual page size that the segment can con- tain. The base virtual page size is 2b bytes. - Segments contain pages of a single size, a 5.7.2 Virtualized Partition Mem- mixture of 4 KB and 64 KB pages, or a mixture of page sizes that include implementation- ory (VPM) Mode dependent page sizes. VPM mode enables the hypervisor to reassign all or A virtual address is translated to a real address via part of a partition's memory transparently so that the the Page Table. reassignment is not visible to the partition. When this is - Virtual page size is 2p bytes, where 12p, and done, the partition's memory is said to be "virtualized." 2p is no larger than either the size of the big- The VPM field in the LPCR enables VPM mode sepa- gest segment or the real address space; a rately when address translation is enabled and when size of 4KB, 64 KB, and an implementation- translation is disabled. dependent number of other sizes are sup- If the thread is not in hypervisor state, and either ported; see Note 3. address translation is enabled and VPM1=1, or address translation is disabled and VPM0=1, conditions that Notes: would have caused a Data Storage or an Instruction 1. The value of m is implementation-dependent (sub- Storage interrupt if the affected memory were not virtu- ject to the maximum given above). When used to alized instead cause a Hypervisor Data Storage or a address storage, the high-order 60-m bits of the Hypervisor Instruction Storage interrupt respectively. "60-bit" real address must be zeros. Because the Hypervisor Data Storage and Hypervisor Instruction Storage interrupts always put the thread in 2. The value of n is implementation-dependent (sub- hypervisor state, they permit the hypervisor to handle ject to the range given above). In references to 78- the condition if appropriate (e.g., to restore the contents bit virtual addresses elsewhere in this Book, the of a page that was reassigned), and to reflect it to the high-order 78-n bits of the "78-bit" virtual address operating system's Data Storage or Instruction Storage are assumed to be zeros. interrupt handler otherwise. 3. The supported values of p for the larger virtual When address translation is enabled, VPM mode has no effect on address translation. When address transla- page sizes are implementation-dependent (subject tion is disabled, addressing is controlled as specified in to the limitations given above). Section 5.7.3. 5.7.1 32-Bit Mode The computation of the 64-bit effective address is inde- pendent of whether the thread is in 32-bit mode or 64- 772 Power ISATM Book III-S Version 2.06 5.7.3 Real And Virtual Real j+r = m, where the real address size supported by the implementation is m bits. Addressing Modes Programming Note When a storage access is an instruction fetch per- formed when instruction address translation is dis- EA4:63-r should equal 60-r0. If this condition is satis- abled, or if the access is a data access and data fied, ORing the effective address with the offset address translation is disabled, it is said to be per- produces a result that is equivalent to adding the formed in "real addressing mode" if VPM0=0 and the effective address and the offset. thread is not in hypervisor state. If the thread is in If m<60, EA4:63-m and HRMOR4:63-m must be hypervisor state, the access is said to be performed in zeros. "hypervisor real addressing mode" regardless of the value of VPM0. If the thread is not in hypervisor state and VPM0=1, the access is said to be performed in "vir- tual real addressing mode." Storage accesses in real, hypervisor real, and virtual real addressing modes are 5.7.3.2 Offset Real Mode Address performed in a manner that depends on the contents of If VPM0=0, MSRHV=0, and LPES1=1, the access is MSRHV, LPES, VPM, VRMASD, HRMOR, RMLS, controlled by the contents of the Real Mode Limit RMOR (see Chapter 2), bit 0 of the effective address Selector and Real Mode Offset Register, as specified (EA0), and the state of the Real Mode Storage Control below, and the set of storage locations accessible by Facility as described below. Bits 1:3 of the effective code is referred to as the Real Mode Area (RMA). address are ignored. Real Mode Limit Selector (RMLS) MSRHV=1 If bits 4:63 of the effective address for the access If EA0=0, the Hypervisor Offset Real Mode are greater than or equal to the value (limit) repre- Address mechanism, described in Section 5.7.3.1, sented by the contents of the LPCRRMLS, the controls the access. access causes a storage exception (see Section 5.7.9.3). In this comparison, if m<60, bits If EA0=1, bits 4:63 of the effective address are 4:63-m of the effective address may be ignored used as the real address for the access. (i.e., treated as if they were zeros), where the real address size supported by the implementation is m MSRHV=0 bits. The supported limit values are of the form 2j, If LPES1=0, the access causes a storage excep- where 12 j 60. Subject to the preceding sen- tion as described in Section 5.7.9.3. tence, the number and values of the limits sup- If LPES1=1 and VPM0=0, the Offset Real Mode ported are implementation-dependent. Address mechanism, described in Section 5.7.3.2, Real Mode Offset Register (RMOR) controls the access. If the access is permitted by the LPCRRMLS, bits If LPES1=1 and VPM0=1, the Virtual Real Mode 4:63 of the effective address for the access are Addressing mechanism, described in Section ORed with the 60-bit offset represented by the con- 5.7.3.4, controls the access. tents of the RMOR, and the low-order m bits of the 60-bit result are used as the real address for the 5.7.3.1 Hypervisor Offset Real Mode access. The supported offset values are all values of the form i×2s, where 0 i < 2k, and k and s are Address implementation-dependent values having the prop- If MSRHV = 1 and EA0 = 0, the access is controlled by erties that 2s is the minimum limit value supported the contents of the Hypervisor Real Mode Offset Regis- by the implementation (i.e., the minimum value rep- ter, as follows. resentable by the contents of the LPCRRMLS) and k+s = m. Hypervisor Real Mode Offset Register (HRMOR) Bits 4:63 of the effective address for the access are ORed with the 60-bit offset represented by the contents of the HRMOR, and the 60-bit result is used as the real address for the access. The sup- ported offset values are all values of the form i×2r, where 0 i < 2j, and j and r are implementation- dependent values having the properties that 12 r 26 (i.e., the minimum offset granularity is 4 KB and the maximum offset granularity is 64 MB) and Chapter 5. Storage Control 773 Version 2.06 Programming Note 5.7.3.3.1 Hypervisor Real Mode Storage Control The offset specified by the RMOR should be a non- The Hypervisor Real Mode Storage Control facility pro- zero multiple of the limit specified by the RMLS. If vides a means of specifying portions of real storage these registers are set thus, ORing the effective that are treated as non-Guarded in hypervisor real address with the offset produces a result that is addressing mode (MSRHV PR=0b10, and MSRIR=0 or equivalent to adding the effective address and the MSRDR=0, as appropriate for the type of access). The offset. (The offset must not be zero, because real remaining portions are treated as Guarded in hypervi- page 0 contains the fixed interrupt vectors and real sor real addressing mode. The means is a hypervisor pages 1 and 2 may be used for implementation- resource (see Chapter 2), and may also be system- specific purposes; see Section 5.7.4, "Address specific. Ranges Having Defined Uses" on page 777.) Implementations may use the first, or both, of two tech- niques to specify portions of real storage that are treated as non-Guarded in hypervisor real addressing 5.7.3.3 Storage Control Attributes for mode. For the first technique, the facility provides for Accesses in Real and Hypervisor Real the specification, at coarse granularity, of the boundary Addressing Modes between non-Guarded and Guarded real storage. Any storage location below the specified boundary is Storage accesses in hypervisor real addressing mode treated as non-Guarded in hypervisor real addressing are performed as though all of storage had the follow- mode, and any storage location at or above the bound- ing storage control attributes, except as modified by the ary is treated as Guarded in hypervisor real addressing Hypervisor Real Mode Storage Control facility (see mode. For the second technique, which cannot be Section 5.7.3.3.1). (The storage control attributes are applied to instruction storage, the facility divides real defined in Book II.) storage into history blocks, in implementation-specific not Write Through Required sizes. If there is no history for a block and it is accessed not Caching Inhibited, for instruction fetches using a Load/Store Caching Inhibited instruction, the not Caching Inhibited, for data accesses except access is performed as though the block is Guarded, those caused by the Load/Store Caching Inhibited and the block is treated as Guarded for subsequent instructions; Caching Inhibited, for data accesses accesses on a best effort basis, limited by the amount caused by the Load/Store Caching Inhibited of history that the facility can maintain. If there is no his- instructions tory for a block and it is accessed using any other Load Memory Coherence Required, for data accesses or Store instruction, the access is performed as though Guarded the block is Guarded, but the block is treated as non- not SAO Guarded for subsequent accesses on a best effort basis, limited by the amount of history that the facility Storage accesses in real addressing mode are per- can maintain. formed as though all of storage had the following stor- age control attributes. (Such accesses use the Offset The storage location specified by a Load/Store Caching Real Mode Address mechanism.) Inhibited instruction must not be in storage that is spec- ified by the Hypervisor Real Mode Storage Control not Write Through Required facility to be treated as non-Guarded. If the second not Caching Inhibited technique is used, the storage location specified by any Memory Coherence Required, for data accesses other Load or Store instruction must not be in storage not Guarded that is specified by the Hypervisor Real Mode Storage not SAO Control facility to be| treated as Guarded. (For the Additionally, storage accesses in real or hypervisor real second technique, "specified by the Hypervisor Real addressing modes are performed as though all storage Mode Storage Control facility" means "specified in a was not No-execute. history block".) For the second technique, the history can be erased using an slbia instruction; see See Sec- tion 5.9.3.1. Programming Note Because storage accesses in real addressing The facility does not apply to implicit accesses to the mode and hypervisor real addressing mode do not Page Table performed during address translation or in use the SLB or the Page Table, accesses in these recording reference and change information. These modes bypass all checking and recording of infor- accesses are performed as described in Section mation contained therein (e.g., storage protection 5.7.3.5. checks that use information contained therein are not performed, and reference and change informa- tion is not recorded). 774 Power ISATM Book III-S Version 2.06 Reference and change recording are handled as if Programming Note address translation were enabled. The preceding capability can be used to improve the performance of hypervisor software that runs in Field Value hypervisor real addressing mode, by causing 36 ESID 0 accesses to instructions and data that occupy well- behaved storage to be treated as non-Guarded. V 1 B 0b01 - 1 TB VSID 0b00 || 0x0_01FF_FFFF Ks 0 5.7.3.4 Virtual Real Mode Addressing Kp undefined Mechanism N 0 L VRMASDL If VPM0=1, MSRHV=0, LPES1=1, and MSRDR=0 or MSRIR=0 as appropriate for the type of access, the C 0 access is said to be made in virtual real addressing LP VRMASDLP mode and is controlled by the mechanism specified below. The set of storage locations accessible by Figure 14. SLBE for VRMA code is referred to as the Virtualized Real Mode Area (VRMA). In virtual real addressing mode, address translation, Programming Note storage protection, and reference and change record- The C bit in Figure 14 is set to 0 because the imple- ing are handled as follows. mentation-dependent lookaside information associ- Address translation and storage protection are ated with the VRMA is expected to be long-lived. handled as if address translation were enabled, See Section 5.9.3.1. except that translation of effective addresses to vir- tual addresses use the SLBE values in Figure 14 Programming Note instead of the entry in the SLB corresponding to the ESID. In this translation, bits 0:23 of the effec- The 1 TB VSID 0x0_01FF_FFFF should not be tive address are ignored (i.e. treated as if they used by the operating system for purposes other were 0s), bits 24:63-m may be ignored if m < 40, than mapping the VRMA when address translation and the Virtual Page Class Key Protection mecha- is enabled. nism does not apply. Programming Note Programming Note Software should specify PTEB = 0b01 for all Page The Virtual Page Class Key Protection mecha- Table Entries that map the VRMA in order to be nism does not apply because the authority consistent with the values in Figure 14. mask that an OS has set for application pro- grams executing with address translation enabled may not be the same as the authority Programming Note mask required by the OS when address trans- All accesses to the RMA are considered not lation is disabled, such as when first entering Guarded. The G bit of the associated Page Table an interrupt handler. Entry determines whether an access to the VRMA is Guarded. Therefore, if an instruction is fetched from the VRMA, a Hypervisor Instruction Storage interrupt will result if G=1 in the associated Page Table Entry. Programming Note The RMA is considered non-SAO storage. How- ever, any page in the VRMA is treated as SAO storage if WIMG = 0b1110 in the associated Page Table Entry. Chapter 5. Storage Control 775 Version 2.06 5.7.3.5 Storage Control Attributes for 5.7.4 Address Ranges Having Implicit Storage Accesses Defined Uses Implicit accesses to the Page Table during address translation and in recording reference and change infor- The address ranges described below have uses that mation are performed as though the storage occupied are defined by the architecture. by the Page Table had the following storage control Fixed interrupt vectors attributes. Except for the first 256 bytes, which are reserved not Write Through Required for software use, the real page beginning at real not Caching Inhibited address 0x0000_0000_0000_0000 is either used Memory Coherence Required for interrupt vectors or reserved for future interrupt not Guarded vectors. not SAO Implementation-specific use The definition of "performed" given in Book II applies The two contiguous real pages beginning at real also to these implicit accesses; accesses for perform- address 0x0000_0000_0000_1000 are reserved ing address translation are considered to be loads in for implementation-specific purposes. this respect, and accesses for recording reference and change information are considered to be stores. These Offset Real Mode interrupt vectors implicit accesses are ordered by the ptesync instruc- The real pages beginning at the real address spec- tion as described in Section 5.9.2. ified by the HRMOR and RMOR are used similarly to the page for the fixed interrupt vectors. Page Table A contiguous sequence of real pages beginning at the real address specified by SDR1 contains the Page Table. 5.7.5 Address Translation Overview The effective address (EA) is the address generated by the hardware for an instruction fetch or for a data access. If address translation is enabled, this address is passed to the Address Translation mechanism, which attempts to convert the address to a real address which is then used to access storage. The first step in address translation is to convert the effective address to a virtual address (VA), as described in Section 5.7.6. The second step, conver- sion of the virtual address to a real address (RA), is described in Section 5.7.7. If the effective address cannot be translated, a storage exception (see Section 5.2) occurs. Figure 15 gives an overview of the address translation process. 776 Power ISATM Book III-S Version 2.06 5.7.6 Virtual Address Generation Conversion of a 64-bit effective address to a virtual Effective Address address is done by searching the Segment Lookaside Buffer (SLB) as shown in Figure 16. 64-bit Effective Address 64-s s-p p Lookup in SLB ESID Page Byte 0 63-s 64-s 63-p 64-p 63 Segment Lookaside Virtual Address Buffer (SLB) SLBE0 ESID V B VSID KsKpNLC LP Lookup in Page Table SLBEn 0 35 37 39 88 89 93 95 96 VSID0:77-s Real Address 78-s s-p p VSID Page Byte Virtual Page Number (VPN) Figure 15. Address translation overview 78-bit Virtual Address Figure 16. Translation of 64-bit effective address to 78 bit virtual address 5.7.6.1 Segment Lookaside Buffer (SLB) The Segment Lookaside Buffer (SLB) specifies the mapping between Effective Segment IDs (ESIDs) and Virtual Segment IDs (VSIDs). The number of SLB entries is implementation-dependent, except that all implementations provide at least 32 entries. The contents of the SLB are managed by software, using the instructions described in Section 5.9.3.1. See Chapter 10. "Synchronization Requirements for Con- text Alterations" on page 861 for the rules that software must follow when updating the SLB. SLB Entry Each SLB entry (SLBE, sometimes referred to as a "segment descriptor") maps one ESID to one VSID. Figure 17 shows the layout of an SLB entry Chapter 5. Storage Control 777 Version 2.06 . For each SLB entry, software must ensure the following requirements are satisfied. ESID V B VSID KsKpNLC / LP - L||LP contains a value supported by the imple- 0 36 37 39 89 94 95 96 mentation. - The base virtual page size selected by the L Bit(s) Name Description and LP fields does not exceed the segment 0:35 ESID Effective Segment ID size selected by the B field. 36 V Entry valid (V=1) or invalid (V=0) - If s=40, the following bits of the SLB entry con- 37:38 B Segment Size Selector tain 0s. 0b00 - 256 MB (s=28) - ESID24:35 0b01 - 1 TB (s=40) - VSID38:49 0b10 - reserved 0b11 - reserved The bits in the above two items are ignored by 39:88 VSID Virtual Segment ID the hardware. 89 Ks Supervisor (privileged) state stor- age key (see Section 5.7.9.2) The Class field of the SLBE is used in conjunction with 90 Kp Problem state storage key (See the slbie and slbia instructions (see Section 5.9.3.1). Section 5.7.9.2.) "Class" refers to a grouping of SLB entries and imple- 91 N No-execute segment if N=1 mentation-specific lookaside information so that only 92 L Virtual page size selector bit 0 entries in a certain group need be invalidated and oth- 93 C Class ers might be preserved. The Class value assigned to 95:96 LP Virtual page size selector bits 1:2 an implementation-specific lookaside entry derived from an SLB entry must match the Class value of that All other fields are reserved. B0 (SLBE37) is treated as SLB entry. The Class value assigned to an implementa- a reserved field. tion-specific lookaside entry that is not derived from an Figure 17. SLB Entry SLB entry (such as real mode address "translations") is 0. Instructions cannot be executed from a No-execute (N=1) segment. Software must ensure that the SLB contains at most one entry that translates a given effective address, and Segments may contain a mixture of pages sizes. The L that if the SLB contains an entry that translates a given and LP bits specify the base virtual page size that the effective address, then any previously existing transla- segment may contain. The SLBL||LP encoding are those tion of that effective address has been invalidated. An shown in Figure 18. The base virtual page size (also attempt to create an SLB entry that violates this referred to as the "base page size") is the smallest vir- requirement may cause a Machine Check. tual page size for the segment. The base virtual page size is 2b bytes. The actual virtual page size (also Programming Note referred to as the "actual page size" or "virtual page It is permissible for software to replace the contents size") is specified by PTEL LP. of a valid SLB entry without invalidating the transla- tion specified by that entry provided the specified encoding page size restrictions are followed. See Chapter 10 Note 11. 0b000 4 KB 0b101 64 KB 5.7.6.2 SLB Search additional 2b bytes, where b > 12 and b may differ values1 among encoding values When the hardware searches the SLB, all entries are 1 tested for a match with the EA. For a match to exist, the The "additional values" are implementation-depen- following conditions must be satisfied for indicated dent, as are the corresponding base virtual page fields in the SLBE. sizes. Any values that are not supported by a given implementation are reserved in that implementa- V=1 tion. ESID0:63-s=EA0:63-s, where the value of s is speci- fied by the B field in the SLBE being tested Figure 18. Page Size Encoding If no match is found, the search fails. If one match is found, the search succeeds. If more than one match is found, one of the matching entries is used as if it were the only matching entry, or a Machine Check occurs. If the SLB search succeeds, the virtual address (VA) is formed from the EA and the matching SLB entry fields as follows. 778 Power ISATM Book III-S Version 2.06 VA=VSID0:77-s || EA64-s:63 The Virtual Page Number (VPN) is bits 0:77-p of the virtual address. The value of p is the actual virtual page size specified by the PTE used to translate the virtual address (see Section 5.7.7.1). If SLBEN = 1, the N (No- execute) value used for the storage access is 1. If the SLB search fails, a segment fault occurs. This is an Instruction Segment exception or a Data Segment exception, depending on whether the effective address is for an instruction fetch or for a data access. 5.7.7 Virtual to Real Translation Conversion of a 78-bit virtual address to a real address is done by searching the Page Table as shown in Figure 19. Chapter 5. Storage Control 779 Version 2.06 HTABORG HTABSIZE 78-bit Virtual Address 2 44 13 5 78-p p // xxx.......xx000.00 /// Virtual Page Number (VPN) Byte 0 4 1718 45 59 63 78-p 77 77-b 28 Decode to Mask 39 Hash Function (see Section 5.7.7.3) 0 27 0 2728 38 28 AND * If the Server.Relaxed Page Table Alignment category 28 is supported, low order HTABORG bits are not OR * necessarily zero; the OR block to the left is replaced with a full adder, and the carry out is added to bits HTABORG4:17 to form RA0:13 of the PTEG. Page Table 16 bytes PTEG 0 PTE0 PTE7 14 28 11 7 0000000 60-bit Real Address of Page Table Entry Group (PTEG) PTEG n 128 bytes Page Table Entry (PTE) 16 bytes B AVA SW L H V pp / key ARPN LP key R C WIMG N pp 0 57 6162 63 0 1 2 4 44 52 54 5556 57 61 62 63 (ARPN||LP)0:59-p 60-p p 60-bit Real Address Byte Figure 19. Translation of 78-bit virtual address to 60-bit real address 780 Power ISATM Book III-S Version 2.06 5.7.7.1 Page Table Dword Bit(s) Name Description 1 0 pp Page Protection bit 0 The Hashed Page Table (HTAB) is a variable-sized data structure that specifies the mapping between virtual 2:3 key KEY bits 0:1 page numbers and real page numbers, where the real 4:43 ARPN Abbreviated Real Page page number of a real page is bits 0:47 of the address Number of the first byte in the real page. The HTAB's size can 44:51 LP Large page size selector be any size 2n bytes where 18n46. The HTAB must 52:54 key KEY bits 2:4 be located in storage having the storage control 55 R Reference bit attributes that are used for implicit accesses to it (see 56 C Change bit Section 5.7.3.5). The starting address must be a multi- 57:60 WIMG Storage control bits ple of its size unless the implementation supports the 61 N No-execute page if N=1 Server.Relaxed Page Table Alignment category, in 62:63 pp Page Protection bits 1:2 which case its starting address is a multiple of 218 bytes (see Section 5.7.7.4). All other fields are reserved. The HTAB contains Page Table Entry Groups (PTEGs). Figure 20. Page Table Entry A PTEG contains 8 Page Table Entries (PTEs) of 16 bytes each; each PTEG is thus 128 bytes long. PTEGs are entry points for searches of the Page Table. Programming Note The H bit in the Page Table entry should not be set See Section 5.10 for the rules that software must follow to one unless the secondary Page Table search when updating the Page Table. has been enabled. Programming Note If b23, the Abbreviated Virtual Address (AVA) field The Page Table must be treated as a hypervisor contains bits 0:54 of the VA. Otherwise bits 0:77-b of resource (see Chapter 2), and therefore must be the AVA field contain bits 0:77-b of the VA, and bits 78- placed in real storage to which only the hypervisor b:54 of the AVA field must be zero. has write access. Moreover, the contents of the Page Table must be such that non-hypervisor soft- Programming Note ware cannot modify storage that contains hypervi- The AVA field omits the low-order 23 bits of the VA. sor programs or data. These bits are not needed in the PTE, because the low-order b of these bits are part of the byte offset Page Table Entry into the virtual page and, if b<23, the high-order 23-b of these bits are always used in selecting the Each Page Table Entry (PTE) maps one VPN to one PTEGs to be searched (see Section 5.7.7.3). RPN. Figure 20 shows the layout of a PTE. This layout is independent of the Endian mode of the thread. On implementations that support a virtual address size of only n bits, n<78, bits 0:77-n of the AVA field must be 0 57 61 62 63 zeros. B AVA SW L H V A virtual page is mapped to a sequence of 2p-12 contig- pp / key ARPN LP key R C WIMG N pp uous real pages such that the low-order p-12 bits of the 0 1 2 4 44 52 55 56 57 61 62 63 real page number of the first real page in the sequence are 0s. Dword Bit(s) Name Description PTEL LP specify both a base virtual page size (hence- 0 0:1 B Segment Size forth referred to as the "base page size") and an actual 0b00 - 256 MB virtual page size (henceforth referred to as the "actual 0b01 - 1 TB page size" or "virtual page size"). The actual page size 0b10 - reserved is the size of the virtual page mapped by the PTE. The 0b11 - reserved base page size is the smallest actual page size that a 2:56 AVA Abbreviated Virtual Address segment can contain. See Section 5.7.6. 57:60 SW Available for software use 61 L Virtual page size If PTEL=0, the base virtual page size and actual virtual 0b0 - 4 KB page size are 4KB, and ARPN concatenated with LP 0b1 - greater than 4KB (ARPN||LP) contains the page number of the real page (large page) that maps the virtual page described by the entry. 62 H Hash function identifier If PTEL=1, the base page size and actual page size are 63 V Entry valid (V=1) or invalid specified by PTELP. In this case, the contents of PTELP (V=0) have the format shown in Figure 21. Bits labelled "r" are Chapter 5. Storage Control 781 Version 2.06 bits of the real page number. Bits labelled "z" specify the base page size and actual page size. The values Instructions cannot be executed from a No-execute of the "z" bits used to specify each size are implemen- (N=1) page. tation-dependent. The values of the "z" bits used to specify each size, along with all possible values of "r" bits in the LP field, must result in LP values distinct from Page Table Size other LP values for other sizes. Actual page sizes 4KB The number of entries in the Page Table directly affects and 64KB are always supported; other actual page performance because it influences the hit ratio in the sizes are implementation-dependent. If PTEL=1, the Page Table and thus the rate of page faults. If the table actual page size must be greater than 4 KB. Which is too small, it is possible that not all the virtual pages combinations of different base page size and actual that actually have real pages assigned can be mapped page size are supported is implementation-dependent, via the Page Table. This can happen if too many hash except that the combination of a base page size of 4 KB collisions occur and there are more than 16 entries for with an actual page size of 64 KB is always supported. the same primary/secondary pair of PTEGs (when the PTE LP actual page size secondary Page Table search is enabled) or more than r r r r _r r r z 8 KB 8 entries for the same primary PTEG (when the sec- r r r r _r r z z 16 KB r r r r _r z z z 32 KB ondary Page Table search is disabled). r r r r _z z z z 64 KB r r r z _z z z z 128 KB While this situation cannot be guaranteed not to occur r r z z _z z z z 256 KB for any size Page Table, making the Page Table larger r z z z _z z z z 512 KB than the minimum size (see Section 5.7.7.2) will reduce z z z z _z z z z 1 MB the frequency of occurrence of such collisions. Figure 21. Format of PTELP when PTEL=1 Programming Note There are at least 2 formats of PTELP that specify a If large pages are not used, it is recommended that 64 KB page. One format is used with SLBEL||LP = the number of PTEGs in the Page Table be at least 0b000 and one format is used with SLBEL||LP = 0b101. half the number of real pages to be accessed. For example, if the amount of real storage to be The actual page size selected by the LP field must not accessed is 231 bytes (2 GB), then we have exceed the segment size selected by the B field. Forms 231-12=219 real pages. The minimum recom- of PTELP not supported by a given implementation are mended Page Table size would be 218 PTEGs, or treated as reserved values for that implementation. 225 bytes (32 MB). The concatenation of the ARPN field and bits labeled "r" in the LP field contain the high-order bits of the real page number of the real page that maps the first 4KB of 5.7.7.2 Storage Description the virtual page described by the entry. Register 1 The low-order p-12 bits of the real page number con- The Storage Description Register 1 (SDR1) register is tained in the ARPN and LP fields must be 0s and are shown in Figure 22. ignored by the hardware. // HTABORG /// HTABSIZE Programming Note 0 4 46 59 63 The actual page size specified by a given PTELP format is at least 212+(8-c), where c is the number of Bits Name Description r bits in the format. 4:45 HTABORG Real address of Page Table 59:63 HTABSIZE Encoded size of Page Table Programming Note All other fields are reserved. Implementations often have implementation- dependent lookaside buffers (e.g. TLBs and Figure 22. SDR1 ERATs) used to cache translations of recently used storage addresses. Mapping virtual storage to SDR1 is a hypervisor resource; see Chapter 2. large pages may increase the effectiveness of such The HTABORG field in SDR1 contains the high-order lookaside buffers, improving performance, because 42 bits of the 60-bit real address of the Page Table. it is possible for such buffers to translate a larger The Page Table is thus constrained to lie on a 218 byte range of addresses, reducing the frequency that (256 KB) boundary at a minimum. At least 11 bits from the Page Table must be searched to translate an the hash function (see Figure 19) are used to index into address. the Page Table. The minimum size Page Table is 256 KB (211 PTEGs of 128 bytes each). 782 Power ISATM Book III-S Version 2.06 The Page Table can be any size 2n bytes where locate a PTE that may translate the given virtual 18n46. As the table size is increased, more bits are address. used from the hash to index into the table and the value 1. A 39-bit hash value is computed from the VA. The in HTABORG must have more of its low-order bits value of s is the value specified in the SLBE that equal to 0 unless the implementation supports the was used to generate the virtual address; the value Server.Relaxed Page Table Alignment category; see of b is equal to log2(base page size specified in the Section 5.7.7.4. SLBE that was used to translate the address).Pri- The HTABSIZE field in SDR1 contains an integer giv- mary Hash: ing the number of bits (in addition to the minimum of 11 If s=28, the hash value is computed by Exclusive bits) from the hash that are used in the Page Table ORing VA11:49 with (11+b0||VA50:77-b) index. This number must not exceed 28. HTABSIZE is used to generate a mask of the form 0b00...011...1, If s=40, the hash value is computed by Exclusive which is a string of 28 - HTABSIZE 0-bits followed by a ORing the following three quantities: (VA24:37 string of HTABSIZE 1-bits. The 1-bits determine which ||250), (0||VA0:37), and (b-10||VA38:77-b) additional bits (beyond the minimum of 11) from the The 60-bit real address of a PTEG is formed by hash are used in the index (see Figure 19). The num- concatenating the following values: ber of low-order 0 bits in HTABORG must be greater Bits 4:17 of SDR1 (the high-order 14 bits of than or equal to the value in HTABSIZE. HTABORG). On implementations that support a real address size of Bits 0:27 of the 39-bit hash value ANDed with only m bits, m<60, bits 0:59-m of the HTABORG field the mask generated from bits 59:63 of SDR1 are treated as reserved bits, and software must set (HTABSIZE) and then ORed with bits 18:45 of them to zeros. SDR1 (the low-order 28 bits of HTABORG). Bits 28:38 of the 39-bit hash value. Programming Note Seven 0-bits. Let n equal the virtual address size (in bits) sup- This operation identifies a particular PTEG, called ported by the implementation. If n<67, software the "primary PTEG", whose eight PTEs will be should set the HTABSIZE field to a value that does tested. not exceed n-39. Because the high-order 78-n bits 2. Secondary Hash: of the VSID are assumed to be zeros, the hash value used in the Page Table search will have the If the secondary Page Table search is enabled high-order 67-n bits either all 0s (primary hash; see (LPCRTC=0), perform the secondary hash function Section 5.7.7.3) or all 1s (secondary hash). If as follows; otherwise do not perform step 2 and HTABSIZE > n-39, some of these hash value bits proceed to step 3 below. will be used to index into the Page Table, with the If s=28, the hash value is computed by taking the result that certain PTEGs will not be searched. ones complement of the Exclusive OR of VA11:49 with (11+b0||VA50:77-b) Example: If s=40, the hash value is computed by taking the Suppose that the Page Table is 16,384 (214) 128-byte ones complement of the Exclusive OR of the fol- PTEGs, for a total size of 221 bytes (2 MB). A 14-bit lowing three quantities: (VA24:37 ||250), (0||VA0:37), index is required. Eleven bits are provided from the and (b-10||VA38:77-b) hash to start with, so 3 additional bits from the hash must be selected. Thus the value in HTABSIZE must The 60-bit real address of a PTEG is formed by be 3 and the value in HTABORG must have its low- concatenating the following values: order 3 bits (bits 43:45 of SDR1) equal to 0. This Bits 4:17 of SDR1 (the high-order 14 bits of means that the Page Table must begin on a 23+11+7 = HTABORG). 221 = 2 MB boundary. Bits 0:27 of the 39-bit hash value ANDed with the mask generated from bits 59:63 of SDR1 (HTABSIZE) and then ORed with bits 18:45 of 5.7.7.3 Page Table Search SDR1 (the low-order 28 bits of HTABORG). When the hardware searches the Page Table, the Bits 28:38 of the 39-bit hash value. accesses are performed as described in Seven 0-bits. Section 5.7.3.5. This operation identifies the "secondary PTEG". An outline of the HTAB search process is shown in 3. As many as 8 PTEs in the primary PTEG and, if Figure 19. If the implementation supports the the secondary Page Table search is enabled, 8 Server.Relaxed Page Table Alignment category see PTEs in the secondary PTEG are tested to deter- Section 5.7.7.4. Up to two hash functions are used to mine if any translate the given virtual address. Let q = minimum(54, 77-b). For a match to exist, the Chapter 5. Storage Control 783 Version 2.06 following conditions must be satisfied, where The N (No-execute) value used for the storage access SLBE is the SLBE used to form the virtual address. is the result of ORing the N bit from the matching PTE PTEH=0 for the primary PTEG, 1 for the sec- with the N bit from the SLB entry that was used to ondary PTEG translate the effective address. PTEV=1 PTEB=SLBEB Programming Note PTEAVA[0:q]=VA0:q The value of b used when searching the Page if PTEL=0 then SLBEL||LP =0b000 Table to identify the PTEGs is log2(base page else PTELP specifies a base page size size). Since a segment may contain pages of differ- specified by SLBEL||LP ent sizes, the hardware searches for PTEs specify- If no match is found, the search fails. If one match ing pages of any supported size greater than or is found, the search succeeds. If more than one equal to the base page size, and the real address match is found, one of the matching entries is used is formed using the value of p specified by the as if it were the only matching entry, or a Machine matching PTE. Check occurs. A virtual page of 2p bytes in a segment with a base If the Page Table search succeeds, the real address page size of 2b bytes may be mapped by as many (RA) is formed by concatenating bits 0:59-p of as 2(b-p) PTEs. ARPN||LP from the matching PTE with bits 64-p:63 of the effective address (the byte offset), where the p value If the Page Table search fails, a page fault occurs. This is the log2 (actual page size specified by PTEL LP). is a [Hypervisor] Instruction Storage exception or a [Hypervisor] Data Storage exception, depending on RA=(ARPN || LP)0:59-p || EA64-p:63 whether the effective address is for an instruction fetch or for a data access. The N value used for the storage Programming Note access is the N bit from the SLB entry that was used to If PTEL = 0, the actual page size (and base page translate the effective address. size) are 4 KB. Otherwise the actual page size and base page size are specified by PTELP . Programming Note Since hardware searches the Page Table using a To obtain the best performance, Page Table value of b equal to log2 (base page size specified in Entries should be allocated beginning with the first the SLBE that was used to translate the address) empty entry in the primary PTEG, or with the first regardless of the actual page size, the hardware empty entry in the secondary PTEG if the primary page table search will identify different PTEs for PTEG is full and the secondary Page Table search VAs in different 2b-byte blocks of the virtual page if is enabled (LPCRTC=0). the actual page size is larger than the base page size. Therefore, there may need to be a valid PTE corresponding to each 2b block of the virtual page Translation Lookaside Buffer that is referenced. For an actual page size that is Conceptually, the Page Table is searched by the larger than 223 (8 MB), the PTEAVA will differ address relocation hardware to translate every refer- among some or all of these PTEs. Depending on ence. For performance reasons, the hardware usually the Page Table size, some or all of these PTEs may keeps a Translation Lookaside Buffer (TLB) that holds be in the same PTEG. Any such PTEs that are in PTEs that have recently been used. Even though multi- the same PTEG will differ in the value of PTEH or ple PTEs may be needed for a virtual page whose size PTEAVA or both. is larger than the base page size, one TLB entry All PTEs for the same virtual page should have the derived from a single PTE may be used to translate all same values in the Page Protection, KEY, ARPN, of the virtual addresses in the entire virtual page. The WIMG, and N fields. A set of values from any one TLB is searched prior to searching the Page Table. As of the PTEs that maps the virtual page may be a consequence, when software makes changes to the used for an access in the virtual page since looka- Page Table it must perform the appropriate TLB invali- side buffer information may be used to translate the date operations to maintain the consistency of the TLB virtual address. with the Page Table (see Section 5.10). To avoid creating multiple matching PTEs, soft- In the TLB search, the match criteria include virtual ware should not create PTEs for each of two differ- address bits 0:(77-q) where q is an implementation- ent virtual pages that overlap in the virtual address dependent integer such that b q p. As a result of a space. If the virtual page sizes differ, two virtual Page Table search, multiple matching TLB entries are pages overlap if the values of virtual address bits not created for the same virtual page, except that multi- 0:77-p for both virtual pages are the same, where ple matching TLB entries may be created if the Page 2p is the actual virtual page size of the larger page. Table contains PTEs that map different-sized virtual pages that overlap in the virtual address space. (If the 784 Power ISATM Book III-S Version 2.06 virtual page sizes differ, two virtual pages overlap if the generated from bits 59:63 of SDR1 (HTAB- values of virtual address bits 0:77-p for both virtual SIZE) and then added to the value of bits 4:45 pages are the same, where 2p is the actual virtual page of SDR1 (HTABORG). This part of the real size of the larger page.) If a TLB search finds multiple address differs from Section 5.7.7.2. matching TLB entries created from such PTEs, one of Bits 28:38 of the 39-bit hash value. the matching TLB entries is used as if it were the only Seven 0-bits. matching entry, or a Machine Check occurs. An outline of the PTEG real address computation is As a result of a Page Table search in a Page Table that shown in Figure 19. does not contain different-sized virtual pages that over- lap, it is implementation-dependent whether multiple non-matching TLB entries are created for the same vir- tual page. However, in this case if multiple TLB entries are created for a given virtual page, at most one match- ing TLB entry is created for any given virtual address in 5.7.8 Reference and Change that virtual page, and q for that TLB entry is less than p. Recording An implementation may associate each of its TLB If address translation is enabled, Reference (R) and entries with the partition for which the TLB entry was Change (C) bits are updated in any one of what could created, so that the entries can be retained while other be multiple Page Table Entries that map the virtual partitions are executing. In this case, when a valid TLB page that is being accessed. If the storage operand of a entry is created, the LPID value from LPIDR is written Load or Store instruction crosses a virtual page bound- into the TLB entry. ary, the accesses to the components of the operand in each page are treated as separate and independent Programming Notes accesses to each of the pages for the purpose of set- 1. Page Table Entries may or may not be cached ting the Reference and Change bits. in a TLB. Reference and Change bits are set by the hardware as 2. It is possible that the hardware implements described below. Setting the bits need not be atomic more than one TLB, such as one for data and with respect to performing the access that caused the one for instructions. In this case the size and bits to be updated. An attempt to access storage may shape of the TLBs may differ, as may the val- cause one or more of the bits to be set (as described ues contained therein. below) even if the access is not performed. The bits are 3. Use the tlbie or tlbia instruction to ensure that updated in the Page Table Entry if the new value would the TLB no longer contains a mapping for a otherwise be different from the old value for the virtual particular virtual page. page, as determined by examining either the Page Table Entry or any lookaside information for the virtual page (e.g., TLB) maintained by the hardware. 5.7.7.4 Relaxed Page Table Align- Reference Bit ment [Category: Server.Relaxed Page The Reference bit is set to 1 if the corresponding Table Alignment] access (load, store, or instruction fetch) is required The Page Table can be aligned on any 218 byte (256 by the sequential execution model and is per- KB) boundary regardless of the HTAB size. formed. Otherwise the Reference bit may be set to 1 if the corresponding access is attempted, either Section 5.7.7.2 describes the Storage Description Reg- in-order or out-of-order, even if the attempt causes ister, which includes the HTABORG field. That descrip- an exception, except that the Reference bit is not tion generally applies except for the following set to 1 for the access caused by an indexed Move difference. As the Page Table size is increased beyond Assist instruction for which the XER specifies a 256 KB, the value in HTABORG need not have more of length of zero. its low-order bits equal to 0. Instead, (HTABORG || 180) is the real address of the start of the Page Table Change Bit regardless of the Page Table size. The Change bit is set to 1 if a Store instruction is A Page Table search is performed as described in Sec- executed and the store is performed. Otherwise in tion 5.7.7.3 except the 60-bit real address of a PTEG general the Change bit may be set to 1 if a Store for both the primary and, if the secondary Page Table instruction is executed and the store is permitted search is enabled, the secondary hash is formed by by the storage protection mechanism and, if the concatenating the following values: Store instruction is executed out-of-order, the Bits 0:27 of the 39-bit appropriate primary or instruction would be required by the sequential secondary hash value ANDed with the mask Chapter 5. Storage Control 785 Version 2.06 execution model in the absence of the following Programming Note kinds of interrupts: system-caused interrupts (see Section 6.4 on The following cases apply to virtual pages in a seg- page 824) ment with a smaller page size, in which case there Floating-Point Enabled Exception type Pro- may be multiple PTEs for the virtual page. gram interrupts when the thread is in an When a virtual page is accessed, hardware Imprecise mode. maintains the R and C bits in one or more of the PTEs corresponding to the referenced vir- The only exception to the preceding statement is tual page, and not necessarily the PTE that that the Change bit is not set to 1 if the instruction would be used if the Page Table were is a Store String Indexed instruction for which the searched for the storage access (see Section XER specifies a length of zero. 5.7.7.1). Therefore the R and C bits for a given virtual page in a segment with a smaller base Programming Note page size may each be updated in a different A virtual page in a segment with a smaller base PTE (subject to the requirements in Figure 28). page size may be mapped by multiple PTEs. For If a virtual page access results in the R or C each access of a virtual page, hardware may bits needing to be set to 1, the Page Table may search the Page Table to update the R and C bits. If be searched for any PTE corresponding to the lookaside buffer information for the virtual page virtual page accessed, and if the PTE is not already indicates that all such bits to be set have found, a Data Storage Interrupt or Hypervisor already been set in a PTE that maps the virtual Data Storage Interrupt may occur. page, hardware need not make an update. Con- sider the following sequence of events: When the hardware updates the Reference and 1. A virtual page is mapped by 2 PTEs A and B Change bits in the Page Table Entry, the accesses are and the R and C bits in both PTEs are 0. performed as described in Section 5.7.3.5, "Storage 2. A Load instruction accesses the virtual page Control Attributes for Implicit Storage Accesses" on and the R bit is updated in PTE A. page 777. The accesses may be performed using oper- 3. A Load instruction accesses the virtual page ations equivalent to a store to a byte, halfword, word, or and the R bit is updated in PTE B. doubleword, and are not necessarily performed as an 4. A Store instruction accesses the virtual page atomic read/modify/write of the affected bytes. and the C bit is updated in PTE B. These Reference and Change bit updates are not nec- 5. The virtual page is paged out. Software must essarily immediately visible to software. Executing a examine both PTE A and B to get the state of sync instruction ensures that all Reference and the R and C bits for the virtual page. Change bit updates associated with address transla- Furthermore, if in event 2, PTE A was not found, a tions that were performed, by the thread executing the Data Storage interrupt or Hypervisor Data Storage sync instruction, before the sync instruction is exe- interrupt may occur. Subsequently, if in event 3 or cuted will be performed with respect to that thread 4, PTE B was not found, a Data Storage interrupt or before the sync instruction's memory barrier is created. Hypervisor Data Storage interrupt may occur. There are additional requirements for synchronizing Reference and Change bit updates in multi-threaded systems; see Section 5.10, "Page Table Update Syn- chronization Requirements" on page 818. Programming Note Because the sync instruction is execution synchro- nizing, the set of Reference and Change bit updates that are performed with respect to the thread executing the sync instruction before the memory barrier is created includes all Reference and Change bit updates associated with instruc- tions preceding the sync instruction. If software refers to a Page Table Entry when MSRDR=1, the Reference and Change bits in the asso- ciated Page Table Entry are set as for ordinary loads and stores. See Section 5.10 for the rules software must follow when updating Reference and Change bits. 786 Power ISATM Book III-S Version 2.06 Figure 23 on page 789 summarizes the rules for setting the Reference and Change bits. The table applies to each atomic storage reference. It should be read from the top down; the first line matching a given situation applies. For example, if stwcx. fails due to both a stor- age protection violation and the lack of a reservation, the Change bit is not altered. In the figure, the "Load-type" instructions are the Load instructions described in Books I, II, and III-S, eciwx, and the Cache Management instructions that are treated as Loads. The "Store-type" instructions are the Store instructions described in Books I, II, and III-S, ecowx, and the Cache Management instructions that are treated as Stores. The "ordinary" Load and Store instructions are those described in Books I, II, and III-S. "set" means "set to 1". Status of Access R C Indexed Move Assist insn w 0 len in XER No No Storage protection violation Acc1 No Out-of-order I-fetch or Load-type insn or Acc No dcbtst Out-of-order Store-type insn (except dcbtst) Would be required by the sequential execution model in the absence of system-caused or imprecise interrupts3 Acc Acc1 2 All other cases Acc No In-order Load-type or Store-type insn, access not performed4 Load-type insn Acc No Store-type insn Acc Acc2 Other in-order access I-fetch Yes No Ordinary Load, eciwx Yes No Other ordinary Store, ecowx, dcbz Yes Yes icbi, dcbt, dcbtst, dcbst, dcbf[l] Acc No "Acc" means that it is acceptable to set the bit. 1 It is preferable not to set the bit. 2 If C is set, R is also set unless it is already set. 3 For Floating-Point Enabled Exception type Pro- gram interrupts, "imprecise" refers to the excep- tion mode controlled by MSRFE0 FE1. 4 This case does not apply to the Touch instruc- tions, because they do not cause a storage access. Figure 23. Setting the Reference and Change bits Chapter 5. Storage Control 787 Version 2.06 5.7.9 Storage Protection (AMR), shown in Figure 24. The access permissions associated with the Virtual Page Class Key Protection The storage protection mechanism provides a means mechanism apply only to data accesses, and only for selectively granting instruction fetch access, grant- when address translation is enabled. The Virtual Page ing read access, granting write access, and prohibiting Class Key Protection mechanism has no effect on access to areas of storage based on a number of con- instruction fetches. trol criteria. Programming Note The operation of the storage protection mechanism depends on the contents of one or more of the follow- If address translation is disabled for a given ing. access, the access is not affected by the Virtual Page Class Key Protection mechanism even if the - MSR bits HV, IR, DR, PR access is made in virtual real addressing mode. - the key bits in the associated SLB entry - the page protection bits and key bits in the associated PTE Key0 Key1 Key2 ... Key29 Key30 Key31 - the AMR, AMOR, and UAMOR 0 2 4 6 58 60 62 - LPCR bits LPES1 and VPM0 Bits Name Description The storage protection mechanism consists of the Vir- 0:1 Key0 Access mask for class number 0 tual Page Class Key Protection mechanism, described 2:3 Key1 Access mask for class number 1 in Section 5.7.9.1, and the Basic Storage Protection ... ... ... mechanism, described in Section 5.7.9.2 and Section 2n:2n+1 Keyn Access mask for class number n 5.7.9.3. ... ... ... When address translation is enabled for an access, the 62:63 Key31 Access mask for class number 31 access is permitted if and only if the access is permit- ted by both the Virtual Page Class Key Protection Figure 24. Authority Mask Register (AMR) mechanism and the Basic Storage Protection mecha- The access mask for each class defines the access nism. When address translation is disabled for an permissions that apply to loads and stores for which the access, the access is permitted if and only if the access virtual address is translated using a Page Table Entry is permitted by the Basic Storage Protection mecha- that contains a KEY field value equal to the class num- nism. If an instruction fetch is not permitted, an Instruc- ber. The access permissions associated with each tion Storage exception or a Hypervisor Instruction class are defined as follows, where AMR2n and Storage exception is generated. If a data access is not AMR2n+1 refer to the first and second bits of the access permitted, a Data Storage exception or a Hypervisor mask corresponding to class number n. Data Storage exception is generated. - A store is permitted if AMR2n=0b0; otherwise A protection domain is a maximal range of effective the store is not permitted. addresses for which variables related to storage protec- tion can be independently specified (including by - A load is permitted if AMR2n+1=0b0; otherwise default, as in real and hypervisor real addressing the load is not permitted. modes), or a maximal range of addresses, effective or The AMR can be accessed using either SPR 13 or virtual, for which variables related to storage protection SPR 29. Access to the AMR using SPR 29 is privi- cannot be specified. Examples include: a segment, a leged. virtual page (including for a virtualized Real Mode Area), the Real Mode Area (regardless of whether the Programming Note RMA is virtualized), the effective address range 0:260-1 Because the AMR is part of the program context (if in hypervisor real addressing mode, and a maximal address translation is enabled), and because it is range of effective or virtual addresses that cannot be desirable for most application programmers not to mapped to real addresses. A protection boundary is a have to understand the software synchronization boundary between protection domains. requirements for context alterations (or the nuances of address translation and storage protec- 5.7.9.1 Virtual Page Class Key Protec- tion), operating systems should provide a system tion library program that application programs can use to modify the AMR. The Virtual Page Class Key Protection mechanism pro- vides the means to assign virtual pages to one of 32 The Authority Mask Override Register (AMOR) and the classes, and to modify access permissions for each User Authority Mask Override Register (UAMOR), class quickly by modifying the Authority Mask Register shown in Figure 25 and Figure 26 respectively, can be 788 Power ISATM Book III-S Version 2.06 used to restrict modifications (mtspr) of the AMR. Also, Programming Note the AMOR can be used to restrict modifications of the UAMOR. Access to both registers is privileged. The The preceding requirement permits designs to AMOR is a hypervisor resource. implement the AMOR and/or UAMOR as 32-bit registers -- specifically, to implement only the AMOR even-numbered bits (or only the odd-numbered bits) of the register -- in a manner such that the 0 63 reduction, from the architecturally-required 64 bits Figure 25. Authority Mask Override Register to 32 bits, is not visible to (correct) software. This (AMOR) implementation technique saves space in the hard- ware. (A design that uses this technique does the UAMOR appropriate "fan in/out" when the register is 0 63 accessed, to provide the appearance, to (correct) software, of supporting all 64 bits of the register.) Figure 26. User Authority Mask Override Register (UAMOR) Permitting designs to implement the [U]AMOR as 32-bit registers by virtue of the software require- The bits of the AMOR and UAMOR are in 1-1 corre- ment specified above, rather than by defining the spondence with the bits of the AMR (i.e., [U]AMORi [U]AMOR as 32-bit registers, permits the architec- corresponds to AMRi). The AMOR affects modifications ture to be extended in the future to support control- of the AMR and UAMOR in privileged but non hypervi- ling modification of the "read access" AMR bits (the sor state; the UAMOR affects modifications of the AMR odd-numbered bits) independently from the "write in problem state. access" AMR bits (the even-numbered bits), if that When mtspr specifying the AMR (using either proves desirable. If this independent control does SPR 13 or SPR 29) is executed in privileged but prove desirable, the only architecture change non-hypervisor state, the AMOR is used as a mask would be to eliminate the software requirement. that controls which bits of the resulting AMR con- tents come from register RS and which AMR bits are not modified. Programming Note Similarly, when mtspr specifying the AMR (using When modifying the AMOR and/or UAMOR, the SPR 13) is executed in problem state, the UAMOR hypervisor should ensure that the two registers are is used as a mask that controls which bits of the consistent with one another before giving control to resulting AMR contents come from register RS and a non-hypervisor program. In particular, the hyper- which AMR bits are not modified. visor should ensure that if AMORi=0 then When mtspr specifying the UAMOR is executed in UAMORi=0, for all i in the range 0:63. (Having privileged but non-hypervisor state, the AMOR is AMORi=0 and UAMORi=1 would permit problem ANDed with the contents of register RS and the state programs, but not the operating system, to result is placed into the UAMOR; the AMOR modify AMR bit i.) thereby controls which bits of the resulting UAMOR contents come from register RS and which UAMOR bits are set to zero. A complete description of these effects can be found in the description of the mtspr instruction on page 763. Software must ensure that both bits of each even/odd bit pair of the AMOR contain the same value. -- i.e., the contents of register RS for mtspr specifying the AMOR must be such that (RS)2n = (RS)2n+1 for every n in the range 0:31 -- and likewise for the UAMOR. If this requirement is violated for the UAMOR the results of accessing the UAMOR (including implicitly by the hard- ware as described in the second item of the preceding list) are boundedly undefined; if the requirement is vio- lated for the AMOR the results of accessing the AMOR (including implicitly by the hardware as described in the first and third items of the list) are undefined. Chapter 5. Storage Control 789 Version 2.06 Programming Note The Virtual Page Class Key Protection mechanism replaces the Data Address Compare mechanism that was defined in versions of the architecture that precede Version 2.04 (e.g., the two facilities use some of the same resources, as described below). However, the Virtual Page Class Key Protection mechanism can be used to emulate the Data Address Compare mechanism. Moreover, programs that use the Data Address Compare mechanism can be modi- fied in a manner such that they will work correctly both on implementations that comply with versions of the architec- ture that precede Version 2.04 (and hence implement the Data Address Compare mechanism) and on implementations that comply with Version 2.04 of the architecture or with any subsequent version (and hence instead implement the Virtual Page Class Key Protection mechanism). The technique takes advantage of the facts that the SPR number for privileged access to the AMR (29) is the same as the SPR number for the Data Address Compare mechanism's ACCR (Address Compare Control Register), that KEY4 occupies the same bit in the PTE as the Data Address Compare mechanism's AC (Address Compare) bit, and that the definition of ACCR62:63 is very similar to the definition of each even-odd pair of AMR bits. The technique is as follows, where PTE1 refers to doubleword 1 of the PTE. - Set bits 2:3 and 62:63 of SPR 29 (which is also be used for any virtual pages for which it either the ACCR or the AMR) to x, where x is is desired that the Virtual Page Class Key Pro- the desired 2-bit value for controlling Data tection mechanism permit all accesses. Do Address Compare matches, and set bits 0:1 to not use PTEKEY =31. 0s. - When a Data Storage interrupt occurs, if - Set PTE154 (which is either the AC bit or DSISR42=1 then ignore the interrupt for KEY4) to the same value that the AC bit would Cache Management instructions other than be set to, and set PTE12:3 (which are either dcbz. (These instructions can cause a virtual RPN bits, that correspond to a real address page class key protection violation but cannot size larger than the size supported by any cause a Data Address Compare match.) Oth- implementation that supports the Data erwise treat the interrupt as if a Data Address Address Compare mechanism, or KEY0:1) Compare match had occurred. (Note: Cases and PTE152:53 (which are either reserved bits for which it is undefined whether a Data or KEY2:3) to 0s. Address Compare match occurs do not nec- essarily cause a virtual page class key protec- - Use PTEKEY values 0 and 1 only for purposes tion violation.) of emulating the Data Address Compare mechanism, except that PTEKEY value 0 may (Because privileged software can access the AMR using either SPR 13 or SPR 29, it might seem that, when SPR 13 was added to the architecture (in Version 2.06), SPR 29 should have been removed. SPR 29 is retained for two rea- sons: first, to avoid requiring privileged software to change to use the newer SPR number; and second, to retain the ability to emulate the Data Address Compare mechanism as described above.) 790 Power ISATM Book III-S Version 2.06 Programming Note Programming Note An example of the use of the AMOR (and UAMOR) Initialization of the UAMOR to all 0s, by the hypervi- is to support lightweight partitions, here called sor before dispatching a partition for the first time, "adjunct" partitions, that provide services (e.g., as described in the preceding Programming Note, device drivers) to "client" partitions. The adjunct permits operating systems (in partitions that run in partition would be managed by the hypervisor. It a compatibility mode corresponding to Version 2.06 would run in problem state with MSRHV PR=0b11, of the architecture or a subsequent version) to thereby restricting the resources it can modify migrate gradually to supporting problem state (MSRPR=1) and causing its interrupts to go to the access to the AMR -- specifically, to avoid having hypervisor (MSRHV=1), and it would share a Page to be changed immediately to modify the UAMOR Table with the client partition it serves. Typically, and to save the AMR contents when an interrupt each of the two partitions would have data storage occurs from problem state. Relatedly, having the that the other partition must not be able to access. UAMOR contain all 0s while an application pro- The hypervisor can use the AMOR, UAMOR, AMR, gram is running protects old application programs and PTE KEY field to provide the required protec- that are "AMR-unaware". In the absence of pro- tion. (The adjunct partition's lightness of weight gramming errors, such application programs would derives from not requiring an operating system, not attempt to read or modify the AMR. However, and especially from not requiring a full partition having the UAMOR contain all 0s protects such context switch (SLB flush, TLB flush, SDR1 programs against modifying the AMR inadvertently. change, etc.) when the client partition invokes the Permitting an "AMR-unaware" application program services of the adjunct partition.) to modify the AMR (inadvertently) is potentially For example, suppose each of the two partitions harmful for the obvious reasons. (The program must not be able to access any of the other parti- might set to 1 an AMR bit corresponding to tion's data storage. The hypervisor could use KEY accesses that are necessary in order for the pro- value j for all data virtual pages that only the gram to work correctly.) Moreover, even for an adjunct partition must be able to access. Before operating system that includes support for problem dispatching the client partition for the first time, the state modification of the AMR, having the UAMOR hypervisor would initialize the three registers as fol- contain all 0s allows the operating system to avoid lows. saving and restoring the AMR for "AMR-unaware" AMR: all 0s except bits 2j and 2j+1, which would application programs. Such an operating system contain 1s would provide a system service program that UAMOR: all 0s allows an application program to declare itself to be AMOR: all 1s except bits 2j and 2j+1, which would "AMR-aware" -- i.e., potentially to need to modify contain 0s the AMR. When an application program invokes this service, the operating system would set the Before dispatching the adjunct partition, the hyper- UAMOR to the non-zero value appropriate to the visor would set UAMOR to all 0s, and would set the access authorities (load and/or store, for one or AMR to all 1s except bits 2j and 2j+1, which would more key values) that the application program is be set to 0s. (Because the adjunct partition would allowed to modify, and thereafter would save and run in problem state, there is no need for the hyper- restore the AMR (and preserve the UAMOR) for visor to modify the AMOR, and the adjunct partition this application program. (Having the UAMOR con- cannot modify the UAMOR.) In addition, the hyper- tain all 0s does not prevent an "AMR-unaware" pro- visor would prevent the client partition from modify- gram from reading the AMR, but inadvertent ing or deleting PTEs that contain translations used reading of the AMR is likely to be much less harm- by the adjunct partition. ful than inadvertently modifying it.) (It may be desirable to avoid using KEY values 0, (For partitions that run in a compatibility mode cor- 1, and 31 for storage that only the adjunct partition responding to a version of the architecture that pre- can access, because these KEY values may be cedes Version 2.06, the PCR provides sufficient needed by the client partition to emulate the Data protection to application programs.) Address Compare mechanism, as described above. Also, old software, that was written for an implementation that complies with a version of the architecture that precedes Version 2.04 (the ver- 5.7.9.2 Basic Storage Protection, sion in which virtual page class keys were added), Address Translation Enabled effectively uses KEY 0 for all virtual pages.) When address translation is enabled, the Basic Storage Protection mechanism is controlled by the following. MSRPR, which distinguishes between supervisor (privileged) state and problem state Chapter 5. Storage Control 791 Version 2.06 Ks and Kp, the supervisor (privileged) state and VPM0, which distinguishes between real address- problem state storage key bits in the SLB entry ing mode and virtual real addressing mode used to translate the effective address RMLS, which specifies the real mode limit value PP, page protection bits 0:2 in the Page Table Using the above values, the following rules are applied. Entry used to translate the effective address For instruction fetches only: 1. If MSRHV=0 and LPES1=1 and VPM0=1, access - the N (No-execute) value used for the access authority is determined as described in Section (see Sections 5.7.6.1 and 5.7.7.3) 5.7.3.4. - PTEG, the G (Guarded) bit in the Page Table 2. If MSRHV=1 or LPES1=0 or VPM0=0, Figure 28 is Entry used to translate the effective address applied. The access is permitted for any entry in Using the above values, the following rules are applied. the figure except "no access". 1. For an instruction fetch, the access is not permitted LPES1 HV Access Authority if the N value is 1 or if PTEG=1. 0 0 no access 2. For any access except an instruction fetch that is 0 1 read/write not permitted by rule 1, a "Key" value is computed 1 0 read/write or no access1 using the following formula: 1 1 read/write Key (Kp & MSRPR) | (Ks & ¬MSRPR) 1 If the effective address for the access is less than Using the computed Key, Figure 27 is applied. An the value specified by the RMLS, the access instruction fetch is permitted for any entry in the authority is read/write; otherwise the access is not figure except "no access". A load is permitted for permitted. any entry except "no access". A store is permitted only for entries with "read/write". Figure 28. Protection states, address translation disabled Key PP Access Authority 0 000 read/write Programming Note 0 001 read/write The comparison described in note 1 in Figure 28 ignores bits 0:3 of the effective address and may 0 010 read/write ignore bits 4:63-m; see Section 5.7.3. 0 011 read only 0 110 read only 1 000 no access 1 001 read only 1 010 read/write 1 011 read only 1 110 no access All PP encodings not shown above are reserved. The results of using reserved PP encodings are bound- edly undefined. Figure 27. PP bit protection states, address translation enabled 5.7.9.3 Basic Storage Protection, Address Translation Disabled When address translation is disabled, the Basic Stor- age Protection mechanism is controlled by the following (see Chapter 2 and Section 5.7.3, "Real And Virtual Real Addressing Modes"). MSRHV, which (when MSRPR=0) distinguishes between hypervisor state and privileged but non- hypervisor state LPES1, which controls whether storage accesses are permitted in real addressing mode 792 Power ISATM Book III-S Version 2.06 5.8 Storage Control Attributes This section describes aspects of the storage control 5.8.1.1 Out-of-Order Accesses to attributes that are relevant only to privileged software Guarded Storage programmers. The rest of the description of storage control attributes may be found in Section 1.6 of Book II In general, Guarded storage is not accessed out-of- and subsections. order. The only exceptions to this rule are the following. Load Instruction 5.8.1 Guarded Storage If a copy of any byte of the storage operand is in a Storage is said to be "well-behaved" if the correspond- cache then that byte may be accessed in the cache or ing real storage exists and is not defective, and if the in main storage. effects of a single access to it are indistinguishable from the effects of multiple identical accesses to it. Data Instruction Fetch and instructions can be fetched out-of-order from well- If MSRHV IR=0b10 then an instruction may be fetched if behaved storage without causing undesired side any of the following conditions are met. effects. 1. The instruction is in a cache. In this case it may be Storage is said to be Guarded if any of the following fetched from the cache or from main storage. conditions is satisfied. 2. The instruction is in a real page from which an MSR bit IR or DR is 1 for instruction fetches or data instruction has previously been fetched, except accesses respectively, and the G bit is 1 in the rel- that if that previous fetch was based on condition 1 evant Page Table Entry. then the previously fetched instruction must have MSR bit IR or DR is 0 for instruction fetches or data been in the instruction cache. accesses respectively, MSRHV=1, and the storage 3. The instruction is in the same real page as an is outside the range(s) specified by the Hypervisor instruction that is required by the sequential execu- Real Mode Storage Control facility (see tion model, or is in the real page immediately fol- Section 5.7.3.3.1). lowing such a page. In general, storage that is not well-behaved should be Guarded. Because such storage may represent a con- Programming Note trol register on an I/O device or may include locations Software should ensure that only well-behaved that do not exist, an out-of-order access to such stor- storage is copied into a cache, either by accessing age may cause an I/O device to perform unintended as Caching Inhibited (and Guarded) all storage that operations or may result in a Machine Check. may not be well-behaved, or by accessing such storage as not Caching Inhibited (but Guarded) and The following rules apply to in-order execution of Load referring only to cache blocks that are well- and Store instructions for which the first byte of the behaved. storage operand is in storage that is both Caching Inhibited and Guarded. If a real page contains instructions that will be exe- cuted when MSRIR=0 and MSRHV=1, software Load or Store instruction that causes an atomic should ensure that this real page and the next real access page contain only well-behaved storage (or that the If any portion of the storage operand has been Hypervisor Real Mode Storage Control facility accessed and an External, Decrementer, Hypervi- specifies that this real page is not Guarded). sor Decrementer, or Imprecise mode Floating- Point Enabled exception is pending, the instruction completes before the interrupt occurs. 5.8.2 Storage Control Bits Load or Store instruction that causes an Alignment When address translation is enabled, each storage exception, or that causes a [Hypervisor] Data Stor- access is performed under the control of the Page age exception for reasons other than Data Address Table Entry used to translate the effective address. Breakpoint match. Each Page Table Entry contains storage control bits The portion of the storage operand that is in Cach- that specify the presence or absence of the corre- ing Inhibited and Guarded storage is not accessed. sponding storage control for all accesses translated by the entry as shown in Figure 29. (The corresponding rules for instructions that cause a Data Address Breakpoint match are given in Section 8.1.2.) Chapter 5. Storage Control 793 Version 2.06 combination of the attributes normally identified using the WIMG bits. That combination would normally be Bit Storage Control Attribute indicated by WIMG = 0b0010. W1,3 0 - not Write Through Required References to Caching Inhibited storage (or storage 1 - Write Through Required with I=1) elsewhere in the Power ISA have no applica- I3 0 - not Caching Inhibited tion to SAO storage or its WIMG encoding, despite the 1 - Caching Inhibited encoding using I=1. Conversely, references to storage that is not Caching Inhibited (or storage with I=0) apply M2 0 - not Memory Coherence Required to SAO storage or its WIMG encoding. References to 1 - Memory Coherence Required Write Through Required storage (or storage with W=1) G 0 - not Guarded elsewhere in the Power ISA have no application to SAO 1 - Guarded storage or its WIMG encoding, despite the fact that the 1 encoding uses W=1. Conversely, references to storage Support for the 1 value of the W bit is optional. Implementations that do not support the 1 value that is not Write Through Required (or storage with treat the bit as reserved and assume its value to W=0) apply to SAO storage or its WIMG encoding. be 0. 2 If a given real page is accessed concurrently as SAO [Category: Memory Coherence] Support for the 0 storage and as non-SAO storage, the result may be value of the M bit is optional, implementations that characteristic of the weakly consistent model. do not support the 0 value assume the value of the bit to be 1, and may either preserve the value of the bit or write it as 1. 3 [Category: SAO] The combination WIMG = Programming Note 0b1110 has behavior unrelated to the meanings of If an application program requests both the Write the individual bits. See see Section 5.8.2.1, "Stor- Through Required and the Caching Inhibited age Control Bit Restrictions" for additional informa- attributes for a given storage location, the operating tion. system should set the I bit to 1 and the W bit to 0. Figure 29. Storage control bits For implementations that support the SAO cate- gory, the operating system should provide a means When address translation is enabled, instructions are by which application programs can request SAO not fetched from storage for which the G bit in the Page storage, in order to avoid confusion with the pre- Table Entry is set to 1; see Section 5.7.9. ceding guideline (since SAO is encoded using WI=0b11). When address translation is disabled, the storage con- trol attributes are implicit; see Section 5.7.3.3. At any given time, the value of the W bit must be the In Sections 5.8.2.1 and 5.8.2.2, "access" includes same for all accesses to a given real page. accesses that are performed out-of-order, and refer- ences to W, I, M, and G bits include the values of those At any given time, the value of the I bit must be the bits that are implied when address translation is dis- same for all accesses to a given real page. abled. 5.8.2.2 Altering the Storage Control Programming Note Bits In a system consisting of only a single-threaded When changing the value of the W bit for a given real processor which has caches, correct coherent exe- page from 0 to 1, software must ensure that no thread cution does not require storage to be accessed as modifies any location in the page until after all copies of Memory Coherence Required, and accessing stor- locations in the page that are considered to be modified age as not Memory Coherence Required may give in the data caches have been copied to main storage better performance. using dcbst or dcbf[l]. When changing the value of the I bit for a given real 5.8.2.1 Storage Control Bit Restrictions page from 0 to 1, software must set the I bit to 1 and then flush all copies of locations in the page from the All combinations of W, I, M, and G values are permitted caches using dcbf[l] and icbi before permitting any except those for which both W and I are 1 and other accesses to the page. M||G 0b10. The combination WIMG = 0b1110 is used to identify the Strong Access Ordering (SAO) storage attribute (see Section 1.7.1, "Storage Access Ordering", in Book II). Because this attribute is not intended for general purpose programming, it is provided only for a single 794 Power ISATM Book III-S Version 2.06 Programming Note The storage control bit alterations described above are examples of cases in which the directives for application of statements about the W and I bits to SAO given in the third paragraph of the preceding subsection must be applied. A transition from the typical WIMG=0b0010 for ordinary storage to WIMG=0b1110 for SAO storage does not require the flush described above because both WIMG combinations indicate storage that is not Caching Inhibited. Programming Note It is recommended that dcbf be used, rather than dcbfl, when changing the value of the I or W bit from 0 to 1. (dcbfl would have to be executed on all threads for which the contents of the data cache may be inconsistent with the new value of the bit, whereas, if the M bit for the page is 1, dcbf need be executed on only one thread in the system.) When changing the value of the M bit for a given real page, software must ensure that all data caches are consistent with main storage. The actions required to do this are system-dependent. Programming Note For example, when changing the M bit in some directory-based systems, software may be required to execute dcbf[l] on each thread to flush all stor- age locations accessed with the old M value before permitting the locations to be accessed with the new M value. Additional requirements for changing the storage con- trol bits in the Page Table are given in Section 5.10. Chapter 5. Storage Control 795 Version 2.06 5.9 Storage Control Instructions 5.9.1 Cache Management Instructions This section describes aspects of cache management delayed Machine Check interrupt or a delayed Check- that are relevant only to privileged software program- stop. mers. Each implementation provides an efficient means by For a dcbz instruction that causes the target block to which software can ensure that all blocks that are con- be newly established in the data cache without being sidered to be modified in the data cache have been fetched from main storage, the hardware need not ver- copied to main storage before the thread enters any ify that the associated real address is valid. The exist- power conserving mode in which data cache contents ence of a data cache block that is associated with an are not maintained. invalid real address (see Section 5.6) can cause a 5.9.2 Synchronize Instruction The Synchronize instruction is described in respect to the thread executing the ptesync Section 4.4.3 of Book II, but only at the level required instruction, before any implicit accesses to the by an application programmer (sync with L=0 or L=1). affected Page Table Entries, by such Page Table This section describes properties of the instruction that searches, are performed with respect to that are relevant only to operating system and hypervisor thread. software programmers. This variant of the Synchronize In conjunction with the tlbie and tlbsync instruc- instruction is designated the Page Table Entry sync tions, the ptesync instruction provides an ordering and is specified by the extended mnemonic ptesync function for TLB invalidations and related storage (equivalent to sync with L=2). accesses on other threads as described in the tlb- The ptesync instruction has all of the properties of sync instruction description on page 817. sync with L=0 and also the following additional proper- ties. Programming Note The memory barrier created by the ptesync For instructions following a ptesync instruc- instruction provides an ordering function for the tion, the memory barrier need not order implicit storage accesses associated with all instructions storage accesses for purposes of address that are executed by the thread executing the pte- translation and reference and change record- sync instruction and, as elements of set A, for all ing. Reference and Change bit updates associated The functions performed by the ptesync with additional address translations that were per- instruction may take a significant amount of formed, by the thread executing the ptesync time to complete, so this form of the instruction instruction, before the ptesync instruction is exe- should be used only if the functions listed cuted. The applicable pairs are all pairs ai,bj in above are needed. Otherwise sync with L=0 which bj is a data access and ai is not an instruc- should be used (or sync with L=1, or eieio, if tion fetch. appropriate). The ptesync instruction causes all Reference and Section 5.10, "Page Table Update Synchroni- Change bit updates associated with address trans- zation Requirements" on page 818 gives lations that were performed, by the thread execut- examples of uses of ptesync. ing the ptesync instruction, before the ptesync instruction is executed, to be performed with respect to that thread before the ptesync instruc- tion's memory barrier is created. 5.9.3 Lookaside Buffer The ptesync instruction provides an ordering func- Management tion for all stores to the Page Table caused by All implementations have a Segment Lookaside Buffer Store instructions preceding the ptesync instruc- (SLB). For performance reasons, most implementa- tion with respect to searches of the Page Table tions also have implementation-specific lookaside infor- that are performed, by the thread executing the mation that is used in address translation. This ptesync instruction, after the ptesync instruction lookaside information may be: a Translation Lookaside completes. Executing a ptesync instruction Buffer (TLB) which is a cache of recently used Page ensures that all such stores will be performed, with 796 Power ISATM Book III-S Version 2.06 Table Entries (PTEs); a cache of recently used transla- Programming Note tions of effective addresses to real addresses; etc.; or any combination of these. Lookaside information, The function of all the instructions described in including the SLB, is managed using the instructions Sections 5.9.3.1 - 5.9.3.3 is independent of described in the subsections of this section. whether address translation is enabled or disabled. Lookaside information derived from PTEs is not neces- For a discussion of software synchronization sarily kept consistent with the Page Table. When soft- requirements when invalidating SLB and TLB ware alters the contents of a PTE, in general it must entries, see Chapter 10. also invalidate all corresponding implementation-spe- cific lookaside information; exceptions to this rule are described in Section 5.10.1.2. 5.9.3.1 SLB Management Instructions The effects of the slbie, slbia, and TLB Management Programming Note instructions on address translations, as specified in Accesses to a given SLB entry caused by the Sections 5.9.3.1 and 5.9.3.3 for the SLB and TLB instructions described in this section obey the respectively, apply to all implementation-specific looka- sequential execution model with respect to the con- side information that is used in address translation. tents of the entry and with respect to data depen- Unless otherwise stated or obvious from context, refer- dencies on those contents. That is, if an instruction ences to SLB entry invalidation and TLB entry invalida- sequence contains two or more of these instruc- tion elsewhere in the Books apply also to all tions, when the sequence has completed, the final implementation-specific lookaside information that is contents of the SLB entry and of General Purpose derived from SLB entries and PTEs respectively. Registers is as if the instructions had been exe- The tlbia instruction is optional. However, all implemen- cuted in program order. tations provide a means by which software can invali- However, software synchronization is required in date all implementation-specific lookaside information order to ensure that any alterations of the entry that is derived from PTEs. take effect correctly with respect to address trans- Implementation-specific lookaside information that con- lation; see Chapter 10. tains translations of effective addresses to real addresses may include "translations" that apply in real addressing mode. Because such "translations" are affected by the contents of the LPCR, RMOR, and HRMOR, when software alters the contents of these registers it must also invalidate the corresponding implementation-specific lookaside information. Soft- ware can invalidate all such lookaside information by using the slbia instruction with IH=0b000. However, performance is likely to be better if other, appropriate, IH values are used to limit the amount of lookaside information that invalidated. All implementations that have such lookaside informa- tion provide a means by which software can invalidate all such lookaside information. For simplicity, elsewhere in the Books it is assumed that the TLB exists. Programming Note Because the instructions used to manage imple- mentation-specific lookaside information that is derived from PTEs may be changed in a future ver- sion of the architecture, it is recommended that software "encapsulate" uses of the TLB Manage- ment instructions into subroutines. Chapter 5. Storage Control 797 Version 2.06 SLB Invalidate Entry X-form The hardware ignores the contents of RB listed below and software must set them to 0s. slbie RB - (RB)37 - (RB)39:63 31 /// /// RB 434 / - If s = 40, (RB)24:35 0 6 11 16 21 31 If this instruction is executed in 32-bit mode, (RB)0:31 ea0:35 (RB)0:35 must be zeros. if, for SLB entry that translates This instruction is privileged. or most recently translated ea, entry_class = (RB)36 and Special Registers Altered: entry_seg_size = size specified in (RB)37:38 None then for SLB entry (if any) that translates ea SLBEV 0 Programming Note all other fields of SLBE undefined else slbie does not affect SLBs on other threads. s log_base_2(entry_seg_size) esid (RB)0:63-s u <- undefined 1-bit value Programming Note if u then The reason the class value specified by slbie must if an SLB entry translates esid be the same as the Class value that is or was in the SLBEV 0 relevant SLB entry is that the hardware may use all other fields of SLBE undefined these values to optimize invalidation of implemen- Let the Effective Address (EA) be any EA for which tation-specific lookaside information used in EA0:35 = (RB)0:35. Let the class be (RB)36. Let the seg- address translation. If the value specified by slbie ment size be equal to the segment size specified in differs from the value that is or was in the relevant (RB)37:38; the allowed values of (RB)37:38, and the cor- SLB entry, these optimizations may produce incor- respondence between the values and the segment rect results. (An example of implementation-spe- size, are the same as for the B field in the SLBE (see cific address translation lookaside information is Figure 17 on page 779). the set of recently used translations of effective addresses to real addresses that some implemen- The class value and segment size must be the same as tations maintain in an Effective to Real Address the class value and segment size in the SLB entry that Translation (ERAT) lookaside buffer.) translates the EA, or the values that were in the SLB entry that most recently translated the EA if the transla- When switching tasks in certain cases, it may be tion is no longer in the SLB; if these values are not the advantageous to preserve some implementation- same, it is implementation-dependent whether the SLB specific lookaside entries while invalidating others. entry (or implementation-dependent translation infor- The IH=0b001 invalidation hint of the slbia instruc- mation) that translates the EA is invalidated, and the tion can be used for this purpose if SLB class val- next paragraph need not apply. ues are appropriately assigned, i.e. a class value of 0 gives the hint that the entry should be preserved If the SLB contains only a single entry that translates and a class value of 1 indicates the entry must be the EA, then that is the only SLB entry that is invali- invalidated. Also, it is advantageous to assign a dated, except that it is implementation-dependent class value of 1 to entries that need to be invali- whether an implementation-specific lookaside entry for dated via an slbie instruction while preserving a real mode address "translation" is invalidated. If the implementation-specific lookaside entries that are SLB contains more than one such entry, then zero or not derived from an SLB entry since such entries more such entries are invalidated, and similarly for any are assigned a class value of 0. implementation-specific lookaside information used in address translation; additionally, a machine check may The Move To Segment Register instructions (see occur. Section 5.9.3.2.1) create SLB entries in which the Class value is 0. SLB entries are invalidated by setting the V bit in the entry to 0, and the remaining fields of the entry are set to undefined values. Programming Note The B value in register RB may be needed for invalidating ERAT entries corresponding to the translation being invalidated. 798 Power ISATM Book III-S Version 2.06 SLB Invalidate All X-form Programming Note slbia IH If slbia is executed when instruction address trans- lation is enabled, software can ensure that attempt- 31 // IH /// /// 498 / ing to fetch the instruction following the slbia does 0 6 8 11 16 21 31 not cause an Instruction Segment interrupt by plac- ing the slbia and the subsequent instruction in the effective segment mapped by SLB entry 0. (The for each SLB entry except SLB entry 0 preceding assumes that no other interrupts occur SLBEV 0 all other fields of SLBE undefined between executing the slbia and executing the subsequent instruction.) For all SLB entries except SLB entry 0, the V bit in the entry is set to 0, making the entry invalid, and the remaining fields of the entry are set to undefined val- Programming Note ues. SLB entry 0 is not altered. The defined values for IH are as follows. On implementations that have implementation-specific 0b000 All ERAT entries are invalidated. (This lookaside information for effective to real address value is not a hint.) This value should be translations, the IH field provides a hint that can be used by the hypervisor when relocating used to invalidate entries selectively in such lookaside itself (i.e. when modifying the HRMOR) or information. The defined values for IH are as follows. when reconfiguring real storage. 0b000 All such implementation-specific lookaside 0b001 Preserve ERAT entries with a Class value information is invalidated. (This value is not a of 0. This value should be used by an hint.) operating system when switching tasks in certain cases; for example, if SLBEC=0 is 0b001 Preserve such implementation-specific looka- used for SLB translations shared between side information having a Class value of 0. the tasks. 0b010 Preserve such implementation-specific looka- 0b010 Preserve ERAT entries created when side information created when MSRIR/DR=0. MSRIR/DR=0. This value should generally 0b110 Preserve such implementation-specific looka- be used by an operating system when side information created when MSRHV=1, switching tasks. MSRPR=0, and MSRIR/DR=0. 0b110 Preserve ERAT entries created when All other IH values are reserved. If the IH field contains MSRHV=1 and MSRIR/DR=0. This value a reserved value, the hint provided by the IH field is should be used by the hypervisor when undefined. switching partitions. Implementation specific lookaside information for which preservation is not requested is invalidated. Implemen- tation specific lookaside information for which preser- Programming Note vation is requested may be invalidated. slbia serves as both a basic and an extended mne- When IH=0b000, execution of this instruction has the monic. The Assembler will recognize an slbia mne- side effect of clearing the storage access history asso- monic with one operand as the basic form, and an ciated with the Hypervisor Real Mode Storage Control slbia mnemonic with no operand as the extended facility. See Section 5.7.3.3.1, "Hypervisor Real Mode form. In the extended form the IH operand is omit- Storage Control" for more details. ted and assumed to be 0. This instruction is privileged. Special Registers Altered: None Programming Note slbia does not affect SLBs on other threads. Chapter 5. Storage Control 799 Version 2.06 SLB Move To Entry X-form Programming Note slbmte RS,RB The reason slbmte cannot be used to invalidate an SLB entry is that it does not necessarily affect 31 RS /// RB 402 / implementation-specific address translation looka- 0 6 11 16 21 31 side information. slbie (or slbia) must be used for this purpose. The SLB entry specified by bits 52:63 of register RB is loaded from register RS and from the remainder of reg- ister RB. The contents of these registers are inter- preted as shown in Figure 30. RS B VSID KsKpNLC 0 LP 0s 0 2 52 57 58 60 63 RB ESID V 0s index 0 36 37 52 63 RS0:1 B RS2:51 VSID RS52 Ks RS53 Kp RS54 N RS55 L RS56 C RS57 must be 0b0 RS58:59 LP RS60:63 must be 0b0000 RB0:35 ESID RB36 V RB37:51 must be 0b000 || 0x000 RB52:63 index, which selects the SLB entry Figure 30. GPR contents for slbmte On implementations that support a virtual address size of only n bits, n<78, (RS)0:77-n must be zeros. (RS)57 and(RS)60:63 are be ignored by the hardware. High-order bits of (RB)52:63 that correspond to SLB entries beyond the size of the SLB provided by the implementation must be zeros. If this instruction is executed in 32-bit mode, (RB)0:31 must be zeros (i.e., the ESID must be in the range 0:15). This instruction cannot be used to invalidate an SLB entry. This instruction is privileged. Special Registers Altered: None SLB Move From Entry VSID X-form 0 6 11 16 21 31 slbmfev RT,RB If the SLB entry specified by bits 52:63 of register RB is valid (V=1), the contents of the B, VSID, Ks, Kp, N, L, C, 31 RT /// RB 851 / and LP fields of the entry are placed into register RT. 800 Power ISATM Book III-S Version 2.06 The contents of these registers are interpreted as SLB Move From Entry ESID X-form shown in Figure 31. slbmfee RT,RB RT 31 RT /// RB 915 / B VSID KsKpNLC 0 LP 0s 0 6 11 16 21 31 0 2 52 57 58 60 63 RB If the SLB entry specified by bits 52:63 of register RB is valid (V=1), the contents of the ESID and V fields of the 0s index entry are placed into register RT. The contents of these 0 52 63 registers are interpreted as shown in Figure 32. RT RT0:1 B RT2:51 VSID ESID V 0s RT52 Ks 0 36 37 63 RT53 Kp RT54 N RB RT55 L RT56 C 0s index RT57 set to 0b0 0 52 63 RT58:59 LP RT60:63 set to 0b0000 RT0:35 ESID RT36 V RB0:51 must be 0x0_0000_0000_0000 RT37:63 set to 0b000 || 0x00_0000 RB52:63 index, which selects the SLB entry RB0:51 must be 0x0_0000_0000_0000 Figure 31. GPR contents for slbmfev RB52:63 index, which selects the SLB entry On implementations that support a virtual address size Figure 32. GPR contents for slbmfee of only n bits, n<78, RT0:77-n are set to zeros. If the SLB entry specified by bits 52:63 of register RB is If the SLB entry specified by bits 52:63 of register RB is invalid (V=0), the contents of register RT are set to 0. invalid (V=0), the contents of register RT are set to 0. High-order bits of (RB)52:63 that correspond to SLB High-order bits of (RB)52:63 that correspond to SLB entries beyond the size of the SLB provided by the entries beyond the size of the SLB provided by the implementation must be zeros. implementation must be zeros. This instruction is privileged. This instruction is privileged. Special Registers Altered: Special Registers Altered: None None Chapter 5. Storage Control 801 Version 2.06 SLB Find Entry ESID X-form If this instruction is executed in 32-bit mode, (RB)0:31 must be zeros (i.e., the ESID must be in the range 0- slbfee. RT,RB 15). This instruction is privileged. 31 RT /// RB 979 1 0 6 11 16 21 31 Special Registers Altered: CR0 The SLB is searched for an entry that matches the effective address specified by register RB. The search is performed as if it were being performed for purposes of address translation. E.g., in order for a given entry to satisfy the search, the entry must be valid (V=1), and (RB)0:63-s must equal SLBE[ESID0:63-s] (where 2s is the segment size selected by the B field in the entry). If exactly one matching entry is found, the contents of the B, VSID, Ks, Kp, N, L, C, and LP fields of the entry are placed into register RT. If no matching entry is found, register RT is set to 0. If more than one matching entry is found, either one of the matching entries is used, as if it were the only matching entry, or a Machine Check occurs. If a Machine Check occurs, register RT, and CR Field 0 are set to undefined values, and the description below of how this register and this field is set does not apply. The contents of registers RT and RB are interpreted as shown in Figure 33. RT B VSID KsKpNLC 0 LP 0s 0 2 52 57 58 60 63 RB ESID 0s 0 40 63 RT0:1 B RT2:51 VSID RT52 Ks RT53 Kp RT54 N RT55 L RT56 C RT57 set to 0b0 RT58:59 LP RT60:63 set to 0b0000 RB0:35 ESID RB36:63 must be 0x0000000 Figure 33. GPR contents for slbfee. If s > 28, RT80-s:51 are set to zeros. On implementa- tions that support a virtual address size of only n bits, n < 78, RT2:79-n are set to zeros. CR Field 0 is set as follows. j is a 1-bit value that is equal to 0b1 if a matching entry was found. Otherwise, j is 0b0. CR0LT GT EQ SO = 0b00 || j || XERSO 802 Power ISATM Book III-S Version 2.06 5.9.3.2 Bridge to SLB Architecture [Category:Server.Phased-Out] The facility described in this section can be used to RS/RT ease the transition to the current Power ISA software- ::: . KsKpN 0 VSID23:49 managed Segment Lookaside Buffer (SLB) architec- 0 32 33 36 37 63 ture, from the Segment Register architecture provided by 32-bit PowerPC implementations. A complete RB description of the Segment Register architecture may --- ESID --- be found in "Segmented Address Translation, 32-Bit 0 32 36 63 Implementations," Section 4.5, Book III of Version 1.10 of the PowerPC architecture, referenced in the intro- Figure 34. GPR contents for mtsr, mtsrin, mfsr, and duction to this architecture. mfsrin The facility permits the operating system to continue to Programming Note use the 32-bit PowerPC implementation's Segment Register Manipulation instructions. The "Segment Register" format used by the instruc- tions described in this section corresponds to the 5.9.3.2.1 Segment Register low-order 32 bits of RS and RT shown in the figure. This format is essentially the same as that for the Manipulation Instructions Segment Registers of 32-bit PowerPC implementa- The instructions described in this section -- mtsr, tions. The only differences are the following. mtsrin, mfsr, and mfsrin -- allow software to associate Bit 36 corresponds to a reserved bit in Seg- effective segments 0 through 15 with any of virtual seg- ment Registers. Software must supply 0 for the ments 0 through 227-1. SLB entries 0:15 serve as vir- bit because it corresponds to the L bit in SLB tual Segment Registers, with SLB entry i used to entries, and large pages are not supported for emulate Segment Register i. The mtsr and mtsrin SLB entries created by the Move To Segment instructions move 32 bits from a selected GPR to a Register instructions. selected SLB entry. The mfsr and mfsrin instructions move 32 bits from a selected SLB entry to a selected VSID bits 23:25 correspond to reserved bits in GPR. Segment Registers. Software can use these extra VSID bits to create VSIDs that are larger The contents of the GPRs used by the instructions than those supported by the Segment Register described in this section are shown in Figure 34. Fields Manipulation instructions of 32-bit PowerPC shown as zeros must be zero for the Move To Segment implementations. Register instructions. Fields shown as hyphens are ignored. Fields shown as periods are ignored by the Bit 32 of RS and RT corresponds to the T (direct- Move To Segment Register instructions and set to zero store) bit of early 32-bit PowerPC implementations. by the Move From Segment Register instructions. No corresponding bit exists in SLB entries. Fields shown as colons are ignored by the Move To Segment Register instructions and set to undefined val- Programming Note ues by the Move From Segment Register instructions. The Programming Note in the introduction to Sec- tion 5.9.3.1 applies also to the Segment Register Manipulation instructions described in this section, and to any combination of the instructions described in the two sections, except as specified below for mfsr and mfsrin. The requirement that the SLB contain at most one entry that translates a given effective address (see Section 5.7.6.1) applies to SLB entries created by mtsr and mtsrin. This requirement is satisfied nat- urally if only mtsr and mtsrin are used to create SLB entries for a given ESID, because for these instructions the association between SLB entries and ESID values is fixed (SLB entry i is used for ESID i). However, care must be taken if slbmte is also used to create SLB entries for the ESID, because for slbmte the association between SLB entries and ESID values is specified by software. Chapter 5. Storage Control 803 Version 2.06 Move To Segment Register X-form Move To Segment Register Indirect X-form mtsr SR,RS mtsrin RS,RB 31 RS / SR /// 210 / 0 6 11 12 16 21 31 31 RS /// RB 242 / 0 6 11 16 21 31 The SLB entry specified by SR is loaded from register RS, as follows. The SLB entry specified by (RB)32:35 is loaded from register RS, as follows. SLBE Set to SLB Field(s) Bit(s) SLBE Set to SLB Field(s) 0:31 0x0000_0000 ESID0:31 Bit(s) 32:35 SR ESID32:35 0:31 0x0000_0000 ESID0:31 36 0b1 V 32:35 (RB)32:35 ESID32:35 37:38 0b00 B 36 0b1 V 39:61 0b000||0x0_0000 VSID0:22 62:88 (RS)37:63 VSID23:49 37:38 0b00 B 89:91 (RS)33:35 KsKpN 39:61 0b000||0x0_0000 VSID0:22 92 (RS)36 L ((RS)36 must be 0b0) 62:88 (RS)37:63 VSID23:49 93 0b0 C 89:91 (RS)33:35 KsKpN 94 0b0 reserved 92 (RS)36 L ((RS)36 must be 0b0) 95:96 0b00 LP 93 0b0 C 94 0b0 reserved MSRSF must be 0 when this instruction is executed; otherwise the results are boundedly undefined. 95:96 0b00 LP This instruction is privileged. MSRSF must be 0 when this instruction is executed; Special Registers Altered: otherwise the results are boundedly undefined. None This instruction is privileged. Special Registers Altered: None 804 Power ISATM Book III-S Version 2.06 Move From Segment Register X-form Move From Segment Register Indirect X-form mfsr RT,SR mfsrin RT,RB 31 RT / SR /// 595 / 0 6 11 12 16 21 31 31 RT /// RB 659 / 0 6 11 16 21 31 The contents of the low-order 27 bits of the VSID field and the contents of the Ks, Kp, N, and L fields of the The contents of the low-order 27 bits of the VSID field SLB entry specified by SR are placed into register RT and the contents of the Ks, Kp, N, and L fields of the as follows. SLB entry specified by (RB)32:35 are placed into regis- ter RT as follows. SLBE Bit(s) Copied to SLB Field(s) 62:88 RT37:63 VSID23:49 SLBE Bit(s) Copied to SLB Field(s) 89:91 RT33:35 KsKpN 62:88 RT37:63 VSID23:49 92 RT36 L (SLBEL must be 0b0) 89:91 RT33:35 KsKpN 92 RT36 L (SLBEL must be 0b0) RT32 is set to 0. The contents of RT0:31 are undefined. RT32 is set to 0. The contents of RT0:31 are undefined. MSRSF must be 0 when this instruction is executed; otherwise the results are boundedly undefined. MSRSF must be 0 when this instruction is executed; otherwise the results are boundedly undefined. This instruction must be used only to read an SLB entry that was, or could have been, created by mtsr or This instruction must be used only to read an SLB entry mtsrin and has not subsequently been invalidated (i.e., that was, or could have been, created by mtsr or an SLB entry in which ESID<16, V=1, VSID<227, L=0, mtsrin and has not subsequently been invalidated (i.e., and C=0). If the SLB entry is invalid (V=0), RT33:63 are an SLB entry in which ESID<16, V=1, VSID<227, L=0, set to 0. Otherwise the contents of register RT are and C=0). If the SLB entry is invalid (V=0), RT33:63 are undefined. set to 0. Otherwise the contents of register RT are undefined. This instruction is privileged. This instruction is privileged. Special Registers Altered: None Special Registers Altered: None Chapter 5. Storage Control 805 Version 2.06 5.9.3.3 TLB Management Instructions TLB Invalidate Entry X-form RB if L=1: AVA LP 0s B AVAL L tlbie RB,RS 0 44 52 54 56 63 RS32:63 contains an LPID value. The supported 31 RS /// RB 306 / (RS)32:63 values are the same as the LPID values sup- 0 6 11 16 21 31 ported in LPIDR. RS0:31 must contain zeros and are ignored by the hardware. L (RB)63 If the L field in RB contains 0, the base page size is 4 if L = 0 KB and RB56:58 (AP - Actual Page size field) must be then set to the SLBEL||LP encoding for the page size corre- base_pg_size = 4K sponding to the actual page size specified by the PTE actual_pg_size = that was used to create the TLB entry to be invalidated. page size specified in (RB)56:58 Thus, b is equal to 12 and p is equal to log2 (actual i = 51 else page size specified by (RB)56:58). The Abbreviated Vir- base_pg_size = tual Address (AVA) field in register RB must contain bits base page size specified in (RB)44:51 14:65 of the virtual address translated by the TLB entry actual_pg_size = to be invalidated. Variable i is equal to 51. actual page size specified in (RB)44:51 i = max(min(43,63-b),63-p) If the L field in RB contains 1, the following rules apply. b log_base_2(base_pg_size) The base page size and actual page size are spec- p log_base_2(actual_pg_size) ified in the LP field in register RB, where the rela- sg_size segment size specified in (RB)54:55 tionship between (RB)44:51 (LP - Large Page size for each thread selector field) and the base page size and actual for each TLB entry page size is the same as the relationship between if (entry_VA14:i+14 = (RB)0:i) & PTELP and the base page size and actual page (entry_sg_size = sg_size) & size, except for the "r" bits (see Section 5.7.7.1 on (entry_base_pg_size = base_pg_size) & page 782 and Figure 21 on page 783). Thus, b is (entry_actual_pg_size = actual_pg_size) & equal to log2 (base page size specified by ( ( TLBEs contain LPID & (TLBELPID = (RS)32:63) ) | (RB)44:51) and p is equal to log2 (actual page size ( TLBEs do not contain LPID & specified by (RB)44:51). Specifically, (RB)44+c:51 (LPIDRLPID = (RS)32:63) ) ) must be equal to the contents of bits c:7 of the LP then field of the PTE that was used to create the TLB if ((L = 0)|(b 20)) entry to be invalidated, where c is the maximum of then 0 and (20-p). TLB entry invalid Variable i is the larger of (63-p) and the value that else is the smaller of 43 and (63-b). (RB)0:i must con- if (entry_VA58:77-b = (RB)56:75-b) tain bits 14:(i+14) of the virtual address translated then TLB entry invalid by the TLB to be invalidated. If b>20, RB64-b:43 may contain any value and are ignored by the The operation performed by this instruction is based on hardware. the contents of registers RS and RB. The contents of If b<20, (RB)56:75-b must contain bits 58:77-b of these registers are shown below, where L is (RB)63. the virtual address translated by the TLB to be invalidated, and other bits in (RB)56:62 may contain RS: any value and are ignored by the hardware. If b20, (RB)56:62 (AVAL - Abbreviated Virtual 0s LPID Address, Lower) may contain any value and are 0 32 63 ignored by the hardware. Let the segment size be equal to the segment size RB if L=0: specified in (RB)54:55 (B field). The contents of RB54:55 must be the same as the contents of the B field of the AVA 0s B AP 0s L PTE that was used to create the TLB entry to be invali- 0 52 54 56 59 63 dated. RB52:53 and RB59:62 (when (RB)63 = 0) must contain zeros and are ignored by the hardware. 806 Power ISATM Book III-S Version 2.06 All TLB entries on all threads that have all of the follow- Programming Note ing properties are made invalid. The entry translates a virtual address for which all For tlbie[l] instructions in which (RB)63=0, the AP the following are true. value in RB is provided to make it easier for the VA14:14+i is equal to (RB)0:i. hardware to locate address translations, in looka- side buffers, corresponding to the address transla- L=0 or b20 or, if L=1 and b<20, tion being invalidated. VA58:77-b is equal to (RB)56:75-b. The segment size of the entry is the same as the For tlbie[l] instructions the AP specification is not segment size specified in (RB)54:55. binary compatible with versions of the architecture Either of the following is true: that precede Version 2.06. As an example, for an The L field in RB is 0, the base page size of actual page size of 64 KB AP=0b101, whereas the entry is 4 KB, and the actual page size of software written for an implementation that com- plies with a version of the architecture that pre- the entry matches the actual page size speci- cedes V. 2.06 would have AP=100 since AP was a fied in (RB)56:58. 1 bit value followed by 0s in RB57:58. If binary com- The L field in RB is 1, the base page size of patibility is important, for a 64 KB page software the entry matches the base page size speci- can use AP=0b101 on these earlier implementa- fied in (RB)44:51, and the actual page size of tions since these implementations were required to the entry matches the actual page size speci- ignore RB57:58. fied in (RB)44:51. Either of the following is true: The implementation's TLB entries contain Programming Note LPID values and TLBELPID = (RS)32:63. For tlbie[l] instructions the AVA and AVAL fields in The implementation's TLB entries do not con- RB contain different VA bits from those in PTEAVA. tain LPID values, and LPIDRLPID = (RS)32:63. The LPIDR used for this comparison is in the same thread as the TLB entry being tested. If the implementation's TLB entries contain LPID val- ues, additional TLB entries may also be made invalid if those TLB entries contain an LPID that matches (RS)32:63. If the implementation's TLB entries do not contain LPID values, additional TLB entries may also be made invalid on any thread that is in the partition specified by (RS)32:63. MSRSF must be 1 when this instruction is executed; otherwise the results are undefined. If the value specified in RS32:63, RB54:55, RB56:58 (when RB63=0), or RB44:51 (when RB63=1) is not sup- ported by the implementation, the instruction is treated as if the instruction form were invalid. The operation performed by this instruction is ordered by the eieio (or sync or ptesync) instruction with respect to a subsequent tlbsync instruction executed by the thread executing the tlbie instruction. The opera- tions caused by tlbie and tlbsync are ordered by eieio as a fourth set of operations, which is independent of the other three sets that eieio orders. This instruction is hypervisor privileged. See Section 5.10, "Page Table Update Synchronization Requirements" for a description of other requirements associated with the use of this instruction. Special Registers Altered: None Chapter 5. Storage Control 807 Version 2.06 TLB Invalidate Entry Local X-form IS=0b00 and L=1: tlbiel RB AVA LP IS B AVAL L 0 44 52 54 56 63 31 /// /// RB 274 / 0 6 11 16 21 31 IS=0b10 or 0b11: IS (RB)52:53 switch(IS) 0s SET IS 0s case (0b00): 0 40 52 54 63 L (RB)63 if L = 0 The Invalidation Selector (IS) field in RB has three then defined values (0b00, 0b10, and 0b11). The IS value of base_pg_size = 4K 0b01 is reserved and is treated in the same manner as actual_pg_size = the corresponding case for instruction fields (see page size specified in (RB)56:58 i = 51 Section 1.3.3, "Reserved Fields and Reserved Values" else on page 6 in Book I). base_pg_size = base page size specified in (RB)44:51 Engineering Note actual_pg_size = IS field in RB contains 0b00 actual page size specified in (RB)44:51 i = max(min(43,63-b),63-p) If the L field in RB contains 0, the base page size is b log_base_2(base_pg_size) 4 KB and RB56:58 (AP - Actual Page size field) p log_base_2(actual_pg_size) must be set to the SLBEL||LP encoding for the page sg_size segment size specified in (RB)54:55 size corresponding to the actual page size speci- for each TLB entry fied by the PTE that was used to create the TLB if (entry_VA14:i+14 = (RB)0:i) & entry to be invalidated. Thus, b is equal to 12 and p (entry_sg_size = segment_size) & is equal to log2 (actual page size specified by (entry_base_pg_size = base_pg_size) & (RB)56:58). The Abbreviated Virtual Address (AVA) (entry_actual_pg_size = actual_pg_size) & field in register RB must contain bits 14:65 of the (TLBEs do not contain LPID | (TLBEs contain LPID & (TLBELPID=LPIDRLPID))) virtual address translated by the TLB entry to be then invalidated. Variable i is equal to 51. if ((L = 0)|(b 20)) If the L field in RB contains 1, the following rules then apply. TLB entry invalid else The base page size and actual page size are if (entry_VA58:77-b = (RB)56:75-b) specified in the LP field in register RB, where then the relationship between (RB)44:51 (LP - Large TLB entry invalid Page size selector field) and the base page case (0b10): size and actual page size is the same as the i implementation-dependent number, 40i51 relationship between PTELP and the base for each TLB entry in set (RB)i:51 page size and actual page size, except for the if (TLBEs do not contain LPID | "r" bits (see Section 5.7.7.1 on page 782 and (TLBEs contain LPID & (TLBELPID=LPIDRLPID))) Figure 21 on page 783). Thus, b is equal to then TLB entry invalid log2 (base page size specified by (RB)44:51) case (0b11): i implementation-dependent number, 40i51 and p is equal to log2 (actual page size speci- if MSRHV then fied by (RB)44:51). Specifically, (RB)44+c:51 TLBEs in set (RB)i:51 invalid must be equal to the contents of bits c:7 of the else LP field of the PTE that was used to create the for each TLB entry in set (RB)i:51 TLB entry to be invalidated, where c is the if (TLBEs do not contain LPID | maximum of 0 and (20-p). (TLBEs contain LPID & (TLBELPID=LPIDRLPID))) Variable i is the larger of (63-p) and the value then TLB entry invalid that is the smaller of 43 and (63-b). (RB)0:i The operation performed by this instruction is based on must contain bits 14:(i+14) of the virtual the contents of register RB. The contents of RB are address translated by the TLB to be invali- shown below, where IS is (RB)52:53 and L is (RB)63. dated. If b>20, RB64-b:43 may contain any value and are ignored by the hardware. IS=0b00 and L=0: If b<20, (RB)56:75-b must contain bits 58:77-b of the virtual address translated by the TLB to AVA IS B AP 0s L be invalidated, and other bits in (RB)56:62 may 0 52 54 56 59 63 808 Power ISATM Book III-S Version 2.06 contain any value and are ignored by the tual address and same partition, then all these hardware. entries must be in a single set. If b20, (RB)56:62 (AVAL - Abbreviated Virtual If the IS field in RB contains 0b10 or 0b11, it is Address, Lower) may contain any value and implementation-dependent whether implementa- are ignored by the hardware. tion-specific lookaside information that contains Let the segment size be equal to the segment size translations of effective addresses to real specified in (RB)54:55 (B field). The contents of addresses is invalidated. RB54:55 must be the same as the contents of the B field of the PTE that was used to create the TLB RB0:39 (when (RB)52:53 = 0b10 or 0b11), RB59:62(when entry to be invalidated. (RB)52:53 = 0b00 and (RB)63=0), and RB54:63 (when (RB)52:53 = 0b10 or 0b11) must contain 0s and are Let the segment size be equal to the segment size ignored by the hardware. When i>40 and (RB)52:53 = specified in (RB)54:55 (B field). The contents of 0b10 or 0b11, RB40:i-1 may contain any value and are RB54:55 must be the same as the contents of PTEB used to create the TLB entry to be invalidated. ignored by the hardware. All TLB entries that have all of the following proper- Only TLB entries on the thread executing the tlbiel ties are made invalid on the thread executing the instruction are affected. tlbiel instruction. MSRSF must be 1 when this instruction is executed; The entry translates a virtual address for which all the following are true. otherwise the results are boundedly undefined. VA14:14+i is equal to (RB)0:i. If the value specified in RB54:55, RB56:58 (when L=0 or b20 or, if L=1 and b<20, RB52:53=0b00 and (RB)63 is 0), or RB44:51 (when VA58:77-b is equal to (RB)56:75-b. RB52:53=0b00 and (RB)63 is 1) is not supported by the The segment size of the entry is the same as implementation, the instruction is treated as if the the segment size specified in (RB)54:55. Either of the following is true: instruction form were invalid. The L field in RB is 0, the base page This instruction is privileged. size of the entry is 4 KB, and the actual page size of the entry matches the See Section 5.10, "Page Table Update Synchronization actual page size specified in (RB)56:58. Requirements" on page 818 for a description of other The L field in RB is 1, the base page requirements associated with the use of this instruction. size of the entry matches the base page size specified in (RB)44:51, and Special Registers Altered: the actual page size of the entry None matches the actual page size specified in (RB)44:51. Either of the following is true: The implementation's TLB entries do not contain LPID values. The implementation's TLB entries con- tain LPID values and TLBELPID = LPIDRLPID. IS field in RB contains 0b10 or 0b11 (RB)i:51 (bits i-40:11 of the SET field in (RB)) spec- ify a set of TLB entries, where i is an implementa- tion-dependent value in the range 40:51. Each entry in the set is invalidated if any of the following conditions are met for the entry. The implementation's TLB entries do not con- tain an LPID value. The IS field in RB contains 0b10 or MSRHV=0, the implementation's TLB entries contain an LPID value, and TLBELPID = LPIDRLPID. The IS field in RB contains 0b11 and MSRHV=1. How the TLB is divided into the 252-i sets is imple- mentation-dependent. The relationship of virtual addresses to these sets is also implementation- dependent. However, if, in an implementation, there can be multiple TLB entries for the same vir- Chapter 5. Storage Control 809 Version 2.06 Programming Note TLB Invalidate All X-form The primary use of this instruction by hypervisor tlbia software is to invalidate TLB entries prior to reas- signing a thread to a new logical partition. 31 /// /// /// 370 / For IS = 0b10 or 0b11, it is implementation-depen- 0 6 11 16 21 31 dent whether ERAT entries are invalidated. If the tlbiel instruction is being executed due to a parti- all TLB entries invalid tion swap, an slbia instruction can be used to inval- All TLB entries are made invalid on the thread execut- idate the pertinent ERAT entries. If the tlbiel ing the tlbia instruction. instruction is being executed to invalidate TLB entries with parity or ECC errors, the fact that the This instruction is hypervisor privileged. corresponding ERAT entries are not invalidated is immaterial. If the tlbiel instruction is being exe- This instruction is optional, and need not be imple- cuted to invalidate multiple matching TLB entries, mented. the fact that the corresponding ERAT entries are Special Registers Altered: not invalidated is immaterial for implementations None that never create multiple matching ERAT entries. The primary use of this instruction by operating Programming Note system software is to invalidate TLB entries that tlbia does not affect TLBs on other threads. were created by the hypervisor using an implemen- tation-specific hypervisor-managed TLB facility, if such a facility is provided. tlbiel may be executed on a given thread even if the sequence tlbie - eieio - tlbsync - ptesync is concurrently being executed on another thread. See also the Programming Notes with the descrip- tion of the tlbie instruction. 810 Power ISATM Book III-S Version 2.06 TLB Synchronize X-form tlbsync 31 /// /// /// 566 / 0 6 11 16 21 31 The tlbsync instruction provides an ordering function for the effects of all tlbie instructions executed by the thread executing the tlbsync instruction, with respect to the memory barrier created by a subsequent pte- sync instruction executed by the same thread. Execut- ing a tlbsync instruction ensures that all of the following will occur. All TLB invalidations caused by tlbie instructions preceding the tlbsync instruction will have com- pleted on any other thread before any data accesses caused by instructions following the pte- sync instruction are performed with respect to that thread. All storage accesses by other threads for which the address was translated using the translations being invalidated, and all Reference and Change bit updates associated with address translations that were performed by other threads using the translations being invalidated, will have been per- formed with respect to the thread executing the ptesync instruction, to the extent required by the associated Memory Coherence Required attributes, before the ptesync instruction's mem- ory barrier is created. The operation performed by this instruction is ordered by the eieio (or sync or ptesync) instruction with respect to preceding tlbie instructions executed by the thread executing the tlbsync instruction. The opera- tions caused by tlbie and tlbsync are ordered by eieio as a fourth set of operations, which is independent of the other three sets that eieio orders. The tlbsync instruction may complete before opera- tions caused by tlbie instructions preceding the tlb- sync instruction have been performed. This instruction is hypervisor privileged. See Section 5.10 for a description of other require- ments associated with the use of this instruction. Special Registers Altered: None Programming Note tlbsync should not be used to synchronize the completion of tlbiel. Chapter 5. Storage Control 811 Version 2.06 5.10 Page Table Update Synchronization Requirements This section describes rules that software must follow 5.10.1 Page Table Updates when updating the Page Table, and includes suggested sequences of operations for some representative TLBs are non-coherent caches of the HTAB. TLB cases. entries must be invalidated explicitly with one of the TLB Invalidate instructions. In the sequences of operations shown in the following subsections, the Page Table Entry is assumed to be for Unsynchronized lookups in the HTAB continue a virtual page for which the base page size is equal to even while it is being modified. Any thread, including the actual page size. If these page sizes are different, a thread on which software is modifying the HTAB, may multiple tlbie instructions are needed, one for each look in the HTAB at any time in an attempt to translate a PTE corresponding to the virtual page. virtual address. When modifying a PTE, software must ensure that the PTE's V bit is 0 if the PTE is inconsis- In the sequences of operations shown in the following tent (e.g., if the RPN field is not correct for the current subsections, any alteration of a Page Table Entry (PTE) AVA field). that corresponds to a single line in the sequence is assumed to be done using a Store instruction for which Updates of Reference and Change bits by the hard- the access is atomic. Appropriate modifications must ware are not synchronized with the accesses that be made to these sequences if this assumption is not cause the updates. When modifying doubleword 1 of satisfied (e.g., if a store doubleword operation is done a PTE, software must take care to avoid overwriting a using two Store Word instructions). hardware update of these bits and to avoid having the value written by a Store instruction overwritten by a Stores are not performed out-of-order, as described in hardware update. Section 5.5, "Performing Operations Out-of-Order" on page 770. Moreover, address translations associated Software must execute tlbie and tlbsync instructions with instructions preceding the corresponding Store only as part of the following sequence, and must instructions are not performed again after the stores ensure that no other thread will execute a "conflicting have been performed. (These address translations instruction" while the instructions in the sequence are must have been performed before the store was deter- executing on the given thread. mined to be required by the sequential execution model, because they might have caused an exception.) As a result, an update to a PTE need not be preceded tlbie instruction(s) specifying the same LPID oper- by a context synchronizing operation. and value eieio All of the sequences require a context synchronizing tlbsync operation after the sequence if the new contents of the ptesync PTE are to be used for address translations associated with subsequent instructions. Let L be the LPID value specified by the above tlbie instruction(s). The "conflicting instructions" in this case As noted in the description of the Synchronize instruc- are the following. tion in Section 4.4.3 of Book II, address translation a tlbie instruction that specifies an LPID value that associated with instructions which occur in program matches the value L order subsequent to the Synchronize (and this includes a tlbsync instruction that is part of a tlbie-eieio- the ptesync variant) may be performed prior to the tlbsync-ptesync sequence in which the tlbie completion of the Synchronize. To ensure that these instruction(s) specify an LPID value that matches instructions and data which may have been specula- the value L tively fetched are discarded, a context synchronizing an mtspr instruction that modifies the LPIDR, if the operation is required. modification has either of the following properties. - The old LPID value (i.e., the contents of the Programming Note LPIDR just before the mtspr instruction is In many cases this context synchronization will executed) is the value L occur naturally; for example, if the sequence is exe- cuted within an interrupt handler the rfid or hrfid - The new LPID value (i.e., the value specified instruction that returns from the interrupt handler by the mtspr instruction) is the value L may provide the required context synchronization. Other instructions (excluding mtspr instructions that modify the LPIDR as described above, and excluding Page Table Entries must not be changed in a manner tlbie instructions except as shown) may be interleaved that causes an implicit branch. with the instruction sequence shown above, but the instructions in the sequence must appear in the order 812 Power ISATM Book III-S Version 2.06 shown. On systems consisting of only a single- is used in an environment consisting of only a single- threaded processor, the eieio and tlbsync instructions threaded processor. can be omitted. Programming Note Programming Note For all of the sequences shown in the following The eieio instruction prevents the reordering of subsections, if it is necessary to communicate com- tlbie instructions previously executed with respect pletion of the sequence to software running on to the subsequent tlbsync instruction. The tlbsync another thread, the ptesync instruction at the end instruction and the subsequent ptesync instruction of the sequence should be followed by a Store together ensure that all storage accesses for which instruction that stores a chosen value to some cho- the address was translated using the translations sen storage location X. The memory barrier cre- being invalidated, and all Reference and Change ated by the ptesync instruction ensures that if a bit updates associated with address translations Load instruction executed by another thread that were performed using the translations being returns the chosen value from location X, the invalidated, will be performed with respect to any sequence's stores to the Page Table have been thread or mechanism, to the extent required by the performed with respect to that other thread. The associated Memory Coherence Required Load instruction that returns the chosen value attributes, before any data accesses caused by should be followed by a context synchronizing instructions following the ptesync instruction are instruction in order to ensure that all instructions performed with respect to that thread or mecha- following the context synchronizing instruction will nism. be fetched and executed using the values stored by the sequence (or values stored subsequently). (These instructions may have been fetched or exe- cuted out-of-order using the old contents of the Before permitting an mtspr instruction that modifies the PTE.) LPIDR to be executed on a given thread, software must ensure that no other thread will execute a "conflicting This Note assumes that the Page Table and loca- instruction" until after the mtspr instruction followed by tion X are in storage that is Memory Coherence a context synchronizing instruction have been executed Required. on the given thread (a context synchronizing event can be used instead of the context synchronizing instruc- tion; see Chapter 10). 5.10.1.1 Adding a Page Table Entry The "conflicting instructions" in this case are the follow- This is the simplest Page Table case. The V bit of the ing. old entry is assumed to be 0. The following sequence can be used to create a PTE, maintain a consistent a tlbie instruction specifying an LPID operand state, and ensure that a subsequent reference to the value that matches either the old or the new virtual address translated by the new entry will use the LPIDRLPID value correct real address and associated attributes a tlbsync instruction that is part of a tlbie-eieio- tlbsync-ptesync sequence in which the tlbie PTEARPN,LP,AC,R,C,WIMG,N,PP new values instruction(s) specify an LPID value that matches eieio /* order 1st update before 2nd */ either the old or the new LPIDRLPID value PTEB,AVA,SW,L,H,V new values (V=1) ptesync /* order updates before next Programming Note Page Table search and before The restrictions specified above regarding modify- next data access */ ing the LPIDR apply even on systems consisting of only a single-threaded processor, and even if the new LPID value is equal to the old LPID value. The sequences of operations shown in the following subsections assume a multi-threaded environment. In an environment consisting of only a single-threaded processor, the tlbsync must be omitted, and the eieio that separates the tlbie from the tlbsync can be omit- ted. In a multi-threaded environment, when tlbiel is used instead of tlbie in a Page Table update, the syn- chronization requirements are the same as when tlbie Chapter 5. Storage Control 813 Version 2.06 5.10.1.2 Modifying a Page Table Entry Modifying the SW field If the only change being made to a valid entry is to General Case modify the SW field, the following sequence suffices, If a valid entry is to be modified and the translation because the SW field is not used by the hardware and instantiated by the entry being modified is to be invali- doubleword 0 of the PTE is not modified by the hard- dated, the following sequence can be used to modify ware. the PTE, maintain a consistent state, ensure that the translation instantiated by the old entry is no longer loop: ldarx r1 PTE_dwd_0 /* load dwd 0 of PTE */ r157:60 new SW value /* replace SW, in r1 */ available, and ensure that a subsequent reference to stdcx. PTE_dwd_0 r1 /* store dwd 0 of PTE the virtual address translated by the new entry will use if still reserved (new SW value, other the correct real address and associated attributes. (The fields unchanged) */ sequence is equivalent to deleting the PTE and then bne- loop /* loop if lost reservation */ adding a new one; see Sections 5.10.1.1 and 5.10.1.3.) A lbarx/stbcx., lharx/sthcx., or lwarx/stwcx. pair PTEV 0 /* (other fields don't matter)*/ (specifying the low-order byte, halfword, or word ptesync /* order update before tlbie and respectively of doubleword 0 of the PTE) can be used before next Page Table search */ instead of the ldarx /stdcx. pair shown above. tlbie(old_B,old_VA14:77-b,old_L,old_LP,old_AP, old_LPID) Modifying the Virtual Address /*invalidate old translation*/ eieio /* order tlbie before tlbsync */ If the virtual address translated by a valid PTE is to be tlbsync /* order tlbie before ptesync */ modified and the new virtual address hashes to the ptesync /* order tlbie, tlbsync and 1st same PTEG (or the same two PTEGs if the secondary update before 2nd update */ Page Table search is enabled) as does the old virtual PTEARPN,LP,AC,R,C,WIMG,N,PP new values address, the following sequence can be used to modify eieio /* order 2nd update before 3rd */ PTEB,AVA,SW,L,H,V new values (V=1) the PTE, maintain a consistent state, ensure that the ptesync /* order 2nd and 3rd updates before translation instantiated by the old entry is no longer next Page Table search and available, and ensure that a subsequent reference to before next data access */ the virtual address translated by the new entry will use the correct real address and associated attributes. Resetting the Reference Bit PTEAVA,SW,L,H,V new values (V=1) If the only change being made to a valid entry is to set ptesync /* order update before tlbie and the Reference bit to 0, a simpler sequence suffices before next Page Table search */ because the Reference bit need not be maintained tlbie(old_B,old_VA14:77-b,old_L,old_LP,old_AP, exactly. old_LPID) /*invalidate old translation*/ eieio /* order tlbie before tlbsync */ oldR PTER /* get old R */ tlbsync /* order tlbie before ptesync */ if oldR = 1 then ptesync /* order tlbie, tlbsync, and update PTER 0 /* store byte (R=0, other bits before next data access */ unchanged) */ tlbie(B,VA14:77-b,L,LP,AP,LPID) /* invalidate entry */ eieio /* order tlbie before tlbsync */ 5.10.1.3 Deleting a Page Table Entry tlbsync /* order tlbie before ptesync */ ptesync /* order tlbie, tlbsync, and update The following sequence can be used to ensure that the before next Page Table search translation instantiated by an existing entry is no longer and before next data access */ available. PTEV 0 /* (other fields don't matter) */ ptesync /* order update before tlbie and before next Page Table search */ tlbie(old_B,old_VA14:77-b,old_L,old_LP,old_AP, old_LPID) /*invalidate old translation*/ eieio /* order tlbie before tlbsync */ tlbsync /* order tlbie before ptesync */ ptesync /* order tlbie, tlbsync, and update before next data access */ 814 Power ISATM Book III-S Version 2.06 Chapter 6. Interrupts 6.1 Overview. . . . . . . . . . . . . . . . . . . . 821 6.5.7 External Interrupt . . . . . . . . . . . . 835 6.2 Interrupt Registers . . . . . . . . . . . . 822 6.5.8 Alignment Interrupt. . . . . . . . . . . 835 6.2.1 Machine Status Save/Restore Regis- 6.5.9 Program Interrupt . . . . . . . . . . . . 837 ters . . . . . . . . . . . . . . . . . . . . . . . . . . . 822 6.5.10 Floating-Point Unavailable 6.2.2 Hypervisor Machine Status Save/ Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 838 Restore Registers . . . . . . . . . . . . . . . . 822 6.5.11 Decrementer Interrupt . . . . . . . 839 6.2.3 Data Address Register . . . . . . . 822 6.5.12 Hypervisor Decrementer 6.2.4 Hypervisor Data Address Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 839 Register. . . . . . . . . . . . . . . . . . . . . . . . 822 6.5.13 System Call Interrupt . . . . . . . . 839 6.2.5 Data Storage Interrupt 6.5.14 Trace Interrupt [Category: Status Register . . . . . . . . . . . . . . . . . . 822 Trace] . . . . . . . . . . . . . . . . . . . . . . . . . . 839 6.2.6 Hypervisor Data Storage Interrupt 6.5.15 Hypervisor Data Storage Status Register . . . . . . . . . . . . . . . . . . 823 Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 840 6.2.7 Hypervisor Emulation Instruction 6.5.16 Hypervisor Instruction Storage Register. . . . . . . . . . . . . . . . . . . . . . . . 823 Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 841 6.2.8 Hypervisor Maintenance Exception 6.5.17 Hypervisor Emulation Assistance Register. . . . . . . . . . . . . . . . . . . . . . . . 823 Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 842 6.2.9 Hypervisor Maintenance Exception 6.5.18 Hypervisor Maintenance Enable Register. . . . . . . . . . . . . . . . . . 823 Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 842 6.3 Interrupt Synchronization . . . . . . . 824 6.5.19 Performance Monitor 6.4 Interrupt Classes . . . . . . . . . . . . . 824 Interrupt [Category: Server.Performance 6.4.1 Precise Interrupt . . . . . . . . . . . . 824 Monitor] . . . . . . . . . . . . . . . . . . . . . . . . 843 6.4.2 Imprecise Interrupt. . . . . . . . . . . 824 6.5.20 Vector Unavailable Interrupt [Cate- 6.4.3 Interrupt Processing . . . . . . . . . 825 gory: Vector] . . . . . . . . . . . . . . . . . . . . 843 6.4.4 Implicit alteration of HSRR0 and 6.5.21 VSX Unavailable Interrupt [Cate- HSRR1 . . . . . . . . . . . . . . . . . . . . . . . . 827 gory: VSX] . . . . . . . . . . . . . . . . . . . . . . 843 6.5 Interrupt Definitions . . . . . . . . . . . 828 6.6 Partially Executed 6.5.1 System Reset Interrupt . . . . . . . 829 Instructions . . . . . . . . . . . . . . . . . . . . . 844 6.5.2 Machine Check Interrupt . . . . . . 831 6.7 Exception Ordering . . . . . . . . . . . . 845 6.5.3 Data Storage Interrupt . . . . . . . . 832 6.7.1 Unordered Exceptions . . . . . . . . 845 6.5.4 Data Segment Interrupt . . . . . . . 833 6.7.2 Ordered Exceptions . . . . . . . . . . 845 6.5.5 Instruction Storage Interrupt . . . 834 6.8 Interrupt Priorities . . . . . . . . . . . . . 845 6.5.6 Instruction Segment Interrupt. . . . . . . . . . . . . . . . . . . . . . . . 834 6.1 Overview one interrupt is reported, and when it is processed (taken) no program state is lost. Since Save/Restore The Power ISA provides an interrupt mechanism to Registers SRR0 and SRR1 are serially reusable allow the thread to change state as a result of external resources used by most interrupts, program state may signals, errors, or unusual conditions arising in the exe- be lost when an unordered interrupt is taken. cution of instructions. System Reset and Machine Check interrupts are not ordered. All other interrupts are ordered such that only Chapter 6. Interrupts 821 Version 2.06 6.2 Interrupt Registers Programming Note Execution of some instructions, and fetching instructions when MSRIR=1, may have the side 6.2.1 Machine Status Save/ effect of modifying HSRR0 and HSRR1; see Sec- Restore Registers tion 6.4.4. When various interrupts occur, the state of the machine is saved in the Machine Status Save/Restore registers (SRR0 and SRR1). Section 6.5 describes which regis- 6.2.3 Data Address Register ters are altered by each interrupt. The Data Address Register (DAR) is a 64-bit register that is set by the Machine Check, Data Storage, Data SRR0 // Segment, and Alignment interrupts; see Sections 6.5.2, 0 62 63 6.5.3, 6.5.4, and 6.5.8. In general, when one of these interrupts occurs the DAR is set to an effective address SRR1 associated with the storage access that caused the 0 63 interrupt, with the high-order 32 bits of the DAR set to 0 if the interrupt occurs in 32-bit mode. Figure 35. Save/Restore Registers SRR1 bits may be treated as reserved in a given imple- DAR mentation if they correspond to MSR bits that are 0 63 reserved or are treated as reserved in that implementa- Figure 37. Data Address Register tion and, for SRR1 bits in the range 33:36, 42:43, and 45:47, they are specified as being set either to 0 or to an undefined value for all interrupts that set SRR1 6.2.4 Hypervisor Data Address (including implementation-dependent setting, e.g. by the Machine Check interrupt or by implementation-spe- Register cific interrupts). SRR144 cannot be treated as reserved, The Hypervisor Data Address Register (HDAR) is a 64- regardless of how it is set by interrupts, because it is bit register that is set by the Hypervisor Data Storage used by software, as described in a Programming Note Interrupt; see Section 6.5.15. In general, when this near the end of Section 6.5.9, "Program Interrupt" on interrupt occurs, the HDAR is set to an effective page 837. address associated with the storage access that caused the interrupt, with the high-order 32 bits of the 6.2.2 Hypervisor Machine Status HDAR set to 0 if the interrupt occurs in 32-bit mode. Save/Restore Registers HDAR When various interrupts occur, the state of the machine 0 63 is saved in the Hypervisor Machine Status Save/ Figure 38. Hypervisor Data Address Register Restore registers (HSRR0 and HSRR1). Section 6.5 describes which registers are altered by each interrupt. 6.2.5 Data Storage Interrupt HSRR0 // Status Register 0 62 63 The Data Storage Interrupt Status Register (DSISR) is HSRR1 a 32-bit register that is set by the Machine Check, Data 0 63 Storage, Data Segment, and Alignment interrupts; see Sections 6.5.2, 6.5.3, 6.5.4, and 6.5.8. In general, when Figure 36. Hypervisor Save/Restore Registers one of these interrupts occurs the DSISR is set to indi- cate the cause of the interrupt. HSRR1 bits may be treated as reserved in a given implementation if they correspond to MSR bits that are reserved or are treated as reserved in that implementa- DSISR tion and, for HSRR1 bits in the range 33:36 and 42:47, 32 63 they are specified as being set either to 0 or to an Figure 39. Data Storage Interrupt Status Register undefined value for all interrupts that set HSRR1 (including implementation-dependent setting, e.g. by DSISR bits may be treated as reserved in a given implementation-specific interrupts). implementation if they are specified as being set either to 0 or to an undefined value for all interrupts that set The HSRR0 and HSRR1 are hypervisor resources; see the DSISR (including implementation-dependent set- Chapter 2. 822 Power ISATM Book III-S Version 2.06 ting, e.g. by the Machine Check interrupt or by imple- Others Implementation-specific. mentation-specific interrupts). When the mtspr instruction is executed with the HMER as the encoded Special Purpose Register, the contents 6.2.6 Hypervisor Data Storage of register RS are ANDed with the contents of the HMER and the result is placed into the HMER. Interrupt Status Register The exception bits in the HMER are sticky; that is, once The Hypervisor Data Storage Interrupt Status Register set to 1 they remain set to 1 until they are set to 0 by an (HDSISR) is a 32-bit register that is set by the Hypervi- mthmer instruction. sor Data Storage interrupt. In general, when one of these interrupts occurs the HDSISR is set to indicate Programming Note the cause of the interrupt. An access to the HMER is likely to be very slow. Software should access it sparingly. HDSISR 32 63 Figure 40. Hypervisor Data Storage Interrupt Status Register 6.2.9 Hypervisor Maintenance 6.2.7 Hypervisor Emulation Exception Enable Register Instruction Register The Hypervisor Maintenance Exception Enable Regis- ter (HMEER) is a 64-bit register in which each bit The Hypervisor Emulation Instruction Register (HEIR) enables the corresponding exception in the HMER to is a 32-bit register that is set by the Hypervisor Emula- cause the Hypervisor Maintenance interrupt, potentially tion Assistance interrupt; see Section 6.5.17. The causing exit from power-saving mode; see Section image of the instruction that caused the interrupt is 6.5.18 and Section 3.3.2. loaded into the register. HMEER HEIR 0 63 0 31 Figure 43. Hypervisor Maintenance Exception Figure 41. Hypervisor Emulation Instruction Enable Register Register 6.2.8 Hypervisor Maintenance Exception Register Each bit in the Hypervisor Maintenance Exception Register (HMER) is associated with one or more causes of the Hypervisor Maintenance exception, and is set when the associated exception(s) occur. If the corresponding bit in the Hypervisor Maintenance Exception Enable Register (HMEER) is set, a Hypervisor Maintenance Inter- rupt (HMI) may occur. If the thread is in a power- saving mode when the interrupt would have occurred, the thread will exit the power-saving mode; see Section 6.5.18 and Section 3.3.2. HMER 0 63 Figure 42. Hypervisor Maintenance Exception Register The contents of the HMER are as follows: 0 Set to 1 for a Malfunction Alert. 1 Set to 1 when performance is degraded for thermal reasons. 2 Set to 1 when thread recovery is invoked. Chapter 6. Interrupts 823 Version 2.06 6.3 Interrupt Synchronization appear to have completed with respect to the exe- cuting thread. When an interrupt occurs, SRR0 or HSRR0 is set to 3. The instruction causing the exception may appear point to an instruction such that all preceding instruc- not to have begun execution (except for causing tions have completed execution, no subsequent the exception), may have been partially executed, instruction has begun execution, and the instruction or may have completed, depending on the inter- addressed by SRR0 or HSRR0 may or may not have rupt type. completed execution, depending on the interrupt type. 4. Architecturally, no subsequent instruction has With the exception of System Reset and Machine begun execution. Check interrupts, all interrupts are context synchroniz- ing as defined in Section 1.5.1. System Reset and Machine Check interrupts are context synchronizing if 6.4.2 Imprecise Interrupt they are recoverable (i.e., if bit 62 of SRR1 is set to 1 This architecture defines one imprecise interrupt, the by the interrupt). If a System Reset or Machine Check Imprecise Mode Floating-Point Enabled Exception type interrupt is not recoverable (i.e., if bit 62 of SRR1 is set Program interrupt. to 0 by the interrupt), it acts like a context synchronizing operation with respect to subsequent instructions. That When an Imprecise Mode Floating-Point Enabled is, a non-recoverable System Reset or Machine Check Exception type Program interrupt occurs, the following interrupt need not satisfy items 1 through 3 of Section conditions exist at the interrupt point. 1.5.1, but does satisfy items 4 and 5. 1. SRR0 addresses either the instruction causing the exception or some instruction following that instruction; see Section 6.5.9, "Program Interrupt" 6.4 Interrupt Classes on page 837. Interrupts are classified by whether they are directly 2. An interrupt is generated such that all instructions caused by the execution of an instruction or are caused preceding the instruction addressed by SRR0 by some other system exception. Those that are "sys- appear to have completed with respect to the exe- tem-caused" are: cuting thread. System Reset 3. The instruction addressed by SRR0 may appear Machine Check not to have begun execution (except, in some External cases, for causing the interrupt to occur), may Decrementer have been partially executed, or may have com- Hypervisor Decrementer pleted; see Section 6.5.9. Hypervisor Maintenance 4. No instruction following the instruction addressed External, Decrementer, Hypervisor Decrementer, and by SRR0 appears to have begun execution. Hypervisor Maintenance interrupts are maskable inter- rupts. Therefore, software may delay the generation of All Floating-Point Enabled Exception type Program these interrupts. System Reset and Machine Check interrupts are maskable using the MSR bits FE0 and interrupts are not maskable. FE1. Although these interrupts are maskable, they dif- fer significantly from the other maskable interrupts in "Instruction-caused" interrupts are further divided into that the masking of these interrupts is usually con- two classes, precise and imprecise. trolled by the application program, whereas the mask- ing of all other maskable interrupts is controlled by either the operating system or the hypervisor. 6.4.1 Precise Interrupt Except for the Imprecise Mode Floating-Point Enabled Exception type Program interrupt, all instruction- caused interrupts are precise. When the fetching or execution of an instruction causes a precise interrupt, the following conditions exist at the interrupt point. 1. SRR0 addresses either the instruction causing the exception or the immediately following instruction. Which instruction is addressed can be determined from the interrupt type and status bits. 2. An interrupt is generated such that all instructions preceding the instruction causing the exception 824 Power ISATM Book III-S Version 2.06 6.4.3 Interrupt Processing Associated with each kind of interrupt is an interrupt vector, which contains the initial sequence of instruc- tions that is executed when the corresponding interrupt occurs. Interrupt processing consists of saving a small part of the thread's state in certain registers, identifying the cause of the interrupt in other registers, and continuing execution at the corresponding interrupt vector loca- tion. When an exception exists that will cause an inter- rupt to be generated and it has been determined that the interrupt will occur, the following actions are per- formed. The handling of Machine Check interrupts (see Section 6.5.2) differs from the description given below in several respects. 1. SRR0 or HSRR0 is loaded with an instruction address that depends on the type of interrupt; see the specific interrupt description for details. 2. Bits 33:36 and 42:47 of SRR1 or HSRR1 are loaded with information specific to the interrupt type. 3. Bits 0:32, 37:41, and 48:63 of SRR1 or HSRR1 are loaded with a copy of the corresponding bits of the MSR. 4. The MSR is set as shown in Figure 44 on page 828. In particular, MSR bits IR and DR are set to 0, disabling relocation, and MSR bit SF is set to 1, selecting 64-bit mode. The new values take effect beginning with the first instruction executed following the interrupt. 5. Instruction fetch and execution resumes, using the new MSR value, at the effective address specific to the interrupt type. These effective addresses are shown in Figure 45 on page 829. Interrupts do not clear reservations obtained with lbarx, lharx, lwarx, or ldarx. Programming Note In general, when an interrupt occurs, the following instructions should be executed by the operating system before dispatching a "new" program. stbcx., sthcx., stwcx., or stdcx., to clear the reservation if one is outstanding, to ensure that a lbarx, lharx, lwarx, or ldarx in the inter- rupted program is not paired with a stbcx., sthcx., stwcx., or stdcx. in the "new" pro- gram. sync, to ensure that all storage accesses caused by the interrupted program will be per- formed with respect to another thread before the program is resumed on that other thread. isync or rfid, to ensure that the instructions in the "new" program execute in the "new" con- text. Chapter 6. Interrupts 825 Version 2.06 Programming Note For instruction-caused interrupts, in some cases it may a category that the implementation does not be desirable for the operating system to emulate the support but is used by some programs that instruction that caused the interrupt, while in other the operating system supports. cases it may be desirable for the operating system not In general, the instruction should not be emulated if: to emulate the instruction. The following list, while not complete, illustrates criteria by which decisions regard- - The purpose of the instruction is to cause an ing emulation should be made. The list applies to gen- interrupt. Example: System Call interrupt eral execution environments; it does not necessarily caused by sc. apply to special environments such as program debug- - The interrupt is caused by a condition that is ging, bring-up, etc. stated, in the instruction description, poten- In general, the instruction should be emulated if: tially to cause the interrupt. Example: Align- ment interrupt caused by lwarx for which the - The interrupt is caused by a condition for storage operand is not aligned. which the instruction description (including related material such as the introduction to the - The program is attempting to perform a func- section describing the instruction) implies that tion that it should not be permitted to perform. the instruction works correctly. Example: Example: Data Storage interrupt caused by Alignment interrupt caused by lmw for which lwz for which the storage operand is in stor- the storage operand is not aligned, or by dcbz age that the program should not be permitted for which the storage operand is in storage to access. (If the function is one that the pro- that is Write Through Required or Caching gram should be permitted to perform, the con- Inhibited. ditions that caused the interrupt should be corrected and the program re-dispatched such - The instruction is an illegal instruction that that the instruction will be re-executed. Exam- should appear, to the program executing it, as ple: Data Storage interrupt caused by lwz for if it were supported by the implementation. which the storage operand is in storage that Example: A Hypervisor Emulation Assistance the program should be permitted to access interrupt is caused by an instruction that has but for which there currently is no PTE that been phased out of the architecture but is still satisfies the Page Table search.) used by some programs that the operating system supports, or by an instruction that is in Programming Note If a program modifies an instruction that it or another program will subsequently execute and the execution of the instruction causes an interrupt, the state of storage and the content of some registers may appear to be inconsistent to the interrupt han- dler program. For example, this could be the result of one program executing an instruction that causes a Hypervisor Emulation Assistance inter- rupt just before another instance of the same pro- gram stores an Add Immediate instruction in that storage location. To the interrupt handler code, it would appear that a hardware generated the inter- rupt as the result of executing a valid instruction. 826 Power ISATM Book III-S Version 2.06 Execution of these instructions is guaranteed not Programming Note to have the side effect of altering HSRR0 and In order to handle Machine Check and System HSRR1 only if the storage operand is aligned and Reset interrupts correctly, the operating system MSRDR=0. should manage MSRRI as follows. 3. Arithmetic instructions In the Machine Check and System Reset inter- rupt handlers, interpret SRR1 bit 62 (where addi, addis, add, subf, neg MSRRI is placed) as: 4. Compare instructions - 0: interrupt is not recoverable - 1: interrupt is recoverable cmpi, cmp, cmpli, cmpl In each interrupt handler, when enough state 5. Logical and Extend Sign instructions has been saved that a Machine Check or Sys- ori, oris, xori, xoris, and, or, xor, nand, nor, tem Reset interrupt can be recovered from, set eqv, andc, orc, extsb, extsh, extsw MSRRI to 1. 6. Rotate and Shift instructions In each interrupt handler, do the following (in order) just before returning. rldicl<64>, rldicr<64>, rldic<64>, rlwinm, 1. Set MSRRI to 0. rldcl<64>, rldcr<64>, rlwnm, rldimi<64>, rlwimi, 2. Set SRR0 and SRR1 to the values to be sld<64>, slw, srd<64>, srw used by rfid. The new value of SRR1 7. Other instructions should have bit 62 set to 1 (which will hap- pen naturally if SRR1 is restored to the isync value saved there by the interrupt, rfid, hrfid because the interrupt handler will not be executing this sequence unless the inter- mtspr, mfspr, mtmsrd, mfmsr rupt is recoverable). 3. Execute rfid. Programming Note For interrupts that set the SRRs other than Instructions excluded from the list include the fol- Machine Check or System Reset, MSRRI can be lowing. managed similarly when these interrupts occur instructions that set or use XERCA within interrupt handlers for other interrupts that set instructions that set XEROV or XERSO the SRRs. andi., andis., and fixed-point instructions with Rc=1 (Fixed-point instructions with Rc=1 can This Note does not apply to interrupts that set the be replaced by the corresponding instruction HSRRs because these interrupts put the thread with Rc=0 followed by a Compare instruction.) into hypervisor state, and either do not occur or can all floating-point instructions be prevented from occurring within interrupt han- mftb dlers for other interrupts that set the HSRRs. These instructions, and the other excluded instruc- tions, may be implemented with the assistance of 6.4.4 Implicit alteration of HSRR0 the Hypervisor Emulation Assistance interrupt, or of implementation-specific interrupts that modify and HSRR1 HSRR0 and HSRR1. The included instructions are Executing some of the more complex instructions may guaranteed not to be implemented thus. (The have the side effect of altering the contents of HSRR0 included instructions are sufficiently simple as to be and HSRR1. The instructions listed below are guaran- unlikely to need such assistance. Moreover, they teed not to have this side effect. Any omission of are likely to be needed in interrupt handlers before instruction suffixes is significant; e.g., add is listed but HSRR0 and HSRR1 have been saved or after add. is excluded. HSRR0 and HSRR1 have been restored.) 1. Branch instructions Similarly, fetching instructions may have the side effect of altering the contents of HSRR0 and HSRR1 unless b[l][a], bc[l][a], bclr[l], bcctr[l] MSRIR=0. 2. Fixed-Point Load and Store Instructions lbz, lbzx, lhz, lhzx, lwz, lwzx, ld<64>, ldx<64>, stb, stbx, sth, sthx, stw, stwx, std<64>, stdx<64> Chapter 6. Interrupts 827 Version 2.06 6.5 Interrupt Definitions effective address of the interrupt vector for each inter- rupt type. (Section 5.7.4 on page 777 summarizes all Figure 44 shows all the types of interrupts and the val- architecturally defined uses of effective addresses, ues assigned to the MSR for each. Figure 45 shows the including those implied by Figure 45.) Interrupt Type MSR Bit IR DR FE0 FE1 EE RI ME HV System Reset 0 0 0 0 0 0 p 1 Machine Check 0 0 0 0 0 0 0 1 Data Storage 0 0 0 0 0 0 - m Data Segment 0 0 0 0 0 0 - m Instruction Storage 0 0 0 0 0 0 - m Instruction Segment 0 0 0 0 0 0 - m External 0 0 0 0 0 h - e Alignment 0 0 0 0 0 0 - m Program 0 0 0 0 0 0 - m FP Unavailable3 0 0 0 0 0 0 - m Decrementer 0 0 0 0 0 0 - m Hypervisor Decrementer 0 0 0 0 0 - - 1 System Call 0 0 0 0 0 0 - s Trace 0 0 0 0 0 0 - m Hypervisor Data Storage 0 0 0 0 0 - - 1 Hypervisor Instr. Storage. 0 0 0 0 0 - - 1 Hypv Emulation Assistance 0 0 0 0 0 - - 1 Hypervisor Maintenance 0 0 0 0 0 - - 1 Performance Monitor 0 0 0 0 0 0 - m Vector Unavailable1 0 0 0 0 0 0 - m VSX Unavailable2 0 0 0 0 0 0 - m 0 bit is set to 0 1 bit is set to 1 p bit is set to 1 if interrupt ocurred while the thread was in power-saving mode; other- wise not altered - bit is not altered m if LPES1=0, set to 1; otherwise not altered e if LPES0=0, set to 1; otherwise not altered h if LPES0=1, set to 0; otherwise not altered s if LEV=1 or LPES1=0, set to 1; otherwise not altered Settings for Other Bits Bits BE, FP, PMM, PR, SE, and VEC1, and VSX2 are set to 0. If the interrupt results in HV being equal to 1, the LE bit is copied from the HILE bit; other- wise the LE bit is copied from the LPCRILE bit. The SF bit is set to 1. Reserved bits are set as if written as 0. 1 Category: Vector 2 Category: Vector Scalar Emulation 3 Category: Floating-Point Figure 44. MSR setting due to interrupt 828 Power ISATM Book III-S Version 2.06 6.5.1 System Reset Interrupt Effective If a System Reset exception causes an interrupt that is Address1 Interrupt Type not context synchronizing or causes the loss of a 00..0000_0100 System Reset Machine Check exception or a Direct External excep- 00..0000_0200 Machine Check tion, or if the state of the thread has been corrupted, the 00..0000_0300 Data Storage interrupt is not recoverable. 00..0000_0380 Data Segment When the thread is in any power-saving level, a System 00..0000_0400 Instruction Storage Reset interrupt occurs when a System Reset exception 00..0000_0480 Instruction Segment exists. When the thread is in doze or nap power-saving 00..0000_0500 External levels, a System Reset interrupt occurs when any of 00..0000_0600 Alignment the following exceptions exists provided that the excep- 00..0000_0700 Program tion is enabled to cause exit from power saving mode (see Section 2.2, "Logical Partitioning Control Register 00..0000_0800 Floating-Point Unavailable5 (LPCR)"). When the thread is in sleep or rvwinkle 00..0000_0900 Decrementer power-saving level, it is implementation-specific 00..0000_0980 Hypervisor Decrementer whether the following exceptions, when enabled, cause 00..0000_0A00 Reserved exit, or whether only a system-reset causes exit. 00..0000_0B00 Reserved External 00..0000_0C00 System Call 00..0000_0D00 Trace Decrementer 00..0000_0E00 Hypervisor Data Storage Hypervisor Maintenance 00..0000_0E20 Hypervisor Instruction Storage Implementation-specific 00..0000_0E40 Hypervisor Emulation Assistance 00..0000_0E60 Hypervisor Maintenance SRR1 indicates the exception that caused exit from 00..0000_0E80 Reserved power-saving mode as specified below. . . . ... The following registers are set: 00..0000_0EFF Reserved SRR0 If the interrupt did not occur when the 00..0000_0F00 Performance Monitor thread was in power-saving mode, set to 00..0000_0F20 Vector Unavailable3 the effective address of the instruction that 00..0000_0F40 VSX Unavailable4 the thread would have attempted to exe- . . . ... cute next if no interrupt conditions were 00..0000_0FFF Reserved present; otherwise, set to an undefined 1 value. The values in the Effective Address column are interpreted as follows. SRR1 00...0000_nnnn means 33:36 Set to 0. 0x0000_0000_0000_nnnn 42:44 If the interrupt did not occur when the 2 Effective addresses 0x0000_0000_0000_0000 thread was in power-saving mode, set to an through 0x0000_0000_0000_00FF are used by implementation-specific value. If the inter- software and will not be assigned as interrupt rupt occurred when the thread was in vectors. power-saving mode, set to indicate the 3 Category: Vector. exception that caused exit from power-sav- 4 Category: Vector Scalar Extension ing mode as shown below: 5 Category: Floating Point Figure 45. Effective address of interrupt vector by SRR142:44 Exception interrupt type 000 Reserved 001 Implementation specific Programming Note 010 System Reset When address translation is disabled, use of any of 011 Decrementer the effective addresses that are shown as reserved 100 External in Figure 45 risks incompatibility with future imple- 101 Hypervisor Maintenance mentations. 110 Implementation specific 111 Implementation specific If multiple exceptions that cause exit from power-saving mode exist, the exception Chapter 6. Interrupts 829 Version 2.06 reported is the exception corresponding to the interrupt that would have occurred if the 62 If the interrupt did not occur while the same exceptions existed and the thread thread was in power-saving mode, loaded was not in power-saving mode. from bit 62 of the MSR if the thread is in a 45 Set to 0. recoverable state; otherwise set to 0. If the 46:47 Set to indicate whether the interrupt interrupt occurred while the thread was in occurred when the thread was in power- power-saving mode, set to 1 if the thread is saving mode and, if so, the extent in a recoverable state; otherwise set to 0. to which resource state was maintained Others Loaded from the MSR. while the thread was in power-saving MSR See Figure 44 on page 828. mode, as follows: In addition, if the interrupt occurs when the thread is in 00 The interrupt did not occur when power-saving mode and is caused by an exception the thread was in power-saving other than a System Reset exception, all other regis- mode. ters, except HSRR0 and HSRR1, that would be set by the corresponding interrupt if the exception occurred 01 The interrupt occurred when the when the thread was not in power-saving mode are set thread was in power-saving mode. by the System Reset interrupt, and are set to the val- The state of all resources was ues to which they would be set if the exception maintained as if the thread was not occurred when the thread was not in power-saving in power-saving mode. mode. 10 The interrupt occurred when the Execution resumes at effective address thread was in power-saving mode. 0x0000_0000_0000_0100. The state of some resources was The means for software to distinguish between power- not maintained, but the state of all on Reset and other types of System Reset are imple- hypervisor resources was main- mentation-dependent. tained as if the thread was not in power-saving mode and the state of all other resources is such that 6.5.2 Machine Check Interrupt the hypervisor can resume execu- tion. The causes of Machine Check interrupts are implemen- tation-dependent. For example, a Machine Check 11 The interrupt occurred when the interrupt may be caused by a reference to a storage thread was in power-saving mode. location that contains an uncorrectable error or does The state of some resources was not exist (see Section 5.6), or by an error in the storage not maintained, and the state of subsystem. some hypervisor resources was not maintained or the state of some When the thread is not in power-saving mode, Machine resources is such that the hypervi- Check interrupts are enabled when MSRME=1; if sor cannot resume execution. MSRME=0 and a Machine Check exception occurs, the thread enters the Checkstop state. When the thread is in doze or nap power-saving levels, Machine Check Programming Note interrupts are treated as enabled when LPCRPECE[2]=1 Although the resources that are main- and cannot occur when LPCRPECE[2]=0. When the tained in power-saving mode (except thread is in sleep or rvwinkle power-saving level, it is in doze power-saving level) are imple- implementation-specific whether Machine Check inter- mentation-dependent, the hypervisor rupts are treated as enabled under the same conditions can avoid implementation-depen- as in doze and nap power-saving level or if they cannot dence in the portion of the System occur. If a Machine Check exception occurs while the Reset and Machine Check interrupt thread is in power-saving mode and the Machine handlers that recover from having Check exception is not enabled to cause exit from been in power-saving mode by using power-saving mode, the result is implementation spe- the contents of SRR146:47, to deter- cific mine what state to restore. (To avoid implementation-dependence in the The Checkstop state may also be entered if an access portion of the hypervisor that enters is attempted to a storage location that does not exist power-saving mode, the hypervisor (see Section 5.6), or if an implementation-dependant must use the specification of the four hardware error occurs that prevents continued opera- instructions to determine what state to tion. save.) Disabled Machine Check (Checkstop State) 830 Power ISATM Book III-S Version 2.06 When a thread is in Checkstop state, instruction pro- cessing is suspended and generally cannot be 10 The interrupt occurred when the restarted without resetting the thread. Some implemen- thread was in power-saving mode. tations may preserve some or all of the internal state of The state of some resources was the thread when entering Checkstop state, so that the not maintained, but the state of all state can be analyzed as an aid in problem determina- hypervisor resources was main- tion. tained as if the thread was not in power-saving mode and the state Enabled Machine Check of all other resources is such that the hypervisor can resume execu- If a Machine Check exception causes an interrupt that tion. is not context synchronizing or causes the loss of a Direct External exception, or if the state of the thread 11 The interrupt occurred when the has been corrupted, the interrupt is not recoverable. thread was in power-saving mode. In some systems, the operating system may attempt to The state of some resources was identify and log the cause of the Machine Check. not maintained, and the state of some hypervisor resources was The following registers are set: not maintained or the state of some SRR0 If the interrupt did not occur while the resources is such that the hypervi- thread was in power-saving mode, set on a sor cannot resume execution. "best effort" basis to the effective address Programming Note of some instruction that was executing or was about to be executed when the Although the resources that are main- Machine Check exception occurred; other- tained in power-saving mode (except wise set to an undefined value. in the doze power-saving level) are implementation-dependent, the hyper- Programming Note visor can avoid implementation-depen- Since the hypervisor can save the dence in the portion of the System address of the instruction following the Reset and Machine Check interrupt power-saving mode instruction if handlers that recover from having needed, there is no need for the thread been in power-saving mode by using to preserve it and store it into SRR0. the contents of SRR146:47, to deter- Therefore, for ease of implementation, mine what state to restore. (To avoid the contents of SRR0 upon exit from implementation-dependence in the power-saving mode are specified to be portion of the hypervisor that enters undefined. power-saving mode, the hypervisor must use the specification of the four SRR1 instructions to determine what state to 46:47 Set to indicate whether the interrupt save.) occurred when the thread was in power- saving mode and, if so, the extent to which resource state was maintained while the 62 If the interrupt did not occur while the thread was in power-saving mode, as fol- thread was in power-saving mode, loaded lows. from bit 62 of the MSR if the thread is in a recoverable state; otherwise set to 0. If the interrupt occurred while the thread was in 00 The interrupt did not occur when power-saving mode, set to 1 if the thread is the thread was in power-saving in a recoverable state; otherwise set to 0. mode. Others Set to an implementation-dependent value. 01 The interrupt occurred when the MSR See Figure 44. thread was in power-saving mode. DSISR Set to an implementation-dependent value. The state of all resources was maintained as if the thread was not DAR Set to an implementation-dependent value. in power-saving mode. Execution resumes at effective address 0x0000_0000_0000_0200. A Machine Check interrupt caused by the existence of multiple SLB entries or TLB entries (or similar entries in implementation-specific translation caches) which translate a given effective or virtual address (see Sec- Chapter 6. Interrupts 831 Version 2.06 tions 5.7.6.2 and 5.7.7.3.) must occur while still in the DSISR context of the partition that caused it. The interrupt 32 Set to 0. must be presented in a way that permits continuing 33 Set to 1 if MSRDR=1 and the translation for execution, with damage limited to the causing partition. an attempted access is not found in the Treating the exception as instruction-caused will Page Table; otherwise set to 0.. achieve these requirements. 34:35 Set to 0. 36 Set to 1 if the access is not permitted by Programming Note Figure 27 or 28, as appropriate; otherwise If a Machine Check interrupt is caused by an error set to 0. in the storage subsystem, the storage subsystem 37 Set to 1 if the access is due to a lq, stq, may return incorrect data, which may be placed lbarx, lharx, lwarx, ldarx, stbcx., sthcx., into registers. This corruption of register contents stwcx., or stdcx. instruction that may occur even if the interrupt is recoverable. addresses storage that is Write Through Required or Caching Inhibited; otherwise set to 0. 38 Set to 1 for a Store, dcbz, or ecowx 6.5.3 Data Storage Interrupt instruction; otherwise set to 0. A Data Storage interrupt occurs when no higher priority 39:40 Set to 0. exception exists, the value of the expression 41 Set to 1 if a Data Address Breakpoint match occurs; otherwise set to 0. (MSRHV PR = 0b10)|(¬VPM0 & ¬MSRDR) 42 Set to 1 if the access is not permitted by vir- | (¬VPM1 & MSRDR) tual page class key protection; otherwise set to 0. is 1, and a data access cannot be performed for any of the following reasons. Programming Note Storage protection violations for the Data address translation is enabled (MSRDR=1) Data Storage interrupt are reported in and the virtual address of any byte of the storage DSISR36 and DSISR42, whereas stor- location specified by a Load, Store, icbi, dcbz, age protection violations for the Instruc- dcbst, dcbf[l], eciwx, or ecowx instruction cannot tion Storage interrupt are reported in be translated to a real address. SRR135 and SRR136. The effective address specified by a lq, stq, lbarx, lharx, lwarx, ldarx, stbcx., sthcx., stwcx., or stdcx. instruction refers to storage that is Write 43 Set to 1 if execution of an eciwx or ecowx Through Required or Caching Inhibited. instruction is attempted when EARE=0; oth- The access violates storage protection. erwise set to 0. A Data Address Breakpoint match occurs. 44:63 Set to 0. Execution of an eciwx or ecowx instruction is dis- DAR Set to the effective address of a storage allowed because EARE=0. element as described in the following list. If a stbcx., sthcx., stwcx., or stdcx. would not perform The list should be read from the top down; its store in the absence of a Data Storage interrupt, and the DAR is set as described by the first item either (a) the specified effective address refers to stor- that corresponds to an exception that is age that is Write Through Required or Caching Inhib- reported in the DSISR. For example, if a ited, or (b) a non-conditional Store to the specified Load Word instruction causes a storage effective address would cause a Data Storage interrupt, protection violation and a Data Address it is implementation-dependent whether a Data Storage Breakpoint match (and both are reported in interrupt occurs. the DSISR), the DAR is set to the effective address of a byte in the first aligned double- If the XER specifies a length of zero for an indexed word for which access was attempted in the Move Assist instruction, a Data Storage interrupt does page that caused the exception. not occur. a Data Storage exception occurs for The following registers are set: reasons other than a Data Address Breakpoint match or, for eciwx and SRR0 Set to the effective address of the instruc- ecowx, EARE=0 tion that caused the interrupt. - a byte in the block that caused the SRR1 exception, for a Cache Manage- 33:36 Set to 0. ment instruction 42:47 Set to 0. - a byte in the first aligned quad- Others Loaded from the MSR. word for which access was attempted in the page that caused MSR See Figure 44. 832 Power ISATM Book III-S Version 2.06 the exception, for a quadword DAR Set to the effective address of a storage Load or Store instruction (i.e., a element as described in the following list. Load or Store instruction for which a byte in the block that caused the the storage operand is a quad- exception, for a Cache Management word; "first" refers to address instruction order: see Section 6.7) a byte in the first aligned quadword for - a byte in the first aligned double- which access was attempted in the word for which access was segment that caused the exception, for attempted in the page that caused a quadword Load or Store instruction the exception, for a non-quadword (i.e., a Load or Store instruction for Load or Store instruction or an which the storage operand is a quad- eciwx or ecowx instruction word; "first" refers to address order: undefined, for a Data Address Break- see Section 6.7) point match, or if eciwx or ecowx is a byte in the first aligned doubleword executed when EARE=0 for which access was attempted in the segment that caused the exception, for For the cases in which the DAR is specified a non-quadword Load or Store instruc- above to be set to a defined value, if the tion or an eciwx or ecowx instruction interrupt occurs in 32-bit mode the high- order 32 bits of the DAR are set to 0. If the interrupt occurs in 32-bit mode the high-order 32 bits of the DAR are set to 0. If multiple Data Storage exceptions occur for a given effective address, any one or more of the bits corre- Execution resumes at effective address sponding to these exceptions may be set to 1 in the 0x0000_0000_0000_0380. DSISR. Programming Note Execution resumes at effective address 0x0000_0000_0000_0300. A Data Segment interrupt occurs if MSRDR=1 and the translation of the effective address of any byte of the specified storage location is not found in the 6.5.4 Data Segment Interrupt SLB (or in any implementation-specific address translation lookaside information). A Data Segment interrupt occurs when no higher prior- ity exception exists and a data access cannot be per- formed because data address translation is enabled and the effective address of any byte of the storage 6.5.5 Instruction Storage Interrupt location specified by a Load, Store, icbi, dcbz, dcbst, An Instruction Storage interrupt occurs when no higher dcbf[l] eciwx, or ecowx instruction cannot be trans- priority exception exists, the value of the expression lated to a virtual address. (MSRHV PR = 0b10)|(¬VPM0 & ¬MSRIR) If a stbcx., sthcx., stwcx., or stdcx. would not perform its store in the absence of a Data Segment interrupt | (¬VPM1 & MSRIR) and a non-conditional Store to the specified effective is 1, and the next instruction to be executed cannot be address would cause a Data Segment interrupt, it is fetched for any of the following reasons. implementation-dependent whether a Data Segment interrupt occurs. Instruction address translation is enabled and the virtual address cannot be translated to a real If the XER specifies a length of zero for an indexed address. Move Assist instruction, a Data Segment interrupt does The fetch access violates storage protection. not occur. The following registers are set: The following registers are set: SRR0 Set to the effective address of the instruction SRR0 Set to the effective address of the instruc- that the thread would have attempted to exe- tion that caused the interrupt. cute next if no interrupt conditions were SRR1 present (if the interrupt occurs on attempting 33:36 Set to 0. to fetch a branch target, SRR0 is set to the 42:47 Set to 0. branch target address). Others Loaded from the MSR. SRR1 MSR See Figure 44. 33 Set to 1 if MSRIR=1 and the translation for an attempted access is not found in the DSISR Set to an undefined value. Page Table; otherwise set to 0. 34 Set to 0. Chapter 6. Interrupts 833 Version 2.06 35 Set to 1 if the access is to No-execute or 6.5.7 External Interrupt Guarded storage; otherwise set to 0. 36 Set to 1 if the access is not permitted by An External interrupt is classified as being either a Figure 27 or 28, as appropriate; otherwise Direct External interrupt or a Mediated External inter- set to 0. rupt. Throughout this Book, usage of the phrase "Exter- nal interrupt', without further classification, refers to Programming Note both a Direct External interrupt and a Mediated Exter- Storage protection violations for the nal interrupt. Instruction Storage interrupt are reported in SRR135 and SRR136, 6.5.7.1 Direct External Interrupt whereas storage protection violations A Direct External interrupt occurs when no higher prior- for the Data Storage interrupt are ity exception exists, a Direct External exception exists, reported in DSISR36 and DSISR42. and the value of the expression MSREE | (¬(LPES0) & (¬(MSRHV) | MSRPR)) 42:47 Set to 0. is one. The occurrence of the interrupt does not cause Others Loaded from the MSR. the exception to cease to exist. MSR See Figure 44. When LPES0=0, the following registers are set: If multiple Instruction Storage exceptions occur due to HSRR0 Set to the effective address of the instruction attempting to fetch a single instruction, any one or more that the thread would have attempted to exe- of the bits corresponding to these exceptions may be cute next if no interrupt conditions were set to 1 in SRR1. present. Execution resumes at effective address HSRR1 0x0000_0000_0000_0400. 33:36 Set to 0. 42:47 Set to 0. Others Loaded from the MSR. 6.5.6 Instruction Segment MSR See Figure 44 on page 828. Interrupt When LPES0=1, the following registers are set: An Instruction Segment interrupt occurs when no SRR0 Set to the effective address of the instruction higher priority exception exists and the next instruction that the thread would have attempted to exe- to be executed cannot be fetched because instruction cute next if no interrupt conditions were address translation is enabled and the effective present. address cannot be translated to a virtual address. SRR1 The following registers are set: 33:36 Set to 0. SRR0 Set to the effective address of the instruction 42:47 Set to 0. that the thread would have attempted to exe- Others Loaded from the MSR. cute next if no interrupt conditions were MSR See Figure 44 on page 828. present (if the interrupt occurs on attempting to fetch a branch target, SRR0 is set to the Execution resumes at effective address branch target address). 0x0000_0000_0000_0500. SRR1 Programming Note 33:36 Set to 0. 42:47 Set to 0. Because the value of MSREE is always 1 when the Others Loaded from the MSR. thread is in problem state, the simpler expression MSR See Figure 44 on page 828. MSREE | ¬(LPES0 | MSRHV) Execution resumes at effective address is equivalent to the expression given above. 0x0000_0000_0000_0480. Programming Note Programming Note The Direct External exception has the same mean- An Instruction Segment interrupt occurs if ing as the External exception in versions of the MSRIR=1 and the translation of the effective architecture prior to Version 2.05. address of the next instruction to be executed is not found in the SLB (or in any implementation-specific address translation lookaside information). 6.5.7.2 Mediated External Interrupt 834 Power ISATM Book III-S Version 2.06 A Mediated External interrupt occurs when no higher pages that have different storage control priority exception exists, a Mediated External exception attributes. exists (see the definition of LPCRMER in Section 2.2), The operand of a Load or Store is not aligned and and the value of the expression is in storage that is Write Through Required or MSREE & (¬(MSRHV) | MSRPR) Caching Inhibited. is one. The occurrence of the interrupt does not cause The operand of lfdp, lfdpx, stfdp, stfdpx, dcbz, the exception to cease to exist. lbarx, lharx, lwarx, ldarx, stbcx., sthcx., stwcx., or stdcx. is in storage that is Write Through When LPES0=0, the following registers are set: Required or Caching Inhibited. HSRR0 Set to the effective address of the instruction If a stbcx., sthcx., stwcx., or stdcx. would not perform that the thread would have attempted to exe- its store in the absence of an Alignment interrupt and cute next if no interrupt conditions were the specified effective address refers to storage that is present. Write Through Required or Caching Inhibited, it is HSRR1 implementation-dependent whether an Alignment inter- 33:36 Set to 0. rupt occurs. 42 Set to 1. If the XER specifies a length of zero for an indexed 43:47 Set to 0. Move Assist instruction, an Alignment interrupt does Others Loaded from the MSR. not occur. MSR See Figure 44 on page 828. Setting the DSISR and DAR as described below is When LPES0=1, the following registers are set: optional for implementations on which Alignment inter- rupts occur rarely, if ever, for cases that the Alignment SRR0 Set to the effective address of the instruction interrupt handler emulates. For such implementations, that the thread would have attempted to exe- if the DSISR and DAR are not set as described below cute next if no interrupt conditions were they are set to undefined values. present. SRR1 The following registers are set: 33:36 Set to 0. SRR0 Set to the effective address of the instruction 42:47 Set to 0. that caused the interrupt. Others Loaded from the MSR. SRR1 MSR See Figure 44 on page 828. 33:36 Set to 0. 42:47 Set to 0. Execution resumes at effective address Others Loaded from the MSR. 0x0000_0000_0000_0500. MSR See Figure 44. 6.5.8 Alignment Interrupt DSISR 32:43 Set to 0. An Alignment interrupt occurs when no higher priority 44:45 Set to bits 30:31 of the instruction if DS- exception exists and a data access cannot be per- form. Set to 0b00 if D-, or X-form. formed for any of the following reasons. 46 Set to 0. The operand of a Floating-Point Load, a Floating- 47:48 Set to bits 29:30 of the instruction if X-form. Point Store, a Vector-Scalar Extension Load, or a Set to 0b00 if D- or DS-form. Vector-Scalar Extension Store is not word-aligned, 49 Set to bit 25 of the instruction if X-form. Set or crosses a virtual page boundary. to bit 5 of the instruction if D- or DS-form. 50:53 Set to bits 21:24 of the instruction if X-form. The operand of lq, stq, lmw, stmw, lharx, lwarx, Set to bits 1:4 of the instruction if D- or DS- ldarx, sthcx., stwcx., stdcx., eciwx, ecowx, lfdp, form. lfdpx, stfdp, or stfdpx is not aligned. 54:58 Set to bits 6:10 of the instruction (RT/RS/ The operand of a single-register Load or Store is FRT/FRS), except undefined for dcbz. not aligned and the thread is in Little-Endian mode. 59:63 Set to bits 11:15 of the instruction (RA) for update form instructions; set to either bits The instruction is lq, stq, lmw, stmw, lswi, lswx, 11:15 of the instruction or to any register stswi, or stswx, and the operand is in storage that number not in the range of registers to be is Write Through Required or Caching Inhibited, or loaded for a valid form lmw, a valid form the thread is in Little-Endian mode. lswi, or a valid form lswx for which neither The operand of a Load or Store crosses a segment RA nor RB is in the range of registers to be boundary, or crosses a boundary between virtual loaded; otherwise undefined. Chapter 6. Interrupts 835 Version 2.06 DAR Set to the effective address computed by A Floating-Point Enabled Exception type Program the instruction, except that if the interrupt interrupt is generated when the value of the occurs in 32-bit mode the high-order 32 bits expression of the DAR are set to 0. (MSRFE0 | MSRFE1) & FPSCRFEX For an X-form Load or Store, it is acceptable for the thread to set the DSISR to the same value that would is 1. FPSCRFEX is set to 1 by the execution of a have resulted if the corresponding D- or DS-form floating-point instruction that causes an enabled instruction had caused the interrupt. Similarly, for a D- exception, including the case of a Move To or DS-form Load or Store, it is acceptable for the thread FPSCR instruction that causes an exception bit to set the DSISR to the value that would have resulted and the corresponding enable bit both to be 1. for the corresponding X-form instruction. For example, an unaligned lwax (that crosses a protection boundary) would normally, following the description above, cause Privileged Instruction the DSISR to be set to binary: The following applies if the instruction is executed when MSRPR = 1. 000000000000 00 0 01 0 0101 ttttt ????? A Privileged Instruction type Program interrupt where "ttttt" denotes the RT field, and "?????" denotes is generated when execution is attempted of a an undefined 5-bit value. However, it is acceptable if it privileged instruction, or of an mtspr or mfspr causes the DSISR to be set as for lwa, which is instruction with an SPR field that contains a value having spr0=1. 000000000000 10 0 00 0 1101 ttttt ????? The following applies if the instruction is executed If there is no corresponding alternative form instruction when MSRHV PR = 0b00. (e.g., for lwaux), the value described above is set in the DSISR. A Privileged Instruction type Program interrupt is generated when execution is attempted of The instruction pairs that may use the same DSISR an mtspr or mfspr instruction with an SPR value are. field that designates an SPR that is accessible by the instruction only when the thread is in lhz/lhzx lhzu/lhzux lha/lhax lhau/lhaux hypervisor state, or when execution of a lwz/lwzx lwzu/lwzux lwa/lwax hypervisor-privileged instruction is attempted. ld/ldx ldu/ldux lsth/sthx sthu/sthux stw/stwx stwu/stwux Programming Note std/stdx stdu/stdux lfs/lfsx lfsu/lfsux lfd/lfdx lfdu/lfdux These are the only cases in which a Privi- stfs/stfsx stfsu/stfsux stfd/stfdx stfdu/stfdux leged Instruction type Program interrupt can be generated when MSRPR=0. They Execution resumes at effective address can be distinguished from other causes of 0x0000_0000_0000_0600. Privileged Instruction type Program inter- rupts by examining SRR149 (the bit in Programming Note which MSRPR was saved by the interrupt). The architecture does not support the use of an unaligned effective address by lharx, lwarx, ldarx, Trap sthcx., stwcx., stdcx., eciwx, and ecowx. If an A Trap type Program interrupt is generated when Alignment interrupt occurs because one of these any of the conditions specified in a Trap instruction instructions specifies an unaligned effective is met. address, the Alignment interrupt handler must not attempt to simulate the instruction, but instead The following registers are set: should treat the instruction as a programming error. SRR0 For all Program interrupts except a Floating- Point Enabled Exception type Program inter- rupt, set to the effective address of the instruc- 6.5.9 Program Interrupt tion that caused the corresponding exception. A Program interrupt occurs when no higher priority For a Floating-Point Enabled Exception type exception exists and one of the following exceptions Program interrupt, set as described in the fol- arises during execution of an instruction: lowing list. - If MSRFE0 FE1 = 0b00, FPSCRFEX = 1, Floating-Point Enabled Exception and an instruction is executed that changes MSRFE0 FE1 to a nonzero value, set to the effective address of the instruc- 836 Power ISATM Book III-S Version 2.06 tion that the thread would have attempted Others Loaded from the MSR. to execute next if no interrupt conditions Exactly one of bits 43, 45, and 46 is set to 1. were present. MSR See Figure 44 on page 828. Programming Note Execution resumes at effective address Recall that all instructions that can alter 0x0000_0000_0000_0700. MSRFE0 FE1 are context synchroniz- ing, and therefore are not initiated until Programming Note all preceding instructions have reported all exceptions they will cause. In versions of the architecture that precede V. 2.05, the conditions that now cause a Hypervisor Emula- tion Assistance interrupt instead caused an "Illegal - If MSRFE0 FE = 0b11, set to the effective Instruction type Program interrupt". This was a Pro- address of the instruction that caused the gram interrupt for which registers (SRR0, SRR1, Floating-Point Enabled Exception. and the MSR) were set as described above for the - If MSRFE0 FE = 0b01 or 0b10, set to the Privileged Instruction type Program interrupt, effective address of the first instruction except that SRR144 was set to 1 and SRR145 was that caused a Floating-Point Enabled set to 0. Thus operating systems have code to han- Exception since the most recent time dle these conditions, at the Program interrupt vec- FPSCRFEX was changed from 1 to 0 or of tor location. For this reason, if a Hypervisor some subsequent instruction. Emulation Assistance interrupt occurs, when the thread is not in hypervisor state, for an instruction Programming Note that the hypervisor does not emulate, the hypervi- If SRR0 is set to the effective address sor should pass control to the operating system at of a subsequent instruction, that the operating system's Program interrupt vector instruction will not be beyond the first location, with all registers (SRR0, SRR1, MSR, such instruction at which synchroniza- GPRs, etc.) set as if the instruction had caused a tion of floating-point instructions Privileged Instruction type Program interrupt, occurs. (Recall that such synchroniza- except with SRR144:45 set to 0b10. (The Hypervi- tion is caused by Floating-Point Status sor Emulation Assistance interrupt was added to and Control Register instructions, as the architecture in V. 2.05, and the Illegal Instruc- well as by execution synchronizing tion type Program interrupt was removed from the instructions and events.) architecture in V. 2.06. In V. 2.05 the Hypervisor Emulation Assistance interrupt was optional: imple- SRR1 mentations that supported it generated it as 33:36 Set to 0. described in V. 2.06, and never generated an Ille- 42 Set to 0. gal Instruction type Program interrupt; implementa- 43 Set to 1 for a Floating-Point Enabled tions that did not support it generated an Illegal Exception type Program interrupt; other- Instruction type Program interrupt as described wise set to 0. above.) 44 Set to 0. 45 Set to 1 for a Privileged Instruction type Program interrupt; otherwise set to 0. 6.5.10 Floating-Point Unavailable 46 Set to 1 for a Trap type Program interrupt; Interrupt otherwise set to 0. 47 Set to 0 if SRR0 contains the address of A Floating-Point Unavailable interrupt occurs when no the instruction causing the exception and higher priority exception exists, an attempt is made to there is only one such instruction; other- execute a floating-point instruction (including floating- wise set to 1. point loads, stores, and moves), and MSRFP=0. Programming Note The following registers are set: SRR147 can be set to 1 only if the SRR0 Set to the effective address of the instruc- exception is a Floating-Point Enabled tion that caused the interrupt. Exception and either MSRFE0 FE1 = SRR1 0b01 or 0b10 or MSRFE0 FE1 has just 33:36 Set to 0. been changed from 0b00 to a nonzero 42:47 Set to 0. value. (SRR147 is always set to 1 in the Others Loaded from the MSR. last case.) MSR See Figure 44 on page 828. Chapter 6. Interrupts 837 Version 2.06 Execution resumes at effective address 6.5.13 System Call Interrupt 0x0000_0000_0000_0800. A System Call interrupt occurs when a System Call instruction is executed. 6.5.11 Decrementer Interrupt The following registers are set: A Decrementer interrupt occurs when no higher priority exception exists, a Decrementer exception exists, and SRR0 Set to the effective address of the instruc- MSREE=1. tion following the System Call instruction. The following registers are set: SRR1 33:36 Set to 0. SRR0 Set to the effective address of the instruc- 42:47 Set to 0. tion that the thread would have attempted Others Loaded from the MSR. to execute next if no interrupt conditions were present. MSR See Figure 44 on page 828. SRR1 Execution resumes at effective address 33:36 Set to 0. 0x0000_0000_0000_0C00. 42:47 Set to 0. Others Loaded from the MSR. Programming Note An attempt to execute an sc instruction with LEV=1 MSR See Figure 44 on page 828. in problem state should be treated as a program- Execution resumes at effective address ming error. 0x0000_0000_0000_0900. 6.5.14 Trace Interrupt [Category: 6.5.12 Hypervisor Decrementer Trace] Interrupt A Trace interrupt occurs when no higher priority excep- tion exists and either MSRSE=1 and any instruction A Hypervisor Decrementer interrupt occurs when no except rfid or hrfid, is successfully completed, or higher priority exception exists, a Hypervisor Decre- MSRBE=1 and a Branch instruction is completed. Suc- menter exception exists, and the value of the following cessful completion means that the instruction caused expression is 1. no other interrupt. Thus a Trace interrupt never occurs for a System Call instruction, or for a Trap instruction (MSREE | ¬(MSRHV) | MSRPR) & HDICE that traps. The instruction that causes a Trace interrupt The following registers are set: is called the "traced instruction". HSRR0 Set to the effective address of the instruc- When a Trace interrupt occurs, the following registers tion that the thread would have attempted are set: to execute next if no interrupt conditions SRR0 Set to the effective address of the instruc- were present. tion that the thread would have attempted HSRR1 to execute next if no interrupt conditions 33:36 Set to 0. were present. 42:47 Set to 0. SRR1 Others Loaded from the MSR. 33:36 and 42:47 MSR See Figure 44 on page 828. Set to an implementation-dependent value. Others Loaded from the MSR. Execution resumes at effective address 0x0000_0000_0000_0980. MSR See Figure 44 on page 828. Execution resumes at effective address Programming Note 0x0000_0000_0000_0D00. Because the value of MSREE is always 1 when the thread is in problem state, the simpler expression Extensions to the Trace facility are described in Appendix C. (MSREE | ¬(MSRHV)) & HDICE is equivalent to the expression given above. 838 Power ISATM Book III-S Version 2.06 The following registers are set: Programming Note The following instructions are not traced. HSRR0 Set to the effective address of the instruc- tion that caused the interrupt. rfid hrfid HSRR1 sc, and Trap instructions that trap 33:36 Set to 0. other instructions that cause interrupts (other 42:47 Set to 0. than Trace interrupts) Others Loaded from the MSR. the first instructions of any interrupt handler MSR See Figure 44. instructions that are emulated by software HDSISR In general, interrupt handlers can achieve the effect 32 Set to 0. of tracing these instructions. 33 Set to 1 if the value of the expression (MSRDR) | ((¬MSRDR & VPM0) & LPES1) is 1 and the translation for an attempted access is not found in the Page Table; oth- 6.5.15 Hypervisor Data Storage erwise set to 0. Interrupt 34:35 Set to 0. 36 Set to 1 if the access is not permitted by A Hypervisor Data Storage interrupt occurs when the Figure 27 or 28, as appropriate; otherwise thread is not in hypervisor state, no higher priority set to 0. exception exists, the value of the expression 37 Set to 1 if the access is due to a lq, stq, (VPM0 & ¬MSRDR) | (VPM1 & MSRDR) lbarx, lharx, lwarx, ldarx, stbcx., sthcx., stwcx., or stdcx. instruction that is 1, and a data access cannot be performed for any of addresses storage that is Write Through the following reasons. Required or Caching Inhibited; otherwise set to 0. Data address translation is enabled (MSRDR=1) 38 Set to 1 for a Store, dcbz, or ecowx and the virtual address of any byte of the storage instruction; otherwise set to 0. location specified by a Load, Store, icbi, dcbz, 39:40 Set to 0. dcbst, dcbf[l], eciwx, or ecowx instruction cannot 41 Set to 1 if a Data Address Breakpoint be translated to a real address. match occurs; otherwise set to 0. Data address translation is disabled (MSRDR=0), 42 Set to 1 if the access is not permitted by vir- LPES1 =1, and the virtual address of any byte of tual page class key protection; otherwise the storage location specified by a Load, Store, set to 0. icbi, dcbz, dcbst, dcbf[l], eciwx, or ecowx instruction cannot be translated to a real address Programming Note by means of the virtual real addressing mecha- Storage protection violations for the nism. Hypervisor Data Storage interrupt are The effective address specified by a lq, stq, lbarx, reported in HDSISR36 and HDSISR42, lharx, lwarx, ldarx, stbcx., sthcx., stwcx., or whereas storage protection violations stdcx. instruction refers to storage that is Write for the Hypervisor Instruction Storage Through Required or Caching Inhibited. interrupt are reported in HSRR135 and The access violates storage protection. HSRR136. A Data Address Breakpoint match occurs. Execution of an eciwx or ecowx instruction is dis- allowed because EARE=0. 43 Set to 1 if execution of an eciwx or ecowx instruction is attempted when EARE=0; oth- If a stbcx., sthcx., stwcx., or stdcx. would not perform erwise set to 0. its store in the absence of a Hypervisor Data Storage 44:63 Set to 0. interrupt, and either (a) the specified effective address refers to storage that is Write Through Required or HDAR Set to the effective address of a storage Caching Inhibited, or (b) a non-conditional Store to the element as described in the following list. specified effective address would cause a Hypervisor The list should be read from the top down; Data Storage interrupt, it is implementation-dependent the HDAR is set as described by the first whether a Hypervisor Data Storage interrupt occurs. item that corresponds to an exception that is reported in the HDSISR. For example, if If the XER specifies a length of zero for an indexed a Load Word instruction causes a storage Move Assist instruction, a Hypervisor Data Storage protection violation and a Data Address interrupt does not occur. Breakpoint match (and both are reported in Chapter 6. Interrupts 839 Version 2.06 the HDSISR), the HDAR is set to the effec- cannot be translated to a real address by means of tive address of a byte in the first aligned the virtual real addressing mechanism. doubleword for which access was The fetch access violates storage protection. attempted in the page that caused the The following registers are set: exception. a Hypervisor Data Storage exception HSRR0 Set to the effective address of the instruction occurs for reasons other than a Data that the thread would have attempted to exe- Address Breakpoint match or, for cute next if no interrupt conditions were eciwx and ecowx, EARE=0 present (if the interrupt occurs on attempting - a byte in the block that caused the to fetch a branch target, HSRR0 is set to the exception, for a Cache Manage- branch target address). ment instruction HSRR1 - a byte in the first aligned quad- 33 Set to 1 if the value of the expression word for which access was (MSRIR) | ((¬MSRIR & VPM0) attempted in the page that caused & LPES1) the exception, for a quadword is 1 and the translation for an attempted Load or Store instruction (i.e., a access is not found in the Page Table; oth- Load or Store instruction for which erwise set to 0. the storage operand is a quad- 34 Set to 0. word; "first" refers to address 35 Set to 1 if the access is to No-execute or order: see Section 6.7) Guarded storage; otherwise set to 0. - a byte in the first aligned double- 36 Set to 1 if the access is not permitted by word for which access was Figure 27 or 28, as appropriate; otherwise attempted in the page that caused set to 0. the exception, for a non-quadword Load or Store instruction or an Programming Note eciwx or ecowx instruction Storage protection violations for the undefined, for a Data Address Break- Hypervisor Instruction Storage interrupt point match, or if eciwx or ecowx is are reported in HSRR135 and executed when EARE=0 HSRR136, whereas storage protection For the cases in which the HDAR is speci- violations for the Hypervisor Data Stor- fied above to be set to a defined value, if age interrupt are reported in HDSISR36 the interrupt occurs in 32-bit mode the high- and HDSISR42. order 32 bits of the HDAR are set to 0. If multiple Hypervisor Data Storage exceptions occur 42:46 Set to 0. for a given effective address, any one or more of the 47 Set to 0. bits corresponding to these exceptions may be set to 1 Others Loaded from the MSR. in the HDSISR. MSR See Figure 44. Execution resumes at effective address If multiple Hypervisor Instruction Storage exceptions 0x0000_0000_0000_0E00. occur due to attempting to fetch a single instruction, any one or more of the bits corresponding to these 6.5.16 Hypervisor Instruction exceptions may be set to 1 in HSRR1. Storage Interrupt Execution resumes at effective address 0x0000_0000_0000_0E10. A Hypervisor Instruction Storage interrupt occurs when the thread is not in hypervisor state, no higher priority exception exists, the value of the expression (VPM0 & ¬MSRIR) | (VPM1 & MSRIR) 6.5.17 Hypervisor Emulation is 1, and the next instruction to be executed cannot be Assistance Interrupt fetched for any of the following reasons. A Hypervisor Emulation Assistance interrupt is gener- Instruction address translation is enabled ated when execution is attempted of an illegal instruc- (MSRIR=1) and the virtual address cannot be tion, or of a reserved instruction or an instruction that is translated to a real address. not provided by the implementation. It is also generated Instruction address translation is disabled under the following conditions. (MSRIR=0), LPES1 =1, and the virtual address 840 Power ISATM Book III-S Version 2.06 an mtspr or mfspr instruction is executed when Others Loaded from the MSR. MSRPR=1 if the instruction specifies an SPR with MSR See Figure 44 on page 828. spr0=0 that is not provided by the implementation an mtspr or mfspr instruction is executed when HMER See Section 6.2.8 on page 823. MSRPR=0 if the instruction specifies SPR 0 an mfspr instruction is executed when MSRPR=0 if The exception bits in the HMER are sticky; that is, once the instruction specifies SPR 4, 5, or 6 set to 1 they remain set to 1 until they are set to 0 by an A Hypervisor Emulation Assistance interrupt may be mthmer instruction. generated when execution is attempted of any of the Execution resumes at effective address following kinds of instruction. 0x0000_0000_0000_0E60. an instruction that is in invalid form an lswx instruction for which RA or RB is in the range of registers to be loaded Programming Note The following registers are set: Because the value of MSREE is always 1 when the HSRR0 Set to the effective address of the instruc- thread is in problem state, the simpler expression tion that caused the interrupt. (MSREE | ¬(MSRHV)) HSRR1 33:36 Set to 0. is equivalent to the expression given above. 42:47 Set to 0. Others Loaded from the MSR. Programming Note MSR See Figure 44 on page 828. If an implementation uses the HMER to record that HEIR Set to a copy of the instruction that caused a readable resource, such as the Time Base, has the interrupt been corrupted, then, because the HMI is disabled in the hypervisor state, it is necessary for the Execution resumes at effective address hypervisor to check HMER after reading that 0x0000_0000_0000_0E40. resource to be sure an error has not occurred. Programming Note If a Hypervisor Emulation Assistance interrupt occurs, when the thread is not in hypervisor state, for an instruction that the hypervisor does not emu- 6.5.19 Performance Monitor late, the hypervisor should pass control to the oper- Interrupt [Category: Server.Perfor- ating system as if the instruction had caused an "Illegal Instruction type Program interrupt", as mance Monitor] described in a Programming Note near the end of The Performance Monitor interrupt is part of the Perfor- Section 6.5.9, "Program Interrupt" on page 837. mance Monitor facility; see Appendix C. If the Perfor- mance Monitor facility is not implemented or does not use this interrupt, the corresponding interrupt vector 6.5.18 Hypervisor Maintenance (see Figure 45 on page 829) is treated as reserved. Interrupt A Hypervisor Maintenance interrupt occurs when no 6.5.20 Vector Unavailable Inter- higher priority exception exists, a Hypervisor Mainte- rupt [Category: Vector] nance exception exists (a bit in the HMER is set to one), the exception is enabled in the HMEER, and the A Vector Unavailable interrupt occurs when no higher value of the following expression is 1. priority exception exists, an attempt is made to execute a Vector instruction (including Vector loads, stores, and (MSREE | ¬(MSRHV) | MSRPR ) moves), and MSRVEC=0. The following registers are set: The following registers are set: HSRR0 Set to the effective address of the instruc- SRR0 Set to the effective address of the instruc- tion that the thread would have attempted tion that caused the interrupt. to execute next if no interrupt conditions were present. SRR1 33:36 Set to 0. HSRR1 42:47 Set to 0. 33:36 Set to 0. Others Loaded from the MSR. 42:47 Set to 0. Chapter 6. Interrupts 841 Version 2.06 MSR See Figure 44 on page 828. Execution resumes at effective address 0x0000_0000_0000_0F20. 6.5.21 VSX Unavailable Interrupt [Category: VSX] A VSX Unavailable interrupt occurs when no higher pri- ority exception exists, an attempt is made to execute a VSX instruction (including VSX loads, stores, and moves), and MSRVSX=0. The following registers are set: SRR0 Set to the effective address of the instruc- tion that caused the interrupt. SRR1 33:36 Set to 0. 42:47 Set to 0. Others Loaded from the MSR. MSR See Figure 44 on page 828. Execution resumes at effective address 0x0000_0000_0000_0F40. 842 Power ISATM Book III-S Version 2.06 6.6 Partially Executed Programming Note Instructions An exception may result in the partial execution of a Load or Store instruction. For example, if the If a Data Storage, Data Segment, Alignment, system- Page Table Entry that translates the address of the caused, or imprecise exception occurs while a Load or storage operand is altered, by a program running Store instruction is executing, the instruction may be on another thread, such that the new contents of aborted. In such cases the instruction is not completed, the Page Table Entry preclude performing the but may have been partially executed in the following access, the alteration could cause the Load or respects. Store instruction to be aborted after having been partially executed. Some of the bytes of the storage operand may have been accessed, except that if access to a As stated in the Book II section cited above, if an given byte of the storage operand would violate instruction is partially executed the contents of reg- storage protection, that byte is neither copied to a isters are preserved to the extent that the instruc- register by a Load instruction nor modified by a tion can be re-executed correctly. The consequent Store instruction. Also, the rules for storage preservation is described in the following list. For accesses given in Section 5.8.1, "Guarded Stor- any given instruction, zero, one, or two items in the age" and in Section 2.1 of Book II are obeyed. list apply. Some registers may have been altered as For a fixed-point Load instruction that is not a described in the Book II section cited above. multiple or string form, or for an eciwx instruc- tion, if RT=RA or RT=RB then the contents of Reference and Change bits may have been register RT are not altered. updated as described in Section 5.7.8. For an lq instruction, if RT+1 = RA then the For a stbcx., sthcx., stwcx., or stdcx. instruction contents of register RT+1 are not altered. that is executed in-order, CR0 may have been set to an undefined value and the reservation may For an update form Load or Store instruction, have been cleared. the contents of register RA are not altered. The architecture does not support continuation of an aborted instruction but intends that the aborted instruc- tion be re-executed if appropriate. Chapter 6. Interrupts 843 Version 2.06 6.7 Exception Ordering Instruction-Caused and Precise Since multiple exceptions can exist at the same time 1. Instruction Segment and the architecture does not provide for reporting 2. [Hypervisor] Instruction Storage more than one interrupt at a time, the generation of 3.a Hypervisor Emulation Assistance more than one interrupt is prohibited. Some exceptions, 3.b Program such as the Mediated External exception, persist and - Privileged Instruction can be deferred. However, other exceptions would be 4. Function-Dependent lost if they were not recognized and handled when they 4.a Fixed-Point and Branch occur. For example, if an External interrupt was gener- 1a Program ated when a Data Storage exception existed, the Data - Trap Storage exception would be lost. If the Data Storage 1b System Call exception was caused by a Store Multiple instruction 1c [Hypervisor] Data Storage, [Hypervisor] Data for which the storage operand crosses a virtual page Segment, or Alignment boundary and the exception was a result of attempting 2 Trace to access the second virtual page, the store could have 4.b Floating-Point modified locations in the first virtual page even though it 1 FP Unavailable appeared that the Store Multiple instruction was never 2a Program executed. - Precise Mode Floating-Pt Enabled Excep'n 2b [Hypervisor] Data Storage, [Hypervisor] Data For the above reasons, all exceptions are prioritized Segment, or Alignment with respect to other exceptions that may exist at the 3 Trace same instant to prevent the loss of any exception that is 4.c Vector not persistent. Some exceptions cannot exist at the 1 Vector Unavailable same instant as some others. 2a [Hypervisor] Data Storage, [Hypervisor] Data Segment, or Alignment Data Storage, Hypervisor Data Storage, and Alignment 3 Trace exceptions occur as if the storage operand were accessed one byte at a time in order of increasing For implementations that execute multiple instructions effective address (with the obvious caveat if the oper- in parallel using pipeline or superscalar techniques, or and includes both the maximum effective address and combinations of these, it can be difficult to understand effective address 0). the ordering of exceptions.To understand this ordering it is useful to consider a model in which each instruction is fetched, then decoded, then executed, all before the 6.7.1 Unordered Exceptions next instruction is fetched. In this model, the exceptions The exceptions listed here are unordered, meaning that a single instruction would generate are in the order they may occur at any time regardless of the state of shown in the list of instruction-caused exceptions. the interrupt processing mechanism. These exceptions Exceptions with different numbers have different order- are recognized and processed when presented. ing. Exceptions with the same numbering but different lettering are mutually exclusive and cannot be caused 1. System Reset by the same instruction. The External, Decrementer, 2. Machine Check and Hypervisor Decrementer interrupts have equal ordering. Similarly, where Data Storage, Data Seg- 6.7.2 Ordered Exceptions ment, and Alignment exceptions are listed in the same item they have equal ordering. The exceptions listed here are ordered with respect to Even on threads that are capable of executing several the state of the interrupt processing mechanism. In the instructions simultaneously, or out of order, instruction- following list, the hypervisor forms of the Data Storage caused interrupts (precise and imprecise) occur in pro- and Instruction Storage exceptions can be substituted gram order. for the non-hypervisor forms since the hypervisor forms cannot be caused by the same instruction and have the same ordering. 6.8 Interrupt Priorities System-Caused or Imprecise This section describes the relationship of non- 1. Program maskable, maskable, precise, and imprecise interrupts. - Imprecise Mode Floating-Point Enabled Exception In the following descriptions, the interrupt mechanism 2. Hypervisor Maintenance waiting for all possible exceptions to be reported 3. External and [Hypervisor] Decrementer includes only exceptions caused by previously initiated instructions (e.g., it does not include waiting for the Decrementer to step through zero). The exceptions 844 Power ISATM Book III-S Version 2.06 are listed in order of highest to lowest priority. The one of the three interrupts for which an exception phrase "corresponding interrupt" means the interrupt exists). having the same name as the exception unless the A. Fixed-Point Loads and Stores thread is in power-saving mode, in which case the a. These exceptions are mutually exclusive phrase means the System Reset interrupt. and have the same priority: Unless otherwise stated or obvious from context, it is Hypervisor Emulation Assistance assumed below that one of the following conditions is Program - Privileged Instruction satisfied. b. [Hypervisor] Data Storage, [Hypervisor] Data Segment, or Alignment The thread is not in power-saving mode and the c. Trace interrupt, unless it is the Machine Check inter- rupt, is not disabled. (For the Machine Check B. Floating-Point Loads and Stores interrupt no assumption is made regarding a. Hypervisor Emulation Assistance enablement.) b. Floating-Point Unavailable c. [Hypervisor] Data Storage, [Hypervisor] The thread is in power-saving mode and the Data Segment, or Alignment exception is enabled to cause exit from the d. Trace mode. C. Vector Loads and Stores In the following list, the hypervisor forms of the Data a. Hypervisor Emulation Assistance Storage and Instruction Storage exceptions can be b. Vector Unavailable substituted for the non-hypervisor forms since the c. [Hypervisor] Data Storage, [Hypervisor] hypervisor forms cannot be caused by the same Data Segment, or Alignment instruction and have the same priority. d. Trace 1. System Reset D. Other Floating-Point Instructions System Reset exception has the highest priority of a. Floating-Point Unavailable all exceptions. If this exception exists, the interrupt b. Program - Precise Mode Floating-Point mechanism ignores all other exceptions and gen- Enabled Exception erates a System Reset interrupt. c. Trace Once the System Reset interrupt is generated, no E. Other Vector Instructions nonmaskable interrupts are generated due to a. Vector Unavailable exceptions caused by instructions issued prior to b. Trace the generation of this interrupt. F. rfid, hrfid and mtmsr[d] 2. Machine Check a. Program - Privileged Instruction b. Program - Floating-Point Enabled Exception Machine Check exception is the second highest c. Trace, for mtmsr[d] only priority exception. If this exception exists and a System Reset exception does not exist, the inter- G. Other Instructions rupt mechanism ignores all other exceptions and a.These exceptions are mutually exclusive generates a Machine Check interrupt. and have the same priority: Program - Trap Once the Machine Check interrupt is generated, System Call no nonmaskable interrupts are generated due to Program - Privileged Instruction exceptions caused by instructions issued prior to Hypervisor Emulation Assistance the generation of this interrupt. b.Trace 3. Instruction-Dependent H. [Hypervisor] Instruction Storage and This exception is the third highest priority excep- Instruction Segment tion. When this exception is created, the interrupt These exceptions have the lowest priority in mechanism waits for all possible Imprecise excep- this category. They are recognized only when tions to be reported. It then generates the appro- all instructions prior to the instruction causing priate ordered interrupt if no higher priority one of these exceptions appear to have com- exception exists when the interrupt is to be gener- pleted and that instruction is the next instruc- ated. Within this category a particular instruction tion to be executed. The two exceptions are may present more than a single exception. When mutually exclusive. this occurs, those exceptions are ordered in prior- ity as indicated in the following lists. Where [Hyper- The priority of these exceptions is specified for visor] Data Storage, Data Segment, and Alignment completeness and to ensure that they are not exceptions are listed in the same item they have given more favorable treatment. It is accept- equal priority (i.e., the hardware may generate any able for an implementation to treat these Chapter 6. Interrupts 845 Version 2.06 exceptions as though they had a lower prior- Programming Note ity. An incorrect or malicious operating system 4. Program - Imprecise Mode Floating-Point Enabled could corrupt the first instruction in the inter- Exception rupt vector location for an instruction-caused This exception is the fourth highest priority excep- interrupt such that the attempt to execute the tion. When this exception is created, the interrupt instruction causes the same exception that mechanism waits for all other possible exceptions caused the interrupt (a looping interrupt; e.g., to be reported. It then generates this interrupt if no Trap instruction and Program interrupt). Simi- higher priority exception exists when the interrupt larly, the first instruction of the interrupt vector is to be generated. for one instruction-caused interrupt could cause a different instruction-caused interrupt, 5. Hypervisor Maintenance and the first instruction of the interrupt vector This exception is the fifth highest priority excep- for the second instruction-caused interrupt tion. When this exception is created, the interrupt could cause the first instruction-caused inter- mechanism waits for all other possible exceptions rupt (e.g., Program interrupt and Floating-Point to be reported. It then generates this interrupt if no Unavailable interrupt). Similarly, if the Real higher priority exception exists when the interrupt Mode Area is virtualized and there is no PTE is to be generated. for the page containing the interrupt vectors, every attempt to execute the first instruction of If a Hypervisor Maintenance exception exists and the OS's Instruction Storage interrupt handler each attempt to execute an instruction when the would cause a Hypervisor Instruction Storage Hypervisor Maintenance interrupt is enabled interrupt; if the Hypervisor Instruction Storage causes an exception (see the Programming Note interrupt handler returns to the OS's Instruction below), the Hypervisor Maintenance interrupt is Storage interrupt handler without the relevant not delayed indefinitely. PTE having been created, another Hypervisor 6. Direct External, Mediated External, and [Hypervi- Instruction Storage interrupt would occur sor] Decrementer immediately. The looping caused by these and similar cases is terminated by the occurrence These exceptions are the lowest priority excep- of a System Reset or Hypervisor Decrementer tions. All have equal priority (i.e., the hardware interrupt. may generate any one of the corresponding inter- rupts for which an exception exists). When one of these exceptions is created, the interrupt process- ing mechanism waits for all other possible excep- tions to be reported. It then generates the corresponding interrupt if no higher priority excep- tion exists when the interrupt is to be generated. If a Hypervisor Decrementer exception exists and each attempt to execute an instruction when the Hypervisor Decrementer interrupt is enabled causes an exception (see the Programming Note below), the Hypervisor Decrementer interrupt is not delayed indefinitely. If LPES0=1 and a Direct External exception exists and each attempt to execute an instruction when this interrupt is enabled causes an exception (see the Programming Note below), the Direct External interrupt is not delayed indefinitely. 846 Power ISATM Book III-S Version 2.06 Chapter 7. Timer Facilities 7.1 Overview. . . . . . . . . . . . . . . . . . . . 849 7.4 Hypervisor Decrementer . . . . . . . . 851 7.2 Time Base (TB) . . . . . . . . . . . . . . 849 7.5 Processor Utilization of Resources 7.2.1 Writing the Time Base . . . . . . . . 850 Register (PURR) . . . . . . . . . . . . . . . . . 852 7.3 Decrementer . . . . . . . . . . . . . . . . . 850 7.6 Scaled Processor Utilization of 7.3.1 Writing and Reading the Resources Register (SPURR) . . . . . . . 853 Decrementer . . . . . . . . . . . . . . . . . . . . 851 7.1 Overview See Chapter 5 of Book II for infromation about the update frequency of the Time Base. The Time Base, Decrementer, Hypervisor Decre- The Time Base is implemented such that: menter, Processor Utilization of Resources, and Scaled Processor Utilization of Resources registers 1. Loading a GPR from the Time Base has no effect provide timing functions for the system. The remainder on the accuracy of the Time Base. of this section describes these registers and related 2. Copying the contents of a GPR to the Time Base facilities. replaces the contents of the Time Base with the contents of the GPR. 7.2 Time Base (TB) The Power ISA does not specify a relationship between the frequency at which the Time Base is updated and The Time Base (TB) is a 64-bit register (see Figure 46) other frequencies, such as the CPU clock or bus clock containing a 64-bit unsigned integer that is incremented in a Power ISA system. The Time Base update fre- periodically. quency is not required to be constant. What is required, so that system software can keep time of day and oper- 0 39 ate interval timers, is one of the following. TBU40 /// The system provides an (implementation-depen- TBU TBL dent) interrupt to software whenever the update 0 32 63 frequency of the Time Base changes, and a means to determine what the current update frequency is. Field Description The update frequency of the Time Base is under TBU40 Upper 40 bits of Time Base the control of the system software. TBU Upper 32 bits of Time Base Implementations must provide a means for either pre- TBL Lower 32 bits of Time Base venting the Time Base from incrementing or preventing it from being read in problem state (MSRPR=1). If the Figure 46. Time Base means is under software control, it must be privileged The Time Base is a hypervisor resource; see Chapter and, in implementations of the Server environment, 2. must be accessible only in hypervisor state (MSRHV PR = 0b10). There must be a method for getting all Time The SPRs TBU40, TBU, and TBL provide access to the Bases in the system to start incrementing with values fields of the Time Base shown in Figure 46. When a that are identical or almost identical. mtspr instruction is executed specifying one of these SPRs, the associated field of the Time Base is altered and the remaining bits of the Time Base are not affected. Chapter 7. Timer Facilities 849 Version 2.06 Rx contain the desired value upper 40 bits of the Time Programming Note Base. If software initializes the Time Base on power-on to some reasonable value and the update frequency mftb Ry # Read 64-bit Time Base value of the Time Base is constant, the Time Base can clrldi Ry,Ry,40# lower 24 bits of old TB be used as a source of values that increase at a mttbu40Rx # write upper 40 bits of TB constant rate, such as for time stamps in trace mftb Rz # read TB value again entries. clrldi Rz,Rz,40# lower 24 bits of new TB cmpld Rz,Ry # compare new and old lwr 24 Even if the update frequency is not constant, val- bge done # no carry out of low 24 bits ues read from the Time Base are monotonically addis Rx,Rx,0x0100#increment upper 40 bits increasing (except when the Time Base wraps from mttbu40 Rx # update to adjust for carry 264-1 to 0). If a trace entry is recorded each time the update frequency changes, the sequence of Programming Note Time Base values can be post-processed to The instructions for writing the Time Base are become actual time values. mode-independent. Thus code written to set the Successive readings of the Time Base may return Time Base will work correctly in either 64-bit or 32- identical values. bit mode. If Time Base bits 60:63 are used as part of a ran- dom number generator, software must account for the fact that these bits are set to 0x0 only when bit 7.3 Decrementer 59 changes state regardless of whether or not they The Decrementer (DEC) is a 32-bit decrementing incremented to 0xF since they were previously set counter that provides a mechanism for causing a Dec- to 0x0. rementer interrupt after a programmable delay. The See the description of the Time Base in Chapter of contents of the Decrementer are treated as a signed Book II for ways to compute time of day in POSIX integer. format from the Time Base. DEC 32 63 7.2.1 Writing the Time Base Figure 47. Decrementer Writing the Time Base is privileged, and can be done The Decrementer counts down. only in hypervisor state. Reading the Time Base is not privileged; it is discussed in Chapter 5 of Book II. Decrementer bits 32:59 count down until their value becomes 0x000_0000, at the next decrement their It is not possible to write the entire 64-bit Time Base value becomes 0xFFF_FFFF. Decrementer bits 60:63 using a single instruction. The mttbl and mttbu may decrement at a variable rate. When the value of bit extended mnemonics write the lower and upper halves 59 changes, bits 60:63 are set to 0xF; if bits 60:63 dec- of the Time Base (TBL and TBU), respectively, preserv- rement to 0x0 before the value of bit 59 changes, they ing the other half. These are extended mnemonics for remain at 0x0 until the value of bit 59 changes. the mtspr instruction; see Appendix A, "Assembler Extended Mnemonics" on page 867. The Decrementer is driven at the same frequency as the Time Base. The period of the Decrementer will The Time Base can be written by a sequence such as: depend on the driving frequency, but if the same values are used as given earlier for the Time Base (see lwz Rx,upper # load 64-bit value for Section 5.2 of Book II), and if the Time Base update lwz Ry,lower # TB into Rx and Ry frequency is constant, the period would be li Rz,0 mttbl Rz # set TBL to 0 32 mttbu Rx # set TBU 2 × 32 TDEC = -------------------- = 137 seconds. mttbl Ry # set TBL 1 GHz Provided that no interrupts occur while the last three instructions are being executed, loading 0 into TBL pre- When the contents of DEC32 change from 0 to 1, a vents the possibility of a carry from TBL to TBU while Decrementer exception will come into existence within the Time Base is being initialized. a reasonable period of time. When the contents of DEC32 change from 1 to 0, the existing Decrementer The preferred method of changing the Time Base uti- exception, if any, will cease to exist within a reasonable lizes the TBU40 facility. The following code sequence period of time, but not later than the completion of the demonstrates the process. Assume the upper 40 bits of next context synchronizing instruction or event. 850 Power ISATM Book III-S Version 2.06 The preceding paragraph applies regardless of whether ble delay. The contents of the Hypervisor Decrementer the change in the contents of DEC32 is the result of are treated as a signed integer. decrementation of the Decrementer by the hardware or of modification of the Decrementer caused by execu- HDEC tion of an mtspr instruction. 32 63 The operation of the Decrementer has the following Figure 48. Hypervisor Decrementer additional properties. The Hypervisor Decrementer is a hypervisor resource; 1. Loading a GPR from the Decrementer has no see Chapter 2. effect on the accuracy of the Time Base. Hypervisor Decrementer bits 32:59 count down until 2. Copying the contents of a GPR to the Decrementer their value becomes 0x000_0000, at the next decre- replaces the contents of the Decrementer with the ment their value becomes 0xFFF_FFFF. Bits 60:63 may contents of the GPR. decrement at a variable rate. When the value of bit 59 changes, bits 60:63 are set to 0xF; if bits 60:63 decre- Programming Note ment to 0x0 before the value of bit 59 changes, they In systems that change the Time Base update fre- remain at 0x0 until the value of bit 59 changes. quency for purposes such as power management, The Hypervisor Decrementer is driven at the same fre- the Decrementer input frequency will also change. quency as the Time Base. The period of the Hypervisor Software must be aware of this in order to set inter- Decrementer will depend on the driving frequency, but val timers. if the same values are used as given above for the If Decrementer bits 60:63 are used as part of a ran- Time Base (see Section 7.2), and if the Time Base dom number generator, software must account for update frequency is constant, the period would be the fact that these bits are set to 0xF only when bit 32 59 changes state regardless of whether or not they 2 × 32 TDEC = -------------------- = 137 seconds. decremented to 0x0 since they were previously set 1 GHz to 0xF. When the contents of HDEC32 change from 0 to 1 and the thread is not in a power-saving mode, a Hypervisor 7.3.1 Writing and Reading the Decrementer exception will come into existence within a reasonable period of time. When a Hypervisor Decre- Decrementer menter interrupt occurs, the existing Hypervisor Decre- The contents of the Decrementer can be read or written menter exception will cease to exist within a reasonable using the mfspr and mtspr instructions, both of which period of time, but not later than the completion of the are privileged when they refer to the Decrementer. next context synchronizing instruction or event. Even if Using an extended mnemonic (see Appendix A, multiple HDEC32 transitions from 0 to 1 occur before a "Assembler Extended Mnemonics" on page 867), the Hypervisor Decrementer interrupt occurs, at most one Decrementer can be written from GPR Rx using: Hypervisor Decrementer exception exists. The preceding paragraph applies regardless of mtdec Rx whether the change in the contents of HDEC32 is the The Decrementer can be read into GPR Rx using: result of decrementation of the Hypervisor Decre- menter by the hardware or of modification of the Hyper- mfdec Rx visor Decrementer caused by execution of an mtspr instruction. Copying the Decrementer to a GPR has no effect on the Decrementer contents or on the interrupt mecha- The operation of the Hypervisor Decrementer has the nism. following additional properties. 1. Loading a GPR from the Hypervisor Decrementer 7.4 Hypervisor Decrementer has no effect on the accuracy of the Hypervisor Decrementer. The Hypervisor Decrementer (HDEC) is a 32-bit decre- 2. Copying the contents of a GPR to the Hypervisor menting counter that provides a mechanism for causing Decrementer replaces the contents of the Hypervi- a Hypervisor Decrementer interrupt after a programma- sor Decrementer with the contents of the GPR. Chapter 7. Timer Facilities 851 Version 2.06 which the PURR value increases is implementation Programming Note dependent. In systems that change the Time Base update fre- quency for purposes such as power management, Let the difference between the value represented by the Hypervisor Decrementer update frequency will the contents of the Time Base at times Ta and Tb be also change. Software must be aware of this in Tab. Let the difference between the value represented order to set interval timers. by the contents of the PURR at time Ta and Tb be the value Pab. The ratio of Pab/Tab is an estimate of the per- If Hypervisor Decrementer bits 60:63 are used as centage of shared resources used by the thread during part of a random number generator, software must the interval Tab. For the set {S} of threads that share account for the fact that these bits are set to 0xF the resources monitored by the PURR, the sum of the only when bit 59 changes state regardless of usage estimates for all the threads in the set is 1.0. whether or not they decremented to 0x0 since they were previously set to 0xF. The definition of the set of threads S, the shared resources corresponding to the set S, and specifics of the algorithm for incrementing the PURR are imple- Programming Note mentation-specific. A Hypervisor Decrementer exception is not created The PURR is implemented such that: if the thread is in a power-saving mode when HDEC32 changes from 0 to 1 because having a 1. Loading a GPR from the PURR has no effect on Hypervisor Decrementer interrupt occur almost the accuracy of the PURR. immediately after exiting the power-saving mode in 2. Copying the contents of a GPR to the PURR this case is deemed unnecessary. The hypervisor replaces the contents of the PURR with the con- already has control, and if a timed exit from the tents of the GPR. power-saving mode is necessary and possible, the hypervisor can use the Decrementer to exit the Programming Note power-saving mode at the appropriate time. For sleep and rvwinkle power-saving levels, the state of Estimates computed as described above may be the Hypervisor Decrementer and Decrementer is useful for purposes related to resource utilization, not necessarily maintained and updated. including utilization-based system management and planning. Because the rate at which the PURR accumulates 7.5 Processor Utilization of resource usage estimates is dependent on the fre- quency at which the Time Base is incremented, Resources Register (PURR) and the frequency of the oscillator that drives instruction execution may vary independently from The Processor Utilization of Resources Register that of the Time Base, the interpretation of the con- (PURR) is a 64-bit counter, the contents of which pro- tents of the PURR may be inaccurate as a mea- vide an estimate of the resources used by the thread. surement of capacity consumption for accounting The contents of the PURR are treated as a 64-bit purposes. The SPURR should be used for unsigned integer. accounting purposes. PURR 0 63 7.6 Scaled Processor Utilization Figure 49. Processor Utilization of Resources Register of Resources Register (SPURR) The PURR is a hypervisor resource; see Chapter 2. The Scaled Processor Utilization of Resources Regis- ter (SPURR) is a 64-bit counter, the contents of which The contents of the PURR increase monotonically, provide an estimate of the resources used by the unless altered by software, until the sum of the contents thread. The contents of the SPURR are treated as a plus the amount by which it is to be increased exceed 64-bit unsigned integer. 0xFFFF_FFFF_FFFF_FFFF (264 - 1) at which point the contents are replaced by that sum modulo 264. There SPURR is no interrupt or other indication when this occurs. 0 63 The rate at which the value represented by the contents of the PURR increases is an estimate of the portion of Figure 50. Scaled Processor Utilization of resources used by the thread per unit time with respect Resources Register to other threads that share those resources monitored The SPURR is a hypervisor resource; see Section 2.7. by the PURR. When the thread is idle, the rate at 852 Power ISATM Book III-S Version 2.06 The contents of the SPURR increase monotonically, unless altered by software, until the sum of the contents plus the amount by which it is to be increased exceed 0xFFFF_FFFF_FFFF_FFFF (264 - 1) at which point the contents are replaced by that sum modulo 264. There is no interrupt or other indication when this occurs. The rate at which the value represented by the contents of the SPURR increases is an estimate of the portion of resources used by the thread with respect to other threads that share those resources monitored by the SPURR, and relative to the computational capacity pro- vided by those resources. The computational capacity provided by the shared resources may vary as a func- tion of the frequency of the oscillator which drives the resources or as a result of deliberate delays in process- ing that are created to reduce power consumption. When the thread is idle, the rate at which the SPURR value increases is implementation dependent. Let the difference between the value represented by the contents of the Time Base at times Ta and Tb be Tab. Let the ratio of the effective and nominal frequen- cies of the oscillator driving instruction execution fe/fn be fr. Let the ratio of delay cycles created by power reduction circuitry and total cycles cd/ct be cr. Let the difference between the value represented by the con- tents of the SPURR at time Ta and Tb be the value Sab. The ratio of Sab/(Tab x fr x (1 - cr)) is an estimate of the percentage of shared resource capacity used by the thread during the interval Tab. For the set {S} of threads that share the resources monitored by the SPURR, the sum of the usage estimates for all the threads in the set is 1.0. The definition of the set of threads S, the shared resources corresponding to the set S, and specifics of the algorithm for incrementing the SPURR are imple- mentation-specific. The SPURR is implemented such that: 1. Loading a GPR from the SPURR has no effect on the accuracy of the SPURR. 2. Copying the contents of a GPR to the SPURR replaces the contents of the SPURR with the con- tents of the GPR. Programming Note Estimates computed as described above may be useful for purposes of resource use accounting, program dispatching, etc. Chapter 7. Timer Facilities 853 Version 2.06 854 Power ISATM Book III-S Version 2.06 Chapter 8. Debug Facilities 8.1 Overview. . . . . . . . . . . . . . . . . . . . 855 8.1.2 Data Address Breakpoint . . . . . . 855 8.1.1 Come-From Address Register . . 855 8.1 Overview a subsequent context synchronizing operation has occurred. Implementations provide debug facilities to enable hardware and software debug functions, such as con- CFAR // trol flow tracing, data address breakpoints, and pro- 0 62 63 gram single-stepping. The debug facilities described in Figure 51. Come-From Address Register this section consist of the Come-From Address Regis- ter (see Section 8.1.1), and the Data Address Break- The contents of the CFAR can be read and written point Register (DABR) and Data Address Breakpoint using the mfspr and mtspr instructions. Acccess to the Register Extension (DABRX) (see Section 8.1.2). The CFAR is privileged. interrupt associated with the Data Address Breakpoint registers is described in Section 6.5.3. The Trace facil- Programming Note ity, which can be used for single-stepping as well as for This register can be used for purposes of debug- control flow tracing, is described in Section 6.5.14. ging software. For example, often a software bug The mfspr and mtspr instructions (see Section 4.4.4) results in the program executing a portion of the provide access to the registers of the debug facilities. code that it should not have reached or causing an unexpected interrupt. In the former case, a break- In addition to the facilities mentioned above, implemen- point can be placed in the portion of the code that tations typically provide debug facilities, modes, and was erroneously reached and the program reexe- access mechanisms that are implementation-specific. cuted. In either case, the interrupt handler can save For example, implementations typically provide facili- the contents of the CFAR (before executing the first ties for instruction address tracing, and also access to instruction that would modify the register), and then certain debug facilities via a dedicated interface such make the saved contents available for a debugger as the IEEE 1149.1 Test Access Port (JTAG). to use in determining the control flow path by which the exception was reached. 8.1.1 Come-From Address Regis- In order to preserve the CFAR's contents for each ter partition and to prevent it from being used to imple- ment a "covert channel" between partitions, the The Come-From Address Register (CFAR) is a 64-bit hypervisor should initialize/save/restore the CFAR register. When an rfid instruction is executed, the reg- when switching partitions on a given thread. ister is set to the effective address of the instruction. When a Branch instruction is executed and the branch is taken, the register is set to the effective address of an 8.1.2 Data Address Breakpoint instruction in the instruction cache block containing the Branch instruction, except that if the Branch instruction The Data Address Breakpoint mechanism provides a is a B-form Branch (i.e. bc, bca, bcl, or bcla) for which means of detecting load and store accesses to a desig- the target address is in the instruction cache block con- nated doubleword. The address comparison is done taining the Branch instruction or is in the previous or on an effective address (EA). next cache block, the register is not necessarily set. The Data Address Breakpoint mechanism is controlled For Branch instructions, the setting need not occur until by the Data Address Breakpoint Register (DABR), Chapter 8. Debug Facilities 855 Version 2.06 shown in Figure 52, and the Data Address Breakpoint - hypervisor state and DABRXHYP = 1, or Register Extension (DABRX), shown in Figure 53. - privileged but non-hypervisor state and DABRXPNH = 1, or DAB BT DW DR - problem state and DABRXPR = 1 0 61 62 63 the instruction is a Store and DABRDW = 1, or the instruction is a Load and DABRDR = 1. Bit(s) Name Description In 32-bit mode the high-order 32 bits of the EA are 0:60 DAB Data Address Breakpoint treated as zeros for the purpose of detecting a match. 61 BT Breakpoint Translation 62 DW Data Write If the above conditions are satisfied, a match also 63 DR Data Read occurs for eciwx and ecowx. For the purpose of deter- mining whether a match occurs, eciwx is treated as a Figure 52. Data Address Breakpoint Register Load, and ecowx is treated as a Store. /// BTI PRIVM If the above conditions are satisfied, it is undefined 0 60 61 63 whether a match occurs in the following cases. The instruction is Store Conditional but the store is Bit(s) Name Description not performed 60 BTI Breakpoint Translation Ignore 61:63 PRIVM Privilege Mask The instruction is dcbz. (For the purpose of deter- 61 HYP Hypervisor state mining whether a match occurs, dcbz is treated as 62 PNH Privileged but Non-Hypervisor state a Store.) 63 PRO Problem state The Cache Management instructions other than dcbz All other fields are reserved. never cause a match. Figure 53. Data Address Breakpoint Register A Data Address Breakpoint match causes a Data Stor- Extension age exception or a Hypervisor Data Storage exception The DABR and DABRX are hypervisor resources; see (see Section 6.5.3, "Data Storage Interrupt" on Section 2.7 on page 737. page 832 and Section 6.5.15, "Hypervisor Data Stor- age Interrupt" on page 840). If a match occurs, some or The supported PRIVM values are 0b000, 0b001, all of the bytes of the storage operand may have been 0b010, 0b011, 0b100, and 0b111. If the PRIVM field accessed; however, if a Store or ecowx instruction does not contain one of the supported values, then causes the match, the storage operand is not modified whether a match occurs for a given storage access is if the instruction is one of the following: undefined. Elsewhere in this section it is assumed that any Store instruction that causes an atomic access the PRIVM field contains one of the supported values. ecowx Programming Note Programming Note PRIVM value 0b000 causes matches not to occur The Data Address Breakpoint mechanism does not regardless of the contents of other DABR and apply to instruction fetches. DABRX fields. PRIVM values 0b101 and 0b110 are not supported because a storage location that is shared between the hypervisor and non-hypervisor Programming Note software is unlikely to be accessed using the same Before setting a breakpoint requested by the oper- EA by both the hypervisor and the non-hypervisor ating system, the hypervisor must verify that the software. (PRIVM value 0b111 is supported prima- requested contents of the DABR and DABRX can- rily for reasons of software compatibility, as not cause the hypervisor to receive a Data Storage described in a subsequent Programming Note.) interrupt that it is not prepared to handle, or that it intrinsically cannot handle (e.g., the EA is in the A Data Address Breakpoint match occurs for a Load or range of EAs at which the hypervisor's Data Stor- Store instruction if, for any byte accessed, all of the fol- age interrupt handler saves registers, DABRBT || lowing conditions are satisfied. DABRXBTI 0b10, DABRDW = 1, and DABRXHYP = 1). the access is - a quadword access and EA0:59 = DABR0:59, or - not a quadword access and EA0:60 = DABR0:60 (= DABRDAB) (MSRDR = DABRBT) | DABRXBTI the thread is in 856 Power ISATM Book III-S Version 2.06 Programming Note Implementations that comply with versions of the architecture that precede Version 2.02 do not pro- vide the DABRX. Forward compatibility for soft- ware that was written for such implementations (and uses the Data Address Breakpoint facility) can be obtained by setting DABRX60:63 to 0b0111. Chapter 8. Debug Facilities 857 Version 2.06 858 Power ISATM Book III-S Version 2.06 Chapter 9. External Control [Category: External Control] The External Control facility permits a program to com- municate with a special-purpose device. The facility 9.2 External Access Instructions consists of a Special Purpose Register, called EAR, The External Access instructions, External Control In and two instructions, called External Control In Word Word Indexed (eciwx) and External Control Out Word Indexed (eciwx) and External Control Out Word Indexed (ecowx), are described in Book II. Additional Indexed (ecowx). information about them is given below. This facility must provide a means of synchronizing the If attempt is made to execute either of these instruc- devices with the hardware to prevent the use of an tions when EARE=0, a Data Storage interrupt occurs address by the device when the translation that pro- with bit 43 of the DSISR set to 1. duced that address is being invalidated. The instructions are supported whenever MSRDR=1. If either instruction is executed when MSRDR=0 (real 9.1 External Access Register addressing mode), the results are boundedly unde- fined. This 32-bit Special Purpose Register controls access to the External Control facility and, for external control operations that are permitted, identifies the target device. E /// RID 32 33 58 63 Bit(s) Name Description 32 E Enable bit 58:63 RID Resource ID All other fields are reserved. Figure 54. External Access Register The External Access Register (EAR) is a hypervisor resource; see Chapter 2. The high-order bits of the RID field that correspond to bits of the Resource ID beyond the width of the Resource ID supported by the implementation are treated as reserved bits. Programming Note The hypervisor can use the EAR to control which programs are allowed to execute External Access instructions, when they are allowed to do so, and which devices they are allowed to communicate with using these instructions. Chapter 9. External Control [Category: External Control] 859 Version 2.06 860 Power ISATM Book III-S Version 2.06 Chapter 10. Synchronization Requirements for Context Alterations Changing the contents of certain System Registers, the If a sequence of instructions contains context-altering contents of SLB entries, or the contents of other system instructions and contains no instructions that are resources that control the context in which a program affected by any of the context alterations, no software executes can have the side effect of altering the con- synchronization is required within the sequence. text in which data addresses and instruction addresses are interpreted, and in which instructions are executed Programming Note and data accesses are performed. For example, Sometimes advantage can be taken of the fact that changing MSRIR from 0 to 1 has the side effect of certain events, such as interrupts, and certain enabling translation of instruction addresses. These instructions that occur naturally in the program, side effects need not occur in program order, and such as the rfid that returns from an interrupt han- therefore may require explicit synchronization by soft- dler, provide the required synchronization. ware. (Program order is defined in Book II.) An instruction that alters the context in which data No software synchronization is required before or after addresses or instruction addresses are interpreted, or a context-altering instruction that is also context syn- in which instructions are executed or data accesses are chronizing or when altering the MSR in most cases performed, is called a context-altering instruction. This (see the tables). No software synchronization is chapter covers all the context-altering instructions. The required before most of the other alterations shown in software synchronization required for them is shown in Table 2, because all instructions preceding the context- Table 1 (for data access) and Table 2 (for instruction altering instruction are fetched and decoded before the fetch and execution). context-altering instruction is executed (the hardware must determine whether any of these preceding The notation "CSI" in the tables means any context instructions are context synchronizing). synchronizing instruction (e.g., sc, isync, or rfid). A context synchronizing interrupt (i.e., any interrupt Unless otherwise stated, the material in this chapter except non-recoverable System Reset or non-recover- assumes a single-threaded environment. able Machine Check) can be used instead of a context synchronizing instruction. If it is, phrases like "the syn- chronizing instruction", below, should be interpreted as meaning the instruction at which the interrupt occurs. If no software synchronization is required before (after) a context-altering instruction, "the synchronizing instruc- tion before (after) the context-altering instruction" should be interpreted as meaning the context-altering instruction itself. The synchronizing instruction before the context-alter- ing instruction ensures that all instructions up to and including that synchronizing instruction are fetched and executed in the context that existed before the alter- ation. The synchronizing instruction after the context- altering instruction ensures that all instructions after that synchronizing instruction are fetched and executed in the context established by the alteration. Instruc- tions after the first synchronizing instruction, up to and including the second synchronizing instruction, may be fetched or executed in either context. Chapter 10. Synchronization Requirements for Context Alterations 861 Version 2.06 Instruction or Required Required Notes Instruction or Required Required Notes Event Before After Event Before After interrupt none none interrupt none none rfid none none rfid none none hrfid none none hrfid none none sc none none sc none none Trap none none Trap none none mtmsrd (SF) none none mtmsrd (SF) none none 8 mtmsr[d] (PR) none none mtmsr[d] (EE) none none 1 mtmsr[d] (DR) none none mtmsr[d] (PR) none none 9 mtsr[in] CSI CSI mtmsr[d] (FP) none none mtspr (SDR1) ptesync CSI 3,4 mtmsr[d](FE0,FE1) none none mtspr (AMR) CSI CSI 15 mtmsr[d] (SE, BE) none none mtspr (EAR) CSI CSI mtmsr[d] (IR) none none 9 mtspr (RMOR) CSI CSI 13 mtmsr[d] (RI) none none mtspr (HRMOR) CSI CSI 13 mtsr[in] none CSI 9 mtspr (LPCR) CSI CSI 13 mtspr (DEC) none none 10 mtspr (DABR) -- -- 2 mtspr (SDR1) ptesync CSI 3,4 mtspr (DABRX) -- -- 2 mtspr (CTRL) none none slbie CSI CSI mtspr (HDEC) none none 10 slbia CSI CSI mtspr (RMOR) none CSI 14 slbmte CSI CSI 11 mtspr (HRMOR) none CSI 9,13 tlbie CSI CSI 5,7 mtspr (LPCR) none CSI 13, 14 tlbiel CSI ptesync 5 mtspr (LPIDR) CSI CSI 7,12, 16 tlbia CSI CSI 5 mtspr (PCR) none CSI Store(PTE) none {ptesync, 6,7 slbie none CSI CSI} slbia none CSI Table 1: Synchronization requirements for data access slbmte none CSI 9,11 tlbie none CSI 5,7 tlbiel none CSI 5 tlbia none CSI 5 Store(PTE) none {ptesync, 6,7,9 CSI} Table 2: Synchronization requirements for instruction fetch and/or execution 862 Power ISATM Book III-S Version 2.06 Notes: tion ensures that all preceding instructions that access data storage have completed to a point at 1. The effect of changing the EE bit is immediate, which they have reported all exceptions they will even if the mtmsr[d] instruction is not context syn- cause. chronizing (i.e., even if L=1). If an mtmsr[d] instruction sets the EE bit to 0, The context synchronizing instruction after the neither an External interrupt nor a Decre- tlbie, tlbiel, or tlbia instruction ensures that stor- menter interrupt occurs after the mtmsr[d] is age accesses associated with instructions follow- executed. ing the context synchronizing instruction will not If an mtmsr[d] instruction changes the EE bit use the TLB entry(s) being invalidated. from 0 to 1 when an External, Decrementer, or (If it is necessary to order storage accesses asso- higher priority exception exists, the corre- ciated with preceding instructions, or Reference sponding interrupt occurs immediately after and Change bit updates associated with preceding the mtmsr[d] is executed, and before the next address translations, with respect to subsequent instruction is executed in the program that set data accesses, a ptesync instruction must also be EE to 1. used, either before or after the tlbie, tlbiel, or tlbia If a hypervisor executes the mtmsr[d] instruc- instruction. These effects of the ptesync instruc- tion that sets the EE bit to 0, a Hypervisor tion are described in the last paragraph of Note 8.) Decrementer interrupt does not occur after mtmsr[d] is executed as long as the thread 6. The notation "{ptesync,CSI}" denotes an instruc- remains in hypervisor state. tion sequence. Other instructions may be inter- If the hypervisor executes an mtmsr[d] leaved with this sequence, but these instructions instruction that changes the EE bit from 0 to 1 must appear in the order shown. when a Hypervisor Decrementer or higher pri- No software synchronization is required before the ority exception exists, the corresponding inter- Store instruction because (a) stores are not per- rupt occurs immediately after the mtmsr[d] formed out-of-order and (b) address translations instruction is executed, and before the next associated with instructions preceding the Store instruction is executed, provided HDICE is 1. instruction are not performed again after the store 2. Synchronization requirements for this instruction has been performed (see Section 5.5). These are implementation-dependent. properties ensure that all address translations associated with instructions preceding the Store 3. SDR1 must not be altered when MSRDR=1 or instruction will be performed using the old contents MSRIR=1; if it is, the results are undefined. of the PTE. 4. A ptesync instruction is required before the mtspr The ptesync instruction after the Store instruction instruction because (a) SDR1 identifies the Page ensures that all searches of the Page Table that Table and thereby the location of Reference and are performed after the ptesync instruction com- Change bits, and (b) on some implementations, pletes will use the value stored (or a value stored use of SDR1 to update Reference and Change bits subsequently). The context synchronizing instruc- may be independent of translating the virtual tion after the ptesync instruction ensures that any address. (For example, an implementation might address translations associated with instructions identify the PTE in which to update the Reference following the context synchronizing instruction that and Change bits in terms of its offset in the Page were performed using the old contents of the PTE Table, instead of its real address, and then add the will be discarded, with the result that these Page Table address from SDR1 to the offset to address translations will be performed again and, if determine the real address at which to update the there is no corresponding entry in any implementa- bits.) To ensure that Reference and Change bits tion-specific address translation lookaside informa- are updated in the correct Page Table, SDR1 must tion, will use the value stored (or a value stored not be altered until all Reference and Change bit subsequently). updates associated with address translations that were performed, by the thread executing the The ptesync instruction also ensures that all stor- mtspr instruction, before the mtspr instruction is age accesses associated with instructions preced- executed have been performed with respect to that ing the ptesync instruction, and all Reference and thread. A ptesync instruction guarantees this syn- Change bit updates associated with additional chronization of Reference and Change bit address translations that were performed, by the updates, while neither a context synchronizing thread executing the ptesync instruction, before operation nor the instruction fetching mechanism the ptesync instruction is executed, will be per- does so. formed with respect to any thread or mechanism, to the extent required by the associated Memory 5. For data accesses, the context synchronizing Coherence Required attributes, before any data instruction before the tlbie, tlbiel, or tlbia instruc- accesses caused by instructions following the pte- Chapter 10. Synchronization Requirements for Context Alterations 863 Version 2.06 sync instruction are performed with respect to that Programming Note thread or mechanism. If it is desired to set MSRIR to 1 early in an operat- 7. There are additional software synchronization ing system interrupt handler, advantage can some- requirements for this instruction in multi-threaded times be taken of the fact that EA0:3 are ignored environments (e.g., it may be necessary to invali- when forming the real address when address trans- date one or more TLB entries on all threads in the lation is disabled and MSRHV = 0. For example, if system and to be able to determine that the invali- address translation resources are set such that dations have completed and that all side effects of effective address 0xn000_0000_0000_0000 maps the invalidations have taken effect). to real address 0x000_0000_0000_0000 when Section 5.10 gives examples of using tlbie, Store, address translation is enabled, where n is an arbi- and related instructions to maintain the Page trary 4-bit value, the following code sequence, in Table, in both multi-threaded environments and real page 0, can be used early in the interrupt han- environments consisting of only a single-threaded dler. processor. la rx,target li ry,0xn000 Programming Note sldi ry,ry,48 In a multi-threaded system, if software locking or rx,rx,ry # set high-order is used to help ensure that the requirements nibble of target described in Section 5.10 are satisfied, the addr to 0xn lwsync instruction near the end of the lock mtctr rx bcctr # branch to targ acquisition sequence (see Section B.2.1.1 of Book II) may naturally provide the context syn- targ: mfmsr rx chronization that is required before the alter- ori rx,rx,0x0020 ation. mtmsrd rx# set MSR[IR] to 1 The mtmsrd does not cause an implicit branch in 8. The alteration must not cause an implicit branch in real address space because the real address of the effective address space. Thus, when changing next sequential instruction is independent of MSRSF from 1 to 0, the mtmsrd instruction must MSRIR. Using mtmsrd, rather than rfid (or similar have an effective address that is less than 232 - 4. context synchronizing instruction that alters the Furthermore, when changing MSRSF from 0 to 1, control flow), may yield better performance on the mtmsrd instruction must not be at effective some implementations. address 232 - 4 (see Section 5.3.2 on page 770). 9. The alteration must not cause an implicit branch in (Variations on the technique are possible. For real address space. Thus the real address of the example, the target instruction of the bcctr can be context-altering instruction and of each subse- in arbitrary real page P, where P is a 48-bit value, quent instruction, up to and including the next con- provided that effective address 0xn || P || 0x000 text synchronizing instruction, must be maps to real address P || 0x000 when address independent of whether the alteration has taken translation is enabled.) effect. 10. The elapsed time between the contents of the Decrementer or Hypervisor Decrementer becom- ing negative and the signaling of the correspond- ing exception is not defined. 11. If an slbmte instruction alters the mapping, or associated attributes, of a currently mapped ESID, the slbmte must be preceded by an slbie (or slbia) instruction that invalidates the existing translation. This applies even if the corresponding entry is no longer in the SLB (the translation may still be in implementation-specific address transla- tion lookaside information). No software synchro- nization is needed between the slbie and the slbmte, regardless of whether the index of the SLB entry (if any) containing the current translation is the same as the SLB index specified by the slb- mte. 864 Power ISATM Book III-S Version 2.06 No slbie (or slbia) is needed if the slbmte instruc- 16. LPIDR must not be altered when MSRDR=0 or tion replaces a valid SLB entry with a mapping of a MSRIR=0; if it is, the results are undefined. different ESID (e.g., to satisfy an SLB miss). How- ever, the slbie is needed later if and when the translation that was contained in the replaced SLB entry is to be invalidated. 12. The context synchronizing instruction before the mtspr instruction ensures that the LPIDR is not altered out-of-order. (Out-of-order alteration of the LPIDR could permit the requirements described in Section 5.10.1 to be violated. For the same rea- son, such a context synchronizing instruction may be needed even if the new LPID value is equal to the old LPID value.) See also Chapter 2. "Logical Partitioning (LPAR)" on page 731 regarding moving a thread from one partition to another. 13. When the RMOR or HRMOR is modified, or the VC, VRMASD, RMLS, or LPES1 fields of the LPCR are modified, software must invalidate all implementation-specific lookaside information used in address translation that depends on the old contents of these registers or fields (i.e., the contents immediately before the modification). The slbia instruction can be used to invalidate all such implementation-specific lookaside information. 14. A context synchronizing instruction or event that is executed or occurs when LPCRMER = 1 does not necessarily ensure that the exception effects of LPCRMER are consistent with the contents of LPCRMER. See Section 2.2. 15. This line applies regardless of which SPR number (13 or 29) is used for the AMR. Chapter 10. Synchronization Requirements for Context Alterations 865 Version 2.06 866 Power ISATM Book III-S Version 2.06 Appendix A. Assembler Extended Mnemonics In order to make assembler language programs simpler tions. This appendix defines extended mnemonics and to write and easier to understand, a set of extended symbols related to instructions defined in Book III. mnemonics and symbols is provided for certain instruc- Assemblers should provide the extended mnemonics and symbols listed here, and may provide others. A.1 Move To/From Special Purpose Register Mnemonics This section defines extended mnemonics for the mftb mnemonic with one operand as the extended mtspr and mfspr instructions, including the Special form. In the extended form the TBR operand is omitted Purpose Registers (SPRs) defined in Book I and cer- and assumed to be 268 (the value that corresponds to tain privileged SPRs, and for the Move From Time TB). Base instruction defined in Book II. Programming Note The mtspr and mfspr instructions specify an SPR as a numeric operand; extended mnemonics are provided The extended mnemonics in Table 3 for SPRs that represent the SPR in the mnemonic rather than associated with the Performance Monitor facility requiring it to be coded as an operand. Similar are based on the definitions in Appendix B. extended mnemonics are provided for the Move From Other versions of Performance Monitor facilities Time Base instruction, which specifies the portion of used different sets of SPR numbers (all 32-bit Pow- the Time Base as a numeric operand. erPC implementations used a different set, and Note: mftb serves as both a basic and an extended some early Power ISA implementations used yet a mnemonic. The Assembler will recognize an mftb mne- different set). monic with two operands as the basic form, and an Appendix A. Assembler Extended Mnemonics 867 Version 2.06 Table 3: Extended mnemonics for moving to/from an SPR Move To SPR Move From SPR1 Special Purpose Register Extended Equivalent to Extended Equivalent to Fixed-Point Exception Register mtxer Rx mtspr 1,Rx mfxer Rx mfspr Rx,1 Link Register mtlr Rx mtspr 8,Rx mflr Rx mfspr Rx,8 Count Register mtctr Rx mtspr 9,Rx mfctr Rx mfspr Rx,9 Data Stream Control Register mtdscr Rx mtspr 17,Rx mfdscr Rx mfspr Rx,17 Data Storage Interrupt Status mtdsisr Rx mtspr 18,Rx mfdsisr Rx mfspr Rx,18 Register Data Address Register mtdar Rx mtspr 19,Rx mfdar Rx mfspr Rx,19 Decrementer mtdec Rx mtspr 22,Rx mfdec Rx mfspr Rx,22 Storage Description Register 1 mtsdr1 Rx mtspr 25,Rx mfsdr1 Rx mfspr Rx,25 Save/Restore Register 0 mtsrr0 Rx mtspr 26,Rx mfsrr0 Rx mfspr Rx,26 Save/Restore Register 1 mtsrr1 Rx mtspr 27,Rx mfsrr1 Rx mfspr Rx,27 Come-From Address Register mtcfar Rx mtspr 28,Rx mfcfar Rx mfspr Rx,28 AMR mtamr Rx mtspr 29,Rx mfamr Rx mfspr Rx,29 CTRL mtctrl Rx mtspr 152,Rx mfctrl Rx mfspr Rx,136 UAMOR mtuamor Rx mtspr 157,Rx mfuamor Rx mfspr Rx,157 Special Purpose Registers mtsprg n,Rx mtspr 272+n,Rx mfsprg Rx,n mfspr Rx,272+n G0 through G3 Time Base [Lower] mttbl Rx mtspr 284,Rx mftb Rx mftb Rx,2681 mfspr Rx,268 Time Base Upper mttbu Rx mtspr 285,Rx mftbu Rx mftb Rx,2691 mfspr Rx,269 Time Base Upper 40 mttbu40 Rx mtspr 286,Rx - - Processor Version Register - - mfpvr Rx mfspr Rx,287 HMER mthmer Rx mtspr 336,Rx mfhmer Rx mfspr Rx,336 HMEER mthmeer Rx mtspr 337,Rx mfhmeer Rx mfspr Rx,337 AMOR mtamor Rx mtspr 349,Rx mfamor Rx mfspr Rx,349 MMCRA mtmmcra Rx mtspr 786,Rx mfmmcra Rx mfspr Rx,770 PMC1 mtpmc1 Rx mtspr 787,Rx mfpmc1 Rx mfspr Rx,771 PMC2 mtpmc2 Rx mtspr 788,Rx mfpmc2 Rx mfspr Rx,772 PMC3 mtpmc3 Rx mtspr 789,Rx mfpmc3 Rx mfspr Rx,773 PMC4 mtpmc4 Rx mtspr 790,Rx mfpmc4 Rx mfspr Rx,774 PMC5 mtpmc5 Rx mtspr 791,Rx mfpmc5 Rx mfspr Rx,775 PMC6 mtpmc6 Rx mtspr 792,Rx mfpmc6 Rx mfspr Rx,776 MMCR0 mtmmcr0 Rx mtspr 795,Rx mfmmcr0 Rx mfspr Rx,779 MMCR1 mtmmcr1 Rx mtspr 798,Rx mfmmcr1 Rx mfspr Rx,782 PPR mtppr Rx mtspr 896, Rx mfppr Rx mfspr Rx, 896 PPR322 mtppr32 Rx mtspr 898, Rx mfppr32 Rx mfspr Rx, 898 Processor Identification Register - - mfpir Rx mfspr Rx,1023 1 The mftb instruction is Category: Phased-Out. Assemblers targeting version 2.03 or later of the architecture should generate an mfspr instruction for the mftb and mftbu extended mnemonics; see the corresponding Assembler Note in the mftb instruction description (see Section 5.2.1 of Book II). 2 Category: Phased-In 868 Power ISATM Book III-S Version 2.06 Appendix B. Example Performance Monitor Note - SIAR and SDAR (Sampled Instruction Address Register and Sampled Data Address This Appendix describes an example implementa- Register), which contain the address of the tion of a Performance Monitor. A subset of these "sampled instruction" and of the "sampled requirements are being considered for inclusion in data" the Architecture as part of Category: Server.Perfor- mance Monitor. the Performance Monitor interrupt, which can be caused by monitored conditions and events A Performance Monitor facility provides a means of col- The minimal subset of the features that makes the lecting information about program and system perfor- resulting Performance Monitor useful to software con- mance. sists of MSRPMM, PMC1, PMC2, PMC3, PMC4, The resources (e.g., SPR numbers) that a Performance MMCR0, MMCR1, and MMCRA and certain bits and Monitor facility may use are identified elsewhere in this fields of these three Monitor Mode Control Registers, Book. All other aspects of any Performance Monitor and the Performance Monitor Interrupt. These features facility are implementation-dependent. are identified as the "basic" features below. The remaining features (the remaining SPRs, and the This appendix provides an example of a Performance remaining bits and fields in the three Monitor Mode Monitor facility. It is only an example; implementations Control Registers) are considered "extensions". may provide all, some, or none of the features described here, or may provide features that are similar The events that can be counted in the PMCs as well as to those described here but differ in detail. the code that identifies each event are implementation- dependent. The events and codes may vary between PMCs, as well as between implementations. For the Programming Note programmable PMCs, the event to be counted is Because the features provided by a Performance selected by specifying the appropriate code in the Monitor facility are implementation-dependent, MMCR "Selector" field for the PMC. Some events may operating systems should provide services that include operations that are performed out-of-order. support the useful performance monitoring func- tions in a generic fashion. Application programs Many aspects of the operation of the Performance should use these services, and should not depend Monitor are summarized by the following hierarchy, on the features provided by a particular implemen- which is described starting at the lowest level. tation. A "counter negative condition" exists when the value in a PMC is negative (i.e., when bit 0 of the The example Performance Monitor facility consists of PMC is 1). A "Time Base transition event" occurs the following features (described in detail in subse- when a selected bit of the Time Base changes quent sections). from 0 to 1 (the bit is selected by an MMCR field). one MSR bit The term "condition or event" is used as an abbre- viation for "counter negative condition or Time - PMM (Performance Monitor Mark), which can Base transition event". A condition or event can be be used to select one or more programs for caused implicitly by the hardware (e.g., increment- monitoring ing a PMC) or explicitly by software (mtspr). SPRs A condition or event is enabled if the correspond- - PMC1 - PMC6 (Performance Monitor Counter ing "Enable" bit in an MMCR is 1. The occurrence registers 1 - 6), which count events of an enabled condition or event can have side effects within the Performance Monitor, such as - MMCR0, MMCR1, and MMCRA (Monitor causing the PMCs to cease counting. Mode Control Registers 0, 1, and A), which control the Performance Monitor facility An enabled condition or event causes a Perfor- mance Monitor alert if Performance Monitor alerts are enabled by the corresponding "Enable" bit in Appendix B. Example Performance Monitor 869 Version 2.06 an MMCR. A single Performance Monitor alert Programming Note may reflect multiple enabled conditions and events. Software can use this bit as a process-specific marker which, in conjunction with MMCR0FCM0 A Performance Monitor alert causes a Perfor- FCM1 (see Section B.2.2), permits events to be mance Monitor exception. counted on a process-specific basis. (The bit is The exception effects of the Performance Monitor saved by interrupts and restored by rfid.) are said to be consistent with the contents of Common uses of the PMM bit include the following. MMCR0PMAO if one of the following statements is true. (MMCR0PMAO reflects the occurrence of Per- Count events for a few selected processes. formance Monitor alerts; see the definition of that This use requires the following bit settings. bit in Section B.2.2.) - MSRPMM=1 for the selected processes, - MMCR0PMAO=0 and a Performance Monitor MSRPMM=0 for all other processes exception does not exist. - MMCR0FCM0=1 - MMCR0PMAO=1 and a Performance Monitor - MMCR0FCM1=0 exception exists. Count events for all but a few selected pro- A context synchronizing instruction or event that cesses. This use requires the following bit set- occurs when MMCR0PMAO=0 ensures that the tings. exception effects of the Performance Monitor are - MSRPMM=1 for the selected processes, consistent with the contents of MMCR0PMAO. MSRPMM=0 for all other processes - MMCR0FCM0=0 Even without software synchronization, when the - MMCR0FCM1=1 contents of MMCR0PMAO change, the exception effects of the Performance Monitor become con- Notice that for both of these uses a mark value of 1 sistent with the new contents of MMCR0PMAO suf- identifies the "few" processes and a mark value of 0 ficiently soon that the Performance Monitor facility identifies the remaining "many" processes. is useful to software for its intended purposes. Because the PMM bit is set to 0 when an interrupt occurs (see Figure 44 on page 828), interrupt han- A Performance Monitor exception causes a Perfor- dlers are treated as one of the "many". If it is mance Monitor interrupt when MSREE=1. desired to treat interrupt handlers as one of the "few", the mark value convention just described Programming Note would be reversed. The Performance Monitor can be effectively dis- abled (i.e., put into a state in which Performance Monitor SPRs are not altered and Performance Monitor interrupts do not occur) by setting MMCR0 B.2 Special Purpose Registers to 0x0000_0000_8000_0000. The Performance Monitor SPRs count events, control the operation of the Performance Monitor, and provide associated information. B.1 PMM Bit of the Machine The Performance Monitor SPRs can be read and writ- State Register ten using the mfspr and mtspr instructions (see Section 4.4.4, "Move To/From System Register Instruc- The Performance Monitor uses MSR bit PMM, which is tions" on page 760). The Performance Monitor SPR defined as follows. numbers are shown in Figure 55. Writing any of the Performance Monitor SPRs is privileged. Reading any Bit Description of the Performance Monitor SPRs is not privileged 61 Performance Monitor Mark (PMM) (however, the privileged SPR numbers used to write This bit is a basic feature. the SPRs can also be used to read them; see the fig- ure). This bit contains the Performance Monitor "mark" (0 or 1). The elapsed time between the execution of an instruc- tion and the time at which events due to that instruction have been reflected in Performance Monitor SPRs is not defined. No means are provided by which software can ensure that all events due to preceding instructions have been reflected in Performance Monitor SPRs. Similarly, if the events being monitored may be caused by operations that are performed out-of-order, no means are provided by which software can prevent such events due to subsequent instructions from being 870 Power ISATM Book III-S Version 2.06 reflected in Performance Monitor SPRs. Thus the con- tents obtained by reading a Performance Monitor SPR may not be precise: it may fail to reflect some events SPR1 Register Privi- due to instructions that precede the mfspr and may decimal spr5:9 spr0:4 Name leged reflect some events due to instructions that follow the 786 11000 10010 MMCRA yes mfspr. This lack of precision applies regardless of 787 11000 10011 PMC1 yes whether the state of the thread is such that the SPR is 788 11000 10100 PMC2 yes subject to change by the hardware at the time the 789 11000 10101 PMC3 yes mfspr is executed. Similarly, if an mtspr instruction is 790 11000 10110 PMC4 yes executed that changes the contents of the Time Base, the change is not guaranteed to have taken effect with 791 11000 10111 PMC5 yes respect to causing Time Base transition events until 792 11000 11000 PMC6 yes after a subsequent context synchronizing instruction 795 11000 11011 MMCR0 yes has been executed. 796 11000 11100 SIAR yes 797 11000 11101 SDAR yes If an mtspr instruction is executed that changes the value of a Performance Monitor SPR other than SIAR 798 11000 11110 MMCR1 yes 1 or SDAR, the change is not guaranteed to have taken Note that the order of the two 5-bit halves of effect until after a subsequent context synchronizing the SPR number is reversed. instruction has been executed (see Chapter 10. "Synchronization Requirements for Context Alter- Figure 56. Performance Monitor SPR encodings for ations" on page 861). mtspr Programming Note Depending on the events being monitored, the con- tents of Performance Monitor SPRs may be B.2.1 Performance Monitor affected by aspects of the runtime environment Counter Registers (e.g., cache contents) that are not directly attribut- able to the programs being monitored. The six Performance Monitor Counter registers, PMC1 through PMC6, are 32-bit registers that count events. PMC1 SPR1,2 Register Privi- PMC2 decimal spr5:9 spr0:4 Name leged PMC3 770,786 11000 n0010 MMCRA no,yes PMC4 771,787 11000 n0011 PMC1 no,yes PMC5 772,788 11000 n0100 PMC2 no,yes PMC6 773,789 11000 n0101 PMC3 no,yes 774,790 11000 n0110 PMC4 no,yes 32 63 775,791 11000 n0111 PMC5 no,yes 776,792 11000 n1000 PMC6 no,yes Figure 57. Performance Monitor Counter registers PMC1, PMC2, PMC3, and PMC4 are basic features. 779,795 11000 n1011 MMCR0 no,yes PMC5 and PMC6 are not programmable. PMC5 780,796 11000 n1100 SIAR no,yes counts instructions completed and PMC6 counts 781,797 11000 n1101 SDAR no,yes cycles. 782,798 11000 n1110 MMCR1 no,yes Normally each PMC is incremented each hardware 1 Note that the order of the two 5-bit halves of cycle by the number of times the corresponding event the SPR number is reversed. occurred in that cycle. Other modes of incrementing 2 Reading the SPR is privileged if and only if may also be provided (e.g., see the description of n=1. MMCR1 bits PMC1HIST and PMCjHIST). Figure 55. Performance Monitor SPR encodings for "PMCj" is used as an abbreviation for "PMCi, i > 1". mfspr Programming Note PMC5 and PMC6 are defined to facilitate calculat- ing basic performance metrics such as cycles per instruction (CPI). Appendix B. Example Performance Monitor 871 Version 2.06 0 The PMCs are incremented (if permitted Programming Note by other MMCR bits). Software can use a PMC to "pace" the collection of 1 The PMCs are not incremented if Performance Monitor data. For example, if it is MSRPR=1. desired to collect event counts every n cycles, soft- ware can specify that a particular PMC count 35 Freeze Counters while Mark = 1 (FCM1) cycles and set that PMC to 0x8000_0000 - n. The This bit is a basic feature. events of interest would be counted in other PMCs. 0 The PMCs are incremented (if permitted The counter negative condition that will occur after by other MMCR bits). n cycles can, with the appropriate setting of MMCR 1 The PMCs are not incremented if bits, cause counter values to become frozen, cause MSRPMM=1. a Performance Monitor interrupt to occur, etc. 36 Freeze Counters while Mark = 0 (FCM0) This bit is a basic feature. B.2.2 Monitor Mode Control 0 The PMCs are incremented (if permitted Register 0 by other MMCR bits). 1 The PMCs are not incremented if Monitor Mode Control Register 0 (MMCR0) is a 64-bit MSRPMM=0. register. This register, along with MMCR1 and MMCRA, controls the operation of the Performance 37 Performance Monitor Alert Enable (PMAE) Monitor. This bit is a basic feature. MMCR0 0 Performance Monitor alerts are disabled. 0 63 1 Performance Monitor alerts are enabled until a Performance Monitor alert occurs, Figure 58. Monitor Mode Control Register 0 at which time: MMCR0PMAE is set to 0 MMCR0 is a basic feature. Within MMCR0, some of MMCR0PMAO is set to 1 the bits and fields are basic features and some are extensions. The basic bits and fields are identified as Programming Note such, below. Software can set this bit and Some bits of MMCR0 are altered by the hardware MMCR0PMAO to 0 to prevent Performance when various events occur, as described below. Monitor interrupts. The bit definitions of MMCR0 are as follows. MMCR0 Software can set this bit to 1 and then poll bits that are not implemented are treated as reserved. the bit to determine whether an enabled Bit(s) Description condition or event has occurred. This is especially useful for software that runs 0:31 Reserved with MSREE=0. 32 Freeze Counters (FC) In earlier versions of the architecture that This bit is a basic feature. lacked the concept of Performance Moni- 0 The PMCs are incremented (if permitted tor alerts, this bit was called Performance by other MMCR bits). Monitor Exception Enable (PMXE). 1 The PMCs are not incremented. 38 Freeze Counters on Enabled Condition or The hardware sets this bit to 1 when an Event (FCECE) enabled condition or event occurs and MMCR0FCECE=1. 0 The PMCs are incremented (if permitted by other MMCR bits). 33 Freeze Counters in Privileged State (FCS) 1 The PMCs are incremented (if permitted This bit is a basic feature. by other MMCR bits) until an enabled condition or event occurs when 0 The PMCs are incremented (if permitted MMCR0TRIGGER=0, at which time: by other MMCR bits). MMCR0FC is set to 1 1 The PMCs are not incremented if MSRHV PR=0b00. If the enabled condition or event occurs when MMCR0TRIGGER=1, the FCECE bit is treated 34 Freeze Counters in Problem State (FCP) as if it were 0. This bit is a basic feature. 39:40 Time Base Selector (TBSEL) 872 Power ISATM Book III-S Version 2.06 This field selects the Time Base bit that can 1 PMC1 is incremented (if permitted by cause a Time Base transition event (the event other MMCR bits). The PMCjs are not occurs when the selected bit changes from 0 incremented until PMC1 is negative or an to 1). enabled condition or event occurs, at which time: 00 Time Base bit 63 is selected. the PMCjs resume incrementing (if 01 Time Base bit 55 is selected. permitted by other MMCR bits) 10 Time Base bit 51 is selected. MMCR0TRIGGER is set to 0 11 Time Base bit 47 is selected. See the description of the FCECE bit, above, Programming Note regarding the interaction between TRIGGER Time Base transition events can be used and FCECE. to collect information about activity, as Programming Note revealed by event counts in PMCs and by addresses in SIAR and SDAR, at periodic Uses of TRIGGER include the following. intervals. Resume counting in the PMCjs when In multi-threaded systems in which the PMC1 becomes negative, without Time Base registers are synchronized causing a Performance Monitor inter- among the threads, Time Base transition rupt. Then freeze all PMCs (and events can be used to correlate the Per- optionally cause a Performance Mon- formance Monitor data obtained by the itor interrupt) when a PMCj becomes several threads. For this use, software negative. The PMCjs then reflect the must specify the same TBSEL value for all events that occurred between the the threads in the system. time PMC1 became negative and the time a PMCj becomes negative. This Because the frequency of the Time Base use requires the following MMCR0 bit is implementation-dependent, software settings. should invoke a system service program to obtain the frequency before choosing a - TRIGGER=1 value for TBSEL. - PMC1CE=0 - PMCjCE=1 41 Time Base Event Enable (TBEE) - TBEE=0 - FCECE=1 0 Time Base transition events are disabled. - PMAE=1 (if a Performance Moni- 1 Time Base transition events are enabled. tor interrupt is desired) 42:47 Reserved Resume counting in the PMCjs when 48 PMC1 Condition Enable (PMC1CE) PMC1 becomes negative, and cause a Performance Monitor interrupt with- This bit controls whether counter negative out freezing any PMCs. The PMCjs conditions due to a negative value in PMC1 then reflect the events that occurred are enabled. between the time PMC1 became 0 Counter negative conditions for PMC1 are negative and the time the interrupt disabled. handler reads them. This use 1 Counter negative conditions for PMC1 are requires the following MMCR0 bit set- enabled. tings. 49 PMCj Condition Enable (PMCjCE) - TRIGGER=1 - PMC1CE=1 This bit controls whether counter negative - TBEE=0 conditions due to a negative value in any - FCECE=0 PMCj (i.e., in any PMC except PMC1) are - PMAE=1 enabled. 0 Counter negative conditions for all PMCjs 51:52 Setting is implementation-dependent. are disabled. 53:55 Reserved 1 Counter negative conditions for all PMCjs are enabled. 56 Performance Monitor Alert Occurred (PMAO) 50 Trigger (TRIGGER) This bit is a basic feature. 0 The PMCs are incremented (if permitted by other MMCR bits). Appendix B. Example Performance Monitor 873 Version 2.06 0 A Performance Monitor alert has not MMCRA, controls the operation of the Performance occurred since the last time software set Monitor. this bit to 0. 1 A Performance Monitor alert has occurred MMCR1 since the last time software set this bit to 0 63 0. Figure 59. Monitor Mode Control Register 1 This bit is set to 1 by the hardware when a Performance Monitor alert occurs. This bit can MMCR1 is a basic feature. Within MMCR1, some of be set to 0 only by the mtspr instruction. the bits and fields are basic features and some are extensions. The basic bits and fields are identified as Programming Note such, below. Software can set this bit to 1 to simulate Some bits of MMCR1 are altered by the hardware the occurrence of a Performance Monitor when various events occur, as described below. alert. The bit definitions of MMCR1 are as follows. MMCR1 Software should set this bit to 0 after han- bits that are not implemented are treated as reserved. dling the Performance Monitor alert. Bit(s) Description 57 Setting is implementation-dependent. 0:31 Implementation-Dependent Use 58 Freeze Counters 1-4 (FC1-4) These bits have implementation-dependent 0 PMC1 - PMC4 are incremented (if permit- uses (e.g., extended event selection). ted by other MMCR bits). 32:39 PMC1 Selector (PMC3SEL) 1 PMC1 - PMC4 are not incremented. 40:47 PMC2 Selector (PMC4SEL) 59 Freeze Counters 5-6 (FC5-6) 48:55 PMC3 Selector (PMC5SEL) 56:63 PMC4 Selector (PMC6SEL) 0 PMC5 - PMC6 are incremented (if permit- ted by other MMCR bits). Each of these fields contains a code that iden- 1 PMC5 - PMC6 are not incremented. tifies the event to be counted by PMCs 1 through 4 respectively. 60:61 Reserved PMC Selectors are basic features. 62 Freeze Counters in Wait State (FCWAIT) This bit is a basic feature. Compatibility Note 0 The PMCs are incremented (if permitted In versions of the architecture that pre- by other MMCR bits). cede Version 2.02 the PMC Selector 1 The PMCs are not incremented if Fields were six bits long, and were split CTRL31=0. Software is expected to set between MMCR0 and MMCR1. PMC1-8 CTRL31=0 when it is in a "wait state", i.e, were all programmable. when there is no process ready to run. If more programmable PMCs are imple- Only Branch Unit type of events do not incre- mented in the future, additional MMCRs ment if CTRL31=0. Other units continue to may be defined to cover the additional count. selectors. 63 Freeze Counters in Hypervisor State (FCH) This bit is a basic feature. B.2.4 Monitor Mode Control 0 The PMCs are incremented (if permitted Register A by other MMCR bits). 1 The PMCs are not incremented if Monitor Mode Control Register A (MMCRA) is a 64-bit MSRHV PR=0b10. register. This register, along with MMCR0 and MMCR1, controls the operation of the Performance Monitor. B.2.3 Monitor Mode Control Register 1 MMCRA 0 63 Monitor Mode Control Register 1 (MMCR1) is a 64-bit register. This register, along with MMCR0 and Figure 60. Monitor Mode Control Register A MMCRA is a basic feature. Within MMCRA, some of the bits and fields are basic features and some are 874 Power ISATM Book III-S Version 2.06 extensions. The basic bits and fields are identified as B.2.5 Sampled Instruction such, below. Address Register Some bits of MMCRA are altered by the hardware when various events occur, as described below. The Sampled Instruction Address Register (SIAR) is a 64-bit register. It contains the address of the "sampled The bit definitions of MMCRA are as follows. MMCRA instruction" when a Performance Monitor alert occurs. bits that are not implemented are treated as reserved. Bit(s) Description SIAR 0 63 0:31 Reserved 32 Contents of SIAR and SDAR Are Related Figure 61. Sampled Instruction Address Register (CSSR) When a Performance Monitor alert occurs, SIAR is set Set to 1 by the hardware if the contents of to the effective address of an instruction that was being SIAR and SDAR are associated with the same executed, possibly out-of-order, at or around the time instruction; otherwise set to 0. that the Performance Monitor alert occurred. This instruction is called the "sampled instruction". 33:34 Setting is implementation-dependent. The contents of SIAR may be altered by the hardware if 35 Sampled MSRHV (SAMPHV) and only if MMCR0PMAE=1. Thus after the Perfor- Value of MSRHV when the Performance Moni- mance Monitor alert occurs, the contents of SIAR are tor Alert occurred. not altered by the hardware until software sets MMCR0PMAE to 1. After software sets MMCR0PMAE to 36 Sampled MSRPR (SAMPPR) 1, the contents of SIAR are undefined until the next Value of MSRPR when the Performance Moni- Performance Monitor alert occurs. tor Alert occurred. See Section B.4 regarding the effects of the Trace facil- 37:47 Setting is implementation-dependent. ity on SIAR. 48:53 Threshold (THRESHOLD) Programming Note This field contains a "threshold value", which If the Performance Monitor alert causes a Perfor- is a value such that only events that exceed mance Monitor interrupt, the value of MSRHV PR the value are counted. The events to which a that was in effect when the sampled instruction was threshold value can apply are implementation- being executed is reported in MMCRA. dependent, as are the dimension of the threshold (e.g., duration in cycles) and the granularity with which the threshold value is interpreted. B.2.6 Sampled Data Address Reg- ister Programming Note The Sampled Data Address Register (SDAR) is a 64- By varying the threshold value, software bit register. It contains the address of the "sampled can obtain a profile of the characteristics data" when a Performance Monitor alert occurs. of the events subject to the threshold. For example, if PMC1 counts the number of cache misses for which the duration SDAR exceeds the threshold value, then soft- 0 63 ware can obtain the distribution of cache Figure 62. Sampled Data Address Register miss durations for a given program by monitoring the program repeatedly using When a Performance Monitor alert occurs, SDAR is set a different threshold value each time. to the effective address of the storage operand of an instruction that was being executed, possibly out-of- 54:59 Reserved for implementation-specific use. order, at or around the time that the Performance Mon- itor alert occurred. This storage operand is called the 60:62 Reserved "sampled data". The sampled data may be, but need 63 Setting is implementation-dependent. not be, the storage operand (if any) of the sampled instruction (see Section B.2.5). The contents of SDAR may be altered by the hardware if and only if MMCR0PMAE=1. Thus after the Perfor- mance Monitor alert occurs, the contents of SDAR are not altered by the hardware until software sets Appendix B. Example Performance Monitor 875 Version 2.06 MMCR0PMAE to 1. After software sets MMCR0PMAE to occur before the next instruction is executed (if no 1, the contents of SDAR are undefined until the next higher priority exception exists). Performance Monitor alert occurs. The priority of the Performance Monitor exception is See Section B.4 regarding the effects of the Trace facil- equal to that of the External, Decrementer, and Hyper- ity on SDAR. visor Decrementer exceptions (i.e., the hardware may generate any one of the four interrupts for which an Programming Note exception exists) (see Section 6.7.2, "Ordered Excep- If the Performance Monitor alert causes a Perfor- tions" on page 845 and Section 6.8, "Interrupt Priori- mance Monitor interrupt, MMCRA indicates ties" on page 845). whether the sampled data is the storage operand of the sampled instruction. B.4 Interaction with the Trace Facility B.3 Performance Monitor If the Trace facility includes setting SIAR and SDAR Interrupt (see Appendix C, "Example Trace Extensions" on page 879), and tracing is active (MSRSE=1 or The Performance Monitor interrupt is a system caused MSRBE=1), the contents of SIAR and SDAR as used by interrupt (Section 6.4). It is masked by MSREE in the the Performance Monitor facility are undefined and may same manner that External and Decrementer interrupts change even when MMCR0PMAE=0. are. The Performance Monitor interrupt is a basic feature. Programming Note A potential combined use of the Trace and Perfor- A Performance Monitor interrupt occurs when no higher mance Monitor facilities is to trace the control flow priority exception exists, a Performance Monitor excep- of a program and simultaneously count events for tion exists, and MSREE=1. that program. If multiple Performance Monitor exceptions occur before the first causes a Performance Monitor interrupt, the interrupt reflects the most recent Performance Mon- itor exception and the preceding Performance Monitor exceptions are lost. The following registers are set: SRR0 Set to the effective address of the instruc- tion that would have been attempted to be execute next if no interrupt conditions were present. SRR1 33:36 and 42:47 Implementation-specific. Others Loaded from the MSR. MSR See Figure 44 on page 828. SIAR Set to the effective address of the "sampled instruction" (see Section B.2.5). SDAR Set to the effective address of the "sampled data" (see Section B.2.6). Execution resumes at effective address 0x0000_0000_0000_0F00. In general, statements about External and Decre- menter interrupts elsewhere in this Book apply also to the Performance Monitor interrupt; for example, if a Performance Monitor exception exists when an mtm- srd[d] instruction is executed that changes MSREE from 0 to 1, the Performance Monitor interrupt will 876 Power ISATM Book III-S Version 2.06 Appendix C. Example Trace Extensions 34 Set to 1 if the traced instruction is dcbt, Note dcbtst, dcbz, dcbst, dcbf[l]; otherwise set This Appendix describes an example implementa- to 0. tion of Trace Extensions. A subset of these require- 35 Set to 1 if the traced instruction is a Load ments are being considered for inclusion in the instruction or eciwx; may be set to 1 if the Architecture as part of Category: Trace. traced instruction is icbi, dcbt, dcbtst, dcbst, dcbf[l]; otherwise set to 0. This appendix provides an example of extensions that 36 Set to 1 if the traced instruction is a Store may be added to the Trace facility described in instruction, dcbz, or ecowx; otherwise set Section 6.5.14, "Trace Interrupt [Category: Trace]" on to 0. page 839. It is only an example; implementations may 42 Set to 1 if the traced instruction is lswx or provide all, some, or none of the features described stswx; otherwise set to 0. here, or may provide features that are similar to those 43 Implementation-dependent. described here but differ in detail. 44 Set to 1 if the traced instruction is a Branch instruction and the branch is taken; other- The extensions consist of the following features wise set to 0. (described in detail below). 45 Set to 1 if the traced instruction is eciwx or use of MSRSE BE=0b11 to specify new causes of ecowx; otherwise set to 0. Trace interrupts 46 Set to 1 if the traced instruction is lbarx, specification of how certain SRR1 bits are set lharx, lwarx, ldarx, stbcx., sthcx., stwcx., when a Trace interrupt occurs or stdcx.; otherwise set to 0. 47 Implementation-dependent. setting of SIAR and SDAR (see Appendix B, "Example Performance Monitor" on page 869) when a Trace interrupt occurs SIAR and SDAR If the Performance Monitor facility is implemented and MSRSE BE = 0b11 includes SIAR and SDAR (see Appendix B), the follow- ing additional registers are set when a Trace interrupt If MSRSE BE=0b11, the hardware generates a Trace occurs: exception under the conditions described in Section 6.5.14 for MSRSE BE=0b01, and also after successfully SIAR Set to the effective address of the traced completing the execution of any instruction that would instruction. cause at least one of SRR1 bits 33:36, 42, and 44:46 to SDAR Set to the effective address of the storage be set to 1 (see below) if the instruction were executed operand (if any) of the traced instruction; when MSRSE BE=0b10. otherwise undefined. This overrides the implicit statement in Section 6.5.14 If the state of the Performance Monitor is such that the that the effects of MSRSE BE=0b11 are the same as Performance Monitor may be altering these registers those of MSRSE BE=0b10. (i.e., if MMCR0PMAE=1), the contents of SIAR and SDAR as used by the Trace facility are undefined and SRR1 may change even when no Trace interrupt occurs. When a Trace interrupt occurs, the SRR1 bits that are not loaded from the MSR are set as follows instead of as described in Section 6.5.14. 33 Set to 1 if the traced instruction is icbi; oth- erwise set to 0. Appendix C. Example Trace Extensions 879 Version 2.06 880 Power ISATM Book III-S Version 2.06 Appendix D. Interpretation of the DSISR as Set by an Alignment Interrupt For most causes of Alignment interrupt, the interrupt versa.Therefore two such instructions may yield the handler will emulate the interrupting instruction. To do same DSISR value (all 32 bits). For example, stw and this, it needs the following characteristics of the inter- stwx may both yield either the DSISR value shown in rupting instruction: the following table for stw, or that shown for stwx. Load or store Length (halfword, word, doubleword) String, multiple, or elementary Fixed-point or floating-point Update or non-update Byte reverse or not Is it dcbz? The Power ISA optionally provides this information by setting bits in the DSISR that identify the interrupting instruction type. It is not necessary for the interrupt handler to load the interrupting instruction from storage. The mapping is unique except for a few exceptions that are discussed below. The near-uniqueness depends on the fact that many instructions, such as the fixed- and floating-point arithmetic instructions and the one- byte loads and stores, cannot cause an Alignment interrupt. See Section 6.5.8 for a description of how the opcode and extended opcode are mapped to a DSISR value for an X-, D-, or DS-form instruction that causes an Align- ment interrupt. The table on the next page shows the inverse mapping: how the DSISR bits identify the interrupting instruc- tion. The following notes are cited in the table. 1. For cases in which multiple instructions can give the same value for the DSISR bits (44:45 47:53) that are derived from the opcode, the Alignment interrupt handler should load the instruction from storage, using the effective address in SRR0, and treat the instruction appropriately (e.g., emulate the instruction, treat the case as a programming error, etc.). For example, if lbarx or lwarx causes an Alignment interrupt, it should not be emulated, but instead should be treated as a programming error. 2. These are distinguished by DSISR bits 44:45, which are not shown in the table. The interrupt handler has no need to distinguish between an X-form instruction and the corresponding D- or DS-form instruction if one exists, and vice Appendix D. Interpretation of the DSISR as Set by an Alignment Interrupt 881 Version 2.06 then it is or D/ then it is or D/ If DSISR either X- DS-form so the instruction If DSISR either X- DS-form so the instruction 47:53 is: form opcode: opcode: is: 47:53 is: form opcode: opcode: is: 00 0 0000 00000xxx00 x00000 lwarx,lwz, 10 0 0001 00010xxx10 - reserved(1) 10 0 0010 00100xxx10 stwcx. 00 0 0001 00010xxx00 x00010 ldarx 10 0 0011 00110xxx10 stdcx. 00 0 0010 00100xxx00 x00100 stw 10 0 0100 01000xxx10 - 00 0 0011 00110xxx00 x00110 - 10 0 0101 01010xxx10 - 00 0 0100 01000xxx00 x01000 lhz 10 0 0110 01100xxx10 - 00 0 0101 01010xxx00 x01010 lha, lxvdsx(1) 10 0 0111 01110xxx10 - 00 0 0110 01100xxx00 x01100 sth 10 0 1000 10000xxx10 lwbrx 00 0 0111 01110xxx00 x01110 lmw 10 0 1001 10010xxx10 - 00 0 1000 10000xxx00 x10000 lfs 10 0 1010 10100xxx10 stwbrx 00 0 1001 10010xxx00 x10010 lfd, lxsdx(1) 10 0 1011 10110xxx10 sthcx. 00 0 1010 10100xxx00 x10100 stbrx, stfs(1) 10 0 1100 11000xxx10 lhbrx 00 0 1011 10110xxx00 x10110 stfd, stxsdx(1) 10 0 1101 11010xxx10 - 00 0 1100 11000xxx00 x11000 lq 10 0 1110 11100xxx10 sthbrx 00 0 1101 11010xxx00 x11010 ld, ldu, lwa (2) 10 0 1111 11110xxx10 - 00 0 1110 11100xxx00 x11100 stxvw4x 10 1 0000 00001xxx10 - 00 0 1111 11110xxx00 x11110 std, stdu, stq (2) 10 1 0001 00011xxx10 - 00 1 0000 00001xxx00 x00001 lbarx, lwzu (1) 10 1 0010 00101xxx10 - 00 1 0001 00011xxx00 x00011 lharx 10 1 0011 00111xxx10 - 00 1 0010 00101xxx00 x00101 stwu 10 1 0100 01001xxx10 eciwx 00 1 0011 00111xxx00 x00111 - 10 1 0101 01011xxx10 - 00 1 0100 01001xxx00 x01001 lhzu 10 1 0110 01101xxx10 ecowx 00 1 0101 01011xxx00 x01011 lhau 10 1 0111 01111xxx10 - 00 1 0110 01101xxx00 x01101 sthu 10 1 1000 10001xxx10 - 00 1 0111 01111xxx00 x01111 stmw 10 1 1001 10011xxx10 - 00 1 1000 10001xxx00 x10001 lfsu 10 1 1010 10101xxx10 stbcx. 00 1 1001 10011xxx00 x10011 lfdu, lxsdux(1) 10 1 1011 10111xxx10 - 00 1 1010 10101xxx00 x10101 stfsu 10 1 1100 11001xxx10 - 00 1 1011 10111xxx00 x10111 stfdu, stxsdux(1) 10 1 1101 11011xxx10 - 10 1 1110 11101xxx10 - 00 1 1100 11001xxx00 x11001 zlfdp, lxvw4ux(1) 00 1 1101 11011xxx00 x11011 lxvd2ux 10 1 1111 11111xxx10 dcbz 11 0 0000 00000xxx11 lwzx 00 1 1110 11101xxx00 x11101 stfdp, stxvw4ux(1) 00 1 1111 11111xxx00 x11111 stxvd2ux 11 0 0001 00010xxx11 - 11 0 0010 00100xxx11 stwx 01 0 0000 00000xxx01 ldx 11 0 0011 00110xxx11 - 01 0 0001 00010xxx01 - 01 0 0010 00100xxx01 stdx 11 0 0100 01000xxx11 lhzx 11 0 0101 01010xxx11 lhax 01 0 0011 00110xxx01 - 01 0 0100 01000xxx01 - 11 0 0110 01100xxx11 sthx 11 0 0111 01110xxx11 - 01 0 0101 01010xxx01 lwax 01 0 0110 01100xxx01 - 11 0 1000 10000xxx11 lfsx 11 0 1001 10010xxx11 lfdx 01 0 0111 01110xxx01 - 01 0 1000 10000xxx01 lswx 11 0 1010 10100xxx11 stfsx 11 0 1011 10110xxx11 stfdx 01 0 1001 10010xxx01 lswi 01 0 1010 10100xxx01 stswx 11 0 1100 11000xxx11 lfdpx 11 0 1101 11010xxx11 lfiwax 01 0 1011 10110xxx01 stswi 01 0 1100 11000xxx01 - 11 1 1101 11010xxx11 lfiwzx 11 0 1110 11100xxx11 stfdpx 01 0 1101 11010xxx01 - 01 0 1110 11100xxx01 - 11 0 1111 11110xxx11 stfiwx 11 1 0000 00001xxx11 lwzux 01 0 1111 11110xxx01 - 01 1 0000 00001xxx01 ldux 11 1 0001 00011xxx11 - 11 1 0010 00101xxx11 stwux 01 1 0001 00011xxx01 - 01 1 0010 00101xxx01 stdux 11 1 0011 00111xxx11 - 11 1 0100 01001xxx11 lhzux 01 1 0011 00111xxx01 - 01 1 0100 01001xxx01 - 11 1 0101 01011xxx11 lhaux 11 1 0110 01101xxx11 sthux 01 1 0101 01011xxx01 lwaux 01 1 0110 01101xxx01 - 11 1 0111 01111xxx11 - 11 1 1000 10001xxx11 lfsux 01 1 0111 01111xxx01 - 01 1 1000 10001xxx01 - 11 1 1001 10011xxx11 lfdux 11 1 1010 10101xxx11 stfsux 01 1 1001 10011xxx01 - 01 1 1010 10101xxx01 - 11 1 1011 10111xxx11 stfdux 11 1 1100 11001xxx11 - 01 1 1011 10111xxx01 - 01 1 1100 11001xxx01 - 11 1 1101 11011xxx11 - 11 1 1110 11101xxx11 - 01 1 1101 11011xxx01 - 01 1 1110 11101xxx01 - 11 1 1111 11111xxx11 - 01 1 1111 11111xxx01 - 10 0 0000 00000xxx10 - 882 Power ISATM Book III-S Version 2.06 Appendix D. Interpretation of the DSISR as Set by an Alignment Interrupt 883 Version 2.06 884 Power ISATM Book III-S Version 2.06 Appendix E. Programming Examples E.1 Unsigned Single-Precision- minus), replace the xori after the "SignedSub" label with "xori RA,RA,2". BCD Arithmetic Preserving the appropriate sign code is accomplished addg6s can be used to add or subtract two BCD oper- by zeroing the sign code of the other operand before ands. In these examples it is assumed that r0 contains performing a 16 digit BCD addition/subtraction. Other 0x666...666. (BCD data formats are described in addends (ones complement or 6's) must leave the sign Section 5.3 of Book I.) code position as zero. Addition of the unsigned BCD operand in register RA to (In this example r11 contains 0x6666 6666 6666 6660.) the unsigned BCD operand in register RB can be accomplished as follows. SignedSub: xori RA,RA,1 add r1,RA,r0 add r2,r1,RB SignedAdd: addg6s RT,r1,RB xor r5,RA,RB subf RT,RT,r2 # RT = RA +BCD RB andi. r5,r5, 15 # compare sign codes cmpld cr1,RA,RB # compare magnitudes Subtraction of the unsigned BCD operand in register beq cr0,samesign RA from the unsigned BCD operand in register RB can ble cr1,BminusA be accomplished as follows. (In this example it is assumed that RB is not register 0.) # set up for RT = RA -BCD RB nor r9,RB,RB # one's complement of RB addi r1,RB,1 addi r10,RA,16 # generate the carry in nor r2,RA,RA # one's complement of RA b submag add r3,r1,r2 addg6s RT,r1,r2 BminusA: subf RT,RT,r3 # RT = RB -BCD RA # set up for RT = RB -BCD RA nor r9,RA,RA # one's complement of RA Additional instructions are needed to handle signed addi r10,RB,16 # generate the carry in BCD operands, and BCD operands that occupy more than one register (e.g., unsigned BCD operands that submag: have more than 16 decimal digits). rldicr r9,r9,0,59 # remove the sign code add r8,r10,r9 addg6s RT,r10,r9 E.2 Signed Single-Precision rldicr RT,RT,0,59 # remove generated 6 from # sign position BCD Arithmetic subf RT,RT,r8 b done Addition of the signed 15-digit BCD operand in register RA to the signed BCD operand in register RB can be samesign: accomplished as follows. If the signs of operands are rldicr r8,RB,0,59 # remove the sign code different, then the operand of smaller magnitude is sub- add r10,RA,r11 # add 6's tracted from the operand of larger magnitude and the add r9,r10,r8 sign of the larger operand is preserved; otherwise the addg6s RT,r10,RB subf RT,RT,r9 # RT = RA +BCD RB operands are added and the sign is preserved. The sign code is in the low order 4 bits of the operands done: and uses one of the standard encodings. (See Section 5.3 of Book I for a description of BCD and sign encodings.) This example assumes preferred sign option 1 (0b1100 is plus and 0b1101 is minus). For pre- ferred sign option 2 (0b1111 is plus and 0b1101 is Appendix E. Programming Examples 885 Version 2.06 E.3 Unsigned Extended-Preci- sion BCD Arithmetic Multiple precision BCD arithmetic requires additional code to add/subtract higher order digits and handle the carry between 16 digit groups. For example, the follow- ing sequence implements a 32-digit BCD add. In this example the contents of register R3 concatenated with the contents of R4 represent the first 32-digit operand and the contents of register R5 concatenated with the contents of R6 represents the second operand. The contents of register R3 concatenated with the contents of register R4 represents the result. (In this example r0 contains 0x6666 6666 6666 6666.) add r10,R4,r0 addc r9,r10,R6 # generate the carry addg6s R4,r10,R6 subf R4,R4,r9 # RT1 = RA1 +BCD RB1 addze R5,R5 # propagate the carry add r10,R3,r0 add r9,r10,R5 addg6s R3,r10,R5 subf R3,R3,r9 # RT0 = RA0 +BCD RB0 Note that an extra instruction (addze) is required to propagate the carry so that the same value is used in the subsequent add and addg6s. The following sequence implements a 32-digit BCD subtraction. In this example the first operand in R3 and R4 is subtracted from the 2nd operand in R5 and R6.The result is in R3 and R4. addi r10,R6,1 nor r9,R4,R4 # one's complement of RA0 addc r8,r10,r9 # Generate the carry addg6s R4,r10,r9 subf R4,R4,r8 # RT1 = RB1 -BCD RA1 addze r10,R5 # propagate the carry nor r9,R3,R3 # one's complement of RA0 add r8,r10,r2 addg6s R3,r10,r9 subf R3,R3,r8 # RT0 = RB0 -BCD RA0 886 Power ISATM III-S Version 2.06 Book III-E: Power ISA Operating Environment Architecture - Embedded Environment [Category: Embedded] Book III-E: Power ISA AS Operating Environment Architecture 887 Version 2.06 888 Power ISATM Book III-E Version 2.06 Chapter 1. Introduction 1.1 Overview. . . . . . . . . . . . . . . . . . . . 889 1.5 Exceptions. . . . . . . . . . . . . . . . . . . 892 1.2 32-Bit Implementations . . . . . . . . . 889 1.6 Synchronization . . . . . . . . . . . . . . 892 1.3 Document Conventions . . . . . . . . 889 1.6.1 Context Synchronization . . . . . . 892 1.3.1 Definitions and Notation . . . . . . 889 1.6.2 Execution Synchronization . . . . . 893 1.3.2 Reserved Fields. . . . . . . . . . . . . 891 1.4 General Systems Overview . . . . . 891 1.1 Overview interrupt" or "Unimplemented Operation exception type Program interrupt", as appropriate. Chapter 1 of Book I describes computation modes, For "system instruction storage error handler" sub- document conventions, a general systems overview, stitute "Instruction Storage interrupt" or "Instruction instruction formats, and storage addressing. This chap- TLB Error", as appropriate. ter augments that description as necessary for the Power ISA Operating Environment Architecture. For "system privileged instruction error handler" substitute "Privileged Instruction exception type Program interrupt". 1.2 32-Bit Implementations For "system service program" substitute "System Call interrupt". Though the specifications in this document assume a 64-bit implementation, 32-bit implementations are per- For "system trap handler" substitute "Trap type mitted as described in Appendix C, "Guidelines for Program interrupt". 64-bit Implementations in 32-bit Mode and 32-bit Imple- mentations" on page 1111. 1.3.1 Definitions and Notation The definitions and notation given in Books I and II are 1.3 Document Conventions augmented by the following. Threaded processor, single-threaded processor, The notation and terminology used in Book I apply to thread this Book also, with the following substitutions. For "system alignment error handler" substitute A threaded processor implements one or more "Alignment interrupt". "threads", where a thread corresponds to the Book I/II concept of "processor". That is, the definition of For "system auxiliary processor enabled exception "thread" is the same as the Book I definition of error handler" substitute "Auxiliary Processor "processor", and "processor" as used in Books I Enabled Exception type Program interrupt", and II can be thought of as either a single-threaded For "system data storage error handler" substitute processor or as one thread of a multi-threaded "Data Storage interrupt" or Data TLB Error inter- processor. The only unqualified uses of "proces- rupt" as appropriate. sor" in Book III are in resource names (e.g. Pro- cessor Identification Register); such uses should For "system error handler" substitute "interrupt". be regarded as meaning "threaded processor". For "system floating-point enabled exception error The threads of a multi-threaded processor typically handler" substitute "Floating-Point Enabled Excep- share certain resources, such as the hardware tion type Program interrupt". components that execute certain kinds of instruc- tions (e.g., Fixed-Point instructions), certain For "system illegal instruction error handler" substi- tute "Illegal Instruction exception type Program Chapter 1. Introduction 889 Version 2.06 caches, the address translation mechanism, and alteration, the result of accessing the associ- certain hypervisor resources. ated storage is undefined. In other cases of violation of a rule that is Thread enabled, thread disabled stated using the word "must", the results are A thread can be enabled or disabled. When boundedly undefined unless otherwise stated. enabled, the thread can prefetch and execute instructions; when disabled, prefetched instruc- Programming Note tions are discarded, and the thread cannot Contrary to the general principle of partition prefetch or execute instructions. isolation, the result of accessing storage Performed associated with violations of the require- ments for storage control bit values and their An explicit modification, by a thread T1, of a alteration is specified as "undefined". This is shared SPR (using mtspr) or an entry in a shared the only case where a guest operating sys- TLB (using tlbwe), is performed with respect to tem can cause "undefined" results. Embed- thread T2 when a read of the shared SPR or TLB ded architecture does this as a hardware entry (using mfspr or tlbre respectively) by T2 will simplification because of the limited amount return the result of the modification (or of a subse- of code involved with storage control bits and quent modification). For T2, the effects of such a because operating systems used in the modification having been performed with respect Embedded environment can be tested and to T2 are the same as if the mtspr or tlbwe were then controlled. in T2's instruction stream at the point at which the modification was performed with respect to T2. T1 context of a program and T2 may be any threads that share the SPR or TLB with one another, and may be the same The state (e.g., privilege and relocation) in which thread. the program executes. The context is controlled by the contents of certain System Registers, such as real page the MSR, of certain lookaside buffers, such as the A unit of real storage that is aligned at a boundary TLB, and of other resources. that is a multiple of its size. The real page size may exception range from 1KB to 1TB [MAV=1.0] or to 2TB [MAV=2.0]. An error, unusual condition, or external signal, that may set a status bit and may or may not cause an [MAV=x.x] interrupt, depending upon whether the correspond- Instructions and facilities are considered part of all ing interrupt is enabled. MMU architecture versions unless otherwise marked. If a facility or section is marked with a specific MMU Architecture version x.x, that facility interrupt or all material in that section and its subsections The act of changing the machine state in response are considered part of the specific MMU architec- to an exception, as described in Chapter ture version. 7. "Interrupts and Exceptions" on page 1013. "must" trap interrupt If the Embedded.Hypervisor category is not sup- An interrupt that results from execution of a Trap ported and privileged software violates a rule that instruction. is stated using the word "must", the results are undefined. If the Embedded.Hypervisor category is Additional exceptions to the sequential execution supported, the following applies. model, beyond those described in the section enti- If hypervisor software violates a rule that is tled "Instruction Fetching" in Book I, are the follow- stated using the word "must" (e.g., "this field ing. must be set to 0"), and the rule pertains to the - A reset or Machine Check interrupt may contents of a hypervisor resource, to execut- occur. The determination of whether an ing an instruction that can be executed only in instruction is required by the sequential exe- hypervisor state, or to accessing storage cution model is not affected by the potential using a TLB entry with a TGS value of 0, the occurrence of a reset or Machine Check inter- results are undefined, and may include alter- rupt. (The determination is affected by the ing resources belonging to other partitions, potential occurrence of any other kind of inter- causing the system to "hang", etc. rupt.) If supervisor software violates the require- ments for storage control bit values or their - A context-altering instruction is executed (Chapter 12. "Synchronization Requirements 890 Power ISATM Book III-E Version 2.06 for Context Alterations" on page 1099). The Unless otherwise stated, no defined field other context alteration need not take effect until the than the one(s) specifically being updated are required subsequent synchronizing operation modified. has occurred. Contents of reserved fields are either preserved or hardware written as zero. Any combination of hard-wired implementation, The reader should be aware that reading and writing of emulation assist, or interrupt for software assis- some of these registers (e.g., the MSR) can occur as a tance. In the last case, the interrupt may be to an side effect of processing an interrupt and of returning architected location or to an implementation- from an interrupt, as well as when requested explicitly dependent location. Any use of emulation assists by the appropriate instruction (e.g., mtmsr instruction). or interrupts to implement the architecture is imple- mentation-dependent. hypervisor privileged (or hypervisor-privileged) 1.4 General Systems Overview If category E.HV is implemented, this term The hardware contains the sequencing and processing describes an instruction, register, or facility that is controls for instruction fetch, instruction execution, and available only when the thread is in hypervisor interrupt action. Most implementations also contain state. Otherwise, this term describes an instruc- data and instruction caches. Instructions fall into the tion, register, or facility that is available only when following classes: the thread is in supervisor state. instructions executed in the Branch Facility privileged state and supervisor state instructions executed in the Fixed-Point Facility instructions executed in the Floating-Point Facility Used interchangeably to refer to a state in which instructions executed in the Vector Facility privileged facilities are available. instructions executed in an Auxiliary Processor guest state other instructions A state used to run software under control of a Almost all instructions executed in the Branch Facility, hypervisor program in which hypervisor-privileged Fixed-Point Facility, Floating-Point Facility, and Vector facilities are not available. Facility are nonprivileged and are described in Book I. guest supervisor state Book I may describe additional nonprivileged instruc- tions (e.g., Book II describes some nonprivileged A state which is in both the guest state and the instructions for cache management). Instructions exe- supervisor state. cuted in an Auxiliary Processor are implementation- problem state and user mode dependent. Instructions related to the supervisor mode, control of hardware resources, control of the storage Used interchangeably to refer to a state in which hierarchy, and all other privileged instructions are privileged facilities are not available. described here or are implementation-dependent. directed In a hypervised system, the attribute of an interrupt 1.5 Exceptions that execution occurs in the guest supervisor or hypervisor state as described in Section 2.3.1, The following augments the exceptions defined in Book "Directed Interrupts". I that can be caused directly by the execution of an /, //, ///, ... denotes a field that is reserved in an instruction: instruction, in a register, or in an architected stor- the execution of a floating-point instruction when age table. MSRFP=0 (Floating-Point Unavailable interrupt) ?, ??, ???, ... denotes a field that is implementa- execution of an instruction that causes a debug tion-dependent in an instruction, in a register, or in event (Debug interrupt). an architected storage table. the execution of an auxiliary processor instruction when the auxiliary processor is unavailable (Auxil- 1.3.2 Reserved Fields iary Processor Unavailable interrupt) Some fields of certain architected registers may be the execution of a Vector, SPE, or Embedded written to automatically by the hardware, e.g., Floating-Point instruction when MSRSPV=0 (SPE/ Reserved bits in System Registers. When the hardware Embedded Floating-Point/Vector Unavailable writes to such a register, the following rules are interrupt) obeyed. Chapter 1. Introduction 891 Version 2.06 1.6 Synchronization Programming Note The context established by a context synchronizing The synchronization described in this section refers to operation includes modifications to shared the state of the thread that is performing the synchroni- resources by other threads, that were performed zation. with respect to the context synchronizing thread before the operation was initiated. Examples of 1.6.1 Context Synchronization potentially shared resources include TLBs and SPRs such as the LPIDR. An instruction or event is context synchronizing if it sat- isfies the requirements listed below. Such instructions and events are collectively called context synchronizing Programming Note operations. The context synchronizing operations A context synchronizing operation is necessarily include the dnh instruction, the isync instruction, the execution synchronizing; see Section 1.6.2. System Linkage instructions, the mtmsr instruction, Unlike the Synchronize instruction, a context syn- and most interrupts (see Section 7.1). Also, the combi- chronizing operation does not affect the order in nation of disabling and enabling a thread is context- which storage accesses are performed. synchronizing for the thread being enabled (See Sec- tion 3). Item 2 permits a choice only for isync (and sync; 1. The operation causes instruction dispatching (the see Section 1.6.2) because all other execution syn- issuance of instructions by the instruction fetching chronizing operations also alter context. mechanism to any instruction execution mecha- nism) to be halted. 1.6.2 Execution Synchronization 2. The operation is not initiated or, in the case of dnh [Category: Embedded.Enhanced Debug], isync An instruction is execution synchronizing if it satisfies does not complete, until all instructions that pre- items 2 and 3 of the definition of context synchroniza- cede the operation have completed to a point at tion (see Section 1.6.1). sync is treated like isync with which they have reported all exceptions they will respect to item 2. The execution synchronizing instruc- cause. tions are sync, mtmsr and all context synchronizing instructions. 3. The operation ensures that the instructions that precede the operation will complete execution in Programming Note the context (privilege, relocation, storage protec- tion, etc.) in which they were initiated. 4. If the operation directly causes an interrupt (e.g., Unlike a context synchronizing operation, an exe- sc directly causes a System Call interrupt) or is an cution synchronizing instruction does not ensure interrupt, the operation is not initiated until no that the instructions following that instruction will exception exists having higher priority than the execute in the context established by that instruc- exception associated with the interrupt (see tion. This new context becomes effective some- Section 7.9, "Exception Priorities" on page 1059). time after the execution synchronizing instruction completes and before or at a subsequent context 5. The operation ensures that the instructions that fol- synchronizing operation. low the operation will be fetched and executed in the context established by the operation. (This requirement dictates that any prefetched instruc- tions be discarded and that any effects and side effects of executing them out-of-order also be dis- carded, except as described in Section 6.5, "Per- forming Operations Out-of-Order".) The operation ensures that all explicit modifica- tions of shared SPRs (mtspr) and shared TLBs (tlbwe), caused by instructions that precede the operation, have been performed with respect to all other threads that share the SPR or TLB. 892 Power ISATM Book III-E Version 2.06 Chapter 2. Logical Partitioning [Category: Embedded.Hypervisor] 2.1 Overview. . . . . . . . . . . . . . . . . . . . 895 2.3 Interrupts and Exceptions . . . . . . . 896 2.2 Registers . . . . . . . . . . . . . . . . . . . 896 2.3.1 Directed Interrupts . . . . . . . . . . . 896 2.2.1 Register Mapping . . . . . . . . . . . 896 2.3.2 Hypervisor Service Interrupts. . . 897 2.2.2 Logical Partition Identification Regis- 2.4 Instruction Mapping. . . . . . . . . . . . 897 ter (LPIDR) . . . . . . . . . . . . . . . . . . . . . 896 2.1 Overview ties associated with Logical Partitioning are described within the appropriate sections within this Book. The Embedded.Hypervisor category permits threads An instruction that is hypervisor privileged must be exe- and portions of real storage to be assigned to logical cute in the hypervisor state (MSRGS PR = 0b00). If an collections called partitions, such that a program exe- attempt is made to execute a hypervisor-privileged cuting in one partition cannot interfere with any pro- instruction in the guest supervisor state (MSRGS PR = gram executing in a different partition. This isolation 0b10), an Embedded Hypervisor Privilege exception can be provided for both problem state and privileged occurs. A register that is hypervisor privileged may only state programs, by using a layer of trusted software, be accessed in the hypervisor state (MSRGS PR = called a hypervisor program (or simply a "hypervisor"), 0b00). If a hypervisor-privileged register is accessed in and the resources provided by this facility to manage the guest supervisor state (MSRGS PR = 0b10), an system resources. The collection of software that runs Embedded Hypervisor Privilege exception occurs. in a given partition and its associated resources is called a guest. The guest normally includes an operat- When MSRGS PR = 0b01 or MSRGS PR = 0b11, the ing system (or other system software) running in privi- thread is in problem (user) state. The resources leged state and its associated processes running in the (instructions and registers) that are available are gener- problem state under the management of the hypervi- ally the same when MSRPR = 0b1 regardless of the sor. The thread is in the guest state when a guest is state of MSRGS, however when MSRGS PR = 0b11 executing and is in the hypervisor state when the some interrupts are directed to the guest supervisor hypervisor is executing. The thread is executing in the state. When MSRGS PR = 0b01 interrupts are always guest state when MSRGS=1. directed to the hypervisor (see Section 2.3.1, "Directed Interrupts"). The number of partitions supported is implementation- dependent. Category Embedded.Hypervisor changes the operating system programming model to allow for easier virtual- A thread is assigned to one partition at any given time. ization, while retaining a default backward compatible A thread can be assigned to any given partition without mode in which an operating system written for hard- consideration of the physical configuration of the sys- ware not implementing this category will still operate as tem (e.g. shared registers, caches, organization of the before without using the Logical Partitioning facilities. storage hierarchy), except that threads that share cer- tain hypervisor resources may need to be assigned to Category Embedded.Hypervisor requires that category the same partition. Additionally, certain resources may Embedded.Processor Control and category Embed- be utilized by the guest at the discretion of the hypervi- ded.MMU Type FSL are also supported. sor. Such usage may cause interference between parti- tions and the hypervisor should allocate those resources accordingly. The primary registers and facili- ties used to control Logical Partitioning are listed below and described in the following subsections. Other facili- Chapter 2. Logical Partitioning [Category: Embedded.Hypervisor] 895 Version 2.06 2.2 Registers SPR Accessed SPR Type of Access Mapped to Registers specific to Logical Partitioning and hypervisor SPRG3 GSPRG3 mtspr, mfspr control are defined in this section. Other registers 1 which are hypervisor privileged or have hypervisor-only If an implementation permits user access to fields appear and are described in other sections in this SPRG3, user access to GSPRG3 should also be Book. permitted. In this case, the access is mapped when executing in guest state regardless of the setting of MSRPR 2.2.1 Register Mapping Table 1: Mapped SPRs To facilitate better performance for operating systems executing in the guest supervisor state, some Special 2.2.2 Logical Partition Identifica- Purpose Register (SPR) accesses are redirected to analogous guest-state SPRs. An SPR is said to be tion Register (LPIDR) mapped if this redirection takes place when executing The Logical Partition Identification Register (LPIDR) in guest supervisor state. These guest-state SPRs sep- contains the Logical Partition ID (LPID) currently in use arate performance critical state of the hypervisor and for the thread. The format of the LPIDR is shown in the operating system executing in guest supervisor Figure 1 below. state. The mapping of these register accesses allows the same programming model to be used for an operat- LPIDR ing system running in the guest supervisor state or in 32 63 the hypervisor state. Figure 1. Logical Partition Identification Register For example, when a mtspr SRR0,r5 instruction is exe- cuted in guest supervisor state, the access to SRR0 is The LPIDR is part of the virtual address. During mapped to GSRR0. This produces the same operation address translation, its content is compared to the as executing mtspr GSRR0,r5 TLPID field in the TLB entry to determine a matching TLB entry. . Programming Note The LPIDR is hypervisor privileged. Since accesses to the mapped SPRs are automati- cally mapped to the appropriate guest-accessible The 12 least significant bits of LPIDR contain the LPID SPR, guest supervisor software should use the value. All 12 bits do not need to be implemented. Unim- original SPRs for accessing these registers (i.e. plemented bits should read as zero. The number of SRR0, not GSRR0). This facilitates using the implemented bits is reported in MMUCFGLPIDSIZE same code in hypervisor or guest state. SPR accesses that are mapped in guest supervisor 2.3 Interrupts and Exceptions state are listed in Table 1. 2.3.1 Directed Interrupts SPR Type of Access SPR Accessed Mapped to Category Embedded.Hypervisor introduces new inter- SRR0 GSRR0 mtspr, mfspr rupt semantics. Interrupts are directed to either the guest state or the hypervisor state. The state to which SRR1 GSRR1 mtspr, mfspr interrupts are directed determines which SPRs are EPR GEPR mfspr used to form the vector address, which save/restore ESR GESR mtspr, mfspr registers are used to capture the thread state at the DEAR GDEAR mtspr, mfspr time of the interrupt, and which registers are used to post exception status. PIR GPIR mfspr If IVORs [Category: Embedded.Phased-Out] are SPRG0 GSPRG0 mtspr, mfspr supported, interrupts directed to the guest state SPRG1 GSPRG1 mtspr, mfspr use the Guest Interrupt Vector Prefix Register (GIVPR) to determine the high-order 48 bits of the SPRG2 GSPRG2 mtspr, mfspr vector address and use Guest Interrupt Vector 1 If an implementation permits user access to Registers (GIVORs) to provide the low-order 16 SPRG3, user access to GSPRG3 should also be bits (of which the last 4 bits are 0). permitted. In this case, the access is mapped If Interrupt Fixed Offsets [Category: Embed- when executing in guest state regardless of the ded.Phased-In] are supported, interrupts directed setting of MSRPR to the guest state use the Guest Interrupt Vector Table 1: Mapped SPRs Prefix Register (GIVPR) to determine the high- 896 Power ISATM Book III-E Version 2.06 order 52 bits of the vector address and use the 12- occur if a ehpriv instruction is executed regardless of bit exception vector offsets (described in Section the thread state. 7.2.15) to provide the low-order 12 bits (of which The Embedded Hypervisor System Call Interrupt the last 5 bits are 0). occurs when an sc instruction is executed and LEV=1. If IVORs [Category: Embedded.Phased-Out] are The sc instruction is described in Section 4.3.1, "Sys- supported, interrupts directed to the embedded tem Linkage Instructions". hypervisor state use the IVPR for the upper 48 bits of the address and the IVORs for the lower 16 bits of the address. If Interrupt Fixed Offsets [Category: Embed- 2.4 Instruction Mapping ded.Phased-In] are supported, interrupts directed When executing in the guest supervisor state to the embedded hypervisor state use one of the (MSRGS PR = 0b10), execution of an rfi instruction is following for the interrupt vector address. mapped to rfgi and the rfgi instruction is executed in If the Machine Check Interrupt Vector Prefix place of the rfi. The mapping of these instructions Register (see Section 7.2.18.4) is supported allows the same programming model to be used for an and the interrupt is a Machine Check, operating system running in the guest supervisor state MCIVPR provides the high-order 52 bits of the or in the hypervisor state. vector address and the 12-bit exception vector offsets (described in Section 7.2.15) provides the low-order 12 bits (of which the last 5 bits are 0). Otherwise, IVPR provides the high-order 52 bits of the vector address and the 12-bit exception vector offsets (described in Section 7.2.15) provides the low-order 12 bits (of which the last 5 bits are 0). Interrupts that are directed to the guest state use GSRR0/GSRR1 registers to save the context at inter- rupt time. Interrupts directed to the embedded hypervi- sor state use SRR0/SRR1, with the exception of Guest Processor Doorbell interrupts which use GSRR0/ GSRR1. Most interrupts are directed to the hypervisor state. Some interrupts may be directed to the guest state if the thread is currently executing in guest state (MSRGS = 1) and the interrupt is configured by the EPCR to the guest state or the interrupt is a System Call Interrupt. An Instruction Storage or Data Storage interrupt result- ing from a TLB Ineligible exception is always directed to the hypervisor state. Debug interrupts, Criti- cal interrupts, and Machine Check interrupts are always directed to the hypervisor state. 2.3.2 Hypervisor Service Inter- rupts Two interrupts exist as mechanisms for the hypervisor to provide services to the guest. The Embedded Hypervisor Privilege Interrupt occurs when guest supervisor state attempts execution of a hypervisor-privileged instruction or attempts to access a hypervisor-privileged resource. This can be used by the hypervisor to provide virtualization services for the guest. The Embedded Hypervisor Privilege Interrupt is described in Section 7.6.28, "Embedded Hypervisor Privilege Interrupt [Category: Embedded.Hypervisor]". An Embedded Hypervisor Privilege Interrupt will also Chapter 2. Logical Partitioning [Category: Embedded.Hypervisor] 897 Version 2.06 898 Power ISATM Book III-E Version 2.06 Chapter 3. Thread Control [Category: Embedded Multi- Threading] 3.1 Overview. . . . . . . . . . . . . . . . . . . . 899 3.7 Thread Management Facility [Cate- 3.2 Thread Identification Register gory: Embedded Multithreading.Thread (TIR) . . . . . . . . . . . . . . . . . . . . . . . . . . 899 Management]. . . . . . . . . . . . . . . . . . . . 901 3.3 Thread Enable Register (TEN) . . . 899 3.7.1 Initialize Next Instruction Address 3.4 Thread Enable Status Register Registers . . . . . . . . . . . . . . . . . . . . . . . 901 (TENSR) . . . . . . . . . . . . . . . . . . . . . . . 900 3.7.2 Thread Management 3.5 Disabling and Enabling Threads . . 900 Instructions . . . . . . . . . . . . . . . . . . . . . 902 3.6 Sharing of Multi-Threaded Processor Resources . . . . . . . . . . . . . . . . . . . . . . 900 3.1 Overview 3.3 Thread Enable Register The Thread Control facility permits the hypervisor to (TEN) control and monitor the execution, priority, and other The layout of the TEN is shown in below. aspects of threads. TEN 3.2 Thread Identification Regis- 0 63 ter (TIR) Figure 3. Thread Enable Register The TEN is a 64-bit register. For t < T, where T is the The layout of the TIR is shown in below. number of threads supported by the implementation, bit 63-t corresponds to thread t. When TEN63-t is 0, thread TIR t is disabled. When TEN63-t is 1, thread t is enabled. 0 63 Software is permitted to write any value to bits 0:63-T; Figure 2. Thread Identification Register a subsequent reading of these bits always returns 0. The TIR is a 64-bit read-only register that can be used The TEN can be accessed using two SPR numbers. to distinguish the thread from other threads on a multi- threaded processor. Threads are numbered sequen- - When SPR 438 (Thread Enable Set, or TENS) tially, with valid values ranging from 0 to t-1, where t is is written, threads for which the corresponding the number of threads implemented. A thread for which bit in TENS is 1 are enabled; threads for which TIR = n is referred to as "thread n." the corresponding bit in TENS is 0 are unaf- fected. The TIR is hypervisor privileged. - When SPR 439 (Thread Enable Clear, or TENC) is written, threads for which the corre- sponding bit in TENC is 1 are disabled; threads for which the corresponding bit in TENC is 0 are unaffected. When each SPR is read, the current value of the TEN is returned. The TEN is hypervisor privileged. Chapter 3. Thread Control [Category: Embedded Multi-Threading] 899 Version 2.06 Programming Note Programming Note Software can determine the number of threads The architecture provides no method to make a supported by the implementation by setting each thread's updates to shared storage visible to other progressively higher-order bit to 1, and testing threads before it is disabled. Similarly, the architec- whether a subsequent read returns a 1. Because ture provides no method to make updates to this operation enables the thread, software should shared storage made while a thread is disabled vis- ensure that an acceptable instruction sequence is ible to a thread when it is subsequently enabled. located at the thread's starting effective address. (See Section 8.3, "Thread State after Reset".) Programming Note When thread T1 disables other threads, Tn, it sets the TEN bits corresponding to Tn to 0s. In order to 3.4 Thread Enable Status Regis- ensure that all updates to shared SPRs and shared ter (TENSR) TLBs caused by instructions being performed by threads Tn have been performed with respect to all The layout of the TENSR is shown in below. threads on a multi-threaded processor, thread T1 reads the TENSR until all the bits corresponding to TENSR the disabled threads, Tn, are 0s. 0 63 Figure 4. Thread Enable Status Register 3.6 Sharing of Multi-Threaded The TENSR is a 64-bit read-only register. Bit 63-t of the TENSR corresponds to thread t. The contents of the Processor Resources TENSR are equal to the contents of the TEN, except The PVR and TEN must be shared among all threads that when TEN63-t changes from 1 to 0, TENSR63-t of a multi-threaded processor. Various other resources does not change from 1 to 0 until thread t is disabled. are allowed to be shared among threads. Programs The TENSR is hypervisor privileged. that modify shared resources must be aware of such sharing, and must allow for the fact that changes to these resources may affect more than one thread. 3.5 Disabling and Enabling Resources that may be shared are grouped into the fol- Threads lowing five groups of related resources. If any of the resources in a group are shared among threads, all of The combination of disabling and enabling a thread is the resources in the group must be shared. context-synchronizing for the thread being enabled. ATB, ATBL, ATBU [Category: ATB] Steps 1-3 of context synchronization (see Section IVORs [Category: Phased-Out] 1.6.1) occur as a result of the thread being disabled, IVPR and updates to SPRs and shared TLBs caused by pre- TB, TBL, TBU ceding instructions executed by the thread occur. When MMUCFG, MMUCSR0, TLB, TLBnCFG, all updates to these shared SPRs and shared TLBs TLBnCFG2, TLBnEPT, [Category: Embed- have been performed with respect to all other threads ded.Hypervisor.LRAT]: LRAT, LRATCFG, on a multi-threaded processor, the TENSR bit corre- LRATCFG2 sponding to the disabled thread is set to 0. If the implementation requires all threads to be in the Asynchronous interrupts that occur after the thread is same partition, the following additional groups of disabled are pended until the thread is enabled. resources may be shared. If any of the resources in a When a thread is enabled by setting the TEN bit corre- group are shared among threads, all of the resources in sponding to the thread to 1, the thread begins execu- the group must be shared. tion at the next instruction to be executed when it was DAC1, DAC2, DVC1, DVC2, IAC1, IAC2, IAC3 disabled or at the effective address specified by the EHCSR [Category: Embedded.Hypervisor] INIA [Category: Embedded Multi-threading.Thread GIVORs [Category: Phased-Out] Management] if the INIA corresponding to the thread GIVPR [Category: Embedded.Hypervisor] was written while the thread was disabled. LPIDR [Category: Embedded.Hypervisor] Certain implementation-dependent registers, instruc- tion and Data Caches, and implementation-dependent look-aside information may also be shared. The set of resources that are shared is implementation- dependent. 900 Power ISATM Book III-E Version 2.06 Programming Note 3.7 Thread Management Facility When software executing in thread T1 writes a new [Category: Embedded Multi- value in an SPR (mtspr) that is shared with other threads, or explicitly writes to an entry in a shared threading.Thread Management] TLB (tlbwe), either of the following sequences of operations can be performed in order to ensure The thread management facility enables software to that the write operation has been performed with control features related to threads. The capabilities pro- respect to other threads. vided allow software, for a disabled thread, to specify the address of the instruction to be executed when the Sequence 1 thread is enabled. Other implementation-dependent Disable all other threads (see Section 3.5) capabilities may also be provided. Write to the shared SPR (mtspr) or to the shared TLB (tlbwe) Perform a context synchronizing operation 3.7.1 Initialize Next Instruction Enable the previously-disabled threads Address Registers In the above sequence, the context synchronizing The Initialize Next Instruction Address (INIAn, where n operation ensures that the write operation has = 0..63) registers are 64-bit write-only registers that can been performed with respect to all other threads be used to specify the effective address of the instruc- that share the SPR or TLB; the enabling of other tion to be executed when a currently-disabled thread is threads ensures that subsequent instructions of the enabled. INIAn corresponds to thread n. enabled threads use the new SPR or TLB value since enabling a thread is a context synchronizing Instruction Address 0 operation. 0 62 63 Sequence 2 Figure 5. Initialize Next Instruction Address All threads are put in hypervisor state and Register begin polling a storage flag The thread updating the SPR or TLB does the Bit 63 is always 0. Bit 62 is part of the Instruction following: Address if Category: VLE is supported; otherwise bit 62 Writes to the SPR (mtspr) or the TLB is always 0. (tlbwe) When the INIA is written in 32-bit mode, bits 0:31 are Sets a storage flag indicating the write set to 0s. operation was done Performs a context synchronizing opera- The initial value of all INIAs is x'FFFF FFFF FFFF tion FFFC.' When other threads see the updated storage flag, they perform context synchronizing oper- ations. In the above sequence, the context synchronizing operation by the thread that writes to the SPR or TLB ensures that the write operation has been per- formed with respect to all other threads that share the SPR or TLB; the context synchronizing opera- tion by the other threads ensure that subsequent instructions for these threads use the updated value. Chapter 3. Thread Control [Category: Embedded Multi-Threading] 901 Version 2.06 3.7.2 Thread Management Instruc- tions Move To Thread Management Register XFX-form mttmr TMR,RS 31 RS tmr 494 / 0 6 11 21 31 n tmr5:9 || tmr0:4 TMR(n) (RS) The TMR field denotes a Thread Management Regis- ter, encoded as shown in the table below. The contents of register RS are placed into the designated Thread Management Register. TMR1 Register decimal tmr5:9 tmr0:4 Name 320 01010 00000 INIA0 ... ................... ... 383 10111 11111 INIA63 1 Note that the order of the two 5-bit halves of the SPR number is reversed. Figure 6. Thread Management Register Numbers All values of the TMR field not shown in Figure 6 are implementation-specific. An implementation only provides INIA registers corre- sponding to its implemented threads. Execution of this instruction specifying a TMR number that is not defined for the implementation causes an Illegal Instruction type Program interrupt if MSRGS PR=0b00. This instruction is hypervisor privileged. Special Registers Altered: See above 902 Power ISATM Book III-E Version 2.06 Chapter 4. Branch Facility 4.1 Branch Facility Overview . . . . . . . 903 4.2.3 Embedded Processor Control Regis- 4.2 Branch Facility Registers . . . . . . . 903 ter (EPCR) . . . . . . . . . . . . . . . . . . . . . . 906 4.2.1 Machine State Register . . . . . . . 903 4.3 Branch Facility Instructions . . . . . . 908 4.2.2 Machine State Register Protect Reg- 4.3.1 System Linkage Instructions . . . 908 ister (MSRP) . . . . . . . . . . . . . . . . . . . . 905 4.1 Branch Facility Overview 0 The thread is in hypervisor state if MSRPR = 0. This chapter describes the details concerning the regis- 1 The thread is in guest state. ters and the privileged instructions implemented in the MSRGS cannot be changed unless thread Branch Facility that are not covered in Book I. is in the hypervisor state. 36 Implementation-dependent 4.2 Branch Facility Registers 37 User Cache Locking Enable (UCLE) [Category: Embedded Cache Locking.User Mode] 4.2.1 Machine State Register 0 Cache Locking instructions are privileged. The MSR (MSR) is a 32-bit register. MSR bits are num- 1 Cache Locking instructions can be exe- bered 32 (most-significant bit) to 63 (least-significant cuted in user mode (MSRPR=1). bit). This register defines the state of the thread. The MSR can also be modified by the mtmsr, rfi, rfci, rfdi If category Embedded Cache Locking.User [Category: Embedded.Enhanced Debug], rfmci, rfgi Mode is not supported, this bit is treated as [Category: Embedded.Hypervisor], wrtee and wrteei reserved. instructions and interrupts. It can be read by the mfmsr 38 SP/Embedded Floating-Point/Vector Avail- instruction. able (SPV) [Category: Signal Processing]: MSR 0 The thread cannot execute any SP 32 63 instructions except for the brinc instruc- Figure 7. Machine State Register tion. 1 The thread can execute all SP instruc- Below are shown the bit definitions for the Machine tions. State Register. [Category: Vector]: Bit Description 0 The thread cannot execute any Vector 32 Computation Mode (CM) instruction. 1 The thread can execute Vector instruc- 0 The thread runs in 32-bit mode. tions. 1 The thread runs in 64-bit mode. 39:45 Reserved 33 Reserved 34 Implementation-dependent 35 Guest State (GS) [Category: Embedded.Hypervisor] Chapter 4. Branch Facility 903 Version 2.06 46 Critical Enable (CE) 0 Critical Input, Watchdog Timer, Guest Processor Doorbell Critical , and 51 Machine Check Enable (ME) Processor Doorbell Critical interrupts are disabled. 0 Machine Check interrupts are disabled. 1 Critical Input, Watchdog Timer, Guest 1 Machine Check interrupts are enabled. Processor Doorbell Critical , and Processor Doorbell Critical interrupts are [Category: Embedded.Hypervisor] enabled. Machine Check interrupts with the excep- tion of Guest Processor Doorbell Machine [Category: Embedded.Hypervisor] Check are enabled regardless of the state Critical level interrupts with the exception of of MSRME when MSRGS = 1. Guest Processor Doorbell Critical are 52 Floating-Point Exception Mode 0 (FE0) enabled regardless of the state of MSRCE [Category: Floating-Point] when MSRGS = 1. (See below) 47 Reserved 53 Implementation-dependent 48 External Enable (EE) 54 Debug Interrupt Enable (DE) 0 External Input, Decrementer, Fixed-Inter- val Timer, Processor Doorbell, Guest Pro- 0 Debug interrupts are disabled cessor Doorbell and Embedded 1 Debug interrupts are enabled if Performance Monitor interrupts DBCR0IDM=1 are disabled. 55 Floating-Point Exception Mode 1 (FE1) 1 External Input, Decrementer, Fixed-Inter- [Category: Floating-Point] val Timer, Processor Doorbell, Guest Pro- cessor Doorbell , and Embedded (See below) Performance Monitor interrupts 56 Reserved are enabled. 57 Reserved [Category: Embedded.Hypervisor] 58 Instruction Address Space (IS) When an interrupt that is maskable by 0 The thread directs all instruction fetches to MSREE is directed to the hypervisor state, address space 0 (TS=0 in the relevant the interrupt is enabled if MSREE=1 or TLB entry). MSRGS=1 except for Guest Processor 1 The thread directs all instruction fetches to Doorbell which is enabled if MSREE=1 and address space 1 (TS=1 in the relevant MSRGS=1. When an interrupt that is TLB entry). maskable by MSREE is directed to the guest supervisor state, the interrupt is 59 Data Address Space (DS) enabled if MSREE=1 and MSRGS=1. Also, 0 The thread directs all data storage see the EXTGS bit in Section 4.2.3. accesses to address space 0 (TS=0 in the 49 Problem State (PR) relevant TLB entry). 1 The thread directs all data storage 0 The thread is in privileged state (supervi- accesses to address space 1 (TS=1 in the sor state). relevant TLB entry). 1 The thread is in problem state (user mode). 60 Implementation-dependent MSRPR also affects storage access control, 61 Performance Monitor Mark (PMM) as described in Section 6.7.6 [Category: Embedded.Performance Monitor] 50 Floating-Point Available (FP) 0 Disable statistics gathering on marked [Category: Floating-Point] processes. 1 Enable statistics gathering on marked pro- 0 The thread cannot execute any floating- cesses point instructions, including floating-point loads, stores and moves. See Appendix D for additional information. 1 The thread can execute floating-point 62 Reserved instructions. 63 Reserved 904 Power ISATM Book III-E Version 2.06 The Floating-Point Exception Mode bits FE0 and FE1 Bit Definition are interpreted as shown below. For further details see 32:36 Reserved Book I. 37 User Cache Lock Enable Protect (UCLEP) FE0 FE1 Mode [Category: ECL] 0 0 Ignore Exceptions 0 MSRUCLE can be modified in guest super- 0 1 Imprecise Nonrecoverable visor state. 1 0 Imprecise Recoverable 1 MSRUCLE cannot be modified in guest 1 1 Precise supervisor state and guest state cache See Section 8.3 for the initial state of the MSR. locking using dcbtls, dcbtstls, dcblc, icbtls, and icblc is affected as descrbed [Category:Embedded.Hypervisor] later in this section.". Some bits in the MSR can only be changed when the thread is in hypervisor state or the MSRP register has 38:53 Reserved been configured to allow changes in the guest supervi- 54 Debug Enable Protect (DEP) sor state. See Section 4.2.2, "Machine State Register Protect Register (MSRP)". 0 MSRDE can be modified in guest supervi- sor state. Programming Note 1 MSRDE cannot be modified in guest supervisor state. A Machine State Register bit that is reserved may be altered by rfi/rfci/rfmci/rfdi [Category:Embed- 55:60 Reserved ded.Enhanced Debug]/rfgi [Category:Embed- 61 Performance Monitor Mark Protect (PMMP) ded.Hypervisor]. [Category: E.PM] 0 MSRPMM can be modified in guest super- visor state. 4.2.2 Machine State Register Pro- 1 MSRPMM cannot be modified in guest tect Register (MSRP) supervisor state and guest state accesses to Performance Monitor Registers using The Machine State Register Protect Register (MSRP) mfpmr and mtpmr are affected as controls whether certain bits in the Machine State Reg- described later in this section. ister (MSR) can be modified in guest supervisor state. In addition, the MSRP impacts the behavior of cache 62:63 Reserved locking and performance monitor instructions in guest The MSRP is hypervisor privileged. state, as described below. The format of the MSRP is shown in Figure 8 below. A context synchronizing operation must be performed following a change to MSRP to ensure that its changes MSRP are visible in the current context. 32 63 The behavior of cache locking instructions (dcbtls, Figure 8. Machine State Register Protect Register dcbtstls, dcblc, icbtls, icblc) in guest privileged state is dependent on the setting of MSRPUCLEP. When The MSRP is used to prevent guest supervisor state MSRPUCLEP = 0, cache locking instructions are permit- program from modifying the UCLE, DE, or PMM bits in ted to execute normally in the guest privileged state. the MSR. The MSRP bits UCLEP, DEP, and PMMP When MSRPUCLEP = 1, cache locking instructions are control whether the guest can change the correspond- not permitted to execute in the guest privileged state ing MSR bits UCLE, DE, and PMM, respectively. When and cause an Embedded Hypervisor Privilege excep- the MSRP bit associated with a corresponding MSR bit tion. [Category: ECL] is 0, any operation in guest privileged state is allowed to modify that MSR bit, whether from an instruction that The behavior of Performance Monitor instructions modifies the MSR, or from an interrupt from the guest (mtpmr, mfpmr) is dependent on the setting of state which is taken in the guest supervisor state. MSRPPMMP. When MSRPPMMP = 0, Performance When the MSRP bit associated with a corresponding Monitor instructions are permitted to execute normally MSR bit is 1 no operation in guest privileged state is in the guest state. When MSRPPMMP = 1, Performance allowed to modify that MSR bit (i.e., it remains Monitor instructions are not permitted to execute unchanged), whether from an instruction that modifies normally in the guest state. Execution of a mfpmr the MSR, or from an interrupt from the guest state instruction which specifies a user Performance Monitor which is taken in the guest supervisor state. register produces a value of 0 in the destination GPR. In the guest supervisor state (MSRPR = 0 and MSRGS = These bits are interpreted as follows: 1), execution of any mtpmr instruction or execution of a mfpmr instruction which specifies a privileged Chapter 4. Branch Facility 905 Version 2.06 Performance Monitor Register produces an Embedded 0 Instruction TLB Error Interrupts that occur Hypervisor Privilege exception. [Category: E.PM] in the guest state are directed to the hypervisor state. Programming Note 1 Instruction TLB Error Interrupts that occur Setting the MSRP to 0 at initialization allows guest in the guest state are directed to the guest state access to MSRUCLE,DE,PMM and the associ- supervisor state. ated cache locking and performance monitor facili- 35 Data Storage Interrupt Directed to Guest ties. State (DSIGS) [Category: Embedded.Hypervisor] Controls whether a Data Storage Interrupt that 4.2.3 Embedded Processor Con- occurs in the guest state is taken in the guest state or the hypervisor state, except for an trol Register (EPCR) interrupt caused by a TLB Ineligible exception The Embedded Processor Control Register (EPCR) . provides general controls for privileged facilities. The 0 Data Storage Interrupts that occur in the format of the EPCR is shown in Figure 1 below. guest state are directed to the hypervisor state. EPCR 1 Data Storage Interrupts that occur in the 32 63 guest state are directed to the guest supervisor state except that a Data Stor- Figure 9. Embedded Processor Control Register age Interrupt due to a TLB Ineligible These bits are interpreted as follows: exception is directed to the hyper- visor state, regardless of the existence of Bit Definition other exceptions that cause a Data Stor- age interrupt. 32 External Input Interrupt Directed to Guest State (EXTGS) 36 Instruction Storage Interrupt Directed to [Category: Embedded.Hypervisor] Guest State (ISIGS) Controls whether an External Input Interrupt is [Category: Embedded.Hypervisor] taken in the guest supervisor state or the Controls whether an Instruction Storage Inter- hypervisor state. rupt that occurs in the guest state is taken in the guest state or the hypervisor state, except 0 External Input Interrupts are directed to for an interrupt caused by a TLB Ineligible the hypervisor state. External Input Inter- exception . rupts pend until MSRGS=1 or MSREE=1. 1 External Inputs interrupts are directed to 0 Instruction Storage Interrupts that occur in the guest supervisor state. External Input the guest supervisor state are directed to interrupts pend until MSRGS=1 and the hypervisor state. MSREE=1. 1 Instruction Storage Interrupts that occur in the guest state are directed to the guest 33 Data TLB Error Interrupt Directed to Guest supervisor state. State (DTLBGS) [Category: Embedded.Hypervisor] 37 Disable Embedded Hypervisor Debug Controls whether a Data TLB Error Interrupt (DUVD) that occurs in the guest state is taken in the [Category: Embedded.Hypervisor] guest supervisor state or the hypervisor state. Controls whether Debug Events occur in the hypervisor state. 0 Data TLB Error Interrupts that occur in the guest state are directed to the hypervisor 0 Debug events can occur in the hypervisor state. state. 1 Data TLB Error Interrupts that occur in the 1 Debug events, except for the Uncondi- guest state are directed to the guest tional Debug Event, are suppressed in the supervisor state. hypervisor state. It is implementation- dependent whether the Unconditional 34 Instruction TLB Error Interrupt Directed to Debug Event is suppressed. Guest State (ITLBGS) [Category: Embedded.Hypervisor] 38 Interrupt Computation Mode (ICM) Controls whether an Instruction TLB Error [Category: 64-bit] Interrupt that occurs in the guest state is taken If category E.HV is implemented, this bit con- in the guest supervisor state or the hypervisor trols the computational mode of the thread state. when an interrupt occurs that is directed to the hypervisor state. At interrupt time, EPCRICM is 906 Power ISATM Book III-E Version 2.06 copied into MSRCM if the interrupt is directed Engineering Note to the hypervisor state. EPCR only needs to be implemented if category If category E.HV is not implemented, then this Embedded.Hypervisor or category 64-bit are imple- bit controls the computational mode of the mented. thread when any interrupt occurs. At interrupt time, EPCRICM is copied into MSRCM. 0 Interrupts will execute in 32-bit mode. 1 Interrupts will execute in 64-bit mode. 39 Guest Interrupt Computation Mode (GICM) [Category: E.Hypervisor] [Corequisite Category: 64-bit] Controls the computational mode of the thread when an interrupt occurs that is directed to the guest supervisor state. At inter- rupt time, EPCRGICM is copied into MSRCM if the interrupt is directed to the guest supervisor state 0 Interrupts will execute in 32-bit mode. 1 Interrupts will execute in 64-bit mode. 40 Disable Guest TLB Management Instruc- tions (DGTMI) [Category: Embedded.Hypervisor] Controls whether guest supervisor state can execute any TLB management instructions. 0 tlbilx, tlbsrx., and tlbwe (for a Logical to Real Address translation hit) are allowed to execute normally when MSRGS,PR = 0b10. 1 tlbilx, tlbsrx., and tlbwe always cause an Embedded Hypervisor Privilege Interrupt when MSRGS,PR = 0b10. 41 Disable MAS Interrupt Updates for Hyper- visor (DMIUH) [Category: Embedded.Hypervisor] Controls whether MAS registers are updated by hardware when a Data or Instruction TLB Error Interrupt or a Data or Instruction Storage Interrupt is taken in the hypervisor. 0 MAS registers are set as described in Table 11 on page 986 when a Data or Instruction TLB Error Interrupt or a Data or Instruction Storage Interrupt is taken in the hypervisor. 1 MAS registers updates as described in Table 11 are disabled and MAS registers are left unchanged when a Data or Instruction TLB Error Interrupt or a Data or Instruction Storage Interrupt is taken in the hypervisor. 42:63 Reserved This register is hypervisor privileged. Chapter 4. Branch Facility 907 Version 2.06 4.3 Branch Facility Instructions and by which the system can return from performing a service or from processing an interrupt. The System Call instruction is described in Book I, but 4.3.1 System Linkage Instructions only at the level required by an application program- These instructions provide the means by which a pro- mer. A complete description of this instruction appears gram can call upon the system to perform a service, below. System Call SC-form is generated. The interrupt causes the MSR to be set as described in Section 7.6.10 and Section 7.6.27. sc If LEV=0 and the thread is in guest state, the interrupt causes the next instruction to be fetched from the effec- 17 /// /// /// /// // 1 / tive address based on one of the following. 0 6 11 16 20 27 30 31 GIVPR0:47||GIVOR848:59||0b0000 if IVORs [Cate- gory: Embedded.Phased-Out] are supported. sc LEV GIVPR0:51||0x120 if Interrupt Fixed Offsets [Cate- [Category:Embedded.Hypervisor] gory: Embedded.Phased-In] are supported. If LEV=0 and the thread is in hypervisor state, the inter- 17 /// /// /// LEV // 1 / rupt causes the next instruction to be fetched from the 0 6 11 16 20 27 30 31 effective address based on one of the following. IVPR0:47||IVOR848:59||0b0000 if IVORs [Category: if LEV = 0 then Embedded.Phased-Out] are supported. IVPR0:51||0x120 if Interrupt Fixed Offsets [Cate- if MSRGS = 1 then gory: Embedded.Phased-In] are supported. GSRR0 iea CIA + 4 GSRR1 MSR If LEV=1, the interrupt causes the next instruction to be if IVORs supported then fetched from the effective address based on one of the NIA GIVPR0:47 || GIVOR848:59 || 0b0000 following. else IVPR0:47||IVOR4048:59||0b0000 if IVORs [Cate- NIA GIVPR0:51||0x0120 gory: Embedded.Phased-Out] are supported. MSR new_value (see below) IVPR0:51||0x300 if Interrupt Fixed Offsets [Cate- gory: Embedded.Phased-In] are supported. else This instruction is context synchronizing. SRR0 iea CIA + 4 SRR1 MSR Special Registers Altered: if IVORs supported then SRR0 GSRR0 SRR1 GSRR1 MSR NIA IVPR0:47 || IVOR848:59 || 0b0000 else Programming Note NIA IVPR0:51||0x0120 sc serves as both a basic and an extended mne- MSR new_value (see below) monic. The Assembler will recognize an sc mne- monic with one operand as the basic form, and an else if LEV = 1 then sc mnemonic with no operand as the extended SRR0 iea CIA + 4 form. In the extended form, the LEV operand is SRR1 MSR omitted and assumed to be 0. if IVORs supported then NIA IVPR0:47 || IVOR4048:59 || 0b0000 else NIA IVPR0:51||0x300 MSR new_value (see below) If category E.HV is not implemented, the System Call instruction behaves as if MSRGS = 0 and LEV = 0. If MSRGS = 0 or if LEV = 1, the effective address of the instruction following the System Call instruction is placed into SRR0 and the contents of the MSR are copied into SRR1. Otherwise, the effective address of the instruction following the System Call instruction is placed into GSRR0 and the contents of the MSR are copied into GSRR1. If LEV=0, a System Call interrupt is generated. If LEV=1, an Embedded Hypervisor System Call interrupt 908 Power ISATM Book III-E Version 2.06 Return From Interrupt XL-form Return From Critical Interrupt XL-form rfi rfci 19 /// /// /// 50 / 19 /// /// /// 51 / 0 6 11 16 21 31 0 6 11 16 21 31 MSR SRR1 MSR CSRR1 NIA iea SRR00:61 || 0b00 NIA iea CSRR00:61 || 0b00 The rfi instruction is used to return from a base class The rfci instruction is used to return from a critical class interrupt, or as a means of simultaneously establishing interrupt, or as a means of establishing a new context a new context and synchronizing on that new context. and synchronizing on that new context simultaneously. The contents of SRR1 are placed into the MSR. If the The contents of CSRR1 are placed into the MSR. If the new MSR value does not enable any pending excep- new MSR value does not enable any pending excep- tions, then the next instruction is fetched, under control tions, then the next instruction is fetched, under control of the new MSR value, from the address of the new MSR value, from the address SRR00:61||0b00. (Note: VLE behavior may be different; CSRR00:61||0b00. (Note: VLE behavior may be differ- see Book VLE.) If the new MSR value enables one or ent; see Book VLE.) If the new MSR value enables one more pending exceptions, the interrupt associated with or more pending exceptions, the interrupt associated the highest priority pending exception is generated; in with the highest priority pending exception is gener- this case the value placed into the applicable save/ ated; in this case the value placed into SRR0 or restore register 0 by the interrupt processing mecha- CSRR0 by the interrupt processing mechanism (see nism (see Section 7.6 on page 1030) is the address of Section 7.6 on page 1030) is the address of the instruc- the instruction that would have been executed next had tion that would have been executed next had the inter- the interrupt not occurred (i.e. the address in SRR0 at rupt not occurred (i.e. the address in CSRR0 at the the time of the execution of the rfi). time of the execution of the rfci). This instruction is privileged and context synchronizing. This instruction is hypervisor privileged and context synchronizing. [Category:Embedded.Hypervisor] When rfi is executed in guest state, the instruction is Special Registers Altered: mapped to rfgi and rfgi is executed instead. MSR Special Registers Altered: MSR Chapter 4. Branch Facility 909 Version 2.06 Return From Debug Interrupt X-form Return From Machine Check Interrupt XL-form rfdi [Category: Embedded.Enhanced Debug] rfmci 19 /// /// /// 39 / 19 /// /// /// 38 / 0 6 11 16 21 31 0 6 11 16 21 31 MSR DSRR1 MSR MCSRR1 NIA iea DSRR00:61 || 0b00 NIA iea MCSRR00:61 || 0b00 The rfdi instruction is used to return from a Debug The rfmci instruction is used to return from a Machine interrupt, or as a means of establishing a new context Check class interrupt, or as a means of establishing a and synchronizing on that new context simultaneously. new context and synchronizing on that new context simultaneously. The contents of DSRR1 are placed into the MSR. If the new MSR value does not enable any pending excep- The contents of MCSRR1 are placed into the MSR. If tions, then the next instruction is fetched, under control the new MSR value does not enable any pending of the new MSR value, from the address exceptions, then the next instruction is fetched, under DSRR00:61||0b00. (Note: VLE behavior may be differ- control of the new MSR value, from the address ent; see Book VLE.) If the new MSR value enables one MCSRR00:61||0b00. (Note: VLE behavior may be differ- or more pending exceptions, the interrupt associated ent; see Book VLE.) If the new MSR value enables one with the highest priority pending exception is gener- or more pending exceptions, the interrupt associated ated; in this case the value placed into SRR0, CSRR0, with the highest priority pending exception is gener- or DSRR0 by the interrupt processing mechanism is ated; in this case the value placed into SRR0, CSRR0, the address of the instruction that would have been MCSRR0, or DSRR0 [Category: Embedded.Enhanced executed next had the interrupt not occurred (i.e. the Debug] by the interrupt processing mechanism (see address in DSRR0 at the time of the execution of the Section 7.6 on page 1030) is the address of the instruc- rfdi). tion that would have been executed next had the inter- rupt not occurred (i.e. the address in MCSRR0 at the This instruction is hypervisor privileged and context time of the execution of the rfmci). synchronizing. This instruction is hypervisor privileged and context Special Registers Altered: synchronizing. MSR Special Registers Altered: MSR 910 Power ISATM Book III-E Version 2.06 Return From Guest Interrupt XL-form Embedded Hypervisor Privilege XL-form rfgi [Category:Embedded.Hypervisor] ehpriv OC [Category: Embedded.Hypervisor] 19 /// /// /// 102 / 31 OC 270 / 0 6 11 16 21 31 0 6 21 31 newmsr GSRR1 The ehpriv instruction generates an Embedded Hyper- if MSRGS = 1 then visor Privilege Exception resulting in an Embedded newmsrGS,WE MSRGS,WE Hypervisor Privilege Interrupt. prots MSRPUCLEP,DEP,PMMP newmsr prots & MSR | ~prots & newmsr The OC field may be used by hypervisor software to MSR newmsr provide a facility for emulated virtual instructions. NIA iea GSRR00:61 || 0b00 Special Registers Altered: The rfgi instruction is used to return from a guest state None base class interrupt, or as a means of simultaneously establishing a new context and synchronizing on that Programming Note new context. The ehpriv instruction is analogous to a guaran- The contents of Guest Save/Restore Register 1 are teed illegal instruction encoding in that it guaran- placed into the MSR. If the rfgi is executed in the guest tees that an Embedded Hypervisor Privilege supervisor state (MSRGS PR = 0b10), the bits MSRGS exception is generated. The instruction is useful for WE are not modified and the bits MSRUCLE DE PMM are programs that need to communicate information to modified only if the associated bits in the Machine State the hypervisor software, particularly as a means for Register Protect (MSRP) Register are set to 0. If the implementing breakpoint operations in a hypervisor new MSR value does not enable any pending excep- managed debugger. tions, then the next instruction is fetched, under control of the new MSR value, from the address Programming Note GSRR00:61||0b00. (Note: VLE behavior may be differ- ent; see Book VLE.) If the new MSR value enables one ehpriv serves as both a basic and an extended or more pending exceptions, the interrupt associated mnemonic. The Assembler will recognize an with the highest priority pending exception is gener- ehpriv mnemonic with one operand as the basic ated; in this case the value placed into the associated form, and an ehpriv mnemonic with no operand as save/restore register 0 by the interrupt processing the extended form. In the extended form, the OC mechanism is the address of the instruction that would operand is omitted and assumed to be 0. have been executed next had the interrupt not occurred (i.e. the address in GSRR0 at the time of the execution of the rfgi). This instruction is privileged and context synchronizing. Special Registers Altered: MSR Chapter 4. Branch Facility 911 Version 2.06 912 Power ISATM Book III-E Version 2.06 Chapter 5. Fixed-Point Facility 5.1 Fixed-Point Facility Overview . . . . 913 5.3.6 External Process ID Registers [Cate- 5.2 Special Purpose Registers . . . . . . 913 gory: Embedded.External PID] . . . . . . 916 5.3 Fixed-Point Facility Registers . . . . 913 5.3.6.1 External Process ID Load Context 5.3.1 Processor Version Register . . . . 913 (EPLC) Register . . . . . . . . . . . . . . . . . 916 5.3.2 Processor Identification 5.3.6.2 External Process ID Store Context Register. . . . . . . . . . . . . . . . . . . . . . . . 914 (EPSC) Register . . . . . . . . . . . . . . . . . 917 5.3.3 Guest Processor Identification Reg- 5.4 Fixed-Point Facility Instructions . . . 918 ister [Category: 5.4.1 Move To/From System Register Embedded.Hypervisor] . . . . . . . . . . . . 914 Instructions . . . . . . . . . . . . . . . . . . . . . 918 5.3.4 Program Priority Register 32-bit [Cat- 5.4.2 OR Instruction . . . . . . . . . . . . . . 926 egory: Phased-In] . . . . . . . . . . . . . . . . 914 5.4.3 External Process ID Instructions 5.3.5 Software-use SPRs . . . . . . . . . . 915 [Category: Embedded.External PID] . . 927 5.1 Fixed-Point Facility Over- instruction. Read access to the PVR is privileged; write access is not provided. view Version Revision This chapter describes the details concerning the regis- 32 48 63 ters and the privileged instructions implemented in the Fixed-Point Facility that are not covered in Book I. Figure 10. Processor Version Register The PVR distinguishes between implementations that differ in attributes that may affect software. It contains 5.2 Special Purpose Registers two fields. Special Purpose Registers (SPRs) are read and written Version A 16-bit number that identifies the version using the mfspr (page 922) and mtspr (page 921) of the implementation. Different version instructions. Most SPRs are defined in other chapters numbers indicate major differences of this book; see the index to locate those definitions. between implementations, such as which optional facilities and instructions are sup- ported. 5.3 Fixed-Point Facility Regis- Revision A 16-bit number that distinguishes between ters implementations of the version. Different revision numbers indicate minor differences between implementations having the same 5.3.1 Processor Version Register version number, such as clock rate and Engineering Change level. The Processor Version Register (PVR) is a 32-bit read- only register that contains a value identifying the ver- Version numbers are assigned by the Power ISA Archi- sion and revision level of the hardware. The contents tecture process. Revision numbers are assigned by an of the PVR can be copied to a GPR by the mfspr implementation-defined process. Chapter 5. Fixed-Point Facility 913 Version 2.06 5.3.2 Processor Identification Programming Note Register mfspr RT,PIR should be used to read GPIR in guest supervisor state. See Section 2.2.1, "Regis- The Processor Identification Register (PIR) is a 32-bit ter Mapping". mtspr PIR,RS should be used to register that contains a value that can be used to distin- write GPIR in guest supervisor state. guish the thread from other threads in the system. The contents of the PIR can be read using mfspr and writ- ten using mtspr. Read access to the PIR is privileged; Engineering Note write access, if provided, is hypervisor privileged. Some previous implementations did not allow write access to the PIR. This allows the hypervisor to [Category:Embedded.Hypervisor] easily track changes to the GPIR. If write access is Read accesses to the PIR in guest supervisor state are allowed to the PIR, write access should be allowed mapped to the GPIR. to the GPIR and should only be available to the hypervisor. PROCID 32 63 Bits Name Description 32:63 PROCID Thread ID 5.3.4 Program Priority Register 32-bit [Category: Phased-In] Figure 11. Processor Identification Register Privileged programs may set a wider range of program The means by which the PIR is initialized are imple- priorities in the PRI field of PPR32 than may be set by mentation-dependent. problem-state programs (see Section 3.1 of Book II). Problem-state programs may only set values in the Programming Note range of 0b010 to 0b100. Privileged programs may set The PIR can be used to identify the thread globally values in the range of 0b001 to 0b110. Hypervisor soft- among all threads in a system that contains multi- ware may also set 0b111. If a program attempts to set ple threads. This facilitates more efficient usage of a value that is not available to it, the PRI field remains the Processor Control facility (see Section 11). unchanged. The values and their corresponding mean- ings are as follows. 5.3.3 Guest Processor Identifica- 001 very low tion Register [Category:Embed- 010 low 011 medium low (normal) ded.Hypervisor] 100 medium The Guest Processor Identification Register (GPIR) is a 101 medium high 32-bit register that contains a value that can be used to 110 high distinguish the thread from other threads in the system. 111 very high The contents of the GPIR can be read using mfspr and written using mtspr. Read access to the GPIR is privi- leged; write access, if provided, is hypervisor privi- leged. PROCID 32 63 Bits Name Description 32:63 PROCID Thread ID Figure 12. Guest Processor Identification Register The means by which the GPIR is initialized are imple- mentation-dependent. 914 Power ISATM Book III-E Version 2.06 5.3.5 Software-use SPRs [Category:Embedded.Hypervisor] These 64-bit registers can be accessed only in Software-use SPRs are 64-bit registers provided for supervisor mode. use by software. GSPRG3 SPRG0 [Category:Embedded.Hypervisor] SPRG1 This 64-bit register can be read in supervisor mode and can be written only in supervisor mode. It is SPRG2 implementation-dependent whether or not this reg- SPRG3 ister can be read in user mode. SPRG4 SPRGi or GSPRGi can be read using mfspr and writ- SPRG5 ten using mtspr. SPRG6 SPRG7 Programming Note SPRG8 mfspr RT,SPRGi should be used to read GSPRGi SPRG9 [Category: Embedded.Enhanced Debug] in guest state. mtspr SPRGi,RS should be used to GSPRG0 [Category:Embedded.Hypervisor] write GSPRGi in guest state. See Section 2.2.1, "Register Mapping". GSPRG1 [Category:Embedded.Hypervisor] GSPRG2 [Category:Embedded.Hypervisor] GSPRG3 [Category:Embedded.Hypervisor] 0 63 Figure 13. Special Purpose Registers Programming Note USPRG0 was made a 32-bit register and renamed to VRSAVE; see Sections 3.2.3 and 6.3.3 of Book I. SPRG0 through SPRG2 These 64-bit registers can be accessed only in supervisor mode. [Category:Embedded.Hypervisor] Access to these registers in guest supervisor state is mapped to GSPRG0 through GSPRG2. SPRG3 This 64-bit register can be read in supervisor mode and can be written only in supervisor mode. It is implementation-dependent whether or not this reg- ister can be read in user mode. [Category:Embedded.Hypervisor] Access to this register in guest state is mapped to GSPRG3. SPRG4 through SPRG7 These 64-bit registers can be written only in super- visor mode. These registers can be read in super- visor and user modes. SPRG8 through SPRG9 These 64-bit registers can be accessed only in supervisor mode. GSPRG0 through GSPRG2 Chapter 5. Fixed-Point Facility 915 Version 2.06 5.3.6 External Process ID Regis- If the TLB lookup is successful, the storage access control mechanism grants or denies the access using ters [Category: Embedded.Exter- context information from EPLCEPR or EPSCEPR for nal PID] loads and stores respectively. If access is not granted, a Data Storage interrupt occurs, and the ESREPID bit is The External Process ID Registers provide capabilities set to 1. If the operation was a Store, the ESRST bit is for loading and storing General Purpose Registers and also set to 1. performing cache management operations using a sup- plied context other than the context normally used by the programming model. 5.3.6.1 External Process ID Load Con- text (EPLC) Register Two SPRs describe the context for loading and storing using external contexts. The External Process ID Load The EPLC register contains fields to provide the con- Context (EPLC) Register provides the context for text for External Process ID Load instructions. External Process ID Load instructions, and the External Process ID Store Context (EPSC) Register provides EPLC the context for External Process ID Store instructions. 32 63 Each of these registers contains a PR (privilege) bit, an Figure 14. External Process ID Load Context AS (address space) bit, a Process ID, a GS (guest Register state) bit , and an LPID . Changes to the EPLC or the EPSC Register require that a context These bits are interpreted as follows: synchronizing operation be performed prior to using any External Process ID instructions that use these Bit Definition registers. 32 External Load Context PR Bit (EPR) External Process ID instructions that use the context Used in place of MSRPR by the storage provided by the EPLC register include lbepx, lhepx, access control mechanism when an External lwepx, ldepx, dcbtep, dcbtstep, dcbfep, dcbstep, Process ID Load instruction is executed. icbiep, lfdepx, evlddepx, lvepx, and lvepxl and those 0 Supervisor mode that use the context provided by the EPSC register 1 User mode include stbepx, sthepx, stwepx, stdepx, dcbzep, stfdepx, evstddepx, stvepx, and stvepxl. Instruction 33 External Load Context AS Bit (EAS) definitions appear in Section 5.4.3. Used in place of MSRDS for translation when an External Process ID Load instruction is System software configures the EPLC register to reflect executed, and, if this Load instruction causes the Process ID, AS, PR, GS , and LPID a Data TLB Error interrupt, loaded into MAS state from the context that it wishes to perform registers in place of MSRDS. loads from and configures the EPSC register to reflect the Process ID, AS, PR, GS , and LPID 0 Address space 0 state from the context it wishes to perform 1 Address space 1 stores to. Software then issues External Process ID 34 External Load Context GS Bit (EGS) instructions to manipulate data as required. [Category Embedded.Hypervisor] When the an External Process ID Load instruction is Used in place of MSRGS for translation when executed, it uses the context information in the EPLC an External Process ID Load instruction is Register instead of the normal context with respect to executed. address translation and storage access control. EPLCEPR is used in place of MSRPR, EPLCEAS is used 0 Hypervisor state in place of MSRDS, EPLCEPID is used in place of PID, 1 Guest state EPLCEGS. is used in place of MSRGS , and 35 Reserved EPLCELPID. is used in place of LPIDR . Simi- 36:47 External Load Context LPID Value (ELPID) larly, when the an External Process ID Store instruction [Category:Embedded.Hypervisor] is executed, it uses the context information in the EPSC Used in place of LPIDR register for translation Register instead of the normal context with respect to when an External Process ID Load instruction address translation and storage access control. is executed. EPSCEPR is used in place of MSRPR, EPSCEAS is used in place of MSRDS, EPSCEPID is used in place of PID, 48:49 Reserved EPSCEGS. is used in place of MSRGS , and 50:63 External Load Context Process ID Value EPSCELPID. is used in place of LPIDR . Trans- (EPID) lation occurs using the new substituted values. Used in place of the Process ID register value for translation when an external Process ID 916 Power ISATM Book III-E Version 2.06 Load instruction is executed, and, if this Load for translation when an external PID Store instruction causes a Data TLB Error interrupt, instruction is executed, and, if this Store loaded into MAS registers in place of PID con- instruction causes a Data TLB Error interrupt, tents. loaded into MAS registers in place of PID con- tents. When a mtspr instruction is executed that targets EPLC, the EGS and ELPID fields are only modified if When a mtspr instruction is executed that targets the thread is in hypervisor state. EPSC, the EGS and ELPID fields are only modified if the thread is in hypervisor state. 5.3.6.2 External Process ID Store Con- text (EPSC) Register The EPSC register contains fields to provide the con- text for External Process ID Store instructions. The field encoding is the same as the EPLC Register. EPSC 32 63 Figure 15. External Process ID Store Context Register These bits are interpreted as follows: Bits Definition 32 External Store Context PR Bit (EPR) Used in place of MSRPR by the storage access control mechanism when an External Process ID Store instruction is executed. 0 Supervisor mode 1 User mode 33 External Store Context AS Bit (EAS) Used in place of MSRDS for translation when an External Process ID Store instruction is executed, and, if this Store instruction causes a Data TLB Error interrupt, loaded into MAS registers in place of MSRDS. 0 Address space 0 1 Address space 1 34 External Store Context GS Bit (EGS) [Category Embedded.Hypervisor] Used in place of MSRGS for translation when an External Process ID Store instruction is executed. 0 Hypervisor state 1 Guest state 35 Reserved 36:47 External Store Context LPID Value (ELPID) [Category:Embedded.Hypervisor] Used in place of LPIDR register for translation when an External Process ID Store instruction is executed. 48:49 Reserved 50:63 External Store Context Process ID Value (EPID) Used in place of the Process ID register value Chapter 5. Fixed-Point Facility 917 Version 2.06 5.4 Fixed-Point Facility Instructions 5.4.1 Move To/From System Register Instructions The Move To Special Purpose Register and Move From specific SPR numbers that are implemented, and simi- Special Purpose Register instructions are described in larly for "defined" registers. Book I, but only at the level available to an application programmer. For example, no mention is made there of Extended mnemonics registers that can be accessed only in supervisor mode. The descriptions of these instructions given Extended mnemonics are provided for the mtspr and below extend the descriptions given in Book I, but do mfspr instructions so that they can be coded with the not list Special Purpose Registers that are implementa- SPR name as part of the mnemonic rather than as a tion-dependent. In the descriptions of these instructions numeric operand; see Appendix B. given below, the "defined" SPR numbers are the SPR numbers shown in Table 16 and the implementation- SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 1 00000 00001 XER no no 64 B 8 00000 01000 LR no no 64 B 9 00000 01001 CTR no no 64 B 17 00000 10001 DSCR yes yes 64 STM 22 00000 10110 DEC yes9 yes9 32 B 9 26 00000 11010 SRR0 yes yes9 64 B 27 00000 11011 SRR1 yes9 yes9 64 B 48 00001 10000 PID yes yes 32 E 54 00001 10110 DECAR hypv8 hypv8 32 E 55 00001 10111 MCIVPR hypv8 hypv8 64 E 56 00001 11000 LPER hypv8 hypv8 64 E.HV; E.PT 57 00001 11001 LPERU hypv8 hypv8 32 E.HV; E.PT 58 00001 11010 CSRR0 hypv8 hypv8 64 E 59 00001 11011 CSRR1 hypv8 hypv8 32 E 61 00001 11101 DEAR yes9 yes9 64 E 62 00001 11110 ESR yes9 yes9 32 E 63 00001 11111 IVPR hypv8 hypv8 64 E 256 01000 00000 VRSAVE no no 32 B 259 01000 00011 SPRG3 - no 64 B 260-263 01000 001xx SPRG[4-7] - no 64 E 268 01000 01100 TB - no 64 B 269 01000 01101 TBU - no 325 B 9 9 272-275 01000 100xx SPRG[0-3] yes yes 64 B 276-279 01000 101xx SPRG[4-7] yes yes 64 E 282 01000 11010 EAR hypv4 hypv4 32 EC 284 01000 11100 TBL hypv4 - 32 B 285 01000 11101 TBU hypv4 - 32 B 286 01000 11110 PIR hypv8 yes9 32 E 287 01000 11111 PVR - yes 32 B 304 01001 10000 DBSR hypv5,8 hypv8 32 E 306 01001 10010 DBSRWR hypv3 - 32 E.HV 307 01001 10011 EPCR hypv3 hypv3 32 E.HV,(E;64) 308 01001 10100 DBCR0 hypv8 hypv8 32 E 309 01001 10101 DBCR1 hypv8 hypv8 32 E 310 01001 10110 DBCR2 hypv8 hypv8 32 E 311 01001 10111 MSRP hypv3 hypv3 32 E.HV 312 01001 11000 IAC1 hypv8 hypv8 64 E 313 01001 11001 IAC2 hypv8 hypv8 64 E 314 01001 11010 IAC3 hypv8 hypv8 64 E Figure 16. SPR Numbers (Sheet 1 of 3) 918 Power ISATM Book III-E Version 2.06 SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 315 01001 11011 IAC4 hypv8 hypv8 64 E 316 01001 11100 DAC1 hypv8 hypv8 64 E 317 01001 11101 DAC2 hypv8 hypv8 64 E 318 01001 11110 DVC1 hypv8 hypv8 64 E 319 01001 11111 DVC2 hypv8 hypv8 64 E 336 01010 10000 TSR hypv5,8 hypv8 32 E 338 01010 10010 LPIDR hypv3 hypv3 32 E.HV 339 01010 10011 MAS5 hypv3 hypv3 32 E.HV 340 01010 10100 TCR hypv8 hypv8 32 E 341 01010 10101 MAS8 hypv3 hypv3 32 E.HV 342 01010 10110 LRATCFG - hypv3 32 E.HV.LRAT 343 01010 10111 LRATPS - hypv3 32 E.HV.LRAT 344-347 01010 110xx TLB[0-3]PS - hypv3 32 E.HV 348 01010 11100 MAS5||MAS6 hypv3 hypv3 64 E.HV; 64 349 01010 11101 MAS8||MAS1 hypv3 hypv3 64 E.HV; 64 350 01010 11110 EPTCFG hypv8 hypv8 32 E.PT 368-371 01011 100xx GSPRG0-3 yes yes 64 E.HV 372 01011 10100 MAS7||MAS3 yes yes 64 E; 64 373 01011 10101 MAS0||MAS1 yes yes 64 E; 64 378 01011 11010 GSRR0 yes yes 64 E.HV 379 01011 11011 GSRR1 yes yes 32 E.HV 380 01011 11100 GEPR yes yes 32 E.HV;EXP 381 01011 11101 GDEAR yes yes 64 E.HV 382 01011 11110 GPIR hypv3 yes 32 E.HV 383 01011 11111 GESR yes yes 32 E.HV 400-415 01100 1xxxx IVOR0-15 hypv8 hypv8 32 E 432-435 01101 100xx IVOR38-41 hypv8 hypv8 32 E.HV 436 01101 10100 IVOR42 hypv8 hypv8 32 E.HV.LRAT 437 01101 10101 TENSR - hypv8 64 E.MT 438 01101 10110 TENS hypv8 hypv8 64 E.MT 439 01101 10111 TENC hypv8 hypv8 64 E.MT 440-441 01101 1100x GIVOR2-3 hypv3 yes 32 E.HV 442 01101 11010 GIVOR4 hypv3 yes 32 E.HV 443 01101 11011 GIVOR8 hypv3 yes 32 E.HV 444 01101 11100 GIVOR13 hypv3 yes 32 E.HV 445 01101 11101 GIVOR14 hypv3 yes 32 E.HV 446 01101 11110 TIR - hypv9 64 E.MT 447 01101 11111 GIVPR hypv3 yes 64 E.HV 512 10000 00000 SPEFSCR no no 32 SP 526 10000 01110 ATB/ATBL - no 64 ATB 527 10000 01111 ATBU - no 32 ATB 528 10000 10000 IVOR32 hypv8 hypv8 32 SP 529 10000 10001 IVOR33 hypv8 hypv8 32 SP 530 10000 10010 IVOR34 hypv8 hypv8 32 SP 531 10000 10011 IVOR35 hypv8 hypv8 32 E.PM 532 10000 10100 IVOR36 hypv8 hypv8 32 E.PC 533 10000 10101 IVOR37 hypv8 hypv8 32 E.PC 570 10001 11010 MCSRR0 hypv8 hypv8 64 E 571 10001 11011 MCSRR1 hypv8 hypv8 32 E 572 10001 11100 MCSR hypv8 hypv8 64 E 574 10001 11110 DSRR0 yes yes 64 E.ED 575 10001 11111 DSRR1 yes yes 32 E.ED 604 10010 11100 SPRG8 hypv8 hypv8 64 E 605 10010 11101 SPRG9 yes yes 64 E.ED 624 10011 10000 MAS0 yes yes 32 E Figure 16. SPR Numbers (Sheet 2 of 3) Chapter 5. Fixed-Point Facility 919 Version 2.06 SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 625 10011 10001 MAS1 yes yes 32 E 626 10011 10010 MAS2 yes yes 64 E 627 10011 10011 MAS3 yes yes 32 E 628 10011 10100 MAS4 yes yes 32 E 630 10011 10110 MAS6 yes yes 32 E 688-691 10101 100xx TLB[0-3]CFG - hypv8 32 E 702 10101 11110 EPR - yes9 32 EXP 898 11100 00010 PPR32 no no 32 Phased-In 924 11100 11100 DCDBTRL -5 hypv8 32 E.CD 5 925 11100 11101 DCDBTRH - hypv8 32 E.CD 926 11100 11110 ICDBTRL -6 hypv8 32 E.CD 927 11100 11111 ICDBTRH -6 hypv8 32 E.CD 944 11101 10000 MAS7 yes yes 32 E 947 11101 10011 EPLC yes yes 32 E.PD 948 11101 10100 EPSC yes6 yes 32 E.PD 979 11110 10011 ICDBDR -7 hypv8 32 E.CD 1012 11111 10100 MMUCSR0 hypv8 hypv8 32 E 1015 11111 10111 MMUCFG - hypv8 32 E - This register is not defined for this instruction. 1 Note that the order of the two 5-bit halves of the SPR number is reversed. 2 See Section 1.3.5 of Book I. If multiple categories are listed separated by a semi- colon, all the listed categories must be implemented in order for the other col- umns of the line to apply. A comma separates two alternatives, and takes precedence over a semicolon; e.g., the EPCR (E.HV,E;64) must be implemented if either (a) category E.HV is implemented or (b) the implementation is Embedded and supports the 64-bit category. 3 This register is a hypervisor resource, and can be accessed by this instruction only in hypervisor state (see Chapter 2 of Book III-E). 4 If the Embedded.Hypervisor category is supported, this register is a hypervisor resource, and can be accessed by this instruction only in hypervisor state (see Chapter 2 of Book III-E). Otherwise, the register is privileged. 5 The register can be written by the dcread instruction. 6 The register can be written by the icread instruction. 7 The register is Category: Phased-in. 8 If the Embedded.Hypervisor category is supported, this register is a hypervisor resource, and can be accessed by this instruction only in hypervisor state (see Chapter 2 of Book III-E). Otherwise, the register is privileged for Embedded. 9 If the Embedded.Hypervisor category is supported, this register is a hypervisor resource and can be accessed by this instruction only in hypervisor state, and guest references to the register are redirected to the corresponding guest register (see Chapter 2 of Book III-E). Otherwise the register is privileged. All SPR numbers that are not shown above and are not implementation-specific are reserved. Figure 16. SPR Numbers (Sheet 3 of 3) Move To Special Purpose Register The SPR field denotes a Special Purpose Register, XFX-form encoded as shown in Figure 16. The contents of regis- ter RS are placed into the designated Special Purpose mtspr SPR,RS Register. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RS are placed into the 31 RS spr 467 / SPR. 0 6 11 21 31 For this instruction, SPRs TBL and TBU are treated as separate 32-bit registers; setting one leaves the other n spr5:9 || spr0:4 unaltered. if length(SPR(n)) = 64 then SPR(n) (RS) spr0=1 if and only if writing the register is privileged. else Execution of this instruction specifying a defined and SPR(n) (RS)32:63 920 Power ISATM Book III-E Version 2.06 privileged register when MSRPR=1 causes a Privileged Instruction type Program interrupt. Execution of this instruction specifying an SPR number that is not defined for the implementation causes either an Illegal Instruction type Program interrupt or one of the following. if spr0=0: boundedly undefined results if spr0=1: - if MSRPR=1: Privileged Instruction type Pro- gram interrupt; if MSRPR=0: boundedly unde- fined results If the SPR number is set to a value that is shown in Figure 16 but corresponds to an optional Special Pur- pose Register that is not provided by the implementa- tion, the effect of executing this instruction is the same as if the SPR number were reserved. Special Registers Altered: See Figure 16 Compiler and Assembler Note For the mtspr and mfspr instructions, the SPR number coded in assembler language does not appear directly as a 10-bit binary number in the instruction. The number coded is split into two 5-bit halves that are reversed in the instruction, with the high-order 5 bits appearing in bits 16:20 of the instruction and the low-order 5 bits in bits 11:15. Programming Note For a discussion of software synchronization requirements when altering certain Special Pur- pose Registers, see Chapter 12. "Synchronization Requirements for Context Alterations" on page 1099. Chapter 5. Fixed-Point Facility 921 Version 2.06 Move From Special Purpose Register Move To Device Control Register XFX-form XFX-form mfspr RT,SPR mtdcr DCRN,RS [Category: Embedded.Device Control] 31 RT spr 339 / 0 6 11 21 31 31 RS dcr 451 / 0 6 11 21 31 n spr5:9 || spr0:4 if length(SPR(n)) = 64 then DCRN dcr0:4 || dcr5:9 RT SPR(n) DCR(DCRN) (RS) else RT 32 0 || SPR(n) Let DCRN denote a Device Control Register. (The sup- ported Device Control Registers are implementation- The SPR field denotes a Special Purpose Register, dependent.) encoded as shown in Figure 16. The contents of the designated Special Purpose Register are placed into The contents of register RS are placed into the desig- register RT. For Special Purpose Registers that are 32 nated Device Control Register. For 32-bit Device Con- bits long, the low-order 32 bits of RT receive the con- trol Registers, the contents of bits 32:63 of (RS) are tents of the Special Purpose Register and the high- placed into the Device Control Register. order 32 bits of RT are set to zero. This instruction is privileged. spr0=1 if and only if reading the register is privileged. Special Registers Altered: Execution of this instruction specifying a defined and Implementation-dependent. privileged register when MSRPR=1 causes a Privileged Instruction type Program interrupt. Move To Device Control Register Indexed Execution of this instruction specifying an SPR number X-form that is not defined for the implementation causes either an Illegal Instruction type Program interrupt or one of mtdcrx RA,RS the following. [Category: Embedded.Device Control] if spr0=0: boundedly undefined results 31 RS RA /// 387 / if spr0=1: 0 6 11 16 21 31 - if MSRPR=1: Privileged Instruction type Pro- gram interrupt DCRN (RA) - if MSRPR=0: boundedly undefined results DCR(DCRN) (RS) Let the contents of register RA denote a Device Control If the SPR field contains a value that is shown in Register. (The supported Device Control Registers Figure 16 but corresponds to an optional Special Pur- supported are implementation-dependent.) pose Register that is not provided by the implementa- tion, the effect of executing this instruction is the same The contents of register RS are placed into the desig- as if the SPR number were reserved. nated Device Control Register. For 32-bit Device Con- Special Registers Altered: trol Registers, the contents of RS32:63 are placed into None the Device Control Register. The specification of Device Control Registers using Note mtdcrx, mtdcrux (see Book I), and mtdcr is imple- See the Notes that appear with mtspr. mentation-dependent. For example, mtdcr 105,r2 and mtdcrux r1,r2 (where register r1 contains the value 105) may not produce identical results on an imple- mentation. This instruction is privileged. Special Registers Altered: Implementation-dependent. 922 Power ISATM Book III-E Version 2.06 Move From Device Control Register Special Registers Altered: XFX-form Implementation-dependent. mfdcr RT,DCRN Move To Machine State Register X-form [Category: Embedded.Device Control] mtmsr RS 31 RT dcr 323 / 0 6 11 21 31 31 RS /// /// 146 / 0 6 11 16 21 31 DCRN dcr0:4 || dcr5:9 RT DCR(DCRN) newmsr (RS)32:63 if MSRCM = 0 & newmsrCM = 1 then NIA0:31 0 Let DCRN denote a Device Control Register. (The sup- if MSRGS = 1 then ported Device Control Registers are implementation- newmsrGS WE MSRGS WE dependent.) prots0:31 0 The contents of the designated Device Control Register protsUCLEP DEP PMMP MSRPUCLEP DEP PMMP newmsr prots & MSR | ~prots & newmsr are placed into register RT. For 32-bit Device Control MSR newmsr Registers, the contents of the Device Control Register are placed into bits 32:63 of RT. Bits 0:31 of RT are set The contents of register RS32:63 are placed into the to 0. MSR. If the thread is changing from 32-bit mode to 64- bit mode, the next instruction is fetched from This instruction is privileged. 320||NIA 32:63. Special Registers Altered: This instruction is privileged and execution synchroniz- Implementation-dependent. ing. In addition, alterations to the EE or CE bits are effective as soon as the instruction completes. Thus if MSREE=0 Move From Device Control Register and an External interrupt is pending, executing an Indexed X-form mtmsr that sets MSREE to 1 will cause the External interrupt to be taken before the next instruction is exe- mfdcrx RT,RA cuted, if no higher priority exception exists. Likewise, if [Category: Embedded.Device Control] MSRCE=0 and a Critical Input interrupt is pending, exe- cuting an mtmsr that sets MSRCE to 1 will cause the 31 RT RA /// 259 / Critical Input interrupt to be taken before the next 0 6 11 16 21 31 instruction is executed if no higher priority exception exists. (See Section 7.6 on page 1030.) DCRN (RA) [Category:Embedded.Hypervisor] RT DCR(DCRN) GS, WE and bits protected with MSRP are only modi- Let the contents of register RA denote a Device Control fied if mtmsr is executed in hypervisor state. Register (the supported Device Control Registers are Special Registers Altered: implementation-dependent.) MSR The contents of the designated Device Control Register are placed into register RT. For 32-bit Device Control Programming Note Registers, the contents of bits 32:63 of the designated For a discussion of software synchronization Device Control Register are placed into RT. Bits 0:31 of requirements when altering certain MSR bits RT are set to 0. please refer to Chapter 12. The specification of Device Control Registers using mfdcrx and mfdcrux (see Book I) compared to the specification of Device Control Registers using mfdcr is implementation-dependent. For example, mfdcr r2,105 and mfdcrx r2,r1 (where register r1 contains the value 105) may not produce identical results on an implementation or between implementations. Also, accessing privileged Device Control Registers in super- visor mode with mfdcrux is implementation-depen- dent. This instruction is privileged. Chapter 5. Fixed-Point Facility 923 Version 2.06 Move From Machine State Register Write MSR External Enable X-form X-form wrtee RS mfmsr RT 31 RS /// /// 131 / 31 RT /// /// 83 / 0 6 11 16 21 31 0 6 11 16 21 31 MSREE (RS)48 RT 32 0 || MSR The content of (RS)48 is placed into MSREE. The contents of the MSR are placed into bits 32:63 of Alteration of the MSREE bit is effective as soon as the register RT and bits 0:31 of RT are set to 0. instruction completes. Thus if MSREE=0 and an Exter- This instruction is privileged. nal interrupt is pending, executing a wrtee instruction that sets MSREE to 1 will cause the External interrupt to Special Registers Altered: occur before the next instruction is executed, if no None higher priority exception exists (Section 7.9, "Exception Priorities" on page 1059). This instruction is privileged. Special Registers Altered: MSR 924 Power ISATM Book III-E Version 2.06 Write MSR External Enable Immediate X-form wrteei E 31 /// /// E /// 163 / 0 6 11 16 17 21 31 MSREE E The value specified in the E field is placed into MSREE. Alteration of the MSREE bit is effective as soon as the instruction completes. Thus if MSREE=0 and an Exter- nal interrupt is pending, executing a wrtee instruction that sets MSREE to 1 will cause the External interrupt to occur before the next instruction is executed, if no higher priority exception exists (Section 7.9, "Exception Priorities" on page 1059). This instruction is privileged. Special Registers Altered: MSR Programming Note wrtee and wrteei are used to provide atomic update of MSREE. Typical usage is: mfmsr Rn #save EE in (Rn)48 wrteei 0 #turn off EE mfmsr Rn #save EE in (Rn)48 wrteei 0 #turn off EE : : : : #code with EE disabled wrtee Rn #restore EE without altering #other MSR bits that might #have changed Chapter 5. Fixed-Point Facility 925 Version 2.06 5.4.2 OR Instruction or Rx,Rx,Rx can be used to set PPRPRI (see Figure 2 in Section 3.1 of Book II) as shown in Figure 17. PPRPRI remains unchanged if the privilege state of the thread executing the instruction is lower than the privi- lege indicated in the figure. (The encodings available to problem-state programs, as well as encodings for addi- tional shared resource hints not shown here, are described in Section 3.2 of Book II.) Rx PPR32PRI Priority Privileged 31 001 very low yes 1 010 low no 6 011 medium low no (normal) 2 100 medium no 5 101 medium high yes 3 110 high yes 7 111 very high hypv1 1 If the Embedded.Hypervisor category is supported, this value is hypervisor privileged. Otherwise, the value is privileged. Figure 17. Priority levels for or Rx,Rx,Rx 926 Power ISATM Book III-E Version 2.06 5.4.3 External Process ID Instructions [Category: Embedded.External PID] External Process ID instructions provide capabilities for If an Alignment interrupt, Data Storage interrupt, or a loading and storing General Purpose Registers and Data TLB Error interrupt, occurs while attempting to performing cache management operations using a sup- execute an External Process ID instruction, ESREPID is plied context other than the context normally used by set to 1 indicating that the instruction causing the inter- translation. rupt was an External Process ID instruction; any other applicable ESR bits are also set. The EPLC and EPSC registers provide external con- texts for performing loads and stores. The EPLC and the EPSC registers are described in Section 5.3.6. Load Byte by External Process ID Indexed Load Halfword by External Process ID X-form Indexed X-form lbepx RT,RA,RB lhepx RT,RA,RB 31 RT RA RB 95 / 31 RT RA RB 287 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + (RB) EA b + (RB) 56 48 RT 0 || MEM(EA,1) RT 0 || MEM(EA,2) Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). The byte in storage addressed by EA is loaded into The halfword in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. RT48:63. RT0:47 are set to 0. For lbepx, the normal translation mechanism is not For lhepx, the normal translation mechanism is not used. The contents of the EPLC register are used to used. The contents of the EPLC register are used to provide the context in which translation occurs. The fol- provide the context in which translation occurs. The fol- lowing substitutions are made for just the translation lowing substitutions are made for just the translation and access control process: and access control process: EPLCEPR is used in place of MSRPR EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPLCEPID is used in place of PID EPLCEGS is used in pace of MSRGS EPLCEGS is used in pace of MSRGS EPLCELPID is used in pace of LPIDR EPLCELPID is used in pace of LPIDR This instruction is privileged. This instruction is privileged. Special Registers Altered: Special Registers Altered: None None Programming Note Programming Note This instruction behaves identically to a lbzx This instruction behaves identically to a lhzx instruction except for using the EPLC register to instruction except for using the EPLC register to provide the translation context. provide the translation context. Chapter 5. Fixed-Point Facility 927 Version 2.06 Load Word by External Process ID Load Doubleword by External Process ID Indexed X-form Indexed X-form lwepx RT,RA,RB ldepx RT,RA,RB 31 RT RA RB 31 / 31 RT RA RB 29 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + (RB) EA b + (RB) 32 RT 0 || MEM(EA,4) RT MEM(EA,8) Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). The word in storage addressed by EA is loaded into The doubleword in storage addressed by EA is loaded RT32:63. RT0:31 are set to 0. into RT. For lwepx, the normal translation mechanism is not For ldepx, the normal translation mechanism is not used. The contents of the EPLC register are used to used. The contents of the EPLC register are used to provide the context in which translation occurs. The fol- provide the context in which translation occurs. The fol- lowing substitutions are made for just the translation lowing substitutions are made for just the translation and access control process: and access control process: EPLCEPR is used in place of MSRPR EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPLCEPID is used in place of PID EPLCEGS is used in pace of MSRGS EPLCEGS is used in pace of MSRGS EPLCELPID is used in pace of LPIDR EPLCELPID is used in pace of LPIDR This instruction is privileged. This instruction is privileged. Special Registers Altered: Corequisite Categories: None 64-Bit Special Registers Altered: Programming Note None This instruction behaves identically to a lwzx instruction except for using the EPLC register to Programming Note provide the translation context. This instruction behaves identically to a ldx instruc- tion except for using the EPLC register to provide the translation context. 928 Power ISATM Book III-E Version 2.06 Store Byte by External Process ID Store Halfword by External Process ID Indexed X-form Indexed X-form stbepx RS,RA,RB sthepx RS,RA,RB 31 RS RA RB 223 / 31 RS RA RB 415 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + (RB) EA b + (RB) MEM(EA,1) (RS)56:63 MEM(EA,2) (RS)48:63 Let the effective address (EA) be the sum (RA|0)+(RB). (RS)56:63 are stored into the byte in storage addressed Let the effective address (EA) be the sum (RA|0)+(RB). by EA. (RS)48:63 are stored into the halfword in storage addressed by EA. For stbepx, the normal translation mechanism is not used. The contents of the EPSC register are used to For sthepx, the normal translation mechanism is not provide the context in which translation occurs. The fol- used. The contents of the EPSC register are used to lowing substitutions are made for just the translation provide the context in which translation occurs. The fol- and access control process: lowing substitutions are made for just the translation EPSCEPR is used in place of MSRPR and access control process: EPSCEAS is used in place of MSRDS EPSCEPR is used in place of MSRPR EPSCEPID is used in place of PID EPSCEAS is used in place of MSRDS EPSCEGS is used in pace of MSRGS EPSCEPID is used in place of PID EPSCELPID is used in pace of LPIDR EPSCEGS is used in pace of MSRGS EPSCELPID is used in pace of LPIDR This instruction is privileged. This instruction is privileged. Special Registers Altered: None Special Registers Altered: None Programming Note This instruction behaves identically to a stbx Programming Note instruction except for using the EPSC register to This instruction behaves identically to a sthx provide the translation context. instruction except for using the EPSC register to provide the translation context. Chapter 5. Fixed-Point Facility 929 Version 2.06 Store Word by External Process ID Store Doubleword by External Process ID Indexed X-form Indexed X-form stwepx RS,RA,RB stdepx RS,RA,RB 31 RS RA RB 159 / 31 RS RA RB 157 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + (RB) EA b + (RB) MEM(EA,4) (RS)32:63 MEM(EA,8) (RS) Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). (RS)32:63 are stored into the word in storage addressed (RS) is stored into the doubleword in storage by EA. addressed by EA. For stwepx, the normal translation mechanism is not For stdepx, the normal translation mechanism is not used. The contents of the EPSC register are used to used. The contents of the EPSC register are used to provide the context in which translation occurs. The fol- provide the context in which translation occurs. The fol- lowing substitutions are made for just the translation lowing substitutions are made for just the translation and access control process: and access control process: EPSCEPR is used in place of MSRPR EPSCEPR is used in place of MSRPR EPSCEAS is used in place of MSRDS EPSCEAS is used in place of MSRDS EPSCEPID is used in place of PID EPSCEPID is used in place of PID EPSCEGS is used in pace of MSRGS EPSCEGS is used in pace of MSRGS EPSCELPID is used in pace of LPIDR EPSCELPID is used in pace of LPIDR This instruction is privileged. This instruction is privileged. Special Registers Altered: Corequisite Categories: None 64-Bit Special Registers Altered: Programming Note None This instruction behaves identically to a stwx instruction except for using the EPSC register to Programming Note provide the translation context. This instruction behaves identically to a stdx instruction except for using the EPSC register to provide the translation context. 930 Power ISATM Book III-E Version 2.06 Data Cache Block Store by External PID Data Cache Block Touch by External PID X-form X-form dcbstep RA,RB dcbtep TH,RA,RB 31 /// RA RB 63 / 31 TH RA RB 319 / 0 6 11 16 21 31 0 6 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). If the block containing the byte addressed by EA is in The dcbtep instruction provides a hint that the program storage that is Memory Coherence Required, a block will probably soon load from the block containing the containing the byte addressed by EA is in the data byte addressed by EA. If the Cache Specification cate- cache of any thread, and any locations in the block are gory is supported, the nature of the hint is affected by considered to be modified there, then those locations TH values of 0b00000 to 0b00111. Values associated are written to main storage. Additional locations in the with the Stream category are ignored. See block may be written to main storage. The block ceases Section 4.3.2 of Book II for more information. to be considered modified in that data cache. If the block is in a storage location that is Caching If the block containing the byte addressed by EA is in Inhibited or Guarded, the hint is ignored. storage that is not Memory Coherence Required and The only operation that is "caused" by the dcbtep the block is in the data cache of this thread, and any instruction is the providing of the hint. The actions (if locations in the block are considered to be modified any) taken in response to the hint are not considered to there, those locations are written to main storage. Addi- be "caused by" or "associated with" the dcbtep instruc- tional locations in the block may be written to main stor- tion (e.g., dcbtep is considered not to cause any data age, and the block ceases to be considered modified in accesses). No means are provided by which software that data cache. can synchronize these actions with the execution of the The function of this instruction is independent of instruction stream. For example, these actions are not whether the block containing the byte addressed by EA ordered by the memory barrier created by a sync is in storage that is Write Through Required or Caching instruction. Inhibited. The dcbtep instruction may complete before the opera- The instruction is treated as a Load with respect to tion it causes has been performed. translation, memory protection, and is treated as a The nature of the hint depends, in part, on the value of Write with respect to debug events. the TH field, as specified in the dcbt instruction in This instruction is privileged. Section 4.3.2 of Book II. For dcbstep, the normal translation mechanism is not The instruction is treated as a Load, except that no used. The contents of the EPLC register are used to interrupt occurs if a protection violation occurs. provide the context in which translation occurs. The fol- The instruction is privileged. lowing substitutions are made for just the translation and access control process: The normal address translation mechanism is not used. EPLCEPR is used in place of MSRPR The contents of the EPLC register are used to provide EPLCEAS is used in place of MSRDS the context in which translation occurs. The following EPLCEPID is used in place of PID substitutions are made for just the translation and EPLCEGS is used in place of MSR[GS] access control process: EPLCELPID is used in place of LPIDR EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID Special Registers Altered: EPLCEGS is used in place of MSR[GS] None EPLCELPID is used in place of LPIDR Programming Note Special Registers Altered: This instruction behaves identically to a dcbst None instruction except for using the EPLC register to Extended Mnemonics: provide the translation context. Extended mnemonics are provided for the Data Cache Block Touch by External PID instruction so that it can Chapter 5. Fixed-Point Facility 931 Version 2.06 be coded with the TH value as the last operand for all Data Cache Block Flush by External PID categories. . X-form Extended: Equivalent to: dcbfep RA,RB dcbtctep RA,RB,TH dcbtep for TH values of 0b0000 - 0b0111; 31 /// RA RB 127 / other TH values are invalid. 0 6 11 16 21 31 dcbtdsep RA,RB,TH dcbtep for TH values of 0b0000 or 0b1000 - 0b1010; Let the effective address (EA) be the sum (RA|0)+(RB). other TH values are invalid. If the block containing the byte addressed by EA is in Programming Note storage that is Memory Coherence Required, a block containing the byte addressed by EA is in the data This instruction behaves identically to a dcbt cache of any thread, and any locations in the block are instruction except for using the EPLC register to considered to be modified there, then those locations provide the translation context, and not supporting are written to main storage. Additional locations in the the Stream category. block may also be written to main storage. The block is invalidated in the data cache of all threads. If the block containing the byte addressed by EA is in storage that is not Memory Coherence Required, a block containing the byte addressed by EA is in the data cache of this thread, and any locations in the block are considered to be modified there, then those loca- tions are written to main storage. Additional locations in the block may also be written to main storage. The block is invalidated in the data cache of this thread. The function of this instruction is independent of whether the block containing the byte addressed by EA is in storage that is Write Through Required or Caching Inhibited. The instruction is treated as a Load with respect to translation, memory protection, and is treated as a Write with respect to debug events. This instruction is privileged. The normal translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following sub- stitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPLCEGS is used in place of MSR[GS] EPLCELPID is used in place of LPIDR Special Registers Altered: None Programming Note This instruction behaves identically to a dcbf instruction except for using the EPLC register to provide the translation context. 932 Power ISATM Book III-E Version 2.06 Data Cache Block Touch for Store by that it can be coded with the TH value as the last oper- External PID X-form and for all categories. . dcbtstep TH,RA,RB Extended: Equivalent to: dcbtstctep RA,RB,TH dcbtstep for TH values of 31 TH RA RB 255 / 0b0000 - 0b0111; 0 6 11 16 21 31 other TH values are invalid. Let the effective address (EA) be the sum (RA|0)+(RB). Programming Note This instruction behaves identically to a dcbtst The dcbtstep instruction provides a hint that the pro- instruction except for using the EPLC register to gram will probably soon store to the block containing provide the translation context, and not supporting the byte addressed by EA. If the Cache Specification the Stream category. category is supported, the nature of the hint is affected by TH values of 0b00000 to 0b00111. Values associ- ated with the Stream category are ignored. See Section 4.3.2 of Book II for more information. If the block is in a storage location that is Caching Inhibited or Guarded, the hint is ignored. The only operation that is "caused" by the dcbtstep instruction is the providing of the hint. The actions (if any) taken in response to the hint are not considered to be "caused by" or "associated with" the dcbtstep instruction (e.g., dcbtstep is considered not to cause any data accesses). No means are provided by which software can synchronize these actions with the execu- tion of the instruction stream. For example, these actions are not ordered by the memory barrier created by a sync instruction. The dcbtstep instruction may complete before the operation it causes has been performed. The instruction is treated as a Store, except that no interrupt occurs if a protection violation occurs. The instruction is privileged. The normal address translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPLCEGS is used in place of MSR[GS] EPLCELPID is used in place of LPIDR Special Registers Altered: None Extended Mnemonics: Extended mnemonics are provided for the Data Cache Block Touch for Store by External PID instruction so Chapter 5. Fixed-Point Facility 933 Version 2.06 Instruction Cache Block Invalidate by Data Cache Block set to Zero by External External PID X-form PID X-form icbiep RA,RB dcbzep RA,RB 31 /// RA RB 991 / 31 /// RA RB 1023 / 0 6 11 16 21 31 0 6 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). if RA = 0 then b 0 else b (RA) If the block containing the byte addressed by EA is in EA b + (RB) storage that is Memory Coherence Required and a n block size (bytes) block containing the byte addressed by EA is in the m log2(n) instruction cache of any thread, the block is invalidated ea EA0:63-m || m0 n in those instruction caches. MEM(ea, n) 0x00 If the block containing the byte addressed by EA is in Let the effective address (EA) be the sum (RA|0)+(RB). storage that is not Memory Coherence Required and a All bytes in the block containing the byte addressed by block containing the byte addressed by EA is in the EA are set to zero. instruction cache of this thread, the block is invalidated in that instruction cache. This instruction is treated as a Store. The function of this instruction is independent of This instruction is privileged. whether the block containing the byte addressed by EA The normal translation mechanism is not used. The is in storage that is Write Through Required or Caching contents of the EPSC register are used to provide the Inhibited. context in which translation occurs. The following sub- The instruction is treated as a Load. stitutions are made for just the translation and access control process: This instruction is privileged. EPSCEPR is used in place of MSRPR For icbiep, the normal translation mechanism is not EPSCEAS is used in place of MSRDS used. The contents of the EPLC register are used to EPSCEPID is used in place of PID provide the context in which translation occurs. The fol- EPSCEGS is used in place of MSR[GS] lowing substitutions are made for just the translation EPSCELPID is used in place of LPIDR and access control process: Special Registers Altered: EPLCEPR is used in place of MSRPR None EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID Programming Note EPLCEGS is used in place of MSR[GS] EPLCELPID is used in place of LPIDR See the Programming Notes for the dcbz instruc- tion. Special Registers Altered: None Programming Note Programming Note This instruction behaves identically to a dcbz This instruction behaves identically to an icbi instruction except for using the EPSC register to instruction except for using the EPLC register to provide the translation context. provide the translation context. 934 Power ISATM Book III-E Version 2.06 Load Floating-Point Double by External Store Floating-Point Double by External Process ID Indexed X-form Process ID Indexed X-form lfdepx FRT,RA,RB stfdepx FRS,RA,RB 31 FRT RA RB 607 / 31 FRS RA RB 735 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + (RB) EA b + (RB) FRT MEM(EA,8) MEM(EA,8) (FRS) Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). The doubleword in storage addressed by EA is loaded (FRS) is stored into the doubleword in storage into FRT. addressed by EA. For lfdepx, the normal translation mechanism is not For stfdepx, the normal translation mechanism is not used. The contents of the EPLC register are used to used. The contents of the EPSC register are used to provide the context in which translation occurs. The fol- provide the context in which translation occurs. The fol- lowing substitutions are made for just the translation lowing substitutions are made for just the translation and access control process: and access control process: EPLCEPR is used in place of MSRPR EPSCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPSCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPSCEPID is used in place of PID EPLCEGS is used in place of MSR[GS] EPSCEGS is used in place of MSR[GS] EPLCELPID is used in place of LPIDR EPSCELPID is used in place of LPIDR This instruction is privileged. This instruction is privileged. An attempt to execute lfdepx while MSRFP=0 will An attempt to execute stfdepx while MSRFP=0 will cause a Floating-Point Unavailable interrupt. cause a Floating-Point Unavailable interrupt. Corequisite Categories: Corequisite Categories: Floating-Point Floating-Point Special Registers Altered: Special Registers Altered: None None Programming Note Programming Note This instruction behaves identically to a lfdx This instruction behaves identically to a stfdx instruction except for using the EPLC register to instruction except for using the EPSC register to provide the translation context. provide the translation context. Chapter 5. Fixed-Point Facility 935 Version 2.06 Vector Load Doubleword into Doubleword Vector Store Doubleword into by External Process ID Indexed EVX-form Doubleword by External Process ID Indexed EVX-form evlddepx RT,RA,RB evstddepx RS,RA,RB 31 RT RA RB 799 / 0 6 11 16 21 31 31 RT RA RB 927 / 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + (RB) EA b + (RB) RT MEM(EA,8) MEM(EA,8) (RS) Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). The doubleword in storage addressed by EA is loaded (RS) is stored into the doubleword in storage into RT. addressed by EA. For evlddepx, the normal translation mechanism is not For evstddepx, the normal translation mechanism is used. The contents of the EPLC register are used to not used. The contents of the EPSC register are used provide the context in which translation occurs. The fol- to provide the context in which translation occurs. The lowing substitutions are made for just the translation following substitutions are made for just the translation and access control process: and access control process: EPLCEPR is used in place of MSRPR EPSCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPSCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPSCEPID is used in place of PID EPLCEGS is used in place of MSR[GS] EPSCEGS is used in place of MSR[GS] EPLCELPID is used in place of LPIDR EPSCELPID is used in place of LPIDR This instruction is privileged. This instruction is privileged. An attempt to execute evlddepx while MSRSPV=0 will An attempt to execute evstddepx while MSRSPV=0 will cause an SPE Unavailable interrupt. cause an SPE Unavailable interrupt. Corequisite Categories: Corequisite Categories: Signal Processing Engine Signal Processing Engine Special Registers Altered: Special Registers Altered: None None Programming Note Programming Note This instruction behaves identically to a evlddx This instruction behaves identically to a evstddx instruction except for using the EPLC register to instruction except for using the EPSC register to provide the translation context. provide the translation context. 936 Power ISATM Book III-E Version 2.06 Load Vector by External Process ID Load Vector by External Process ID Indexed X-form Indexed LRU X-form lvepx VRT,RA,RB lvepxl VRT,RA,RB 31 VRT RA RB 295 / 31 VRT RA RB 263 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + (RB) EA b + (RB) VRT MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) VRT MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) mark_as_not_likely_to_be_needed_again_anytime_soon Let the effective address (EA) be the sum (RA|0)+(RB). ( EA ) The quadword in storage addressed by the result of EA ANDed with 0xFFFF_FFFF_FFFF_FFF0 is loaded into Let the effective address (EA) be the sum (RA|0)+(RB). VRT. The quadword in storage addressed by the result of EA ANDed with 0xFFFF_FFFF_FFFF_FFF0 is loaded into For lvepx, the normal translation mechanism is not VRT. used. The contents of the EPLC register are used to provide the context in which translation occurs. The fol- lvepxl provides a hint that the quadword in storage lowing substitutions are made for just the translation addressed by EA will probably not be needed again by and access control process: the program in the near future. EPLCEPR is used in place of MSRPR For lvepxl, the normal translation mechanism is not EPLCEAS is used in place of MSRDS used. The contents of the EPLC register are used to EPLCEPID is used in place of PID provide the context in which translation occurs. The fol- EPLCEGS is used in place of MSR[GS] lowing substitutions are made for just the translation EPLCELPID is used in place of LPIDR and access control process: This instruction is privileged. EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS An attempt to execute lvepx while MSRSPV=0 will EPLCEPID is used in place of PID cause a Vector Unavailable interrupt. EPLCEGS is used in place of MSR[GS] Corequisite Categories: EPLCELPID is used in place of LPIDR Vector This instruction is privileged. Special Registers Altered: An attempt to execute lvepxl while MSRSPV=0 will None cause a Vector Unavailable interrupt. Programming Note Corequisite Categories: Vector This instruction behaves identically to a lvx instruc- tion except for using the EPLC register to provide Special Registers Altered: the translation context. None Programming Note See the Programming Notes for the lvxl instruction in Section 6.7.2 of Book I. Programming Note This instruction behaves identically to a lvxl instruction except for using the EPLC register to provide the translation context. Chapter 5. Fixed-Point Facility 937 Version 2.06 Store Vector by External Process ID Store Vector by External Process ID Indexed X-form Indexed LRU X-form stvepx VRS,RA,RB stvepxl VRS,RA,RB 31 VRS RA RB 807 / 31 VRS RA RB 775 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + (RB) EA b + (RB) MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) (VRS) MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) (VRS) mark_as_not_likely_to_be_needed_again_anytime_soon Let the effective address (EA) be the sum (RA|0)+(RB). (EA) The contents of VRS are stored into the quadword in storage addressed by the result of EA ANDed with Let the effective address (EA) be the sum (RA|0)+(RB). 0xFFFF_FFFF_FFFF_FFF0. The contents of VRS are stored into the quadword in storage addressed by the result of EA ANDed with For stvepx, the normal translation mechanism is not 0xFFFF_FFFF_FFFF_FFF0. used. The contents of the EPSC register are used to provide the context in which translation occurs. The fol- The stvepxl instruction provides a hint that the quad- lowing substitutions are made for just the translation word addressed by EA will probably not be needed and access control process: again by the program in the near future. EPSCEPR is used in place of MSRPR For stvepxl, the normal translation mechanism is not EPSCEAS is used in place of MSRDS used. The contents of the EPSC register are used to EPSCEPID is used in place of PID provide the context in which translation occurs. The fol- EPSCEGS is used in place of MSR[GS] lowing substitutions are made for just the translation EPSCELPID is used in place of LPIDR and access control process: This instruction is privileged. EPSCEPR is used in place of MSRPR EPSCEAS is used in place of MSRDS An attempt to execute stvepx while MSRSPV=0 will EPSCEPID is used in place of PID cause a Vector Unavailable interrupt. EPSCEGS is used in place of MSR[GS] Corequisite Categories: EPSCELPID is used in place of LPIDR Vector This instruction is privileged. Special Registers Altered: An attempt to execute stvepxl while MSRSPV=0 will None cause a Vector Unavailable interrupt. Programming Note Corequisite Categories: Vector This instruction behaves identically to a stvx instruction except for using the EPSC register to Special Registers Altered: provide the translation context. None Programming Note See the Programming Notes for the lvxl instruction in Section 6.7.2 of Book I. Programming Note This instruction behaves identically to a stvxl instruction except for using the EPSC register to provide the translation context. 938 Power ISATM Book III-E Version 2.06 Chapter 6. Storage Control 6.1 Overview. . . . . . . . . . . . . . . . . . . . 940 6.9 Logical to Real Address Translation 6.2 Storage Exceptions. . . . . . . . . . . . 942 [Category: 6.3 Instruction Fetch . . . . . . . . . . . . . 942 Embedded.Hypervisor.LRAT] . . . . . . . 971 6.3.1 Implicit Branch . . . . . . . . . . . . . . 942 6.10 Storage Control Registers . . . . . . 973 6.3.2 Address Wrapping Combined with 6.10.1 Process ID Register . . . . . . . . . 973 Changing MSR Bit CM . . . . . . . . . . . . 943 6.10.2 MMU Assist Registers . . . . . . . 973 6.4 Data Access . . . . . . . . . . . . . . . . . 943 6.10.3 MMU Configuration and Control 6.5 Performing Operations Registers . . . . . . . . . . . . . . . . . . . . . . . 974 Out-of-Order . . . . . . . . . . . . . . . . . . . . 943 6.10.3.1 MMU Configuration Register 6.6 Invalid Real Address . . . . . . . . . . . 944 (MMUCFG) . . . . . . . . . . . . . . . . . . . . . 974 6.7 Storage Control. . . . . . . . . . . . . . . 944 6.10.3.2 TLB Configuration Registers 6.7.1 Translation Lookaside Buffer . . . 944 (TLBnCFG) . . . . . . . . . . . . . . . . . . . . . 974 6.7.2 Virtual Address Spaces . . . . . . . 949 6.10.3.3 TLB Page Size Registers 6.7.3 TLB Address Translation . . . . . . 950 (TLBnPS) [MAV=2.0] . . . . . . . . . . . . . . 976 6.7.4 Page Table Address Translation [Cat- 6.10.3.4 Embedded Page Table Configura- egory: Embedded.Page Table] . . . . . . 953 tion Register (EPTCFG) . . . . . . . . . . . 976 6.7.5 Page Table Update Synchronization 6.10.3.5 LRAT Configuration Register Requirements [Category: Embed- (LRATCFG) [Category: Embedded.Hyper- ded.Page Table] . . . . . . . . . . . . . . . . . 961 visor.LRAT] . . . . . . . . . . . . . . . . . . . . . 977 6.7.5.1 Page Table Updates . . . . . . . . 962 6.10.3.6 LRAT Page Size Register 6.7.5.1.1 Adding a Page Table Entry . 962 (LRATPS) [Category: Embedded.Hypervi- 6.7.5.1.2 Deleting a Page Table Entry 963 sor.LRAT] . . . . . . . . . . . . . . . . . . . . . . . 977 6.7.5.1.3 Modifying a Page Table 6.10.3.7 MMU Control and Status Register Entry . . . . . . . . . . . . . . . . . . . . . . . . . . 963 (MMUCSR0) . . . . . . . . . . . . . . . . . . . . 978 6.7.5.2 Invalidating an Indirect TLB 6.10.3.8 MAS0 Register . . . . . . . . . . . 978 Entry . . . . . . . . . . . . . . . . . . . . . . . . . . 963 6.10.3.9 MAS1 Register . . . . . . . . . . . 980 6.7.6 Storage Access Control . . . . . . . 964 6.10.3.10 MAS2 Register . . . . . . . . . . 980 6.7.6.1 Execute Access . . . . . . . . . . . 964 6.10.3.11 MAS3 Register . . . . . . . . . . 981 6.7.6.2 Write Access. . . . . . . . . . . . . . 964 6.10.3.12 MAS4 Register . . . . . . . . . . 982 6.7.6.3 Read Access . . . . . . . . . . . . . 965 6.10.3.13 MAS5 Register [Category: 6.7.6.4 Virtualized Access . . . 965 Embedded.Hypervisor] . . . . . . . . . . . . 983 6.7.6.5 Storage Access Control Applied to 6.10.3.14 MAS6 Register . . . . . . . . . . 983 Cache Management Instructions . . . . 965 6.10.3.15 MAS7 Register . . . . . . . . . . 984 6.7.6.6 Storage Access Control Applied to 6.10.3.16 MAS8 Register [Category: String Instructions . . . . . . . . . . . . . . . . 966 Embedded.Hypervisor] . . . . . . . . . . . . 984 6.8 Storage Control Attributes . . . . . . 966 6.10.3.17 Accesses to Paired MAS Regis- 6.8.1 Guarded Storage . . . . . . . . . . . . 966 ters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 985 6.8.1.1 Out-of-Order Accesses to Guarded 6.10.3.18 MAS Register Update Storage . . . . . . . . . . . . . . . . . . . . . . . . 966 Summary . . . . . . . . . . . . . . . . . . . . . . . 985 6.8.2 User-Definable. . . . . . . . . . . . . . 966 6.11 Storage Control Instructions . . . . 988 6.8.3 Storage Control Bits. . . . . . . . . . 966 6.11.1 Cache Management 6.8.3.1 Storage Control Bit Instructions . . . . . . . . . . . . . . . . . . . . . 988 Restrictions . . . . . . . . . . . . . . . . . . . . . 967 6.11.2 Cache Locking [Category: Embed- 6.8.3.2 Altering the Storage Control ded Cache Locking] . . . . . . . . . . . . . . . 989 Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . 968 6.11.2.1 Lock Setting and Clearing . . . 989 Chapter 6. Storage Control 939 Version 2.06 6.11.2.2 Error Conditions . . . . . . . . . . .989 6.11.4.3 Invalidating TLB Entries . . . . 997 6.11.2.2.1 Overlocking . . . . . . . . . . . . .990 6.11.4.4 TLB Lookaside Information. . 999 6.11.2.2.2 Unable-to-lock and Unable-to- 6.11.4.5 Invalidating LRAT Entries . . . 999 unlock Conditions . . . . . . . . . . . . . . . . .990 6.11.4.6 Searching TLB Entries . . . . . 999 6.11.2.3 Cache Locking Instructions . .991 6.11.4.7 TLB Replacement Hardware 6.11.3 Synchronize Instruction. . . . . . .993 Assist . . . . . . . . . . . . . . . . . . . . . . . . . 999 6.11.4 LRAT [Category: Embedded.Hyper- 6.11.4.8 32-bit and 64-bit Specific MMU visor.LRAT] and TLB Management. . . .993 Behavior . . . . . . . . . . . . . . . . . . . . . . 1000 6.11.4.1 Reading TLB or LRAT 6.11.4.9 TLB Management Entries . . . . . . . . . . . . . . . . . . . . . . . . .993 Instructions . . . . . . . . . . . . . . . . . . . . 1001 6.11.4.2 Writing TLB or LRAT Entries .993 6.11.4.2.1 TLB Write Conditional [Embed- ded.TLB Write Conditional] . . . . . . . . .994 6.1 Overview trapped to the hypervisor, which allows the hypervisor to virtualize data accesses to specific pages, e.g. Instruction effective addresses are generated for accesses to memory-mapped I/O. Storage control sequential instruction fetches and for addresses that attributes described in Book II are supported by corre- correspond to a change in program flow (branches, sponding TLB bits, as are four optional implementation- interrupts). Data effective addresses are generated by dependent user-defined storage control attributes. The Load, Store, and Cache Management instructions. TLB organization of the TLB (e.g. associativity, number of Management instructions generate effective addresses entries, number of arrays, etc.) is implementation- to determine the presence of or to invalidate a specific dependent. There is MMU configuration and TLB con- TLB entry associated with that address. For a complete figuration information in various registers to describe discussion of storage addressing and effective address this implementation-dependent organization. calculation, see Section 1.10 of Book I. Software manages translation directly by installing TLB Portions of the context of an effective address are entries, and indirectly by setting up the page tables, appended to it to form the virtual address. The context which the TLB will cache. TLB Management instruc- is provided by various registers. The virtual address tions are used by software to read, write, search and consists of the Logical Partition ID (LPID) , the invalidate TLB contents. MMU Assist Registers (MAS) Guest State , the address space identifier, the are used to transfer data to and from the TLB arrays by process identifier, and the effective address. The virtual TLB Management instructions. If the Embedded.Hyper- address is translated to a real address by a matching visor category is not supported, TLB Management "direct" entry in the Translation Lookaside Buffer (TLB) instructions are privileged instructions. according to procedures described in Section 6.7.3. A different MMU Architecture Version (MAV) is used to The Virtual Page Number (VPN) part of the virtual indicate that different register layouts and functions are address is compared to the TLB contents to determine provided. The MMU Architecture Version Number is a match. The VPN consists of bits of the virtual address specified by the read-only MMUCFG register. The with the exception of the low-order effective address Embedded.Hypervisor.LRAT, Embedded TLB Write bits that correspond to the byte offset within the page. If Conditional, and Embedded.Page Table categories are the Embedded.Page Table category is supported, a vir- available only in MMU Architecture Version 2.0. tual address can be translated by the Page Table pointed to by a matching "indirect" TLB entry as If the Embedded.Hypervisor category is supported and described in Section 6.7.4. As a result of a Page Table the Embedded.Hypervisor.LRAT category is not sup- translation, a direct TLB entry is created, and this direct ported, most TLB Management instructions are hyper- TLB entry can be used for subsequent translations. All visor instructions. TLB entries contain real addresses, virtual addresses are translated by the Page Table and, to maintain isolation between partitions, guest or the TLB, i.e. unlike the Server environment, operating systems are not given access to real there is no real mode. The real address that results addresses. In this case most TLB Management instruc- from the translation is used to access main storage. tions trap to the hypervisor. The hypervisor can emu- late a TLB Management instruction by swapping a Real The Translation Lookaside Buffer is the hardware Page Number for corresponding Logical Page Number resource that also controls protection and storage con- (LPN) in the MAS registers or vice versa so that the trol attributes. TLB permission bits control user and guest OS only sees LPNs. supervisor read, write and execute capability. If the Embedded.Hypervisor category is supported, the Virtu- However, if the Embedded.Hypervisor.LRAT category alization Fault bit permits data accesses to pages to be is supported, hardware can perform the translation of 940 Power ISATM Book III-E Version 2.06 an LPN into a corresponding RPN. In this case, a TLB Write Entry (tlbwe) instruction can be executed in guest supervisor state. The LPN in the MAS register is Virtual Address translated into a corresponding RPN by a hardware lookup in a Logical to Real Address Translation (LRAT) array, and the RPN is written to the TLB in place of the LPN when tlbwe is executed in guest supervisor state. Lookup VA in TLB The Embedded.TLB Write Conditional (E.TWC) cate- gory provides a TLB write operation that is conditional on a TLB-reservation where the TLB-reservation is pre- viously established by a tlbsrx. instruction. The TLB- direct entry Only indirect reservation is cleared by TLB invalidations and TLB matches entry matches writes involving the same virtual page. Thus, without acquiring a software lock, software can use the E.TWC category to write a TLB entry while ensuring that the entry is not a duplicate of an entry created simulta- Lookup in Page Table neously by another thread that shares the TLB and is not a stale value for a virtual page that was concur- rently invalidated. Category E.HV sup- Category E.HV Figure 18 gives an overview of address translation if ported & TGS of the Embedded.Page Table category is supported. The not supported indirect TLB entry IND bit in a TLB entry indicates whether the entry is a = 1 "direct" entry or "indirect" entry. When a virtual address is translated, the TLB arrays are searched for a match- ing entry. If there is one and only one matching direct Lookup LPN in LRAT entry, that entry is used to translate the VA. If there is [Category: E.HV.LRAT] no matching direct TLB entry, but there is one and only one matching indirect entry, the indirect entry is used to access a Page Table Entry (PTE). If the PTE is a valid entry (V bit = 1), the PTE is used to translate the address. If the Embedded.Page Table and Embed- ded.Hypervisor categories are both supported, the Real Address Embedded.Hypervisor.LRAT category is supported. In this case if the TGS bit of the indirect TLB entry is 1, the Figure 18. Address translation with page table RPN from the PTE is treated as a Logical Page Num- ber (LPN) and translated by the LRAT into an RPN. If the Embedded.Page Table is supported but the Address Size Overview Embedded.Hypervisor category is not supported, Real address space size is 2m bytes, m64; see supervisor software can create direct and indirect TLB Note 1. entries and can control the Page Table Entries. If both In MMU Architecture Version 1.0, real page sizes categories are supported, guest supervisor software are 4p KB where 0p15 (i.e. 1 KB, 4KB, 16KB, can still create direct and indirect TLB entries and con- 64KB, 256KB, 1MB, 4MB, 16MB, 64MB, 256MB, trol the Page Table Entries if guest execution of TLB 1GB, 4GB, 16GB, 64GB, 256GB, 1TB); see Note Management instructions is enabled. However, 2. In MMU Architecture Version 2.0, real page depending on various factors such as the number of sizes are 2p KB where 0p31 (i.e. 1 KB, 2 KB, available LRAT entries, performance may be better if 4KB, 8 KB, 16KB, 32KB, 64KB, 128KB, 256KB, guest virtual addresses are translated by a Page Table 512KB, 1MB, 2MB, 4MB, 8MB, 16MB, 32MB, that is managed by hypervisor software. 64MB, 128MB, 256MB, 512MB, 1GB, 2GB, 4GB, 8GB, 16GB, 32GB, 64GB, 128GB, 256GB, 512GB, 1TB, 2TB); see Note 2. However, real pages sizes supported by a Page Table are limited to values of p where 2p15. Effective address space size is 264 bytes in 64-bit implementations and 232 bytes in 32-bit implemen- tations. The virtual address space size depends on the implementation. Chapter 6. Storage Control 941 Version 2.06 - Virtual address space size in 64-bit implemen- Programming Note tations is 2v bytes, where: - 66v79 if the Embedded.Hypervisor [Category: Embedded.Hypervisor.LRAT]: The logi- Category is not supported; see Note 3. cal pages sizes supported by an implementation - 68v92 if the Embedded.Hypervisor are typically larger than the real page sizes sup- Category is supported; see Note 3. ported. This implies that memory blocks must be - Virtual address space size in 32-bit implemen- assigned to a partition with larger granularity than tations is 2v bytes, where: the memory blocks that can be managed within a - 34v47 if the Embedded.Hypervisor partition. Category is not supported; see Note 3. - 36v60 if the Embedded.Hypervisor Category is supported; see Note 3. - The number of LPID bits is 1g12; see Note 3. 6.2 Storage Exceptions - There is one GS bit. - There is one AS bit. A storage exception results when the sequential execu- - The number of PID bits is 1d14; see Note 3. tion model requires that a storage access be performed - For any given real page, the virtual page size but the access is not permitted (e.g., is not permitted by is the same as the real page size. the storage protection mechanism), the access cannot If the Embedded.Hypervisor.LRAT category is be performed because the effective address cannot be supported, the following applies. translated to a real address, or the access matches The logical page sizes allowed by the archi- some tracking mechanism criteria (e.g., Data Address tecture are the same as the real page sizes. Breakpoint). However, an implementation need not support In certain cases a storage exception may result in the the same logical and real page sizes. "restart" of (re-execution of at least part of) a Load or The logical address space size is 2q bytes, Store instruction. See Section 2.1 of Book II and where q64; see Note 4. Section 7.7 on page 1054 in this Book. Notes: 1. The value of m is implementation-dependent (sub- ject to the maximum given above). When used to 6.3 Instruction Fetch address storage, the high-order 64-m bits of the For an instruction fetch, MSRIS is appended to the "64-bit" real address must be zeros. A maximum of effective address as part of the virtual address. The 64 bits of real address can by supported by the Address Translation mechanism is described in TLB. A maximum of 52 bits of real address can be Section 6.7.2, Section 6.7.3, and, if the Embed- supported by the Page Table . ded.Page Table category is supported, Section 6.7.4. 2. Which of these pages sizes are supported is implementation-dependent. If an implementation 6.3.1 Implicit Branch supports multiple TLB arrays, the page sizes sup- ported by each array may be different. Supported Explicitly altering certain MSR bits (using mtmsr), or page sizes are indicated by TLB configuration explicitly altering TLB entries, certain System Registers information (see Sections 6.10.3.3 and 6.10.3.4). and possibly other implementation-dependent regis- ters, may have the side effect of changing the 3. The values of v, g, and d are implementation- addresses, effective or real, from which the current dependent (subject to the range given above). The instruction stream is being fetched. This side effect is value of v is a function of g, d, whether the imple- called an implicit branch. For example, an mtmsr mentation is 32-bit or 64-bit, and whether the instruction that changes the value of MSRCM may Embedded.Hypervisor category is supported. change the real address from which the current instruc- 4. The value of q is implementation-dependent (sub- tion stream is being fetched. The MSR bits and System ject to the maximum given above). A maximum of Registers (excluding implementation-dependent regis- 64 bits of logical address can by supported by the ters) for which alteration can cause an implicit branch LRAT. A maximum of 52 bits of logical address are indicated as such in Chapter 12. "Synchronization can be supported by the Page Table . Requirements for Context Alterations" on page 1099. Implicit branches are not supported by the Power ISA. If an implicit branch occurs, the results are boundedly undefined. 942 Power ISATM Book III-E Version 2.06 6.3.2 Address Wrapping Com- operation would not have been performed in the sequential execution model, any results of the opera- bined with Changing MSR Bit CM tion are abandoned (except as described below). If the current instruction is at effective address 232-4 In the remainder of this section, including its subsec- and is an mtmsr instruction that changes the contents tions, "Load instruction" includes the Cache Manage- of MSRCM, the effective address of the next sequential ment and other instructions that are stated in the instruction is undefined. instruction descriptions to be "treated as a Load", and similarly for "Store instruction". Programming Note A data access that is performed out-of-order may cor- In the case described in the preceding paragraph, if respond to an arbitrary Load or Store instruction (e.g., a an interrupt occurs before the next sequential Load or Store instruction that is not in the instruction instruction is executed, the contents of SRR0, stream being executed). Similarly, an instruction fetch CSRR0, or MCSRR0, as appropriate to the inter- that is performed out-of-order may be for an arbitrary rupt, are undefined if the Embedded.Hypervisor instruction (e.g., the aligned word at an arbitrary loca- category is not supported or the interrupt is tion in instruction storage). directed to the hypervisor state. If the Embed- ded.Hypervisor category is supported and the inter- Most operations can be performed out-of-order, as long rupt is directed to the guest state, the contents of as the machine appears to follow the sequential execu- GSRR0 are undefined. tion model. Certain out-of-order operations are restricted, as follows. Stores 6.4 Data Access Stores are not performed out-of-order (even if the Store instructions that caused them were executed For a normal Load or Store instruction, MSRDS is out-of-order). appended to the effective address as part of the virtual address. The Address Translation mechanism is Accessing Guarded Storage described in Section 6.7.2, Section 6.7.3, and, if the The restrictions for this case are given in Section Embedded.Page Table category is supported, 6.8.1.1. Section 6.7.4. The Embedded.External PID category must be supported. The effective address for an Exter- The only permitted side effects of performing an opera- nal Process ID Load or Store instruction data access is tion out-of-order are the following. processed under control of the EPLC or EPSC, respec- A Machine Check that could be caused by in-order tively. See Section 5.3.6.1 and Section 5.3.6.2. execution may occur out-of-order except that, if category E.HV is supported and the Machine Check is the result of multiple TLB entries that 6.5 Performing Operations translate the same VA, the Machine Check inter- Out-of-Order rupt must occur in the context in which it was caused. Also, if category E.HV is supported, a An operation is said to be performed "in-order" if, at the Machine Check interrupt resulting from the follow- time that it is performed, it is known to be required by ing situations must be precise. the sequential execution model. An operation is said to Execution of an External Process ID instruc- be performed "out-of-order" if, at the time that it is per- tion that has an operand that can be trans- formed, it is not known to be required by the sequential lated by multiple TLB entries. execution model. Execution of a tlbivax instruction that isn't a TLB invalidate all and there are multiple Operations are performed out-of-order on the expecta- entries in a single thread's TLB array(s) that tion that the results will be needed by an instruction that match the complete VPN. will be required by the sequential execution model. Execution of a tlbilx instruction with T=3 and Whether the results are really needed is contingent on there are multiple entries in the TLB array(s) everything that might divert the control flow away from that match the complete VPN. the instruction, such as Branch, Trap, System Call, and Execution of a tlbsx or tlbsrx. instruction and Return From Interrupt instructions, and interrupts, and there are multiple matching TLB entries. on everything that might change the context in which the instruction is executed. Non-Guarded storage locations that could be fetched into a cache by in-order fetching or execu- Typically, operations are performed out-of-order when tion of an arbitrary instruction may be fetched out- resources are available that would otherwise be idle, so of-order into that cache. the operation incurs little or no cost. If subsequent events such as branches or interrupts indicate that the Chapter 6. Storage Control 943 Version 2.06 6.6 Invalid Real Address Maintenance of TLB entries is under software control, except that if the Embedded.Page Table category is A storage access (including an access that is per- supported, hardware will write TLB entries for transla- formed out-of-order; see Section 6.5) may cause a tions performed via the Page Table. System software Machine Check if the accessed storage location con- determines TLB entry replacement strategy and the for- tains an uncorrectable error or does not exist. See mat and use of any page state information. If a TLB Section 7.6.3 on page 1034. provides Next Victim (NV) information, software can optionally use NV to choose a TLB entry to be replaced. See Section 6.11.4.7. Some implementations 6.7 Storage Control allow software to specify that a hardware generated hash and hardware replacement algorithm should be This section describes the address translation facility, used to select the entry. See Section 6.11.4.7. The TLB access control, and storage control attributes. entry contains all the information required to identify the page, to specify the translation, to specify access con- Demand-paged virtual memory is supported, as well as trols, and to specify the storage control attributes. a variety of other management schemes that depend on precise control of effective-to-real address transla- A TLB entry is written by copying information from MAS tion and flexible memory protection. Translation misses registers, using a tlbwe instruction (see page 1010). A and protection faults cause precise exceptions. Suffi- TLB entry is read by copying information to MAS regis- cient information is available to correct the fault and ters, using a tlbre instruction (see page 1008). Soft- restart the faulting instruction. ware can also search for specific TLB entries using the tlbsx instruction (see page 1005) and, if the Embed- The effective address space is divided into pages. The ded.TLB Write Conditional category is supported, tlb- page represents the granularity of effective address srx. (see page 1007). translation, access control, and storage control attributes. In MMU Architecture Version 1.0, up to six- Each TLB entry describes a page. Fields in the TLB teen page sizes (1KB, 4KB, 16KB, 64KB, 256KB, 1MB, entry fall into five categories: 4MB, 16MB, 64MB, 256MB, 1GB, 4GB, 16GB, 64GB, Page identification fields (information required to 256GB, 1TB) may be simultaneously supported. In identify the page to the hardware translation mech- MMU Architecture Version 2.0, up to 32 page sizes (1 anism). KB, 2 KB, 4KB, 8 KB, 16KB, 32KB, 64KB, 128KB, Address translation fields 256KB, 512KB, 1MB, 2MB, 4MB, 8MB, 16MB, 32MB, Access control fields 64MB, 128MB, 256MB, 512MB, 1GB, 2GB, 4GB, 8GB, Storage control attribute fields 16GB, 32GB, 64GB, 128GB, 256GB, 512GB, 1TB, TLB management field 2TB) may be simultaneously supported. In order for an effective to real translation to exist, a valid entry for the While the fields in the TLB entry are required, unless page containing the effective address must be in the they are identified as part of a category that is not sup- Translation Lookaside Buffer (TLB). Addresses for ported, no particular TLB entry format is formally speci- which no TLB entry exists cause TLB Miss exceptions. fied. The tlbre and tlbwe instructions provide the ability to read or write individual entries. Below are shown the field definitions for the TLB entry. Some fields that are 6.7.1 Translation Lookaside used only for indirect TLB entries can be overlaid with Buffer fields that are used only for direct TLB entries. Such overlap is implementation-dependent and an example The Translation Lookaside Buffer (TLB) is the hard- is shown in Figure 19 on page 949. ware resource that controls translation, protection, and storage control attributes. The organization of the TLB (e.g. associativity, number of entries, number of arrays, etc.) is implementation-dependent. Thus, the software for updating the TLB is also implementation-depen- dent. However, MMU configuration and TLB configura- tion information is provided such that software written to handle various TLB organizations could potentially run on multiple MMU implementations. A unified TLB organization (one to four TLB arrays, called TLB0, TLB1, TLB2 and TLB3, where each contains transla- tions for both instructions and data) is assumed in the following description. For details on how to synchronize TLB updates with instruction execution see Section 6.11.4.3 and Chapter 12. 944 Power ISATM Book III-E Version 2.06 . SIZE Page Size For direct TLB entries, the SIZE field speci- Page Identification Fields for direct and indirect fies the size of the virtual page associated entries with the TLB entry. For indirect TLB Name Description entries, the SIZE field specifies the maxi- EPN Effective Page Number (up to 54 bits) mum amount of virtual storage that can be Bits 0:n­1 of the EPN field are compared to mapped by the page table to which the bits 0:n­1 of the effective address (EA) of indirect TLB entry points. The following the storage access (where n=64­ applies in both cases: log2(page size in bytes) and page size is For MAV = 1.0, 4SIZEKB, where 0 specified by the SIZE field of the TLB SIZE 15. See Table 2. For TLB entry). See Table 2 and Table 3. arrays that contain fixed-size TLB Note: Bits X:Y of the EPN field are imple- entries, this field is treated as mented, where X=0 (64-bit implementa- reserved for tlbwe and tlbre tion) or X=32 (32-bit implementation), and instructions and is treated as a fixed Y 53. Y = p - 1 where p = 64­ value for translations. For variable log2(smallest page size in bytes) and page size TLB arrays, this field smallest page size is the smallest page must be a value between size supported by the implementation as TLBnCFGMINSIZE and specified by TLB array's TLBnCFG or TLBnCFGMAXSIZE. TLBnPS. The number of bits implemented For MAV = 2.0, 2SIZEKB, where 0 for EPN is not required to be the same SIZE 31. See Table 3. This field number of bits as are implemented for must be one of the page sizes spec- RPN. Implemented bits that represent off- ified by the corresponding TLBnPS sets within a page are ignored for address register. comparisons performed for translation, Implementations may support any one or invalidation, and searches and software more of the page sizes described above. should set these bits to zero. Unimple- TID Translation ID (implementation-dependent mented EPN bits are treated as if they size) contain 0s. Field used to identify a shared page (TID=0) TS Translation Address Space or the owner's process ID of a private page This bit indicates the address space this TLB (TID0). See Section 6.7.2. entry is associated with. For instruction TLPID Translation Logical Partition ID storage accesses, MSRIS must match the This field identifies a partition. The Transla- value of TS in the TLB entry for that TLB tion Logical Partition ID is compared with entry to provide the translation. Likewise, LPIDR contents during translation. This for data storage accesses, MSRDS must allows for an efficient change of address match the value of TS in the TLB entry. For space when a transition between partitions the tlbsrx. instruction, MAS1TS occurs. This number of bits in this field is provides the address space specification an implementation-dependent number n, that must match the value of TS. For the where 1 n 12. See Section 6.7.2. tlbsx instruction, MAS6SAS provides the TGS Translation Guest State address space specification that must This 1-bit field indicates whether this TLB match the value of TS. For the instructions entry is valid for the guest space or for the tlbilx with T=3 and tlbivax with hypervisor space. The Translation Guest EA61=0, MAS6SAS provides the address Space field is compared with the MSRGS space specification that is compared to the bit during translation. This allows for an value of TS. efficient change of address space when a transition from guest state to hypervisor state occurs. See Section 6.7.2. 0 Hypervisor space 1 Guest space Chapter 6. Storage Control 945 Version 2.06 V Valid Translation Field This bit indicates whether that this TLB entry Name Description is valid and may be used for translation. RPN Real Page Number (up to 54 bits) The Valid bit for a given entry can be set or For a direct TLB entry, bits 0:n­1 of the RPN cleared with a tlbwe instruction; alterna- field are used to replace bits 0:n­1 of the tively, the Valid bit for an entry may be effective address to produce the real cleared by a tlbilx or tlbivax address for the storage access (where instruction or by a MMUCSR0 TLB invali- n=64­log2(page size in bytes) and page size date all. is specified by the SIZE field of the TLB IND Indirect entry). See Section 6.7.3 for a requirement This bit distinguishes between an indirect on unused low-order RPN bits (i.e. bits n:53) TLB entry that points to a Page Table being 0. (IND=1) and a direct TLB entry that can be For an indirect TLB entry, bits 0:m-1 of the used directly to translate a virtual address RPN field followed by 64-m 0s are the real (IND=0). If a TLB array does not support address of the page table pointed to the indi- this bit (TLBnCFGIND = 0), the implied IND rect TLB entry, where m = 61 ­ (SIZE ­ value is 0. For the tlbsx instruction, SPSIZE). RPN bits m:53 must be zero. See MAS6SIND provides the direct/indirect Section 6.7.4. specification that must match the value of Note: Bits X:Y of the RPN field are imple- IND. For the instructions tlbilx mented, where X 0 and Y 53. X = 64 - with T=3 and tlbivax with EA61=0, MMUCFGRASIZE. Y is the larger of the fol- MAS6SIND provides the direct/indirect lowing applicable values: specification that is compared to the value p - 1 where p = 64­ of IND. See Section 6.7.4. log2(smallest_page size in bytes) and smallest page size is the smallest Page Identification Field for indirect entry page size supported by the imple- Name Description mentation as specified by TLB array's SPSIZE Sub-Page Size (IND=1) TLBnCFG or TLBnPS. SPSIZE is a 5-bit field that specifies the mini- 52 if the Embedded.Page Table cate- mum page size that can be specified by gory is supported and a page table each Page Table Entry in the Page Table size of 2 KB is supported (EPTCF- that is pointed to by the indirect TLB entry. GPSn - EPTCFGSPSn = 8 for some This minimum page size is 2SPSIZE KB value of n). and must be at least 4 KB. Thus SPSIZE The number of bits implemented for EPN is not must be at least 2. Valid values are speci- required to be the same number of bits as fied by EPTCFGSPS2 SPS1 SPS0. See Sec- are implemented for RPN. Unimplemented tion 6.7.4 RPN bits are treated as if they contain 0s. Storage Control Bits (see Section 6.8.3 on page 966) Name Description W Write-Through Required This bit indicates whether the page is Write- Through Required. See Section 1.6.1 of Book II. 0 This page is not Write-Through Required storage. 1 This page is Write-Through Required storage. I Caching Inhibited This bit indicates whether the page is Caching Inhibited. See Section 1.6.2 of Book II. 0 This page is not Caching Inhibited storage. 1 This page is Caching Inhibited stor- age. 946 Power ISATM Book III-E Version 2.06 M Memory Coherence Required Access Control Fields for direct TLB entry This bit indicates whether the page is Memory Name Description Coherence Required. See Section 1.6.3 of UX User State Execute Enable (IND=0) Book II. See Section 6.7.6.1. 0 This page is not Memory Coherence 0 Instruction fetch and execution is not permit- Required storage. ted from this page while MSRPR=1 and will 1 This page is Memory Coherence cause an Execute Access Control exception Required storage. type Instruction Storage interrupt. G Guarded 1 Instruction fetch and execution is permitted This bit indicates whether the page is Guarded. from this page while MSRPR=1. See Section 1.6.4 of Book II and Section SX Supervisor State Execute Enable (IND=0) 6.8.1. See Section 6.7.6.1. 0 This page is not Guarded storage. 0 Instruction fetch and execution is not permit- 1 This page is Guarded storage. ted from this page while MSRPR=0 and will E Endian Mode cause an Execute Access Control exception This bit indicates whether the page is accessed type Instruction Storage interrupt. in Little-Endian or Big-Endian byte order. 1 Instruction fetch and execution is permitted See Section 1.10.1 of Book I and from this page while MSRPR=0. Section 1.6.5 of Book II. UW User State Write Enable (IND=0) 0 The page is accessed in Big-Endian See Section 6.7.6.2. byte order. 0 Store operations, including dcba, dcbz, and 1 The page is accessed in Little-Endian dcbzep are not permitted to this page when byte order. MSRPR=1 and will cause a Write Access U0:U3 User-Definable Storage Control Control exception. A Write Access Control Attributes See Section 6.8.2. exception will cause a Data Storage inter- Specifies implementation-dependent and sys- rupt. tem-dependent storage control attributes for 1 Store operations, including dcba, dcbz, and the page associated with the TLB entry. The dcbzep are permitted to this page when existence of these bits is implementation- MSRPR=1. dependent. SW Supervisor State Write Enable (IND=0) VLE Variable Length Encoding See Section 6.7.6.2. This bit specifies whether a page 0 Store operations, including dcba, dcbi, which contains instructions is to be dcbz, and dcbzep are not permitted to this decoded as VLE instructions (see page when MSRPR=0. Store operations, Chapter 1 of Book VLE). See Section including dcbi, dcbz, and dcbzep, will 6.8.3 and Chapter 1 of Book VLE. cause a Write Access Control exception. A 0 Instructions fetched from the page Write Access Control exception will cause a are decoded and executed as Data Storage interrupt. non-VLE instructions. 1 Store operations, including dcba, dcbi, 1 Instructions fetched from the page dcbz, and dcbzep, are permitted to this are decoded and executed as page when MSRPR=0. VLE instructions. UR User State Read Enable (IND=0) ACM Alternate Coherency Mode See Section 6.7.6.3. This bit allows an implementation to employ 0 Load operations (including load-class Cache more than a single coherency method. This Management instructions) are not permitted allows participation in multiple coherency from this page when MSRPR=1 and will protocols. If the M attribute (Memory Coher- cause a Read Access Control exception. A ence Required) is not set for a page (M=0), Read Access Control exception will cause a the page has no coherency associated with it Data Storage interrupt. and the ACM attribute is ignored. If the M 1 Load operations (including load-class Cache attribute is set to 1 for a page (M=1), the Management instructions) are permitted ACM attribute is used to determine the from this page when MSRPR=1. coherence domain (or protocol) used. The coherency method used in Alternate Coher- ency Mode is implementation-dependent. Chapter 6. Storage Control 947 Version 2.06 SR Supervisor State Read Enable (IND=0) Programming Note See Section 6.7.6.3. Any TLB entry with IPROT = 0 is volatile and may 0 Load operations (including load-class Cache be evicted for the following reasons even though Management instructions) are not permitted software didn't explicitly remove or invalidate the from this page when MSRPR=0 and will entry. cause a Read Access Control exception. A Generous TLB invalidations (tlbivax and Read Access Control exception will cause a tlbilx) Data Storage interrupt. TLB updates due to Page Table translations 1 Load operations (including load-class Cache Management instructions) are permitted Hardware replacement algorithm on a tlbwe from this page when MSRPR=0. instruction if MMUCFGHES=1 and MAS0HES =1. Access Control Field for direct and indirect entries Name Description VF Virtualization Fault TLB entry with IND=0 TLB entry with IND=1 See Section 6.7.6.4 This 1-bit field specifies whether the TLB entry UX SPSIZE0 is used by the hypervisor to virtualize data SX SPSIZE1 accesses, e.g. accesses to memory-mapped UW SPSIZE2 I/O. A translation of the operand address of a SW SPSIZE3 Load, Store, or Cache Management instruc- tion that uses a TLB entry with the Virtualiza- UR SPSIZE4 tion Fault field equal to 1 causes a SR RPN52 Virtualization Fault exception type Data Stor- age interrupt regardless of the settings of the Figure 19. Overlaid TLB Field Example permission bits. The interrupt is always directed to hypervisor state regardless of the setting of EPCRDSIGS. 0 A Load, Store, or Cache Management 6.7.2 Virtual Address Spaces access to this page does not cause a Virtual- ization Fault exception. There are two address spaces and their use is deter- 1 A Load, Store, or Cache Management mined by the operating system. Some operating sys- access to this page causes a Virtualization tems use one address space for user programs and Fault exception. one for certain supervisor software such as interrupt handlers. There are two bits in the Machine State Reg- ister, the Instruction Address Space bit (IS) and the TLB Management Field Data Address Space bit (DS), that control which Name Description address space instruction and data storage accesses, IPROT Invalidation Protection respectively, are performed in, and a bit in the TLB A TLB entry with this bit equal to 1 is protected entry (TS) that specifies which address space that TLB from all TLB invalidation mechanisms entry is associated with. except the explicit writing of a 0 to the V bit. See Section 6.11.4.3. IPROT is imple- Programming Note mented only for TLB entries in TLB arrays Because MSRIS and MSRDS are set to 0 by the where TLBnCFGIPROT is indicated. If IPROT hardware on interrupt, the Operating System soft- = 1, the TLB entry is protected from invali- ware that handles interrupts should be designed to date operations due to any of the following. run with AS=0. As a result, Operating System soft- execution of tlbivax ware that wishes to, for example, use one address execution of tlbilx space for user and the other for supervisor should tlbivax invalidations from use AS=0 for supervisor and AS=1 for user. another thread tlbilx invalidations from another thread when the If the Embedded.Hypervisor category is supported, the TLB is shared with that thread above two address spaces exist for each logical parti- TLB invalidate all operations tion and for both the guest and non-guest states within This bit is a hypervisor resource. each logical partition. The Logical Partition ID Register identifies the partition and a field in the TLB entry (TLPID) specifies which partition that TLB entry is associated with. The Guest State (GS) bit in the Machine State Register identifies the guest state or 948 Power ISATM Book III-E Version 2.06 non-guest state and a bit in the TLB entry (TGS) speci- Guest State bit. The Logical Partition ID is provided by fies which of these states that TLB entry is associated the contents of LPIDR and the Guest State bit is pro- with. vided by the MSRGS. For instruction fetches, the address space identifier is provided by MSRIS and the Load, Store, Cache Management, and Branch instruc- process identifier is provided by the contents of the tions and next-sequential-instruction fetches produce a Process ID Register. For data storage accesses, the 64-bit effective address. A one-bit address space iden- address space identifier is provided by the MSRDS and tifier and a process identifier are prepended to the the process identifier is provided by the contents of the effective address to form the virtual address. If the Process ID Register. Embedded.Hypervisor category is supported, this address is also prepended by a Logical Partition ID and MSRGS MSRDS for data storage accesses MSRIS for instruction fetch LPIDR PID 64-bit Effective Address LPID PID GS Effective Page Address Offset AS 0 n­1 n 63 GS || LPID included if Embedded.Hypervisor category is supported Virtual Address TLB multiple-entry RPN0:53 Real Page Number Offset 0 n­1 n 63 NOTE: n = 64­log2(page size) 64-bit Real Address Figure 20. Effective-to-Virtual-to-Real TLB Address Translation Flow 6.7.3 TLB Address Translation respectively, of each TLB entry. If the Embed- ded.Hypervisor category is supported, the Logical Par- A program references memory by using the effective tition ID and the Guest State bit are also compared to address computed by the hardware when it executes a the Translation Logical Partition ID (TLPID) and Trans- Load, Store, Cache Management, or Branch instruc- lation Guest State (TGS) of each TLB entry. Figure 21 tion, and when it fetches the next instruction. A virtual illustrates the criteria for a virtual address to match a address is formed from the effective address as specific TLB entry for a direct TLB entry (IND = 0). See described in Section 6.7.2 and the virtual address is Section 6.7.4 for details on Page Table translation using translated to a real address according to the proce- an indirect TLB entry. dures described in this section. The storage subsystem The virtual address of a storage access matches a uses the real address for the access. All storage direct TLB entry if the first four following conditions are access effective addresses are translated to real true, and, additionally, if the Embedded.Page Table addresses using the TLB mechanism. See Figure 20. category is supported, the fifth condition is true, and, The virtual address is used to locate the associated additionally, if the Embedded.Hypervisor category is entry in the TLB. The address space identifier, the pro- supported, the last two conditions are true. cess identifier, and the effective address of the storage The Valid bit of the TLB entry is 1. access are compared to the Translation Address The value of the address specifier for the storage Space bit (TS), the Translation ID field (TID), and the access (MSRIS for instruction fetches, MSRDS for value in the Effective Page Number field (EPN), Chapter 6. Storage Control 949 Version 2.06 data storage accesses) is equal to the value in the TS bit of the TLB entry. The value of the process identifier in the PID regis- Table 3: Page Size and Effective Address to TLB EPN ter is equal to the value in the TID field of the TLB Comparison for MAV = 2.0 entry or the value of the TID field of the TLB entry Page Size EA to EPN Comparison SIZE is equal to 0. (2SIZEKB) (bits 0:53­SIZE) 0b00000 1KB EPN0:53 =? EA0:53 The contents of bits 0:n­1 of the effective address 0b00001 2KB EPN0:52 =? EA0:52 of the storage access are equal to the value of bits 0b00010 4KB EPN0:51 =? EA0:51 0:n-1 of the EPN field of the TLB entry (where 0b00011 8KB EPN0:50 =? EA0:50 n=64-log2(page size in bytes) and page size is 0b00100 16KB EPN0:49 =? EA0:49 specified by the value of the SIZE field of the TLB 0b00101 32KB EPN0:48 =? EA0:48 entry). See Table 2 and Table 3. 0b00110 64KB EPN0:47 =? EA0:47 One of the following conditions is true. 0b00111 128KB EPN0:46 =? EA0:46 The TLB array supports the IND bit 0b01000 256KB EPN0:45 =? EA0:45 (TLBnCFGIND = 1) and the IND bit of the TLB 0b01001 512KB EPN0:44 =? EA0:44 entry is equal to 0. 0b01010 1MB EPN0:43 =? EA0:43 The TLB array does not support the IND bit 0b01011 2MB EPN0:42 =? EA0:42 (TLBnCFGIND = 0). 0b01100 4MB EPN0:41 =? EA0:41 Either the value of the logical partition identifier in 0b01101 8MB EPN0:40 =? EA0:40 LPIDR is equal to the value of the TLPID field of 0b01110 16MB EPN0:39 =? EA0:39 the TLB entry, or the value of the TLPID field of the 0b01111 32MB EPN0:38 =? EA0:38 TLB entry is equal to 0. 0b10000 64MB EPN0:37 =? EA0:37 The value of the guest state bit (MSRGS) is equal 0b10001 128MB EPN0:36 =? EA0:36 to the value of the TGS bit of the TLB entry. 0b10010 256MB EPN0:35 =? EA0:35 0b10011 512MB EPN0:34 =? EA0:34 0b10100 1GB EPN0:33 =? EA0:33 0b10101 2GB EPN0:32 =? EA0:32 Table 2: Page Size and Effective Address to TLB EPN 0b10110 4GB EPN0:31 =? EA0:31 Comparison for MAV = 1.0 0b10111 8GB EPN0:30 =? EA0:30 Page Size EA to EPN Comparison SIZE 0b11000 16GB EPN0:29 =? EA0:29 (4SIZEKB) (bits 0:53­2×SIZE) 0b11001 32GB EPN0:28 =? EA0:28 0b0000 1 KB EPN0:53 =? EA0:53 0b11010 64GB EPN0:27 =? EA0:27 0b0001 4KB EPN0:51 =? EA0:51 0b11011 128GB EPN0:26 =? EA0:26 0b0010 16KB EPN0:49 =? EA0:49 0b11100 256GB EPN0:25 =? EA0:25 0b0011 64KB EPN0:47 =? EA0:47 0b11101 512GB EPN0:24 =? EA0:24 0b0100 256KB EPN0:45 =? EA0:45 0b11110 1TB EPN0:23 =? EA0:23 0b0101 1MB EPN0:43 =? EA0:43 0b11111 2TB EPN0:22 =? EA0:22 0b0110 4MB EPN0:41 =? EA0:41 0b0111 16MB EPN0:39 =? EA0:39 Programming Note 0b1000 64MB EPN0:37 =? EA0:37 0b1001 256MB EPN0:35 =? EA0:35 An implementation need not support all page sizes. 0b1010 1GB EPN0:33 =? EA0:33 0b1011 4GB EPN0:31 =? EA0:31 If the virtual address of the storage access matches a 0b1100 16GB EPN0:29 =? EA0:29 TLB entry in accordance with the selection criteria 0b1101 64GB EPN0:27 =? EA0:27 specified in the preceding paragraph, the value of the 0b1110 256GB EPN0:25 =? EA0:25 Real Page Number field (RPN) of the matching TLB 0b1111 1TB EPN0:23 =? EA0:23 entry provides the real page number portion of the real address. Let n=64­log2(page size in bytes) where page size is specified by the SIZE field of the TLB entry. Bits n:63 of the effective address are appended to bits 0:n­1 of the 54-bit RPN field of the matching TLB entry to produce the 64-bit real address (i.e. RA = RPN0:n­1 || EAn:63) that is presented to main stor- age to perform the storage access. The page size is determined by the value of the SIZE field of the match- ing TLB entry. See Table 4 and Table 5. Depending on the page size, certain RPN bits of the matching TLB entry must be zero as shown in Table 4 and Table 5. Otherwise, it is implementation-dependent whether the 950 Power ISATM Book III-E Version 2.06 address translation is performed as if these RPN bits are 0 or as if the corresponding RA bits are undefined Table 5: Real Address Generation for MAV = 2.0 values, or either an Instruction Storage exception (for Page RPN Bits an instruction fetch) or Data Storage exception (for a Size Required data access) occurs. If the specified page size is not SIZE Real Address (4SIZE to be Equal supported by the implementation's TLB array, it is KB) to 0 implementation-dependent whether the address trans- lation is performed as if the page size was a smaller 0b00000 1KB none RPN0:53 || EA54:63 size or either an Instruction Storage exception (for an 0b00001 2KB RPN53:53=0 RPN0:52 || EA53:63 instruction fetch) or Data Storage exception (for a data 0b00010 4KB RPN52:53=0 RPN0:51 || EA52:63 access) occurs. 0b00011 8KB RPN51:53=0 RPN0:50 || EA51:63 0b00100 16KB RPN50:53=0 RPN0:49 || EA50:63 0b00101 32KB RPN49:53=0 RPN0:48 || EA49:63 Table 4: Real Address Generation for MAV = 1.0 0b00110 64KB RPN48:53=0 RPN0:47 || EA48:63 Page RPN Bits 0b00111 128KB RPN47:53=0 RPN0:46 || EA47:63 Size Required 0b01000 256KB RPN46:53=0 RPN0:45 || EA46:63 SIZE Real Address 0b01001 512KB RPN45:53=0 RPN0:44 || EA45:63 (4SIZE to be Equal KB) to 0 0b01010 1MB RPN44:53=0 RPN0:43 || EA44:63 0b01011 2MB RPN43:53=0 RPN0:42 || EA43:63 0b0000 1 KB none RPN0:53 || EA54:63 0b01100 4MB RPN42:53=0 RPN0:41 || EA42:63 0b0001 4KB RPN52:53=0 RPN0:51 || EA52:63 0b01101 8MB RPN41:53=0 RPN0:40 || EA41:63 0b0010 16KB RPN50:53=0 RPN0:49 || EA50:63 0b01110 16MB RPN40:53=0 RPN0:39 || EA40:63 0b0011 64KB RPN48:53=0 RPN0:47 || EA48:63 0b01111 32MB RPN39:53=0 RPN0:38 || EA39:63 0b0100 256KB RPN46:53=0 RPN0:45 || EA46:63 0b10000 64MB RPN38:53=0 RPN0:37 || EA38:63 0b0101 1MB RPN44:53=0 RPN0:43 || EA44:63 0b10001 128MB RPN37:53=0 RPN0:36 || EA37:63 0b0110 4MB RPN42:53=0 RPN0:41 || EA42:63 0b10010 256MB RPN36:53=0 RPN0:35 || EA36:63 0b0111 16MB RPN40:53=0 RPN0:39 || EA40:63 0b10011 512MB RPN35:53=0 RPN0:34 || EA35:63 0b1000 64MB RPN38:53=0 RPN0:37 || EA38:63 0b10100 1GB RPN34:53=0 RPN0:33 || EA34:63 0b1001 256MB RPN36:53=0 RPN0:35 || EA36:63 0b10101 2GB RPN33:53=0 RPN0:32 || EA33:63 0b1010 1GB RPN34:53=0 RPN0:33 || EA34:63 0b10110 4GB RPN32:53=0 RPN0:31 || EA32:63 0b1011 4GB RPN32:53=0 RPN0:31 || EA32:63 0b10111 8GB RPN31:53=0 RPN0:30 || EA31:63 0b1100 16GB RPN30:53=0 RPN0:29 || EA30:63 0b11000 16GB RPN30:53=0 RPN0:29 || EA30:63 0b1101 64GB RPN28:53=0 RPN0:27 || EA28:63 0b11001 32GB RPN29:53=0 RPN0:28 || EA29:63 0b1110 256GB RPN26:53=0 RPN0:25 || EA26:63 0b11010 64GB RPN28:53=0 RPN0:27 || EA28:63 0b1111 1TB RPN24:53=0 RPN0:23 || EA24:63 0b11011 128GB RPN27:53=0 RPN0:26 || EA27:63 0b11100 256GB RPN26:53=0 RPN0:25 || EA26:63 0b11101 512GB RPN25:53=0 RPN0:24 || EA25:63 0b11110 1TB RPN24:53=0 RPN0:23 || EA24:63 0b11111 2TB RPN23:53=0 RPN0:22 || EA23:63 A TLB Miss exception occurs if there is no valid match- ing direct entry in the TLB for the page specified by the virtual address (Instruction or Data TLB Error interrupt) and, if the Embedded.Page Table category is sup- ported, there is no matching indirect entry (see Section 6.7.4). A TLB Miss exception for an instruction fetch will result in an Instruction TLB Miss exception type Instruc- tion TLB Error interrupt. A TLB Miss exception for a data storage access will result in a Data TLB Miss exception type Data TLB Error interrupt. Although the possibility exists to place multiple direct and/or multiple indirect entries into the TLB that match a specific virtual address, assuming a set-associative or fully-associa- tive organization, doing so is a programming error. Either one of the matching entries is used or a Machine Check exception occurs if there are multiple matching direct entries or multiple matching indirect entries for an instruction or data access. The rest of the matching TLB entry provides the access control bits (UX, SX, UW, SW, UR, SR, VF), and stor- Chapter 6. Storage Control 951 Version 2.06 age control attributes (ACM [implementation-depen- dent], VLE , U0, U1, U2, U3, W, I, M, G, E) for the storage access. The access control bits and stor- age control attribute bits specify whether or not the access is allowed and how the access is to be per- formed. See Sections 6.7.6 and 6.11.4. included if Category E.HV supported TLBentry[i][TGS] =? MSRGS partition LPIDRk:63 page =? non-guest page TLBentry[i][TLPID]k:63 =0? TLBentry[i][V] TLB entry i matches TLBentry[i][TS] =? { AS Legend: MSRIS for instruction fetches, or AS Process IDn:63 private page =? MSRDS for data storage accesses EA effective address of storage access shared page GS Guest State TLBentry[i][TID]n:63 =0? LPID Logical Partition ID IND Indirect bit { TLBentry[i][EPN]0:N-1 contents of Process ID Register for =? EA0:N-1 Process ID instruction fetches and data included if Category storage accesses E.PT supported TLBentry[i][IND] =0? N-1 63 ­ log2(page size) n 64 ­ # of implemented PID/TID bits k 64-# of implemented LPID bits Figure 21. Address Translation: Virtual Address to direct TLB Entry Match Process 6.7.4 Page Table Address Trans- entry is 1, a Virtualization Fault exception occurs. If the PTE is a valid entry (V bit = 1), the PTE is used to lation [Category: Embedded.Page translate the address. The PTE includes the abbrevi- Table] ated RPN (ARPN), page size (PS), storage control (WIMGE), implementation-dependent bits, and storage A hardware Page Table is a variable-sized data struc- access control bits (BAP,R,C) that are used for the ture that specifies the mapping between virtual page access. If the Embedded.Page Table and Embed- numbers and real page numbers. There can be many ded.Hypervisor categories are both supported, the hardware Page Tables. Each Page Table is defined by Embedded.Hypervisor.LRAT category is supported. In an indirect TLB entry. An indirect TLB entry is an entry this case, the RPN from the PTE is treated as a Logical that has its IND bit equal to 1. Page Number (LPN) and the LPN is translated by the LRAT into an RPN. See Section 6.9. If there is more An indirect TLB entry matches the virtual address if all than one matching direct TLB entry or more than one fields match per Section 6.7.4 except for the IND bit matching indirect TLB entry, any one of the duplicate and the IND bit of the TLB entry is 1. If there is no entries may be used or Machine Check exception may matching direct TLB entry, but there is one and only occur. one matching indirect entry, the indirect entry is used to access a Page Table Entry (PTE) if the VF bit of the See Section 6.7.5 for the rules that software must fol- indirect TLB entry is 0. If the VF bit of this indirect TLB low when updating the Page Table. 952 Power ISATM Book III-E Version 2.06 Programming Note Even when the Embedded.Hypervisor category is supported, a Page Table can optionally be treated as a guest supervisor resource due to the LRAT. If the Page Table is treated as a hypervisor resource, the Page Table must be placed in stor- age to which only the hypervisor has access. More- over, the contents of the Page Table must be such that non-hypervisor software cannot modify storage that contains hypervisor programs or data. An LRAT identity mapping (LPN=RPN) can be used when the Page Tables are treated as hypervisor resources, especially if only one LRAT entry is pro- vided. If the LRAT identity mapping converts LPNs into RPNs that extend beyond the memory given to the partition, the Page Table Entries still provide the hypervisor with a mechanism to limit a guest's accesses to memory assigned to the partition, assuming guest execution of TLB Management instructions is disabled. Programming Note If storage accesses are to scattered virtual pages, an Embedded Page Table could be sparsely used, and, in the worst case, there could be only one valid PTE in the Page Table. In this case it would be more efficient for software to directly load TLB entries rather than have both an indirect TLBE and a direct TLBE, which is loaded from the Page Table. Chapter 6. Storage Control 953 Version 2.06 Effective Address Matching Indirect TLB Entry (TLBE) TGS IND RPN SIZE SPSIZE 64-p p 1 1 54 5 5 Effective Page Number Byte 1 xxx.......xxx00.0 0 23 51 0 31 32 52 64-p 63 DECODE to 29 SHIFT SHIFT RIGHT 21 Decode to EA Mask 0 20 21 28 000...00111...111 21 AND NOTE: p = log2(page size specified by PTEPS) 21* Page Table OR Page Table Entry (PTE) 32 21 8 3 0b000 64-bit Real Address of PTE 8 bytes Page Table Entry (PTE) 8 bytes ARPN WIMGE R impl. SW0 C PS BAP SW1 V dep. 0 40 45 46 50 51 52 56 62 63 ARPN0:51-p 0x000 64-bit Real Address 12 52-p p (Logical Address if LRAT used) byte Logical Address translation performed if Category E.HV is supported and the LRAT TGS bit of the associated indirect TLB entry is 1. 64-bit Real Address Figure 22. Page Table Translation 954 Power ISATM Book III-E Version 2.06 Figure 22 depicts the Page Table translation for a occurs. See Section 6.9. If an LRAT Error interrupt matching indirect TLB entry (TLBE). The Page Table results from this exception, ESRPT is set to 1. Entry that is used to translate the Effective Address is If the Page Table Entry that is accessed is invalid (Valid selected by a real address formed from some combina- bit = 0), a Page Table Fault exception occurs. An Exe- tion of RPN bits from the TLBE and some EA bits. The cute, Read, or Write Access Control exception occurs if low-order m bits of the RPN field in the indirect TLB a valid PTE is found but the access is not allowed by entry must be zeros, where m is (SIZE ­ SPSIZE) ­ 7. the access control mechanism. These exceptions are SIZE minus SPSIZE must be greater than 7 (corre- types of Instruction Storage exception or Data Storage sponding to a page table size of at least 2 KB; see exception, depending on whether the effective address below under "Page Table Size and Alignment"). The is for an instruction fetch or for a data access.See SIZE and SPSIZE fields of the TLBE determine which Section 7.6.4 and Section 7.6.5 for additional informa- bits of the RPN and EA are used in the following man- tion about these and other interrupt types. For either of ner. these interrupts caused by a Page Table Fault excep- tion or Execute, Read, or Write Access Control excep- 1. EA23:51 are shifted right q bits, according to a tion due to PTE permissions, ESRPT or GESRPT is set decode of SPSIZE, to produce a 29-bit result S. to 1 (GESRPT if the Embedded.Hypervisor.LRAT cate- The value of q is (SPSIZE ­ 2). Bits shifted out of gory is supported and the interrupt is directed to the the rightmost bit position are lost. guest. Otherwise, ESRPT). 2. A 21-bit EA mask is formed based on a decode of SIZE and SPSIZE. The EA mask is (29 - (SIZE ­ Programming Note SPSIZE))0 || (SIZE ­ SPSIZE)-81. If PTEPS is greater than the SPSIZE of the associ- 3. The EA mask from step 2 is ANDed with the high- ated indirect TLB entry, 2(PS - SPSIZE) PTEs are order 21 bits of the shifted EA result (S0:20) from needed for the virtual page to ensure there is no step 1 to form a 21-bit result. Page Table Fault exception for accesses to the page regardless of the location of the access within 4. RPN32:52from the indirect TLB entry is ORed with the page. If a Page Table Fault exception for some the 21-bit result from step 3 to form a 21-bit result accesses to the page is acceptable, there is no R. requirement that all such PTEs for the page be 5. The real address of the PTE is formed as follows: valid. RA = TLBERPN[0:31] || R || S21:28 || 0b000 The doubleword addressed by the real address result from step 5 is the PTE used to translate the EA if the PTE is valid (Valid bit = 1). If the PTE is valid, PTEPS must be greater than or equal to the SPSIZE of the associated indirect TLB entry and must be less than or equal to the SIZE of the associated indirect TLB entry. The real address (RA) result is formed by concatenat- ing 0x000 with the ARPN0:51-p from the PTE and with the low-order p bits of EA, where p is equal to log2(page size specified by PTEPS). RA = 0x000 || ARPN0:51-p || EA64-p:63 However, if an implementation supports a real address with only r bits, r<52, and either the Embedded.Hyper- visor category is not supported or the TGS bit of the corresponding indirect TLB entry is 0, the high-order 52-r bits of PTEARPN are ignored and treated as 0s. If the Embedded.Hypervisor category is supported, an implementation supports a logical address with only q bits, q<52, and the TGS bit of the corresponding indi- rect TLB entry is 1, the high-order 52-q bits of PTEARPN are ignored and treated as 0s. If the Embedded.Hypervisor category is supported and the TGS bit of the associated indirect TLB entry is 1, the RA formed from the PTE is treated as a logical real address and translated by the LRAT. If there is no matching entry in the LRAT, an LRAT Miss exception Chapter 6. Storage Control 955 Version 2.06 same W value must be used for a single thread's indi- Programming Note rect and direct TLB entries that map the same PTE. The computation of the real address of the PTE The Implementations may require specific values for can be understood as follows. (Some of the facts ACM and U0:U3. mentioned below, such as the fact that the mini- mum Page Table size is 2K, are covered later in the section.) Ordering of Implicit Accesses to the Page Table 1. q is the number of EA bits above bit 52 that are part of the byte offset within the effective page. The definition of "performed" given in Books II and III-E (The minimum size of a page that is mapped applies also to the implicit accesses to the Page Table by a PTE is 4K, so EA52:63 are always part of by the thread in performing address translation. the byte offset, and SPSIZE must be at least Accesses for performing address translation are con- 2.) S is the low-order 29-q bits of the EPN, sidered to be loads in this respect. These implicit prepended with q 0s. accesses are ordered by the sync instruction with L=0 2. The EA-mask has a number of low-order 1 bits as described below. equal to the difference between log2(# PTEs) The Synchronize instruction is described in and log2(minimum # PTEs) = 8. (The log2 of Section 4.4.3 of Book II, but only at the level required the number of PTEs in a Page Table is SIZE - by an application programmer (sync with L=0 or L=1). SPSIZE. The minimum Page Table size is 2K This section describes properties of the instruction that and PTE size = 8 bytes, so the minimum num- are relevant only to operating system and hypervisor ber of PTEs is 211 ÷ 23 = 28.) Call this number software programmers. The sync instruction with L=0 s; i.e., s = (SIZE - SPSIZE) - 8, and the log2 of (sync) has the following additional properties. the number of PTEs in the Page Table is s+8. 3. The result is the low-order s bits of the EPN The sync instruction provides an ordering function that are immediately above the lowest-order 8 for all stores to the Page Table caused by Store EPN bits (the lowest-order 8 bits are always instructions preceding the sync instruction with used to select the PTE), prepended with 21-s respect to lookups of the Page Table that are per- 0s. (If s could be greater than (29-q)-8, the formed, by the thread executing the sync instruc- "EPN" bits included in the result could include tion, after the sync instruction completes. 0 bits that were shifted in step 1. However, this Executing a sync instruction ensures that all such would correspond to (SIZE - SPSIZE) - 8 > (31 stores will be performed, with respect to the thread - SPSIZE) - 8, which would imply SIZE > 31, executing the sync instruction, before any implicit which is impossible.) accesses to the affected Page Table Entries, by 4. R consists of the high-order 21-s bits of such Page Table lookups, are performed with RPN32:52 followed by the low-order s bits of respect to that thread. the EPN that are immediately above the low- In conjunction with the tlbivax and tlbsync est-order 8 EPN bits. instructions, the sync instruction provides an 5. The real address of the PTE thus consists of ordering function for TLB invalidations and related the high-order 53-s bits of the RPN from the storage accesses on other threads as described in TLB entry, followed by the low-order s+8 bits the tlbsync instruction description on page 1010. of the EPN (recall that s+8 is the number of PTEs in the Page Table), followed by 3 0s. Programming Note For instructions following a sync instruction, Storage Control Attributes for the the memory barrier need not order implicit storage accesses for purposes of address Page Table translation. A Page Table must be located in storage that is Big- Endian, Memory Coherence Required, not Caching Inhibited and not Guarded. If the translation of a virtual Page Table Entry address matches an indirect TLB entry that has its stor- Each Page Table Entry (PTE) maps a VPN to an RPN. age control attribute E bit equal to 1, M bit equal to 0, I If the corresponding indirect TLB entry has an LPID bit equal to 1, or G bit = 1, it is implementation-depen- or PID value of zero, multiple VPNs are dent whether the translation is performed as if valid val- mapped by a single PTE in a Page Table pointed to by ues were substituted for the invalid values or as if the entry doesn't match, or either an Instruction Storage exception (for an instruction fetch) or Data Storage exception (for a data access) occurs. The Page Table is allowed to be located in storage that is Write Through Required or Not Write Through Required. However, the 956 Power ISATM Book III-E Version 2.06 such an indirect TLB entry. Figure 23 shows the layout address with only q bits, q<52, and the TGS bit of the of a PTE. corresponding indirect TLB entry is 1, the high-order 52-q bits of PTEARPN are ignored and treated as 0s for impl. dep. ARPN WIMGE R C PS BAP V address translation. SW0 SW1 The Base Access Permission (BAP) bits are used together with the Reference (R) and Change (C) bits to 0 40 45 46 50 51 52 56 62 63 derive the storage access control bits that are used for the access. Table 6 shows how the storage access con- Bit(s) Name Description trol bits are derived from the BAP, R, and C bits of the Page Table Entry. 0:39 ARPN Abbreviated Real Page Number 40:44 WIMGE Storage control attributes Table 6: Storage Access Control Bits Derived from a 45 R Reference bit Page Table Entry 46:49 impl.- Implementation-dependent Derived Storage Access dep. These bits can be used to support Page Table Values Control User-Definable Storage Control UX BAP0 & R Attributes, ACM and VLE. These SX BAP1 & R bits are used in any combination of UW BAP2 & R & C the following or subsets of the fol- SW BAP3 & R & C lowing: UR BAP4 & R 46:49 - User-Definable Stor- SR BAP5 & R age Control Attributes (U0:U3) Programming Note 48 - ACM 49 - VLE Unlike many architectures, the R and C bits in a 50 SW0 Available for software use Page Table entry are not updated by hardware. 51 C Change bit 52:55 PS Page Size (real) Programming Note 56:61 BAP Base Access Permission bits: The page size specified by PTEPS must be consis- 0: Base UX tent with the page sizes supported by a direct TLB 1: Base SX entry of a TLB array that can be loaded from the 2: Base UW Page Table. 3: Base SW 4: Base UR An implementation need not support all page sizes. 5: Base SR 62 SW1 Available for software use Page Table Size and Alignment 63 V Entry valid (V=1) or invalid (V=0) A Page Table's size is 8 × 2(SIZE ­ SPSIZE). where SIZE Figure 23. Page Table Entry is the page size specified by the SIZE field of the indi- rect TLB entry used to access the Page Table and The Page Size (PS) field encodes page sizes using the SPSIZE is the sub-page size specified by the SPSIZE same encodes as the TLBSIZE, except that 0b0 is field of this indirect TLB entry. Page Table sizes smaller prepended to the 4-bit PS value (0 || PS) to form the than 2 KB are not allowed and SPSIZE must be greater equivalent 5-bit encode and PS must specify a page than or equal to 2. This implies that the Page Table size size of 4 KB or larger. See Table 3 on page 951. s is 2 KB s 4 GB. The Page Table is aligned on a The Abbreviated Real Page Number (ARPN) field con- boundary that is a multiple of its size. tains the least significant 40 bits of the RPN. The full RPN associated with the PTE is formed from the ARPN TLB Update prepended with 0x000, i.e. RPN = 0x000 || ARPN. As a result of a Page Table translation, a correspond- Depending on the page size, certain ARPN bits must ing direct TLB entry is created if no exception occurs, is be zero. Specifically, if p>12, ARPN52-p:39 must be optionally created if certain exceptions occur, and is not zeros, where p = log2(page size specified by PTEPS). If created if certain other exceptions occur. an implementation supports a real address with only r bits, r<52, and either the Embedded.Hypervisor cate- If no exception occurs, a direct TLB entry is written to gory is not supported or the TGS bit of the correspond- create an entry corresponding to the virtual address ing indirect TLB entry is 0, the high-order 52-r bits of and the contents of the PTE that was used to translate PTEARPN are ignored and treated as 0s for address the virtual address. In this case, hardware selects the translation. If the Embedded.Hypervisor category is TLB array and TLB entry to be written. Any TLB array supported, an implementation supports a logical Chapter 6. Storage Control 957 Version 2.06 that meets all the following criteria can be selected by the hardware. The TLB array supports the page size specified by PTEPS. The TLB array can be loaded from the Page Table (TLBnCFGPT = 1). If no TLB array can be selected based on these criteria, then a TLB Ineligible exception occurs. Hardware also selects the entry within the TLB array based on some implementation-dependent algorithm. However, a valid TLB entry with IPROT = 1 must not be overwritten. If all TLB entries that can be used for a specific virtual page have IPROT = 1, then a TLB Ineligible exception occurs. In the absence of a higher priority exception, an Instruction Storage or Data Storage interrupt occurs, depending on whether the Page Table translation was due to an instruction fetch or data access and ESRTLBI is set to 1. An Instruction Storage or Data Storage inter- rupt resulting from a TLB Ineligible exception is always directed to hypervisor state. It is implementation-dependent whether a TLB entry is written as a result of a Page Table translation if a Page Table Fault exception occurs, but, if written, the valid bit of the TLB entry is set to 0. It is implementation-depen- dent whether a TLB entry is written as a result of a Page Table translation if an Execute, Read, or Write Access Control exception occurs. A TLB entry is not written as a result of a Page Table translation if an LRAT Miss exception occurs or a TLB Ineligible exception occurs. If a TLB entry is written, the entry is written based on the values shown in Table 7. 958 Power ISATM Book III-E Version 2.06 . Programming Note Table 7: TLB Update after Page Table Translation Only software creates indirect TLB entries, but both TLB field Load Value software and hardware create direct TLB entries. EPN0:p-1 EA0:p-1, where Unless a TLB Write Conditional instruction is used, p = 64 - log2(page size in bytes) and software must avoid creating a direct TLB entry for page size is specified by PTEPS. a VPN that may also be simultaneously translated Any low-order EPN bits in the TLB via a Page Table by a thread sharing the TLB. Oth- entry that correspond to byte offsets erwise multiple, direct TLB entries could be cre- with the page are undefined. ated. If software is preloading a TLB with a direct TLB entry and there is already an indirect TLB TS TS from indirect TLB entry entry that could be used to translate the same SIZE PTEPS VPN, software must ensure that no program on any TLPID [Category: TLPID from indirect TLB entry thread sharing the TLB is accessing the VPN. Oth- E.HV] erwise multiple, direct TLB entries could be cre- TGS [Category: TGS from indirect TLB entry ated. If the Embedded.TLB Write Conditional cate- E.HV] gory is supported, a TLB Write Conditional TID TID from indirect TLB entry instruction can be used to create a direct TLB entry V PTEV for the same VPN that may also be mapped by an IND 0 existing indirect entry and Page Table Entry, RPN if E.HV.LRAT not supported, assuming the page size specified by the TLB Write then Conditional and PTE are identical. RPN = 0x000 || PTEARPN || 0b00 else LPN = 0x000 || PTEARPN || 0b00 RPN = result of LRAT translation 6.7.5 Page Table Update Synchro- of LPN & PTEPS nization Requirements [Category: WIMGE PTEWIMGE U0:U3, ACM, VLEPTEimpl.-dep. (which of the imple- Embedded.Page Table] mentation-dependent TLB bits This section describes rules that software must follow are loaded and which of the when updating the Page Table. Otherwise, TLB entries PTE46:49 bits is used to load each for outdated PTEs may remain valid. This section TLB bit are implementation- includes suggested sequences of operations for some dependent) representative cases. UR, UW, UX, SR, Derived Storage Access Control SW, SX from PTEBAP, PTER, and PTEC. In the sequences of operations shown in the following See Table 6. subsections, any alteration of a Page Table Entry VF 0 (PTE) that corresponds to a single line in the sequence IPROT 0 is assumed to be done using a Store instruction for which the access is atomic. Appropriate modifications If implementations write TLB entries for out-of-order must be made to these sequences if this assumption is Page Table translations, a mechanism for disabling not satisfied (e.g., if a store doubleword operation is such TLB updates must be provided by the implemen- done using two Store Word instructions). tation in order for software to preload a TLB array with- out the possibility of creating multiple direct entries for As described in Section 6.5, stores are not performed the same virtual address. out-of-order. Moreover, address translations associated with instructions preceding the corresponding Store Programming Note instructions are not performed again after the stores have been performed. (These address translations As a hardware simplification the architecture allows must have been performed before the store was deter- a TLB entry to be written with the valid bit set to 0 if mined to be required by the sequential execution a Page Table Fault exception occurs. A replace- model, because they might have caused an exception.) ment of a valid TLB entry by an invalid entry is typi- As a result, an update to a PTE need not be preceded cally not a significant performance impact since by a context synchronizing operation. software often swaps in the virtual page and cre- ates a valid PTE for the page. All of the sequences require a context synchronizing operation after the sequence if the new contents of the PTE are to be used for address translations associated with subsequent instructions. As noted in the description of the Synchronize instruc- tion in Section 4.4.3 of Book II, address translation Chapter 6. Storage Control 959 Version 2.06 associated with instructions which occur in program Programming Note order subsequent to the Synchronize may actually be performed prior to the completion of the Synchronize. For all of the sequences shown in the following To ensure that these instructions and data which may subsections, if it is necessary to communicate com- have been speculatively fetched are discarded, a con- pletion of the sequence to software running on text synchronizing operation is required. another thread, the sync instruction at the end of the sequence should be followed by a Store Programming Note instruction that stores a chosen value to some cho- sen storage location X. The memory barrier cre- In many cases this context synchronization will ated by the sync instruction ensures that if a Load occur naturally; for example, if the sequence is instruction executed by another thread returns the executed within an interrupt handler the rfid or chosen value from location X, the sequence's hrfid instruction that returns from the interrupt han- stores to the Page Table have been performed with dler may provide the required context synchroniza- respect to that other thread. The Load instruction tion. that returns the chosen value should be followed by a context synchronizing instruction in order to Page Table Entries must not be changed in a manner ensure that all instructions following the context that causes an implicit branch. synchronizing instruction will be fetched and exe- cuted using the values stored by the sequence (or 6.7.5.1 Page Table Updates values stored subsequently). (These instructions may have been fetched or executed out-of-order TLBs are non-coherent caches of the Page Table. TLB using the old contents of the PTE.) entries must be invalidated explicitly with one of the methods described in Section 6.11.4.3. This Note assumes that the Page Table and loca- tion X are in storage that is Memory Coherence Unsynchronized lookups in the Page Table continue Required. even while it is being modified. Any thread, including a thread on which software is modifying the Page Table, may look in the Page Table at any time in an attempt to 6.7.5.1.1 Adding a Page Table Entry translate a virtual address. When modifying a PTE, This is the simplest Page Table case. The Valid bit of software must ensure that the PTE's Valid bit is 0 if the the old entry is assumed to be 0. The following PTE is inconsistent (e.g., if the BAP field is not correct sequence can be used to create a PTE, maintain a for the current ARPN field). consistent state, and ensure that a subsequent refer- ence to the virtual address translated by the new entry The sequences of operations shown in the following will use the correct real address and associated subsections assume a multi-threaded processor envi- attributes. ronment. In a system consisting of only a single- threaded processor the tlbsync must be omitted, and PTEARPN,WIMGE,R,SW0,C,PS,BAP,SW1,V new values the mbar that separates the tlbivax from the tlbsync sync /* order updates before next can be omitted. In a multi-threaded processor environ- Page Table lookup and before ment, when tlbilx [Category: E.HV] is used instead of next data access. */ tlbivax in a Page Table update, the synchronization requirements are the same as when tlbivax is used in a system consistion of only a single-threaded proces- sor. 960 Power ISATM Book III-E Version 2.06 On a 32-bit implementation, the following sequence sequence suffices if the precise instant that hardware can be used. Page Table translations use the new value doesn't PTEARPN(0:31) new value matter. Reference, Change, and Valid bits are in differ- mbar /* order 1st update before 2nd */ ent bytes to facilitate the use of a Store instruction of a PTEARPN[32:39],WIMGE,R,SW0,C,PS,BAP,SW1,V new values byte to modify a Reference or Change bit instead of a sync /* order updates before next ldarx and stdcx.. However, the correctness of doing so Page Table lookup and before next data access. */ is a software issue beyond the scope of this architec- ture. 6.7.5.1.2 Deleting a Page Table Entry Modifying the Virtual Address The following sequence can be used to ensure that the If the virtual address translated by a valid PTE is to be translation instantiated by an existing entry is no longer modified and the new virtual address maps to the same available. PTE as does the old virtual address, the following sequence can be used to modify the PTE, maintain a PTEV 0 /* (other fields don't matter) */ consistent state, ensure that the translation instantiated sync /* order update before tlbivax and by the old entry is no longer available, and ensure that before next Page Table lookup */ a subsequent reference to the virtual address trans- tlbivax(old_LPID,old_GS,old_PID,old_AS,old_VA, lated by the new entry will use the correct real address old_ISIZE, old_IND) /*invalidate old translation*/ and associated attributes. This instruction sequence mbar /* order tlbivax before tlbsync */ depends on an atomic doubleword store of the PTE. In tlbsync /* order tlbivax before sync */ 32-bit mode, the instruction sequence shown for the sync /* order tlbivax, tlbsync, and update general case must be used. before next data access */ PTEU0:U3,ARPN,PS,BAP,R,C,WIMGE,SW0:1,V new values 6.7.5.1.3 Modifying a Page Table Entry /* Set PTE to new values with V=1 */ sync /* order update before tlbivax and before next Page Table lookup */ General Case tlbivax(old_LPID,old_GS,old_PID,old_AS,old_VA, If a valid entry is to be modified and the translation old_ISIZE, old_IND) /*invalidate old translation*/ instantiated by the entry being modified is to be invali- mbar /* order tlbivax before tlbsync */ dated, the old PTE can be deleted and a new one tlbsync /* order tlbivax before sync */ added using the sequences described in the two pre- sync /* order tlbivax, tlbsync, and update ceding sections, in order to ensure that the translation before next data access */ instantiated by the old entry is no longer available, maintain a consistent state, modify the PTE, and ensure that a subsequent reference to the virtual address translated by the new entry will use the correct 6.7.5.2 Invalidating an Indirect TLB real address and associated attributes. Entry Modifying the SW0 and SW1 Fields The following sequence can be used to ensure that translations by a Page Table that is mapped via an indi- If the only change being made to a valid entry is to rect entry will no longer occur and that the storage used modify the SW0 or SW1 fields, the following sequence for the Page Table can then be re-used for other pur- suffices because the SW0 and SW1 fields are not used poses. by the thread. for all valid PTEs mapped by the indirect TLB entry loop: ldarx r1 PTE /* load of PTE */ PTEV 0/* (other fields don't matter) */ r1 new SW0,SW1 /* replace SW0,SW1 in r1*/ sync /* order stores to PTEs */ stdcx. PTE r1 /* store of PTE for all valid PTEs mapped by the indirect TLB entry if still reserved (new SW0 or SW1 tlbivax(old_LPID,old_GS,old_PID,old_AS,old_VA, values, other fields unchanged) */ old_ISIZE, MAS6SIND = 0) bne- loop /* loop if lost reservation */ /*invalidate old PTE translations*/ tbivax(old_LPID,old_GS,old_PID,old_AS,old_VA, A lwarx/stwcx. pair (specifying the low-order word of old_ISIZE, MAS6SIND = 1) the PTE) can be used instead of the ldarx /stdcx. pair /*invalidate old indirect TLB entry */ shown above. mbar /* order tlbivax before tlbsync */ tlbsync /* order tlbivax before sync */ Modifying a Reference or Change Bit sync /* order tlbivax, tlbsync, and update before next data access to the storage If the only change being made to a valid entry is to locations occupied by the Page Table modify the R bit, the C bit or both, the preceding pointed to by the old indirect TLBE */ Chapter 6. Storage Control 961 Version 2.06 6.7.6 Storage Access Control An Execute, Read, or Write Access Control exception or Virtualization Fault exception occurs if the appropri- After a matching TLB entry has been identified, the ate TLB entry is found but the access is not allowed by access control mechanism selectively grants execute the access control mechanism (Instruction or Data access, read access, and write access separately for Storage interrupt). See Section 7.6 for additional infor- user mode versus supervisor mode. If the Embed- mation about these and other interrupt types. In certain ded.Hypervisor category is supported, the access con- cases, Execute, Read, and Write Access Control trol mechanism selectively controls an access so that exceptions and Virtualization Fault exceptions may the access can be virtualized by the hypervisor if result in the restart of (re-execution of at least part of) a appropriate. Figure 24 illustrates the access control Load or Store instruction. process and is described in detail in Sections 6.7.6.1 Implementations may provide additional access control through 6.7.6.6. capabilities beyond those described here. TLB match (see Figure 21) access granted MSRPR instruction fetch TLBentry[UX] TLBentry[SX] load-class data storage access TLBentry[UR] TLBentry[SR] store-class data storage access TLBentry[UW] TLBentry[SW] included if Category E.HV supported TLBentry[VF] Figure 24. Access Control Process when their effective addresses were mapped to exe- cute permitted storage. Software need not flush a page from the instruction cache before marking it no-exe- 6.7.6.1 Execute Access cute. The UX and SX bits of the TLB entry control execute Furthermore, if the sequential execution model calls for access to the page (see Table 8). the execution of an instruction from a page that is not Instructions may be fetched and executed from a page enabled for execution (i.e. UX=0 when MSRPR=1 or in storage while in user state (MSRPR=1) if the UX SX=0 when MSRPR=0), an Execute Access Control access control bit for that page is equal to 1. If the UX exception type Instruction Storage interrupt is taken. access control bit is equal to 0, then instructions from that page will not be fetched, and will not be placed into 6.7.6.2 Write Access any cache as the result of a fetch request to that page while in user state. The UW and SW bits of the TLB entry control write access to the page (see Table 8). Instructions may be fetched and executed from a page in storage while in supervisor state (MSRPR=0) if the Store operations (including Store-class Cache Man- SX access control bit for that page is equal to 1. If the agement instructions) are permitted to a page in stor- SX access control bit is equal to 0, then instructions age while in user state (MSRPR=1) if the UW access from that page will not be fetched, and will not be control bit for that page is equal to 1. If the UW access placed into any cache as the result of a fetch request to control bit is equal to 0, then execution of the Store that page while in supervisor state. instruction is suppressed and a Write Access Control exception type Data Storage interrupt is taken. Instructions from no-execute storage may be in the instruction cache if they were fetched into that cache 962 Power ISATM Book III-E Version 2.06 Store operations (including Store-class Cache Man- age interrupts and Virtualization Fault exception type agement instructions) are permitted to a page in stor- Data Storage interrupts. age while in supervisor state (MSRPR=0) if the SW dcbt, dcbtep, dcbtst, dcbtstep, and icbt instructions access control bit for that page is equal to 1. If the SW are treated as Loads with respect to protection. How- access control bit is equal to 0, then execution of the ever, they do not cause Read Access Control excep- Store instruction is suppressed and a Write Access tions. A virtualization fault on these instructions will not Control exception type Data Storage interrupt is taken. result in a Data Storage interrupt. 6.7.6.3 Read Access It is implementation-dependent whether dcbtsls instructions are treated as Loads or Stores with respect The UR and SR bits of the TLB entry control read to protection. As such, they can cause either Read access to the page (see Table 8). Access Control exception type Data Storage interrupts or Write Access Control exception type Data Storage Load operations (including Load-class Cache Manage- interrupts and can also cause Virtualization Fault ment instructions) are permitted from a page in storage exception type Data Storage interrupts. while in user state (MSRPR=1) if the UR access control bit for that page is equal to 1. If the UR access control dcbf, dcbfep, dcbst, and dcbstep instructions are bit is equal to 0, then execution of the Load instruction treated as Loads with respect to protection. Flushing or is suppressed and a Read Access Control exception storing a line from the cache is not considered a Store type Data Storage interrupt is taken. since the store has already been done to update the cache and the dcbf, dcbfep, dcbst, or dcbstep Load operations (including Load-class Cache Manage- instruction is only updating the copy in main storage. ment instructions) are permitted from a page in storage As a Load, they can cause Read Access Control while in supervisor state (MSRPR=0) if the SR access exception type Data Storage interrupts and Virtualiza- control bit for that page is equal to 1. If the SR access tion Fault exception type Data Storage interrupts. control bit is equal to 0, then execution of the Load instruction is suppressed and a Read Access Control exception type Data Storage interrupt is taken. Table 8: Storage Access Control Applied to Cache Instructions Read Pro- Write Pro- 6.7.6.4 Virtualized Access Instruction tection tection Virtualiza- tion Fault1 The VF bit of the TLB entry prevents a Load or Store Violation Violation access to the page (see Table 8). dcba No No No The translation of a Load or Store (including Cache dcbf Yes No Yes Management instructions) operand address that uses a dcbfep Yes No Yes TLB entry with the Translation Virtualization Fault field dcbi Yes3 Yes3 Yes equal to 1 causes a Virtualization Fault exception type Data Storage interrupt regardless of the settings of the dcblc Yes No Yes permission bits and regardless of whether the TLB dcbst Yes No Yes entry is a direct or indirect entry. The resulting Data dcbstep Yes No Yes Storage interrupt is directed to the hypervisor state. dcbt No No No dcbtep No No No 6.7.6.5 Storage Access Control Applied to Cache Management Instructions dcbtls Yes No Yes dcbtst No No No dcbi, dcbz, and dcbzep instructions are treated as Stores since they can change data (or cause loss of dcbtstep No No No 4 Yes4 Yes4 data by invalidating a dirty line). As such, they can dcbtstls Yes cause Write Access Control exception type Data Stor- dcbz No Yes Yes age interrupts. If an implementation first flushes a line dcbzep No Yes Yes before invalidating it during a dcbi, the dcbi is treated as a Load since the data is not modified. dci No Yes Yes icbi Yes No Yes dcba instructions are treated as Stores since they can change data. However, they do not cause Write Access icbiep Yes No Yes Control exceptions. A dcba instruction will not cause a 2 icblc Yes No Yes virtualization fault (TLBVF = 1). icbt No No No dcblc, dcbtls, icbi and icbiep instructions are treated 2 icbtls Yes No Yes as Loads with respect to protection. As such, they can ici No No No cause Read Access Control exception type Data Stor- Chapter 6. Storage Control 963 Version 2.06 Table 8: Storage Access Control Applied to Cache rupt is pending, the instruction completes before Instructions the interrupt occurs. Read Pro- Write Pro- Load or Store instruction that causes an Alignment Virtualiza- exception, a Data TLB Error exception, or that Instruction tection tection tion Fault1 causes a Data Storage exception. Violation Violation 1. Category: Embedded.Hypervisor The portion of the storage operand that is in Cach- 2. icbtls and icblc require execute or read access. ing Inhibited and Guarded storage is not accessed. 3. dcbi may cause a Read or Write Access Control Exception based on whether the data is flushed Programming Note prior to invalidation. Instruction fetching from Guarded storage is per- 4. It is implementation-dependent whether dcbtstls mitted. If instruction fetches from Guarded storage is treated as a Load or a Store. must be prevented, software must set access con- trol bits for such pages to no-execute (i.e. UX=0 6.7.6.6 Storage Access Control Applied and SX=0). to String Instructions When the string length is zero, neither lswx nor stswx 6.8.1.1 Out-of-Order Accesses to can cause Data Storage interrupts. Guarded Storage In general, Guarded storage is not accessed out-of- 6.8 Storage Control Attributes order. The only exception to this rule is the following. This section describes aspects of the storage control Load Instruction attributes that are relevant only to privileged software If a copy of any byte of the storage operand is in a programmers. The rest of the description of storage cache then that byte may be accessed in the cache or control attributes may be found in Section 1.6 of Book II in main storage. and subsections. 6.8.2 User-Definable 6.8.1 Guarded Storage User-definable storage control attributes control user- Storage is said to be "well-behaved" if the correspond- definable and implementation-dependent behavior of ing real storage exists and is not defective, and if the the storage system. The existence of these bits is effects of a single access to it are indistinguishable implementation-dependent. These bits are both imple- from the effects of multiple identical accesses to it. mentation-dependent and system-dependent in their Data and instructions can be fetched out-of-order from effect. These bits may be used in any combination and well-behaved storage without causing undesired side also in combination with the other storage control effects. attribute bits. Storage is said to be Guarded if the G bit is 1 in the TLB entry that translates the effective address. 6.8.3 Storage Control Bits In general, storage that is not well-behaved should be Storage control attributes are specified on a per-page Guarded. Because such storage may represent a con- basis. These attributes are specified in storage control trol register on an I/O device or may include locations bits in the TLB entries. The interpretation of their values that do not exist, an out-of-order access to such stor- is given in Figure 25. age may cause an I/O device to perform unintended operations or may result in a Machine Check. Instruction fetching is not affected by the G bit. The following rules apply to in-order execution of Load and Store instructions for which the first byte of the storage operand is in storage that is both Caching Inhibited and Guarded. Load or Store instruction that causes an atomic access If any portion of the storage operand has been accessed and an asynchronous or imprecise inter- 964 Power ISATM Book III-E Version 2.06 The combination WIMG = 0b1110 is used to identify the Strong Access Ordering (SAO) storage attribute Bit Storage Control Attribute (see Section 1.7.1, "Storage Access Ordering", in Book II). Because this attribute is not intended for general W1,6 0 - not Write Through Required purpose programming, it is provided only for a single 1 - Write Through Required combination of the attributes normally identified using I6 0 - not Caching Inhibited the WIMG bits. That combination would normally be 1 - Caching Inhibited indicated by WIMG = 0b0010. M2 0 - not Memory Coherence Required References to Caching Inhibited storage (or storage 1 - Memory Coherence Required with I=1) elsewhere in the Power ISA have no applica- G 0 - not Guarded tion to SAO storage or its WIMG encoding, despite the 1 - Guarded fact that the encoding uses using I=1. Conversely, refer- E3 0 - Big-Endian ences to storage that is not Caching Inhibited (or stor- 1 - Little-Endian age with I=0) apply to SAO storage or its WIMG U0-U34 User-Definable encoding. References to Write Through Required stor- 5 age (or storage with W=1) elsewhere in the Power ISA VLE 0 - non Variable Length Encoding (VLE). have no application to SAO storage or its WIMG encod- 1 - VLE ing, despite the encoding using W=1. Conversely, refer- ACM7 0 - not Alternate Coherency Mode ences to storage that is not Write Through Required (or 1 - Alternate Coherency Mode (if M=1) storage with W=0) apply to SAO storage or its WIMG 1 Support for the 1 value of the W bit is optional. encoding. Implementations that do not support the 1 value If a given real page is accessed concurrently as SAO treat the bit as reserved and assume its value to storage and as non-SAO storage, the result may be be 0. characteristic of the weakly consistent model. 2 Support of the 1 value is optional for implementa- tions that do not support multiprocessing, imple- Programming Note mentations that do not support this storage control attribute assume the value of the bit to be 0, and If an application program requests both the Write setting M=1 in a TLB entry will have no effect. Through Required and the Caching Inhibited 3 [Category: Embedded.Little-Endian] attributes for a given storage location, the operating 4 Support for these attributes is optional. system should set the I bit to 1 and the W bit to 0. 5 [Category: VLE] For implementations that support the SAO cate- 6 [Category: SAO] The combination WIMG = gory, the operating system should provide a means 0b1110 has behavior unrelated to the meanings of by which application programs can request SAO the individual bits. See Section 6.8.3.1, "Storage storage, in order to avoid confusion with the pre- Control Bit Restrictions" for additional information. ceding guideline (since SAO is encoded using 7 The coherency method used in Alternate Coher- WI=0b11). ency Mode is implementation-dependent. Accesses to the same storage location using two effec- Figure 25. Storage control bits tive addresses for which the W bit differs meet the memory coherence requirements described in In Section 6.8.3.1 and 6.8.3.2, "access" includes Section 1.6.3 of Book II if the accesses are performed accesses that are performed out-of-order. by a single thread. If the accesses are performed by two or more threads, coherence is enforced by the Programming Note hardware only if the W bit is the same for all the In a system consisting of only a single-threaded accesses. processor that has caches, correct coherent execu- tion does not require storage to be accessed as At any given time, the value of the I bit must be the Memory Coherence Required, and accessing stor- same for all accesses to a given real page. age as not Memory Coherence Required may give At any given time, data accesses to a given real page better performance. may use both Endian modes. When changing the Endian mode of a given real page for instruction fetch- ing, care must be taken to prevent accesses while the 6.8.3.1 Storage Control Bit Restrictions change is made and to flush the instruction cache(s) All combinations of W, I, M, G, and E values are permit- after the change has been completed. ted except those for which both W and I are 1 and Setting the VLE attribute to 1 and setting the E attribute M||G 0b10. to 1 is considered a programming error and an attempt to fetch instructions from a page so marked produces an Instruction Storage Interrupt Byte Ordering Excep- Chapter 6. Storage Control 965 Version 2.06 tion and sets ESRBO or GESRBO to 1 (GESRBO if the When changing the value of the VLE bit for a given real Embedded.Hypervisor category is supported and the page, software must set the VLE bit to the new value, interrupt is directed to the guest. Otherwise, ESRBO). then, if the page was not Caching Inhibited, invalidate copies of all locations in the page from instruction At any given time, the value of the VLE bit must be the cache using icbi or icbiep, and then execute an isync same for all accesses to a given real page. instruction before permitting any other accesses to the page. Programming Note When changing the Endian mode of a given real page used for instruction fetching and the instruc- tion cache is shared between threads, care must be taken to prevent accesses from any thread that shares the instruction cache while the change is made until the instruction cache flush has been completed. 6.8.3.2 Altering the Storage Control Bits When changing the value of the W bit for a given real page from 0 to 1, software must ensure that no thread modifies any location in the page until after all copies of locations in the page that are considered to be modified in the data caches have been copied to main storage using dcbst, dcbstep, dcbf, dcbfep, or dcbi. When changing the value of the I bit for a given real page from 0 to 1, software must set the I bit to 1 and then flush all copies of locations in the page from the caches using dcbf, dcbfep, or dcbi, and icbi or icbiep before permitting any other accesses to the page. Programming Note The storage control bit alterations described above are examples of cases in which the directives for application of statements about the W and I bits to SAO given in the third paragraph of the preceding subsection must be applied. A transition from the typical WIMG=0b0010 for ordinary storage to WIMG=0b1110 for SAO storage does not require the flush described above because both WIMG combinations indicate storage that is not Caching Inhibited. When changing the value of the M bit for a given real page, software must ensure that all data caches are consistent with main storage. The actions required to do this to are system-dependent. Programming Note For example, when changing the M bit in some directory-based systems, software may be required to execute dcbf or dcbfep on each thread to flush all storage locations accessed with the old M value before permitting the locations to be accessed with the new M value. The actions required when changing the ACM bit for a given real page are system-dependent. 966 Power ISATM Book III-E Version 2.06 Programming Note This Note suggests one example for managing refer- UW, and SW bits off, and the index and effective page ence and change recording. number of the entry retained by software. The first attempt of application code to use the page will cause When performing physical page management, it is use- an Access Control exception (because the entry is ful to know whether a given physical page has been marked "No Execute", "No Read", and "No Write"). The referenced or altered. Note that this may be more Instruction or Data Storage interrupt handler records involved than knowing whether a given TLB entry has the reference to the TLB entry and to the associated been used to reference or alter memory, since multiple physical page in a software table, and then turns on the TLB entries may translate to the same physical page. If appropriate access control bit. An initial read from the it is necessary to replace the contents of some physical page could be handled by only turning on the appropri- page with other contents, a page which has been refer- ate UR or SR access control bits, leaving the page enced (accessed for any purpose) is more likely to be "read-only". retained than a page which has never been referenced. If the contents of a given physical page are to be In a demand-paged environment, when the contents of replaced, then the contents of that page must be writ- a physical page are to be replaced, if any storage in ten to the backing store before replacement, if anything that physical page has been altered, then the backing in that page has been changed. Software must main- storage must be updated. The information that a physi- tain records to control this process. cal page is dirty is typically recorded in a "Change" bit for that page. Similarly, when performing TLB management, it is use- ful to know whether a given TLB entry has been refer- Write Access Control exceptions may be used to allow enced. When making a decision about which entry to software to maintain change information for a physical cast-out of the TLB, an entry which has been refer- page. For the example just given for reference record- enced is more likely to be retained in the TLB than an ing, the first write access to the page via the TLB entry entry which has never been referenced. will create a Write Access Control exception type Data Storage interrupt. The Data Storage interrupt handler Execute, Read and Write Access Control exceptions records the change status to the physical page in a may be used to allow software to maintain reference software table, and then turns on the appropriate UW information for a TLB entry and for its associated phys- and SW bits. ical page. The entry is built, with its UX, SX, UR, SR, Chapter 6. Storage Control 967 Version 2.06 6.9 Logical to Real Address LSIZE Logical Page Size The LSIZE field specifies the size of the logical Translation [Category: Embed- page associated with the LRAT entry as ded.Hypervisor.LRAT] 2LSIZEKB, where 0 LSIZE 31. Implemen- tations may support any one or more of In a partitioned environment, a guest operating system these logical page sizes (see Section is not allowed to manipulate real page numbers. 6.10.3.6), and these logical page sizes need Instead the hypervisor virtualizes the real memory and not be the same as the real page sizes that the guest operating system manages the virtualized are implemented. However, the smallest log- memory using logical page numbers (LPNs). In MMU ical page is no smaller than the smallest real Architecture Version 2.0, a Logical to Real Address page. The encodes for LSIZE are the same Translation (LRAT) array facilitates this virtualization by as the encodes for the TLB SIZE. See Sec- providing a hardware translation from an LPN to an tion 6.7.2. This field must be one of the logi- RPN without trapping to the hypervisor for every TLB cal pages sizes specified by the LRATPS update. register. LRPN LRAT Real Page Number (up to 54 bits) LRAT Entry Bits 0:n­1 of the LRPN field are used to replace bits 0:n­1 of the LPN to produce the Below are shown the field definitions for an LRAT entry. RPN that is written to TLBRPN by a tlbwe instruction or a Page Table translation Name Description (where n=64­log2(logical page size in bytes) V Valid and logical page size is specified by the This bit indicates that this LRAT entry is valid LSIZE field of the LRAT entry). Software and may be used for translation of an LPN to must set unused low-order LRPN bits to 0. an RPN. The Valid bit for a given entry can Note: Bits X:Y of the LRPN field are imple- be set or cleared with a tlbwe instruction. mented, where X 0 and Y 53. X = 64 - LPID Logical Partition ID MMUCFGRASIZE. Y = p - 1 where p = 64­ This optional field identifies a partition. The log2(smallest logical page size in bytes) and Logical Partition ID is compared with LPIDR smallest logical page size is the smallest contents during an LRAT translation. This page size supported by the implementation field is required if category E.PT is supported as specified by the LRATPS register. Unim- or if threads that share an LRAT can be in plemented LRPN bits are treated as if they different partitions. Whether the LPID field is contain 0s. supported is indicated by LRATCFGLPID. Note: The number of bits implemented for this An LRAT entry can be written by the hypervisor using field is required to be the same as the TLPID the tlbwe instruction with MAS0ATSEL equal to 1. The field in a TLB. contents of the LRAT entry specified by MAS0ESEL, LPN Logical Page Number (up to 54 bits) and MAS2EPN are written from MAS registers. See the Bits 64-q:n­1 of the LPN field are compared to tlbwe instruction description in Section 6.11.4.9. bits 64-q:n­1 of the Logical Page Number (LPN) for the tlbwe instruction or Page Table An LRAT entry can be read by the hypervisor using the translation (where q = LRATCFGLASIZE and tlbre instruction with MAS0ATSEL equal to 1. The con- n = 64 ­log2(logical page size in bytes) and tents of the LRAT entry specified by MAS0ESEL and logical page size is specified by the LSIZE MAS2EPN are read and placed into the MAS registers. field of the LRAT entry). See Section 6.7.2. See the tlbre instruction description in Section Software must set unused low-order LPN 6.11.4.9. bits to 0. Maintenance of LRAT entries is under hypervisor soft- Note: Bits X:Y of the LPN field are imple- ware control. Hypervisor software determines LRAT mented, where X 0 and Y 53. The bits entry replacement strategy. There is no Next Victim implemented for LPN are not required to be support for the LRAT array. the same as those implemented for TLBRPN. Unimplemented LRPN bits are treated as if The LRAT array is a hypervisor resource. they contain 0s. There is at most one LRAT array per thread. 968 Power ISATM Book III-E Version 2.06 MAS1IND = 0 and the value of MAS1TSIZE is Programming Note less than or equal to the value of the LSIZE Hypervisor software should not create an LRAT field of the LRAT entry. entry that maps any real memory regions for which MAS1IND = 1 and the value of (3 + a TLB entry should have VF equal to 1. Otherwise, (MAS1TSIZE ­ MAS3SPSIZE)) is less than or a guest operating system could incorrectly create equal to the value of the (10 + TLB entries, for this memory, with VF=0, assuming LRAT entryLSIZE). hypervisor software normally sets MAS8VF=0 before giving control to a guest operating system. If a matching LRAT entry is found, the LRPN from that LRAT entry provides the upper bits of the RPN that is written to the TLB, and the LPN provides the low order TLB Write RPN bits written to the TLB. Let n=64­ When the guest operating system manipulates the val- log2(logical page size in bytes) where logical page size ues of RPN fields of MAS registers, the values are is specified by the LSIZE field of the LRAT entry. Bits treated as forming an LPN. When guest supervisor n:53 of the LPN are appended to bits 0:n­1 of the software attempts to execute tlbwe on an implementa- LRPN field of the selected LRAT entry to produce the tion that supports MMU Architecture Version 1, tlbre, RPN (i.e. RPN = LRPN0:n­1 || LPNn:53). The page size and tlbsx, which operate on a TLB entry's real page specified by the LSIZE of the LRAT entry used to trans- number (RPN) or when guest supervisor software late the LPN must be one of the values supported by attempts to execute a TLB Management instruction the implementation's LRAT array. If the LRAT does not with guest execution of TLB Management instructions contain a matching entry for the LPN, an LRAT Miss disabled (EPCRDGTMI=1), an Embedded Hypervisor exception occurs. Privilege exception occurs. Also, if TLBnCFGGTWE = 0 When the hypervisor executes a tlbwe instruction, no for a TLB array and the guest supervisor executes a LRAT translation is performed and the RPN formed tlbwe to the TLB array, an Embedded Hypervisor Privi- from MAS7RPNU and MAS3RPNL is written to the TLB. lege exception occurs. If a tlbwe caused the exception, the hypervisor can replace the LPN value in the MAS Page Table registers with the corresponding RPN, execute a tlbwe, and restore the LPN in the MAS registers before return- A Logical to Real Address Translation (LRAT) array ing to the guest operating system. If a tlbre or tlbsx provides a mechanism that allows guest Page Table caused the exception, the hypervisor can execute the management and translation without direct hypervisor exception-causing instruction and replace the RPN involvement. When an instruction fetch address or a value in the MAS registers with the corresponding LPN Load, Store, or Cache Management instruction oper- before returning to the guest operating system. and address is translated by the Page Table, the Embedded.Hypervisor category is supported, and the A Logical to Real Address Translation (LRAT) array TGS bit of the associated indirect TLB entry is 1, the provides a mechanism that allows a guest operating RPN result of the Page Table translation is treated as system to write the TLB without trapping to the hypervi- an LPN that is translated into an RPN by the LRAT if a sor. When guest supervisor software executes tlbwe matching LRAT entry is found. A matching LRAT entry on an implementation that supports MMU Architecture exists if the following conditions are all true for some Version 2, guest execution of TLB Management LRAT entry. instructions is enabled (EPCRDGTMI=0), and The Valid bit of the LRAT entry is one. TLBnCFGGTWE = 1 for the TLB array to be written, an Either the LPID field is not supported in the LRAT LPN is formed. If MAS7 is implemented, LPN = (LRATCFGLPID=0) or the value of LPIDRLPID is MAS7RPNU || MAS3RPNL. Otherwise, LPN = 320 || equal to the value of the LPID field of the LRAT MAS3RPNL. The LPN is translated into an RPN by the entry. LRAT if a matching LRAT entry is found. A matching Bits 64-q:n-1 of the LPN match the corresponding LRAT entry exists if the following conditions are all true bits of the LPN field of the LRAT entry where n = for some LRAT entry. 64 ­ log2(logical page size in bytes), The Valid bit of the LRAT entry is one. logical page size is specified by the LSIZE field of Either the LPID field is not supported in the LRAT the LRAT entry, and q is specified by (LRATCFGLPID=0) or the value of LPIDRLPID is LRATCFGLASIZE. equal to the value of the LPID field of the LRAT The value of PTEPS is less than or equal to the entry. value of the LSIZE field of the LRAT entry. Bits 64-q:n-1 of the LPN match the corresponding bits of the LPN field of the LRAT entry where n = If a matching LRAT entry is found, the LRPN from that 64 ­ log2(logical page size in bytes), LRAT entry provides the upper bits of the RPN of the logical page size is specified by the LSIZE field of translation result and the LPN provides the low order the LRAT entry, and q is specified by RPN bits of the translation result. Let n=64­ LRATCFGLASIZE. log2(logical page size in bytes) where logical page size Either of the following is true. is specified by the LSIZE field of the LRAT entry. Bits Chapter 6. Storage Control 969 Version 2.06 n:51 of the LPN are appended to bits 0:n­1 of the Programming Note LRPN field of the selected LRAT entry to produce the RPN (i.e. RPN = LRPN0:n­1 || LPNn:51). The page size The PID register was referred to as PID0 in the specified by the LSIZE of the LRAT entry used to trans- Type FSL Storage Control appendix of previous late the LPN must be one of the values supported by versions of the architecture. the implementation's LRAT array. If the LRAT does not contain a matching entry for the LPN, an LRAT Miss exception occurs. 6.10.2 MMU Assist Registers The MMU Assist Registers (MAS) are used to transfer 6.10 Storage Control Registers data to and from the TLB arrays. If the Embed- ded.Hypervisor.LRAT category is supported, MAS reg- In addition to the registers described below, the isters are also used to transfer data to and from the Machine State Register provides the IS and DS bits, LRAT array. MAS registers can be read and written by that specify which of the two address spaces the software using mfspr and mtspr instructions. Execu- respective instruction or data storage accesses are tion of a tlbre instruction with MAS0ATSEL=0 causes directed towards. MSRPR bit is also used by the stor- the TLB entry specified by MAS0TLBSEL, MAS0ESEL, age access control mechanism. If the Embed- and MAS2EPN to be copied to the MAS registers if ded.Hypervisor category is supported, the MSRGS bit is TLBnCFGHES = 0 for the TLB array specified by used to identify guest state. The guest supervisor state MAS0TLBSEL whereas the TLB entry is specified by exists when MSRPR = 0 and MSRGS = 1. MSRGS is MAS0ESEL, MAS0TLBSEL, and a hardware generated used to form the virtual address. Also, see hash based on MAS2EPN, MAS1TID, and MAS1TSIZE if Section 5.3.6 for the registers in the Embedded.Exter- TLBnCFGHES=1. Execution of a tlbwe instruction with nal PID category. MAS0ATSEL=0 (or in hypervisor state) causes the TLB entry specified by MAS0TLBSEL, MAS0ESEL, and MAS2EPN to be written with contents of the MAS regis- 6.10.1 Process ID Register ters if TLBnCFGHES = 0 for the TLB array specified by MAS0TLBSEL. If TLBnCFGHES = 1 for a tlbwe, the TLB The Process ID Registers are 32-bit registers as shown entry is selected by MAS0TLBSEL, a hardware gener- in Figure 26. Process ID Register bits are numbered 32 ated hash based on MAS2EPN, MAS1TID, and (most-significant bit) to 63 (least-significant bit). The MAS1TSIZE, and either a hardware replacement algo- number of bits implemented in a PID register is indi- rithm if MAS0HES=1 or MAS0ESEL if MAS0HES=0. cated by the value of the MMUCFGPIDSIZE. The Pro- MAS registers may also be updated by hardware as the cess ID Register provides a value that is used to result of any of the following. construct a virtual address for accessing storage. a tlbsx instruction The Process ID Register is a privileged register. This the occurrence of an Instruction or Data TLB Error register can be read using mfspr and can be written interrupt if any of the following is true. using mtspr. An implementation may opt to implement The Embedded.Hypervisor category is not only the least-significant n bits of the Process ID Regis- supported. ter, where 1 n 14, and n must be the same as the MAS Register updates are enabled for inter- number of implemented bits in the TID field of the TLB rupts directed to the hypervisor (EPCRDMIUH entry. The most-significant 32­n bits of the Process ID = 0). Register are treated as reserved. The interrupt is directed to the guest state (EPCRITLBGS = 1 for Instruction TLB Error /// PID interrupt and EPCRDTLBGS = 1 for Data TLB 32 50 63 Error interrupt). Figure 26. Processor ID Register (PID) All MAS registers are privileged, except MAS5 and MAS8, which are hypervisor privileged and are only The PID register fields are described below. provided if category Embedded.Hypervisor is sup- Bit Description ported. All MAS registers with the exception of MAS7 and, if Embedded.Hypervisor category is not sup- 50:63 Processor ID (PID) ported, MAS5 and MAS8, must be implemented. MAS7 Identifies a unique process (except for the is not required to be implemented if the hardware sup- value of 0) and is used to construct a virtual ports 32 bits or less of real address. address for storage accesses. The necessary bits of any multi-bit field in a MAS regis- All other fields are reserved. ter must be implemented such that only the resources supported are represented. Any non-implemented bits in a field should have no effect when writing and should always read as zero. For example, if only 2 TLB arrays 970 Power ISATM Book III-E Version 2.06 are implemented, then only the lower-order bit of the 0 E.TWC category is not supported MAS0TLBSEL field is implemented. 1 E.TWC category is supported. See Sec- tion 6.11.4.2.1 for a description of condi- tional TLB writes. This category also 6.10.3 MMU Configuration and includes support for the tlbsrx. instruc- Control Registers tion, MAS0WQ, and MAS6ISIZE. 53:57 PID Register Size (PIDSIZE) 6.10.3.1 MMU Configuration Register The value of PIDSIZE is one less than the number of bits implemented for each of the (MMUCFG) PID registers implemented. Only the least sig- The read-only MMUCFG register provides information nificant PIDSIZE+1 bits in the PID registers about the MMU and its arrays. MMUCFG is a privileged are implemented. The maximum number of register except that if the Embedded.Hypervisor cate- PID register bits that may be implemented is gory is supported, MMUCFG is a hypervisor resource. 14. The layout of the MMUCFG register is shown in Figure 60:61 Number of TLBs (NTLBS) 27 for MAV=1.0 and in Figure 28 for MAV=2.0. The value of NTLBS is one less than the num- ber of software-accessible TLB structures that LPIDSIZE /// RASIZE /// // PIDSIZE are implemented. NTLBS is set to one less NTLBS MAVN than the number of TLB structures so that its value matches the maximum value of MAS0TLBSEL. 32 36 40 47 53 58 60 62 63 00 1 TLB Figure 27. MMU Configuration Register [MAV=1.0] 01 2 TLBs 10 3 TLBs LPIDSIZE /// RASIZE /// // PIDSIZE 11 4 TLBs NTLBS MAVN LRAT TWC 62:63 MMU Architecture Version Number (MAVN) Indicates the version number of the architec- 32 36 40 47 48 49 53 58 60 62 63 ture of the MMU implemented. Figure 28. MMU Configuration Register [MAV=2.0] 00 Version 1.0 01 Version 2.0 The MMUCFG fields are described below. 10 Reserved Bit Description 11 Reserved 36:39 LPID Register Size (LPIDSIZE) All other fields are reserved. The value of LPIDSIZE is the number of bits in LPIDR that are implemented. Only the least 6.10.3.2 TLB Configuration Registers significant LPIDSIZE bits in LPIDR are imple- mented. The Embedded.Hypervisor category (TLBnCFG) is supported if and only if LPIDSIZE > 0. Each TLBnCFG read-only register provides configura- 40:46 Real Address Size (RASIZE) tion information about each corresponding TLB array Number of bits in a real address supported by that is implemented. There is one TLBnCFG register the implementation. implemented for each TLB array that is implemented. TLBnCFG corresponds to TLBn for 0 n 47 LRAT Translation Supported (LRAT) [Cate- MMUCFGNTLBS. TLBnCFG registers are privileged gory: Embedded.Hypervisor.LRAT] registers except that if the Embedded.Hypervisor cate- Indicates LRAT translation is supported. gory is supported, TLBnCFG registers are hypervisor 0 LRAT translation is not supported. A resources. The layout of the TLBnCFG registers is tlbwe executed in guest supervisor state results in an Embedded Hypervisor Privi- lege exception. 1 LRAT translation is supported by one or more TLB arrays. See TLBnCFGGTWE. The LRATCFG and LRATPS registers are supported. 48 TLB Write Conditional (TWC) [MAV=2.0] Indicates whether the Embedded.TLB Write Conditional category is supported. Chapter 6. Storage Control 971 Version 2.06 shown in Figure 29 for MAV=1.0 and in Figure 30 for TLB array because LRAT translation is sup- MAV=2.0. ported for the TLB array. 0 A guest supervisor cannot write the TLB MAXSIZE ASSOC // NENTRY MINSIZE array. A tlbwe executed in guest supervi- IPROT AVAIL sor state results in an Embedded Hypervi- sor Privilege exception. 32 40 44 48 49 50 52 63 1 A guest supervisor can write the TLB array if guest execution of TLB Manage- Figure 29. TLB Configuration Register [MAV=1.0] ment instructions is enabled GTWE (EPCRDGTMI=0). ASSOC /// IPROT / / NENTRY HES 48 Invalidate Protection (IPROT) IND PT Invalidate protect capability of the TLB array. 32 40 45 46 47 48 49 50 51 52 63 0 Indicates invalidate protection capability not supported. Figure 30. TLB Configuration Register [MAV=2.0] 1 Indicates invalidate protection capability The TLBnCFG fields are described below. supported. Bit Description 49 Page Size Availability (AVAIL) [MAV=1.0] This defines the page size availability of the 32:39 Associativity (ASSOC) TLB array. If the Embedded.Page Table cate- Total number of entries in a TLB array which gory is supported, this also defines the virtual can be used for translating addresses with a address space size availability of TLB array. given EPN. This number is referred to as the Otherwise, this field is reserved. associativity level of the TLB array. A value equal to NENTRY or 0 indicates the array is 0 Fixed selectable page size from MINSIZE fully-associative. to MAXSIZE (all TLB entries are the same size). 40:43 Minimum Page Size (MINSIZE) [MAV=1.0] 1 Variable page size from MINSIZE to MAX- This field defines the minimum page size of SIZE (each TLB entry can be sized sepa- the TLB array. Page size encoding is defined rately). in Section 6.7.2. 50 Hardware Entry Select (HES) [MAV=2.0] 44:47 Maximum Page Size (MAXSIZE) [MAV=1.0] Indicates whether the TLB array supports This field defines the maximum page size of MAS0HES and the associated method for the TLB array. Page size encoding is defined hardware selecting a TLB entry based on in Section 6.7.2. MAS1TID TSIZE and MAS2EPN for a tlbwe 45 Page Table (PT) [MAV=2.0 and Category: instruction. E.PT] 0 MAS0HES is not supported. Indicates that the TLB array can be loaded 1 MAS0HES is supported for tlbwe. See from the hardware Page Table. Section 6.10.3.8. For tlbre, MAS0ESEL 0 TLB array cannot be loaded from the selects among the TLB entries that can be Page Table. used for translating addresses with a 1 TLB array can be loaded from the Page given MAS1TID TSIZE and MAS2EPN. The Table. set of TLB entries is determined by a hardware generated hash based on MAS1TID TSIZE and MAS2EPN. The hash 46 Indirect (IND) [MAV=2.0 and Category: E.PT] is the same for tlbwe and tlbre for a given Indicates that an indirect TLB entry can be TLB array but could be different for each created in the TLB array and that there is a TLB array. corresponding EPTCFG register that defines the SIZE and Sub-Page Size values that are 52:63 Number of Entries (NENTRY) supported. Number of entries in the TLB array. 0 The TLB array treats the IND bit as All other fields are reserved. reserved. 1 The TLB array supports indirect entries. 6.10.3.3 TLB Page Size Registers 47 Guest TLB Write Enabled (GTWE) (TLBnPS) [MAV=2.0] [MAV=2.0 and Category: Embedded.Hypervi- Each TLBnPS read-only register provides page size sor.LRAT] information about each corresponding TLB array that is Indicates that a guest supervisor can write the implemented in MMU Architecture Version 2.0. Each 972 Power ISATM Book III-E Version 2.06 Page Size bit (PS0-PS31) that is a one indicates that a specific page size is supported by the array. Multiple 1 Table 9: Relationship of TLBnPS PS bits bits indicate that multiple page sizes are supported and LRATPS PS bits to page concurrently. TLBnPS registers are privileged registers size except that if the Embedded.Hypervisor category is TLBnPS or supported, TLBnPS registers are hypervisor resources. PSm Page Size LRATPS bit The layout of the TLBnPS registers is shown in Figure 31. 32 PS31 2TB 33 PS30 1TB PS31 - PS0 34 PS29 512GB 32 63 35 PS28 256GB 36 PS27 128GB Figure 31. TLB n Page Size Register 37 PS26 64GB 38 PS25 32GB The TLBnPS fields are described below. 39 PS24 16GB Bit Description 40 PS23 8GB 41 PS22 4GB 32:63 Page Size 31 - Page Size 0 (PS31-PS0) 42 PS21 2GB PSm indicates whether a direct TLB entry 43 PS20 1GB page size of 2m KB is supported by the TLB 44 PS19 512MB array. PSm corresponds to bit TLBnPS63-m for 45 PS18 256MB m = 0 to 31. 46 PS17 128MB 0 Direct TLB entry page size of 2m KB is not 47 PS16 64MB supported. 48 PS15 32MB 1 Direct TLB entry page size of 2m KB is 49 PS14 16MB supported. 50 PS13 8mb 51 PS12 4MB Table 9 shows the relationship between the Page Size 52 PS11 2MB (PSm) bits in TLBnPS and page size. The existence 53 PS10 1MB and type of mechanism for configuring the use of a 54 PS9 512KB subset of supported page sizes is implementation- 55 PS8 256KB dependent. 56 PS7 128KB 57 PS6 64KB 58 PS5 32KB 59 PS4 16KB 60 PS3 8KB 61 PS2 4KB 62 PS1 2KB 63 PS0 1KB 6.10.3.4 Embedded Page Table Config- uration Register (EPTCFG) This read-only register consists of 3 pairs of page size (PSi) and sub-page size (SPSi) values, where i = 0 to 2. These combinations are supported for Page Table translations. The page size and sub-page size encod- ings for PSi and SPSi are the same the MAS1TSIZE encodings, except that an SPSi value of 0b00001 is reserved and a value of zero for the SPSi field means there is no page size and sub-page size combination information supplied by that field. If SPSi is zero, PSi is zero. Zero values of PSi and SPSi pairs are the left- most fields. See Table 3. For nonzero values of SPSi, PSi minus SPSi is greater than 7. The EPTCFG register is a privileged register except that if the Embedded.Hypervisor category is supported, Chapter 6. Storage Control 973 Version 2.06 EPTCFG register is a hypervisor resource. The layout Bit Description of the EPTCFG register is shown in Figure 32. 32:39 Associativity (ASSOC) Total number of entries in the LRAT array // PS2 SPS2 PS1 SPS1 PS0 SPS0 which can be used for translating addresses 32 34 39 44 49 54 59 63 with a given EPN. This number is referred to Figure 32. Embedded Page Table Configuration as the associativity level of the LRAT array. A Register value equal to NENTRY or 0 indicates the array is fully-associative. The EPTCFG fields are described below. 40:46 Logical Address Size (LASIZE) Bit Description Number of bits in a logical address supported 34:38 Page Size 2 (PS2) by the implementation. PS2 indicates whether an indirect TLB entry 50 LPID Supported (LPID) with a page size of 2PS2 KB combined with the Indicates whether the LPID field in the LRAT sub-page size specified by SPS2 is sup- is supported. ported. 0 The LPID field in the LRAT is not sup- 39:43 Sub-Page Size 2 (SPS2) ported. SPS2 indicates whether an indirect TLB entry 1 The LPID field in the LRAT is supported. with a sub-page size of 2SPS2 KB combined with the page size specified by PS2 is sup- 52:63 Number of Entries (NENTRY) ported. Number of entries in LRAT array. At least one entry is supported. 44:48 Page Size 1 (PS1) PS1 indicates whether an indirect TLB entry with a page size of 2PS1 KB combined with the All other fields are reserved. sub-page size specified by SPS1 is sup- ported. 6.10.3.6 LRAT Page Size Register 49:53 Sub-Page Size1 (SPS1) (LRATPS) [Category: Embedded.Hyper- SPS1 indicates whether an indirect TLB entry with a sub-page size of 2SPS1 KB combined visor.LRAT] with the page size specified by PS1 is sup- LRATPS is a read-only register that provides page size ported. information about the LRAT that is implemented if the 54:58 Page Size 0 (PS0) Embedded.Hypervisor.LRAT category is supported in PS0 indicates whether an indirect TLB entry MMU Architecture Version 2.0. Each Page Size bit with a page size of 2PS0 KB combined with the (PS0-PS31) that is a one indicates that a specific logi- sub-page size specified by SPS0 is sup- cal page size is supported by the array. Multiple 1 bits ported. indicate that multiple page sizes are supported concur- rently. LRATPS is a hypervisor resource. The layout of 59:63 Sub-Page Size 0 (SPS0) the LRATPS registers is shown in Figure 34. SPS0 indicates whether an indirect TLB entry with a sub-page size of 2SPS0 KB combined PS31 - PS0 with the page size specified by PS0 is sup- 32 63 ported. Figure 34. LRAT Page Size Register 6.10.3.5 LRAT Configuration Register The LRATPS fields are described below. (LRATCFG) [Category: Embed- Bit Description ded.Hypervisor.LRAT] 32:63 Page Size 31 - Page Size 0 (PS31-PS0) The LRATCFG read-only register provides configura- PSm indicates whether a logical page size of tion information about the LRAT array. LRATCFG is a 2m KB is supported by the LRAT array. PSm hypervisor resource. The layout of the LRATCFG regis- corresponds to bit LRATPS64-m for m = 0 to ters is shown in Figure 33. 31. 0 Logical page size of 2m KB is not sup- LPID ASSOC LASIZE /// / NENTRY ported. 1 Logical page size of 2m KB is supported. 32 40 47 50 51 52 63 All other fields are reserved. Figure 33. LRAT Configuration Register The LRATCFG fields are described below. 974 Power ISATM Book III-E Version 2.06 Table 9 on page 976 shows the relationship between Bit Description the Page Size (PSm) bits in LRATPS and the logical 41:44TLB3 Array Page Size (TLB3_PS) page size. Page size of the TLB3 array. 45:48TLB2 Array Page Size (TLB2_PS) Page size of the TLB2 array. 6.10.3.7 MMU Control and Status Reg- 49:52TLB1 Array Page Size (TLB1_PS) ister (MMUCSR0) Page size of the TLB1 array. The MMUCSR0 register is used for general control of 53:56 TLB0 Array Page Size (TLB0_PS) the MMU including page sizes for programmable fixed Page size of the TLB0 array. size TLB arrays [MAV=1.0] and invalidation of the TLB array. For TLB arrays that have programmable fixed Programming Note sizes, the TLBn_PS fields [MAV=1.0] allow software to Changing the fixed page size of an entire specify the page size. MMUCSR0 is a privileged regis- array must be done with great care. If any ter except that if the Embedded.Hypervisor category is entries in the array are valid, changing the supported, MMUCSR0 is a hypervisor resource. page size may cause those entries to overlap, creating a serious programming The layout of the MMUCSR0 is shown in Figure 35 for error. It is suggested that the entire TLB MAV=1.0 and in Figure 36 for MAV=2.0. array be invalidated and any entries with IPROT have their V bits set to zero before TLB3_PS TLB2_PS TLB1_PS TLB0_PS /// // / TLB2_FI TLB3_FI TLB0_FI TLB1_FI changing page size. 57:62 TLBn Invalidate All 32 41 45 49 53 57 58 59 61 62 63 TLB invalidate all bit for the TLBn array. 0 If this bit reads as a 1, an invalidate all Figure 35. MMU Control and Status Register 0 operation for the TLBn array is in [MAV=1.0] progress. Hardware will set this bit to 0 when the invalidate all operation is com- TLB2_FI TLB3_FI TLB0_FI TLB1_FI /// // / pleted. Writing a 0 to this bit during an invalidate all operation is ignored. 1 TLBn invalidation operation. Hardware ini- 32 57 58 59 61 62 63 tiates a TLBn invalidate all operation. When this operation is complete, this bit is Figure 36. MMU Control and Status Register 0 cleared. Writing a 1 during an invalidate [MAV=2.0] all operation produces an undefined The MMUCSR0 fields are described below. result. If the TLB array supports IPROT, entries that have IPROT set will not be Bit Description invalidated. 41:56 TLBn Array Page Size [MAV=1.0] 57 TLB2 Invalidate All (TLB2_FI) A 4-bit field specifies the page size for TLBn TLB invalidate all bit for the TLB2 array. array. Page size encoding is defined in Sec- tion 6.7.2. If the value of TLBn_PS is not 58 TLB3 Invalidate All (TLB3_FI) between TLBnCFGMINSIZE and TLB invalidate all bit for the TLB3 array. TLBnCFGMAXSIZE, the page size is set to 61 TLB0 Invalidate All (TLB0_FI) TLBnCFGMINSIZE. A TLBn_PS field is imple- TLB invalidate all bit for the TLB0 array. mented only for a TLB array that can be pro- grammed to support only one of several fixed 62 TLB1 Invalidate All (TLB1_FI) page sizes. For each TLB array n (for 0 n < TLB invalidate all bit for the TLB1 array. MMUCFGNTLBS), this field is implemented All other fields are reserved. only if the following are all true. TLBnCFGAVAIL = 0. 6.10.3.8 MAS0 Register TLBnCFGMINSIZE TLBnCFGMAXSIZE. The MAS0 register contains fields for identifying and selecting a TLB entry. If the Embedded.Hypervi- sor.LRAT category is supported, the MAS0 register is also used to select an LRAT entry as well as select between a TLB array and the LRAT array. MAS0 regis- ter fields are loaded by the execution of the tlbsx instruction and by the occurrence of an Instruction or Data TLB Error interrupt under certain conditions. Chapter 6. Storage Control 975 Version 2.06 MAS0 is a privileged register. The layout of the MAS0 0 The entry is selected by MAS0ESEL and a register is shown in Figure 37. hardware generated hash based on TLBSEL MAS1TID TSIZE, and MAS2EPN. / ESEL / WQ NV 1 The entry is selected by a hardware ATSEL HES replacement algorithm and a hardware generated hash based on MAS1TID TSIZE, 32 33 34 36 48 49 50 52 63 and MAS2EPN. 50:51 Write Qualifier (WQ) [MAV=2.0 and Cate- Figure 37. MAS0 register gory: Embedded.TLB Write Conditional] The MAS0 fields are described below. Qualifies the TLB write operation performed by tlbwe if a TLB entry is to be written Bit Description (MAS0ATSEL=0 or MSRGS=1). If an LRAT 32 Array Type Select (ATSEL) [Category: entry is to be written (MAS0ATSEL=1 and Embedded.Hypervisor.LRAT] MSRGS=0) by a tlbwe and the Embed- Selects LRAT or TLB for access for tlbwe and ded.TLB Write Conditional category is sup- tlbre. In guest state, MAS0ATSEL is treated as ported, WQ must be 0b00. Otherwise the if it were zero such that a TLB array is always result is undefined. This field has no effect on selected. other TLB Management instructions. Whether 0 TLB an implementation supports this field is indi- 1 LRAT cated by MMUCFGTWC. If MMUCFGTWC = 0, WQ is ignored and treated as 0b00 for tlbwe. 34:35 TLB Select (TLBSEL) If ATSEL=0 or MSRGS=1, selects TLB for 00 The selected TLB entry is written regard- access. less of the TLB-reservation. The TLB-res- If ATSEL=1, TLBSEL is treated as reserved. ervation is cleared. 01 The selected TLB entry is written if and 00 TLB0 only if the TLB reservation exists. A tlbwe 01 TLB1 with this value is called a TLB Write Con- 10 TLB2 ditional. The TLB-reservation is cleared. 11 TLB3 10 The TLB-reservation is cleared; no TLB 36:47 Entry Select (ESEL) entry is written. Identifies an entry in the selected array to be 11 Reserved used for tlbwe and tlbre. Valid values for 52:63 Next Victim (NV) ESEL are from 0 to TLBnCFGASSOC - 1 for a NV is a hint to software to identify the next vic- TLB array and 0 to LRATCFGASSOC - 1 for the tim to be targeted for a TLB miss replacement LRAT. That is, ESEL selects the entry in the operation for those TLBs that support the NV selected array from the set of entries which function. If the TLB selected by MAS0TLBSEL can be used for translating addresses with the does not support the NV function, this field is EPN (if TLBnCFGHES=0 and either undefined. The method of determining the MAS0ATSEL=0 or MSRGS=1), the combination next victim is implementation-dependent. NV of EPN, SIZE, and PID (for tlbwe if TLBnCF- is updated on tlbsx hit and miss cases as GHES=1 and either MAS0ATSEL=0 or shown in Table 11 on page 986, on execution MSRGS=1), or the LPN (if MAS0ATSEL=1 and of tlbre if the TLB array being accessed sup- MSRGS=0) specified by MAS2EPN. For fully- ports the NV field, and on TLB Error interrupts associative TLB or LRAT arrays, ESEL if the Embedded.Hypervisor category is not ranges from 0 to TLBnCFGNENTRY - 1 or 0 to supported, MAS Register updates are LRATCFGNENTRY - 1, respectively. enabled for interrupts directed to the hypervi- 49 Hardware Entry Select (HES) [MAV=2.0] sor (EPCRDMIUH = 0), or the interrupt is Determines how the TLB entry within the directed to the guest state. When NV is selected TLB array is selected by tlbwe if a updated by a supported TLB array, the NV TLB entry is to be written (MAS0ATSEL=0 or field will always present a value that can be MSRGS=1). If an LRAT entry is to be written used in the MAS0ESEL field. The LRAT array (MAS0ATSEL=1 and MSRGS=0) by a tlbwe, does not support Next Victim. HES must be 0. Otherwise the result is unde- All other fields are reserved. fined. This field has no effect on other TLB Management instructions. Whether an imple- mentation supports this bit for a TLB array is 6.10.3.9 MAS1 Register indicated by TLBnCFGHES. If TLBnCFGHES = The MAS1 register contains fields used for reading and 0, HES is ignored and treated as 0 for tlbwe. writing an LRAT or TLB entry. MAS1 976 Power ISATM Book III-E Version 2.06 register fields are also loaded by the execution of the All other fields are reserved. tlbsx instruction and by the occurrence of an Instruc- tion or Data TLB Error interrupt under certain condi- tions. TLB fields loaded from the MAS1 register are 6.10.3.10 MAS2 Register used for selecting a TLB entry during translation. If the The MAS2 register is a 64-bit register in 64-bit imple- Embedded.Hypervisor.LRAT category is supported, mentations and a 32-bit register in 32-bit implementa- LRAT fields V and LSIZE, which are loaded from tions. The MAS2 register contains fields used for MAS1V TSIZE, are used for selecting an LRAT entry for reading and writing an LRAT or TLB translating LPNs when tlbwe is executed in guest entry. MAS2 register fields are also loaded by the exe- supervisor state and, if the Embedded.Page Table cat- cution of the tlbsx instruction and by the occurrence of egory is supported, during page table lookups per- an Instruction or Data TLB Error interrupt under certain formed when the PTEARPN is treated as an LPN (The conditions. The register contains fields for specifying Embedded.Hypervisor.LRAT category is supported the effective page address and the storage control and the TGS bit of the corresponding indirect TLB entry attributes for a TLB entry. If the Embedded.Hypervi- is 1). MAS1 is a privileged register. The layout of the sor.LRAT category is supported, the MAS2 register MAS1 register is shown in Figure 38 for MAV=1.0 and EPN field can also be used for specifying the logical in Figure 39 for MAV=2.0. page number for an LRAT entry. The only MAS2 field used for the LRAT array is EPN. MAS2 is a privileged IPROT V TID /// TS TSIZE /// register. The layout of the MAS2 register is shown in Figure 40 for MAV=1.0 and in Figure 41 for MAV=2.0. 32 33 34 48 51 52 56 63 ACM VLE EPN /// W I MGE Figure 38. MAS1 register [MAV=1.0] 0 52 57 58 59 60 61 62 63 IPROT V TID // TS TSIZE /// Figure 40. MAS2 register [MAV = 1.0] IND 32 33 34 48 50 51 52 57 63 ACM VLE Figure 39. MAS1 register [MAV=2.0] EPN /// W I MGE The MAS1 fields are described below. 0 54 57 58 59 60 61 62 63 Bit Definition Figure 41. MAS2 register [MAV = 2.0] 32 Valid Bit (V) The MAS2 fields are described below. See the corresponding TLB bit definition in Section 6.7.1 and the corresponding LRAT bit Bit Description definition in Section 6.9. 0:51 Effective Page Number (EPN) [MAV=1.0] 33 Invalidate Protect (IPROT) See the corresponding TLB bit definition in See the corresponding TLB bit definition in Section 6.7.1. Bits that correspond to an offset Section 6.7.1. within the smallest virtual page implemented need not be implemented. Unimplemented 34:47 Translation Identity (TID) EPN bits are treated as 0s. EPN0:31 are imple- See the corresponding TLB field definition in mented only in 64-bit implementations as the Section 6.7.1. upper 32 bits of the effective address of the 50 Indirect (IND) [MAV=2.0 and Category: page. Embedded.Page Table] 0:53 Effective Page Number (EPN) [MAV=2.0 See the corresponding TLB bit definition in See the corresponding TLB bit definition in Section 6.7.1. Section 6.7.1 and the LRAT LPN field defini- 51 Translation Space (TS) tion in Section 6.9. Bits that correspond to an See the corresponding TLB bit definition in offset within the smallest virtual page imple- Section 6.7.1. mented need not be implemented. Unimple- mented EPN bits are treated as 0s. EPN0:31 52:55 Translation Size (TSIZE) [MAV=1.0] are implemented only in 64-bit implementa- See the TLB SIZE field definition in Section tions as the upper 32 bits of the effective 6.7.1. address of the page. 52:56 Translation Size (TSIZE) [MAV=2.0] 57 Alternate Coherency Mode (ACM) See the TLB SIZE field definition in Section See the corresponding TLB bit definition in 6.7.1 and the LRAT LSIZE field definition in Section 6.7.1. If ACM is not supported by the Section 6.9. implementation, this bit is treated as reserved. Chapter 6. Storage Control 977 Version 2.06 category is supported, the RPN specified by MAS7 and Programming Note MAS3 is treated as an LPN for tlbwe executed in guest Some previous implementations may supervisor state (see Section 6.9). MAS3 is a privileged have a TLB storage bit accessed via this register. if the Embedded.Page Table category is sup- bit position and labeled as X0. Software ported, MAS3 has different meanings depending on the should not use the presence of this bit MAS1IND value. For MAS1IND = 0, the layout of the (the ability to set to 1 and read a 1) to MAS3 register is shown in Figure 42 for MAV=1.0 and determine if the implementation supports in Figure 43 for MAV=2.0. the Alternate Coherency Mode. U0:U3 RPNL // UW SW UR SR UX SX 58 VLE Mode (VLE) [Category: VLE] See the corresponding TLB bit definition in Section 6.7.1. If the VLE category is not sup- 32 52 54 58 59 60 61 62 63 ported, this bit is treated as reserved. Figure 42. MAS3 register for MAS1IND=0 [MAV=1.0] Programming Note U0:U3 RPNL UW SW UR SR UX SX Some previous implementations may have a TLB storage bit accessed via this position and labeled as X1. Software 32 54 58 59 60 61 62 63 should not use the presence of this bit Figure 43. MAS3 register for MAS1IND=0 [MAV=2.0] (the ability to set to 1 and read a 1) to determine if the implementation supports For MAS1IND = 1, the layout of the MAS3 register is the VLE. shown in Figure 44. U0:U3 59 Write Through (W) RPNL SPSIZE UND See the corresponding TLB bit definition in Section 6.7.1. 32 54 58 63 60 Caching Inhibited (I) Figure 44. MAS3 register for MAS1IND=1 [MAV=2.0 See the corresponding TLB bit definition in and Category: E.PT] Section 6.7.1. 61 Memory Coherence Required (M) The MAS3 fields are described below. See the corresponding TLB bit definition in Bit Description Section 6.7.1. 32:51 Real Page Number (bits 32:51) (RPNL or 62 Guarded (G) RPN32:51) [MAV=1.0] See the corresponding TLB bit definition in The real page number is formed by the upper Section 6.7.1. n bits of (MAS7RPNU || MAS3RPNL), where n = 63 Endianness (E) 64 - log2(page size in bytes) and page size is See the corresponding TLB bit definition in specified by MAS1TSIZE for a tlbwe instruction Section 6.7.1. and by the SIZE field of the TLB entry if a TLB entry is being read by a tlbre or tlbsx instruc- All other fields are reserved. tion. RPN0:31 are accessed through MAS7. RPNL bits corresponding to bits that are not 6.10.3.11 MAS3 Register implemented in the RPN field of the TLB are treated as reserved. The MAS3 register contains fields used for reading and 32:53 Real Page Number (bits 32:53) (RPNL or writing an LRAT or TLB entry. MAS3 RPN32:53) [MAV=2.0] register fields are also loaded by the execution of the The real page number is formed by the upper tlbsx instruction and by the occurrence of an Instruc- n bits of (MAS7RPNU || MAS3RPNL), where n = tion or Data TLB Error interrupt under certain condi- 64 - log2(page size in bytes) and page size is tions. The MAS3 register contains fields for specifying specified by MAS1TSIZE for a tlbwe instruc- the real page address, user defined attributes, and the tion, by the SIZE field of the TLB entry if a TLB permission attributes for a TLB entry. if the Embed- entry is being read by a tlbre or tlbsx instruc- ded.Page Table category is supported, MAS3 also con- tion or by the LSIZE field of the LRAT entry if tains a field specifying the minimum page size specified an LRAT entry is being read by a tlbre instruc- by each Page Table Entry that is mapped by the indi- tion. RPNL bits corresponding to bits that are rect TLB entry. if the Embedded.Hypervisor.LRAT cate- not implemented in the RPN field of the TLB gory is supported, the low-order LRPN bits of the LRAT are treated as reserved. array can be read into or written from MAS3RPNL by hypervisor software. If the Embedded.Hypervisor.LRAT 978 Power ISATM Book III-E Version 2.06 54:57 User Bits (U0:U3) See the corresponding TLB bit definition in TLBSELD Section 6.7.1. If one or more of these bits is // /// // TSIZED ACMD VLED INDD not implemented in the TLB, the correspond- WD MD GD ED ID ing MAS3 bit is treated as reserved. If MAS1IND = 0, MAS358:63 are defined as follows: 32 34 36 48 49 52 57 58 59 60 61 62 63 58 User State Execute Enable (UX) Figure 46. MAS4 register [MAV=2.0] See the corresponding TLB bit definition in The MAS4 fields are described below. Section 6.7.1. Bit Description 59 Supervisor State Execute Enable (SX) See the corresponding TLB bit definition in 34:35 TLBSEL Default Value (TLBSELD) Section 6.7.1. Specifies the default value loaded in MAS0TLBSEL on the interrupt. 60 User State Write Enable (UW) See the corresponding TLB bit definition in 48 IND Default Value (INDD) Section 6.7.1. Specifies the default value loaded in MAS1IND and MAS6SIND on the interrupt. 61 Supervisor State Write Enable (SW) See the corresponding TLB bit definition in 52:55 Default TSIZE Value (TSIZED) [MAV=1.0] Section 6.7.1. Specifies the default value loaded into MAS1TSIZE on a TLB miss exception. 62 User State Read Enable (UR) See the corresponding TLB bit definition in 52:56 Default TSIZE Value (TSIZED) [MAV=2.0] Section 6.7.1. Specifies the default value loaded into MAS1TSIZE on a TLB miss exception. If 63 Supervisor State Read Enable (SR) MMUCFGTWC = 1, TSIZED is also the default See the corresponding TLB bit definition in value loaded into MAS6ISIZE on the interrupt. Section 6.7.1. 57 Default ACM Value (ACMD) If MAS1IND = 1, MAS358:63 are defined as follows: Specifies the default value loaded into MAS2ACM on the interrupt. 58:62 Sub-Page Size (SPSIZE) See the corresponding TLB field definition in 58 Default VLE Value (VLED) [Category: VLE] Section 6.7.1. Specifies the default value loaded into MAS2VLE on the interrupt. 63 Undefined) (UND) The value of this bit is undefined after a tlbre 59 Default W Value (WD) or tlbsx. Specifies the default value loaded into MAS2W on the interrupt. All other fields are reserved. 60 Default I Value (ID) Specifies the default value loaded into MAS2I 6.10.3.12 MAS4 Register on the interrupt. The MAS4 register contains fields for specifying default 61 Default M Value (MD) information to be pre-loaded on an Instruction or Data Specifies the default value loaded into MAS2M TLB Error interrupt if the Embedded.Hypervisor cate- on the interrupt. gory is not supported, MAS Register updates are enabled for interrupts directed to the hypervisor 62 Default G Value (GD) (EPCRDMIUH = 0), or the interrupt is directed to the Specifies the default value loaded into MAS2G guest state. See Section 6.11.4.7 for more information. on the interrupt. MAS4 is a privileged register. The layout of the MAS4 63 Default E Value (ED) register is shown in Figure 45 for MAV=1.0 and in Fig- Specifies the default value loaded into MAS2E ure 46 for MAV=2.0. on the interrupt. All other fields are reserved. TLBSELD // /// / TSIZED ACMD VLED WD MD GD ED ID 6.10.3.13 MAS5 Register [Category: 32 34 36 52 56 57 58 59 60 61 62 63 Embedded.Hypervisor] Figure 45. MAS4 register [MAV=1.0] The MAS5 register contains fields for specifying LPID and GS values to be used when searching TLB entries with the tlbsrx. and tlbsx instructions. The Chapter 6. Storage Control 979 Version 2.06 SLPID and SGS fields are used to match TLPID and register. The layout of the MAS6 register is shown in TGS fields in the TLB entry. The MAS5 fields are also Figure 48 for MAV=1.0 and in Figure 49 for MAV=2.0. used for selecting TLB entries to be invalidated by the SAS tlbilx or tlbivax instructions. MAS5 is a hyper- // SPID /// visor resource. The layout of the MA5 register is shown in Figure 47. 32 34 48 63 Figure 48. MAS6 register [MAV = 1.0] SGS /// SLPID SIND // SPID /// ISIZE /// SAS 32 33 52 63 Figure 47. MAS5 register 32 34 48 52 57 62 63 The MAS5 fields are described below. Figure 49. MAS6 register [MAV = 2.0] Bit Description The MAS6 fields are described below. 32 Search GS (SGS) Bit Description Specifies the GS value used when searching the TLB during execution of tlbsrx. 34:47 Search PID (SPID) and tlbsx and for selecting TLB entries to be Specifies the value of PID used when search- invalidated by tlbilx or tlbivax. The ing the TLB during execution of tlbsrx. SGS field is compared with the TGS field of and tlbsx. It also defines the PID of each TLB entry to find a matching entry. the TLB entry to be invalidated by tlbilx with T=1 or T=3 and tlbivax with 52:63 Search Logical Partition ID (SLPID) EA61=0. The number of bits implemented is Specifies the LPID value used when search- the same as the number of bits implemented ing the TLB during execution of tlbsrx. in the PID register. and tlbsx and for selecting TLB entries to be invalidated by tlbilx or Programming Note tlbivax. The SLPID field is compared with the The SPID field was referred to as SPID0 TLPID field of each TLB entry to find a match- in previous versions of the architecture. ing entry. Only the least significant MMUCF- GLPIDSIZE bits of SLPID are implemented. 52:56 Invalidation Size (ISIZE) [MAV=2.0] All other fields are reserved. ISIZE defines the size of the virtual address space mapped by the TLB entry to be invali- Programming Note dated by tlbilx T=3 and tlbivax. This Hypervisor software should generally treat MAS5 field is only supported if MMUCFGTWC = 1 or, as part of the partition state. for any TLB array, TLBnCFGHES = 1. Other- wise this field is reserved. 6.10.3.14 MAS6 Register Programming Note To make code more portable across The MAS6 register contains fields for specifying PID, implementations, software should always IND, and AS values to be used when searching TLB set ISIZE before executing tlbilx T=3 entries with the tlbsx instruction and, if MMUCFGTWC = and tlbivax. 1 or TLBnCFGHES = 1, for specifying the PID, IND, AS, and size of the virtual address space to be used for 62 Indirect Value for Searches (SIND) selecting TLB entries to be invalidated by the tlbilx [MAV=2.0 and Category: Embedded.Page or tlbivax instructions. MAS6 is a privileged Table] Specifies the value of IND used when search- ing the TLB during execution of tlbsrx. and tlbsx. It also defines the Indi- rect (IND) value of the TLB entry to be invali- dated by tlbilx T=3 and tlbivax. 63 Address Space Value for Searches (SAS) Specifies the value of AS used when search- ing the TLB during execution of tlbsrx. and tlbsx. It also defines the TS value of the TLB entry to be invalidated by tlbilx T=3 and tlbivax. 980 Power ISATM Book III-E Version 2.06 All other fields are reserved. 32 Translation Guest State (TGS) See the corresponding TLB bit definition in Section 6.7.1. 6.10.3.15 MAS7 Register 33 Translation Virtualization Fault (VF) The MAS7 register contains a field used for reading See the corresponding TLB bit definition in and writing an LRAT or TLB entry. Section 6.7.1. MAS7 register field is also loaded by the execution of the tlbsx instruction. The MAS7 register contains the 52:63 Translation Logical Process ID (TLPID) high order address bits of the RPN for a TLB entry in See the corresponding TLB bit definition in implementations that support more than 32 bits of Section 6.7.1 and the LRAT LPID field defini- physical address. if the Embedded.Hypervisor.LRAT tion in Section 6.9. category is supported by such implementations, the All other fields are reserved. high-order LRPN bits of the LRAT array can be read into or written from MAS7 by hypervisor software. If the Programming Note Embedded.Hypervisor.LRAT category is supported, the RPN specified by MAS7and MAS3 is treated as an Hypervisor software should generally treat MAS8 LPN for tlbwe executed in guest supervisor state (see as part of the partition state. After executing tlbsx Section 6.9). If no more than 32 bits of physical and tlbre, hypervisor software may need to restore addressing are supported, it is implementation-depen- MAS8 before returning to guest state. This is espe- dent whether MAS7 is implemented. MAS7 is a privi- cially important if the Embedded.Hypervisor.LRAT leged register. The layout of the MAS7 is shown in category is supported because a guest can exe- Figure 50. cute a tlbwe instruction that writes a TLB entry with the MAS8 values. RPNU 32 63 Programming Note Figure 50. MAS7 register For a TLB entry with VF=1, hypervisor software should have the execution permission bits set so The MAS7 fields are described below. that an instruction fetch of the page is prevented. Bit Description The VF bit can be used to force a Data Storage 32:63 Real Page Number (bits 0:31) (RPNU or interrupt for virtualization of MMIO. RPN0:31) RPN32:53 are accessed through MAS3. RPNU bits corresponding to bits that are not imple- 6.10.3.17 Accesses to Paired MAS Reg- mented in the RPN field of the TLB are treated isters as reserved. In 64-bit mode, certain MAS registers can be accessed in pairs with a single mtspr or mfspr instruction. The 6.10.3.16 MAS8 Register [Category: registers that can be accessed this way are shown in Embedded.Hypervisor] Table 10. These register pairs are treated as if they are a single 64-bit register by a mtspr or mfspr instruction. The MAS8 register contains fields used for reading and writing an LRAT or TLB entry. The MAS8 register fields are also loaded from a matching Table 10:MAS Register Pairs TLB entry by execution of a tlbsx instruction that is Privileged successful. The associated TLB fields are used to SPR 64-bit Pairs mtspr & Cat2 select a TLB entry during TLB address translation and Number mfspr TLB searches, and to force a Data Storage Interrupt to be directed to the hypervisor state. MAS8 is an hypervi- MAS5 || MAS6 348 hypv1 E.HV; 64 sor resource. The LPID field of the LRAT is used to MAS8 || MAS1 349 hypv1 E.HV; 64 select an LRAT entry during LRAT translation. The lay- MAS7 || MAS3 372 yes 64 out of the MAS8 register is shown in Figure 51. MAS0 || MAS1 373 yes 64 1 This register is a hypervisor resource, and can be TGS /// TLPID accessed by one of these instructions only in VF hypervisor state (see Chapter 2). 32 33 34 52 63 2 See Section 1.3.5 of Book I. If multiple categories are listed, the register pair is only provided if all Figure 51. MAS8 register categories are supported. Otherwise the SPR The MAS8 fields are described below. number is treated as reserved. Bit Description Chapter 6. Storage Control 981 Version 2.06 6.10.3.18 MAS Register Update Sum- mary Table 11 summarizes how MAS registers are modified by Instruction TLB Error interrupt, Data TLB Error inter- rupt and the TLB Management instructions. 982 Power ISATM Book III-E Version 2.06 Table 11:MAS Register Update Summary for TLB operations Value Loaded on Event Data TLB Error Inter- Data TLB Data TLB rupt on Load MAS Field Error Inter- Error Inter- or Store that Updated rupt on Exter- rupt on Exter- is not cate- tlbsx hit tlbsx miss tlbre nal Process nal Process gory E.PD or ID Load ID Store Instruction 2 2 TLB Error Interrupt2 MAS0ATSEL 0 0 0 0 0 -- MAS0TLBSEL MAS4TLBSELD MAS4TLBSELD MAS4TLBSELD TLB array that MAS4TLBSELD -- hit MAS0ESEL if TLB array if TLB array if TLB array index of entry if TLB array -- [MAS4TLBSEL [MAS4TLBSEL [MAS4TLBSEL that hit [MAS4TLBSEL D] supports D] supports D] supports D] supports next victim next victim next victim next victim then hardware then hardware then hardware then hardware hint, hint, hint, hint, else unde- else unde- else unde- else unde- fined fined fined fined MAS0HES TLBnCFGHES TLBnCFGHES TLBnCFGHES 0 TLBnCFGHES -- for array spec- for array spec- for array spec- for array spec- ified by ified by ified by ified by MAS4TLBSELD MAS4TLBSELD MAS4TLBSELD MAS4TLBSELD MAS0WQ 0b01 0b01 0b01 0b01 0b01 -- MAS0NV if TLB array if TLB array if TLB array if TLB array if TLB array if TLB array [MAS4TLBSEL [MAS4TLBSEL [MAS4TLBSEL with the [MAS4TLBSEL [MAS0TLBSEL] D] supports D] supports D] supports matching entry D] supports supports next next victim next victim next victim supports next next victim victim then then next then next then next victim then then next hardware hint, hardware hint, hardware hint, hardware hint, hardware hint, hardware hint, else unde- else unde- else unde- else unde- else unde- else unde- fined fined fined fined fined fined MAS1V 1 1 1 1 0 TLBV MAS1IPROT 0 0 0 TLBIPROT 0 TLBIPROT MAS1TID PID EPLCEPID EPSCEPID TLBTID MAS6SPID TLBTID MAS1IND MAS4INDD MAS4INDD MAS4INDD TLBIND MAS4INDD TLBIND MAS1TS MSRIS or EPLCEAS EPSCEAS TLBTS MAS6SAS TLBTS MSRDS MAS1TSIZE MAS4TSIZED MAS4TSIZED MAS4TSIZED TLBSIZE MAS4TSIZED TLBSIZE MAS2EPN EA0:531 EA0:53 EA0:53 TLBEPN undefined TLBEPN MAS2ACM MAS4ACMD MAS4ACMD MAS4ACMD TLBACM MAS4ACMD TLBACM MAS2VLE MAS4VLED MAS4VLED MAS4VLED TLBVLE MAS4VLED TLBVLE MAS2W I M G E MAS4WD ID MD MAS4WD ID MD MAS4WD ID MD TLBW I M G E MAS4WD ID MD TLBW I M G E GD ED GD ED GD ED GD ED MAS3RPNL 0 0 0 TLBRPN[32:53] 0 TLBRPN[32:53] MAS3U0:U3 0 0 0 TLBU0:U3 0 TLBU0:U3 Chapter 6. Storage Control 983 Version 2.06 Table 11:MAS Register Update Summary for TLB operations Value Loaded on Event Data TLB Error Inter- Data TLB Data TLB rupt on Load MAS Field Error Inter- Error Inter- or Store that Updated rupt on Exter- rupt on Exter- is not cate- tlbsx hit tlbsx miss tlbre nal Process nal Process gory E.PD or ID Load ID Store Instruction 2 2 TLB Error Interrupt2 MAS3UX SX UW 0 0 0 if Category: 0 if Category: SW UR SR E.PT sup- E.PT sup- ported and ported and TLBIND TLBIND = 0 then = 0 then TLBUX SX UW TLBUX SX UW SW UR SR SW UR SR MAS3SPSIZE (see the entry (see the entry (see the entry if Category: (see the entry if Category: for MAS3UX SX for MAS3UX SX for MAS3UX SX E.PT sup- for MAS3UX SX E.PT sup- UW SW UR SR) UW SW UR SR) UW SW UR SR) ported and UW SW UR SR) ported and TLBIND TLBIND = 1 then TLB- = 1 then TLB- SPSIZE SPSIZE MAS3UND (see the entry (see the entry (see the entry if Category: (see the entry if Category: for MAS3UX SX for MAS3UX SX for MAS3UX SX E.PT sup- for MAS3UX SX E.PT sup- UW SW UR SR) UW SW UR SR) UW SW UR SR) ported and UW SW UR SR) ported and TLBIND TLBIND = 1 then = 1 then undefined undefined MAS5 & --3 --3 --3 -- -- -- MAS4 MAS6SPID PID EPLCEPID EPSCEPID -- -- -- MAS6ISIZE if MAS4TSIZED MAS4TSIZED MAS4TSIZED -- -- -- TLBnCFGHES=1 or MAS6SAS MSRIS or EPLCEAS EPSCEAS -- -- -- MSRDS MAS6SIND MAS4INDD MAS4INDD MAS4INDD -- -- -- MAS7RPNU 0 0 0 TLBRPN[0:31] 0 TLBRPN[0:31] MAS8TGS VF --3 --3 --3 TLBTGS VF -- TLBTGS VF TLPID TLPID TLPID 1. If MSRCM=0 (32-bit mode) at the time of the exception, EPN0:31 are set to 0. 2. If the E.HV category is not supported, MAS Register updates are enabled for interrupts directed to the hypervisor (EPCRDMIUH = 0), or the interrupt is directed to the guest state. 3. MAS5 and MAS8 are not updated on a Data or Instruction TLB Error interrupt. The hypervisor should ensure they already contain values appropriate to the partition. 984 Power ISATM Book III-E Version 2.06 6.11 Storage Control Instructions 6.11.1 Cache Management Instructions This section describes aspects of cache management delayed Machine Check interrupt or a delayed Check- that are relevant only to privileged software program- stop. mers. Each implementation provides an efficient means by For a dcbz or dcba instruction that causes the target which software can ensure that all blocks that are con- block to be newly established in the data cache without sidered to be modified in the data cache have been being fetched from main storage, the hardware need copied to main storage before the thread enters any not verify that the associated real address is valid. The power conserving mode in which data cache contents existence of a data cache block that is associated with are not retained. an invalid real address (see Section 6.6) can cause a Data Cache Block Invalidate X-form mbar. On other implementations this instruction is treated as a Load (see the section cited above). dcbi RA,RB If a thread holds a reservation and some other thread executes a dcbi that specifies a location in the same 31 /// RA RB 470 / reservation granule, the reservation may be lost only if 0 6 11 16 21 31 the dcbi is treated as a Store. if RA=0 then b 0 dcbi may cause a cache locking exception, the details else b (RA) of which are implementation-dependent. EA b + (RB) This instruction is privileged. InvalidateDataCacheBlock( EA ) Special Registers Altered: Let the effective address (EA) be the sum (RA|0)+(RB). None If the block containing the byte addressed by EA is in storage that is Memory Coherence Required and a block containing the byte addressed by EA is in the data cache of any thread, then the block is invalidated in those data caches. On some implementations, before the block is invalidated, if any locations in the block are considered to be modified in any such data cache, those locations are written to main storage and additional locations in the block may be written to main storage. If the block containing the byte addressed by EA is in storage that is not Memory Coherence Required and a block containing the byte addressed by EA is in the data cache of this thread, then the block is invalidated in that data cache. On some implementations, before the block is invalidated, if any locations in the block are considered to be modified in that data cache, those locations are written to main storage and additional locations in the block may be written to main storage. The function of this instruction is independent of whether the block containing the byte addressed by EA is in storage that is Write Through Required or Caching Inhibited. This instruction is treated as a Store (see Section 6.7.6.5) on implementations that invalidate a block without first writing to main storage all locations in the block that are considered to be modified in the data cache, except that the invalidation is not ordered by Chapter 6. Storage Control 985 Version 2.06 6.11.2 Cache Locking [Category: Embedded Cache Locking] The Embedded Cache Locking category defines the method of locking is said to be persistent; otherwise instructions and methods for locking cache blocks for it is not persistent. An implementation may choose to frequently used instructions and data. Cache locking implement locks as persistent or not persistent; how- allows software to instruct the cache to keep latency ever, the preferred method is persistent. sensitive data readily available for fast access. This is It is implementation-dependent if cache blocks are accomplished by marking individual cache blocks as implicitly unlocked in the following ways: locked. A locked block is invalidated as the result of a A locked block differs from a normal block in the cache dcbi, dcbf, dcbfep, icbi, or icbiep instruction. in the following way: A locked block is evicted because of an overlock- blocks that are locked in the cache do not partici- ing condition. pate in the normal replacement policy when a A snoop hit on a locked block that requires the block must be replaced. block to be invalidated. This can occur because the data the block contains has been modified external to the thread, or another thread has 6.11.2.1 Lock Setting and Clearing explicitly invalidated the block. Blocks are locked into the cache by software using The entire cache containing the locked block is Cache Locking instructions. The following instructions invalidated. are provided to lock data items into the data and instruction cache: 6.11.2.2 Error Conditions dcbtls - Data cache block touch and lock set. Setting locks in the cache can fail for a variety of rea- dcbtstls - Data cache block touch for store and sons. A Lock Set instruction addressing a byte in stor- lock set. age that is not allowed to be accessed by the storage icbtls - Instruction cache block touch and lock set. access control mechanism (see Section 6.7.6) will The RA and RB operands in these instructions are cause a Data Storage interrupt (DSI). Addresses refer- used to identify the block to be locked. The CT field enced by Cache Locking instructions are always trans- indicates which cache in the cache hierarchy should be lated as data references; therefore, icbtls instructions targeted. (See Section 4.3 of Book II.) that fail to translate or are not allowed by the storage access control mechanism cause Data TLB Error inter- These instructions are similar in nature to the dcbt, rupts and Data Storage interrupts, respectively. Addi- dcbtst, and icbt instructions, but are not hints and thus tionally, cache locking and clearing operations can fail locking instructions do not execute speculatively and due to non-privileged access. The methods for deter- may cause additional exceptions. For unified caches, mining other failure conditions such as unable-to-lock both the instruction lock set and the data lock set target or overlocking (see below), is implementation-depen- the same cache. dent. Similarly, blocks are unlocked from the cache by soft- If the Embedded.Hypervisor category is supported and ware using Lock Clear instructions. The following MSRPUCLEP = 1, an attempt to execute a Cache Lock- instructions are provided to unlock instructions and ing instruction in guest state results in an Embedded data in their respective caches: Hypervisor Privilege exception if MSRPR = 0 or a cache dcblc - Data cache block lock clear. locking exception if MSRPR = 1. When the Embed- icblc - Instruction cache block lock clear. ded.Hypervisor category is not supported, MSRPUCLEP = 0, or MSRGS = 0, then if a Cache Locking instruction The RA and RB operands in these instructions are is executed in user mode and MSRUCLE is 0, a cache used to identify the block to be unlocked. The CT field locking exception occurs. If a Data Storage interrupt indicates which cache in the cache hierarchy should be occurs as a result of a cache locking exception, one of targeted. the following ESR or GESR bits is set to 1 (GESR if the Additionally, an implementation-dependent method can Embedded.Hypervisor category is supported and the be provided for software to clear all the locks in the interrupt is directed to the guest. Otherwise, ESR). cache. Bit Description An implementation is not required to unlock blocks that contain data that has been invalidated unless it is 42 DLK0 explicitly unlocked with a dcblc or icblc instruction; if 0 Default setting. the implementation does not unlock the block upon 1 A dcbtls, dcbtstls, or dcblc instruction invalidation, the block remains locked even though it was executed in user mode. contains invalid data. If the implementation does not clear locks when the associated block is invalidated, 986 Power ISATM Book III-E Version 2.06 43 DLK1 The target address is marked Caching Inhibited, or the storage control attributes of the address use a 0 Default setting. coherency protocol that does not support locking. 1 An icbtls or icblc instruction was exe- The target cache is disabled or not present. cuted in user mode. The CT field of the instructions contains a value [Category:Embedded.Hypervisor] not supported by the implementation. The behavior of cache locking instructions in guest Any other implementation-specific error conditions privileged state (dcbtls, dcbtstls, dcblc, icbtls, icblc) are detected. is dependent on the setting of MSRPUCLEP. When If an unable-to-lock or unable-to-unlock condition MSRPUCLEP = 0, cache locking instructions are permit- occurs, the lock set or unlock instruction is treated as a ted to execute normally in the guest privileged state. no-op and the condition may be reported in an imple- When MSRPUCLEP = 1, cache locking instructions are mentation-dependent manner. not permitted to execute in the guest privileged state and cause an Embedded Hypervisor Privilege excep- tion when executed. See Section 4.2.2, "Machine State Register Protect Register (MSRP)". 6.11.2.2.1 Overlocking If no exceptions occur for the execution of an dcbtls, dcbtstls, or icbtls instruction, an attempt is made to lock the specified block into the cache. If all of the avail- able cache blocks into which the specified block may be loaded are already locked, an overlocking condition occurs. The overlocking condition may be reported in an implementation-dependent manner. If an overlocking condition occurs, it is implementation- dependent whether the specified block is not locked into the cache or if another locked block is evicted and the specified block is locked. The selection of which block is replaced in an overlock- ing situation is implementation-dependent. The over- locking condition is still said to exist, and is reflected in any implementation-dependent overlocking status. An attempt to lock a block that is already present and valid in the cache will not cause an overlocking condi- tion. If a cache block is to be loaded because of an instruc- tion other than a Cache Management or Cache Locking instruction and all available blocks into which the block can be loaded are locked, the instruction executes and completes, but no cache blocks are unlocked and the block is not loaded into the cache. Programming Note Since caches may be shared among threads, an overlocking condition may occur when loading a block even though a given thread has not locked all the available cache blocks. Similarly. blocks may be unlocked as a result of invalidations by other threads. 6.11.2.2.2 Unable-to-lock and Unable-to-unlock Conditions If no exceptions occur and no overlocking condition exists, an attempt to set or unlock a lock may fail if any of the following are true: Chapter 6. Storage Control 987 Version 2.06 6.11.2.3 Cache Locking Instructions Data Cache Block Touch and Lock Set Data Cache Block Touch for Store and X-form Lock Set X-form dcbtls CT,RA,RB dcbtstls CT,RA,RB 31 / CT RA RB 166 / 31 / CT RA RB 134 / 0 6 7 11 16 21 31 0 6 7 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). The dcbtls instruction provides a hint that the program The dcbtstls instruction provides a hint that the pro- will probably soon load from the block containing the gram will probably soon store to the block containing byte addressed by EA, and that the block containing the byte addressed by EA, and that the block contain- the byte addressed by EA is to be loaded and locked ing the byte addressed by EA is to be loaded and into the cache specified by the CT field. (See locked into the cache specified by the CT field. (See Section 4.3 of Book II.) If the CT field is set to a value Section 4.3 of Book II.) If the CT field is set to a value not supported by the implementation, no operation is not supported by the implementation, no operation is performed. performed. If the block already exists in the cache, the block is If the block already exists in the cache, the block is locked without accessing storage. An unable-to-lock locked without accessing storage. An unable-to-lock condition may occur (see Section 6.11.2.2.2), or an condition may occur (see Section 6.11.2.2.2), or an overlocking condition may occur (see Section overlocking condition may occur (see Section 6.11.2.2.1). 6.11.2.2.1). The dcbtls instruction may complete before the opera- The dcbtstls instruction may complete before the oper- tion it causes has been performed. ation it causes has been performed. The instruction is treated as a Load. The instruction is treated as a Store. This instruction is privileged unless the Embedded This instruction is privileged unless the Embedded Cache Locking.User Mode category is supported. If the Cache Locking.User Mode category is supported. If the Embedded Cache Locking.User Mode category is Embedded Cache Locking.User Mode category is sup- supported, this instruction is privileged only if ported, this instruction is privileged only if MSRUCLE=0. MSRUCLE=0. Special Registers Altered: Special Registers Altered: None None Instruction Cache Block Touch and Lock This instruction treated as a Load (see Section 4.3 of Set X-form Book II). If an unable-to-lock condition occurs (see Section icbtls CT,RA,RB 6.11.2.2.2) no operation is performed. 31 / CT RA RB 486 / This instruction is privileged unless the Embedded 0 6 7 11 16 21 31 Cache Locking.User Mode category is supported. If the Embedded Cache Locking.User Mode category is sup- Let the effective address (EA) be the sum (RA|0)+(RB). ported, this instruction is privileged only if MSRUCLE=0. The icbtls instruction causes the block containing the Special Registers Altered: byte addressed by EA to be loaded and locked into the None instruction cache specified by CT, and provides a hint that the program will probably soon execute code from the block. See Section 4.3 of Book II for a definition of the CT field. If the block already exists in the cache, the block is locked without refetching from memory. 988 Power ISATM Book III-E Version 2.06 Instruction Cache Block Lock Clear X-form icblc CT,RA,RB 31 / CT RA RB 230 / 0 6 7 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). The block containing the byte addressed by EA in the instruction cache specified by the CT field is unlocked. The instruction is treated as a Load. If an unable-to-lock condition occurs (see Section 6.11.2.2.2) no operation is performed. If the block con- taining the byte addressed by EA is not locked in the specified cache, no cache operation is performed. This instruction is privileged unless the Embedded Cache Locking.User Mode category is supported. If the Embedded Cache Locking.User Mode category is sup- ported, this instruction is privileged only if MSRUCLE=0. Special Registers Altered: None Data Cache Block Lock Clear X-form dcblc CT,RA,RB 31 / CT RA RB 390 / 0 6 7 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). The block containing the byte addressed by EA in the data cache specified by the CT field is unlocked. The instruction is treated as a Load. If an unable-to-lock condition occurs (see Section 6.11.2.2.2) no operation is performed. If the block con- taining the byte addressed by EA is not locked in the specified cache, no cache operation is performed. This instruction is privileged unless the Embedded Cache Locking.User Mode category is supported. If the Embedded Cache Locking.User Mode category is sup- ported, this instruction is privileged only if MSRUCLE=0. Special Registers Altered: None Programming Note The dcblc and icblc instructions are used to remove locks previously set by the corresponding lock set instructions. Chapter 6. Storage Control 989 Version 2.06 6.11.3 Synchronize Instruction Programming Note The Synchronize instruction is described in Section 4.3 If the Embedded.Hypervisor category is supported of Book II, but only at the level required by an applica- and if EPCRITLBGS DTLBGS = 0b00, the hypervisor tion programmer (sync with L=0 or L=1). Also see can virtualize the physical TLB by keeping a soft- Section , "Ordering of Implicit Accesses to the Page ware copy of at least the guest operating system Table" on page 957. TLB entries with IPROT=1 and avoid keeping the guest Instruction and Data TLB Error interrupt han- In conjunction with the tlbivax and tlbsync instruc- dlers in the physical TLB. tions, the sync instruction provides an ordering func- tion for TLB invalidations and related storage accesses on other threads as described in the tlbsync instruction 6.11.4.1 Reading TLB or LRAT Entries description on page 1010. TLB entries can be read by executing tlbre instruc- tions. If the Embedded.Hypervisor.LRAT category is 6.11.4 LRAT [Category: Embed- supported, LRAT entries can be read by also executing ded.Hypervisor.LRAT] and TLB tlbre instructions. At the time of tlbre execution, a TLB array is selected if MAS0ATSEL=0 or MSRGS=1. The Management LRAT array is selected if MAS0ATSEL=1 and MSRGS=0. The LRAT can be read only when in hypervisor state. Unless the Embedded.Page Table category is sup- ported, no format for the Page Tables or the Page If a TLB array is selected, MAS0TLBSEL selects the TLB Table Entries is implied. Software has significant flexi- array to be written. If TLBnCFGHES=0, the TLB entry in bility in implementing a custom replacement strategy. the selected TLB array is selected by MAS0ESEL and For example, software may choose to set IPROT=1 for MAS2EPN. In this case, MAS0ESEL selects one of the TLB entries that correspond to frequently used storage, possible entries that can be used for a given MAS2EPN. so that those entries are never cast out of the TLB and If TLBnCFGHES=1, the TLB entry in the selected TLB TLB Miss exceptions to those pages never occur. At a array is selected by MAS0ESEL and by a hardware gen- minimum, software must maintain a TLB entry or erated hash based on MAS1TID TSIZE, and MAS2EPN. entries for the Instruction and Data TLB Error interrupt In this case, MAS0ESEL selects one of the possible handlers. entries that can be used for a given MAS1TID TSIZE and MAS2EPN. TLB management is performed in software with some hardware assist. This hardware assist consists of a If an LRAT array is selected, the LRAT entry is selected minimum of: by MAS0ESEL and MAS2EPN. In this case, MAS0ESEL selects one of the possible entries that can be used for Automatic recording of the effective address caus- a given MAS2EPN. ing a TLB Error interrupt. For Instruction TLB Error interrupts, the address is saved in the Save/ Specifying invalid values for MAS0TLBSEL, MAS0ESEL, Restore Register 0. For Data TLB Error interrupts, MAS2EPN or, if TLBnCFGHES=1, MAS1TID or the address is saved in the Data Exception MAS1TSIZE, results in MAS1V being set to 0 and unde- Address Register. fined results in the other MAS register fields that are Automatic updating of the MAS register on the loaded by the tlbre instruction. Which values are occurrence of a TLB Error interrupt if the Embed- invalid is implementation-dependent. For example, ded.Hypervisor category is not supported, MAS even though an implementation has only one TLB Register updates are enabled for interrupts array, the implementation could simply ignore directed to the hypervisor (EPCRDMIUH = 0), or the MAS0TLBSEL, read the selected entry in the TLB array, interrupt is directed to the guest state. and load the MAS registers from the TLB entry regard- less of the MAS0TLBSEL value. Instructions for reading, writing, searching, invali- dating, and synchronizing the TLB. If the Embed- Programming Note ded.Hypervisor.LRAT category is supported, a When reading TLB entries, MAS2EPN or a subset subset of these instructions can also be used for of the bits in MAS2EPN is used to form the index for reading and writing the LRAT. accessing the TLB array, i.e. MAS2EPN isn't neces- sarily an effective page number. 6.11.4.2 Writing TLB or LRAT Entries TLB entries can be written by executing tlbwe instruc- tions. If the Embedded.Hypervisor.LRAT category is supported, LRAT entries can also be written by execut- 990 Power ISATM Book III-E Version 2.06 ing tlbwe instructions. At the time of tlbwe execution, a IPROT equal to 0 is selected. However, an Embedded TLB array is selected if MAS0ATSEL=0 or MSRGS=1. Hypervisor Privilege exception occurs on a tlbwe if all The LRAT array is selected if MAS0ATSEL=1 and the following conditions are met. MSRGS=0. The LRAT can be written only when in The Embedded.Hypervisor.LRAT category is sup- hypervisor state. ported. The tlbwe is executed in guest supervisor state. If a TLB array is selected, MAS0TLBSEL selects the TLB IPROT=1 for all entries which can be used for array to be written. If TLBnCFGHES=0, the TLB entry in translating addresses with the specified TID, the selected TLB array is selected by MAS0ESEL and TSIZE and EPN. MAS2EPN. In this case, MAS0ESEL selects one of the MAS0WQ= 0b00 possible entries that can be used for a given MAS2EPN. If TLBnCFGHES=1 and MAS0HES=0, the TLB entry in If MAS0WQ= 0b01, MAS0HES =1, and the first three of the selected TLB array is selected by a hardware gen- the above conditions are met, it is implementation- erated hash based on MAS1TID TSIZE, MAS2EPN, and dependent whether an Embedded Hypervisor Privilege MAS0ESEL. In this case, MAS0ESEL selects one of the exception occurs. possible entries that can be used for a given For TLBs with TLBnCFGHES=1, the relationship MAS0TLBSEL, MAS1TID TSIZE, and MAS2EPN. If between the TLB entry selected by a tlbwe with TLBnCFGHES=1 and MAS0HES=1, the TLB entry in the MAS0HES=0 versus the TLB entry selected by a tlbwe selected TLB array is selected by a hardware replace- with MAS0HES=1 is implementation-dependent and ment algorithm and a hardware generated hash based may depend on the history of TLB use. on MAS1TID TSIZE and MAS2EPN. If an invalid value is specified for MAS0TLBSEL If an LRAT array is selected, the LRAT entry is selected MAS0ESEL or MAS2EPN, either no TLB entry is written by MAS0ESEL and MAS2EPN. In this case, MAS0ESEL by the tlbwe, or the tlbwe is performed as if some selects one of the possible entries that can be used for implementation-dependent, valid value were substi- a given MAS2EPN. LRAT entries can be written only tuted for the invalid value, or an Illegal Instruction with MAS0HES= 0. exception occurs. If the page size specified by At the time of tlbwe execution, the MAS registers con- MAS1TSIZE is not supported by the specified array, the tain the contents to be written to the indexed TLB entry. tlbwe may be performed as if TSIZE were some imple- Upon completion of the tlbwe instruction, the contents mentation-dependent value, or an Illegal Instruction of the MAS registers corresponding to TLB entry fields exception occurs. will be written to the indexed TLB entry, except that if If the Embedded.Hypervisor category is supported but the Embedded.Hypervisor.LRAT category is sup- the Embedded.Hypervisor.LRAT category is not sup- ported, guest execution of TLB Management instruc- ported, the tlbwe instruction is hypervisor privileged. tions is enabled (EPCRDGTMI=0), MSRPR = 0, MSRGS Otherwise, this instruction is privileged. = 1, and, for the TLB array to be written, TLBnCFGGTWE = 1, the RPN from the MAS registers is treated like an LPN and translated by the LRAT, and Programming Note the RPN from the LRAT is written to the TLB entry if the Since a hardware replacement algorithm selects translation is successful. See Section 6.9. If the LRAT the entry for a tlbwe instruction with MAS0HES = 1, translation fails, an LRAT Miss exception occurs. it is typically not possible to write the same entry using a second tlbwe instruction with MAS0HES = If the Embedded.Hypervisor.LRAT category is sup- 1. Doing so might create multiple entries for the ported and a guest supervisor executes a tlbwe same virtual page. If software needs to change the instruction with MAS1IPROT=1 or for which the entry to value of any of the TLB fields, software should gen- be overwritten has IPROT=1, an Embedded Hypervisor erally invalidate the original entry before executing Privilege exception occurs. However, if the second tlbwe instruction with the new values. MAS0WQ=0b01, it is implementation-dependent whether the Embedded Hypervisor Privilege exception occurs. 6.11.4.2.1 TLB Write Conditional [Embed- ded.TLB Write Conditional] If the Embedded.Hypervisor.LRAT category is sup- ported and a guest supervisor executes a tlbwe The tlbsrx. instruction and tlbwe instruction instruction with MAS0HES =0, it is implementation- with MAS0WQ = 0b01 together permit a convenient way dependent whether the Embedded Hypervisor Privi- for software to write a TLB entry while ensuring that the lege exception occurs. entry is not a duplicate entry and is not a stale, invalid value. Without the TLB Write Conditional facility, soft- If a TLB entry is being written with MAS0HES =1, the ware must hold a software lock during the process of hardware replacement algorithm picks an entry in the creating a TLB entry in order to prevent other threads selected array from the set of entries which can be from updating a shared TLB or invalidating a TLB entry. used for translating addresses with the specified TID, TSIZE and EPN. Whenever possible, an entry with Chapter 6. Storage Control 991 Version 2.06 The tlbsrx. instruction has two effects that thread holding the TLB-reservation or by a thread occur either at the same time or in the following order. that shares the TLB with this thread, and the MAS5SLPID and MAS6SPID values used by the 1. A TLB-reservation is established for a virtual tlbilx match the LPID and PID values address, and, if the Embedded Page Table cate- associated with the TLB-reservation. gory is supported, an associated IND value. If category Embedded.Hypervisor is supported, 2. A search of the selected TLB array is performed. and a tlbilx with T = 3 is executed by the thread holding the TLB-reservation or by a thread The TLB-reservation is used by a subsequent tlbwe that shares the TLB with this thread, the instruction that writes a TLB entry (i.e. MAS0ATSEL = 0 MAS5SGS, MAS5SLPID, MAS6SPID, and MAS6SAS or MSRGS=1) with MAS0WQ = 0b01. The TLB is only values used by the tlbilx match the GS, written by this tlbwe if the TLB-reservation still exists at LPID, PID, and AS values associated with the the instant the TLB is written. A tlbwe that writes the TLB-reservation, and EA0:n-1 values of the tlbilx TLB is said to "succeed". TLB Write Conditional cannot match the EA0:n-1 values associated with be used for the LRAT. the TLB-reservation, where n=64- TLB-reservation log2(page size in bytes) and page size is specified by the MAS6ISIZE. A TLB-reservation is set by a tlbsrx. instruc- A tlbwe instruction is executed by the thread hold- tion. The TLB-reservation has an associated IND ing the TLB-reservation or by a thread that shares and virtual address consisting of AS, PID, the TLB with this thread, and all the following are EA0:53, LPID , and GS . These values true. come from MAS1IND , MAS1TS MAS1TID, An interrupt does not occur as a result of the MAS2EPN, MAS8TLPID , and MAS8TGS tlbwe instruction. , respectively. There is no specific page size The Embedded.Hypervisor category is not associated with the TLB-reservation. The TLB-reserva- supported or the MAS8TLPID value used by tion applies to any virtual page that contains the virtual the tlbwe match the LPID value associated address. There is only one TLB-reservation in a thread. with the TLB-reservation. The TLB-reservation is cleared by any of the following. The Embedded.Hypervisor category is not The thread holding the TLB-reservation executes supported or the MAS8TGS value used by the another tlbsrx. : This clears the first tlbwe match the GS value associated with the TLB-reservation and establishes a new one. TLB-reservation. A tlbivax is executed by any thread and all the fol- The MAS1TID value used by the tlbwe lowing conditions are met. matches the PID value associated with the Either the Embedded.Hypervisor category is TLB-reservation. not supported or the MAS5SGS and The Embedded.Page Table category is not MAS5SLPID values used by the tlbivax match supported or the MAS1IND value used by the the GS and LPID values associated with the tlbwe matches the IND value associated with TLB-reservation. the TLB-reservation. The MAS6SPID and MAS6SAS values used by The MAS1TS value used by the tlbwe the tlbivax match the PID and AS values matches the AS value associated with the associated with the TLB-reservation. TLB-reservation. The EA0:n-1 values of the tlbivax match the Bits 0:(n-1) of MAS2EPN used by the tlbwe EA0:n-1 values associated with the TLB-reser- match the EA0:n-1 values associated with the vation, where n=64-log2(page size in bytes) TLB-reservation, where n=64- and page size is specified by the MAS6ISIZE. log2(page size in bytes) and page size is The thread holding the TLB-reservation or another specified by the MAS1TSIZE used by the thread that shares the TLB with this thread exe- tlbwe. cutes a mtspr to MMUCSR0 that performs a TLB Either of the following conditions are met. invalidate all operation and LPIDR contents of the The MAS0WQ used by the tlbwe thread executing the mtspr matches the LPID instruction is 0b00. value associated with the TLB-reservation. The MAS0WQ used by the tlbwe If category Embedded.Hypervisor is supported, instruction is 0b01 and the TLB-reser- and a tlbilx with T = 0 is executed by the vation for the thread executing the thread holding the TLB-reservation or by a thread tlbwe exists. that shares the TLB with this thread, and the The thread that has the TLB-reservation or another MAS5SLPID value used by the tlbilx thread that shares the TLB with this thread that, as matches the LPID value associated with the TLB- a result of a Page Table translation, writes a TLB reservation. entry and all the following conditions are met. If category Embedded.Hypervisor is supported, The TS and EA0:n-1 values for the new TLB and a tlbilx with T = 1 is executed by the entry match the corresponding values associ- 992 Power ISATM Book III-E Version 2.06 ated with the TLB-reservation where n = 64- Programming Note log2(page size in bytes), where page size is specified by the SIZE value written to the TLB A common operation is to ensure that the TLB-res- entry. ervation has been set by a tlbsrx. The Embedded.Hypervisor category is not instruction before executing a subsequent Load supported or TLPID for the new TLB entry instruction of a software page table entry in order to matches the LPID associated with the TLB- ensure the TLB-reservation detects an invalidation reservation. of the entry that was accessed. Beside using a con- The Embedded.Hypervisor category is not text synchronizing instruction, software can also supported or TGS for the new TLB entry ensure the TLB-reservation has been set by a tlb- matches the GS associated with the TLB-res- srx. instruction by reading the CR0 field ervation. or CR with a mfocrf or mfcr instruction after the The TID for the new TLB entry matches the tlbsrx. and creating a dependency PID associated with the TLB-reservation. between the data read from CR0 or CR and the The Valid bit for the new TLB entry is 1. address used for the subsequent Load instruction. The IND value associated with the TLB-reser- vation is 0. Serialization of TLB operations Implementations are allowed to clear a TLB-reservation Regardless of which threads initiated the operations, all for conditions other than those specified above. The operations (reads, writes, invalidates, and searches) architecture assures that a TLB-reservation will be involving a single TLB are defined to be serialized such cleared when required per the above requirements, but that only one operation occurs at a time. This also does not guarantee that these are the only conditions applies to a TLB that is shared by multiple threads. for clearing a TLB-reservation. However, the occur- Even if there is no matching TLB entry on a tlbivax, the rence of an interrupt does not clear a TLB-reservation. TLB is still searched to determine there is no matching entry and this search is still referred to as the TLB inval- Programming Note idation. Software running on two threads that share a TLB If two threads share a TLB and both simultaneously should not attempt to create two TLB entries that execute a tlbsrx. instruction for a virtual would both translate a specific virtual address and address in a virtual page V, and then both threads exe- where the TID or LPID values differ, i.e. one of the cute a TLB Write Conditional to create a TLB entry for values is zero and the other is nonzero. The TLB- the virtual page V, at most one of these tlbwe instruc- reservation will not protect against this case, since tions succeeds. a TLB-reservation is not cleared by a tlbwe unless there is an exact match on the PID and LPID val- If, after thread P1 establishes a TLB-reservation for a ues. virtual address in a virtual page V, another thread P2 executes a tlbivax that invalidates a TLB entry for the Likewise software should not attempt to create a virtual page V and thread P1 does a TLB Write Condi- Page Table entry and a TLB entry where both tional to create a TLB entry for the virtual page V, then entries would translate a specific virtual address, one of the following occurs. where the TLB array written by the Page Table The TLB invalidation occurs before the TLB write. translation is used by the same thread that uses Thus the TLB-reservation is lost and the TLB Write this TLB entry, and where the TID or LPID values Conditional does not succeed. of the PTE and TLB entry differ, i.e. one of the val- The TLB write occurs before the TLB invalidation. ues is zero and the other is nonzero. The TLB-res- Thus the TLB Write Conditional succeeds and the ervation will not protect against this case, since a resulting TLB entry created by the tlbwe is invali- TLB-reservation is not cleared by a TLB write dated by the tlbivax. resulting from a Page Table translation unless there is an exact match on the PID and LPID val- Forward progress ues. Forward progress in loops that use tlbsrx. and tlbwe with MAS0WQ = 0b01 is achieved by a coop- erative effort among hardware and system software. Synchronization of TLB-reservation The architecture guarantees that when a thread exe- cutes a tlbsrx. to set a TLB-reservation for The side-effect of a tlbsrx. instruction setting virtual address X and then a TLB Write Conditional to the TLB-reservation can be synchronized by a context write a TLB entry, either synchronizing instruction or event. 1. the TLB Write Conditional succeeds and the TLB entry is written, or Chapter 6. Storage Control 993 Version 2.06 2. the TLB Write Conditional fails because the TLB- In systems consisting of a single-threaded processor reservation was reset because some other thread as well as in systems consisting of multi-threaded pro- invalidated all TLB entries in the system for the vir- cessors, invalidations can occur on a wider set of TLB tual page containing the virtual address X or some entries than intended. That is, a virtual address pre- other thread wrote a shared TLB entry for the vir- sented for invalidation may cause not only the intended tual page containing the virtual address X, or TLB targeted for invalidation to be invalidated, but may also invalidate other TLB entries depending on the 3. the TLB Write Conditional fails because the implementation. This is because parts of the translation thread's TLB-reservation was lost for some other mechanism may not be fully specified to the hardware reason. at invalidate time. This is especially true in SMP sys- In Case 1 forward progress is made in the sense that tems, where the invalidation address must be supplied the thread successfully wrote the TLB entry. In Case 2, to all threads in the system, and there may be other lim- the system as a whole makes progress in the sense itations imposed by the hardware implementation. This that either some thread successfully invalidated TLB phenomenon is known as generous invalidates. The entries for virtual address X or some thread that shares architecture assures that the intended TLB will be inval- the TLB wrote a TLB entry for the virtual page contain- idated, but does not guarantee that it will be the only ing virtual address X. Case 3 covers TLB-reservation one. A TLB entry invalidated by writing the V bit of the loss required for correct operation of the rest of the sys- TLB entry to 0 by use of a tlbwe instruction is guaran- tem. This includes TLB-reservation loss caused by teed to invalidate only the selected TLB entry. Invali- some other thread invalidating all entries in a shared dates occurring from tlbilx or tlbivax TLB, as well as TLB-reservation loss caused by system instructions or from tlbivax instructions on another software invalidating all entries for the PID value asso- thread may cause generous invalidates. ciated with virtual address X. It may also include imple- The architecture provides a method to protect against mentation-dependent causes of reservation loss. generous invalidations. This is important since there An implementation may make a forward progress guar- are certain virtual memory regions that must be prop- antee, defining the conditions under which the system erly mapped to make forward progress. To prevent this, as a whole makes progress. Such a guarantee must the architecture specifies an IPROT bit for TLB entries. specify the possible causes of TLB-reservation loss in If the IPROT bit is set to 1 in a given TLB entry, that Case 3. While the architecture alone cannot provide entry is protected from invalidations resulting from such a guarantee, the characteristics listed in Cases 1 tlbilx and tlbivax instructions, or from invali- and 2 are necessary conditions for any forward date all operations. TLB entries with the IPROT field set progress guarantee. An implementation and operating may only be invalidated by explicitly writing the TLB system can build on them to provide such a guarantee. entry and specifying a 0 for the V (MAS1V) field. To invalidate one or more individual virtual pages from Programming Note all TLB arrays in all threads without the involvement of The architecture does not include a "fairness guar- software running on other threads, software can exe- antee". In competing for a TLB-reservation, two cute the following sequence of instructions. threads can indefinitely lock out a third. one or more tlbivax instructions mbar or sync tlbsync 6.11.4.3 Invalidating TLB Entries sync TLB entries may be invalidated by three different meth- Other instructions, excluding tlbivax, may be inter- ods or if the Embedded.Hypervisor category is sup- leaved with the instruction sequence shown above, but ported, by four different methods. the instructions in the sequence must appear in the The TLB entry can be invalidated as the result of a order shown. On systems consisting of only a single- tlbwe instruction that sets the MAS1V bit in the threaded processor, the tlbsync and the preceding entry to 0. mbar or sync can be omitted. TLB entries may be invalidated as a result of a tlbivax instruction or from an invalidation resulting from a tlbivax on another thread. TLB entries may be invalidated as a result of an invalidate all operation specified through appropri- ate settings in the MMUCSR0. If the Embedded.Hypervisor category is supported, TLB entries may be invalidated as a result of a tlbilx instruction. See Section 6.11.4.4 for the effects of the above meth- ods on TLB lookaside information. 994 Power ISATM Book III-E Version 2.06 Programming Note Programming Note For the preceding instruction sequence, the mbar For MMU Architecture Version 1.0, to ensure a TLB or first sync instruction prevents the reordering of entry that is not protected by IPROT is invalidated if tlbivax instructions previously executed by the software does not know which TLB array the entry thread with respect to the subsequent tlbsync is in, software should issue a tlbivax instruction instruction. The tlbsync instruction and the subse- targeting each TLB in the implementation with the quent sync instruction together ensure that all stor- EA to be invalidated. age accesses for which the address was translated using the translations being invalidated will be per- formed with respect to any thread or mechanism, to Programming Note the extent required by the associated Memory The preferred method of invalidating entire TLB Coherence Required and Alternate Coherency arrays is invalidation using MMUCSR0. Mode attributes, before any data accesses caused by instructions following the sync instruction are Programming Note performed with respect to that thread or mecha- nism. Invalidations using MMUCSR0 only affect the TLB array on the thread that performs the invalidation. To perform invalidations on all threads in a coher- Programming Note ence domain on a multi-threaded processor or on a Implementations are permitted to have a restriction system containing multiple single-threaded proces- on the number of threads doing a tlbivax-mbar/ sors, software should use tlbivax. If a large num- sync-tlbsync-sync sequence. This restriction ber of TLB entries need to be invalidated, using could be imposed by the system or the hardware. MMUCSR0 or, if the Embedded.Hypervisor cate- gory is supported, tlbilx , on each thread may be more efficient. Programming Note The most obvious issue with generous invalida- tions is the code memory region that serves as the Programming Note exception handler for MMU faults. If this region Since a hardware replacement algorithm selects does not have a valid mapping, an MMU exception the entry for a tlbwe instruction with MAS0HES = 1, cannot be handled because the first address of the it is typically not possible to invalidate the entry exception handler will result in another MMU using a second tlbwe instruction with MAS0HES = 1 exception. and MAS1V = 0. If software needs to invalidate a single entry that was written with MAS0HES = 1, software should generally invalidate the entry using Programming Note tlbilx with T=3 or tlbivax. Not all TLB arrays in a given implementation will implement the IPROT attribute. It is likely that implementations that are suitable for demand page 6.11.4.4 TLB Lookaside Information environments will implement it for only a single array, while not implementing it for other TLB For performance reasons, most implementations also arrays. have implementation-specific lookaside information that is used in address translation. This lookaside infor- mation is a cache of recently used TLB entries. Programming Note If TLBnCFGHES=0, lookaside information for the asso- Operating systems and hypervisors need to use ciated TLB array is kept coherent with the TLB and is great care when using protected (IPROT) TLB invisible to software. Any write to the TLB array that entries, particularly in SMP systems. A system that displaces or updates an entry will be reflected in the contains TLB entries on other threads will require a lookaside information, invalidating the lookaside infor- cross thread interrupt or some other synchroniza- mation corresponding to the previous TLB entry. Any tion mechanism to assure that each thread per- type of invalidation of an entry in TLB will also invali- forms the required invalidation by writing its own date the corresponding entry in the lookaside informa- TLB entries. tion. If TLBnCFGHES=1, lookaside information for the asso- ciated TLB array is not required to be kept coherent with the TLB. Only in the following conditions will the lookaside information be kept coherent with the TLB. The MMUCSR0 TLB invalidate all will invalidate all lookaside information. The tlbilx and tlbivax Chapter 6. Storage Control 995 Version 2.06 instructions invalidate lookaside information corre- and EPN values corresponding to the access that sponding to TLB entry values that they are specified to caused the exception. MAS6 is updated to set invalidate as well as those TLB entry values that would MAS6SPID to the value of PID, MAS6SIND to the value have been invalidated except for their IPROT=1 value. of MAS4INDD, and MAS6SAS to the value of MSRDS or MSRIS depending on the type of access (data or The same instructions that synchronize invalidations of instruction) that caused the error. In addition, if TLB entries also synchronize invalidation of TLB looka- MAS4TLBSELD identifies a TLB array that supports NV side information. (Next Victim), MAS0ESEL is loaded with a value that hardware predicts represents the best TLB entry to vic- Programming Note timize to create a new TLB entry and MAS0NV is If TLBnCFGHES=1 for a TLB array and it is impor- updated with the TLB entry index of what hardware pre- tant that the lookaside information corresponding to dicts to be the next victim for the set of entries which a TLB entry be invalidated, software should use can be used for translating addresses with the EPN tlbilx or tlbivax to invalidate the virtual that caused the exception. Thus MAS0ESEL identifies address. the current TLB entry to be replaced, and MAS0NV points to the next victim. When software writes the TLB entry, the MAS0NV field is written to the TLB array's set 6.11.4.5 Invalidating LRAT Entries of next victim values. The algorithm used by the hard- ware to determine which TLB entry should be targeted There is only one mechanism for invalidating LRAT for replacement is implementation-dependent. entries. An LRAT entry can be invalidated as the result of a tlbwe instruction that overwrites the LRAT entry Next Victim support is provided for TLB arrays that are with a new valid entry or that sets LRATV = 0. Only one set associative and that have TLBnCFGHES=0. Next LRAT entry is invalidated by a single tlbwe. Victim support is not provided for TLB arrays that are fully associative. 6.11.4.6 Searching TLB Entries The automatic update of the MAS registers sets up all the necessary fields for creating a new TLB entry with Software may search the MMU by using the tlbsx the exception of RPN, the U0-U3 attribute bits, and the instruction, and, if Category: TLB Write Conditional cat- permission bits. With the exception of the upper 32 bits egory is supported, the tlbsrx. instruction. of RPN and the page attributes (should software desire The tlbsrx. and tlbsx instructions use IND, to specify changes from the default attributes), all the PID, and AS values from the MAS registers instead of remaining fields are located in MAS3, requiring only the the PID registers and the MSR, and, if the Embed- single MAS register manipulation by software before ded.Hypervisor category is supported, these instruc- writing the TLB entry. tions use an LPID and GS value from the MAS registers instead LPIDR and MSR. This allows software For Instruction Storage interrupt (ISI) and Data Storage to search address spaces that differ from the current interrupt (DSI) related exceptions, the MAS registers address space defined by the PID registers. This is are not updated. Software must explicitly search the useful for TLB fault handling. TLB to find the appropriate entry. The update of MAS registers through TLB Replace- 6.11.4.7 TLB Replacement Hardware ment Hardware Assist is summarized in Table 11 on Assist page 986. The architecture provides mechanisms to assist soft- Programming Note ware in creating and updating TLB entries when certain MMU related exceptions occur. This is called TLB Next Victim support is not provided for a fully asso- Replacement Hardware Assist. Hardware will update ciative array because such an array is intended for the MAS registers on the occurrence of a Data TLB mostly static mappings of addresses. Error Interrupt or Instruction TLB Error interrupt if the Embedded.Hypervisor category is not supported, MAS Register updates are enabled for interrupts directed to 6.11.4.8 32-bit and 64-bit Specific MMU the hypervisor (EPCRDMIUH = 0), or the interrupt is Behavior directed to the guest state. MMU behavior is largely unaffected by whether the When a Data or Instruction TLB Error interrupt (TLB thread is in 32-bit computation mode (MSRCM=0) or 64- miss) occurs and if the Embedded.Hypervisor category bit computation mode (MSRCM=1). The only differ- is not supported, MAS Register updates are enabled ences occur in the EPN field of the TLB entry and the for interrupts directed to the hypervisor (EPCRDMIUH = EPN field of MAS2. The differences are summarized 0), or the interrupt is directed to the guest state, then here. MAS0, MAS1, and MAS2 are automatically updated using the defaults specified in MAS4 as well as the AS 996 Power ISATM Book III-E Version 2.06 Executing a tlbwe instruction in 32-bit mode will set bits 0:31 of the TLB EPN field to 0, regardless of the value of bits 0:31 of the EPN field in MAS2. For an update to MAS registers via TLB Replace- ment Hardware Assist (see Section 6.11.4.7), an update to bits 0:53 of the EPN field occurs regard- less of the computation mode of the thread at the time of the exception or the interrupt computation mode in which the interrupt is taken. If the instruc- tion causing the exception was executing in 32-bit mode, bits 0:31 of the EPN field in MAS2 will be set to 0. Executing a tlbre instruction in 32-bit mode will set bits 0:31 of the MAS2 EPN field to an undefined value. Programming Note This allows a 32-bit OS to operate seamlessly in 32-bit mode on a 64-bit implementation and a 64- bit OS to easily support 32-bit applications. Chapter 6. Storage Control 997 Version 2.06 6.11.4.9 TLB Management Instructions ing function for the effects of tlbivax. If the Embed- ded.Hypervisor category is supported, the tlbilx The tlbivax instruction is used to invalidate TLB instruction is used to invalidate TLB entries in the entries. Additional instructions are used to read and thread executing the tlbilx. write, and search TLB entries, and to provide an order- TLB Invalidate Virtual Address Indexed The logical AND of EA0:53 and m is equal to the X-form logical AND of the EPN value of the TLB entry and m, where m is based on the following. tlbivax RA,RB If MMUCFGTWC = 1 or TLBnCFGHES = 1, c is equal MAS6ISIZE. Otherwise, c is equal to 31 /// RA RB 786 / entrySIZE. 0 6 11 16 21 31 If MMU Architecture Version 1.0 is supported, m is equal to the logical NOT of ((1 << (2 × if RA = 0 then b 0 c)) - 1). Otherwise, m is equal to the logical else b (RA) NOT of ((1 << c) - 1). EA b + (RB) The TID value of the TLB entry is equal to MAS6SPID for each thread and the TS value of the TLB entry is equal to if MAV = 1.0 then for TLB array = EA59:60 MAS6SAS. if MAV = 2.0 then for each TLB array The implementation does not support the Embed- for each TLB entry ded.Page Table category or the IND value of the if MMUCFGTWC = 1 or TLBnCFGHES = 1 then TLB entry is equal to MAS6SIND. c MAS6ISIZE Either of the following is true: else The implementation does not support the c entrySIZE if MAV = 1.0 then Embedded.Hypervisor category. m ¬((1 << (2 × c)) - 1) The TLPID value of the TLB entry is equal to else MAS5SLPID and the TGS value of the TLB entry m ¬((1 << c) - 1) is equal to MAS5SGS. if ((EA0:53 & m) = (entryEPN & m)) & entryIPROT = 0. entryTID = MAS6SPID & entryTS = MAS6SAS & (E.PT not supported | entryIND = MAS6SIND) & In MMU Architecture Version 1.0 if EA61=1, all entries (E.HV not supported | in all threads not protected by the IPROT attribute in (entryTLPID = MAS5SLPID & the TLB array targeted by EA59:60 are made invalid. entryTGS = MAS5SGS)) | ((MAV = 1.0) & (EA61 = 1)) If the instruction specifies a TLB array that does not then exist, the instruction is treated as if the instruction form if entryIPROT = 0 then entryV 0 is invalid. If the implementation requires the page size to be specified by MAS6ISIZE (MMUCFGTWC = 1 or, for Let the effective address (EA) be the sum (RA|0)+ the specified TLB array, TLBnCFGHES = 1) and the (RB). The EA is interpreted as show below. page size specified by MAS6ISIZE is not supported by EA0:53 EA0:53 the implementation, the instruction is treated as if the instruction form is invalid. EA54:58 Reserved If the operation isn't a TLB invalidate all and there are EA59:60 TLB array selector [MAV = 1.0] multiple entries in a single thread's TLB array(s) that 00 TLB0 match the complete VPN, then zero or more matching 01 TLB1 entries with IPROT=0 are invalidated or a Machine 10 TLB2 Check interrupt occurs. If the Embedded.Hypervisor 11 TLB3 category is supported, this Machine Check interrupt EA61 TLB invalidate all [MAV = 1.0] must be precise. EA62:63 Reserved The operation performed by this instruction is ordered by the mbar (or sync) instruction with respect to a sub- If EA61=0, all TLB entries on all threads that have all of sequent tlbsync instruction executed by the thread the following properties are made invalid. The MAS executing the tlbivax instruction. The operations registers listed are those in the thread executing the caused by tlbivax and tlbsync are ordered by mbar as tlbivax. a set of operations which is independent of the other The MMU architecture version is 2.0 or the entry is sets that mbar orders. in the TLB array targeted by EA59:60. The effects of the invalidation on this thread are not guaranteed to be visible to the programming model 998 Power ISATM Book III-E Version 2.06 until the completion of a context synchronizing opera- tion. Invalidations may occur for other TLB entries in the designated array, but in no case will any TLB entries with the IPROT attribute set be made invalid. If RA does not equal 0, it is implementation-dependent whether an Illegal Instruction exception occurs. If the Embedded.Hypervisor category is supported, this instruction is hypervisor privileged. Otherwise, this instruction is privileged. Special Registers Altered: None Programming Note Care must be taken not to invalidate any TLB entry that contains the mapping for any interrupt vector. For backward compatibility, implementations may ignore a TLB entry's TS and TID fields when deter- mining whether an entry should be invalidated. Since this and other such generous invalidation can be performed, consideration should be given to protecting a TLB entry that maps an interrupt vec- tor by setting TLBIPROT =1. Programming Note The tlbilx instruction is the preferred way of performing TLB invalidations for operating sys- tems running as a guest to the hypervisor since the invalidations are partitioned and do not require hypervisor privilege. Programming Note The TLB invalidate all function (EA61=1) only exists in MMU Architecture Version 1.0 implementations. It should only be used when running existing soft- ware is deemed important. Chapter 6. Storage Control 999 Version 2.06 TLB Invalidate Local Indexed If MMU Architecture Version 1.0 is supported, X-form m is equal to the logical NOT of ((1 << (2 × c)) - 1). Otherwise, m is equal to the logical tlbilx RA,RB [Category: Embedded.Hypervisor]] NOT of ((1 << c) - 1). The TID value of the TLB entry is equal to MAS6SPID 31 /// T RA RB 18 / and the TS value of the TLB entry is equal to 0 6 9 11 16 21 31 MAS6SAS. The implementation does not support the Embed- if RA = 0 then b 0 ded.Page Table category or the IND value of the else b (RA) TLB entry is equal to MAS6SIND. EA b + (RB) The IPROT of entry is 0. for each TLB array The effects of the invalidation are not guaranteed to be for each TLB entry if MMUCFGTWC = 1 or TLBnCFGHES = 1 then visible to the programming model until the completion c MAS6ISIZE of a context synchronizing operation. else Invalidations may occur for other TLB entries on the c entrySIZE thread executing the tlbilx instruction, but in no case if MAV = 1.0 then m ¬((1 << (2 × c)) - 1) will any TLB entries with the IPROT attribute set be else made invalid. m ¬((1 << c) - 1) If T = 2, the instruction form is invalid. if (entryIPROT = 0) & (entryTLPID = MAS5SLPID) then if T = 0 then entryV 0 If T = 3 and the implementation requires the page size if T = 1 & entryTID = MAS6SPID then entryV 0 to be specified by MAS6ISIZE (MMUCFGTWC = 1 or, for if T = 3 & entryTGS = MAS5SGS & any TLB array, TLBnCFGHES = 1) and the page size ((EA0:53 & m) = (entryEPN & m)) & specified by MAS6ISIZE is not supported by the imple- entryTID = MAS6SPID & entryTS = MAS6SAS & (E.PT not supported | entryIND = MAS6SIND) mentation, the instruction is treated as if the instruction then form is invalid. entryV 0 If T=3 and there are multiple entries in the TLB array(s) Let the effective address (EA) be the sum (RA|0) + that match the complete VPN, then zero or more (RB). matching entries with IPROT=0 are invalidated or a Machine Check interrupt occurs. If the Embed- The tlbilx instruction invalidates TLB entries in the ded.Hypervisor category is supported, this Machine thread that executes the tlbilx instruction. TLB entries Check interrupt must be precise. which are protected by the IPROT attribute (entryIPROT = 1) are not invalidated. If RA does not equal 0, it is implementation-dependent whether an Illegal Instruction exception occurs. If T = 0, all TLB entries that have all of the following properties are made invalid on the thread executing the If the Embedded.Hypervisor category is supported and tlbilx instruction. guest execution of TLB Management instructions is dis- The TLPID of the entry matches MAS5SLPID. abled (EPCRDGTMI=1), this instruction is hypervisor The IPROT of entry is 0. privileged. Otherwise, this instruction is privileged. If T = 1, all TLB entries that have all of the following Special Registers Altered: properties are made invalid on the thread executing the None tlbilx instruction. Extended Mnemonics: The TLPID of the entry matches MAS5SLPID. The TID of the entry matches MAS6SPID. Examples of extended mnemonics for TLB Invalidate The IPROT of entry is 0. Local: If T = 3, all TLB entries in the thread executing the Extended: Equivalent to: tlbilx instruction that have all of the following properties tlbilxlpid tlbilx 0,0,0 are made invalid. tlbilxpid tlbilx 1,0,0 The TLPID value of the TLB entry is equal to tlbilxva RA,RB tlbilx 3,RA,RB MAS5SLPID and the TGS value of the TLB entry is tlbilxva RB tlbilx 3,0,RB equal to MAS5SGS. The logical AND of EA0:53 and m is equal to the logical AND of the EPN value of the TLB entry and m, where m is based on the following. If MMUCFGTWC = 1 or TLBnCFGHES = 1, c is equal MAS6ISIZE. Otherwise, c is equal to entrySIZE. 1000 Power ISATM Book III-E Version 2.06 Programming Note tlbilx is the preferred way of performing TLB invali- dations, especially for operating systems running as a guest to the hypervisor since the invalidations are partitioned and do not require hypervisor privi- lege. Programming Note When dispatching a guest operating system, hypervisor software should always set MAS5SLPID to the guest's corresponding LPID value. Programming Note Executing a tlbilx instruction with T=0 or T=1 may take many cycles to perform. Software should only issue these operations when an LPID or a PID value is reused or taken out of use. Chapter 6. Storage Control 1001 Version 2.06 TLB Search Indexed X-form if ACM supported then MAS2ACM entryACM MAS3RPNL rpn32:53 tlbsx RA,RB MAS3U0:U3 entryU0:U3 MAS7RPNU rpn0:31 31 /// RA RB 914 / if category E.HV supported then MAS8TGS VF TLPID entryTGS VF TLPID 0 6 11 16 21 31 else MAS0ATSEL 0 if RA = 0 then b 0 MAS0TLBSEL MAS4TLBSELD else b (RA) if Next Victim supported then EA b + (RB) if TLB array specified by MAS4TLBSELD supports Valid_matching_entry_exists 0 Next Victim then for each TLB array MAS0ESEL hint for each TLB entry MAS0NV hint for next replacement if MAV = 1.0 then else m ¬((1 << (2 × entrySIZE)) - 1) MAS0ESEL undefined else MAS0NV undefined m ¬((1 << entrySIZE) - 1) else if ((EA0:53 & m) = (entryEPN & m)) & MAS0ESEL undefined (entryTID = MAS6SPID | entryTID = 0) & if MAS0HES supported entryTS = MAS6SAS & MAS0HES TLBnCFGHES for the TLB array specified (E.PT not supported | entryIND = MAS6SIND) & by MAS4TLBSELD (E.HV not supported | (entryTGS = MAS5SGS & MAS1V IPROT 0 (entryTLPID = MAS5SLPID | entryTLPID = 0))) then MAS1TID TS MAS6SPID SAS Valid_matching_entry_exists 1 MAS1TSIZE MAS4TSIZED exit for loops if Embedded.Page Table category supported then if Valid_matching_entry_exists MAS1IND MAS4INDD entry matching entry found MAS2W I M G E MAS4WD ID MD GD ED array TLB array number where TLB entry found if category VLE supported then MAS2VLE MAS4VLED index index into TLB array of TLB entry found if ACM supported, then MAS2ACM MAS4ACMD if TLB array supports Next Victim then MAS2EPN undefined hint hardware hint for Next Victim MAS3RPNL 0 else MAS3U0:U3 UX SX UW SW UR SR 0 hint undefined MAS7RPNU 0 rpn entryRPN if category E.TWC supported then MAS0WQ 0b01 MAS0ATSEL 0 Let the effective address (EA) be the sum (RA|0)+ MAS0TLBSEL array MAS0ESEL index (RB). if MAS0HES supported If any TLB array contains a valid entry matching the MAS0HES 0 MAS1IND and virtual address formed by if Next Victim supported then MAS5SGS , MAS5SLPID , MAS1TS TID, if TLB array specified by MAS0TLBSEL supports NV then and EA, the search is considered successful. A TLB MAS0NV hint entry matches if all the following conditions are met. else The valid bit of the TLB entry is 1. MAS0NV undefined The logical AND of EA0:53 and m is equal to the MAS1V 1 logical AND of the EPN value of the TLB entry and MAS1TID TS TSIZE entryTID TS SIZE m, where m is determined as follows: if TLB array supports IPROT then If MMU Architecture Version 1.0 is supported, MAS1IPROT entryIPROT m is equal to the logical NOT of ((1 << (2 × else entrySIZE)) - 1). Otherwise, m is equal to the MAS1IPROT 0 if category E.PT supported then logical NOT of ((1 << entrySIZE) - 1) if TLB array supports indirect entries then The TID value of the TLB entry is equal to MAS6SPID MAS1IND entryIND or is zero. if entryIND = 1 The TS value of the TLB entry is equal to MAS6SAS. MAS3SPSIZE entrySPSIZE Either the Embedded.Page Table category is not else supported or the IND value of the TLB entry is MAS3UX SX UW SW UR SR entryUX SX UW SW UR SR equal to MAS6SIND. else Either of the following is true: MAS1IND 0 The implementation does not support the MAS3UX SX UW SW UR SR entryUX SX UW SW UR SR Embedded.Hypervisor category. else MAS3UX SX UW SW UR SR entryUX SX UW SW UR SR The TGS value of the TLB entry is equal to MAS2EPN W I M G E entryEPN W I M G E MAS5SGS and either the TLPID value of the if category VLE supported then MAS2VLE entryVLE TLB entry is equal to MAS5SLPID or is zero. 1002 Power ISATM Book III-E Version 2.06 If the search is successful, MAS register fields are If no valid matching translation exists, MAS1V is set to loaded from the matching TLB entry according to the 0 and the MAS register fields are loaded according to following. the following in order to facilitate a TLB replacement. MAS0ATSEL is set to 0. MAS0ATSEL is set to 0. MAS0TLBSEL is set to the number of the TLB array MAS0TLBSEL is loaded from MAS4TLBSELD. with the matching entry. If Next Victim is not supported for any TLB array, MAS0ESEL is set to the index of the matching MAS0ESEL is set to an implementation-dependent entry. undefined value. Otherwise, the following applies. If MAS0HES is supported, MAS0HES is set to 0. If the TLB array specified by MAS4TLBSELD If Next Victim is supported for any TLB array, the supports Next Victim, MAS0ESEL is set to the following applies. hardware hint for the index of the entry to be If the TLB array with the matching entry sup- replaced and MAS0NV is set to the hardware ports Next Victim, MAS0NV is MAS0NV is set hint for the index of the next entry to be to the hardware hint for the index of the entry replaced. Otherwise, MAS0ESEL and MAS0NV to be replaced. Otherwise, MAS0NV is set to are set to implementation-dependent unde- an implementation-dependent undefined fined values. value. If MAS0HES is supported, MAS0HES is set to the MAS1V is set to 1. value of TLBnCFG for the TLB array specified by MAS1TID TS TSIZE are loaded from the TID, TS, MAS4TLBSELD. and SIZE fields of the TLB entry. MAS1IPROT is set to 0. If the TLB array supports IPROT, MAS1IPROT is MAS1TID TS are loaded from MAS6SPID SAS. loaded from the IPROT bit of the TLB entry. Other- MAS1TSIZE is loaded from MAS4TSIZED. wise, MAS1IPROT is set to 0. If the Embedded.Page Table category is sup- MAS2EPN W I M G E are loaded from the EPN, W, I, ported, MAS1IND is set to MAS4INDD. M, G, and E fields of the TLB entry. MAS2EPN is set to an implementation-dependent If the VLE category is supported, MAS2VLE is undefined value. loaded from the VLE bit of the TLB entry. MAS2W I M G E are loaded from MAS4WD ID MD GD If Alternate Coherency Mode is supported, ED. MAS2ACM is loaded from the ACM bit of the TLB If the VLE category is supported, MAS2VLE is entry. loaded from MAS4VLED. MAS3RPNL is loaded from the lower 22-bits of the If Alternate Coherency Mode is supported, RPN field of the TLB entry, and, if implemented, MAS2ACM is loaded from MAS4ACMD. MAS7RPNU is loaded from the upper 32-bits of the MAS3RPNL and, if implemented, MAS7RPNU are RPN field of the TLB entry. set to 0s. The supported User-Defined storage control bits in The supported User-Defined storage control bits MAS3U0:U3 are loaded from the respective sup- bits in MASU0:U3 are set to 0s. ported U0:U3 bits of the TLB entry. MAS3UX SX UW SW UR SR are set to 0s. If the Embedded.Page Table category is not sup- If the Embedded.TLB Write Conditional category is ported, MAS3UX SX UW SW UR SR are loaded from supported, MAS0WQ is set to 0b01. the UX, SX, UW, SW, UR, and SR bits of the TLB entry. Otherwise, the following applies. If a tlbsx is successful, it is considered to "hit". Other- if the TLB array does not support indirect wise, it is considered to "miss". entries, MAS1IND is set to 0 and MAS3UX SX If there are multiple matching TLB entries, either one of UW SW UR SR are loaded from the UX, SX, UW, the matching entries is used or a Machine Check SW, UR, and SR bits of the TLB entry. Other- exception occurs. If the Embedded.Hypervisor cate- wise, the following applies. gory is supported, this Machine Check interrupt must MAS1IND is loaded from the IND bit of be precise. the TLB entry. If the IND bit of the TLB entry is 1, If RA does not equal zero, it is implementation-depen- MAS3SPSIZE is loaded from the dent whether an Illegal Instruction exception occurs. SPSIZE field of the TLB entry, and If the Embedded.Hypervisor category is supported, this MAS3UND is set to an implementation- instruction is hypervisor privileged. Otherwise, this dependent undefined value. instruction is privileged. If the IND bit of the TLB entry is 0, MAS3UX SX UW SW UR SR are loaded Special Registers Altered: from the UX, SX, UW, SW, UR, and MAS0 MAS1 MAS2 MAS3 MAS7 SR bits of the TLB entry. MAS8 (if category E.HV supported) If the Embedded.Hypervisor category is imple- mented, MAS8TGS VF TLPID are loaded from the TGS, VF, and TLPID fields of the TLB entry. Chapter 6. Storage Control 1003 Version 2.06 TLB Search and Reserve Indexed X-form entrySIZE)) - 1). Otherwise, m is equal to the logical NOT of ((1 << entrySIZE) - 1) tlbsrx. RA,RB [Category: Embedded.TLB Write The TID value of the TLB entry is equal to MAS1TID Conditional] or is zero. The TS value of the TLB entry is equal to MAS1TS. 31 /// RA RB 850 1 Either of the following is true: 0 6 11 16 21 31 The implementation does not support the Embedded.Hypervisor category. if RA = 0 then b 0 The TGS value of the TLB entry is equal to else b (RA) MAS5SGS and either the TLPID value of the EA b + (RB) TLB entry is equal to MAS5SLPID or is zero. pid MAS1TID CR Field 0 is set as follows. n is a 1-bit value that indi- as MAS1TS if Embedded.Page Table category supported then cates whether the search was successful. ind MAS1IND if category E.HV supported then CR0LT GT EQ SO = 0b00 || n || 0 gs MAS5SGS lpid MAS5SLPID This instruction creates a TLB-reservation for use by a va gs || lpid || as || pid || EA TLB Write instruction. The virtual address described else above is associated with the TLB-reservation, and va as || pid || EA replaces any address previously associated with the TLB-RESERVE 1 TLB-reservation. (The TLB-reservation is created if Embedded.Page Table category supported then regardless of whether the search succeeds.) TLB-RESERVE_IND_N_ADDR ind || va else If there are multiple matching TLB entries, either one of TLB-RESERVE_ADDR va the matching entries is used or a Machine Check Valid_matching_entry_exists 0 exception occurs. If the Embedded.Hypervisor cate- for each TLB array gory is supported, this Machine Check interrupt must for each TLB entry be precise. if MAV = 1.0 then m ¬((1 << (2 × entrySIZE)) - 1) If RA does not equal zero, it is implementation-depen- else dent whether an Illegal Instruction exception occurs. m ¬((1 << entrySIZE) - 1) if ((EA0:53 & m) = (entryEPN & m)) & If the Embedded.Hypervisor category is supported and (entryTID = MAS1TID | entryTID = 0) & guest execution of TLB Management instructions is dis- entryTS = MAS1TS & abled (EPCRDGTMI=1), this instruction is hypervisor (E.PT not supported | entryIND = MAS1IND) & privileged. Otherwise, this instruction is privileged. (E.HV not supported | (entryTGS = MAS5SGS & (entryTLPID = MAS5SLPID | entryTLPID = 0))) then Special Registers Altered: Valid_matching_entry_exists 1 CR0 exit for loops if Valid_matching_entry_exists then CR0 0b0010 else CR0 0b0000 Let the effective address (EA) be the sum (RA|0)+ (RB). If any TLB array contains a valid entry matching the MAS1IND and virtual address formed by MAS5SGS , MAS5SLPID , MAS1TS TID, and EA, the search is considered successful. A TLB entry matches if all the following conditions are met. The valid bit of the TLB entry is 1. Either the Embedded.Page Table category is not supported or the IND value of the TLB entry is equal to MAS1IND. The logical AND of EA0:53 and m is equal to the logical AND of the EPN value of the TLB entry and m, where m is determined as follows: If MMU Architecture Version 1.0 is supported, m is equal to the logical NOT of ((1 << (2 × 1004 Power ISATM Book III-E Version 2.06 TLB Read Entry X-form MAS1IPROT TID TS 0 0 0 if Embedded.Page Table supported then tlbre MAS1IND 0 MAS2EPN entryLPN 31 /// /// /// 946 / MAS2W I M G E 0 0 0 0 0 if category VLE supported then 0 6 11 16 21 31 MAS2VLE 0 if ACM supported then MAS2ACM 0 if MAS0ATSEL= 0 then MAS3RPNL rpn32:53 if TLBnCFGHES = 0 then MAS3U0:U3 UX SX UW SW UR SR 0 0 0 0 0 0 0 entry SelectTLB(MAS0TLBSEL,MAS0ESEL, MAS2EPN) MAS7RPNU rpn0:31 else MAS8TGS VF 0 0 entry SelectTLB(MAS0TLBSEL, MAS1TID TSIZE, if the LRAT supports LPID MAS2EPN, MAS0ESEL) MAS8TLPID entryLPID if Next Victim supported then else if TLB array specified by MAS0TLBSEL supports NV MAS8TLPID undefined then else MAS0NV hint MAS1V 0 else MAS1IPROT TID TS TSIZE undefined MAS0NV undefined if Embedded.Page Table supported then if TLB entry is found then MAS1IND undefined rpn entryRPN MAS2EPN W I M G E undefined MAS1V TID TS TSIZE entryV TID TS SIZE if category VLE supported then if TLB array supports IPROT then MAS2VLE undefined MAS1IPROT entryIPROT if ACM supported then MAS2ACM undefined else MAS3RPNL U0:U3 UX SX UW SW UR SR undefined MAS1IPROT 0 MAS7RPNU undefined if category E.PT supported then MAS8TGS VF TLPID undefined if TLB array supports indirect entries then If the Embedded.Hypervisor.LRAT is not supported, MAS1IND entryIND if entryIND = 1 MAS0ATSEL is treated as if it were zero in the following MAS3SPSIZE entrySPSIZE description. else If the Embedded.Hypervisor.LRAT category is not sup- MAS3UX SX UW SW UR SR entryUX SX UW SW UR SR ported or MAS0ATSEL is 0, then the following applies. else MAS1IND 0 If Next Victim is supported for any TLB array, the MAS3UX SX UW SW UR SR entryUX SX UW SW UR SR following applies. else If the TLB array specified by MAS0TLBSEL MAS3UX SX UW SW UR SR entryUX SX UW SW UR SR supports Next Victim, MAS0NV is set to the MAS2EPN W I M G E entryEPN W I M G E hardware hint for the index of the entry to be if category VLE supported then MAS2VLE entryVLE replaced. Otherwise, MAS0NV is set to an if ACM supported then MAS2ACM entryACM implementation-dependent undefined value. MAS3RPNL rpn32:53 A TLB entry is selected in one of the following MAS3U0:U3 entryU0:U3 ways. MAS7RPNU rpn0:31 if category E.HV supported then if TLBnCFGHES=0 for the TLB array selected MAS8TGS VF TLPID entryTGS VF TLPID by MAS0TLBSEL, the TLB entry is specified by else MAS0TLBSEL, MAS0ESEL and MAS2EPN. MAS1V 0 if TLBnCFGHES=1 for the TLB array selected MAS1IPROT TID TS TSIZE undefined by MAS0TLBSEL, the TLB entry is specified by if Embedded.Page Table supported then MAS0TLBSEL, MAS0ESEL and a hardware MAS1IND undefined generated hash based on MAS1TID, MAS2EPN W I M G E undefined MAS1TSIZE, and MAS2EPN. if category VLE supported then MAS2VLE undefined If the selected TLB entry exists, MAS register fields are if ACM supported then MAS2ACM undefined loaded according to the following. MAS3RPNL U0:U3 UX SX UW SW UR SR undefined MAS1V TID TS TSIZE are loaded from the V, TID, MAS7RPNU undefined TS, and SIZE fields of the TLB entry. if category E.HV supported then If the TLB array supports IPROT, MAS1IPROT is MAS8TGS VF TLPID undefined loaded from the IPROT bit of the TLB entry. Other- else entry SelectLRAT(MAS0ESEL, MAS2EPN) wise, MAS1IPROT is set to 0. MAS0NV undefined MAS2EPN W I M G E are loaded from the EPN, W, I, if LRAT entry is found then M, G, and E fields of the TLB entry. rpn entryLRPN If the VLE category is supported, MAS2VLE is MAS1V TSIZE entryV LSIZE loaded from the VLE bit of the TLB entry. Chapter 6. Storage Control 1005 Version 2.06 If Alternate Coherency Mode is supported, The supported User-Defined storage control MAS2ACM is loaded from the ACM bit of the TLB bits in MAS3U0:U3 are set to 0s. entry. MAS8TGS VF are set to 0s. MAS3RPNL is loaded from the lower 22-bits of the If the LPID field in the LRAT is supported RPN field of the TLB entry, and, if implemented, (LRATCFGLPID = 1), MAS8TLPID is loaded MAS7RPNU is loaded from the upper 32-bits of the from the TLPID field of the LRAT entry. RPN field of the TLB entry. If TLBnCFGHES = 1 and the page size specified by The supported User-Defined storage control bits in MAS1TSIZE is not supported by the specified array, the MAS3U0:U3 are loaded from the respective sup- tlbre may be performed as if TSIZE were some imple- ported U0:U3 bits of the TLB entry. mentation-dependent value or, as described below, as If the Embedded.Page Table category is not sup- if the entry can not be found, or an Illegal Instruction ported, MAS3UX SX UW SW UR SR are loaded from exception occurs. the UX, SX, UW, SW, UR, and SR bits of the TLB entry. Otherwise, the following applies. It is implementation-dependent whether a TLB or LRAT if the TLB array does not support indirect entry can not be found or whether larger values of the entries, MAS1IND is set to 0 and MAS3UX SX fields that select an entry are simply mapped to existing UW SW UR SR are loaded from the UX, SX, UW, entries. If the specified TLB or LRAT entry does not SW, UR, and SR bits of the TLB entry. Other- exist, MAS1V is set to 0 and the following MAS register wise, the following applies. fields are set to implementation-dependent undefined MAS1IND is loaded from the IND bit of values. the TLB entry. MAS1IPROT TID TS TSIZE, MAS2EPN W I M G E, If the IND bit of the TLB entry is 1, MAS3UX SX UW SW UR SR, MAS3RPNL, and, if imple- MAS3SPSIZE is loaded from the mented, MAS7RPNU SPSIZE field of the TLB entry, and If the VLE category is supported, MAS2VLE MAS3UND is set to an implementation- If Alternate Coherency Mode is supported, dependent undefined value. MAS2ACM If the IND bit of the TLB entry is 0, The supported User-Defined storage control bits in MAS3UX SX UW SW UR SR are loaded MAS3U0:U3 from the UX, SX, UW, SW, UR, and If the Embedded.Page Table category is sup- SR bits of the TLB entry. ported, MAS1IND If the Embedded.Hypervisor category is imple- If the Embedded.Hypervisor category is imple- mented, MAS8TGS VF TLPID are loaded from the mented, MAS8TGS VF TLPID TGS, VF, and TLPID fields of the TLB entry. If the Embedded.Hypervisor category is supported, this If the Embedded.Hypervisor.LRAT category is sup- instruction is hypervisor privileged. Otherwise, this ported, the LRAT array is specified (MAS0ATSEL = 1), instruction is privileged. then the following applies. MAS0NV is set to an implementation-dependent Special Registers Altered: undefined value. MAS0 MAS1 MAS2 MAS3 MAS7 If the LRAT entry specified by MAS0ESEL and MAS8 (if category E.HV is supported) MAS2EPN exists, MAS register fields are loaded from the LRAT entry according to the following. Programming Note MAS1V TSIZE are loaded from the V and Hypervisor software should generally prevent guest LSIZE fields of the LRAT entry. operating system visibility of the RPN. After execut- MAS1IPROT TID TS, MAS2W I M G E, MAS3UX SX ing a tlbsx or tlbre on behalf of a guest, the hyper- UW SW UR SR, and MAS8TGS VF are set to 0s. visor should replace the RPN fields in the MAS3 If the Embedded.Page Table category is sup- and MAS7 registers with the corresponding values ported, MAS1IND is set to 0. from the appropriate LPN. MAS2EPN is loaded from the LPN field of the LRAT entry. If the VLE category is supported, MAS2VLE is set to 0. If Alternate Coherency Mode is supported, MAS2ACM is set to 0. MAS3RPNL is loaded from the lower 22-bits of the LRPN field of the LRAT entry, and, if implemented, MAS7RPNU is loaded from the upper 32-bits of the LRPN field of the LRAT entry. 1006 Power ISATM Book III-E Version 2.06 TLB Synchronize X-form TLB Write Entry X-form tlbsync tlbwe 31 /// /// /// 566 / 31 /// /// /// 978 / 0 6 11 16 21 31 0 6 11 16 21 31 The tlbsync instruction provides an ordering function if MAS0WQ = 0b00 | MAS0WQ = 0b01 then for the effects of all tlbivax instructions executed by the if MAS0ATSEL = 0 or MSRGS = 1 then thread executing the tlbsync instruction, with respect if TLBnCFGHES = 0 then entry SelectTLB(MAS0TLBSEL,MAS0ESEL, MAS2EPN) to the memory barrier created by a subsequent sync else instruction executed by the same thread. Executing a if MAS0HES = 1 then tlbsync instruction ensures that all of the following will entry SelectTLB(MAS0TLBSEL, MAS1TID TSIZE, occur. MAS2EPN, hardware_replacement_algorithm) All TLB invalidations caused by tlbivax instruc- else entry SelectTLB(MAS0TLBSEL, MAS1TID TSIZE, tions preceding the tlbsync instruction will have MAS2EPN, MAS0ESEL) completed on any other thread before any data if TLB array specified by MAS0TLBSEL supports NV accesses caused by instructions following the &((MAS0WQ = 0b00) | (category E.TWC supported sync instruction are performed with respect to that & (MAS0WQ = 0b01) & (TLB-reservation))) then thread. hint MAS0NV if TLB entry is found & All storage accesses by other threads for which ((MAS0WQ = 0b00) | ((category E.TWC supported) the address was translated using the translations & (MAS0WQ = 0b01) & (TLB-reservation))) then being invalidated will have been performed with if category E.HV.LRAT supported & (MSRGS=1) & respect to the thread executing the sync instruc- (MAS1V=1) then tion, to the extent required by the associated Mem- rpn translate_logical_to_real(MAS7RPNU ory Coherence Required attributes, before the || MAS3RPNL, MAS8TLPID) sync instruction's memory barrier is created. else if MAS7 implemented then The operation performed by this instruction is ordered rpn MAS7RPNU || MAS3RPNL by the mbar or sync instruction with respect to preced- 32 else rpn 0 || MAS3RPNL ing tlbivax instructions executed by the thread execut- entryV IPROT TID TS SIZE MAS1V IPROT TID TS TSIZE ing the tlbsync instruction. The operations caused by entryEPN VLE W I M G E ACM MAS2EPN VLE W I M G E ACM tlbivax and tlbsync are ordered by mbar as a set of entryU0:U3 MAS3U0:U3 operations, which is independent of the other sets that if category E.PT supported and TLB array supports indirect entries then mbar orders. entryIND MAS1IND The tlbsync instruction may complete before opera- if MAS1IND = 0 then tions caused by tlbivax instructions preceding the tlb- entryUX SX UW SW UR SR MAS3UX SX UW SW UR SR sync instruction have been performed. else entrySPSIZE MAS3SPSIZE If the Embedded.Hypervisor category is supported, this entryRPN rpn instruction is hypervisor privileged. Otherwise, this if (category E.HV is supported) instruction is privileged. entryTGS VF TLPID MAS8TGS VF TLPID if category E.TWC supported Special Registers Altered: TLB-reservation 0 None else entry SelectLRAT(MAS0ESEL,MAS2EPN) if LRAT entry is found & (MAS0WQ = 0b00) & (MAS0HES = 0b0) then hint MAS0NV entryV LSIZE MAS1V TSIZE entryLPN MAS2EPN entryRPN MAS7RPNU || MAS3RPNL if LRATCFGLPID = 1 entryLPID MAS8TLPID else if category E.TWC supported TLB-reservation 0 If the Embedded.TLB Write Conditional category is not supported, MAS0WQ is treated as if it were 0b00 in the following description. Chapter 6. Storage Control 1007 Version 2.06 If the Embedded.Hypervisor.LRAT category is not sup- If the selected entry exists, the selected entry ported or MSRGS=1, MAS0ATSEL is treated as if it were has IPROT=1, MAS0WQ=0b00, and zero in the following description. MSRGS=1, no TLB entry is written, and a Hypervisor Privilege exception occurs. If a TLB array is specified (MAS0ATSEL = 0) and If MAS0WQ=0b00, MAS1V=1, MAS1IPROT=1, TLBnCFGHES=0 for the TLB array selected by and MSRGS=1, no TLB entry is written, and a MAS0TLBSEL, MAS0HES is treated as 0 in the following Hypervisor Privilege exception occurs. description. If MAS0WQ=0b00, MAS1V=0, MAS1IPROT=1, If the Embedded.Page Table category is supported, a and MSRGS=1, no TLB entry is written, and it TLB array is specified (MAS0ATSEL = 0), and the speci- is implementation-dependent whether a fied TLB array does not support indirect entries, Hypervisor Privilege exception occurs. MAS1IND must be 0. If the selected entry has IPROT=1, MAS0WQ=0b01, and MSRGS=1, no TLB entry If a TLB array is specified (MAS0ATSEL = 0) and is written, and it is implementation-dependent MAS0WQ is 0b00 or 0b01, the following applies. whether a Hypervisor Privilege exception If the Embedded.Hypervisor is not supported, the occurs. tlbwe instruction is executed in hypervisor state, or If MAS0WQ=0b01, MAS1IPROT=1, and MAS1V=0, an RPN is formed by concatenating MSRGS=1, no TLB entry is written, and it is MAS7RPNU with MAS3RPNL (RPN = MAS7RPNU implementation-dependent whether a Hyper- || MAS3RPNL). visor Privilege exception occurs. If the Embedded.Hypervisor category is supported, If MAS0HES=0 and MSRGS=1, no TLB entry is the tlbwe instruction is executed in guest state, written, and it is implementation-dependent and MAS1V=1, an LPN is formed by concatenating whether a Hypervisor Privilege exception MAS7RPNU with MAS3RPNL (LPN = MAS7RPNU occurs. || MAS3RPNL). However, if MAS7 is not imple- mented, LPN = 320 || MAS3RPNL. This LPN is If a TLB entry is to be written per the preceding descrip- translated by the LRAT to obtain the RPN. If there tion, then regardless of whether the selected TLB entry is no LRAT entry that translates this LPN for the exists, MAS0NV provides a suggestion to hardware of LPID specified by LPIDR, an LRAT Miss exception what the hardware hint for replacement should be occurs. However, if MAS0WQ is 0b01 and no TLB- when the next Data or Instruction TLB Error Interrupt reservation exists, it is implementation-dependent for a virtual address that uses the set of TLB entries whether the LRAT Miss exception occurs. containing the entry written by the tlbwe instruction. If TLBnCFGHES for the TLB array selected by If the selected TLB entry exists and the TLB entry is to MAS0TLBSEL is 0, the TLB entry is specified by be written per the preceding description, the fields of MAS0TLBSEL, MAS0ESEL, and MAS2EPN. If the TLB entry are loaded from the MAS registers TLBnCFGHES is 1 and MAS0HES is 1, the TLB according to the following. entry is selected by MAS0TLBSEL, a hardware The V, TID, TS, and SIZE fields of the TLB entry replacement algorithm, and a hardware generated are loaded from MAS1V TID TS TSIZE. hash based on MAS1TID TSIZE, and MAS2EPN. If If the TLB array supports IPROT, the IPROT bit of TLBnCFGHES is 1 and MAS0HES is 0, the TLB the TLB entry is loaded from MAS1IPROT. entry is selected by MAS0TLBSEL, MAS0ESEL and The EPN, W, I, M, G, and E fields of the TLB entry a hardware generated hash based on are loaded from MAS2EPN W I M G E. MAS0TLBSEL, MAS1TID TSIZE, and MAS2EPN. If the VLE category is supported, the VLE bit of the The selected TLB entry is written (see the follow- TLB entry is loaded from MAS2VLE. ing major bulleted item) if all the following condi- If Alternate Coherency Mode is supported, the tions are met. ACM bit of the TLB entry is loaded from MAS2ACM. There is no LRAT Miss exception. The RPN field of the TLB entry is loaded from the MAS0WQ is 0b00 or both the following are RPN described above. true. The supported User-Defined storage control bits MAS0WQ is 0b01 (U0:U3) of the TLB entry are loaded from the A TLB-reservation exists. respective bits in MAS3U0:U3. MAS1IPROT is 0, the Embedded.Hypervisor If the Embedded.Page Table category is sup- category is not supported, or MSRGS = 0. ported and the TLB array supports indirect entries, The selected TLB entry has IPROT = 0, the the following applies. Embedded.Hypervisor category is not sup- The IND of the TLB entry is loaded from ported, or MSRGS = 0. MAS1IND. Use the first of the following sub-bullets that If MAS1IND is 1, the SPSIZE field of the TLB applies. entry is loaded from MAS3SPSIZE. If EPCRDGTMI=1, no TLB entry is written and a Hypervisor Privilege exception occurs. 1008 Power ISATM Book III-E Version 2.06 If MAS1IND is 0, the UX, SX, UW, SW, UR, If TLBnCFGHES=1 for the selected TLB array, a TLB and SR bits of the TLB entry are loaded from write does not necessarily invalidate implementation- MAS3UX SX UW SW UR SR. specific TLB lookaside information. See Section If the Embedded.Hypervisor category is imple- 6.11.4.4. mented, the TGS, VF, and TLPID fields of the TLB This instruction is hypervisor privileged if the Embed- entry are loaded from MAS8TGS VF TLPID. ded.Hypervisor category is supported and any of the If the LRAT array is specified (MAS0ATSEL = 1) for a following is true. tlbwe, MAS0WQ must be 0b00 and MAS0HES must be The Embedded.Hypervisor.LRAT category is not 0. If the LRAT array is specified (MAS0ATSEL = 1), supported. MAS0WQ is 0b00, MAS0HES is 0, and the tlbwe instruc- MSRGS = 1 and, for the TLB array selected by tion is executed in hypervisor state, the following MAS0TLBSEL, TLBnCFGGTWE = 0. applies. Guest execution of TLB Management instructions An RPN is formed by concatenating MAS7RPNU is disabled (EPCRDGTMI=1). with MAS3RPNL (RPN = MAS7RPNU || Otherwise, this instruction is privileged. MAS3RPNL) The contents of the MAS1V TSIZE, MAS2EPN, and Special Registers Altered: the RPN described above are written to the None selected LRAT entryV LSIZE LPN RPN. If the LPID field in the LRAT is supported Programming Note (LRATCFGLPID = 1), MAS8TLPID is written to the Care must be taken not to invalidate any TLB entry LPID field of the selected entry. that contains the mapping for any interrupt vector. If no exception occurs, and either MAS0WQ is 0b10 or a TLB array was selected by the tlbwe (MAS0ATSEL=0 or MSRGS=1), the TLB-reservation is cleared. If MAS0WQ is 0b10, no TLB entry is written. If MAS0WQ is 0b11, the instruction is treated as if the instruction form is invalid. If the page size specified by MAS1TSIZE is not sup- ported by the specified array, the tlbwe may be per- formed as if TSIZE were some implementation- dependent value, or an Illegal Instruction exception occurs. If a TLB entry is to be written per the preceding descrip- tion, MAS1IND=1, and values of I,M,G, and E to be writ- ten to the TLB entry are inconsistent with storage that is not Caching Inhibited, Memory Coherence Required, not Guarded, and Big-Endian, the tlbwe may be per- formed as described or an Illegal Instruction exception occurs. Also, if a TLB entry is to be written per the pre- ceding description, MAS1IND=1, and values of ACM and U0:U3 to be written to the TLB entry are inconsis- tent with the requirements that an implementation has for storage control attributes for a Page Table, the tlbwe may be performed as described or an Illegal Instruction exception occurs. If an invalid value is specified for MAS0TLBSEL MAS0ESEL or MAS2EPN, either no TLB entry is written by the tlbwe, or the tlbwe is performed as if some implementation-dependent, valid value were substi- tuted for the invalid value, or an Illegal Instruction exception occurs. A context synchronizing instruction is required after a tlbwe instruction to ensure any subsequent instructions that will use the updated TLB or LRAT values execute in the new context. Chapter 6. Storage Control 1009 Version 2.06 1010 Power ISATM Book III-E Version 2.06 Chapter 7. Interrupts and Exceptions 7.1 Overview. . . . . . . . . . . . . . . . . . . 1014 7.2.18.3 Machine Check Syndrome 7.2 Interrupt Registers . . . . . . . . . . . 1014 Register . . . . . . . . . . . . . . . . . . . . . . . 1023 7.2.1 Save/Restore Register 0 . . . . . 1014 7.2.18.4 Machine Check Interrupt Vector 7.2.2 Save/Restore Register 1 . . . . . 1015 Prefix Register . . . . . . . . . . . . . . . . . . 1023 7.2.3 Guest Save/Restore Register 0 7.2.19 External Proxy Register [Category: [Category:Embedded.Hypervisor]. . . 1015 External Proxy] . . . . . . . . . . . . . . . . . 1023 7.2.4 Guest Save/Restore Register 1 7.2.20 Guest External Proxy Register [Category:Embedded.Hypervisor]. . . 1015 [Category: Embedded Hypervisor, External 7.2.5 Critical Save/Restore Proxy]. . . . . . . . . . . . . . . . . . . . . . . . . 1023 Register 0 . . . . . . . . . . . . . . . . . . . . . 1016 7.3 Exceptions. . . . . . . . . . . . . . . . . . 1025 7.2.6 Critical Save/Restore 7.4 Interrupt Classification. . . . . . . . . 1025 Register 1 . . . . . . . . . . . . . . . . . . . . . 1016 7.4.1 Asynchronous Interrupts . . . . . 1025 7.2.7 Debug Save/Restore Register 0 7.4.2 Synchronous Interrupts . . . . . . 1025 [Category: Embedded.Enhanced 7.4.2.1 Synchronous, Precise Debug]. . . . . . . . . . . . . . . . . . . . . . . . 1016 Interrupts . . . . . . . . . . . . . . . . . . . . . . 1026 7.2.8 Debug Save/Restore Register 1 7.4.2.2 Synchronous, Imprecise [Category: Embedded.Enhanced Interrupts . . . . . . . . . . . . . . . . . . . . . . 1026 Debug]. . . . . . . . . . . . . . . . . . . . . . . . 1017 7.4.3 Interrupt Classes . . . . . . . . . . . 1026 7.2.9 Data Exception Address 7.4.4 Machine Check Interrupts . . . . 1026 Register. . . . . . . . . . . . . . . . . . . . . . . 1017 7.5 Interrupt Processing . . . . . . . . . . 1027 7.2.10 Guest Data Exception Address 7.6 Interrupt Definitions . . . . . . . . . . . 1030 Register [Category: 7.6.1 Interrupt Fixed Offsets [Category: Embedded.Hypervisor] . . . . . . . . . . . 1017 Embedded.Phased-In]. . . . . . . . . . . . 1033 7.2.11 Interrupt Vector Prefix 7.6.2 Critical Input Interrupt. . . . . . . . 1034 Register. . . . . . . . . . . . . . . . . . . . . . . 1018 7.6.3 Machine Check Interrupt . . . . . 1034 7.2.12 Guest Interrupt Vector Prefix 7.6.4 Data Storage Interrupt . . . . . . . 1035 Register [Category: 7.6.5 Instruction Storage Interrupt. . . 1037 Embedded.Hypervisor.Phased-Out] . 1018 7.6.6 External Input Interrupt . . . . . . 1039 7.2.13 Exception Syndrome Register 1019 7.6.7 Alignment Interrupt. . . . . . . . . . 1039 7.2.14 Guest Exception Syndrome 7.6.8 Program Interrupt . . . . . . . . . . . 1040 Register [Category: 7.6.9 Floating-Point Unavailable Embedded.Hypervisor] . . . . . . . . . . . 1020 Interrupt . . . . . . . . . . . . . . . . . . . . . . . 1042 7.2.15 Interrupt Vector Offset Registers 7.6.10 System Call Interrupt . . . . . . . 1042 [Category: Embedded.Phased-Out] . 1020 7.6.11 Auxiliary Processor Unavailable 7.2.16 Guest Interrupt Vector Offset Interrupt . . . . . . . . . . . . . . . . . . . . . . . 1042 Register [Category: 7.6.12 Decrementer Interrupt . . . . . . 1043 Embedded.Hypervisor.Phased-Out] . 1021 7.6.13 Fixed-Interval Timer Interrupt . 1043 7.2.17 Logical Page Exception Register 7.6.14 Watchdog Timer Interrupt. . . . 1044 [Category: Embedded.Hypervisor and 7.6.15 Data TLB Error Interrupt. . . . . 1044 Embedded.Page Table] . . . . . . . . . . . 1022 7.6.16 Instruction TLB Error Interrupt 1045 7.2.18 Machine Check Registers . . . 1022 7.6.17 Debug Interrupt . . . . . . . . . . . 1046 7.2.18.1 Machine Check Save/Restore 7.6.18 SPE/Embedded Floating-Point/ Register 0 . . . . . . . . . . . . . . . . . . . . . 1022 Vector Unavailable Interrupt 7.2.18.2 Machine Check Save/Restore [Categories: SPE.Embedded Float Scalar Register 1 . . . . . . . . . . . . . . . . . . . . . 1022 Chapter 7. Interrupts and Exceptions 1013 Version 2.06 Double, SPE.Embedded Float Vector, 7.6.28 Embedded Hypervisor Privilege Vector] . . . . . . . . . . . . . . . . . . . . . . . .1047 Interrupt [Category: 7.6.19 Embedded Floating-Point Data Embedded.Hypervisor] . . . . . . . . . . . 1052 Interrupt 7.6.29 LRAT Error Interrupt [Category: [Categories: SPE.Embedded Float Scalar Embedded.Hypervisor.LRAT] . . . . . . 1052 Double, SPE.Embedded Float Scalar 7.7 Partially Executed Instructions . . 1054 Single, SPE.Embedded Float Vector] .1048 7.8 Interrupt Ordering and Masking . 1055 7.6.20 Embedded Floating-Point Round 7.8.1 Guidelines for System Interrupt Software . . . . . . . . . . . . . . . . . . . . . . 1056 [Categories: SPE.Embedded Float Scalar 7.8.2 Interrupt Order . . . . . . . . . . . . . 1058 Double, SPE.Embedded Float Scalar 7.9 Exception Priorities. . . . . . . . . . . 1059 Single, SPE.Embedded Float Vector] .1048 7.9.1 Exception Priorities for Defined 7.6.21 Performance Monitor Interrupt Instructions . . . . . . . . . . . . . . . . . . . . 1059 [Category: Embedded.Performance 7.9.1.1 Exception Priorities for Defined Monitor]. . . . . . . . . . . . . . . . . . . . . . . .1049 Floating-Point Load and Store 7.6.22 Processor Doorbell Interrupt Instructions . . . . . . . . . . . . . . . . . . . . 1059 [Category: Embedded.Processor 7.9.1.2 Exception Priorities for Other Control] . . . . . . . . . . . . . . . . . . . . . . . .1049 Defined Load and Store Instructions and 7.6.23 Processor Doorbell Critical Interrupt Defined Cache Management [Category: Embedded.Processor Instructions . . . . . . . . . . . . . . . . . . . . 1059 Control] . . . . . . . . . . . . . . . . . . . . . . . .1049 7.9.1.3 Exception Priorities for Other 7.6.24 Guest Processor Doorbell Interrupt Defined Floating-Point Instructions . . 1060 [Category: 7.9.1.4 Exception Priorities for Defined Embedded.Hypervisor,Embedded.Process Privileged Instructions. . . . . . . . . . . . 1060 or Control]. . . . . . . . . . . . . . . . . . . . . .1049 7.9.1.5 Exception Priorities for Defined 7.6.25 Guest Processor Doorbell Critical Trap Instructions . . . . . . . . . . . . . . . . 1060 Interrupt [Category: 7.9.1.6 Exception Priorities for Defined Embedded.Hypervisor,Embedded.Process System Call Instruction . . . . . . . . . . . 1060 or Control]. . . . . . . . . . . . . . . . . . . . . .1051 7.9.1.7 Exception Priorities for Defined 7.6.26 Guest Processor Doorbell Machine Branch Instructions . . . . . . . . . . . . . . 1060 Check Interrupt [Category: 7.9.1.8 Exception Priorities for Defined Embedded.Hypervisor,Embedded.Process Return From Interrupt Instructions . . 1061 or Control]. . . . . . . . . . . . . . . . . . . . . .1051 7.9.1.9 Exception Priorities for Other 7.6.27 Embedded Hypervisor System Call Defined Instructions . . . . . . . . . . . . . 1061 Interrupt [Category: 7.9.2 Exception Priorities for Reserved Embedded.Hypervisor]. . . . . . . . . . . .1051 Instructions . . . . . . . . . . . . . . . . . . . . 1061 7.1 Overview , base, critical, debug [Category: E.ED], Machine Check interrupts, respectively, program state An interrupt is the action in which the thread saves its may be lost when an unordered interrupt is taken. (See old context (MSR and next instruction address) and Section 7.8, "Interrupt Ordering and Masking". begins execution at a pre-determined interrupt-handler address, with a modified MSR. Exceptions are the events that will, if enabled, cause the thread to take an interrupt. 7.2 Interrupt Registers Exceptions are generated by signals from internal and external peripherals, instructions, the internal timer Interrupts are divided into 4 classes, as described in 7.2.1 Save/Restore Register 0 Section 7.4.3, such that only one interrupt of each class Save/Restore Register 0 (SRR0) is a 64-bit register. is reported, and when it is processed no program state SRR0 bits are numbered 0 (most-significant bit) to 63 is lost. Since Save/Restore register pairs GSRR0/ (least-significant bit). The register is used to save GSRR1 SRR0/SRR1, CSRR0/CSRR1, machine state on non-critical interrupts, and to restore DSRR0/DSRR1 [Category: E.ED], and MCSSR0/ machine state when an rfi is executed. On a non-criti- MCSSR1 are serially reusable resources used by guest cal interrupt, SRR0 is set to the current or next instruc- 1014 Power ISATM Book III-E Version 2.06 tion address. When rfi is executed, instruction 7.2.3 Guest Save/Restore Regis- execution continues at the address in SRR0. ter 0 [Category:Embedded.Hypervi- In general, SRR0 contains the address of the instruc- tion that caused the non-critical interrupt, or the sor] address of the instruction to return to after a non-critical interrupt is serviced. Guest Save/Restore Register 0 (GSRR0) is a 64-bit register. GSRR0 bits are numbered 0 (most-significant The contents of SRR0 when an interrupt is taken are bit) to 63 (least-significant bit). The register is used to mode dependent, reflecting the computation mode save machine state on guest interrupts, and to restore when the interrupt is taken and the computation mode machine state when an rfgi is executed. On a guest entered for execution of the interrupt (specified by interrupt, GSRR0 is set to the current or next instruction EPCRICM) . When computation mode when the address. When rfgi is executed, instruction execution interrupt is taken is 32-bit mode and the computation mode entered for execution of the interrupt is 64-bit continues at the address in GSRR0. mode, the high-order 32 bits of SRR0 are set to 0s. In general, GSRR0 contains the address of the instruc- When computation mode when the interrupt is taken is tion that caused the guest interrupt, or the address of 64-bit mode and the computation mode entered for the instruction to return to after a guest interrupt is ser- execution of the interrupt is 32-bit mode, the contents viced. SRR0 are undefined. The contents of SRR0 upon interrupt can be described The contents of GSRR0 when an interrupt is taken are as follows (assuming Addr is the address to be put into mode dependent, reflecting the computation mode cur- SRR0): rently in use (specified by MSRCM) and the computa- tion mode entered for execution of the interrupt if (MSRCM = 0) & (EPCRICM = 0) (specified by EPCRGICM). The contents of GSRR0 then SRR0 32undefined || Addr32:63 upon interrupt can be described as follows (assuming if (MSRCM = 0) & (EPCRICM = 1) Addr is the address to be put into GSRR0): then SRR0 320 || Addr32:63 if (MSRCM = 1)&(EPCRICM = 1) then SRR0 Addr0:63 if (MSRCM = 0) & (EPCRGICM = 0) if (MSRCM = 1)&(EPCRICM = 0) then SRR0 undefined then GSRR0 32undefined || Addr32:63 if (MSRCM = 0) & (EPCRGICM = 1) The contents of SRR0 can be read into register RT then GSRR0 320 || Addr32:63 if (MSRCM = 1)&(EPCRGICM = 1) then GSRR0 Addr0:63 using mfspr RT,SRR0. The contents of register RS can if (MSRCM=1)&(EPCRGICM=0) then GSRR0 undefined be written into the SRR0 using mtspr SRR0,RS. The contents of GSRR0 can be read into register RT This register is hypervisor privileged. using mfspr RT,GSRR0. The contents of register RS can be written into the GSRR0 using mtspr 7.2.2 Save/Restore Register 1 GSRR0,RS. Save/Restore Register 1 (SRR1) is a 32-bit register. This register is privileged. SRR1 bits are numbered 32 (most-significant bit) to 63 (least-significant bit). The register is used to save Programming Note machine state on non-critical interrupts, and to restore mfspr RT,SRR0 should be used to read GSRR0 in machine state when an rfi is executed. When a non- guest supervisor state. mtspr SRR0,RS should be critical interrupt is taken, the contents of the MSR are used to write GSRR0 in guest supervisor state. placed into SRR1. When rfi is executed, the contents of SRR1 are placed into the MSR. See Section 2.2.1, "Register Mapping". Bits of SRR1 that correspond to reserved bits in the MSR are also reserved. Programming Note 7.2.4 Guest Save/Restore Regis- A MSR bit that is reserved may be inadvertently ter 1 [Category:Embedded.Hypervi- modified by rfi/rfci/rfmci. sor] The contents of SRR1 can be read into register RT Guest Save/Restore Register 1 (GSRR1) is a 32-bit using mfspr RT,SRR1. The contents of register RS register. GSRR1 bits are numbered 32 (most-signifi- can be written into the SRR1 using mtspr SRR1,RS. cant bit) to 63 (least-significant bit). The register is used This register is hypervisor privileged. to save machine state on guest interrupts, and to restore machine state when an rfgi is executed. When a guest interrupt is taken, the contents of the MSR are placed into GSRR1. When rfgi is executed, the con- tents of GSRR1 are placed into the MSR. Chapter 7. Interrupts and Exceptions 1015 Version 2.06 Bits of GSRR1 that correspond to reserved bits in the The contents of CSRR0 can be read into register RT MSR are also reserved. using mfspr RT,CSRR0. The contents of register RS can be written into CSRR0 using mtspr CSRR0,RS. Programming Note This register is hypervisor privileged. A MSR bit that is reserved may be inadvertently modified by rfi/rfgi/rfci/rfdi/rfmci. 7.2.6 Critical Save/Restore Regis- The contents of GSRR1 can be read into register RT ter 1 using mfspr RT,GSRR1. The contents of register RS can be written into the GSRR1 using mtspr Critical Save/Restore Register 1 (CSRR1) is a 32-bit GSRR1,RS. register. CSRR1 bits are numbered 32 (most-signifi- cant bit) to 63 (least-significant bit). The register is used This register is privileged. to save machine state on critical interrupts, and to restore machine state when an rfci is executed. When Programming Note a critical interrupt is taken, the contents of the MSR are mfspr RT,SRR1 should be used to read GSRR1 in placed into CSRR1. When rfci is executed, the con- guest supervisor state. mtspr SRR1,RS should be tents of CSRR1 are placed into the MSR. used to write GSRR1 in guest supervisor state. See Section 2.2.1, "Register Mapping". Bits of CSRR1 that correspond to reserved bits in the MSR are also reserved. Programming Note 7.2.5 Critical Save/Restore Regis- A MSR bit that is reserved may be inadvertently ter 0 modified by rfi/rfci/rfmci. Critical Save/Restore Register 0 (CSRR0) is a 64-bit register. CSRR0 bits are numbered 0 (most-significant The contents of CSRR1 can be read into bits 32:63 of bit) to 63 (least-significant bit). The register is used to register RT using mfspr RT,CSRR1, setting bits 0:31 save machine state on critical interrupts, and to restore of RT to zero. The contents of bits 32:63 of register RS machine state when an rfci is executed. When a critical can be written into the CSRR1 using mtspr interrupt is taken, the CSRR0 is set to the current or CSRR1,RS. next instruction address. When rfci is executed, This register is hypervisor privileged. instruction execution continues at the address in CSRR0. 7.2.7 Debug Save/Restore Regis- In general, CSRR0 contains the address of the instruc- tion that caused the critical interrupt, or the address of ter 0 [Category: Embed- the instruction to return to after a critical interrupt is ser- ded.Enhanced Debug] viced. Debug Save/Restore Register 0 (DSRR0) is a 64-bit The contents of CSRR0 when an interrupt is taken are register used to save machine state on Debug inter- mode dependent, reflecting the computation mode rupts, and to restore machine state when an rfdi is exe- when the interrupt is taken and the computation mode entered for execution of the interrupt (specified by cuted. When a Debug interrupt is taken, the DSRR0 is EPCRICM) [Category:Embedded.Hypervisor]. If compu- set to the current or next instruction address. When rfdi tation mode when the interrupt is taken is 32-bit mode is executed, instruction execution continues at the and the computation mode entered for execution of the address in DSRR0. interrupt is 64-bit mode, the high-order 32 bits of In general, DSRR0 contains the address of an instruc- CSRR0 are set to 0s. When computation mode when the interrupt is taken is 64-bit mode and the computa- tion that was executing or just finished execution when tion mode entered for execution of the interrupt is 32-bit the Debug exception occurred. mode, the contents CSRR0 are undefined. The contents of DSRR0 when an interrupt is taken are The contents of CSRR0 upon critical interrupt can be mode dependent, reflecting the computation mode described as follows (assuming Addr is the address to when the interrupt is taken and the computation mode be put into CSRR0): entered for execution of the interrupt (specified by EPCRICM) [Category:Embedded.Hypervisor]. If compu- tation mode when the interrupt is taken is 32-bit mode if (MSRCM = 0) & (EPCRICM = 0) then CSRR0 32undefined || Addr32:63 and the computation mode entered for execution of the if (MSRCM = 0) & (EPCRICM = 1) interrupt is 64-bit mode, the high-order 32 bits of then CSRR0 320 || Addr32:63 DSRR0 are set to 0s. When computation mode when if (MSRCM = 1) & (EPCRICM = 1) then CSRR0 Addr0:63 the interrupt is taken is 64-bit mode and the computa- if (MSRCM = 1)&(EPCRICM = 0) then CSRR0 undefined 1016 Power ISATM Book III-E Version 2.06 tion mode entered for execution of the interrupt is 32-bit mode, the contents DSRR0 are undefined. if (MSRCM = 0) & (EPCRICM = 0) then DEAR 32undefined || Addr32:63 The contents of DSRR0 upon Debug interrupt can be if (MSRCM = 0) & (EPCRICM = 1) described as follows (assuming Addr is the address to then DEAR 320 || Addr32:63 be put into DSRR0): if (MSRCM = 1) & (EPCRICM = 1) then DEAR Addr0:63 if (MSRCM = 1) & (EPCRICM = 0) then DEAR undefined 32 if (MSRCM = 0) & (EPCRICM = 0) then DSRR0 undefined || Addr32:63 The contents of DEAR can be read into register RT if (MSRCM = 0) & (EPCRICM = 1) then DSRR0 32 0 || Addr32:63 using mfspr RT,DEAR. The contents of register RS if (MSRCM = 1) & (EPCRICM = 1) then DSRR0 Addr0:63 can be written into the DEAR using mtspr DEAR,RS. if (MSRCM = 1) & (EPCRICM = 0) then DSRR0 undefined This register is hypervisor privileged. The contents of DSRR0 can be read into register RT using mfspr RT,DSRR0. The contents of register RS can be written into DSRR0 using mtspr DSRR0,RS. 7.2.10 Guest Data Exception This register is hypervisor privileged. Address Register [Category: Embedded.Hypervisor] 7.2.8 Debug Save/Restore Regis- The Guest Data Exception Address Register (GDEAR) ter 1 [Category: Embed- is a 64-bit register. GDEAR bits are numbered 0 (most- significant bit) to 63 (least-significant bit). The GDEAR ded.Enhanced Debug] contains the address that was referenced by a Load, Store or Cache Management instruction that caused an Debug Save/Restore Register 1 (DSRR1) is a 32-bit Alignment, Data TLB Miss, or Data Storage interrupt register used to save machine state on Debug inter- that was directed to the guest supervisor state. The rupts, and to restore machine state when an rfdi is exe- GDEAR is identical in form and function to DEAR cuted. When a Debug interrupt is taken, the contents of the Machine State Register are placed into DSRR1. The contents of the GDEAR when an interrupt is taken When rfdi is executed, the contents of DSRR1 are are mode dependent, reflecting the computation mode placed into the Machine State Register. currently in use (specified by MSRCM) and the compu- tation mode entered for execution of the interrupt Bits of DSRR1 that correspond to reserved bits in the (specified by EPCRGICM). The contents of the GDEAR Machine State Register are also reserved. upon interrupt can be described as follows (assuming The contents of DSRR1 can be read into bits 32:63 of Addr is the address to be put into GDEAR): register RT using mfspr RT,DSRR1, setting bits 0:31 of RT to zero. The contents of bits 32:63 of register RS if (MSRCM = 0) & (EPCRGICM = 0) can be written into the DSSR1 using mtspr then GDEAR 32undefined || Addr32:63 DSRR1,RS. if (MSRCM = 0) & (EPCRGICM = 1) then GDEAR 320 || Addr32:63 This register is hypervisor privileged. if (MSRCM = 1) & (EPCRGICM = 1) then GDEAR Addr0:63 if (MSRCM = 1)&(EPCRGICM = 0) 7.2.9 Data Exception Address then GDEAR undefined Register The contents of GDEAR can be read into register RT using mtspr RT,GDEAR. The contents of register RS The Data Exception Address Register (DEAR) is a 64- can be written into the GDEAR using mtspr bit register. DEAR bits are numbered 0 (most-signifi- GDEAR,RS. cant bit) to 63 (least-significant bit). The DEAR contains the address that was referenced by a Load, Store or This register is privileged. Cache Management instruction that caused an LRAT Error interrupt or that caused an Alignment, Programming Note Data TLB Miss, Data Storage interrupt if either the mfspr RT,DEAR should be used to read GDEAR in Embedded.Hypervisor category is not supported or the guest supervisor state. mtspr DEAR,RS should be interrupt is directed to the hypervisor. used to write GDEAR in guest supervisor state. The contents of the DEAR when an interrupt is taken See Section 2.2.1, "Register Mapping". are mode dependent, reflecting the computation mode currently in use (specified by MSRCM) and the compu- tation mode entered for execution of the critical inter- rupt (specified by EPCRICM). The contents of the DEAR upon interrupt can be described as follows (assuming Addr is the address to be put into DEAR): Chapter 7. Interrupts and Exceptions 1017 Version 2.06 7.2.11 Interrupt Vector Prefix Reg- Interrupt Vector Prefix Register to form the 64-bit address of the exception processing routine. ister If Interrupt Fixed Offsets [Category: Embed- The Interrupt Vector Prefix Register (IVPR) is a 64-bit ded.Phased-In] are supported, the following applies. register. Interrupt Vector Prefix Register bits are num- GIVPR52:63 are reserved. For interrupts directed to bered 0 (most-significant bit) to 63 (least-significant bit). guest state, bits 0:51 of the Guest Interrupt Vector Pre- fix Register provide the high-order 52 bits of the The IVPR is used for Machine Check interrupt if the address of the exception processing routines. The 12- MCIVPR is not supported. The IVPR is used for other bit exception vector offsets (provided in Section 7.2.15) interrupts if Category E.HV is not supported or if the are concatenated to the right of bits 0:47 of the Guest interrupt is directed to the hypervisor state. For these Interrupt Vector Prefix Register to form the 64-bit interrupts, the IVPR is used in one of the following address of the exception processing routine. ways. If Interrupt Vector Offset Registers [Category: The contents of Guest Interrupt Vector Prefix Register Embedded.Phased-Out] are supported, the follow- can be read into register RT using mfspr RT,GIVPR. ing applies. Bits 48:63 are reserved. Bits 0:47 of The contents of register RS can be written into Interrupt the Interrupt Vector Prefix Register provide the Vector Prefix Register using mtspr GIVPR,RS. high-order 48 bits of the address of the exception processing routines. The 16-bit exception vector Write access to this register is hypervisor privileged. offsets from the appropriate IVOR (provided in Read access to this register is privileged. Section 7.2.15) are concatenated to the right of bits 0:47 of the Interrupt Vector Prefix Register to Programming Note form the 64-bit address of the exception process- mfspr RT,IVPR should be used to read GIVPR in ing routine. guest supervisor state. mtspr IVPR,RS should be If Interrupt Fixed Offsets [Category: Embed- used to write GIVPR in guest supervisor state. ded.Phased-In] are supported, the following Hypervisor software should emulate the accesses applies. IVPR52:63 are reserved. Bits 0:51 of the for the guest. Interrupt Vector Prefix Register provide the high- order 52 bits of the address of the exception pro- cessing routines. The 12-bit exception vector off- sets (provided in Section 7.2.15) are concatenated to the right of bits 0:47 of the Interrupt Vector Prefix Register to form the 64-bit address of the excep- tion processing routine. The contents of Interrupt Vector Prefix Register can be read into register RT using mfspr RT,IVPR. The con- tents of register RS can be written into Interrupt Vector Prefix Register using mtspr IVPR,RS. This register is hypervisor privileged. 7.2.12 Guest Interrupt Vector Pre- fix Register [Category: Embed- ded.Hypervisor.Phased-Out] The Guest Interrupt Vector Prefix Register (GIVPR) is a 64-bit register. Interrupt Vector Prefix Register bits are numbered 0 (most-significant bit) to 63 (least-signifi- cant bit). If Interrupt Vector Offset Registers [Category: Embed- ded.Phased-Out] are supported, the following applies. GIVPR48:63 are reserved. For interrupts directed to guest state, bits 0:47 of the Guest Interrupt Vector Pre- fix Register provides the high-order 48 bits of the address of the exception processing routines. The 16- bit exception vector offsets (provided in Section 7.2.15) are concatenated to the right of bits 0:47 of the Guest 1018 Power ISATM Book III-E Version 2.06 7.2.13 Exception Syndrome Register The Exception Syndrome Register (ESR) is a 32-bit the bit or bits corresponding to the specific exception register. ESR bits are numbered 32 (most-significant that generated the interrupt is set, and all other ESR bit) to 63 (least-significant bit). The ESR provides a bits are cleared. Other interrupt types do not affect the syndrome to differentiate between the different kinds of contents of the ESR. The ESR does not need to be exceptions that can generate the same interrupt type. cleared by software. Figure 52 shows the bit definitions Upon the generation of one of these types of interrupts, for the ESR. Bit(s) Name Meaning Associated Interrupt Type 32:35 Implementation-dependent (Implementation-dependent) 36 PIL Illegal Instruction exception Program 37 PPR Privileged Instruction exception Program 38 PTR Trap exception Program 39 FP Floating-point operation Alignment Data Storage Data TLB Program 40 ST Store operation Alignment Data Storage Data TLB Error 41 Reserved 42 DLK0 (Implementation-dependent) (Implementation-dependent) 43 DLK1 (implementation-dependent) (Implementation-dependent) 44 AP Auxiliary Processor operation Alignment Data Storage Data TLB Program 45 PUO Unimplemented Operation exception Program 46 BO Byte Ordering exception Data Storage Instruction Storage 47 PIE Imprecise exception Program 48:52 Reserved 53 DATA Data Access [Category: Embedded.Page LRAT Error Table] 54 TLBI TLB Ineligible [Category: Embedded.Page Data Storage Table] Instruction Storage 55 PT Page Table [Category: Embedded.Page Data Storage Table] Instruction Storage LRAT Error 56 SPV Signal Processing operation [Category: Sig- Alignment nal Processing Engine] Data Storage Vector operation [Category: Vector] Data TLB Embedded Floating-point Data Embedded Floating-point Round SPE/Embedded Floating-point/Vector Unavailable 57 EPID External Process ID operation [Category: Alignment Embedded.External Process ID] Data Storage Data TLB 58 VLEMI VLE operation [Category: VLE] Alignment Data Storage Data TLB SPE/Embedded Floating-point/Vector Unavailable Embedded Floating-point Data Embedded Floating-point Round Instruction Storage Program System Call Chapter 7. Interrupts and Exceptions 1019 Version 2.06 Bit(s) Name Meaning Associated Interrupt Type 59:61 Implementation-dependent (Implementation-dependent) 62 MIF Misaligned Instruction [Category: VLE] Instruction TLB Instruction Storage Figure 52. Exception Syndrome Register 7.2.15 Interrupt Vector Offset Definitions Registers [Category: Embed- Programming Note ded.Phased-Out] The information provided by the ESR is not com- plete. System software may also need to identify The Interrupt Vector Prefix Register (IVPR) is a 64-bit the type of instruction that caused the interrupt, register. Interrupt Vector Prefix Register bits are num- examine the TLB entry accessed by a data or bered 0 (most-significant bit) to 63 (least-significant bit). instruction storage access, as well as examine the The IVPR is used for Machine Check interrupt if the ESR to fully determine what exception or excep- MCIVPR is not supported. The IVPR is used for other tions caused the interrupt. For example, a Data interrupts if Category E.HV is not supported or if the Storage interrupt may be caused by both a Protec- interrupt is directed to the hypervisor state. For these tion Violation exception as well as a Byte Ordering interrupts, the IVPR is used in one of the following exception. System software would have to look ways. beyond ESRBO, such as the state of MSRPR in If Interrupt Vector Offset Registers [Category: SRR1 and the page protection bits in the TLB entry Embedded.Phased-Out] are supported, the follow- accessed by the storage access, to determine ing applies. Bits 48:63 are reserved. Bits 0:47 of whether or not a Protection Violation also occurred. the Interrupt Vector Prefix Register provide the high-order 48 bits of the address of the exception The contents of the ESR can be read into bits 32:63 of processing routines. The 16-bit exception vector register RT using mfspr RT,ESR, setting bits 0:31 of RT offsets from the appropriate IVOR (provided in to zero. The contents of bits 32:63 of register RS can Section 7.2.15) are concatenated to the right of be written into the ESR using mtspr ESR,RS. bits 0:47 of the Interrupt Vector Prefix Register to form the 64-bit address of the exception process- This register is hypervisor privileged. ing routine. If Interrupt Fixed Offsets [Category: Embed- 7.2.14 Guest Exception Syndrome ded.Phased-In] are supported, the following applies. IVPR52:63 are reserved. Bits 0:51 of the Register [Category: Embed- Interrupt Vector Prefix Register provide the high- ded.Hypervisor] order 52 bits of the address of the exception pro- cessing routines. The 12-bit exception vector off- The Guest Exception Syndrome Register (GESR) is a sets (provided in Section 7.2.15) are concatenated 32-bit register. GESR bits are numbered 32 (most-sig- to the right of bits 0:47 of the Interrupt Vector Prefix nificant bit) to 63 (least-significant bit). The GESR is Register to form the 64-bit address of the excep- identical in form and function to the ESR, but is tion processing routine. updated in place of the ESR when an interrupt is directed to the guest. For a description of bit settings and meanings see Section 7.2.13, "Exception Syn- drome Register". The contents of the GESR can be read into bits 32:63 of register RT using mfspr RT,GESR, setting bits 0:31 of RT to zero. The contents of bits 32:63 of register RS can be written into the GESR using mtspr GESR,RS. This register is privileged. Programming Note mfspr RT,ESR should be used to read GESR in guest supervisor state. mtspr ESR,RS should be used to write GESR in guest supervisor state. See Section 2.2.1, "Register Mapping" 1020 Power ISATM Book III-E Version 2.06 Assignments Bits 48:59 of the contents of IVORi can be read into bits IVORi Interrupt 48:59 of register RT using mfspr RT,IVORi, setting bits IVOR0 Critical Input 0:47 and bits 60:63 of GPR(RT) to zero. Bits 48:59 of IVOR1 Machine Check the contents of register RS can be written into bits IVOR2 Data Storage 48:59 of IVORi using mtspr IVORi,RS. IVOR3 Instruction Storage IVOR4 External Input These registers are hypervisor privileged. IVOR5 Alignment IVOR6 Program 7.2.16 Guest Interrupt Vector Off- IVOR7 Floating-Point Unavailable IVOR8 System Call set Register [Category: Embed- IVOR9 Auxiliary Processor Unavailable ded.Hypervisor.Phased-Out] IVOR10 Decrementer IVOR11 Fixed-Interval Timer Interrupt The Guest Interrupt Vector Offset Registers (GIVORs) IVOR12 Watchdog Timer Interrupt are 32-bit registers. Guest Interrupt Vector Offset Reg- IVOR13 Data TLB Error ister bits are numbered 32 (most-significant bit) to 63 IVOR14 Instruction TLB Error (least-significant bit). Bits 32:47 and bits 60:63 are IVOR15 Debug reserved. A Guest Interrupt Vector Offset Register pro- IVOR16 Reserved vides the quadword index from the base address pro- : vided by the GIVPR (see Section 7.2.12) for its IVOR31 respective guest state interrupt. Guest Interrupt Vector Offset Registers are analogous to Interrupt Vector Off- [Category: Signal Processing Engine] set Registers except that they are used when an inter- [Category: Vector] rupt is directed to the guest supervisor state. Figure 54 IVOR32 SPE/Embedded Floating-Point/Vector provides the assignments of specific Guest Interrupt Unavailable Interrupt Vector Offset Registers to specific interrupts. [Category: SP.Embedded Float_*] (IVORs 33 & 34 are required if any SP.Float_ dependent category is supported.) IVORi Interrupt IVOR33 Embedded Floating-Point Data Interrupt GIVOR2 Data Storage IVOR34 Embedded Floating-Point Round Inter- GIVOR3 Instruction Storage rupt GIVOR4 External Input [Category: Embedded Performance Monitor] GIVOR8 System Call IVOR35 Embedded Performance Monitor Inter- GIVOR13 Data TLB Error rupt [Category: Embedded.Hypervisor.LRAT] [Category: Embedded.Processor Control] IVOR 42 LRAT Error Interrupt IVOR36 Processor Doorbell Interrupt IVOR43 Implementation-dependent IVOR37 Processor Doorbell Critical Interrupt : [Category: Embedded.Hypervisor, Embedded.Pro- IVOR63 cessor Control] Figure 54. Guest Interrupt Vector Offset Register IVOR38 Guest Processor Doorbell Interrupt Assignments IVOR39 Guest Processor Doorbell Critical/ Machine Check Interrupt Bits 48:59 of the contents of GIVORi can be read into bits 48:59 of register RT using mfspr RT,GIVORi, set- [Category: Embedded.Hypervisor] ting bits 0:47 and bits 60:63 of GPR(RT) to zero. Bits IVOR40 Embedded Hypervisor System Call 48:59 of the contents of register RS can be written into Interrupt bits 48:59 of GIVORi using mtspr GIVORi,RS. IVOR41 Embedded Hypervisor Privilege Inter- rupt Write access to these registers is hypervisor privileged. Read access to these registers is privileged. [Category: Embedded.Hypervisor.LRAT] IVOR 42 LRAT Error Interrupt IVOR43 Implementation-dependent : IVOR63 Figure 53. Interrupt Vector Offset Register Chapter 7. Interrupts and Exceptions 1021 Version 2.06 The contents of register RS can be written into the Programming Note LPER using mtspr LPER,RS. On a 32-bit implementa- mfspr RT,IVORi should be used to read GIVORi in tion, the contents of register RS32:63 can be written into guest supervisor state. mtspr IVORi,RS should be the LPER0:31 using mtspr LPERU,RS. used to write GIVOR in guest supervisor state. Hypervisor software should emulate the accesses On a 32-bit implementation that supports fewer than 33 for the guest. bits of real address, it is implementation-dependent whether the SPR number for LPERU is treated as a reserved value for mfspr and mtspr. Programming Note The LPER is a hypervisor resource. The architecture only provides a few GIVORs that are implemented in hardware that are performance critical. Hypervisor software should emulate access 7.2.18 Machine Check Registers to IVORs that do not have corresponding GIVORs. A set of Special Purpose Registers are provided to sup- port Machine Check interrupts. 7.2.17 Logical Page Exception Register [Category: Embed- 7.2.18.1 Machine Check Save/Restore Register 0 ded.Hypervisor and Embed- Machine Check Save/Restore Register 0 (MCSRR0) is ded.Page Table] a 64-bit register used to save machine state on The Logical Page Exception Register (LPER) is a 64- Machine Check interrupts, and to restore machine state bit register that is required when both the Embed- when an rfmci is executed. When a Machine Check ded.Hypervisor and Embedded.Page Table categories interrupt is taken, the MCSRR0 is set to the current or are supported. LPER bits are numbered 0 (most-signif- next instruction address. When rfmci is executed, icant bit) to 63 (least-significant bit). instruction execution continues at the address in MCSRR0. /// ALPN /// LPS In general, MCSRR0 contains the address of an 0 12 52 60 63 instruction that was executing or about to be executed when the Machine Check exception occurred. Figure 55. Logical Page Exception Register The contents of MCSRR0 when a Machine Check The LPER fields are described below. interrupt is taken are mode dependent, reflecting the Bit Definition computation mode currently in use (specified by 12:52 Abbreviated Logical Page Number (ALPN) MSRCM) and the computation mode entered for execu- This field contains the Abbreviated Real Page tion of the Machine Check interrupt (specified by Number from the PTE which caused the LRAT EPCRICM) [Category:Embedded.Hypervisor]. The con- Error interrupt. Only bits corresponding to the tents of MCSRR0 upon Machine Check interrupt can PTEARPN bits supported by the implementa- be described as follows (assuming Addr is the address tion need be implemented. to be put into MCSRR0): 60:63 Logical Page Size (LPS) if (MSRCM = 0) & (EPCRICM = 0) This field contains the Page Size from the then MCSRR0 32undefined || Addr32:63 PTE that caused the LRAT Error interrupt. if (MSRCM = 0) & (EPCRICM = 1) then MCSRR0 320 || Addr32:63 All other fields are reserved. if (MSRCM = 1)&(EPCRICM = 1) then MCSRR0 Addr0:63 if (MSRCM=1)&(EPCRICM=0) then MCSRR0 undefined The LPER contains the values of the ARPN and PS fields from the PTE that was used to translate a virtual The contents of MCSRR0 can be read into register RT address for an instruction fetch, Load, Store or Cache using mfspr RT,MCSRR0. The contents of register RS Management instruction that caused an LRAT Error can be written into MCSRR0 using mtspr MCSRR0,RS. interrupt as a result of an LRAT Miss exception. The This register is hypervisor privileged. contents of LPER are unchanged by an interrupt for any other type of exception. The LPER is a hypervisor resource. 7.2.18.2 Machine Check Save/Restore Register 1 The contents of the Logical Page Exception Register can be read into register RT using mfspr RT,LPER. On Machine Check Save/Restore Register 1 (MCSRR1) is a 32-bit implementation, the contents of LPER0:31 can a 32-bit register used to save machine state on be read into register RT32:63 using mfspr RT,LPERU. Machine Check interrupts, and to restore machine state 1022 Power ISATM Book III-E Version 2.06 when an rfmci is executed. When a Machine Check Programming Note interrupt is taken, the contents of the MSR are placed into MCSRR1. When rfmci is executed, the contents of In some implementations that support Interrupt MCSRR1 are placed into the MSR. Fixed Offsets, certain instruction cache errors result in a Machine Check exception. The Machine Bits of MCSRR1 that correspond to reserved bits in the Check interrupt handler needs to be in Caching MSR are also reserved. Inhibited storage in order for the interrupt handler to operate despite an instruction cache error. Programming Note A MSR bit that is reserved may be inadvertently modified by rfi/rfci/rfmci. 7.2.19 External Proxy Register The contents of MCSRR1 can be read into register RT [Category: External Proxy] using mfspr RT,MCSRR1. The contents of register RS The External Proxy Register (EPR) contains implemen- can be written into the MCSRR1 using mtspr tation-dependent information related to an External MCSRR1,RS. Input interrupt when an External Input interrupt occurs. This register is hypervisor privileged. The EPR is only considered valid from the time that the External Input Interrupt occurs until MSREE is set to 1 as the result of a mtmsr or a return from interrupt 7.2.18.3 Machine Check Syndrome instruction. Register The format of the EPR is shown below. MCSR (MCSR) is a 64-bit register that is used to record the cause of the Machine Check interrupt. The EPR specific definition of the contents of this register are 32 63 implementation-dependent (see the User Manual of the Figure 56. External Proxy Register implementation). When the External Input interrupt is taken, the contents The contents of MCSR can be read into register RT of the EPR provide information related to the External using mfspr RT,MCSR. The contents of register RS Input Interrupt. can be written into the MCSR using mtspr MCSR,RS. This register is hypervisor privileged. This register is hypervisor privileged. Programming Note 7.2.18.4 Machine Check Interrupt Vec- The EPR is provided for faster interrupt processing tor Prefix Register as well as situations where an interrupt must be taken, but software must delay the resultant pro- The Machine Check Interrupt Vector Prefix Register cessing for later. (MCIVPR) is a 64-bit register. MCIVPR is supported only if Interrupt Fixed Offsets [Category: Embed- The EPR contains the vector from the interrupt ded.Phased-In] are supported. Whether the MCIVPR is controller. The process of receiving the interrupt supported is implementation-dependent. into the EPR acknowledges the interrupt to the interrupt controller. The method for enabling or dis- Machine Check Interrupt Vector Prefix Register bits are abling the acknowledgment of the interrupt by plac- numbered 0 (most-significant bit) to 63 (least-signifi- ing the interrupt-related information in the EPR is cant bit). MCIVPR52:63 are reserved. Bits 0:51 of the implementation-dependent. If this acknowledge- Machine Check Interrupt Vector Prefix Register provide ment is disabled, then the EPR is set to 0 when the the high-order 52 bits of the address of the Machine External Input interrupt occurs. Check exception processing routine. The 12-bit Machine Check exception vector offset (provided in Section 7.2.15) is concatenated to the right of bits 0:47 of the Machine Check Interrupt Vector Prefix Register 7.2.20 Guest External Proxy Reg- to form the 64-bit address of the Machine Check excep- ister [Category: Embedded Hyper- tion processing routine. visor, External Proxy] The contents of Machine Check Interrupt Vector Prefix Register can be read into register RT using mfspr The Guest External Proxy Register (GEPR) contains RT,IVPR. The contents of register RS can be written implementation-dependent information related to an into Machine Check Interrupt Vector Prefix Register External Input interrupt when an External Input interrupt using mtspr IVPR,RS. directed to the guest occurs. The GEPR is only consid- ered valid from the time that the External Input Interrupt Chapter 7. Interrupts and Exceptions 1023 Version 2.06 occurs until MSREE is set to 1 as the result of a mtmsr or a return from interrupt instruction. The format of the GEPR is shown below. GEPR 32 63 Figure 57. Guest External Proxy Register When the External Input interrupt is taken in the guest supervisor state, the contents of the GEPR provide information related to the External Input Interrupt. The contents of the GEPR can be read into bits 32:63 of register RT using mfspr RT,GEPR, setting bits 0:31 of RT to zero. The contents of bits 32:63 of register RS can be written into the GEPR using mtspr GEPR,RS. The GEPR is identical in form and function to the EPR. This register is privileged. Programming Note The GEPR is provided for faster interrupt process- ing as well as situations where an interrupt must be taken, but software must delay the resultant pro- cessing for later. The GEPR contains the vector from the interrupt controller. The process of receiving the interrupt into the GEPR acknowledges the interrupt to the interrupt controller. The method for enabling or dis- abling the acknowledgment of the interrupt by plac- ing the interrupt-related information in the GEPR is implementation-dependent. If this acknowledge- ment is disabled, then the GEPR is set to 0 when the External Input interrupt occurs. Programming Note mfspr RT,EPR should be used to read GEPR in guest supervisor state. Hypervisor software should emulate the accesses for the guest. This keeps the programming model consistent for an operating system running as a guest and running directly in hypervisor state. Programming Note Writing the GEPR register is allowed from both guest supervisor state and hypervisor state. Hyper- visor must be able to write GEPR to virtualize External Input interrupt handling for the guest if the guest is using External Proxy. Writing to EPR from the guest is not mapped and results in the same behavior as any undefined supervisor level SPR. 1024 Power ISATM Book III-E Version 2.06 7.3 Exceptions the execution of an instruction that is not imple- mented by the implementation (Illegal Instruction There are two kinds of exceptions, those caused exception or Unimplemented Operation exception directly by the execution of an instruction and those type of Program interrupt) caused by an asynchronous event. In either case, the the execution of an auxiliary processor instruction exception may cause one of several types of interrupts when the auxiliary processor instruction is unavail- to be invoked. able (Auxiliary Processor Unavailable interrupt) Examples of exceptions that can be caused directly by the execution of an instruction that causes an aux- the execution of an instruction include but are not lim- iliary processor enabled exception (Enabled ited to the following: exception type Program interrupt) an attempt to execute a reserved-illegal instruction The invocation of an interrupt is precise, except that if (Illegal Instruction exception type Program inter- one of the imprecise modes for invoking the Floating- rupt) point Enabled Exception type Program interrupt is in an attempt by an application program to execute a effect then the invocation of the Floating-point Enabled `privileged' instruction (Privileged Instruction Exception type Program interrupt may be imprecise. exception type Program interrupt) When the interrupt is invoked imprecisely, the except- ing instruction does not appear to complete before the an attempt by an application program to access a next instruction starts (because one of the effects of the `privileged' Special Purpose Register (Privileged excepting instruction, namely the invocation of the Instruction exception type Program interrupt) interrupt, has not yet occurred). an attempt by an application program to access a Special Purpose Register that does not exist (Unimplemented Operation Instruction exception 7.4 Interrupt Classification type Program interrupt) All interrupts, except for Machine Check, can be classi- an attempt by a system program to access a Spe- fied as either Asynchronous or Synchronous. Indepen- cial Purpose Register that does not exist (bound- dent from this classification, all interrupts, including edly undefined results) Machine Check, can be classified into one of the follow- the execution of a defined instruction using an ing classes: invalid form (Illegal Instruction exception type Pro- Guest [Category:Embedded.Hypervisor] gram interrupt, Unimplemented Operation excep- Base tion type Program interrupt, or Privileged Critical Instruction exception type Program interrupt) Machine Check Debug[Category:Embedded.Enhanced Debug]. an attempt to access a storage location that is either unavailable (Instruction TLB Error interrupt or Data TLB Error interrupt) or not permitted 7.4.1 Asynchronous Interrupts (Instruction Storage interrupt or Data Storage Asynchronous interrupts are caused by events that are interrupt) independent of instruction execution. For asynchro- an attempt to access storage with an effective nous interrupts, the address reported to the exception address alignment not supported by the implemen- handling routine is the address of the instruction that tation (Alignment interrupt) would have executed next, had the asynchronous inter- the execution of a System Call instruction (System rupt not occurred. Call interrupt) the execution of a Trap instruction whose trap con- 7.4.2 Synchronous Interrupts dition is met (Trap type Program interrupt) Synchronous interrupts are those that are caused the execution of a floating-point instruction when directly by the execution (or attempted execution) of floating-point instructions are unavailable (Float- instructions, and are further divided into two classes, ing-point Unavailable interrupt) precise and imprecise. the execution of a floating-point instruction that Synchronous, precise interrupts are those that pre- causes a floating-point enabled exception to exist cisely indicate the address of the instruction causing (Enabled exception type Program interrupt) the exception that generated the interrupt; or, for cer- the execution of a defined instruction that is not tain synchronous, precise interrupt types, the address implemented by the implementation (Illegal of the immediately following instruction. Instruction exception or Unimplemented Operation Synchronous, imprecise interrupts are those that may exception type of Program interrupt) indicate the address of the instruction causing the Chapter 7. Interrupts and Exceptions 1025 Version 2.06 exception that generated the interrupt, or some instruc- been partially executed (see Section 7.7 on tion after the instruction causing the exception. page 1054). If the imprecise interrupt is forced by the execution synchronizing mechanism, due to executing an 7.4.2.1 Synchronous, Precise Inter- execution synchronizing instruction other than rupts sync or isync, then GSRR0 [Category: Embed- When the execution or attempted execution of an ded.Hypervisor], SRR0, or CSRR0 addresses the instruction causes a synchronous, precise interrupt, the interrupt-forcing instruction, and the interrupt-forc- following conditions exist at the interrupt point. ing instruction appears not to have begun execu- GSRR0 [Category: Embedded.Hypervisor], SRR0, tion (except for its forcing the imprecise interrupt). CSRR0, or DSRR0 [Category: Embed- If the imprecise interrupt is forced by an sync or ded.Enhanced Debug] addresses either the isync instruction, then GSRR0 [Category: Embed- instruction causing the exception or the instruction ded.Hypervisor], SRR0, or CSRR0 may address immediately following the instruction causing the either the sync or isync instruction, or the follow- exception. Which instruction is addressed can be ing instruction. determined from the interrupt type and status bits. If the imprecise interrupt is not forced by either the An interrupt is generated such that all instructions context synchronizing mechanism or the execution preceding the instruction causing the exception synchronizing mechanism, then the instruction appear to have completed with respect to the exe- addressed by GSRR0 [Category: Embed- cuting thread. However, some storage accesses ded.Hypervisor], SRR0, or CSRR0 may have been associated with these preceding instructions may partially executed (see Section 7.7 on page 1054). not have been performed with respect to other No instruction following the instruction addressed threads and mechanisms. by GSRR0 [Category: Embedded.Hypervisor], The instruction causing the exception may appear SRR0, or CSRR0 has executed. not to have begun execution (except for causing the exception), may have been partially executed, 7.4.3 Interrupt Classes or may have completed, depending on the inter- rupt type. See Section 7.7 on page 1054. Interrupts can also be classified as guest [Category: Architecturally, no subsequent instruction has exe- Embedded.Hypervisor], base, critical, Machine Check, cuted beyond the instruction causing the excep- and Debug [Category: Embedded.Enhanced Debug]. tion. Interrupt classes other than the guest [Category: Embedded.Hypervisor] or base class may demand 7.4.2.2 Synchronous, Imprecise Inter- immediate attention even if another class of interrupt is rupts currently being processed and software has not yet had the opportunity to save the state of the machine (i.e. When the execution or attempted execution of an return address and captured state of the MSR). For this instruction causes an imprecise interrupt, the following reason, the interrupts are organized into a hierarchy conditions exist at the interrupt point. (see Section 7.8). To enable taking a critical, Machine When the execution or attempted execution of an Check, or Debug [Category: Embedded.Enhanced instruction causes an imprecise interrupt, the following Debug] interrupt immediately after a guest [Category: conditions exist at the interrupt point. Embedded.Hypervisor] or base class interrupt occurs (i.e. before software has saved the state of the GSRR0 [Category: Embedded.Hypervisor], SRR0, machine), these interrupts use the Save/Restore Reg- or CSRR0 addresses either the instruction causing ister pair CSRR0/CSRR1, MCSRR0/MCSRR1, or the exception or some instruction following the DSRR0/DSRR1 [Category: Embedded.Enhanced instruction causing the exception that generated Debug], and guest [Category: Embedded.Hypervisor] the interrupt. and base class interrupts use Save/Restore Register An interrupt is generated such that all instructions pairs GSRR0/GSRR1 and SRR0/SRR1.respectively. preceding the instruction addressed by GSRR0 [Category: Embedded.Hypervisor], SRR0, or rupts use Save/Restore Register pair SRR0/SRR1. CSRR0 appear to have completed with respect to the executing thread. If the imprecise interrupt is forced by the context 7.4.4 Machine Check Interrupts synchronizing mechanism, due to an instruction Machine Check interrupts are a special case. They are that causes another exception that generates an typically caused by some kind of hardware or storage interrupt (e.g., Alignment, Data Storage), then subsystem failure, or by an attempt to access an invalid GSRR0 [Category: Embedded.Hypervisor] or address. A Machine Check may be caused indirectly by SRR0 addresses the interrupt-forcing instruction, the execution of an instruction, but not be recognized and the interrupt-forcing instruction may have and/or reported until long after the thread has executed 1026 Power ISATM Book III-E Version 2.06 past the instruction that caused the Machine Check. As 4. The MSR is updated as described below. The new such, Machine Check interrupts cannot properly be values take effect beginning with the first instruc- thought of as synchronous or asynchronous, nor as tion following the interrupt. MSR bits of particular precise or imprecise. The following general rules apply interest are the following. to Machine Check interrupts: MSREE,PR,FP,FE0,FE1,IS,DS,SPV are set to 0 by all interrupts. 1. No instruction after the one whose address is If Category E.HV is supported, MSRWE and reported to the Machine Check interrupt handler in MSRGS are left unchanged when an interrupt MCSRR0 has begun execution. is directed to the guest supervisor state, other- 2. The instruction whose address is reported to the wise they are set to 0 by all interrupts. Machine Check interrupt handler in MCSRR0, and If Category E.HV is supported, MSRPMM is left all prior instructions, may or may not have com- unchanged when an interrupt is directed to the pleted successfully. All those instructions that are guest supervisor state and MSRPPMMP = 1, ever going to complete appear to have done so otherwise MSRPMM is set to 0 by all interrupts. already, and have done so within the context exist- If Category E.HV is supported, MSRUCLE is ing prior to the Machine Check interrupt. No further left unchanged when an interrupt is directed to interrupt (other than possible additional Machine the guest supervisor state and MSRPUCLEP = Check interrupts) will occur as a result of those 1, otherwise MSRUCLE is set to 0 by all inter- instructions. rupts. MSRME is set to 0 by Machine Check inter- rupts and left unchanged by all other inter- 7.5 Interrupt Processing rupts. MSRCE is set to 0 by critical class interrupts, Associated with each kind of interrupt is an interrupt Debug interrupts, and Machine Check inter- vector, that is the address of the initial instruction that is rupts, and is left unchanged by all other inter- executed when the corresponding interrupt occurs. rupts. MSRDE is set to 0 by critical class interrupts When category Embedded.Hypervisor is implemented, unless Category E.ED is supported, by Debug interrupts are directed (see Section 2.3.1, "Directed interrupts, and by Machine Check interrupts, Interrupts") to the guest supervisor state or the hypervi- and is left unchanged by all other interrupts. sor state, which effects how some MSR bits are set. If Category E.HV is supported and the inter- The conditions under which a given interrupt is directed rupt is directed to the guest supervisor state, to the guest supervisor state or hypervisor state is more MSRCM is set to EPCRGICM, otherwise fully described in the interrupt definitions for each inter- MSRCM is set to EPCRICM. rupt in Section 7.6, "Interrupt Definitions". Other supported MSR bits are left unchanged Interrupt processing consists of saving a small part of by all interrupts. the thread's state in certain registers, identifying the See Section 4.2.1 for more detail on the definition cause of the interrupt in another register, and continu- of the MSR. ing execution at the corresponding interrupt vector location. When an exception exists that will cause an 5. Instruction fetching and execution resumes, using interrupt to be generated and it has been determined the new MSR value, at a location specific to the that the interrupt can be taken, the following actions are interrupt. If Category E.HV is supported, and the performed, in order: interrupt is directed to the guest state, the location is one of the following, where IVORi (GIVORi) is 1. GSRR0 [Category: Embedded.Hypervisor], SRR0, the (Guest) Interrupt Vector Offset Register for that DSRR0 [Category: Embedded.Enhanced Debug], interrupt (see Figure 54 on page 1021): MCSRR0, or CSRR0 is loaded with an instruction GIVPR0:47 ||GIVORi48:59 || 0b0000 if address that depends on the interrupt; see the Interrupt Vector Offset Registers [Cat- specific interrupt description for details. egory: Embedded.Phased-Out] are 2. The GESR [Category: Embedded.Hypervisor] or supported ESR is loaded with information specific to the GIVPR0:51 || fixed offset shown in Fig- exception. Note that many interrupts can only be ure Figure 53 on page 1021 if Interrupt caused by a single kind of exception event, and Fixed Offsets [Category: Embed- thus do not need nor use an ESR setting to indi- ded.Phased-In] are supported cate to the cause of the interrupt was. Otherwise, the location is one of the following: 3. GSRR1 [Category: Embedded.Hypervisor], SRR1, IVPR0:47 || IVORi48:59 || 0b0000 if DSRR1 [Category: Embedded.Enhanced Debug], Interrupt Vector Offset Registers [Cat- or MCSRR1, or CSRR1 is loaded with a copy of egory: Embedded.Phased-Out] are the contents of the MSR. supported Chapter 7. Interrupts and Exceptions 1027 Version 2.06 IVPR0:51 || fixed offset shown in Figure Programming Note Figure 53 on page 1021 if Interrupt Fixed Offsets [Category: Embed- In general, at process switch (partition switch), due ded.Phased-In] are supported and to possible process interlocks and possible data either MCIVPR is not supported or the availability requirements, the operating system interrupt is not a Machine Check (hypervisor) needs to consider executing the fol- MCIVPR0:51 || fixed offset shown in lowing. Figure Figure 53 on page 1021 if Inter- stbcx., sthcx., stwcx. or stdcx., to clear the rupt Fixed Offsets [Category: Embed- reservation if one is outstanding, to ensure that ded.Phased-In] are supported, a lbarx, lharx, lwarx or ldarx in the "old" pro- MCIVPR is supported, and the inter- cess (partition) is not paired with a stbcx., rupt is a Machine Check sthcx., stwcx., or stdcx. in the "new" process The contents of the (Guest) Interrupt Vector Prefix (partition). Register, Machine Check Interrupt Vector Prefix sync, to ensure that all storage operations of Register, and (Guest) Interrupt Vector Offset Reg- an interrupted process are complete with isters are indeterminate upon power-on reset, and respect to other threads before that process must be initialized by system software using the begins executing on another thread. mtspr instruction. isync, rfgi , rfi, rfdi [Category: Embedded.Enhanced Debug], rfmci, or rfci to It is implementation-dependent whether interrupts clear ensure that the instructions in the "new" pro- reservations obtained with lbarx, lharx, lwarx, or cess execute in the "new" context. ldarx. Interrupts might not clear reservations obtained with Load and Reserve instructions. The operating system or hypervisor should do so at appropriate points, such as at process switch or a partition switch. At the end of an interrupt handling routine, execution of an rfgi [Category: Embedded.Hypervsior], rfi, rfdi [Cat- egory: Embedded.Enhanced Debug], rfmci, or rfci causes the MSR to be restored from the contents of GSRR1 [Category: Embedded.Hypervisor], SRR1, DSRR1 [Category: Embedded.Enhanced Debug], MCSRR1, or CSRR1, and instruction execution to resume at the address contained in GSRR0 [Category: Embedded.Hypervisor], SRR0, DSRR0 [Category: Embedded.Enhanced Debug], MCSRR0, or CSRR0, respectively. 1028 Power ISATM Book III-E Version 2.06 Programming Note For instruction-caused interrupts, in some cases it may system supports, or by an instruction that is in be desirable for the operating system to emulate the a category that the implementation does not instruction that caused the interrupt, while in other support but is used by some programs that cases it may be desirable for the operating system not the operating system supports. to emulate the instruction. The following list, while not In general, the instruction should not be emulated if: complete, illustrates criteria by which decisions regard- ing emulation should be made. The list applies to gen- - The purpose of the instruction is to cause an eral execution environments; it does not necessarily interrupt. Example: System Call interrupt apply to special environments such as program debug- caused by sc. ging, bring-up, etc. - The interrupt is caused by a condition that is In general, the instruction should be emulated if: stated, in the instruction description, poten- tially to cause the interrupt. Example: Align- - The interrupt is caused by a condition for ment interrupt caused by lwarx for which the which the instruction description (including storage operand is not aligned. related material such as the introduction to the section describing the instruction) implies that - The program is attempting to perform a func- the instruction works correctly. Example: tion that it should not be permitted to perform. Alignment interrupt caused by lmw for which Example: Data Storage interrupt caused by the storage operand is not aligned, or by dcbz lwz for which the storage operand is in stor- or dcbzep for which the storage operand is in age that the program should not be permitted storage that is Write Through Required or to access. (If the function is one that the pro- Caching Inhibited. gram should be permitted to perform, the con- ditions that caused the interrupt should be - The instruction is an illegal instruction that corrected and the program re-dispatched such should appear, to the program executing it, as that the instruction will be re-executed. Exam- if it were supported by the implementation. ple: Data Storage interrupt caused by lwz for Example: Illegal Instruction type Program which the storage operand is in storage that interrupt caused by an instruction that has the program should be permitted to access been phased out of the architecture but is still but for which there currently is no TLB entry.) used by some programs that the operating Chapter 7. Interrupts and Exceptions 1029 Version 2.06 7.6 Interrupt Definitions Table 58 provides a summary of each interrupt type, mask the interrupt type and which Interrupt Vector Off- the various exception types that may cause that inter- set Register is used to specify that interrupt type's vec- rupt type, the classification of the interrupt, which ESR tor address. (GESR) bits can be set, if any, which MSR bits can Synchronous, Imprecise (Section 1.3.5 of Book I) Synchronous, Precise Notes (see page 1032 DBCR0/TCR Mask Bit MSR Mask Bit(s)1 Asynchronous Category Critical ESR (GESR) Page IVOR Interrupt Exception (See Note 5) IVOR0 Critical Input Critical Input x x CE E 1 1034 or GS IVOR1 Machine Check Machine Check ME E 2,4 1034 or GS IVOR2 Data Storage Access x [ST],[FP,AP,SPV] E 9 1035 GIVOR2 , [E.HV] [PT], [VLEMI], [EPID] Load and Reserve or x [ST], E 9 Store Conditional to [VLEMI] `write-thru required' storage (W=1) Cache Locking x {DLK0,DLK1},[ST] E 8 [VLEMI] Byte Ordering x BO, [ST], E [FP,AP,SPV], [VLEMI], [EPID] Virtualization Fault [ST], E.PT [FP,AP,SPV], [VLEMI], [EPID] Page Table Fault PT, [ST], E.PT [FP,AP,SPV], [VLEMI], [EPID] TLB Ineligible TLBI,[ST], E.PT [FP,AP,SPV], [VLEMI], [EPID] IVOR3 Inst Storage Access x [PT] E 1037 GIVOR3 Byte Ordering x BO, E [E.HV] [VLEMI] Mismatched Instruction x BO,VLEMI E, 1 Storage (See Book VLE VLE.)) Misaligned Instruction x MIF E, 1 Storage (See Book VLE VLE.) Page Table Fault PT E.PT TLB Ineligible TLBI E.PT 1030 Power ISATM Book III-E Version 2.06 Synchronous, Imprecise (Section 1.3.5 of Book I) Synchronous, Precise Notes (see page 1032 DBCR0/TCR Mask Bit MSR Mask Bit(s)1 Asynchronous Category Critical ESR (GESR) Page IVOR Interrupt Exception (See Note 5) IVOR4 External Input External Input x EE E 1 1039 or GS GIVOR4 External Input External Input x EE E.HV 1 1039 and GS IVOR5 Alignment Alignment x [ST],[FP,AP,SPV] E 1040 [EPID],[VLEMI] IVOR6 Program Illegal x PIL, [VLEMI] E 1042 Privileged x PPR,[AP], E [VLEMI] Trap x PTR,[VLEMI] E FP Enabled x x FP, [PIE] FE0, E 6,7 FE1 AP Enabled x x AP E Unimplemented Op x PUO, [VLEMI] E 7 [FP,AP,SPV] IVOR7 FP Unavailable FP Unavailable x E 1042 IVOR8 System Call System Call x [VLEMI] E, 1042 GIVOR8 E.HV {E.HV] IVOR9 AP Unavailable AP Unavailable x E 1043 IVOR10 Decrementer x EE DIE E 1043 or GS IVOR11 FIT x EE FIE E 1044 or GS IVOR12 Watchdog x x CE WIE E 10 1044 or GS IVOR13 Data TLB Error Data TLB Miss x [ST],[FP,AP,SPV] E, 1045 GIVOR13 [VLEMI],[EPID] E.HV IVOR14 Inst TLB Error Inst TLB Miss x [MIF] E, 1046 GIVOR14 E.HV IVOR15 Debug Trap x x DE IDM E 10 1046 Inst Addr Compare x x DE IDM E 10 Data Addr Compare x x DE IDM E 10 Instruction Complete x x DE IDM E 3,10 Branch Taken x x DE IDM E 3,10 Return From Interrupt x x DE IDM E 10 Interrupt Taken x x DE IDM E 10 Uncond Debug Event x x DE IDM E.ED 10 Critical Interrupt Taken x DE IDM E.ED Critical Interrupt Return x DE IDM E.ED Chapter 7. Interrupts and Exceptions 1031 Version 2.06 Synchronous, Imprecise (Section 1.3.5 of Book I) Synchronous, Precise Notes (see page 1032 DBCR0/TCR Mask Bit MSR Mask Bit(s)1 Asynchronous Category Critical ESR (GESR) Page IVOR Interrupt Exception (See Note 5) IVOR32 SPE/Embedded SPE Unavailable x SPV, [VLEMI] SPE 1047 Floating-Point/Vector Unavailable Vector Unavailable SPV V IVOR33 Embedded Floating- Embedded Floating- x SPV, [VLEMI] SP.F* 1048 Point Data Point Data IVOR34 Embedded Floating- Embedded Floating- x SPV, [VLEMI] SP.F* 1048 Point Round Point Round IVOR35 Embedded Perfor- Embedded Perfor- x E.PM mance Monitor mance Monitor IVOR36 Processor Doorbell Processor Doorbell x EE E.PC or GS IVOR37 Processor Doorbell Processor Doorbell Crit- x x CE E.PC Critical ical or GS IVOR38 Guest Processor Guest Processor Door- x EE E.PC, Doorbell bell and E.HV GS IVOR39 Guest Processor Guest Processor Door- x x CE E.PC, Doorbell Critical bell Critical and E.HV GS Guest Processor Guest Processor Door- x x ME E.PC, Doorbell Machine bell Machine Check and E.HV Check GS IVOR40 Embedded Hypervi- Embedded Hypervisor x [VLEMI] E.HV sor System Call System Call IVOR41 Embedded Hypervi- Embedded Hypervisor x [VLEMI] E.HV sor Privilege Privilege IVOR42 LRAT Error LRAT Miss x [ST],[FP,AP,SPV] E.HV. 1052 [DATA],[PT] LRAT [VLEMI], [EPID] 1. If an expression of MSR bits is provided, the interrupt is masked if the expression evaluates to 0 and is enabled if the expression evaluates to 1. Figure 58. Interrupt and Exception Types Figure 58 Notes 3. The Instruction Complete and Branch Taken debug events are only defined for MSRDE=1 when 1. Although it is not specified, it is common for sys- in Internal Debug Mode (DBCR0IDM=1). In other tem implementations to provide, as part of the words, when in Internal Debug Mode with interrupt controller, independent mask and status MSRDE=0, then Instruction Complete and Branch bits for the various sources of Critical Input and Taken debug events cannot occur, and no DBSR External Input interrupts. status bits are set and no subsequent imprecise 2. Machine Check interrupts are a special case and Debug interrupt will occur (see Section 10.4 on are not classified as asynchronous nor synchro- page 1078). nous. See Section 7.4.4 on page 1026. 1032 Power ISATM Book III-E Version 2.06 4. Machine Check status information is commonly 7.6.1 Interrupt Fixed Offsets [Cat- provided as part of the system implementation, but is implementation-dependent. egory: Embedded.Phased-In] 5. In general, when an interrupt causes a particular Figure 53 on page 1021 shows the 12-bit low-order ESR (GESR) bit or bits to be set (or cleared) as effective address offset for each interrupt type. This indicated in the table, it also causes all other ESR value is the offset from the base address provided by (GESR) bits to be cleared. There may be special either the IVPR (see Section 7.2.11) or the GIVPR (see rules regarding the handling of implementation- Section 7.2.12). specific ESR (GESR) bits. Legend: [xxx] means ESR(GESR)xxx could be set [xxx,yyy] means either ESR(GESR)xxx or ESR(GESR)yyy may be set, but never both (xxx,yyy) means either ESR(GESR)xxx or ESR(GESR)yyy will be set, but never both {xxx,yyy} means either ESR(GESR)xxx or ESR(GESR)yyy will be set, or possibly both xxx means ESR(GESR)xxx is set 6. The precision of the Floating-point Enabled Excep- tion type Program interrupt is controlled by the MSRFE0,FE1 bits. When MSRFE0,FE1=0b01 or 0b10, the interrupt may be imprecise. When such a Program interrupt is taken, if the address saved in SRR0 is not the address of the instruction that caused the exception (i.e. the instruction that caused FPSCRFEX to be set to 1), ESRPIE is set to 1. When MSRFE0,FE1=0b11, the interrupt is pre- cise. When MSRFE0,FE1=0b00, the interrupt is masked, and the interrupt will subsequently occur imprecisely if and when Floating-point Enabled Exception type Program interrupts are enabled by setting either or both of MSRFE0,FE1, and will also cause ESRPIE to be set to 1. See Section 7.6.8. Also, exception status on the exact cause is avail- able in the Floating-Point Status and Control Reg- ister (see Section 4.2.2 and Section 4.4 of Book I). The precision of the Auxiliary Processor Enabled Exception type Program interrupt is implementa- tion-dependent. 7. Auxiliary Processor exception status is commonly provided as part of the implementation. 8. Cache locking and cache locking exceptions are implementation-dependent. 9. Software must examine the instruction and the subject TLB entry to determine the exact cause of the interrupt. 10. If the Embedded.Enhanced Debug category is enabled, this interrupt is not a critical interrupt. DSRR0 and DSRR1 are used instead of CSRR0 and CSRR1. Chapter 7. Interrupts and Exceptions 1033 Version 2.06 7.6.2 Critical Input Interrupt offset Interrupt A Critical Input interrupt occurs when no higher priority 0x000 Machine Check exception exists (see Section 7.9 on page 1059), a Crit- 0x020 Critical Input ical Input exception is presented to the interrupt mecha- 0x040 Debug nism, and MSRCE=1. While the specific definition of a 0x060 Data Storage1 Critical Input exception is implementation-dependent, it 0x080 Instruction Storage1 would typically be caused by the activation of an asyn- 0x0A0 External Input1 chronous signal that is part of the system. Also, imple- 0x0C0 Alignment mentations may provide an alternative means (in 0x0E0 Program addition to MSRCE) for masking the Critical Input inter- 0x100 Floating-Point Unavailable rupt. 0x120 System Call1 CSRR0, CSRR1, and MSR are updated as follows: 0x140 Auxiliary Processor Unavailable 0x160 Decrementer1 CSRR0 Set to the effective address of the next 0x180 Fixed-Interval Timer Interrupt instruction to be executed. 0x1A0 Watchdog Timer Interrupt CSRR1 Set to the contents of the MSR at the time 0x1C0 Data TLB Error1 of the interrupt. 0x1E0 Instruction TLB Error1 MSR [Category: Signal Processing Engine] CM MSRCM is set to EPCRICM. [Category: Vector] ME Unchanged. 0x200 SPE/Embedded Floating-Point/Vector DE Unchanged if category E.ED is supported; Unavailable Interrupt otherwise set to 0 [Category: SP.Embedded Float_*] All other defined MSR bits set to 0. (The following vector offsets are required if any SP.Float_ dependent category is supported.) If Interrupt Fixed Offsets [Category: Embed- ded.Phased-In] are supported, instruction execution 0x220 Embedded Floating-Point Data Interrupt resumes at address IVPR0:51 || 0x020. Otherwise, 0x240 Embedded Floatg.-pt. round Interrupt instruction execution resumes at address IVPR0:47 || [Category: Embedded Performance Monitor] IVOR048:59|| 0b0000. 0x260 Embedded Performance Monitor Interrupt [Category: Embedded.Processor Control] Programming Note 0x280 Processor Doorbell Interrupt Software is responsible for taking any action(s) that 0x2A0 Processor Doorbell Critical Interrupt are required by the implementation in order to clear any Critical Input exception status prior to re- [Category: Embedded.Hypervisor] enabling MSRCE in order to avoid another, redun- 0x2C0 Guest Processor Doorbell dant Critical Input interrupt. 0x2E0 Guest Processor Doorbell Critical; Guest Processor Doorbell Machine Check 0x300 Embedded Hypervisor System Call 7.6.3 Machine Check Interrupt 0x320 Embedded Hypervisor Privilege [Category: Embedded.Hypervisor.LRAT] A Machine Check interrupt occurs when no higher pri- ority exception exists (see Section 7.9 on page 1059), a 0x340 LRAT Error interrupt Machine Check exception is presented to the interrupt 0x360 Reserved mechanism, and MSRME=1. The specific cause or ... causes of Machine Check exceptions are implementa- 0x7FF tion-dependent, as are the details of the actions taken 0x800 Implementation-dependent on a Machine Check interrupt. ... If the Machine Check Extension is implemented, 0xFFF MCSRR0, MCSRR1, and MCSR are set, otherwise 1 CSRR0, CSRR1, and ESR are set. The registers are MSRGS is not modified by the occurrence of the interrupt except as described in Section 7.5. updated as follows: Figure 59. Interrupt Vector Offsets CSRR0/MCSRR0 Set to an instruction address. As closely as possible, set to the effective address of an instruction that was executing or about to be executed when the Machine Check exception occurred. 1034 Power ISATM Book III-E Version 2.06 CSRR1/MCSRR1 Programming Note Set to the contents of the MSR at the time of the interrupt. On implementations on which a Machine Check interrupt can be caused by referring to an invalid MSR real address, executing a dcbz, dcbzep, or dcba CM MSRCM is set to EPCRICM. instruction can cause a delayed Machine Check DE Unchanged if category E.ED is supported; interrupt by establishing in the data cache a block otherwise set to 0. that is associated with an invalid real address. See All other defined MSR bits set to 0. Section 4.3 of Book II. A Machine Check interrupt can eventually occur if and when a subsequent ESR/MCSR attempt is made to write that block to main storage, Implementation-dependent. for example as the result of executing an instruc- Instruction execution resumes at address IVPR0:47 || tion that causes a cache miss for which the block is IVOR148:59||0b0000. the target for replacement or as the result of exe- cuting a dcbst, dcbstep, dcbf, or dcbfep instruc- If the Embedded.Hypervisor category is supported, a tion. Machine Check interrupt caused by the existence of multiple direct TLB entries or multiple indirect TLB entries (or similar entries in implementation-specific 7.6.4 Data Storage Interrupt translation caches) which translate a given virtual address (see Section 6.7.3) must occur while still in the A Data Storage interrupt may occur when no higher pri- context of the partition or hypervisor state that caused ority exception exists (see Section 7.9 on page 1059) it. In these cases, the interrupt must be presented in a and a Data Storage exception is presented to the inter- way that permits continuing execution. Treating the rupt mechanism. A Data Storage exception is caused exception as instruction-caused allows these require- when any of the following exceptions arises during exe- ments to be achieved. Also, if the Embedded.Hypervi- cution of an instruction: sor category is supported, a Machine Check interrupt resulting from the following situations must be precise. Read Access Control exception Execution of an External Process ID instruc- A Read Access Control exception is caused when one tion that has an operand that can be trans- of the following conditions exist. lated by multiple TLB entries. Execution of a tlbivax instruction that isn't a While in user mode (MSRPR=1), a Load or `load- TLB invalidate all and there are multiple class' Cache Management instruction attempts to entries in a single thread's TLB array(s) that access a location in storage that is not user mode match the complete VPN. read enabled (i.e. page access control bit UR=0). Execution of a tlbilx instruction with T=3 and While in supervisor mode (MSRPR=0), a Load or there are multiple entries in the TLB array(s) `load-class' Cache Management instruction that match the complete VPN. attempts to access a location in storage that is not Execution of a tlbsx or tlbsrx. instruction and supervisor mode read enabled (i.e. page access there are multiple matching TLB entries. control bit SR=0). If Interrupt Fixed Offsets [Category: Embed- Write Access Control exception ded.Phased-In] are supported and Machine Check Interrupt Vector Prefix Register is supported, instruction A Write Access Control exception is caused when one execution resumes at address MCIVPR0:51 || 0x000. If of the following conditions exist. Interrupt Fixed Offsets [Category: Embedded.Phased- While in user mode (MSRPR=1), a Store or `store- In] are supported and Machine Check Interrupt Vector class' Cache Management instruction attempts to Prefix Register is not implemented, instruction execu- access a location in storage that is not user mode tion resumes at address IVPR0:51 || 0x000. If Interrupt write enabled (i.e. page access control bit UW=0). Fixed Offsets [Category: Embedded.Phased-In] are While in supervisor mode (MSRPR=0), a Store or supported, instruction execution resumes at address `store-class' Cache Management instruction IVPR0:47 || IVOR148:59|| 0b0000. attempts to access a location in storage that is not supervisor mode write enabled (i.e. page access Programming Note control bit SW=0). If a Machine Check interrupt is caused by an error in the storage subsystem, the storage subsystem Byte Ordering exception may return incorrect data, that may be placed into A Byte Ordering exception may occur when the imple- registers and/or on-chip caches. mentation cannot perform the data storage access in the byte order specified by the Endian storage attribute of the page being accessed. Chapter 7. Interrupts and Exceptions 1035 Version 2.06 Cache Locking exception to hypervisor state regardless of the setting of EPCRD- SIGS A Cache Locking exception may occur when the locked state of one or more cache lines has the potential to be Instructions lswx or stswx with a length of zero, icbt, altered. This exception is implementation-dependent. dcbt, dcbtep, dcbtst, dcbtstep, or dcba cannot cause a Data Storage interrupt, regardless of the effective Storage Synchronization exception address. A Storage Synchronization exception will occur when an attempt is made to execute a Load and Reserve or Programming Note Store Conditional instruction from or to a location that is The icbi, icbiep, icbt, icbtls and icblc instructions Write Through Required or Caching Inhibited (if the are treated as Loads from the addressed byte with interrupt does not occur then the instruction executes respect to address translation and protection. correctly: see Section 4.4.2 of Book II). These Instruction Cache Management instructions use MSRDS, not MSRIS, to determine translation If a stbcx., sthcx., stwcx., or stdcx. would not perform for their operands. Instruction Storage exceptions its store in the absence of a Data Storage interrupt, and and Instruction TLB Miss exceptions are associ- either (a) the specified effective address refers to stor- ated with the `fetching' of instructions not with the age that is Write Through Required or Caching Inhib- `execution' of instructions. Data Storage exceptions ited, or (b) a non-conditional Store to the specified and Data TLB Miss exceptions are associated with effective address would cause a Data Storage inter- the `execution' of Instruction Cache Management rupt, it is implementation-dependent whether a Data instructions. One exception to the above is that Storage interrupt occurs. icbtls and icblc only cause a Data Storage excep- Page Table Fault exception tion if they have neither execute access nor read access. A Page Table Fault exception is caused when a Page Table translation occurs for a data access due to a When a Data Storage interrupt occurs, the thread sup- Load, Store or Cache Management instruction and the presses the execution of the instruction causing the Page Table Entry that is accessed is invalid (PTE Valid Data Storage exception. bit = 0). If category Embedded.Hypervisor is not supported or if TLB Ineligible exception category Embedded.Hypervisor is supported and the interrupt is directed to hypervisor state, SRR0, SRR1, A TLB Ineligible exception is caused when a Page MSR, DEAR, and ESR are updated as follows: Table translation occurs for a data access due to a Load, Store or Cache Management instruction and any SRR0 Set to the effective address of the instruc- of the following conditions are true. tion causing the Data Storage interrupt. The only TLB entries that can be used to hold the SRR1 Set to the contents of the MSR at the time translation for the virtual address have IPROT=1 of the interrupt. No TLB array can be loaded from the Page Table for the page size specified by the PTE. MSR The PTEARPN is treated as an LPN (The Embed- CM MSRCM is set to EPCRICM. ded.Hypervisor category is supported) and there is CE, ME, no TLB array that meets all the following condi- DE Unchanged. tions. The TLB array supports the page size speci- All other defined MSR bits set to 0. fied by the PTE. DEAR Set to the effective address of a byte that is The TLB array can be loaded from the Page both within the range of the bytes being Table (TLBnCFGPT = 1). accessed by the Storage Access or Cache If the Embedded.Hypervisor category is supported, an Management instruction, and within the Data Storage interrupt resulting from a TLB Ineligible page whose access caused the Data Stor- exception is always directed to hypervisor state regard- age exception. less of the setting of EPCRDSIGS. ESR Virtualization Fault exception [Category: Embed- FP Set to 1 if the instruction causing the inter- ded.Hypervisor] rupt is a floating-point load or store; other- wise set to 0. A Virtualization Fault exception will occur when a Load, ST Set to 1 if the instruction causing the inter- Store, or Cache Management Instruction attempts to rupt is a Store or `store-class' Cache Man- access a location in storage that has the Virtualization agement instruction; otherwise set to 0. Fault (VF) bit set. A Data Storage interrupt resulting from a Virtualization Fault exception is always directed 1036 Power ISATM Book III-E Version 2.06 DLK0:1 Set to an implementation-dependent value 5. Byte Ordering: BO due to a Cache Locking exception causing 6. Read Access or Write Access: If the exception the interrupt. occurred during a Page Table translation, PT AP Set to 1 if the instruction causing the inter- rupt is an Auxiliary Processor load or store; otherwise set to 0. Programming Note BO Set to 1 if the instruction caused a Byte Ordering exception; otherwise set to 0. Since some Data Storage exceptions are not mutu- TLBI Set to 1 if a TLB Ineligible exception ally exclusive, system software may need to exam- occurred during a Page Table translation ine the TLB entry or the Page Table entry accessed for the instruction causing the interrupt; oth- by the data storage access in order to determine erwise set to 0. whether additional exceptions may have also PT If a Page Table Fault or Read or Write occurred. Access Control exception occurred during a Page Table translation for the instruction If Category Embedded.Hypervisor is supported and the causing the interrupt, then PT is set to 1 if interrupt is directed to the guest state, instruction exe- no TLB entry was created from the Page cution resumes at the address given by one of the fol- Table and is set to an implementation- lowing. dependent value if a TLB entry was cre- GIVPR0:47 || GIVOR248:59||0b0000 if IVORs [Cate- ated. See Section 6.7.4 for rules about TLB gory: Embedded.Phased-Out] are supported. updates. If no Page Table Fault or Read or GIVPR0:51||0x060 if Interrupt Fixed Offsets [Cate- Write Access Control exception occurred gory: Embedded.Phased-In] are supported. during a Page Table translation for the Otherwise, instruction execution resumes at the instruction causing the interrupt, set to 0. address given by one of the following. SPV Set to 1 if the instruction causing the inter- IVPR0:47 || IVOR248:59||0b0000 if IVORs [Cate- rupt is a SPE operation or a Vector opera- gory: Embedded.Phased-Out] are supported. tion; otherwise set to 0. IVPR0:51||0x060 if Interrupt Fixed Offsets [Cate- VLEMI Set to 1 if the instruction causing the inter- gory: Embedded.Phased-In] are supported. rupt resides in VLE storage. EPID Set to 1 if the instruction causing the inter- rupt is an External Process ID instruction; 7.6.5 Instruction Storage Interrupt otherwise set to 0. An Instruction Storage interrupt occurs when no higher All other defined ESR bits are set to 0. priority exception exists (see Section 7.9 on If category Embedded.Hypervisor is supported and the page 1059) and an Instruction Storage exception is interrupt is directed to guest supervisor state GSRR0, presented to the interrupt mechanism. An Instruction GSRR1, GDEAR, and GESR are set in place of SRR0, Storage exception is caused when any of the following SRR1, DEAR, and ESR, respectively. The MSR is set exceptions arises during execution of an instruction: as follows: Execute Access Control exception MSR An Execute Access Control exception is caused when CM MSRCM is set to EPCRGICM. one of the following conditions exist. CE, ME,WE,GS,DE Unchanged. While in user mode (MSRPR=1), an instruction fetch attempts to access a location in storage that Bits in the MSR corresponding to set bits in the is not user mode execute enabled (i.e. page MSRP register are left unchanged. access control bit UX=0). All other defined MSR bits set to 0. While in supervisor mode (MSRPR=0), an instruc- tion fetch attempts to access a location in storage The following is a prioritized listing of the various that is not supervisor mode execute enabled (i.e. exceptions which cause a Data Storage interrupt and page access control bit SX=0). the corresponding ESR bit, if applicable. Even though multiple of these exceptions may occur, at most one of Byte Ordering exception the following exceptions is reported in the ESR. A Byte Ordering exception may occur when the imple- 1. Cache Locking : DLK0:1 mentation cannot perform the instruction fetch in the byte order specified by the Endian storage attribute of 2. Page Table Fault : PT the page being accessed. 3. Virtualization Fault 4. TLB Ineligible : TLBI Page Table Fault exception Chapter 7. Interrupts and Exceptions 1037 Version 2.06 A Page Table Fault exception is caused when a Page the interrupt, then PT is set to 1 if no TLB Table translation occurs for an instruction fetch and the entry was created from the Page Table and Page Table Entry that is accessed is invalid (Valid bit = is set to an implementation-dependent 0). value if a TLB entry was created. See Sec- tion 6.7.4 for rules about TLB updates. If no TLB Ineligible exception Page Table Fault or Execute Access Con- A TLB Ineligible exception is caused when a Page trol exception occurred during a Page Table translation occurs for an instruction fetch and any Table translation for the instruction causing of the following conditions are true. the interrupt, set to 0. The only TLB entries that can be used to hold the VLEMI Set to 1 if the instruction causing the inter- translation for the virtual address have IPROT=1 rupt resides in VLE storage. No TLB array can be loaded from the Page Table All other defined ESR bits are set to 0. for the page size specified by the PTE. The PTEARPN is treated as an LPN (The Embed- If category Embedded.Hypervisor is supported and the ded.Hypervisor category is supported) and there is interrupt is directed to guest supervisor state, GSRR0, no TLB array that meets all the following condi- GSRR1, and GESR are set in place of SRR0, SRR1, tions. and ESR, respectively. The MSR is set as follows: The TLB array supports the page size speci- fied by the PTE. MSR The TLB array can be loaded from the Page CM MSRCM is set to EPCRGICM. Table (TLBnCFGPT = 1). CE, ME,WE,GS,DE Unchanged. If the Embedded.Hypervisor category is supported, an Instruction Storage interrupt resulting from a TLB Ineli- Bits in the MSR corresponding to set bits in the gible exception is always directed to hypervisor state MSRP register are left unchanged. regardless of the setting of EPCRISIGS. All other defined MSR bits set to 0. When an Instruction Storage interrupt occurs, the The following is a prioritized listing of the various thread suppresses the execution of the instruction exceptions which cause a Instruction Storage interrupt causing the Instruction Storage exception. and the corresponding ESR bit, if applicable. Even If category Embedded.Hypervisor is not supported or if though multiple of these exceptions may occur, at most category Embedded.Hypervisor is supported and the one of the following exceptions is reported in the ESR. interrupt is directed to hypervisor state, SRR0, SRR1, 1. Page Table Fault : PT MSR, and ESR are updated as follows: 2. TLB Ineligible : TLBI SRR0, SRR1, MSR, and ESR are updated as follows: 3. Byte Ordering exception: BO SRR0 Set to the effective address of the instruc- tion causing the Instruction Storage inter- 4. Execute Access: If the exception occurred during a rupt. Page Table translation, PT SRR1 Set to the contents of the MSR at the time Programming Note of the interrupt. Since some Instruction Storage exceptions are not mutually exclusive, system software may need to MSR examine the TLB entry or the Page Table entry CM MSRCM is set to EPCRICM. accessed by the data storage access in order to CE, ME,DE determine whether additional exceptions may have Unchanged. also occurred. All other defined MSR bits set to 0. If Category Embedded.Hypervisor is supported and the ESR interrupt is directed to the guest state, instruction exe- BO Set to 1 if the instruction fetch caused a cution resumes at the address given by one of the fol- Byte Ordering exception; otherwise set to lowing. 0. GIVPR0:47 || GIVOR348:59||0b0000 if IVORs [Cate- TLBI Set to 1 if a TLB Ineligible exception gory: Embedded.Phased-Out] are supported. occurred during a Page Table translation GIVPR0:51||0x080 if Interrupt Fixed Offsets [Cate- for the instruction causing the interrupt; oth- gory: Embedded.Phased-In] are supported. erwise set to 0. PT If a Page Table Fault or an Execute Access Otherwise, instruction execution resumes at the Control exception occurred during a Page address given by one of the following. Table translation for the instruction causing 1038 Power ISATM Book III-E Version 2.06 IVPR0:47 || IVOR348:59||0b0000 if IVORs [Cate- cution resumes at the address given by one of the fol- gory: Embedded.Phased-Out] are supported. lowing. IVPR0:51||0x080 if Interrupt Fixed Offsets [Cate- GIVPR0:47 || GIVOR448:59||0b0000 if IVORs [Cate- gory: Embedded.Phased-In] are supported. gory: Embedded.Phased-Out] are supported. GIVPR0:51||0x0A0 if Interrupt Fixed Offsets [Cate- gory: Embedded.Phased-In] are supported. 7.6.6 External Input Interrupt Otherwise, instruction execution resumes at the An External Input interrupt occurs when no higher prior- address given by one of the following. ity exception exists (see Section 7.9 on page 1059), an IVPR0:47 || IVOR448:59||0b0000 if IVORs [Cate- External Input exception is presented to the interrupt gory: Embedded.Phased-Out] are supported. mechanism, and the interrupt is enabled. While the specific definition of an External Input exception is IVPR0:51||0x0A0 if Interrupt Fixed Offsets [Cate- implementation-dependent, it would typically be caused gory: Embedded.Phased-In] are supported. by the activation of an asynchronous signal that is part of the processing system. Also, implementations may Programming Note provide an alternative means (in addition to the Software is responsible for taking whatever enabled criteria) for masking the External Input inter- action(s) are required by the implementation in rupt. order to clear any External Input exception status prior to re-enabling MSREE in order to avoid If category Embedded.Hypervisor is supported, Exter- another, redundant External Input interrupt. nal Input interrupts are enabled if: (EPCREXTGS = 0) & ((MSRGS =1) | (MSREE=1)) or 7.6.7 Alignment Interrupt (EPCREXTGS = 1) & (MSRGS =1) & (MSREE=1) An Alignment interrupt occurs when no higher priority exception exists (see Section 7.9 on page 1059) and Otherwise, External Input interrupts are enabled if an Alignment exception is presented to the interrupt MSREE=1. mechanism. An Alignment exception may be caused If category Embedded.Hypervisor is not supported or if when the implementation cannot perform a data stor- category Embedded.Hypervisor is supported and the age access for one of the following reasons: interrupt is directed to hypervisor state, SRR0, SRR1, The operand of a single-register Load or Store is and MSR are updated as follows: not aligned. SRR0 Set to the effective address of the next The instruction is a Load Multiple or Store Multiple, instruction to be executed. or a Move Assist for which the length of the stor- age operand is not zero. SRR1 Set to the contents of the MSR at the time The operand of dcbz or dcbzep is in storage that of the interrupt. is Write Through Required or Caching Inhibited, or one of these instructions is executed in an imple- MSR mentation that has either no data cache or a Write CM MSRCM is set to EHSRICM. Through data cache or the line addressed by the CE, ME,DE, instruction cannot be established in the cache Unchanged. because the cache is disabled or locked. All other defined MSR bits set to 0. The operand of a Store, except Store Conditional, or Store String for which the length of the storage If category Embedded.Hypervisor is supported and the operand is zero, is in storage that is Write-Through interrupt is directed to the guest supervisor state, Required. GSRR0 and GSRR1 are set in place of SRR0 and SRR1, respectively. The MSR is set as follows: For lmw and stmw with an operand that is not word- aligned, and for Load and Reserve and Store Condi- MSR tional instructions with an operand that is not aligned, CM MSRCM is set to EPCRGICM. an implementation may yield boundedly undefined CE, ME,WE,GS,DE results instead of causing an Alignment interrupt. A Unchanged. Store Conditional to Write Through Required storage may either cause a Data Storage interrupt, cause an Bits in the MSR corresponding to set bits in the Alignment interrupt, or correctly execute the instruction. MSRP register are left unchanged. For all other cases listed above, an implementation All other defined MSR bits set to 0. may execute the instruction correctly instead of causing an Alignment interrupt. (For dcbz or dcbzep, `correct' If Category Embedded.Hypervisor is supported and the execution means setting each byte of the block in main interrupt is directed to the guest state, instruction exe- storage to 0x00.) Chapter 7. Interrupts and Exceptions 1039 Version 2.06 instruction execution resumes at address IVPR0:47 || Programming Note IVOR548:59|| 0b0000. The architecture does not support the use of an unaligned effective address by Load and Reserve and Store Conditional instructions. If an Alignment 7.6.8 Program Interrupt interrupt occurs because one of these instructions A Program interrupt occurs when no higher priority specifies an unaligned effective address, the Align- exception exists (see Section 7.9 on page 1059), a Pro- ment interrupt handler must not attempt to emulate gram exception is presented to the interrupt mecha- the instruction, but instead should treat the instruc- nism, and, for Floating-point Enabled exception, tion as a programming error. MSRFE0,FE1 are non-zero. A Program exception is caused when any of the following exceptions arises When an Alignment interrupt occurs, the thread sup- during execution of an instruction: presses the execution of the instruction causing the Alignment exception. Floating-point Enabled exception SRR0, SRR1, MSR, DEAR, and ESR are updated as A Floating-point Enabled exception is caused when follows: FPSCRFEX is set to 1 by the execution of a floating- SRR0 Set to the effective address of the instruc- point instruction that causes an enabled exception, tion causing the Alignment interrupt. including the case of a Move To FPSCR instruction that causes an exception bit and the corresponding enable SRR1 Set to the contents of the MSR at the time bit both to be 1. Note that in this context, the term of the interrupt. `enabled exception' refers to the enabling provided by control bits in the Floating-Point Status and Control MSR Register. See Section 4.2.2 of Book I. CM MSRCM is set to EPCRICM. CE, ME,DE Auxiliary Processor Enabled exception Unchanged The cause of an Auxiliary Processor Enabled exception CE, ME, is implementation-dependent. DE, ICM Unchanged. All other defined MSR bits set to 0. Illegal Instruction exception DEAR Set to the effective address of a byte that is An Illegal Instruction exception does occur when exe- both within the range of the bytes being cution is attempted of any of the following kinds of accessed by the Storage Access or Cache instructions. Management instruction, and within the a reserved-illegal instruction page whose access caused the Alignment when MSRPR=1 (user mode), an mtspr or mfspr exception. that specifies an spr value with spr5=0 (user-mode accessible) that represents an unimplemented ESR Special Purpose Register FP Set to 1 if the instruction causing the inter- rupt is a floating-point load or store; other- An Illegal Instruction exception may occur when execu- wise set to 0. tion is attempted of any of the following kinds of instruc- ST Set to 1 if the instruction causing the inter- tions. If the exception does not occur, the alternative is rupt is a Store; otherwise set to 0. shown in parentheses. AP Set to 1 if the instruction causing the inter- rupt is an Auxiliary Processor load or store; an instruction that is in invalid form (boundedly otherwise set to 0. undefined results) SPV Set to 1 if the instruction causing the inter- an lswx instruction for which register RA or regis- rupt is a SPE operation or a Vector opera- ter RB is in the range of registers to be loaded tion; otherwise set to 0. (boundedly undefined results) VLEMI Set to 1 if the instruction causing the inter- a defined instruction that is not implemented by the rupt resides in VLE storage. implementation (Unimplemented Operation excep- EPID Set to 1 if the instruction causing the inter- tion) rupt is an External Process ID instruction; Privileged Instruction exception otherwise set to 0. A Privileged Instruction exception occurs when All other defined ESR bits are set to 0. MSRPR=1 and execution is attempted of any of the fol- If Interrupt Fixed Offsets [Category: Embed- lowing kinds of instructions. ded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x0C0. Otherwise, a privileged instruction 1040 Power ISATM Book III-E Version 2.06 an mtspr or mfspr instruction that specifies an spr MSR value with spr5=1 CM MSRCM is set to EPCRICM. CE, ME,DE Trap exception Unchanged A Trap exception occurs when any of the conditions All other defined MSR bits set to 0. specified in a Trap instruction are met and the excep- tion is not also enabled as a Debug interrupt. If enabled ESR as a Debug interrupt (i.e. DBCR0TRAP=1, PIL Set to 1 if an Illegal Instruction exception DBCR0IDM=1, and MSRDE=1), then a Debug interrupt type Program interrupt; otherwise set to 0 will be taken instead of the Program interrupt. PPR Set to 1 if a Privileged Instruction exception type Program interrupt; otherwise set to 0 Unimplemented Operation exception PTR Set to 1 if a Trap exception type Program An Unimplemented Operation exception may occur interrupt; otherwise set to 0 when execution is attempted of a defined instruction PUO Set to 1 if an Unimplemented Operation that is not implemented by the implementation. Other- exception type Program interrupt; other- wise an Illegal Instruction exception occurs. wise set to 0 FP Set to 1 if the instruction causing the inter- An Unimplemented Operation exception may also rupt is a floating-point instruction; otherwise occur when the thread is in 32-bit mode and execution set to 0. is attempted of an instruction that is part of the 64-Bit PIE Set to 1 if a Floating-point Enabled excep- category. Otherwise the instruction executes normally. tion type Program interrupt, and the SRR0, SRR1, MSR, and ESR are updated as follows: address saved in SRR0 is not the address of the instruction causing the exception (i.e. SRR0 For all Program interrupts except an the instruction that caused FPSCRFEX to Enabled exception when in one of the be set); otherwise set to 0. imprecise modes (see Section 4.2.1 on AP Set to 1 if the instruction causing the inter- page 903) or when a disabled exception is rupt is an Auxiliary Processor instruction; subsequently enabled, set to the effective otherwise set to 0. address of the instruction that caused the SPV Set to 1 if the instruction causing the inter- Program interrupt. rupt is a SPE operation or a Vector opera- For an imprecise Enabled exception, set to tion; otherwise set to 0. the effective address of the excepting VLEMI Set to 1 if the instruction causing the inter- instruction or to the effective address of rupt resides in VLE storage. some subsequent instruction. If it points to All other defined ESR bits are set to 0. a subsequent instruction, that instruction has not been executed, and ESRPIE is set If Interrupt Fixed Offsets [Category: Embed- to 1. If a subsequent instruction is an sync ded.Phased-In] are supported, instruction execution or isync, SRR0 will point at the sync or resumes at address IVPR0:51 || 0x0E0. Otherwise, isync instruction, or at the following instruc- instruction execution resumes at address IVPR0:47 || tion. IVOR648:59|| 0b0000. If FPSCRFEX=1 but both MSRFE0=0 and MSRFE1=0, an Enabled exception type 7.6.9 Floating-Point Unavailable Program interrupt will occur imprecisely prior to or at the next synchronizing event if Interrupt these MSR bits are altered by any instruc- A Floating-Point Unavailable interrupt occurs when no tion that can set the MSR so that the higher priority exception exists (see Section 7.9 on expression page 1059), an attempt is made to execute a floating- (MSRFE0 | MSRFE1) & FPSCRFEX point instruction (i.e. any instruction listed in Section 4.6 of Book I), and MSRFP=0. is 1. When this occurs, SRR0 is loaded with the address of the instruction that would When a Floating-Point Unavailable interrupt occurs, the have executed next, not with the address of hardware suppresses the execution of the instruction the instruction that modified the MSR caus- causing the Floating-Point Unavailable interrupt. ing the interrupt, and ESRPIE is set to 1. SRR0, SRR1, and MSR are updated as follows: SRR1 Set to the contents of the MSR at the time SRR0 Set to the effective address of the instruc- of the interrupt. tion that caused the interrupt. SRR1 Set to the contents of the MSR at the time of the interrupt. Chapter 7. Interrupts and Exceptions 1041 Version 2.06 MSR IVPR0:47 || IVOR848:59||0b0000 if IVORs [Cate- CM MSRCM is set to EPCRICM. gory: Embedded.Phased-Out] are supported. CE, ME,DE IVPR0:51||0x120 if Interrupt Fixed Offsets [Cate- Unchanged gory: Embedded.Phased-In] are supported. All other defined MSR bits set to 0. If Interrupt Fixed Offsets [Category: Embed- 7.6.11 Auxiliary Processor ded.Phased-In] are supported, instruction execution Unavailable Interrupt resumes at address IVPR0:51 || 0x100. Otherwise, instruction execution resumes at address IVPR0:47 || An Auxiliary Processor Unavailable interrupt occurs IVOR748:59|| 0b0000. when no higher priority exception exists (see Section 7.9 on page 1059), an attempt is made to exe- cute an Auxiliary Processor instruction (including Auxil- 7.6.10 System Call Interrupt iary Processor loads, stores, and moves), the target Auxiliary Processor is present on the implementation, A System Call interrupt occurs when no higher priority and the Auxiliary Processor is configured as unavail- exception exists (see Section 7.9 on page 1059) and a able. Details of the Auxiliary Processor, its instruction System Call (sc) instruction is executed. set, and its configuration are implementation-depen- If category Embedded.Hypervisor is not supported or if dent. See User's Manual for the implementation. category Embedded.Hypervisor is supported and the When an Auxiliary Processor Unavailable interrupt interrupt is directed to hypervisor state, SRR0, SRR1, occurs, the hardware suppresses the execution of the and MSR are updated as follows: instruction causing the Auxiliary Processor Unavailable SRR0 Set to the effective address of the instruc- interrupt. tion after the sc instruction. Registers SRR0, SRR1, and MSR are updated as fol- SRR1 Set to the contents of the MSR at the time lows: of the interrupt. SRR0 Set to the effective address of the instruc- tion that caused the interrupt. MSR CM MSRCM is set to EPCRICM. SRR1 Set to the contents of the MSR at the time VLEMI Set to 1 if the instruction causing the inter- of the interrupt. rupt resides in VLE storage. CE, ME,DE MSR Unchanged. CM MSRCM is set to EPCRICM. CE, ME,DE All other defined MSR bits set to 0. Unchanged If category Embedded.Hypervisor is supported and the All other defined MSR bits set to 0. interrupt is directed to guest supervisor state (MSRGS = 1), GSRR0 and GSRR1 are set in place of SRR0 and If Interrupt Fixed Offsets [Category: Embed- SRR1, respectively. The MSR is set as follows: ded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x140. Otherwise, MSR instruction execution resumes at address IVPR0:47 || CM MSRCM is set to EPCRGICM. IVOR948:59|| 0b0000. CE, ME,WE,GS,DE Unchanged. 7.6.12 Decrementer Interrupt Bits in the MSR corresponding to set bits in the MSRP register are left unchanged. A Decrementer interrupt occurs when no higher priority interrupt exists (see Section 7.9 on page 1059), a Dec- All other defined MSR bits set to 0. rementer exception exists (TSRDIS=1), and the excep- If Category Embedded.Hypervisor is supported and the tion is enabled. If category Embedded.Hypervisor is interrupt is directed to the guest state, instruction exe- supported, the interrupt is enabled by TCR[DIE]=1 and cution resumes at the address given by one of the fol- (MSREE=1 or MSRGS=1). Otherwise, the interrupt is lowing. enabled by TCRDIE=1 and MSREE=1. See Section 9.3 GIVPR0:47 || GIVOR848:59||0b0000 if IVORs [Cate- on page 1069. gory: Embedded.Phased-Out] are supported. GIVPR0:51||0x120 if Interrupt Fixed Offsets [Cate- Programming Note gory: Embedded.Phased-In] are supported. MSREE also enables the External Input and Fixed- Interval Timer interrupts. Otherwise, instruction execution resumes at the address given by one of the following. 1042 Power ISATM Book III-E Version 2.06 SRR0, SRR1, MSR, and TSR are updated as follows: CE, ME,DE Unchanged. SRR0 Set to the effective address of the next instruction to be executed. All other defined MSR bits set to 0. SRR1 Set to the contents of the MSR at the time TSR (See Section 9.5.1 on page 1072.) of the interrupt. FIS Set to 1 MSR If Interrupt Fixed Offsets [Category: Embed- CM MSRCM is set to EPCRICM. ded.Phased-In] are supported, instruction execution CE, ME,DE resumes at address IVPR0:51 || 0x180. Otherwise, Unchanged instruction execution resumes at address IVPR0:47 || All other defined MSR bits set to 0. IVOR1148:59|| 0b0000. TSR (See Section 9.5.1 on page 1072.) Programming Note DIS Set to 1. Software is responsible for clearing the Fixed-Inter- val Timer exception status prior to re-enabling the If Interrupt Fixed Offsets [Category: Embed- MSREE bit in order to avoid another redundant ded.Phased-In] are supported, instruction execution Fixed-Interval Timer interrupt. To clear the Fixed- resumes at address IVPR0:51 || 0x160. Otherwise, Interval Timer exception, the interrupt handling rou- instruction execution resumes at address IVPR0:47 || tine must clear TSRFIS. Clearing is done by writing IVOR1048:59|| 0b0000. a word to TSR using mtspr with a 1 in any bit posi- tion that is to be cleared and 0 in all other bit posi- Programming Note tions. The write-data to the TSR is not direct data, Software is responsible for clearing the Decre- but a mask. A 1 causes the bit to be cleared, and a menter exception status prior to re-enabling the 0 has no effect. MSREE bit in order to avoid another redundant Decrementer interrupt. To clear the Decrementer exception, the interrupt handling routine must clear 7.6.14 Watchdog Timer Interrupt TSRDIS. Clearing is done by writing a word to TSR using mtspr with a 1 in any bit position that is to be A Watchdog Timer interrupt occurs when no higher pri- cleared and 0 in all other bit positions. The write- ority exception exists (see Section 7.9 on page 1059), a data to the TSR is not direct data, but a mask. A 1 Watchdog Timer exception exists (TSRWIS=1), and the causes the bit to be cleared, and a 0 has no effect. exception is enabled. If category Embedded.Hypervi- sor is supported, the interrupt is enabled by TCRWIE=1 and (MSRCE = 1 or MSRGS=1).Otherwise, the interrupt 7.6.13 Fixed-Interval Timer Inter- is enabled by TCRWIE=1 and MSRCE=1. See Section 9.7 on page 1073. rupt A Fixed-Interval Timer interrupt occurs when no higher Programming Note priority exception exists (see Section 7.9 on MSRCE also enables the Critical Input interrupt. page 1059), a Fixed-Interval Timer exception exists (TSRFIS=1), and the exception is enabled. If category CSRR0, CSRR1, MSR, and TSR are updated as fol- Embedded.Hypervisor is supported, the interrupt is lows: enabled by TCRFIE=1 and MSRGS=1. See Section 9.6 CSRR0 Set to the effective address of the next on page 1073. instruction to be executed. Programming Note CSRR1 Set to the contents of the MSR at the time of the interrupt. MSREE also enables the External Input and Decre- menter interrupts. MSR CM MSRCM is set to EPCRICM. ME Unchanged. SRR0, SRR1, MSR, and TSR are updated as follows: DE Unchanged if category E.ED is supported; SRR0 Set to the effective address of the next otherwise set to 0. instruction to be executed. All other defined MSR bits set to 0. SRR1 Set to the contents of the MSR at the time TSR (See Section 9.5.1 on page 1072.) of the interrupt. WIS Set to 1. MSR If Interrupt Fixed Offsets [Category: Embed- CM MSRCM is set to EPCRICM. ded.Phased-In] are supported, instruction execution Chapter 7. Interrupts and Exceptions 1043 Version 2.06 resumes at address IVPR0:51 || 0x1A0. Otherwise, Management instruction, and within the instruction execution resumes at address IVPR0:47 || page whose access caused the Data TLB IVOR1248:59|| 0b0000. Error exception. Programming Note ESR Software is responsible for clearing the Watchdog ST Set to 1 if the instruction causing the inter- Timer exception status prior to re-enabling the rupt is a Store, dcbi, dcbz, or dcbzep MSRCE bit in order to avoid another redundant instruction; otherwise set to 0. Watchdog Timer interrupt. To clear the Watchdog FP Set to 1 if the instruction causing the inter- Timer exception, the interrupt handling routine rupt is a floating-point load or store; other- must clear TSRWIS. Clearing is done by writing a wise set to 0. word to TSR using mtspr with a 1 in any bit posi- AP Set to 1 if the instruction causing the inter- tion that is to be cleared and 0 in all other bit posi- rupt is an Auxiliary Processor load or store; tions. The write-data to the TSR is not direct data, otherwise set to 0. but a mask. A 1 causes the bit to be cleared, and a SPV Set to 1 if the instruction causing the inter- 0 has no effect. rupt is a SPE operation or a Vector opera- tion; otherwise set to 0. VLEMI Set to 1 if the instruction causing the inter- rupt resides in VLE storage. 7.6.15 Data TLB Error Interrupt EPID Set to 1 if the instruction causing the inter- A Data TLB Error interrupt occurs when no higher prior- rupt is an External Process ID instruction; ity exception exists (see Section 7.9 on page 1059) and otherwise set to 0. any of the following Data TLB Error exceptions is pre- All other defined ESR bits are set to 0. sented to the interrupt mechanism. If category Embedded.Hypervisor is supported and the TLB Miss exception interrupt is directed to guest supervisor state, GSRR0, GSRR1, GDEAR, and GESR are set in place of SRR0, Caused when the virtual address associated with a SRR1, DEAR, and ESR, respectively. The MSR is set data storage access does not match any valid entry in as follows: the TLB as specified in Section 6.7.2 on page 949. If a stbcx., sthcx., stwcx., or stdcx. would not perform MSR its store in the absence of a Data Storage interrupt, and CM MSRCM is set to EPCRGICM. a non-conditional Store to the specified effective CE, ME,WE,GS,DE address would cause a Data Storage interrupt, it is Unchanged. implementation dependent whether a Data Storage Bits in the MSR corresponding to set bits in the interrupt occurs. MSRP register are left unchanged. When a Data TLB Error interrupt occurs, the hardware All other defined MSR bits set to 0. suppresses the execution of the instruction causing the Data TLB Error interrupt. If Category Embedded.Hypervisor is supported and the interrupt is directed to the guest state, instruction exe- If category Embedded.Hypervisor is not supported or if cution resumes at the address given by one of the fol- category Embedded.Hypervisor is supported and the lowing. interrupt is directed to hypervisor state, SRR0, SRR1, GIVPR0:47 || GIVOR1348:59||0b0000 if IVORs [Cat- MSR, DEAR, and ESR are updated as follows: egory: Embedded.Phased-Out] are supported. SRR0 Set to the effective address of the instruc- GIVPR0:51||0x1C0 if Interrupt Fixed Offsets [Cate- tion causing the Data TLB Error interrupt gory: Embedded.Phased-In] are supported. SRR1 Set to the contents of the MSR at the time Otherwise, instruction execution resumes at the of the interrupt. address given by one of the following. IVPR0:47 || IVOR1348:59||0b0000 if IVORs [Cate- MSR gory: Embedded.Phased-Out] are supported. CM MSRCM is set to EHSRICM. IVPR0:51||0x1C0 if Interrupt Fixed Offsets [Cate- CE, ME, DE gory: Embedded.Phased-In] are supported. Unchanged. All other defined MSR bits set to 0. 7.6.16 Instruction TLB Error Inter- DEAR Set to the effective address of a byte that is rupt both within the range of the bytes being An Instruction TLB Error interrupt occurs when no accessed by the Storage Access or Cache higher priority exception exists (see Section 7.9 on 1044 Power ISATM Book III-E Version 2.06 page 1059) and any of the following Instruction TLB 7.6.17 Debug Interrupt Error exceptions is presented to the interrupt mecha- nism. A Debug interrupt occurs when no higher priority exception exists (see Section 7.9 on page 1059), a TLB Miss exception Debug exception exists in the DBSR, and Debug inter- Caused when the virtual address associated with an rupts are enabled (DBCR0IDM=1 and MSRDE=1). A instruction fetch does not match any valid entry in the Debug exception occurs when a Debug Event causes a TLB as specified in Section 6.7.2 on page 949. corresponding bit in the DBSR to be set. See Section 10.5. When an Instruction TLB Error interrupt occurs, the hardware suppresses the execution of the instruction If the Embedded.Enhanced Debug category is not sup- causing the Instruction TLB Miss exception. ported or is supported and is not enabled, CSRR0, CSRR1, MSR, and DBSR are updated as follows. If the If category Embedded.Hypervisor is not supported or if Embedded.Enhanced Debug category is supported category Embedded.Hypervisor is supported and the and is enabled, DSRR0 and DSRR1 are updated as interrupt is directed to hypervisor state, SRR0, SRR1, specified below and CSRR0 and CSRR1 are not and MSR are updated as follows: changed. The means by which the Embed- SRR0 Set to the effective address of the instruc- ded.Enhanced Debug category is enabled is imple- tion causing the Instruction TLB Error inter- mentation-dependent. rupt. CSRR0 or DSRR0 [Category: Embedded.Enhanced SRR1 Set to the contents of the MSR at the time Debug] of the interrupt. For Debug exceptions that occur while Debug interrupts are enabled MSR (DBCR0IDM=1 and MSRDE=1), CSRR0 is CM MSRCM is set to EPCRICM. set as follows: CE, ME,DE For Instruction Address Compare Unchanged. (IAC1, IAC2, IAC3, IAC4), Data All other defined MSR bits set to 0. Address Compare (DAC1R, DAC1W, DAC2R, DAC2W), Data Value Com- If category Embedded.Hypervisor is supported and the pare (DVC1, DVC2), Trap (TRAP), or interrupt is directed to guest supervisor state, GSRR0 Branch Taken (BRT) debug excep- and GSRR1 are set in place of SRR0 and SRR1, tions, set to the address of the instruc- respectively. The MSR is set as follows: tion causing the Debug interrupt. For Instruction Complete (ICMP) MSR debug exceptions, set to the address CM MSRCM is set to EPCRGICM. of the instruction that would have exe- CE, ME,WE,GS,DE cuted after the one that caused the Unchanged. Debug interrupt. Bits in the MSR corresponding to set bits in the For Unconditional Debug Event (UDE) MSRP register are left unchanged. debug exceptions, set to the address of the instruction that would have exe- All other defined MSR bits set to 0. cuted next if the Debug interrupt had not occurred. If Category Embedded.Hypervisor is supported and the For Interrupt Taken (IRPT) debug interrupt is directed to the guest state, instruction exe- exceptions, set to the interrupt vector cution resumes at the address given by one of the fol- value of the interrupt that caused the lowing. Interrupt Taken debug event. GIVPR0:47 || GIVOR1448:59||0b0000 if IVORs [Cat- For Return From Interrupt (RET) egory: Embedded.Phased-Out] are supported. debug exceptions, set to the address GIVPR0:51||0x1E0 if Interrupt Fixed Offsets [Cate- of the rfi instruction that caused the gory: Embedded.Phased-In] are supported. Debug interrupt. Otherwise, instruction execution resumes at the For Critical Interrupt Taken (CRPT) address given by one of the following. debug exceptions, DSRR0 is set to the IVPR0:47 || IVOR1448:59||0b0000 if IVORs [Cate- address of the first instruction of the gory: Embedded.Phased-Out] are supported. critical interrupt handler. CSRR0 is IVPR0:51||0x1E0 if Interrupt Fixed Offsets [Cate- unaffected. gory: Embedded.Phased-In] are supported. For Critical Interrupt Return (CRET) debug exceptions, DSRR0 is set to the address of the rfci instruction that Chapter 7. Interrupts and Exceptions 1045 Version 2.06 caused the Debug interrupt. See 7.6.18 SPE/Embedded Floating- Section 10.4.10, "Critical Interrupt Return Debug Event [Category: Point/Vector Unavailable Interrupt Embedded.Enhanced Debug]". [Categories: SPE.Embedded Float For Debug exceptions that occur while Scalar Double, SPE.Embedded Debug interrupts are disabled (DBCR0IDM=0 or MSRDE=0), a Debug Float Vector, Vector] interrupt will occur at the next synchroniz- The SPE/Embedded Floating-Point/Vector Unavail- ing event if DBCR0IDM and MSRDE are able interrupt occurs when no higher priority exception modified such that they are both 1 and if exists, and an attempt is made to execute an SPE, the Debug exception Status is still set in the SPE.Embedded Float Scalar Double, SPE.Embedded DBSR. When this occurs, CSRR0 or Float Vector, or Vector instruction and MSRSPV = 0. DSRR0 [Category:Embedded.Enhanced Debug] is set to the address of the instruc- When an Embedded Floating-Point Unavailable inter- tion that would have executed next, not rupt occurs, the hardware suppresses the execution of with the address of the instruction that the instruction causing the exception. modified the Debug Control Register 0 or SRR0, SRR1, MSR, and ESR are updated as follows: MSR and thus caused the interrupt. SRR0 Set to the effective address of the instruc- CSRR1 or DSRR1 [Category: Embedded.Enhanced tion causing the Embedded Floating-Point Debug] Unavailable interrupt. Set to the contents of the MSR at the time SRR1 Set to the contents of the MSR at the time of the interrupt. of the interrupt. MSR MSR CM MSRCM is set to EPCRICM. CM MSRCM is set to EPCRICM. ME Unchanged CE, ME,DE All other supported MSR bits set to 0. Unchanged DBSR Set to indicate type of Debug Event (see Section 10.5.2) ESR SPV Set to 1. If Interrupt Fixed Offsets [Category: Embed- VLEMI Set to 1 if the instruction causing the inter- ded.Phased-In] are supported, instruction execution rupt resides in VLE storage. resumes at address IVPR0:51 || 0x040. Otherwise, All other defined ESR bits are set to 0. instruction execution resumes at address IVPR0:47 || IVOR1548:59|| 0b0000. If Interrupt Fixed Offsets [Category: Embed- ded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x200. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR3248:59|| 0b0000. Programming Note This interrupt is also used by the Signal Processing Engine in the same manner. It should be used by software to determine if the application is using the upper 32 bits of the GPRs in a 32-bit implementa- tion and thus be required to save and restore them on context switch. 1046 Power ISATM Book III-E Version 2.06 7.6.19 Embedded Floating-Point 7.6.20 Embedded Floating-Point Data Interrupt Round Interrupt [Categories: SPE.Embedded Float [Categories: SPE.Embedded Float Scalar Double, SPE.Embedded Scalar Double, SPE.Embedded Float Scalar Single, SPE.Embedded Float Scalar Single, SPE.Embedded Float Vector] Float Vector] The Embedded Floating-Point Data interrupt occurs The Embedded Floating-Point Round interrupt occurs when no higher priority exception exists (see Section when no higher priority exception exists (see 7.9) and an Embedded Floating-Point Data exception is Section 7.9 on page 1059), SPEFSCRFINXE is set to 1, presented to the interrupt mechanism. The Embedded and any of the following occurs: Floating-Point Data exception causing the interrupt is - the unrounded result of an Embedded Float- indicated in the SPEFSCR; these exceptions include ing-Point operation is not exact Embedded Floating-Point Invalid Operation/Input Error (FINV, FINVH), Embedded Floating-Point Divide By - an overflow occurs and overflow exceptions Zero (FDBZ, FDBZH), Embedded Floating-Point Over- are disabled (FOVF or FOVFH is set to 1 and flow (FOV, FOVH), and Embedded Floating-Point FOVFE is set to 0) Underflow (FUNF, FUNFH) - an underflow occurs and underflow excep- When an Embedded Floating-Point Data interrupt tions are disabled (FUNF is set to 1 and occurs, the hardware suppresses the execution of the FUNFE is set to 0). instruction causing the exception. The value of SPEFSCRFINXS is 1, indicating that one of SRR0, SRR1, MSR, and ESR are updated as follows: the above exceptions has occurred, and additional information about the exception is found in SRR0 Set to the effective address of the instruc- SPEFSCRFGH FG FXH FX. tion causing the Embedded Floating-Point Data interrupt. When an Embedded Floating-Point Round interrupt occurs, the hardware completes the execution of the SRR1 Set to the contents of the MSR at the time instruction causing the exception and writes the result of the interrupt. to the destination register prior to taking the interrupt. MSR SRR0, SRR1, MSR, and ESR are updated as follows: CM MSRCM is set to EPCRICM. SRR0 Set to the effective address of the instruc- CE, ME,DE tion following the instruction causing the Unchanged Embedded Floating-Point Round interrupt. VLEMI Set to 1 if the instruction causing the inter- rupt resides in VLE storage. SRR1 Set to the contents of the MSR at the time of the interrupt. All other defined MSR bits set to 0. MSR ESR CM MSRCM is set to EPCRICM. SPV Set to 1. CE, ME,DE All other defined ESR bits are set to 0. Unchanged If Interrupt Fixed Offsets [Category: Embed- All other defined MSR bits set to 0. ded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x220. Otherwise, ESR instruction execution resumes at address IVPR0:47 || SPV Set to 1. IVOR3348:59|| 0b0000. VLEMI Set to 1 if the instruction causing the inter- rupt resides in VLE storage. All other defined ESR bits are set to 0. If Interrupt Fixed Offsets [Category: Embed- ded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x240. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR3448:59|| 0b0000. Chapter 7. Interrupts and Exceptions 1047 Version 2.06 CE, ME,DE Programming Note Unchanged. If an implementation does not support ±Infinity rounding modes and the rounding mode is set to be All other defined MSR bits set to 0. +Infinity or -Infinity, an Embedded Floating-Point If Interrupt Fixed Offsets [Category: Embed- Round interrupt occurs after every Embedded ded.Phased-In] are supported, instruction execution Floating-Point instruction for which rounding might resumes at address IVPR0:51 || 0x280. Otherwise, occur regardless of the value of FINXE, provided instruction execution resumes at address IVPR0:47 || no higher priority exception exists. IVOR3648:59|| 0b0000. When an Embedded Floating-Point Round interrupt occurs, the unrounded (truncated) result of an inex- 7.6.23 Processor Doorbell Criti- act high or low element is placed in the target regis- ter. If only a single element is inexact, the other cal Interrupt [Category: Embed- exact element is updated with the correctly ded.Processor Control] rounded result, and the FG and FX bits corre- sponding to the other exact element will both be 0. A Processor Doorbell Critical Interrupt occurs when no higher priority exception exists, a Processor Doorbell The bits FG (FGH) and FX (FXH) are provided so Critical exception is present, and the interrupt is that an interrupt handler can round the result as it enabled (MSRCE=1). Processor Doorbell Critical desires. FG (FGH) is the value of the bit immedi- exceptions are generated when DBELL_CRIT mes- ately to the right of the least significant bit of the sages (see Chapter 11) are received and accepted by destination format mantissa from the infinitely pre- the thread. cise intermediate calculation before rounding. FX (FXH) is the value of the `or' of all the bits to the If category Embedded.Hypervisor is supported, the right of the FG (FGH) of the destination format interrupt is enabled if MSRGS = 1 or MSRCE=1. mantissa from the infinitely precise intermediate calculation before rounding. CSRR0, CSRR1 and MSR are updated as follows: CSRR0 Set to the effective address of the next instruction to be executed. 7.6.21 Performance Monitor Inter- CSRR1 Set to the contents of the MSR at the time rupt [Category: Embedded.Perfor- of the interrupt. mance Monitor] MSR The Performance Monitor interrupt is part of the CM MSRCM is set to EPCRICM. optional Performance Monitor facility; see Appendix D. ME Unchanged. DE Unchanged if category E.ED is supported, otherwise set to 0 7.6.22 Processor Doorbell Inter- All other defined MSR bits set to 0. rupt [Category: Embedded.Proces- If Interrupt Fixed Offsets [Category: Embed- sor Control] ded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x2A0. Otherwise, A Processor Doorbell Interrupt occurs when no higher instruction execution resumes at address IVPR0:47 || priority exception exists, a Processor Doorbell excep- IVOR3748:59|| 0b0000. tion is present, and the interrupt is enabled (MSREE=1). Processor Doorbell exceptions are generated when DBELL messages (see Chapter 11) are received and 7.6.24 Guest Processor Doorbell accepted by the thread. Interrupt [Category: Embed- If category Embedded.Hypervisor is supported, the interrupt is enabled if MSRGS = 1 or MSREE=1. ded.Hypervisor,Embedded.Proces- SRR0, SRR1 and MSR are updated as follows: sor Control] SRR0 Set to the effective address of the next A Guest Processor Doorbell Interrupt occurs when no instruction to be executed. higher priority exception exists, a Guest Processor Doorbell exception is present, and the interrupt is SRR1 Set to the contents of the MSR at the time enabled (MSRGS=1 and MSREE=1). Guest Processor of the interrupt. Doorbell exceptions are generated when G_DBELL messages (see Chapter 11) are received and accepted MSR by the thread. CM MSRCM is set to EPCRICM. 1048 Power ISATM Book III-E Version 2.06 GSRR0, GSRR1 and MSR are updated as follows: Programming Note GSRR0 Set to the effective address of the next Some guest operating systems running on a hyper- instruction to be executed. visor may use lazy interrupt blocking. That is, when GSRR1 Set to the contents of the MSR at the time the operating system wants to block interrupts at of the interrupt. the interrupt controller, it does not actually perform the blocking operation, but instead sets a value in MSR memory that represents the level at which inter- CM MSRCM is set to EPCRICM. rupts are to be blocked. When an actual interrupt CE, ME,DE occurs, this value is consulted to determine if the Unchanged. interrupt should have been blocked. If so, the cur- rent interrupt level that is to be blocked is set in the All other defined MSR bits set to 0. interrupt controller and the interrupt handling code If Interrupt Fixed Offsets [Category: Embed- returns without acknowledging the interrupt. When ded.Phased-In] are supported, instruction execution interrupts are unblocked at a later time, the inter- resumes at address IVPR0:51 || 0x2C0. Otherwise, rupt will be reasserted by the interrupt controller. instruction execution resumes at address IVPR0:47 || When a hypervisor is taking external interrupts and IVOR3848:59|| 0b0000. then reflecting them to a guest, the hypervisor must acknowledge the interrupt before reflecting it to the Programming Note guest since the external interrupt will occur again once MSRGS = 1 regardless of the state of MSREE. Guest Processor Doorbell interrupts are used by the hypervisor to be notified when the guest operat- To emulate the behavior required for lazy interrupt ing system has set MSREE to 1. This allows the blocking by the guest, the hypervisor should exe- hypervisor to reflect base class interrupts to the cute another msgsnd instruction specifying a guest at a time when the guest is ready to accept Guest Processor Doorbell at the time that it is them (MSRGS=1 and MSREE=1). reflecting the interrupt to the guest. When the guest performs its interrupt acknowledge (a hypercall or writing to an interrupt controller register emulated by the hypervisor), the hypervisor can execute a msgclr to clear a pending message if there are no other interrupts to be reflected to the guest. Chapter 7. Interrupts and Exceptions 1049 Version 2.06 7.6.25 Guest Processor Doorbell CSRR0, CSRR1 and MSR are updated as follows: Critical Interrupt [Category: CSRR0 Set to the effective address of the next instruction to be executed. Embedded.Hypervisor,Embed- CSRR1 Set to the contents of the MSR at the time ded.Processor Control] of the interrupt. A Guest Processor Doorbell Critical Interrupt occurs MSR when no higher priority exception exists, a Guest Pro- CM MSRCM is set to EPCRICM. cessor Doorbell Critical exception is present, and the ME Unchanged. interrupt is enabled (MSRGS = 1 and MSRCE=1). Guest DE Unchanged if category E.ED is supported, Processor Doorbell Critical exceptions are generated otherwise set to 0 when G_DBELL_CRIT messages (see Chapter 11) are received and accepted by the thread. All other defined MSR bits set to 0. CSRR0, CSRR1 and MSR are updated as follows: If Interrupt Fixed Offsets [Category: Embed- ded.Phased-In] are supported, instruction execution CSRR0 Set to the effective address of the next resumes at address IVPR0:51 || 0x2E0. Otherwise, instruction to be executed. instruction execution resumes at address IVPR0:47 || CSRR1 Set to the contents of the MSR at the time IVOR3948:59|| 0b0000. of the interrupt. Programming Note MSR Guest Processor Doorbell Machine Check inter- CM MSRCM is set to EPCRICM. rupts are used by the hypervisor to be notified ME Unchanged. when the guest operating system has set MSRME DE Unchanged if category E.ED is supported, to 1. This allows the hypervisor to reflect machine otherwise set to 0 check class interrupts to the guest at a time when All other defined MSR bits set to 0. the guest is ready to accept them (MSRGS=1 and MSRME=1). If Interrupt Fixed Offsets [Category: Embed- ded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x2E0. Otherwise, Programming Note instruction execution resumes at address IVPR0:47 || Guest Processor Doorbell Critical interrupts and IVOR3948:59|| 0b0000. Guest Processor Doorbell Machine Check inter- rupts share the same IVOR. Hypervisor software Programming Note can differentiate between the two interrupts by Guest Processor Doorbell Critical interrupts are comparing whether CE or ME is set in CSRR1 and used by the hypervisor to be notified when the which interrupt class is to be reflected. guest operating system has set MSRCE to 1. This allows the hypervisor to reflect critical class inter- rupts to the guest at a time when the guest is ready 7.6.27 Embedded Hypervisor Sys- to accept them (MSRGS=1 and MSRCE=1). tem Call Interrupt [Category: Embedded.Hypervisor] An Embedded Hypervisor System Call interrupt occurs 7.6.26 Guest Processor Doorbell when no higher priority exception exists (see Section 7.9) and a System Call (sc) instruction with LEV = 1 is Machine Check Interrupt [Category: executed. Embedded.Hypervisor,Embed- SRR0, SRR1, and MSR are updated as follows: ded.Processor Control] SRR0 Set to the effective address of the instruc- A Guest Processor Doorbell Machine Check Interrupt tion after the sc instruction. occurs when no higher priority exception exists, a SRR1 Set to the contents of the MSR at the time Guest Processor Doorbell Machine Check exception is of the interrupt. present, and the interrupt is enabled (MSRGS = 1 and MSRME=1). Guest Processor Doorbell Machine Check MSR exceptions are generated when G_DBELL_MC mes- CM MSRCM is set to EPCRICM. sages (see Chapter 11) are received and accepted by VLEMI Set to 1 if the instruction causing the inter- the thread. rupt resides in VLE storage. CE, ME,DE 1050 Power ISATM Book III-E Version 2.06 Unchanged. An Embedded Hypervisor Privilege exception also occurs when execution is attempted of an ehpriv All other defined MSR bits set to 0. instruction, regardless of the state of the thread. If Interrupt Fixed Offsets [Category: Embed- Execution of the instruction causing the interrupt is sup- ded.Phased-In] are supported, instruction execution pressed and SRR0, SRR1, and MSR are updated as resumes at address IVPR0:51 || 0x300. Otherwise, follows: instruction execution resumes at address IVPR0:47 || IVOR4048:59|| 0b0000. SRR0 Set to the effective address of the instruc- tion causing the Embedded Hypervisor Privilege interrupt. 7.6.28 Embedded Hypervisor Priv- SRR1 Set to the contents of the MSR at the time ilege Interrupt [Category: Embed- of the interrupt. ded.Hypervisor] MSR An Embedded Hypervisor Privilege interrupt occurs CM MSRCM is set to EPCRICM. when no higher priority exception exists (see VLEMI Set to 1 if the instruction causing the inter- Section 7.9 on page 1059) and an Embedded Hypervi- rupt resides in VLE storage. sor Privilege exception is presented to the exception CE, ME,DE mechanism. Unchanged. An Embedded Hypervisor Privilege exception occurs All other defined MSR bits set to 0. when MSRGS = 1 and MSRPR = 0 and execution is If Interrupt Fixed Offsets [Category: Embed- attempted of any of the following: ded.Phased-In] are supported, instruction execution a hypervisor-privileged instruction resumes at address IVPR0:51 || 0x320. Otherwise, an mtspr or mfspr instruction that specifies an instruction execution resumes at address IVPR0:47 || SPR that is hypervisor privileged IVOR4148:59|| 0b0000. a tlbwe instruction and category Embed- ded.Hypervisor.LRAT is not implemented. a tlbwe, tlbsrx., or tlbilx instruction and 7.6.29 LRAT Error Interrupt [Cate- EPCRDGTMI=1 a tlbwe instruction that attempts to write a TLB gory: Embedded.Hypervisor.LRAT] entry for which TLBV=1 and TLBIPROT=1 when An LRAT Error interrupt occurs when no higher priority MAS0WQ=0b00 exception exists (see Section 7.9 on page 1059) and a tlbwe instruction that attempts to write a TLB an LRAT Miss exception is presented to the interrupt entry when MAS1V=1, MAS1IPROT=1, and mechanism. MAS0WQ=0b00 a mtpmr or mfpmr instruction and MSRPPMMP = 1 An LRAT Miss exception is caused by either of the fol- a Cache Locking instruction and MSRPUCLEP = 1 lowing. A tlbwe instruction is executed in guest supervisor An Embedded Hypervisor Privilege exception may state and the logical page number (RPN specified occur for the following implementation dependent rea- by MAS7 and MAS3 and page size specified by sons when MSRGS = 1 and MSRPR = 0 and execution MAS1TSIZE) does not match any valid entry in the is attempted of any of the following: LRAT. a tlbwe instruction that attempts to write a TLB A Page Table translation is performed and the entry for which TLBV=0 and TLBIPROT=1 when associated PTEARPN is to be treated as an LPN MAS0WQ=0b00 (the Embedded.Hypervisor category is supported) a tlbwe instruction that attempts to write a TLB and the logical page number (RPN based on entry when MAS1V=0, MAS1IPROT=1, and PTEARPN and page size specified by PTEPS) does MAS0WQ=0b00 not match any valid entry in the LRAT. a tlbwe instruction that attempts to write a TLB entry for which TLBIPROT=1 when MAS0WQ=0b01 When an LRAT Error interrupt occurs, the hardware a tlbwe instruction that attempts to write a TLB suppresses the execution of the instruction causing the entry when MAS1IPROT=1, and MAS0WQ=0b01 LRAT Error interrupt. a tlbwe instruction that attempts to write a TLB entry when MAS0HES=0 SRR0, SRR1, MSR, ESR, and LPER are updated as a tlbwe instruction that attempts to write a TLB follows: entry to an array that is disallowed by the imple- SRR0 Set to the effective address of the instruc- mentation tion causing the LRAT Error interrupt. an implementation dependent instruction or SPR which is hypervisor privileged SRR1 Set to the contents of the MSR at the time of the interrupt. Chapter 7. Interrupts and Exceptions 1051 Version 2.06 MSR All other defined ESR bits are set to 0. CM MSRCM is set to EPCRICM. CE, ME, DE Unchanged. LPER All other defined MSR bits are set to 0. Set to the values of the ARPN and PS fields from the PTE that was used to translate a virtual DEAR If the LRAT Error interrupt occurred for a address for an instruction fetch, Load, Store or Page Table translation, set to the effective Cache Management instruction that caused an address of a byte that is both within the LRAT Error interrupt as a result of an LRAT Miss range of the bytes being accessed by the exception. Storage Access or Cache Management instruction, and within the page whose If Interrupt Fixed Offsets [Category: Embed- access caused the LRAT Miss exception. ded.Phased-In] are supported, instruction execution Otherwise, undefined. resumes at address IVPR0:47 || 0x0340. Otherwise, instruction execution resumes at address IVPR0:47 || ESR IVOR4248:59|| 0b0000. FP Set to 1 if the instruction causing the inter- rupt is a floating-point load or store and the translation of the operand address causes the LRAT Miss exception; otherwise set to 0. ST Set to 1 if the instruction causing the inter- rupt is a Store or `store-class' Cache Man- agement instruction and the translation of the operand address causes the LRAT Miss exception; otherwise set to 0. AP Set to 1 if the instruction causing the inter- rupt is an Auxiliary Processor load or store and the translation of the operand address causes the LRAT Miss exception; other- wise set to 0. SPV Set to 1 if the instruction causing the inter- rupt is a SPE operation or a Vector opera- tion, the instruction is a Load or Store, and the translation of the operand address causes the LRAT Miss exception; other- wise set to 0. DATA Set to 1 if the interrupt is due to an LRAT miss resulting from a Page Table transla- tion of a Load, Store or Cache Manage- ment operand address; otherwise set to 0. TLBI Set to 1 if a TLB Ineligible exception occurred during a Page Table translation for the instruction causing the interrupt; oth- erwise set to 0. PT Set to 1 if the cause of the interrupt is an LRAT miss exception on a Page Table translation. Set to 0 if the cause of the inter- rupt is an LRAT miss exception on a tlbwe. VLEMI Set to 1 if the instruction causing the inter- rupt resides in VLE storage, the instruction is a Load, Store, or Cache Management instruction, and the translation of the oper- and address causes the LRAT Miss excep- tion. EPID Set to 1 if the instruction causing the inter- rupt is an External Process ID instruction, the instruction is a Load, Store, or Cache Management instruction, and the transla- tion of the operand address causes the LRAT Miss exception; otherwise set to 0. 1052 Power ISATM Book III-E Version 2.06 7.7 Partially Executed Instructions In general, the architecture permits load and store 1. Any Load or Store (except elementary, aligned, instructions to be partially executed, interrupted, and guarded): then to be restarted from the beginning upon return Any asynchronous interrupt from the interrupt. Unaligned Load and Store instruc- Machine Check tions, or Load Multiple, Store Multiple, Load String, and Program (Imprecise Mode Floating-Point Store String instructions may be broken up into multi- Enabled) ple, smaller accesses, and these accesses may be per- Program (Imprecise Mode Auxiliary Processor formed in any order. In order to guarantee that a Enabled) particular load or store instruction will complete without 2. Unaligned elementary Load or Store, or any multi- being interrupted and restarted, software must mark ple or string: the storage being referred to as Guarded, and must use an elementary (non-string or non-multiple) load or All of the above listed under item 1, plus the store that is aligned on an operand-sized boundary. following: Data Storage (if the access crosses a protec- In order to guarantee that Load and Store instructions tion boundary) can, in general, be restarted and completed correctly Debug (Data Address Compare, Data Value without software intervention, the following rules apply Compare) when an execution is partially executed and then inter- rupted: 3. mtcrf may also be partially executed due to the occurrence of any of the interrupts listed under For an elementary Load, no part of the target reg- item 1 at the time the mtcrf was executing. ister RT or FRT, will have been altered. All instructions prior to the mtcrf have com- For `with update' forms of Load or Store, the pleted execution. (Some storage accesses update register, register RA, will not have been generated by these preceding instructions altered. may not have completed.) On the other hand, the following effects are permissible No subsequent instruction has begun execu- when certain instructions are partially executed and tion. then restarted: The mtcrf instruction (the address of which was saved in SRR0/CSRR0/MCSRR0/ For any Store, some of the bytes at the target stor- DSRR0 [Category: Embedded.Enhanced age location may have been altered (if write Debug] at the occurrence of the interrupt), access to that page in which bytes were altered is may appear not to have begun or may have permitted by the access control mechanism). In partially executed. addition, for Store Conditional instructions, CR0 has been set to an undefined value, and it is unde- fined whether the reservation has been cleared. For any Load, some of the bytes at the addressed storage location may have been accessed (if read access to that page in which bytes were accessed is permitted by the access control mechanism). For Load Multiple or Load String, some of the reg- isters in the range to be loaded may have been altered. Including the addressing registers (RA, and possibly RB) in the range to be loaded is a programming error, and thus the rules for partial execution do not protect against overwriting of these registers. In no case will access control be violated. As previously stated, the only load or store instructions that are guaranteed to not be interrupted after being partially executed are elementary, aligned, guarded loads and stores. All others may be interrupted after being partially executed. The following list identifies the specific instruction types for which interruption after partial execution may occur, as well as the specific interrupt types that could cause the interruption: Chapter 7. Interrupts and Exceptions 1053 Version 2.06 7.8 Interrupt Ordering and Masking It is possible for multiple exceptions to exist simulta- chy of interrupt classes is as follows from highest to neously, each of which could cause the generation of lowest: an interrupt. Furthermore, for interrupts classes other than the Machine Check interrupt and critical interrupts, MSR Enables Save/Restore the architecture does not provide for reporting more Interrupt Class Cleared Registers than one interrupt of the same class (unless the Machine Check ME,DE, CE, EE MSRR0/1 Embedded.Enhanced Debug category is supported). Debug1 DE,CE,EE DSRR0/1 Therefore, the architecture defines that interrupts are ordered with respect to each other, and provides a Critical CE,EE CSRR0/1 masking mechanism for certain persistent interrupt Base EE SRR0/1 types. Guest EE GSRR0/1 1 When an interrupt is masked (disabled), and an event The Debug interrupt class is Category: E.ED. causes an exception that would normally generate the Note: MSRDE may be cleared when a critical inter- interrupt, the exception persists as a status bit in a reg- rupt occurs if Category: E.ED is not supported. ister (which register depends upon the exception type). However, no interrupt is generated. Later, if the inter- Figure 60. Interrupt Hierarchy rupt is enabled (unmasked), and the exception status has not been cleared by software, the interrupt due to [Category: Embedded.Hypervisor] the original exception event will then finally be gener- The masking of interrupts is affected by MSRGS and ated. whether the interrupt is directed to the guest supervisor state or the hypervisor state. In general, interrupts All asynchronous interrupts can be masked. In addition, directed to the hypervisor state (with the exception of certain synchronous interrupts can be masked. An Guest Processor Doorbell type interrupts), are enabled example of such an interrupt is the Floating-Point if MSRGS = 1 regardless of the value of other MSR Enabled exception type Program interrupt. The execu- enables. Interrupts directed to the guest supervisor tion of a floating-point instruction that causes the state are enabled if the associated MSR enables are FPSCRFEX bit to be set to 1 is considered an exception set and MSRGS = 1. event, regardless of the setting of MSRFE0,FE1. If MSRFE0,FE1 are both 0, then the Floating-Point If the Embedded.Enhanced Debug category is not sup- Enabled exception type of Program interrupt is ported (or is supported and is not enabled), then the masked, but the exception persists in the FPSCRFEX Debug interrupt becomes a Critical class interrupt and bit. Later, if the MSRFE0,FE1 bits are enabled, the inter- all critical class interrupts will clear DE, CE, and EE in rupt will finally be generated. the MSR. The architecture enables implementations to avoid situ- Base Class interrupts that occur as a result of precise ations in which an interrupt would cause the state infor- exceptions are not masked by the EE bit in the MSR mation (saved in Save/Restore Registers) from a and any such exception that occurs prior to software previous interrupt to be overwritten and lost. In order to saving the state of SRR0/1 in a base class exception do this, the architecture defines interrupt classes in a handler will result in a situation that could result in the hierarchical manner. At each interrupt class, hardware loss of state information. automatically disables any further interrupts associated This first step of the hardware clearing the MSR enable with the interrupt class by masking the interrupt enable bits lower in the hierarchy shown in Figure 60 prevents in the MSR when the interrupt is taken. In addition, any subsequent asynchronous interrupts from overwrit- each interrupt class masks the interrupt enable in the ing the Save/Restore Registers (SRR0/SRR1, CSRR0/ MSR for each lower class in the hierarchy. The hierar- CSRR1, MCSRR0/MCSRR1, or DSRR0/DSRR1 [Cate- gory: Embedded.Enhanced Debug]), prior to software being able to save their contents. Hardware also auto- matically clears, on any interrupt, MSRPR,FP,FE0,FE1,IS,DS. The clearing of these bits assists in the avoidance of subsequent interrupts of certain other types. However, guaranteeing that inter- rupt classes lower in the hierarchy do not occur and thus do not overwrite the Save/Restore Registers (SRR0/SRR1, CSRR0/CSRR1, DSRR0/DSRR1 [Cate- gory: Embedded.Enhanced Debug], or MCSRR0/ MCSRR1) also requires the cooperation of system soft- ware. Specifically, system software must avoid the exe- 1054 Power ISATM Book III-E Version 2.06 cution of instructions that could cause (or enable) a execution of any floating-point instruction, due to subsequent interrupt, if the contents of the Save/ the automatic clearing of MSRFP. However, even if Restore Registers (SRR0/SRR1, CSRR0/CSRR1, software were to re-enable MSRFP, floating-point DSRR0/DSRR1 [Category: Embedded.Enhanced instructions must still be avoided in order to pre- Debug]), or MCSRR0/MCSRR1) have not yet been vent Program interrupts due to various possible saved. Program interrupt exceptions (Floating-Point Enabled, Unimplemented Operation). 7.8.1 Guidelines for System Soft- Re-enabling of MSRPR ware This prevents Privileged Instruction exception type Program interrupts. Alternatively, software could The following list identifies the actions that system soft- re-enable MSRPR, but avoid the execution of any ware must avoid, prior to having saved the Save/ privileged instructions. Restore Registers' contents: Execution of any Auxiliary Processor instruction Re-enabling an interrupt class that is at the same or a lower level in the interrupt hierarchy. This This prevents Auxiliary Processor Unavailable includes the following actions: interrupts, and Auxiliary Processor Enabled type and Unimplemented Operation type Program inter- - Re-enabling of MSREE rupts. - Re-enabling of MSRCE,EE in critical class Execution of any Illegal instructions interrupt handlers, and if the Embed- ded.Enhanced Debug category is not sup- This prevents Illegal Instruction exception type ported, re-enabling of MSRDE. Program interrupts. - Category: Embedded.Enhanced Debug: Re- Execution of any instruction that could cause an enabling of MSRCE,EE,DE in Debug class inter- Alignment interrupt rupt handlers This prevents Alignment interrupts. Included in this - Re-enabling of MSREE,CE,DE,ME in Machine category are any string or multiple instructions, Check interrupt handlers. and any unaligned elementary load or store instructions. See Section 7.6.7 on page 1039 for a Branching (or sequential execution) to addresses complete list of instructions that may cause Align- for which any of the following conditions are true. ment interrupts. The address is not mapped by the TLB or the Page Table or is mapped without UX=1 or It is not necessary for hardware or software to avoid SX=1 permission. interrupts higher in the interrupt hierarchy (see Both the Embedded.Hypervisor.LRAT and the Figure 60 on page 1055) from within interrupt handlers Embedded Page.Table category are sup- (and hence, for example, hardware does not automati- ported, MSRGS=1, and the effective address cally clear MSRCE,ME,DE upon a base class interrupt), is mapped by the Page Table but the LPN is since interrupts at each level of the hierarchy use differ- not mapped by the LRAT. ent pairs of Save/Restore Registers to save the instruc- tion address and MSR (i.e. SRR0/SRR1 for base class This prevents Instruction Storage, LRAT Error, and interrupts, and MCSRR0/MCSRR1,DSRR0/DSRR1 Instruction TLB Error interrupts. [Category: Embedded.Enhanced Debug], or CSRR0/ Load, Store or Cache Management instructions to CSRR1 for non-base class interrupts). The converse, addresses for which any of the following conditions however, is not true. That is, hardware and software are true. must cooperate in the avoidance of interrupts lower in The address is not mapped by the TLB or the the hierarchy from occurring within interrupt handlers, Page Table or is mapped without the required even though the these interrupts use different Save/ access permissions. Restore Register pairs. This is because the interrupt Both the Embedded.Hypervisor.LRAT and the higher in the hierarchy may have occurred from within a Embedded Page.Table category are sup- interrupt handler for an interrupt lower in the hierarchy ported, MSRGS=1, and the effective address prior to the interrupt handler having saved the Save/ is mapped by the Page Table but the LPN is Restore Registers. Therefore, within an interrupt han- not mapped by the LRAT. dler, Save/Restore Registers for all interrupts lower in the hierarchy may contain data that is necessary to the This prevents Data Storage, LRAT Error, and Data system software. TLB Error interrupts. Execution of any floating-point instruction This prevents Floating-Point Unavailable inter- rupts. Note that this interrupt would occur upon the Chapter 7. Interrupts and Exceptions 1055 Version 2.06 7.8.2 Interrupt Order priority existing interrupt in items 2-5, without executing any instructions at the base class interrupt handler. The following is a prioritized listing of the various This is because the base interrupt classes do not auto- enabled interrupts for which exceptions might exist matically disable the MSR mask bits for the interrupts simultaneously: listed in 2-5. In all other cases, a particular interrupt class from the above list will automatically disable any 1. Synchronous (Non-Debug) Interrupts: subsequent interrupts of the same class, as well as all Data Storage other interrupt classes that are listed below it in the pri- Instruction Storage ority order. Alignment Program Embedded Hypervisor Privilege Floating-Point Unit Unavailable Auxiliary Processor Unavailable Embedded Floating-Point Unavailable [SP.Category: SP.Embedded Float_*] SPE/Embedded Floating-Point/Vector Unavailable Embedded Floating-Point Data [Category: SP.Embedded Float_*] Embedded Floating-Point Round [Category: SP.Embedded Float_*] System Call Embedded Hypervisor System Call Data TLB Error Instruction TLB Error LRAT Error Only one of the above types of synchronous inter- rupts may have an existing exception generating it at any given time. This is guaranteed by the excep- tion priority mechanism (see Section 7.9 on page 1059) and the requirements of the Sequen- tial Execution Model. 2. Machine Check 3. Guest Processor Doorbell Machine Check 4. Debug 5. Critical Input 6. Watchdog Timer 7. Processor Doorbell Critical Category: Embedded.Processor Control 8. Guest Processor Doorbell Critical Cate- gory: Embedded.Processor Control 9. External Input Category: Embedded.Processor Control 10. Fixed-Interval Timer Category: Embedded.Proces- sor Control 11. Decrementer Category: Embedded.Processor Control 12. Processor Doorbell Category: Embed- ded.Processor Control 13. Guest Processor Doorbell 14. Embedded Performance Monitor Even though, as indicated above, the base, synchro- nous exception types listed under item 1 are generated with higher priority than the non-base interrupt classes listed in items 2-6, the fact is that these base class interrupts will immediately be followed by the highest 1056 Power ISATM Book III-E Version 2.06 7.9 Exception Priorities Programming Note Some exception types may even be mutually exclu- All synchronous (precise and imprecise) interrupts are sive of each other and could otherwise be consid- reported in program order, as required by the Sequen- ered the same priority. In these cases, the tial Execution Model. The one exception to this rule is exceptions are listed in the order suggested by the the case of multiple synchronous imprecise interrupts. sequential execution model. Upon a synchronizing event, all previously executed instructions are required to report any synchronous imprecise interrupt-generating exceptions, and the interrupt will then be generated with all of those excep- 7.9.1 Exception Priorities for tion types reported cumulatively, in both the ESR, and Defined Instructions any status registers associated with the particular exception type (e.g. the Floating-Point Status and Con- trol Register). 7.9.1.1 Exception Priorities for Defined Floating-Point Load and Store Instruc- For any single instruction attempting to cause multiple exceptions for which the corresponding synchronous tions interrupt types are enabled, this section defines the pri- The following prioritized list of exceptions may occur as ority order by which the instruction will be permitted to a result of the attempted execution of any defined cause a single enabled exception, thus generating a Floating-Point Load and Store instruction. particular synchronous interrupt. Note that it is this exception priority mechanism, along with the require- 1. Debug (Instruction Address Compare) ment that synchronous interrupts be generated in pro- 2. Instruction TLB Error gram order, that guarantees that at any given time, 3. Instruction Storage Interrupt (all types) there exists for consideration only one of the synchro- 4. LRAT Error for instruction fetch [Categories: E.PT nous interrupt types listed in item 1 of Section 7.8.2 on and E.HV.LRAT] page 1058. The exception priority mechanism also pre- 5. Program (Illegal Instruction) vents certain debug exceptions from existing in combi- 6. Floating-Point Unavailable nation with certain other synchronous interrupt- 7. Program (Unimplemented Operation) generating exceptions. 8. Data TLB Error 9. Data Storage (all types) Because unaligned Load and Store instructions, or 10. Alignment Load Multiple, Store Multiple, Load String, and Store 11. LRAT Error for data access [Categories: E.PT and Sting instructions may be broken up into multiple, E.HV.LRAT] smaller accesses, and these accesses may be per- 12. Debug (Data Address Compare, Data Value Com- formed in any order. The exception priority mechanism pare) applies to each of the multiple storage accesses in the 13. Debug (Instruction Complete) order they are performed by the implementation. If the instruction is causing both a Debug (Instruction This section does not define the permitted setting of Address Compare) and a Debug (Data Address Com- multiple exceptions for which the corresponding inter- pare) or Debug (Data Value Compare), and is not caus- rupt types are disabled. The generation of exceptions ing any of the exceptions listed in items 2-11, it is for which the corresponding interrupt types are dis- permissible for both exceptions to be generated and abled will have no effect on the generation of other recorded in the DBSR. A single Debug interrupt will exceptions for which the corresponding interrupt types result. are enabled. Conversely, if a particular exception for which the corresponding interrupt type is enabled is shown in the following sections to be of a higher priority 7.9.1.2 Exception Priorities for Other than another exception, it will prevent the setting of that Defined Load and Store Instructions and other exception, independent of whether that other Defined Cache Management Instructions exception's corresponding interrupt type is enabled or The following prioritized list of exceptions may occur as disabled. a result of the attempted execution of any other defined Except as specifically noted, only one of the exception Load or Store instruction, or defined Cache Manage- types listed for a given instruction type will be permitted ment instruction. to be generated at any given time. The priority of the 1. Debug (Instruction Address Compare) exception types are listed in the following sections 2. Instruction TLB Error ranging from highest to lowest, within each instruction 3. Instruction Storage Interrupt (all types) type. 4. LRAT Error for instruction fetch [Categories: E.PT and E.HV.LRAT] 5. Program (Illegal Instruction) Chapter 7. Interrupts and Exceptions 1057 Version 2.06 6. Program (Privileged Instruction) (dcbi only) Instruction) exceptions, it is possible that they are 7. Program (Unimplemented Operation) simultaneously enabling (via mask bits) multiple exist- 8. Data TLB Error ing exceptions (and at the same time possibly causing 9. Data Storage (all types) a Debug (Instruction Complete) exception). When this 10. Alignment occurs, the interrupts will be handled in the order 11. LRAT Error for data access [Categories: E.PT and defined by Section 7.8.2 on page 1058. E.HV.LRAT] 12. Debug (Data Address Compare, Data Value Com- pare) 7.9.1.5 Exception Priorities for Defined 13. Debug (Instruction Complete) Trap Instructions If the instruction is causing both a Debug (Instruction The following prioritized list of exceptions may occur as Address Compare) and a Debug (Data Address Com- a result of the attempted execution of a defined Trap pare) or Debug (Data Value Compare), and is not caus- instruction. ing any of the exceptions listed in items 2-11, it is 1. Debug (Instruction Address Compare) permissible for both exceptions to be generated and 2. Instruction TLB Error recorded in the DBSR. A single Debug interrupt will 3. Instruction Storage Interrupt (all types) result. 4. LRAT Error [Categories: E.PT and E.HV.LRAT] 5. Program (Illegal Instruction) 7.9.1.3 Exception Priorities for Other 6. Program (Unimplemented Operation) 7. Debug (Trap) Defined Floating-Point Instructions 8. Program (Trap) The following prioritized list of exceptions may occur as 9. Debug (Instruction Complete) a result of the attempted execution of any defined float- If the instruction is causing both a Debug (Instruction ing-point instruction other than a load or store. Address Compare) and a Debug (Trap), and is not 1. Debug (Instruction Address Compare) causing any of the exceptions listed in items 2-6, it is 2. Instruction TLB Error permissible for both exceptions to be generated and 3. Instruction Storage Interrupt (all types) recorded in the DBSR. A single Debug interrupt will 4. LRAT Error [Categories: E.PT and E.HV.LRAT] result. 5. Program (Illegal Instruction) 6. Floating-Point Unavailable 7. Program (Unimplemented Operation) 7.9.1.6 Exception Priorities for Defined 8. Program (Floating-point Enabled) System Call Instruction 9. Debug (Instruction Complete) The following prioritized list of exceptions may occur as a result of the attempted execution of a defined System 7.9.1.4 Exception Priorities for Defined Call instruction. Privileged Instructions 1. Debug (Instruction Address Compare) 2. Instruction TLB Error The following prioritized list of exceptions may occur as 3. Instruction Storage Interrupt (all types) a result of the attempted execution of any defined privi- 4. LRAT Error [Categories: E.PT and E.HV.LRAT] leged instruction, except dcbi, rfi, and rfci instructions. 5. Program (Illegal Instruction) 1. Debug (Instruction Address Compare) 6. Program (Unimplemented Operation) 2. Instruction TLB Error 7. System Call 3. Instruction Storage Interrupt (all types) 8. Embedded Hypervisor System Call [Category: 4. LRAT Error [Categories: E.PT and E.HV.LRAT] E.HV] (for hardware tablewalk) 9. Debug (Instruction Complete) 5. Program (Illegal Instruction, except for tlbwe with invalid MAS settings, see 9) 6. Program (Privileged Instruction) 7.9.1.7 Exception Priorities for Defined 7. Program (Unimplemented Operation) Branch Instructions 8. Embedded Hypervisor Privilege [Category: E.HV] TThe following prioritized list of exceptions may occur 9. Program (Illegal Instruction, special case for tlbwe as a result of the attempted execution of any defined with invalid MAS settings) branch instruction. 10. LRAT Error [Category: E.HV.LRAT] (for tlbwe) 11. Debug (Instruction Complete) 1. Debug (Instruction Address Compare) 2. Instruction TLB Error For mtmsr, mtspr (DBCR0, DBCR1, DBCR2), mtspr 3. Instruction Storage Interrupt (all types) (TCR), and mtspr (TSR), if they are not causing Debug 4. LRAT Error [Categories: E.PT and E.HV.LRAT] (Instruction Address Compare) nor Program (Privileged 5. Program (Illegal Instruction) 1058 Power ISATM Book III-E Version 2.06 6. Program (Unimplemented Operation) 7.9.2 Exception Priorities for 7. Debug (Branch Taken) 8. Debug (Instruction Complete) Reserved Instructions If the instruction is causing both a Debug (Instruction The following prioritized list of exceptions may occur as Address Compare) and a Debug (Branch Taken), and a result of the attempted execution of any reserved is not causing any of the exceptions listed in items 2-6, instruction. it is permissible for both exceptions to be generated 1. Debug (Instruction Address Compare) and recorded in the DBSR. A single Debug interrupt will 2. Instruction TLB Error result. 3. Instruction Storage Interrupt (all types) 4. Program (Illegal Instruction) 7.9.1.8 Exception Priorities for Defined Return From Interrupt Instructions The following prioritized list of exceptions may occur as a result of the attempted execution of an rfi, rfci, rfmci, rfdi [Category:Embedded.Enhanced Debug], rfgi [Cat- egory: Embedded.Hypervisor] instruction. 1. Debug (Instruction Address Compare) 2. Instruction TLB Error 3. Instruction Storage Interrupt (all types) 4. LRAT Error [Categories: E.PT and E.HV.LRAT] 5. Program (Illegal Instruction) 6. Program (Privileged Instruction) 7. Program (Unimplemented Operation) 8. Debug (Return From Interrupt) 9. Debug (Instruction Complete) If the rfi or rfci, rfmci, or rfdi [Category: Embed- ded.Enhanced Debug] or rfgi [Category: Embed- ded.Hypervisor] instruction is causing both a Debug (Instruction Address Compare) and a Debug (Return From Interrupt), and is not causing any of the excep- tions listed in items 2-6, it is permissible for both excep- tions to be generated and recorded in the DBSR. A single Debug interrupt will result. 7.9.1.9 Exception Priorities for Other Defined Instructions The following prioritized list of exceptions may occur as a result of the attempted execution of all other instruc- tions not listed above. 1. Debug (Instruction Address Compare) 2. Instruction TLB Error 3. Instruction Storage Interrupt (all types) 4. LRAT Error instruction fetch [Categories: E.PT and E.HV.LRAT] 5. Program (Illegal Instruction) 6. Program (Unimplemented Operation) 7. Embedded Hypervisor Privilege 8. LRAT Error for data access for tlbwe [Category: E.HV.LRAT] 9. Debug (Instruction Complete) Chapter 7. Interrupts and Exceptions 1059 Version 2.06 1060 Power ISATM Book III-E Version 2.06 Chapter 8. Reset and Initialization 8.1 Background. . . . . . . . . . . . . . . . . 1063 8.4 Software Initialization 8.2 Reset Mechanisms . . . . . . . . . . . 1063 Requirements . . . . . . . . . . . . . . . . . . 1065 8.3 Thread State after Reset . . . . . . 1063 8.1 Background The Thread Enable Register, Machine State Register, Processor Version Register, and a TLB entry are This chapter describes the requirements for thread updated as follows. reset. This includes both the means of causing reset, and the specific initialization that is required to be per- Thread Enable Register [Category: formed automatically by the hardware. This chapter Embedded Multi-Threading] also provides an overview of the operations that should be performed by initialization software. The TEN is set to the value 0x0000_0000_0000_0001, indicating that only thread 0 is enabled. In general, the specific actions taken by a thread upon reset are implementation-dependent. Also, it is the responsibility of system initialization software to initial- Machine State Register ize the majority of thread and system resources after The state of the MSR for all threads is as shown in reset. Implementations are required to provide a mini- Figure 61. mum thread initialization such that this system software may be fetched and executed, thereby accomplishing the rest of system initialization. Bit Setting Comments CM 0 Computation Mode (set to 32-bit 8.2 Reset Mechanisms mode) GS 0 Hypervisor state This specification defines two mechanisms for inter- UCLE 0 User Cache Locking Enable nally invoking a thread reset operation using either the Watchdog Timer (see Section 9.7 on page 1073) or the SPV 0 SPE/Embedded Floating-Point/ Debug facilities using DBCR0RST (see Section 10.5.1.1 Vector Unavailable on page 1085). In addition, implementations will typi- CE 0 Critical Input interrupts disabled cally provide additional means for invoking a reset DE 0 Debug interrupts disabled operation, using an external mechanism such as a sig- EE 0 External Input interrupts disabled nal pin, which when activated, causes the thread to be PR 0 Supervisor mode reset. FP 0 FP unavailable ME 0 Machine Check interrupts disabled 8.3 Thread State after Reset FE0 0 FP exception type Program inter- rupts disabled The initial thread state is controlled by the register con- FE1 0 FP exception type Program inter- tents after reset. In general, the contents of most regis- rupts disabled ters are undefined after reset. IS 0 Instruction Address Space 0 The hardware is only guaranteed to initialize those reg- DS 0 Data Address Space 0 isters (or specific bits in registers) which must be initial- PMM 0 Performance Monitor Mark ized in order for software to be able to reliably perform the rest of system initialization. Figure 61. Machine State Register Initial Values Chapter 8. Reset and Initialization 1063 Version 2.06 Logical Partition Identification Regis- ter [Category Embedded.Hypervisor] Field Setting Comments The Logical Partition Identification Register (LPIDR) is V 1 valid set to 0. EPN see Represents the last page in below effective address space Processor Version Register RPN see Represents the last page in Implementation-Dependent. (This register is read-only, below physical address space and contains a value which identifies the specific imple- TS 0 translation address space 0 mentation) IND 0 direct entry TLB entry TLPID 0 translation logical partition ID A TLB entry (which entry is implementation-dependent) is initialized in an implementation-dependent manner TGS 0 translation hypervisor state that maps the last page in the implemented effective storage address space, with the following field settings: SIZE ? smallest page size supported W ? implementation-dependent value I ? implementation-dependent value M ? implementation-dependent value G ? implementation-dependent value E ? implementation-dependent value U0 ? implementation-dependent value U1 ? implementation-dependent value U2 ? implementation-dependent value U3 ? implementation-dependent value TID ? implementation-dependent value, but page must be accessible UX ? implementation-dependent value UR ? implementation-dependent value UW ? implementation-dependent value SX 1 page is execute accessible in supervisor mode SR 1 page is read accessible in supervisor mode SW 1 page is write accessible in supervisor mode VLE ? implementation-dependent value ACM ? implementation-dependent value IPROT ? implementation-dependent value VF 0 no virtualization fault Figure 62. TLB Initial Values The initial settings of EPN and RPN are dependent upon the number of bits implemented in the EPN and RPN fields and the minimum page size supported by the implementation. For example, an implementation that supports 64KB pages as the smallest size and 32 bits of effective address would implement a 16 bit EPN and set the initial value of the EPN field of the TLB boot entry to 216-1 (0xFFFF) while an implementation that supports 4K pages as the smallest size and 32 bits of effective address would implement a 20 bit EPN and 1064 Power ISATM Book III-E Version 2.06 set the initial value of the boot entry to 220-1 (0xFFFFF). Instruction execution begins at the last word address of the page mapped by the boot TLB entry. Note that this address is different from the System Reset interrupt vector specified in Book III-S. An implementation may provide additional methods for initializing the TLB entry used for initial boot by provid- ing an implementation-dependent RPN, or initializing other TLB entries. If Category: Embedded Multi-threading.Thread Man- agement is not supported, instruction execution for other threads begins at the last word address of the effective address space; otherwise execution begins at the address specified by the NIA register correspond- ing to the thread. 8.4 Software Initialization Requirements When reset occurs, the thread is initialized to a mini- mum configuration to start executing initialization code. Initialization code is necessary to complete the thread and system configuration. The initialization code described in this section is the minimum recommended for configuring the thread to run application code. Initialization code should configure the following resources: - Invalidate the instruction cache and data cache (implementation-dependent). - Initialize system memory as required by the operating system or application code. - Initialize the Interrupt Vector Prefix Register and Interrupt Vector Offset Register. - Initialize other registers as needed by the sys- tem. - Initialize off-chip system facilities. - Dispatch the operating system or application code. Chapter 8. Reset and Initialization 1065 Version 2.06 1066 Power ISATM Book III-E Version 2.06 Chapter 9. Timer Facilities 9.1 Overview. . . . . . . . . . . . . . . . . . . 1067 9.4 Decrementer Auto-Reload 9.2 Time Base (TB) . . . . . . . . . . . . . 1067 Register . . . . . . . . . . . . . . . . . . . . . . . 1070 9.2.1 Writing the Time Base . . . . . . . 1068 9.5 Timer Control Register . . . . . . . . 1070 9.3 Decrementer . . . . . . . . . . . . . . . . 1069 9.5.1 Timer Status Register . . . . . . . 1072 9.3.1 Writing and Reading the Decre- 9.6 Fixed-Interval Timer . . . . . . . . . . 1073 menter . . . . . . . . . . . . . . . . . . . . . . . . 1069 9.7 Watchdog Timer . . . . . . . . . . . . . 1073 9.3.2 Decrementer Events . . . . . . . . 1069 9.8 Freezing the Timer Facilities . . . . 1075 9.1 Overview The period of the Time Base depends on the driving frequency. As an order of magnitude example, suppose The Time Base, Decrementer, Fixed-interval Timer, that the CPU clock is 1 GHz and that the Time Base is and Watchdog Timer provide timing functions for the driven by this frequency divided by 32. Then the period system. The remainder of this section describes these of the Time Base would be registers and related facilities. 64 2 × 32 TTB = -------------------- = 5.90 × 1011 seconds - 1 GHz 9.2 Time Base (TB) which is approximately 18,700 years. The Time Base is implemented such that: The Time Base (TB) is a 64-bit register (see Figure 63) containing a 64-bit unsigned integer that is incremented 1. Loading a GPR from the Time Base has no effect periodically. Each increment adds 1 to the low-order bit on the accuracy of the Time Base. (bit 63). The frequency at which the integer is updated 2. Copying the contents of a GPR to the Time Base is implementation-dependent. replaces the contents of the Time Base with the contents of the GPR. TBU TBL 0 32 63 The Power ISA does not specify a relationship between the frequency at which the Time Base is updated and Field Description other frequencies, such as the CPU clock or bus clock in a Power ISA system. The Time Base update fre- TBU Upper 32 bits of Time Base quency is not required to be constant. What is required, TBL Lower 32 bits of Time Base so that system software can keep time of day and oper- ate interval timers, is one of the following. Figure 63. Time Base The system provides an (implementation-depen- The Time Base bits 0:59 increment until their value dent) interrupt to software whenever the update becomes 0xFFF_FFFF_FFFF_FFFF (259 - 1), at the frequency of the Time Base bits 0:59 changes, and next increment their value becomes a means to determine what the current update fre- 0x000_0000_0000_0000. There is no interrupt or other quency is. indication when this occurs. The update frequency of the Time Base bits 0:59 is Time base bits 60:63 may increment at a variable rate. under the control of the system software. When the value of bit 59 changes, bits 60:63 are set to zero; if bits 60:63 increment to 0xF before the value of Implementations must provide a means for either pre- bit 59 changes, they remain at 0xF until the value of bit venting the Time Base from incrementing or preventing 59 changes. it from being read in user mode (MSRPR=1). If the means is under software control, it must be privileged. Chapter 9. Timer Facilities 1067 Version 2.06 There must be a method for getting all Time Bases in the system to start incrementing with values that are identical or almost identical. Programming Note If software initializes the Time Base on power-on to some reasonable value and the update frequency of the Time Base is constant, the Time Base can be used as a source of values that increase at a constant rate, such as for time stamps in trace entries. Even if the update frequency is not constant, val- ues read from the Time Base are monotonically increasing (except when the Time Base wraps from 264-1 to 0). If a trace entry is recorded each time the update frequency changes, the sequence of Time Base values can be post-processed to become actual time values. Successive readings of the Time Base may return identical values. See the description of the Time Base in Book II, for ways to compute time of day in POSIX format from the Time Base. 9.2.1 Writing the Time Base Writing the Time Base is hypervisor privileged. Reading the Time Base is not privileged, it is discussed in Book II. It is not possible to write the entire 64-bit Time Base using a single instruction. The mttbl and mttbu extended mnemonics write the lower and upper halves of the Time Base (TBL and TBU), respectively, pre- serving the other half. These are extended mnemonics for the mtspr instruction; see Appendix B, "Assembler Extended Mnemonics" on page 1109. The Time Base can be written by a sequence such as: lwz Rx,upper # load 64-bit value for lwz Ry,lower # TB into Rx and Ry li Rz,0 mttbl Rz # set TBL to 0 mttbu Rx # set TBU mttbl Ry # set TBL Provided that no interrupts occur while the last three instructions are being executed, loading 0 into TBL pre- vents the possibility of a carry from TBL to TBU while the Time Base is being initialized. Programming Note The instructions for writing the Time Base are mode-independent. Thus code written to set the Time Base will work correctly in either 64-bit or 32- bit mode. 1068 Power ISATM Book III-E Version 2.06 9.3 Decrementer 9.3.1 Writing and Reading the The Decrementer (DEC) is a 32-bit decrementing Decrementer counter that provides a mechanism for causing a Dec- The contents of the Decrementer can be read or written rementer interrupt after a programmable delay. The using the mfspr and mtspr instructions, both of which contents of the Decrementer are treated as a signed are hypervisor privileged. Using an extended mne- integer. monic (see Appendix B, "Assembler Extended Mne- monics" on page 1109), the Decrementer can be DEC written from GPR Rx using: 32 63 mtdec Rx Figure 64. Decrementer The Decrementer can be read into GPR Rx using: Decrementer bits 32:59 count down until their value becomes 0x000_0000, at the next increment their mfdec Rx value becomes 0xFFF_FFFF. Decrementer bits 60:63 may decrement at a variable rate. When the value of bit Copying the Decrementer to a GPR has no effect on 59 changes, bits 60:63 are set to 0xF; if bits 60:63 dec- the Decrementer contents or on the interrupt mecha- rement to 0x0 before the value of bit 59 changes, they nism. remain at 0x0 until the value of bit 59 changes. The Decrementer is driven by the same frequency as 9.3.2 Decrementer Events the Time Base. The period of the Decrementer will A Decrementer event occurs when a decrement occurs depend on the driving frequency, but if the same values on a Decrementer value of 0x0000_0001. are used as given above for the Time Base (see Sec- tion 9.2), and if the Time Base update frequency is con- Upon the occurrence of a Decrementer event, the Dec- stant, the period would be rementer may be reloaded from a 32-bit Decrementer 32 Auto-Reload Register (DECAR). See Section 9.4. Upon 2 × 32 the occurrence of a Decrementer event, the Decre- TDEC = -------------------- = 137 seconds. 1 GHz menter has the following basic modes of operation. The Decrementer counts down. The operation of the Decrementer satisfies the follow- Decrement to one and stop on zero ing constraints. If TCRARE=0, TSRDIS is set to 1, the value 1. The operation of the Time Base and the Decre- 0x0000_0000 is then placed into the DEC, and the menter is coherent, i.e., the counters are driven by Decrementer stops decrementing. the same fundamental time base. A Decrementer interrupt occurs when no higher 2. Loading a GPR from the Decrementer has no priority interrupt exists, a Decrementer exception effect on the accuracy of the Time Base. exists, and the exception is enabled. If category 3. Copying the contents of a GPR to the Decre- Embedded.Hypervisor is supported, the interrupt menter replaces the contents of the Decrementer is enabled by TCRDIE=1 and (MSREE=1 or with the contents of the GPR. MSRGS=1). Otherwise, the interrupt is enabled by TCRDIE=1 and MSREE=1. See Section 7.6.12, Programming Note "Decrementer Interrupt" on page 1043 for details of In systems that change the Time Base update fre- register behavior caused by the Decrementer inter- quency for purposes such as power management, rupt. the Decrementer input frequency will also change. Software must be aware of this in order to set inter- Decrement to one and auto-reload val timers. If TCRARE=1, TSRDIS is set to 1, the contents of If Decrementer bits 60:63 are used as part of a ran- the Decrementer Auto-Reload Register is then dom number generator, software must account for placed into the DEC, and the Decrementer contin- the fact that these bits are set to 0xF only when bit ues decrementing from the reloaded value. 59 changes state regardless of whether or not they decremented to 0x0 since they were previously set A Decrementer interrupt occurs when no higher to 0xF. priority interrupt exists, a Decrementer exception exists, and the exception is enabled. If category Embedded.Hypervisor is supported, the interrupt is enabled by TCRDIE=1 and (MSREE=1 or MSRGS=1). Otherwise, the interrupt is enabled by TCRDIE=1 and MSREE=1. See Section 7.6.12, Chapter 9. Timer Facilities 1069 Version 2.06 "Decrementer Interrupt" on page 1043 for details of register behavior caused by the Decrementer inter- 9.5 Timer Control Register rupt. The Timer Control Register (TCR) is a 32-bit register. Forcing the Decrementer to 0 using the mtspr instruc- Timer Control Register bits are numbered 32 (most-sig- tion will not cause a Decrementer exception; however, nificant bit) to 63 (least-significant bit). The Timer Con- decrementing which was in progress at the instant of trol Register controls Decrementer (see Section 9.3), the mtspr may cause the exception. To eliminate the Fixed-Interval Timer (see Section 9.6), and Watchdog Decrementer as a source of exceptions, set TCRDIE to Timer (see Section 9.7) options. 0 (clear the Decrementer Interrupt Enable bit). The relationship of the Timer facilities to the TCR and If it is desired to eliminate all Decrementer activity, the TB is shown in the figure below. procedure is as follows: This register is hypervisor privileged. 1. Write 0 to TCRDIE. This will prevent Decrementer activity from causing exceptions. 2. Write 0 to TCRARE to disable the Decrementer auto-reload. 3. Write 0 to Decrementer. This will halt Decrementer decrementing. While this action will not cause a Decrementer exception to be set in TSRDIS, a near simultaneous decrement may have done so. 4. Write 1 to TSRDIS. This action will clear TSRDIS to 0 ( see Section 9.5.1 on page 1072). This will clear any Decrementer exception which may be pend- ing. Because the Decrementer is frozen at zero, no further Decrementer events are possible. If the auto-reload feature is disabled (TCRARE=0), then once the Decrementer decrements to zero, it will stay there until software reloads it using the mtspr instruc- tion. On reset, TCRARE is set to 0. This disables the auto- reload feature. 9.4 Decrementer Auto-Reload Register The Decrementer Auto-Reload Register is a 32-bit reg- ister as shown below. DECAR 32 63 Figure 65. Decrementer Bits of the decrementer auto-reload register are num- bered 32 (most-significant bit) to 63 (least-significant bit). The Decrementer Auto-Reload Register is pro- vided to support the auto-reload feature of the Decre- menter. See Section 9.3.2 The contents of the Decrementer Auto-Reload Register cannot be read. The contents of bits 32:63 of register RS can be written to the Decrementer Auto-Reload Register using the mtspr instruction. This register is hypervisor privileged. 1070 Power ISATM Book III-E Version 2.06 TIME BASE (incrementer) TBU TBL Timer Clock 0 31 0 31 Watchdog Timer events based on one of 4 Time Base bits selected by TCRWP (the 4 Time Base bits that can be selected by TCRWP are implementation-dependent) Fixed-Interval Timer events based on one of 4 Time Base bits selected by TCRFP (the 4 Time Base bits that can be selected by TCRFP are implementation-dependent) (decrementer) DEC Decrementer event < 0/1 detect auto-reload DECAR 0 31 Figure 66. Relationships of the Timer Facilities The contents of the Timer Control Register can be read tion of any of these settings is implemen- using the mfspr instruction. The contents of bits 32:63 tation-dependent. of register RS can be written to the Timer Control Reg- The Watchdog Timer Reset Control field is ister using the mtspr instruction. cleared to zero by reset. These bits are set The contents of the TCR are defined below: only by software. Once a 1 has been written to one of these bits, that bit remains a 1 until a Bit(s) Description reset occurs. This is to prevent errant code from disabling the Watchdog reset function. 32:33 Watchdog Timer Period (WP) (see Section 9.7 on page 1073) 36 Watchdog Timer Interrupt Enable (WIE) (see Section 9.7 on page 1073) Specifies one of 4 bit locations of the Time Base used to signal a Watchdog Timer 0 Disable Watchdog Timer interrupt exception on a transition from 0 to 1. The 4 1 Enable Watchdog Timer interrupt Time Base bits that can be specified to 37 Decrementer Interrupt Enable (DIE) (see serve as the Watchdog Timer period are Section 9.3 on page 1069) implementation-dependent. 0 Disable Decrementer interrupt 1 Enable Decrementer interrupt 34:35 Watchdog Timer Reset Control (WRC) (see 38:39 Fixed-Interval Timer Period (FP) (see Section 9.7 on page 1073) Section 9.6 on page 1073) 00 No Watchdog Timer reset will occur Specifies one of 4 bit locations of the Time TCRWRC resets to 0b00. This field may be Base used to signal a Fixed-Interval Timer set by software, but cannot be cleared by exception on a transition from 0 to 1. The 4 software (except by a software-induced Time Base bits that can be specified to serve reset) as the Fixed-Interval Timer period are imple- mentation-dependent. 01-11 40 Fixed-Interval Timer Interrupt Enable (FIE) Force thread to be reset on second time- (see Section 9.6 on page 1073 out of Watchdog Timer. The exact func- Chapter 9. Timer Facilities 1071 Version 2.06 0 Disable Fixed-Interval Timer interrupt Bit(s) Description 1 Enable Fixed-Interval Timer interrupt 32 Enable Next Watchdog Timer (ENW) (see 41 Auto-Reload Enable (ARE) Section 9.7 on page 1073) 0 Disable auto-reload of the Decrementer 0 Action on next Watchdog Timer time-out is to set TSRENW Decrementer exception is presented (i.e. 1 Action on next Watchdog Timer time-out is TSRDIS is set to 1) when the Decrementer governed by TSRWIS is decremented from a value of 33 Watchdog Timer Interrupt Status (WIS) (see 0x0000_0001. The next value placed in the Section 9.7 on page 1073) Decrementer is the value 0x0000_0000. If category Embedded.Hypervisor is sup- 0 A Watchdog Timer event has not ported, when (MSREE=1 or MSRGS=1), occurred. TCRDIE=1, and TSRDIS=1, a Decrementer 1 A Watchdog Timer event has occurred. If category Embedded.Hypervisor is sup- interrupt is taken. If category Embed- ported, when (MSRCE=1 or MSRGS=1 ) ded.Hypervisor is not supported, when and TCRWIE=1, a Watchdog Timer inter- MSREE=1, TCRDIE=1, and TSRDIS=1, a rupt is taken. If category Embed- Decrementer interrupt is taken. Software ded.Hypervisor is not supported, when must reset TSRDIS. MSRCE=1 and TCRWIE=1, a Watchdog Timer interrupt is taken. 1 Enable auto-reload of the Decrementer 34:35 Watchdog Timer Reset Status (WRS) (see Decrementer exception is presented (i.e. Section 9.7 on page 1073) TSRDIS is set to 1) when the Decrementer These two bits are set to one of three values is decremented from a value of when a reset is caused by the Watchdog 0x0000_0001. The contents of the Decre- Timer. These bits are undefined at power-up. menter Auto-Reload Register is placed in the Decrementer. If category Embed- 00 No Watchdog Timer reset has occurred. ded.Hypervisor is supported, when 01 Implementation-dependent reset informa- (MSREE=1 or MSRGS=1), TCRDIE=1, and tion. TSRDIS=1, a Decrementer interrupt is 10 Implementation-dependent reset informa- taken. If category Embedded.Hypervisor is tion. not supported, when MSREE=1, 11 Implementation-dependent reset informa- TCRDIE=1, and TSRDIS=1, a Decrementer tion. interrupt is taken. Software must reset 36 Decrementer Interrupt Status (DIS) (see TSRDIS. Section 9.3.2 on page 1069) 42 Implementation-dependent 0 A Decrementer event has not occurred. 1 A Decrementer event has occurred. When 43:63 Reserved MSREE=1 and TCRDIE=1, a Decrementer interrupt is taken. 9.5.1 Timer Status Register 37 Fixed-Interval Timer Interrupt Status (FIS) (see Section 9.6 on page 1073) The Timer Status Register (TSR) is a 32-bit register. 0 A Fixed-Interval Timer event has not Timer Status Register bits are numbered 32 (most-sig- occurred. nificant bit) to 63 (least-significant bit). The Timer Sta- 1 A Fixed-Interval Timer event has tus Register contains status on timer events and the occurred. If category Embedded.Hypervi- most recent Watchdog Timer-initiated thread reset. sor is supported, when (MSREE=1 or The Timer Status Register is set via hardware, and MSRGS=1) and TCRFIE=1, a Fixed-Inter- read and cleared via software. The contents of the val Timer interrupt is taken. If category Timer Status Register can be read using the mfspr Embedded.Hypervisor is not supported, instruction. Bits in the Timer Status Register can be when MSREE=1 and TCRFIE=1, a Fixed- cleared using the mtspr instruction. Clearing is done Interval Timer interrupt is taken. by writing bits 32:63 of a General Purpose Register to 38:63 Reserved the Timer Status Register with a 1 in any bit position that is to be cleared and 0 in all other bit positions. The This register is hypervisor privileged. write-data to the Timer Status Register is not direct data, but a mask. A 1 causes the bit to be cleared, and a 0 has no effect. 9.6 Fixed-Interval Timer The contents of the TSR are defined below: The Fixed-Interval Timer (FIT) is a mechanism for pro- viding timer interrupts with a repeatable period, to facil- 1072 Power ISATM Book III-E Version 2.06 itate system maintenance. It is similar in function to an Out. The assumption is that TSRWIS was not cleared auto-reload Decrementer, except that there are fewer because the thread was unable to execute the Watch- selections of interrupt period available. The Fixed-Inter- dog Timer interrupt handler, leaving reset as the only val Timer exception occurs on 0 to 1 transitions of a available means to restart the system. Note that once selected bit from the Time Base (see Section 9.5). TCRWRC has been set to a non-zero value, it cannot be reset by software; this feature prevents errant software The Fixed-Interval Timer exception is logged by TSR- from disabling the Watchdog Timer reset capability. FIS. A Fixed-Interval Timer interrupt occurs when no higher priority interrupt exists (see Section 7.9 on A more complete view of Watchdog Timer behavior is page 1059), a Fixed-Interval Timer exception exists afforded by Figure 67 and Figure 68, which describe (TSRFIS=1), and the exception is enabled. If category the Watchdog Timer state machine and Watchdog mbedded.Hypervisor is supported, the interrupt is Timer controls. The numbers in parentheses in the fig- enabled by TCRFIE=1 and (MSREE = 1 or MSRGS=1). ure refer to the discussion of modes of operation which Otherwise, the interrupt is enabled by TCRFIE=1 and follow the table. MSREE=1. See Section 7.6.13 on page 1043 for details of register behavior caused by the Fixed-Interval Timer interrupt. Note that a Fixed-Interval Timer exception will also occur if the selected Time Base bit transitions from 0 to 1 due to an mtspr instruction that writes a 1 to the bit when its previous value was 0. 9.7 Watchdog Timer The Watchdog Timer is a facility intended to aid system recovery from faulty software or hardware. Watchdog time-outs occur on 0 to 1 transitions of selected bits from the Time Base (Section 9.5). When a Watchdog Timer time-out occurs while Watch- dog Timer Interrupt Status is clear (TSRWIS = 0) and the next Watchdog Time-out is enabled (TSRENW = 1), a Watchdog Timer exception is generated and logged by setting TSRWIS to 1. This is referred to as a Watch- dog Timer First Time Out. A Watchdog Timer interrupt occurs when no higher priority interrupt exists (see Section 7.9 on page 1059), a Watchdog Timer excep- tion exists (TSRWIS=1), and the exception is enabled. If category Embedded.Hypervisor is supported, the inter- rupt is enabled by TCRWIE=1 and (MSRCE = 1 or MSRGS=1). Otherwise, the interrupt is enabled by TCRWIE=1 and MSRCE=1. See Section 7.6.14 on page 1044 for details of register behavior caused by the Watchdog Timer Interrupt. The purpose of the Watch- dog Timer First time-out is to give an indication that there may be problem and give the system a chance to perform corrective action or capture a failure before a reset occurs from the Watchdog Timer Second time-out as explained further below. Note that a Watchdog Timer exception will also occur if the selected Time Base bit transitions from 0 to 1 due to an mtspr instruction that writes a 1 to the bit when its previous value was 0. When a Watchdog Timer time-out occurs while TSRWIS = 1 and TSRENW = 1, a thread reset occurs if it is enabled by a non-zero value of the Watchdog Reset Control field in the Timer Control Register (TCRWRC). This is referred to as a Watchdog Timer Second Time Chapter 9. Timer Facilities 1073 Version 2.06 Time-out. No exception recorded in TSRWIS. Set TSRENW so next time-out will cause exception. (2) SW Loop TSRENW,WIS=0b00 TSRENW,WIS=0b10 (1) Watchdog Time-out. WDT exception recorded in TSRWIS. Interrupt If category Embedded.Hypervisor is supported, Handler WDT interrupt will occur if enabled by TCRWIE=1 (3) SW Loop and (MSRCE=1 or MSRGS=1). If category interrupt (2) Watchdog Embedded.Hypervisor is not supported, WDT will Interrupt occur if enabled by TCRWIE=1 and MSRCE=1. Handler If TCRWRC00 then RESET, including Time-out TSRENW,WIS=0b01 TSRENW,WIS=0b11 TSRWRS TCRWRC TCRWRC 0b00 Time-out. Set TSRENW so next time-out will cause reset Figure 67. Watchdog State Machine Enable WDT Status Next WDT Action when timer interval expires (TSRWIS) (TSRENW) 0 0 Set Enable Next Watchdog Timer (TSRENW=1). 0 1 Set Enable Next Watchdog Timer (TSRENW=1). 1 0 Set Watchdog Timer interrupt status bit (TSRWIS=1). If category Embedded.Hypervisor is supported and Watchdog Timer interrupt is enabled (TCRWIE=1 and (MSRCE=1 or MSRGS=1)), then interrupt.If category Embedded.Hypervisor is not supported and Watchdog Timer interrupt is enabled (TCRWIE=1 and MSRCE=1), then interrupt. 1 1 Cause Watchdog Timer reset action specified by TCRWRC. Reset will copy pre-reset TCRWRC into TSRWRS, then clear TCRWRC. Figure 68. Watchdog Timer Controls The controls described in the above table imply three 2. Always take the Watchdog Timer interrupt when different modes of operation that a programmer might pending, but avoid when possible. In this mode a select for the Watchdog Timer. Each of these modes recurring code loop of reliable duration (or perhaps assumes that TCRWRC has been set to allow thread a periodic interrupt handler such as the Fixed- reset by the Watchdog facility: Interval Timer interrupt handler) is used to repeat- edly clear TSRENW such that a first time-out 1. Always take the Watchdog Timer interrupt when exception is avoided, and thus no Watchdog Timer pending, and never attempt to prevent its occur- interrupt occurs. Once TSRENW has been cleared, rence. In this mode, the Watchdog Timer interrupt software has between one and two full Watchdog caused by a first time-out is used to clear TSRWIS periods before a Watchdog exception will be so a second time-out never occurs. TSRENW is not posted in TSRWIS. If this occurs before the soft- cleared, thereby allowing the next time-out to ware is able to clear TSRENW again, a Watchdog cause another interrupt. 1074 Power ISATM Book III-E Version 2.06 Timer interrupt will occur. In this case, the Watch- dog Timer interrupt handler will then clear both TSRENW and TSRWIS, in order to (hopefully) avoid the next Watchdog Timer interrupt. 3. Never take the Watchdog Timer interrupt. In this mode, Watchdog Timer interrupts are disabled (via TCRWIE=0), and the system depends upon a recurring code loop of reliable duration (or perhaps a periodic interrupt handler such as the Fixed- Interval Timer interrupt handler) to repeatedly clear TSRWIS such that a second time-out is avoided, and thus no reset occurs. TSRENW is not cleared, thereby allowing the next time-out to set TSRWIS again. The recurring code loop must have a period which is less than one Watchdog Timer period in order to guarantee that a Watchdog Timer reset will not occur. 9.8 Freezing the Timer Facilities The debug mechanism provides a means of tempo- rarily freezing the timers upon a debug event. Specifi- cally, the Time Base and Decrementer can be frozen and prevented from incrementing/decrementing, respectively, whenever a debug event is set in the Debug Status Register. Note that this also freezes the FIT and Watchdog timer. This allows a debugger to simulate the appearance of `real time', even though the application has been temporarily `halted' to service the debug event. See the description of bit 63 of the Debug Control Register 0 (Freeze Timers on Debug Event or DBCR0FT) in Section 10.5.1.1 on page 1085. Chapter 9. Timer Facilities 1075 Version 2.06 1076 Power ISATM Book III-E Version 2.06 Chapter 10. Debug Facilities 10.1 Overview. . . . . . . . . . . . . . . . . . 1077 10.4.10 Critical Interrupt Return Debug 10.2 Internal Debug Mode . . . . . . . . 1077 Event [Category: Embedded.Enhanced 10.3 External Debug Mode [Category: Debug] . . . . . . . . . . . . . . . . . . . . . . . . 1085 Embedded.Enhanced Debug] . . . . . . 1078 10.5 Debug Registers . . . . . . . . . . . . 1085 10.4 Debug Events . . . . . . . . . . . . . . 1078 10.5.1 Debug Control Registers . . . . 1085 10.4.1 Instruction Address Compare 10.5.1.1 Debug Control Register 0 Debug Event . . . . . . . . . . . . . . . . . . . 1079 (DBCR0) . . . . . . . . . . . . . . . . . . . . . . 1085 10.4.2 Data Address Compare Debug 10.5.1.2 Debug Control Register 1 Event . . . . . . . . . . . . . . . . . . . . . . . . . 1081 (DBCR1) . . . . . . . . . . . . . . . . . . . . . . 1087 10.4.3 Trap Debug Event . . . . . . . . . 1082 10.5.1.3 Debug Control Register 2 10.4.4 Branch Taken Debug Event . . 1083 (DBCR2) . . . . . . . . . . . . . . . . . . . . . . 1088 10.4.5 Instruction Complete Debug 10.5.2 Debug Status Register . . . . . . 1089 Event . . . . . . . . . . . . . . . . . . . . . . . . . 1083 10.5.3 Debug Status Register Write Regis- 10.4.6 Interrupt Taken Debug Event . 1083 ter (DBSRWR) . . . . . . . . . . . . . . . . . . 1091 10.4.6.1 Causes of Interrupt Taken Debug 10.5.4 Instruction Address Compare Events . . . . . . . . . . . . . . . . . . . . . . . . 1083 Registers . . . . . . . . . . . . . . . . . . . . . . 1091 10.4.6.2 Interrupt Taken Debug Event 10.5.5 Data Address Compare Description . . . . . . . . . . . . . . . . . . . . 1083 Registers . . . . . . . . . . . . . . . . . . . . . . 1091 10.4.7 Return Debug Event . . . . . . . 1084 10.5.6 Data Value Compare 10.4.8 Unconditional Debug Event . . 1084 Registers . . . . . . . . . . . . . . . . . . . . . . 1091 10.4.9 Critical Interrupt Taken Debug 10.6 Debugger Notify Halt Instruction Event [Category: Embedded.Enhanced [Category: Embedded.Enhanced Debug]. . . . . . . . . . . . . . . . . . . . . . . . 1084 Debug] . . . . . . . . . . . . . . . . . . . . . . . . 1092 10.1 Overview The mfspr and mtspr instructions (see Section 5.4.1) provide access to the registers of the debug facilities. Debug facilities are provided to enable hardware and In addition to the facilities described here, implementa- software debug functions, such as instruction and data tions will typically include debug facilities, modes, and breakpoints and program single stepping. The debug access mechanisms which are implementation-spe- facilities consist of a set of Debug Control Registers cific. For example, implementations will typically pro- (DBCR0, DBCR1, and DBCR2) (see Section 10.5.1 on vide access to the debug facilities via a dedicated page 1085), a set of Address and Data Value Compare interface such as the IEEE 1149.1 Test Access Port Registers (IAC1, IAC2, IAC3, IAC4, DAC1, DAC2, (JTAG). DVC1, and DVC2), (see Section 10.4.3, Section 10.4.4, and Section 10.4.5), a Debug Status Register (DBSR) (see Section 10.5.2) for enabling and recording various 10.2 Internal Debug Mode kinds of debug events, and a special Debug interrupt type built into the interrupt mechanism (see Debug events include such things as instruction and Section 7.6.17). The debug facilities also provide a data breakpoints. These debug events cause status mechanism for software-controlled thread reset, and for bits to be set in the Debug Status Register. The exist- controlling the operation of the timers in a debug envi- ence of a set bit in the Debug Status Register is consid- ronment. ered a Debug exception. Debug exceptions, if enabled, will cause Debug interrupts. Chapter 10. Debug Facilities 1077 Version 2.06 There are two different mechanisms that control Debug Event (UDE) is an exception to this rule. Once a whether Debug interrupts are enabled. The first is the Debug Status Register bit is set, if Debug interrupts are MSRDE bit, and this bit must be set to 1 to enable enabled by MSRDE, a Debug interrupt will be gener- Debug interrupts. The second mechanism is an enable ated. bit in the Debug Control Register 0 (DBCR0). This bit is [Category: Embedded.Hypervisor] the Internal Debug Mode bit (DBCR0IDM), and it must To prevent spurious hypervisor debug events from also be set to 1 to enable Debug interrupts. occurring when a guest has been permitted to use the When DBCR0IDM=1, the thread is in Internal Debug Debug facilities, if the thread is in hypervisor state Mode. In this mode, debug events will (if also enabled (MSRGS=0) and debug events are disabled for hypervi- by MSRDE) cause Debug interrupts. Software at the sor (EPCRDUVD=1), no debug events are allowed to Debug interrupt vector location will thus be given con- occur except for the Unconditional Debug Event. It is trol upon the occurrence of a debug event, and can implementation-dependent whether the Unconditional access (via the normal instructions) all architected Debug Event is allowed to occur in hypervisor state resources. In this fashion, debug monitor software can when EPCRDUVD=1. control the thread and gather status, and interact with Certain debug events are not allowed to occur when debugging hardware. MSRDE=0. In such situations, no Debug exception When the thread is not in Internal Debug Mode occurs and thus no Debug Status Register bit is set. (DBCR0IDM=0), debug events may still occur and be Other debug events may cause Debug exceptions and recorded in the Debug Status Register. These excep- set Debug Status Register bits regardless of the state tions may be monitored via software by reading the of MSRDE. The associated Debug interrupts that result Debug Status Register (using mfspr), or may eventu- from such Debug exceptions will be delayed until ally cause a Debug interrupt if later enabled by setting MSRDE=1, provided the exceptions have not been DBCR0IDM=1 (and MSRDE=1). Behavior when debug cleared from the Debug Status Register in the mean- events occur while DBCR0IDM=0 is implementation- time. dependent. Any time that a Debug Status Register bit is allowed to be set while MSRDE=0, a special Debug Status Regis- 10.3 External Debug Mode [Cat- ter bit, Imprecise Debug Event (DBSRIDE), will also be set. DBSRIDE indicates that the associated Debug egory: Embedded.Enhanced exception bit in the Debug Status Register was set while Debug interrupts were disabled via the MMSRDE Debug] bit. Debug interrupt handler software can use this bit to determine whether the address recorded in CSRR0/ The External Debug Mode is a mode in which external DSRR0 [Category: Embedded.Enhanced Debug] facilities can control execution and access registers should be interpreted as the address associated with and other resources. These facilities are defined as the the instruction causing the Debug exception, or simply external debug facilities and are not defined here, how- the address of the instruction after the one which set ever some instructions and registers share internal and the MSRDE bit, thereby enabling the delayed Debug external debug roles and are briefly described as nec- interrupt. essary. Debug interrupts are ordered with respect to other A dnh instruction is provided to stop instruction fetch- interrupt types (see Section 7.8 on page 179). Debug ing and execution and allow the thread to be managed exceptions are prioritized with respect to other excep- by an external debug facility. After the dnh instruction tions (see Section 7.9 on page 183). is executed, instructions are not fetched, interrupts are not taken, and the thread does not execute instruc- There are eight types of debug events defined: tions. 1. Instruction Address Compare debug events 2. Data Address Compare debug events 10.4 Debug Events 3. 4. Trap debug events Branch Taken debug events Debug events are used to cause Debug exceptions to 5. Instruction Complete debug events be recorded in the Debug Status Register (see 6. Interrupt Taken debug events Section 10.5.2). In order for a debug event to be 7. Return debug events enabled to set a Debug Status Register bit and thereby 8. Unconditional debug events cause a Debug exception, the specific event type must 9. Critical Interrupt Taken debug events [Category be enabled by a corresponding bit or bits in the Debug Embedded.Enhanced Debug] Control Register DBCR0 (see Section 10.5.1.1), 10. Critical Interrupt Return debug events [Category DBCR1 (see Section 10.5.1.2), or DBCR2 (see Embedded.Enhanced Debug] Section 10.5.1.3), in most cases; the Unconditional 1078 Power ISATM Book III-E Version 2.06 Programming Note There are two classes of debug exception types: ysis it wants to, then clears all debug event enables in the DBCR except for the instruction Type 1: exception before instruction complete debug event enable. Type 2: exception after instruction 4. Software does an rfci or rfdi [Category: Embed- Almost all debug exceptions fall into the first type. That ded.Enhanced Debug]. is, they all take the interrupt upon encountering an 5. Hardware would execute and complete one instruction having the exception without updating any instruction (the branch taken in this case), and architectural state (other than DBSR, CSRR0/DSRR0 then take a Debug interrupt with CSRR0/DSRR0 [Category: Embedded.Enhanced Debug], CSRR1/ [Category: Embedded.Enhanced Debug] pointing DSRR1 [Category: Embedded.Enhanced Debug], to the target of the branch. MSR) for that instruction. 6. Software would see the instruction complete inter- The CSRR0/DSRR0 [Category: Embedded.Enhanced rupt type. It clears the instruction complete event Debug] for this type of exception points to the instruc- enable, then enables the branch taken interrupt tion that encountered the exception. This includes IAC, event again. DAC, branch taken, etc. 7. Software does an rfci or rfdi [Category: Embed- The only exception which fall into the second type is ded.Enhanced Debug]. the instruction complete debug exception. This excep- tion is taken upon completing and updating one instruc- 8. Hardware resumes on the target of the taken tion and then pointing CSRR0/DSRR0 [Category: branch and continues until another taken branch, Embedded.Enhanced Debug] to the next instruction to in which case we end up at step 2 again. execute. This, at first, seems like a double tax (i.e. 2 debug inter- To make forward progress for any Type 1 debug rupts for every instance of a Type 1 exception), but exception one does the following: there doesn't seem like any other clean way to make forward progress on Type 1 debug exceptions. The 1. Software sets up Type 1 exceptions (e.g. branch only other way to avoid the double tax is to have the taken debug exceptions) and then returns to nor- debug handler routine actually emulate the instruction mal program operation pointed to for the Type 1 exceptions, determine the 2. Hardware takes Debug interrupt upon the first next instruction that would have been executed by the branch taken Debug exception, pointing to the interrupted program flow and load the CSRR0/DSRR0 branch with CSRR0/DSRR0 [Category: Embed- [Category: Embedded.Enhanced Debug] with that ded.Enhanced Debug]. address and do an rfci/rfdi [Category: Embed- ded.Enhanced Debug]; this is probably not faster. 3. Software, in the debug handler, sees the branch taken exception type, does whatever logging/anal- 10.4.1 Instruction Address Com- DBCR1IAC2US specifies whether IAC2 debug events can occur in user mode or supervisor mode, or both. pare Debug Event DBCR1IAC3US specifies whether IAC3 debug events One or more Instruction Address Compare debug can occur in user mode or supervisor mode, or both. events (IAC1, IAC2, IAC3 or IAC4) occur if they are enabled and execution is attempted of an instruction at DBCR1IAC4US specifies whether IAC4 debug events an address that meets the criteria specified in the can occur in user mode or supervisor mode, or both. DBCR0, DBCR1, IAC1, IAC2, IAC3, and IAC4 Regis- ters. Effective/Real Address Mode DBCR1IAC1ER specifies whether effective addresses, Instruction Address Compare User/ real addresses, effective addresses and MSRIS=0, or Supervisor Mode effective addresses and MSRIS=1 are used in deter- mining an address match on IAC1 debug events. DBCR1IAC1US specifies whether IAC1 debug events can occur in user mode or supervisor mode, or both. DBCR1IAC2ER specifies whether effective addresses, real addresses, effective addresses and MSRIS=0, or Chapter 10. Debug Facilities 1079 Version 2.06 effective addresses and MSRIS=1 are used in deter- address of the instruction fetch is greater than mining an address match on IAC2 debug events. or equal to the contents of the IAC1 and less than the contents of the IAC2, an instruction DBCR1IAC3ER specifies whether effective addresses, address match occurs. real addresses, effective addresses and MSRIS=0, or effective addresses and MSRIS=1 are used in deter- For IAC3 and IAC4 debug events, if the 64-bit mining an address match on IAC3 debug events. address of the instruction fetch is greater than or equal to the contents of the IAC3 and less DBCR1IAC4ER specifies whether effective addresses, than the contents of the IAC4, an instruction real addresses, effective addresses and MSRIS=0, or address match occurs. effective addresses and MSRIS=1 are used in deter- mining an address match on IAC4 debug events. - For 64-bit implementations, the addresses are masked to compare only bits 32:63 when the thread is executing in 32-bit mode. Instruction Address Compare Mode - Exclusive address range compare mode DBCR1IAC12M specifies whether all or some of the bits For IAC1 and IAC2 debug events, if the 64-bit of the address of the instruction fetch must match the address of the instruction fetch is less than the contents of the IAC1 or IAC2, whether the address contents of the IAC1 or greater than or equal must be inside a specific range specified by the IAC1 to the contents of the IAC2, an instruction and IAC2 or outside a specific range specified by the address match occurs. IAC1 and IAC2 for an IAC1 or IAC2 debug event to occur. For IAC3 and IAC4 debug events, if the 64-bit address of the instruction fetch is less than the DBCR1IAC34M specifies whether all or some of the bits contents of the IAC3 or greater than or equal of the address of the instruction fetch must match the to the contents of the IAC4, an instruction contents of the IAC3 Register or IAC4 Register, address match occurs. whether the address must be inside a specific range specified by the IAC3 Register and IAC4 Register or For 64-bit implementations, the addresses are outside a specific range specified by the IAC3 Register masked to compare only bits 32:63 when the and IAC4 Register for an IAC3 or IAC4 debug event to thread is executing in 32-bit mode. occur. See the detailed description of DBCR0 (see There are four instruction address compare modes. Section 10.5.1.1, "Debug Control Register 0 (DBCR0)" on page 1085) and DBCR1 (see Section 10.5.1.2, There are four instruction address compare modes. "Debug Control Register 1 (DBCR1)" on page 1087) - Exact address compare mode and the modes for detecting IAC1, IAC2, IAC3 and If the address of the instruction fetch is equal IAC4 debug events. Instruction Address Compare to the value in the enabled IAC Register, an debug events can occur regardless of the setting of instruction address match occurs. For 64-bit MSRDE or DBCR0IDM. implementations, the addresses are masked When an Instruction Address Compare debug event to compare only bits 32:63 when the thread is occurs, the corresponding DBSRIAC1, DBSRIAC2, executing in 32-bit mode. DBSRIAC3, or DBSRIAC4 bit or bits are set to record the - Address bit match mode debug exception. If MSRDE=0, DBSRIDE is also set to 1 For IAC1 and IAC2 debug events, if the to record the imprecise debug event. address of the instruction fetch access, If MSRDE=1 (i.e. Debug interrupts are enabled) at the ANDed with the contents of the IAC2, are time of the Instruction Address Compare debug excep- equal to the contents of the IAC1, also ANDed tion, a Debug interrupt will occur immediately (provided with the contents of the IAC2, an instruction there exists no higher priority exception which is address match occurs. enabled to cause an interrupt). The execution of the For IAC3 and IAC4 debug events, if the instruction causing the exception will be suppressed, address of the instruction fetch, ANDed with and CSRR0/DSRR0 [Category: Embedded.Enhanced the contents of the IAC4, are equal to the con- Debug] will be set to the address of the excepting tents of the IAC3, also ANDed with the con- instruction. tents of the IAC4, an instruction address If MSRDE=0 (i.e. Debug interrupts are disabled) at the match occurs. time of the Instruction Address Compare debug excep- For 64-bit implementations, the addresses are tion, a Debug interrupt will not occur, and the instruc- masked to compare only bits 32:63 when the tion will complete execution (provided the instruction is thread is executing in 32-bit mode. not causing some other exception which will generate - Inclusive address range compare mode an enabled interrupt). For IAC1 and IAC2 debug events, if the 64-bit 1080 Power ISATM Book III-E Version 2.06 Later, if the debug exception has not been reset by debug events. Note that dcbf, dcbfep, dcbst, clearing DBSRIAC1, DBSRIAC2, DBSRIAC3, and and dcbstep are considered reads with DBSRIAC4, and MSRDE is set to 1, a delayed Debug respect to Data Storage exceptions, since interrupt will occur. In this case, CSRR0/DSRR0 [Cate- they do not actually change the data at a gory: Embedded.Enhanced Debug will contain the given address. However, since the execution address of the instruction after the one which enabled of these instructions may result in write activity the Debug interrupt by setting MSRDE to 1. Software in on the data bus, they are treated as writes the Debug interrupt handler can observe DBSRIDE to with respect to debug events. Note also that determine how to interpret the value in CSRR0/DSRR0 dcbtst and dcbtstep are treated as no-opera- [Category: Embedded.Enhanced Debug. tions when they report Data Storage or Data TLB Miss exceptions, instead of being allowed to cause interrupts. However, these instruc- 10.4.2 Data Address Compare tions are allowed to cause Debug interrupts, Debug Event even when they would otherwise have been no-op'ed due to a Data Storage or Data TLB One or more Data Address Compare debug events Miss exception. (DAC1R, DAC1W, DAC2R, DAC2W) occur if they are enabled, execution is attempted of a data storage access instruction, and the type, address, and possibly even the data value of the data storage access meet Data Address Compare User/Supervi- the criteria specified in the Debug Control Register 0, sor Mode Debug Control Register 2, and the DAC1, DAC2, DVC1, and DVC2 Registers. DBCR2DAC1US specifies whether DAC1R and DAC1W debug events can occur in user mode or supervisor mode, or both. Data Address Compare Read/Write Enable DBCR2DAC2US specifies whether DAC2R and DAC2W debug events can occur in user mode or DBCR0DAC1 specifies whether DAC1R debug events supervisor mode, or both. can occur on read-type data storage accesses and whether DAC1W debug events can occur on write-type data storage accesses. Effective/Real Address Mode DBCR2DAC1ER specifies whether effective DBCR0DAC2 specifies whether DAC2R debug events addresses, real addresses, effective addresses can occur on read-type data storage accesses and and MSRDS=0, or effective addresses and whether DAC2W debug events can occur on write-type MSRDS=1 are used to in determining an address data storage accesses. match on DAC1R and DAC1W debug events. Indexed-string instructions (lswx, stswx) for which the DBCR2DAC2ER specifies whether effective XER field specifies zero bytes as the length of the addresses, real addresses, effective addresses string are treated as no-ops, and are not allowed to and MSRDS=0, or effective addresses and cause Data Address Compare debug events. MSRDS=1 are used to in determining an address All Load instructions are considered reads with respect match on DAC2R and DAC2W debug events. to debug events, while all Store instructions are consid- ered writes with respect to debug events. In addition, Data Address Compare Mode the Cache Management instructions, and certain spe- cial cases, are handled as follows. DBCR2DAC12M specifies whether all or some of the bits of the address of the data storage access must - dcbt, dcbtls, dcbtep, icbt, icbtls, icbtep, match the contents of the DAC1 or DAC2, whether icbi, icblc, dcblc, and icbiep are all consid- the address must be inside a specific range speci- ered reads with respect to debug events. Note fied by the DAC1 and DAC2 or outside a specific that dcbt, dcbtep, icbt, and icbtep are range specified by the DAC1 and DAC2 for a treated as no-operations when they report DAC1R, DAC1W, DAC2R or DAC2W debug event Data Storage or Data TLB Miss exceptions, to occur. instead of being allowed to cause interrupts. However, these instructions are allowed to There are four data address compare modes. cause Debug interrupts, even when they - Exact address compare mode would otherwise have been no-op'ed due to a If the 64-bit address of the data storage Data Storage or Data TLB Miss exception. access is equal to the value in the enabled - dcbtst, dcbtstls, dcbtstep, dcbz, dcbzep, Data Address Compare Register, a data dcbi, dcbf, dcbfep, dcba, dcbst, and dcb- address match occurs. step are all considered writes with respect to Chapter 10. Debug Facilities 1081 Version 2.06 For 64-bit implementations, the addresses are exists no higher priority exception which is enabled to masked to compare only bits 32:63 when the cause an interrupt), the execution of the instruction thread is executing in 32-bit mode. causing the exception will be suppressed, and CSRR0/ DSRR0 [Category: Embedded.Enhanced Debug will be - Address bit match mode set to the address of the excepting instruction. Depend- If the address of the data storage access, ing on the type of instruction and/or the alignment of ANDed with the contents of the DAC2, are the data access, the instruction causing the exception equal to the contents of the DAC1, also may have been partially executed (see Section 7.7). ANDed with the contents of the DAC2, a data address match occurs. If MSRDE=0 (i.e. Debug interrupts are disabled) at the time of the Data Address Compare debug exception, a For 64-bit implementations, the addresses are Debug interrupt will not occur, and the instruction will masked to compare only bits 32:63 when the complete execution (provided the instruction is not thread is executing in 32-bit mode. causing some other exception which will generate an - Inclusive address range compare mode enabled interrupt). Also, DBSRIDE is set to indicate that If the 64-bit address of the data storage the debug exception occurred while Debug interrupts access is greater than or equal to the contents were disabled by MSRDE=0. of the DAC1 and less than the contents of the Later, if the debug exception has not been reset by DAC2, a data address match occurs. clearing DBSRDAC1R, DBSRDAC1W, DBSRDAC2R, DBSRDAC2W, and MSRDE is set to 1, a delayed Debug For 64-bit implementations, the addresses are interrupt will occur. In this case, CSRR0/DSRR0 [Cate- masked to compare only bits 32:63 when the gory: Embedded.Enhanced Debug will contain the thread is executing in 32-bit mode. address of the instruction after the one which enabled - Exclusive address range compare mode the Debug interrupt by setting MSRDE to 1. Software in If the 64-bit address of the data storage the Debug interrupt handler can observe DBSRIDE to access is less than the contents of the DAC1 determine how to interpret the value in CSRR0/DSRR0 or greater than or equal to the contents of the [Category: Embedded.Enhanced Debug. DAC2, a data address match occurs. 10.4.3 Trap Debug Event For 64-bit implementations, the addresses are masked to compare only bits 32:63 when the A Trap debug event (TRAP) occurs if DBCR0TRAP=1 thread is executing in 32-bit mode. (i.e. Trap debug events are enabled) and a Trap instruction (tw, twi, td, tdi) is executed and the condi- Data Value Compare Mode tions specified by the instruction for the trap are met. The event can occur regardless of the setting of DBCR2DVC1M and DBCR2DVC1BE specify whether MSRDE or DBCR0IDM. and how the data value being accessed by the storage access must match the contents of the When a Trap debug event occurs, DBSRTR is set to 1 DVC1 for a DAC1R or DAC1W debug event to to record the debug exception. If MSRDE=0, DBSRIDE occur. is also set to 1 to record the imprecise debug event. DBCR2DVC2M and DBCR2DVC2BE specify whether If MSRDE=1 (i.e. Debug interrupts are enabled) at the and how the data value being accessed by the time of the Trap debug exception, a Debug interrupt will storage access must match the contents of the occur immediately (provided there exists no higher pri- DVC2 for a DAC2R or DAC2W debug event to ority exception which is enabled to cause an interrupt), occur. and CSRR0/DSRR0 [Category: Embedded.Enhanced Debug] will be set to the address of the excepting The description of DBCR0 (see Section 10.5.1.1) and instruction. DBCR2 (see Section 10.5.1.3) and the modes for detecting Data Address Compare debug events. Data If MSRDE=0 (i.e. Debug interrupts are disabled) at the Address Compare debug events can occur regardless time of the Trap debug exception, a Debug interrupt will of the setting of MSRDE or DBCR0IDM. not occur, and a Trap exception type Program interrupt will occur instead if the trap condition is met. When an Data Address Compare debug event occurs, the corresponding DBSRDAC1R, DBSRDAC1W, Later, if the debug exception has not been reset by DBSRDAC2R, or DBSRDAC2W bit or bits are set to 1 to clearing DBSRTR, and MSRDE is set to 1, a delayed record the debug exception. If MSRDE=0, DBSRIDE is Debug interrupt will occur. In this case, CSRR0/DSRR0 also set to 1 to record the imprecise debug event. [Category: Embedded.Enhanced Debug will contain the address of the instruction after the one which If MSRDE=1 (i.e. Debug interrupts are enabled) at the enabled the Debug interrupt by setting MSRDE to 1. time of the Data Address Compare debug exception, a Software in the debug interrupt handler can observe Debug interrupt will occur immediately (provided there 1082 Power ISATM Book III-E Version 2.06 DBSRIDE to determine how to interpret the value in When an Instruction Complete debug event occurs, CSRR0/DSRR0 [Category: Embedded.Enhanced DBSRICMP is set to 1 to record the debug exception, a Debug]. Debug interrupt will occur immediately (provided there exists no higher priority exception which is enabled to cause an interrupt), and CSRR0/DSRR0 [Category: 10.4.4 Branch Taken Debug Event Embedded.Enhanced Debug] will be set to the address A Branch Taken debug event (BRT) occurs if of the instruction after the one causing the Instruction DBCR0BRT=1 (i.e. Branch Taken Debug events are Complete debug exception. enabled), execution is attempted of a branch instruction whose direction will be taken (that is, either an uncondi- 10.4.6 Interrupt Taken Debug tional branch, or a conditional branch whose branch condition is met), and MSRDE=1. Event Branch Taken debug events are not recognized if MSRDE=0 at the time of the execution of the branch 10.4.6.1 Causes of Interrupt Taken instruction and thus DBSRIDE can not be set by a Debug Events Branch Taken debug event. This is because branch instructions occur very frequently. Allowing these com- Only base class interrupts can cause an Interrupt mon events to be recorded as exceptions in the DBSR Taken debug event. If the Embedded.Enhanced Debug while debug interrupts are disabled via MSRDE would category is not supported or is supported and not result in an inordinate number of imprecise Debug enabled, all other interrupts automatically clear interrupts. MSRDE, and thus would always prevent the associated Debug interrupt from occurring precisely. If the Embed- When a Branch Taken debug event occurs, the DBSR- ded.Enhanced Debug category is supported and BRT bit is set to 1 to record the debug exception and a enabled, then critical class interrupts do not automati- Debug interrupt will occur immediately (provided there cally clear MSRDE, but they cause Critical Interrupt exists no higher priority exception which is enabled to Taken debug events instead of Interrupt Taken debug cause an interrupt). The execution of the instruction events. causing the exception will be suppressed, and CSRR0/ DSRR0 [Category: Embedded.Enhanced Debug] will Also, if the Embedded.Enhanced Debug category is not be set to the address of the excepting instruction. supported or is supported and not enabled, Debug interrupts themselves are critical class interrupts, and thus any Debug interrupt (for any other debug event) 10.4.5 Instruction Complete would always end up setting the additional exception of DBSRIRPT upon entry to the Debug interrupt handler. Debug Event At this point, the Debug interrupt handler would be An Instruction Complete debug event (ICMP) occurs if unable to determine whether or not the Interrupt Taken DBCR0ICMP=1 (i.e. Instruction Complete debug events debug event was related to the original debug event. are enabled), execution of any instruction is completed, and MSRDE=1. Note that if execution of an instruction 10.4.6.2 Interrupt Taken Debug Event is suppressed due to the instruction causing some other exception which is enabled to generate an inter- Description rupt, then the attempted execution of that instruction An Interrupt Taken debug event (IRPT) occurs if does not cause an Instruction Complete debug event. DBCR0IRPT=1 (i.e. Interrupt Taken debug events are The sc instruction does not fall into the type of an enabled) and a base class interrupt occurs. Interrupt instruction whose execution is suppressed, since the Taken debug events can occur regardless of the set- instruction actually completes execution and then gen- ting of MSRDE. erates a System Call interrupt. In this case, the Instruc- tion Complete debug exception will also be set. When an Interrupt Taken debug event occurs, DBSR- IRPT is set to 1 to record the debug exception. If Instruction Complete debug events are not recognized MSRDE=0, DBSRIDE is also set to 1 to record the if MSRDE=0 at the time of the execution of the instruc- imprecise debug event. tion, DBSRIDE can not be set by an ICMP debug event. This is because allowing the common event of Instruc- If MSRDE=1 (i.e. Debug interrupts are enabled) at the tion Completion to be recorded as an exception in the time of the Interrupt Taken debug event, a Debug inter- DBSR while Debug interrupts are disabled via MSRDE rupt will occur immediately (provided there exists no would mean that the Debug interrupt handler software higher priority exception which is enabled to cause an would receive an inordinate number of imprecise interrupt), and Critical Save/Restore Register 0/Debug Debug interrupts every time Debug interrupts were re- Save/Restore Register 0 [Category: Embed- enabled via MSRDE. ded.Enhanced Debug] will be set to the address of the interrupt vector which caused the Interrupt Taken Chapter 10. Debug Facilities 1083 Version 2.06 debug event. No instructions at the base interrupt han- the event). The Unconditional debug event can occur dler will have been executed. regardless of the setting of MSRDE. If MSRDE=0 (i.e. Debug interrupts are disabled) at the When an Unconditional debug event occurs, the time of the Interrupt Taken debug event, a Debug inter- DBSRUDE bit is set to 1 to record the Debug exception. rupt will not occur, and the handler for the interrupt If MSRDE=0, DBSRIDE is also set to 1 to record the which caused the Interrupt Taken debug event will be imprecise debug event. allowed to execute. If MSRDE=1 (i.e. Debug interrupts are enabled) at the Later, if the debug exception has not been reset by time of the Unconditional Debug exception, a Debug clearing DBSRIRPT, and MSRDE is set to 1, a delayed interrupt will occur immediately (provided there exists Debug interrupt will occur. In this case, CSRR0/DSRR0 no higher priority exception which is enabled to cause [Category: Embedded.Enhanced Debug] will contain an interrupt), and CSRR0/DSRR0 [Category: Embed- the address of the instruction after the one which ded.Enhanced Debug] will be set to the address of the enabled the Debug interrupt by setting MSRDE to 1. instruction which would have executed next had the Software in the Debug interrupt handler can observe interrupt not occurred. the DBSRIDE bit to determine how to interpret the value If MSRDE=0 (i.e. Debug interrupts are disabled) at the in CSRR0/DSRR0 [Category: Embedded.Enhanced time of the Unconditional Debug exception, a Debug Debug. interrupt will not occur. Later, if the Unconditional Debug exception has not 10.4.7 Return Debug Event been reset by clearing DBSRUDE, and MSRDE is set to A Return debug event (RET) occurs if DBCR0RET=1 1, a delayed Debug interrupt will occur. In this case, and an attempt is made to execute an rfi. Return debug CSRR0/DSRR0 [Category: Embedded.Enhanced events can occur regardless of the setting of MSRDE. Debug] will contain the address of the instruction after the one which enabled the Debug interrupt by setting When a Return debug event occurs, DBSRRET is set to MSRDE to 1. Software in the Debug interrupt handler 1 to record the debug exception. If MSRDE=0, DBSRIDE can observe DBSRIDE to determine how to interpret the is also set to 1 to record the imprecise debug event. value in CSRR0/DSRR0 [Category: Embed- If MSRDE=1 at the time of the Return Debug event, a ded.Enhanced Debug]. Debug interrupt will occur immediately, and CSRR0/ DSRR0 [Category: Embedded.Enhanced Debug will be 10.4.9 Critical Interrupt Taken set to the address of the rfi. Debug Event [Category: Embed- If MSRDE=0 at the time of the Return Debug event, a Debug interrupt will not occur. ded.Enhanced Debug] Later, if the Debug exception has not been reset by A Critical Interrupt Taken debug event (CIRPT) occurs clearing DBSRRET, and MSRDE is set to 1, a delayed if DBCR0CIRPT = 1 (i.e. Critical Interrupt Taken debug imprecise Debug interrupt will occur. In this case, events are enabled) and a critical interrupt occurs. A CSRR0/DSRR0 [Category: Embedded.Enhanced critical interrupt is any interrupt that saves state in Debug will contain the address of the instruction after CSRR0 and CSRR1 when the interrupt is taken. Criti- the one which enabled the Debug interrupt by setting cal Interrupt Taken debug events can occur regardless MSRDE to 1. An imprecise Debug interrupt can be of the setting of MSRDE. caused by executing an rfi when DBCR0RET=1 and When a Critical Interrupt Taken debug event occurs, MSRDE=0, and the execution of that rfi happens to DBSRCIRPT is set to 1 to record the debug event. If cause MSRDE to be set to 1. Software in the Debug MSRDE=0, DBSRIDE is also set to 1 to record the interrupt handler can observe the DBSRIDE bit to deter- imprecise debug event. mine how to interpret the value in CSRR0/DSRR0 [Cat- egory: Embedded.Enhanced Debug]. If MSRDE = 1 (i.e. Debug Interrupts are enabled) at the time of the Critical Interrupt Taken debug event, a Debug Interrupt will occur immediately (provided there 10.4.8 Unconditional Debug Event is no higher priority exception which is enabled to An Unconditional debug event (UDE) occurs when the cause an interrupt), and DSRR0 will be set to the Unconditional Debug Event (UDE) signal is activated address of the first instruction of the critical interrupt by the debug mechanism. The exact definition of the handler. No instructions at the critical interrupt handler UDE signal and how it is activated is implementation- will have been executed. dependent. The Unconditional debug event is the only If MSRDE = 0 (i.e. Debug Interrupts are disabled) at the debug event which does not have a corresponding time of the Critical Interrupt Taken debug event, a enable bit for the event in DBCR0 (hence the name of Debug Interrupt will not occur, and the handler for the critical interrupt which caused the debug event will be 1084 Power ISATM Book III-E Version 2.06 allowed to execute normally. Later, if the debug excep- and DBCR2 are used to enable debug events, reset the tion has not been reset by clearing DBSRCIRPT and thread, control timer operation during debug events, MSRDE is set to 1, a delayed Debug Interrupt will and set the debug mode of the thread. occur. In this case DSRR0 will contain the address of the instruction after the one that set MSRDE = 1. Soft- ware in the Debug Interrupt handler can observe 10.5.1.1 Debug Control Register 0 DBSRIDE to determine how to interpret the value in (DBCR0) DSRR0. The contents of the DBCR0 can be read into bits 32:63 of register RT using mfspr RT,DBCR0, setting bits 0:31 10.4.10 Critical Interrupt Return of RT to 0. The contents of bits 32:63 of register RS can be written to the DBCR0 using mtspr DBCR0,RS. Debug Event [Category: Embed- The bit definitions for DBCR0 are shown below. ded.Enhanced Debug] Bit(s) Description A Critical Interrupt Return debug event (CRET) occurs 32 External Debug Mode (EDM) [Category: if DBCR0CRET = 1 (i.e. Critical Interrupt Return debug Embedded.Enhanced Debug] events are enabled) and an attempt is made to execute The EDM bit is a read-only bit that reflects an rfci instruction. Critical Interrupt Return debug whether the thread is controlled by an external events can occur regardless of the setting of MSRDE. debug facility. When EDM is set, internal When a Critical Interrupt Return debug event occurs, debug mode is suppressed and the taking of DBSRCRET is set to 1 to record the debug event. If debug interrupts does not occur. MSRDE=0, DBSRIDE is also set to 1 to record the 0 The thread is not in external debug mode. imprecise debug event. 1 The thread is in external debug mode. If MSRDE = 1 (i.e. Debug Interrupts are enabled) at the 33 Internal Debug Mode (IDM) time of the Critical Interrupt Return debug event, a Debug Interrupt will occur immediately (provided there 0 Debug interrupts are disabled. is no higher priority exception which is enabled to 1 If MSRDE=1, then the occurrence of a cause an interrupt), and DSRR0 will be set to the debug event or the recording of an earlier address of the rfci instruction. debug event in the Debug Status Register when MSRDE=0 or DBCR0IDM=0 will If MSRDE = 0 (i.e. Debug Interrupts are disabled) at the cause a Debug interrupt. time of the Critical Interrupt Return debug event, a Debug Interrupt will not occur. Later, if the debug Programming Note exception has not been reset by clearing DBSRCRET Software must clear debug event status in and MSRDE is set to 1, a delayed Debug Interrupt will the Debug Status Register in the Debug occur. In this case DSRR0 will contain the address of interrupt handler when a Debug interrupt the instruction after the one that set MSRDE = 1. An is taken before re-enabling interrupts via imprecise Debug Interrupt can be caused by executing MSRDE. Otherwise, redundant Debug an rfci when DBCR0CRET = 1 and MSRDE = 0, and the interrupts will be taken for the same execution of the rfci happens to cause MSRDE to be debug event. set to 1. Software in the Debug Interrupt handler can observe DBSRIDE to determine how to interpret the 34:35 Reset (RST) value in DSRR0. 00 No action 01 Implementation-specific 10.5 Debug Registers 10 Implementation-specific 11 Implementation-specific This section describes debug-related registers that are Warning: Writing 0b01, 0b10, or 0b11 to accessible to software. These registers are intended for these bits may cause a thread reset to occur. use by special debug tools and debug software, and not by general application or operating system code. 36 Instruction Completion Debug Event (ICMP) 10.5.1 Debug Control Registers 0 ICMP debug events are disabled 1 ICMP debug events are enabled Debug Control Register 0 (DBCR0), Debug Control Register 1 (DBCR1), and Debug Control Register 2 Note: Instruction Completion will not cause an (DBCR2) are each 32-bit registers. Bits of DBCR0, ICMP debug event if MSRDE=0. DBCR1, and DBCR2 are numbered 32 (most-signifi- cant bit) to 63 (least-significant bit). DBCR0, DBCR1, Chapter 10. Debug Facilities 1085 Version 2.06 37 Branch Taken Debug Event Enable (BRT) 48 Return Debug Event Enable (RET) 0 BRT debug events are disabled 0 RET debug events cannot occur 1 BRT debug events are enabled 1 RET debug events can occur Note: Taken branches will not cause a BRT Note: Return From Critical Interrupt will not debug event if MSRDE=0. cause an RET debug event if MSRDE=0. If the Embedded.Enhanced Debug category is sup- 38 Interrupt Taken Debug Event Enable (IRPT) ported, see Section 10.4.10 0 IRPT debug events are disabled 49:56 Reserved 1 IRPT debug events are enabled 57 Critical Interrupt Taken Debug Event Note: Critical interrupts will not cause an IRPT (CIRPT) [Category: Embedded.Enhanced Debug event even if MSRDE=0. If the Embed- Debug] ded.Enhanced Debug category is supported, A Critical Interrupt Taken Debug Event occurs see Section 10.4.9. when DBCR0CIRPT = 1 and a critical interrupt 39 Trap Debug Event Enable (TRAP) (any interrupt that uses the critical class, i.e. uses CSRR0 and CSRR1) occurs. 0 TRAP debug events cannot occur 1 TRAP debug events can occur 0 Critical interrupt taken debug events are disabled. 40 Instruction Address Compare 1 Debug 1 Critical interrupt taken debug events are Event Enable (IAC1) enabled. 0 IAC1 debug events cannot occur 58 Critical Interrupt Return Debug Event 1 IAC1 debug events can occur (CRET) [Category: Embedded.Enhanced 41 Instruction Address Compare 2 Debug Debug] Event Enable (IAC2) A Critical Interrupt Return Debug Event 0 IAC2 debug events cannot occur occurs when DBCR0CRET= 1 and a return 1 IAC2 debug events can occur from critical interrupt (an rfci instruction is executed) occurs. 42 Instruction Address Compare 3 Debug Event Enable (IAC3) 0 Critical interrupt return debug events are disabled. 0 IAC3 debug events cannot occur 1 Critical interrupt return debug events are 1 IAC3 debug events can occur enabled. 43 Instruction Address Compare 4 Debug 59:62 Implementation-dependent Event Enable (IAC4) 63 Freeze Timers on Debug Event (FT) 0 IAC4 debug events cannot occur 1 IAC4 debug events can occur 0 Enable clocking of timers 1 Disable clocking of timers if any DBSR bit 44:45 Data Address Compare 1 Debug Event is set (except MRR) Enable (DAC1) This register is hypervisor privileged. 00 DAC1 debug events cannot occur 01 DAC1 debug events can occur only if a store-type data storage access 10.5.1.2 Debug Control Register 1 10 DAC1 debug events can occur only if a (DBCR1) load-type data storage access The contents of the DBCR1 can be read into bits 32:63 11 DAC1 debug events can occur on any a register RT using mfspr RT,DBCR1, setting bits 0:31 data storage access of RT to 0. The contents of bits 32:63 of register RS 46:47 Data Address Compare 2 Debug Event can be written to the DBCR1 using mtspr DBCR1,RS. Enable (DAC2) The bit definitions for DBCR1 are shown below. 00 DAC2 debug events cannot occur 01 DAC2 debug events can occur only if a Bit(s) Description store-type data storage access 32:33 Instruction Address Compare 1 User/ 10 DAC2 debug events can occur only if a Supervisor Mode(IAC1US) load-type data storage access 00 IAC1 debug events can occur 11 DAC2 debug events can occur on any 01 Reserved data storage access 10 IAC1 debug events can occur only if MSRPR=0 1086 Power ISATM Book III-E Version 2.06 11 IAC1 debug events can occur only if in IAC1 and less than the value specified in MSRPR=1 IAC2. 34:35 Instruction Address Compare 1 Effective/ If IAC1USIAC2US or IAC1ERIAC2ER, Real Mode (IAC1ER) results are boundedly undefined. 00 IAC1 debug events are based on effective addresses 11 Exclusive address range compare 01 IAC1 debug events are based on real IAC1 and IAC2 debug events can occur addresses only if the address of the instruction fetch is 10 IAC1 debug events are based on effective less than the value specified in IAC1 or is addresses and can occur only if MSRIS=0 greater than or equal to the value specified 11 IAC1 debug events are based on effective in IAC2. addresses and can occur only if MSRIS=1 36:37 Instruction Address Compare 2 User/ If IAC1USAC2US or IAC1ERIAC2ER, Supervisor Mode (IAC2US) results are boundedly undefined. 00 IAC2 debug events can occur 42:47 Reserved 01 Reserved 48:49 Instruction Address Compare 3 User/ 10 IAC2 debug events can occur only if Supervisor Mode (IAC3US) MSRPR=0 00 IAC3 debug events can occur 11 IAC2 debug events can occur only if 01 Reserved MSRPR=1 10 IAC3 debug events can occur only if 38:39 Instruction Address Compare 2 Effective/ MSRPR=0 Real Mode (IAC2ER) 11 IAC3 debug events can occur only if 00 IAC2 debug events are based on effective MSRPR=1 addresses 50:51 Instruction Address Compare 3 Effective/ 01 IAC2 debug events are based on real Real Mode (IAC3ER) addresses 00 IAC3 debug events are based on effective 10 IAC2 debug events are based on effective addresses addresses and can occur only if MSRIS=0 01 IAC3 debug events are based on real 11 IAC2 debug events are based on effective addresses addresses and can occur only if MSRIS=1 10 IAC3 debug events are based on effective 40:41 Instruction Address Compare 1/2 Mode addresses and can occur only if MSRIS=0 (IAC12M) 11 IAC3 debug events are based on effective 00 Exact address compare addresses and can occur only if MSRIS=1 52:53 Instruction Address Compare 4 User/ IAC1 debug events can occur only if the Supervisor Mode (IAC4US) address of the instruction fetch is equal to the value specified in IAC1. 00 IAC4 debug events can occur 01 Reserved IAC2 debug events can occur only if the 10 IAC4 debug events can occur only if address of the instruction fetch is equal to MSRPR=0 the value specified in IAC2. 11 IAC4 debug events can occur only if MSRPR=1 01 Address bit match 54:55 Instruction Address Compare 4 Effective/ IAC1 and IAC2 debug events can occur Real Mode (IAC4ER) only if the address of the instruction fetch, 00 IAC4 debug events are based on effective ANDed with the contents of IAC2 are equal addresses to the contents of IAC1, also ANDed with 01 IAC4 debug events are based on real the contents of IAC2. addresses If IAC1USIAC2US or IAC1ERIAC2ER, 10 IAC4 debug events are based on effective results are boundedly undefined. addresses and can occur only if MSRIS=0 11 IAC4 debug events are based on effective addresses and can occur only if MSRIS=1 10 Inclusive address range compare 56:57 Instruction Address Compare 3/4 Mode IAC1 and IAC2 debug events can occur (IAC34M) only if the address of the instruction fetch is greater than or equal to the value specified 00 Exact address compare Chapter 10. Debug Facilities 1087 Version 2.06 IAC3 debug events can occur only if the 11 DAC1 debug events can occur only if address of the instruction fetch is equal to MSRPR=1 the value specified in IAC3. 34:35 Data Address Compare 1 Effective/Real IAC4 debug events can occur only if the Mode (DAC1ER) address of the instruction fetch is equal to 00 DAC1 debug events are based on effec- the value specified in IAC4. tive addresses 01 DAC1 debug events are based on real 01 Address bit match addresses 10 DAC1 debug events are based on effec- IAC3 and IAC4 debug events can occur tive addresses and can occur only if only if the address of the data storage MSRDS=0 access, ANDed with the contents of IAC4 11 DAC1 debug events are based on effec- are equal to the contents of IAC3, also tive addresses and can occur only if ANDed with the contents of IAC4. MSRDS=1 If IAC3USIAC4US or IAC3ERIAC4ER, 36:37 Data Address Compare 2 User/Supervisor results are boundedly undefined. Mode (DAC2US) 00 DAC2 debug events can occur 10 Inclusive address range compare 01 Reserved IAC3 and IAC4 debug events can occur 10 DAC2 debug events can occur only if only if the address of the instruction fetch is MSRPR=0 greater than or equal to the value specified 11 DAC2 debug events can occur only if in IAC3 and less than the value specified in MSRPR=1 IAC4. 38:39 Data Address Compare 2 Effective/Real If IAC3USIAC4US or IAC3ERIAC4ER, Mode (DAC2ER) results are boundedly undefined. 00 DAC2 debug events are based on effec- tive addresses 11 Exclusive address range compare 01 DAC2 debug events are based on real IAC3 and IAC4 debug events can occur addresses only if the address of the instruction fetch is 10 DAC2 debug events are based on effec- less than the value specified in IAC3 or is tive addresses and can occur only if greater than or equal to the value specified MSRDS=0 in IAC4. 11 DAC2 debug events are based on effec- tive addresses and can occur only if If IAC3USIAC4US or IAC3ERIAC4ER, MSRDS=1 results are boundedly undefined. 40:41 Data Address Compare 1/2 Mode 58:63 Reserved (DAC12M) This register is hypervisor privileged. 00 Exact address compare DAC1 debug events can occur only if the 10.5.1.3 Debug Control Register 2 address of the data storage access is equal (DBCR2) to the value specified in DAC1. The contents of the DBCR2 can be copied into bits DAC2 debug events can occur only if the 32:63 register RT using mfspr RT,DBCR2, setting bits address of the data storage access is equal 0:31 of register RT to 0. The contents of bits 32:63 of a to the value specified in DAC2. register RS can be written to the DBCR2 using mtspr DBCR2,RS. The bit definitions for DBCR2 are 01 Address bit match shown below. DAC1 and DAC2 debug events can occur only if the address of the data storage Bit(s) Description access, ANDed with the contents of DAC2 32:33 Data Address Compare 1 User/Supervisor are equal to the contents of DAC1, also Mode (DAC1US) ANDed with the contents of DAC2. 00 DAC1 debug events can occur If DAC1USDAC2US or 01 Reserved DAC1ERDAC2ER, results are boundedly 10 DAC1 debug events can occur only if undefined. MSRPR=0 10 Inclusive address range compare 1088 Power ISATM Book III-E Version 2.06 DAC1 and DAC2 debug events can occur 56:63 Data Value Compare 2 Byte Enables only if the address of the data storage (DVC2BE) access is greater than or equal to the value Specifies which bytes in the aligned data specified in DAC1 and less than the value value being read or written by the storage specified in DAC2. access are compared to the corresponding If DAC1US DAC2US or DAC1ER bytes in DVC2 DAC2ER, results are boundedly undefined. This register is hypervisor privileged. 11 Exclusive address range compare DAC1 and DAC2 debug events can occur 10.5.2 Debug Status Register only if the address of the data storage The Debug Status Register (DBSR) is a 32-bit register access is less than the value specified in and contains status on debug events and the most DAC1 or is greater than or equal to the recent thread reset. value specified in DAC2. The DBSR is set via hardware, and read and cleared If DAC1US DAC2US or DAC1ER via software. The contents of the DBSR can be read DAC2ER, results are boundedly undefined. into bits 32:63 of a register RT using the mfspr instruc- 42:43 Reserved tion, setting bits 0:31 of RT to zero. Bits in the DBSR can be cleared using the mtspr instruction. Clearing is 44:45 Data Value Compare 1 Mode (DVC1M) done by writing bits 32:63 of a register to the DBSR 00 DAC1 debug events can occur with a 1 in any bit position that is to be cleared and 0 in 01 DAC1 debug events can occur only when all other bit positions. The write-data to the DBSR is not all bytes specified in DBCR2DVC1BE in the direct data, but a mask. A 1 causes the bit to be data value of the data storage access cleared, and a 0 has no effect. match their corresponding bytes in DVC1 10 DAC1 debug events can occur only when The bit definitions for the DBSR are shown below: at least one of the bytes specified in DBCR2DVC1BE in the data value of the Bit(s) Description data storage access matches its corre- 32 Imprecise Debug Event (IDE) sponding byte in DVC1 Set to 1 if MSRDE=0 and a debug event 11 DAC1 debug events can occur only when causes its respective Debug Status Register all bytes specified in DBCR2DVC1BE within bit to be set to 1. at least one of the halfwords of the data value of the data storage access matches 33 Unconditional Debug Event (UDE) their corresponding bytes in DVC1 Set to 1 if an Unconditional debug event 46:47 Data Value Compare 2 Mode (DVC2M) occurred. See Section 10.4.8. 00 DAC2 debug events can occur 01 DAC2 debug events can occur only when 34:35 Most Recent Reset (MRR) all bytes specified in DBCR2DVC2BE in the data value of the data storage access Set to one of three values when a reset match their corresponding bytes in DVC2 occurs. These two bits are undefined at 10 DAC2 debug events can occur only when power-up. at least one of the bytes specified in DBCR2DVC2BE in the data value of the 00 No reset occurred since these bits last data storage access matches its corre- cleared by software sponding byte in DVC2 01 Implementation-dependent reset informa- 11 DAC2 debug events can occur only when tion all bytes specified in DBCR2DVC2BE within 10 Implementation-dependent reset informa- at least one of the halfwords of the data tion value of the data storage access matches 11 Implementation-dependent reset informa- their corresponding bytes in DVC2 tion 48:55 Data Value Compare 1 Byte Enables 36 Instruction Complete Debug Event (ICMP) (DVC1BE) Set to 1 if an Instruction Completion debug Specifies which bytes in the aligned data event occurred and DBCR0ICMP=1. See value being read or written by the storage Section 10.4.5. access are compared to the corresponding bytes in DVC1. 37 Branch Taken Debug Event (BRT) Chapter 10. Debug Facilities 1089 Version 2.06 Set to 1 if a Branch Taken debug event Set to 1 if a Return debug event occurred and occurred and DBCR0BRT=1. See DBCR0RET=1. See Section 10.4.2. Section 10.4.4. 49:52 Reserved 38 Interrupt Taken Debug Event (IRPT) 53:56 Implementation-dependent Set to 1 if an Interrupt Taken debug event 57 Critical Interrupt Taken Debug Event occurred and DBCR0IRPT=1. See (CIRPT) [Category: Embedded.Enhanced Section 10.4.6. Debug] 39 Trap Instruction Debug Event (TRAP) A Critical Interrupt Taken Debug Event occurs when DBCR0CIRPT=1 and a critical interrupt Set to 1 if a Trap Instruction debug event (any interrupt that uses the critical class, i.e. occurred and DBCR0TRAP=1. See uses CSRR0 and CSRR1) occurs. Section 10.4.3. 0 Critical interrupt taken debug events are 40 Instruction Address Compare 1 Debug disabled. Event (IAC1) 1 Critical interrupt taken debug events are Set to 1 if an IAC1 debug event occurred and enabled. DBCR0IAC1=1. See Section 10.4.1. 58 Critical Interrupt Return Debug Event 41 Instruction Address Compare 2 Debug (CRET) [Category: Embedded.Enhanced Event (IAC2) Debug] A Critical Interrupt Return Debug Event Set to 1 if an IAC2 debug event occurred and occurs when DBCR0CRET=1 and a return from DBCR0IAC2=1. See Section 10.4.1. critical interrupt (an rfci instruction is exe- 42 Instruction Address Compare 3 Debug cuted) occurs. Event (IAC3) 0 Critical interrupt return debug events are Set to 1 if an IAC3 debug event occurred and disabled. DBCR0IAC3=1. See Section 10.4.1. 1 Critical interrupt return debug events are enabled. 43 Instruction Address Compare 4 Debug Event (IAC4) 59:63 Implementation-dependent Set to 1 if an IAC4 debug event occurred and This register is hypervisor privileged. DBCR0IAC4=1. See Section 10.4.1. 44 Data Address Compare 1 Read Debug Event (DAC1R) Set to 1 if a read-type DAC1 debug event occurred and DBCR0DAC1=0b10 or DBCR0DAC1=0b11. See Section 10.4.2. 45 Data Address Compare 1 Write Debug Event (DAC1W) Set to 1 if a write-type DAC1 debug event occurred and DBCR0DAC1=0b01 or DBCR0DAC1=0b11. See Section 10.4.2. 46 Data Address Compare 2 Read Debug Event (DAC2R) Set to 1 if a read-type DAC2 debug event occurred and DBCR0DAC2=0b10 or DBCR0DAC2=0b11. See Section 10.4.2. 47 Data Address Compare 2 Write Debug Event (DAC2W) Set to 1 if a write-type DAC2 debug event occurred and DBCR0DAC2=0b01 or DBCR0DAC2=0b11. See Section 10.4.2. 48 Return Debug Event (RET) 1090 Power ISATM Book III-E Version 2.06 10.5.3 Debug Status Register A debug event may be enabled to occur upon loads, stores, or cache operations to an address specified in Write Register (DBSRWR) either the DAC1 or DAC2, inside or outside a range specified by the DAC1 and DAC2, or to blocks of The Debug Status Register Write Register (DBSRWR) addresses specified by the combination of the DAC1 allows a hypervisor state program to write the contents and DAC1 (see Section 10.4.2). of the Debug Status Register (see Section 10.5.2, "Debug Status Register"). The format of the DBSRWR The contents of the Data Address Compare i Register is shown in Figure 69 below. (where i={1 or 2}) can be read into register RT using mfspr RT,DACi. The contents of register RS can be DBSRWR written to the Data Address Compare i Register using 32 63 mtspr DACi,RS. Figure 69. Debug Status Register Write Register The contents of the DAC1 or DAC2 are compared to the address generated by a data storage access The DBSRWR is provided as a means to restore the instruction. contents of the DBSR on a partition switch. These registers are hypervisor privileged. The DBSRWR is hypervisor privileged. Writing DBSRWR changes the value in the DBSR. 10.5.6 Data Value Compare Regis- Writing non-zero bits may enable an imprecise Debug exception which may cause later imprecise Debug ters Interrupts. In order to correctly write DBSRWR, soft- The Data Value Compare Register 1 and 2 (DVC1 and ware should ensure that MSRDE = 0 when the value is DVC2 respectively) are each 64-bits. written and perform a context synchronizing operation before setting MSRDE to 1. A DAC1R, DAC1W, DAC2R, or DAC2W debug event may be enabled to occur upon loads or stores of a spe- cific data value specified in either or both of the DVC1 10.5.4 Instruction Address Com- and DVC2. DBCR2DVC1M and DBCR2DVC1BE control pare Registers how the contents of the DVC1 is compared with the value and DBCR2DVC2M and DBCR2DVC2BE control The Instruction Address Compare Register 1, 2, 3, and how the contents of the DVC2 is compared with the 4 (IAC1, IAC2, IAC3, and IAC4 respectively) are each value (see Section 10.4.2 and Section 10.5.1.3). 64-bits, with bit 63 being reserved. The contents of the Data Value Compare i Register A debug event may be enabled to occur upon an (where i={1 or 2}) can be read into register RT using attempt to execute an instruction from an address mfspr RT,DVCi. The contents of register RS can be specified in either IAC1, IAC2, IAC3, or IAC4, inside or written to the Data Value Compare i Register using outside a range specified by IAC1 and IAC2 or, inside mtspr DVCi,RS. or outside a range specified by IAC3 and IAC4, or to blocks of addresses specified by the combination of the These registers are hypervisor privileged. IAC1 and IAC2, or to blocks of addresses specified by the combination of the IAC3 and IAC4. Since all instruction addresses are required to be word-aligned, the two low-order bits of the Instruction Address Com- pare Registers are reserved and do not participate in the comparison to the instruction address (see Section 10.4.1 on page 1079). The contents of the Instruction Address Compare i Register (where i={1,2,3, or 4}) can be read into regis- ter RT using mfspr RT,IACi. The contents of register RS can be written to the Instruction Address Compare i Register using mtspr IACi,RS. This register is hypervisor privileged. 10.5.5 Data Address Compare Registers The Data Address Compare Register 1 and 2 (DAC1 and DAC2 respectively) are each 64-bits. Chapter 10. Debug Facilities 1091 Version 2.06 10.6 Debugger Notify Halt Instruction [Category: Embedded.Enhanced Debug] The dnh instruction provides the means for the transfer of information between the thread and an implementa- tion-dependent external debug facility. dnh also causes the thread to stop fetching and executing instructions. Debugger Notify Halt XFX-form dnh DUI,DUIS 19 DUI DUIS 198 / 0 6 11 21 31 if enabled by implementation-dependent means then implementation-dependent register DUI halt thread else illegal instruction exception Execution of the dnh instruction causes the thread to stop fetching instructions and taking interrupts if execu- tion of the instruction has been enabled. The contents of the DUI field are sent to the external debug facility to identify the reason for the halt. If execution of the dnh instruction has not been previ- ously enabled, executing the dnh instruction produces an Illegal Instruction exception. The means by which execution of the dnh instruction is enabled is imple- mentation-dependent. The current state of the debug facility, whether the thread is in IDM or EDM mode has no effect on the execution of the dnh instruction. The instruction is context synchronizing. Programming Note The DUIS field in the instruction may be used to pass information to an external debug facility. After the dnh instruction has executed, the instruction itself can be read back by the Illegal Instruction Interrupt handler or the external debug facility if the contents of the DUIS field are of interest. If the thread entered the Illegal Instruction Interrupt han- dler, software can use SRR0 to obtain the address of the dnh instruction which caused the handler to be invoked. Special Registers Altered: None 1092 Power ISATM Book III-E Version 2.06 Chapter 11. Processor Control [Category: Embedded.Processor Control] 11.1 Overview. . . . . . . . . . . . . . . . . . 1093 11.2.2.3 Guest Doorbell Critical Message 11.2 Programming Model . . . . . . . . . 1093 Filtering [Category: 11.2.1 Message Handling and Embedded.Hypervisor] . . . . . . . . . . . 1096 Filtering . . . . . . . . . . . . . . . . . . . . . . . 1093 11.2.2.4 Guest Doorbell Machine Check 11.2.2 Doorbell Message Filtering . . 1094 Message Filtering [Category: 11.2.2.1 Doorbell Critical Message Embedded.Hypervisor] . . . . . . . . . . . 1096 Filtering . . . . . . . . . . . . . . . . . . . . . . . 1095 11.3 Processor Control Instructions . 1098 11.2.2.2 Guest Doorbell Message Filtering [Category: Embedded.Hypervisor] . . 1095 11.1 Overview To provide inter thread interrupt capability the following doorbell message types are defined: The Processor Control facility provides a mechanism Processor Doorbell for threads within a coherence domain to send mes- Processor Doorbell Critical sages to all devices in the coherence domain. The facil- Guest Processor Doorbell ity provides a mechanism for sending interrupts that Guest Processor Doorbell Critical are not dependent on the interrupt controller to threads Guest Processor Doorbell Machine Check and allows message filtering by the threads that A doorbell message causes an interrupt to occur on receive the message. threads when the message is received and the thread The Processor Control facility is also useful for sending determines through examination of the payload that the messages to a device that provides specialized ser- message should be accepted. The examination of the vices such as secure boot operations controlled by a payload for this purpose is termed filtering. The accep- security device. tance of a doorbell message causes an exception to be generated on the accepting thread. The Processor Control facility defines how threads send messages and what actions threads take on the Threads accept and filter messages defined in receipt of a message. The actions taken by devices Section 11.2.1. Threads may also accept other imple- other than threads are not defined. mentation-dependent defined messages. 11.2.1 Message Handling and Fil- 11.2 Programming Model tering Threads initiate a message by executing the msgsnd instruction and specifying a message type and mes- Threads filter, accept, and handle message types sage payload in a general purpose register. Sending a defined as follows. The message type is specified in message causes the message to be sent to all the the message and is determined by the contents of reg- devices, including the sending thread, in the coherence ister RB32:36 used as the operand in the msgsnd domain in a reliable manner. instruction.The message type is interpreted as follows: Value Description Each device receives all messages that are sent. The actions that a device takes are dependent on the mes- 0 Doorbell Interrupt (DBELL) sage type and payload. There are no restrictions on A Processor Doorbell exception is generated what messages a thread can send. on the thread when the thread has filtered the message based on the payload and has Chapter 11. Processor Control [Category: Embedded.Processor Control] 1093 Version 2.06 determined that it should accept the message. 11.2.2 Doorbell Message Filtering A Processor Doorbell Interrupt occurs when no higher priority exception exists, a Proces- A thread receiving a DBELL message will filter the sor Doorbell exception exists, and the inter- message and either ignore the message or accept the rupt is enabled (MSREE=1). If category message and generate a Processor Doorbell exception Embedded.Hypervisor is supported, the inter- based on the payload and the state of the thread at the rupt is enabled if (MSREE=1 or MSRGS=1). time the message is received. 1 Doorbell Critical Interrupt (DBELL_CRIT) The payload is specified in the message and is deter- A Processor Doorbell Critical exception is mined by the contents of register RB37:63 used as the generated on the thread when the thread has operand in the msgsnd instruction. The payload bits filtered the message based on the payload are defined below. and has determined that it should accept the message. A Processor Doorbell Critical Inter- Bit Description rupt occurs when no higher priority exception 37 Broadcast (BRDCAST) exists, a Processor Doorbell Critical exception If set, the message is accepted by all threads exists, and the interrupt is enabled regardless of the value of the PIR register and (MSRCE=1). If category Embedded.Hypervi- the value of PIRTAG. sor is supported, the interrupt is enabled if 0 If the value of PIR and PIRTAG are equal (MSRCE=1 or MSRGS=1). a Processor Doorbell exception is gener- 2 Guest Doorbell Interrupt (G_DBELL) ated. 1 A Processor Doorbell exception is gener- A Guest Processor Doorbell exception is gen- ated regardless of the value of PIRTAG erated on the thread when the thread has fil- and PIR. tered the message based on the payload and 38:49 LPID Tag (LPIDTAG) has determined that it should accept the mes- The contents of this field are compared with sage. A Guest Processor Doorbell Interrupt the contents of the LPIDR. If LPIDTAG = 0, it occurs when no higher priority exception matches all values in the LPIDR register. exists, a Guest Processor Doorbell exception exists, and the interrupt is enabled (MSREE=1 50:63 PIR Tag (PIRTAG) and MSRGS=1). The contents of this field are compared with bits 50:63 of the PIR register. 3 Guest Doorbell Interrupt Critical (G_DBELL_CRIT) If category E.HV is supported by the thread on which a A Guest Processor Doorbell Critical exception DBELL message is received, the message will only be is generated on the thread when the thread accepted if it is for this partition (payloadLPIDTAG = has filtered the message based on the pay- LPIDR) or it is for all partitions (payloadLPIDTAG = 0) load and has determined that it should accept and it meets the additional criteria for acceptance the message. A Guest Processor Doorbell below. Critical Interrupt occurs when no higher prior- ity exception exists, a Guest Processor Door- If a DBELL message is received by a thread, the mes- bell Critical exception exists, and the interrupt sage is accepted and a Processor Doorbell exception is enabled (MSRCE=1 and MSRGS=1). is generated if one of the following conditions exist: This is a broadcast message (payloadBRDCAST=1); 4 Guest Doorbell Interrupt Machine Check The message is intended for this thread (G_DBELL_MC) (PIR50:63=payloadPIRTAG). A Guest Processor Doorbell Machine Check exception is generated on the thread when the The exception condition remains until a Processor thread has filtered the message based on the Doorbell Interrupt is taken, or a msgclr instruction is payload and has determined that it should executed on the receiving thread with a message type accept the message. A Guest Processor of DBELL. A change to any of the filtering criteria (i.e. Doorbell Machine Check Interrupt occurs changing the PIR register) will not clear a pending Pro- when no higher priority exception exists, a cessor Doorbell exception. Guest Processor Doorbell Machine Check DBELL messages are not cumulative. That is, if a exception exists, and the interrupt is enabled DBELL message is accepted and the interrupt is (MSRME=1 and MSRGS=1). pended because MSREE=0, additional DBELL mes- Message types other than these and their associated sages that would be accepted are ignored until the Pro- actions are implementation-dependent. cessor Doorbell exception is cleared by taking the interrupt or cleared by executing a msgclr with a mes- sage type of DBELL on the receiving thread. 1094 Power ISATM Book III-E Version 2.06 The temporal relationship between when a DBELL The temporal relationship between when a message is sent and when it is received in a given DBELL_CRIT message is sent and when it is received thread is not defined. in a given thread is not defined. The temporal relationship between when a 11.2.2.1 Doorbell Critical Message Fil- DBELL_CRIT message is sent and when it is received tering in a given thread is not defined. A thread receiving a DBELL_CRIT message type will filter the message and either ignore the message or 11.2.2.2 Guest Doorbell Message Filter- accept the message and generate a Processor Door- ing [Category: Embedded.Hypervisor] bell Critical exception based on the payload and the A thread receiving a G_DBELL message type will filter state of the thread at the time the message is received. the message and either ignore the message or accept The payload is specified in the message and is deter- the message and generate a Guest Processor Doorbell mined by the contents of register RB37:63 used as the Critical exception based on the payload and the state of operand in the msgsnd instruction. The payload bits the thread at the time the message is received. are defined below. The payload is specified in the message and is deter- Bit Description mined by the contents of register RB37:63 used as the 37 Broadcast (BRDCAST) operand in the msgsnd instruction. The payload bits If set, the message is accepted by all threads are defined below. regardless of the value of the PIR register and Bit Description the value of PIRTAG. 37 Broadcast (BRDCAST) 0 If the value of PIR and PIRTAG are equal If set, the message is accepted by all threads a Processor Doorbell Critical exception is regardless of the value of the GPIR register generated. and the value of PIRTAG. 1 A Processor Doorbell Critical exception is 0 If the value of GPIR and PIRTAG are generated regardless of the value of equal a Guest Processor Doorbell excep- PIRTAG and PIR. tion is generated. 38:49 LPID Tag (LPIDTAG) 1 A Guest Processor Doorbell exception is The contents of this field are compared with generated regardless of the value of the contents of the LPIDR. If LPIDTAG = 0, it PIRTAG and GPIR. matches all values in the LPIDR register. 38:49 LPID Tag (LPIDTAG) 50:63 PIR Tag (PIRTAG) The contents of this field are compared with The contents of this field are compared with the contents of the LPIDR. If LPIDTAG = 0, it bits 50:63 of the PIR register. matches all values in the LPIDR register. If category E.HV is supported by the thread on which a 50:63 PIR Tag (PIRTAG) DBELL_CRIT message is received, the message will The contents of this field are compared with only be accepted if it is for this partition (payloadLPID- bits 50:63 of the GPIR register. TAG = LPIDR) or it is for all partitions (payloadLPIDTAG = When a G_DBELL message is received by a thread, 0) and it meets the additional criteria for acceptance the message will only be accepted if it is for this parti- below. tion (payloadLPIDTAG = LPIDR) or it is for all partitions If a DBELL_CRIT message is received by a thread, the (payloadLPIDTAG = 0) and it meets the additional criteria message is accepted and a Processor Doorbell Critical for acceptance below. exception is generated if one of the following conditions The message is accepted and a Guest Processor exist: Doorbell exception is generated if one of the following This is a broadcast message (payloadBRDCAST=1); conditions exist: The message is intended for this thread This is a broadcast message (payloadBRDCAST=1); (PIR50:63=payloadPIRTAG). The message is intended for this thread DBELL_CRIT messages are not cumulative. That is, if (GPIR50:63=payloadPIRTAG). a DBELL_CRIT message is accepted and the interrupt G_DBELL messages are not cumulative. That is, if a is pended because MSRCE=0, additional DBELL_CRIT G_DBELL message is accepted and the interrupt is messages that would be accepted are ignored until the pended because MSRCE=0, additional G_DBELL mes- Processor Doorbell Critical exception is cleared by tak- sages that would be accepted are ignored until the ing the interrupt or cleared by executing a msgclr with Guest Processor Doorbell exception is cleared by tak- a message type of DBELL_CRIT on the receiving ing the interrupt or cleared by executing a msgclr with thread. a message type of G_DBELL on the receiving thread. Chapter 11. Processor Control [Category: Embedded.Processor Control] 1095 Version 2.06 The temporal relationship between when a G_DBELL by executing a msgclr with a message type of message is sent and when it is received in a given G_DBELL_CRIT on the receiving thread. thread is not defined. The temporal relationship between when a G_DBELL_CRIT message is sent and when it is 11.2.2.3 Guest Doorbell Critical Mes- received in a given thread is not defined. sage Filtering [Category: Embed- ded.Hypervisor] 11.2.2.4 Guest Doorbell Machine Check A thread receiving a G_DBELL_CRIT message type Message Filtering [Category: Embed- will filter the message and either ignore the message or ded.Hypervisor] accept the message and generate a Guest Processor A thread receiving a G_DBELL_MC message type will Doorbell Critical exception based on the payload and filter the message and either ignore the message or the state of the thread at the time the message is accept the message and generate a Guest Processor received. Doorbell Machine Check exception based on the pay- The payload is specified in the message and is deter- load and the state of the thread at the time the mes- mined by the contents of register RB37:63 used as the sage is received. operand in the msgsnd instruction. The payload bits The payload is specified in the message and is deter- are defined below. mined by the contents of register RB37:63 used as the Bit Description operand in the msgsnd instruction. The payload bits 37 Broadcast (BRDCAST) are defined below. If set, the message is accepted by all threads Bit Description regardless of the value of the GPIR register 37 Broadcast (BRDCAST) and the value of PIRTAG. If set, the message is accepted by all threads 0 If the value of GPIR and PIRTAG are regardless of the value of the GPIR register equal a Guest Processor Doorbell Critical and the value of PIRTAG. exception is generated. 0 If the value of GPIR and PIRTAG are 1 A Guest Processor Doorbell Critical equal a Guest Processor Doorbell exception is generated regardless of the Machine Check exception is generated. value of PIRTAG and GPIR. 1 A Guest Processor Doorbell Machine 38:49 LPID Tag (LPIDTAG) Check exception is generated regardless The contents of this field are compared with of the value of PIRTAG and GPIR. the contents of the LPIDR. If LPIDTAG = 0, it 38:49 LPID Tag (LPIDTAG) matches all values in the LPIDR register. The contents of this field are compared with 50:63 PIR Tag (PIRTAG) the contents of the LPIDR. If LPIDTAG = 0, it The contents of this field are compared with matches all values in the LPIDR register. bits 50:63 of the GPIR register. 50:63 PIR Tag (PIRTAG) When a G_DBELL_CRIT message is received by a The contents of this field are compared with thread, the message will only be accepted if it is for this bits 50:63 of the GPIR register. partition (payloadLPIDTAG = LPIDR) or it is for all parti- When a G_DBELL_MC message is received by a tions (payloadLPIDTAG = 0) and it meets the additional thread, the message will only be accepted if it is for this criteria for acceptance below. partition (payloadLPIDTAG = LPIDR) or it is for all parti- If a G_DBELL_CRIT message is received by a thread, tions (payloadLPIDTAG = 0) and it meets the additional the message is accepted and a Guest Processor Door- criteria for acceptance below. bell Critical exception is generated if one of the follow- If a G_DBELL_MC message is received by a thread, ing conditions exist: the message is accepted and a Guest Processor Door- This is a broadcast message (payloadBRDCAST=1); bell Machine Check exception is generated if one of the The message is intended for this thread following conditions exist: (GPIR50:63=payloadPIRTAG). This is a broadcast message (payloadBRDCAST=1); G_DBELL_CRIT messages are not cumulative. That is, The message is intended for this thread if a G_DBELL_CRIT message is accepted and the (GPIR50:63=payloadPIRTAG). interrupt is pended because MSRCE=0, additional G_DBELL_MC messages are not cumulative. That is, if G_DBELL messages that would be accepted are a G_DBELL_MC message is accepted and the inter- ignored until the Guest Processor Doorbell Critical rupt is pended because MSRCE=0, additional exception is cleared by taking the interrupt or cleared G_DBELL_MC messages that would be accepted are 1096 Power ISATM Book III-E Version 2.06 ignored until the Guest Processor Doorbell Machine Check exception is cleared by taking the interrupt or cleared by executing a msgclr with a message type of G_DBELL_MC on the receiving thread. The temporal relationship between when a G_DBELL_MC message is sent and when it is received in a given thread is not defined. Chapter 11. Processor Control [Category: Embedded.Processor Control] 1097 Version 2.06 11.3 Processor Control Instructions msgsnd and msgclr instructions are provided for In the instruction descriptions the statement "this sending and clearing messages to threads and other instructions is treated as a Store" means that the devices in the coherence domain. These instructions instruction is treated as a Store with respect to the stor- are privileged. age access ordering mechanism caused by memory barriers in Section 1.7.1 of Book II. Message Send X-form Message Clear X-form msgsnd RB msgclr RB 31 /// /// RB 206 / 31 /// /// RB 238 / 0 6 11 16 21 31 0 6 11 16 21 31 msgtype GPR(RB)32:36 msgtype GPR(RB)32:36 payload GPR(RB)37:63 clear_received_message(msgtype) send_msg_to_choherence_domain(msgtype, payload) msgclr clears a message of msgtype previously msgsnd sends a message to all devices in the coher- accepted by the thread executing the msgclr. msgtype ence domain. The message contains a type and a pay- is defined by the contents of RB32:36. A message is load. The message type (msgtype) is defined by the said to be cleared when a pending exception generated contents of RB32:36 and the message payload is by an accepted message has not yet taken its associ- defined by the contents of RB37:63. Message delivery is ated interrupt. reliable and guaranteed. Each device may perform specific actions based on the message type and pay- If a pending exception exists for msgtype that excep- load or may ignore messages. Consult the implementa- tion is cleared at the completion of the msgclr instruc- tion user's manual for specific actions taken based on tion. message type and payload. For threads, the types of messages that can be cleared For threads, actions taken on receipt of a message are are defined in Section 11.2.1. defined in Section 11.2.1. This instruction is hypervisor privileged. For storage access ordering, msgsnd is treated as a Special Registers Altered: Store with respect to memory barriers. None This instruction is hypervisor privileged. Programming Note Special Registers Altered: Execution of a msgclr instruction that clears a None pending exception when the associated interrupt is masked because the interrupt enable (MSREE or MSRCE) is not set to 1 will always clear the pending exception (and thus the interrupt will not occur) if a subsequent instruction causes MSREE or MSRCE to be set to 1. 1098 Power ISATM Book III-E Version 2.06 Chapter 12. Synchronization Requirements for Context Alterations Changing the contents of certain System Registers, the If a sequence of instructions contains context-altering contents of TLB entries, or the contents of other system instructions and contains no instructions that are resources that control the context in which a program affected by any of the context alterations, no software executes can have the side effect of altering the con- synchronization is required within the sequence. text in which data addresses and instruction addresses are interpreted, and in which instructions are executed Programming Note and data accesses are performed. For example, Sometimes advantage can be taken of the fact that changing certain bits in the MSR has the side effect of certain events, such as interrupts, and certain changing how instruction addresses are calculated. instructions that occur naturally in the program, These side effects need not occur in program order, such as an rfi, rfgi [Category: Embedded.Hypervi- and therefore may require explicit synchronization by sor], rfci, rfmci, or rfdi [Category: software. (Program order is defined in Book II.) Embeddd.Enhanced Debug] that returns from an An instruction that alters the context in which data interrupt handler, provide the required synchroniza- addresses or instruction addresses are interpreted, or tion. in which instructions are executed or data accesses are performed, is called a context-altering instruction. This No software synchronization is required before or after chapter covers all the context-altering instructions. The a context-altering instruction that is also context syn- software synchronization required for them is shown in chronizing (e.g., rfi, etc.) or when altering the MSR in Table 13 (for data access) and Table 12 (for instruction most cases (see the tables). No software synchroniza- fetch and execution). tion is required before most of the other alterations shown in Table 12, because all instructions preceding The notation "CSI" in the tables means any context the context-altering instruction are fetched and synchronizing instruction (e.g., sc, isync, rfi, rfci, decoded before the context-altering instruction is exe- rfmci, or rfdi [Category: Embedded. Enhanced cuted (the hardware must determine whether any of Debug]). A context synchronizing interrupt (i.e., any these preceding instructions are context synchroniz- interrupt except non-recoverable Machine Check) can ing). be used instead of a context synchronizing instruction. If it is, phrases like "the synchronizing instruction", Unless otherwise stated, the material in this chapter below, should be interpreted as meaning the instruction assumes a single-threaded environment. at which the interrupt occurs. If no software synchroni- zation is required before (after) a context-altering instruction, "the synchronizing instruction before (after) the context-altering instruction" should be interpreted as meaning the context-altering instruction itself. The synchronizing instruction before the context-alter- ing instruction ensures that all instructions up to and including that synchronizing instruction are fetched and executed in the context that existed before the alter- ation. The synchronizing instruction after the context- altering instruction ensures that all instructions after that synchronizing instruction are fetched and executed in the context established by the alteration. Instruc- tions after the first synchronizing instruction, up to and including the second synchronizing instruction, may be fetched or executed in either context. Chapter 12. Synchronization Requirements for Context Alterations 1099 Version 2.06 Instruction or Required Required Notes Instruction or Required Required Notes Event Before After Event Before After interrupt none none interrupt none none rfi none none rfi none none rfci none none rfci none none rfmci none none rfmci none none rfdi[Category:E.ED] none none rfdi[Category:E.ED] none none rfgi none none rfgi none none sc none none sc none none mtmsr (GS) none CSI mtmsr (GS) none CSI mtspr (LPIDR) none CSI 2 mtspr (LPIDR) CSI CSI mtspr (GIVPR) none none mtmsr (CM) none CSI mtspr (DBSRWR) none CSI mtmsr (PR) none CSI mtspr (EPCR) none CSI mtmsr (ME) none CSI 3 mtspr (GIVORi) none none mtmsr (DS) none CSI mtmsr (CM) none none mtspr (PID) CSI CSI mtmsr (UCLE) none none mtspr (DBSR) -- -- 5 mtmsr (SPV) none none mtspr --- --- 5 mtmsr (CE) none none 4 (DBCR0,DBCR2) mtmsr (EE) none none 4 mtspr -- -- 5 mtmsr (PR) none CSI (DAC1,DAC2, mtmsr (FP) none CSI DVC1,DVC2) mtmsr (DE) none CSI mtspr (MMUCSR0) CSI CSI, or CSI 6,9 TLB invalidate all and sync mtmsr (ME) none CSI 3 tlbilx CSI CSI, or CSI 6 mtmsr (FE0) none CSI and sync mtmsr (FE1) none CSI Store(PTE) none {sync, CSI} 1,8 mtmsr (IS) none CSI 2 tlbivax CSI CSI, or CSI 1,6 mtspr (DEC) none none 7 and sync mtspr (PID) none CSI 2 tlbwe CSI CSI, or CSI 1,6 mtspr (IVPR) none none and sync mtspr (DBSR) -- -- 5 Table 13:Synchronization requirements for data access mtspr -- -- 5 (DBCR0,DBCR1) mtspr -- -- 5 Notes: (IAC1,IAC2,IAC3, 1. There are additional software synchronization IAC4) requirements for this instruction in multi-threaded mtspr (IVORi) none none environments (e.g., it may be necessary to invali- mtspr (TSR) none none 7 date one or more TLB entries on all threads in the mtspr (TCR) none none 7 system and to be able to determine that the invali- mtspr (MMUCSR0) none CSI, or 6,9 dations have completed and that all side effects of TLB invalidate all CSI and sync the invalidations have taken effect); it is also nec- essary to execute a tlbsync instruction. mtspr (MCIVPR) none none tlbilx none CSI, or 6 2. The alteration must not cause an implicit branch in CSI and sync real address space. Thus the real address of the Store(PTE) none {sync, CSI} 1,8 context-altering instruction and of each subse- tlbivax none CSI, or 1,6 quent instruction, up to and including the next con- CSI and sync text synchronizing instruction, must be tlbwe none CSI, or 1,6 independent of whether the alteration has taken CSI and sync effect. wrtee none none 4 3. A context synchronizing instruction is required wrteei none none 4 after altering MSRME to ensure that the alteration Table 12:Synchronization requirements for instruction takes effect for subsequent Machine Check inter- rupts, which may not be recoverable and therefore fetch and/or execution 1100 Power ISATM Book III-E Version 2.06 may not be context synchronizing. Programming Note The following sequence illustrates why it is 4. The effect of changing MSREE or MSRCE is imme- necessary, for data accesses, to ensure that diate. all storage accesses due to instructions before If an mtmsr, wrtee, or wrteei instruction sets the tlbwe or tlbivax have completed to a point MSREE to `0', an External Input, DEC or FIT inter- at which they have reported all exceptions they rupt does not occur after the instruction is exe- will cause. Assume that valid TLB entries exist cuted. for the target storage location when the sequence starts. If an mtmsr, wrtee, or wrteei instruction changes A program issues a load or store to a MSREE from `0' to `1' when an External Input, Dec- page. rementer, Fixed-Interval Timer, or higher priority The same program executes a tlbwe or enabled exception exists, the corresponding inter- tlbivax that invalidates the corresponding rupt occurs immediately after the mtmsr, wrtee, or TLB entry. wrteei is executed, and before the next instruction The Load or Store instruction finally exe- is executed in the program that set MSREE to `1'. cutes, and gets a TLB Miss exception. If an mtmsr instruction sets MSRCE to `0', a Criti- The TLB Miss exception is semantically cal Input or Watchdog Timer interrupt does not incorrect. In order to prevent it, a context occur after the instruction is executed. synchronizing instruction must be exe- cuted between steps 1 and 2. If an mtmsr instruction changes MSRCE from `0' to `1' when a Critical Input, Watchdog Timer or higher 7. The elapsed time between the Decrementer priority enabled exception exists, the correspond- reaching zero, or the transition of the selected ing interrupt occurs immediately after the mtmsr is Time Base bit for the Fixed-Interval Timer or the executed, and before the next instruction is exe- Watchdog Timer, and the signalling of the Decre- cuted in the program that set MSRCE to `1'. menter, Fixed-Interval Timer or the Watchdog 5. Synchronization requirements for changing any of Timer exception is not defined. the Debug Facility Registers are implementation- 8. The notation "{sync, CSI}" denotes an instruction dependent. sequence. Other instructions may be interleaved 6. For data accesses, the context synchronizing with this sequence, but these instructions must instruction before the tlbwe, tlbilx , or appear in the order shown. tlbivax instruction ensures that all storage No software synchronization is required before the accesses due to preceding instructions have com- Store instruction because (a) stores are not per- pleted to a point at which they have reported all formed out-of-order and (b) address translations exceptions they will cause. associated with instructions preceding the Store The context synchronizing instruction after the instruction are not performed again after the store tlbwe, tlbilx , or tlbivax ensures that sub- has been performed (see Section 5.5). These sequent storage accesses (data and instruction) properties ensure that all address translations will use the updated value in the TLB entry(s) associated with instructions preceding the Store being affected. It does not ensure that all storage instruction will be performed using the old contents accesses previously translated by the TLB entry(s) of the PTE. being updated have completed with respect to The sync instruction after the Store instruction storage; if these completions must be ensured, the ensures that all lookups of the Page Table that are tlbwe, tlbilx , or tlbivax must be followed performed after the sync instruction completes will by a sync instruction as well as by a context syn- use the value stored (or a value stored subse- chronizing instruction. quently). The context synchronizing instruction after the sync instruction ensures that any address translations associated with instructions following the context synchronizing instruction that were performed using the old contents of the PTE will be discarded, with the result that these address trans- lations will be performed again and, if there is no corresponding entry in any implementation-spe- cific address translation lookaside information, will use the value stored (or a value stored subse- quently). Chapter 12. Synchronization Requirements for Context Alterations 1101 Version 2.06 The sync instruction also ensures that all storage accesses associated with instructions preceding the sync instruction, before the sync instruction is executed, will be performed with respect to any thread or mechanism, to the extent required by the associated Memory Coherence Required or Alter- nate Coherence Mode attributes, before any data accesses caused by instructions following the sync instruction are performed with respect to that thread or mechanism. 9. After executing a mtspr that sets one of the TLB invalidate all bits in the MMUCSR0 to a 1, software must read MMUCSR0 using a mfspr instruction until the corresponding bit is zero and then perform the CSI, or CSI and sync as indicated in the "Required After" column. 1102 Power ISATM Book III-E Version 2.06 Appendix A. Implementation-Dependent Instructions This appendix documents architectural resources that tions may exercise reasonable flexibility in implement- are allocated for specific implementation-sensitive ing these functions, but that flexibility should be limited functions which have scope-limited utility. Implementa- to that allowed in this appendix. A.1 Embedded Cache Initialization [Category: Embedded.Cache Ini- tialization] Data Cache Invalidate X-form Instruction Cache Invalidate X-form dci CT ici CT 31 / CT /// /// 454 / 31 / CT /// /// 966 / 0 6 7 11 16 21 31 0 6 7 11 16 21 31 If CT is not supported by the implementation, this If CT is not supported by the implementation, this instruction designates the primary data cache as the instruction designates the primary instruction cache as target data cache. the target instruction cache. If CT is supported by the implementation, let CT desig- If CT is supported by the implementation, let CT desig- nate either the primary data cache or another level of nate either the primary instruction cache or another the data cache hierarchy, as specified in Book II Sec- level of the instruction cache hierarchy, as specified in tion 3.2, as the target data cache. Book II Section 3.2, as the target instruction cache. The contents of the target data cache of the thread The contents of the target instruction cache of the executing the dci instruction are invalidated. thread executing the ici instruction are invalidated. Software must place a sync instruction before the dci Software must place a sync instruction before the ici to to guarantee all previous data storage accesses com- guarantee all previous instruction storage accesses plete before the dci is performed. complete before the ici is performed. Software must place a sync instruction after the dci to Software must place an isync instruction after the ici to guarantee that the dci completes before any subse- invalidate any instructions that may have already been quent data storage accesses are performed. fetched from the previous contents of the instruction cache after the isync. This instruction is hypervisor privileged. This instruction is hypervisor privileged. Special Registers Altered: None Special Registers Altered: None Extended Mnemonics: Extended Mnemonics: Extended mnemonic for Data Cache Invalidate Extended mnemonic for Instruction Cache Invalidate Extended: Equivalent to: dccci dci 0 Extended: Equivalent to: iccci ici 0 Appendix A. Implementation-Dependent Instructions 1103 Version 2.06 A.2 Embedded Cache Debug Facility [Category: Embedded.Cache Debug] A.2.1 Embedded Cache Debug Registers A.2.1.1 Data Cache Debug Tag Regis- A.2.1.2 Data Cache Debug Tag Regis- ter High ter Low The Data Cache Debug Tag Register High (DCDB- The Data Cache Debug Tag Register Low (DCDBTRL) TRH) is a 32-bit Special Purpose Register. The Data is a 32-bit Special Purpose Register. The Data Cache Cache Debug Tag Register High is read using mfspr Debug Tag Register Low is read using mfspr and is set and is set by dcread. by dcread. DCDBTRH DCDBTRL 32 63 32 63 Figure 70. Data Cache Debug Tag Register High Figure 71. Data Cache Debug Tag Register Low Programming Note Programming Note An example implementation of DCDBTRH could An example implementation of DCDBTRL could have the following content and format. have the following content and format. Bit(s) Description Bit(s) Description 32:55 Tag Real Address (TRA) 32:44 Reserved (TRA) Bits 0:23 of the lower 32 bits of the 36-bit 45 U bit parity (UPAR) real address associated with this cache block 46:47 Tag parity (TPAR) 56 Valid (V) 48:51 Data parity (DPAR) The valid indicator for the cache block (1 52:55 Modified (dirty) parity (MPAR) indicates valid) 56:59 Dirty Indicators (D) 57:59 Reserved The "dirty" (modified) indicators for each 60:63 Tag Extended Real Address (TERA) of the four doublewords in the cache block Upper 4 bits of the 36-bit real address 60 U0 Storage Attribute (U0) associated with this cache block The U0 storage attribute for the page Implementations may support different content and associated with this cache block format based on their cache implementation. 61 U1 Storage Attribute (U1) The U1 storage attribute for the page This register is hypervisor privileged. associated with this cache block 62 U2 Storage Attribute (U2) The U2 storage attribute for the page associated with this cache block 63 U3 Storage Attribute (U3) The U3 storage attribute for the page associated with this cache block Implementations may support different content and format based on their cache implementation. This register is hypervisor privileged. 1104 Power ISATM Book III-E Version 2.06 A.2.1.3 Instruction Cache Debug Data A.2.1.5 Instruction Cache Debug Tag Register Register Low The Instruction Cache Debug Data Register (ICDBDR) The Instruction Cache Debug Tag Register Low (ICDB- is a read-only 32-bit Special Purpose Register. The TRL) is a 32-bit Special Purpose Register. The Instruc- Instruction Cache Debug Data Register can be read tion Cache Debug Tag Register Low is read using using mfspr and is set by icread. mfspr and is set by icread. ICDBDR ICDBTRL 32 63 32 63 Figure 72. Instruction Cache Debug Data Register Figure 74. Instruction Cache Debug Tag Register Low This register is hypervisor privileged. Programming Note A.2.1.4 Instruction Cache Debug Tag An example implementation of ICDBTRL could Register High have the following content and format. The Instruction Cache Debug Tag Register High Bit(s) Description (ICDBTRH) is a 32-bit Special Purpose Register. The Instruction Cache Debug Tag Register High is read 32:53 Reserved using mfspr and is set by icread. 54 Translation Space (TS) The address space portion of the virtual ICDBTRH address associated with this cache block. 32 63 55 Translation ID Disable (TD) Figure 73. Instruction Cache Debug Tag Register TID Disable field for the memory page High associated with this cache block 56:63 Translation ID (TID) Programming Note TID field portion of the virtual address An example implementation of ICDBTRH could associated with this cache block have the following content and format. Other implementations may support different con- Bit(s) Description tent and format based on their cache implementa- 32:55 Tag Effective Address (TEA) tion. Bits 0:23 of the 32-bit effective address associated with this cache block This register is hypervisor privileged. 56 Valid (V) The valid indicator for the cache block (1 indicates valid) 57:58 Tag parity (TPAR) 59 Instruction Data parity (DPAR) 60:63 Reserved Implementations may support different content and format based on their cache implementation. This register is hypervisor privileged. Appendix A. Implementation-Dependent Instructions 1105 Version 2.06 A.2.2 Embedded Cache Debug Instructions Data Cache Read X-form sync # ensure that all previous # cache operations have dcread RT,RA,RB # completed 31 RT RA RB 486 / dcread regT,regA,regB# read cache information; 0 6 11 16 21 31 isync # ensure dcread completes # before attempting to [Alternative Encoding] # read results 31 RT RA RB 326 / mfspr regD,dcdbtrh # move high portion of tag 0 6 11 16 21 31 # into GPR D mfspr regE,dcdbtrl # move low portion of tag if RA = 0 then b 0 # into GPR E else b (RA) EA b + (RB) This instruction is hypervisor privileged. C log2(cache size) Special Registers Altered: B log2(cache block size) DCDBTRH DCDBTRL IDX EA64-C:63-B WD EA64-B:61 RT0:31 undefined Programming Note RT32:63 (data cache data)[IDX]WD×32:WD×32+31 dcread can be used by a debug tool to determine DCDBTRH (data cache tag high)[IDX] the contents of the data cache, without knowing the DCDBTRL (data cache tag low)[IDX] specific addresses of the blocks which are currently Let the effective address (EA) be the sum of the con- contained within the cache. tents of register RA, or 0 if RA is equal to 0, and the contents of register RB. Programming Note Let C = log2(cache size in bytes). Execution of dcread before the data cache has Let B = log2(cache block size in bytes). completed all cache operations associated with previously executed instructions (such as block fills EA64-C:63-B selects one of the 2C-B data cache blocks. and block flushes) is undefined. EA64-B:61 selects one of the data words in the selected data cache block. The selected word in the selected data cache block is placed into register RT. The contents of the data cache directory entry associ- ated with the selected data cache block are placed into DCDBTRH and DCDBTRL (see Figure 70 and Figure 71). dcread requires software to guarantee execution syn- chronization before subsequent mfspr instructions can read the results of the dcread instruction into GPRs. In order to guarantee that the mfspr instructions obtain the results of the dcread instruction, a sequence such as the following must be used: 1106 Power ISATM Book III-E Version 2.06 Instruction Cache Read X-form Programming Note icread RA,RB icread can be used by a debug tool to determine the contents of the instruction cache, without know- 31 /// RA RB 998 / ing the specific addresses of the blocks which are 0 6 11 16 21 31 currently contained within the cache. if RA = 0 then b 0 else b (RA) EA b + (RB) C log2(cache size) B log2(cache block size) IDX EA64-C:63-B WD EA64-B:61 ICDBDR (instruction cache data)[IDX]WD×32:WD×32+31 ICDBTRH (instruction cache tag high)[IDX] ICDBTRL (instruction cache tag low)[IDX] Let the effective address (EA) be the sum of the con- tents of register RA, or 0 if RA is equal to 0, and the contents of register RB. Let C = log2(cache size in bytes). Let B = log2(cache block size in bytes). EA64-C:63-B selects one of the 2C-B instruction cache blocks. EA64-B:61 selects one of the data words in the selected instruction cache block. The selected word in the selected instruction cache block is placed into ICDBDR. The contents of the instruction cache directory entry associated with the selected cache block are placed into ICDBTRH and ICDBTRL (see Figure 73 and Figure 74). icread requires software to guarantee execution syn- chronization before subsequent mfspr instructions can read the results of the icread instruction into GPRs. In order to guarantee that the mfspr instructions obtain the results of the icread instruction, a sequence such as the following must be used: icread regA,regB # read cache information isync # ensure icread completes # before attempting to # read results mficdbdr regC # move instruction # information into GPR C mficdbtrh regD # move high portion of # tag into GPR D mficdbtrl regE # move low portion of tag # into GPR E This instruction is hypervisor privileged. Special Registers Altered: ICDBDR ICDBTRH ICDBTRL Appendix A. Implementation-Dependent Instructions 1107 Version 2.06 1108 Power ISATM Book III-E Version 2.06 Appendix B. Assembler Extended Mnemonics In order to make assembler language programs simpler to write and easier to understand, a set of extended mnemonics and symbols is provided for certain instruc- tions. This appendix defines extended mnemonics and symbols related to instructions defined in Book III. Assemblers should provide the extended mnemonics and symbols listed here, and may provide others. Appendix B. Assembler Extended Mnemonics 1109 Version 2.06 B.1 Move To/From Special Purpose Register Mnemonics This section defines extended mnemonics for the Time Base instruction, which specifies the portion of mtspr and mfspr instructions, including the Special the Time Base as a numeric operand. Purpose Registers (SPRs) defined in Book I and cer- Note: mftb serves as both a basic and an extended tain privileged SPRs, and for the Move From Time mnemonic. The Assembler will recognize an mftb mne- Base instruction defined in Book II. monic with two operands as the basic form, and an The mtspr and mfspr instructions specify an SPR as a mftb mnemonic with one operand as the extended numeric operand; extended mnemonics are provided form. In the extended form the TBR operand is omitted that represent the SPR in the mnemonic rather than and assumed to be 268 (the value that corresponds to requiring it to be coded as an operand. Similar TB). extended mnemonics are provided for the Move From Table 14:Extended mnemonics for moving to/from an SPR Move To SPR Move From SPR Special Purpose Register Extended Equivalent to Extended Equivalent to Fixed-Point Exception Register mtxer Rx mtspr 1,Rx mfxer Rx mfspr Rx,1 Link Register mtlr Rx mtspr 8,Rx mflr Rx mfspr Rx,8 Count Register mtctr Rx mtspr 9,Rx mfctr Rx mfspr Rx,9 Decrementer mtdec Rx mtspr 22,Rx mfdec Rx mfspr Rx,22 Save/Restore Register 0 mtsrr0 Rx mtspr 26,Rx mfsrr0 Rx mfspr Rx,26 Save/Restore Register 1 mtsrr1 Rx mtspr 27,Rx mfsrr1 Rx mfspr Rx,27 Special Purpose Registers mtsprg n,Rx mtspr 272+n,Rx mfsprg Rx,n mfspr Rx,272+n G0 through G3 Time Base [Lower] mttbl Rx mtspr 284,Rx mftb Rx mftb Rx,2681 mfspr Rx,268 Time Base Upper mttbu Rx mtspr 285,Rx mftbu Rx mftb Rx,2691 mfspr Rx,269 PPR32 mtppr32 Rx mtspr 898, Rx mfppr32 Rx mfspr Rx, 8982 Processor Version Register - - mfpvr Rx mfspr Rx,287 1 The mftb instruction is Category: Phased-Out. Assemblers targeting version 2.03 or later of the architecture should generate an mfspr instruction for the mftb and mftbu extended mnemonics; see the corresponding Assembler Note in the mftb instruction description (see Section 5.2.1 of Book II). 2 Category: Phased-In 1110 Power ISATM Book III-E Version 2.06 Appendix C. Guidelines for 64-bit Implementations in 32-bit Mode and 32-bit Implementations C.1 Hardware Guidelines C.1.1 64-bit Specific Instructions the 32 0s when implementing these instructions. For Branch to Link Register and Branch to Count Register The instructions in the Category: 64-Bit are considered instructions, given the LR and CTR are implemented restricted only to 64-bit processing. A 32-bit implemen- only as 32-bit registers, only concatenating 2 0s to the tation need not implement the group; likewise, the right of bits 32:61 of these registers is necessary to 32-bit applications will not utilize any of these instruc- form the 32-bit branch target address. tions. All other instructions shall either be supported For next sequential instruction address computation, directly by the implementation, or sufficient infrastruc- the behavior is the same as for 64-bit implementations ture will be provided to enable software emulation of in 32-bit mode. the instructions. A 64-bit implementation that is execut- ing in 32-bit mode may choose to take an Unimple- mented Instruction Exception when these 64-bit C.1.4 TLB Fields on 32-bit Imple- specific instructions are executed. mentations C.1.2 Registers on 32-bit Imple- 32-bit implementations should support bits 32:53 of the Effective Page Number (EPN) field in the TLB. This mentations size provides support for a 32-bit effective address, which Power ISA ABIs may have come to expect to be The Power ISA provides 32-bit and 64-bit registers. All available. 32-bit implementations may support greater 32-bit registers shall be supported as defined in the than 32-bit real addresses by supporting more than bits specification except the MSR and EPCR. The MSR 32:53 of the Real Page Number (RPN) field in the TLB. shall be supported as defined in the specification except that CM is treated as a reserved bit. EPCR shall be supported as defined in the specification except that ICM and GICM are treated as reserved bits. Only bits C.2 32-bit Software Guidelines 32:63 of the 64-bit registers are required to be imple- mented in hardware in a 32-bit implementation except C.2.1 32-bit Instruction Selection for the 64-bit FPRs. Such 64-bit registers include the LR, the CTR, the XER, the 32 GPRs, SRR0, CSRR0, Any software that uses any of the instructions listed in DSRR0 , MCSRR0. MAS2, and GSRR0 Category: 64-Bit shall be considered 64-bit software, . and correct execution cannot be guaranteed on 32-bit implementations. Generally speaking, 32-bit software For additional information, see Section 1.5.2 of Book I. should avoid using any instruction or instructions that depend on any particular setting of bits 0:31 of any C.1.3 Addressing on 32-bit Imple- 64-bit application-accessible system register, including General Purpose Registers, for producing the correct mentations 32-bit results. Context switching may or may not pre- Only bits 32:63 of the 64-bit instruction and data stor- serve the upper 32 bits of application-accessible 64-bit age effective addresses need to be calculated and pre- system registers and insertion of arbitrary settings of sented to main storage. Given that the only branch and those upper 32 bits at arbitrary times during the execu- data storage access instructions that are not included tion of the 32-bit application must not affect the final in Section C.1.1 are defined to prepend 32 0s to bits result. 32:63 of the effective address computation, a 32-bit implementation can simply bypass the prepending of Appendix C. Guidelines for 64-bit Implementations in 32-bit Mode and 1111 Version 2.06 1112 Power ISATM Book III-E Version 2.06 Appendix D. Example Performance Monitor [Category: Embedded.Performance Monitor] D.1 Overview Counter registers. These registers are used for data collection. The occurrence of selected events This appendix describes an example of a Performance are counted here. These registers are named Monitor facility. It defines an architecture suitable for PMC0..15. User and supervisor level access to performance monitoring facilities in the Embedded these registers is through different PMR numbers environment. The architecture itself presents only pro- allowing different access rights. gramming model visible features in conjunction with Global controls. This register control global set- architecturally defined behavioral features. Much of the tings of the Performance Monitor facility and affect selection of events is by necessity implementation- all counters. This register is named PMGC0. User dependent and is not described as part of the architec- and supervisor level access to these registers is ture; however, this document provides guidelines for through different PMR numbers allowing different some features of a performance monitor implementa- access rights. In addition, a bit in the MSR tion that should be followed by all implementations. (MSRPMM) is defined to enable/disable counting. The example Performance Monitor facility provides the Local controls. These registers control settings ability to monitor and count predefined events such as that apply only to a particular counter. These regis- clocks, misses in the instruction cache or data cache, ters are named PMLCa0..15 and PMLCb0..15. types of instructions decoded, or mispredicted User and supervisor level access to these regis- branches. The count of such events can be used to ters is through different PMR numbers allowing dif- trigger the Performance Monitor exception. While most ferent access rights. Each set of local control of the specific events are not architected, the mecha- registers (PMLCan and PMLCbn) contains con- nism of controlling data collection is. trols that apply to the associated same numbered counter register (e.g. PMLCa0 and PMLCb0 con- The example Performance Monitor facility can be used tain controls for PMC0 while PMLCa1 and to do the following: PMLCb1 contain controls for PMC1). Improve system performance by monitoring soft- ware execution and then recoding algorithms for Assembler Note more efficiency. For example, memory hierarchy The counter registers, global controls, and local behavior can be monitored and analyzed to opti- controls have alias names which cause the assem- mize task scheduling or data distribution algo- bler to use different PMR numbers. The names rithms. PMC0...15, PMGC0, PMLCa0...15, and Characterize performance in environments not PMLCb0...15 cause the assembler to use the easily characterized by benchmarking. supervisor level PMR number, and the names UPMC0...15, UPMGC0, UPMLCa0...15, and Help system developers bring up and debug their UPMLCb0...15 cause the assembler to use the systems. user-level PMR number. A given implementation may implement fewer counter D.2 Programming Model registers (and their associated control registers) than The example Performance Monitor facility defines a set are architected. Architected counter and counter con- of Performance Monitor Registers (PMRs) that are trol registers that are not implemented behave the used to collect and control performance data collection same as unarchitected Performance Monitor Registers. and an interrupt to allow intervention by software. The PMRs are described in Section D.3. PMRs provide various controls and access to collected data. They are categorized as follows: Software uses the global and local controls to select which events are counted in the counter registers, when such events should be counted, and what action Appendix D. Example Performance Monitor 1113 Version 2.06 should be taken when a counter overflows. Software PMLCan necessary to enable monitoring of each can use the collected information to determine perfor- thread state are shown in Figure 75. mance attributes of a given segment of code, a pro- cess, or the entire software system. PMRs can be read Thread State FCS FCU FCM1 FCM0 by software using the mfpmr instruction and PMRs can Marked 0 0 0 1 be written by using the mtpmr instruction. Both instruc- tions are described in Section D.4. Not marked 0 0 1 0 Supervisor 0 1 0 0 Since counters are defined as 32-bit registers, it is pos- sible for the counting of some events to overflow. A User 1 0 0 0 Performance Monitor interrupt is provided that can be Marked and supervisor 0 1 0 1 programmed to occur in the event of a counter over- Marked and user 1 0 0 1 flow. The Performance Monitor interrupt is described in detail in Section D.2.5 and Section D.2.6. Not marked and supervisor 0 1 1 0 Not mark and user 1 0 1 0 D.2.1 Event Counting All 0 0 0 0 None X X 1 1 Event counting can be configured in several different None 1 1 X X ways. This section describes configurability and spe- cific unconditional counting modes. Figure 75. Thread States and PMLCan Bit Settings Two unconditional counting modes may be specified: D.2.2 Thread Context Config- Counting is unconditionally enabled regardless of urability the states of MSRPMM and MSRPR. This can be accomplished by setting PMLCanFCS, Counting can be enabled if conditions in the thread PMLCanFCU, PMLCanFCM1, and PMLCanFCM0 to state match a software-specified condition. Because a 0 for each counter control. software task scheduler may switch a thread's execu- tion among multiple processes and because statistics Counting is unconditionally disabled regardless of on only a particular process may be of interest, a facility the states of MSRPMM and MSRPR. This can be is provided to mark a process. The Performance Moni- accomplished by setting PMGC0FAC to 1 or by set- tor mark bit, MSRPMM, is used for this purpose. System ting PMLCanFC to 1 for each counter control. Alter- software may set this bit to 1 when a marked process is natively, this can be accomplished by setting running. This enables statistics to be gathered only dur- PMLCanFCM1 to 1 and PMLCanFCM0 to 1 for each ing the execution of the marked process. The states of counter control or by setting PMLCanFCS to 1 and MSRPR and MSRPMM together define a state that the PMLCanFCU to 1 for each counter control. thread (supervisor or user) and the process (marked or unmarked) may be in at any time. If this state matches Programming Note an individual state specified by the PMLCanFCS, PML- Events may be counted in a fuzzy manner. That is, CanFCU, PMLCanFCM1 and PMLCanFCM0 fields in events may not be counted precisely due to the PMLCan (the state for which monitoring is enabled), nature of an implementation. Users of the Perfor- counting is enabled for PMCn. mance Monitor facility should be aware that an Each event, on an implementation basis, may count event may be counted even if it was precisely fil- regardless of the value of MSRPMM. The counting tered, though it should not have been. In general behavior of each event should be documented in the such discrepancies are statistically unimportant User's Manual. and users should not assume that counts are explicitly accurate. The thread states and the settings of the PMLCanFCS, PMLCanFCU, PMLCanFCM1 and PMLCanFCM0 fields in D.2.3 Event Selection Events to count are determined by placing an imple- mentation defined event value into the PMLCa0..15EVENT field. Which events may be pro- grammed into which counter are implementation spe- cific and should be defined in the User's Manual. In general, most events may be programmed into any of the implementation available counters. Programming a counter with an event that is not supported for that counter gives boundedly undefined results. 1114 Power ISATM Book III-E Version 2.06 Programming Note Programming Note Event name and event numbers will differ greatly When taking a Performance Monitor interrupt soft- across implementations and software should not ware should clear the overflow condition by reading expect that events and event names will be consis- the counter register and setting the counter register tent. to a non-overflow value since the normal return from the interrupt will set MSREE back to 1. D.2.4 Thresholds Thresholds are values that must be exceeded for an D.3 Performance Monitor Regis- event to be counted. Threshold values are pro- ters grammed in the PMLCb0..15THRESHOLD field. The events which may be thresholded and the units of each event that may be thresholded are implementation- D.3.1 Performance Monitor Global dependent. Programming a threshold value for an event that is not defined to use a threshold gives Control Register 0 boundedly undefined results. The Performance Monitor Global Control Register 0 (PMGC0) controls all Performance Monitor counters. D.2.5 Performance Monitor PMGC0 Exception 32 63 A Performance Monitor exception occurs when counter Figure 76. [User] Performance Monitor Global overflow detection is enabled and a counter overflows. Control Register 0 More specifically, for each counter register n, if PMGC0PMIE=1 and PMLCanCE=1 and PMCnOV=1 and These bits are interpreted as follows: MSREE = 1, a Performance Monitor exception is said to Bit Description exist. The Performance Monitor exception condition will cause a Performance Monitor interrupt if the exception 32 Freeze All Counters (FAC) is the highest priority exception. The FAC bit is sticky; that is, once set to 1 it remains set to 1 until it is set to 0 by an mtpmr The Performance Monitor exception is level sensitive instruction. and the exception condition may cease to exist if any of 0 The PMCs can be incremented (if enabled the required conditions fail to be met. Thus it is possible by other Performance Monitor control for a counter to overflow and continue counting events fields). until PMCnOV becomes 0 without taking a Performance 1 The PMCs can not be incremented. Monitor interrupt if MSREE = 0 during the overflow con- dition. To avoid this, software should program the 33 Performance Monitor Interrupt Enable counters to freeze if an overflow condition is detected (PMIE) (see Section D.3.4). 0 Performance Monitor interrupts are dis- abled. D.2.6 Performance Monitor Inter- 1 Performance Monitor interrupts are enabled and occur when an enabled con- rupt dition or event occurs. Enabled conditions A Performance Monitor interrupt occurs when a Perfor- and events are described in Section D.2.5. mance Monitor exception exists and no higher priority 34 Freeze Counters on Enabled Condition or exception exists. When a Performance Monitor inter- Event (FCECE) rupt occurs, SRR0 and SRR1 record the current state Enabled conditions and events are described of the NIA and the MSR and the MSR is set to handle in Section D.2.5. the interrupt. If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction exe- 0 The PMCs can be incremented (if enabled cution resumes at address IVPR0:51 || 0x260. Other- by other Performance Monitor control wise, instruction execution resumes at address fields). IVPR0:47 || IVOR3548:59|| 0b0000. 1 The PMCs can be incremented (if enabled by other Performance Monitor control The Performance Monitor interrupt is precise and asyn- fields) only until an enabled condition or chronous. event occurs. When an enabled condition or event occurs, PMGC0FAC is set to 1. It is the user's responsibility to set PMGC0FAC to 0. Appendix D. Example Performance Monitor 1115 Version 2.06 35:63 Reserved 0 Overflow conditions for PMCn cannot occur (PMCn cannot cause interrupts, The UPMGC0 register is an alias to the PMGC0 regis- cannot freeze counters) ter for user mode read only access. 1 Overflow conditions occur when the most- significant-bit of PMCn is equal to 1. D.3.2 Performance Monitor Local It is recommended that CE be set to 0 when Control A Registers counter PMCn is selected for chaining; see Section D.5.1. The Performance Monitor Local Control A Registers 0 through 15 (PMLCa0..15) function as event selectors 38:40 Reserved and give local control for the corresponding numbered 41:47 Event Selector (EVENT) Performance Monitor counters. PMLCa works with the Up to 128 events selectable; see Section corresponding numbered PMLCb register. D.2.3. 48:53 Setting is implementation-dependent. PMLCa0..15 32 63 54:63 Reserved Figure 77. [User] Performance Monitor Local The UPMLCa0..15 registers are aliases to the Control A Registers PMLCa0..15 registers for user mode read only access. PMLCa is set to 0 at reset. These bits are interpreted as follows: D.3.3 Performance Monitor Local Bit Description Control B Registers 32 Freeze Counter (FC) The Performance Monitor Local Control B Registers 0 0 The PMC can be incremented (if enabled through 15 (PMLCb0..15) specify a threshold value and by other Performance Monitor control a multiple to apply to a threshold event selected for the fields). corresponding Performance Monitor counter. Thresh- 1 The PMC can not be incremented. old capability is implementation counter dependent. Not all events or all counters of an implementation are 33 Freeze Counter in Supervisor State (FCS) guaranteed to support thresholds. PMLCb works with 0 The PMC is incremented (if enabled by the corresponding numbered PMLCa register. other Performance Monitor control fields). 1 The PMC can not be incremented if PMLCb0..15 MSRPR is 0. 32 63 34 Freeze Counter in User State (FCU) Figure 78. [User] Performance Monitor Local Control B Register 0 The PMC can be incremented (if enabled by other Performance Monitor control PMLCb is set to 0 at reset. These bits are interpreted fields). as follows: 1 The PMC can not be incremented if Bit Description MSRPR is 1. 32:52 Reserved 35 Freeze Counter while Mark is Set (FCM1) 53:55 Threshold Multiple (THRESHMUL) 0 The PMC can be incremented (if enabled by other Performance Monitor control 000 Threshold field is multiplied by 1 fields). (THRESHOLD × 1) 1 The PMC can not be incremented if 001 Threshold field is multiplied by 2 MSRPMM is 1. (THRESHOLD × 2) 010 Threshold field is multiplied by 4 36 Freeze Counter while Mark is Cleared (THRESHOLD × 4) (FCM0) 011 Threshold field is multiplied by 8 0 The PMC can be incremented (if enabled (THRESHOLD × 8) by other Performance Monitor control 100 Threshold field is multiplied by 16 fields). (THRESHOLD × 16) 1 The PMC can not be incremented if 101 Threshold field is multiplied by 32 MSRPMM is 0. (THRESHOLD × 32) 37 Condition Enable (CE) 110 Threshold field is multiplied by 64 (THRESHOLD × 64) 1116 Power ISATM Book III-E Version 2.06 111 Threshold field is multiplied by 128 2,147,483,648 (0x8000_0000) to a value greater than (THRESHOLD × 128) or equal to 2,147,483,648 (0x8000_0000). 56:57 Reserved Several different actions may occur when an overflow state is reached, depending on the configuration: 58:63 Threshold (THRESHOLD) Only events that exceed the value THRESH- If PMLCanCE is 0, no special actions occur on OLD multiplied as described by THRESHMUL overflow: the counter continues incrementing, and are counted. Events to which a threshold no exception is signaled. value applies are implementation-dependent If PMLCanCE and PMGC0FCECE are 1, all counters as are the unit (for example duration in cycles) are frozen when PMCn overflows. and the granularity with which the threshold If PMLCanCE, PMGC0PMIE, and MSREE are 1, an value is interpreted. exception is signalled when PMCn reaches over- flow. Note that the interrupts are masked by setting Programming Note MSREE to 0. An overflow condition may be present while MSREE is zero, but the interrupt is not taken By varying the threshold value, software can obtain until MSREE is set to 1. a profile of the event characteristics subject to thresholding. For example, if PMC1 is configured to If an overflow condition occurs while MSREE is 0 (the count cache misses that last longer than the exception is masked), the exception is still signalled threshold value, software can measure the distribu- once MSREE is set to 1 if the overflow condition is still tion of cache miss durations for a given program by present and the configuration has not been changed in monitoring the program repeatedly using a different the meantime to disable the exception; however, if MSREE remains 0 until after the counter leaves the threshold value each time. overflow state (MSB becomes 0), or if MSREE remains 0 until after PMLCanCE or PMGC0PMIE are set to 0, the The UPMLCb0..15 registers are aliases to the exception does not occur. PMLCb0..15 registers for user mode read only access. Programming Note D.3.4 Performance Monitor Loading a PMC with an overflowed value can cause an immediate exception. For example, if Counter Registers PMLCanCE, PMGC0PMIE, and MSREE are all 1, The Performance Monitor Counter Registers and an mtpmr loads an overflowed value into a (PMC0..15) are 32-bit counters that can be pro- PMCn that previously held a non-overflowed value, grammed to generate interrupt signals when they over- then an interrupt will be generated before any flow. Each counter is enabled to count up to 128 event counting has occurred. events. The following sequence is generally recommended for PMC0..15 setting the counter values and configurations. 32 63 1. Set PMGC0FAC to 1 to freeze the counters. Figure 79. [User] Performance Monitor Counter 2. Perform a series of mtpmr operations to initialize Registers counter values and configure the control registers PMCs are set to 0 at reset. These bits are interpreted 3. Release the counters by setting PMGC0FAC to 0 as follows: with a final mtpmr. Bit Description 32 Overflow (OV) 0 Counter has not reached an overflow state. 1 Counter has reached an overflow state. 33:63 Counter Value (CV) Indicates the number of occurrences of the specified event. The minimum value for a counter is 0 (0x0000_0000) and the maximum value is 4,294,967,295 (0xFFFF_FFFF). A counter can increment up to the maximum value and then wraps to the minimum value. A counter enters the overflow state when the high- order bit is set to 1, which normally occurs only when the counter increments from a value below Appendix D. Example Performance Monitor 1117 Version 2.06 D.4 Performance Monitor Instructions Move From Performance Monitor Register Move To Performance Monitor Register XFX-form XFX-form mfpmr RT,PMRN mtpmr PMRN,RS 31 RT pmrn 334 / 31 RS pmrn 462 / 0 6 11 21 31 0 6 11 21 31 n pmrn5:9 || pmrn0:4 n pmrn5:9 || pmrn0:4 if length(PMR(n)) = 64 then if length(PMR(n)) = 64 then RT PMR(n) PMR(n) (RS) else else RT 32 0 || PMR(n)32:63 PMR(n) (RS)32:63 Let PMRN denote a Performance Monitor Register Let PMRN denote a Performance Monitor Register number and PMR the set of Performance Monitor Reg- number and PMR the set of Performance Monitor Reg- isters. isters. The contents of the designated Performance Monitor The contents of the register RS are placed into the des- Register are placed into register RT. ignated Performance Monitor Register. The list of defined Performance Monitor Registers and The list of defined Performance Monitor Registers and their privilege class is provided in Figure 80. their privilege class is provided in Figure 80. Execution of this instruction specifying a defined and Execution of this instruction specifying a defined and privileged Performance Monitor Register when privileged Performance Monitor Register when MSRPR=1 will result in a Privileged Instruction excep- MSRPR=1 will result in a Privileged Instruction excep- tion. tion. Category: Embedded.Hypervisor] [Category: Embedded.Hypervisor] If MSRPPMMP = 1 and MSRGS = 1, execution of this If MSRPPMMP = 1 and MSRGS = 1 and MSRPR = 0, instruction specifying a defined Performance Monitor execution of this instruction specifying a defined Perfor- Register sets RT to 0. mance Monitor Register results in a Embedded Hyper- visor Privilege exception. Execution of this instruction specifying an undefined Performance Monitor Register will either result in an Execution of this instruction specifying an undefined Illegal Instruction exception or will produce an unde- Performance Monitor Register will either result in an fined value for register RT. Illegal Instruction exception or will perform no opera- tion. Special Registers Altered: None Special Registers Altered: None PMR1 Privileged decimal Register Name Cat pmrn5:9 pmrn0:4 mtpmr mfpmr 0-15 00000 0xxxx PMC0..15 - no E.PM 16-31 00000 1xxxx PMC0..15 yes yes E.PM 128-143 00100 0xxxx PMLCA0..15 - no E.PM 144-159 00100 1xxxx PMLCA0..15 yes yes E.PM 256-271 01000 0xxxx PMLCB0..15 - no E.PM 272-287 01000 1xxxx PMLCB0..15 yes yes E.PM 384 01100 00000 PMGC0 - no E.PM 400 01100 10000 PMGC0 yes yes E.PM - This register is not defined for this instruction. 1 Note that the order of the two 5-bit halves of the PMR number is reversed. Figure 80. Embedded.Peformance Monitor PMRs 1118 Power ISATM Book III-E Version 2.06 D.5 Performance Monitor Soft- ware Usage Notes D.5.1 Chaining Counters An implementation may contain events that are used to "chain" counters together to provide a larger range of event counts. This is accomplished by programming the desired event into one counter and programming another counter with an event that occurs when the first counter transitions from 1 to 0 in the most significant bit. The counter chaining feature can be used to decrease the processing pollution caused by Performance Moni- tor interrupts, (things like cache contamination, and pipeline effects), by allowing a higher event count than is possible with a single counter. Chaining two counters together effectively adds 32 bits to a counter register where the first counter's carry-out event acts like a carry-out feeding the second counter. By defining the event of interest to be another PMC's overflow genera- tion, the chained counter increments each time the first counter rolls over to zero. Multiple counters may be chained together. Because the entire chained value cannot be read in a single instruction, an overflow may occur between counter reads, producing an inaccurate value. A sequence like the following is necessary to read the complete chained value when it spans multiple counters and the counters are not frozen. The example shown is for a two-counter case. loop: mfpmr Rx,pmctr1 #load from upper counter mfpmr Ry,pmctr0 #load from lower counter mfpmr Rz,pmctr1 #load from upper counter cmp cr0,0,Rz,Rx #see if `old' = `new' bc 4,2,loop #loop if carry occurred between reads The comparison and loop are necessary to ensure that a consistent set of values has been obtained. The above sequence is not necessary if the counters are frozen. D.5.2 Thresholding Threshold event measurement enables the counting of duration and usage events. Assume an example event, dLFB load miss cycles, requires a threshold value. A dLFB load miss cycles event is counted only when the number of cycles spent recovering from the miss is greater than the threshold. If the event is counted on two counters and each counter has an individual threshold, one execution of a performance monitor pro- gram can sample two different threshold values. Mea- suring code performance with multiple concurrent thresholds expedites code profiling significantly. Appendix D. Example Performance Monitor 1119 Version 2.06 1120 Power ISATM Book III-E Version 2.06 Book VLE: Power ISA Operating Environment Architecture - Variable Length Encoding (VLE) Environment [Category: Variable Length Encoding] Book VLE: Power ISA AS Operating Environment Architecture 1121 Version 2.06 1122 Power ISATM Book VLE Version 2.06 Chapter 1. Variable Length Encoding Introduction 1.1 Overview. . . . . . . . . . . . . . . . . . . 1123 1.4.6 R-form (16-bit Monadic 1.2 Documentation Conventions. . . . 1124 Instructions) . . . . . . . . . . . . . . . . . . . . 1125 1.2.1 Description of Instruction 1.4.7 RR-form (16-bit Dyadic Operation . . . . . . . . . . . . . . . . . . . . . 1124 Instructions) . . . . . . . . . . . . . . . . . . . . 1125 1.3 Instruction Mnemonics and 1.4.8 SD4-form (16-bit Load/Store Operands . . . . . . . . . . . . . . . . . . . . . 1124 Instructions) . . . . . . . . . . . . . . . . . . . . 1125 1.4 VLE Instruction Formats . . . . . . . 1124 1.4.9 BD15-form . . . . . . . . . . . . . . . . 1125 1.4.1 BD8-form (16-bit Branch Instruc- 1.4.10 BD24-form . . . . . . . . . . . . . . . 1125 tions) . . . . . . . . . . . . . . . . . . . . . . . . . 1124 1.4.11 D8-form . . . . . . . . . . . . . . . . . 1125 1.4.2 C-form (16-bit Control 1.4.12 ESC-form . . . . . . . . . . . . . . . . 1125 Instructions) . . . . . . . . . . . . . . . . . . . 1124 1.4.13 I16A-form . . . . . . . . . . . . . . . . 1125 1.4.3 IM5-form (16-bit register + immediate 1.4.14 I16L-form . . . . . . . . . . . . . . . . 1125 Instructions) . . . . . . . . . . . . . . . . . . . 1124 1.4.15 M-form . . . . . . . . . . . . . . . . . . 1125 1.4.4 OIM5-form (16-bit register + offset 1.4.16 SCI8-form . . . . . . . . . . . . . . . . 1125 immediate Instructions) . . . . . . . . . . . 1124 1.4.17 LI20-form . . . . . . . . . . . . . . . . 1125 1.4.5 IM7-form (16-bit Load immediate 1.4.18 X-form . . . . . . . . . . . . . . . . . . 1126 Instructions) . . . . . . . . . . . . . . . . . . . 1124 1.4.19 Instruction Fields . . . . . . . . . . 1126 This chapter describes computation modes, document standard instruction encodings and VLE instructions for conventions, a processor overview, instruction formats, that page of memory. storage addressing, and instruction addressing. Instruction encodings in pages marked as VLE are either 16 or 32 bits long, and are aligned on 16-bit 1.1 Overview boundaries. Because of this, all instruction pages marked as VLE are required to use Big-Endian byte Variable Length Encoding (VLE) is a code density opti- ordering. mized re-encoding of much of the instruction set The programming model uses the same register set defined by Books I, II, and III-E using both 16-bit and with both instruction set encodings, although some reg- 32-bit instruction formats. isters are not accessible by VLE instructions using the VLE offers more efficient binary representations of 16-bit formats and not all condition register (CR) fields applications for the embedded processor spaces where are used by Conditional Branch instructions or instruc- code density plays a major role in affecting overall sys- tions that access the condition register executing from tem cost, and to a somewhat lesser extent, perfor- a VLE instruction page. In addition, immediate fields mance. and displacements differ in size and use, due to the more restrictive encodings imposed by VLE instruction VLE is a supplement to the instruction set defined by formats. Book I-III and code pages using VLE encoding or non- VLE encoding can be intermingled in a system provid- VLE additional instruction fields are described in ing focus on both high performance and code density Section 1.4.19, "Instruction Fields". where most needed. Other than the requirement of Big-Endian byte ordering VLE provides alternative encodings to instructions for instruction pages and the additional storage defined in Books I-III to enable reduced code footprint. attribute to identify whether the instruction page corre- This set of alternative encodings is selected on a page sponds to a VLE section of code, VLE complies with basis. A single storage attribute bit selects between the memory model, register model, timer facilities, debug facilities, and interrupt/exception model defined Chapter 1. Variable Length Encoding Introduction 1123 Version 2.06 in Book I-III and therefore execute in the same environ- In some cases an instruction field must contain a partic- ment as non-VLE instructions. ular value. If a field that must contain a particular value does not contain that value, the instruction form is invalid and the results are as described for invalid 1.2 Documentation Conventions instruction forms in Book I. Book VLE adheres to the documentation conventions VLE instructions use split field notation as defined in defined inSection 1.3 of Book I. Note however that this Section 1.6 of Book I. book defines instructions that apply to the User Instruc- tion Set Architecture, the Virtual Environment Architec- 1.4.1 BD8-form (16-bit Branch ture, and the Operating Environment Architecture. Instructions) 1.2.1 Description of Instruction 0 5 6 8 15 Operation OPCD BO16 BI16 BD8 OPCD X O LK BD8 The RTL (register transfer language) descriptions in Book VLE conform to the conventions described in Figure 1. BD8 instruction format Section 1.3.4 of Book I. 1.4.2 C-form (16-bit Control 1.3 Instruction Mnemonics and Instructions) Operands 0 15 The description of each instruction includes the mne- OPCD monic and a formatted list of operands. VLE instruction OPCD LK semantics are either identical or similar to those of other instructions in the architecture. Where the semantics, side-effects, and binary encodings are iden- Figure 2. C instruction format tical, the standard mnemonics and formats are used. Such unchanged instructions are listed and appropri- 1.4.3 IM5-form (16-bit register + ately referenced, but the instruction definitions are not replicated in this book. Where the semantics are similar immediate Instructions) but the binary encodings differ, the standard mnemonic 0 6 7 12 15 is typically preceded with an e_ to denote a VLE instruction. To distinguish between similar instructions OPCD X O UI5 RX available in both 16- and 32-bit forms under VLE and standard instructions, VLE instructions encoded with Figure 3. IM5 instruction format 16 bits have an se_ prefix. The following are examples: stwx RS,RA,RB // standard Book I instruction e_stw RS,D(RA) // 32-bit VLE instruction 1.4.4 OIM5-form (16-bit register + se_stw RZ,SD4(RX) // 16-bit VLE instruction offset immediate Instructions) 1.4 VLE Instruction Formats 0 OPCD 6 7 X OIM5 12 RX 15 O All VLE instructions to be executed are either two or OPCD R C OIM5 RX four bytes long and are halfword-aligned in storage. Thus, whenever instruction addresses are presented to Figure 4. OIM5 instruction format the processor (as in Branch instructions), the low-order bit is treated as 0. Similarly, whenever the processor generates an instruction address, the low-order bit is 1.4.5 IM7-form (16-bit Load imme- zero. diate Instructions) The format diagrams given below show horizontally all valid combinations of instruction fields. Only those for- 0 5 12 15 mats that are unique to VLE-defined instructions are OPCD UI7 RX included here. Instruction forms that are available in VLE or non-VLE mode are described in Section 1.6 of Figure 5. IM7 instruction format Book I and are not repeated here. 1124 Power ISATM Book VLE Version 2.06 1.4.6 R-form (16-bit Monadic 1.4.12 ESC-form Instructions) 0 6 11 16 21 31 OPCD // // ELEV XO 0 6 12 15 OPCD XO RX 1.4.13 I16A-form Figure 6. R instruction format 0 6 11 16 21 31 OPCD si RA XO si 1.4.7 RR-form (16-bit Dyadic OPCD ui RA XO ui Instructions) Figure 12. I16A instruction format 0 6 8 12 15 OPCD XO RY RX OPCD X R O C RY RX 1.4.14 I16L-form OPCD XO ARY RX 0 6 11 16 21 31 OPCD XO RY ARX OPCD RT ui XO ui Figure 7. RR instruction format Figure 13. I16L instruction format 1.4.8 SD4-form (16-bit Load/Store 1.4.15 M-form Instructions) 0 6 11 16 21 26 31 0 4 8 12 15 OPCD RS RA SH MB ME X O OPCD SD4 RZ RX OPCD RS RA SH MB ME X O Figure 8. SD4 instruction format Figure 14. M instruction format 1.4.9 BD15-form 1.4.16 SCI8-form 0 6 10 12 16 31 0 6 11 16 21 22 24 31 OPCD XO BO32 BI32 BD15 LK OPCD RT RA XO Rc F SCL UI8 OPCD RT RA XO F SCL UI8 Figure 9. BD15 instruction format OPCD RS RA XO Rc F SCL UI8 OPCD RS RA XO F SCL UI8 1.4.10 BD24-form OPCD 000 BF32 RA XO F SCL UI8 OPCD 001 BF32 RA XO F SCL UI8 0 6 7 31 OPCD XO RA XO F SCL UI8 OPCD 0 BD24 LK Figure 15. SC18 instruction format Figure 10. BD24 instruction format 1.4.17 LI20-form 1.4.11 D8-form 0 6 11 16 17 21 31 0 6 11 16 24 31 OPCD RT li20 XO li20 li20 OPCD RT RA XO D8 OPCD RS RA XO D8 Figure 16. LI20 instruction format Figure 11. D8 instruction format Chapter 1. Variable Length Encoding Introduction 1125 Version 2.06 1.4.18 X-form BF32 (9:10) Field used to specify one of the Condition 0 6 9 11 16 21 31 Register fields to be used as a target of a OPCD BF 0 RA RB XO / compare instruction. D8 (24:31) Figure 17. X instruction format The D8 field is a 8-bit signed displacement which is sign-extended to 64 bits. 1.4.19 Instruction Fields ELEV (16:20) Field used by the e_sc instruction. VLE uses instruction fields defined in Section 1.6.28 of F (21) Fill value used to fill the remaining 56 bits of a Book I as well as VLE-defined instruction fields defined scaled-immediate 8 value. below. LI20 (17:20 || 11:15 || 21:31) ARX (12:15) A 20-bit signed immediate value which is sign- Field used to specify an "alternate" General extended to 64 bits for the e_li instruction. Purpose Register in the range R8:R23 to be used as a destination. LK (7, 15, 31) LINK bit. ARY (8:11) Field used to specify an "alternate" General 0 Do not set the Link Register. Purpose Register in the range R8:R23 to be 1 Set the Link Register. The sum of the used as a source. value 2 or 4 and the address of the Branch instruction is placed into the Link Register. BD8 (8:15), BD15 (16:30), BD24 (7:30) OIM5 (7:11) Immediate field specifying a signed two's Offset Immediate field used to specify a 5-bit complement branch displacement which is unsigned fixed-point value in the range [1:32] concatenated on the right with 0b0 and sign- encoded as [0:31]. Thus the binary encoding extended to 64 bits. of 0b00000 represents an immediate value of BD15. (Used by 32-bit branch conditional 1, 0b00001 represents an immediate value of class instructions) A 15-bit signed displace- 2, and so on. ment that is sign-extended and shifted left one OPCD (0:3, 0:4, 0:5, 0:9, 0:14, 0:15) bit (concatenated with 0b0) and then added to Primary opcode field. the current instruction address to form the branch target address. Rc (6, 7, 20, 31) RECORD bit. BD24. (Used by 32-bit branch class instruc- tions) A 24-bit signed displacement that is 0 Do not alter the Condition Register. sign-extended and shifted left one bit (concat- 1 Set Condition Register Field 0. enated with 0b0) and then added to the cur- RX (12:15) rent instruction address to form the branch Field used to specify a General Purpose Reg- target address. ister in the ranges R0:R7 or R24:R31 to be BD8. (Used by 16-bit branch and branch con- used as a source or as a destination. R0 is ditional class instructions) An 8-bit signed dis- encoded as 0b0000, R1 as 0b0001, etc. R24 placement that is sign-extended and shifted is encoded as 0b1000, R25 as 0b1001, etc. left one bit (concatenated with 0b0) and then RY (8:11) added to the current instruction address to Field used to specify a General Purpose Reg- form the branch target address. ister in the ranges R0:R7 or R24:R31 to be BI16 (6:7), BI32 (12:15) used as a source. R0 is encoded as 0b0000, Field used to specify one of the Condition R1 as 0b0001, etc. R24 is encoded as Register fields to be used as a condition of a 0b1000, R25 as 0b1001, etc. Branch Conditional instruction. RZ (8:11) BO16 (5), BO32 (10:11) Field used to specify a General Purpose Reg- ister in the ranges R0:R7 or R24:R31 to be Field used to specify whether to branch if the used as a source or as a destination for load/ condition is true, false, or to decrement the store data. R0 is encoded as 0b0000, R1 as Count Register and branch if the Count Regis- 0b0001, etc. R24 is encoded as 0b1000, R25 ter is not zero in a Branch Conditional instruc- as 0b1001, etc. tion. 1126 Power ISATM Book VLE Version 2.06 SCL (22:23) Field used to specify a scale amount in Imme- diate instructions using the SCI8-form. Scaling involves left shifting by 0, 8, 16, or 24 bits. SD4 (4:7) Used by 16-bit load and store class instruc- tions. The SD4 field is a 4-bit unsigned imme- diate value zero-extended to 64 bits, shifted left according to the size of the operation, and then added to the base register to form a 64- bit EA. For byte operations, no shift is per- formed. For half-word operations, the immedi- ate is shifted left one bit (concatenated with 0b0). For word operations, the immediate is shifted left two bits (concatenated with 0b00).SI (6:10 || 21:31, 11:15 || 21:31) A 16-bit signed immediate value sign- extended to 64 bits and used as one operand of the instruction. UI (6:10 || 21:31, 11:15 || 21:31) A 16-bit unsigned immediate value zero- extended to 64 bits or padded with 16 zeros and used as one operand of the instruction. The instruction encoding differs between the I16A and I16L instruction formats as shown in Section 1.4.13 and Section 1.4.14. UI5 (7:11) Immediate field used to specify a 5-bit unsigned fixed-point value. UI7 (5:11) Immediate field used to specify a 7-bit unsigned fixed-point value. UI8 (24:31) Immediate field used to specify an 8-bit unsigned fixed-point value. XO (6, 6:7, 6:10, 6:11, 16, 16:19, 16:20, 16:23, 31) Extended opcode field. Assembler Note For scaled immediate instructions using the SCI8- form, the instruction assembly syntax requires a single immediate value, sci8, that the assembler will synthesize into the appropriate F, SCL, and UI8 fields. The F, SCL, and UI8 fields must be able to be formed correctly from the given sci8 value or the assembler will flag the assembly instruction as an error. Chapter 1. Variable Length Encoding Introduction 1127 Version 2.06 1128 Power ISATM Book VLE Version 2.06 Chapter 2. VLE Storage Addressing 2.1 Data Storage Addressing 2.2.2 VLE Exception Syndrome Modes . . . . . . . . . . . . . . . . . . . . . . . . 1129 Bits. . . . . . . . . . . . . . . . . . . . . . . . . . . 1130 2.2 Instruction Storage Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . 1130 2.2.1 Misaligned, Mismatched, and Byte Ordering Instruction Storage Exceptions . . . . . . . . . . . . . . . . . . . . 1130 A program references memory using the effective address (EA) computed by the processor when it exe- cutes a Storage Access or Branch instruction (or cer- tain other instructions described in Book II and Book III- E), or when it fetches the next sequential instruction. 2.1 Data Storage Addressing Modes Table 1 lists data storage addressing modes supported by the VLE category. Table 1: Data Storage Addressing Modes Mode Form Description Base+16-bit displacement D-form The 16-bit D field is sign-extended and added to the contents of the GPR (32-bit instruction format) designated by RA or to zero if RA = 0 to produce the EA. Base+8-bit displacement D8-form The 8-bit D8 field is sign-extended and added to the contents of the GPR (32-bit instruction format) designated by RA or to zero if RA = 0 to produce the EA. Base+scaled 4-bit displace- SD4-form The 4-bit SD4 field zero-extended, scaled (shifted left) according to the ment size of the operand, and added to the contents of the GPR designated (16-bit instruction format) by RX to produce the EA. (Note that RX = 0 is not a special case.) Base+Index X-form The GPR contents designated by RB are added to the GPR contents (32-bit instruction format) designated by RA or to zero if RA = 0 to produce the EA. Chapter 2. VLE Storage Addressing 1129 Version 2.06 2.2 Instruction Storage Addressing Modes Table 2 lists instruction storage addressing modes sup- ported by the VLE category. Table 2: Instruction Storage Addressing Modes Mode Description Taken BD24-form Branch instruc- The 24-bit BD24 field is concatenated on the right with 0b0, sign-extended, and tions (32-bit instruction format) then added to the address of the branch instruction. Taken B15-form Branch instruc- The 15-bit BD15 field is concatenated on the right with 0b0, sign-extended, and tions (32-bit instruction format) then added to the address of the branch instruction to form the EA of the next instruction. Take BD8-form Branch instruc- The 8-bit BD8 field is concatenated on the right with 0b0, sign-extended, and tions (16-bit instruction format) then added to the address of the branch instruction to form the EA of the next instruction. Sequential instruction fetching (or The value 4 [2] is added to the address of the current 32-bit [16-bit] instruction to non-taken branch instructions) form the EA of the next instruction. If the address of the current instruction is 0xFFFF_FFFF_FFFF_FFFC [0xFFFF_FFFF_FFFF_FFFE] in 64-bit mode or 0xFFFF_FFFC [0xFFFF_FFFE] in 32-bit mode, the address of the next sequential instruction is undefined. Any Branch instruction with The value 4 is added to the address of the current branch instruction and the LK = 1 (32-bit instruction for- result is placed into the LR. If the address of the current instruction is mat) 0xFFFF_FFFF_FFFF_FFFC in 64-bit mode o r0xFFFF_FFFC in 32-bit mode, the result placed into the LR is undefined. Branch se_bl. se_blrl. se_bctrl The value 2 is added to the address of the current branch instruction and the instructions (16-bit instruction result is placed into the LR. If the address of the current instruction is format) 0xFFFF_FFFF_FFFF_FFFE in 64-bit mode or 0xFFFF_FFFE in 32-bit mode, the result placed into the LR is undefined. 2.2.1 Misaligned, Mismatched, attribute bit set to 0. If a Mismatched Instruction Stor- age Exception is detected and no higher priority excep- and Byte Ordering Instruction Stor- tion exists, an Instruction Storage Interrupt will occur age Exceptions setting SRR0(GSRR0) to the misaligned address for which execution was attempted. A Misaligned Instruction Storage Exception occurs A Byte Ordering Instruction Storage Exception occurs when an implementation which supports VLE attempts when an implementation which supports VLE attempts to execute an instruction that is not 32-bit aligned and to execute an instruction that has the VLE storage the VLE storage attribute is not set for the page that attribute set to 1 and the E (Endian) storage attribute corresponds to the effective address of the instruction. set to 1 for the page that corresponds to the effective The attempted execution can be the result of a Branch address of the instruction. If a Byte Ordering Instruction instruction which has bit 62 of the target address set to Storage Exception is detected and no higher priority 1 or the result of an rfi, se_rfi, rfci, se_rfci, rfdi, exception exists, an Instruction Storage Interrupt will se_rfdi, rfgi, se_rfgi, rfmci, or se_rfmci instruction occur setting SRR0(GSRR0) to the address for which which has bit 62 set in SRR0, SRR0, CSRR0, CSRR0, execution was attempted. DSRR0, DSRR0, GSRR0, GSRR0, MCSRR0, or MCSRR0 respectively. If a Misaligned Instruction Stor- age Exception is detected and no higher priority excep- 2.2.2 VLE Exception Syndrome tion exists, an Instruction Storage Interrupt will occur setting SRR0(GSRR0) to the misaligned address for Bits which execution was attempted. Two bits in the Exception Syndrome Register (ESR) A Mismatched Instruction Storage Exception occurs (see Section 7.2.13 of Book III-E) are provided to facili- when an implementation which supports VLE attempts tate VLE exception handling, VLEMI and MIF. to execute an instruction that crosses a page boundary ESR(GESR)VLEMI is set when an exception and subse- for which the first page has the VLE storage attribute quent interrupt is caused by the execution or attempted set to 1 and the second page has the VLE storage 1130 Power ISATM Book VLE Version 2.06 execution of an instruction that resides in memory with the VLE storage attribute set. ESR(GESR)MIF is set when an Instruction Storage Interrupt is caused by a Misaligned Instruction Storage Exception or when an Instruction TLB Error Interrupt was caused by a TLB miss on the second half of a mis- aligned 32-bit instruction. ESR(GESR)BO is set when an Instruction Storage Interrupt is caused by a Mismatched Instruction Stor- age Exception or a Byte Ordering Instruction Storage Exception. Programming Note When an Instruction TLB Error Interrupt occurs as the result of a Instruction TLB miss on the second half of a 32-bit VLE instruction that is aligned to only 16-bits, SRR0 will point to the first half of the instruction and ESRMIF will be set to 1. Any other status posted as a result of the TLB miss (such as MAS register updates described in Chapter 6 of Book III-E) will reflect the page corresponding to the second half of the instruction which caused the Instruction TLB miss. Chapter 2. VLE Storage Addressing 1131 Version 2.06 1132 Power ISATM Book VLE Version 2.06 Chapter 3. VLE Compatibility with Books I­III 3.1 Overview. . . . . . . . . . . . . . . . . . . 1133 3.2.2 MMU Extensions . . . . . . . . . . . 1133 3.2 VLE Processor and Storage Control 3.3 VLE Limitations . . . . . . . . . . . . . . 1134 Extensions . . . . . . . . . . . . . . . . . . . . 1133 3.2.1 Instruction Extensions . . . . . . . 1133 This chapter addresses the relationship between VLE 3.2.1 Instruction Extensions and Books I­III. This section describes extensions to support VLE oper- ations. Because instructions may reside on a half-word 3.1 Overview boundary, bit 62 is not masked by instructions that read an instruction address from a register, such as the LR, Category VLE uses the same semantics as Books I­III. CTR, or a save/restore register 0, that holds an instruc- Due to the limited instruction encoding formats, VLE tion address: instructions typically support reduced immediate fields and displacements, and not all operations defined by The instruction set defined by Books I-III is modified to Books I­III are encoded in category VLE. The basic support halfword instruction addressing, as follows: philosophy is to capture all useful operations, with most For Return From Interrupt instructions, such as rfi, frequent operations given priority. Immediate fields and rfci, rfdi, rfgi, and rfmci no longer mask bit 62 of displacements are provided to cover the majority of the respective save/restore register 0. The desti- ranges encountered in embedded control code. nation address is SRR00:62 || 0b0, CSRR00:62 || Instructions are encoded in either a 16- or 32-bit for- 0b0, DSRR00:62 || 0b0, GSRR00:62 || 0b0, and mat, and these may be freely intermixed. MCSRR00:62 || 0b0, respectively. For bclr, bclrl, bcctr, and bcctrl no longer mask VLE instructions cannot access floating-point registers bit 62 of the LR or CTR. The destination address is (FPRs). VLE instructions use GPRs and SPRs with the LR0:62 || 0b0 or CTR0:62 || 0b0. following limitations: VLE instructions using the 16-bit formats are lim- ited to addressing GPR0­GPR7, and GPR24­ 3.2.2 MMU Extensions GPR31 in most instructions. Move instructions are VLE operation is indicated by the VLE storage attribute. provided to transfer register contents between When the VLE storage attribute for a page is set to 1, these registers and GPR8­GPR23. instruction fetches from that page are decoded and VLE compare and bit test instructions using the processed as VLE instructions. See Section 6.8.3 of 16-bit formats implicitly set their results in CR0. Book III-E. VLE instruction encodings are generally different than When instructions are executing from a page that has instructions defined by Books I­III, except that most the VLE storage attribute set to 1, the processor is said instructions falling within primary opcode 31 are to be in VLE mode. encoded identically and have identical semantics unless they affect or access a resource not supported by category VLE. 3.3 VLE Limitations VLE instruction fetches are valid only when performed 3.2 VLE Processor and Storage in a Big-Endian mode. Attempting to fetch an instruc- Control Extensions tion in a Little-Endian mode from a page with the VLE storage attribute set causes an Instruction Storage This section describes additional functionality to sup- Byte-ordering exception. port category VLE. Chapter 3. VLE Compatibility with Books I­III 1133 Version 2.06 Support for concurrent modification and execution of VLE instructions is implementation-dependent. 1134 Power ISATM Book VLE Version 2.06 Chapter 4. Branch Operation Instructions 4.1 Branch Facility Registers . . . . . . 1135 4.1.2 Link Register (LR) . . . . . . . . . . 1136 4.1.1 Condition Register (CR). . . . . . 1135 4.1.3 Count Register (CTR) . . . . . . . 1136 4.1.1.1 Condition Register Setting for 4.2 Branch Instructions . . . . . . . . . . . 1137 Compare Instructions . . . . . . . . . . . . 1136 4.3 System Linkage Instructions . . . . 1140 4.1.1.2 Condition Register Setting for the 4.4 Condition Register Instructions . . 1144 Bit Test Instruction. . . . . . . . . . . . . . . 1136 This section defines Branch instructions that can be A specified CR field can be set as the result of a executed when a processor is in VLE mode and the fixed-point compare instruction. registers that support them. CR field 0 can be set as the result of a fixed-point bit test instruction. 4.1 Branch Facility Registers Other instructions from implemented categories may also set bits in the CR in the same manner that they The registers that support branch operations are: would when not in VLE mode. Section 4.1.1, "Condition Register (CR)" Instructions are provided to perform logical operations Section 4.1.2, "Link Register (LR)" on individual CR bits and to test individual CR bits. Section 4.1.3, "Count Register (CTR)" For all fixed-point instructions in which the Rc bit is defined and set, and for e_add2i., e_and2i.,and 4.1.1 Condition Register (CR) e_and2is., the first three bits of CR field 0 (CR32:34) are set by signed comparison of the result to zero, and The Condition Register (CR) is a 32-bit register which the fourth bit of CR field 0 (CR35) is copied from the reflects the result of certain operations, and provides a final state of XERSO. "Result" here refers to the entire mechanism for testing (and branching). The CR is 64-bit value placed into the target register in 64-bit more fully defined in Book I. mode, and to bits 32:63 of the value placed into the tar- Category VLE uses the entire CR, but some compari- get register in 32-bit mode. son operations and all Branch instructions are limited to using CR0­CR3. The full Book I condition register field if (64-bit mode) and logical operations are provided however. then M 0 else M 32 if (target_register)M:63 < 0 then c 0b100 CR else if (target_register)M:63 > 0 then c 0b010 32 63 else c 0b001 CR0 c || XERSO Figure 18. Condition Register If any portion of the result is undefined, the value The bits in the Condition Register are grouped into placed into the first three bits of CR field 0 is undefined. eight 4-bit fields, CR Field 0 (CR0) ... CR Field 7 (CR7), which are set by VLE defined instructions in one of the The bits of CR field 0 are interpreted as shown below. following ways. CR Bit Description Specified fields of the condition register can be set by a move to the CR from a GPR (mtcrf, mtocrf). 32 Negative (LT) A specified CR field can be set by a move to the The result is negative. CR from another CR field (e_mcrf) or from 33 Positive (GT) XER32:35 (mcrxr). The result is positive. CR field 0 can be set as the implicit result of a 34 Zero (EQ) fixed-point instruction. The result is 0. Chapter 4. Branch Operation Instructions 1135 Version 2.06 35 Summary overflow (SO) 4.1.2 Link Register (LR) This is a copy of the contents of XERSO at the completion of the instruction. VLE instructions use the Link Register (LR) as defined in Book I, although category VLE defines a subset of all variants of Book I conditional branches involving the 4.1.1.1 Condition Register Setting for LR. Compare Instructions For compare instructions, a CR field specified by the 4.1.3 Count Register (CTR) BF operand for the e_cmph, e_cmphl, e_cmpi, and e_cmpli instructions, or CR0 for the se_cmpl, VLE instructions use the Count Register (CTR) as e_cmp16i, e_cmph16i, e_cmphl16i, e_cmpl16i, defined in Book I, although category VLE defines a se_cmp, se_cmph, se_cmphl, se_cmpi, and subset of the variants of Book I conditional branches se_cmpli instructions, is set to reflect the result of the involving the CTR. comparison. The CR field bits are interpreted as shown below. A complete description of how the bits are set is given in the instruction descriptions and Section 5.6, "Fixed-Point Compare and Bit Test Instructions". Condition register bits settings for compare instructions are interpreted as follows. (Note: e_cmpi, and e_cmpli instructions have a BF32 field instead of BF field; for these instructions, BF32 should be substituted for BF in the list below.) CR Bit Description 4×BF + 32 Less Than (LT) For signed fixed-point compare, (RA) or (RX) < sci8, SI, (RB), or (RY). For unsigned fixed-point compare, (RA) or (RX) sci8, SI, (RB), or (RY). For unsigned fixed-point compare, (RA) or (RX) >u sci8, UI, UI5, (RB), or (RY). 4×BF + 34 Equal (EQ) For fixed-point compare, (RA) or (RX) = sci8, UI, UI5, SI, (RB), or (RY). 4×BF + 35 Summary Overflow (SO) For fixed-point compare, this is a copy of the contents of XERSO at the completion of the instruction. 4.1.1.2 Condition Register Setting for the Bit Test Instruction The Bit Test Immediate instruction, se_btsti, also sets CR field 0. See the instruction description and also Section 5.6, "Fixed-Point Compare and Bit Test Instruc- tions". 1136 Power ISATM Book VLE Version 2.06 4.2 Branch Instructions The sequence of instruction execution can be changed Encodings for the BO32 field for VLE are shown in by the branch instructions. Because VLE instructions Figure 19. must be aligned on half-word boundaries, the low-order bit of the generated branch target address is forced to 0 BO32 Description by the processor in performing the branch. 00 Branch if the condition is false. The branch instructions compute the EA of the target in 01 Branch if the condition is true. one of the following ways, as described in Section 2.2, 10 Decrement CTRM:63, then branch if the "Instruction Storage Addressing Modes" decremented CTRM:630 1. Adding a displacement to the address of the 11 Decrement CTRM:63, then branch if the branch instruction. decremented CTRM:63=0. 2. Using the address contained in the LR (Branch to Link Register [and Link]). Figure 19. BO32 field encodings 3. Using the address contained in the CTR (Branch Encodings for the BO16 field for VLE are shown in to Count Register [and Link]). Figure 20. Branching can be conditional or unconditional, and the return address can optionally be provided. If the return address is to be provided (LK = 1), the EA of the instruction following the branch instruction is placed BO16 Description into the LR after the branch target address has been 0 Branch if the condition is false. computed; this is done regardless of whether the 1 Branch if the condition is true. branch is taken. Figure 20. BO16 field encodings In branch conditional instructions, the BI32 or BI16 instruction field specifies the CR bit to be tested. For 32-bit instructions using BI32, CR32:47 (corresponding to bits in CR0:CR3) may be specified. For 16-bit instructions using BI16, only CR32:35 (bits within CR0) may be specified. In branch conditional instructions, the BO32 or BO16 field specifies the conditions under which the branch is taken and how the branch is affected by or affects the CR and CTR. Note that VLE instructions also have dif- ferent encodings for the BO32 and BO16 fields than in Book I's BO field. If the BO32 field specifies that the CTR is to be decre- mented, in 64-bit mode CTR0:63 are decremented, and in 32-bit mode CTR32:63 are decremented. If BO16 or BO32 specifies a condition that must be TRUE or FALSE, that condition is obtained from the contents of CRBI32+32 or CRBI16+32. (Note that CR bits are num- bered 32:63. BI32 or BI16 refers to the condition regis- ter bit field in the branch instruction encoding. For example, specifying BI32 = 2 refers to CR34.) For Figure 19 let M = 0 in 64-bit mode and M = 32 in 32-bit mode. Chapter 4. Branch Operation Instructions 1137 Version 2.06 Branch [and Link] BD24-form Branch [and Link] BD8-form e_b target_addr (LK=0) se_b target_addr (LK=0) e_bl target_addr (LK=1) se_bl target_addr (LK=1) 30 0 BD24 LK 58 0 LK BD8 0 6 7 31 0 6 7 8 15 NIA iea CIA + EXTS(BD24 || 0b0) NIA iea CIA + EXTS(BD8 || 0b0) if LK then LR iea CIA + 4 if LK then LR iea CIA + 2 target_addr specifies the branch target address. target_addr specifies the branch target address. The branch target address is the sum of BD24 || 0b0 The branch target address is the sum of BD8 || 0b0 sign-extended and the address of this instruction, with sign-extended and the address of this instruction, with the high-order 32 bits of the branch target address set the high-order 32 bits of the branch target address set to 0 in 32-bit mode. to 0 in 32-bit mode. If LK=1 then the effective address of the instruction fol- If LK=1 then the effective address of the instruction fol- lowing the Branch instruction is placed into the Link lowing the Branch instruction is placed into the Link Register. Register. Special Registers Altered: Special Registers Altered: LR (if LK=1) LR (if LK=1) Branch Conditional [and Link] BD15-form Branch Conditional Short Form BD8-form e_bc BO32,BI32,target_addr (LK=0) se_bc BO16,BI16,target_addr e_bcl BO32,BI32,target_addr (LK=1) 28 BO16 BI16 BD8 30 8 BO32 BI32 BD15 LK 0 5 6 8 15 0 6 10 12 16 31 cond_ok (CRBI16+32 BO16) if (64-bit mode) if cond_ok then then M 0 NIA iea CIA + EXTS(BD8 || 0b0) else M 32 else NIA iea CIA + 2 if BO320 then CTRM:63 CTRM:63 - 1 The BI16 field specifies the Condition Register bit to be ctr_ok ¬BO320 | ((CTRM:63 0) BO321) cond_ok BO320 | (CRBI32+32 BO321) tested. The BO16 field is used to resolve the branch as if ctr_ok & cond_ok then described in Figure 20. target_addr specifies the NIA iea (CIA + EXTS(BD15 || 0b0)) branch target address. else The branch target address is the sum of BD8 || 0b0 NIA iea CIA + 4 if LK then LR iea CIA + 4 sign-extended and the address of this instruction, with the high-order 32 bits of the branch target address set The BI32 field specifies the Condition Register bit to be to 0 in 32-bit mode. tested. The BO32 field is used to resolve the branch as described in Figure 19. target_addr specifies the Special Registers Altered: branch target address. None The branch target address is the sum of BD15 || 0b0 sign-extended and the address of this instruction, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction following the Branch instruction is placed into the Link Register. Special Registers Altered: CTR (if BO320=1) LR (if LK=1) 1138 Power ISATM Book VLE Version 2.06 Branch to Count Register [and Link] Branch to Link Register [and Link]C-form C-form se_blr (LK=0) se_bctr (LK=0) se_blrl (LK=1) se_bctrl (LK=1) 02 LK 03 LK 0 15 0 15 NIA iea LR0:62 || 0b0 NIA iea CTR0:62 || 0b0 if LK then LR iea CIA + 2 if LK then LR iea CIA + 2 The branch target address is LR0:62 || 0b0 with the The branch target address is CTR0:62 || 0b0 with the high-order 32 bits of the branch target address set to 0 high-order 32 bits of the branch target address set to 0 in 32-bit mode. in 32-bit mode. If LK=1 then the effective address of the instruction If LK=1 then the effective address of the instruction following the Branch instruction is placed into the Link following the Branch instruction is placed into the Link Register. Register. Special Registers Altered: Special Registers Altered: LR (if LK=1) LR (if LK=1) Chapter 4. Branch Operation Instructions 1139 Version 2.06 4.3 System Linkage Instructions The System Linkage instructions enable the program to in Book I and Book III-E with the exception of the LEV call upon the system to perform a service and provide a field, but are encoded differently. means by which the system can return from performing se_sc provides the same functionality as the Book I a service or from processing an interrupt. System Link- (and Book III-E) instruction sc without the LEV field. age instructions defined by the VLE category are identi- se_rfi, se_rfci, se_rfdi, and se_rfmci provide the cal in semantics to System Linkage instructions defined same functionality as the Book III-E instructions rfi, rfci, rfdi, and rfmci respectively. System Call C-form,ESC-form If MSRGS = 0 or if ELEV = 1, the effective address of the instruction following the System Call instruction is se_sc placed into SRR0 and the contents of the MSR are copied into SRR1. Otherwise, the effective address of 02 the instruction following the System Call instruction is 0 15 placed into GSRR0 and the contents of the MSR are copied into GSRR1. ELEV values greater than 1 are reserved. Bits 0:3 of the ELEV field (instruction bits e_sc ELEV [Category:Embedded.Hypervisor] 16:19) are treated as a reserved field. 31 /// /// ELEV 36 / If ELEV=0, a System Call interrupt is generated. If 0 6 11 16 21 31 ELEV=1, an Embedded Hypervisor System Call inter- rupt is generated. The interrupt causes the MSR to be set as described in Section 7.6.10 and Section 7.6.27 lev = ELEV of Book III-E. if `se_sc' then lev 0 If ELEV=0 or se_sc is executed, and the processor is rr0 iea CIA + 2 in guest state, instruction execution resumes at the else if `e_sc' then address given by one of the following. lev ELEV GIVPR0:47 || GIVOR848:59||0b0000 if IVORs [Cate- rr0 iea CIA + 4 gory:Embedded.Phased-Out] are supported. if lev = 0 then GIVPR0:51||0x120 if Interrupt Fixed Offsets [Cate- if MSRGS = 1 then gory:Embedded.Phased-In] are supported. GSRR0 iea rr0 GSRR1 MSR If ELEV=0 or se_sc is executed, and the processor is if IVORs supported then in hypervisor state, instruction execution resumes at NIA GIVPR0:47 || the address given by one of the following. GIVOR848:59 || 0b0000 IVPR0:47 || IVOR848:59||0b0000 if IVORs [Cate- else gory:Embedded.Phased-Out] are supported. NIA GIVPR0:51 || 0x120 IVPR0:51||0x120 if Interrupt Fixed Offsets [Cate- MSR new_value (see below) gory:Embedded.Phased-In] are supported. else If ELEV=1, the interrupt causes instruction execution to SRR0 iea rr0 resume at the address given by one of the following. SRR1 MSR GIVPR0:47 || GIVOR4048:59||0b0000 if IVORs [Cat- if IVORs supported then egory:Embedded.Phased-Out] are supported. NIA IVPR0:47 || GIVPR0:51||0x300 if Interrupt Fixed Offsets [Cate- IVOR848:59 || 0b0000 gory:Embedded.Phased-In] are supported. else NIA IVPR0:51 || 0x120 This instruction is context synchronizing. MSR new_value (see below) else if ELEV = 1 then Special Registers Altered: SRR0 iea CIA + 4 SRR0 GSRR0 SRR1 GSRR1 MSR SRR1 MSR if IVORs supported then Programming Note NIA IVPR0:47 || IVOR4048:59 || 0b0000 e_sc serves as both a basic and an extended mne- else monic. The Assembler will recognize an e_sc mne- NIA IVPR0:51 || 0x300 MSR new_value (see below) monic with one operand as the basic form, and an e_sc mnemonic with no operand as the extended If category E.HV is not implemented, the System Call form. In the extended form, the ELEV operand is instruction behaves as if MSRGS = 0 and ELEV = 0. omitted and assumed to be 0. 1140 Power ISATM Book VLE Version 2.06 Illegal C-form Return From Machine Check Interrupt C- form se_illegal se_rfmci 0 0 15 11 0 15 se_illegal is used to request an Illegal Instruction exception. MSR MCSRR1 NIA ieaMCSRR00:62 || 0b0 The behavior is the same as if an illegal instruction was executed. The se_rfmci instruction is used to return from a machine check class interrupt, or as a means of estab- This instruction is context synchronizing. lishing a new context and synchronizing on that new Special Registers Altered: context simultaneously. SRR0 SRR1 MSR ESR The contents of MCSRR1 are placed into the MSR. If the new MSR value does not enable any pending exceptions, then the next instruction is fetched, under control of the new MSR value, from the address MCSRR00:62||0b0. If the new MSR value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is gener- ated; in this case the values placed into the save/ restore registers by the interrupt processing mecha- nism (see Chapter 7 of Book III-E) is the address and MSR value of the instruction that would have been exe- cuted next had the interrupt not occurred (that is, the address in MCSRR0 at the time of the execution of the se_rfmci). This instruction is privileged and context synchronizing. Special Registers Altered: MSR Chapter 4. Branch Operation Instructions 1141 Version 2.06 Return From Critical Interrupt C-form Return From Interrupt C-form se_rfci se_rfi 09 08 0 15 0 15 MSR CSRR1 MSR SRR1 NIA iea CSRR00:62 || 0b0 NIA ieaSRR00:62 || 0b0 The se_rfci instruction is used to return from a critical The se_rfi instruction is used to return from a non-criti- class interrupt, or as a means of establishing a new cal class interrupt, or as a means of establishing a new context and synchronizing on that new context simulta- context and synchronizing on that new context simulta- neously. neously. The contents of CSRR1 are placed into the MSR. If the The contents of SRR1 are placed into the MSR. If the new MSR value does not enable any pending excep- new MSR value does not enable any pending excep- tions, then the next instruction is fetched, under control tions, then the next instruction is fetched under control of the new MSR value, from the address of the new MSR value from the address SRR00:62||0b0. CSRR00:62||0b0. If the new MSR value enables one or If the new MSR value enables one or more pending more pending exceptions, the interrupt associated with exceptions, the interrupt associated with the highest the highest priority pending exception is generated; in priority pending exception is generated; in this case the this case the values placed into the save/restore regis- values placed into the save/restore registers by the ters by the interrupt processing mechanism (see Chap- interrupt processing mechanism (see Chapter 7 of ter 7 of Book III-E) is the address and MSR value of the Book III-E) is the address and MSR value of the instruc- instruction that would have been executed next had the tion that would have been executed next had the inter- interrupt not occurred (that is, the address in CSRR0 at rupt not occurred (that is, the address in SRR0 at the the time of the execution of the se_rfci). time of the execution of the se_rfi). This instruction is privileged and context synchronizing. This instruction is privileged and context synchronizing. Special Registers Altered: Special Registers Altered: MSR MSR 1142 Power ISATM Book VLE Version 2.06 Return From Debug Interrupt C-form Return From Guest Interrupt C-form se_rfdi se_rfgi i 10 12 0 15 0 15 MSR DSRR1 NIA ieaDSRR032:62 || 0b0 newmsr GSRR1 if MSRGS = 1 then The se_rfdi instruction is used to return from a debug newmsrGS,WE MSRGS,WE class interrupt, or as a means of establishing a new prots MSRPUCLEP,DEP,PMMP context and synchronizing on that new context simulta- newmsr prots & MSR | ~prots & newmsr neously. MSR newmsr NIA iea GSRR00:62 || 0b0 The contents of DSRR1 are placed into the MSR. If the new MSR value does not enable any pending excep- The se_rfgi instruction is used to return from a guest tions, then the next instruction is fetched, under control state base class interrupt, or as a means of simulta- of the new MSR value, from the address neously establishing a new context and synchronizing DSRR00:62||0b0. If the new MSR value enables one or on that new context. more pending exceptions, the interrupt associated with The contents of Guest Save/Restore Register 1 are the highest priority pending exception is generated; in placed into the MSR. If the se_rfgi is executed in the this case the value placed into the save/restore regis- guest supervisor state (MSRGS PR = 0b10), the bits ters by the interrupt processing mechanism (see Chap- MSRGS WE are not modified and the bits MSRUCLE DE ter 7 of Book III-E) is the address of the instruction that would have been executed next had the interrupt not PMM are modified only if the associated bits in the Machine State Register Protect (MSRP) Register are occurred (that is, the address in DSRR0 at the time of set to 0. If the new MSR value does not enable any the execution of se_rfdi). pending exceptions, then the next instruction is This instruction is privileged and context synchronizing. fetched, under control of the new MSR value, from the address GSRR00:62||0b0. If the new MSR value Special Registers Altered: enables one or more pending exceptions, the interrupt MSR associated with the highest priority pending exception Corequisite Categories: is generated; in this case the value placed into the Embedded.Enhanced Debug associated save/restore register 0 by the interrupt pro- cessing mechanism is the address of the instruction that would have been executed next had the interrupt not occurred (i.e. the address in GSRR0 at the time of the execution of the se_rfgi). This instruction is privileged and context synchronizing. Special Registers Altered: MSR Corequisite Categories: Embedded.Hypervisor Chapter 4. Branch Operation Instructions 1143 Version 2.06 4.4 Condition Register Instructions Condition Register instructions are provided to transfer does remap the CR-logical and mcrf instruction func- values to and from various portions of the CR. Cate- tionality into primary opcode 31. These instructions gory VLE does not introduce any additional functional- operate identically to the Book I instructions, but are ity beyond that defined in Book I for CR operations, but encoded differently. Condition Register AND XL-form Condition Register AND with Complement XL-form e_crand BT,BA,BB e_crandc BT,BA,BB 31 BT BA BB 257 / 0 6 11 16 21 31 31 BT BA BB 129 / 0 6 11 16 21 31 CRBT+32 CRBA+32 & CRBB+32 The bit in the Condition Register specified by BA+32 is CRBT+32 CRBA+32 & ¬CRBB+32 ANDed with the bit in the Condition Register specified by BB+32, and the result is placed into the bit in the The bit in the Condition Register specified by BA+32 is Condition Register specified by BT+32. ANDed with the one's complement of the bit in the Con- Special Registers Altered: dition Register specified by BB+32, and the result is CRBT+32 placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32 Condition Register Equivalent XL-form Condition Register NAND XL-form e_creqv BT,BA,BB e_crnand BT,BA,BB 31 BT BA BB 289 / 31 BT BA BB 225 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 CRBA+32 CRBB+32 CRBT+32 ¬(CRBA+32 & CRBB+32) The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is XORed with the bit in the Condition Register specified ANDed with the bit in the Condition Register specified by BB+32, and the complemented result is placed into by BB+32, and the complemented result is placed into the bit in the Condition Register specified by BT+32. the bit in the Condition Register specified by BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 1144 Power ISATM Book VLE Version 2.06 Condition Register NOR XL-form Condition Register OR XL-form e_crnor BT,BA,BB e_cror BT,BA,BB 31 BT BA BB 33 / 31 BT BA BB 449 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 ¬(CRBA+32 | CRBB+32) CRBT+32 CRBA+32 | CRBB+32 The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is ORed with the bit in the Condition Register specified by ORed with the bit in the Condition Register specified by BB+32, and the complemented result is placed into the BB+32, and the result is placed into the bit in the Con- bit in the Condition Register specified by BT+32. dition Register specified by BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 Condition Register OR with Complement Condition Register XOR XL-form XL-form e_crxor BT,BA,BB e_crorc BT,BA,BB 31 BT BA BB 193 / 31 BT BA BB 417 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 CRBA+32 CRBB+32 CRBT+32 CRBA+32 | ¬CRBB+32 The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is XORed with the bit in the Condition Register specified ORed with the complement of the bit in the Condition by BB+32, and the result is placed into the bit in the Register specified by BB+32, and the result is placed Condition Register specified by BT+32. into the bit in the Condition Register specified by Special Registers Altered: BT+32. CRBT+32 Special Registers Altered: CRBT+32 Move CR Field XL-form e_mcrf BF,BFA 31 BF // BFA /// /// 16 / 0 6 9 11 14 16 21 31 CR4xBF+32:4xBF+35 CR4xBFA+32:4xBFA+35 The contents of Condition Register field BFA are cop- ied to Condition Register field BF. Special Registers Altered: CR field BF Chapter 4. Branch Operation Instructions 1145 Version 2.06 1146 Power ISATM Book VLE Version 2.06 Chapter 5. Fixed-Point Instructions 5.1 Fixed-Point Load Instructions . . . 1147 5.7 Fixed-Point Trap Instructions . . . . 1163 5.2 Fixed-Point Store Instructions. . . 1151 5.8 Fixed-Point Select Instruction . . . 1163 5.3 Fixed-Point Load and Store with Byte 5.9 Fixed-Point Logical, Bit, and Move Reversal Instructions. . . . . . . . . . . . . 1154 Instructions . . . . . . . . . . . . . . . . . . . . 1164 5.4 Fixed-Point Load and Store Multiple 5.10 Fixed-Point Rotate and Shift Instructions . . . . . . . . . . . . . . . . . . . . 1154 Instructions . . . . . . . . . . . . . . . . . . . . 1169 5.5 Fixed-Point Arithmetic 5.11 Move To/From System Register Instructions . . . . . . . . . . . . . . . . . . . . 1155 Instructions . . . . . . . . . . . . . . . . . . . . 1172 5.6 Fixed-Point Compare and Bit Test Instructions . . . . . . . . . . . . . . . . . . . . 1159 This section lists the fixed-point instructions supported by category VLE. 5.1 Fixed-Point Load Instructions The fixed-point Load instructions compute the effective the instruction form is invalid. This is the same behavior address (EA) of the memory to be accessed as as specified for load with update instructions in Book I. described in Section 2.1, "Data Storage Addressing The fixed-point Load instructions from Book I, lbzx, Modes" lbzux, lhzx, lhzux, lhax, lhaux, lwzx, and lwzux are The byte, halfword, word, or doubleword in storage available while executing in VLE mode. The mnemon- addressed by EA is loaded into RT or RZ. ics, decoding, and semantics for these instructions are identical to those in Book I. See Section 3.3.2 of Book I Category VLE supports both Big- and Little-Endian byte for the instruction definitions. ordering for data accesses. The fixed-point Load instructions from Book I, lwax, Some fixed-point load instructions have an update form lwaux, ldx, and ldux are available while executing in in which RA is updated with the EA. For these forms, if VLE mode on 64-bit implementations. The mnemonics, RA0 and RART, the EA is placed into RA and the decoding, and semantics for these instructions are memory element (byte, halfword, word, or doubleword) identical to those in Book I. See Section 3.3.2 of Book addressed by EA is loaded into RT. If RA=0 or RA =RT, Ifor the instruction definitions. Chapter 5. Fixed-Point Instructions 1147 Version 2.06 Load Byte and Zero D-form Load Byte and Zero Short Form SD4-form e_lbz RT,D(RA) se_lbz RZ,SD4(RX) 12 RT RA D 08 SD4 RZ RX 0 6 11 16 31 0 4 8 12 15 if RA = 0 then b 0 EA (RX)+ 600 || SD4 560 || MEM(EA, 1) else b (RA) RZ EA b + EXTS(D) RT 56 0 || MEM(EA, 1) Let the effective address (EA) be the sum RX + SD4. The byte in storage addressed by EA is loaded into Let the effective address (EA) be the sum (RA|0) + D. RT56:63. RT0:55 are set to 0. The byte in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. Special Registers Altered: None Special Registers Altered: None Load Byte and Zero with Update D8-form Load Halfword Algebraic D-form e_lbzu RT,D8(RA) e_lha RT,D(RA) 06 RT RA 0 D8 14 RT RA D 0 6 11 16 24 31 0 6 11 16 31 EA (RA) + EXTS(D8) if RA = 0 then b 0 RT 560 || MEM(EA, 1) else b (RA) RA EA EA b + EXTS(D) RT EXTS(MEM(EA, 2)) Let the effective address (EA) be the sum (RA) + D8. The byte in storage addressed by EA is loaded into Let the effective address (EA) be the sum (RA|0) + D. RT56:63. RT0:55 are set to 0. The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are filled with a copy of bit 0 of the EA is placed into register RA. loaded halfword. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Load Halfword and Zero D-form Load Halfword and Zero Short Form SD4-form e_lhz RT,D(RA) se_lhz RZ,SD4(RX) 22 RT RA D 0 6 11 16 31 10 SD4 RZ RX 0 4 8 12 15 if RA = 0 then b 0 else b (RA) EA (RX)+ (590 || SD4 || 0) EA b + EXTS(D) RZ 48 0 || MEM(EA, 2) RT 48 0 || MEM(EA, 2) Let the effective address (EA) be the sum (RX) + (SD4 Let the effective address (EA) be the sum (RA|0) + D. || 0). The halfword in storage addressed by EA is The halfword in storage addressed by EA is loaded into loaded into RZ48:63. RZ0:47 are set to 0. RT48:63. RT0:47 are set to 0. Special Registers Altered: Special Registers Altered: None None 1148 Power ISATM Book VLE Version 2.06 Load Halfword Algebraic with Update Load Halfword and Zero with Update D8-form D8-form e_lhau RT,D8(RA) e_lhzu RT,D8(RA) 06 RT RA 03 D8 06 RT RA 01 D8 0 6 11 16 24 31 0 6 11 16 24 31 EA (RA) + EXTS(D8) EA (RA) + EXTS(D8) RT EXTS(MEM(EA, 2)) RT 480 || MEM(EA, 2)) RA EA RA EA Let the effective address (EA) be the sum (RA) + D8. Let the effective address (EA) be the sum (RA) + D8. The halfword in storage addressed by EA is loaded into The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are filled with a copy of bit 0 of the RT48:63. RT0:47 are set to 0. loaded halfword. EA is placed into register RA. EA is placed into RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Load Word and Zero D-form Load Word and Zero Short FormSD4-form e_lwz RT,D(RA) se_lwz RZ,SD4(RX) 20 RT RA D 12 SD4 RZ RX 0 6 11 16 31 0 4 8 12 15 if RA = 0 then b 0 EA (RX)+ (580 || SD4 || 20) 320 || MEM(EA, 2) else b (RA) RZ EA b + EXTS(D) RT 32 0 || MEM(EA, 4) Let the effective address (EA) be the sum (RX) + (SD4 || 00). The word in storage addressed by EA is loaded Let the effective address (EA) be the sum (RA|0) + D. into RZ32:63. RZ0:31 are set to 0. The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. Special Registers Altered: None Special Registers Altered: None Chapter 5. Fixed-Point Instructions 1149 Version 2.06 Load Word and Zero with Update D8-form e_lwzu RT,D8(RA) 06 RT RA 02 D8 0 6 11 16 24 31 EA (RA) + EXTS(D8) 320 || MEM(EA, 4)) RT RA EA Let the effective address (EA) be the sum (RA) + D8. The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None 1150 Power ISATM Book VLE Version 2.06 5.2 Fixed-Point Store Instructions The fixed-point Store instructions compute the EA of If RS=RA, the contents of register RS are copied the memory to be accessed as described in to the target memory element and then EA is Section 2.1, "Data Storage Addressing Modes". placed into register RA (RS). The contents of register RS or RZ are stored into the The fixed-point Store instructions from Book I, stbx, byte, halfword, word, or doubleword in storage stbux, sthx, sthux, stwx, and stwux are available addressed by EA. while executing in VLE mode. The mnemonics, decod- ing, and semantics for those instructions are identical Category VLE supports both Big- and Little-Endian byte to those in Book I; see Section 3.3.3 of Book I for the ordering for data accesses. instruction definitions. Some fixed-point store instructions have an update The fixed-point Store instructions from Book I, stdx and form, in which register RA is updated with the effective stdux are available while executing in VLE mode on address. For these forms, the following rules (from 64-bit implementations. The mnemonics, decoding, Book I) apply. and semantics for these instructions are identical to If RA0, the effective address is placed into regis- those in Book I; see Section 3.3.3 of Book I for the ter RA. instruction definitions. Store Byte D-form Store Byte Short Form SD4-form e_stb RS,D(RA) se_stb RZ,SD4(RX) 13 RS RA D 09 SD4 RZ RX 0 6 11 16 31 0 4 8 12 15 if RA = 0 then b 0 EA (RX) + EXTS(SD4) else b (RA) MEM(EA, 1) (RZ)56:63 EA b + EXTS(D) MEM(EA, 1) (RS)56:63 Let the effective address (EA) be the sum (RX) + SD4. (RZ)56:63 are stored in the byte in storage addressed by Let the effective address (EA) be the sum (RA|0)+ D. EA. (RS)56:63 are stored in the byte in storage addressed by EA. Special Registers Altered: None Special Registers Altered: None Chapter 5. Fixed-Point Instructions 1151 Version 2.06 Store Byte with Update D8-form e_stbu RS,D8(RA) 06 RS RA 04 D8 0 6 11 16 24 31 EA (RA) + EXTS(D8) MEM(EA, 1) (RS)56:63 RA EA Let the effective address (EA) be the sum (RA) + D8. (RS)56:63 are stored in the byte in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None Store Halfword D-form Store Halfword Short Form SD4-form e_sth RS,D(RA) se_sth RZ,SD4(RX) 23 RS RA D 11 SD4 RZ RX 0 6 11 16 31 0 4 8 12 15 if RA = 0 then b 0 EA (RX) + (590 || SD4 || 0) else b (RA) MEM(EA, 2) (RZ)48:63 EA b + EXTS(D) MEM(EA, 2) (RS)48:63 Let the effective address (EA) be the sum (RX) + (SD4 || 0). (RZ)48:63 are stored in the halfword in storage Let the effective address (EA) be the sum (RA|0) + D. addressed by EA. (RS)48:63 are stored in the halfword in storage addressed by EA. Special Registers Altered: None Special Registers Altered: None Store Halfword with Update D8-form e_sthu RS,D8(RA) 06 RS RA 05 D8 0 6 11 16 24 31 EA (RA) + EXTS(D8) MEM(EA, 2) (RS)48:63 RA EA Let the effective address (EA) be the sum (RA) + D8. (RS)48:63 are stored in the halfword in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None 1152 Power ISATM Book VLE Version 2.06 Store Word D-form Store Word Short Form SD4-form e_stw RS,D(RA) se_stw RZ,SD4(RX) 21 RS RA D 13 SD4 RZ RX 0 6 11 16 31 0 4 8 12 15 if RA = 0 then b 0 EA (RX) + (580 || SD4 || 20) else b (RA) MEM(EA, 4) (RZ)32:63 EA b + EXTS(D) MEM(EA, 4) (RS)32:63 Let the effective address (EA) be the sum (RX)+ (SD4 || 00). (RZ)32:63 are stored in the word in storage Let the effective address (EA) be the sum (RA|0) + D. addressed by EA. (RS)32:63 are stored in the word in storage addressed by EA. Special Registers Altered: None Special Registers Altered: None Store Word with Update D8-form e_stwu RS,D8(RA) 06 RS RA 06 D8 0 6 11 16 24 31 EA (RA) + EXTS(D8) MEM(EA, 4) (RS)32:63 RA EA Let the effective address (EA) be the sum (RA) + D8. (RS)32:63 are stored in the word in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None Chapter 5. Fixed-Point Instructions 1153 Version 2.06 5.3 Fixed-Point Load and Store with Byte Reversal Instructions The fixed-point Load with Byte Reversal and Store with Byte Reversal instructions from Book I, lhbrx, lwbrx, sthbrx, and stwbrx are available while executing in VLE mode. The mnemonics, decoding, and semantics for these instructions are identical to those in Book I. See Section 3.3.4 of Book I for the instruction defini- tions. 5.4 Fixed-Point Load and Store Multiple Instructions The Load/Store Multiple instructions have preferred forms; see Section 1.8.1 of Book I. In the preferred forms storage alignment satisfies the following rule. The combination of the EA and RT (RS) is such that the low-order byte of GPR 31 is loaded (stored) from (into) the last byte of an aligned quadword in storage. Load Multiple Word D8-form Store Multiple Word D8-form e_lmw RT,D8(RA) e_stmw RS,D8(RA) 06 RT RA 08 D8 06 RS RA 9 D8 0 6 11 16 24 31 0 6 11 16 24 31 if RA = 0 then b 0 if RA = 0 then b 0 else b (RA) else b (RA) EA b + EXTS(D8) EA b + EXTS(D8) r RT r RS do while r 31 do while r 31 GPR(r) 32 0 || MEM(EA,4) MEM(EA,4) GPR(r)32:63 r r + 1 r r + 1 EA EA + 4 EA EA + 4 Let n = (32-RT). Let the effective address (EA) be the Let n = (32-RS). Let the effective address (EA) be the sum (RA|0) + D8. sum (RA|0) + D8. n consecutive words starting at EA are loaded into the n consecutive words starting at EA are stored from the low-order 32 bits of GPRs RT through 31. The high- low-order 32 bits of GPRs RS through 31. order 32 bits of these GPRs are set to zero. Special Registers Altered: If RA is in the range of registers to be loaded, including None the case in which RA = 0, the instruction form is invalid. Special Registers Altered: None 1154 Power ISATM Book VLE Version 2.06 5.5 Fixed-Point Arithmetic e_addic[.] and e_subfic[.] always set CA to reflect the carry out of bit 0 in 64-bit mode and out of bit 32 in 32- Instructions bit mode. The fixed-point Arithmetic instructions use the contents The fixed-point Arithmetic instructions from Book I, of the GPRs as source operands, and place results into add[.], addo[.], addc[.], addco[.], adde[.], addeo[.], GPRs, into status bits in the XER and into CR0. addme[.], addmeo[.], addze[.], addzeo[.], divw[.], divwo[.], divwu[.], divwuo[.], mulhw[.], mulhwu[.], The fixed-point Arithmetic instructions treat source mullw[.], mullwo[.] neg[.], nego[.], subf[.], subfo[.] operands as signed integers unless the instruction is subfe[.], subfeo[.], subfme[.], subfmeo[.], subfze[.], explicitly identified as performing an unsigned opera- subfzeo[.], subfc[.], and subfco[.] are available while tion. executing in VLE mode. The mnemonics, decoding, and semantics for these instructions are identical to The e_add2i. instruction and other Arithmetic instruc- those in Book I; see Section 3.3.8 of Book I for the tions with Rc=1 set the first three bits of CR0 to charac- instruction definitions. terize the result placed into the target register. In 64-bit mode, these bits are set by signed comparison of the The fixed-point Arithmetic instructions from Book I, result to 0. In 32-bit mode, these bits are set by signed mulld[.], mulldo[.], mulhd[.], mulhdu[.], muldu[.], comparison of the low-order 32 bits of the result to divd[.], divdo[.], divdu[.], and divduo[.] are available zero. while executing in VLE mode on 64-bit implementa- tions. The mnemonics, decoding, and semantics for those instructions are identical to these in Book I; see Section 3.3.8 of Book I for the instruction definitions. Chapter 5. Fixed-Point Instructions 1155 Version 2.06 Add Short Form RR-form Add Immediate D-form se_add RX,RY e_add16i RT,RA,SI 01 0 RY RX 07 RT RA SI 0 6 8 12 15 0 6 11 16 31 RX (RX) + (RY) RT (RA) + EXTS(SI) The sum (RX) + (RY) is placed into register RX. The sum (RA) + SI is placed into register RT. Special Registers Altered: Special Registers Altered: None None Add (2 operand) Immediate and Record Add (2 operand) Immediate Shifted I16A-form I16A-form e_add2i. RA,si e_add2is RA,si 28 si RA 17 si 28 si RA 18 si 0 6 11 16 21 31 0 6 11 16 21 31 RA (RA) + EXTS(si) RA (RA) + EXTS(si || 160) The sum (RA) + si is placed into register RT. The sum (RA) + (si || 0x0000) is placed into register RA. Special Registers Altered: Special Registers Altered: CR0 None Add Scaled Immediate SCI8-form Add Immediate Short Form OIM5-form e_addi RT,RA,sci8 (Rc=0) se_addi RX,oimm e_addi. RT,RA,sci8 (Rc=1) 08 0 OIM5 RX 06 RT RA 8 Rc F SCL UI8 0 6 7 12 15 0 6 11 16 20 21 22 24 31 oimm (590 || OIM5) + 1 sci8 56-SCL×8F || UI8 ||SCL×8F RX (RX) + oimm RT (RA) + sci8 The sum (RX) + oimm is placed into RX. The value of The sum (RA) + sci8 is placed into register RT. oimm must be in the range of 1 to 32. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) None 1156 Power ISATM Book VLE Version 2.06 Add Scaled Immediate Carrying SCI8-form e_addic RT,RA,sci8 (Rc=0) e_addic. RT,RA,sci8 (Rc=1) 06 RT RA 9 Rc F SCL UI8 0 6 11 16 20 21 22 24 31 56-SCL×8 sci8 F || UI8 ||SCL×8F RT (RA) + sci8 The sum (RA) + sci8 is placed into register RT. Special Registers Altered: CR0 (if Rc=1) CA Subtract RR-form Subtract From Short Form RR-form se_sub RX,RY se_subf RX,RY 1 2 RY RX 01 3 RY RX 0 6 8 12 15 0 6 8 12 15 RX ¬(RY) + 1 (RX) + RX ¬(RX) + (RY) + 1 The sum (RX) + ¬(RY) + 1 is placed into register RX. The sum ¬(RX) + (RY) + 1 is placed into register RX. Special Registers Altered: Special Registers Altered: None None Subtract From Scaled Immediate Carrying Subtract Immediate OIM5-form SCI8-form se_subi RX,oimm (Rc=0) e_subfic RT,RA,sci8 (Rc=0) se_subi. RX,oimm (Rc=1) e_subfic. RT,RA,sci8 (Rc=1) 09 Rc OIM5 RX 06 RT RA 11 Rc F SCL UI8 0 6 7 12 15 0 6 11 16 20 21 22 24 31 oimm (590 || OIM5) + 1 sci8 56-SCL×8F || UI8 ||SCL×8F RX (RX) + ¬oimm + 1 RT ¬(RA) + sci8 + 1 The sum (RA) + ¬oimm + 1 is placed into register RX. The sum ¬(RA) + sci8 + 1 is placed into register RT. The value of oimm must be in the range 1 to 32. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) CA Chapter 5. Fixed-Point Instructions 1157 Version 2.06 Multiply Low Scaled Immediate SCI8-form Multiply (2 operand) Low Immediate I16A-form e_mulli RT,RA,sci8 e_mull2i RA,si 06 RT RA 20 F SCL UI8 0 6 11 16 21 22 24 31 28 si RA 20 si 0 6 11 16 21 31 56-SCL×8 sci8 F || UI8 ||SCL×8F prod0:127 (RA) × sci8 prod0:127 (RA) × EXTS(si) RT prod64:127 RA prod64:127 The 64-bit first operand is (RA). The 64-bit second The 64-bit first operand is (RA). The 64-bit second operand is the sci8 operand. The low-order 64-bits of operand is the sign-extended value of the si operand. the 128-bit product of the operands are placed into reg- The low-order 64-bits of the 128-bit product of the oper- ister RT. ands are placed into register RA. Both operands and the product are interpreted as Both operands and the product are interpreted as signed integers. signed integers. Special Registers Altered: Special Registers Altered: None None Multiply Low Word Short Form RR-form Negate Short Form R-form se_mullw RX,RY se_neg RX 01 1 RY RX 0 03 RX 0 6 8 12 15 0 6 12 15 RX (RX)32:63 × (RY)32:63 RX ¬(RX)+ 1 The 32-bit operands are the low-order 32-bits of RX The sum ¬(RX) + 1 is placed into register RX and of RY. The 64-bit product of the operands is placed If the processor is in 64-bit mode and register RX con- into register RX. tains the most negative 64-bit number Both operands and the product are interpreted as (0x8000_0000_0000_0000), the result is the most neg- signed integers. ative 64-bit number. Similarly, if the processor is in 32- bit mode and register RX contains the most negative Special Registers Altered: 32-bit number (0x8000_0000), the result is the most None negative 32-bit number. Special Registers Altered: None 1158 Power ISATM Book VLE Version 2.06 5.6 Fixed-Point Compare and Bit Test Instructions The fixed-point Compare instructions compare the con- The fixed-point Bit Test instruction tests the bit speci- tents of register RA or register RX with one of the fol- fied by the UI5 instruction field and sets the CR0 field lowing: as follows. The value of the scaled immediate field sci8 . formed from the F, UI8, and SCL fields as: 56-SCL×8 Bit Name Description sci8 F || UI8 ||SCL×8F 0 LT Always set to 0 The zero-extended value of the UI field 1 GT RXui5 = 1 The zero-extended value of the UI5 field 2 EQ RXui5 = 0 The sign-extended value of the SI field 3 SO Summary overflow from the XER The contents of register RB or register RY. The following comparisons are signed: e_cmph, The fixed-point Compare instructions from Book I, cmp e_cmpi, e_cmp16i, e_cmph16i, se_cmp, se_cmph, and cmpl are available while executing in VLE mode. and se_cmpi. The mnemonics, decoding, and semantics for these instructions are identical to those in Book I; see The following comparisons are unsigned: e_cmphl, Section 3.3.9 of Book I for the instruction definitions. e_cmpli, e_cmphl16i, e_cmpl16i, se_cmpli, se_cmpl, and se_cmphl. Bit Test Immediate IM5-form Compare Immediate Word I16A-form se_btsti RX,UI5 e_cmp16i RA,si 25 1 UI5 RX 28 si RA 19 si 0 6 7 12 15 0 6 11 16 21 31 a UI5 b EXTS(si) a+32 b 0 || 1 || 31-a0 if (RA)32:63 < b32:63 then c 0b100 c (RX) & b if (RA)32:63 > b32:63 then c 0b010 if c = 0 then d 0b001 else d 0b010 if (RA)32:63 = b32:63 then c 0b001 CR0 d || XERSO CR0 c || XERSO Bit UI5+32 of register RX is tested for equality to '0' and The low-order 32 bits of register RA are compared with the result is recorded in CR0. EQ is set if the tested bit si, treating operands as signed integers. The result of is 0, LT is cleared, and GT is set to the inverse value of the comparison is placed into CR0. EQ. Special Registers Altered: Special Registers Altered: CR0 CR0 Chapter 5. Fixed-Point Instructions 1159 Version 2.06 Compare Scaled Immediate Word Compare Word RR-form SCI8-form se_cmp RX,RY e_cmpi BF32,RA,sci8 3 0 RY RX 06 000 BF32 RA 21 F SCL UI8 0 6 8 12 15 0 6 9 11 16 21 22 24 31 if (RX)32:63 < (RY)32:63 then c 0b100 56-SCL×8 sci8 F || UI8 ||SCL×8F if (RX)32:63 > (RY)32:63 then c 0b010 if (RA)32:63 < sci832:63 then c 0b100 if (RX)32:63 = (RY)32:63 then c 0b001 if (RA)32:63 > sci832:63 then c 0b010 CR0 c || XERSO if (RA)32:63 = sci832:63 then c 0b001 The low-order 32 bits of register RX are compared with CR4×BF32+32:4×BF32+35 c || XERSO the low-order 32 bits of register RY, treating operands The low-order 32 bits of register RA are compared with as signed integers. The result of the comparison is sci8, treating operands as signed integers. The result placed into CR0. of the comparison is placed into CR field BF32. Special Registers Altered: Special Registers Altered: CR0 CR field BF32 Compare Immediate Word Short Form Compare Logical Immediate Word IM5-form I16A-form se_cmpi RX,UI5 e_cmpl16i RA,ui 10 1 UI5 RX 28 ui RA 21 ui 0 6 7 12 15 0 6 11 16 21 31 59 48 b 0 || UI5 b 0 || ui if (RX)32:63 < b32:63 then c 0b100 if (RA)32:63 b32:63 then c 0b010 if (RA)32:63 >u b32:63 then c 0b010 if (RX)32:63 = b32:63 then c 0b001 if (RA)32:63 = b32:63 then c 0b001 CR0 c || XERSO CR0 c || XERSO The low-order 32 bits of register RX are compared with The low-order 32 bits of register RA are compared with UI5, treating operands as signed integers. The result of ui, treating operands as unsigned integers. The result the comparison is placed into CR0. of the comparison is placed into CR0. Special Registers Altered: Special Registers Altered: CR0 CR0 1160 Power ISATM Book VLE Version 2.06 Compare Logical Scaled Immediate Word Compare Logical Word RR-form SCI8-form se_cmpl RX,RY e_cmpli BF32,RA,sci8 3 1 RY RX 06 001 BF32 RA 21 F SCL UI8 0 6 8 12 15 0 6 9 11 16 21 22 24 31 if (RX)32:63 u (RY)32:63 then c 0b010 u if (RA)32:63 < sci832:63 then c if (RX)32:63 = (RY)32:63 then c 0b001 0b100 if (RA)32:63 >u sci832:63 then c 0b010 CR0 c || XERSO if (RA)32:63 = sci832:63 then c 0b001 The low-order 32 bits of register RX are compared with CR4×BF32+32:4×BF32+35 c || XERSO the low-order 32 bits of register RY, treating operands The low-order 32 bits of register RA are compared with as unsigned integers. The result of the comparison is sci8, treating operands as unsigned integers. The placed into CR0. result of the comparison is placed into CR field BF32. Special Registers Altered: Special Registers Altered: CR0 CR field BF32 Compare Logical Immediate Word Compare Halfword X-form OIM5-form e_cmph BF,RA,RB se_cmpli RX,oimm 31 BF 0 RA RB 14 / 08 1 OIM5 RX 0 6 9 11 16 21 31 0 6 7 12 15 a EXTS((RA)48:63) 59 b EXTS((RB)48:63) oimm 0 || (OIM5 + 1) if (RX)32:63 u oimm32:63 then c 0b010 if a > b then c 0b010 if (RX)32:63 = oimm32:63 then c 0b001 if a = b then c 0b001 CR0 c || XERSO CR4×BF+32:4×BF+35 c || XERSO The low-order 32 bits of register RX are compared with The low-order 16 bits of register RA are compared with oimm, treating operands as unsigned integers. The the low-order 16 bits of register RB, treating operands result of the comparison is placed into CR0. The value as signed integers. The result of the comparison is of oimm must be in the range of 1 to 32. placed into CR field BF. Special Registers Altered: Special Registers Altered: CR0 CR field BF Chapter 5. Fixed-Point Instructions 1161 Version 2.06 Compare Halfword Short Form RR-form Compare Halfword Immediate I16A-form se_cmph RX,RY e_cmph16i RA,si 3 2 RY RX 28 si RA 22 si 0 6 8 12 15 0 6 11 16 21 31 a EXTS((RX)48:63) a EXTS((RA)48:63) b EXTS((RY)48:63) b EXTS(si) if a < b then c 0b100 if a < b then c 0b100 if a > b then c 0b010 if a > b then c 0b010 if a = b then c 0b001 if a = b then c 0b001 CR0 c || XERSO CR0 c || XERSO The low-order 16 bits of register RX are compared with The low-order 16 bits of register RA are compared with the low-order 16 bits of register RY, treating operands si, treating operands as signed integers. The result of as signed integers. The result of the comparison is the comparison is placed into CR0. placed into CR0. Special Registers Altered: Special Registers Altered: CR0 CR0 Compare Halfword Logical X-form Compare Halfword Logical Short Form RR-form e_cmphl BF,RA,RB se_cmphl RX,RY 31 BF 0 RA RB 46 / 0 6 9 11 16 21 31 3 3 RY RX 0 6 8 12 15 a EXTZ((RA)48:63) b EXTZ((RB)48:63) a (RX)48:63 if a u b then c 0b010 if a u b then c 0b010 CR4×BF+32:4×BF+35 c || XERSO if a = b then c 0b001 CR0 c || XERSO The low-order 16 bits of register RA are compared with the low-order 16 bits of register RB, treating operands The low-order 16 bits of register RX are compared with as unsigned integers. The result of the comparison is the low-order 16 bits of register RY, treating operands placed into CR field BF. as unsigned integers. The result of the comparison is placed into CR0. Special Registers Altered: CR field BF Special Registers Altered: CR0 1162 Power ISATM Book VLE Version 2.06 Compare Halfword Logical Immediate I16A-form 5.7 Fixed-Point Trap Instruc- tions e_cmphl16i RA,ui The fixed-point Trap instruction from Book I, tw is avail- 28 ui RA 23 ui able while executing in VLE mode. The mnemonics, 0 6 11 16 21 31 decoding, and semantics for this instruction is identical to that in Book I; see Section 3.3.10 of Book I for the a 48 0 || (RA)48:63 instruction definition. 48 b 0 || ui The fixed-point Trap instruction from Book I, td is avail- if a u b then c 0b010 if a = b then c 0b001 tations. The mnemonic, decoding, and semantics for CR0 c || XERSO the td instruction are identical to those in Book I; see Section 3.3.10 of Book I for the instruction definitions. The low-order 16 bits of register RA are compared with the ui field, treating operands as signed integers. The result of the comparison is placed into CR0. 5.8 Fixed-Point Select Instruc- Special Registers Altered: CR0 tion The fixed-point Select instruction provides a means to select one of two registers and place the result in a destination register under the control of a predicate value supplied by a CR bit. The fixed-point Select instruction from Book I, isel is available while executing in VLE mode. The mnemon- ics, decoding, and semantics for this instruction is iden- tical to that in Book I; see Section of Book I for the instruction definition. Chapter 5. Fixed-Point Instructions 1163 Version 2.06 5.9 Fixed-Point Logical, Bit, and Move Instructions The Logical instructions perform bit-parallel operations The fixed-point Logical instructions from Book I, and[.], on 64-bit operands. The Bit instructions manipulate a or[.], xor[.], nand[.], nor[.], eqv[.], andc[.], orc[.], bit, or create a bit mask, in a register. The Move extsb[.], extsh[.], cntlzw[.], and popcntb are avail- instructions move a register or an immediate value into able while executing in VLE mode. The mnemonics, a register. decoding, and semantics for these instructions are identical to those in Book I; see Section 3.3.12 of Book The X-form Logical instructions with Rc=1, the SCI8- I for the instruction definitions. form Logical instructions with Rc=1, the RR-form Logi- cal instructions with Rc=1, the e_and2i. instruction, The fixed-point Logical instructions from Book I, and the e_and2is. instruction set the first three bits of extsw[.] and cntlzd[.] are available while executing in CR field 0 as the arithmetic instructions described in VLE mode on 64-bit implementations. The mnemonics, Section 5.5, "Fixed-Point Arithmetic Instructions". (Also decoding, and semantics for these instructions are see Section 4.1.1.) The Logical instructions do not identical to those in Book I; see Section 3.3.12 of Book change the SO, OV, and CA bits in the XER. I for the instruction definitions. AND (two operand) Immediate I16L-form AND (2 operand) Immediate Shifted I16L-form e_and2i. RT,ui e_and2is. RT,ui 28 RT ui 25 ui 0 6 11 16 21 31 28 RT ui 29 ui 0 6 11 16 21 31 48 RT (RT) & ( 0 || ui) RT (RT) & (320 || ui || 160) The contents of register RT are ANDed with 480 || ui and the result is placed into register RT. The contents of register RT are ANDed with 320 || ui || 16 0 and the result is placed into register RT. Special Registers Altered: CR0 Special Registers Altered: CR0 AND Scaled Immediate AND Immediate Short Form IM5-form SCI8-form se_andi RX,UI5 e_andi RA,RS,sci8 (Rc=0) e_andi. RA,RS,sci8 (Rc=1) 11 1 UI5 RX 0 6 7 12 15 06 RS RA 12 Rc F SCL UI8 0 6 11 16 20 21 22 24 31 RX (RX) & 590 || UI5 56-SCL×8F || UI8 ||SCL×8F The contents of register RX are ANDed with 590 || UI5 sci8 and the result is placed into register RX. RA (RS) & sci8 Special Registers Altered: The contents of register RS are ANDed with sci8 and None the result is placed into register RA. Special Registers Altered: CR0 (if Rc=1) 1164 Power ISATM Book VLE Version 2.06 OR (two operand) Immediate I16L-form OR (2 operand) Immediate Shifted I16L-form e_or2i RT,ui e_or2is RT,ui 28 RT ui 24 ui 0 6 11 16 21 31 28 RT ui 26 ui 0 6 11 16 21 31 RT (RT) | (480 || ui) RT (RT) | (320 || ui || 160) The contents of register RT are ORed with 480 || ui and the result is placed into register RT. The contents of register RT are ORed with 320 || ui || 16 0 and the result is placed into register RT. Special Registers Altered: None Special Registers Altered: None OR Scaled Immediate SCI8-form XOR Scaled Immediate SCI8-form e_ori RA,RS,sci8 (Rc=0) e_xori RA,RS,sci8 (Rc=0) e_ori. RA,RS,sci8 (Rc=1) e_xori. RA,RS,sci8 (Rc=1) 06 RS RA 13 Rc F SCL UI8 06 RS RA 14 Rc F SCL UI8 0 6 11 16 20 21 22 24 31 0 6 11 16 20 21 22 24 31 sci8 56-SCL×8F || UI8 ||SCL×8F sci8 56-SCL×8F || UI8 ||SCL×8F RA (RS) | sci8 RA (RS) sci8 The contents of register RS are ORed with sci8 and the The contents of register RS are XORed with sci8 and result is placed into register RA. the result is placed into register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) AND Short Form RR-form AND with Complement Short Form RR-form se_and RX,RY (Rc=0) se_and. RX,RY (Rc=1) se_andc RX,RY 17 1 Rc RY RX 17 1 RY RX 0 6 7 8 12 15 0 6 8 12 15 RX (RX) & (RY) RX (RX) & ¬(RY) The contents of register RX are ANDed with the con- The contents of register RX are ANDed with the com- tents of register RY and the result is placed into register plement of the contents of register RY and the result is RX. placed into register RX. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) None Chapter 5. Fixed-Point Instructions 1165 Version 2.06 OR Short Form RR-form NOT Short Form R-form se_or RX,RY se_not RX 17 0 RY RX 0 02 RX 0 6 8 12 15 0 6 12 15 RX (RX) | (RY) RX ¬(RX) The contents of register RX are ORed with the contents The contents of RX are complemented and placed into of register RY and the result is placed into register RX. register RX. Special Registers Altered: Special Registers Altered: None None Bit Clear Immediate IM5-form Bit Generate Immediate IM5-form se_bclri RX,UI5 se_bgeni RX,UI5 24 0 UI5 RX 24 1 UI5 RX 0 6 7 12 15 0 6 7 12 15 a UI5 a UI5 RX (RX) & (a+321 || 0 || 31-a1) RX (a+320 || 1 || 31-a0) Bit UI5+32 of register RX is set to 0. Bit UI5+32 of register RX is set to 1. All other bits in register RX are set to 0. Special Registers Altered: None Special Registers Altered: None Bit Mask Generate Immediate IM5-form Bit Set Immediate IM5-form se_bmaski RX,UI5 se_bseti RX,UI5 11 0 UI5 RX 25 0 UI5 RX 0 6 7 12 15 0 6 7 12 15 a UI5 a UI5 64 if a = 0 then RX 1 RX (RX) | (a+320 || 1 || 31-a0) 64-a else RX 0 || a1 Bit UI5+32 of register RX is set to 1. If UI5 is not zero, the low-order UI5 bits are set to 1 in register RX and all other bits in register RX are set to 0. Special Registers Altered: If UI5 is 0, all bits in register RX are set to 1. None Special Registers Altered: None 1166 Power ISATM Book VLE Version 2.06 Extend Sign Byte Short Form R-form Extend Sign Halfword Short Form R-form se_extsb RX se_extsh RX 0 13 RX 0 15 RX 0 6 12 15 0 6 12 15 s (RX)56 s (RX)48 56 48 RX s || (RX)56:63 RX s || (RX)48:63 (RX)56:63 are placed into RX56:63. Bit 56 of register RX (RX)48:63 are placed into RX48:63. Bit 48 of register RX is placed into RX0:55. is placed into RX0:47. Special Registers Altered: Special Registers Altered: None None Extend Zero Byte R-form Extend Zero Halfword R-form se_extzb RX se_extzh RX 0 12 RX 0 14 RX 0 6 12 15 0 6 12 15 RX 560 || (RX)56:63 RX 480 || (RX)48:63 (RX)56:63 are placed into RX56:63. RX0:55 are set to 0. (RX)48:63 are placed into RX48:63. RX0:47 are set to 0. Special Registers Altered: Special Registers Altered: None None Load Immediate LI20-form Load Immediate Short Form IM7-form e_li RT,LI20 se_li RX,UI7 28 RT li20 0 li20 li20 09 UI7 RX 0 6 11 16 17 21 31 0 5 12 15 RT EXTS(li201:4 || li205:8 || li200 || li209:19) RX 570 || UI7 The sign-extended LI20 field is placed into RT. The zero-extended UI7 field is placed into RX. Special Registers Altered: Special Registers Altered: None None Load Immediate Shifted I16L-form e_lis RT,ui 28 RT ui 28 ui 0 6 11 16 21 31 32 RT 0 || ui || 160 The zero-extended value of ui shifted left 16 bits is placed into RT. Special Registers Altered: None Chapter 5. Fixed-Point Instructions 1167 Version 2.06 Move from Alternate Register RR-form Move Register RR-form se_mfar RX,ARY se_mr RX,RY 0 3 ARY RX 0 1 RY RX 0 6 8 12 15 0 6 8 12 15 r ARY+8 RX (RY) RX GPR(r) The contents of register RY are placed into RX. The contents of register ARY+8 are placed into RX. ARY specifies a register in the range R8:R23. Special Registers Altered: None Special Registers Altered: None Move To Alternate Register RR-form se_mtar ARX,RY 0 2 RY ARX 0 6 8 12 15 r ARX+8 GPR(r) (RY) The contents of register RY are placed into register ARX+8. ARX specifies a register in the range R8:R23. Special Registers Altered: None 1168 Power ISATM Book VLE Version 2.06 5.10 Fixed-Point Rotate and Shift Instructions The fixed-point Shift instructions from Book I, slw[.], The fixed-point Shift instructions from Book I, sld[.], srw[.], srawi[.], and sraw[.] are available while execut- srd[.], sradi[.], and srad[.] are available while execut- ing in VLE mode. The mnemonics, decoding, and ing in VLE mode on 64-bit implementations. The mne- semantics for those instructions are identical to those in monics, decoding, and semantics for those instructions Book I; see Section 3.3.13.2 of Book I for the instruc- are identical to those in Book I; see Section 3.3.13.2 of tion definitions. Book I for the instruction definitions. Rotate Left Word X-form Rotate Left Word Immediate X-form e_rlw RA,RS,RB (Rc=0) e_rlwi RA,RS,SH (Rc=0) e_rlw. RA,RS,RB (Rc=1) e_rlwi. RA,RS,SH (Rc=1) 31 RS RA RB 280 Rc 31 RS RA SH 312 Rc 0 6 11 16 21 31 0 6 11 16 21 31 n (RB)59:63 n SH RA ROTL32((RS)32:63,n) RA ROTL32((RS)32:63,n) The contents of register RS are rotated32 left the num- The contents of register RS are rotated32 left SH bits ber of bits specified by (RB)59:63 and the result is and the result is placed into register RA. placed into register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Rotate Left Word Immediate then Mask Rotate Left Word Immediate then AND Insert M-form with Mask M-form e_rlwimi RA,RS,SH,MB,ME e_rlwinm RA,RS,SH,MB,ME 29 RS RA SH MB ME 0 29 RS RA SH MB ME 1 0 6 11 16 21 26 31 0 6 11 16 21 26 31 n SH n SH r ROTL32((RS)32:63, n) r ROTL32((RS)32:63, n) m MASK(MB+32, ME+32) m MASK(MB+32, ME+32) RA r&m | (RA)&¬m RA r & m The contents of register RS are rotated32 left SH bits. A The contents of register RS are rotated32 left SH bits. A mask is generated having 1-bits from bit MB+32 mask is generated having 1-bits from bit MB+32 through bit ME+32 and 0-bits elsewhere. The rotated through bit ME+32 and 0-bits elsewhere. The rotated data is inserted into register RA under control of the data is ANDed with the generated mask and the result generated mask. is placed into register RA. Special Registers Altered: Special Registers Altered: None None Chapter 5. Fixed-Point Instructions 1169 Version 2.06 Shift Left Word Immediate X-form Shift Left Word Immediate Short Form IM5-form e_slwi RA,RS,SH (Rc=0) e_slwi. RA,RS,SH (Rc=1) se_slwi RX,UI5 31 RS RA SH 56 Rc 27 0 UI5 RX 0 6 11 16 21 31 0 6 7 12 15 n SH n UI5 r ROTL32((RS)32:63, n) r ROTL32((RX)32:63, n) m MASK(32, 63-n) m MASK(32, 63-n) RA r & m RX r & m The contents of the low-order 32 bits of register RS are The contents of the low-order 32 bits of register RX are shifted left SH bits. Bits shifted out of position 32 are shifted left UI5 bits. Bits shifted out of position 32 are lost. Zeros are supplied to the vacated positions on the lost. Zeros are supplied to the vacated positions on the right. The 32-bit result is placed into RA32:63. RA0:31 right. The 32-bit result is placed into RX32:63. RX0:31 are set to 0. are set to 0. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) None Shift Left Word RR-form Shift Right Algebraic Word Immediate IM5-form se_slw RX,RY se_srawi RX,UI5 16 2 RY RX 0 6 8 12 15 26 1 UI5 RX 0 6 7 12 15 n (RY)58:63 r ROTL32((RX)32:63, n) n UI5 if (RY)58 = 0 then m MASK(32, 63-n) r ROTL32((RX)32:63, 64-n) 64 else m 0 m MASK(n+32, 63) RX r & m s (RX)32 RX r&m | (64s)&¬m The contents of the low-order 32 bits of register RX are CA s & ((r&¬m)32:630) shifted left the number of bits specified by (RY)58:63. Bits shifted out of position 32 are lost. Zeros are sup- The contents of the low-order 32 bits of register RX are plied to the vacated positions on the right. The 32-bit shifted right UI5 bits. Bits shifted out of position 63 are result is placed into RX32:63. RX0:31 are set to 0. Shift lost, and bit 32 of RX is replicated to fill the vacated amounts from 32-63 give a zero result. positions on the left. Bit 32 of RX is replicated to fill RX0:31 and the 32-bit result is placed into RX32:63. CA Special Registers Altered: is set to 1 if the low-order 32 bits of register RX contain None a negative value and any 1-bits are shifted out of bit position 63; otherwise CA is set to 0. A shift amount of zero causes RX to receive EXTS((RX)32:63), and CA to be set to 0. Special Registers Altered: CA 1170 Power ISATM Book VLE Version 2.06 Shift Right Algebraic Word RR-form Shift Right Word Immediate X-form se_sraw RX,RY e_srwi RA,RS,SH (Rc=0) e_srwi. RA,RS,SH (Rc=1) 16 1 RY RX 0 6 8 12 15 31 RS RA SH 568 Rc 0 6 11 16 21 31 n (RY)59:63 r ROTL32((RX)32:63, 64-n) n SH if (RY)58 = 0 then m MASK(n+32, 63) r ROTL32((RS)32:63, 64-n) else m 64 0 m MASK(n+32, 63) s (RX)32 RA r & m RX r&m | (64s)&¬m CA s & ((r&¬m)32:630) The contents of the low-order 32 bits of register RS are shifted right SH bits. Bits shifted out of position 63 are The contents of the low-order 32 bits of register RX are lost. Zeros are supplied to the vacated positions on the shifted right the number of bits specified by (RY)58:63. left. The 32-bit result is placed into RA32:63. RA0:31 are Bits shifted out of position 63 are lost, and bit 32 of RX set to 0. is replicated to fill the vacated positions on the left. Bit 32 of RX is replicated to fill RX0:31 and the 32-bit result Special Registers Altered: is placed into RX32:63. CA is set to 1 if the low-order 32 CR0 (if Rc=1) bits of register RX contain a negative value and any 1- bits are shifted out of bit position 63; otherwise CA is set to 0. A shift amount of zero causes RX to receive EXTS((RX)32:63), and CA to be set to 0. Shift amounts from 32-63 give a result of 64 sign bits, and cause CA to receive the sign bit of (RX)32:63. Special Registers Altered: CA Shift Right Word Immediate Short Form Shift Right Word RR-form IM5-form se_srw RX,RY se_srwi RX,UI5 16 0 RY RX 26 0 UI5 RX 0 6 8 12 15 0 6 7 12 15 n (RY)59:63 n UI5 r ROTL32((RX)32:63, 64-n) r ROTL32((RX)32:63, 64-n) if (RY)58 = 0 then m MASK(n+32, 63) else m 64 m MASK(n+32, 63) 0 RX r & m RX r & m The contents of the low-order 32 bits of register RX are The contents of the low-order 32 bits of register RX are shifted right UI5 bits. Bits shifted out of position 63 are shifted right the number of bits specified by (RY)58:63. lost. Zeros are supplied to the vacated positions on the Bits shifted out of position 63 are lost. Zeros are sup- left. The 32-bit result is placed into RX32:63. RX0:31 are plied to the vacated positions on the left. The 32-bit set to 0. result is placed into RX32:63. RX0:31 are set to 0. Shift amounts from 32 to 63 give a zero result. Special Registers Altered: None Special Registers Altered: None Chapter 5. Fixed-Point Instructions 1171 Version 2.06 5.11 Move To/From System Register Instructions The VLE category provides 16-bit forms of instructions The fixed-point Move To/From System Register to move to/from the LR and CTR. instructions from Book III-E, mfspr, mtspr, mfdcr, mtdcr, mtmsr, mfmsr, wrtee, and The fixed-point Move To/From System Register instruc- wrteei are available while executing in VLE mode. The tions from Book I, mfspr, mtcrf, mfcr, mfdcrx, mtocrf, mnemonics, decoding, and semantics for these instruc- mfocrf, mcrxr, mtdcrux, mfdcrux, tions are identical to those in Book III-E; see and mtspr are available while executing in VLE mode. Section 5.4.1 of Book III-E for the instruction defini- The mnemonics, decoding, and semantics for these tions. instructions are identical to those in Book I; see Section 3.3.15 of Book I for the instruction definitions. Move From Count Register R-form Move From Link Register R-form se_mfctr RX se_mflr RX 0 10 RX 0 8 RX 0 6 12 15 0 6 12 15 RX CTR RX LR The CTR contents are placed into register RX. The LR contents are placed into register RX. Special Registers Altered: Special Registers Altered: None None Move To Count Register R-form Move To Link Register R-form se_mtctr RX se_mtlr RX 0 11 RX 0 9 RX 0 6 12 15 0 6 12 15 CTR (RX) LR (RX) The contents of register RX are placed into the CTR. The contents of register RX are placed into the LR. Special Registers Altered: Special Registers Altered: CTR LR 1172 Power ISATM Book VLE Version 2.06 Chapter 6. Storage Control Instructions 6.1 Storage Synchronization 6.4 TLB Management Instructions . . 1174 Instructions . . . . . . . . . . . . . . . . . . . . 1173 6.5 Instruction Alignment and Byte 6.2 Cache Management Ordering. . . . . . . . . . . . . . . . . . . . . . . 1174 Instructions . . . . . . . . . . . . . . . . . . . . 1174 6.3 Cache Locking Instructions. . . . . 1174 6.1 Storage Synchronization Instruction Synchronize C-form Instructions se_isync The memory synchronization instructions implemented 01 by category VLE are identical in semantics to those 0 15 defined in Book II and Book III-E. The se_isync instruction is defined by category VLE, but has the Executing an se_isync instruction ensures that all same semantics as isync. instructions preceding the se_isync instruction have The Load and Reserve and Store Conditional instruc- completed before the se_isync instruction completes, tions from Book II, lbarx, lharx, lwarx, stbcx., sthcx., and that no subsequent instructions are initiated until and stwcx., are available while executing in VLE after the se_isync instruction completes. It also mode. The mnemonics, decoding, and semantics for ensures that all instruction cache block invalidations those instructions are identical to those in Book II; see caused by icbi instructions preceding the se_isync Section 4.4.2 of Book II for the instruction definitions. instruction have been performed with respect to the processor executing the se_isync instruction, and then The Load and Reserve and Store Conditional instruc- causes any prefetched instructions to be discarded. tions from Book II, ldarx and stdcx. are available while executing in VLE mode on 64-bit implementations. The Except as described in the preceding sentence, the mnemonics, decoding, and semantics for those instruc- se_isync instruction may complete before storage tions are identical to those in Book II; see Section 4.4.2 accesses associated with instructions preceding the of Book II for the instruction definitions. se_isync instruction have been performed. This instruction is context synchronizing. The Memory Barrier instructions from Book II, sync and mbar are available while executing in VLE mode. The se_isync instruction has identical semantics to the The mnemonics, decoding, and semantics for those Book II isync instruction, but has a different encoding. instructions are identical to those in Book II; see Special Registers Altered: Section 4.4.3 of Book II for the instruction definitions. None The wait instruction from Book II is available while exe- cuting in VLE mode if the category Wait is imple- mented. The mnemonics, decoding, and semantics for wait are identical to those in Book II; see Section 4.4 of Book II for the instruction definition. Chapter 6. Storage Control Instructions 1173 Version 2.06 6.2 Cache Management Instruc- 6.5 Instruction Alignment and tions Byte Ordering Cache management instructions implemented by cate- Only Big-Endian instruction memory is supported when gory VLE are identical to those defined in Book II and executing from a page of VLE instructions. Attempting Book III-E. to fetch VLE instructions from a page marked as Little- Endian generates an instruction storage interrupt byte- The Cache Management instructions from Book II, ordering exception. dcba, dcbf, dcbst, dcbt, dcbtst, dcbz, icbi, and icbt are available while executing in VLE mode. The mne- monics, decoding, and semantics for these instructions are identical to those in Book II; see Section 4.3 of Book II for the instruction definitions. The Cache Management instruction from Book III-E, dcbi is available while executing in VLE mode. The mnemonics, decoding, and semantics for this instruc- tion are identical to those in Book III-E; see Section 6.11.1 of Book III-E for the instruction defini- tion. 6.3 Cache Locking Instructions Cache locking instructions implemented by category VLE are identical to those defined in Book III-E. If the Cache Locking instructions are implemented in cate- gory VLE, the category Embedded Cache Locking must also be implemented. The Cache Locking instructions from Book III-E, dcbtls, dcbtstls, dcblc, icbtls, and icblc are available while executing in VLE mode. The mnemonics, decod- ing, and semantics for these instructions are identical to those in Book III-E; see Section 6.11.2 of Book III-E for the instruction definitions. 6.4 TLB Management Instruc- tions The TLB Management instructions implemented by category VLE are identical to those defined in Book III- E. The TLB Management instructions from Book III-E, tlbre, tlbwe, tlbivax, tlbilx , tlbsync, tlbsrx. , and tlbsx are available while executing in VLE mode. The mnemonics, decoding, and semantics for these instructions are identical to those in Book III- E. See Section 6.11.4.9 of Book III-E. Instructions and resources described in Chapter 6 of Book III-E are available if the appropriate category is implemented. 1174 Power ISATM Book VLE Version 2.06 Chapter 7. Additional Categories Available in VLE 7.1 Move Assist . . . . . . . . . . . . . . . . 1175 7.8 Embedded Performance 7.2 Vector . . . . . . . . . . . . . . . . . . . . . 1175 Monitor. . . . . . . . . . . . . . . . . . . . . . . . 1176 7.3 Signal Processing Engine. . . . . . 1175 7.9 Processor Control . . . . . . . . . . . . 1176 7.4 Embedded Floating Point . . . . . . 1175 7.10 Decorated Storage . . . . . . . . . . 1176 7.5 Embedded Hypervisor . . . . . . . . 1175 7.11 Embedded Cache 7.6 Legacy Move Assist . . . . . . . . . . 1175 Initialization . . . . . . . . . . . . . . . . . . . . 1176 7.7 External PID . . . . . . . . . . . . . . . . 1176 7.12 Embedded Cache Debug . . . . . 1176 Instructions and resources from categories other than Base and Embedded are available in VLE. These 7.4 Embedded Floating Point include categories for which all the instructions in the Embedded Floating Point instructions implemented by category use primary opcode 4 or primary opcode 31. category VLE are identical to those defined in Book I. If category SPE.Embedded Float Scalar Double, SPE.Embedded Float Scalar Single, or SPE.Embed- 7.1 Move Assist ded Float Vector is supported in non-VLE mode, the appropriate Embedded Floating Point instructions are Move Assist instructions implemented by category VLE also supported in VLE mode. The mnemonics, decod- are identical to those defined in Book I. If category ing, and semantics for those instructions are identical Move Assist is supported in non-VLE mode, Move to those in Book I; see Chapter 9 of Book I for the Assist instructions are also supported in VLE mode. instruction definitions. The mnemonics, decoding, and semantics for those instructions are identical to those in Book I; see Section 3.3.6 of Book I for the instruction definitions. 7.5 Embedded Hypervisor Embedded Hypervisor instructions implemented by cat- 7.2 Vector egory VLE are not identical to those defined in Book III - E. The ehpriv instruction is identical in mnemonics, Vector instructions implemented by category VLE are decoding, and semantics to the instruction defined in identical to those defined in Book I. If category Vector is Book III-E. See Section 4.3.1 of Book III-E for the supported in non-VLE mode, Vector instructions are instruction definition. The sc instruction which provides also supported in VLE mode. The mnemonics, decod- a LEV field for executing calls to the hypervisor soft- ing, and semantics for those instructions are identical ware is implemented as e_sc, which is defined in to those in Book I; see Chapter 6 of Book I for the Section 4.3 of Book VLE. The rfgi instruction is imple- instruction definitions. mented as se_rfgi, which is also defined in Section 4.3 of Book VLE. If category Embedded Hypervisor is sup- ported in non-VLE mode, Embedded Hypervisor 7.3 Signal Processing Engine instructions are also supported in VLE mode. Signal Processing Engine instructions implemented by category VLE are identical to those defined in Book I. If category Signal Processing Engine is supported in non- 7.6 Legacy Move Assist VLE mode, Signal Processing Engine instructions are Legacy Move Assist instructions implemented by cate- also supported in VLE mode. The mnemonics, decod- gory VLE are identical to those defined in Book I. If cat- ing, and semantics for those instructions are identical egory Legacy Move Assist is supported in non-VLE to those in Book I; see Chapter 8 of Book I for the mode, Legacy Move Assist instructions are also sup- instruction definitions. Chapter 7. Additional Categories Available in VLE 1175 Version 2.06 ported in VLE mode. The mnemonics, decoding, and semantics for those instructions are identical to those in 7.11 Embedded Cache Initializa- Book I; see Chapter 10 of Book I for the instruction def- tion initions. Embedded Cache Initialization instructions imple- mented by category VLE are identical to those defined 7.7 External PID in Book III-E. If category Embedded.Cache Initialization is supported in non-VLE mode, Embedded Cache Ini- External Process ID instructions implemented by cate- tialization instructions are also supported in VLE mode. gory VLE are identical to those defined in Book III-E. If The mnemonics, decoding, and semantics for those category Embedded.External PID is supported in non- instructions are identical to those in Book III-E; Chapter VLE mode, External Process ID instructions are also A.1 of Book III-E for the instruction definitions. supported in VLE mode. The mnemonics, decoding, and semantics for those instructions are identical to those in Book III-E; see Chapter 5.3.6 of Book III-E for 7.12 Embedded Cache Debug the instruction definitions. Embedded Cache Debug instructions implemented by category VLE are identical to those defined in Book III- 7.8 Embedded Performance E. If category Embedded.Cache Debug is supported in non-VLE mode, Embedded Cache Debug instructions Monitor are also supported in VLE mode. The mnemonics, decoding, and semantics for those instructions are Embedded Performance Monitor instructions imple- identical to those in Book III-E; Chapter A.2 of Book III- mented by category VLE are identical to those defined E for the instruction definitions. in Book III-E. If category Embedded.Performance Mon- itor is supported in non-VLE mode, Embedded Perfor- mance Monitor instructions are also supported in VLE mode. The mnemonics, decoding, and semantics for those instructions are identical to those in Book III-E; see Appendix D of Book III-E for the instruction defini- tions. 7.9 Processor Control Processor Control instructions implemented by cate- gory VLE are identical to those defined in Book III-E. If category Embedded.Processor Control is supported in non-VLE mode, Processor Control instructions are also supported in VLE mode. The mnemonics, decoding, and semantics for those instructions are identical to those in Book III-E; see Chapter 11 of Book III-E for the instruction definitions. 7.10 Decorated Storage Decorated Storage instructions implemented by cate- gory VLE are identical to those defined in Book II. If category Decorated Storage is supported in non-VLE mode, Decorated Storage instructions are also sup- ported in VLE mode. The mnemonics, decoding, and semantics for those instructions are identical to those in Book II; see Chapter 6 of Book II for the instruction def- initions. 1176 Power ISATM Book VLE Version 2.06 Appendix A. VLE Instruction Set Sorted by Mnemonic This appendix lists all the instructions available in VLE mode in the Power ISA, in order by mnemonic. Opcodes that are not defined below are treated as illegal by category VLE. Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 XO 7C000214 SR B add[o][.] Add XO 7C000014 SR B addc[o][.] Add Carrying XO 7C000114 SR B adde[o][.] Add Extended XO 7C0001D4 SR B addme[o][.] Add to Minus One Extended XO 7C000194 SR B addze[o][.] Add to Zero Extended X 7C000038 SR B and[.] AND X 7C000078 SR B andc[.] AND with Complement EVX 1000020F SP brinc Bit Reversed Increment X 7C000000 B cmp Compare X 7C000040 B cmpl Compare Logical X 7C000074 SR 64 cntlzd[.] Count Leading Zeros Doubleword X 7C000034 SR B cntlzw[.] Count Leading Zeros Word X 7C0005EC E dcba Data Cache Block Allocate X 7C0000AC B dcbf Data Cache Block Flush X 7C0000FE P E.PD dcbfep Data Cache Block Flush by External PID X 7C0003AC P E dcbi Data Cache Block Invalidate X 7C00030C M ECL dcblc Data Cache Block Lock Clear X 7C00006C B dcbst Data Cache Block Store X 7C00007E E.PD dcbstep Data Cache Block Store by External PID X 7C00022C B dcbt Data Cache Block Touch X 7C00027E P E.PD dcbtep Data Cache Block Touch by External PID X 7C00014C M ECL dcbtls Data Cache Block Touch and Lock Set X 7C0001EC B dcbtst Data Cache Block Touch for Store X 7C0001FE P E.PD dcbtstep Data Cache Block Touch for Store by External PID X 7C00010C M ECL dcbtstls Data Cache Block Touch for Store and Lock Set X 7C0007EC B dcbz Data Cache Block set to Zero X 7C0007FE P E.PD dcbzep Data Cache Block set to Zero by External PID X 7C00038C H E.CI dci Data Cache Invalidate X 7C0003CC H E.CD dcread Data Cache Read [Alternative Encoding] XO 7C0003D2 SR 64 divd[o][.] Divide Doubleword XO 7C000392 SR 64 divdu[o][.] Divide Doubleword Unsigned XO 7C0003D6 SR B divw[o][.] Divide Word XO 7C000396 SR B divwu[o][.] Divide Word Unsigned X 7C00009C SR LMV dlmzb[.] Determine Leftmost Zero Byte X 7C0003C6 DS dsn Decorated Storage Notify D 1C000000 VLE e_add16i Add Immediate I16A 70008800 SR VLE e_add2i. Add (2 operand) Immediate and Record I16A 70009000 VLE e_add2is Add (2 operand) Immediate Shifted SCI8 18008000 SR VLE e_addi[.] Add Scaled Immediate SCI8 18009000 SR VLE e_addic[.] Add Scaled Immediate Carrying I16L 7000C800 SR VLE e_and2i. AND (two operand) Immediate I16L 7000E800 SR VLE e_and2is. AND (2 operand) Immediate Shifted SCI8 1800C000 SR VLE e_andi[.] AND Scaled Immediate BD24 78000000 VLE e_b[l] Branch [and Link] BD15 7A000000 CT VLE e_bc[l] Branch Conditional [and Link] Appendix A. VLE Instruction Set Sorted by Mnemonic 1177 Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 I16A 70009800 VLE e_cmp16i Compare Immediate Word X 7C00001C VLE e_cmph Compare Halfword I16A 7000B000 VLE e_cmph16i Compare Halfword Immediate X 7C00005C VLE e_cmphl Compare Halfword Logical I16A 7000B800 VLE e_cmphl16i Compare Halfword Logical Immediate SCI8 1800A800 VLE e_cmpi Compare Scaled Immediate Word I16A 7000A800 VLE e_cmpl16i Compare Logical Immediate Word SCI8 1880A800 VLE e_cmpli Compare Logical Scaled Immediate Word XL 7C000202 VLE e_crand Condition Register AND XL 7C000102 VLE e_crandc Condition Register AND with Complement XL 7C000242 VLE e_creqv Condition Register Equivalent XL 7C0001C2 VLE e_crnand Condition Register NAND XL 7C000042 VLE e_crnor Condition Register NOR XL 7C000382 VLE e_cror Condition Register OR XL 7C000342 VLE e_crorc Condition Register OR with Complement XL 7C000182 VLE e_crxor Condition Register XOR D 30000000 VLE e_lbz Load Byte and Zero D8 18000000 VLE e_lbzu Load Byte and Zero with Update D 38000000 VLE e_lha Load Halfword Algebraic D8 18000300 VLE e_lhau Load Halfword Algebraic with Update D 58000000 VLE e_lhz Load Halfword and Zero D8 18000100 VLE e_lhzu Load Halfword and Zero with Update LI20 70000000 VLE e_li Load Immediate I16L 7000E000 VLE e_lis Load Immediate Shifted D8 18000800 VLE e_lmw Load Multiple Word D 50000000 VLE e_lwz Load Word and Zero D8 18000200 VLE e_lwzu Load Word and Zero with Update XL 7C000020 VLE e_mcrf Move CR Field I16A 7000A000 VLE e_mull2i Multiply (2 operand) Low Immediate SCI8 1800A000 VLE e_mulli Multiply Low Scaled Immediate I16L 7000C000 VLE e_or2i OR (two operand) Immediate I16L 7000D000 VLE e_or2is OR (2 operand) Immediate Shifted SCI8 1800D000 SR VLE e_ori[.] OR Scaled Immediate X 7C000230 SR VLE e_rlw[.] Rotate Left Word X 7C000270 SR VLE e_rlwi[.] Rotate Left Word Immediate M 74000000 VLE e_rlwimi Rotate Left Word Immediate then Mask Insert M 74000001 VLE e_rlwinm Rotate Left Word Immediate then AND with Mask ESC 7C000048 VLE, e_sc System Call E.HV X 7C000070 SR VLE e_slwi[.] Shift Left Word Immediate X 7C000470 SR VLE e_srwi[.] Shift Right Word Immediate D 34000000 VLE e_stb Store Byte D8 18000400 VLE e_stbu Store Byte with Update D 5C000000 VLE e_sth Store Halfword D8 18000500 VLE e_sthu Store Halfword with Update D8 18000900 VLE e_stmw Store Multiple Word D 54000000 VLE e_stw Store Word D8 18000600 VLE e_stwu Store Word with Update SCI8 1800B000 SR VLE e_subfic[.] Subtract From Scaled Immediate Carrying SCI8 1800E000 SR VLE e_xori[.] XOR Scaled Immediate EVX 100002E4 SP.FD efdabs Floating-Point Double-Precision Absolute Value EVX 100002E0 SP.FD efdadd Floating-Point Double-Precision Add EVX 100002EF SP.FD efdcfs Floating-Point Double-Precision Convert from Single-Pre- cision EVX 100002F3 SP.FD efdcfsf Convert Floating-Point Double-Precision from Signed Fraction EVX 100002F1 SP.FD efdcfsi Convert Floating-Point Double-Precision from Signed Integer EVX 100002E3 SP.FD efdcfsid Convert Floating-Point Double-Precision from Signed Integer Doubleword 1178 Power ISATM Book VLE Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 100002F2 SP.FD efdcfuf Convert Floating-Point Double-Precision from Unsigned Fraction EVX 100002F0 SP.FD efdcfui Convert Floating-Point Double-Precision from Unsigned Integer EVX 100002E2 SP.FD efdcfuid Convert Floating-Point Double-Precision from Unsigned Integer Doubleword EVX 100002EE SP.FD efdcmpeq Floating-Point Double-Precision Compare Equal EVX 100002EC SP.FD efdcmpgt Floating-Point Double-Precision Compare Greater Than EVX 100002ED SP.FD efdcmplt Floating-Point Double-Precision Compare Less Than EVX 100002F7 SP.FD efdctsf Convert Floating-Point Double-Precision to Signed Frac- tion EVX 100002F5 SP.FD efdctsi Convert Floating-Point Double-Precision to Signed Inte- ger EVX 100002EB SP.FD efdctsidz Convert Floating-Point Double-Precision to Signed Inte- ger Doubleword with Round toward Zero EVX 100002FA SP.FD efdctsiz Convert Floating-Point Double-Precision to Signed Inte- ger with Round toward Zero EVX 100002F6 SP.FD efdctuf Convert Floating-Point Double-Precision to Unsigned Fraction EVX 100002F4 SP.FD efdctui Convert Floating-Point Double-Precision to Unsigned Integer EVX 100002EA SP.FD efdctuidz Convert Floating-Point Double-Precision to Unsigned Integer Doubleword with Round toward Zero EVX 100002F8 SP.FD efdctuiz Convert Floating-Point Double-Precision to Unsigned Integer with Round toward Zero EVX 100002E9 SP.FD efddiv Floating-Point Double-Precision Divide EVX 100002E8 SP.FD efdmul Floating-Point Double-Precision Multiply EVX 100002E5 SP.FD efdnabs Floating-Point Double-Precision Negative Absolute Value EVX 100002E6 SP.FD efdneg Floating-Point Double-Precision Negate EVX 100002E1 SP.FD efdsub Floating-Point Double-Precision Subtract EVX 100002FE SP.FD efdtsteq Floating-Point Double-Precision Test Equal EVX 100002FC SP.FD efdtstgt Floating-Point Double-Precision Test Greater Than EVX 100002FD SP.FD efdtstlt Floating-Point Double-Precision Test Less Than EVX 100002C4 SP.FS efsabs Floating-Point Single-Precision Absolute Value EVX 100002C0 SP.FS efsadd Floating-Point Single-Precision Add EVX 100002CF SP.FD efscfd Floating-Point Single-Precision Convert from Double-Pre- cision EVX 100002D3 SP.FS efscfsf Convert Floating-Point Single-Precision from Signed Fraction EVX 100002D1 SP.FS efscfsi Convert Floating-Point Single-Precision from Signed Inte- ger EVX 100002D2 SP.FS efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction EVX 100002D0 SP.FS efscfui Convert Floating-Point Single-Precision from Unsigned Integer EVX 100002CE SP.FS efscmpeq Floating-Point Single-Precision Compare Equal EVX 100002CC SP.FS efscmpgt Floating-Point Single-Precision Compare Greater Than EVX 100002CD SP.FS efscmplt Floating-Point Single-Precision Compare Less Than EVX 100002D7 SP.FS efsctsf Convert Floating-Point Single-Precision to Signed Frac- tion EVX 100002D5 SP.FS efsctsi Convert Floating-Point Single-Precision to Signed Integer EVX 100002DA SP.FS efsctsiz Convert Floating-Point Single-Precision to Signed Integer with Round Towards Zero EVX 100002D6 SP.FS efsctuf Convert Floating-Point Single-Precision to Unsigned Fraction EVX 100002D4 SP.FS efsctui Convert Floating-Point Single-Precision to Unsigned Inte- ger EVX 100002D8 SP.FS efsctuiz Convert Floating-Point Single-Precision to Unsigned Inte- ger with Round Towards Zero EVX 100002C9 SP.FS efsdiv Floating-Point Single-Precision Divide Appendix A. VLE Instruction Set Sorted by Mnemonic 1179 Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 100002C8 SP.FS efsmul Floating-Point Single-Precision Multiply EVX 100002C5 SP.FS efsnabs Floating-Point Single-Precision Negative Absolute Value EVX 100002C6 SP.FS efsneg Floating-Point Single-Precision Negate EVX 100002C1 SP.FS efssub Floating-Point Single-Precision Subtract EVX 100002DE SP.FS efststeq Floating-Point Single-Precision Test Equal EVX 100002DC SP.FS efststgt Floating-Point Single-Precision Test Greater Than EVX 100002DD SP.FS efststlt Floating-Point Single-Precision Test Less Than XL 7C00021C E.HV ehpriv Embedded Hypervisor Privilege X 7C000238 SR B eqv[.] Equivalent EVX 10000208 SP evabs Vector Absolute Value EVX 10000202 SP evaddiw Vector Add Immediate Word EVX 100004C9 SP evaddsmiaaw Vector Add Signed, Modulo, Integer to Accumulator Word EVX 100004C1 SP evaddssiaaw Vector Add Signed, Saturate, Integer to Accumulator Word EVX 100004C8 SP evaddumiaaw Vector Add Unsigned, Modulo, Integer to Accumulator Word EVX 100004C0 SP evaddusiaaw Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX 10000200 SP evaddw Vector Add Word EVX 10000211 SP evand Vector AND EVX 10000212 SP evandc Vector AND with Complement EVX 10000234 SP evcmpeq Vector Compare Equal EVX 10000231 SP evcmpgts Vector Compare Greater Than Signed EVX 10000230 SP evcmpgtu Vector Compare Greater Than Unsigned EVX 10000233 SP evcmplts Vector Compare Less Than Signed EVX 10000232 SP evcmpltu Vector Compare Less Than Unsigned EVX 1000020E SP evcntlsw Vector Count Leading Signed Bits Word EVX 1000020D SP evcntlzw Vector Count Leading Zeros Word EVX 100004C6 SP evdivws Vector Divide Word Signed EVX 100004C7 SP evdivwu Vector Divide Word Unsigned EVX 10000219 SP eveqv Vector Equivalent EVX 1000020A SP evextsb Vector Extend Sign Byte EVX 1000020B SP evextsh Vector Extend Sign Halfword EVX 10000284 SP.FV evfsabs Vector Floating-Point Single-Precision Absolute Value EVX 10000280 SP.FV evfsadd Vector Floating-Point Single-Precision Add EVX 10000293 SP.FV evfscfsf Vector Convert Floating-Point Single-Precision from Signed Fraction EVX 10000291 SP.FV evfscfsi Vector Convert Floating-Point Single-Precision from Signed Integer EVX 10000292 SP.FV evfscfuf Vector Convert Floating-Point Single-Precision from Unsigned Fraction EVX 10000290 SP.FV evfscfui Vector Convert Floating-Point Single-Precision from Unsigned Integer EVX 1000028E SP.FV evfscmpeq Vector Floating-Point Single-Precision Compare Equal EVX 1000028C SP.FV evfscmpgt Vector Floating-Point Single-Precision Compare Greater Than EVX 1000028D SP.FV evfscmplt Vector Floating-Point Single-Precision Compare Less Than EVX 10000297 SP.FV evfsctsf Vector Convert Floating-Point Single-Precision to Signed Fraction EVX 10000295 SP.FV evfsctsi Vector Convert Floating-Point Single-Precision to Signed Integer EVX 1000029A SP.FV evfsctsiz Vector Convert Floating-Point Single-Precision to Signed Integer with Round Toward Zero EVX 10000296 SP.FV evfsctuf Vector Convert Floating-Point Single-Precision to Unsigned Fraction EVX 10000294 SP.FV evfsctui Vector Convert Floating-Point Single-Precision to Unsigned Integer EVX 10000298 SP.FV evfsctuiz Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero 1180 Power ISATM Book VLE Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000289 SP.FV evfsdiv Vector Floating-Point Single-Precision Divide EVX 10000288 SP.FV evfsmul Vector Floating-Point Single-Precision Multiply EVX 10000285 SP.FV evfsnabs Vector Floating-Point Single-Precision Negative Abso- lute Value EVX 10000286 SP.FV evfsneg Vector Floating-Point Single-Precision Negate EVX 10000281 SP.FV evfssub Vector Floating-Point Single-Precision Subtract EVX 1000029E SP.FV evfststeq Vector Floating-Point Single-Precision Test Equal EVX 1000029C SP.FV evfststgt Vector Floating-Point Single-Precision Test Greater Than EVX 1000029D SP.FV evfststlt Vector Floating-Point Single-Precision Test Less Than EVX 10000301 SP evldd Vector Load Double Word into Double Word EVX 7C00063E P E.PD evlddepx Vector Load Doubleword into Doubleword by External Process ID Indexed EVX 10000300 SP evlddx Vector Load Double Word into Double Word Indexed EVX 10000305 SP evldh Vector Load Double Word into Four Halfwords EVX 10000304 SP evldhx Vector Load Double Word into Four Halfwords Indexed EVX 10000303 SP evldw Vector Load Double Word into Two Words EVX 10000302 SP evldwx Vector Load Double Word into Two Words Indexed EVX 10000309 SP evlhhesplat Vector Load Halfword into Halfwords Even and Splat EVX 10000308 SP evlhhesplatx Vector Load Halfword into Halfwords Even and Splat Indexed EVX 1000030F SP evlhhossplat Vector Load Halfword into Halfword Odd and Splat EVX 1000030E SP evlhhossplatx Vector Load Halfword into Halfword Odd Signed and Splat Indexed EVX 1000030D SP evlhhousplat Vector Load Halfword into Halfword Odd Unsigned and Splat EVX 1000030C SP evlhhousplatx Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed EVX 10000311 SP evlwhe Vector Load Word into Two Halfwords Even EVX 10000310 SP evlwhex Vector Load Word into Two Halfwords Even Indexed EVX 10000317 SP evlwhos Vector Load Word into Two Halfwords Odd Signed (with sign extension) EVX 10000316 SP evlwhosx Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) EVX 10000315 SP evlwhou Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) EVX 10000314 SP evlwhoux Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) EVX 1000031D SP evlwhsplat Vector Load Word into Two Halfwords and Splat EVX 1000031C SP evlwhsplatx Vector Load Word into Two Halfwords and Splat Indexed EVX 10000319 SP evlwwsplat Vector Load Word into Word and Splat EVX 10000318 SP evlwwsplatx Vector Load Word into Word and Splat Indexed EVX 1000022C SP evmergehi Vector Merge High EVX 1000022E SP evmergehilo Vector Merge High/Low EVX 1000022D SP evmergelo Vector Merge Low EVX 1000022F SP evmergelohi Vector Merge Low/High EVX 1000052B SP evmhegsmfaa Vector Multiply Halfwords, Even, Guarded, Signed, Mod- ulo, Fractional and Accumulate EVX 100005AB SP evmhegsmfan Vector Multiply Halfwords, Even, Guarded, Signed, Mod- ulo, Fractional and Accumulate Negative EVX 10000529 SP evmhegsmiaa Vector Multiply Halfwords, Even, Guarded, Signed, Mod- ulo, Integer and Accumulate EVX 100005A9 SP evmhegsmian Vector Multiply Halfwords, Even, Guarded, Signed, Mod- ulo, Integer and Accumulate Negative EVX 10000528 SP evmhegumiaa Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 100005A8 SP evmhegumian Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 1000040B SP evmhesmf Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional Appendix A. VLE Instruction Set Sorted by Mnemonic 1181 Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 1000042B SP evmhesmfa Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional to Accumulate EVX 1000050B SP evmhesmfaaw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate into Words EVX 1000058B SP evmhesmfanw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 10000409 SP evmhesmi Vector Multiply Halfwords, Even, Signed, Modulo, Integer EVX 10000429 SP evmhesmia Vector Multiply Halfwords, Even, Signed, Modulo, Integer to Accumulator EVX 10000509 SP evmhesmiaaw Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate into Words EVX 10000589 SP evmhesmianw Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate Negative into Words EVX 10000403 SP evmhessf Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional EVX 10000423 SP evmhessfa Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional to Accumulator EVX 10000503 SP evmhessfaaw Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional and Accumulate into Words EVX 10000583 SP evmhessfanw Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 10000501 SP evmhessiaaw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate into Words EVX 10000581 SP evmhessianw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate Negative into Words EVX 10000408 SP evmheumi Vector Multiply Halfwords, Even, Unsigned, Modulo, Inte- ger EVX 10000428 SP evmheumia Vector Multiply Halfwords, Even, Unsigned, Modulo, Inte- ger to Accumulator EVX 10000508 SP evmheumiaaw Vector Multiply Halfwords, Even, Unsigned, Modulo, Inte- ger and Accumulate into Words EVX 10000588 SP evmheumianw Vector Multiply Halfwords, Even, Unsigned, Modulo, Inte- ger and Accumulate Negative into Words EVX 10000500 SP evmheusiaaw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate into Words EVX 10000580 SP evmheusianw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 1000052F SP evmhogsmfaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate EVX 100005AF SP evmhogsmfan Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate Negative EVX 1000052D SP evmhogsmiaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer, and Accumulate EVX 100005AD SP evmhogsmian Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate Negative EVX 1000052C SP evmhogumiaa Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 100005AC SP evmhogumian Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 1000040F SP evmhosmf Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional EVX 1000042F SP evmhosmfa Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional to Accumulator EVX 1000050F SP evmhosmfaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate into Words EVX 1000058F SP evmhosmfanw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 1000040D SP evmhosmi Vector Multiply Halfwords, Odd, Signed, Modulo, Integer 1182 Power ISATM Book VLE Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 1000042D SP evmhosmia Vector Multiply Halfwords, Odd, Signed, Modulo, Integer to Accumulator EVX 1000050D SP evmhosmiaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate into Words EVX 1000058D SP evmhosmianw Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate Negative into Words EVX 10000407 SP evmhossf Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional EVX 10000427 SP evmhossfa Vector Multiply Halfwords, Odd, Signed, Fractional to Accumulator EVX 10000507 SP evmhossfaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate into Words EVX 10000587 SP evmhossfanw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 10000505 SP evmhossiaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate into Words EVX 10000585 SP evmhossianw Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate Negative into Words EVX 1000040C SP evmhoumi Vector Multiply Halfwords, Odd, Unsigned, Modulo, Inte- ger EVX 1000042C SP evmhoumia Vector Multiply Halfwords, Odd, Unsigned, Modulo, Inte- ger to Accumulator EVX 1000050C SP evmhoumiaaw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Inte- ger and Accumulate into Words EVX 1000058C SP evmhoumianw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Inte- ger and Accumulate Negative into Words EVX 10000504 SP evmhousiaaw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Inte- ger and Accumulate into Words EVX 10000584 SP evmhousianw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Inte- ger and Accumulate Negative into Words EVX 100004C4 SP evmra Initialize Accumulator EVX 1000044F SP evmwhsmf Vector Multiply Word High Signed, Modulo, Fractional EVX 1000046F SP evmwhsmfa Vector Multiply Word High Signed, Modulo, Fractional to Accumulator EVX 1000044D SP evmwhsmi Vector Multiply Word High Signed, Modulo, Integer EVX 1000046D SP evmwhsmia Vector Multiply Word High Signed, Modulo, Integer to Accumulator EVX 10000447 SP evmwhssf Vector Multiply Word High Signed, Saturate, Fractional EVX 10000467 SP evmwhssfa Vector Multiply Word High Signed, Saturate, Fractional to Accumulator EVX 1000044C SP evmwhumi Vector Multiply Word High Unsigned, Modulo, Integer EVX 1000046C SP evmwhumia Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator EVX 10000549 SP evmwlsmiaaw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words EVX 100005C9 SP evmwlsmianw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words EVX 10000541 SP evmwlssiaaw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words EVX 100005C1 SP evmwlssianw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words EVX 10000448 SP evmwlumi Vector Multiply Word Low Unsigned, Modulo, Integer EVX 10000468 SP evmwlumia Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator EVX 10000548 SP evmwlumiaaw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words EVX 100005C8 SP evmwlumianw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words Appendix A. VLE Instruction Set Sorted by Mnemonic 1183 Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000540 SP evmwlusiaaw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Words EVX 100005C0 SP evmwlusianw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words EVX 1000045B SP evmwsmf Vector Multiply Word Signed, Modulo, Fractional EVX 1000047B SP evmwsmfa Vector Multiply Word Signed, Modulo, Fractional to Accu- mulator EVX 1000055B SP evmwsmfaa Vector Multiply Word Signed, Modulo, Fractional and Accumulate EVX 100005DB SP evmwsmfan Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative EVX 10000459 SP evmwsmi Vector Multiply Word Signed, Modulo, Integer EVX 10000479 SP evmwsmia Vector Multiply Word Signed, Modulo, Integer to Accumu- lator EVX 10000559 SP evmwsmiaa Vector Multiply Word Signed, Modulo, Integer and Accu- mulate EVX 100005D9 SP evmwsmian Vector Multiply Word Signed, Modulo, Integer and Accu- mulate Negative EVX 10000453 SP evmwssf Vector Multiply Word Signed, Saturate, Fractional EVX 10000473 SP evmwssfa Vector Multiply Word Signed, Saturate, Fractional to Accumulator EVX 10000553 SP evmwssfaa Vector Multiply Word Signed, Saturate, Fractional and Accumulate EVX 100005D3 SP evmwssfan Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative EVX 10000458 SP evmwumi Vector Multiply Word Unsigned, Modulo, Integer EVX 10000478 SP evmwumia Vector Multiply Word Unsigned, Modulo, Integer to Accu- mulator EVX 10000558 SP evmwumiaa Vector Multiply Word Unsigned, Modulo, Integer and Accumulate EVX 100005D8 SP evmwumian Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative EVX 1000021E SP evnand Vector NAND EVX 10000209 SP evneg Vector Negate EVX 10000218 SP evnor Vector NOR EVX 10000217 SP evor Vector OR EVX 1000021B SP evorc Vector OR with Complement EVX 10000228 SP evrlw Vector Rotate Left Word EVX 1000022A SP evrlwi Vector Rotate Left Word Immediate EVX 1000020C SP evrndw Vector Round Word EVS 10000278 SP evsel Vector Select EVX 10000224 SP evslw Vector Shift Left Word EVX 10000226 SP evslwi Vector Shift Left Word Immediate EVX 1000022B SP evsplatfi Vector Splat Fractional Immediate EVX 10000229 SP evsplati Vector Splat Immediate EVX 10000223 SP evsrwis Vector Shift Right Word Immediate Signed EVX 10000222 SP evsrwiu Vector Shift Right Word Immediate Unsigned EVX 10000221 SP evsrws Vector Shift Right Word Signed EVX 10000220 SP evsrwu Vector Shift Right Word Unsigned EVX 10000321 SP evstdd Vector Store Double of Double EVX 7C00073E P E.PD evstddepx Vector Store Doubleword into Doubleword by External Process ID Indexed EVX 10000320 SP evstddx Vector Store Doubleword of Doubleword Indexed EVX 10000325 SP evstdh Vector Store Double of Four Halfwords EVX 10000324 SP evstdhx Vector Store Double of Four Halfwords Indexed EVX 10000323 SP evstdw Vector Store Double of Two Words EVX 10000322 SP evstdwx Vector Store Double of Two Words Indexed EVX 10000331 SP evstwhe Vector Store Word of Two Halfwords from Even EVX 10000330 SP evstwhex Vector Store Word of Two Halfwords from Even Indexed EVX 10000335 SP evstwho Vector Store Word of Two Halfwords from Odd 1184 Power ISATM Book VLE Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000334 SP evstwhox Vector Store Word of Two Halfwords from Odd Indexed EVX 10000339 SP evstwwe Vector Store Word of Word from Even EVX 10000338 SP evstwwex Vector Store Word of Word from Even Indexed EVX 1000033D SP evstwwo Vector Store Word of Word from Odd EVX 1000033C SP evstwwox Vector Store Word of Word from Odd Indexed EVX 100004CB SP evsubfsmiaaw Vector Subtract Signed, Modulo, Integer to Accumulator Word EVX 100004C3 SP evsubfssiaaw Vector Subtract Signed, Saturate, Integer to Accumulator Word EVX 100004CA SP evsubfumiaaw Vector Subtract Unsigned, Modulo, Integer to Accumula- tor Word EVX 100004C2 SP evsubfusiaaw Vector Subtract Unsigned, Saturate, Integer to Accumula- tor Word EVX 10000204 SP evsubfw Vector Subtract from Word EVX 10000206 SP evsubifw Vector Subtract Immediate from Word EVX 10000216 SP evxor Vector XOR X 7C000774 SR B extsb[.] Extend Sign Byte X 7C000734 SR B extsh[.] Extend Sign Halfword X 7C0007B4 SR 64 extsw[.] Extend Sign Word X 7C0007AC B icbi Instruction Cache Block Invalidate X 7C0007BE P E.PD icbiep Instruction Cache Block Invalidate by External PID X 7C0001CC M ECL icblc Instruction Cache Block Lock Clear X 7C00002C E icbt Instruction Cache Block Touch X 7C0003CC M ECL icbtls Instruction Cache Block Touch and Lock Set X 7C00078C H E.CI ici Instruction Cache Invalidate X 7C0007CC H E.CD icread Instruction Cache Read A 7C00001E B isel Integer Select X 7C000068 B lbarx Load Byte and Reserve Indexed X 7C000406 DS lbdx Load Byte with Decoration Indexed X 7C0000BE P E.PD lbepx Load Byte by External Process ID Indexed X 7C0000EE B lbzux Load Byte and Zero with Update Indexed X 7C0000AE B lbzx Load Byte and Zero Indexed X 7C0000A8 64 ldarx Load Doubleword And Reserve Indexed X 7C0004C6 DS lddx Load Doubleword with Decoration Indexed X 7C00003A P E.PD;64 ldepx Load Doubleword by External Process ID Indexed X 7C00006A 64 ldux Load Doubleword with Update Indexed X 7C00002A 64 ldx Load Doubleword Indexed X 7C000646 DS lfddx Load Floating Doubleword with Decoration Indexed X 7C0004BE P E.PD lfdepx Load Floating-Point Double by External Process ID Indexed X 7C0000E8 B lharx Load Halfword and Reserve Indexed X 7C0002EE B lhaux Load Halfword Algebraic with Update Indexed X 7C0002AE B lhax Load Halfword Algebraic Indexed X 7C00062C B lhbrx Load Halfword Byte-Reverse Indexed X 7C000446 DS lhdx Load Halfword with Decoration Indexed X 7C00023E P E.PD lhepx Load Halfword by External Process ID Indexed X 7C00026E B lhzux Load Halfword and Zero with Update Indexed X 7C00022E B lhzx Load Halfword and Zero Indexed X 7C0004AA MA lswi Load String Word Immediate X 7C00042A MA lswx Load String Word Indexed X 7C00000E V lvebx Load Vector Element Byte Indexed X 7C00004E V lvehx Load Vector Element Halfword Indexed X 7C00024E P E.PD lvepx Load Vector by External Process ID Indexed X 7C00020E P E.PD lvepxl Load Vector by External Process ID Indexed LRU X 7C00008E V lvewx Load Vector Element Word Indexed X 7C00000C V lvsl Load Vector for Shift Left Indexed X 7C00004C V lvsr Load Vector for Shift Right Indexed X 7C0000CE V lvx Load Vector Indexed X 7C0002CE V lvxl Load Vector Indexed LRU X 7C000028 B lwarx Load Word And Reserve Indexed X 7C0002EA 64 lwaux Load Word Algebraic with Update Indexed Appendix A. VLE Instruction Set Sorted by Mnemonic 1185 Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 7C0002AA 64 lwax Load Word Algebraic Indexed X 7C00042C B lwbrx Load Word Byte-Reverse Indexed X 7C000486 DS lwdx Load Word with Decoration Indexed X 7C00003E P E.PD lwepx Load Word by External Process ID Indexed X 7C00006E B lwzux Load Word and Zero with Update Indexed X 7C00002E B lwzx Load Word and Zero Indexed XO 10000158 SR LMA macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed XO 100001D8 SR LMA macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed XO 10000198 SR LMA macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned XO 10000118 SR LMA macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned XO 10000058 SR LMA machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed XO 100000D8 SR LMA machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed XO 10000098 SR LMA machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned XO 10000018 SR LMA machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned XO 10000358 SR LMA maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed XO 100003D8 SR LMA maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed XO 10000398 SR LMA maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned XO 10000318 SR LMA maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned X 7C0006AC E mbar Memory Barrier X 7C000400 E mcrxr Move To Condition Register from XER XFX 7C000026 B mfcr Move From Condition Register XFX 7C000286 P E.DC mfdcr Move From Device Control Register X 7C000246 E.DC mfdcrux Move From Device Control Register User-mode Indexed X 7C000206 P E.DC mfdcrx Move From Device Control Register Indexed X 7C0000A6 P B mfmsr Move From Machine State Register XFX 7C100026 B mfocrf Move From One Condition Register Field XFX 7C00029C O E.PM mfpmr Move From Performance Monitor Register XFX 7C0002A6 O B mfspr Move From Special Purpose Register VX 10000604 V mfvscr Move From VSCR X 7C0001DC H E.PC msgclr Message Clear X 7C00019C H E.PC msgsnd Message Send XFX 7C000120 B mtcrf Move To Condition Register Fields XFX 7C000386 P E.DC mtdcr Move To Device Control Register X 7C000346 E.DC mtdcrux Move To Device Control Register User-mode Indexed X 7C000306 P E.DC mtdcrx Move To Device Control Register Indexed X 7C000124 P E mtmsr Move To Machine State Register XFX 7C100120 B mtocrf Move To One Condition Register Field XFX 7C00039C O E.PM mtpmr Move To Performance Monitor Register XFX 7C0003A6 O B mtspr Move To Special Purpose Register VX 10000644 V mtvscr Move To VSCR X 10000150 SR LMA mulchw[.] Multiply Cross Halfword to Word Signed X 10000110 SR LMA mulchwu[.] Multiply Cross Halfword to Word Unsigned XO 7C000092 SR 64 mulhd[.] Multiply High Doubleword XO 7C000012 SR 64 mulhdu[.] Multiply High Doubleword Unsigned X 10000050 SR LMA mulhhw[.] Multiply High Halfword to Word Signed X 10000010 SR LMA mulhhwu[.] Multiply High Halfword to Word Unsigned XO 7C000096 SR B mulhw[.] Multiply High Word XO 7C000016 SR B mulhwu[.] Multiply High Word Unsigned XO 7C0001D2 SR 64 mulld[o][.] Multiply Low Doubleword 1186 Power ISATM Book VLE Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 10000350 SR LMA mullhw[.] Multiply Low Halfword to Word Signed X 10000310 SR LMA mullhwu[.] Multiply Low Halfword to Word Unsigned XO 7C0001D6 SR B mullw[o][.] Multiply Low Word X 7C0003B8 SR B nand[.] NAND XO 7C0000D0 SR B neg[o][.] Negate XO 1000015C SR LMA nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Modulo Signed XO 100001DC SR LMA nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Saturate Signed XO 1000005C SR LMA nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Modulo Signed XO 100000DC SR LMA nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Saturate Signed XO 1000035C SR LMA nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Modulo Signed XO 100003DC SR LMA nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Sat- urate Signed X 7C0000F8 SR B nor[.] NOR X 7C000378 SR B or[.] OR X 7C000338 SR B orc[.] OR with Complement X 7C0000F4 B popcntb Population Count Bytes RR 0400---- VLE se_add Add Short Form OIM5 2000---- VLE se_addi Add Immediate Short Form RR 4600---- SR VLE se_and[.] AND Short Form RR 4500---- VLE se_andc AND with Complement Short Form IM5 2E00---- VLE se_andi AND Immediate Short Form BD8 E800---- VLE se_b[l] Branch [and Link] BD8 E000---- VLE se_bc Branch Conditional Short Form IM5 6000---- VLE se_bclri Bit Clear Immediate C 0006---- VLE se_bctr[l] Branch to Count Register [and Link] IM5 6200---- VLE se_bgeni Bit Generate Immediate C 0004---- VLE se_blr[l] Branch to Link Register [and Link] IM5 2C00---- VLE se_bmaski Bit Mask Generate Immediate IM5 6400---- VLE se_bseti Bit Set Immediate IM5 6600---- VLE se_btsti Bit Test Immediate RR 0C00---- VLE se_cmp Compare Word RR 0E00---- VLE se_cmph Compare Halfword Short Form RR 0F00---- VLE se_cmphl Compare Halfword Logical Short Form IM5 2A00---- VLE se_cmpi Compare Immediate Word Short Form RR 0D00---- VLE se_cmpl Compare Logical Word OIM5 2200---- VLE se_cmpli Compare Logical Immediate Word R 00D0---- VLE se_extsb Extend Sign Byte Short Form R 00F0---- VLE se_extsh Extend Sign Halfword Short Form R 00C0---- VLE se_extzb Extend Zero Byte R 00E0---- VLE se_extzh Extend Zero Halfword C 0000---- VLE se_illegal Illegal C 0001---- VLE se_isync Instruction Synchronize SD4 8000---- VLE se_lbz Load Byte and Zero Short Form SD4 A000---- VLE se_lhz Load Halfword and Zero Short Form IM7 4800---- VLE se_li Load Immediate Short Form SD4 C000---- VLE se_lwz Load Word and Zero Short Form RR 0300---- VLE se_mfar Move from Alternate Register R 00A0---- VLE se_mfctr Move From Count Register R 0080---- VLE se_mflr Move From Link Register RR 0100---- VLE se_mr Move Register RR 0200---- VLE se_mtar Move To Alternate Register R 00B0---- VLE se_mtctr Move To Count Register R 0090---- VLE se_mtlr Move To Link Register RR 0500---- VLE se_mullw Multiply Low Word Short Form R 0030---- VLE se_neg Negate Short Form R 0020---- VLE se_not NOT Short Form Appendix A. VLE Instruction Set Sorted by Mnemonic 1187 Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 RR 4400---- VLE se_or OR Short Form C 0009---- H VLE se_rfci Return From Critical Interrupt C 000A---- H VLE se_rfdi Return From Debug Interrupt C 000C---- P VLE, se_rfgi Return From Guest Interrupt E.HV C 0008---- P VLE se_rfi Return From Interrupt C 000B---- H VLE se_rfmci Return From Machine Check Interrupt C 0002---- VLE se_sc System Call RR 4200---- VLE se_slw Shift Left Word IM5 6C00---- VLE se_slwi Shift Left Word Immediate Short Form RR 4100---- VLE se_sraw Shift Right Algebraic Word IM5 6A00---- VLE se_srawi Shift Right Algebraic Immediate RR 4000---- VLE se_srw Shift Right Word IM5 6800---- VLE se_srwi Shift Right Word Immediate Short Form SD4 9000---- VLE se_stb Store Byte Short Form SD4 B000---- VLE se_sth Store Halfword Short Form SD4 D000---- VLE se_stw Store Word Short Form RR 0600---- VLE se_sub Subtract RR 0700---- VLE se_subf Subtract From Short Form OIM5 2400---- SR VLE se_subi[.] Subtract Immediate X 7C000036 SR 64 sld[.] Shift Left Doubleword X 7C000030 SR B slw[.] Shift Left Word X 7C000634 SR 64 srad[.] Shift Right Algebraic Doubleword XS 7C000674 SR 64 sradi[.] Shift Right Algebraic Doubleword Immediate X 7C000630 SR B sraw[.] Shift Right Algebraic Word X 7C000670 SR B srawi[.] Shift Right Algebraic Word Immediate X 7C000436 SR 64 srd[.] Shift Right Doubleword X 7C000430 SR B srw[.] Shift Right Word X 7C00056D B stbcx. Store Byte Conditional Indexed X 7C000506 DS stbdx Store Byte with Decoration Indexed X 7C0001BE P E.PD stbepx Store Byte by External Process ID Indexed X 7C0001EE B stbux Store Byte with Update Indexed X 7C0001AE B stbx Store Byte Indexed X 7C0001AD 64 stdcx. Store Doubleword Conditional Indexed X 7C0005C6 DS stddx Store Doubleword with Decoration Indexed X 7C00013A P E.PD;64 stdepx Store Doubleword by External Process ID Indexed X 7C00016A 64 stdux Store Doubleword with Update Indexed X 7C00012A 64 stdx Store Doubleword Indexed X 7C000746 DS stfddx Store Floating Doubleword with Decoration Indexed X 7C0005BE P E.PD stfdepx Store Floating-Point Double by External Process ID Indexed X 7C00072C B sthbrx Store Halfword Byte-Reverse Indexed X 7C0005AD B sthcx. Store Halfword Conditional Indexed X 7C000546 DS sthdx Store Halfword with Decoration Indexed X 7C00033E P E.PD sthepx Store Halfword by External Process ID Indexed X 7C00036E B sthux Store Halfword with Update Indexed X 7C00032E B sthx Store Halfword Indexed X 7C0005AA MA stswi Store String Word Immediate X 7C00052A MA stswx Store String Word Indexed X 7C00010E V stvebx Store Vector Element Byte Indexed X 7C00014E V stvehx Store Vector Element Halfword Indexed X 7C00064E P E.PD stvepx Store Vector by External Process ID Indexed X 7C00060E P E.PD stvepxl Store Vector by External Process ID Indexed LRU X 7C00018E V stvewx Store Vector Element Word Indexed X 7C0001CE V stvx Store Vector Indexed X 7C0003CE V stvxl Store Vector Indexed LRU X 7C00052C B stwbrx Store Word Byte-Reverse Indexed X 7C00012D B stwcx. Store Word Conditional Indexed X 7C000586 DS stwdx Store Word with Decoration Indexed X 7C00013E P E.PD stwepx Store Word by External Process ID Indexed X 7C00016E B stwux Store Word with Update Indexed 1188 Power ISATM Book VLE Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 7C00012E B stwx Store Word Indexed XO 7C000050 SR B subf[o][.] Subtract From XO 7C000010 SR B subfc[o][.] Subtract From Carrying XO 7C000110 SR B subfe[o][.] Subtract From Extended XO 7C0001D0 SR B subfme[o][.] Subtract From Minus One Extended XO 7C000190 SR B subfze[o][.] Subtract From Zero Extended X 7C0004AC B sync Synchronize X 7C000088 64 td Trap Doubleword X 7C000624 H E tlbivax TLB Invalidate Virtual Address Indexed X 7C000764 H E tlbre TLB Read Entry X 7C000724 H E tlbsx TLB Search Indexed X 7C00046C H E tlbsync TLB Synchronize X 7C0007A4 H E tlbwe TLB Write Entry X 7C000008 B tw Trap Word VX 10000180 V vaddcuw Vector Add and write Carry-out Unsigned Word VX 1000000A V vaddfp Vector Add Single-Precision VX 10000300 V vaddsbs Vector Add Signed Byte Saturate VX 10000340 V vaddshs Vector Add Signed Halfword Saturate VX 10000380 V vaddsws Vector Add Signed Word Saturate VX 10000000 V vaddubm Vector Add Unsigned Byte Modulo VX 10000200 V vaddubs Vector Add Unsigned Byte Saturate VX 10000040 V vadduhm Vector Add Unsigned Halfword Modulo VX 10000240 V vadduhs Vector Add Unsigned Halfword Saturate VX 10000080 V vadduwm Vector Add Unsigned Word Modulo VX 10000280 V vadduws Vector Add Unsigned Word Saturate VX 10000404 V vand Vector Logical AND VX 10000444 V vandc Vector Logical AND with Complement VX 10000502 V vavgsb Vector Average Signed Byte VX 10000542 V vavgsh Vector Average Signed Halfword VX 10000582 V vavgsw Vector Average Signed Word VX 10000402 V vavgub Vector Average Unsigned Byte VX 10000442 V vavguh Vector Average Unsigned Halfword VX 10000482 V vavguw Vector Average Unsigned Word VX 1000034A V vcfsx Vector Convert From Signed Fixed-Point Word VX 1000030A V vcfux Vector Convert From VC 100003C6 V vcmpbfp[.] Vector Compare Bounds Single-Precision VC 100000C6 V vcmpeqfp[.] Vector Compare Equal To Single-Precision VC 10000006 V vcmpequb[.] Vector Compare Equal To Unsigned Byte VC 10000046 V vcmpequh[.] Vector Compare Equal To Unsigned Halfword VC 10000086 V vcmpequw[.] Vector Compare Equal To Unsigned Word VC 100001C6 V vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Preci- sion VC 100002C6 V vcmpgtfp[.] Vector Compare Greater Than Single-Precision VC 10000306 V vcmpgtsb[.] Vector Compare Greater Than Signed Byte VC 10000346 V vcmpgtsh[.] Vector Compare Greater Than Signed Halfword VC 10000386 V vcmpgtsw[.] Vector Compare Greater Than Signed Word VC 10000206 V vcmpgtub[.] Vector Compare Greater Than Unsigned Byte VC 10000246 V vcmpgtuh[.] Vector Compare Greater Than Unsigned Halfword VC 10000286 V vcmpgtuw[.] Vector Compare Greater Than Unsigned Word VX 100003CA V vctsxs Vector Convert To Signed Fixed-Point Word Saturate VX 1000038A V vctuxs Vector Convert To Unsigned Fixed-Point Word Saturate VX 1000018A V vexptefp Vector 2 Raised to the Exponent Estimate Floating-Point VX 100001CA V vlogefp Vector Log Base 2 Estimate Floating-Point VA 1000002E V vmaddfp Vector Multiply-Add Single-Precision VX 1000040A V vmaxfp Vector Maximum Single-Precision VX 10000102 V vmaxsb Vector Maximum Signed Byte VX 10000142 V vmaxsh Vector Maximum Signed Halfword VX 10000182 V vmaxsw Vector Maximum Signed Word VX 10000002 V vmaxub Vector Maximum Unsigned Byte VX 10000042 V vmaxuh Vector Maximum Unsigned Halfword VX 10000082 V vmaxuw Vector Maximum Unsigned Word Appendix A. VLE Instruction Set Sorted by Mnemonic 1189 Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 VA 10000020 V vmhaddshs Vector Multiply-High-Add Signed Halfword Saturate VA 10000021 V vmhraddshs Vector Multiply-High-Round-Add Signed Halfword Satu- rate VX 1000044A V vminfp Vector Minimum Single-Precision VX 10000302 V vminsb Vector Minimum Signed Byte VX 10000342 V vminsh Vector Minimum Signed Halfword VX 10000382 V vminsw Vector Minimum Signed Word VX 10000202 V vminub Vector Minimum Unsigned Byte VX 10000242 V vminuh Vector Minimum Unsigned Halfword VX 10000282 V vminuw Vector Minimum Unsigned Word VA 10000022 V vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo VX 1000000C V vmrghb Vector Merge High Byte VX 1000004C V vmrghh Vector Merge High Halfword VX 1000008C V vmrghw Vector Merge High Word VX 1000010C V vmrglb Vector Merge Low Byte VX 1000014C V vmrglh Vector Merge Low Halfword VX 1000018C V vmrglw Vector Merge Low Word VA 10000025 V vmsummbm Vector Multiply-Sum Mixed Byte Modulo VA 10000028 V vmsumshm Vector Multiply-Sum Signed Halfword Modulo VA 10000029 V vmsumshs Vector Multiply-Sum Signed Halfword Saturate VA 10000024 V vmsumubm Vector Multiply-Sum Unsigned Byte Modulo VA 10000026 V vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo VA 10000027 V vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate VX 10000308 V vmulesb Vector Multiply Even Signed Byte VX 10000348 V vmulesh Vector Multiply Even Signed Halfword VX 10000208 V vmuleub Vector Multiply Even Unsigned Byte VX 10000248 V vmuleuh Vector Multiply Even Unsigned Halfword VX 10000108 V vmulosb Vector Multiply Odd Signed Byte VX 10000148 V vmulosh Vector Multiply Odd Signed Halfword VX 10000008 V vmuloub Vector Multiply Odd Unsigned Byte VX 10000048 V vmulouh Vector Multiply Odd Unsigned Halfword VA 1000002F V vnmsubfp Vector Negative Multiply-Subtract Single-Precision VX 10000504 V vnor Vector Logical NOR VX 10000484 V vor Vector Logical OR VA 1000002B V vperm Vector Permute VX 1000030E V vpkpx Vector Pack Pixel VX 1000018E V vpkshss Vector Pack Signed Halfword Signed Saturate VX 1000010E V vpkshus Vector Pack Signed Halfword Unsigned Saturate VX 100001CE V vpkswss Vector Pack Signed Word Signed Saturate VX 1000014E V vpkswus Vector Pack Signed Word Unsigned Saturate VX 1000000E V vpkuhum Vector Pack Unsigned Halfword Unsigned Modulo VX 1000008E V vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate VX 1000004E V vpkuwum Vector Pack Unsigned Word Unsigned Modulo VX 100000CE V vpkuwus Vector Pack Unsigned Word Unsigned Saturate VX 1000010A V vrefp Vector Reciprocal Estimate Single-Precision VX 100002CA V vrfim Vector Round to Single-Precision Integer toward -Infinity VX 1000020A V vrfin Vector Round to Single-Precision Integer Nearest VX 1000028A V vrfip Vector Round to Single-Precision Integer toward +Infinity VX 1000024A V vrfiz Vector Round to Single-Precision Integer toward Zero VX 10000004 V vrlb Vector Rotate Left Byte VX 10000044 V vrlh Vector Rotate Left Halfword VX 10000084 V vrlw Vector Rotate Left Word VX 1000014A V vrsqrtefp Vector Reciprocal Square Root Estimate Single-Precision VA 1000002A V vsel Vector Select VX 100001C4 V vsl Vector Shift Left VX 10000104 V vslb Vector Shift Left Byte VA 1000002C V vsldoi Vector Shift Left Double by Octet Immediate VX 10000144 V vslh Vector Shift Left Halfword VX 1000040C V vslo Vector Shift Left by Octet VX 10000184 V vslw Vector Shift Left Word VX 1000020C V vspltb Vector Splat Byte 1190 Power ISATM Book VLE Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 VX 1000024C V vsplth Vector Splat Halfword VX 1000030C V vspltisb Vector Splat Immediate Signed Byte VX 1000034C V vspltish Vector Splat Immediate Signed Halfword VX 1000038C V vspltisw Vector Splat Immediate Signed Word VX 1000028C V vspltw Vector Splat Word VX 100002C4 V vsr Vector Shift Right VX 10000304 V vsrab Vector Shift Right Algebraic Word VX 10000344 V vsrah Vector Shift Right Algebraic Halfword VX 10000384 V vsraw Vector Shift Right Algebraic Word VX 10000204 V vsrb Vector Shift Right Byte VX 10000244 V vsrh Vector Shift Right Halfword VX 1000044C V vsro Vector Shift Right by Octet VX 10000284 V vsrw Vector Shift Right Word VX 10000580 V vsubcuw Vector Subtract and Write Carry-Out Unsigned Word VX 1000004A V vsubfp Vector Subtract Single-Precision VX 10000700 V vsubsbs Vector Subtract Signed Byte Saturate VX 10000740 V vsubshs Vector Subtract Signed Halfword Saturate VX 10000780 V vsubsws Vector Subtract Signed Word Saturate VX 10000400 V vsububm Vector Subtract Unsigned Byte Modulo VX 10000600 V vsububs Vector Subtract Unsigned Byte Saturate VX 10000440 V vsubuhm Vector Subtract Unsigned Halfword Modulo VX 10000640 V vsubuhs Vector Subtract Unsigned Halfword Saturate VX 10000480 V vsubuwm Vector Subtract Unsigned Word Modulo VX 10000680 V vsubuws Vector Subtract Unsigned Word Saturate VX 10000688 V vsum2sws Vector Sum across Half Signed Word Saturate VX 10000708 V vsum4sbs Vector Sum across Quarter Signed Byte Saturate VX 10000648 V vsum4shs Vector Sum across Quarter Signed Halfword Saturate VX 10000608 V vsum4ubs Vector Sum across Quarter Unsigned Byte Saturate VX 10000788 V vsumsws Vector Sum across Signed Word Saturate VX 1000034E V vupkhpx Vector Unpack High Pixel VX 1000020E V vupkhsb Vector Unpack High Signed Byte VX 1000024E V vupkhsh Vector Unpack High Signed Halfword VX 100003CE V vupklpx Vector Unpack Low Pixel VX 1000028E V vupklsb Vector Unpack Low Signed Byte VX 100002CE V vupklsh Vector Unpack Low Signed Halfword VX 100004C4 V vxor Vector Logical XOR X 7C00007C WT wait Wait X 7C000106 P E wrtee Write MSR External Enable X 7C000146 P E wrteei Write MSR External Enable Immediate X 7C000278 SR B xor[.] XOR 1 See the key to the mode dependency and privilege columns on page 1324 and the key to the category column in Section 1.3.5 of Book I. 2 For 16-bit instructions, the "Opcode" column represents the 16-bit hexadecimal instruction encoding with the opcode and extended opcode in the corresponding fields in the instruction, and with 0's in bit positions which are not opcode bits; dashes are used following the opcode to indicate the form is a 16-bit instruction. For 32-bit instructions, the "Opcode" column represents the 32-bit hexadecimal instruction encoding with the opcode, extended opcode, and other fields with fixed values in the corresponding fields in the instruction, and with 0...s in bit positions which are not opcode, extended opcode or fixed value bits. Appendix A. VLE Instruction Set Sorted by Mnemonic 1191 Version 2.06 1192 Power ISATM Book VLE Version 2.06 Appendix B. VLE Instruction Set Sorted by Opcode This appendix lists all the instructions available in VLE mode in the Power ISA , in order by opcode. Opcodes that are not defined below are treated as illegal by category VLE. Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 C 0000---- VLE se_illegal Illegal C 0001---- VLE se_isync Instruction Synchronize C 0002---- VLE se_sc System Call C 0004---- VLE se_blr[l] Branch to Link Register [and Link] C 0006---- VLE se_bctr[l] Branch to Count Register [and Link] C 0008---- P VLE se_rfi Return From Interrupt C 0009---- H VLE se_rfci Return From Critical Interrupt C 000A---- H VLE se_rfdi Return From Debug Interrupt C 000B---- H VLE se_rfmci Return From Machine Check Interrupt C 000C---- P VLE, se_rfgi Return From Guest Interrupt E.HV R 0020---- VLE se_not NOT Short Form R 0030---- VLE se_neg Negate Short Form R 0080---- VLE se_mflr Move From Link Register R 0090---- VLE se_mtlr Move To Link Register R 00A0---- VLE se_mfctr Move From Count Register R 00B0---- VLE se_mtctr Move To Count Register R 00C0---- VLE se_extzb Extend Zero Byte R 00D0---- VLE se_extsb Extend Sign Byte Short Form R 00E0---- VLE se_extzh Extend Zero Halfword R 00F0---- VLE se_extsh Extend Sign Halfword Short Form RR 0100---- VLE se_mr Move Register RR 0200---- VLE se_mtar Move To Alternate Register RR 0300---- VLE se_mfar Move from Alternate Register RR 0400---- VLE se_add Add Short Form RR 0500---- VLE se_mullw Multiply Low Word Short Form RR 0600---- VLE se_sub Subtract RR 0700---- VLE se_subf Subtract From Short Form RR 0C00---- VLE se_cmp Compare Word RR 0D00---- VLE se_cmpl Compare Logical Word RR 0E00---- VLE se_cmph Compare Halfword Short Form RR 0F00---- VLE se_cmphl Compare Halfword Logical Short Form VX 10000000 V vaddubm Vector Add Unsigned Byte Modulo VX 10000002 V vmaxub Vector Maximum Unsigned Byte VX 10000004 V vrlb Vector Rotate Left Byte VC 10000006 V vcmpequb[.] Vector Compare Equal To Unsigned Byte VX 10000008 V vmuloub Vector Multiply Odd Unsigned Byte VX 1000000A V vaddfp Vector Add Single-Precision VX 1000000C V vmrghb Vector Merge High Byte VX 1000000E V vpkuhum Vector Pack Unsigned Halfword Unsigned Modulo X 10000010 SR LMA mulhhwu[.] Multiply High Halfword to Word Unsigned XO 10000018 SR LMA machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned VA 10000020 V vmhaddshs Vector Multiply-High-Add Signed Halfword Saturate Appendix B. VLE Instruction Set Sorted by Opcode 1193 Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 VA 10000021 V vmhraddshs Vector Multiply-High-Round-Add Signed Halfword Satu- rate VA 10000022 V vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo VA 10000024 V vmsumubm Vector Multiply-Sum Unsigned Byte Modulo VA 10000025 V vmsummbm Vector Multiply-Sum Mixed Byte Modulo VA 10000026 V vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo VA 10000027 V vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate VA 10000028 V vmsumshm Vector Multiply-Sum Signed Halfword Modulo VA 10000029 V vmsumshs Vector Multiply-Sum Signed Halfword Saturate VA 1000002A V vsel Vector Select VA 1000002B V vperm Vector Permute VA 1000002C V vsldoi Vector Shift Left Double by Octet Immediate VA 1000002E V vmaddfp Vector Multiply-Add Single-Precision VA 1000002F V vnmsubfp Vector Negative Multiply-Subtract Single-Precision VX 10000040 V vadduhm Vector Add Unsigned Halfword Modulo VX 10000042 V vmaxuh Vector Maximum Unsigned Halfword VX 10000044 V vrlh Vector Rotate Left Halfword VC 10000046 V vcmpequh[.] Vector Compare Equal To Unsigned Halfword VX 10000048 V vmulouh Vector Multiply Odd Unsigned Halfword VX 1000004A V vsubfp Vector Subtract Single-Precision VX 1000004C V vmrghh Vector Merge High Halfword VX 1000004E V vpkuwum Vector Pack Unsigned Word Unsigned Modulo X 10000050 SR LMA mulhhw[.] Multiply High Halfword to Word Signed XO 10000058 SR LMA machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed XO 1000005C SR LMA nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Modulo Signed VX 10000080 V vadduwm Vector Add Unsigned Word Modulo VX 10000082 V vmaxuw Vector Maximum Unsigned Word VX 10000084 V vrlw Vector Rotate Left Word VC 10000086 V vcmpequw[.] Vector Compare Equal To Unsigned Word VX 1000008C V vmrghw Vector Merge High Word VX 1000008E V vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate XO 10000098 SR LMA machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned VC 100000C6 V vcmpeqfp[.] Vector Compare Equal To Single-Precision VX 100000CE V vpkuwus Vector Pack Unsigned Word Unsigned Saturate XO 100000D8 SR LMA machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed XO 100000DC SR LMA nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Sat- urate Signed VX 10000102 V vmaxsb Vector Maximum Signed Byte VX 10000104 V vslb Vector Shift Left Byte VX 10000108 V vmulosb Vector Multiply Odd Signed Byte VX 1000010A V vrefp Vector Reciprocal Estimate Single-Precision VX 1000010C V vmrglb Vector Merge Low Byte VX 1000010E V vpkshus Vector Pack Signed Halfword Unsigned Saturate X 10000110 SR LMA mulchwu[.] Multiply Cross Halfword to Word Unsigned XO 10000118 SR LMA macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned VX 10000142 V vmaxsh Vector Maximum Signed Halfword VX 10000144 V vslh Vector Shift Left Halfword VX 10000148 V vmulosh Vector Multiply Odd Signed Halfword VX 1000014A V vrsqrtefp Vector Reciprocal Square Root Estimate Single-Precision VX 1000014C V vmrglh Vector Merge Low Halfword VX 1000014E V vpkswus Vector Pack Signed Word Unsigned Saturate X 10000150 SR LMA mulchw[.] Multiply Cross Halfword to Word Signed XO 10000158 SR LMA macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed XO 1000015C SR LMA nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Modulo Signed 1194 Power ISATM Book VLE Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 VX 10000180 V vaddcuw Vector Add and write Carry-out Unsigned Word VX 10000182 V vmaxsw Vector Maximum Signed Word VX 10000184 V vslw Vector Shift Left Word VX 1000018A V vexptefp Vector 2 Raised to the Exponent Estimate Floating-Point VX 1000018C V vmrglw Vector Merge Low Word VX 1000018E V vpkshss Vector Pack Signed Halfword Signed Saturate XO 10000198 SR LMA macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned VX 100001C4 V vsl Vector Shift Left VC 100001C6 V vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Preci- sion VX 100001CA V vlogefp Vector Log Base 2 Estimate Floating-Point VX 100001CE V vpkswss Vector Pack Signed Word Signed Saturate XO 100001D8 SR LMA macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed XO 100001DC SR LMA nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Saturate Signed EVX 10000200 SP evaddw Vector Add Word VX 10000200 V vaddubs Vector Add Unsigned Byte Saturate EVX 10000202 SP evaddiw Vector Add Immediate Word VX 10000202 V vminub Vector Minimum Unsigned Byte EVX 10000204 SP evsubfw Vector Subtract from Word VX 10000204 V vsrb Vector Shift Right Byte EVX 10000206 SP evsubifw Vector Subtract Immediate from Word VC 10000206 V vcmpgtub[.] Vector Compare Greater Than Unsigned Byte EVX 10000208 SP evabs Vector Absolute Value VX 10000208 V vmuleub Vector Multiply Even Unsigned Byte EVX 10000209 SP evneg Vector Negate EVX 1000020A SP evextsb Vector Extend Sign Byte VX 1000020A V vrfin Vector Round to Single-Precision Integer Nearest EVX 1000020B SP evextsh Vector Extend Sign Halfword EVX 1000020C SP evrndw Vector Round Word VX 1000020C V vspltb Vector Splat Byte EVX 1000020D SP evcntlzw Vector Count Leading Zeros Word EVX 1000020E SP evcntlsw Vector Count Leading Signed Bits Word VX 1000020E V vupkhsb Vector Unpack High Signed Byte EVX 1000020F SP brinc Bit Reversed Increment EVX 10000211 SP evand Vector AND EVX 10000212 SP evandc Vector AND with Complement EVX 10000216 SP evxor Vector XOR EVX 10000217 SP evor Vector OR EVX 10000218 SP evnor Vector NOR EVX 10000219 SP eveqv Vector Equivalent EVX 1000021B SP evorc Vector OR with Complement EVX 1000021E SP evnand Vector NAND EVX 10000220 SP evsrwu Vector Shift Right Word Unsigned EVX 10000221 SP evsrws Vector Shift Right Word Signed EVX 10000222 SP evsrwiu Vector Shift Right Word Immediate Unsigned EVX 10000223 SP evsrwis Vector Shift Right Word Immediate Signed EVX 10000224 SP evslw Vector Shift Left Word EVX 10000226 SP evslwi Vector Shift Left Word Immediate EVX 10000228 SP evrlw Vector Rotate Left Word EVX 10000229 SP evsplati Vector Splat Immediate EVX 1000022A SP evrlwi Vector Rotate Left Word Immediate EVX 1000022B SP evsplatfi Vector Splat Fractional Immediate EVX 1000022C SP evmergehi Vector Merge High EVX 1000022D SP evmergelo Vector Merge Low EVX 1000022E SP evmergehilo Vector Merge High/Low EVX 1000022F SP evmergelohi Vector Merge Low/High EVX 10000230 SP evcmpgtu Vector Compare Greater Than Unsigned EVX 10000231 SP evcmpgts Vector Compare Greater Than Signed Appendix B. VLE Instruction Set Sorted by Opcode 1195 Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000232 SP evcmpltu Vector Compare Less Than Unsigned EVX 10000233 SP evcmplts Vector Compare Less Than Signed EVX 10000234 SP evcmpeq Vector Compare Equal VX 10000240 V vadduhs Vector Add Unsigned Halfword Saturate VX 10000242 V vminuh Vector Minimum Unsigned Halfword VX 10000244 V vsrh Vector Shift Right Halfword VC 10000246 V vcmpgtuh[.] Vector Compare Greater Than Unsigned Halfword VX 10000248 V vmuleuh Vector Multiply Even Unsigned Halfword VX 1000024A V vrfiz Vector Round to Single-Precision Integer toward Zero VX 1000024C V vsplth Vector Splat Halfword VX 1000024E V vupkhsh Vector Unpack High Signed Halfword EVS 10000278 SP evsel Vector Select EVX 10000280 SP.FV evfsadd Vector Floating-Point Single-Precision Add VX 10000280 V vadduws Vector Add Unsigned Word Saturate EVX 10000281 SP.FV evfssub Vector Floating-Point Single-Precision Subtract VX 10000282 V vminuw Vector Minimum Unsigned Word EVX 10000284 SP.FV evfsabs Vector Floating-Point Single-Precision Absolute Value VX 10000284 V vsrw Vector Shift Right Word EVX 10000285 SP.FV evfsnabs Vector Floating-Point Single-Precision Negative Absolute Value EVX 10000286 SP.FV evfsneg Vector Floating-Point Single-Precision Negate VC 10000286 V vcmpgtuw[.] Vector Compare Greater Than Unsigned Word EVX 10000288 SP.FV evfsmul Vector Floating-Point Single-Precision Multiply EVX 10000289 SP.FV evfsdiv Vector Floating-Point Single-Precision Divide VX 1000028A V vrfip Vector Round to Single-Precision Integer toward +Infinity EVX 1000028C SP.FV evfscmpgt Vector Floating-Point Single-Precision Compare Greater Than VX 1000028C V vspltw Vector Splat Word EVX 1000028D SP.FV evfscmplt Vector Floating-Point Single-Precision Compare Less Than EVX 1000028E SP.FV evfscmpeq Vector Floating-Point Single-Precision Compare Equal VX 1000028E V vupklsb Vector Unpack Low Signed Byte EVX 10000290 SP.FV evfscfui Vector Convert Floating-Point Single-Precision from Unsigned Integer EVX 10000291 SP.FV evfscfsi Vector Convert Floating-Point Single-Precision from Signed Integer EVX 10000292 SP.FV evfscfuf Vector Convert Floating-Point Single-Precision from Unsigned Fraction EVX 10000293 SP.FV evfscfsf Vector Convert Floating-Point Single-Precision from Signed Fraction EVX 10000294 SP.FV evfsctui Vector Convert Floating-Point Single-Precision to Unsigned Integer EVX 10000295 SP.FV evfsctsi Vector Convert Floating-Point Single-Precision to Signed Integer EVX 10000296 SP.FV evfsctuf Vector Convert Floating-Point Single-Precision to Unsigned Fraction EVX 10000297 SP.FV evfsctsf Vector Convert Floating-Point Single-Precision to Signed Fraction EVX 10000298 SP.FV evfsctuiz Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 1000029A SP.FV evfsctsiz Vector Convert Floating-Point Single-Precision to Signed Integer with Round Toward Zero EVX 1000029C SP.FV evfststgt Vector Floating-Point Single-Precision Test Greater Than EVX 1000029D SP.FV evfststlt Vector Floating-Point Single-Precision Test Less Than EVX 1000029E SP.FV evfststeq Vector Floating-Point Single-Precision Test Equal EVX 100002C0 SP.FS efsadd Floating-Point Single-Precision Add EVX 100002C1 SP.FS efssub Floating-Point Single-Precision Subtract EVX 100002C4 SP.FS efsabs Floating-Point Single-Precision Absolute Value VX 100002C4 V vsr Vector Shift Right EVX 100002C5 SP.FS efsnabs Floating-Point Single-Precision Negative Absolute Value 1196 Power ISATM Book VLE Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 100002C6 SP.FS efsneg Floating-Point Single-Precision Negate VC 100002C6 V vcmpgtfp[.] Vector Compare Greater Than Single-Precision EVX 100002C8 SP.FS efsmul Floating-Point Single-Precision Multiply EVX 100002C9 SP.FS efsdiv Floating-Point Single-Precision Divide VX 100002CA V vrfim Vector Round to Single-Precision Integer toward -Infinity EVX 100002CC SP.FS efscmpgt Floating-Point Single-Precision Compare Greater Than EVX 100002CD SP.FS efscmplt Floating-Point Single-Precision Compare Less Than EVX 100002CE SP.FS efscmpeq Floating-Point Single-Precision Compare Equal VX 100002CE V vupklsh Vector Unpack Low Signed Halfword EVX 100002CF SP.FD efscfd Floating-Point Single-Precision Convert from Double-Pre- cision EVX 100002D0 SP.FS efscfui Convert Floating-Point Single-Precision from Unsigned Integer EVX 100002D1 SP.FS efscfsi Convert Floating-Point Single-Precision from Signed Inte- ger EVX 100002D2 SP.FS efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction EVX 100002D3 SP.FS efscfsf Convert Floating-Point Single-Precision from Signed Frac- tion EVX 100002D4 SP.FS efsctui Convert Floating-Point Single-Precision to Unsigned Inte- ger EVX 100002D5 SP.FS efsctsi Convert Floating-Point Single-Precision to Signed Integer EVX 100002D6 SP.FS efsctuf Convert Floating-Point Single-Precision to Unsigned Frac- tion EVX 100002D7 SP.FS efsctsf Convert Floating-Point Single-Precision to Signed Frac- tion EVX 100002D8 SP.FS efsctuiz Convert Floating-Point Single-Precision to Unsigned Inte- ger with Round Towards Zero EVX 100002DA SP.FS efsctsiz Convert Floating-Point Single-Precision to Signed Integer with Round Towards Zero EVX 100002DC SP.FS efststgt Floating-Point Single-Precision Test Greater Than EVX 100002DD SP.FS efststlt Floating-Point Single-Precision Test Less Than EVX 100002DE SP.FS efststeq Floating-Point Single-Precision Test Equal EVX 100002E0 SP.FD efdadd Floating-Point Double-Precision Add EVX 100002E1 SP.FD efdsub Floating-Point Double-Precision Subtract EVX 100002E2 SP.FD efdcfuid Convert Floating-Point Double-Precision from Unsigned Integer Doubleword EVX 100002E3 SP.FD efdcfsid Convert Floating-Point Double-Precision from Signed Integer Doubleword EVX 100002E4 SP.FD efdabs Floating-Point Double-Precision Absolute Value EVX 100002E5 SP.FD efdnabs Floating-Point Double-Precision Negative Absolute Value EVX 100002E6 SP.FD efdneg Floating-Point Double-Precision Negate EVX 100002E8 SP.FD efdmul Floating-Point Double-Precision Multiply EVX 100002E9 SP.FD efddiv Floating-Point Double-Precision Divide EVX 100002EA SP.FD efdctuidz Convert Floating-Point Double-Precision to Unsigned Inte- ger Doubleword with Round toward Zero EVX 100002EB SP.FD efdctsidz Convert Floating-Point Double-Precision to Signed Integer Doubleword with Round toward Zero EVX 100002ED SP.FD efdcmplt Floating-Point Double-Precision Compare Less Than EVX 100002EC SP.FD efdcmpgt Floating-Point Double-Precision Compare Greater Than EVX 100002EE SP.FD efdcmpeq Floating-Point Double-Precision Compare Equal EVX 100002EF SP.FD efdcfs Floating-Point Double-Precision Convert from Single-Pre- cision EVX 100002F0 SP.FD efdcfui Convert Floating-Point Double-Precision from Unsigned Integer EVX 100002F1 SP.FD efdcfsi Convert Floating-Point Double-Precision from Signed Integer EVX 100002F2 SP.FS efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction Appendix B. VLE Instruction Set Sorted by Opcode 1197 Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 100002F3 SP.FD efdcfsf Convert Floating-Point Double-Precision from Signed Fraction EVX 100002F4 SP.FD efdctui Convert Floating-Point Double-Precision to Unsigned Inte- ger EVX 100002F5 SP.FD efdctsi Convert Floating-Point Double-Precision to Signed Integer EVX 100002F6 SP.FD efdctuf Convert Floating-Point Double-Precision to Unsigned Fraction EVX 100002F7 SP.FD efdctsf Convert Floating-Point Double-Precision to Signed Frac- tion EVX 100002F8 SP.FD efdctuiz Convert Floating-Point Double-Precision to Unsigned Inte- ger with Round toward Zero EVX 100002FA SP.FD efdctsiz Convert Floating-Point Double-Precision to Signed Integer with Round toward Zero EVX 100002FC SP.FD efdtstgt Floating-Point Double-Precision Test Greater Than EVX 100002FD SP.FD efdtstlt Floating-Point Double-Precision Test Less Than EVX 100002FE SP.FD efdtsteq Floating-Point Double-Precision Test Equal EVX 10000300 SP evlddx Vector Load Double Word into Double Word Indexed VX 10000300 V vaddsbs Vector Add Signed Byte Saturate EVX 10000301 SP evldd Vector Load Double Word into Double Word EVX 10000302 SP evldwx Vector Load Double Word into Two Words Indexed VX 10000302 V vminsb Vector Minimum Signed Byte EVX 10000303 SP evldw Vector Load Double Word into Two Words EVX 10000304 SP evldhx Vector Load Double Word into Four Halfwords Indexed VX 10000304 V vsrab Vector Shift Right Algebraic Word EVX 10000305 SP evldh Vector Load Double Word into Four Halfwords VC 10000306 V vcmpgtsb[.] Vector Compare Greater Than Signed Byte EVX 10000308 SP evlhhesplatx Vector Load Halfword into Halfwords Even and Splat Indexed VX 10000308 V vmulesb Vector Multiply Even Signed Byte EVX 10000309 SP evlhhesplat Vector Load Halfword into Halfwords Even and Splat VX 1000030A V vcfux Vector Convert From EVX 1000030C SP evlhhousplatx Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed VX 1000030C V vspltisb Vector Splat Immediate Signed Byte EVX 1000030D SP evlhhousplat Vector Load Halfword into Halfword Odd Unsigned and Splat EVX 1000030E SP evlhhossplatx Vector Load Halfword into Halfword Odd Signed and Splat Indexed VX 1000030E V vpkpx Vector Pack Pixel EVX 1000030F SP evlhhossplat Vector Load Halfword into Halfword Odd and Splat X 10000310 SR LMA mullhwu[.] Multiply Low Halfword to Word Unsigned EVX 10000311 SP evlwhe Vector Load Word into Two Halfwords Even EVX 10000314 SP evlwhoux Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) EVX 10000315 SP evlwhou Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) EVX 10000316 SP evlwhosx Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) EVX 10000317 SP evlwhos Vector Load Word into Two Halfwords Odd Signed (with sign extension) EVX 10000318 SP evlwwsplatx Vector Load Word into Word and Splat Indexed XO 10000318 SR LMA maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned EVX 10000319 SP evlwwsplat Vector Load Word into Word and Splat EVX 1000031C SP evlwhsplatx Vector Load Word into Two Halfwords and Splat Indexed EVX 1000031D SP evlwhsplat Vector Load Word into Two Halfwords and Splat EVX 10000320 SP evstddx Vector Store Doubleword of Doubleword Indexed EVX 10000321 SP evstdd Vector Store Double of Double EVX 10000322 SP evstdwx Vector Store Double of Two Words Indexed EVX 10000323 SP evstdw Vector Store Double of Two Words 1198 Power ISATM Book VLE Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000324 SP evstdhx Vector Store Double of Four Halfwords Indexed EVX 10000325 SP evstdh Vector Store Double of Four Halfwords EVX 10000330 SP evstwhex Vector Store Word of Two Halfwords from Even Indexed EVX 10000331 SP evstwhe Vector Store Word of Two Halfwords from Even EVX 10000334 SP evstwhox Vector Store Word of Two Halfwords from Odd Indexed EVX 10000335 SP evstwho Vector Store Word of Two Halfwords from Odd EVX 10000338 SP evstwwex Vector Store Word of Word from Even Indexed EVX 10000339 SP evstwwe Vector Store Word of Word from Even EVX 1000033C SP evstwwox Vector Store Word of Word from Odd Indexed EVX 1000033D SP evstwwo Vector Store Word of Word from Odd VX 10000340 V vaddshs Vector Add Signed Halfword Saturate VX 10000342 V vminsh Vector Minimum Signed Halfword VX 10000344 V vsrah Vector Shift Right Algebraic Halfword VC 10000346 V vcmpgtsh[.] Vector Compare Greater Than Signed Halfword VX 10000348 V vmulesh Vector Multiply Even Signed Halfword VX 1000034A V vcfsx Vector Convert From Signed Fixed-Point Word VX 1000034C V vspltish Vector Splat Immediate Signed Halfword VX 1000034E V vupkhpx Vector Unpack High Pixel X 10000350 SR LMA mullhw[.] Multiply Low Halfword to Word Signed XO 10000358 SR LMA maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed XO 1000035C SR LMA nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Mod- ulo Signed VX 10000380 V vaddsws Vector Add Signed Word Saturate VX 10000382 V vminsw Vector Minimum Signed Word VX 10000384 V vsraw Vector Shift Right Algebraic Word VC 10000386 V vcmpgtsw[.] Vector Compare Greater Than Signed Word VX 1000038A V vctuxs Vector Convert To Unsigned Fixed-Point Word Saturate VX 1000038C V vspltisw Vector Splat Immediate Signed Word XO 10000398 SR LMA maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned VC 100003C6 V vcmpbfp[.] Vector Compare Bounds Single-Precision VX 100003CA V vctsxs Vector Convert To Signed Fixed-Point Word Saturate VX 100003CE V vupklpx Vector Unpack Low Pixel XO 100003D8 SR LMA maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed XO 100003DC SR LMA nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Sat- urate Signed VX 10000400 V vsububm Vector Subtract Unsigned Byte Modulo VX 10000402 V vavgub Vector Average Unsigned Byte EVX 10000403 SP evmhessf Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional VX 10000404 V vand Vector Logical AND EVX 10000407 SP evmhossf Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional EVX 10000408 SP evmheumi Vector Multiply Halfwords, Even, Unsigned, Modulo, Inte- ger EVX 10000409 SP evmhesmi Vector Multiply Halfwords, Even, Signed, Modulo, Integer VX 1000040A V vmaxfp Vector Maximum Single-Precision EVX 1000040B SP evmhesmf Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional EVX 1000040C SP evmhoumi Vector Multiply Halfwords, Odd, Unsigned, Modulo, Inte- ger VX 1000040C V vslo Vector Shift Left by Octet EVX 1000040D SP evmhosmi Vector Multiply Halfwords, Odd, Signed, Modulo, Integer EVX 1000040F SP evmhosmf Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional EVX 10000423 SP evmhessfa Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional to Accumulator Appendix B. VLE Instruction Set Sorted by Opcode 1199 Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000427 SP evmhossfa Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional to Accumulator EVX 10000428 SP evmheumia Vector Multiply Halfwords, Even, Unsigned, Modulo, Inte- ger to Accumulator EVX 10000429 SP evmhesmia Vector Multiply Halfwords, Even, Signed, Modulo, Integer to Accumulator EVX 1000042B SP evmhesmfa Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional to Accumulator EVX 1000042C SP evmhoumia Vector Multiply Halfwords, Odd, Unsigned, Modulo, Inte- ger to Accumulator EVX 1000042D SP evmhosmia Vector Multiply Halfwords, Odd, Signed, Modulo, Integer to Accumulator EVX 1000042F SP evmhosmfa Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional to Accumulator VX 10000440 V vsubuhm Vector Subtract Unsigned Halfword Modulo VX 10000442 V vavguh Vector Average Unsigned Halfword VX 10000444 V vandc Vector Logical AND with Complement EVX 10000447 SP evmwhssf Vector Multiply Word High Signed, Saturate, Fractional EVX 10000448 SP evmwlumi Vector Multiply Word Low Unsigned, Modulo, Integer VX 1000044A V vminfp Vector Minimum Single-Precision EVX 1000044C SP evmwhumi Vector Multiply Word High Unsigned, Modulo, Integer VX 1000044C V vsro Vector Shift Right by Octet EVX 1000044D SP evmwhsmi Vector Multiply Word High Signed, Modulo, Integer EVX 1000044F SP evmwhsmf Vector Multiply Word High Signed, Modulo, Fractional EVX 10000453 SP evmwssf Vector Multiply Word Signed, Saturate, Fractional EVX 10000458 SP evmwumi Vector Multiply Word Unsigned, Modulo, Integer EVX 10000459 SP evmwsmi Vector Multiply Word Signed, Modulo, Integer EVX 1000045B SP evmwsmf Vector Multiply Word Signed, Modulo, Fractional EVX 10000467 SP evmwhssfa Vector Multiply Word High Signed, Saturate, Fractional to Accumulator EVX 10000468 SP evmwlumia Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator EVX 1000046C SP evmwhumia Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator EVX 1000046D SP evmwhsmia Vector Multiply Word High Signed, Modulo, Integer to Accumulator EVX 1000046F SP evmwhsmfa Vector Multiply Word High Signed, Modulo, Fractional to Accumulator EVX 10000473 SP evmwssfa Vector Multiply Word Signed, Saturate, Fractional to Accumulator EVX 10000478 SP evmwumia Vector Multiply Word Unsigned, Modulo, Integer to Accu- mulator EVX 10000479 SP evmwsmia Vector Multiply Word Signed, Modulo, Integer to Accumu- lator EVX 1000047B SP evmwsmfa Vector Multiply Word Signed, Modulo, Fractional to Accu- mulator VX 10000480 V vsubuwm Vector Subtract Unsigned Word Modulo VX 10000482 V vavguw Vector Average Unsigned Word VX 10000484 V vor Vector Logical OR EVX 100004C0 SP evaddusiaaw Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX 100004C1 SP evaddssiaaw Vector Add Signed, Saturate, Integer to Accumulator Word EVX 100004C2 SP evsubfusiaaw Vector Subtract Unsigned, Saturate, Integer to Accumula- tor Word EVX 100004C3 SP evsubfssiaaw Vector Subtract Signed, Saturate, Integer to Accumulator Word EVX 100004C4 SP evmra Initialize Accumulator VX 100004C4 V vxor Vector Logical XOR EVX 100004C6 SP evdivws Vector Divide Word Signed 1200 Power ISATM Book VLE Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 100004C7 SP evdivwu Vector Divide Word Unsigned EVX 100004C8 SP evaddumiaaw Vector Add Unsigned, Modulo, Integer to Accumulator Word EVX 100004C9 SP evaddsmiaaw Vector Add Signed, Modulo, Integer to Accumulator Word EVX 100004CA SP evsubfumiaaw Vector Subtract Unsigned, Modulo, Integer to Accumula- tor Word EVX 100004CB SP evsubfsmiaaw Vector Subtract Signed, Modulo, Integer to Accumulator Word EVX 10000500 SP evmheusiaaw Vector Multiply Halfwords, Even, Unsigned, Saturate, Inte- ger and Accumulate into Words EVX 10000501 SP evmhessiaaw Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate into Words VX 10000502 V vavgsb Vector Average Signed Byte EVX 10000503 SP evmhessfaaw Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional and Accumulate into Words EVX 10000504 SP evmhousiaaw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Inte- ger and Accumulate into Words VX 10000504 V vnor Vector Logical NOR EVX 10000505 SP evmhossiaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate into Words EVX 10000507 SP evmhossfaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate into Words EVX 10000508 SP evmheumiaaw Vector Multiply Halfwords, Even, Unsigned, Modulo, Inte- ger and Accumulate into Words EVX 10000509 SP evmhesmiaaw Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate into Words EVX 1000050B SP evmhesmfaaw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate into Words EVX 1000050C SP evmhoumiaaw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Inte- ger and Accumulate into Words EVX 1000050D SP evmhosmiaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate into Words EVX 1000050F SP evmhosmfaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate into Words EVX 10000528 SP evmhegumiaa Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 10000529 SP evmhegsmiaa Vector Multiply Halfwords, Even, Guarded, Signed, Mod- ulo, Integer and Accumulate EVX 1000052B SP evmhegsmfaa Vector Multiply Halfwords, Even, Guarded, Signed, Mod- ulo, Fractional and Accumulate EVX 1000052C SP evmhogumiaa Vector Multiply Halfwords, Odd, Guarded, Unsigned, Mod- ulo, Integer and Accumulate EVX 1000052D SP evmhogsmiaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer, and Accumulate EVX 1000052F SP evmhogsmfaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate EVX 10000540 SP evmwlusiaaw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Words EVX 10000541 SP evmwlssiaaw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words VX 10000542 V vavgsh Vector Average Signed Halfword EVX 10000548 SP evmwlumiaaw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words EVX 10000549 SP evmwlsmiaaw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words EVX 10000553 SP evmwssfaa Vector Multiply Word Signed, Saturate, Fractional and Accumulate EVX 10000558 SP evmwumiaa Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Appendix B. VLE Instruction Set Sorted by Opcode 1201 Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000559 SP evmwsmiaa Vector Multiply Word Signed, Modulo, Integer and Accu- mulate EVX 1000055B SP evmwsmfaa Vector Multiply Word Signed, Modulo, Fractional and Accumulate EVX 10000580 SP evmheusianw Vector Multiply Halfwords, Even, Unsigned, Saturate, Inte- ger and Accumulate Negative into Words VX 10000580 V vsubcuw Vector Subtract and Write Carry-Out Unsigned Word EVX 10000581 SP evmhessianw Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate Negative into Words VX 10000582 V vavgsw Vector Average Signed Word EVX 10000583 SP evmhessfanw Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 10000584 SP evmhousianw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Inte- ger and Accumulate Negative into Words EVX 10000585 SP evmhossianw Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate Negative into Words EVX 10000587 SP evmhossfanw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 10000588 SP evmheumianw Vector Multiply Halfwords, Even, Unsigned, Modulo, Inte- ger and Accumulate Negative into Words EVX 10000589 SP evmhesmianw Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate Negative into Words EVX 1000058B SP evmhesmfanw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 1000058C SP evmhoumianw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Inte- ger and Accumulate Negative into Words EVX 1000058D SP evmhosmianw Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate Negative into Words EVX 1000058F SP evmhosmfanw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 100005A8 SP evmhegumian Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 100005A9 SP evmhegsmian Vector Multiply Halfwords, Even, Guarded, Signed, Mod- ulo, Integer and Accumulate Negative EVX 100005AB SP evmhegsmfan Vector Multiply Halfwords, Even, Guarded, Signed, Mod- ulo, Fractional and Accumulate Negative EVX 100005AC SP evmhogumian Vector Multiply Halfwords, Odd, Guarded, Unsigned, Mod- ulo, Integer and Accumulate Negative EVX 100005AD SP evmhogsmian Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate Negative EVX 100005AF SP evmhogsmfan Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate Negative EVX 100005C0 SP evmwlusianw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words EVX 100005C1 SP evmwlssianw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words EVX 100005C8 SP evmwlumianw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words EVX 100005C9 SP evmwlsmianw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words EVX 100005D3 SP evmwssfan Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative EVX 100005D8 SP evmwumian Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative EVX 100005D9 SP evmwsmian Vector Multiply Word Signed, Modulo, Integer and Accu- mulate Negative EVX 100005DB SP evmwsmfan Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative VX 10000600 V vsububs Vector Subtract Unsigned Byte Saturate 1202 Power ISATM Book VLE Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 VX 10000604 V mfvscr Move From VSCR VX 10000608 V vsum4ubs Vector Sum across Quarter Unsigned Byte Saturate VX 10000640 V vsubuhs Vector Subtract Unsigned Halfword Saturate VX 10000644 V mtvscr Move To VSCR VX 10000648 V vsum4shs Vector Sum across Quarter Signed Halfword Saturate VX 10000680 V vsubuws Vector Subtract Unsigned Word Saturate VX 10000688 V vsum2sws Vector Sum across Half Signed Word Saturate VX 10000700 V vsubsbs Vector Subtract Signed Byte Saturate VX 10000708 V vsum4sbs Vector Sum across Quarter Signed Byte Saturate VX 10000740 V vsubshs Vector Subtract Signed Halfword Saturate VX 10000780 V vsubsws Vector Subtract Signed Word Saturate VX 10000788 V vsumsws Vector Sum across Signed Word Saturate D8 18000000 VLE e_lbzu Load Byte and Zero with Update D8 18000100 VLE e_lhzu Load Halfword and Zero with Update D8 18000200 VLE e_lwzu Load Word and Zero with Update D8 18000300 VLE e_lhau Load Halfword Algebraic with Update D8 18000400 VLE e_stbu Store Byte with Update D8 18000500 VLE e_sthu Store Halfword with Update D8 18000600 VLE e_stwu Store Word with Update D8 18000800 VLE e_lmw Load Multiple Word D8 18000900 VLE e_stmw Store Multiple Word SCI8 18008000 SR VLE e_addi[.] Add Scaled Immediate SCI8 18009000 SR VLE e_addic[.] Add Scaled Immediate Carrying SCI8 1800A000 VLE e_mulli Multiply Low Scaled Immediate SCI8 1800A800 VLE e_cmpi Compare Scaled Immediate Word SCI8 1800B000 SR VLE e_subfic[.] Subtract From Scaled Immediate Carrying SCI8 1800C000 SR VLE e_andi[.] AND Scaled Immediate SCI8 1800D000 SR VLE e_ori[.] OR Scaled Immediate SCI8 1800E000 SR VLE e_xori[.] XOR Scaled Immediate SCI8 1880A800 VLE e_cmpli Compare Logical Scaled Immediate Word D 1C000000 VLE e_add16i Add Immediate OIM5 2000---- VLE se_addi Add Immediate Short Form OIM5 2200---- VLE se_cmpli Compare Logical Immediate Word OIM5 2400---- SR VLE se_subi[.] Subtract Immediate IM5 2A00---- VLE se_cmpi Compare Immediate Word Short Form IM5 2C00---- VLE se_bmaski Bit Mask Generate Immediate IM5 2E00---- VLE se_andi AND Immediate Short Form D 30000000 VLE e_lbz Load Byte and Zero D 34000000 VLE e_stb Store Byte D 38000000 VLE e_lha Load Halfword Algebraic RR 4000---- VLE se_srw Shift Right Word RR 4100---- VLE se_sraw Shift Right Algebraic Word RR 4200---- VLE se_slw Shift Left Word RR 4400---- VLE se_or OR Short Form RR 4500---- VLE se_andc AND with Complement Short Form RR 4600---- SR VLE se_and[.] AND Short Form IM7 4800---- VLE se_li Load Immediate Short Form D 50000000 VLE e_lwz Load Word and Zero D 54000000 VLE e_stw Store Word D 58000000 VLE e_lhz Load Halfword and Zero D 5C000000 VLE e_sth Store Halfword IM5 6000---- VLE se_bclri Bit Clear Immediate IM5 6200---- VLE se_bgeni Bit Generate Immediate IM5 6400---- VLE se_bseti Bit Set Immediate IM5 6600---- VLE se_btsti Bit Test Immediate IM5 6800---- VLE se_srwi Shift Right Word Immediate Short Form IM5 6A00---- VLE se_srawi Shift Right Algebraic Word Immediate IM5 6C00---- VLE se_slwi Shift Left Word Immediate Short Form LI20 70000000 VLE e_li Load Immediate I16A 70008800 SR VLE e_add2i. Add (2 operand) Immediate and Record I16A 70009000 VLE e_add2is Add (2 operand) Immediate Shifted Appendix B. VLE Instruction Set Sorted by Opcode 1203 Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 I16A 70009800 VLE e_cmp16i Compare Immediate Word I16A 7000A000 VLE e_mull2i Multiply (2 operand) Low Immediate I16A 7000A800 VLE e_cmpl16i Compare Logical Immediate Word I16A 7000B000 VLE e_cmph16i Compare Halfword Immediate I16A 7000B800 VLE e_cmphl16i Compare Halfword Logical Immediate I16L 7000C000 VLE e_or2i OR (two operand) Immediate I16L 7000C800 SR VLE e_and2i. AND (two operand) Immediate I16L 7000D000 VLE e_or2is OR (2 operand) Immediate Shifted I16L 7000E000 VLE e_lis Load Immediate Shifted I16L 7000E800 SR VLE e_and2is. AND (2 operand) Immediate Shifted M 74000000 VLE e_rlwimi Rotate Left Word Immediate then Mask Insert M 74000001 VLE e_rlwinm Rotate Left Word Immediate then AND with Mask BD24 78000000 VLE e_b[l] Branch [and Link] BD15 7A000000 CT VLE e_bc[l] Branch Conditional [and Link] X 7C000000 B cmp Compare X 7C000008 B tw Trap Word X 7C00000C V lvsl Load Vector for Shift Left Indexed X 7C00000E V lvebx Load Vector Element Byte Indexed XO 7C000010 SR B subfc[o][.] Subtract From Carrying XO 7C000012 SR 64 mulhdu[.] Multiply High Doubleword Unsigned XO 7C000014 SR B addc[o][.] Add Carrying XO 7C000016 SR B mulhwu[.] Multiply High Word Unsigned X 7C00001C VLE e_cmph Compare Halfword A 7C00001E B isel Integer Select XL 7C000020 VLE e_mcrf Move CR Field XFX 7C000026 B mfcr Move From Condition Register X 7C000028 B lwarx Load Word And Reserve Indexed X 7C00002A 64 ldx Load Doubleword Indexed X 7C00002C E icbt Instruction Cache Block Touch X 7C00002E B lwzx Load Word and Zero Indexed X 7C000030 SR B slw[.] Shift Left Word X 7C000034 SR B cntlzw[.] Count Leading Zeros Word X 7C000036 SR 64 sld[.] Shift Left Doubleword X 7C000038 SR B and[.] AND X 7C00003A P E.PD;64 ldepx Load Doubleword by External Process ID Indexed X 7C00003E P E.PD lwepx Load Word by External Process ID Indexed X 7C000040 B cmpl Compare Logical XL 7C000042 VLE e_crnor Condition Register NOR ESC 7C000048 VLE, e_sc System Call E.HV X 7C00004C V lvsr Load Vector for Shift Right Indexed X 7C00004E V lvehx Load Vector Element Halfword Indexed XO 7C000050 SR B subf[o][.] Subtract From X 7C00005C VLE e_cmphl Compare Halfword Logical X 7C000068 B lbarx Load Byte and Reserve Indexed X 7C00006A 64 ldux Load Doubleword with Update Indexed X 7C00006C B dcbst Data Cache Block Store X 7C00006E B lwzux Load Word and Zero with Update Indexed X 7C000070 SR VLE e_slwi[.] Shift Left Word Immediate X 7C000074 SR 64 cntlzd[.] Count Leading Zeros Doubleword X 7C000078 SR B andc[.] AND with Complement X 7C00007C WT wait Wait X 7C00007E E.PD dcbstep Data Cache Block Store by External PID X 7C000088 64 td Trap Doubleword X 7C00008E V lvewx Load Vector Element Word Indexed XO 7C000092 SR 64 mulhd[.] Multiply High Doubleword XO 7C000096 SR B mulhw[.] Multiply High Word X 7C00009C SR LMV dlmzb[.] Determine Leftmost Zero Byte X 7C0000A6 P B mfmsr Move From Machine State Register X 7C0000A8 64 ldarx Load Doubleword And Reserve Indexed X 7C0000AC B dcbf Data Cache Block Flush 1204 Power ISATM Book VLE Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 7C0000AE B lbzx Load Byte and Zero Indexed X 7C0000BE P E.PD lbepx Load Byte by External Process ID Indexed X 7C0000CE V lvx Load Vector Indexed XO 7C0000D0 SR B neg[o][.] Negate X 7C0000E8 B lharx Load Halfword and Reserve Indexed X 7C0000EE B lbzux Load Byte and Zero with Update Indexed X 7C0000F4 B popcntb Population Count Bytes X 7C0000F8 SR B nor[.] NOR X 7C0000FE P E.PD dcbfep Data Cache Block Flush by External PID XL 7C000102 VLE e_crandc Condition Register AND with Complement X 7C000106 P E wrtee Write MSR External Enable X 7C00010C M ECL dcbtstls Data Cache Block Touch for Store and Lock Set X 7C00010E V stvebx Store Vector Element Byte Indexed XO 7C000110 SR B subfe[o][.] Subtract From Extended XO 7C000114 SR B adde[o][.] Add Extended XFX 7C000120 B mtcrf Move To Condition Register Fields X 7C000124 P E mtmsr Move To Machine State Register X 7C00012A 64 stdx Store Doubleword Indexed X 7C00012D B stwcx. Store Word Conditional Indexed X 7C00012E B stwx Store Word Indexed X 7C00013A P E.PD;64 stdepx Store Doubleword by External Process ID Indexed X 7C00013E P E.PD stwepx Store Word by External Process ID Indexed X 7C000146 P E wrteei Write MSR External Enable Immediate X 7C00014C M ECL dcbtls Data Cache Block Touch and Lock Set X 7C00014E V stvehx Store Vector Element Halfword Indexed X 7C00016A 64 stdux Store Doubleword with Update Indexed X 7C00016E B stwux Store Word with Update Indexed XL 7C000182 VLE e_crxor Condition Register XOR X 7C00018E V stvewx Store Vector Element Word Indexed XO 7C000190 SR B subfze[o][.] Subtract From Zero Extended XO 7C000194 SR B addze[o][.] Add to Zero Extended X 7C00019C H E.PC msgsnd Message Send X 7C0001AD 64 stdcx. Store Doubleword Conditional Indexed X 7C0001AE B stbx Store Byte Indexed X 7C0001BE P E.PD stbepx Store Byte by External Process ID Indexed XL 7C0001C2 VLE e_crnand Condition Register NAND X 7C0001CC M ECL icblc Instruction Cache Block Lock Clear X 7C0001CE V stvx Store Vector Indexed XO 7C0001D0 SR B subfme[o][.] Subtract From Minus One Extended XO 7C0001D2 SR 64 mulld[o][.] Multiply Low Doubleword XO 7C0001D4 SR B addme[o][.] Add to Minus One Extended XO 7C0001D6 SR B mullw[o][.] Multiply Low Word X 7C0001DC H E.PC msgclr Message Clear X 7C0001EC B dcbtst Data Cache Block Touch for Store X 7C0001EE B stbux Store Byte with Update Indexed X 7C0001FE P E.PD dcbtstep Data Cache Block Touch for Store by External PID XL 7C000202 VLE e_crand Condition Register AND X 7C000206 P E.DC mfdcrx Move From Device Control Register Indexed X 7C00020E P E.PD lvepxl Load Vector by External Process ID Indexed LRU XO 7C000214 SR B add[o][.] Add XL 7C00021C E.HV ehpriv Embedded Hypervisor Privilege X 7C00022C B dcbt Data Cache Block Touch X 7C00022E B lhzx Load Halfword and Zero Indexed X 7C000230 SR VLE e_rlw[.] Rotate Left Word X 7C000238 SR B eqv[.] Equivalent X 7C00023E P E.PD lhepx Load Halfword by External Process ID Indexed XL 7C000242 VLE e_creqv Condition Register Equivalent X 7C000246 E.DC mfdcrux Move From Device Control Register User-mode Indexed X 7C00024E P E.PD lvepx Load Vector by External Process ID Indexed X 7C00026E B lhzux Load Halfword and Zero with Update Indexed X 7C000270 SR VLE e_rlwi[.] Rotate Left Word Immediate Appendix B. VLE Instruction Set Sorted by Opcode 1205 Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 7C000278 SR B xor[.] XOR X 7C00027E P E.PD dcbtep Data Cache Block Touch by External PID XFX 7C000286 P E.DC mfdcr Move From Device Control Register X 7C00028C P E.CD dcread Data Cache Read XFX 7C00029C O E.PM mfpmr Move From Performance Monitor Register XFX 7C0002A6 O B mfspr Move From Special Purpose Register X 7C0002AA 64 lwax Load Word Algebraic Indexed X 7C0002AE B lhax Load Halfword Algebraic Indexed X 7C0002CE V lvxl Load Vector Indexed LRU X 7C0002EA 64 lwaux Load Word Algebraic with Update Indexed X 7C0002EE B lhaux Load Halfword Algebraic with Update Indexed X 7C000306 P E.DC mtdcrx Move To Device Control Register Indexed X 7C00030C M ECL dcblc Data Cache Block Lock Clear X 7C00032E B sthx Store Halfword Indexed X 7C000338 SR B orc[.] OR with Complement X 7C00033E P E.PD sthepx Store Halfword by External Process ID Indexed XL 7C000342 VLE e_crorc Condition Register OR with Complement X 7C000346 E.DC mtdcrux Move To Device Control Register User-mode Indexed X 7C00036E B sthux Store Halfword with Update Indexed X 7C000378 SR B or[.] OR XL 7C000382 VLE e_cror Condition Register OR XFX 7C000386 P E.DC mtdcr Move To Device Control Register X 7C00038C H E.CI dci Data Cache Invalidate XO 7C000392 SR 64 divdu[o][.] Divide Doubleword Unsigned XO 7C000396 SR B divwu[o][.] Divide Word Unsigned XFX 7C00039C O E.PM mtpmr Move To Performance Monitor Register XFX 7C0003A6 O B mtspr Move To Special Purpose Register X 7C0003AC P E dcbi Data Cache Block Invalidate X 7C0003C6 DS dsn Decorated Storage Notify X 7C0003CC M ECL icbtls Instruction Cache Block Touch and Lock Set X 7C0003CC H E.CD dcread Data Cache Read [Alternative Encoding] X 7C0003CE V stvxl Store Vector Indexed LRU XO 7C0003D2 SR 64 divd[o][.] Divide Doubleword XO 7C0003D6 SR B divw[o][.] Divide Word X 7C000400 E mcrxr Move To Condition Register from XER X 7C000406 DS lbdx Load Byte with Decoration Indexed X 7C00042A MA lswx Load String Word Indexed X 7C00042C B lwbrx Load Word Byte-Reverse Indexed X 7C000430 SR B srw[.] Shift Right Word X 7C000436 SR 64 srd[.] Shift Right Doubleword X 7C000446 DS lhdx Load Halfword with Decoration Indexed X 7C00046C H E tlbsync TLB Synchronize X 7C000470 SR VLE e_srwi[.] Shift Right Word Immediate X 7C000486 DS lwdx Load Word with Decoration Indexed X 7C0004AA MA lswi Load String Word Immediate X 7C0004AC B sync Synchronize X 7C0004BE P E.PD lfdepx Load Floating-Point Double by External Process ID Indexed X 7C0004C6 DS lddx Load Doubleword with Decoration Indexed X 7C000506 DS stbdx Store Byte with Decoration Indexed X 7C00052A MA stswx Store String Word Indexed X 7C00052C B stwbrx Store Word Byte-Reverse Indexed X 7C000546 DS sthdx Store Halfword with Decoration Indexed X 7C00056D B stbcx. Store Byte Conditional Indexed X 7C000586 DS stwdx Store Word with Decoration Indexed X 7C0005AA MA stswi Store String Word Immediate X 7C0005AD B sthcx. Store Halfword Conditional Indexed X 7C0005BE P E.PD stfdepx Store Floating-Point Double by External Process ID Indexed X 7C0005C6 DS stddx Store Doubleword with Decoration Indexed X 7C0005EC E dcba Data Cache Block Allocate 1206 Power ISATM Book VLE Version 2.06 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 7C00060E P E.PD stvepxl Store Vector by External Process ID Indexed LRU X 7C000624 H E tlbivax TLB Invalidate Virtual Address Indexed X 7C00062C B lhbrx Load Halfword Byte-Reverse Indexed X 7C000630 SR B sraw[.] Shift Right Algebraic Word X 7C000634 SR 64 srad[.] Shift Right Algebraic Doubleword EVX 7C00063E P E.PD evlddepx Vector Load Doubleword into Doubleword by External Process ID Indexed X 7C000646 DS lfddx Load Floating Doubleword with Decoration Indexed X 7C00064E P E.PD stvepx Store Vector by External Process ID Indexed X 7C000670 SR B srawi[.] Shift Right Algebraic Word Immediate XS 7C000674 SR 64 sradi[.] Shift Right Algebraic Doubleword Immediate X 7C0006AC E mbar Memory Barrier X 7C000724 H E tlbsx TLB Search Indexed X 7C00072C B sthbrx Store Halfword Byte-Reverse Indexed X 7C000734 SR B extsh[.] Extend Sign Halfword EVX 7C00073E P E.PD evstddepx Vector Store Doubleword into Doubleword by External Process ID Indexed X 7C000746 DS stfddx Store Floating Doubleword with Decoration Indexed X 7C000764 H E tlbre TLB Read Entry X 7C000774 SR B extsb[.] Extend Sign Byte X 7C00078C H E.CI ici Instruction Cache Invalidate X 7C0007A4 H E tlbwe TLB Write Entry X 7C0007AC B icbi Instruction Cache Block Invalidate X 7C0007B4 SR 64 extsw[.] Extend Sign Word X 7C0007BE P E.PD icbiep Instruction Cache Block Invalidate by External PID X 7C0007CC H E.CD icread Instruction Cache Read X 7C0007EC B dcbz Data Cache Block set to Zero X 7C0007FE P E.PD dcbzep Data Cache Block set to Zero by External PID XFX 7C100026 B mfocrf Move From One Condition Register Field XFX 7C100120 B mtocrf Move To One Condition Register Field SD4 8000---- VLE se_lbz Load Byte and Zero Short Form SD4 9000---- VLE se_stb Store Byte Short Form SD4 A000---- VLE se_lhz Load Halfword and Zero Short Form SD4 B000---- VLE se_sth Store Halfword Short Form SD4 C000---- VLE se_lwz Load Word and Zero Short Form SD4 D000---- VLE se_stw Store Word Short Form BD8 E000---- VLE se_bc Branch Conditional Short Form BD8 E800---- VLE se_b[l] Branch [and Link] 1 See the key to the mode dependency and privilege column below and the key to the category column in Section 1.3.5 of Book I. 2For 16-bit instructions, the "Opcode" column represents the 16-bit hexadecimal instruction encoding with the opcode and extended opcode in the corresponding fields in the instruction, and with 0's in bit positions which are not opcode bits; dashes are used following the opcode to indicate the form is a 16-bit instruction. For 32-bit instructions, the "Opcode" column represents the 32-bit hexadecimal instruction encoding with the opcode, extended opcode, and other fields with fixed values in the corresponding fields in the instruction, and with 0's in bit positions which are not opcode, extended opcode or fixed value bits." Mode Dependency and Privilege Abbreviations Except as described below and in Section 1.10.3, "Effective Address Calculation", in Book I, all instructions are inde- pendent of whether the processor is in 32-bit or 64-bit mode. Mode Dep. Description CT If the instruction tests the Count Register, it tests the low-order 32 bits in 32-bit mode and all 64 bits in 64-bit mode. SR The setting of status registers (such as XER and CR0) is mode-dependent. Appendix B. VLE Instruction Set Sorted by Opcode 1207 Version 2.06 Mode Dep. Description 32 The instruction must be executed only in 32- bit mode. 64 The instruction must be executed only in 64- bit mode. Key to Privilege Column Priv. Description P Denotes a privileged instruction. O Denotes an instruction that is treated as privi- leged or nonprivileged (or hypervisor, for mtspr), depending on the SPR or PMR number. M Denotes an instruction that is treated as privi- leged or nonprivileged, depending on the value of the UCLE bit of the MSR. H Denotes an instruction that can be executed only in hypervisor state. 1208 Power ISATM Book VLE Version 2.06 Appendices: Power ISA Book I-III Appendices Appendices: Power ISA AS Book I-III Appendices 1209 Version 2.06 1210 Power ISATM Book Appendices Version 2.06 Appendix A. Incompatibilities with the POWER Architecture This appendix identifies the known incompatibilities In general, the incompatibilities identified here are that must be managed in the migration from the those that affect a POWER application program. POWER Architecture to the Power ISA. Some of the Incompatibilities for instructions that can be used only incompatibilities can, at least in principle, be detected by POWER operating system programs are not neces- by the processor, which could trap and let software sim- sarily discussed. Discussion of incompatibilities that ulate the POWER operation. Others cannot be pertain only to operating system programs assumes detected by the processor even in principle. the Server environment (because there is no need for POWER operating system programs to run in the Embedded environment). A.1 New Instructions, Formerly In several cases the Power ISA assumes that reserved fields in POWER instructions indeed contain zero. The Privileged Instructions cases include the following. bclr[l] and bcctr[l] assume that bits 19:20 in the Instructions new to Power ISA typically use opcode val- POWER instructions contain zero. ues (including extended opcode) that are illegal in cmpi, cmp, cmpli, and cmpl assume that bit 10 in POWER. A few instructions that are privileged in the POWER instructions contains zero. POWER (e.g., dclz, called dcbz in Power ISA) have mtspr and mfspr assume that bits 16:20 in the been made nonprivileged in Power ISA. Any POWER POWER instructions contain zero. program that executes one of these now-valid or now- mtcrf and mfcr assume that bit 11 in the POWER nonprivileged instructions, expecting to cause the sys- instructions is contains zero. tem illegal instruction error handler or the system privi- Synchronize assumes that bits 9:10 in the POWER leged instruction error handler to be invoked, will not instruction (dcs) contain zero. (This assumption execute correctly on Power ISA. provides compatibility for application programs, but not necessarily for operating system programs; A.2 Newly Privileged see Section A.22.) mtmsr assumes that bit 15 in the POWER instruc- Instructions tion contains zero. The following instructions are nonprivileged in POWER but privileged in Power ISA. A.4 Reserved Bits in Registers mfmsr Both POWER and Power ISA permit software to write mfsr any value to these bits. However in POWER reading such a bit always returns 0, while in Power ISA reading it may return either 0 or the value that was last written A.3 Reserved Fields in to it. Instructions These fields are shown with "/"s in the instruction lay- A.5 Alignment Check outs. In both POWER and Power ISA these fields are ignored by the processor. The Power ISA states that The POWER MSR AL bit (bit 24) is no longer sup- these fields must contain zero. The POWER Architec- ported; the corresponding Power ISA MSR bit, bit 56, is ture lacks such a statement, but it is expected that reserved. The low-order bits of the EA are always used. essentially all POWER programs contain zero in these (Notice that the value 0 -- the normal value for a fields. reserved bit --- means "ignore the low-order EA bits" in Appendix A. Incompatibilities with the POWER Architecture 1211 Version 2.06 POWER, and the value 1 means "use the low-order EA to be interpreted, they are in fact ignored by the proces- bits".) POWER-compatible operating system code will sor. probably write the value 1 to this bit. Power ISA shows these bits as "z", "a", or "t". The "z" bits are ignored, as in POWER. However, the "a" and "t" A.6 Condition Register bits can be used by software to provide a hint about how the branch is likely to behave. If a POWER pro- The following instructions specify a field in the CR gram has the "wrong" value for these bits, the program explicitly (via the BF field) and also, in POWER, use bit will produce the same results as on POWER but perfor- 31 as the Record bit. In Power ISA, bit 31 is a reserved mance may be affected. field for these instructions and is ignored by the proces- sor. In POWER, if bit 31 contains 1 the instructions exe- cute normally (i.e., as if the bit contained 0) except as A.9 BH Field follows: Bits 19:20 of the Branch Conditional to Link Register and Branch Conditional to Count Register instructions cmp CR0 is undefined if Rc=1 and BF0 are reserved in POWER but are defined as a branch cmpl CR0 is undefined if Rc=1 and BF0 hint (BH) field in Power ISA. Because these bits are mcrxr CR0 is undefined if Rc=1 and BF0 hints, they may affect performance but do not affect the fcmpu CR1 is undefined if Rc=1 results of executing the instruction. fcmpo CR1 is undefined if Rc=1 mcrfs CR1 is undefined if Rc=1 and BF1 A.10 Branch Conditional to A.7 LK and Rc Bits Count Register For the instructions listed below, if bit 31 (LK or Rc bit in For the case in which the Count Register is decre- POWER) contains 1, in POWER the instruction exe- mented and tested (i.e., the case in which BO2=0), cutes as if the bit contained 0 except as follows: if POWER specifies only that the branch target address LK=1, the Link Register is set (to an undefined value, is undefined, with the implication that the Count Regis- except for svc); if Rc=1, Condition Register Field 0 or 1 ter, and the Link Register if LK=1, are updated in the is set to an undefined value. In Power ISA, bit 31 is a normal way. Power ISA specifies that this instruction reserved field for these instructions and is ignored by form is invalid. the processor. Power ISA instructions for which bit 31 is the LK bit in POWER: A.11 System Call sc (svc in POWER) There are several respects in which Power ISA is the Condition Register Logical instructions incompatible with POWER for System Call instructions mcrf -- which in POWER are called Supervisor Call instruc- isync (ics in POWER) tions. Power ISA instructions for which bit 31 is the Rc bit in POWER provides a version of the Supervisor Call POWER: instruction (bit 30 = 0) that allows instruction fetch- ing to continue at any one of 128 locations. It is fixed-point X-form Load and Store instructions used for "fast SVCs". Power ISA provides no such fixed-point X-form Compare instructions version: if bit 30 of the instruction is 0 the instruc- the X-form Trap instruction tion form is invalid. mtspr, mfspr, mtcrf, mcrxr, mfcr, mtocrf, mfo- crf POWER provides a version of the Supervisor Call floating-point X-form Load and Store instructions instruction (bits 30:31 = 0b11) that resumes floating-point Compare instructions instruction fetching at one location and sets the mcrfs Link Register to the address of the next instruction. dcbz (dclz in POWER) Power ISA provides no such version: bit 31 is a reserved field. For POWER, information from the MSR is saved in A.8 BO Field the Count Register. For Power ISA this information is saved in SRR1. POWER shows certain bits in the BO field -- used by Branch Conditional instructions -- as "x". Although the In POWER bits 16:19 and 27:29 of the instruction POWER Architecture does not say how these bits are comprise defined instruction fields or a portion 1212 Power ISATM Book Appendices Version 2.06 thereof, while in Power ISA these bits comprise reserved fields. A.15 Load/Store Multiple In POWER bits 20:26 of the instruction comprise a Instructions portion of the SV field, while in Power ISA these There are two respects in which Power ISA is incom- bits comprise the LEV field. patible with POWER for Load Multiple and Store Multi- POWER saves the low-order 16 bits of the instruc- ple instructions. tion, in the Count Register. Power ISA does not If the EA is not word-aligned, in Power ISA either save them. an Alignment exception occurs or the addressed The settings of MSR bits by the associated inter- bytes are loaded, while in POWER an Alignment rupt differ between POWER and Power ISA; see interrupt occurs if MSRAL=1 (the low-order two bits POWER Processor Architecture and Book III. of the EA are ignored if MSRAL=0). In Power ISA the instruction may be interrupted by a system-caused interrupt, while in POWER the A.12 Fixed-Point Exception instruction cannot be thus interrupted. Register (XER) Bits 48:55 of the XER are reserved in Power ISA, while A.16 Move Assist Instructions in POWER the corresponding bits (16:23) are defined and contain the comparison byte for the lscbx instruc- There are several respects in which Power ISA is tion (which Power ISA lacks). incompatible with POWER for Move Assist instructions. In Power ISA an lswx instruction with zero length leaves the contents of RT undefined (if RTRA and A.13 Update Forms of Storage RTRB) or is an invalid instruction form (if RT=RA Access Instructions or RT=RB), while in POWER the corresponding instruction (lsx) is a no-op in these cases. Power ISA requires that RA not be equal to either RT (fixed-point Load only) or 0. If the restriction is violated the instruction form is invalid. POWER permits these In Power ISA a Move Assist instruction may be cases, and simply avoids saving the EA. interrupted by a system-caused interrupt, while in POWER the instruction cannot be thus interrupted. A.14 Multiple Register Loads A.17 Move To/From SPR Power ISArequires that RA, and RB if present in the instruction format, not be in the range of registers to be There are several respects in which Power ISA is loaded, while POWER permits this and does not alter incompatible with POWER for Move To/From Special RA or RB in this case. (The Power ISA restriction Purpose Register instructions. applies even if RA=0, although there is no obvious ben- The SPR field is ten bits long in Power ISA, but efit to the restriction in this case since RA is not used to only five in POWER (see also Section A.3, compute the effective address if RA=0.) If the Power "Reserved Fields in Instructions"). ISA restriction is violated, either the system illegal instruction error handler is invoked or the results are mfspr can be used to read the Decrementer in boundedly undefined. The instructions affected are: problem state in POWER, but only in privileged state in Power ISA. lmw (lm in POWER) lswi (lsi in POWER) If the SPR value specified in the instruction is not lswx (lsx in POWER) one of the defined values, POWER behaves as fol- lows. For example, an lmw instruction that loads all 32 regis- - If the instruction is executed in problem state ters is valid in POWER but is an invalid form in Power and SPR0=1, a Privileged Instruction type ISA. Program interrupt occurs. No architected reg- isters are altered except those set by the inter- rupt. - Otherwise no architected registers are altered. In this same case, Power ISA behaves as follows. - If the instruction is executed in problem state, a Hypervisor Emulation Assistance interrupt occurs if spr0=0 and a Privileged Instruction Appendix A. Incompatibilities with the POWER Architecture 1213 Version 2.06 type Program interrupt occurs if spr0=1. No architected registers are altered except those A.21 Zeroing Bytes in the Data set by the interrupt. Cache - If the instruction is executed in privileged The dclz instruction of POWER and the dcbz instruc- state, a Hypervisor Emulation Assistance tion of Power ISA have the same opcode. However, the interrupt occurs if the SPR value is 0 or, for functions differ in the following respects. mfspr only, if the SPR value is 4, 5, or 6. In dclz clears a line while dcbz clears a block. these cases no architected registers are dclz saves the EA in RA (if RA0) while dcbz does altered except those set by the interrupt. Oth- not. erwise no operation is performed. (See dclz is privileged while dcbz is not. Section 4.4.4, "Move To/From System Regis- ter Instructions" in Book III-S.) A.22 Synchronization A.18 Effects of Exceptions on The Synchronize instruction (called dcs in POWER) FPSCR Bits FR and FI and the isync instruction (called ics in POWER) cause more pervasive synchronization in Power ISA than in For the following cases, POWER does not specify how POWER. However, unlike dcs, Synchronize does not FR and FI are set, while Power ISA preserves them for wait until data cache block writes caused by preceding Invalid Operation Exception caused by a Compare instructions have been performed in main storage. instruction, sets FI to 1 and FR to an undefined value Also, Synchronize has an L field while dcs does not, for disabled Overflow Exception, and clears them oth- and some uses of the instruction by the operating sys- erwise. tem require L=2. (The L field corresponds to Invalid Operation Exception (enabled or disabled) reserved bits in dcs and hence is expected to be zero Zero Divide Exception (enabled or disabled) in POWER programs; see Section A.3.) Disabled Overflow Exception A.23 Move To Machine State A.19 Store Floating-Point Sin- Register Instruction gle Instructions The mtmsr instruction has an L field in Power ISA but There are several respects in which Power ISA is not in POWER. The function of the variant of mtmsr incompatible with POWER for Store Floating-Point Sin- with L=1 differs from the function of the instruction in gle instructions. the POWER architecture in the following ways. POWER uses FPSCRUE to help determine In Power ISA, this variant of mtmsr modifies only whether denormalization should be done, while the EE and RI bits of the MSR, while in the Power ISA does not. Using FPSCRUE is in fact POWER mtmsr modifies all bits of the MSR. incorrect: if FPSCRUE=1 and a denormalized sin- This variant of mtmsr is execution synchronizing gle-precision number is copied from one storage in Power ISA but is context synchronizing in location to another by means of lfs followed by POWER. (The POWER architecture lacks Power stfs, the two "copies" may not be the same. ISA's distinction between execution synchroniza- tion and context synchronization. The statement in For an operand having an exponent that is less the POWER architecture specification that mtmsr than 874 (unbiased exponent less than -149), is "synchronizing" is equivalent to stating that the POWER stores a zero (if FPSCRUE=0) while instruction is context synchronizing.) Power ISA stores an undefined value. Also, mtmsr is optional in Power ISA but required in POWER. A.20 Move From FPSCR POWER defines the high-order 32 bits of the result of A.24 Direct-Store Segments mffs to be 0xFFFF_FFFF, while Power ISA copies the high-order 32-bits of the FPSCR. POWER's direct-store segments are not supported in Power ISA. 1214 Power ISATM Book Appendices Version 2.06 A.25 Segment Register A.28 Floating-Point Interrupts Manipulation Instructions POWER uses MSR bit 20 to control the generation of interrupts for floating-point enabled exceptions, and The definitions of the four Segment Register Manipula- Power ISA uses the corresponding MSR bit, bit 52, for tion instructions mtsr, mtsrin, mfsr, and mfsrin differ the same purpose. However, in Power ISA this bit is in two respects between POWER and Power ISA. part of a two-bit value that controls the occurrence, pre- Instructions similar to mtsrin and mfsrin are called cision, and recoverability of the interrupt, while in mtsri and mfsri in POWER. POWER this bit is used independently to control the privilege: mfsr and mfsri are problem state instruc- occurrence of the interrupt (in POWER all floating-point tions in POWER, while mfsr and mfsrin interrupts are precise). are privileged in Power ISA. function: the "indirect" instructions (mtsri and mfsri) in POWER use an RA register in computing the Segment Register number, A.29 Timing Facilities and the computed EA is stored into RA (if RA0 and RART), while in Power ISA A.29.1 Real-Time Clock mtsrin and mfsrin have no RA field and the EA is not stored. The POWER Real-Time Clock is not supported in Power ISA. Instead, Power ISA provides a Time Base. mtsr, mtsrin (mtsri), and mfsr have the same Both the RTC and the TB are 64-bit Special Purpose opcodes in Power ISA as in POWER. mfsri (POWER) Registers, but they differ in the following respects. and mfsrin (Power ISA) have different opcodes. The RTC counts seconds and nanoseconds, while Also, the Segment Register Manipulation instructions the TB counts "ticks". The ticking rate of the TB is are required in POWER whereas they are optional in implementation-dependent. Power ISA. The RTC increments discontinuously: 1 is added to RTCU when the value in RTCL passes 999_999_999. The TB increments continuously: 1 A.26 TLB Entry Invalidation is added to TBU when the value in TBL passes 0xFFFF_FFFF. The tlbi instruction of POWER and the tlbie instruction The RTC is written and read by the mtspr and of Power ISA have the same opcode. However, the mfspr instructions, using SPR numbers that functions differ in the following respects. denote the RTCU and RTCL. The TB is written and tlbi computes the EA as (RA|0) + (RB), while tlbie read by the same instructions using different SPR lacks an RA field and computes the EA and related numbers. information as (RB). The SPR numbers that denote POWER's RTCL tlbi saves the EA in RA (if RA0), while tlbie lacks and RTCU are invalid in Power ISA. an RA field and does not save the EA. The RTC is guaranteed to increment at least once For tlbi the high-order 36 bits of RB are used in in the time required to execute ten Add Immediate computing the EA, while for tlbie these bits contain instructions. No analogous guarantee is made for additional information that is not directly related to the TB. the EA. Not all bits of RTCL need be implemented, while For tlbi has no RS operand, while for tlbie the all bits of the TB must be implemented. (RS) is an LPID value used to qualify the TLB invalidation. A.29.2 Decrementer Also, tlbi is required in POWER whereas tlbie is The Power ISA Decrementer differs from the POWER optional in Power ISA. Decrementer in the following respects. The Power ISA DEC decrements at the same rate A.27 Alignment Interrupts that the TB increments, while the POWER DEC decrements every nanosecond (which is the same Placing information about the interrupting instruction rate that the RTC increments). into the DSISR and the DAR when an Alignment inter- Not all bits of the POWER DEC need be imple- rupt occurs is optional in Power ISA but required in mented, while all bits of the Power ISA DEC must POWER. be implemented. The interrupt caused by the DEC has its own inter- rupt vector location in Power ISA, but is considered an External interrupt in POWER. Appendix A. Incompatibilities with the POWER Architecture 1215 Version 2.06 A.30 Deleted Instructions MNEM PRI XOP The following instructions are part of the POWER abs 31 360 Architecture but have been dropped from the Power clcs 31 531 ISA. clf 31 118 cli (*) 31 502 abs Absolute dclst 31 630 clcs Cache Line Compute Size div 31 331 clf Cache Line Flush divs 31 363 cli (*) Cache Line Invalidate doz 31 264 dclst Data Cache Line Store dozi 09 - div Divide lscbx 31 277 divs Divide Short maskg 31 29 doz Difference Or Zero maskir 31 541 dozi Difference Or Zero Immediate mfsri 31 627 lscbx Load String And Compare Byte Indexed mul 31 107 maskg Mask Generate nabs 31 488 maskir Mask Insert From Register rac (*) 31 818 mfsri Move From Segment Register Indirect rfi (*) 19 50 mul Multiply rfsvc 19 82 nabs Negative Absolute rlmi 22 - rac (*) Real Address Compute rrib 31 537 rfi (*) Return From Interrupt sle 31 153 rfsvc Return From SVC sleq 31 217 rlmi Rotate Left Then Mask Insert sliq 31 184 rrib Rotate Right And Insert Bit slliq 31 248 sle Shift Left Extended sllq 31 216 sleq Shift Left Extended With MQ slq 31 152 sliq Shift Left Immediate With MQ sraiq 31 952 slliq Shift Left Long Immediate With MQ sraq 31 920 sllq Shift Left Long With MQ sre 31 665 slq Shift Left With MQ srea 31 921 sraiq Shift Right Algebraic Immediate With MQ sreq 31 729 sraq Shift Right Algebraic With MQ sriq 31 696 sre Shift Right Extended srliq 31 760 srea Shift Right Extended Algebraic srlq 31 728 sreq Shift Right Extended With MQ srq 31 664 sriq Shift Right Immediate With MQ srliq Shift Right Long Immediate With MQ (*) This instruction is privileged. srlq Shift Right Long With MQ srq Shift Right With MQ Assembler Note It might be helpful to current software writers for the (*) This instruction is privileged. Assembler to flag the discontinued POWER Note: Many of these instructions use the MQ register. instructions. The MQ is not defined in the Power ISA. A.31 Discontinued Opcodes The opcodes listed below are defined in the POWER Architecture but have been dropped from the Power ISA. The list contains the POWER mnemonic (MNEM), the primary opcode (PRI), and the extended opcode (XOP) if appropriate. The corresponding instructions are reserved in Power ISA. 1216 Power ISATM Book Appendices Version 2.06 A.32 POWER2 Compatibility The POWER2 instruction set is a superset of the section, as are the new POWER2 instructions that are POWER instruction set. Some of the instructions added not included in the Power ISA. for POWER2 are included in the Power ISA. Those that Other incompatibilities are also listed. have been renamed in the Power ISA are listed in this A.32.1 Cross-Reference for the second column of the table: the remainder of the line gives the Power ISA mnemonic and the page on Changed POWER2 Mnemonics which the instruction is described, as well as the instruction names. The following table lists the new POWER2 instruction mnemonics that have been changed in the Power ISA POWER2 mnemonics that have not changed are not User Instruction Set Architecture, sorted by POWER2 listed. mnemonic. To determine the Power ISA mnemonic for one of these POWER2 mnemonics, find the POWER2 mnemonic in POWER2 Power ISA Page Mnemonic Instruction Mnemonic Instruction 142 fcir[.] Floating Convert Double to Integer fctiw[.] Floating Convert To Integer Word with Round 143 fcirz[.] Floating Convert Double to Integer fctiwz[.] Floating Convert To Integer Word with Round to Zero with round toward Zero A.32.2 Load/Store Floating-Point A.32.3 Floating-Point Conversion Double to Integer Several of the opcodes for the Load/Store Floating- The fcir and fcirz instructions of POWER2 have the Point Quad instructions of the POWER2 architecture same opcodes as do the fctiw and fctiwz instructions, have been reclaimed by the Load/Store Foating-Point respectively, of Power ISA. However, the functions differ Double [Indexed] instructions (entries with a '-' in the in the following respects. Power ISA column have not been reclaimed): fcir and fcirz set the high-order 32 bits of the tar- MNEMONIC get FPR to 0xFFFF_FFFF, while fctiw and fctiwz POWER2 POWER ISA PRI XOP set them to an undefined value. Except for enabled Invalid Operation Exceptions, lfq lq 56 - fcir and fcirz set the FPRF field of the FPSCR lfqu lfdp 57 0 based on the result, while fctiw and fctiwz set it to lfqux - 31 823 an undefined value. lfqx lfdpx 31 791 fcir and fcirz do not affect the VXSNAN bit of the stfq - 60 - FPSCR, while fctiw and fctiwz do. stfqu stfdp 61 - fcir and fcirz set FPSCRXX to 1 for certain cases stfqux - 31 951 of "Large Operands" (i.e., operands that are too stfqx stfdpx 31 919 large to be represented as a 32-bit signed fixed- Differences between the l/stfdp[x] instructions and the point integer), while fctiw and fctiwz do not alter it POWER2 l/stfq[u][x] instructions include the following. for any case of "Large Operand". (The IEEE stan- The storage operand for the l/stfdp[x] instructions dard requires not altering it for "Large Operands".) must be quadword aligned for optimal perfor- mance. The register pairs for the l/stfdp[x] instructions must be even-odd pairs, instead of any consecu- tive pair. The l/stfdp[x] instructions do not have update forms. Appendix A. Incompatibilities with the POWER Architecture 1217 Version 2.06 A.32.4 Floating-Point Interrupts POWER2 uses MSR bits 20 and 23 to control the gen- eration of interrupts for floating-point enabled excep- tions, and Power ISA uses the corresponding MSR bits, bits 52 and 55, for the same purpose. However, in Power ISA these bits comprise a two-bit value that con- trols the occurrence, precision, and recoverability of the interrupt, while in POWER2 these bits are used inde- pendently to control the occurrence (bit 20) and the precision (bit 23) of the interrupt. Moreover, in Power ISA all floating-point interrupts are considered Program interrupts, while in POWER2 imprecise floating-point interrupts have their own interrupt vector location. A.32.5 Trace The Trace interrupt vector location differs between the two architectures, and there are many other differ- ences. A.33 Deleted Instructions The following instructions are new in POWER2 imple- mentations of the POWER Architecture but have been dropped from the Power ISA. lfq Load Floating-Point Quad lfqu Load Floating-Point Quad with Update lfqux Load Floating-Point Quad with Update Indexed lfqx Load Floating-Point Quad Indexed stfq Store Floating-Point Quad stfqu Store Floating-Point Quad with Update stfqux Store Floating-Point Quad with Update Indexed stfqx Store Floating-Point Quad Indexed A.33.1 Discontinued Opcodes The opcodes listed below are new in POWER2 imple- mentations of the POWER Architecture but have been dropped from the Power ISA. The list contains the POWER2 mnemonic (MNEM), the primary opcode (PRI), and the extended opcode (XOP) if appropriate. The instructions are either illegal or reserved in Power ISA; see Appendix D. MNEM PRI XOP lfq 56 - lfqx 31 791 stfqx 31 919 1218 Power ISATM Book Appendices Version 2.06 Appendix B. Platform Support Requirements As described in Chapter 1 of Book I, the architecture is structured as a collection of categories. Each category is comprised of facilities and/or instructions that together provide a unit of functionality. The Server and Embedded categories are referred to as "special" because all implementations must support at least one of these categories. Each special category, when taken together with the Base category, is referred to as an "environment", and provides the minimum functionality required to develop operating systems and applica- tions. Every processor implementation supports at least one of the environments, and may also support a set of cat- egories chosen based on the target market for the implementation. However, a Server implementation supports only those categories designated as part of the Server platform in Figure 21. To facilitate the devel- opment of operating systems and applications for a well-defined purpose or customer set, usually embod- ied in a unique hardware platform, this appendix docu- ments the association between a platform and the set of categories it requires. Adding a new platform may permit cost-performance optimization by clearly identifying a unique set of cate- gories. However, this has the potential to fragment the application base. As a result, new platforms will be added only when the optimization benefit clearly out- weighs the loss due to fragmentation. The platform support requirements are documented in Figure 21. An "x" in a column indicates that the cate- gory is required. A "+" in a column indicates that the requirement is being phased in. Appendix B. Platform Support Requirements 1221 Version 2.06 Category Server Plat- Embedded form Platform Base x x Server x Embedded x Alternate Time Base Cache Specification Decimal Floating-Point x Embedded.Cache Debug Embedded.Cache Initialization Embedded.Device Control Embedded.Enhanced Debug Embedded.External PID Embedded.Hypervisor Embedded.Hypervisor.LRAT Embedded.Little-Endian Embedded.Page Table Embedded.Performance Monitor Embedded.Processor Control Embedded Cache Locking Embedded Multi-Threading Embedded Multi-Threading.Thread Man- agement Embedded.TLB Write Conditional External Control External Proxy Floating-Point x Floating-Point.Record x Legacy Move Assist Legacy Integer Multiply-Accumulate Load/Store Quadword x2 Memory Coherence x Move Assist x Processor Compatibility Server.Performance Monitor x Signal Processing Engine SPE.Embedded Float Scalar Double SPE.Embedded Float Scalar Single SPE.Embedded Float Vector Store Conditional Page Mobility x Stream x Strong Access Order x Trace x Variable Length Encoding Vector + Vector.Little-Endian +1 Figure 21. Platform Support Requirements (Sheet 1 of 2) 1222 Power ISATM Book Appendices Version 2.06 Category Server Plat- Embedded form Platform Wait 64-Bit x 1. If the Vector category is supported, Vector.Little-Endian is required on the Server platform. 2. Optional for the Server Platform. Figure 21. Platform Support Requirements (Sheet 2 of 2) Appendix B. Platform Support Requirements 1223 Version 2.06 1224 Power ISATM Book Appendices Version 2.06 Appendix C. Complete SPR List This appendix lists all the Special Purpose Registers in the Power ISA , ordered by SPR number. SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 1 00000 00001 XER no no 64 B 8 00000 01000 LR no no 64 B 9 00000 01001 CTR no no 64 B 13 00000 01101 AMR no9 no 64 S 17 00000 10001 DSCR yes yes 64 STM 18 00000 10010 DSISR yes yes 32 S 19 00000 10011 DAR yes yes 64 S 22 00000 10110 DEC yes13 yes13 32 B 3 25 00000 11001 SDR1 hypv hypv3 64 S 13 26 00000 11010 SRR0 yes yes13 64 B 13 27 00000 11011 SRR1 yes yes13 64 B 28 00000 11100 CFAR yes yes 64 S 29 00000 11101 AMR yes9 yes 64 S 48 00001 10000 PID yes yes 32 E 54 00001 10110 DECAR hypv12 hypv12 32 E 55 00001 10111 MCIVPR hypv9 hypv9 64 E 12 56 00001 11000 LPER hypv hypv9 64 E.HV; E.PT 9 57 00001 11001 LPERU hypv hypv9 32 E.HV; E.PT 12 58 00001 11010 CSRR0 hypv hypv12 64 E 59 00001 11011 CSRR1 hypv12 hypv12 32 E 13 61 00001 11101 DEAR yes yes13 64 E 13 62 00001 11110 ESR yes yes13 32 E 12 hypv12 63 00001 11111 IVPR hypv 64 E 136 00100 01000 CTRL - no 32 S 152 00100 11000 CTRL yes - 32 S 157 00100 11101 UAMOR yes10 yes 64 S 256 01000 00000 VRSAVE no no 32 B 259 01000 00011 SPRG3 - no 64 B 260-263 01000 001xx SPRG[4-7] - no 64 E 268 01000 01100 TB - no 64 B 269 01000 01101 TBU - no 32 B 272-275 01000 100xx SPRG[0-3] yes13 yes13 64 B 276-279 01000 101xx SPRG[4-7] yes yes 64 E 282 01000 11010 EAR hypv4 hypv4 32 EC 284 01000 11100 TBL hypv4 - 32 B 285 01000 11101 TBU hypv4 - 32 B 286 01000 11110 TBU40 hypv - 64 S 286 01000 11110 PIR hypv12 yes13 32 E 287 01000 11111 PVR - yes 32 B Figure 22. SPR Numbers (Sheet 1 of 4) Appendix C. Complete SPR List 1225 Version 2.06 SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 304 01001 10000 HSPRG0 hypv3 hypv3 64 S 5,12 304 01001 10000 DBSR hypv hypv9 32 E 305 01001 10001 HSPRG1 hypv3 hypv3 64 S 306 01001 10010 HDSISR hypv3 hypv3 32 S 3 306 01001 10010 DBSRWR hypv - 32 E.HV 307 01001 10011 HDAR hypv3 hypv3 64 S 307 01001 10011 EPCR hypv3 hypv3 32 E.HV, (E;64) 308 01001 10100 DBCR0 yes yes 32 E 308 01001 10100 DBCR0 hypv12 hypv9 32 E 309 01001 10101 PURR hypv3 yes 64 S 309 01001 10101 DBCR1 hypv12 hypv9 32 E 310 01001 10110 HDEC hypv3 hypv3 32 S 12 310 01001 10110 DBCR2 hypv hypv9 32 E 3 311 01001 10111 MSRP hypv hypv3 32 E.HV 312 01001 11000 RMOR hypv3 hypv3 64 S 312 01001 11000 IAC1 hypv12 hypv9 64 E 3 313 01001 11001 HRMOR hypv hypv3 64 S 12 313 01001 11001 IAC2 hypv hypv9 64 E 3 314 01001 11010 HSRR0 hypv hypv3 64 S 314 01001 11010 IAC3 hypv12 hypv9 64 E 3 315 01001 11011 HSRR1 hypv hypv3 64 S 12 315 01001 11011 IAC4 hypv hypv9 64 E 12 316 01001 11100 DAC1 hypv hypv9 64 E 317 01001 11101 DAC2 hypv12 hypv9 64 E 3 318 01001 11110 LPCR hypv hypv3 64 S 12 318 01001 11110 DVC1 hypv hypv9 64 E 319 01001 11111 LPIDR hypv3 hypv3 32 S 319 01001 11111 DVC2 hypv12 hypv9 64 E 5,12 336 01010 10000 TSR hypv hypv12 32 E 3,8 336 01010 10000 HMER hypv hypv3 64 S 3 337 01010 10001 HMEER hypv hypv3 64 S 338 01010 10010 PCR hypv3 hypv3 64 S 3 338 01010 10010 LPIDR hypv hypv3 32 E.HV 3 339 01010 10011 HEIR hypv hypv3 32 S 3 339 01010 10011 MAS5 hypv hypv3 32 E.HV 340 01010 10100 TCR hypv12 hypv9 32 E 3 hypv3 341 01010 10101 MAS8 hypv 32 E.HV 342 01010 10110 LRATCFG - hypv3 32 E.HV.LRAT 343 01010 10111 LRATPS - hypv3 32 E.HV.LRAT 344-347 01010 110xx TLB[0-3]PS - hypv3 32 E.HV 348 01010 11100 MAS5||MAS6 hypv3 hypv3 64 E.HV; 64 349 01010 11101 MAS8||MAS1 hypv 3 hypv3 64 E.HV; 64 9 350 01010 11110 EPTCFG hypv hypv9 32 E.PT 368-371 01011 100xx GSPRG0-3 yes yes 64 E.HV 372 01011 10100 MAS7||MAS3 yes yes 64 E; 64 373 01011 10101 MAS0||MAS1 yes yes 64 E; 64 378 01011 11010 GSRR0 yes yes 64 E.HV 379 01011 11011 GSRR1 yes yes 32 E.HV 380 01011 11100 GEPR yes yes 32 E.HV;EXP 381 01011 11101 GDEAR yes yes 64 E.HV 382 01011 11110 GPIR hypv3 yes 32 E.HV 383 01011 11111 GESR yes yes 32 E.HV 400-415 01100 1xxxx IVOR[0-15] hypv12 hypv9 32 E 432-435 01101 100xx IVOR38-41 hypv9 hypv9 32 E.HV Figure 22. SPR Numbers (Sheet 2 of 4) 1226 Power ISATM Book Appendices Version 2.06 SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 436 01101 10100 IVOR42 hypv12 hypv12 32 E.HV.LRAT 437 01101 10101 TENSR - hypv12 64 E.MT 438 01101 10110 TENS hypv12 hypv12 64 E.MT 439 01101 10111 TENC hypv12 hypv12 64 E.MT 440-441 01101 1100x GIVOR2-3 hypv3 yes 32 E.HV 442 01101 11010 GIVOR4 hypv3 yes 32 E.HV 443 01101 11011 GIVOR8 hypv3 yes 32 E.HV 444 01101 11100 GIVOR13 hypv3 yes 32 E.HV 445 01101 11101 GIVOR14 hypv3 yes 32 E.HV 446 01101 11110 TIR - hypv12 64 E.MT 447 01101 11111 GIVPR hypv3 yes 64 E.HV 512 10000 00000 SPEFSCR no no 32 SP 526 10000 01110 ATB/ATBL - no 64 ATB 527 10000 01111 ATBU - no 32 ATB 528 10000 10000 IVOR32 hypv12 hypv9 32 SP 529 10000 10001 IVOR33 hypv12 hypv9 32 SP 530 10000 10010 IVOR34 hypv12 hypv9 32 SP 531 10000 10011 IVOR35 hypv12 hypv9 32 E.PM 532 10000 10100 IVOR36 hypv12 hypv9 32 E.PC 533 10000 10101 IVOR37 hypv12 hypv9 32 E.PC 570 10001 11010 MCSRR0 hypv12 hypv9 64 E 571 10001 11011 MCSRR1 hypv12 hypv9 32 E 572 10001 11100 MCSR hypv12 hypv9 64 E 574 10001 11110 DSRR0 yes yes 64 E.ED 575 10001 11111 DSRR1 yes yes 32 E.ED 604 10010 11100 SPRG8 hypv12 hypv9 64 E 605 10010 11101 SPRG9 yes yes 64 E.ED 624 10011 10000 MAS0 yes yes 32 E 625 10011 10001 MAS1 yes yes 32 E 626 10011 10010 MAS2 yes yes 64 E 627 10011 10011 MAS3 yes yes 32 E 628 10011 10100 MAS4 yes yes 32 E 630 10011 10110 MAS6 yes yes 32 E 688-691 10101 100xx TLB[0-3]CFG - hypv9 32 E 702 10101 11110 EPR - yes13 32 EXP 768-783 11000 0xxxx perf_mon - no 64 S.PM 784-799 11000 1xxxx perf_mon yes yes 64 S.PM 896 11100 00000 PPR no no 64 S 898 01110 00010 PPR32 no no 32 B11 924 11100 11100 DCDBTRL -6 hypv12 32 E.CD 6 925 11100 11101 DCDBTRH - hypv12 32 E.CD 926 11100 11110 ICDBTRL -7 hypv12 32 E.CD 927 11100 11111 ICDBTRH -7 hypv12 32 E.CD 944 11101 10000 MAS7 yes yes 32 E 947 11101 10011 EPLC yes yes 32 E.PD 948 11101 10100 EPSC yes yes 32 E.PD 979 11110 10011 ICDBDR -7 hypv12 32 E.CD 1012 11111 10100 MMUCSR0 hypv12 hypv12 32 E 3 1013 11111 10101 DABR hypv hypv3 64 S 1015 11111 10111 DABRX hypv3 hypv3 64 S 1015 11111 10111 MMUCFG - hypv12 32 E 1023 11111 11111 PIR - yes 32 S - This register is not defined for this instruction. 1 Note that the order of the two 5-bit halves of the SPR number is reversed. Figure 22. SPR Numbers (Sheet 3 of 4) Appendix C. Complete SPR List 1227 Version 2.06 SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 2 See Section 1.3.5 of Book I. If multiple categories are listed separated by a semico- lon, all the listed categories must be implemented in order for the other columns of the line to apply. A comma separates two alternatives, and takes precedence over a semicolon; e.g., the EPCR (E.HV,E;64) must be implemented if either (a) category E.HV is implemented or (b) the processor is an Embedded processor that implements the 64-Bit category. 3 This register is a hypervisor resource, and can be accessed by this instruction only in hypervisor state (see Chapter 2 of Book III-S or Chapter 2 of Book III-E as appropri- ate). 4 This register is a hypervisor resource, and can be accessed by this instruction only in hypervisor state (see Chapter 2 of Book III-S). If the Embedded.Hypervisor category is supported, this register is a hypervisor resource, and can be accessed by this instruction only in hypervisor state (see Chap- ter 2 of Book III-E). Otherwise the register is privileged. 5 This register cannot be directly written. Instead, bits in the register corresponding to 1 bits in (RS) can be cleared using mtspr SPR,RS. 6 The register can be written by the dcread instruction. 7 The register can be written by the icread instruction. 8 This register cannot be directly written. Instead, bits in the register corresponding to 0 bits in (RS) can be cleared using mtspr SPR,RS. 9 The value specified in register RS may be masked by the contents of the [U]AMOR before being placed into the AMR; see the mtspr instruction description in Book III-S. 10 The value specified in register RS may be ANDed with the contents of the AMOR before being placed into the UAMOR; see the mtspr instruction description in Book III-S. 11 The register is Category: Phased-in. 12 If the Embedded.Hypervisor category is supported, this register is a hypervisor resource, and can be accessed by this instruction only in hypervisor state (see Chap- ter 2 of Book III-E). Otherwise the register is privileged for Embedded. 13 If the Embedded.Hypervisor category is supported, this register is a hypervisor resource and can be accessed by this instruction only in hypervisor state, and guest references to the register are redirected to the corresponding guest register (see Chapter 2 of Book III-E). Otherwise the register is privileged. All SPR numbers that are not shown above and are not implementation-specific are reserved. Figure 22. SPR Numbers (Sheet 4 of 4) 1228 Power ISATM Book Appendices Version 2.06 Appendix D. Illegal Instructions With the exception of the instruction consisting entirely of binary 0s, the instructions in this class are available for future extensions of the Power ISA; that is, some future version of the Power ISA may define any of these instructions to perform new functions. The following primary opcodes are illegal. 1, 5, 6 The following primary opcodes have unused extended opcodes. Their unused extended opcodes can be determined from the opcode maps in Appendix F of Book Appendices. All unused extended opcodes are illegal. 4, 19, 30, 31, 56, 5 , 58, 59, 60, 62, 63 An instruction consisting entirely of binary 0s is illegal, and is guaranteed to be illegal in all future versions of this architecture. Appendix D. Illegal Instructions 1229 Version 2.06 1230 Power ISATM Book Appendices Version 2.06 Appendix E. Reserved Instructions The instructions in this class are allocated to specific purposes that are outside the scope of the Power ISA. The following types of instruction are included in this class. 1. The instruction having primary opcode 0, except the instruction consisting entirely of binary 0s (which is an illegal instruction; see Section 1.7.2, "Illegal Instruction Class" on page 22) and the extended opcode shown below. 256 Service Processor "Attention" 2. Instructions for the POWER Architecture that have not been included in the Power ISA. These are listed in Section A.31, "Discontinued Opcodes" and Section A.33.1, "Discontinued Opcodes". 3. Implementation-specific instructions used to con- form to the Power ISA specification. 4. Any other implementation-dependent instructions that are not defined in the Power ISA. Appendix E. Reserved Instructions 1231 Version 2.06 1232 Power ISATM Book Appendices Version 2.06 Appendix F. Opcode Maps This appendix contains tables showing the opcodes reserved because it is "overlaid", by a fixed-point and extended opcodes. or Storage Access instruction having only a pri- mary opcode, by an instruction having an For the primary opcode table (Table 3 on page 1235), extended opcode in primary opcode 30, 58, or 62, each cell is in the following format. or by a potential instruction in any of the categories just mentioned. The overlaying instruction, if any, Opcode in Opcode in is also shown. A cell thus reserved should not be Decimal Hexadecimal assigned to an instruction having primary opcode 31. (The overlaying is a consequence of opcode Instruction decoding for fixed-point instructions: the primary Mnemonic opcode, and the extended opcode if any, are mapped internally to a 10-bit "compressed Category Instruction opcode" for ease of subsequent decoding on some Format implementations that complied with previous ver- sions of the architecture.) The category abbreviations are shown on Section 1.3.5 of Book I. However, the categories "Phased-In", Parentheses around the opcode or extended "Phased-Out", and floating-point "Record" are not listed opcode mean that the instruction was defined in in the opcode tables. earlier versions of the Power ISA but is no longer defined in the Power ISA. The extended opcode tables show the extended Curly brackets around the opcode or extended opcode in decimal, the instruction mnemonic, the cate- opcode mean that the instruction will be defined in gory, and the instruction format. These tables appear in future versions of the Power ISA. order of primary opcode within three groups. The first group consists of the primary opcodes that have small long is used as filler for mnemonics that are longer extended opcode fields (2-4 bits), namely 30, 58, and than a table cell. 62. The second group consists of primary opcodes that An empty cell, a cell containing only an asterisk, or a have 11-bit extended opcode fields. The third group cell in which the opcode or extended opcode is paren- consists of primary opcodes that have 10-bit extended thesized, corresponds to an illegal instruction. opcode fields. The tables for the second and third groups are rotated. The instruction consisting entirely of binary 0s causes the system illegal instruction error handler to be In the extended opcode tables several special mark- invoked for all members of the POWER family, and this ings are used. is likely to remain true in future models (it is guaranteed A prime (`) following an instruction mnemonic in the Power ISA). An instruction having primary denotes an additional cell, after the lowest-num- opcode 0 but not consisting entirely of binary 0s is bered one, used by the instruction. For example, reserved except for the following extended opcode subfc occupies cells 8 and 520 of primary opcode (instruction bits 21:30). 31, with the former corresponding to OE=0 and the 256 Service Processor "Attention" (Power ISA latter to OE=1. Similarly, sradi occupies cells 826 only) and 827, with the former corresponding to sh5=0 and the latter to sh5=1 (the 9-bit extended opcode 413, shown on page 96, excludes the sh5 bit). Two vertical bars (||) are used instead of primed mnemonics when an instruction occupies an entire column of a table. The instruction mnemonic is repeated in the last cell of the column. For primary opcode 31, an asterisk (*) in a cell that would otherwise be empty means that the cell is Appendix F. Opcode Maps 1233 Version 2.06 Table 3: Primary opcodes 0 00 1 01 2 02 3 03 See primary opcode 0 extensions on Illegal, tdi twi page 1233 Reserved 64 D B D Trap Doubleword Immediate Trap Word Immediate 4 04 5 05 6 06 7 07 See Table 8 and Table 9 Vector, LMA, mulli SP V, LMA, SP BD Multiply Low Immediate 8 08 9 09 10 0A 11 0B Subtract From Immediate Carrying subfic cmpli cmpi Compare Logical Immediate B D B D B D Compare Immediate 12 0C 13 0D 14 0E 15 0F Add Immediate Carrying addic addic. addi addis Add Immediate Carrying and Record Add Immediate B D B D B D B D Add Immediate Shifted 16 10 17 11 18 12 19 13 Branch Conditional bc sc b CR ops, System Call etc. Branch B B B SC B I XL See Table 11 on page 1248 20 14 21 15 22 16 23 17 Rotate Left Word Imm. then Mask Insert rlwimi rlwinm rlwnm Rotate Left Word Imm. then AND with Mask B M B M B M Rotate Left Word then AND with Mask 24 18 25 19 26 1A 27 1B OR Immediate ori oris xori xoris OR Immediate Shifted XOR Immediate B D B D B D B D XOR Immediate Shifted 28 1C 29 1D 30 1E 31 1F AND Immediate andi. andis. FX Dwd Rot FX AND Immediate Shifted Extended Ops See Table 4 on page 1237 B D B D MD[S] See Table 11 on page 1248 32 20 33 21 34 22 35 23 Load Word and Zero lwz lwzu lbz lbzu Load Word and Zero with Update Load Byte and Zero B D B D B D B D Load Byte and Zero with Update 36 24 37 25 38 26 39 27 Store Word stw stwu stb stbu Store Word with Update Store Byte B D B D B D B D Store Byte with Update 40 28 41 29 42 2A 43 2B Load Half and Zero lhz lhzu lha lhau Load Half and Zero with Update Load Half Algebraic B D B D B D B D Load Half Algebraic with Update 44 2C 45 2D 46 2E 47 2F Store Half sth sthu lmw stmw Store Half with Update Load Multiple Word B D B D B D B D Store Multiple Word 48 30 49 31 50 32 51 33 Load Floating-Point Single lfs lfsu lfd lfdu Load Floating-Point Single with Update Load Floating-Point Double FP D FP D FP D FP D Load Floating-Point Double with Update 52 34 53 35 54 36 55 37 Store Floating-Point Single stfs stfsu stfd stfdu Store Floating-Point Single with Update Store Floating-Point Double FP D FP D FP D FP D Store Floating-Point Double with Update 56 38 57 39 58 3A 59 3B Load Quadword lq FX DS-form FP Single See Table 5 on page 1237 Loads & DFP Ops See Table 6 on page 1237 LSQ DQ DS See Table 16 on page 1252 1234 Power ISATM Book Appendices Version 2.06 Table 3: Primary opcodes 60 3C 61 3D 62 3E 63 3F VSX Extended stfdp FX DS-form FP Double Store Floating-Point Double Pair Ops Stores &DFP Ops See Table 7 on page 1237 FP DS DS See Table 17 on page 1254 See Table 18 on page 1256 Appendix F. Opcode Maps 1235 Version 2.06 Table 5: Extended opcodes for primary opcode 57 (instruction bits 30:31) 0 1 0 lfdp 0 FP DS 1 Table 4: Extended opcodes for primary opcode 30 (instruction bits 27:30) 00 01 10 11 0 1 2 3 rldicl rldicl' rldicr rldicr' 00 64 64 MD MD MD MD 4 5 6 7 rldic rldic' rldimi rldimi' 01 64 64 MD MD MD MD 8 9 rldcl rldcr 10 64 64 MDS MDS 11 Table 6: Extended opcodes for primary opcode 58 (instruction bits 30:31) 0 1 0 1 ld ldu 0 64 64 DS DS 2 lwa 1 64 DS Table 7: Extended opcodes for primary opcode 62 (instruction bits 30:31) 0 1 0 1 std stdu 0 64 64 DS DS 2 stq 1 LSQ DS 1236 Power ISATM Book Appendices Version 2.06 Table 8: (Left) Extended opcodes for primary opcode 4 [Category: V & LMA] (instruction bits 21:31) 000000 000001 000010 000011 000100 000101 000110 000111 001000 001001 001010 001011 001100 001101 001110 001111 0 2 4 6 8 10 12 14 00000 vaddubm vmaxub vrlb vcmpequb vmuloub vaddfp vmrghb vpkuhum V VX V VX V VX V VC V VX V VX V VX V VX 64 66 68 70 72 74 76 78 00001 vadduhm vmaxuh vrlh vcmpequh vmulouh vsubfp vmrghh vpkuwum V VX V VX V VX V VC V VX V VX V VX V VX 128 130 132 134 140 142 00010 vadduwm vmaxuw vrlw vcmpequw vmrghw vpkuhus V VX V VX V VX V VC V VX V VX 198 206 00011 vcmpeqfp vpkuwus V VC V VX 258 260 264 266 268 270 00100 vmaxsb vslb vmulosb vrefp vmrglb vpkshus V VX V VX V VX V VX V VX V VX 322 324 328 330 332 334 00101 vmaxsh vslh vmulosh vrsqrtefp vmrglh vpkswus V VX V VX V VX V VX V VX V VX 384 386 388 394 396 398 00110 vaddcuw vmaxsw vslw vexptefp vmrglw vpkshss V VX V VX V VX V VX V VX V VX 452 454 458 462 00111 vsl vcmpgefp vlogefp vpkswss V VX V VC V VX V VX 512 514 516 518 520 522 524 526 01000 vaddubs vminub vsrb vcmpgtub vmuleub vrfin vspltb vupkhsb V VX V VX V VX V VC V VX V VX V VX V VX 576 578 580 582 584 586 588 590 01001 vadduhs vminuh vsrh vcmpgtuh vmuleuh vrfiz vsplth vupkhsh V VX V VX V VX V VC V VX V VX V VX V VX 640 642 644 646 650 652 654 01010 vadduws vminuw vsrw vcmpgtuw vrfip vspltw vupklsb V VX V VX V VX V VC V VX V VX V VX 708 710 714 718 01011 vsr vcmpgtfp vrfim vupklsh V VX V VC V VX V VX 768 770 772 774 776 778 780 782 01100 vaddsbs vminsb vsrab vcmpgtsb vmulesb vcfux vspltisb vpkpx V VX V VX V VX V VC V VX V VX V VX V VX 832 834 836 838 840 842 844 846 01101 vaddshs vminsh vsrah vcmpgtsh vmulesh vcfsx vspltish vupkhpx V VX V VX V VX V VC V VX V VX V VX V VX 896 898 900 902 906 908 01110 vaddsws vminsw vsraw vcmpgtsw vctuxs vspltisw V VX V VX V VX V VC V VX V VX 966 970 974 01111 vcmpbfp vctsxs vupklpx V VC V VX V VX 1024 1026 1028 1030 1034 1036 10000 vsububm vavguh vand vcmpequb. vmaxfp vslo V VX V VX V VX V VC V VX V VX 1088 1090 1092 1094 1098 1100 10001 vsubuhm vavguw vandc vcmpequh. vminfp vsro V VX V VX V VX V VC V VX V VX 1152 1154 1156 1158 10010 vsubuwm vavgub vor vcmpequw. V VX V VX V VX V VC 1220 1222 10011 vxor vcmpeqfp. V VX V VC 1282 1284 10100 vavgsb vnor V VX V VX 1346 10101 vavgsh V VX 1408 1410 10110 vsubcuw vavgsw V VX V VX 1478 10111 vcmpgefp. V VC 1536 1540 1542 1544 11000 vsububs mfvscr vcmpgtub. vsum4ubs V VX V VX V VC V VX 1600 1604 1606 1608 11001 vsubuhs mtvscr vcmpgtuh. vsum4shs V VX V VX V VC V VX 1664 1670 1672 11010 vsubuws vcmpgtuw. vsum2sws V VX V VC V VX 1734 11011 vcmpgtfp. V VC 1792 1798 1800 11100 vsubsbs vcmpgtsb. vsum4sbs V VX V VC V VX 1856 1862 11101 vsubshs vcmpgtsh. V VX V VC 1920 1926 1928 11110 vsubsws vcmpgtsw. vsumsws V VX V VC V VX 1990 11111 vcmpbfp. V VC 1238 Power ISATM Book Appendices Version 2.06 Table 8 (Left-Center) Extended opcodes for primary opcode 4 [Category: V & LMA] (instruction bits 21:31) 010000 010001 010010 010011 010100 010101 010110 010111 011000 011001 011010 011011 011100 011101 011110 011111 16 17 24 24 00000 mulhhwu mulhhwu. machhwu long LMA X LMA X LMA XO LMA XO 80 81 88 89 92 93 00001 mulhhw mulhhw. machhw machhw. nmachhw long LMA X LMA X LMA XO LMA XO LMA XO LMA XO 152 153 00010 long long LMA XO LMA XO 216 217 220 220 00011 machhws long long long LMA XO LMA XO LMA XO LMA XO 272 273 280 281 00100 mulchwu mulchwu. macchwu long LMA X LMA X LMA XO LMA XO 336 337 344 345 348 349 00101 mulchw mulchw. macchw macchw. nmacchw long LMA X LMA X LMA XO LMA XO LMA XO LMA XO 408 409 00110 long long LMA XO LMA XO 472 473 476 477 00111 macchws long long long LMA XO LMA XO LMA XO LMA XO 01000 01001 01010 01011 784 784 792 793 01100 mullhwu mullhwu. maclhwu maclhwu. LMA X LMA X LMA XO LMA XO 848 849 856 857 860 861 01101 mullhw mullhw. maclhw maclhw. nmaclhw nmaclhw. LMA X LMA X LMA XO LMA XO LMA XO LMA XO 920 921 01110 long long LMA XO LMA XO 984 985 988 989 01111 maclhws maclhws. long long LMA XO LMA XO LMA XO LMA XO 1048 1049 10000 long long LMA XO LMA XO 1112 1113 1116 1117 10001 machhw' long long long LMA XO LMA XO LMA XO LMA XO 1176 1177 10010 long long LMA XO LMA XO 1240 1241 1244 1245 10011 long long long long LMA XO LMA XO LMA XO LMA XO 1304 1305 10100 long long LMA XO LMA XO 1368 1369 1372 1373 10101 macchw' long long long LMA XO LMA XO LMA XO LMA XO 1432 1433 10110 long long LMA XO LMA XO 1496 1497 1500 1501 10111 long long long long LMA XO LMA XO LMA XO LMA XO 11000 11001 11010 11011 1816 1817 11100 long long LMA XO LMA XO 1880 1881 1884 1885 11101 maclhw' maclhw' long long LMA XO LMA XO LMA XO LMA XO 1944 1946 11110 long long LMA XO LMA XO 2008 2009 2012 2013 11111 long long long long LMA XO LMA XO LMA XO LMA XO Appendix G. Opcode Maps 1239 Version 2.06 Table 8 (Right-Center) Extended opcodes for primary opcode 4 [Category: V & LMA] (instruction bits 21:31) 100000 100001 100010 100011 100100 100101 100110 100111 101000 101001 101010 101011 101100 101101 101110 101111 32 32 34 36 37 38 39 40 41 42 43 44 46 47 00000 vmhaddshs vmhraddshs vmladduhm vmsumubm vmsummbm vmsumuhm vmsumuhs vmsumshm vmsumshs vsel vperm vsldoi vmaddfp vnmsubfp V VA V VA V VA V VA V VA V VA V VA V VA V VA V VA V VA V VA V VA V VA || || || || || || || || || || || || || || 00001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11111 || || || || || || || || || || || || || || vmhaddshs vmhraddshs vmladduhm vmsumubm vmsummbm vmsumuhm vmsumuhs vmsumshm vmsumshs vsel vperm vsdoi vmaddfp vnmsubfp 1240 Power ISATM Book Appendices Version 2.06 Table 8 (Right) Extended opcodes for primary opcode 4 [Category: V & LMA] (instruction bits 21:31) 110000 110001 110010 110011 110100 110101 110110 110111 111000 111001 111010 111011 111100 111101 111110 111111 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 Appendix G. Opcode Maps 1241 Version 2.06 Table 9: (Left) Extended opcodes for primary opcode 4 [Category: SP.*] (instruction bits 21:31) 000000 000001 000010 000011 000100 000101 000110 000111 001000 001001 001010 001011 001100 001101 001110 001111 00000 00001 00010 00011 00100 00101 00110 00111 512 514 516 518 520 521 522 523 524 525 526 527 01000 evaddw evaddiw evsubfw evsubifw evabs evneg evextsb evextsh evrndw evcntlzw evcntlsw brinc SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01001 640 641 644 645 646 648 649 652 653 654 01010 evfsadd evssub evfsabs evfsnabs evfsneg evfsmul evfsdiv long evfscmplt long sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX 704 705 708 709 710 712 713 716 717 718 719 01011 efsadd efssub efsabs efsnabs efsneg efsmul efsdiv efscmpgt efscmplt efscmpeq efscfd sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fd EVX 768 769 770 771 772 773 776 777 780 781 782 783 01100 evlddx evldd evldwx evldw evldhx evldh long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01101 01110 01111 1027 1031 1032 1033 1035 1036 1037 1039 10000 evmhessf evmhossf long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 1095 1096 1100 1101 1103 10001 long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX 10010 1216 1217 1218 1219 1220 1222 1223 1224 1225 1226 1227 10011 long long long long evmra evdivws evdivwu long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 1280 1281 1283 1284 1285 1287 1288 1289 1291 1292 1293 1295 10100 long long long long long long long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 1344 1345 1352 1353 10101 long long long long SP EVX SP EVX SP EVX SP EVX 1408 1409 1411 1412 1413 1415 1416 1417 1419 1420 1421 1423 10110 long long long long long long long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 1472 1473 1480 1481 10111 long long long long SP EVX SP EVX SP EVX SP EVX 11000 11001 11010 11011 11100 11101 11110 11111 1242 Power ISATM Book Appendices Version 2.06 Table 9 (Left-Center) Extended opcodes for primary opcode 4 [Category: SP.*] (instruction bits 21:31) 010000 010001 010010 010011 010100 010101 010110 010111 011000 011001 011010 011011 011100 011101 011110 011111 00000 00001 00010 00011 00100 00101 00110 00111 529 530 534 535 536 537 539 542 01000 evand evandc evxor evor evnor eveqv evorc evnand SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01001 656 657 658 659 660 661 662 663 664 666 668 669 670 01010 evfscfui evfscfsi evfscfuf evfscfsf evfsctui evfsctsi evfsctuf evfsctsf evfsctuiz evfsctsiz evfststgt evfststlt evfststeq sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX 720 721 722 723 724 725 726 727 728 730 732 733 734 01011 efscfui efscfsi efscfuf efscfsf efsctui efsctsi efsctuf efsctsf efsctuiz efsctsiz efststgt efststlt efststeq sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX 784 785 788 789 790 791 792 793 796 797 01100 evlwhex evlwhe evlwhoux evlwhou evlwhosx evlwhos long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01101 01110 01111 10000 1107 1112 1113 1115 10001 evmwssf long long long SP EVX SP EVX SP EVX SP EVX 10010 10011 10100 1363 1368 1369 1371 10101 long long long long SP EVX SP EVX SP EVX SP EVX 10110 1491 1496 1497 1499 10111 long long long long SP EVX SP EVX SP EVX SP EVX 11000 11001 11010 11011 11100 11101 11110 11111 Appendix G. Opcode Maps 1243 Version 2.06 Table 9 (Right-Center) Extended opcodes for primary opcode 4 [Category: SP.*] (instruction bits 21:31) 100000 100001 100010 100011 100100 100101 100110 100111 101000 101001 101010 101011 101100 101101 101110 101111 00000 00001 00010 00011 00100 00101 00110 00111 544 545 546 547 548 550 552 553 554 555 556 557 558 559 01000 evsrwu evsrws evsrwiu evsrwis evslw evslwi evrlw evsplati evrlwi evsplatfi long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01001 01010 736 737 738 739 740 741 742 744 745 746 747 748 749 750 751 01011 efdadd efdsub efdcfuid efdcfsid efdabs efdnabs efdneg efdmul efddiv efdctuidz efdctsidz efdcmpgt efdcmplt efdcmpeq efdcfs sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX 800 801 802 803 804 805 01100 evstddx evstdd evstdwx evstdw evstdhx evstdh SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01101 01110 01111 1059 1063 1064 1065 1067 1068 1069 1071 10000 long long long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 1127 1128 1132 1133 1135 10001 long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX 10010 10011 1320 1321 1323 1324 1325 1327 10100 long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 10101 1448 1449 1451 1452 1453 1455 10110 long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 10111 11000 11001 11010 11011 11100 11101 11110 11111 1244 Power ISATM Book Appendices Version 2.06 Table 9 (Right) Extended opcodes for primary opcode 4 [Category: SP.*] (instruction bits 21:31) 110000 110001 110010 110011 110100 110101 110110 110111 111000 111001 111010 111011 111100 111101 111110 111111 00000 00001 00010 00011 00100 00101 00110 00111 560 561 562 563 564 01000 evcmpgtu evcmpgts evcmpltu evcmplts evcmpeq SP EVX SP EVX SP EVX SP EVX SP EVX 632 633 634 635 636 637 638 639 01001 evsel evsel' evsel' evsel' evsel' evsel' evsel' evsel' SP EVS SP EVS SP EVS SP EVS SP EVS SP EVS SP EVS SP EVS 01010 752 753 754 755 756 757 758 759 760 762 764 765 766 01011 efdcfui efdcfsi efdcfuf efdcfsf efdctui efdctsi efdctuf efdctsf efdctuiz efdctsiz efdtstgt efdtstlt efdtsteq sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX 816 817 820 821 824 825 828 829 01100 evstwhex evstwhe evstwhox evstwho evstwwex evstwwe evstwwox evstwwo SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01101 01110 01111 10000 1139 1144 1145 1147 10001 long long long long SP EVX SP EVX SP EVX SP EVX 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 Appendix G. Opcode Maps 1245 Version 2.06 Table 10: (Left) Extended opcodes for primary opcode 19 (instruction bits 21:30) 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 0 00000 mcrf B XL 33 38 39 00001 crnor rfmci rfdi B XL E XL E.ED X 00010 00011 129 00100 crandc B XL 00101 193 crxor 198 00110 B XL dnh E.EDXFX 225 00111 crnand B XL 257 01000 crand B XL 289 01001 creqv B XL 332 01010 lxvdsx VSX XX 01011 01100 417 01101 crorc B XL 449 01110 cror B XL 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 1246 Power ISATM Book Appendices Version 2.06 Table 10. (Right) Extended opcodes for primary opcode 19 (instruction bits 21:30) 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 16 18 00000 bclr rfid B XL S XL 50 51 00001 rfi rfci E XL E XL (82) 00010 rfsvc XL 00011 00100 150 isync B XL 00101 00110 00111 274 01000 hrfid S XL 01001 01010 01011 402 01100 doze S XL 434 01101 nap S XL 466 01110 sleep S XL 498 01111 rvwinkle S XL 528 10000 bcctr B XL 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 Appendix G. Opcode Maps 1247 Version 2.06 Table 11: (Left) Extended opcodes for primary opcode 31 (instruction bits 21:30) 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 0 4 6 7 8 9 10 11 14 15 00000 cmp tw lvsl lvebx subfc mulhdu addc mulhwu Res'd See B X B X V X V X B XO 64 XO B XO B XO VLE Table 15 32 33 38 39 40 46 || 00001 cmpl Res'd lvsr lvehx subf Res'd || B X VLE V X V X B XO VLE || 68 71 73 74 75 78 || 00010 td lvewx mulhd addg6s' mulhw dlmzb || 64 X V X 64 XO BCDA XO B XO LMV X || 103 104 || 00011 lvx neg || V X B XO || 129 131 134 135 136 138 || 00100 Res'd wrtee dcbtstls stvebx subfe adde || VLE E X ECL X V X B XO B XO || 163 166 167 || 00101 wrteei dcbtls stvehx || E X ECL X V X || 193 199 200 202 206 || 00110 Res'd stvewx subfze addze msgsnd || VLE V X B XO B XO E.PC X || 225 230 231 232 233 234 235 238 || 00111 Res'd icblc stvx subfme mulld addme mullw msgclr || VLE ECL X V X B XO 64 XO B XO B XO E.PC X || 257 259 262 263 266 270 || 01000 Res'd mfdcrx Res'd lvepxl add ehpriv || VLE E.DC X AP E.PD X B XO E.HV XL || 289 291 295 || 01001 Res'd mfdcrux lvepx || VLE E.DC X E.PD X || 323 326 332 334 || 01010 mfdcr dcread lxvdsx mfpmr || E.DC XFX E.CD X VSX XX E.PM XFX || 359 {366} || 01011 lvxl mftmr || V X || 387 390 393 395 398 || 01100 mtdcrx dcblc divdeu' divweu mvptas || E.DC X ECL X 64 XO B XO || 417 419 425 427 || 01101 Res'd mtdcrux divde divwe || VLE E.DC X 64 XO B XO || 449 451 454 457 459 462 || 01110 Res'd mtdcr dci divdu divwu mtpmr || VLE E.DC XFX E.CI X 64 XO B XO E.PM XFX || 483 486 487 489 491 494 || 01111 dsn Res'd stvxl divd divw mttmr || DS X AP V X 64 XO B XO EMT XFX || 512 515 519 520 521 522 523 || 10000 mcrxr lbdx Res'd subfc' mulhdu' addc' mulhwu' || E X DS X V X B XO 64XO B XO B XO || 547 551 552 || 10001 lhdx Res'd subf' || DS X V X B XO || 579 585 586 587 588 || 10010 lwdx mulhd' addg6s mulhw' lxsdx || DS X 64 XO BCDA XO B XO VSX XX || 611 616 620 || 10011 lddx neg' lxsdux || DS X B XO VSX XX || 643 647 648 650 || 10100 stbdx Res'd subfe' adde' || DS X V X B XO B XO || 675 679 || 10101 sthdx Res'd || DS X V X || 707 712 714 716 || 10110 stwdx subfze' addze' stxsdx || DS X B XO B XO VSX XX || 739 744 745 746 747 748 || 10111 stddx subfme' mulld' addme' mullw' stxsdux || DS X B XO 64 XO B XO B XO VSX XX || 775 778 780 || 11000 stvepxl add' lxvw4x || E.PD X B XO VSX XX || 802 803 807 812 || 11001 dcffix lfddx stvepx lxvw4ux || DFP X DS X E.PD X VSX XX || 844 || 11010 lxvd2x || VSX XX || 876 || 11011 lxvd2ux || VSX XX || 903 908 || 11100 stvlxl stxvw4x || V X VSX XX || 931 935 937 939 940 || 11101 stfddx stvrxl divde' divwe' stxvw4ux || DS X V X 64 XO 64 XO VSX XX || 966 969 971 972 || 11110 ici divdu' divwu' stxvd2ux || E.CI X 64XO 64XO VSX XX || 998 1001 1003 || 11111 icread divd' divw' See E.CD X 64 XO B XO Table 15 1248 Power ISATM Book Appendices Version 2.06 Table 11. (Right) Extended opcodes for primary opcode 31 (instruction bits 21:30) 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 16 18 19 20 21 22 23 24 26 27 28 29 30 31 00000 Res'd tlbilx mfcr lwarx ldx icbt lwzx slw cntlzw sld and ldepx rldicl* lwepx VLE E.HV X B XFX B X 64 X E X B X B X B X 64 X B X E.PD X 64 MD E.PD X 52 53 54 55 56 58 60 62 00001 lbarx ldux dcbst lwzux Res'd cntlzd andc See B X 64 X B X B X VLE 64 X B X Table 12 (82) 83 84 86 87 94 95 00010 mtsrd mfmsr ldarx dcbf lbzx rldicr* lbepx X B X 64 X B X B X 64 MD E.PD X (114) 116 (118) 119 122 124 126 127 00011 mtsrdin lharx clf lbzux popcntb nor rldicr* dcbfep X B X X B X B X B X 64 MD E.PD X 144 146 149 150 151 154 157 158 159 00100 mtcrf mtmsr stdx stwcx. stwx prtyw stdepx rldic* See B XFX B X 64 X B X B X B X E.PD X 64 MD Table 14 178 181 183 186 190 191 00101 mtmsrd stdux stwux prtyd rldic* rlwinm* S X 64 X B X 64 X 64 MD B M 210 214 215 222 223 00110 mtsr stdcx. stbx rldimi* stbepx S X 64 X B X 64 MD E.PD X 242 246 247 250 252 254 255 00111 mtsrin dcbtst stbux Res'd bpermd rldimi* See S X B X B X popcnt 64 X 64 MD Table 14 274 278 279 280 282 284 285 286 286 01000 tlbiel dcbt lhzx Res'd cdtbcd eqv evlddepx rldcl* See S X B X B X VLE BCDA X B X E.PD evx 64 MDS Table 14 306 308 310 311 312 314 316 318 319 01001 tlbie Res'd eciwx lhzux Res'd cbcdtd xor rldcr* See S X EC X B X VLE BCDA X B X 64 MDS Table 14 339 341 342 343 350 351 01010 mfspr lwax Res'd lhax * xori* B XFX 64 X AP B X B D 370 371 373 374 375 378 382 383 01011 tlbia mftb lwaux Res'd lhaux popcntw * xoris* S X S XFX 64 X AP B X B X B D 402 407 412 413 414 415 01100 slbmte sthx orc evstddepx * See S X B X B X E.PD evx Table 14 434 438 439 444 446 447 01101 slbie ecowx sthux or * andis.* S X EC X B X B X B D 467 469 470 471 476 478 01110 mtspr * dcbi lmw* nand * B XFX E X All D B X 498 501 503 506 508 510 01111 slbia * stmw* popcntd cmpb * S X All D 64 X B X (530) 532 533 534 535 536 539 10000 no-op ldbrx lswx lwbrx lfsx srw srd 64 X MA X B X FP X B X 64 X (562) 566 567 568 10001 no-op tlbsync lfsux Res'd S X FP X VLE (594) 595 597 598 599 607 10010 no-op mfsr lswi sync lfdx lfdepx S X MA X B X FP X E.PD X (626) 631 10011 no-op lfdux FP X (658) 659 660 661 662 663 10100 no-op mfsrin stdbrx stswx stwbrx stfsx S X 64 X MA X B X FP X (690) 694 695 10101 no-op stbcx. stfsux B X FP X (722) 725 726 727 735 10110 no-op stswi sthcx. stfdx stfdepx MA X B X FP X E.PD X (754) 758 759 10111 no-op dcba stfdux E X FP X 786 789 790 791 792 794 11000 tlbivax lwzcix lhbrx lfdpx sraw srad E X S X B X FP X B X 64 X (818) 821 822 823 824 826 827 11001 rac lhzcix Res'd Res'd srawi sradi sradi' X S X B X 64 XS 64 XS 850 851 853 854 855 11010 tlbsrx. slbmfev lbzcix See lfiwax E.TWC X S X S X Table 13 FP X 885 887 11011 ldcix lfiwzx S X FP X 914 915 917 918 919 922 11100 tlbsx slbmfee stwcix sthbrx stfdpx extsh E X S X S X B X FP X B X 946 949 951 954 11101 tlbre sthcix Res'd extsb E X S X AP B X 978 979 981 982 983 986 991 11110 tlbwe slbfee. stbcix icbi stfiwx extsw icbiep E X S X S X B X FP X 64 X E.PD X 1010 1013 1014 1023 11111 Res'd stdcix dcbz dcbzep S X B X E.PD X Appendix G. Opcode Maps 1249 Version 2.06 Table 12: Opcode: 31, Extended Opcode: 62 0 00001 62 62 00001 rldicl* wait 64 MD WT X Table 13: Opcode: 31, Extended Opcode: 854 10110 854 854 11010 eieio mbar S X E X Table 14: Opcode: 31, Extended Opcode: 159 11111 159 159 00100 rlwimi* stwepx B M E.PD X 191 00101 rlwinm* B M 223 00110 stbepx E.PD X 255 00111 rlwnm* B M 287 287 01000 ori* lhepx B D E.PD X 319 319 01001 oris* dcbtep B D E.PD X 351 01010 xori* B D 383 01011 xoris* B D 415 415 01100 andi.* sthepx B D E.PD X 1250 Power ISATM Book Appendices Version 2.06 Table 15: Opcode: 31, Extended Opcode: 15 01111 15 00000 isel B A 47 || 00001 * || || 79 || 00010 tdi* || 64 D || 111 || 00011 twi* || B D || 143 || 00100 * || || 175 || 00101 * || || 207 || 00110 * || || 239 || 00111 mulli* || B D || 271 || 01000 subfic* || B D || || 01001 || || 335 || 01010 cmpli* || B D || 367 || 01011 cmpi* || B D || 399 || 01100 addic* || B D || 431 || 01101 addic.* || B D || 463 || 01110 addi* || B D || 495 || 01111 addis* || B D || || 10000 || || || 10001 || || || 10010 || || || 10011 || || || 10100 || || || 10101 || || || 10110 || || || 10111 || || || 11000 || || || 11001 || || || 11010 || || || 11011 || || || 11100 || || || 11101 || || || 11110 || || || 11111 || isel Appendix G. Opcode Maps 1251 Version 2.06 Table 16:(Left) Extended opcodes for primary opcode 59 (instruction bits 21:30) 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 2 3 00000 dadd dqua DFP X DFP Z 34 35 00001 dmul drrnd DFP X DFP Z 66 67 00010 dscli dquai DFP Z22 DFP Z 98 99 00011 dscri drintx DFP Z DFP Z23 130 00100 dcmpo DFP X 162 00101 dtstex DFP X 194 00110 dtstdc DFP Z23 226 227 00111 dtstdg drintn DFP Z23 DFP Z23 258 259 01000 dctdps dqua' DFP X DFP Z 290 291 01001 dctfix drrnd' DFP X DFP Z 322 323 01010 ddedpd dquai' DFP X DFP Z 354 355 01011 dxex drintx' DFP X DFP Z23 01100 01101 01110 01111 514 515 10000 dsub dqua' DFP X DFP Z 546 547 10001 ddiv drrnd' DFP X DFP Z 578 579 10010 dscli' dquai' DFP Z22 DFP Z 610 611 10011 dscri' drintx' DFP Z22 DFP Z23 642 10100 dcmpu DFP X 674 10101 dtstsf DFP X 706 10110 dtstdc' DFP Z23 738 739 10111 dtstdg'drintn' DFP Z23DFP Z23 770 771 11000 drsp dqua' DFP X DFP Z 803 11001 drrnd' DFP Z 834 835 846 11010 denbcd dquai' fcfids DFP X DFP Z FP X 866 867 11011 diex drintx' DFP X DFP Z23 11100 11101 974 11110 fcfidus FP X 995 11111 drintn' DFP Z23 1252 Power ISATM Book Appendices Version 2.06 Table 16. (Right) Extended opcodes for primary opcode 59 (instruction bits 21:30) 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 18 20 21 22 24 25 26 28 29 30 31 00000 fdivs fsubs fadds fsqrts fres fmuls frsqrtes fmsubs fmadds fnmsubs fnmadds FP A FP A FP A FP A FP A FP A FP A FP A FP A FP A FP A || || || || || || || || || || || 00001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11111 || || || || || || || || || || || fdivs fsubs fadds fsqrts fres fmuls frsqrtes fmsub fmadds fnmsubs fnmadds Appendix G. Opcode Maps 1253 Version 2.06 Table 17. (Left) Extended opcodes for primary opcode 60 (instruction bits 21:30) 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 8 00000 xxsldwi VSX XX 40 00001 xxpermdi VSX XX 72 00010 xxmrglw VSX XX 00011 128 132 136 140 00100 xsadddp xsmaddadp xxsldwi xscmpudp VSX XX VSX XX VSX XX VSX XX 160 164 168 172 00101 xssubdp xsmaddmdp xxpermdi xscmpodp VSX XX VSX XX VSX XX VSX XX 192 196 200 00110 xsmuldp xsmsubadp xxmrglw VSX XX VSX XX VSX XX 224 228 00111 xsdivdp xsmsubmdp VSX XX VSX XX 256 260 264 268 01000 xvdivsp xvmaddasp xxsldwi xvcmpeqsp VSX XX VSX XX VSX XX VSX XX 288 292 296 300 01001 xvdivsp xvmaddmsp xxpermdi xvcmpgtsp VSX XX VSX XX VSX XX VSX XX 320 324 328 332 01010 xvdivsp xvmsubasp xxspltw xvcmpgesp VSX XX VSX XX VSX XX VSX XX 352 356 01011 xvdivsp xvmsubmsp VSX XX VSX XX 384 388 392 396 01100 xvdivdp xvmaddadp xxsldwi xvcmpeqdp VSX XX VSX XX VSX XX VSX XX 416 420 424 428 01101 xvdivdp xvmaddmdp xxpermdi xvcmpgtdp VSX XX VSX XX VSX XX VSX XX 448 452 460 01110 xvdivdp xvmsubadp xvcmpgedp VSX XX VSX XX VSX XX 480 484 01111 xvdivdp xvmsubmdp VSX XX VSX XX 520 10000 xxland VSX XX 552 10001 xxlandc VSX XX 584 10010 xxlor VSX XX 616 10011 xxlxor VSX XX 640 644 648 10100 xsmaxdp xsnmaddadp xxlnor VSX XX VSX XX VSX XX 672 676 10101 xsnibdp xsnmaddmdp VSX XX VSX XX 704 708 10110 xscpsgndp xsnmsubadp VSX XX VSX XX 740 10111 xsnmsubmdp VSX XX 768 772 780 11000 xvmaxsp xvnmaddasp xvcmpeqsp. VSX XX VSX XX VSX XX 800 804 812 11001 xvminsp xvnmaddmsp xvcmpgtsp. VSX XX VSX XX VSX XX 832 836 844 11010 xvcpsgnsp xvnmsubasp xvcmpgesp. VSX XX VSX XX VSX XX 868 11011 xvnmsubmsp VSX XX 896 900 908 11100 xvmaxdp xvnmaddadp xvcmpeqdp. VSX XX VSX XX VSX XX 928 932 940 11101 xvmindp xvnmaddmdp xvcmpgtdp. VSX XX VSX XX VSX XX 960 964 972 11110 xvcpsgndp xvnmsubadp xvcmpgedp. VSX XX VSX XX VSX XX 996 11111 xvnmsubmdp VSX XX 1254 Power ISATM Book Appendices Version 2.06 Table 17. (Right) Extended opcodes for primary opcode 60 (instruction bits 21:30) 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 24 00000 xxsel VSX XA 00001 00010 00011 144 146 148 150 00100 xscvdpuxws xsrdpi xsrsqrtedp xssqrtdp* VSX XX VSX XX VSX XX VSX XX 176 178 180 00101 xscvdpsxws xsrdpiz xsredp VSX XX VSX XX VSX XX 210 212 214 00110 xsrdpip xstsqrtdp xsrdpic* VSX XX VSX XX VSX XX 242 244 00111 xsrdpim xstdivdp VSX XX VSX XX 272 274 278 01000 xvcvspuxws xvrspi xvsqrtsp* VSX XX VSX XX VSX XX 304 306 01001 xvcvspsxws xvrspiz VSX XX VSX XX 336 338 342 01010 xvcvuxwsp xvrspip xvrspic* VSX XX VSX XX VSX XX 368 370 372 01011 xvcvsxwsp xvrspim xvtdivsp VSX XX VSX XX VSX XX 400 402 404 406 01100 xvcvdpuxws xvrdpi xvrsqrtedp xvsqrtdp* VSX XX VSX XX VSX XX VSX XX 432 434 436 01101 xvcvdpsxws xvrdpiz xvredp VSX XX VSX XX VSX XX 464 466 468 470 01110 xvcvuxwdp xvrdpip xvtsqrtdp xvrdpic* VSX XX VSX XX VSX XX VSX XX 496 498 500 01111 xvcvsxwdp xvrdpim xvtdivdp VSX XX VSX XX VSX XX 530 10000 xscvdpsp VSX XX 10001 10010 10011 656 658 10100 xscvdpuxds xscvspdp VSX XX VSX XX 688 690 10101 xscvdpsxds xsabsdp VSX XX VSX XX 720 722 10110 xscvuxddp xsnabsdp VSX XX VSX XX 752 754 10111 xscvsxddp xsnegdp VSX XX VSX XX 784 786 11000 xvcvspuxds xvcvdpsp VSX XX VSX XX 816 818 11001 xvcvspsxds xvabssp VSX XX VSX XX 848 850 11010 xvcvuxdsp xvnabssp VSX XX VSX XX 880 882 11011 xvcvsxdsp xvnegsp VSX XX VSX XX 912 914 11100 xvcvdpuxds xvcvspdp VSX XX VSX XX 944 946 11101 xvcvdpsxds xvabsdp VSX XX VSX XX 976 978 11110 xvcvuxddp xvnabsdp VSX XX VSX XX 1008 1010 11111 xvcvsxddp xvnegdp VSX XX VSX XX Appendix G. Opcode Maps 1255 Version 2.06 Table 18:(Left) Extended opcodes for primary opcode 63 (instruction bits 21:30) 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 0 2 3 8 12 14 15 00000 fcmpu daddq dquaq fcpsgn frsp fctiw fctiwz FP X DFP X DFP Z FP X FP X FP X FP X 32 34 35 38 40 00001 fcmpo dmulq drrndq mtfsb1 fneg FP X DFP X DFP Z23 FP X FP X 64 66 67 70 72 00010 mcrfs dscliq dquaiq mtfsb0 fmr FP X DFP Z22 DFP Z FP X FP X 98 99 00011 dscriq drintxq DFP Z DFP Z23 128 130 134 136 142 143 00100 ftdiv dcmpoq mtfsfi fnabs fctiwu fctiwuz FP X DFP X FP X FP X FP X FP X 160 162 00101 ftsqrt dtstexq FP X DFP X 194 00110 dtstdcq DFP Z22 226 227 00111 dtstdgq drintnq DFP Z22 DFP Z23 258 259 264 01000 dctqpq dqua' fabs DFP X DFP Z FP X 290 291 01001 dctfixq drrnd' DFP X DFP Z 322 323 01010 ddedpdq dquai' DFP X DFP Z 354 355 01011 dxexq drintx' DFP X DFP Z23 392 01100 frin FP X 424 01101 friz FP X 456 01110 frip FP X 483 488 01111 drintn' frim DFP Z23 FP X 514 515 10000 dsubq dqua' DFP X DFP Z 546 547 10001 ddivq drrnd' DFP X DFP Z 578 579 583 10010 dscli' dquai' mffs DFP Z22 DFP Z FP X 610 611 10011 dscri' drintx' DFP Z22 DFP Z23 642 10100 dcmpuq DFP X 674 10101 dtstsfq DFP X 706 711 10110 dtstdc' mtfsf DFP Z23 FP XFL 738 739 10111 dtstdg' drintn' DFP Z23 DFP Z23 770 771 11000 drdpq dqua' DFP X DFP Z 802 803 814 815 11001 dcffixq drrnd' fctid fctidz DFP X DFP Z FP X FP X 834 835 846 11010 denbcdq dquai' fcfid DFP X DFP Z FP X 866 867 11011 diexq drintx' DFP X DFP Z23 11100 942 943 11101 fctidu fctiduz FP X FP X 974 11110 fcfidu FP X 995 11111 drintn' DFP Z23 1256 Power ISATM Book Appendices Version 2.06 Table 18. (Right) Extended opcodes for primary opcode 63 (instruction bits 21:30) 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 18 20 21 22 23 24 25 26 28 29 30 31 00000 fdiv fsub fadd fsqrt fsel fre fmul frsqrte fmsub fmadd fnmsub fnmadd FP A FP A FP A FP A FP A FP A FP A FP A FP A FP A FP A FP A || || || || || || || || || || || || 00001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11111 || || || || || || || || || || || || fdiv fsub fadd fsqrt fsel fre fmul frsqrte fmsub fmadd fnmsub fnmadd Appendix G. Opcode Maps 1257 Version 2.06 1258 Power ISATM Book Appendices Version 2.06 Appendix G. Power ISA Instruction Set Sorted by Category This appendix lists all the instructions in the Power ISA, grouped by category, and in order by mnemonic within cate- gory. Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 252 86 64 bpermd Bit Permute Doubleword X 31 58 SR 85 64 cntlzd[.] Count Leading Zeros Doubleword XO 31 489 SR 72 64 divd[o][.] Divide Doubleword XO 31 425 SR 73 64 divde[o][.] Divide Doubleword Extended XO 31 393 SR 73 64 divdeu[o][.] Divide Doubleword Extended Unsigned XO 31 457 SR 72 64 divdu[o][.] Divide Doubleword Unsigned X 31 986 SR 85 64 extsw[.] Extend Sign Word DS 58 0 50 64 ld Load Doubleword X 31 84 699 64 ldarx Load Doubleword And Reserve Indexed X 31 532 56 64 ldbrx Load Doubleword Byte-Reverse Indexed DS 58 1 50 64 ldu Load Doubleword with Update X 31 53 50 64 ldux Load Doubleword with Update Indexed X 31 21 50 64 ldx Load Doubleword Indexed DS 58 2 49 64 lwa Load Word Algebraic X 31 373 49 64 lwaux Load Word Algebraic with Update Indexed X 31 341 49 64 lwax Load Word Algebraic Indexed XO 31 73 SR 71 64 mulhd[.] Multiply High Doubleword XO 31 9 SR 71 64 mulhdu[.] Multiply High Doubleword Unsigned XO 31 233 SR 71 64 mulld[o][.] Multiply Low Doubleword X 31 186 84 64 prtyd Parity Doubleword MDS 30 8 SR 91 64 rldcl[.] Rotate Left Doubleword then Clear Left MDS 30 9 SR 92 64 rldcr[.] Rotate Left Doubleword then Clear Right MD 30 2 SR 91 64 rldic[.] Rotate Left Doubleword Immediate then Clear MD 30 0 SR 90 64 rldicl[.] Rotate Left Doubleword Immediate then Clear Left MD 30 1 SR 90 64 rldicr[.] Rotate Left Doubleword Immediate then Clear Right MD 30 3 SR 92 64 rldimi[.] Rotate Left Doubleword Immediate then Mask Insert X 31 27 SR 95 64 sld[.] Shift Left Doubleword X 31 794 SR 96 64 srad[.] Shift Right Algebraic Doubleword XS 31 413 SR 96 64 sradi[.] Shift Right Algebraic Doubleword Immediate X 31 539 SR 95 64 srd[.] Shift Right Doubleword DS 62 0 54 64 std Store Doubleword X 31 660 56 64 stdbrx Store Doubleword Byte-Reverse Indexed X 31 214 699 64 stdcx. Store Doubleword Conditional Indexed DS 62 1 54 64 stdu Store Doubleword with Update X 31 181 54 64 stdux Store Doubleword with Update Indexed X 31 149 54 64 stdx Store Doubleword Indexed X 31 68 77 64 td Trap Doubleword D 2 77 64 tdi Trap Doubleword Immediate XO 31 266 SR 63 B add[o][.] Add XO 31 10 SR 64 B addc[o][.] Add Carrying XO 31 138 SR 65 B adde[o][.] Add Extended D 14 62 B addi Add Immediate D 12 SR 63 B addic Add Immediate Carrying D 13 SR 63 B addic. Add Immediate Carrying and Record D 15 62 B addis Add Immediate Shifted Appendix G. Power ISA Instruction Set Sorted by Category 1259 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XO 31 234 SR 65 B addme[o][.] Add to Minus One Extended XO 31 202 SR 66 B addze[o][.] Add to Zero Extended X 31 28 SR 80 B and[.] AND X 31 60 SR 81 B andc[.] AND with Complement D 28 SR 78 B andi. AND Immediate D 29 SR 78 B andis. AND Immediate Shifted I 18 35 B b[l][a] Branch B 16 CT 35 B bc[l][a] Branch Conditional XL 19 528 CT 36 B bcctr[l] Branch Conditional to Count Register XL 19 16 CT 36 B bclr[l] Branch Conditional to Link Register X 31 0 74 B cmp Compare X 31 508 82 B cmpb Compare Bytes D 11 74 B cmpi Compare Immediate X 31 32 75 B cmpl Compare Logical D 10 75 B cmpli Compare Logical Immediate X 31 26 SR 81 B cntlzw[.] Count Leading Zeros Word XL 19 257 37 B crand Condition Register AND XL 19 129 38 B crandc Condition Register AND with Complement XL 19 289 38 B creqv Condition Register Equivalent XL 19 225 37 B crnand Condition Register NAND XL 19 33 38 B crnor Condition Register NOR XL 19 449 37 B cror Condition Register OR XL 19 417 38 B crorc Condition Register OR with Complement XL 19 193 37 B crxor Condition Register XOR X 31 86 691 B dcbf Data Cache Block Flush X 31 54 691 B dcbst Data Cache Block Store X 31 278 688 B dcbt Data Cache Block Touch X 31 246 689 B dcbtst Data Cache Block Touch for Store X 31 1014 691 B dcbz Data Cache Block set to Zero XO 31 491 SR 68 B divw[o][.] Divide Word XO 31 427 SR 69 B divwe[o][.] Divide Word Extended XO 31 395 SR 69 B divweu[o][.] Divide Word Extended Unsigned XO 31 459 SR 68 B divwu[o][.] Divide Word Unsigned X 31 284 SR 81 B eqv[.] Equivalent X 31 954 SR 81 B extsb[.] Extend Sign Byte X 31 922 SR 81 B extsh[.] Extend Sign Halfword X 31 982 680 B icbi Instruction Cache Block Invalidate A 31 15 77 B isel Integer Select XL 19 150 693 B isync Instruction Synchronize X 31 52 694 B lbarx Load Byte and Reserve Indexed D 34 45 B lbz Load Byte and Zero D 35 45 B lbzu Load Byte and Zero with Update X 31 119 45 B lbzux Load Byte and Zero with Update Indexed X 31 87 46 B lbzx Load Byte and Zero Indexed D 42 47 B lha Load Halfword Algebraic X 31 116 695 B lharx Load Halfword and Reserve Indexed D 43 47 B lhau Load Halfword Algebraic with Update X 31 375 47 B lhaux Load Halfword Algebraic with Update Indexed X 31 343 47 B lhax Load Halfword Algebraic Indexed X 31 790 55 B lhbrx Load Halfword Byte-Reverse Indexed D 40 46 B lhz Load Halfword and Zero D 41 46 B lhzu Load Halfword and Zero with Update X 31 311 46 B lhzux Load Halfword and Zero with Update Indexed X 31 279 46 B lhzx Load Halfword and Zero Indexed D 46 57 B lmw Load Multiple Word X 31 20 694 B lwarx Load Word And Reserve Indexed X 31 534 55 B lwbrx Load Word Byte-Reverse Indexed D 32 48 B lwz Load Word and Zero D 33 48 B lwzu Load Word and Zero with Update X 31 55 48 B lwzux Load Word and Zero with Update Indexed X 31 23 48 B lwzx Load Word and Zero Indexed 1260 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XL 19 0 38 B mcrf Move Condition Register Field XFX 31 19 102 B mfcr Move From Condition Register X 31 83 P 767, B mfmsr Move From Machine State Register 923 XFX 31 19 103 B mfocrf Move From One Condition Register Field XFX 31 339 O 101, B mfspr Move From Special Purpose Register 708 XFX 31 144 102 B mtcrf Move To Condition Register Fields XFX 31 144 103 B mtocrf Move To One Condition Register Field XFX 31 467 O 100 B mtspr Move To Special Purpose Register XO 31 75 SR 67 B mulhw[.] Multiply High Word XO 31 11 SR 67 B mulhwu[.] Multiply High Word Unsigned D 7 67 B mulli Multiply Low Immediate XO 31 235 SR 67 B mullw[o][.] Multiply Low Word X 31 476 SR 80 B nand[.] NAND XO 31 104 SR 66 B neg[o][.] Negate X 31 124 SR 81 B nor[.] NOR X 31 444 SR 80 B or[.] OR X 31 412 SR 81 B orc[.] OR with Complement D 24 78 B ori OR Immediate D 25 79 B oris OR Immediate Shifted X 31 122 83 B popcntb Population Count Bytes X 31 506 85 B popcntd Population Count Doubleword X 31 378 83 B popcntw Population Count Word X 31 154 84 B prtyw Parity Word M 20 SR 89 B rlwimi[.] Rotate Left Word Immediate then Mask Insert M 21 SR 87 B rlwinm[.] Rotate Left Word Immediate then AND with Mask M 23 SR 88 B rlwnm[.] Rotate Left Word then AND with Mask SC 17 39, B sc System Call 745, 908 X 31 24 SR 93 B slw[.] Shift Left Word X 31 792 SR 94 B sraw[.] Shift Right Algebraic Word X 31 824 SR 94 B srawi[.] Shift Right Algebraic Word Immediate X 31 536 SR 93 B srw[.] Shift Right Word D 38 51 B stb Store Byte X 31 694 696 B stbcx. Store Byte Conditional Indexed D 39 51 B stbu Store Byte with Update X 31 247 51 B stbux Store Byte with Update Indexed X 31 215 51 B stbx Store Byte Indexed D 44 52 B sth Store Halfword X 31 918 55 B sthbrx Store Halfword Byte-Reverse Indexed X 31 726 697 B sthcx. Store Halfword Conditional Indexed D 45 52 B sthu Store Halfword with Update X 31 439 52 B sthux Store Halfword with Update Indexed X 31 407 52 B sthx Store Halfword Indexed D 47 57 B stmw Store Multiple Word D 36 53 B stw Store Word X 31 662 55 B stwbrx Store Word Byte-Reverse Indexed X 31 150 698 B stwcx. Store Word Conditional Indexed D 37 53 B stwu Store Word with Update X 31 183 53 B stwux Store Word with Update Indexed X 31 151 53 B stwx Store Word Indexed XO 31 40 SR 63 B subf[o][.] Subtract From XO 31 8 SR 64 B subfc[o][.] Subtract From Carrying XO 31 136 SR 65 B subfe[o][.] Subtract From Extended D 8 SR 64 B subfic Subtract From Immediate Carrying XO 31 232 SR 65 B subfme[o][.] Subtract From Minus One Extended XO 31 200 SR 66 B subfze[o][.] Subtract From Zero Extended X 31 598 701 B sync Synchronize Appendix G. Power ISA Instruction Set Sorted by Category 1261 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 566 H 817, B tlbsync TLB Synchronize 1010 ,905 X 31 4 76 B tw Trap Word D 3 76 B twi Trap Word Immediate X 31 316 SR 80 B xor[.] XOR D 26 79 B xori XOR Immediate D 27 79 B xoris XOR Immediate Shifted XO 31 74 H 97 BCDA addg6s Add and Generate Sixes X 31 314 H 97 BCDA cbcdtd Convert Binary Coded Decimal to Declets X 31 282 H 97 BCDA cdtbcd Convert Declets To Binary Coded Decimal X 59 2 173 DFP dadd[.] DFP Add X 63 2 173 DFP daddq[.] DFP Add Quad X 59 802 195 DFP dcffix[.] DFP Convert From Fixed X 63 802 195 DFP dcffixq[.] DFP Convert From Fixed Quad X 59 130 179 DFP dcmpo DFP Compare Ordered X 63 130 179 DFP dcmpoq DFP Compare Ordered Quad X 59 642 178 DFP dcmpu DFP Compare Unordered X 63 642 179 DFP dcmpuq DFP Compare Unordered Quad X 59 258 193 DFP dctdp[.] DFP Convert To DFP Long X 59 290 195 DFP dctfix[.] DFP Convert To Fixed X 63 290 195 DFP dctfixq[.] DFP Convert To Fixed Quad X 63 258 193 DFP dctqpq[.] DFP Convert To DFP Extended X 59 322 197 DFP ddedpd[.] DFP Decode DPD To BCD X 63 322 197 DFP ddedpdq[.] DFP Decode DPD To BCD Quad X 59 546 176 DFP ddiv[.] DFP Divide X 63 546 176 DFP ddivq[.] DFP Divide Quad X 59 834 197 DFP denbcd[.] DFP Encode BCD To DPD X 63 834 197 DFP denbcdq[.] DFP Encode BCD To DPD Quad X 59 866 198 DFP diex[.] DFP Insert Biased Exponent X 63 866 198 DFP diexq[.] DFP Insert Biased Exponent Quad X 59 34 175 DFP dmul[.] DFP Multiply X 63 34 175 DFP dmulq[.] DFP Multiply Quad Z 59 3 184 DFP dqua[.] DFP Quantize Z23 59 67 183 DFP dquai[.] DFP Quantize Immediate Z23 63 67 183 DFP dquaiq[.] DFP Quantize Immediate Quad Z23 63 3 184 DFP dquaq[.] DFP Quantize Quad X 63 770 194 DFP drdpq[.] DFP Round To DFP Long Z23 59 227 191 DFP drintn[.] DFP Round To FP Integer Without Inexact Z23 63 227 191 DFP drintnq[.] DFP Round To FP Integer Without Inexact Quad Z23 59 99 189 DFP drintx[.] DFP Round To FP Integer With Inexact Z23 63 99 189 DFP drintxq[.] DFP Round To FP Integer With Inexact Quad Z 59 35 186 DFP drrnd[.] DFP Reround Z23 63 35 186 DFP drrndq[.] DFP Reround Quad X 59 770 194 DFP drsp[.] DFP Round To DFP Short Z23 59 66 200 DFP dscli[.] DFP Shift Significand Left Immediate Z23 63 66 200 DFP dscliq[.] DFP Shift Significand Left Immediate Quad Z 59 98 200 DFP dscri[.] DFP Shift Significand Right Immediate Z 63 98 200 DFP dscriq[.] DFP Shift Significand Right Immediate Quad X 59 514 173 DFP dsub[.] DFP Subtract X 63 514 173 DFP dsubq[.] DFP Subtract Quad Z23 59 194 180 DFP dtstdc DFP Test Data Class Z23 63 194 180 DFP dtstdcq DFP Test Data Class Quad Z23 59 226 180 DFP dtstdg DFP Test Data Group Z23 63 226 180 DFP dtstdgq DFP Test Data Group Quad X 59 162 181 DFP dtstex DFP Test Exponent X 63 162 181 DFP dtstexq DFP Test Exponent Quad X 59 674 182 DFP dtstsf DFP Test Significance X 63 674 182 DFP dtstsfq DFP Test Significance Quad X 59 354 198 DFP dxex[.] DFP Extract Biased Exponent X 63 354 198 DFP dxexq[.] DFP Extract Biased Exponent Quad 1262 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 483 714 DS dsn Decorated Storage Notify X 31 515 712 DS lbdx Load Byte with Decoration Indexed X 31 611 712 DS lddx Load Doubleword with Decoration Indexed X 31 803 712 DS lfddx Load Floating Doubleword with Decoration Indexed X 31 547 712 DS lhdx Load Halfword with Decoration Indexed X 31 579 712 DS lwdx Load Word with Decoration Indexed X 31 643 713 DS stbdx Store Byte with Decoration Indexed X 31 739 713 DS stddx Store Doubleword with Decoration Indexed X 31 931 713 DS stfddx Store Floating Doubleword with Decoration Indexed X 31 675 713 DS sthdx Store Halfword with Decoration Indexed X 31 707 713 DS stwdx Store Word with Decoration Indexed X 31 758 687 E dcba Data Cache Block Allocate X 31 470 P 988 E dcbi Data Cache Block Invalidate X 31 22 680 E icbt Instruction Cache Block Touch X 31 854 703 E mbar Memory Barrier X 31 512 104 E mcrxr Move to Condition Register from XER X 31 146 P 923 E mtmsr Move To Machine State Register XL 19 51 P 909 E rfci Return From Critical Interrupt XL 19 50 P 909 E rfi Return From Interrupt XL 19 38 P 910 E rfmci Return From Machine Check Interrupt X 31 786 P 1001 E tlbivax TLB Invalidate Virtual Address Indexed ,903 X 31 946 P 1008 E tlbre TLB Read Entry ,904 X 31 914 P 1005 E tlbsx TLB Search Indexed ,904 X 31 978 P 1010 E tlbwe TLB Write Entry ,905 X 31 131 P 924 E wrtee Write MSR External Enable X 31 163 P 925 E wrteei Write MSR External Enable Immediate X 31 326 P 1106 E.CD dcread Data Cache Read [Alternative Encoding] X 31 486 P 1106 E.CD dcread Data Cache Read X 31 998 P 1107 E.CD icread Instruction Cache Read X 31 454 P 1103 E.CI dci Data Cache Invalidate X 31 966 P 1103 E.CI ici Instruction Cache Invalidate XFX 31 323 P 923 E.DC mfdcr Move From Device Control Register X 31 291 104 E.DC mfdcrux Move From Device Control Register User-mode Indexed X 31 259 P 923 E.DC mfdcrx Move From Device Control Register Indexed XFX 31 451 P 922 E.DC mtdcr Move To Device Control Register X 31 419 104 E.DC mtdcrux Move To Device Control Register User-mode Indexed X 31 387 P 922 E.DC mtdcrx Move To Device Control Register Indexed XFX 19 198 1092 E.ED dnh Debugger Notify Halt X 19 39 P 910 E.ED rfdi Return From Debug Interrupt XL 31 270 911 E.HV ehpriv Embedded Hypervisor Privilege XL 19 102 911 E.HV rfgi Return From Guest Interrupt X 31 18 P 1003 E.HV tlbilx TLB Invalidate Local X 31 238 P 1098 E.PC msgclr Message Clear X 31 206 P 1098 E.PC msgsnd Message Send X 31 127 P 932 E.PD dcbfep Data Cache Block Flush by External PID X 31 63 P 931 E.PD dcbstep Data Cache Block Store by External PID X 31 319 P 931 E.PD dcbtep Data Cache Block Touch by External PID X 31 255 P 933 E.PD dcbtstep Data Cache Block Touch for Store by External PID X 31 1023 P 934 E.PD dcbzep Data Cache Block set to Zero by External PID EVX 31 799 P 936 E.PD evlddepx Vector Load Doubleword into Doubleword by External Process ID Indexed EVX 31 927 P 936 E.PD evstddepx Vector Store Doubleword into Doubleword by External Process ID Indexed X 31 991 P 934 E.PD icbiep Instruction Cache Block Invalidate by External PID X 31 95 P 927 E.PD lbepx Load Byte by External Process ID Indexed Appendix G. Power ISA Instruction Set Sorted by Category 1263 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 607 P 935 E.PD lfdepx Load Floating-Point Double by External Process ID Indexed X 31 287 P 927 E.PD lhepx Load Halfword by External Process ID Indexed X 31 295 P 937 E.PD lvepx Load Vector by External Process ID Indexed X 31 263 P 937 E.PD lvepxl Load Vector by External Process ID Indexed LRU X 31 31 P 928 E.PD lwepx Load Word by External Process ID Indexed X 31 223 P 929 E.PD stbepx Store Byte by External Process ID Indexed X 31 735 P 935 E.PD stfdepx Store Floating-Point Double by External Process ID Indexed X 31 415 P 929 E.PD sthepx Store Halfword by External Process ID Indexed X 31 807 P 938 E.PD stvepx Store Vector by External Process ID Indexed X 31 775 P 938 E.PD stvepxl Store Vector by External Process ID Indexed LRU X 31 159 P 930 E.PD stwepx Store Word by External Process ID Indexed X 31 29 P 928 E.PD;64 ldepx Load Doubleword by External Process ID Indexed X 31 157 P 930 E.PD;64 stdepx Store Doubleword by External Process ID Indexed XFX 31 334 O 1118 E.PM mfpmr Move From Performance Monitor Register XFX 31 462 O 1118 E.PM mtpmr Move To Performance Monitor Register X 31 850 P 1007 E.TWC tlbsrx. TLB Search and Reserve X 31 310 716 EC eciwx External Control In Word Indexed X 31 438 716 EC ecowx External Control Out Word Indexed X 31 390 M 992 ECL dcblc Data Cache Block Lock Clear X 31 166 M 991 ECL dcbtls Data Cache Block Touch and Lock Set X 31 134 M 991 ECL dcbtstls Data Cache Block Touch for Store and Lock Set X 31 230 M 993 ECL icblc Instruction Cache Block Lock Clear X 31 486 M 992 ECL icbtls Instruction Cache Block Touch and Lock Set X 59 846 145 FP fcfids[.] Floating Convert From Integer Doubleword Single X 63 974 145 FP fcfidu[.] Floating Convert From Integer Doubleword Unsigned X 59 974 146 FP fcfidus[.] Floating Convert From Integer Doubleword Unsigned Single X 63 32 148 FP fcmpo Floating Compare Ordered X 63 0 148 FP fcmpu Floating Compare Unordered X 63 942 141 FP fctidu[.] Floating Convert To Integer Doubleword Unsigned X 63 943 142 FP fctiduz[.] Floating Convert To Integer Doubleword Unsigned with round toward Zero X 63 142 143 FP fctiwu[.] Floating Convert To Integer Word Unsigned X 63 143 144 FP fctiwuz[.] Floating Convert To Integer Word Unsigned with round toward Zero X 63 128 137 FP ftdiv Floating Test for software Divide X 63 160 137 FP ftsqrt Floating Test for software Square Root D 50 125 FP lfd Load Floating-Point Double DS 57 0 131 FP lfdp Load Floating-Point Double Pair X 31 791 131 FP lfdpx Load Floating-Point Double Pair Indexed D 51 125 FP lfdu Load Floating-Point Double with Update X 31 631 125 FP lfdux Load Floating-Point Double with Update Indexed X 31 599 125 FP lfdx Load Floating-Point Double Indexed X 31 855 126 FP lfiwax Load Floating-Point as Integer Word Algebraic Indexed X 31 887 126 FP lfiwzx Load Floating-Point as Integer Word and Zero Indexed D 48 128 FP lfs Load Floating-Point Single D 49 128 FP lfsu Load Floating-Point Single with Update X 31 567 128 FP lfsux Load Floating-Point Single with Update Indexed X 31 535 128 FP lfsx Load Floating-Point Single Indexed X 63 64 150 FP mcrfs Move to Condition Register from FPSCR D 54 129 FP stfd Store Floating-Point Double DS 61 - 131 FP stfdp Store Floating-Point Double Pair X 31 919 131 FP stfdpx Store Floating-Point Double Pair Indexed D 55 129 FP stfdu Store Floating-Point Double with Update X 31 759 129 FP stfdux Store Floating-Point Double with Update Indexed X 31 727 129 FP stfdx Store Floating-Point Double Indexed X 31 983 130 FP stfiwx Store Floating-Point as Integer Word Indexed D 52 128 FP stfs Store Floating-Point Single D 53 128 FP stfsu Store Floating-Point Single with Update 1264 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 695 128 FP stfsux Store Floating-Point Single with Update Indexed X 31 663 128 FP stfsx Store Floating-Point Single Indexed X 63 264 132 FP[R] fabs[.] Floating Absolute Value A 63 21 133 FP[R] fadd[.] Floating Add A 59 21 133 FP[R] fadds[.] Floating Add Single X 63 846 144 FP[R] fcfid[.] Floating Convert From Integer Doubleword X 63 8 132 FP[R] fcpsgn[.] Floating Copy Sign X 63 814 140 FP[R] fctid[.] Floating Convert To Integer Doubleword X 63 815 141 FP[R] fctidz[.] Floating Convert To Integer Doubleword with round toward Zero X 63 14 142 FP[R] fctiw[.] Floating Convert To Integer Word X 63 15 143 FP[R] fctiwz[.] Floating Convert To Integer Word with round toward Zero A 63 18 134 FP[R] fdiv[.] Floating Divide A 59 18 134 FP[R] fdivs[.] Floating Divide Single A 63 29 138 FP[R] fmadd[.] Floating Multiply-Add A 59 29 138 FP[R] fmadds[.] Floating Multiply-Add Single X 63 72 132 FP[R] fmr[.] Floating Move Register A 63 28 138 FP[R] fmsub[.] Floating Multiply-Subtract A 59 28 138 FP[R] fmsubs[.] Floating Multiply-Subtract Single A 63 25 134 FP[R] fmul[.] Floating Multiply A 59 25 134 FP[R] fmuls[.] Floating Multiply Single X 63 136 132 FP[R] fnabs[.] Floating Negative Absolute Value X 63 40 132 FP[R] fneg[.] Floating Negate A 63 31 139 FP[R] fnmadd[.] Floating Negative Multiply-Add A 59 31 139 FP[R] fnmadds[.] Floating Negative Multiply-Add Single A 63 30 139 FP[R] fnmsub[.] Floating Negative Multiply-Subtract A 59 30 139 FP[R] fnmsubs[.] Floating Negative Multiply-Subtract Single A 63 24 135 FP[R] fre[.] Floating Reciprocal Estimate A 59 24 135 FP[R] fres[.] Floating Reciprocal Estimate Single X 63 488 147 FP[R] frim[.] Floating Round to Integer Minus X 63 392 147 FP[R] frin[.] Floating Round to Integer Nearest X 63 456 147 FP[R] frip[.] Floating Round to Integer Plus X 63 424 147 FP[R] friz[.] Floating Round to Integer Toward Zero X 63 12 140 FP[R] frsp[.] Floating Round to Single-Precision A 63 26 136 FP[R] frsqrte[.] Floating Reciprocal Square Root Estimate A 59 26 136 FP[R] frsqrtes[.] Floating Reciprocal Square Root Estimate Single A 63 23 149 FP[R] fsel[.] Floating Select A 63 22 135 FP[R] fsqrt[.] Floating Square Root A 59 22 135 FP[R] fsqrts[.] Floating Square Root Single A 63 20 133 FP[R] fsub[.] Floating Subtract A 59 20 133 FP[R] fsubs[.] Floating Subtract Single X 63 583 150 FP[R] mffs[.] Move From FPSCR X 63 70 152 FP[R] mtfsb0[.] Move To FPSCR Bit 0 X 63 38 152 FP[R] mtfsb1[.] Move To FPSCR Bit 1 XFL 63 711 151 FP[R] mtfsf[.] Move To FPSCR Fields X 63 134 151 FP[R] mtfsfi[.] Move To FPSCR Field Immediate XO 4 172 593 LMA macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed XO 4 236 593 LMA macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed XO 4 204 594 LMA macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned XO 4 140 594 LMA macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned XO 4 44 595 LMA machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed XO 4 108 595 LMA machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed XO 4 76 596 LMA machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned Appendix G. Power ISA Instruction Set Sorted by Category 1265 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XO 4 12 596 LMA machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned XO 4 428 597 LMA maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed XO 4 492 597 LMA maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed XO 4 460 598 LMA maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned XO 4 396 598 LMA maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned X 4 168 598 LMA mulchw[.] Multiply Cross Halfword to Word Signed X 4 136 598 LMA mulchwu[.] Multiply Cross Halfword to Word Unsigned X 4 40 599 LMA mulhhw[.] Multiply High Halfword to Word Signed X 4 8 599 LMA mulhhwu[.] Multiply High Halfword to Word Unsigned X 4 424 599 LMA mullhw[.] Multiply Low Halfword to Word Signed X 4 392 599 LMA mullhwu[.] Multiply Low Halfword to Word Unsigned XO 4 174 600 LMA nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Modulo Signed XO 4 238 600 LMA nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Saturate Signed XO 4 46 601 LMA nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Modulo Signed XO 4 110 601 LMA nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Saturate Signed XO 4 430 602 LMA nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Modulo Signed XO 4 494 602 LMA nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Saturate Signed X 31 78 591 LMV dlmzb[.] Determine Leftmost Zero Byte DQ 56 P 759 LSQ lq Load Quadword DS 62 2 P 759 LSQ stq Store Quadword X 31 597 59 MA lswi Load String Word Immediate X 31 533 59 MA lswx Load String Word Indexed X 31 725 60 MA stswi Store String Word Immediate X 31 661 60 MA stswx Store String Word Indexed XL 19 402 H 748 S doze Doze X 31 854 703 S eieio Enforce In-order Execution of I/O XL 19 274 H 746 S hrfid Hypervisor Return From Interrupt Doubleword X 31 853 H 757 S lbzcix Load Byte and Zero Caching Inhibited Indexed X 31 885 H 757 S ldcix Load Doubleword Caching Inhibited Indexed X 31 821 H 757 S lhzcix Load Halfword and Zero Caching Inhibited Indexed X 31 789 H 757 S lwzcix Load Word and Zero Caching Inhibited Indexed X 31 595 32 P 810 S mfsr Move From Segment Register X 31 659 32 P 810 S mfsrin Move From Segment Register Indirect XFX 31 371 708 S mftb Move From Time Base X 31 146 P 765 S mtmsr Move To Machine State Register X 31 178 P 766 S mtmsrd Move To Machine State Register Doubleword X 31 210 32 P 809 S mtsr Move To Segment Register X 31 242 32 P 809 S mtsrin Move To Segment Register Indirect XL 19 434 H 748 S nap Nap XL 19 18 P 746 S rfid Return From Interrupt Doubleword XL 19 498 H 749 S rvwinkle Rip Van Winkle X 31 979 SR P 807 S slbfee. SLB Find Entry ESID X 31 498 P 804 S slbia SLB Invalidate All X 31 434 P 802 S slbie SLB Invalidate Entry X 31 915 P 806 S slbmfee SLB Move From Entry ESID X 31 851 P 806 S slbmfev SLB Move From Entry VSID X 31 402 P 805 S slbmte SLB Move To Entry XL 19 466 H 748 S sleep Sleep X 31 981 H 758 S stbcix Store Byte Caching Inhibited Indexed X 31 1013 H 758 S stdcix Store Doubleword Caching Inhibited Indexed 1266 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 949 H 758 S sthcix Store Halfword Caching Inhibited Indexed X 31 917 H 758 S stwcix Store Word Caching Inhibited Indexed X 31 370 H 817 S tlbia TLB Invalidate All X 31 306 64 H 811 S tlbie TLB Invalidate Entry X 31 274 64 P 814 S tlbiel TLB Invalidate Entry Local EVX 4 527 510 SP brinc Bit Reversed Increment EVX 4 520 510 SP evabs Vector Absolute Value EVX 4 514 510 SP evaddiw Vector Add Immediate Word EVX 4 1225 510 SP evaddsmiaaw Vector Add Signed, Modulo, Integer to Accumulator Word EVX 4 1217 511 SP evaddssiaaw Vector Add Signed, Saturate, Integer to Accumulator Word EVX 4 1224 511 SP evaddumiaaw Vector Add Unsigned, Modulo, Integer to Accumulator Word EVX 4 1216 511 SP evaddusiaaw Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX 4 512 511 SP evaddw Vector Add Word EVX 4 529 512 SP evand Vector AND EVX 4 530 512 SP evandc Vector AND with Complement EVX 4 564 512 SP evcmpeq Vector Compare Equal EVX 4 561 512 SP evcmpgts Vector Compare Greater Than Signed EVX 4 560 513 SP evcmpgtu Vector Compare Greater Than Unsigned EVX 4 563 513 SP evcmplts Vector Compare Less Than Signed EVX 4 562 513 SP evcmpltu Vector Compare Less Than Unsigned EVX 4 526 514 SP evcntlsw Vector Count Leading Signed Bits Word EVX 4 525 514 SP evcntlzw Vector Count Leading Zeros Word EVX 4 1222 514 SP evdivws Vector Divide Word Signed EVX 4 1223 515 SP evdivwu Vector Divide Word Unsigned EVX 4 537 515 SP eveqv Vector Equivalent EVX 4 522 515 SP evextsb Vector Extend Sign Byte EVX 4 523 515 SP evextsh Vector Extend Sign Halfword EVX 4 769 516 SP evldd Vector Load Double Word into Double Word EVX 4 768 516 SP evlddx Vector Load Double Word into Double Word Indexed EVX 4 773 516 SP evldh Vector Load Double into Four Halfwords EVX 4 772 516 SP evldhx Vector Load Double into Four Halfwords Indexed EVX 4 771 517 SP evldw Vector Load Double into Two Words EVX 4 770 517 SP evldwx Vector Load Double into Two Words Indexed EVX 4 777 517 SP evlhhesplat Vector Load Halfword into Halfwords Even and Splat EVX 4 776 517 SP evlhhesplatx Vector Load Halfword into Halfwords Even and Splat Indexed EVX 4 783 518 SP evlhhossplat Vector Load Halfword into Halfword Odd Signed and Splat EVX 4 782 518 SP evlhhossplatx Vector Load Halfword into Halfword Odd Signed and Splat Indexed EVX 4 781 518 SP evlhhousplat Vector Load Halfword into Halfword Odd Unsigned and Splat EVX 4 780 518 SP evlhhousplatx Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed EVX 4 785 519 SP evlwhe Vector Load Word into Two Halfwords Even EVX 4 784 519 SP evlwhex Vector Load Word into Two Halfwords Even Indexed EVX 4 791 519 SP evlwhos Vector Load Word into Two Halfwords Odd Signed (with sign extension) EVX 4 790 519 SP evlwhosx Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) EVX 4 789 520 SP evlwhou Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) EVX 4 788 520 SP evlwhoux Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) EVX 4 797 520 SP evlwhsplat Vector Load Word into Two Halfwords and Splat Appendix G. Power ISA Instruction Set Sorted by Category 1267 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 796 520 SP evlwhsplatx Vector Load Word into Two Halfwords and Splat Indexed EVX 4 793 521 SP evlwwsplat Vector Load Word into Word and Splat EVX 4 792 521 SP evlwwsplatx Vector Load Word into Word and Splat Indexed EVX 4 556 521 SP evmergehi Vector Merge High EVX 4 558 522 SP evmergehilo Vector Merge High/Low EVX 4 557 521 SP evmergelo Vector Merge Low EVX 4 559 522 SP evmergelohi Vector Merge Low/High EVX 4 1323 522 SP evmhegsmfaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate EVX 4 1451 522 SP evmhegsmfan Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 4 1321 523 SP evmhegsmiaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate EVX 4 1449 523 SP evmhegsmian Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 4 1320 523 SP evmhegumiaa Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1448 523 SP evmhegumian Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1035 524 SP evmhesmf Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional EVX 4 1067 524 SP evmhesmfa Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional to Accumulator EVX 4 1291 524 SP evmhesmfaaw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1419 524 SP evmhesmfanw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1033 525 SP evmhesmi Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger EVX 4 1065 525 SP evmhesmia Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger to Accumulator EVX 4 1289 525 SP evmhesmiaaw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1417 525 SP evmhesmianw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1027 526 SP evmhessf Vector Multiply Halfwords, Even, Signed, Saturate, Fractional EVX 4 1059 526 SP evmhessfa Vector Multiply Halfwords, Even, Signed, Saturate, Fractional to Accumulator EVX 4 1283 527 SP evmhessfaaw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Words EVX 4 1411 527 SP evmhessfanw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate Negative into Words EVX 4 1281 528 SP evmhessiaaw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate into Words EVX 4 1409 528 SP evmhessianw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate Negative into Words EVX 4 1032 529 SP evmheumi Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer EVX 4 1064 529 SP evmheumia Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator EVX 4 1288 529 SP evmheumiaaw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1416 529 SP evmheumianw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 4 1280 530 SP evmheusiaaw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate into Words 1268 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1408 530 SP evmheusianw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 4 1327 531 SP evmhogsmfaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate EVX 4 1455 531 SP evmhogsmfan Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate Negative EVX 4 1325 531 SP evmhogsmiaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate EVX 4 1453 531 SP evmhogsmian Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate Negative EVX 4 1324 532 SP evmhogumiaa Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1452 532 SP evmhogumian Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1039 532 SP evmhosmf Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional EVX 4 1071 532 SP evmhosmfa Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional to Accumulator EVX 4 1295 533 SP evmhosmfaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1423 533 SP evmhosmfanw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1037 533 SP evmhosmi Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger EVX 4 1069 533 SP evmhosmia Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger to Accumulator EVX 4 1293 534 SP evmhosmiaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1421 533 SP evmhosmianw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1031 535 SP evmhossf Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional EVX 4 1063 535 SP evmhossfa Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional to Accumulator EVX 4 1287 536 SP evmhossfaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate into Words EVX 4 1415 536 SP evmhossfanw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 4 1285 537 SP evmhossiaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate into Words EVX 4 1413 537 SP evmhossianw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate Negative into Words EVX 4 1036 537 SP evmhoumi Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer EVX 4 1068 537 SP evmhoumia Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator EVX 4 1292 538 SP evmhoumiaaw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1420 534 SP evmhoumianw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 4 1284 538 SP evmhousiaaw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1412 538 SP evmhousianw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 4 1220 539 SP evmra Initialize Accumulator EVX 4 1103 539 SP evmwhsmf Vector Multiply Word High Signed, Modulo, Fractional EVX 4 1135 539 SP evmwhsmfa Vector Multiply Word High Signed, Modulo, Fractional to Accumulator EVX 4 1101 539 SP evmwhsmi Vector Multiply Word High Signed, Modulo, Integer Appendix G. Power ISA Instruction Set Sorted by Category 1269 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1133 539 SP evmwhsmia Vector Multiply Word High Signed, Modulo, Integer to Accumulator EVX 4 1095 540 SP evmwhssf Vector Multiply Word High Signed, Saturate, Fractional EVX 4 1127 540 SP evmwhssfa Vector Multiply Word High Signed, Saturate, Fractional to Accumulator EVX 4 1100 540 SP evmwhumi Vector Multiply Word High Unsigned, Modulo, Integer EVX 4 1132 540 SP evmwhumia Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator EVX 4 1353 541 SP evmwlsmiaaw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words EVX 4 1481 541 SP evmwlsmianw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words EVX 4 1345 541 SP evmwlssiaaw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words EVX 4 1473 541 SP evmwlssianw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words EVX 4 1096 542 SP evmwlumi Vector Multiply Word Low Unsigned, Modulo, Integer EVX 4 1128 542 SP evmwlumia Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator EVX 4 1352 542 SP evmwlumiaaw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1480 542 SP evmwlumianw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words EVX 4 1344 543 SP evmwlusiaaw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1472 543 SP evmwlusianw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words EVX 4 1115 543 SP evmwsmf Vector Multiply Word Signed, Modulo, Fractional EVX 4 1147 543 SP evmwsmfa Vector Multiply Word Signed, Modulo, Fractional to Accumulator EVX 4 1371 544 SP evmwsmfaa Vector Multiply Word Signed, Modulo, Fractional and Accumulate EVX 4 1499 544 SP evmwsmfan Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative EVX 4 1113 544 SP evmwsmi Vector Multiply Word Signed, Modulo, Integer EVX 4 1145 544 SP evmwsmia Vector Multiply Word Signed, Modulo, Integer to Accu- mulator EVX 4 1369 544 SP evmwsmiaa Vector Multiply Word Signed, Modulo, Integer and Accumulate EVX 4 1497 544 SP evmwsmian Vector Multiply Word Signed, Modulo, Integer and Accumulate Negative EVX 4 1107 545 SP evmwssf Vector Multiply Word Signed, Saturate, Fractional EVX 4 1139 545 SP evmwssfa Vector Multiply Word Signed, Saturate, Fractional to Accumulator EVX 4 1363 545 SP evmwssfaa Vector Multiply Word Signed, Saturate, Fractional and Accumulate EVX 4 1491 546 SP evmwssfan Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative EVX 4 1112 546 SP evmwumi Vector Multiply Word Unsigned, Modulo, Integer EVX 4 1144 546 SP evmwumia Vector Multiply Word Unsigned, Modulo, Integer to Accumulator EVX 4 1368 547 SP evmwumiaa Vector Multiply Word Unsigned, Modulo, Integer and Accumulate EVX 4 1496 547 SP evmwumian Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative EVX 4 542 547 SP evnand Vector NAND EVX 4 521 547 SP evneg Vector Negate EVX 4 536 547 SP evnor Vector NOR EVX 4 535 548 SP evor Vector OR EVX 4 539 548 SP evorc Vector OR with Complement 1270 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 552 548 SP evrlw Vector Rotate Left Word EVX 4 554 549 SP evrlwi Vector Rotate Left Word Immediate EVX 4 524 549 SP evrndw Vector Round Word EVS 4 79 549 SP evsel Vector Select EVX 4 548 550 SP evslw Vector Shift Left Word EVX 4 550 550 SP evslwi Vector Shift Left Word Immediate EVX 4 555 550 SP evsplatfi Vector Splat Fractional Immediate EVX 4 553 550 SP evsplati Vector Splat Immediate EVX 4 547 550 SP evsrwis Vector Shift Right Word Immediate Signed EVX 4 546 550 SP evsrwiu Vector Shift Right Word Immediate Unsigned EVX 4 545 551 SP evsrws Vector Shift Right Word Signed EVX 4 544 551 SP evsrwu Vector Shift Right Word Unsigned EVX 4 801 551 SP evstdd Vector Store Double of Double EVX 4 800 551 SP evstddx Vector Store Double of Double Indexed EVX 4 805 552 SP evstdh Vector Store Double of Four Halfwords EVX 4 804 552 SP evstdhx Vector Store Double of Four Halfwords Indexed EVX 4 803 552 SP evstdw Vector Store Double of Two Words EVX 4 802 552 SP evstdwx Vector Store Double of Two Words Indexed EVX 4 817 553 SP evstwhe Vector Store Word of Two Halfwords from Even EVX 4 816 553 SP evstwhex Vector Store Word of Two Halfwords from Even Indexed EVX 4 821 553 SP evstwho Vector Store Word of Two Halfwords from Odd EVX 4 820 553 SP evstwhox Vector Store Word of Two Halfwords from Odd Indexed EVX 4 825 553 SP evstwwe Vector Store Word of Word from Even EVX 4 824 553 SP evstwwex Vector Store Word of Word from Even Indexed EVX 4 829 554 SP evstwwo Vector Store Word of Word from Odd EVX 4 828 554 SP evstwwox Vector Store Word of Word from Odd Indexed EVX 4 1227 554 SP evsubfsmiaaw Vector Subtract Signed, Modulo, Integer to Accumula- tor Word EVX 4 1219 554 SP evsubfssiaaw Vector Subtract Signed, Saturate, Integer to Accumula- tor Word EVX 4 1226 555 SP evsubfumiaaw Vector Subtract Unsigned, Modulo, Integer to Accumu- lator Word EVX 4 1218 555 SP evsubfusiaaw Vector Subtract Unsigned, Saturate, Integer to Accu- mulator Word EVX 4 516 555 SP evsubfw Vector Subtract from Word EVX 4 518 555 SP evsubifw Vector Subtract Immediate from Word EVX 4 534 555 SP evxor Vector XOR EVX 4 740 577 SP.FD efdabs Floating-Point Double-Precision Absolute Value EVX 4 736 578 SP.FD efdadd Floating-Point Double-Precision Add EVX 4 751 584 SP.FD efdcfs Floating-Point Double-Precision Convert from Single- Precision EVX 4 755 582 SP.FD efdcfsf Convert Floating-Point Double-Precision from Signed Fraction EVX 4 753 581 SP.FD efdcfsi Convert Floating-Point Double-Precision from Signed Integer EVX 4 739 582 SP.FD efdcfsid Convert Floating-Point Double-Precision from Signed Integer Doubleword EVX 4 754 582 SP.FD efdcfuf Convert Floating-Point Double-Precision from Unsigned Fraction EVX 4 752 581 SP.FD efdcfui Convert Floating-Point Double-Precision from Unsigned Integer EVX 4 738 582 SP.FD efdcfuid Convert Floating-Point Double-Precision from Unsigned Integer Doubleword EVX 4 750 579 SP.FD efdcmpeq Floating-Point Double-Precision Compare Equal EVX 4 748 579 SP.FD efdcmpgt Floating-Point Double-Precision Compare Greater Than EVX 4 749 579 SP.FD efdcmplt Floating-Point Double-Precision Compare Less Than EVX 4 759 584 SP.FD efdctsf Convert Floating-Point Double-Precision to Signed Fraction Appendix G. Power ISA Instruction Set Sorted by Category 1271 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 757 582 SP.FD efdctsi Convert Floating-Point Double-Precision to Signed Inte- ger EVX 4 747 583 SP.FD efdctsidz Convert Floating-Point Double-Precision to Signed Inte- ger Doubleword with Round toward Zero EVX 4 762 584 SP.FD efdctsiz Convert Floating-Point Double-Precision to Signed Inte- ger with Round toward Zero EVX 4 758 584 SP.FD efdctuf Convert Floating-Point Double-Precision to Unsigned Fraction EVX 4 756 582 SP.FD efdctui Convert Floating-Point Double-Precision to Unsigned Integer EVX 4 746 583 SP.FD efdctuidz Convert Floating-Point Double-Precision to Unsigned Integer Doubleword with Round toward Zero EVX 4 760 584 SP.FD efdctuiz Convert Floating-Point Double-Precision to Unsigned Integer with Round toward Zero EVX 4 745 578 SP.FD efddiv Floating-Point Double-Precision Divide EVX 4 744 578 SP.FD efdmul Floating-Point Double-Precision Multiply EVX 4 741 577 SP.FD efdnabs Floating-Point Double-Precision Negative Absolute Value EVX 4 742 577 SP.FD efdneg Floating-Point Double-Precision Negate EVX 4 737 578 SP.FD efdsub Floating-Point Double-Precision Subtract EVX 4 766 580 SP.FD efdtsteq Floating-Point Double-Precision Test Equal EVX 4 764 579 SP.FD efdtstgt Floating-Point Double-Precision Test Greater Than EVX 4 765 580 SP.FD efdtstlt Floating-Point Double-Precision Test Less Than EVX 4 719 585 SP.FD efscfd Floating-Point Single-Precision Convert from Double- Precision EVX 4 708 570 SP.FS efsabs Floating-Point Single-Precision Absolute Value EVX 4 704 571 SP.FS efsadd Floating-Point Single-Precision Add EVX 4 723 575 SP.FS efscfsf Convert Floating-Point Single-Precision from Signed Fraction EVX 4 721 575 SP.FS efscfsi Convert Floating-Point Single-Precision from Signed Integer EVX 4 722 575 SP.FS efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 720 575 SP.FS efscfui Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 718 573 SP.FS efscmpeq Floating-Point Single-Precision Compare Equal EVX 4 716 572 SP.FS efscmpgt Floating-Point Single-Precision Compare Greater Than EVX 4 717 572 SP.FS efscmplt Floating-Point Single-Precision Compare Less Than EVX 4 727 576 SP.FS efsctsf Convert Floating-Point Single-Precision to Signed Frac- tion EVX 4 725 575 SP.FS efsctsi Convert Floating-Point Single-Precision to Signed Inte- ger EVX 4 730 576 SP.FS efsctsiz Convert Floating-Point Single-Precision to Signed Inte- ger with Round toward Zero EVX 4 726 576 SP.FS efsctuf Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 724 575 SP.FS efsctui Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 728 576 SP.FS efsctuiz Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 713 571 SP.FS efsdiv Floating-Point Single-Precision Divide EVX 4 712 571 SP.FS efsmul Floating-Point Single-Precision Multiply EVX 4 709 570 SP.FS efsnabs Floating-Point Single-Precision Negative Absolute Value EVX 4 710 570 SP.FS efsneg Floating-Point Single-Precision Negate EVX 4 705 571 SP.FS efssub Floating-Point Single-Precision Subtract EVX 4 734 574 SP.FS efststeq Floating-Point Single-Precision Test Equal EVX 4 732 573 SP.FS efststgt Floating-Point Single-Precision Test Greater Than EVX 4 733 574 SP.FS efststlt Floating-Point Single-Precision Test Less Than EVX 4 644 562 SP.FV evfsabs Vector Floating-Point Single-Precision Absolute Value 1272 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 640 563 SP.FV evfsadd Vector Floating-Point Single-Precision Add EVX 4 659 567 SP.FV evfscfsf Vector Convert Floating-Point Single-Precision from Signed Fraction EVX 4 657 567 SP.FV evfscfsi Vector Convert Floating-Point Single-Precision from Signed Integer EVX 4 658 567 SP.FV evfscfuf Vector Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 656 567 SP.FV evfscfui Vector Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 654 565 SP.FV evfscmpeq Vector Floating-Point Single-Precision Compare Equal EVX 4 652 564 SP.FV evfscmpgt Vector Floating-Point Single-Precision Compare Greater Than EVX 4 653 564 SP.FV evfscmplt Vector Floating-Point Single-Precision Compare Less Than EVX 4 663 569 SP.FV evfsctsf Vector Convert Floating-Point Single-Precision to Signed Fraction EVX 4 661 568 SP.FV evfsctsi Vector Convert Floating-Point Single-Precision to Signed Integer EVX 4 666 568 SP.FV evfsctsiz Vector Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero EVX 4 662 569 SP.FV evfsctuf Vector Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 660 568 SP.FV evfsctui Vector Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 664 568 SP.FV evfsctuiz Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 649 563 SP.FV evfsdiv Vector Floating-Point Single-Precision Divide EVX 4 648 563 SP.FV evfsmul Vector Floating-Point Single-Precision Multiply EVX 4 645 562 SP.FV evfsnabs Vector Floating-Point Single-Precision Negative Abso- lute Value EVX 4 646 562 SP.FV evfsneg Vector Floating-Point Single-Precision Negate EVX 4 641 563 SP.FV evfssub Vector Floating-Point Single-Precision Subtract EVX 4 670 566 SP.FV evfststeq Vector Floating-Point Single-Precision Test Equal EVX 4 668 565 SP.FV evfststgt Vector Floating-Point Single-Precision Test Greater Than EVX 4 669 566 SP.FV evfststlt Vector Floating-Point Single-Precision Test Less Than X 31 7 216 V lvebx Load Vector Element Byte Indexed X 31 39 213 V lvehx Load Vector Element Halfword Indexed X 31 71 213 V lvewx Load Vector Element Word Indexed X 31 6 218 V lvsl Load Vector for Shift Left Indexed X 31 38 218 V lvsr Load Vector for Shift Right Indexed X 31 103 214 V lvx Load Vector Indexed X 31 359 214 V lvxl Load Vector Indexed LRU VX 4 1540 269 V mfvscr Move From Vector Status and Control Register VX 4 1604 269 V mtvscr Move To Vector Status and Control Register X 31 135 216 V stvebx Store Vector Element Byte Indexed X 31 167 216 V stvehx Store Vector Element Halfword Indexed X 31 199 217 V stvewx Store Vector Element Word Indexed X 31 231 214 V stvx Store Vector Indexed X 31 487 217 V stvxl Store Vector Indexed LRU VX 4 384 230 V vaddcuw Vector Add and Write Carry-Out Unsigned Word VX 4 10 259 V vaddfp Vector Add Single-Precision VX 4 768 230 V vaddsbs Vector Add Signed Byte Saturate VX 4 832 230 V vaddshs Vector Add Signed Halfword Saturate VX 4 896 230 V vaddsws Vector Add Signed Word Saturate VX 4 0 231 V vaddubm Vector Add Unsigned Byte Modulo VX 4 512 232 V vaddubs Vector Add Unsigned Byte Saturate VX 4 64 231 V vadduhm Vector Add Unsigned Halfword Modulo VX 4 576 232 V vadduhs Vector Add Unsigned Halfword Saturate VX 4 128 231 V vadduwm Vector Add Unsigned Word Modulo Appendix G. Power ISA Instruction Set Sorted by Category 1273 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 640 232 V vadduws Vector Add Unsigned Word Saturate VX 4 1028 254 V vand Vector Logical AND VX 4 1092 254 V vandc Vector Logical AND with Complement VX 4 1282 245 V vavgsb Vector Average Signed Byte VX 4 1346 245 V vavgsh Vector Average Signed Halfword VX 4 1410 245 V vavgsw Vector Average Signed Word VX 4 1026 246 V vavgub Vector Average Unsigned Byte VX 4 1090 246 V vavguh Vector Average Unsigned Halfword VX 4 1154 246 V vavguw Vector Average Unsigned Word VX 4 842 263 V vcfsx Vector Convert From Signed Fixed-Point Word VX 4 778 263 V vcfux Vector Convert From Unsigned Fixed-Point Word VC 4 966 265 V vcmpbfp[.] Vector Compare Bounds Single-Precision VC 4 198 265 V vcmpeqfp[.] Vector Compare Equal To Single-Precision VC 4 6 251 V vcmpequb[.] Vector Compare Equal To Unsigned Byte VC 4 70 251 V vcmpequh[.] Vector Compare Equal To Unsigned Halfword VC 4 134 252 V vcmpequw[.] Vector Compare Equal To Unsigned Word VC 4 454 266 V vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Pre- cision VC 4 710 266 V vcmpgtfp[.] Vector Compare Greater Than Single-Precision VC 4 774 252 V vcmpgtsb[.] Vector Compare Greater Than Signed Byte VC 4 838 252 V vcmpgtsh[.] Vector Compare Greater Than Signed Halfword VC 4 902 252 V vcmpgtsw[.] Vector Compare Greater Than Signed Word VC 4 518 253 V vcmpgtub[.] Vector Compare Greater Than Unsigned Byte VC 4 582 253 V vcmpgtuh[.] Vector Compare Greater Than Unsigned Halfword VC 4 646 253 V vcmpgtuw[.] Vector Compare Greater Than Unsigned Word VX 4 970 262 V vctsxs Vector Convert To Signed Fixed-Point Word Saturate VX 4 906 262 V vctuxs Vector Convert To Unsigned Fixed-Point Word Saturate VX 4 394 267 V vexptefp Vector 2 Raised to the Exponent Estimate Floating- Point VX 4 458 267 V vlogefp Vector Log Base 2 Estimate Floating-Point VA 4 46 260 V vmaddfp Vector Multiply-Add Single-Precision VX 4 1034 261 V vmaxfp Vector Maximum Single-Precision VX 4 258 247 V vmaxsb Vector Maximum Signed Byte VX 4 322 247 V vmaxsh Vector Maximum Signed Halfword VX 4 386 247 V vmaxsw Vector Maximum Signed Word VX 4 2 248 V vmaxub Vector Maximum Unsigned Byte VX 4 66 248 V vmaxuh Vector Maximum Unsigned Halfword VX 4 130 248 V vmaxuw Vector Maximum Unsigned Word VA 4 32 238 V vmhaddshs Vector Multiply-High-Add Signed Halfword Saturate VA 4 33 238 V vmhraddshs Vector Multiply-High-Round-Add Signed Halfword Satu- rate VX 4 1098 261 V vminfp Vector Minimum Single-Precision VX 4 770 249 V vminsb Vector Minimum Signed Byte VX 4 834 249 V vminsh Vector Minimum Signed Halfword VX 4 898 249 V vminsw Vector Minimum Signed Word VX 4 514 250 V vminub Vector Minimum Unsigned Byte VX 4 578 250 V vminuh Vector Minimum Unsigned Halfword VX 4 642 250 V vminuw Vector Minimum Unsigned Word VA 4 34 239 V vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo VX 4 12 224 V vmrghb Vector Merge High Byte VX 4 76 224 V vmrghh Vector Merge High Halfword VX 4 140 224 V vmrghw Vector Merge High Word VX 4 268 225 V vmrglb Vector Merge Low Byte VX 4 332 225 V vmrglh Vector Merge Low Halfword VX 4 396 225 V vmrglw Vector Merge Low Word VA 4 37 240 V vmsummbm Vector Multiply-Sum Mixed Byte Modulo VA 4 40 240 V vmsumshm Vector Multiply-Sum Signed Halfword Modulo VA 4 41 241 V vmsumshs Vector Multiply-Sum Signed Halfword Saturate VA 4 36 239 V vmsumubm Vector Multiply-Sum Unsigned Byte Modulo VA 4 38 241 V vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo VA 4 39 242 V vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate 1274 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 776 236 V vmulesb Vector Multiply Even Signed Byte VX 4 840 236 V vmulesh Vector Multiply Even Signed Halfword VX 4 520 236 V vmuleub Vector Multiply Even Unsigned Byte VX 4 584 236 V vmuleuh Vector Multiply Even Unsigned Halfword VX 4 264 237 V vmulosb Vector Multiply Odd Signed Byte VX 4 328 237 V vmulosh Vector Multiply Odd Signed Halfword VX 4 8 237 V vmuloub Vector Multiply Odd Unsigned Byte VX 4 72 237 V vmulouh Vector Multiply Odd Unsigned Halfword VA 4 47 260 V vnmsubfp Vector Negative Multiply-Subtract Single-Precision VX 4 1284 254 V vnor Vector Logical NOR VX 4 1156 254 V vor Vector Logical OR VA 4 43 227 V vperm Vector Permute VX 4 782 219 V vpkpx Vector Pack Pixel VX 4 398 220 V vpkshss Vector Pack Signed Halfword Signed Saturate VX 4 270 220 V vpkshus Vector Pack Signed Halfword Unsigned Saturate VX 4 462 220 V vpkswss Vector Pack Signed Word Signed Saturate VX 4 334 220 V vpkswus Vector Pack Signed Word Unsigned Saturate VX 4 14 221 V vpkuhum Vector Pack Unsigned Halfword Unsigned Modulo VX 4 142 221 V vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate VX 4 78 221 V vpkuwum Vector Pack Unsigned Word Unsigned Modulo VX 4 206 221 V vpkuwus Vector Pack Unsigned Word Unsigned Saturate VX 4 266 268 V vrefp Vector Reciprocal Estimate Single-Precision VX 4 714 264 V vrfim Vector Round to Single-Precision Integer toward -Infin- ity VX 4 522 264 V vrfin Vector Round to Single-Precision Integer Nearest VX 4 650 264 V vrfip Vector Round to Single-Precision Integer toward +Infin- ity VX 4 586 264 V vrfiz Vector Round to Single-Precision Integer toward Zero VX 4 4 255 V vrlb Vector Rotate Left Byte VX 4 68 255 V vrlh Vector Rotate Left Halfword VX 4 132 255 V vrlw Vector Rotate Left Word VX 4 330 268 V vrsqrtefp Vector Reciprocal Square Root Estimate Single-Preci- sion VA 4 42 227 V vsel Vector Select VX 4 452 228 V vsl Vector Shift Left VX 4 260 256 V vslb Vector Shift Left Byte VA 4 44 228 V vsldoi Vector Shift Left Double by Octet Immediate VX 4 324 256 V vslh Vector Shift Left Halfword VX 4 1036 228 V vslo Vector Shift Left by Octet VX 4 388 256 V vslw Vector Shift Left Word VX 4 524 226 V vspltb Vector Splat Byte VX 4 588 226 V vsplth Vector Splat Halfword VX 4 780 226 V vspltisb Vector Splat Immediate Signed Byte VX 4 844 226 V vspltish Vector Splat Immediate Signed Halfword VX 4 908 226 V vspltisw Vector Splat Immediate Signed Word VX 4 652 226 V vspltw Vector Splat Word VX 4 708 229 V vsr Vector Shift Right VX 4 772 258 V vsrab Vector Shift Right Algebraic Byte VX 4 836 258 V vsrah Vector Shift Right Algebraic Halfword VX 4 900 258 V vsraw Vector Shift Right Algebraic Word VX 4 516 257 V vsrb Vector Shift Right Byte VX 4 580 257 V vsrh Vector Shift Right Halfword VX 4 1100 229 V vsro Vector Shift Right by Octet VX 4 644 257 V vsrw Vector Shift Right Word VX 4 1408 233 V vsubcuw Vector Subtract and Write Carry-Out Unsigned Word VX 4 74 259 V vsubfp Vector Subtract Single-Precision VX 4 1792 233 V vsubsbs Vector Subtract Signed Byte Saturate VX 4 1856 233 V vsubshs Vector Subtract Signed Halfword Saturate VX 4 1920 233 V vsubsws Vector Subtract Signed Word Saturate VX 4 1024 234 V vsububm Vector Subtract Unsigned Byte Modulo VX 4 1536 235 V vsububs Vector Subtract Unsigned Byte Saturate Appendix G. Power ISA Instruction Set Sorted by Category 1275 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 1088 234 V vsubuhm Vector Subtract Unsigned Halfword Modulo VX 4 1600 234 V vsubuhs Vector Subtract Unsigned Halfword Saturate VX 4 1152 234 V vsubuwm Vector Subtract Unsigned Word Modulo VX 4 1664 235 V vsubuws Vector Subtract Unsigned Word Saturate VX 4 1672 243 V vsum2sws Vector Sum across Half Signed Word Saturate VX 4 1800 244 V vsum4sbs Vector Sum across Quarter Signed Byte Saturate VX 4 1608 244 V vsum4shs Vector Sum across Quarter Signed Halfword Saturate VX 4 1544 244 V vsum4ubs Vector Sum across Quarter Unsigned Byte Saturate VX 4 1928 243 V vsumsws Vector Sum across Signed Word Saturate VX 4 846 222 V vupkhpx Vector Unpack High Pixel VX 4 526 222 V vupkhsb Vector Unpack High Signed Byte VX 4 590 222 V vupkhsh Vector Unpack High Signed Halfword VX 4 974 223 V vupklpx Vector Unpack Low Pixel VX 4 654 223 V vupklsb Vector Unpack Low Signed Byte VX 4 718 223 V vupklsh Vector Unpack Low Signed Halfword VX 4 1220 254 V vxor Vector Logical XOR XX1 31 620 338 VSX lxsdux Load VSR Scalar Doubleword with Update Indexed XX1 31 588 338 VSX lxsdx Load VSR Scalar Doubleword Indexed XX1 31 876 338 VSX lxvd2ux Load VSR Vector Doubleword*2 with Update Indexed XX1 31 844 338 VSX lxvd2x Load VSR Vector Doubleword*2 Indexed XX1 31 332 339 VSX lxvdsx Load VSR Vector Doubleword & Splat Indexed XX1 31 812 339 VSX lxvw4ux Load VSR Vector Word*4 with Update Indexed XX1 31 780 339 VSX lxvw4x Load VSR Vector Word*4 Indexed XX1 31 748 340 VSX stxsdux Store VSR Scalar Doubleword with Update Indexed XX1 31 716 340 VSX stxsdx Store VSR Scalar Doubleword Indexed XX1 31 1004 340 VSX stxvd2ux Store VSR Vector Doubleword*2 with Update Indexed XX1 31 972 340 VSX stxvd2x Store VSR Vector Doubleword*2 Indexed XX1 31 940 341 VSX stxvw4ux Store VSR Vector Word*4 with Update Indexed XX1 31 908 341 VSX stxvw4x Store VSR Vector Word*4 Indexed XX2 60 690 341 VSX xsabsdp VSX Scalar Absolute Value Double-Precision XX3 60 128 342 VSX xsadddp VSX Scalar Add Double-Precision XX3 60 172 347 VSX xscmpodp VSX Scalar Compare Ordered Double-Precision XX3 60 140 349 VSX xscmpudp VSX Scalar Compare Unordered Double-Precision XX3 60 704 351 VSX xscpsgndp VSX Scalar Copy Sign Double-Precision VSX Scalar Convert Double-Precision to Single- XX2 60 530 352 VSX xscvdpsp Precision VSX Scalar truncate Double-Precision to integer and XX2 60 688 353 VSX xscvdpsxds Convert to Signed Fixed-Point Doubleword format with Saturate VSX Scalar truncate Double-Precision to integer and XX2 60 176 355 VSX xscvdpsxws Convert to Signed Fixed-Point Word format with Saturate VSX Scalar truncate Double-Precision to integer and XX2 60 656 357 VSX xscvdpuxds Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Scalar truncate Double-Precision to integer and XX2 60 144 359 VSX xscvdpuxws Convert to Unsigned Fixed-Point Word format with Saturate VSX Scalar Convert Single-Precision to Double- XX2 60 658 361 VSX xscvspdp Precision format VSX Scalar Convert and round Signed Fixed-Point XX2 60 752 361 VSX xscvsxddp Doubleword to Double-Precision format VSX Scalar Convert and round Unsigned Fixed-Point XX2 60 720 362 VSX xscvuxddp Doubleword to Double-Precision format XX3 60 224 363 VSX xsdivdp VSX Scalar Divide Double-Precision XX3 60 132 365 VSX xsmaddadp VSX Scalar Multiply-Add Type-A Double-Precision XX3 60 164 365 VSX xsmaddmdp VSX Scalar Multiply-Add Type-M Double-Precision XX3 60 640 368 VSX xsmaxdp VSX Scalar Maximum Double-Precision XX3 60 672 370 VSX xsmindp VSX Scalar Minimum Double-Precision XX3 60 196 372 VSX xsmsubadp VSX Scalar Multiply-Subtract Type-A Double-Precision XX3 60 228 372 VSX xsmsubmdp VSX Scalar Multiply-Subtract Type-M Double-Precision 1276 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XX3 60 192 375 VSX xsmuldp VSX Scalar Multiply Double-Precision XX2 60 722 377 VSX xsnabsdp VSX Scalar Negative Absolute Value Double-Precision XX2 60 754 377 VSX xsnegdp VSX Scalar Negate Double-Precision VSX Scalar Negative Multiply-Add Type-A Double- XX3 60 644 378 VSX xsnmaddadp Precision VSX Scalar Negative Multiply-Add Type-M Double- XX3 60 676 378 VSX xsnmaddmdp Precision VSX Scalar Negative Multiply-Subtract Type-A Double- XX3 60 708 383 VSX xsnmsubadp Precision VSX Scalar Negative Multiply-Subtract Type-M Double- XX3 60 740 383 VSX xsnmsubmdp Precision XX2 60 146 386 VSX xsrdpi VSX Scalar Round to Double-Precision Integer VSX Scalar Round to Double-Precision Integer using XX2 60 214 387 VSX xsrdpic Current rounding mode VSX Scalar Round to Double-Precision Integer toward - XX2 60 242 388 VSX xsrdpim Infinity VSX Scalar Round to Double-Precision Integer toward XX2 60 210 388 VSX xsrdpip +Infinity VSX Scalar Round to Double-Precision Integer toward XX2 60 178 389 VSX xsrdpiz Zero XX2 60 180 390 VSX xsredp VSX Scalar Reciprocal Estimate Double-Precision VSX Scalar Reciprocal Square Root Estimate Double- XX2 60 148 391 VSX xsrsqrtedp Precision XX2 60 150 392 VSX xssqrtdp VSX Scalar Square Root Double-Precision XX3 60 160 393 VSX xssubdp VSX Scalar Subtract Double-Precision XX3 60 244 395 VSX xstdivdp VSX Scalar Test for software Divide Double-Precision VSX Scalar Test for software Square Root Double- XX2 60 212 396 VSX xstsqrtdp Precision XX2 60 946 397 VSX xvabsdp VSX Vector Absolute Value Double-Precision XX2 60 818 397 VSX xvabssp VSX Vector Absolute Value Single-Precision XX3 60 384 398 VSX xvadddp VSX Vector Add Double-Precision XX3 60 256 402 VSX xvaddsp VSX Vector Add Single-Precision XX3 60 396 404 VSX xvcmpeqdp VSX Vector Compare Equal To Double-Precision VSX Vector Compare Equal To Double-Precision & XX3 60 908 404 VSX xvcmpeqdp. Record XX3 60 268 405 VSX xvcmpeqsp VSX Vector Compare Equal To Single-Precision VSX Vector Compare Equal To Single-Precision & XX3 60 780 405 VSX xvcmpeqsp. Record VSX Vector Compare Greater Than or Equal To XX3 60 460 406 VSX xvcmpgedp Double-Precision VSX Vector Compare Greater Than or Equal To XX3 60 972 406 VSX xvcmpgedp. Double-Precision & Record VSX Vector Compare Greater Than or Equal To Single- XX3 60 332 407 VSX xvcmpgesp Precision VSX Vector Compare Greater Than or Equal To Single- XX3 60 844 407 VSX xvcmpgesp. Precision & Record XX3 60 428 408 VSX xvcmpgtdp VSX Vector Compare Greater Than Double-Precision VSX Vector Compare Greater Than Double-Precision & XX3 60 940 408 VSX xvcmpgtdp. Record XX3 60 300 409 VSX xvcmpgtsp VSX Vector Compare Greater Than Single-Precision VSX Vector Compare Greater Than Single-Precision & XX3 60 812 409 VSX xvcmpgtsp. Record XX3 60 960 410 VSX xvcpsgndp VSX Vector Copy Sign Double-Precision XX3 60 832 410 VSX xvcpsgnsp VSX Vector Copy Sign Single-Precision VSX Vector round and Convert Double-Precision to XX2 60 786 411 VSX xvcvdpsp Single-Precision format VSX Vector truncate Double-Precision to integer and XX2 60 944 412 VSX xvcvdpsxds Convert to Signed Fixed-Point Doubleword Saturate VSX Vector truncate Double-Precision to integer and XX2 60 432 414 VSX xvcvdpsxws Convert to Signed Fixed-Point Word Saturate Appendix G. Power ISA Instruction Set Sorted by Category 1277 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VSX Vector truncate Double-Precision to integer and XX2 60 912 416 VSX xvcvdpuxds Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Vector truncate Double-Precision to integer and XX2 60 400 418 VSX xvcvdpuxws Convert to Unsigned Fixed-Point Word format with Saturate VSX Vector Convert Single-Precision to Double- XX2 60 914 420 VSX xvcvspdp Precision VSX Vector truncate Single-Precision to integer and XX2 60 816 421 VSX xvcvspsxds Convert to Signed Fixed-Point Doubleword format with Saturate VSX Vector truncate Single-Precision to integer and XX2 60 304 423 VSX xvcvspsxws Convert to Signed Fixed-Point Word format with Saturate VSX Vector truncate Single-Precision to integer and XX2 60 784 425 VSX xvcvspuxds Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Vector truncate Single-Precision to integer and XX2 60 272 427 VSX xvcvspuxws Convert to Unsigned Fixed-Point Word Saturate VSX Vector Convert and round Signed Fixed-Point XX2 60 1008 429 VSX xvcvsxddp Doubleword to Double-Precision format VSX Vector Convert and round Signed Fixed-Point XX2 60 880 429 VSX xvcvsxdsp Doubleword to Single-Precision format VSX Vector Convert Signed Fixed-Point Word to XX2 60 496 430 VSX xvcvsxwdp Double-Precision format VSX Vector Convert and round Signed Fixed-Point XX2 60 368 430 VSX xvcvsxwsp Word to Single-Precision format VSX Vector Convert and round Unsigned Fixed-Point XX2 60 976 431 VSX xvcvuxddp Doubleword to Double-Precision format VSX Vector Convert and round Unsigned Fixed-Point XX2 60 848 431 VSX xvcvuxdsp Doubleword to Single-Precision format VSX Vector Convert Unsigned Fixed-Point Word to XX2 60 464 432 VSX xvcvuxwdp Double-Precision format VSX Vector Convert and round Unsigned Fixed-Point XX2 60 336 432 VSX xvcvuxwsp Word to Single-Precision format XX3 60 480 433 VSX xvdivdp VSX Vector Divide Double-Precision XX3 60 352 435 VSX xvdivsp VSX Vector Divide Single-Precision XX3 60 388 437 VSX xvmaddadp VSX Vector Multiply-Add Type-A Double-Precision XX3 60 260 437 VSX xvmaddasp VSX Vector Multiply-Add Type-A Single-Precision XX3 60 420 440 VSX xvmaddmdp VSX Vector Multiply-Add Type-M Double-Precision XX3 60 292 440 VSX xvmaddmsp VSX Vector Multiply-Add Type-M Single-Precision XX3 60 896 443 VSX xvmaxdp VSX Vector Maximum Double-Precision XX3 60 768 445 VSX xvmaxsp VSX Vector Maximum Single-Precision XX3 60 928 447 VSX xvmindp VSX Vector Minimum Double-Precision XX3 60 800 449 VSX xvminsp VSX Vector Minimum Single-Precision XX3 60 452 451 VSX xvmsubadp VSX Vector Multiply-Subtract Type-A Double-Precision XX3 60 324 451 VSX xvmsubasp VSX Vector Multiply-Subtract Type-A Single-Precision XX3 60 484 454 VSX xvmsubmdp VSX Vector Multiply-Subtract Type-M Double-Precision XX3 60 356 454 VSX xvmsubmsp VSX Vector Multiply-Subtract Type-M Single-Precision XX3 60 448 457 VSX xvmuldp VSX Vector Multiply Double-Precision XX3 60 320 459 VSX xvmulsp VSX Vector Multiply Single-Precision XX2 60 978 461 VSX xvnabsdp VSX Vector Negative Absolute Value Double-Precision XX2 60 850 461 VSX xvnabssp VSX Vector Negative Absolute Value Single-Precision XX2 60 1010 462 VSX xvnegdp VSX Vector Negate Double-Precision XX2 60 882 462 VSX xvnegsp VSX Vector Negate Single-Precision VSX Vector Negative Multiply-Add Type-A Double- XX3 60 900 463 VSX xvnmaddadp Precision VSX Vector Negative Multiply-Add Type-A Single- XX3 60 772 463 VSX xvnmaddasp Precision 1278 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VSX Vector Negative Multiply-Add Type-M Double- XX3 60 932 468 VSX xvnmaddmdp Precision VSX Vector Negative Multiply-Add Type-M Single- XX3 60 804 468 VSX xvnmaddmsp Precision VSX Vector Negative Multiply-Subtract Type-A Double- XX3 60 964 471 VSX xvnmsubadp Precision VSX Vector Negative Multiply-Subtract Type-A Single- XX3 60 836 471 VSX xvnmsubasp Precision VSX Vector Negative Multiply-Subtract Type-M Double- XX3 60 996 474 VSX xvnmsubmdp Precision VSX Vector Negative Multiply-Subtract Type-M Single- XX3 60 868 474 VSX xvnmsubmsp Precision XX2 60 402 477 VSX xvrdpi VSX Vector Round to Double-Precision Integer VSX Vector Round to Double-Precision Integer using XX2 60 470 478 VSX xvrdpic Current rounding mode VSX Vector Round to Double-Precision Integer toward - XX2 60 498 478 VSX xvrdpim Infinity VSX Vector Round to Double-Precision Integer toward XX2 60 466 479 VSX xvrdpip +Infinity VSX Vector Round to Double-Precision Integer toward XX2 60 434 479 VSX xvrdpiz Zero XX2 60 436 480 VSX xvredp VSX Vector Reciprocal Estimate Double-Precision XX2 60 308 481 VSX xvresp VSX Vector Reciprocal Estimate Single-Precision XX2 60 274 482 VSX xvrspi VSX Vector Round to Single-Precision Integer VSX Vector Round to Single-Precision Integer using XX2 60 342 482 VSX xvrspic Current rounding mode VSX Vector Round to Single-Precision Integer toward - XX2 60 370 483 VSX xvrspim Infinity VSX Vector Round to Single-Precision Integer toward XX2 60 338 483 VSX xvrspip +Infinity VSX Vector Round to Single-Precision Integer toward XX2 60 306 484 VSX xvrspiz Zero VSX Vector Reciprocal Square Root Estimate Double- XX2 60 404 485 VSX xvrsqrtedp Precision VSX Vector Reciprocal Square Root Estimate Single- XX2 60 276 486 VSX xvrsqrtesp Precision XX2 60 406 487 VSX xvsqrtdp VSX Vector Square Root Double-Precision XX2 60 278 488 VSX xvsqrtsp VSX Vector Square Root Single-Precision XX3 60 416 489 VSX xvsubdp VSX Vector Subtract Double-Precision XX3 60 288 491 VSX xvsubsp VSX Vector Subtract Single-Precision XX3 60 500 493 VSX xvtdivdp VSX Vector Test for software Divide Double-Precision XX3 60 372 494 VSX xvtdivsp VSX Vector Test for software Divide Single-Precision VSX Vector Test for software Square Root Double- XX2 60 468 495 VSX xvtsqrtdp Precision VSX Vector Test for software Square Root Single- XX2 60 340 495 VSX xvtsqrtsp Precision XX3 60 520 496 VSX xxland VSX Logical AND XX3 60 552 496 VSX xxlandc VSX Logical AND with Complement XX3 60 648 497 VSX xxlnor VSX Logical NOR XX3 60 584 497 VSX xxlor VSX Logical OR XX3 60 616 498 VSX xxlxor VSX Logical XOR XX3 60 72 499 VSX xxmrghw VSX Merge High Word XX3 60 200 499 VSX xxmrglw VSX Merge Low Word XX3 60 40 500 VSX xxpermdi VSX Permute Doubleword Immediate XX4 60 24 500 VSX xxsel VSX Select XX3 60 8 501 VSX xxsldwi VSX Shift Left Double by Word Immediate XX3 60 328 501 VSX xxspltw VSX Splat Word X 31 62 704 WT wait Wait Appendix G. Power ISA Instruction Set Sorted by Category 1279 Version 2.06 1 See the key to the mode dependency and privilege columns on page 1324 and the key to the category column in Section 1.3.5 of Book I. 1280 Power ISATM Book Appendices Version 2.06 Appendix H. Power ISA Instruction Set Sorted by Opcode This appendix lists all the instructions in the Power ISA, in order by opcode. Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext D 2 77 64 tdi Trap Doubleword Immediate D 3 76 B twi Trap Word Immediate VX 4 0 231 V vaddubm Vector Add Unsigned Byte Modulo VX 4 2 248 V vmaxub Vector Maximum Unsigned Byte VX 4 4 255 V vrlb Vector Rotate Left Byte VC 4 6 251 V vcmpequb[.] Vector Compare Equal To Unsigned Byte X 4 8 599 LMA mulhhwu[.] Multiply High Halfword to Word Unsigned VX 4 8 237 V vmuloub Vector Multiply Odd Unsigned Byte VX 4 10 259 V vaddfp Vector Add Single-Precision XO 4 12 596 LMA machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned VX 4 12 224 V vmrghb Vector Merge High Byte VX 4 14 221 V vpkuhum Vector Pack Unsigned Halfword Unsigned Modulo VA 4 32 238 V vmhaddshs Vector Multiply-High-Add Signed Halfword Saturate VA 4 33 238 V vmhraddshs Vector Multiply-High-Round-Add Signed Halfword Satu- rate VA 4 34 239 V vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo VA 4 36 239 V vmsumubm Vector Multiply-Sum Unsigned Byte Modulo VA 4 37 240 V vmsummbm Vector Multiply-Sum Mixed Byte Modulo VA 4 38 241 V vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo VA 4 39 242 V vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate X 4 40 599 LMA mulhhw[.] Multiply High Halfword to Word Signed VA 4 40 240 V vmsumshm Vector Multiply-Sum Signed Halfword Modulo VA 4 41 241 V vmsumshs Vector Multiply-Sum Signed Halfword Saturate VA 4 42 227 V vsel Vector Select VA 4 43 227 V vperm Vector Permute XO 4 44 595 LMA machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed VA 4 44 228 V vsldoi Vector Shift Left Double by Octet Immediate XO 4 46 601 LMA nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Modulo Signed VA 4 46 260 V vmaddfp Vector Multiply-Add Single-Precision VA 4 47 260 V vnmsubfp Vector Negative Multiply-Subtract Single-Precision VX 4 64 231 V vadduhm Vector Add Unsigned Halfword Modulo VX 4 66 248 V vmaxuh Vector Maximum Unsigned Halfword VX 4 68 255 V vrlh Vector Rotate Left Halfword VC 4 70 251 V vcmpequh[.] Vector Compare Equal To Unsigned Halfword VX 4 72 237 V vmulouh Vector Multiply Odd Unsigned Halfword VX 4 74 259 V vsubfp Vector Subtract Single-Precision XO 4 76 596 LMA machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned VX 4 76 224 V vmrghh Vector Merge High Halfword VX 4 78 221 V vpkuwum Vector Pack Unsigned Word Unsigned Modulo EVS 4 79 549 SP evsel Vector Select XO 4 108 595 LMA machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed Appendix H. Power ISA Instruction Set Sorted by Opcode 1281 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XO 4 110 601 LMA nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Saturate Signed VX 4 128 231 V vadduwm Vector Add Unsigned Word Modulo VX 4 130 248 V vmaxuw Vector Maximum Unsigned Word VX 4 132 255 V vrlw Vector Rotate Left Word VC 4 134 252 V vcmpequw[.] Vector Compare Equal To Unsigned Word X 4 136 598 LMA mulchwu[.] Multiply Cross Halfword to Word Unsigned XO 4 140 594 LMA macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned VX 4 140 224 V vmrghw Vector Merge High Word VX 4 142 221 V vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate X 4 168 598 LMA mulchw[.] Multiply Cross Halfword to Word Signed XO 4 172 593 LMA macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed XO 4 174 600 LMA nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Modulo Signed VC 4 198 265 V vcmpeqfp[.] Vector Compare Equal To Single-Precision XO 4 204 594 LMA macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned VX 4 206 221 V vpkuwus Vector Pack Unsigned Word Unsigned Saturate XO 4 236 593 LMA macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed XO 4 238 600 LMA nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Saturate Signed VX 4 258 247 V vmaxsb Vector Maximum Signed Byte VX 4 260 256 V vslb Vector Shift Left Byte VX 4 264 237 V vmulosb Vector Multiply Odd Signed Byte VX 4 266 268 V vrefp Vector Reciprocal Estimate Single-Precision VX 4 268 225 V vmrglb Vector Merge Low Byte VX 4 270 220 V vpkshus Vector Pack Signed Halfword Unsigned Saturate VX 4 322 247 V vmaxsh Vector Maximum Signed Halfword VX 4 324 256 V vslh Vector Shift Left Halfword VX 4 328 237 V vmulosh Vector Multiply Odd Signed Halfword VX 4 330 268 V vrsqrtefp Vector Reciprocal Square Root Estimate Single-Preci- sion VX 4 332 225 V vmrglh Vector Merge Low Halfword VX 4 334 220 V vpkswus Vector Pack Signed Word Unsigned Saturate VX 4 384 230 V vaddcuw Vector Add and Write Carry-Out Unsigned Word VX 4 386 247 V vmaxsw Vector Maximum Signed Word VX 4 388 256 V vslw Vector Shift Left Word X 4 392 599 LMA mullhwu[.] Multiply Low Halfword to Word Unsigned VX 4 394 267 V vexptefp Vector 2 Raised to the Exponent Estimate Floating- Point XO 4 396 598 LMA maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned VX 4 396 225 V vmrglw Vector Merge Low Word VX 4 398 220 V vpkshss Vector Pack Signed Halfword Signed Saturate X 4 424 599 LMA mullhw[.] Multiply Low Halfword to Word Signed XO 4 428 597 LMA maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed XO 4 430 602 LMA nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Modulo Signed VX 4 452 228 V vsl Vector Shift Left VC 4 454 266 V vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Pre- cision VX 4 458 267 V vlogefp Vector Log Base 2 Estimate Floating-Point XO 4 460 598 LMA maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned VX 4 462 220 V vpkswss Vector Pack Signed Word Signed Saturate 1282 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XO 4 492 597 LMA maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed XO 4 494 602 LMA nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Saturate Signed EVX 4 512 511 SP evaddw Vector Add Word VX 4 512 232 V vaddubs Vector Add Unsigned Byte Saturate EVX 4 514 510 SP evaddiw Vector Add Immediate Word VX 4 514 250 V vminub Vector Minimum Unsigned Byte EVX 4 516 555 SP evsubfw Vector Subtract from Word VX 4 516 257 V vsrb Vector Shift Right Byte EVX 4 518 555 SP evsubifw Vector Subtract Immediate from Word VC 4 518 253 V vcmpgtub[.] Vector Compare Greater Than Unsigned Byte EVX 4 520 510 SP evabs Vector Absolute Value VX 4 520 236 V vmuleub Vector Multiply Even Unsigned Byte EVX 4 521 547 SP evneg Vector Negate EVX 4 522 515 SP evextsb Vector Extend Sign Byte VX 4 522 264 V vrfin Vector Round to Single-Precision Integer Nearest EVX 4 523 515 SP evextsh Vector Extend Sign Halfword EVX 4 524 549 SP evrndw Vector Round Word VX 4 524 226 V vspltb Vector Splat Byte EVX 4 525 514 SP evcntlzw Vector Count Leading Zeros Word EVX 4 526 514 SP evcntlsw Vector Count Leading Signed Bits Word VX 4 526 222 V vupkhsb Vector Unpack High Signed Byte EVX 4 527 510 SP brinc Bit Reversed Increment EVX 4 529 512 SP evand Vector AND EVX 4 530 512 SP evandc Vector AND with Complement EVX 4 534 555 SP evxor Vector XOR EVX 4 535 548 SP evor Vector OR EVX 4 536 547 SP evnor Vector NOR EVX 4 537 515 SP eveqv Vector Equivalent EVX 4 539 548 SP evorc Vector OR with Complement EVX 4 542 547 SP evnand Vector NAND EVX 4 544 551 SP evsrwu Vector Shift Right Word Unsigned EVX 4 545 551 SP evsrws Vector Shift Right Word Signed EVX 4 546 550 SP evsrwiu Vector Shift Right Word Immediate Unsigned EVX 4 547 550 SP evsrwis Vector Shift Right Word Immediate Signed EVX 4 548 550 SP evslw Vector Shift Left Word EVX 4 550 550 SP evslwi Vector Shift Left Word Immediate EVX 4 552 548 SP evrlw Vector Rotate Left Word EVX 4 553 550 SP evsplati Vector Splat Immediate EVX 4 554 549 SP evrlwi Vector Rotate Left Word Immediate EVX 4 555 550 SP evsplatfi Vector Splat Fractional Immediate EVX 4 556 521 SP evmergehi Vector Merge High EVX 4 557 521 SP evmergelo Vector Merge Low EVX 4 558 522 SP evmergehilo Vector Merge High/Low EVX 4 559 522 SP evmergelohi Vector Merge Low/High EVX 4 560 513 SP evcmpgtu Vector Compare Greater Than Unsigned EVX 4 561 512 SP evcmpgts Vector Compare Greater Than Signed EVX 4 562 513 SP evcmpltu Vector Compare Less Than Unsigned EVX 4 563 513 SP evcmplts Vector Compare Less Than Signed EVX 4 564 512 SP evcmpeq Vector Compare Equal VX 4 576 232 V vadduhs Vector Add Unsigned Halfword Saturate VX 4 578 250 V vminuh Vector Minimum Unsigned Halfword VX 4 580 257 V vsrh Vector Shift Right Halfword VC 4 582 253 V vcmpgtuh[.] Vector Compare Greater Than Unsigned Halfword VX 4 584 236 V vmuleuh Vector Multiply Even Unsigned Halfword VX 4 586 264 V vrfiz Vector Round to Single-Precision Integer toward Zero VX 4 588 226 V vsplth Vector Splat Halfword VX 4 590 222 V vupkhsh Vector Unpack High Signed Halfword EVX 4 640 563 SP.FV evfsadd Vector Floating-Point Single-Precision Add VX 4 640 232 V vadduws Vector Add Unsigned Word Saturate Appendix H. Power ISA Instruction Set Sorted by Opcode 1283 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 641 563 SP.FV evfssub Vector Floating-Point Single-Precision Subtract VX 4 642 250 V vminuw Vector Minimum Unsigned Word EVX 4 644 562 SP.FV evfsabs Vector Floating-Point Single-Precision Absolute Value VX 4 644 257 V vsrw Vector Shift Right Word EVX 4 645 562 SP.FV evfsnabs Vector Floating-Point Single-Precision Negative Abso- lute Value EVX 4 646 562 SP.FV evfsneg Vector Floating-Point Single-Precision Negate VC 4 646 253 V vcmpgtuw[.] Vector Compare Greater Than Unsigned Word EVX 4 648 563 SP.FV evfsmul Vector Floating-Point Single-Precision Multiply EVX 4 649 563 SP.FV evfsdiv Vector Floating-Point Single-Precision Divide VX 4 650 264 V vrfip Vector Round to Single-Precision Integer toward +Infin- ity EVX 4 652 564 SP.FV evfscmpgt Vector Floating-Point Single-Precision Compare Greater Than VX 4 652 226 V vspltw Vector Splat Word EVX 4 653 564 SP.FV evfscmplt Vector Floating-Point Single-Precision Compare Less Than EVX 4 654 565 SP.FV evfscmpeq Vector Floating-Point Single-Precision Compare Equal VX 4 654 223 V vupklsb Vector Unpack Low Signed Byte EVX 4 656 567 SP.FV evfscfui Vector Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 657 567 SP.FV evfscfsi Vector Convert Floating-Point Single-Precision from Signed Integer EVX 4 658 567 SP.FV evfscfuf Vector Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 659 567 SP.FV evfscfsf Vector Convert Floating-Point Single-Precision from Signed Fraction EVX 4 660 568 SP.FV evfsctui Vector Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 661 568 SP.FV evfsctsi Vector Convert Floating-Point Single-Precision to Signed Integer EVX 4 662 569 SP.FV evfsctuf Vector Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 663 569 SP.FV evfsctsf Vector Convert Floating-Point Single-Precision to Signed Fraction EVX 4 664 568 SP.FV evfsctuiz Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 666 568 SP.FV evfsctsiz Vector Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero EVX 4 668 565 SP.FV evfststgt Vector Floating-Point Single-Precision Test Greater Than EVX 4 669 566 SP.FV evfststlt Vector Floating-Point Single-Precision Test Less Than EVX 4 670 566 SP.FV evfststeq Vector Floating-Point Single-Precision Test Equal EVX 4 704 571 SP.FS efsadd Floating-Point Single-Precision Add EVX 4 705 571 SP.FS efssub Floating-Point Single-Precision Subtract EVX 4 708 570 SP.FS efsabs Floating-Point Single-Precision Absolute Value VX 4 708 229 V vsr Vector Shift Right EVX 4 709 570 SP.FS efsnabs Floating-Point Single-Precision Negative Absolute Value EVX 4 710 570 SP.FS efsneg Floating-Point Single-Precision Negate VC 4 710 266 V vcmpgtfp[.] Vector Compare Greater Than Single-Precision EVX 4 712 571 SP.FS efsmul Floating-Point Single-Precision Multiply EVX 4 713 571 SP.FS efsdiv Floating-Point Single-Precision Divide VX 4 714 264 V vrfim Vector Round to Single-Precision Integer toward -Infin- ity EVX 4 716 572 SP.FS efscmpgt Floating-Point Single-Precision Compare Greater Than EVX 4 717 572 SP.FS efscmplt Floating-Point Single-Precision Compare Less Than EVX 4 718 573 SP.FS efscmpeq Floating-Point Single-Precision Compare Equal VX 4 718 223 V vupklsh Vector Unpack Low Signed Halfword 1284 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 719 585 SP.FD efscfd Floating-Point Single-Precision Convert from Double- Precision EVX 4 720 575 SP.FS efscfui Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 721 575 SP.FS efscfsi Convert Floating-Point Single-Precision from Signed Integer EVX 4 722 575 SP.FS efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 723 575 SP.FS efscfsf Convert Floating-Point Single-Precision from Signed Fraction EVX 4 724 575 SP.FS efsctui Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 725 575 SP.FS efsctsi Convert Floating-Point Single-Precision to Signed Inte- ger EVX 4 726 576 SP.FS efsctuf Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 727 576 SP.FS efsctsf Convert Floating-Point Single-Precision to Signed Frac- tion EVX 4 728 576 SP.FS efsctuiz Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 730 576 SP.FS efsctsiz Convert Floating-Point Single-Precision to Signed Inte- ger with Round toward Zero EVX 4 732 573 SP.FS efststgt Floating-Point Single-Precision Test Greater Than EVX 4 733 574 SP.FS efststlt Floating-Point Single-Precision Test Less Than EVX 4 734 574 SP.FS efststeq Floating-Point Single-Precision Test Equal EVX 4 736 578 SP.FD efdadd Floating-Point Double-Precision Add EVX 4 737 578 SP.FD efdsub Floating-Point Double-Precision Subtract EVX 4 738 582 SP.FD efdcfuid Convert Floating-Point Double-Precision from Unsigned Integer Doubleword EVX 4 739 582 SP.FD efdcfsid Convert Floating-Point Double-Precision from Signed Integer Doubleword EVX 4 740 577 SP.FD efdabs Floating-Point Double-Precision Absolute Value EVX 4 741 577 SP.FD efdnabs Floating-Point Double-Precision Negative Absolute Value EVX 4 742 577 SP.FD efdneg Floating-Point Double-Precision Negate EVX 4 744 578 SP.FD efdmul Floating-Point Double-Precision Multiply EVX 4 745 578 SP.FD efddiv Floating-Point Double-Precision Divide EVX 4 746 583 SP.FD efdctuidz Convert Floating-Point Double-Precision to Unsigned Integer Doubleword with Round toward Zero EVX 4 747 583 SP.FD efdctsidz Convert Floating-Point Double-Precision to Signed Inte- ger Doubleword with Round toward Zero EVX 4 748 579 SP.FD efdcmpgt Floating-Point Double-Precision Compare Greater Than EVX 4 749 579 SP.FD efdcmplt Floating-Point Double-Precision Compare Less Than EVX 4 750 579 SP.FD efdcmpeq Floating-Point Double-Precision Compare Equal EVX 4 751 584 SP.FD efdcfs Floating-Point Double-Precision Convert from Single- Precision EVX 4 752 581 SP.FD efdcfui Convert Floating-Point Double-Precision from Unsigned Integer EVX 4 753 581 SP.FD efdcfsi Convert Floating-Point Double-Precision from Signed Integer EVX 4 754 582 SP.FD efdcfuf Convert Floating-Point Double-Precision from Unsigned Fraction EVX 4 755 582 SP.FD efdcfsf Convert Floating-Point Double-Precision from Signed Fraction EVX 4 756 582 SP.FD efdctui Convert Floating-Point Double-Precision to Unsigned Integer EVX 4 757 582 SP.FD efdctsi Convert Floating-Point Double-Precision to Signed Inte- ger Appendix H. Power ISA Instruction Set Sorted by Opcode 1285 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 758 584 SP.FD efdctuf Convert Floating-Point Double-Precision to Unsigned Fraction EVX 4 759 584 SP.FD efdctsf Convert Floating-Point Double-Precision to Signed Fraction EVX 4 760 584 SP.FD efdctuiz Convert Floating-Point Double-Precision to Unsigned Integer with Round toward Zero EVX 4 762 584 SP.FD efdctsiz Convert Floating-Point Double-Precision to Signed Inte- ger with Round toward Zero EVX 4 764 579 SP.FD efdtstgt Floating-Point Double-Precision Test Greater Than EVX 4 765 580 SP.FD efdtstlt Floating-Point Double-Precision Test Less Than EVX 4 766 580 SP.FD efdtsteq Floating-Point Double-Precision Test Equal EVX 4 768 516 SP evlddx Vector Load Double Word into Double Word Indexed VX 4 768 230 V vaddsbs Vector Add Signed Byte Saturate EVX 4 769 516 SP evldd Vector Load Double Word into Double Word EVX 4 770 517 SP evldwx Vector Load Double into Two Words Indexed VX 4 770 249 V vminsb Vector Minimum Signed Byte EVX 4 771 517 SP evldw Vector Load Double into Two Words EVX 4 772 516 SP evldhx Vector Load Double into Four Halfwords Indexed VX 4 772 258 V vsrab Vector Shift Right Algebraic Byte EVX 4 773 516 SP evldh Vector Load Double into Four Halfwords VC 4 774 252 V vcmpgtsb[.] Vector Compare Greater Than Signed Byte EVX 4 776 517 SP evlhhesplatx Vector Load Halfword into Halfwords Even and Splat Indexed VX 4 776 236 V vmulesb Vector Multiply Even Signed Byte EVX 4 777 517 SP evlhhesplat Vector Load Halfword into Halfwords Even and Splat VX 4 778 263 V vcfux Vector Convert From Unsigned Fixed-Point Word EVX 4 780 518 SP evlhhousplatx Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed VX 4 780 226 V vspltisb Vector Splat Immediate Signed Byte EVX 4 781 518 SP evlhhousplat Vector Load Halfword into Halfword Odd Unsigned and Splat EVX 4 782 518 SP evlhhossplatx Vector Load Halfword into Halfword Odd Signed and Splat Indexed VX 4 782 219 V vpkpx Vector Pack Pixel EVX 4 783 518 SP evlhhossplat Vector Load Halfword into Halfword Odd Signed and Splat EVX 4 784 519 SP evlwhex Vector Load Word into Two Halfwords Even Indexed EVX 4 785 519 SP evlwhe Vector Load Word into Two Halfwords Even EVX 4 788 520 SP evlwhoux Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) EVX 4 789 520 SP evlwhou Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) EVX 4 790 519 SP evlwhosx Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) EVX 4 791 519 SP evlwhos Vector Load Word into Two Halfwords Odd Signed (with sign extension) EVX 4 792 521 SP evlwwsplatx Vector Load Word into Word and Splat Indexed EVX 4 793 521 SP evlwwsplat Vector Load Word into Word and Splat EVX 4 796 520 SP evlwhsplatx Vector Load Word into Two Halfwords and Splat Indexed EVX 4 797 520 SP evlwhsplat Vector Load Word into Two Halfwords and Splat EVX 4 800 551 SP evstddx Vector Store Double of Double Indexed EVX 4 801 551 SP evstdd Vector Store Double of Double EVX 4 802 552 SP evstdwx Vector Store Double of Two Words Indexed EVX 4 803 552 SP evstdw Vector Store Double of Two Words EVX 4 804 552 SP evstdhx Vector Store Double of Four Halfwords Indexed EVX 4 805 552 SP evstdh Vector Store Double of Four Halfwords EVX 4 816 553 SP evstwhex Vector Store Word of Two Halfwords from Even Indexed EVX 4 817 553 SP evstwhe Vector Store Word of Two Halfwords from Even 1286 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 820 553 SP evstwhox Vector Store Word of Two Halfwords from Odd Indexed EVX 4 821 553 SP evstwho Vector Store Word of Two Halfwords from Odd EVX 4 824 553 SP evstwwex Vector Store Word of Word from Even Indexed EVX 4 825 553 SP evstwwe Vector Store Word of Word from Even EVX 4 828 554 SP evstwwox Vector Store Word of Word from Odd Indexed EVX 4 829 554 SP evstwwo Vector Store Word of Word from Odd VX 4 832 230 V vaddshs Vector Add Signed Halfword Saturate VX 4 834 249 V vminsh Vector Minimum Signed Halfword VX 4 836 258 V vsrah Vector Shift Right Algebraic Halfword VC 4 838 252 V vcmpgtsh[.] Vector Compare Greater Than Signed Halfword VX 4 840 236 V vmulesh Vector Multiply Even Signed Halfword VX 4 842 263 V vcfsx Vector Convert From Signed Fixed-Point Word VX 4 844 226 V vspltish Vector Splat Immediate Signed Halfword VX 4 846 222 V vupkhpx Vector Unpack High Pixel VX 4 896 230 V vaddsws Vector Add Signed Word Saturate VX 4 898 249 V vminsw Vector Minimum Signed Word VX 4 900 258 V vsraw Vector Shift Right Algebraic Word VC 4 902 252 V vcmpgtsw[.] Vector Compare Greater Than Signed Word VX 4 906 262 V vctuxs Vector Convert To Unsigned Fixed-Point Word Saturate VX 4 908 226 V vspltisw Vector Splat Immediate Signed Word VC 4 966 265 V vcmpbfp[.] Vector Compare Bounds Single-Precision VX 4 970 262 V vctsxs Vector Convert To Signed Fixed-Point Word Saturate VX 4 974 223 V vupklpx Vector Unpack Low Pixel VX 4 1024 234 V vsububm Vector Subtract Unsigned Byte Modulo VX 4 1026 246 V vavgub Vector Average Unsigned Byte EVX 4 1027 526 SP evmhessf Vector Multiply Halfwords, Even, Signed, Saturate, Fractional VX 4 1028 254 V vand Vector Logical AND EVX 4 1031 535 SP evmhossf Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional EVX 4 1032 529 SP evmheumi Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer EVX 4 1033 525 SP evmhesmi Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger VX 4 1034 261 V vmaxfp Vector Maximum Single-Precision EVX 4 1035 524 SP evmhesmf Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional EVX 4 1036 537 SP evmhoumi Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer VX 4 1036 228 V vslo Vector Shift Left by Octet EVX 4 1037 533 SP evmhosmi Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger EVX 4 1039 532 SP evmhosmf Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional EVX 4 1059 526 SP evmhessfa Vector Multiply Halfwords, Even, Signed, Saturate, Fractional to Accumulator EVX 4 1063 535 SP evmhossfa Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional to Accumulator EVX 4 1064 529 SP evmheumia Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator EVX 4 1065 525 SP evmhesmia Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger to Accumulator EVX 4 1067 524 SP evmhesmfa Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional to Accumulator EVX 4 1068 537 SP evmhoumia Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator EVX 4 1069 533 SP evmhosmia Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger to Accumulator EVX 4 1071 532 SP evmhosmfa Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional to Accumulator Appendix H. Power ISA Instruction Set Sorted by Opcode 1287 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 1088 234 V vsubuhm Vector Subtract Unsigned Halfword Modulo VX 4 1090 246 V vavguh Vector Average Unsigned Halfword VX 4 1092 254 V vandc Vector Logical AND with Complement EVX 4 1095 540 SP evmwhssf Vector Multiply Word High Signed, Saturate, Fractional EVX 4 1096 542 SP evmwlumi Vector Multiply Word Low Unsigned, Modulo, Integer VX 4 1098 261 V vminfp Vector Minimum Single-Precision EVX 4 1100 540 SP evmwhumi Vector Multiply Word High Unsigned, Modulo, Integer VX 4 1100 229 V vsro Vector Shift Right by Octet EVX 4 1101 539 SP evmwhsmi Vector Multiply Word High Signed, Modulo, Integer EVX 4 1103 539 SP evmwhsmf Vector Multiply Word High Signed, Modulo, Fractional EVX 4 1107 545 SP evmwssf Vector Multiply Word Signed, Saturate, Fractional EVX 4 1112 546 SP evmwumi Vector Multiply Word Unsigned, Modulo, Integer EVX 4 1113 544 SP evmwsmi Vector Multiply Word Signed, Modulo, Integer EVX 4 1115 543 SP evmwsmf Vector Multiply Word Signed, Modulo, Fractional EVX 4 1127 540 SP evmwhssfa Vector Multiply Word High Signed, Saturate, Fractional to Accumulator EVX 4 1128 542 SP evmwlumia Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator EVX 4 1132 540 SP evmwhumia Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator EVX 4 1133 539 SP evmwhsmia Vector Multiply Word High Signed, Modulo, Integer to Accumulator EVX 4 1135 539 SP evmwhsmfa Vector Multiply Word High Signed, Modulo, Fractional to Accumulator EVX 4 1139 545 SP evmwssfa Vector Multiply Word Signed, Saturate, Fractional to Accumulator EVX 4 1144 546 SP evmwumia Vector Multiply Word Unsigned, Modulo, Integer to Accumulator EVX 4 1145 544 SP evmwsmia Vector Multiply Word Signed, Modulo, Integer to Accu- mulator EVX 4 1147 543 SP evmwsmfa Vector Multiply Word Signed, Modulo, Fractional to Accumulator VX 4 1152 234 V vsubuwm Vector Subtract Unsigned Word Modulo VX 4 1154 246 V vavguw Vector Average Unsigned Word VX 4 1156 254 V vor Vector Logical OR EVX 4 1216 511 SP evaddusiaaw Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX 4 1217 511 SP evaddssiaaw Vector Add Signed, Saturate, Integer to Accumulator Word EVX 4 1218 555 SP evsubfusiaaw Vector Subtract Unsigned, Saturate, Integer to Accu- mulator Word EVX 4 1219 554 SP evsubfssiaaw Vector Subtract Signed, Saturate, Integer to Accumula- tor Word EVX 4 1220 539 SP evmra Initialize Accumulator VX 4 1220 254 V vxor Vector Logical XOR EVX 4 1222 514 SP evdivws Vector Divide Word Signed EVX 4 1223 515 SP evdivwu Vector Divide Word Unsigned EVX 4 1224 511 SP evaddumiaaw Vector Add Unsigned, Modulo, Integer to Accumulator Word EVX 4 1225 510 SP evaddsmiaaw Vector Add Signed, Modulo, Integer to Accumulator Word EVX 4 1226 555 SP evsubfumiaaw Vector Subtract Unsigned, Modulo, Integer to Accumu- lator Word EVX 4 1227 554 SP evsubfsmiaaw Vector Subtract Signed, Modulo, Integer to Accumula- tor Word EVX 4 1280 530 SP evmheusiaaw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1281 528 SP evmhessiaaw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate into Words VX 4 1282 245 V vavgsb Vector Average Signed Byte 1288 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1283 527 SP evmhessfaaw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Words EVX 4 1284 538 SP evmhousiaaw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words VX 4 1284 254 V vnor Vector Logical NOR EVX 4 1285 537 SP evmhossiaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate into Words EVX 4 1287 536 SP evmhossfaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate into Words EVX 4 1288 529 SP evmheumiaaw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1289 525 SP evmhesmiaaw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1291 524 SP evmhesmfaaw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1292 538 SP evmhoumiaaw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1293 534 SP evmhosmiaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1295 533 SP evmhosmfaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1320 523 SP evmhegumiaa Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1321 523 SP evmhegsmiaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate EVX 4 1323 522 SP evmhegsmfaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate EVX 4 1324 532 SP evmhogumiaa Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1325 531 SP evmhogsmiaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate EVX 4 1327 531 SP evmhogsmfaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate EVX 4 1344 543 SP evmwlusiaaw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1345 541 SP evmwlssiaaw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words VX 4 1346 245 V vavgsh Vector Average Signed Halfword EVX 4 1352 542 SP evmwlumiaaw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1353 541 SP evmwlsmiaaw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words EVX 4 1363 545 SP evmwssfaa Vector Multiply Word Signed, Saturate, Fractional and Accumulate EVX 4 1368 547 SP evmwumiaa Vector Multiply Word Unsigned, Modulo, Integer and Accumulate EVX 4 1369 544 SP evmwsmiaa Vector Multiply Word Signed, Modulo, Integer and Accumulate EVX 4 1371 544 SP evmwsmfaa Vector Multiply Word Signed, Modulo, Fractional and Accumulate EVX 4 1408 530 SP evmheusianw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words VX 4 1408 233 V vsubcuw Vector Subtract and Write Carry-Out Unsigned Word EVX 4 1409 528 SP evmhessianw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate Negative into Words VX 4 1410 245 V vavgsw Vector Average Signed Word EVX 4 1411 527 SP evmhessfanw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate Negative into Words Appendix H. Power ISA Instruction Set Sorted by Opcode 1289 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1412 538 SP evmhousianw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 4 1413 537 SP evmhossianw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate Negative into Words EVX 4 1415 536 SP evmhossfanw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 4 1416 529 SP evmheumianw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 4 1417 525 SP evmhesmianw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1419 524 SP evmhesmfanw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1420 534 SP evmhoumianw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 4 1421 533 SP evmhosmianw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1423 533 SP evmhosmfanw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1448 523 SP evmhegumian Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1449 523 SP evmhegsmian Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 4 1451 522 SP evmhegsmfan Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 4 1452 532 SP evmhogumian Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1453 531 SP evmhogsmian Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate Negative EVX 4 1455 531 SP evmhogsmfan Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate Negative EVX 4 1472 543 SP evmwlusianw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words EVX 4 1473 541 SP evmwlssianw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words EVX 4 1480 542 SP evmwlumianw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words EVX 4 1481 541 SP evmwlsmianw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words EVX 4 1491 546 SP evmwssfan Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative EVX 4 1496 547 SP evmwumian Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1497 544 SP evmwsmian Vector Multiply Word Signed, Modulo, Integer and Accumulate Negative EVX 4 1499 544 SP evmwsmfan Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative VX 4 1536 235 V vsububs Vector Subtract Unsigned Byte Saturate VX 4 1540 269 V mfvscr Move From Vector Status and Control Register VX 4 1544 244 V vsum4ubs Vector Sum across Quarter Unsigned Byte Saturate VX 4 1600 234 V vsubuhs Vector Subtract Unsigned Halfword Saturate VX 4 1604 269 V mtvscr Move To Vector Status and Control Register VX 4 1608 244 V vsum4shs Vector Sum across Quarter Signed Halfword Saturate VX 4 1664 235 V vsubuws Vector Subtract Unsigned Word Saturate VX 4 1672 243 V vsum2sws Vector Sum across Half Signed Word Saturate VX 4 1792 233 V vsubsbs Vector Subtract Signed Byte Saturate VX 4 1800 244 V vsum4sbs Vector Sum across Quarter Signed Byte Saturate VX 4 1856 233 V vsubshs Vector Subtract Signed Halfword Saturate VX 4 1920 233 V vsubsws Vector Subtract Signed Word Saturate VX 4 1928 243 V vsumsws Vector Sum across Signed Word Saturate D 7 67 B mulli Multiply Low Immediate 1290 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext D 8 SR 64 B subfic Subtract From Immediate Carrying D 10 75 B cmpli Compare Logical Immediate D 11 74 B cmpi Compare Immediate D 12 SR 63 B addic Add Immediate Carrying D 13 SR 63 B addic. Add Immediate Carrying and Record D 14 62 B addi Add Immediate D 15 62 B addis Add Immediate Shifted B 16 CT 35 B bc[l][a] Branch Conditional SC 17 39, B sc System Call 745, 908 I 18 35 B b[l][a] Branch XL 19 0 38 B mcrf Move Condition Register Field XL 19 16 CT 36 B bclr[l] Branch Conditional to Link Register XL 19 18 P 746 S rfid Return From Interrupt Doubleword XL 19 33 38 B crnor Condition Register NOR XL 19 38 P 910 E rfmci Return From Machine Check Interrupt X 19 39 P 910 E.ED rfdi Return From Debug Interrupt XL 19 50 P 909 E rfi Return From Interrupt XL 19 51 P 909 E rfci Return From Critical Interrupt XL 19 102 911 E.HV rfgi Return From Guest Interrupt XL 19 129 38 B crandc Condition Register AND with Complement XL 19 150 693 B isync Instruction Synchronize XL 19 193 37 B crxor Condition Register XOR XFX 19 198 1092 E.ED dnh Debugger Notify Halt XL 19 225 37 B crnand Condition Register NAND XL 19 257 37 B crand Condition Register AND XL 19 274 H 746 S hrfid Hypervisor Return From Interrupt Doubleword XL 19 289 38 B creqv Condition Register Equivalent XL 19 402 H 748 S doze Doze XL 19 417 38 B crorc Condition Register OR with Complement XL 19 434 H 748 S nap Nap XL 19 449 37 B cror Condition Register OR XL 19 466 H 748 S sleep Sleep XL 19 498 H 749 S rvwinkle Rip Van Winkle XL 19 528 CT 36 B bcctr[l] Branch Conditional to Count Register M 20 SR 89 B rlwimi[.] Rotate Left Word Immediate then Mask Insert M 21 SR 87 B rlwinm[.] Rotate Left Word Immediate then AND with Mask M 23 SR 88 B rlwnm[.] Rotate Left Word then AND with Mask D 24 78 B ori OR Immediate D 25 79 B oris OR Immediate Shifted D 26 79 B xori XOR Immediate D 27 79 B xoris XOR Immediate Shifted D 28 SR 78 B andi. AND Immediate D 29 SR 78 B andis. AND Immediate Shifted MD 30 0 SR 90 64 rldicl[.] Rotate Left Doubleword Immediate then Clear Left MD 30 1 SR 90 64 rldicr[.] Rotate Left Doubleword Immediate then Clear Right MD 30 2 SR 91 64 rldic[.] Rotate Left Doubleword Immediate then Clear MD 30 3 SR 92 64 rldimi[.] Rotate Left Doubleword Immediate then Mask Insert MDS 30 8 SR 91 64 rldcl[.] Rotate Left Doubleword then Clear Left MDS 30 9 SR 92 64 rldcr[.] Rotate Left Doubleword then Clear Right X 31 0 74 B cmp Compare X 31 4 76 B tw Trap Word X 31 6 218 V lvsl Load Vector for Shift Left Indexed X 31 7 216 V lvebx Load Vector Element Byte Indexed XO 31 8 SR 64 B subfc[o][.] Subtract From Carrying XO 31 9 SR 71 64 mulhdu[.] Multiply High Doubleword Unsigned XO 31 10 SR 64 B addc[o][.] Add Carrying XO 31 11 SR 67 B mulhwu[.] Multiply High Word Unsigned A 31 15 77 B isel Integer Select X 31 18 P 1003 E.HV tlbilx TLB Invalidate Local Appendix H. Power ISA Instruction Set Sorted by Opcode 1291 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XFX 31 19 102 B mfcr Move From Condition Register XFX 31 19 103 B mfocrf Move From One Condition Register Field X 31 20 694 B lwarx Load Word And Reserve Indexed X 31 21 50 64 ldx Load Doubleword Indexed X 31 22 680 E icbt Instruction Cache Block Touch X 31 23 48 B lwzx Load Word and Zero Indexed X 31 24 SR 93 B slw[.] Shift Left Word X 31 26 SR 81 B cntlzw[.] Count Leading Zeros Word X 31 27 SR 95 64 sld[.] Shift Left Doubleword X 31 28 SR 80 B and[.] AND X 31 29 P 928 E.PD;64 ldepx Load Doubleword by External Process ID Indexed X 31 31 P 928 E.PD lwepx Load Word by External Process ID Indexed X 31 32 75 B cmpl Compare Logical X 31 38 218 V lvsr Load Vector for Shift Right Indexed X 31 39 213 V lvehx Load Vector Element Halfword Indexed XO 31 40 SR 63 B subf[o][.] Subtract From X 31 52 694 B lbarx Load Byte and Reserve Indexed X 31 53 50 64 ldux Load Doubleword with Update Indexed X 31 54 691 B dcbst Data Cache Block Store X 31 55 48 B lwzux Load Word and Zero with Update Indexed X 31 58 SR 85 64 cntlzd[.] Count Leading Zeros Doubleword X 31 60 SR 81 B andc[.] AND with Complement X 31 62 704 WT wait Wait X 31 63 P 931 E.PD dcbstep Data Cache Block Store by External PID X 31 68 77 64 td Trap Doubleword X 31 71 213 V lvewx Load Vector Element Word Indexed XO 31 73 SR 71 64 mulhd[.] Multiply High Doubleword XO 31 74 H 97 BCDA addg6s Add and Generate Sixes XO 31 75 SR 67 B mulhw[.] Multiply High Word X 31 78 591 LMV dlmzb[.] Determine Leftmost Zero Byte X 31 83 P 767, B mfmsr Move From Machine State Register 923 X 31 84 699 64 ldarx Load Doubleword And Reserve Indexed X 31 86 691 B dcbf Data Cache Block Flush X 31 87 46 B lbzx Load Byte and Zero Indexed X 31 95 P 927 E.PD lbepx Load Byte by External Process ID Indexed X 31 103 214 V lvx Load Vector Indexed XO 31 104 SR 66 B neg[o][.] Negate X 31 116 695 B lharx Load Halfword and Reserve Indexed X 31 119 45 B lbzux Load Byte and Zero with Update Indexed X 31 122 83 B popcntb Population Count Bytes X 31 124 SR 81 B nor[.] NOR X 31 127 P 932 E.PD dcbfep Data Cache Block Flush by External PID X 31 131 P 924 E wrtee Write MSR External Enable X 31 134 M 991 ECL dcbtstls Data Cache Block Touch for Store and Lock Set X 31 135 216 V stvebx Store Vector Element Byte Indexed XO 31 136 SR 65 B subfe[o][.] Subtract From Extended XO 31 138 SR 65 B adde[o][.] Add Extended XFX 31 144 102 B mtcrf Move To Condition Register Fields XFX 31 144 103 B mtocrf Move To One Condition Register Field X 31 146 P 923 E mtmsr Move To Machine State Register X 31 146 P 765 S mtmsr Move To Machine State Register X 31 149 54 64 stdx Store Doubleword Indexed X 31 150 698 B stwcx. Store Word Conditional Indexed X 31 151 53 B stwx Store Word Indexed X 31 154 84 B prtyw Parity Word X 31 157 P 930 E.PD;64 stdepx Store Doubleword by External Process ID Indexed X 31 159 P 930 E.PD stwepx Store Word by External Process ID Indexed X 31 163 P 925 E wrteei Write MSR External Enable Immediate X 31 166 M 991 ECL dcbtls Data Cache Block Touch and Lock Set X 31 167 216 V stvehx Store Vector Element Halfword Indexed 1292 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 178 P 766 S mtmsrd Move To Machine State Register Doubleword X 31 181 54 64 stdux Store Doubleword with Update Indexed X 31 183 53 B stwux Store Word with Update Indexed X 31 186 84 64 prtyd Parity Doubleword X 31 199 217 V stvewx Store Vector Element Word Indexed XO 31 200 SR 66 B subfze[o][.] Subtract From Zero Extended XO 31 202 SR 66 B addze[o][.] Add to Zero Extended X 31 206 P 1098 E.PC msgsnd Message Send X 31 210 32 P 809 S mtsr Move To Segment Register X 31 214 699 64 stdcx. Store Doubleword Conditional Indexed X 31 215 51 B stbx Store Byte Indexed X 31 223 P 929 E.PD stbepx Store Byte by External Process ID Indexed X 31 230 M 993 ECL icblc Instruction Cache Block Lock Clear X 31 231 214 V stvx Store Vector Indexed XO 31 232 SR 65 B subfme[o][.] Subtract From Minus One Extended XO 31 233 SR 71 64 mulld[o][.] Multiply Low Doubleword XO 31 234 SR 65 B addme[o][.] Add to Minus One Extended XO 31 235 SR 67 B mullw[o][.] Multiply Low Word X 31 238 P 1098 E.PC msgclr Message Clear X 31 242 32 P 809 S mtsrin Move To Segment Register Indirect X 31 246 689 B dcbtst Data Cache Block Touch for Store X 31 247 51 B stbux Store Byte with Update Indexed X 31 252 86 64 bpermd Bit Permute Doubleword X 31 255 P 933 E.PD dcbtstep Data Cache Block Touch for Store by External PID X 31 259 P 923 E.DC mfdcrx Move From Device Control Register Indexed X 31 263 P 937 E.PD lvepxl Load Vector by External Process ID Indexed LRU XO 31 266 SR 63 B add[o][.] Add XL 31 270 911 E.HV ehpriv Embedded Hypervisor Privilege X 31 274 64 P 814 S tlbiel TLB Invalidate Entry Local X 31 278 688 B dcbt Data Cache Block Touch X 31 279 46 B lhzx Load Halfword and Zero Indexed X 31 282 H 97 BCDA cdtbcd Convert Declets To Binary Coded Decimal X 31 284 SR 81 B eqv[.] Equivalent X 31 287 P 927 E.PD lhepx Load Halfword by External Process ID Indexed X 31 291 104 E.DC mfdcrux Move From Device Control Register User-mode Indexed X 31 295 P 937 E.PD lvepx Load Vector by External Process ID Indexed X 31 306 64 H 811 S tlbie TLB Invalidate Entry X 31 310 716 EC eciwx External Control In Word Indexed X 31 311 46 B lhzux Load Halfword and Zero with Update Indexed X 31 314 H 97 BCDA cbcdtd Convert Binary Coded Decimal to Declets X 31 316 SR 80 B xor[.] XOR X 31 319 P 931 E.PD dcbtep Data Cache Block Touch by External PID XFX 31 323 P 923 E.DC mfdcr Move From Device Control Register X 31 326 P 1106 E.CD dcread Data Cache Read [Alternative Encoding] XX1 31 332 339 VSX lxvdsx Load VSR Vector Doubleword & Splat Indexed XFX 31 334 O 1118 E.PM mfpmr Move From Performance Monitor Register XFX 31 339 O 101, B mfspr Move From Special Purpose Register 708 X 31 341 49 64 lwax Load Word Algebraic Indexed X 31 343 47 B lhax Load Halfword Algebraic Indexed X 31 359 214 V lvxl Load Vector Indexed LRU X 31 370 H 817 S tlbia TLB Invalidate All XFX 31 371 708 S mftb Move From Time Base X 31 373 49 64 lwaux Load Word Algebraic with Update Indexed X 31 375 47 B lhaux Load Halfword Algebraic with Update Indexed X 31 378 83 B popcntw Population Count Word X 31 387 P 922 E.DC mtdcrx Move To Device Control Register Indexed X 31 390 M 992 ECL dcblc Data Cache Block Lock Clear XO 31 393 SR 73 64 divdeu[o][.] Divide Doubleword Extended Unsigned XO 31 395 SR 69 B divweu[o][.] Divide Word Extended Unsigned X 31 402 P 805 S slbmte SLB Move To Entry Appendix H. Power ISA Instruction Set Sorted by Opcode 1293 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 407 52 B sthx Store Halfword Indexed X 31 412 SR 81 B orc[.] OR with Complement XS 31 413 SR 96 64 sradi[.] Shift Right Algebraic Doubleword Immediate X 31 415 P 929 E.PD sthepx Store Halfword by External Process ID Indexed X 31 419 104 E.DC mtdcrux Move To Device Control Register User-mode Indexed XO 31 425 SR 73 64 divde[o][.] Divide Doubleword Extended XO 31 427 SR 69 B divwe[o][.] Divide Word Extended X 31 434 P 802 S slbie SLB Invalidate Entry X 31 438 716 EC ecowx External Control Out Word Indexed X 31 439 52 B sthux Store Halfword with Update Indexed X 31 444 SR 80 B or[.] OR XFX 31 451 P 922 E.DC mtdcr Move To Device Control Register X 31 454 P 1103 E.CI dci Data Cache Invalidate XO 31 457 SR 72 64 divdu[o][.] Divide Doubleword Unsigned XO 31 459 SR 68 B divwu[o][.] Divide Word Unsigned XFX 31 462 O 1118 E.PM mtpmr Move To Performance Monitor Register XFX 31 467 O 100 B mtspr Move To Special Purpose Register X 31 470 P 988 E dcbi Data Cache Block Invalidate X 31 476 SR 80 B nand[.] NAND X 31 483 714 DS dsn Decorated Storage Notify X 31 486 P 1106 E.CD dcread Data Cache Read X 31 486 M 992 ECL icbtls Instruction Cache Block Touch and Lock Set X 31 487 217 V stvxl Store Vector Indexed LRU XO 31 489 SR 72 64 divd[o][.] Divide Doubleword XO 31 491 SR 68 B divw[o][.] Divide Word X 31 498 P 804 S slbia SLB Invalidate All X 31 506 85 B popcntd Population Count Doubleword X 31 508 82 B cmpb Compare Bytes X 31 512 104 E mcrxr Move to Condition Register from XER X 31 515 712 DS lbdx Load Byte with Decoration Indexed X 31 532 56 64 ldbrx Load Doubleword Byte-Reverse Indexed X 31 533 59 MA lswx Load String Word Indexed X 31 534 55 B lwbrx Load Word Byte-Reverse Indexed X 31 535 128 FP lfsx Load Floating-Point Single Indexed X 31 536 SR 93 B srw[.] Shift Right Word X 31 539 SR 95 64 srd[.] Shift Right Doubleword X 31 547 712 DS lhdx Load Halfword with Decoration Indexed X 31 566 H 817, B tlbsync TLB Synchronize 1010 ,905 X 31 567 128 FP lfsux Load Floating-Point Single with Update Indexed X 31 579 712 DS lwdx Load Word with Decoration Indexed XX1 31 588 338 VSX lxsdx Load VSR Scalar Doubleword Indexed X 31 595 32 P 810 S mfsr Move From Segment Register X 31 597 59 MA lswi Load String Word Immediate X 31 598 701 B sync Synchronize X 31 599 125 FP lfdx Load Floating-Point Double Indexed X 31 607 P 935 E.PD lfdepx Load Floating-Point Double by External Process ID Indexed X 31 611 712 DS lddx Load Doubleword with Decoration Indexed XX1 31 620 338 VSX lxsdux Load VSR Scalar Doubleword with Update Indexed X 31 631 125 FP lfdux Load Floating-Point Double with Update Indexed X 31 643 713 DS stbdx Store Byte with Decoration Indexed X 31 659 32 P 810 S mfsrin Move From Segment Register Indirect X 31 660 56 64 stdbrx Store Doubleword Byte-Reverse Indexed X 31 661 60 MA stswx Store String Word Indexed X 31 662 55 B stwbrx Store Word Byte-Reverse Indexed X 31 663 128 FP stfsx Store Floating-Point Single Indexed X 31 675 713 DS sthdx Store Halfword with Decoration Indexed X 31 694 696 B stbcx. Store Byte Conditional Indexed X 31 695 128 FP stfsux Store Floating-Point Single with Update Indexed 1294 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 707 713 DS stwdx Store Word with Decoration Indexed XX1 31 716 340 VSX stxsdx Store VSR Scalar Doubleword Indexed X 31 725 60 MA stswi Store String Word Immediate X 31 726 697 B sthcx. Store Halfword Conditional Indexed X 31 727 129 FP stfdx Store Floating-Point Double Indexed X 31 735 P 935 E.PD stfdepx Store Floating-Point Double by External Process ID Indexed X 31 739 713 DS stddx Store Doubleword with Decoration Indexed XX1 31 748 340 VSX stxsdux Store VSR Scalar Doubleword with Update Indexed X 31 758 687 E dcba Data Cache Block Allocate X 31 759 129 FP stfdux Store Floating-Point Double with Update Indexed X 31 775 P 938 E.PD stvepxl Store Vector by External Process ID Indexed LRU XX1 31 780 339 VSX lxvw4x Load VSR Vector Word*4 Indexed X 31 786 P 1001 E tlbivax TLB Invalidate Virtual Address Indexed ,903 X 31 789 H 757 S lwzcix Load Word and Zero Caching Inhibited Indexed X 31 790 55 B lhbrx Load Halfword Byte-Reverse Indexed X 31 791 131 FP.out lfdpx Load Floating-Point Double Pair Indexed X 31 792 SR 94 B sraw[.] Shift Right Algebraic Word X 31 794 SR 96 64 srad[.] Shift Right Algebraic Doubleword EVX 31 799 P 936 E.PD evlddepx Vector Load Doubleword into Doubleword by External Process ID Indexed X 31 803 712 DS lfddx Load Floating Doubleword with Decoration Indexed X 31 807 P 938 E.PD stvepx Store Vector by External Process ID Indexed XX1 31 812 339 VSX lxvw4ux Load VSR Vector Word*4 with Update Indexed X 31 821 H 757 S lhzcix Load Halfword and Zero Caching Inhibited Indexed X 31 824 SR 94 B srawi[.] Shift Right Algebraic Word Immediate XX1 31 844 338 VSX lxvd2x Load VSR Vector Doubleword*2 Indexed X 31 850 P 1007 E.TWC tlbsrx. TLB Search and Reserve X 31 851 P 806 S slbmfev SLB Move From Entry VSID X 31 853 H 757 S lbzcix Load Byte and Zero Caching Inhibited Indexed X 31 854 703 S eieio Enforce In-order Execution of I/O X 31 854 703 E mbar Memory Barrier X 31 855 126 FP lfiwax Load Floating-Point as Integer Word Algebraic Indexed XX1 31 876 338 VSX lxvd2ux Load VSR Vector Doubleword*2 with Update Indexed X 31 885 H 757 S ldcix Load Doubleword Caching Inhibited Indexed X 31 887 126 FP lfiwzx Load Floating-Point as Integer Word and Zero Indexed XX1 31 908 341 VSX stxvw4x Store VSR Vector Word*4 Indexed X 31 914 P 1005 E tlbsx TLB Search Indexed ,904 X 31 915 P 806 S slbmfee SLB Move From Entry ESID X 31 917 H 758 S stwcix Store Word Caching Inhibited Indexed X 31 918 55 B sthbrx Store Halfword Byte-Reverse Indexed X 31 919 131 FP.out stfdpx Store Floating-Point Double Pair Indexed X 31 922 SR 81 B extsh[.] Extend Sign Halfword EVX 31 927 P 936 E.PD evstddepx Vector Store Doubleword into Doubleword by External Process ID Indexed X 31 931 713 DS stfddx Store Floating Doubleword with Decoration Indexed XX1 31 940 341 VSX stxvw4ux Store VSR Vector Word*4 with Update Indexed X 31 946 P 1008 E tlbre TLB Read Entry ,904 X 31 949 H 758 S sthcix Store Halfword Caching Inhibited Indexed X 31 954 SR 81 B extsb[.] Extend Sign Byte X 31 966 P 1103 E.CI ici Instruction Cache Invalidate XX1 31 972 340 VSX stxvd2x Store VSR Vector Doubleword*2 Indexed X 31 978 P 1010 E tlbwe TLB Write Entry ,905 X 31 979 SR P 807 S slbfee. SLB Find Entry ESID X 31 981 H 758 S stbcix Store Byte Caching Inhibited Indexed X 31 982 680 B icbi Instruction Cache Block Invalidate X 31 983 130 FP stfiwx Store Floating-Point as Integer Word Indexed Appendix H. Power ISA Instruction Set Sorted by Opcode 1295 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 986 SR 85 64 extsw[.] Extend Sign Word X 31 991 P 934 E.PD icbiep Instruction Cache Block Invalidate by External PID X 31 998 P 1107 E.CD icread Instruction Cache Read XX1 31 1004 340 VSX stxvd2ux Store VSR Vector Doubleword*2 with Update Indexed X 31 1013 H 758 S stdcix Store Doubleword Caching Inhibited Indexed X 31 1014 691 B dcbz Data Cache Block set to Zero X 31 1023 P 934 E.PD dcbzep Data Cache Block set to Zero by External PID D 32 48 B lwz Load Word and Zero D 33 48 B lwzu Load Word and Zero with Update D 34 45 B lbz Load Byte and Zero D 35 45 B lbzu Load Byte and Zero with Update D 36 53 B stw Store Word D 37 53 B stwu Store Word with Update D 38 51 B stb Store Byte D 39 51 B stbu Store Byte with Update D 40 46 B lhz Load Halfword and Zero D 41 46 B lhzu Load Halfword and Zero with Update D 42 47 B lha Load Halfword Algebraic D 43 47 B lhau Load Halfword Algebraic with Update D 44 52 B sth Store Halfword D 45 52 B sthu Store Halfword with Update D 46 57 B lmw Load Multiple Word D 47 57 B stmw Store Multiple Word D 48 128 FP lfs Load Floating-Point Single D 49 128 FP lfsu Load Floating-Point Single with Update D 50 125 FP lfd Load Floating-Point Double D 51 125 FP lfdu Load Floating-Point Double with Update D 52 128 FP stfs Store Floating-Point Single D 53 128 FP stfsu Store Floating-Point Single with Update D 54 129 FP stfd Store Floating-Point Double D 55 129 FP stfdu Store Floating-Point Double with Update DQ 56 P 759 LSQ lq Load Quadword DS 57 0 131 FP.out lfdp Load Floating-Point Double Pair DS 58 0 50 64 ld Load Doubleword DS 58 1 50 64 ldu Load Doubleword with Update DS 58 2 49 64 lwa Load Word Algebraic X 59 2 173 DFP dadd[.] DFP Add Z 59 3 184 DFP dqua[.] DFP Quantize A 59 18 134 FP[R] fdivs[.] Floating Divide Single A 59 20 133 FP[R] fsubs[.] Floating Subtract Single A 59 21 133 FP[R] fadds[.] Floating Add Single A 59 22 135 FP[R] fsqrts[.] Floating Square Root Single A 59 24 135 FP[R] fres[.] Floating Reciprocal Estimate Single A 59 25 134 FP[R] fmuls[.] Floating Multiply Single A 59 26 136 FP[R].in frsqrtes[.] Floating Reciprocal Square Root Estimate Single A 59 28 138 FP[R] fmsubs[.] Floating Multiply-Subtract Single A 59 29 138 FP[R] fmadds[.] Floating Multiply-Add Single A 59 30 139 FP[R] fnmsubs[.] Floating Negative Multiply-Subtract Single A 59 31 139 FP[R] fnmadds[.] Floating Negative Multiply-Add Single X 59 34 175 DFP dmul[.] DFP Multiply Z 59 35 186 DFP drrnd[.] DFP Reround Z23 59 66 200 DFP dscli[.] DFP Shift Significand Left Immediate Z23 59 67 183 DFP dquai[.] DFP Quantize Immediate Z 59 98 200 DFP dscri[.] DFP Shift Significand Right Immediate Z23 59 99 189 DFP drintx[.] DFP Round To FP Integer With Inexact X 59 130 179 DFP dcmpo DFP Compare Ordered X 59 162 181 DFP dtstex DFP Test Exponent Z23 59 194 180 DFP dtstdc DFP Test Data Class Z23 59 226 180 DFP dtstdg DFP Test Data Group Z23 59 227 191 DFP drintn[.] DFP Round To FP Integer Without Inexact X 59 258 193 DFP dctdp[.] DFP Convert To DFP Long 1296 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 59 290 195 DFP dctfix[.] DFP Convert To Fixed X 59 322 197 DFP ddedpd[.] DFP Decode DPD To BCD X 59 354 198 DFP dxex[.] DFP Extract Biased Exponent X 59 514 173 DFP dsub[.] DFP Subtract X 59 546 176 DFP ddiv[.] DFP Divide X 59 642 178 DFP dcmpu DFP Compare Unordered X 59 674 182 DFP dtstsf DFP Test Significance X 59 770 194 DFP drsp[.] DFP Round To DFP Short X 59 802 195 DFP dcffix[.] DFP Convert From Fixed X 59 834 197 DFP denbcd[.] DFP Encode BCD To DPD X 59 846 145 FP fcfids[.] Floating Convert From Integer Doubleword Single X 59 866 198 DFP diex[.] DFP Insert Biased Exponent X 59 974 146 FP fcfidus[.] Floating Convert From Integer Doubleword Unsigned Single XX3 60 8 501 VSX xxsldwi VSX Shift Left Double by Word Immediate XX4 60 24 500 VSX xxsel VSX Select XX3 60 40 500 VSX xxpermdi VSX Permute Doubleword Immediate XX3 60 72 499 VSX xxmrghw VSX Merge High Word XX3 60 128 342 VSX xsadddp VSX Scalar Add Double-Precision XX3 60 132 365 VSX xsmaddadp VSX Scalar Multiply-Add Type-A Double-Precision XX3 60 140 349 VSX xscmpudp VSX Scalar Compare Unordered Double-Precision VSX Scalar truncate Double-Precision to integer and XX2 60 144 359 VSX xscvdpuxws Convert to Unsigned Fixed-Point Word format with Saturate XX2 60 146 386 VSX xsrdpi VSX Scalar Round to Double-Precision Integer VSX Scalar Reciprocal Square Root Estimate Double- XX2 60 148 391 VSX xsrsqrtedp Precision XX2 60 150 392 VSX xssqrtdp VSX Scalar Square Root Double-Precision XX3 60 160 393 VSX xssubdp VSX Scalar Subtract Double-Precision XX3 60 164 365 VSX xsmaddmdp VSX Scalar Multiply-Add Type-M Double-Precision XX3 60 172 347 VSX xscmpodp VSX Scalar Compare Ordered Double-Precision VSX Scalar truncate Double-Precision to integer and XX2 60 176 355 VSX xscvdpsxws Convert to Signed Fixed-Point Word format with Saturate VSX Scalar Round to Double-Precision Integer toward XX2 60 178 389 VSX xsrdpiz Zero XX2 60 180 390 VSX xsredp VSX Scalar Reciprocal Estimate Double-Precision XX3 60 192 375 VSX xsmuldp VSX Scalar Multiply Double-Precision XX3 60 196 372 VSX xsmsubadp VSX Scalar Multiply-Subtract Type-A Double-Precision XX3 60 200 499 VSX xxmrglw VSX Merge Low Word VSX Scalar Round to Double-Precision Integer toward XX2 60 210 388 VSX xsrdpip +Infinity VSX Scalar Test for software Square Root Double- XX2 60 212 396 VSX xstsqrtdp Precision VSX Scalar Round to Double-Precision Integer using XX2 60 214 387 VSX xsrdpic Current rounding mode XX3 60 224 363 VSX xsdivdp VSX Scalar Divide Double-Precision XX3 60 228 372 VSX xsmsubmdp VSX Scalar Multiply-Subtract Type-M Double-Precision VSX Scalar Round to Double-Precision Integer toward - XX2 60 242 388 VSX xsrdpim Infinity XX3 60 244 395 VSX xstdivdp VSX Scalar Test for software Divide Double-Precision XX3 60 256 402 VSX xvaddsp VSX Vector Add Single-Precision XX3 60 260 437 VSX xvmaddasp VSX Vector Multiply-Add Type-A Single-Precision XX3 60 268 405 VSX xvcmpeqsp VSX Vector Compare Equal To Single-Precision VSX Vector truncate Single-Precision to integer and XX2 60 272 427 VSX xvcvspuxws Convert to Unsigned Fixed-Point Word Saturate XX2 60 274 482 VSX xvrspi VSX Vector Round to Single-Precision Integer VSX Vector Reciprocal Square Root Estimate Single- XX2 60 276 486 VSX xvrsqrtesp Precision XX2 60 278 488 VSX xvsqrtsp VSX Vector Square Root Single-Precision Appendix H. Power ISA Instruction Set Sorted by Opcode 1297 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XX3 60 288 491 VSX xvsubsp VSX Vector Subtract Single-Precision XX3 60 292 440 VSX xvmaddmsp VSX Vector Multiply-Add Type-M Single-Precision XX3 60 300 409 VSX xvcmpgtsp VSX Vector Compare Greater Than Single-Precision VSX Vector truncate Single-Precision to integer and XX2 60 304 423 VSX xvcvspsxws Convert to Signed Fixed-Point Word format with Saturate VSX Vector Round to Single-Precision Integer toward XX2 60 306 484 VSX xvrspiz Zero XX2 60 308 481 VSX xvresp VSX Vector Reciprocal Estimate Single-Precision XX3 60 320 459 VSX xvmulsp VSX Vector Multiply Single-Precision XX3 60 324 451 VSX xvmsubasp VSX Vector Multiply-Subtract Type-A Single-Precision XX3 60 328 501 VSX xxspltw VSX Splat Word VSX Vector Compare Greater Than or Equal To Single- XX3 60 332 407 VSX xvcmpgesp Precision VSX Vector Convert and round Unsigned Fixed-Point XX2 60 336 432 VSX xvcvuxwsp Word to Single-Precision format VSX Vector Round to Single-Precision Integer toward XX2 60 338 483 VSX xvrspip +Infinity VSX Vector Test for software Square Root Single- XX2 60 340 495 VSX xvtsqrtsp Precision VSX Vector Round to Single-Precision Integer using XX2 60 342 482 VSX xvrspic Current rounding mode XX3 60 352 435 VSX xvdivsp VSX Vector Divide Single-Precision XX3 60 356 454 VSX xvmsubmsp VSX Vector Multiply-Subtract Type-M Single-Precision VSX Vector Convert and round Signed Fixed-Point XX2 60 368 430 VSX xvcvsxwsp Word to Single-Precision format VSX Vector Round to Single-Precision Integer toward - XX2 60 370 483 VSX xvrspim Infinity XX3 60 372 494 VSX xvtdivsp VSX Vector Test for software Divide Single-Precision XX3 60 384 398 VSX xvadddp VSX Vector Add Double-Precision XX3 60 388 437 VSX xvmaddadp VSX Vector Multiply-Add Type-A Double-Precision XX3 60 396 404 VSX xvcmpeqdp VSX Vector Compare Equal To Double-Precision VSX Vector truncate Double-Precision to integer and XX2 60 400 418 VSX xvcvdpuxws Convert to Unsigned Fixed-Point Word format with Saturate XX2 60 402 477 VSX xvrdpi VSX Vector Round to Double-Precision Integer VSX Vector Reciprocal Square Root Estimate Double- XX2 60 404 485 VSX xvrsqrtedp Precision XX2 60 406 487 VSX xvsqrtdp VSX Vector Square Root Double-Precision XX3 60 416 489 VSX xvsubdp VSX Vector Subtract Double-Precision XX3 60 420 440 VSX xvmaddmdp VSX Vector Multiply-Add Type-M Double-Precision XX3 60 428 408 VSX xvcmpgtdp VSX Vector Compare Greater Than Double-Precision VSX Vector truncate Double-Precision to integer and XX2 60 432 414 VSX xvcvdpsxws Convert to Signed Fixed-Point Word Saturate VSX Vector Round to Double-Precision Integer toward XX2 60 434 479 VSX xvrdpiz Zero XX2 60 436 480 VSX xvredp VSX Vector Reciprocal Estimate Double-Precision XX3 60 448 457 VSX xvmuldp VSX Vector Multiply Double-Precision XX3 60 452 451 VSX xvmsubadp VSX Vector Multiply-Subtract Type-A Double-Precision VSX Vector Compare Greater Than or Equal To XX3 60 460 406 VSX xvcmpgedp Double-Precision VSX Vector Convert Unsigned Fixed-Point Word to XX2 60 464 432 VSX xvcvuxwdp Double-Precision format VSX Vector Round to Double-Precision Integer toward XX2 60 466 479 VSX xvrdpip +Infinity VSX Vector Test for software Square Root Double- XX2 60 468 495 VSX xvtsqrtdp Precision VSX Vector Round to Double-Precision Integer using XX2 60 470 478 VSX xvrdpic Current rounding mode XX3 60 480 433 VSX xvdivdp VSX Vector Divide Double-Precision 1298 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XX3 60 484 454 VSX xvmsubmdp VSX Vector Multiply-Subtract Type-M Double-Precision VSX Vector Convert Signed Fixed-Point Word to XX2 60 496 430 VSX xvcvsxwdp Double-Precision format VSX Vector Round to Double-Precision Integer toward - XX2 60 498 478 VSX xvrdpim Infinity XX3 60 500 493 VSX xvtdivdp VSX Vector Test for software Divide Double-Precision XX3 60 520 496 VSX xxland VSX Logical AND VSX Scalar Convert Double-Precision to Single- XX2 60 530 352 VSX xscvdpsp Precision XX3 60 552 496 VSX xxlandc VSX Logical AND with Complement XX3 60 584 497 VSX xxlor VSX Logical OR XX3 60 616 498 VSX xxlxor VSX Logical XOR XX3 60 640 368 VSX xsmaxdp VSX Scalar Maximum Double-Precision VSX Scalar Negative Multiply-Add Type-A Double- XX3 60 644 378 VSX xsnmaddadp Precision XX3 60 648 497 VSX xxlnor VSX Logical NOR VSX Scalar truncate Double-Precision to integer and XX2 60 656 357 VSX xscvdpuxds Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Scalar Convert Single-Precision to Double- XX2 60 658 361 VSX xscvspdp Precision format XX3 60 672 370 VSX xsmindp VSX Scalar Minimum Double-Precision VSX Scalar Negative Multiply-Add Type-M Double- XX3 60 676 378 VSX xsnmaddmdp Precision VSX Scalar truncate Double-Precision to integer and XX2 60 688 353 VSX xscvdpsxds Convert to Signed Fixed-Point Doubleword format with Saturate XX2 60 690 341 VSX xsabsdp VSX Scalar Absolute Value Double-Precision XX3 60 704 351 VSX xscpsgndp VSX Scalar Copy Sign Double-Precision VSX Scalar Negative Multiply-Subtract Type-A Double- XX3 60 708 383 VSX xsnmsubadp Precision VSX Scalar Convert and round Unsigned Fixed-Point XX2 60 720 362 VSX xscvuxddp Doubleword to Double-Precision format XX2 60 722 377 VSX xsnabsdp VSX Scalar Negative Absolute Value Double-Precision VSX Scalar Negative Multiply-Subtract Type-M Double- XX3 60 740 383 VSX xsnmsubmdp Precision VSX Scalar Convert and round Signed Fixed-Point XX2 60 752 361 VSX xscvsxddp Doubleword to Double-Precision format XX2 60 754 377 VSX xsnegdp VSX Scalar Negate Double-Precision XX3 60 768 445 VSX xvmaxsp VSX Vector Maximum Single-Precision VSX Vector Negative Multiply-Add Type-A Single- XX3 60 772 463 VSX xvnmaddasp Precision VSX Vector Compare Equal To Single-Precision & XX3 60 780 405 VSX xvcmpeqsp. Record VSX Vector truncate Single-Precision to integer and XX2 60 784 425 VSX xvcvspuxds Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Vector round and Convert Double-Precision to XX2 60 786 411 VSX xvcvdpsp Single-Precision format XX3 60 800 449 VSX xvminsp VSX Vector Minimum Single-Precision VSX Vector Negative Multiply-Add Type-M Single- XX3 60 804 468 VSX xvnmaddmsp Precision VSX Vector Compare Greater Than Single-Precision & XX3 60 812 409 VSX xvcmpgtsp. Record VSX Vector truncate Single-Precision to integer and XX2 60 816 421 VSX xvcvspsxds Convert to Signed Fixed-Point Doubleword format with Saturate XX2 60 818 397 VSX xvabssp VSX Vector Absolute Value Single-Precision XX3 60 832 410 VSX xvcpsgnsp VSX Vector Copy Sign Single-Precision Appendix H. Power ISA Instruction Set Sorted by Opcode 1299 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VSX Vector Negative Multiply-Subtract Type-A Single- XX3 60 836 471 VSX xvnmsubasp Precision VSX Vector Compare Greater Than or Equal To Single- XX3 60 844 407 VSX xvcmpgesp. Precision & Record VSX Vector Convert and round Unsigned Fixed-Point XX2 60 848 431 VSX xvcvuxdsp Doubleword to Single-Precision format XX2 60 850 461 VSX xvnabssp VSX Vector Negative Absolute Value Single-Precision VSX Vector Negative Multiply-Subtract Type-M Single- XX3 60 868 474 VSX xvnmsubmsp Precision VSX Vector Convert and round Signed Fixed-Point XX2 60 880 429 VSX xvcvsxdsp Doubleword to Single-Precision format XX2 60 882 462 VSX xvnegsp VSX Vector Negate Single-Precision XX3 60 896 443 VSX xvmaxdp VSX Vector Maximum Double-Precision VSX Vector Negative Multiply-Add Type-A Double- XX3 60 900 463 VSX xvnmaddadp Precision VSX Vector Compare Equal To Double-Precision & XX3 60 908 404 VSX xvcmpeqdp. Record VSX Vector truncate Double-Precision to integer and XX2 60 912 416 VSX xvcvdpuxds Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Vector Convert Single-Precision to Double- XX2 60 914 420 VSX xvcvspdp Precision XX3 60 928 447 VSX xvmindp VSX Vector Minimum Double-Precision VSX Vector Negative Multiply-Add Type-M Double- XX3 60 932 468 VSX xvnmaddmdp Precision VSX Vector Compare Greater Than Double-Precision & XX3 60 940 408 VSX xvcmpgtdp. Record VSX Vector truncate Double-Precision to integer and XX2 60 944 412 VSX xvcvdpsxds Convert to Signed Fixed-Point Doubleword Saturate XX2 60 946 397 VSX xvabsdp VSX Vector Absolute Value Double-Precision XX3 60 960 410 VSX xvcpsgndp VSX Vector Copy Sign Double-Precision VSX Vector Negative Multiply-Subtract Type-A Double- XX3 60 964 471 VSX xvnmsubadp Precision VSX Vector Compare Greater Than or Equal To XX3 60 972 406 VSX xvcmpgedp. Double-Precision & Record VSX Vector Convert and round Unsigned Fixed-Point XX2 60 976 431 VSX xvcvuxddp Doubleword to Double-Precision format XX2 60 978 461 VSX xvnabsdp VSX Vector Negative Absolute Value Double-Precision VSX Vector Negative Multiply-Subtract Type-M Double- XX3 60 996 474 VSX xvnmsubmdp Precision VSX Vector Convert and round Signed Fixed-Point XX2 60 1008 429 VSX xvcvsxddp Doubleword to Double-Precision format XX2 60 1010 462 VSX xvnegdp VSX Vector Negate Double-Precision DS 61 - 131 FP.out stfdp Store Floating-Point Double Pair DS 62 0 54 64 std Store Doubleword DS 62 1 54 64 stdu Store Doubleword with Update DS 62 2 P 759 LSQ stq Store Quadword X 63 0 148 FP fcmpu Floating Compare Unordered X 63 2 173 DFP daddq[.] DFP Add Quad Z23 63 3 184 DFP dquaq[.] DFP Quantize Quad X 63 8 132 FP[R] fcpsgn[.] Floating Copy Sign X 63 12 140 FP[R] frsp[.] Floating Round to Single-Precision X 63 14 142 FP[R] fctiw[.] Floating Convert To Integer Word X 63 15 143 FP[R] fctiwz[.] Floating Convert To Integer Word with round toward Zero A 63 18 134 FP[R] fdiv[.] Floating Divide A 63 20 133 FP[R] fsub[.] Floating Subtract A 63 21 133 FP[R] fadd[.] Floating Add A 63 22 135 FP[R] fsqrt[.] Floating Square Root A 63 23 149 FP[R] fsel[.] Floating Select 1300 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext A 63 24 135 FP[R] fre[.] Floating Reciprocal Estimate A 63 25 134 FP[R] fmul[.] Floating Multiply A 63 26 136 FP[R].in frsqrte[.] Floating Reciprocal Square Root Estimate A 63 28 138 FP[R] fmsub[.] Floating Multiply-Subtract A 63 29 138 FP[R] fmadd[.] Floating Multiply-Add A 63 30 139 FP[R] fnmsub[.] Floating Negative Multiply-Subtract A 63 31 139 FP[R] fnmadd[.] Floating Negative Multiply-Add X 63 32 148 FP fcmpo Floating Compare Ordered X 63 34 175 DFP dmulq[.] DFP Multiply Quad Z23 63 35 186 DFP drrndq[.] DFP Reround Quad X 63 38 152 FP[R] mtfsb1[.] Move To FPSCR Bit 1 X 63 40 132 FP[R] fneg[.] Floating Negate X 63 64 150 FP mcrfs Move to Condition Register from FPSCR Z23 63 66 200 DFP dscliq[.] DFP Shift Significand Left Immediate Quad Z23 63 67 183 DFP dquaiq[.] DFP Quantize Immediate Quad X 63 70 152 FP[R] mtfsb0[.] Move To FPSCR Bit 0 X 63 72 132 FP[R] fmr[.] Floating Move Register Z 63 98 200 DFP dscriq[.] DFP Shift Significand Right Immediate Quad Z23 63 99 189 DFP drintxq[.] DFP Round To FP Integer With Inexact Quad X 63 128 137 FP ftdiv Floating Test for software Divide X 63 130 179 DFP dcmpoq DFP Compare Ordered Quad X 63 134 151 FP[R] mtfsfi[.] Move To FPSCR Field Immediate X 63 136 132 FP[R] fnabs[.] Floating Negative Absolute Value X 63 142 143 FP fctiwu[.] Floating Convert To Integer Word Unsigned X 63 143 144 FP fctiwuz[.] Floating Convert To Integer Word Unsigned with round toward Zero X 63 160 137 FP ftsqrt Floating Test for software Square Root X 63 162 181 DFP dtstexq DFP Test Exponent Quad Z23 63 194 180 DFP dtstdcq DFP Test Data Class Quad Z23 63 226 180 DFP dtstdgq DFP Test Data Group Quad Z23 63 227 191 DFP drintnq[.] DFP Round To FP Integer Without Inexact Quad X 63 258 193 DFP dctqpq[.] DFP Convert To DFP Extended X 63 264 132 FP[R] fabs[.] Floating Absolute Value X 63 290 195 DFP dctfixq[.] DFP Convert To Fixed Quad X 63 322 197 DFP ddedpdq[.] DFP Decode DPD To BCD Quad X 63 354 198 DFP dxexq[.] DFP Extract Biased Exponent Quad X 63 392 147 FP[R].in frin[.] Floating Round to Integer Nearest X 63 424 147 FP[R].in friz[.] Floating Round to Integer Toward Zero X 63 456 147 FP[R].in frip[.] Floating Round to Integer Plus X 63 488 147 FP[R].in frim[.] Floating Round to Integer Minus X 63 514 173 DFP dsubq[.] DFP Subtract Quad X 63 546 176 DFP ddivq[.] DFP Divide Quad X 63 583 150 FP[R] mffs[.] Move From FPSCR X 63 642 179 DFP dcmpuq DFP Compare Unordered Quad X 63 674 182 DFP dtstsfq DFP Test Significance Quad XFL 63 711 151 FP[R] mtfsf[.] Move To FPSCR Fields X 63 770 194 DFP drdpq[.] DFP Round To DFP Long X 63 802 195 DFP dcffixq[.] DFP Convert From Fixed Quad X 63 814 140 FP[R] fctid[.] Floating Convert To Integer Doubleword X 63 815 141 FP[R] fctidz[.] Floating Convert To Integer Doubleword with round toward Zero X 63 834 197 DFP denbcdq[.] DFP Encode BCD To DPD Quad X 63 846 144 FP[R] fcfid[.] Floating Convert From Integer Doubleword X 63 866 198 DFP diexq[.] DFP Insert Biased Exponent Quad X 63 942 141 FP fctidu[.] Floating Convert To Integer Doubleword Unsigned X 63 943 142 FP fctiduz[.] Floating Convert To Integer Doubleword Unsigned with round toward Zero X 63 974 145 FP fcfidu[.] Floating Convert From Integer Doubleword Unsigned 1 See the key to the mode dependency and privilege columns on page 1324 and the key to the category column in Section 1.3.5 of Book I. Appendix H. Power ISA Instruction Set Sorted by Opcode 1301 Version 2.06 1302 Power ISATM Book Appendices Version 2.06 Appendix I. Power ISA Instruction Set Sorted by Mnemonic This appendix lists all the instructions in the Power ISA, in order by mnemonic. Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XO 31 266 SR 63 B add[o][.] Add XO 31 10 SR 64 B addc[o][.] Add Carrying XO 31 138 SR 65 B adde[o][.] Add Extended XO 31 74 H 97 BCDA addg6s Add and Generate Sixes D 14 62 B addi Add Immediate D 12 SR 63 B addic Add Immediate Carrying D 13 SR 63 B addic. Add Immediate Carrying and Record D 15 62 B addis Add Immediate Shifted XO 31 234 SR 65 B addme[o][.] Add to Minus One Extended XO 31 202 SR 66 B addze[o][.] Add to Zero Extended X 31 28 SR 80 B and[.] AND X 31 60 SR 81 B andc[.] AND with Complement D 28 SR 78 B andi. AND Immediate D 29 SR 78 B andis. AND Immediate Shifted I 18 35 B b[l][a] Branch B 16 CT 35 B bc[l][a] Branch Conditional XL 19 528 CT 36 B bcctr[l] Branch Conditional to Count Register XL 19 16 CT 36 B bclr[l] Branch Conditional to Link Register X 31 252 86 64 bpermd Bit Permute Doubleword EVX 4 527 510 SP brinc Bit Reversed Increment X 31 314 H 97 BCDA cbcdtd Convert Binary Coded Decimal to Declets X 31 282 H 97 BCDA cdtbcd Convert Declets To Binary Coded Decimal X 31 0 74 B cmp Compare X 31 508 82 B cmpb Compare Bytes D 11 74 B cmpi Compare Immediate X 31 32 75 B cmpl Compare Logical D 10 75 B cmpli Compare Logical Immediate X 31 58 SR 85 64 cntlzd[.] Count Leading Zeros Doubleword X 31 26 SR 81 B cntlzw[.] Count Leading Zeros Word XL 19 257 37 B crand Condition Register AND XL 19 129 38 B crandc Condition Register AND with Complement XL 19 289 38 B creqv Condition Register Equivalent XL 19 225 37 B crnand Condition Register NAND XL 19 33 38 B crnor Condition Register NOR XL 19 449 37 B cror Condition Register OR XL 19 417 38 B crorc Condition Register OR with Complement XL 19 193 37 B crxor Condition Register XOR X 59 2 173 DFP dadd[.] DFP Add X 63 2 173 DFP daddq[.] DFP Add Quad X 31 758 687 E dcba Data Cache Block Allocate X 31 86 691 B dcbf Data Cache Block Flush X 31 127 P 932 E.PD dcbfep Data Cache Block Flush by External PID X 31 470 P 988 E dcbi Data Cache Block Invalidate X 31 390 M 992 ECL dcblc Data Cache Block Lock Clear X 31 54 691 B dcbst Data Cache Block Store X 31 63 P 931 E.PD dcbstep Data Cache Block Store by External PID Appendix I. Power ISA Instruction Set Sorted by Mnemonic 1303 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 278 688 B dcbt Data Cache Block Touch X 31 319 P 931 E.PD dcbtep Data Cache Block Touch by External PID X 31 166 M 991 ECL dcbtls Data Cache Block Touch and Lock Set X 31 246 689 B dcbtst Data Cache Block Touch for Store X 31 255 P 933 E.PD dcbtstep Data Cache Block Touch for Store by External PID X 31 134 M 991 ECL dcbtstls Data Cache Block Touch for Store and Lock Set X 31 1014 691 B dcbz Data Cache Block set to Zero X 31 1023 P 934 E.PD dcbzep Data Cache Block set to Zero by External PID X 59 802 195 DFP dcffix[.] DFP Convert From Fixed X 63 802 195 DFP dcffixq[.] DFP Convert From Fixed Quad X 31 454 P 1103 E.CI dci Data Cache Invalidate X 59 130 179 DFP dcmpo DFP Compare Ordered X 63 130 179 DFP dcmpoq DFP Compare Ordered Quad X 59 642 178 DFP dcmpu DFP Compare Unordered X 63 642 179 DFP dcmpuq DFP Compare Unordered Quad X 31 326 P 1106 E.CD dcread Data Cache Read [Alternative Encoding] X 31 486 P 1106 E.CD dcread Data Cache Read X 59 258 193 DFP dctdp[.] DFP Convert To DFP Long X 59 290 195 DFP dctfix[.] DFP Convert To Fixed X 63 290 195 DFP dctfixq[.] DFP Convert To Fixed Quad X 63 258 193 DFP dctqpq[.] DFP Convert To DFP Extended X 59 322 197 DFP ddedpd[.] DFP Decode DPD To BCD X 63 322 197 DFP ddedpdq[.] DFP Decode DPD To BCD Quad X 59 546 176 DFP ddiv[.] DFP Divide X 63 546 176 DFP ddivq[.] DFP Divide Quad X 59 834 197 DFP denbcd[.] DFP Encode BCD To DPD X 63 834 197 DFP denbcdq[.] DFP Encode BCD To DPD Quad X 59 866 198 DFP diex[.] DFP Insert Biased Exponent X 63 866 198 DFP diexq[.] DFP Insert Biased Exponent Quad XO 31 489 SR 72 64 divd[o][.] Divide Doubleword XO 31 425 SR 73 64 divde[o][.] Divide Doubleword Extended XO 31 393 SR 73 64 divdeu[o][.] Divide Doubleword Extended Unsigned XO 31 457 SR 72 64 divdu[o][.] Divide Doubleword Unsigned XO 31 491 SR 68 B divw[o][.] Divide Word XO 31 427 SR 69 B divwe[o][.] Divide Word Extended XO 31 395 SR 69 B divweu[o][.] Divide Word Extended Unsigned XO 31 459 SR 68 B divwu[o][.] Divide Word Unsigned X 31 78 591 LMV dlmzb[.] Determine Leftmost Zero Byte X 59 34 175 DFP dmul[.] DFP Multiply X 63 34 175 DFP dmulq[.] DFP Multiply Quad XFX 19 198 1092 E.ED dnh Debugger Notify Halt XL 19 402 H 748 S doze Doze Z 59 3 184 DFP dqua[.] DFP Quantize Z23 59 67 183 DFP dquai[.] DFP Quantize Immediate Z23 63 67 183 DFP dquaiq[.] DFP Quantize Immediate Quad Z23 63 3 184 DFP dquaq[.] DFP Quantize Quad X 63 770 194 DFP drdpq[.] DFP Round To DFP Long Z23 59 227 191 DFP drintn[.] DFP Round To FP Integer Without Inexact Z23 63 227 191 DFP drintnq[.] DFP Round To FP Integer Without Inexact Quad Z23 59 99 189 DFP drintx[.] DFP Round To FP Integer With Inexact Z23 63 99 189 DFP drintxq[.] DFP Round To FP Integer With Inexact Quad Z 59 35 186 DFP drrnd[.] DFP Reround Z23 63 35 186 DFP drrndq[.] DFP Reround Quad X 59 770 194 DFP drsp[.] DFP Round To DFP Short Z23 59 66 200 DFP dscli[.] DFP Shift Significand Left Immediate Z23 63 66 200 DFP dscliq[.] DFP Shift Significand Left Immediate Quad Z 59 98 200 DFP dscri[.] DFP Shift Significand Right Immediate Z 63 98 200 DFP dscriq[.] DFP Shift Significand Right Immediate Quad X 31 483 714 DS dsn Decorated Storage Notify X 59 514 173 DFP dsub[.] DFP Subtract X 63 514 173 DFP dsubq[.] DFP Subtract Quad 1304 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext Z23 59 194 180 DFP dtstdc DFP Test Data Class Z23 63 194 180 DFP dtstdcq DFP Test Data Class Quad Z23 59 226 180 DFP dtstdg DFP Test Data Group Z23 63 226 180 DFP dtstdgq DFP Test Data Group Quad X 59 162 181 DFP dtstex DFP Test Exponent X 63 162 181 DFP dtstexq DFP Test Exponent Quad X 59 674 182 DFP dtstsf DFP Test Significance X 63 674 182 DFP dtstsfq DFP Test Significance Quad X 59 354 198 DFP dxex[.] DFP Extract Biased Exponent X 63 354 198 DFP dxexq[.] DFP Extract Biased Exponent Quad X 31 310 716 EC eciwx External Control In Word Indexed X 31 438 716 EC ecowx External Control Out Word Indexed EVX 4 740 577 SP.FD efdabs Floating-Point Double-Precision Absolute Value EVX 4 736 578 SP.FD efdadd Floating-Point Double-Precision Add EVX 4 751 584 SP.FD efdcfs Floating-Point Double-Precision Convert from Single- Precision EVX 4 755 582 SP.FD efdcfsf Convert Floating-Point Double-Precision from Signed Fraction EVX 4 753 581 SP.FD efdcfsi Convert Floating-Point Double-Precision from Signed Integer EVX 4 739 582 SP.FD efdcfsid Convert Floating-Point Double-Precision from Signed Integer Doubleword EVX 4 754 582 SP.FD efdcfuf Convert Floating-Point Double-Precision from Unsigned Fraction EVX 4 752 581 SP.FD efdcfui Convert Floating-Point Double-Precision from Unsigned Integer EVX 4 738 582 SP.FD efdcfuid Convert Floating-Point Double-Precision from Unsigned Integer Doubleword EVX 4 750 579 SP.FD efdcmpeq Floating-Point Double-Precision Compare Equal EVX 4 748 579 SP.FD efdcmpgt Floating-Point Double-Precision Compare Greater Than EVX 4 749 579 SP.FD efdcmplt Floating-Point Double-Precision Compare Less Than EVX 4 759 584 SP.FD efdctsf Convert Floating-Point Double-Precision to Signed Fraction EVX 4 757 582 SP.FD efdctsi Convert Floating-Point Double-Precision to Signed Integer EVX 4 747 583 SP.FD efdctsidz Convert Floating-Point Double-Precision to Signed Integer Doubleword with Round toward Zero EVX 4 762 584 SP.FD efdctsiz Convert Floating-Point Double-Precision to Signed Integer with Round toward Zero EVX 4 758 584 SP.FD efdctuf Convert Floating-Point Double-Precision to Unsigned Fraction EVX 4 756 582 SP.FD efdctui Convert Floating-Point Double-Precision to Unsigned Integer EVX 4 746 583 SP.FD efdctuidz Convert Floating-Point Double-Precision to Unsigned Integer Doubleword with Round toward Zero EVX 4 760 584 SP.FD efdctuiz Convert Floating-Point Double-Precision to Unsigned Integer with Round toward Zero EVX 4 745 578 SP.FD efddiv Floating-Point Double-Precision Divide EVX 4 744 578 SP.FD efdmul Floating-Point Double-Precision Multiply EVX 4 741 577 SP.FD efdnabs Floating-Point Double-Precision Negative Absolute Value EVX 4 742 577 SP.FD efdneg Floating-Point Double-Precision Negate EVX 4 737 578 SP.FD efdsub Floating-Point Double-Precision Subtract EVX 4 766 580 SP.FD efdtsteq Floating-Point Double-Precision Test Equal EVX 4 764 579 SP.FD efdtstgt Floating-Point Double-Precision Test Greater Than EVX 4 765 580 SP.FD efdtstlt Floating-Point Double-Precision Test Less Than EVX 4 708 570 SP.FS efsabs Floating-Point Single-Precision Absolute Value EVX 4 704 571 SP.FS efsadd Floating-Point Single-Precision Add Appendix I. Power ISA Instruction Set Sorted by Mnemonic 1305 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 719 585 SP.FD efscfd Floating-Point Single-Precision Convert from Double- Precision EVX 4 723 575 SP.FS efscfsf Convert Floating-Point Single-Precision from Signed Fraction EVX 4 721 575 SP.FS efscfsi Convert Floating-Point Single-Precision from Signed Integer EVX 4 722 575 SP.FS efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 720 575 SP.FS efscfui Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 718 573 SP.FS efscmpeq Floating-Point Single-Precision Compare Equal EVX 4 716 572 SP.FS efscmpgt Floating-Point Single-Precision Compare Greater Than EVX 4 717 572 SP.FS efscmplt Floating-Point Single-Precision Compare Less Than EVX 4 727 576 SP.FS efsctsf Convert Floating-Point Single-Precision to Signed Fraction EVX 4 725 575 SP.FS efsctsi Convert Floating-Point Single-Precision to Signed Inte- ger EVX 4 730 576 SP.FS efsctsiz Convert Floating-Point Single-Precision to Signed Inte- ger with Round toward Zero EVX 4 726 576 SP.FS efsctuf Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 724 575 SP.FS efsctui Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 728 576 SP.FS efsctuiz Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 713 571 SP.FS efsdiv Floating-Point Single-Precision Divide EVX 4 712 571 SP.FS efsmul Floating-Point Single-Precision Multiply EVX 4 709 570 SP.FS efsnabs Floating-Point Single-Precision Negative Absolute Value EVX 4 710 570 SP.FS efsneg Floating-Point Single-Precision Negate EVX 4 705 571 SP.FS efssub Floating-Point Single-Precision Subtract EVX 4 734 574 SP.FS efststeq Floating-Point Single-Precision Test Equal EVX 4 732 573 SP.FS efststgt Floating-Point Single-Precision Test Greater Than EVX 4 733 574 SP.FS efststlt Floating-Point Single-Precision Test Less Than XL 31 270 911 E.HV ehpriv Embedded Hypervisor Privilege X 31 854 703 S eieio Enforce In-order Execution of I/O X 31 284 SR 81 B eqv[.] Equivalent EVX 4 520 510 SP evabs Vector Absolute Value EVX 4 514 510 SP evaddiw Vector Add Immediate Word EVX 4 1225 510 SP evaddsmiaaw Vector Add Signed, Modulo, Integer to Accumulator Word EVX 4 1217 511 SP evaddssiaaw Vector Add Signed, Saturate, Integer to Accumulator Word EVX 4 1224 511 SP evaddumiaaw Vector Add Unsigned, Modulo, Integer to Accumulator Word EVX 4 1216 511 SP evaddusiaaw Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX 4 512 511 SP evaddw Vector Add Word EVX 4 529 512 SP evand Vector AND EVX 4 530 512 SP evandc Vector AND with Complement EVX 4 564 512 SP evcmpeq Vector Compare Equal EVX 4 561 512 SP evcmpgts Vector Compare Greater Than Signed EVX 4 560 513 SP evcmpgtu Vector Compare Greater Than Unsigned EVX 4 563 513 SP evcmplts Vector Compare Less Than Signed EVX 4 562 513 SP evcmpltu Vector Compare Less Than Unsigned EVX 4 526 514 SP evcntlsw Vector Count Leading Signed Bits Word EVX 4 525 514 SP evcntlzw Vector Count Leading Zeros Word EVX 4 1222 514 SP evdivws Vector Divide Word Signed EVX 4 1223 515 SP evdivwu Vector Divide Word Unsigned EVX 4 537 515 SP eveqv Vector Equivalent 1306 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 522 515 SP evextsb Vector Extend Sign Byte EVX 4 523 515 SP evextsh Vector Extend Sign Halfword EVX 4 644 562 SP.FV evfsabs Vector Floating-Point Single-Precision Absolute Value EVX 4 640 563 SP.FV evfsadd Vector Floating-Point Single-Precision Add EVX 4 659 567 SP.FV evfscfsf Vector Convert Floating-Point Single-Precision from Signed Fraction EVX 4 657 567 SP.FV evfscfsi Vector Convert Floating-Point Single-Precision from Signed Integer EVX 4 658 567 SP.FV evfscfuf Vector Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 656 567 SP.FV evfscfui Vector Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 654 565 SP.FV evfscmpeq Vector Floating-Point Single-Precision Compare Equal EVX 4 652 564 SP.FV evfscmpgt Vector Floating-Point Single-Precision Compare Greater Than EVX 4 653 564 SP.FV evfscmplt Vector Floating-Point Single-Precision Compare Less Than EVX 4 663 569 SP.FV evfsctsf Vector Convert Floating-Point Single-Precision to Signed Fraction EVX 4 661 568 SP.FV evfsctsi Vector Convert Floating-Point Single-Precision to Signed Integer EVX 4 666 568 SP.FV evfsctsiz Vector Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero EVX 4 662 569 SP.FV evfsctuf Vector Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 660 568 SP.FV evfsctui Vector Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 664 568 SP.FV evfsctuiz Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 649 563 SP.FV evfsdiv Vector Floating-Point Single-Precision Divide EVX 4 648 563 SP.FV evfsmul Vector Floating-Point Single-Precision Multiply EVX 4 645 562 SP.FV evfsnabs Vector Floating-Point Single-Precision Negative Abso- lute Value EVX 4 646 562 SP.FV evfsneg Vector Floating-Point Single-Precision Negate EVX 4 641 563 SP.FV evfssub Vector Floating-Point Single-Precision Subtract EVX 4 670 566 SP.FV evfststeq Vector Floating-Point Single-Precision Test Equal EVX 4 668 565 SP.FV evfststgt Vector Floating-Point Single-Precision Test Greater Than EVX 4 669 566 SP.FV evfststlt Vector Floating-Point Single-Precision Test Less Than EVX 4 769 516 SP evldd Vector Load Double Word into Double Word EVX 31 799 P 936 E.PD evlddepx Vector Load Doubleword into Doubleword by External Process ID Indexed EVX 4 768 516 SP evlddx Vector Load Double Word into Double Word Indexed EVX 4 773 516 SP evldh Vector Load Double into Four Halfwords EVX 4 772 516 SP evldhx Vector Load Double into Four Halfwords Indexed EVX 4 771 517 SP evldw Vector Load Double into Two Words EVX 4 770 517 SP evldwx Vector Load Double into Two Words Indexed EVX 4 777 517 SP evlhhesplat Vector Load Halfword into Halfwords Even and Splat EVX 4 776 517 SP evlhhesplatx Vector Load Halfword into Halfwords Even and Splat Indexed EVX 4 783 518 SP evlhhossplat Vector Load Halfword into Halfword Odd Signed and Splat EVX 4 782 518 SP evlhhossplatx Vector Load Halfword into Halfword Odd Signed and Splat Indexed EVX 4 781 518 SP evlhhousplat Vector Load Halfword into Halfword Odd Unsigned and Splat EVX 4 780 518 SP evlhhousplatx Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed EVX 4 785 519 SP evlwhe Vector Load Word into Two Halfwords Even EVX 4 784 519 SP evlwhex Vector Load Word into Two Halfwords Even Indexed Appendix I. Power ISA Instruction Set Sorted by Mnemonic 1307 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 791 519 SP evlwhos Vector Load Word into Two Halfwords Odd Signed (with sign extension) EVX 4 790 519 SP evlwhosx Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) EVX 4 789 520 SP evlwhou Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) EVX 4 788 520 SP evlwhoux Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) EVX 4 797 520 SP evlwhsplat Vector Load Word into Two Halfwords and Splat EVX 4 796 520 SP evlwhsplatx Vector Load Word into Two Halfwords and Splat Indexed EVX 4 793 521 SP evlwwsplat Vector Load Word into Word and Splat EVX 4 792 521 SP evlwwsplatx Vector Load Word into Word and Splat Indexed EVX 4 556 521 SP evmergehi Vector Merge High EVX 4 558 522 SP evmergehilo Vector Merge High/Low EVX 4 557 521 SP evmergelo Vector Merge Low EVX 4 559 522 SP evmergelohi Vector Merge Low/High EVX 4 1323 522 SP evmhegsmfaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate EVX 4 1451 522 SP evmhegsmfan Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 4 1321 523 SP evmhegsmiaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate EVX 4 1449 523 SP evmhegsmian Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 4 1320 523 SP evmhegumiaa Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1448 523 SP evmhegumian Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1035 524 SP evmhesmf Vector Multiply Halfwords, Even, Signed, Modulo, Fractional EVX 4 1067 524 SP evmhesmfa Vector Multiply Halfwords, Even, Signed, Modulo, Fractional to Accumulator EVX 4 1291 524 SP evmhesmfaaw Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate into Words EVX 4 1419 524 SP evmhesmfanw Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate Negative into Words EVX 4 1033 525 SP evmhesmi Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger EVX 4 1065 525 SP evmhesmia Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger to Accumulator EVX 4 1289 525 SP evmhesmiaaw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1417 525 SP evmhesmianw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1027 526 SP evmhessf Vector Multiply Halfwords, Even, Signed, Saturate, Fractional EVX 4 1059 526 SP evmhessfa Vector Multiply Halfwords, Even, Signed, Saturate, Fractional to Accumulator EVX 4 1283 527 SP evmhessfaaw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Words EVX 4 1411 527 SP evmhessfanw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate Negative into Words EVX 4 1281 528 SP evmhessiaaw Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate into Words EVX 4 1409 528 SP evmhessianw Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate Negative into Words EVX 4 1032 529 SP evmheumi Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer 1308 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1064 529 SP evmheumia Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator EVX 4 1288 529 SP evmheumiaaw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1416 529 SP evmheumianw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 4 1280 530 SP evmheusiaaw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1408 530 SP evmheusianw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 4 1327 531 SP evmhogsmfaa Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate EVX 4 1455 531 SP evmhogsmfan Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 4 1325 531 SP evmhogsmiaa Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate EVX 4 1453 531 SP evmhogsmian Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 4 1324 532 SP evmhogumiaa Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1452 532 SP evmhogumian Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1039 532 SP evmhosmf Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional EVX 4 1071 532 SP evmhosmfa Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional to Accumulator EVX 4 1295 533 SP evmhosmfaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1423 533 SP evmhosmfanw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1037 533 SP evmhosmi Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger EVX 4 1069 533 SP evmhosmia Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger to Accumulator EVX 4 1293 534 SP evmhosmiaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1421 533 SP evmhosmianw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1031 535 SP evmhossf Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional EVX 4 1063 535 SP evmhossfa Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional to Accumulator EVX 4 1287 536 SP evmhossfaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional and Accumulate into Words EVX 4 1415 536 SP evmhossfanw Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional and Accumulate Negative into Words EVX 4 1285 537 SP evmhossiaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate into Words EVX 4 1413 537 SP evmhossianw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate Negative into Words EVX 4 1036 537 SP evmhoumi Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer EVX 4 1068 537 SP evmhoumia Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator EVX 4 1292 538 SP evmhoumiaaw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1420 534 SP evmhoumianw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words Appendix I. Power ISA Instruction Set Sorted by Mnemonic 1309 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1284 538 SP evmhousiaaw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1412 538 SP evmhousianw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 4 1220 539 SP evmra Initialize Accumulator EVX 4 1103 539 SP evmwhsmf Vector Multiply Word High Signed, Modulo, Fractional EVX 4 1135 539 SP evmwhsmfa Vector Multiply Word High Signed, Modulo, Fractional to Accumulator EVX 4 1101 539 SP evmwhsmi Vector Multiply Word High Signed, Modulo, Integer EVX 4 1133 539 SP evmwhsmia Vector Multiply Word High Signed, Modulo, Integer to Accumulator EVX 4 1095 540 SP evmwhssf Vector Multiply Word High Signed, Saturate, Fractional EVX 4 1127 540 SP evmwhssfa Vector Multiply Word High Signed, Saturate, Fractional to Accumulator EVX 4 1100 540 SP evmwhumi Vector Multiply Word High Unsigned, Modulo, Integer EVX 4 1132 540 SP evmwhumia Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator EVX 4 1353 541 SP evmwlsmiaaw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words EVX 4 1481 541 SP evmwlsmianw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words EVX 4 1345 541 SP evmwlssiaaw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words EVX 4 1473 541 SP evmwlssianw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words EVX 4 1096 542 SP evmwlumi Vector Multiply Word Low Unsigned, Modulo, Integer EVX 4 1128 542 SP evmwlumia Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator EVX 4 1352 542 SP evmwlumiaaw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1480 542 SP evmwlumianw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words EVX 4 1344 543 SP evmwlusiaaw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1472 543 SP evmwlusianw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words EVX 4 1115 543 SP evmwsmf Vector Multiply Word Signed, Modulo, Fractional EVX 4 1147 543 SP evmwsmfa Vector Multiply Word Signed, Modulo, Fractional to Accumulator EVX 4 1371 544 SP evmwsmfaa Vector Multiply Word Signed, Modulo, Fractional and Accumulate EVX 4 1499 544 SP evmwsmfan Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative EVX 4 1113 544 SP evmwsmi Vector Multiply Word Signed, Modulo, Integer EVX 4 1145 544 SP evmwsmia Vector Multiply Word Signed, Modulo, Integer to Accu- mulator EVX 4 1369 544 SP evmwsmiaa Vector Multiply Word Signed, Modulo, Integer and Accumulate EVX 4 1497 544 SP evmwsmian Vector Multiply Word Signed, Modulo, Integer and Accumulate Negative EVX 4 1107 545 SP evmwssf Vector Multiply Word Signed, Saturate, Fractional EVX 4 1139 545 SP evmwssfa Vector Multiply Word Signed, Saturate, Fractional to Accumulator EVX 4 1363 545 SP evmwssfaa Vector Multiply Word Signed, Saturate, Fractional and Accumulate EVX 4 1491 546 SP evmwssfan Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative EVX 4 1112 546 SP evmwumi Vector Multiply Word Unsigned, Modulo, Integer 1310 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1144 546 SP evmwumia Vector Multiply Word Unsigned, Modulo, Integer to Accumulator EVX 4 1368 547 SP evmwumiaa Vector Multiply Word Unsigned, Modulo, Integer and Accumulate EVX 4 1496 547 SP evmwumian Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative EVX 4 542 547 SP evnand Vector NAND EVX 4 521 547 SP evneg Vector Negate EVX 4 536 547 SP evnor Vector NOR EVX 4 535 548 SP evor Vector OR EVX 4 539 548 SP evorc Vector OR with Complement EVX 4 552 548 SP evrlw Vector Rotate Left Word EVX 4 554 549 SP evrlwi Vector Rotate Left Word Immediate EVX 4 524 549 SP evrndw Vector Round Word EVS 4 79 549 SP evsel Vector Select EVX 4 548 550 SP evslw Vector Shift Left Word EVX 4 550 550 SP evslwi Vector Shift Left Word Immediate EVX 4 555 550 SP evsplatfi Vector Splat Fractional Immediate EVX 4 553 550 SP evsplati Vector Splat Immediate EVX 4 547 550 SP evsrwis Vector Shift Right Word Immediate Signed EVX 4 546 550 SP evsrwiu Vector Shift Right Word Immediate Unsigned EVX 4 545 551 SP evsrws Vector Shift Right Word Signed EVX 4 544 551 SP evsrwu Vector Shift Right Word Unsigned EVX 4 801 551 SP evstdd Vector Store Double of Double EVX 31 927 P 936 E.PD evstddepx Vector Store Doubleword into Doubleword by External Process ID Indexed EVX 4 800 551 SP evstddx Vector Store Double of Double Indexed EVX 4 805 552 SP evstdh Vector Store Double of Four Halfwords EVX 4 804 552 SP evstdhx Vector Store Double of Four Halfwords Indexed EVX 4 803 552 SP evstdw Vector Store Double of Two Words EVX 4 802 552 SP evstdwx Vector Store Double of Two Words Indexed EVX 4 817 553 SP evstwhe Vector Store Word of Two Halfwords from Even EVX 4 816 553 SP evstwhex Vector Store Word of Two Halfwords from Even Indexed EVX 4 821 553 SP evstwho Vector Store Word of Two Halfwords from Odd EVX 4 820 553 SP evstwhox Vector Store Word of Two Halfwords from Odd Indexed EVX 4 825 553 SP evstwwe Vector Store Word of Word from Even EVX 4 824 553 SP evstwwex Vector Store Word of Word from Even Indexed EVX 4 829 554 SP evstwwo Vector Store Word of Word from Odd EVX 4 828 554 SP evstwwox Vector Store Word of Word from Odd Indexed EVX 4 1227 554 SP evsubfsmiaaw Vector Subtract Signed, Modulo, Integer to Accumula- tor Word EVX 4 1219 554 SP evsubfssiaaw Vector Subtract Signed, Saturate, Integer to Accumula- tor Word EVX 4 1226 555 SP evsubfumiaaw Vector Subtract Unsigned, Modulo, Integer to Accumu- lator Word EVX 4 1218 555 SP evsubfusiaaw Vector Subtract Unsigned, Saturate, Integer to Accu- mulator Word EVX 4 516 555 SP evsubfw Vector Subtract from Word EVX 4 518 555 SP evsubifw Vector Subtract Immediate from Word EVX 4 534 555 SP evxor Vector XOR X 31 954 SR 81 B extsb[.] Extend Sign Byte X 31 922 SR 81 B extsh[.] Extend Sign Halfword X 31 986 SR 85 64 extsw[.] Extend Sign Word X 63 264 132 FP[R] fabs[.] Floating Absolute Value A 63 21 133 FP[R] fadd[.] Floating Add A 59 21 133 FP[R] fadds[.] Floating Add Single X 63 846 144 FP[R] fcfid[.] Floating Convert From Integer Doubleword X 59 846 145 FP fcfids[.] Floating Convert From Integer Doubleword Single X 63 974 145 FP fcfidu[.] Floating Convert From Integer Doubleword Unsigned Appendix I. Power ISA Instruction Set Sorted by Mnemonic 1311 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 59 974 146 FP fcfidus[.] Floating Convert From Integer Doubleword Unsigned Single X 63 32 148 FP fcmpo Floating Compare Ordered X 63 0 148 FP fcmpu Floating Compare Unordered X 63 8 132 FP[R] fcpsgn[.] Floating Copy Sign X 63 814 140 FP[R] fctid[.] Floating Convert To Integer Doubleword X 63 942 141 FP fctidu[.] Floating Convert To Integer Doubleword Unsigned X 63 943 142 FP fctiduz[.] Floating Convert To Integer Doubleword Unsigned with round toward Zero X 63 815 141 FP[R] fctidz[.] Floating Convert To Integer Doubleword with round toward Zero X 63 14 142 FP[R] fctiw[.] Floating Convert To Integer Word X 63 142 143 FP fctiwu[.] Floating Convert To Integer Word Unsigned X 63 143 144 FP fctiwuz[.] Floating Convert To Integer Word Unsigned with round toward Zero X 63 15 143 FP[R] fctiwz[.] Floating Convert To Integer Word with round toward Zero A 63 18 134 FP[R] fdiv[.] Floating Divide A 59 18 134 FP[R] fdivs[.] Floating Divide Single A 63 29 138 FP[R] fmadd[.] Floating Multiply-Add A 59 29 138 FP[R] fmadds[.] Floating Multiply-Add Single X 63 72 132 FP[R] fmr[.] Floating Move Register A 63 28 138 FP[R] fmsub[.] Floating Multiply-Subtract A 59 28 138 FP[R] fmsubs[.] Floating Multiply-Subtract Single A 63 25 134 FP[R] fmul[.] Floating Multiply A 59 25 134 FP[R] fmuls[.] Floating Multiply Single X 63 136 132 FP[R] fnabs[.] Floating Negative Absolute Value X 63 40 132 FP[R] fneg[.] Floating Negate A 63 31 139 FP[R] fnmadd[.] Floating Negative Multiply-Add A 59 31 139 FP[R] fnmadds[.] Floating Negative Multiply-Add Single A 63 30 139 FP[R] fnmsub[.] Floating Negative Multiply-Subtract A 59 30 139 FP[R] fnmsubs[.] Floating Negative Multiply-Subtract Single A 63 24 135 FP[R] fre[.] Floating Reciprocal Estimate A 59 24 135 FP[R] fres[.] Floating Reciprocal Estimate Single X 63 488 147 FP[R].in frim[.] Floating Round to Integer Minus X 63 392 147 FP[R].in frin[.] Floating Round to Integer Nearest X 63 456 147 FP[R].in frip[.] Floating Round to Integer Plus X 63 424 147 FP[R].in friz[.] Floating Round to Integer Toward Zero X 63 12 140 FP[R] frsp[.] Floating Round to Single-Precision A 63 26 136 FP[R].in frsqrte[.] Floating Reciprocal Square Root Estimate A 59 26 136 FP[R].in frsqrtes[.] Floating Reciprocal Square Root Estimate Single A 63 23 149 FP[R] fsel[.] Floating Select A 63 22 135 FP[R] fsqrt[.] Floating Square Root A 59 22 135 FP[R] fsqrts[.] Floating Square Root Single A 63 20 133 FP[R] fsub[.] Floating Subtract A 59 20 133 FP[R] fsubs[.] Floating Subtract Single X 63 128 137 FP ftdiv Floating Test for software Divide X 63 160 137 FP ftsqrt Floating Test for software Square Root XL 19 274 H 746 S hrfid Hypervisor Return From Interrupt Doubleword X 31 982 680 B icbi Instruction Cache Block Invalidate X 31 991 P 934 E.PD icbiep Instruction Cache Block Invalidate by External PID X 31 230 M 993 ECL icblc Instruction Cache Block Lock Clear X 31 22 680 E icbt Instruction Cache Block Touch X 31 486 M 992 ECL icbtls Instruction Cache Block Touch and Lock Set X 31 966 P 1103 E.CI ici Instruction Cache Invalidate X 31 998 P 1107 E.CD icread Instruction Cache Read A 31 15 77 B isel Integer Select XL 19 150 693 B isync Instruction Synchronize X 31 52 694 B lbarx Load Byte and Reserve Indexed X 31 515 712 DS lbdx Load Byte with Decoration Indexed X 31 95 P 927 E.PD lbepx Load Byte by External Process ID Indexed 1312 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext D 34 45 B lbz Load Byte and Zero X 31 853 H 757 S lbzcix Load Byte and Zero Caching Inhibited Indexed D 35 45 B lbzu Load Byte and Zero with Update X 31 119 45 B lbzux Load Byte and Zero with Update Indexed X 31 87 46 B lbzx Load Byte and Zero Indexed DS 58 0 50 64 ld Load Doubleword X 31 84 699 64 ldarx Load Doubleword And Reserve Indexed X 31 532 56 64 ldbrx Load Doubleword Byte-Reverse Indexed X 31 885 H 757 S ldcix Load Doubleword Caching Inhibited Indexed X 31 611 712 DS lddx Load Doubleword with Decoration Indexed X 31 29 P 928 E.PD;64 ldepx Load Doubleword by External Process ID Indexed DS 58 1 50 64 ldu Load Doubleword with Update X 31 53 50 64 ldux Load Doubleword with Update Indexed X 31 21 50 64 ldx Load Doubleword Indexed D 50 125 FP lfd Load Floating-Point Double X 31 803 712 DS lfddx Load Floating Doubleword with Decoration Indexed X 31 607 P 935 E.PD lfdepx Load Floating-Point Double by External Process ID Indexed DS 57 0 131 FP.out lfdp Load Floating-Point Double Pair X 31 791 131 FP.out lfdpx Load Floating-Point Double Pair Indexed D 51 125 FP lfdu Load Floating-Point Double with Update X 31 631 125 FP lfdux Load Floating-Point Double with Update Indexed X 31 599 125 FP lfdx Load Floating-Point Double Indexed X 31 855 126 FP lfiwax Load Floating-Point as Integer Word Algebraic Indexed X 31 887 126 FP lfiwzx Load Floating-Point as Integer Word and Zero Indexed D 48 128 FP lfs Load Floating-Point Single D 49 128 FP lfsu Load Floating-Point Single with Update X 31 567 128 FP lfsux Load Floating-Point Single with Update Indexed X 31 535 128 FP lfsx Load Floating-Point Single Indexed D 42 47 B lha Load Halfword Algebraic X 31 116 695 B lharx Load Halfword and Reserve Indexed D 43 47 B lhau Load Halfword Algebraic with Update X 31 375 47 B lhaux Load Halfword Algebraic with Update Indexed X 31 343 47 B lhax Load Halfword Algebraic Indexed X 31 790 55 B lhbrx Load Halfword Byte-Reverse Indexed X 31 547 712 DS lhdx Load Halfword with Decoration Indexed X 31 287 P 927 E.PD lhepx Load Halfword by External Process ID Indexed D 40 46 B lhz Load Halfword and Zero X 31 821 H 757 S lhzcix Load Halfword and Zero Caching Inhibited Indexed D 41 46 B lhzu Load Halfword and Zero with Update X 31 311 46 B lhzux Load Halfword and Zero with Update Indexed X 31 279 46 B lhzx Load Halfword and Zero Indexed D 46 57 B lmw Load Multiple Word DQ 56 P 759 LSQ lq Load Quadword X 31 597 59 MA lswi Load String Word Immediate X 31 533 59 MA lswx Load String Word Indexed X 31 7 216 V lvebx Load Vector Element Byte Indexed X 31 39 213 V lvehx Load Vector Element Halfword Indexed X 31 295 P 937 E.PD lvepx Load Vector by External Process ID Indexed X 31 263 P 937 E.PD lvepxl Load Vector by External Process ID Indexed LRU X 31 71 213 V lvewx Load Vector Element Word Indexed X 31 6 218 V lvsl Load Vector for Shift Left Indexed X 31 38 218 V lvsr Load Vector for Shift Right Indexed X 31 103 214 V lvx Load Vector Indexed X 31 359 214 V lvxl Load Vector Indexed LRU DS 58 2 49 64 lwa Load Word Algebraic X 31 20 694 B lwarx Load Word And Reserve Indexed X 31 373 49 64 lwaux Load Word Algebraic with Update Indexed X 31 341 49 64 lwax Load Word Algebraic Indexed X 31 534 55 B lwbrx Load Word Byte-Reverse Indexed X 31 579 712 DS lwdx Load Word with Decoration Indexed Appendix I. Power ISA Instruction Set Sorted by Mnemonic 1313 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 31 P 928 E.PD lwepx Load Word by External Process ID Indexed D 32 48 B lwz Load Word and Zero X 31 789 H 757 S lwzcix Load Word and Zero Caching Inhibited Indexed D 33 48 B lwzu Load Word and Zero with Update X 31 55 48 B lwzux Load Word and Zero with Update Indexed X 31 23 48 B lwzx Load Word and Zero Indexed XX1 31 620 338 VSX lxsdux Load VSR Scalar Doubleword with Update Indexed XX1 31 588 338 VSX lxsdx Load VSR Scalar Doubleword Indexed XX1 31 876 338 VSX lxvd2ux Load VSR Vector Doubleword*2 with Update Indexed XX1 31 844 338 VSX lxvd2x Load VSR Vector Doubleword*2 Indexed XX1 31 332 339 VSX lxvdsx Load VSR Vector Doubleword & Splat Indexed XX1 31 812 339 VSX lxvw4ux Load VSR Vector Word*4 with Update Indexed XX1 31 780 339 VSX lxvw4x Load VSR Vector Word*4 Indexed XO 4 172 593 LMA macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed XO 4 236 593 LMA macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed XO 4 204 594 LMA macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned XO 4 140 594 LMA macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned XO 4 44 595 LMA machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed XO 4 108 595 LMA machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed XO 4 76 596 LMA machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned XO 4 12 596 LMA machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned XO 4 428 597 LMA maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed XO 4 492 597 LMA maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed XO 4 460 598 LMA maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned XO 4 396 598 LMA maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned X 31 854 703 E mbar Memory Barrier XL 19 0 38 B mcrf Move Condition Register Field X 63 64 150 FP mcrfs Move to Condition Register from FPSCR X 31 512 104 E mcrxr Move to Condition Register from XER XFX 31 19 102 B mfcr Move From Condition Register XFX 31 323 P 923 E.DC mfdcr Move From Device Control Register X 31 291 104 E.DC mfdcrux Move From Device Control Register User-mode Indexed X 31 259 P 923 E.DC mfdcrx Move From Device Control Register Indexed X 63 583 150 FP[R] mffs[.] Move From FPSCR X 31 83 P 767, B mfmsr Move From Machine State Register 923 XFX 31 19 103 B mfocrf Move From One Condition Register Field XFX 31 334 O 1118 E.PM mfpmr Move From Performance Monitor Register XFX 31 339 O 101, B mfspr Move From Special Purpose Register 708 X 31 595 32 P 810 S mfsr Move From Segment Register X 31 659 32 P 810 S mfsrin Move From Segment Register Indirect XFX 31 371 708 S.out mftb Move From Time Base VX 4 1540 269 V mfvscr Move From Vector Status and Control Register X 31 238 P 1098 E.PC msgclr Message Clear X 31 206 P 1098 E.PC msgsnd Message Send XFX 31 144 102 B mtcrf Move To Condition Register Fields XFX 31 451 P 922 E.DC mtdcr Move To Device Control Register 1314 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 419 104 E.DC mtdcrux Move To Device Control Register User-mode Indexed X 31 387 P 922 E.DC mtdcrx Move To Device Control Register Indexed X 63 70 152 FP[R] mtfsb0[.] Move To FPSCR Bit 0 X 63 38 152 FP[R] mtfsb1[.] Move To FPSCR Bit 1 XFL 63 711 151 FP[R] mtfsf[.] Move To FPSCR Fields X 63 134 151 FP[R] mtfsfi[.] Move To FPSCR Field Immediate X 31 146 P 923 E mtmsr Move To Machine State Register X 31 146 P 765 S mtmsr Move To Machine State Register X 31 178 P 766 S mtmsrd Move To Machine State Register Doubleword XFX 31 144 103 B mtocrf Move To One Condition Register Field XFX 31 462 O 1118 E.PM mtpmr Move To Performance Monitor Register XFX 31 467 O 100 B mtspr Move To Special Purpose Register X 31 210 32 P 809 S mtsr Move To Segment Register X 31 242 32 P 809 S mtsrin Move To Segment Register Indirect VX 4 1604 269 V mtvscr Move To Vector Status and Control Register X 4 168 598 LMA mulchw[.] Multiply Cross Halfword to Word Signed X 4 136 598 LMA mulchwu[.] Multiply Cross Halfword to Word Unsigned XO 31 73 SR 71 64 mulhd[.] Multiply High Doubleword XO 31 9 SR 71 64 mulhdu[.] Multiply High Doubleword Unsigned X 4 40 599 LMA mulhhw[.] Multiply High Halfword to Word Signed X 4 8 599 LMA mulhhwu[.] Multiply High Halfword to Word Unsigned XO 31 75 SR 67 B mulhw[.] Multiply High Word XO 31 11 SR 67 B mulhwu[.] Multiply High Word Unsigned XO 31 233 SR 71 64 mulld[o][.] Multiply Low Doubleword X 4 424 599 LMA mullhw[.] Multiply Low Halfword to Word Signed X 4 392 599 LMA mullhwu[.] Multiply Low Halfword to Word Unsigned D 7 67 B mulli Multiply Low Immediate XO 31 235 SR 67 B mullw[o][.] Multiply Low Word X 31 476 SR 80 B nand[.] NAND XL 19 434 H 748 S nap Nap XO 31 104 SR 66 B neg[o][.] Negate XO 4 174 600 LMA nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Modulo Signed XO 4 238 600 LMA nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Saturate Signed XO 4 46 601 LMA nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Modulo Signed XO 4 110 601 LMA nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Saturate Signed XO 4 430 602 LMA nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Modulo Signed XO 4 494 602 LMA nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Saturate Signed X 31 124 SR 81 B nor[.] NOR X 31 444 SR 80 B or[.] OR X 31 412 SR 81 B orc[.] OR with Complement D 24 78 B ori OR Immediate D 25 79 B oris OR Immediate Shifted X 31 122 83 B popcntb Population Count Bytes X 31 506 85 B popcntd Population Count Doubleword X 31 378 83 B popcntw Population Count Word X 31 186 84 64 prtyd Parity Doubleword X 31 154 84 B prtyw Parity Word XL 19 51 P 909 E rfci Return From Critical Interrupt X 19 39 P 910 E.ED rfdi Return From Debug Interrupt XL 19 102 911 E.HV rfgi Return From Guest Interrupt XL 19 50 P 909 E rfi Return From Interrupt XL 19 18 P 746 S rfid Return From Interrupt Doubleword XL 19 38 P 910 E rfmci Return From Machine Check Interrupt MDS 30 8 SR 91 64 rldcl[.] Rotate Left Doubleword then Clear Left MDS 30 9 SR 92 64 rldcr[.] Rotate Left Doubleword then Clear Right Appendix I. Power ISA Instruction Set Sorted by Mnemonic 1315 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext MD 30 2 SR 91 64 rldic[.] Rotate Left Doubleword Immediate then Clear MD 30 0 SR 90 64 rldicl[.] Rotate Left Doubleword Immediate then Clear Left MD 30 1 SR 90 64 rldicr[.] Rotate Left Doubleword Immediate then Clear Right MD 30 3 SR 92 64 rldimi[.] Rotate Left Doubleword Immediate then Mask Insert M 20 SR 89 B rlwimi[.] Rotate Left Word Immediate then Mask Insert M 21 SR 87 B rlwinm[.] Rotate Left Word Immediate then AND with Mask M 23 SR 88 B rlwnm[.] Rotate Left Word then AND with Mask XL 19 498 H 749 S rvwinkle Rip Van Winkle SC 17 39, B sc System Call 745, 908 X 31 979 SR P 807 S slbfee. SLB Find Entry ESID X 31 498 P 804 S slbia SLB Invalidate All X 31 434 P 802 S slbie SLB Invalidate Entry X 31 915 P 806 S slbmfee SLB Move From Entry ESID X 31 851 P 806 S slbmfev SLB Move From Entry VSID X 31 402 P 805 S slbmte SLB Move To Entry X 31 27 SR 95 64 sld[.] Shift Left Doubleword XL 19 466 H 748 S sleep Sleep X 31 24 SR 93 B slw[.] Shift Left Word X 31 794 SR 96 64 srad[.] Shift Right Algebraic Doubleword XS 31 413 SR 96 64 sradi[.] Shift Right Algebraic Doubleword Immediate X 31 792 SR 94 B sraw[.] Shift Right Algebraic Word X 31 824 SR 94 B srawi[.] Shift Right Algebraic Word Immediate X 31 539 SR 95 64 srd[.] Shift Right Doubleword X 31 536 SR 93 B srw[.] Shift Right Word D 38 51 B stb Store Byte X 31 981 H 758 S stbcix Store Byte Caching Inhibited Indexed X 31 694 696 B stbcx. Store Byte Conditional Indexed X 31 643 713 DS stbdx Store Byte with Decoration Indexed X 31 223 P 929 E.PD stbepx Store Byte by External Process ID Indexed D 39 51 B stbu Store Byte with Update X 31 247 51 B stbux Store Byte with Update Indexed X 31 215 51 B stbx Store Byte Indexed DS 62 0 54 64 std Store Doubleword X 31 660 56 64 stdbrx Store Doubleword Byte-Reverse Indexed X 31 1013 H 758 S stdcix Store Doubleword Caching Inhibited Indexed X 31 214 699 64 stdcx. Store Doubleword Conditional Indexed X 31 739 713 DS stddx Store Doubleword with Decoration Indexed X 31 157 P 930 E.PD;64 stdepx Store Doubleword by External Process ID Indexed DS 62 1 54 64 stdu Store Doubleword with Update X 31 181 54 64 stdux Store Doubleword with Update Indexed X 31 149 54 64 stdx Store Doubleword Indexed D 54 129 FP stfd Store Floating-Point Double X 31 931 713 DS stfddx Store Floating Doubleword with Decoration Indexed X 31 735 P 935 E.PD stfdepx Store Floating-Point Double by External Process ID Indexed DS 61 - 131 FP.out stfdp Store Floating-Point Double Pair X 31 919 131 FP.out stfdpx Store Floating-Point Double Pair Indexed D 55 129 FP stfdu Store Floating-Point Double with Update X 31 759 129 FP stfdux Store Floating-Point Double with Update Indexed X 31 727 129 FP stfdx Store Floating-Point Double Indexed X 31 983 130 FP stfiwx Store Floating-Point as Integer Word Indexed D 52 128 FP stfs Store Floating-Point Single D 53 128 FP stfsu Store Floating-Point Single with Update X 31 695 128 FP stfsux Store Floating-Point Single with Update Indexed X 31 663 128 FP stfsx Store Floating-Point Single Indexed D 44 52 B sth Store Halfword X 31 918 55 B sthbrx Store Halfword Byte-Reverse Indexed X 31 949 H 758 S sthcix Store Halfword Caching Inhibited Indexed X 31 726 697 B sthcx. Store Halfword Conditional Indexed 1316 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 675 713 DS sthdx Store Halfword with Decoration Indexed X 31 415 P 929 E.PD sthepx Store Halfword by External Process ID Indexed D 45 52 B sthu Store Halfword with Update X 31 439 52 B sthux Store Halfword with Update Indexed X 31 407 52 B sthx Store Halfword Indexed D 47 57 B stmw Store Multiple Word DS 62 2 P 759 LSQ stq Store Quadword X 31 725 60 MA stswi Store String Word Immediate X 31 661 60 MA stswx Store String Word Indexed X 31 135 216 V stvebx Store Vector Element Byte Indexed X 31 167 216 V stvehx Store Vector Element Halfword Indexed X 31 807 P 938 E.PD stvepx Store Vector by External Process ID Indexed X 31 775 P 938 E.PD stvepxl Store Vector by External Process ID Indexed LRU X 31 199 217 V stvewx Store Vector Element Word Indexed X 31 231 214 V stvx Store Vector Indexed X 31 487 217 V stvxl Store Vector Indexed LRU D 36 53 B stw Store Word X 31 662 55 B stwbrx Store Word Byte-Reverse Indexed X 31 917 H 758 S stwcix Store Word Caching Inhibited Indexed X 31 150 698 B stwcx. Store Word Conditional Indexed X 31 707 713 DS stwdx Store Word with Decoration Indexed X 31 159 P 930 E.PD stwepx Store Word by External Process ID Indexed D 37 53 B stwu Store Word with Update X 31 183 53 B stwux Store Word with Update Indexed X 31 151 53 B stwx Store Word Indexed XX1 31 748 340 VSX stxsdux Store VSR Scalar Doubleword with Update Indexed XX1 31 716 340 VSX stxsdx Store VSR Scalar Doubleword Indexed XX1 31 1004 340 VSX stxvd2ux Store VSR Vector Doubleword*2 with Update Indexed XX1 31 972 340 VSX stxvd2x Store VSR Vector Doubleword*2 Indexed XX1 31 940 341 VSX stxvw4ux Store VSR Vector Word*4 with Update Indexed XX1 31 908 341 VSX stxvw4x Store VSR Vector Word*4 Indexed XO 31 40 SR 63 B subf[o][.] Subtract From XO 31 8 SR 64 B subfc[o][.] Subtract From Carrying XO 31 136 SR 65 B subfe[o][.] Subtract From Extended D 8 SR 64 B subfic Subtract From Immediate Carrying XO 31 232 SR 65 B subfme[o][.] Subtract From Minus One Extended XO 31 200 SR 66 B subfze[o][.] Subtract From Zero Extended X 31 598 701 B sync Synchronize X 31 68 77 64 td Trap Doubleword D 2 77 64 tdi Trap Doubleword Immediate X 31 370 H 817 S tlbia TLB Invalidate All X 31 306 64 H 811 S tlbie TLB Invalidate Entry X 31 274 64 P 814 S tlbiel TLB Invalidate Entry Local X 31 18 P 1003 E.HV tlbilx TLB Invalidate Local X 31 786 P 1001 E tlbivax TLB Invalidate Virtual Address Indexed ,903 X 31 946 P 1008 E tlbre TLB Read Entry ,904 X 31 850 P 1007 E.TWC tlbsrx. TLB Search and Reserve X 31 914 P 1005 E tlbsx TLB Search Indexed ,904 X 31 566 H 817, B tlbsync TLB Synchronize 1010 ,905 X 31 978 P 1010 E tlbwe TLB Write Entry ,905 X 31 4 76 B tw Trap Word D 3 76 B twi Trap Word Immediate VX 4 384 230 V vaddcuw Vector Add and Write Carry-Out Unsigned Word VX 4 10 259 V vaddfp Vector Add Single-Precision VX 4 768 230 V vaddsbs Vector Add Signed Byte Saturate Appendix I. Power ISA Instruction Set Sorted by Mnemonic 1317 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 832 230 V vaddshs Vector Add Signed Halfword Saturate VX 4 896 230 V vaddsws Vector Add Signed Word Saturate VX 4 0 231 V vaddubm Vector Add Unsigned Byte Modulo VX 4 512 232 V vaddubs Vector Add Unsigned Byte Saturate VX 4 64 231 V vadduhm Vector Add Unsigned Halfword Modulo VX 4 576 232 V vadduhs Vector Add Unsigned Halfword Saturate VX 4 128 231 V vadduwm Vector Add Unsigned Word Modulo VX 4 640 232 V vadduws Vector Add Unsigned Word Saturate VX 4 1028 254 V vand Vector Logical AND VX 4 1092 254 V vandc Vector Logical AND with Complement VX 4 1282 245 V vavgsb Vector Average Signed Byte VX 4 1346 245 V vavgsh Vector Average Signed Halfword VX 4 1410 245 V vavgsw Vector Average Signed Word VX 4 1026 246 V vavgub Vector Average Unsigned Byte VX 4 1090 246 V vavguh Vector Average Unsigned Halfword VX 4 1154 246 V vavguw Vector Average Unsigned Word VX 4 842 263 V vcfsx Vector Convert From Signed Fixed-Point Word VX 4 778 263 V vcfux Vector Convert From Unsigned Fixed-Point Word VC 4 966 265 V vcmpbfp[.] Vector Compare Bounds Single-Precision VC 4 198 265 V vcmpeqfp[.] Vector Compare Equal To Single-Precision VC 4 6 251 V vcmpequb[.] Vector Compare Equal To Unsigned Byte VC 4 70 251 V vcmpequh[.] Vector Compare Equal To Unsigned Halfword VC 4 134 252 V vcmpequw[.] Vector Compare Equal To Unsigned Word VC 4 454 266 V vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Pre- cision VC 4 710 266 V vcmpgtfp[.] Vector Compare Greater Than Single-Precision VC 4 774 252 V vcmpgtsb[.] Vector Compare Greater Than Signed Byte VC 4 838 252 V vcmpgtsh[.] Vector Compare Greater Than Signed Halfword VC 4 902 252 V vcmpgtsw[.] Vector Compare Greater Than Signed Word VC 4 518 253 V vcmpgtub[.] Vector Compare Greater Than Unsigned Byte VC 4 582 253 V vcmpgtuh[.] Vector Compare Greater Than Unsigned Halfword VC 4 646 253 V vcmpgtuw[.] Vector Compare Greater Than Unsigned Word VX 4 970 262 V vctsxs Vector Convert To Signed Fixed-Point Word Saturate VX 4 906 262 V vctuxs Vector Convert To Unsigned Fixed-Point Word Satu- rate VX 4 394 267 V vexptefp Vector 2 Raised to the Exponent Estimate Floating- Point VX 4 458 267 V vlogefp Vector Log Base 2 Estimate Floating-Point VA 4 46 260 V vmaddfp Vector Multiply-Add Single-Precision VX 4 1034 261 V vmaxfp Vector Maximum Single-Precision VX 4 258 247 V vmaxsb Vector Maximum Signed Byte VX 4 322 247 V vmaxsh Vector Maximum Signed Halfword VX 4 386 247 V vmaxsw Vector Maximum Signed Word VX 4 2 248 V vmaxub Vector Maximum Unsigned Byte VX 4 66 248 V vmaxuh Vector Maximum Unsigned Halfword VX 4 130 248 V vmaxuw Vector Maximum Unsigned Word VA 4 32 238 V vmhaddshs Vector Multiply-High-Add Signed Halfword Saturate VA 4 33 238 V vmhraddshs Vector Multiply-High-Round-Add Signed Halfword Sat- urate VX 4 1098 261 V vminfp Vector Minimum Single-Precision VX 4 770 249 V vminsb Vector Minimum Signed Byte VX 4 834 249 V vminsh Vector Minimum Signed Halfword VX 4 898 249 V vminsw Vector Minimum Signed Word VX 4 514 250 V vminub Vector Minimum Unsigned Byte VX 4 578 250 V vminuh Vector Minimum Unsigned Halfword VX 4 642 250 V vminuw Vector Minimum Unsigned Word VA 4 34 239 V vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo VX 4 12 224 V vmrghb Vector Merge High Byte VX 4 76 224 V vmrghh Vector Merge High Halfword VX 4 140 224 V vmrghw Vector Merge High Word VX 4 268 225 V vmrglb Vector Merge Low Byte 1318 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 332 225 V vmrglh Vector Merge Low Halfword VX 4 396 225 V vmrglw Vector Merge Low Word VA 4 37 240 V vmsummbm Vector Multiply-Sum Mixed Byte Modulo VA 4 40 240 V vmsumshm Vector Multiply-Sum Signed Halfword Modulo VA 4 41 241 V vmsumshs Vector Multiply-Sum Signed Halfword Saturate VA 4 36 239 V vmsumubm Vector Multiply-Sum Unsigned Byte Modulo VA 4 38 241 V vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo VA 4 39 242 V vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate VX 4 776 236 V vmulesb Vector Multiply Even Signed Byte VX 4 840 236 V vmulesh Vector Multiply Even Signed Halfword VX 4 520 236 V vmuleub Vector Multiply Even Unsigned Byte VX 4 584 236 V vmuleuh Vector Multiply Even Unsigned Halfword VX 4 264 237 V vmulosb Vector Multiply Odd Signed Byte VX 4 328 237 V vmulosh Vector Multiply Odd Signed Halfword VX 4 8 237 V vmuloub Vector Multiply Odd Unsigned Byte VX 4 72 237 V vmulouh Vector Multiply Odd Unsigned Halfword VA 4 47 260 V vnmsubfp Vector Negative Multiply-Subtract Single-Precision VX 4 1284 254 V vnor Vector Logical NOR VX 4 1156 254 V vor Vector Logical OR VA 4 43 227 V vperm Vector Permute VX 4 782 219 V vpkpx Vector Pack Pixel VX 4 398 220 V vpkshss Vector Pack Signed Halfword Signed Saturate VX 4 270 220 V vpkshus Vector Pack Signed Halfword Unsigned Saturate VX 4 462 220 V vpkswss Vector Pack Signed Word Signed Saturate VX 4 334 220 V vpkswus Vector Pack Signed Word Unsigned Saturate VX 4 14 221 V vpkuhum Vector Pack Unsigned Halfword Unsigned Modulo VX 4 142 221 V vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate VX 4 78 221 V vpkuwum Vector Pack Unsigned Word Unsigned Modulo VX 4 206 221 V vpkuwus Vector Pack Unsigned Word Unsigned Saturate VX 4 266 268 V vrefp Vector Reciprocal Estimate Single-Precision VX 4 714 264 V vrfim Vector Round to Single-Precision Integer toward -Infin- ity VX 4 522 264 V vrfin Vector Round to Single-Precision Integer Nearest VX 4 650 264 V vrfip Vector Round to Single-Precision Integer toward +Infinity VX 4 586 264 V vrfiz Vector Round to Single-Precision Integer toward Zero VX 4 4 255 V vrlb Vector Rotate Left Byte VX 4 68 255 V vrlh Vector Rotate Left Halfword VX 4 132 255 V vrlw Vector Rotate Left Word VX 4 330 268 V vrsqrtefp Vector Reciprocal Square Root Estimate Single-Preci- sion VA 4 42 227 V vsel Vector Select VX 4 452 228 V vsl Vector Shift Left VX 4 260 256 V vslb Vector Shift Left Byte VA 4 44 228 V vsldoi Vector Shift Left Double by Octet Immediate VX 4 324 256 V vslh Vector Shift Left Halfword VX 4 1036 228 V vslo Vector Shift Left by Octet VX 4 388 256 V vslw Vector Shift Left Word VX 4 524 226 V vspltb Vector Splat Byte VX 4 588 226 V vsplth Vector Splat Halfword VX 4 780 226 V vspltisb Vector Splat Immediate Signed Byte VX 4 844 226 V vspltish Vector Splat Immediate Signed Halfword VX 4 908 226 V vspltisw Vector Splat Immediate Signed Word VX 4 652 226 V vspltw Vector Splat Word VX 4 708 229 V vsr Vector Shift Right VX 4 772 258 V vsrab Vector Shift Right Algebraic Byte VX 4 836 258 V vsrah Vector Shift Right Algebraic Halfword VX 4 900 258 V vsraw Vector Shift Right Algebraic Word VX 4 516 257 V vsrb Vector Shift Right Byte VX 4 580 257 V vsrh Vector Shift Right Halfword VX 4 1100 229 V vsro Vector Shift Right by Octet Appendix I. Power ISA Instruction Set Sorted by Mnemonic 1319 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 644 257 V vsrw Vector Shift Right Word VX 4 1408 233 V vsubcuw Vector Subtract and Write Carry-Out Unsigned Word VX 4 74 259 V vsubfp Vector Subtract Single-Precision VX 4 1792 233 V vsubsbs Vector Subtract Signed Byte Saturate VX 4 1856 233 V vsubshs Vector Subtract Signed Halfword Saturate VX 4 1920 233 V vsubsws Vector Subtract Signed Word Saturate VX 4 1024 234 V vsububm Vector Subtract Unsigned Byte Modulo VX 4 1536 235 V vsububs Vector Subtract Unsigned Byte Saturate VX 4 1088 234 V vsubuhm Vector Subtract Unsigned Halfword Modulo VX 4 1600 234 V vsubuhs Vector Subtract Unsigned Halfword Saturate VX 4 1152 234 V vsubuwm Vector Subtract Unsigned Word Modulo VX 4 1664 235 V vsubuws Vector Subtract Unsigned Word Saturate VX 4 1672 243 V vsum2sws Vector Sum across Half Signed Word Saturate VX 4 1800 244 V vsum4sbs Vector Sum across Quarter Signed Byte Saturate VX 4 1608 244 V vsum4shs Vector Sum across Quarter Signed Halfword Saturate VX 4 1544 244 V vsum4ubs Vector Sum across Quarter Unsigned Byte Saturate VX 4 1928 243 V vsumsws Vector Sum across Signed Word Saturate VX 4 846 222 V vupkhpx Vector Unpack High Pixel VX 4 526 222 V vupkhsb Vector Unpack High Signed Byte VX 4 590 222 V vupkhsh Vector Unpack High Signed Halfword VX 4 974 223 V vupklpx Vector Unpack Low Pixel VX 4 654 223 V vupklsb Vector Unpack Low Signed Byte VX 4 718 223 V vupklsh Vector Unpack Low Signed Halfword VX 4 1220 254 V vxor Vector Logical XOR X 31 62 704 WT wait Wait X 31 131 P 924 E wrtee Write MSR External Enable X 31 163 P 925 E wrteei Write MSR External Enable Immediate X 31 316 SR 80 B xor[.] XOR D 26 79 B xori XOR Immediate D 27 79 B xoris XOR Immediate Shifted XX2 60 690 341 VSX xsabsdp VSX Scalar Absolute Value Double-Precision XX3 60 128 342 VSX xsadddp VSX Scalar Add Double-Precision XX3 60 172 347 VSX xscmpodp VSX Scalar Compare Ordered Double-Precision XX3 60 140 349 VSX xscmpudp VSX Scalar Compare Unordered Double-Precision XX3 60 704 351 VSX xscpsgndp VSX Scalar Copy Sign Double-Precision VSX Scalar Convert Double-Precision to Single- XX2 60 530 352 VSX xscvdpsp Precision VSX Scalar truncate Double-Precision to integer and XX2 60 688 353 VSX xscvdpsxds Convert to Signed Fixed-Point Doubleword format with Saturate VSX Scalar truncate Double-Precision to integer and XX2 60 176 355 VSX xscvdpsxws Convert to Signed Fixed-Point Word format with Saturate VSX Scalar truncate Double-Precision to integer and XX2 60 656 357 VSX xscvdpuxds Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Scalar truncate Double-Precision to integer and XX2 60 144 359 VSX xscvdpuxws Convert to Unsigned Fixed-Point Word format with Saturate VSX Scalar Convert Single-Precision to Double- XX2 60 658 361 VSX xscvspdp Precision format VSX Scalar Convert and round Signed Fixed-Point XX2 60 752 361 VSX xscvsxddp Doubleword to Double-Precision format VSX Scalar Convert and round Unsigned Fixed-Point XX2 60 720 362 VSX xscvuxddp Doubleword to Double-Precision format XX3 60 224 363 VSX xsdivdp VSX Scalar Divide Double-Precision XX3 60 132 365 VSX xsmaddadp VSX Scalar Multiply-Add Type-A Double-Precision XX3 60 164 365 VSX xsmaddmdp VSX Scalar Multiply-Add Type-M Double-Precision XX3 60 640 368 VSX xsmaxdp VSX Scalar Maximum Double-Precision XX3 60 672 370 VSX xsmindp VSX Scalar Minimum Double-Precision XX3 60 196 372 VSX xsmsubadp VSX Scalar Multiply-Subtract Type-A Double-Precision 1320 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VSX Scalar Multiply-Subtract Type-M Double- XX3 60 228 372 VSX xsmsubmdp Precision XX3 60 192 375 VSX xsmuldp VSX Scalar Multiply Double-Precision XX2 60 722 377 VSX xsnabsdp VSX Scalar Negative Absolute Value Double-Precision XX2 60 754 377 VSX xsnegdp VSX Scalar Negate Double-Precision VSX Scalar Negative Multiply-Add Type-A Double- XX3 60 644 378 VSX xsnmaddadp Precision VSX Scalar Negative Multiply-Add Type-M Double- XX3 60 676 378 VSX xsnmaddmdp Precision VSX Scalar Negative Multiply-Subtract Type-A Double- XX3 60 708 383 VSX xsnmsubadp Precision VSX Scalar Negative Multiply-Subtract Type-M XX3 60 740 383 VSX xsnmsubmdp Double-Precision XX2 60 146 386 VSX xsrdpi VSX Scalar Round to Double-Precision Integer VSX Scalar Round to Double-Precision Integer using XX2 60 214 387 VSX xsrdpic Current rounding mode VSX Scalar Round to Double-Precision Integer toward XX2 60 242 388 VSX xsrdpim -Infinity VSX Scalar Round to Double-Precision Integer toward XX2 60 210 388 VSX xsrdpip +Infinity VSX Scalar Round to Double-Precision Integer toward XX2 60 178 389 VSX xsrdpiz Zero XX2 60 180 390 VSX xsredp VSX Scalar Reciprocal Estimate Double-Precision VSX Scalar Reciprocal Square Root Estimate Double- XX2 60 148 391 VSX xsrsqrtedp Precision XX2 60 150 392 VSX xssqrtdp VSX Scalar Square Root Double-Precision XX3 60 160 393 VSX xssubdp VSX Scalar Subtract Double-Precision XX3 60 244 395 VSX xstdivdp VSX Scalar Test for software Divide Double-Precision VSX Scalar Test for software Square Root Double- XX2 60 212 396 VSX xstsqrtdp Precision XX2 60 946 397 VSX xvabsdp VSX Vector Absolute Value Double-Precision XX2 60 818 397 VSX xvabssp VSX Vector Absolute Value Single-Precision XX3 60 384 398 VSX xvadddp VSX Vector Add Double-Precision XX3 60 256 402 VSX xvaddsp VSX Vector Add Single-Precision XX3 60 396 404 VSX xvcmpeqdp VSX Vector Compare Equal To Double-Precision VSX Vector Compare Equal To Double-Precision & XX3 60 908 404 VSX xvcmpeqdp. Record XX3 60 268 405 VSX xvcmpeqsp VSX Vector Compare Equal To Single-Precision VSX Vector Compare Equal To Single-Precision & XX3 60 780 405 VSX xvcmpeqsp. Record VSX Vector Compare Greater Than or Equal To XX3 60 460 406 VSX xvcmpgedp Double-Precision VSX Vector Compare Greater Than or Equal To XX3 60 972 406 VSX xvcmpgedp. Double-Precision & Record VSX Vector Compare Greater Than or Equal To XX3 60 332 407 VSX xvcmpgesp Single-Precision VSX Vector Compare Greater Than or Equal To XX3 60 844 407 VSX xvcmpgesp. Single-Precision & Record XX3 60 428 408 VSX xvcmpgtdp VSX Vector Compare Greater Than Double-Precision VSX Vector Compare Greater Than Double-Precision XX3 60 940 408 VSX xvcmpgtdp. & Record XX3 60 300 409 VSX xvcmpgtsp VSX Vector Compare Greater Than Single-Precision VSX Vector Compare Greater Than Single-Precision & XX3 60 812 409 VSX xvcmpgtsp. Record XX3 60 960 410 VSX xvcpsgndp VSX Vector Copy Sign Double-Precision XX3 60 832 410 VSX xvcpsgnsp VSX Vector Copy Sign Single-Precision VSX Vector round and Convert Double-Precision to XX2 60 786 411 VSX xvcvdpsp Single-Precision format VSX Vector truncate Double-Precision to integer and XX2 60 944 412 VSX xvcvdpsxds Convert to Signed Fixed-Point Doubleword Saturate Appendix I. Power ISA Instruction Set Sorted by Mnemonic 1321 Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VSX Vector truncate Double-Precision to integer and XX2 60 432 414 VSX xvcvdpsxws Convert to Signed Fixed-Point Word Saturate VSX Vector truncate Double-Precision to integer and XX2 60 912 416 VSX xvcvdpuxds Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Vector truncate Double-Precision to integer and XX2 60 400 418 VSX xvcvdpuxws Convert to Unsigned Fixed-Point Word format with Saturate VSX Vector Convert Single-Precision to Double- XX2 60 914 420 VSX xvcvspdp Precision VSX Vector truncate Single-Precision to integer and XX2 60 816 421 VSX xvcvspsxds Convert to Signed Fixed-Point Doubleword format with Saturate VSX Vector truncate Single-Precision to integer and XX2 60 304 423 VSX xvcvspsxws Convert to Signed Fixed-Point Word format with Saturate VSX Vector truncate Single-Precision to integer and XX2 60 784 425 VSX xvcvspuxds Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Vector truncate Single-Precision to integer and XX2 60 272 427 VSX xvcvspuxws Convert to Unsigned Fixed-Point Word Saturate VSX Vector Convert and round Signed Fixed-Point XX2 60 1008 429 VSX xvcvsxddp Doubleword to Double-Precision format VSX Vector Convert and round Signed Fixed-Point XX2 60 880 429 VSX xvcvsxdsp Doubleword to Single-Precision format VSX Vector Convert Signed Fixed-Point Word to XX2 60 496 430 VSX xvcvsxwdp Double-Precision format VSX Vector Convert and round Signed Fixed-Point XX2 60 368 430 VSX xvcvsxwsp Word to Single-Precision format VSX Vector Convert and round Unsigned Fixed-Point XX2 60 976 431 VSX xvcvuxddp Doubleword to Double-Precision format VSX Vector Convert and round Unsigned Fixed-Point XX2 60 848 431 VSX xvcvuxdsp Doubleword to Single-Precision format VSX Vector Convert Unsigned Fixed-Point Word to XX2 60 464 432 VSX xvcvuxwdp Double-Precision format VSX Vector Convert and round Unsigned Fixed-Point XX2 60 336 432 VSX xvcvuxwsp Word to Single-Precision format XX3 60 480 433 VSX xvdivdp VSX Vector Divide Double-Precision XX3 60 352 435 VSX xvdivsp VSX Vector Divide Single-Precision XX3 60 388 437 VSX xvmaddadp VSX Vector Multiply-Add Type-A Double-Precision XX3 60 260 437 VSX xvmaddasp VSX Vector Multiply-Add Type-A Single-Precision XX3 60 420 440 VSX xvmaddmdp VSX Vector Multiply-Add Type-M Double-Precision XX3 60 292 440 VSX xvmaddmsp VSX Vector Multiply-Add Type-M Single-Precision XX3 60 896 443 VSX xvmaxdp VSX Vector Maximum Double-Precision XX3 60 768 445 VSX xvmaxsp VSX Vector Maximum Single-Precision XX3 60 928 447 VSX xvmindp VSX Vector Minimum Double-Precision XX3 60 800 449 VSX xvminsp VSX Vector Minimum Single-Precision XX3 60 452 451 VSX xvmsubadp VSX Vector Multiply-Subtract Type-A Double-Precision XX3 60 324 451 VSX xvmsubasp VSX Vector Multiply-Subtract Type-A Single-Precision VSX Vector Multiply-Subtract Type-M Double- XX3 60 484 454 VSX xvmsubmdp Precision XX3 60 356 454 VSX xvmsubmsp VSX Vector Multiply-Subtract Type-M Single-Precision XX3 60 448 457 VSX xvmuldp VSX Vector Multiply Double-Precision XX3 60 320 459 VSX xvmulsp VSX Vector Multiply Single-Precision XX2 60 978 461 VSX xvnabsdp VSX Vector Negative Absolute Value Double-Precision XX2 60 850 461 VSX xvnabssp VSX Vector Negative Absolute Value Single-Precision XX2 60 1010 462 VSX xvnegdp VSX Vector Negate Double-Precision XX2 60 882 462 VSX xvnegsp VSX Vector Negate Single-Precision VSX Vector Negative Multiply-Add Type-A Double- XX3 60 900 463 VSX xvnmaddadp Precision 1322 Power ISATM Book Appendices Version 2.06 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VSX Vector Negative Multiply-Add Type-A Single- XX3 60 772 463 VSX xvnmaddasp Precision VSX Vector Negative Multiply-Add Type-M Double- XX3 60 932 468 VSX xvnmaddmdp Precision VSX Vector Negative Multiply-Add Type-M Single- XX3 60 804 468 VSX xvnmaddmsp Precision VSX Vector Negative Multiply-Subtract Type-A XX3 60 964 471 VSX xvnmsubadp Double-Precision VSX Vector Negative Multiply-Subtract Type-A Single- XX3 60 836 471 VSX xvnmsubasp Precision VSX Vector Negative Multiply-Subtract Type-M XX3 60 996 474 VSX xvnmsubmdp Double-Precision VSX Vector Negative Multiply-Subtract Type-M Single- XX3 60 868 474 VSX xvnmsubmsp Precision XX2 60 402 477 VSX xvrdpi VSX Vector Round to Double-Precision Integer VSX Vector Round to Double-Precision Integer using XX2 60 470 478 VSX xvrdpic Current rounding mode VSX Vector Round to Double-Precision Integer toward XX2 60 498 478 VSX xvrdpim -Infinity VSX Vector Round to Double-Precision Integer toward XX2 60 466 479 VSX xvrdpip +Infinity VSX Vector Round to Double-Precision Integer toward XX2 60 434 479 VSX xvrdpiz Zero XX2 60 436 480 VSX xvredp VSX Vector Reciprocal Estimate Double-Precision XX2 60 308 481 VSX xvresp VSX Vector Reciprocal Estimate Single-Precision XX2 60 274 482 VSX xvrspi VSX Vector Round to Single-Precision Integer VSX Vector Round to Single-Precision Integer using XX2 60 342 482 VSX xvrspic Current rounding mode VSX Vector Round to Single-Precision Integer toward - XX2 60 370 483 VSX xvrspim Infinity VSX Vector Round to Single-Precision Integer toward XX2 60 338 483 VSX xvrspip +Infinity VSX Vector Round to Single-Precision Integer toward XX2 60 306 484 VSX xvrspiz Zero VSX Vector Reciprocal Square Root Estimate Double- XX2 60 404 485 VSX xvrsqrtedp Precision VSX Vector Reciprocal Square Root Estimate Single- XX2 60 276 486 VSX xvrsqrtesp Precision XX2 60 406 487 VSX xvsqrtdp VSX Vector Square Root Double-Precision XX2 60 278 488 VSX xvsqrtsp VSX Vector Square Root Single-Precision XX3 60 416 489 VSX xvsubdp VSX Vector Subtract Double-Precision XX3 60 288 491 VSX xvsubsp VSX Vector Subtract Single-Precision XX3 60 500 493 VSX xvtdivdp VSX Vector Test for software Divide Double-Precision XX3 60 372 494 VSX xvtdivsp VSX Vector Test for software Divide Single-Precision VSX Vector Test for software Square Root Double- XX2 60 468 495 VSX xvtsqrtdp Precision VSX Vector Test for software Square Root Single- XX2 60 340 495 VSX xvtsqrtsp Precision XX3 60 520 496 VSX xxland VSX Logical AND XX3 60 552 496 VSX xxlandc VSX Logical AND with Complement XX3 60 648 497 VSX xxlnor VSX Logical NOR XX3 60 584 497 VSX xxlor VSX Logical OR XX3 60 616 498 VSX xxlxor VSX Logical XOR XX3 60 72 499 VSX xxmrghw VSX Merge High Word XX3 60 200 499 VSX xxmrglw VSX Merge Low Word XX3 60 40 500 VSX xxpermdi VSX Permute Doubleword Immediate XX4 60 24 500 VSX xxsel VSX Select XX3 60 8 501 VSX xxsldwi VSX Shift Left Double by Word Immediate XX3 60 328 501 VSX xxspltw VSX Splat Word Appendix I. Power ISA Instruction Set Sorted by Mnemonic 1323 Version 2.06 1 See the key to the mode dependency and privilege columns on page 1324 and the key to the category column in Section 1.3.5 of Book I. Mode Dependency and Privilege Abbreviations Except as described below and in Section 1.10.3, "Effective Address Calculation", in Book I, all instructions are inde- pendent of whether the processor is in 32-bit or 64-bit mode. Key to Mode Dependency Column Mode Dep. Description CT If the instruction tests the Count Register, it tests the low-order 32 bits in 32-bit mode and all 64 bits in 64-bit mode. SR The setting of status registers (such as XER and CR0) is mode-dependent. 32 The instruction can be executed only in 32-bit mode. 64 The instruction can be executed only in 64-bit mode. Key to Privilege Column Priv. Description P Denotes a privileged instruction. O Denotes an instruction that is treated as privi- leged or nonprivileged (or hypervisor, for mtspr), depending on the SPR or PMR num- ber. H Denotes an instruction that can be executed only in hypervisor state M Denotes an instruction that is treated as privi- leged or nonprivileged, depending on the value of the UCLE bit in the MSR. 1324 Power ISATM Book Appendices Version 2.06 Index A BA instruction field 1126 BB field 18 a bit 32 BC field 18 A-form 17 BD field 18 AA field 18 BD instruction field 1126 address 24 BE effective 27 See Machine State Register effective address 770, 940 BF field 19 real 770, 943 BF instruction field 1126 address compare 770, 832, 840 BFA field 19 address translation 787, 952 BFA instruction field 1126 EA to VA 773 BH field 19 esid to vsid 773 BI field 19, 21 overview 777 block 654 PTE BO field 19, 32 page table entry 782, 787, 958 boundedly undefined 4 Reference bit 787 Branch Trace 839 RPN Bridge 808 real page number 780 Segment Registers 808 VA to RA 780 SR 808 VPN brinc 510 virtual page number 780 BT field 19 32-bit mode 773 bytes 4 address wrap 770, 943 addresses accessed by processor 777 C implicit accesses 777 C 108 interrupt vectors 777 CA 42 with defined uses 777 cache management instructions 679 addressing mode cache model 656 D-mode 1129 cache parameters 677 aliasing 660 Caching Inhibited 657 alignment Change bit 787 effect on performance 671, 855, 1077 CIA 7 Alignment interrupt 835, 881, 1039 Come-From Address Register 761, 1225 assembler language consistency 660 extended mnemonics 627, 867, 1109 context mnemonics 627, 867, 1109 definition 727, 890 symbols 627, 867, 1109 synchronization 729, 892 atomic operation 662 Control Register 754 atomicity 655 Count Register 761, 918, 1136, 1225 single-copy 655 CR 30 Auxiliary Processor 4 Critical Input interrupt 1034 Auxiliary Processor Unavailable interrupt 1042 Critical Save/Restore Register 1 1016 CSRR1 1016 B CTR 31, 1136 CTRL B-form 15 See Control Register BA field 18 Current Instruction Address 745, 908, 1140 Index 1325 Version 2.06 D eciwx instruction 715, 716, 832, 835, 840, 859 ecowx instruction 715, 716, 832, 835, 840, 859 D field 19 EE D instruction field 1126 See Machine State Register D-form 15 effective address 27, 770, 777, 940 D-mode addressing mode 1129 size 773 DABR interrupt 855 translation 778 DABR(X) Effective Address Overflow 840 See Data Breakpoint Register (Extension) eieio instruction 660, 703, 961 DAR emulation assist 728, 891 See Data Address Register Endianness 658 data access 770, 943 EQ 30, 31 Data Address Breakpoint Register (Extension) 737, ESR 1019, 1020 762, 856, 862, 1227 evabs 510 data address compare 832, 840 evaddiw 510 Data Address Register 761, 822, 833, 834, 836, 840, evaddsmiaaw 510 1225 evaddssiaaw 511 data cache instructions 56, 681 evlwhex 519 Data Exception Address Register 1017 exception 1014 data exception address register 1017 alignment exception 1039 Data Segment interrupt 833 critical input exception 1034 data storage 653 data storage exception 1035 Data Storage interrupt 832, 840, 1035 external input exception 1039 Data Storage Interrupt Status Register 761, 822, 823, illegal instruction exception 1040 832, 836, 837, 840, 881, 1225 instruction storage exception 1037 Alignment interrupt 881 instruction TLB miss exception 1045 Data TLB Error interrupt 1044 machine check exception 1034 dcba instruction 687, 988 privileged instruction exception 1041 dcbf instruction 691 program exception 1040 dcbst instruction 667, 691, 832, 840 system call exception 1042, 1051, 1052 dcbt instruction 688, 931, 991 trap exception 1041 dcbtls 992 exception priorities 1059 dcbtst instruction 689, 933, 991 system call instruction 1060 dcbz instruction 690, 800, 832, 836, 840, 881, 934, trap instructions 1060 988 Exception Syndrome Register 1019, 1020 DEAR 1017 exception syndrome register 1019, 1020 Debug Interrupt 1046 exception vector prefix register 1018 DEC Exceptions 1013 See Decrementer exceptions Decrementer 761, 850, 918, 1069, 1225 address compare 770, 832, 840 Decrementer Interrupt 1043 definition 728, 890 Decrementer interrupt 765, 766, 839 Effective Address Overflow 840 defined instructions 21 page fault 770, 786, 832, 840, 942 denormalization 112, 285 protection 770, 942 denormalized number 110, 284 segment fault 770 double-precision 112 storage 770, 942 doublewords 4 execution synchronization 730, 893 DQ field 19 extended mnemonics 717 DQ-form 15 External Access Register 761, 832, 840, 859, 862, DR 918, 1225 See Machine State Register External Control 715 DS field 19 External Control instructions DS-form 15 eciwx 716 DSISR ecowx 716 See Data Storage Interrupt Status Register External Input interrupt 1039 External interrupt 765, 766, 835 E F E (Enable bit) 859 EA 27 FE 31, 108, 278 1326 Power ISATM Book Appendices Version 2.06 FEX 107 VXIMZ 108 FE0 VXISI 108, 277 See Machine State Register VXSNAN 108, 277 FE1 VXSOFT 108 See Machine State Register VXSQRT 108, 278 FG 31, 108, 278 VXVC 108 FI 108 VXZDZ 108, 277 Fixed-Interval Timer interrupt 1043 XE 109 Fixed-Point Exception Register 761, 918, 1225 XX 107 FL 30, 108, 278 ZE 109 FLM field 19 ZX 107 floating-point FR 108 denormalization 112, 285 FRA field 19 double-precision 112 FRB field 19 exceptions 106, 114, 282 FRC field 19 inexact 119, 308 FRS field 19 invalid operation 116, 295 FRT field 19 overflow 117, 304 FU 31, 108 underflow 118, 306 FX 107, 276 zero divide 117, 302 FXM field 19 execution models 119, 289 FXM instruction field 1126 normalization 112, 285 number denormalized 110, 284 G infinity 111, 284 GPR 42 normalized 110, 284 GT 30, 31 not a number 111, 284 Guarded 658 zero 110, 284 rounding 113, 287 sign 111, 285 H single-precision 112 Floating-Point Unavailable interrupt 838, 843, 1042 halfwords 4 forward progress 665 hardware FP definition 728, 891 See Machine State Register hardware description language 7 FPCC 108, 278 hashed page table FPR 106 size 784 FPRF 108 HDEC FPSCR 107, 276 See Hypervisor Decrementer C 108 HDICE FE 108, 278 See Logical Partitioning Control Register FEX 107 HEIR FG 108, 278 See Hypervisor Emulated Instruction Register FI 108 hrfid instruction 741, 846 FL 108, 278 HRMOR FPCC 108, 278 See Hypervisor Real Mode Offset Register FPRF 108 HSPRGn FR 108 See software-use SPRs FU 108 HTABORG 784 FX 107, 276 HTABSIZE 784 NI 109, 279, 280 HV OE 109, 279 See Machine State Register OX 107, 276 hypervisor 731, 899 RN 109, 280 Hypervisor Decrementer 738, 761, 851, 862, 1226 UE 109, 279 Hypervisor Decrementer interrupt 839 UX 107 Hypervisor Emulated Instruction Register 762, 823, VE 109, 279 1226 VX 107, 276 Hypervisor Machine Status Save Restore Register VXCVI 108, 278 See HSRR0, HSRR1 VXIDI 108, 277 Hypervisor Machine Status Save Restore Register Index 1327 Version 2.06 0 822 NB 20 Hypervisor Real Mode Offset Register 675, 734, 735, OE 20 862 PMRN 20 RA 20 RB 20, 21 I Rc 20 I-form 15 RS 20 icbi instruction 667, 680, 832, 840 RT 20 icbt instruction 680 SH 21 ILE SI 21 See Logical Partitioning Control Register SPR 21 illegal instructions 22 SR 21 implicit branch 770, 942 TBR 21 imprecise interrupt 824, 1026 TH 21 in-order operations 770, 943 TO 21 inexact 119, 308 U 21 infinity 111, 284 UI 21 instruction 832, 840 formats 15­?? field A-form 17 BA 1126 B-form 15 BD 1126 D-form 15 BF 1126 DQ-form 15 BFA 1126 DS-form 15 D 1126 I-form 15 FXM 1126 M-form 17 L 1126 MD-form 17 LK 1126 MDS-form 17 Rc 1126 SC-form 15 SH 1127, 1131 VA-form 17 SI 1127 VX-form 18 UI 1127 X-form 16 WS 1127 XFL-form 16 fields 18­?? XFX-form 16 AA 18 XL-form 16 BA 18 XO-form 17 BB 18 XS-form 17 BC 18 interrupt control 1140 BD 18 mtmsr 923 BF 19 partially executed 1054 BFA 19 rfci 1142 BH 19 instruction cache instructions 680 BI 19, 21 instruction fetch 770, 942 BO 19 effective address 770, 942 BT 19 implicit branch 770, 942 D 19 Instruction Fields 1126 DQ 19 instruction restart 673 DS 19 Instruction Segment interrupt 834, 842 FLM 19 instruction storage 653 FRA 19 Instruction Storage interrupt 834, 1037 FRB 19 Instruction TLB Error Interrupt 1045 FRC 19 instruction-caused interrupt 824 FRS 19 Instructions FRT 19 brinc 510 FXM 19 dcbtls 992 L 20 evabs 510 LEV 20 evaddiw 510 LI 20 evaddsmiaaw 510 LK 20 evaddssiaaw 511 MB 20 evlwhex 519 ME 20 instructions 1328 Power ISATM Book Appendices Version 2.06 classes 21 stwx 881 dcba 687, 988 sync 667, 701, 730, 787 dcbf 691 tlbia 787, 817 dcbst 667, 691, 832, 840 tlbie 787, 811, 817, 819, 962 dcbt 688, 931, 991 tlbiel 814 dcbtst 689, 933, 991 tlbsync 817, 961 dcbz 690, 800, 836, 881, 934, 988 wrtee 924 defined 21 wrteei 925 forms 23 interrupt 1014 eciwx 715, 716, 832, 835, 840, 859 Alignment 835, 881 ecowx 715, 716, 832, 835, 840, 859 alignment interrupt 1039 eieio 660, 703, 961 DABR 855 hrfid 741, 846 Data Segment 833 icbi 667, 680, 832, 840 Data Storage 832, 840 icbt 680 data storage interrupt 1035 illegal 22 Decrementer 765, 766, 839 invalid forms 23 definition 728, 890, 891 isync 667, 693 External 765, 766, 835 ldarx 662, 699, 832, 835, 836, 840 external input interrupt 1039 lmw 835, 836 Floating-Point Unavailable 838, 843 lookaside buffer 801 Hypervisor Decrementer 839 lq 759, 835 imprecise 824, 1026 lswi 836 instruction lswx 836 partially executed 1054 lwa 837 Instruction Segment 834, 842 lwarx 662, 694, 695, 832, 835, 836, 840 Instruction Storage 834, 1037 lwaux 837 instruction storage interrupt 1037 lwsync 701 instruction TLB miss interrupt 1045 mbar 703 instruction-caused 824 mfmsr 741, 767, 924 Machine Check 831 mfspr 764, 922 machine check interrupt 1034 mfsr 810 masking 1055 mfsrin 810 guidelines for system software 1058 mftb 708 new MSR 828 mtmsr 741, 765, 846 ordering 1055, 1058 mtmsrd 741, 766, 846 guidelines for system software 1058 address wrap 770, 943 overview 821 mtspr 763, 921 Performance Monitor 843 mtsr 809 precise 824, 1026 mtsrin 809 priorities 845, 846 optional processing 825 See optional instructions Program 837 preferred forms 23 program interrupt 1040 ptesync 701, 730, 961 illegal instruction exception 1040 reserved 22 privileged instruction exception 1041 rfci 909 trap exception 1041 rfid 667, 741, 746, 827, 846 recoverable 827 rfmci 910 synchronization 824 sc 712, 713, 714, 745, 748, 839, 908 System Call 839 slbia 804, 807 system call interrupt 1042, 1051, 1052 slbie 802 System Reset 829 slbmfee 806 system-caused 824 slbmfev 806 Trace 839 slbmte 805 type stdcx. 662, 832, 835, 836, 840 Alignment 1039 stmw 835 Auxiliary Processor Unavailable 1042 storage control 677, 800, 988 Critical Input 1034 stq 759, 835 Data Storage 1035 stw 881 Data TLB Error 1044 stwcx. 662, 697, 698, 699, 832, 835, 836, 840 Debug 1046 Index 1329 Version 2.06 Decrementer 1043 LI field 20 External Input 1039 Link Register 761, 918, 1136, 1225 Fixed-Interval Timer 1043 LK field 20 Floating-Point Unavailable 1042 LK instruction field 1126 Instruction TLB Error 1045 lmw instruction 835, 836 Machine Check 1034 Logical Partition Identification Register 734 Program interrupt 1040 Logical Partitioning 731, 899 System Call 1042, 1051, 1052 Logical Partitioning Control Register 678, 731, 762, Watchdog Timer 1044 801, 862, 1226 vector 825, 829 HDICE Hypervisor Decrementer Interrupt Condition- interrupt and exception handling registers ally Enable 733, 739, 765, 766, 839, 863 DEAR 1017 ILE Interrupt Little-Endian 732, 828 ESR 1019, 1020 ISL Ignore Large Page Specification 732 ivpr 1018 ISL Ignore SLB Large Page Specification 732 interrupt classes LPES Logical Partitioning Environment asynchronous 1025 Selector 733, 739, 745, 774, 776, 794, 795, 828, critical,non-critical 1026 865 machine check 1026 RMI Real Mode Caching Inhibited Bit 733, 865 synchronous 1025 RMLS Real Mode Offset Selector 732, 865 interrupt control instructions 1140 VC 865 mtmsr 923 VC Virtualization Control 731 rfci 1142 VPM Virtualized Partition Memory 732 interrupt processing 1027 VRMASD 865 interrupt vector 1027 VRMASD Virtual Real Mode Area Segment interrupt vector 1027 Descriptor 732 Interrupt Vector Offset Register 36 920, 1227 lookaside buffer 801 Interrupt Vector Offset Register 37 919, 920, 1226, LPAR (see Logical Partitioning) 731, 899 1227 LPCR Interrupt Vector Offset Registers 1020, 1021 See Logical Partitioning Control Register Interrupt Vector Prefix Register 1018 LPES Interrupts 1013 See Logical Partitioning Control Register invalid instruction forms 23 LPIDR invalid operation 116, 295 See Logical Partition Identification Register IR lq instruction 759, 835 See Machine State Register LR 31, 1136 ISL lswi instruction 836 See Logical Partitioning Control Register lswx instruction 836 isync instruction 667, 693 LT 30 IVORs 1020, 1021 lwa instruction 837 IVPR 1018 lwarx instruction 662, 694, 695, 832, 835, 836, 840 ivpr 1018 lwaux instruction 837 lwsync instruction 701 K M K bits 790 key, storage 790 M-form 17 Machine 903 Machine Check 1026 L Machine Check interrupt 831, 1034 dcbf 832, 840 Machine State Register 741, 745, 765, 766, 767, 825, instructions 827, 828, 903, 924 dcbf 832, 840 BE Branch Trace Enable 743 L field 20 DR Data Relocate 743 L instruction field 1126 EE External Interrupt Enable 742, 765, 766 language used for instruction operation description 7 FE0 FP Exception Mode 743 ldarx instruction 662, 699, 832, 835, 836, 840 FE1 FP Exception Mode 743 LE FP FP Available 742 See Machine State Register HV Hypervisor State 741 LEV field 20 IR Instruction Relocate 743 LE Little-Endian Mode 743 1330 Power ISATM Book Appendices Version 2.06 ME Machine Check Enable 742 slbia 804, 807 PMMPerformance Monitor Mark 743, 870 slbie 802 PR Problem State 742 tlbia 817 RI Recoverable Interrupt 743, 765, 766 tlbie 811 SE Single-Step Trace Enable 743 tlbiel 814 SF Sixty Four Bit mode 741, 770, 943 tlbsync 817 VEC Vector Avaialable 742 out-of-order operations 770, 943 Machine Status Save Restore Register OV 42 See SRR0, SRR1 overflow 117, 304 Machine Status Save Restore Register 0 822, 825, OX 107, 276 827 Machine Status Save Restore Register 1 825, 827, 838 P main storage 653 page 654 MB field 20 size 773 mbar instruction 703 page fault 770, 786, 832, 840, 942 MD-form 17 page table MDS-form 17 search 785 ME update 961 See Machine State Register page table entry 782, 787, 958 ME field 20 Change bit 787 memory barrier 660 PP bits 790 Memory Coherence Required 657 Reference bit 787 mfmsr instruction 741, 767, 924 update 819, 961, 962 mfspr instruction 764, 922 partially executed instructions 1054 mfsr instruction 810 partition 731, 899 mfsrin instruction 810 Performance Monitor interrupt 843 mftb instruction 708 performed 654 Mnemonics 1124 PID 973 mnemonics PMM extended 627, 867, 1109 See Machine State Register mode change 770, 943 PMRN field 20 move to machine state register 923 PP bits 790 MSR PR See Machine State Register See Machine State Register mtmsr 923 precise interrupt 824, 1026 mtmsr instruction 741, 765, 846 preferred instruction forms 23 mtmsrd instruction 741, 766, 846 priority of interrupts 845, 846 mtspr instruction 763, 921 Process ID Register 973 mtsr instruction 809 Processor Utilization of Resources Register 761, 852, mtsrin instruction 809 853, 1226 Processor Version Register 753, 913 N Program interrupt 837, 1040 program order 653 NB field 20 Program Priority Register 675, 762, 920, 1227 Next Instruction Address 745, 746, 908, 909, 910, protection boundary 790, 837 911, 1140, 1143 protection domain 790 NI 109, 279, 280 PTE 785 NIA 8 See also page table entry no-op 78 PTEG 785 normalization 112, 285 ptesync instruction 701, 730, 961 normalized number 110, 284 PURR not a number 111, 284 See Processor Utilization of Resources Register PVR See Processor Version Register O OE 109, 279 Q OE field 20 optional instructions 801 quadwords 4 Index 1331 Version 2.06 R General Purpose Registers 42 HDEC RA field 20 Hypervisor Decrementer 738, 761, 851, 862, RB field 20, 21 1226 RC bits 787 HEIR Rc field 20 Hypervisor Emulated Instruction Register 762, Rc instruction field 1126 823, 1226 real address 777 HRMOR Real Mode Offset Register 734, 862, 899, 900 Hypervisor Real Mode Offset Register 675, 734, real page 735, 862 definition 727, 890 HSPRGn real page number 782, 958 software-use SPRs 755 recoverable interrupt 827 HSRR0 reference and change recording 787 Hypervisor Machine Status Save Restore Regis- Reference bit 787 ter 0 822 register IVOR36 CSRR1 1016 Interrupt Vector Offset Register 36 920, 1227 CTR 1136 IVOR37 DEAR 1017 Interrupt Vector Offset Register 37 919, 920, ESR 1019, 1020 1226, 1227 IVORs 1020, 1021 Link Register 31 IVPR 1018 LPCR ivpr 1018 Logical Partitioning Control Register 678, 731, LR 1136 762, 801, 862, 1226 PID 973 LPIDR SRR0 1014, 1015 Logical Partition Identification Register 734 SRR1 1015 LR register transfer level language 7 Link Register 761, 918, 1225 Registers MSR implementation-specific Machine State Register 741, 745, 765, 766, MMCR1 1116 767, 825, 827, 828, 903, 924 supervisor-level PPR MMCR1 1116 Program Prioirty Register 675, 762, 920, 1227 registers PURR CFAR Processor Utilization of Resources Come-From Address Register 761, 1225 Register 761, 852, 853, 1226 Condition Register 30 PVR Count Register 31 Processor Version Register 753, 913 CTR RMOR Count Register 761, 918, 1225 Real Mode Offset Register 734, 862, 899, 900 CTRL SDR1 Control Register 754 Storage Description Register 1 761, 784, 1225 DABR(X) Storage DescriptionRegister 1 862 Data Address Breakpoint Register SPRGn (Extension) 737, 762, 856, 862, 1227 software-use SPRs 761, 918, 1225 DAR SPRs Data Address Register 761, 822, 833, 834, 836, Special Purpose Registers 760 840, 1225 SRR0 DEC Machine Status Save Restore Register 0 822, Decrementer 761, 850, 918, 1069, 1225 825, 827 DSISR SRR1 Data Storage Interrupt Status Register 761, Machine Status Save Restore Register 1 825, 822, 823, 832, 836, 837, 840, 881, 1225 827, 838 EAR TB External Access Register 761, 832, 840, 859, Time Base 849, 1067 862, 918, 1225 TBL Fixed-Point Exception Register 42 Time Base Lower 761, 849, 919, 1067, 1225 Floating-Point Registers 106 TBU Floating-Point Status and Control Register 107, Time Base Upper 761, 849, 919, 1067, 1225 276 Time Base 707, 711 1332 Power ISATM Book Appendices Version 2.06 XER slbie instruction 802 Fixed-Point Exception Register 743, 761, 918, slbmfee instruction 806 1085, 1225 slbmfev instruction 806 relocation slbmte instruction 805 data 770, 943 SO 30, 31, 42 reserved field 6, 728, 729 software-use SPRs 761, 918, 1225 reserved instructions 22 Special Purpose Registers 760 return from critical interrupt 1142 speculative operations 770, 943 rfci 1142 split field notation 15 rfci instruction 909 SPR field 21 rfid instruction 667, 741, 746, 827, 846 SR 808 rfmci instruction 910 SR field 21 RI SRR0 1014, 1015 See Machine State Register SRR1 1015 RID (Resource ID) 859 stdcx. instruction 662, 832, 835, 836, 840 RMI stmw instruction 835 See Logical Partitioning Control Register storage RMLS access order 660 See Logical Partitioning Control Register accessed by processor 777 RMOR atomic operation 662 See Real Mode Offset Register attributes RN 109, 280 Endianness 658 rounding 113, 287 implicit accesses 777 RS field 20 instruction restart 673 RT field 20 interrupt vectors 777 RTL 7 N 785 No-execute 785 order 660 S ordering 660, 701, 703 Save/Restore Register 0 1014, 1015 protection Save/Restore Register 1 1015 translation disabled 794 sc instruction 712, 713, 714, 745, 748, 839, 908 reservation 663 SC-form 15 shared 660 SDR1 with defined uses 777 See Storage Description Register 1 storage access 653 SE definitions See Machine State Register program order 653 segment floating-point 123, 310 size 773 storage access ordering 719 type 773 storage address 24 Segment Lookaside Buffer storage control See SLB instructions 800, 988 Segment Registers 808 storage control attributes 656 Segment Table storage control instructions 677 bridge 808 Storage Description Register 1 761, 784, 862, 1225 sequential execution model 29 storage key 790 definition 728, 890 storage location 653 SF storage operations See Machine State Register in-order 770, 943 SH field 21 out-of-order 770, 943 SH instruction field 1127, 1131 speculative 770, 943 SI field 21 storage protection 790 SI instruction field 1127 string instruction 966 sign 111, 285 TLB management 993 single-copy atomicity 655 stq instruction 759, 835 single-precision 112 string instruction 966 Single-Step Trace 839 stw instruction 881 SLB 778, 801 stwcx. instruction 662, 697, 698, 699, 832, 835, 836, entry 779 840 slbia instruction 804, 807 stwx instruction 881 Index 1333 Version 2.06 symbols 627, 867, 1109 generation 778 sync instruction 667, 701, 730, 787 size 773 synchronization 729, 892, 961 virtual page number 782, 958 context 729, 892 virtual storage 655 execution 730, 893 VPM interrupts 824 See Logical Partitioning Control Register Synchronize 660 VRMASD Synchronous 1025 See Logical Partitioning Control Register system call instruction 1060 VX 107, 276 System Call interrupt 839, 1042, 1051, 1052 VX-form 18 System Reset interrupt 829 VXCVI 108, 278 system-caused interrupt 824 VXIDI 108, 277 VXIMZ 108 VXISI 108, 277 T VXSNAN 108, 277 t bit 32 VXSOFT 108 table update 961 VXSQRT 108, 278 TB 707, 711 VXVC 108 TBL 707, 711 VXZDZ 108, 277 TBR field 21 TH field 21 W Time Base 707, 711, 849, 1067 Time Base Lower 761, 849, 919, 1067, 1225 Watchdog Timer interrupt 1044 Time Base Upper 761, 849, 919, 1067, 1225 words 4 TLB 786, 801, 944 Write Through Required 657 TLB management 993 wrtee instruction 924 tlbia instruction 787, 817 wrteei instruction 925 tlbie instruction 787, 811, 817, 819, 962 WS instruction field 1127 tlbiel instruction 814 tlbsync instruction 817, 961 TO field 21 X Trace interrupt 839 X-form 16 Translation Lookaside Buffer 944 XE 109 translation lookaside buffer 786 XER 42, 743, 1085 trap instructions 1060 XFL-form 16 trap interrupt XFX-form 16 definition 728, 890 XL-form 16 XO-form 17 U XS-form 17 XX 107 U field 21 UE 109, 279 UI field 21 Z UI instruction field 1127 z bit 32 UMMCR1 (user monitor mode control register 1) 1116 ZE 109 undefined 7 zero 110, 284 boundedly 4 zero divide 117, 302 underflow 118, 306 ZX 107 UX 107 Numerics V 2 732 VA-form 17 32-bit mode 773 VC See Logical Partitioning Control Register VE 109, 279 VEC See Machine State Register virtual address 777, 780 1334 Power ISATM Book Appendices Version 2.06 Last Page - End of Document Last Page - End of Document 1335 Version 2.06 1336 Power ISATM Book Appendices