® Power ISATM Version 2.04 April 3, 2007 Version 2.04 The following paragraph does not apply to the United Kingdom or any country or state where such provisions are inconsistent with local law. The specifications in this manual are subject to change without notice. This manual is provided "AS IS". Inter- national Business Machines Corp. makes no warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. International Business Machines Corp. does not war- rant that the contents of this publication or the accom- panying source code examples, whether individually or as one or more groups, will meet your requirements or that the publication or the accompanying source code examples are error-free. This publication could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorpo- rated in new editions of the publication. Address comments to IBM Corporation, 11400 Burnett Road, Austin, Texas 78758-3493. IBM may use or dis- tribute whatever information you supply in any way it believes appropriate without incurring any obligation to you. The following terms are trademarks of the International Business Machines Corporation in the United States and/or other countries: IBM PowerPC RISC/System 6000 POWER POWER2 POWER4 POWER4+ POWER5 IBM System/370 IBM System z The POWER ARCHITECTURE and POWER.ORG. word marks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org. AltiVec is a trademark of Freescale Semiconductor, Inc. used under license. Notice to U.S. Government Users--Documentation Related to Restricted Rights--Use, duplication or dis- closure is subject to restrictions set fourth in GSA ADP Schedule Contract with IBM Corporation. © Copyright International Business Machines Corpora- tion, 1994, 2007. All rights reserved. ii Power ISATM Version 2.04 Preface The roots of the Power ISA (Instruction Set Architec- Book II, Power ISA Virtual Environment Architecture, ture) extend back over a quarter of a century, to IBM defines the storage model and related instructions and Research. The POWER (Performance Optimization facilities available to the application programmer. With Enhanced RISC) Architecture was introduced with Book III-S, Power ISA Operating Environment Architec- the RISC System/6000 product family in early 1990. In ture, defines the supervisor instructions and related 1991, Apple, IBM, and Motorola began the collabora- facilities used for general purpose implementations. It tion to evolve to the PowerPC Architecture, expanding consists mainly of the contents of Book III from Pow- the architecture's applicability. In 1997, Motorola and erPC Version 2.02, with the addition of significant new IBM began another collaboration, focused on optimiz- large page and big segment support. ing PowerPC for embedded systems, which produced Book E. Book III-E, Power ISA Operating Environment Architec- ture, defines the supervisor instructions and related In 2006, Freescale and IBM collaborated on the cre- facilities used for embedded implementations. It was ation of the Power ISA Version 2.03, which represented derived from Book E and extended to include APU the reunification of the architecture by combining Book function. E content with the more general purpose PowerPC Ver- sion 2.02. A significant benefit of the reunification is the Book VLE, Power ISAVariable Length Encoded Instruc- establishment of a single, compatible, 64-bit program- tions Architecture, defines alternative instruction ming model. The combining also extends explicit archi- encodings and definitions intended to increase instruc- tectural endorsement and control to Auxiliary tion density for very low end implementations. It was Processing Units (APUs), units of function that were derived from an APU description developed by Frees- originally developed as implementation- or product fam- cale Semiconductor. ily-specific extensions in the context of the Book E allo- cated opcode space. With the resulting architectural As used in this document, the term "Power ISA" refers superset comes a framework that clearly establishes to the instructions and facilities described in Books I, II, requirements and identifies options. III-S, III-E, and VLE. To a very large extent, application program compatibil- Usage of the phrase "Book III" refers to both Book III-S ity has been maintained throughout the history of the and Book III-E. An exception to this rule is when, at the architecture, with the main exception being application beginning of a Section or Book, it is specified that exploitation of APUs. The framework identifies the usage of the phrase "Book III" implies only either "Book base, pervasive, part of the architecture, and differenti- III-S" or "Book III-E". ates it from "categories" of optional function (see Change bars have been included to indicate changes Section 1.3.5 of Book I). Because of the substantial dif- from Version 2.03. ferences in the supervisor (privileged) architecture that developed as Book E was optimized for embedded sys- tems, the supervisor architectures for embedded and general purpose implementations are represented as mutually exclusive categories. Future versions of the architecture will seek to converge on a common solu- tion where possible. This document defines the Power ISA Version 2.04. It is comprised of five books and a set of appendices. Book I, Power ISA User Instruction Set Architecture, covers the base instruction set and related facilities available to the application programmer. It includes five new chapters derived from APU function, including the vector extension also known as Altivec. Preface iii Version 2.04 Summary of Changes in Version 2.04 Version 2.04 of this document differs from the previous version primarily by containing the definitions of the fol- lowing facilities: New Server Page Protection States. An additional state of the page protection bits in the page table entry is defined which can be used to provide privileged pro- grams read only access and problem state programs no access to a virtual page. Server Virtualized Partition Memory. Several new fea- tures are added to enable virtualization of a partition's memory in order to support more partitions and addi- tional concurrent maintenance procedures transpar- ently to operating system code. Server Virtual Page Class Key Protection. A KEY field in the page table entry and associated features are added for the Server environment to facilitate quick modification of access permission for multiple pages at once. Server Time Base Facility - TBU40. Support is added for time base synchronization via this new time base facility, in which only the upper 40 bits of the time base are accessed. Version Verification See the Power ISA representative for your company. iv Power ISATM Version 2.04 Table of Contents 1.6.9 XFX-FORM . . . . . . . . . . . . . . . . . 15 1.6.10 XFL-FORM. . . . . . . . . . . . . . . . . 15 1.6.11 XS-FORM. . . . . . . . . . . . . . . . . . 15 1.6.12 XO-FORM . . . . . . . . . . . . . . . . . 15 1.6.13 A-FORM . . . . . . . . . . . . . . . . . . . 15 Preface. . . . . . . . . . . . . . . . . . . . . . . . . iii 1.6.14 M-FORM . . . . . . . . . . . . . . . . . . 15 Summary of Changes in Version 2.04 . . iv 1.6.15 MD-FORM . . . . . . . . . . . . . . . . . 15 1.6.16 MDS-FORM . . . . . . . . . . . . . . . . 15 Table of Contents . . . . . . . . . . . . . . . . v 1.6.17 VA-FORM . . . . . . . . . . . . . . . . . . 15 1.6.18 VC-FORM . . . . . . . . . . . . . . . . . 15 1.6.19 VX-FORM. . . . . . . . . . . . . . . . . . 16 Figures. . . . . . . . . . . . . . . . . . . . . . . . xix 1.6.20 EVX-FORM . . . . . . . . . . . . . . . . 16 1.6.21 EVS-FORM . . . . . . . . . . . . . . . . 16 Book I: 1.6.22 Instruction Fields . . . . . . . . . . . . 16 1.7 Classes of Instructions . . . . . . . . . . 18 Power ISA User Instruction Set 1.7.1 Defined Instruction Class . . . . . . . 18 1.7.2 Illegal Instruction Class . . . . . . . . 18 Architecture . . . . . . . . . . . . . . . . . . . . 1 1.7.3 Reserved Instruction Class . . . . . 19 1.8 Forms of Defined Instructions . . . . . 19 Chapter 1. Introduction . . . . . . . . . . 3 1.8.1 Preferred Instruction Forms . . . . . 19 1.1 Overview. . . . . . . . . . . . . . . . . . . . . . 3 1.8.2 Invalid Instruction Forms . . . . . . . 19 1.2 Instruction Mnemonics and Operands3 1.9 Exceptions. . . . . . . . . . . . . . . . . . . . 19 1.3 Document Conventions . . . . . . . . . . 3 1.10 Storage Addressing. . . . . . . . . . . . 20 1.3.1 Definitions . . . . . . . . . . . . . . . . . . . 3 1.10.1 Storage Operands . . . . . . . . . . . 20 1.3.2 Notation . . . . . . . . . . . . . . . . . . . . . 4 1.10.2 Instruction Fetches . . . . . . . . . . . 22 1.3.3 Reserved Fields and Reserved Val- 1.10.3 Effective Address Calculation. . . 23 ues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.4 Description of Instruction Operation 7 Chapter 2. Branch Processor. . . . . 25 1.3.5 Categories . . . . . . . . . . . . . . . . . . . 9 2.1 Branch Processor Overview . . . . . . 25 1.3.5.1 Phased-In/Phased-Out . . . . . . . 10 2.2 Instruction Execution Order. . . . . . . 25 1.3.5.2 Corequisite Category . . . . . . . . 10 2.3 Branch Processor Registers . . . . . . 26 1.3.5.3 Category Notation. . . . . . . . . . . 10 2.3.1 Condition Register . . . . . . . . . . . . 26 1.3.6 Environments. . . . . . . . . . . . . . . . 10 2.3.2 Link Register . . . . . . . . . . . . . . . . 27 1.4 Processor Overview . . . . . . . . . . . . 11 2.3.3 Count Register . . . . . . . . . . . . . . . 27 1.5 Computation modes . . . . . . . . . . . . 13 2.4 Branch Instructions . . . . . . . . . . . . . 27 1.5.1 Modes [Category: Server] . . . . . . 13 2.5 Condition Register Instructions . . . . 33 1.5.2 Modes [Category: Embedded]. . . 13 2.5.1 Condition Register Logical Instruc- 1.6 Instruction formats . . . . . . . . . . . . . 13 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.6.1 I-FORM . . . . . . . . . . . . . . . . . . . . 13 2.5.2 Condition Register Field Instruction . 1.6.2 B-FORM . . . . . . . . . . . . . . . . . . . 13 34 1.6.3 SC-FORM . . . . . . . . . . . . . . . . . . 14 2.6 System Call Instruction . . . . . . . . . 35 1.6.4 D-FORM . . . . . . . . . . . . . . . . . . . 14 1.6.5 DS-FORM . . . . . . . . . . . . . . . . . . 14 Chapter 3. Fixed-Point Processor . 37 1.6.6 DQ-FORM . . . . . . . . . . . . . . . . . . 14 3.1 Fixed-Point Processor Overview . . . 37 1.6.7 X-FORM . . . . . . . . . . . . . . . . . . . 14 3.2 Fixed-Point Processor Registers . . . 38 1.6.8 XL-FORM . . . . . . . . . . . . . . . . . . 15 3.2.1 General Purpose Registers . . . . . 38 Table of Contents v Version 2.04 3.2.2 Fixed-Point Exception Register . . .38 4.2.2 Floating-Point Status and Control 3.2.3 Program Priority Register [Category: Register. . . . . . . . . . . . . . . . . . . . . . . . . 95 Server] . . . . . . . . . . . . . . . . . . . . . . . . . .39 4.3 Floating-Point Data . . . . . . . . . . . . . 97 3.2.4 Software Use SPRs [Category: 4.3.1 Data Format. . . . . . . . . . . . . . . . . 97 Embedded] . . . . . . . . . . . . . . . . . . . . . . .39 4.3.2 Value Representation . . . . . . . . . 98 3.2.5 Device Control Registers 4.3.3 Sign of Result . . . . . . . . . . . . . . . 99 [Category: Embedded] . . . . . . . . . . . . . .39 4.3.4 Normalization and 3.3 Fixed-Point Processor Instructions. .40 Denormalization . . . . . . . . . . . . . . . . . 100 3.3.1 Fixed-Point Storage Access Instruc- 4.3.5 Data Handling and Precision . . . 100 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . .40 4.3.5.1 Single-Precision Operands . . . 100 3.3.1.1 Storage Access Exceptions . . . .40 4.3.5.2 Integer-Valued Operands . . . . 101 3.3.2 Fixed-Point Load Instructions . . . .40 4.3.6 Rounding . . . . . . . . . . . . . . . . . . 101 3.3.2.1 64-bit Fixed-Point Load Instruc- 4.4 Floating-Point Exceptions . . . . . . . 102 tions [Category: 64-Bit]. . . . . . . . . . . . . .45 4.4.1 Invalid Operation Exception. . . . 104 3.3.3 Fixed-Point Store Instructions . . . .47 4.4.1.1 Definition. . . . . . . . . . . . . . . . . 104 3.3.3.1 64-bit Fixed-Point Store Instruc- 4.4.1.2 Action . . . . . . . . . . . . . . . . . . . 104 tions [Category: 64-Bit]. . . . . . . . . . . . . .50 4.4.2 Zero Divide Exception . . . . . . . . 105 3.3.4 Fixed-Point Load and Store with Byte 4.4.2.1 Definition. . . . . . . . . . . . . . . . . 105 Reversal Instructions . . . . . . . . . . . . . . .51 4.4.2.2 Action . . . . . . . . . . . . . . . . . . . 105 3.3.5 Fixed-Point Load and Store Multiple 4.4.3 Overflow Exception . . . . . . . . . . 105 Instructions . . . . . . . . . . . . . . . . . . . . . . .52 4.4.3.1 Definition. . . . . . . . . . . . . . . . . 105 3.3.6 Fixed-Point Move Assist Instructions 4.4.3.2 Action . . . . . . . . . . . . . . . . . . . 105 [Category: Move Assist] . . . . . . . . . . . . .54 4.4.4 Underflow Exception . . . . . . . . . 106 3.3.7 Other Fixed-Point Instructions. . . .57 4.4.4.1 Definition. . . . . . . . . . . . . . . . . 106 3.3.8 Fixed-Point Arithmetic Instructions 58 4.4.4.2 Action . . . . . . . . . . . . . . . . . . . 106 3.3.8.1 64-bit Fixed-Point Arithmetic 4.4.5 Inexact Exception . . . . . . . . . . . 107 Instructions [Category: 64-Bit] . . . . . . . .65 4.4.5.1 Definition. . . . . . . . . . . . . . . . . 107 3.3.9 Fixed-Point Compare Instructions .67 4.4.5.2 Action . . . . . . . . . . . . . . . . . . . 107 3.3.10 Fixed-Point Trap Instructions . . . .69 4.5 Floating-Point Execution Models . 107 3.3.10.1 64-bit Fixed-Point Trap Instruc- 4.5.1 Execution Model for IEEE Opera- tions [Category: 64-Bit]. . . . . . . . . . . . . .70 tions . . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.3.11 Fixed-Point Select [Category: 4.5.2 Execution Model for Base.Phased-In] . . . . . . . . . . . . . . . . . . .70 Multiply-Add Type Instructions . . . . . . 109 3.3.12 Fixed-Point Logical Instructions .71 4.6 Floating-Point Processor Instructions . 3.3.12.1 64-bit Fixed-Point Logical Instruc- 110 tions [Category: 64-Bit]. . . . . . . . . . . . . .76 4.6.1 Floating-Point Storage Access 3.3.12.2 Phased-In Fixed-Point Logical Instructions . . . . . . . . . . . . . . . . . . . . . 111 Instructions [Category: Base.Phased-In] 76 4.6.1.1 Storage Access Exceptions . . 111 3.3.13 Fixed-Point Rotate and Shift 4.6.2 Floating-Point Load Instructions 111 Instructions . . . . . . . . . . . . . . . . . . . . . . .77 4.6.3 Floating-Point Store Instructions 114 3.3.13.1 Fixed-Point Rotate Instructions 77 4.6.4 Floating-Point Move Instructions 118 3.3.13.1.1 64-bit Fixed-Point Rotate 4.6.5 Floating-Point Arithmetic Instructions Instructions [Category: 64-Bit] . . . . . . . .79 119 3.3.13.2 Fixed-Point Shift Instructions . .83 4.6.5.1 Floating-Point Elementary Arith- 3.3.13.2.1 64-bit Fixed-Point Shift Instruc- metic Instructions . . . . . . . . . . . . . . . . 119 tions [Category: 64-Bit]. . . . . . . . . . . . . .85 4.6.5.2 Floating-Point Multiply-Add Instruc- 3.3.14 Move To/From System Register tions . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Instructions . . . . . . . . . . . . . . . . . . . . . . .86 4.6.6 Floating-Point Rounding and Conver- 3.3.14.1 Move To/From System Registers sion Instructions . . . . . . . . . . . . . . . . . 125 [Category: Embedded] . . . . . . . . . . . . . .91 4.6.6.1 Floating-Point Rounding Instruction 125 Chapter 4. Floating-Point Processor 4.6.6.2 Floating-Point Convert To/From [Category: Floating-Point] . . . . . . . 93 Integer Instructions . . . . . . . . . . . . . . . 125 4.6.6.3 Floating Round to Integer Instruc- 4.1 Floating-Point Processor Overview .93 tions [Category: Floating-Point.Phased-In] 4.2 Floating-Point Processor Registers .94 127 4.2.1 Floating-Point Registers . . . . . . . .94 vi Power ISATM Version 2.04 4.6.7 Floating-Point Compare Instructions 5.9.1.5 Vector Integer Sum-Across Instruc- 129 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 4.6.8 Floating-Point Select Instruction 130 5.9.1.6 Vector Integer Average Instructions 4.6.9 Floating-Point Status and Control 175 Register Instructions . . . . . . . . . . . . . . 130 5.9.1.7 Vector Integer Maximum and Mini- mum Instructions . . . . . . . . . . . . . . . . . 177 Chapter 5. Vector Processor 5.9.2 Vector Integer Compare Instructions [Category: Vector]. . . . . . . . . . . . . 133 181 5.9.3 Vector Logical Instructions . . . . . 184 5.1 Vector Processor Overview . . . . . 134 5.9.4 Vector Integer Rotate and Shift 5.2 Chapter Conventions . . . . . . . . . . 134 Instructions . . . . . . . . . . . . . . . . . . . . . 185 5.2.1 Description of Instruction Operation 5.10 Vector Floating-Point Instruction Set . 134 189 5.3 Vector Processor Registers . . . . . 135 5.10.1 Vector Floating-Point Arithmetic 5.3.1 Vector Registers . . . . . . . . . . . . 135 Instructions . . . . . . . . . . . . . . . . . . . . . 189 5.3.2 Vector Status and Control Register . 5.10.2 Vector Floating-Point Maximum and 135 Minimum Instructions. . . . . . . . . . . . . . 191 5.3.3 VR Save Register . . . . . . . . . . . 136 5.10.3 Vector Floating-Point Rounding and 5.4 Vector Storage Access Operations 136 Conversion Instructions . . . . . . . . . . . . 192 5.4.1 Accessing Unaligned Storage Oper- 5.10.4 Vector Floating-Point Compare ands . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Instructions . . . . . . . . . . . . . . . . . . . . . 195 5.5 Vector Integer Operations . . . . . . . 139 5.10.5 Vector Floating-Point Estimate 5.5.1 Integer Saturation . . . . . . . . . . . 139 Instructions . . . . . . . . . . . . . . . . . . . . . 197 5.6 Vector Floating-Point Operations . 140 5.11 Vector Status and Control Register 5.6.1 Floating-Point Overview. . . . . . . 140 Instructions . . . . . . . . . . . . . . . . . . . . . 199 5.6.2 Floating-Point Exceptions . . . . . 140 5.6.2.1 NaN Operand Exception. . . . . 141 5.6.2.2 Invalid Operation Exception . . 141 Chapter 6. Signal Processing Engine 5.6.2.3 Zero Divide Exception . . . . . . 141 (SPE) 5.6.2.4 Log of Zero Exception . . . . . . 141 [Category: Signal Processing Engine 5.6.2.5 Overflow Exception. . . . . . . . . 141 ] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 5.6.2.6 Underflow Exception. . . . . . . . 142 6.1 Overview . . . . . . . . . . . . . . . . . . . . 201 5.7 Vector Storage Access Instructions142 6.2 Nomenclature and Conventions . . 201 5.7.1 Storage Access Exceptions . . . . 142 6.3 Programming Model . . . . . . . . . . . 202 5.7.2 Vector Load Instructions . . . . . . 143 6.3.1 General Operation . . . . . . . . . . . 202 5.7.3 Vector Store Instructions . . . . . . 146 6.3.2 GPR Registers . . . . . . . . . . . . . . 202 5.7.4 Vector Alignment Support Instruc- 6.3.3 Accumulator Register . . . . . . . . . 202 tions . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.3.4 Signal Processing Embedded Float- 5.8 Vector Permute and Formatting ing-Point Status and Control Register Instructions . . . . . . . . . . . . . . . . . . . . . 149 (SPEFSCR) . . . . . . . . . . . . . . . . . . . . . 202 5.8.1 Vector Pack and Unpack Instructions 6.3.5 Data Formats . . . . . . . . . . . . . . . 205 149 6.3.5.1 Integer Format. . . . . . . . . . . . . 205 5.8.2 Vector Merge Instructions . . . . . 154 6.3.5.2 Fractional Format . . . . . . . . . . 205 5.8.3 Vector Splat Instructions . . . . . . 156 6.3.6 Computational Operations . . . . . 206 5.8.4 Vector Permute Instruction . . . . 157 6.3.7 SPE Instructions. . . . . . . . . . . . . 207 5.8.5 Vector Select Instruction . . . . . . 157 6.3.8 Saturation, Shift, and Bit Reverse 5.8.6 Vector Shift Instructions . . . . . . . 158 Models . . . . . . . . . . . . . . . . . . . . . . . . . 207 5.9 Vector Integer Instructions . . . . . . 160 6.3.8.1 Saturation . . . . . . . . . . . . . . . . 207 5.9.1 Vector Integer Arithmetic Instructions 6.3.8.2 Shift Left . . . . . . . . . . . . . . . . . 207 160 6.3.8.3 Bit Reverse . . . . . . . . . . . . . . . 207 5.9.1.1 Vector Integer Add Instructions 160 6.3.9 SPE Instruction Set . . . . . . . . . . 208 5.9.1.2 Vector Integer Subtract Instructions 163 5.9.1.3 Vector Integer Multiply Instructions Chapter 7. Embedded Floating-Point 166 [Category: SPE.Embedded Float Scal 5.9.1.4 Vector Integer Multiply-Add/Sum ar Double] Instructions . . . . . . . . . . . . . . . . . . . . . 168 Table of Contents vii Version 2.04 [Category: SPE.Embedded Float Scal [Category: Legacy Integer Multiply- ar Single] Accumulate] . . . . . . . . . . . . . . . . . 289 [Category: SPE.Embedded Float Vect or] . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Appendix A. Suggested Floating- 7.1 Overview . . . . . . . . . . . . . . . . . . . .255 Point Models [Category: Floating- 7.2 Programming Model. . . . . . . . . . . .256 Point] . . . . . . . . . . . . . . . . . . . . . . . 299 7.2.1 Signal Processing Embedded Float- A.1 Floating-Point Round to Single-Preci- ing-Point Status and Control Register sion Model. . . . . . . . . . . . . . . . . . . . . . 299 (SPEFSCR) . . . . . . . . . . . . . . . . . . . . .256 A.2 Floating-Point Convert to Integer 7.2.2 Floating-Point Data Formats . . . .256 Model . . . . . . . . . . . . . . . . . . . . . . . . . 303 7.2.3 Exception Conditions . . . . . . . . .257 A.3 Floating-Point Convert from Integer 7.2.3.1 Denormalized Values on Input .257 Model . . . . . . . . . . . . . . . . . . . . . . . . . 306 7.2.3.2 Embedded Floating-Point Over- A.4 Floating-Point Round to Integer Model flow and Underflow . . . . . . . . . . . . . . . .257 307 7.2.3.3 Embedded Floating-Point Invalid Operation/Input Errors . . . . . . . . . . . . .257 Appendix B. Vector RTL Functions 7.2.3.4 Embedded Floating-Point Round (Inexact) . . . . . . . . . . . . . . . . . . . . . . . .257 [Category: Vector] . . . . . . . . . . . . 309 7.2.3.5 Embedded Floating-Point Divide by Zero . . . . . . . . . . . . . . . . . . . . . . . . . . .257 Appendix C. Embedded Floating- 7.2.3.6 Default Results . . . . . . . . . . . . .258 Point RTL Functions 7.2.4 IEEE 754 Compliance . . . . . . . . .258 7.2.4.1 Sticky Bit Handling For Exception Conditions. . . . . . . . . . . . . . . . . . . . . . .258 [Category: SPE.Embedded Float 7.3 Embedded Floating-Point Instructions . Scalar Double] 259 [Category: SPE.Embedded Float 7.3.1 Load/Store Instructions . . . . . . . .259 7.3.2 SPE.Embedded Float Vector Instruc- Scalar Single] tions [Category: SPE.Embedded Float [Category: SPE.Embedded Float Vector] . . . . . . . . . . . . . . . . . . . . . . . . .259 Vector] . . . . . . . . . . . . . . . . . . . . . . 311 7.3.3 SPE.Embedded Float Scalar Single C.1 Common Functions . . . . . . . . . . . 311 Instructions C.2 Convert from Single-Precision Embed- [Category: SPE.Embedded Float Scalar ded Floating-Point to Integer Word with Single]. . . . . . . . . . . . . . . . . . . . . . . . . .267 Saturation . . . . . . . . . . . . . . . . . . . . . . 312 7.3.4 SPE.Embedded Float Scalar Double C.3 Convert from Double-Precision Instructions Embedded Floating-Point to Integer Word [Category: SPE.Embedded Float Scalar with Saturation . . . . . . . . . . . . . . . . . . 313 Double] . . . . . . . . . . . . . . . . . . . . . . . . .274 C.4 Convert from Double-Precision 7.4 Embedded Floating-Point Results Embedded Floating-Point to Integer Dou- Summary . . . . . . . . . . . . . . . . . . . . . . .282 bleword with Saturation . . . . . . . . . . . . 314 C.5 Convert to Single-Precision Embed- Chapter 8. Legacy Move Assist ded Floating-Point from Integer Word . 315 Instruction [Category: Legacy Move C.6 Convert to Double-Precision Embed- ded Floating-Point from Integer Word . 315 Assist] . . . . . . . . . . . . . . . . . . . . . . 287 C.7 Convert to Double-Precision Embed- ded Floating-Point from Integer Double- Chapter 9. Legacy Integer Multiply- word . . . . . . . . . . . . . . . . . . . . . . . . . . 316 Accumulate Instructions Appendix D. Assembler Extended Mnemonics . . . . . . . . . . . . . . . . . . 317 D.1 Symbols . . . . . . . . . . . . . . . . . . . . 317 D.2 Branch Mnemonics. . . . . . . . . . . . 318 D.2.1 BO and BI Fields . . . . . . . . . . . . 318 D.2.2 Simple Branch Mnemonics . . . . 318 viii Power ISATM Version 2.04 D.2.3 Branch Mnemonics Incorporating E.3.4 Notes . . . . . . . . . . . . . . . . . . . . . 336 Conditions . . . . . . . . . . . . . . . . . . . . . . 319 E.4 Vector Unaligned Storage Operations D.2.4 Branch Prediction . . . . . . . . . . . 320 [Category: Vector] . . . . . . . . . . . . . . . . 337 D.3 Condition Register Logical Mnemonics E.4.1 Loading a Unaligned Quadword 321 Using Permute from Big-Endian Storage. . D.4 Subtract Mnemonics. . . . . . . . . . . 321 337 D.4.1 Subtract Immediate . . . . . . . . . . 321 D.4.2 Subtract . . . . . . . . . . . . . . . . . . . 321 Book II: D.5 Compare Mnemonics . . . . . . . . . . 322 D.5.1 Doubleword Comparisons . . . . . 322 D.5.2 Word Comparisons . . . . . . . . . . 322 Power ISA Virtual Environment D.6 Trap Mnemonics . . . . . . . . . . . . . . 323 Architecture . . . . . . . . . . . . . . . . . . 339 D.7 Rotate and Shift Mnemonics . . . . 325 D.7.1 Operations on Doublewords . . . 325 Chapter 1. Storage Model. . . . . . . 341 D.7.2 Operations on Words . . . . . . . . 326 1.1 Definitions . . . . . . . . . . . . . . . . . . . 341 D.8 Move To/From Special Purpose Regis- 1.2 Introduction . . . . . . . . . . . . . . . . . . 342 ter Mnemonics . . . . . . . . . . . . . . . . . . 327 1.3 Virtual Storage . . . . . . . . . . . . . . . 342 D.9 Miscellaneous Mnemonics . . . . . . 327 1.4 Single-copy Atomicity . . . . . . . . . 343 1.5 Cache Model . . . . . . . . . . . . . . . . . 343 Appendix E. Programming Examples 1.6 Storage Control Attributes . . . . . . 344 331 1.6.1 Write Through Required . . . . . . 344 E.1 Multiple-Precision Shifts . . . . . . . . 331 1.6.2 Caching Inhibited . . . . . . . . . . . 344 E.2 Floating-Point Conversions [Category: 1.6.3 Memory Coherence Required [Cate- Floating-Point] . . . . . . . . . . . . . . . . . . . 334 gory: Memory Coherence] . . . . . . . . . 345 E.2.1 Conversion from 1.6.4 Guarded . . . . . . . . . . . . . . . . . . 345 Floating-Point Number to 1.6.5 Endianness [Category: Embed- Floating-Point Integer . . . . . . . . . . . . . 334 ded.Little-Endian]. . . . . . . . . . . . . . . . . 346 E.2.2 Conversion from 1.6.6 Variable Length Encoded (VLE) Floating-Point Number to Signed Fixed- Instructions . . . . . . . . . . . . . . . . . . . . . 346 Point Integer Doubleword . . . . . . . . . . 334 1.7 Shared Storage . . . . . . . . . . . . . . 347 E.2.3 Conversion from 1.7.1 Storage Access Ordering . . . . 347 Floating-Point Number to Unsigned Fixed- 1.7.2 Storage Ordering of I/O Accesses . . Point Integer Doubleword . . . . . . . . . . 334 349 E.2.4 Conversion from 1.7.3 Atomic Update . . . . . . . . . . . . . . 349 Floating-Point Number to Signed Fixed- 1.7.3.1 Reservations . . . . . . . . . . . . . 349 Point Integer Word . . . . . . . . . . . . . . . 334 1.7.3.2 Forward Progress. . . . . . . . . . 351 E.2.5 Conversion from 1.8 Instruction Storage . . . . . . . . . . . . 351 Floating-Point Number to Unsigned Fixed- 1.8.1 Concurrent Modification and Execu- Point Integer Word . . . . . . . . . . . . . . . 335 tion of Instructions . . . . . . . . . . . . . . . . 353 E.2.6 Conversion from Signed Fixed-Point Integer Doubleword to Floating-Point Num- Chapter 2. Effect of Operand ber . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Placement on Performance . . . . . . 355 E.2.7 Conversion from Unsigned Fixed- 2.1 Instruction Restart . . . . . . . . . . . . 356 Point Integer Doubleword to Floating-Point Number . . . . . . . . . . . . . . . . . . . . . . . . 335 E.2.8 Conversion from Signed Fixed-Point Chapter 3. Storage Control Integer Word to Floating-Point Number 335 Instructions . . . . . . . . . . . . . . . . . . 357 E.2.9 Conversion from Unsigned Fixed- 3.1 Parameters Useful to Application Pro- Point Integer Word to Floating-Point Num- grams . . . . . . . . . . . . . . . . . . . . . . . . . 357 ber . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 3.2 Cache Management Instructions . 358 E.3 Floating-Point Selection [Category: 3.2.1 Instruction Cache Instructions . . 359 Floating-Point] . . . . . . . . . . . . . . . . . . . 336 3.2.2 Data Cache Instructions . . . . . . 360 E.3.1 Comparison to Zero . . . . . . . . . 336 3.2.2.1 Obsolete Data Cache Instructions E.3.2 Minimum and Maximum . . . . . . 336 [Category: Vector.Phased-Out] . . . . . . 368 E.3.3 Simple if-then-else 3.3 Synchronization Instructions . . . . . 369 Constructions . . . . . . . . . . . . . . . . . . . 336 Table of Contents ix Version 2.04 3.3.1 Instruction Synchronize Instruction . . Architecture - Server Environment . . 369 391 3.3.2 Load and Reserve and Store Condi- tional Instructions . . . . . . . . . . . . . . . . .369 3.3.2.1 64-Bit Load and Reserve and Chapter 1. Introduction . . . . . . . . 393 Store Conditional Instructions [Category: 1.1 Overview. . . . . . . . . . . . . . . . . . . . 393 64-Bit] . . . . . . . . . . . . . . . . . . . . . . . . . .371 1.2 Document Conventions . . . . . . . . 393 3.3.3 Memory Barrier Instructions . . . .372 1.2.1 Definitions and Notation . . . . . . 393 3.3.4 Wait Instruction . . . . . . . . . . . . . .375 1.2.2 Reserved Fields. . . . . . . . . . . . . 394 1.3 General Systems Overview . . . . . 394 Chapter 4. Time Base . . . . . . . . . 377 1.4 Exceptions . . . . . . . . . . . . . . . . . . 394 1.5 Synchronization . . . . . . . . . . . . . . 395 4.1 Time Base Overview . . . . . . . . . . .377 1.5.1 Context Synchronization . . . . . . 395 4.2 Time Base . . . . . . . . . . . . . . . . . . .377 1.5.2 Execution Synchronization . . . . 395 4.2.1 Time Base Instructions . . . . . . . .378 4.3 Alternate Time Base [Category: Alter- nate Time Base] . . . . . . . . . . . . . . . . . .380 Chapter 2. Logical Partitioning (LPAR) . . . . . . . . . . . . . . . . . . . . . . 397 Chapter 5. External Control 2.1 Overview. . . . . . . . . . . . . . . . . . . . 397 [Category: External Control] . . . . 381 2.2 Logical Partitioning Control Register (LPCR) . . . . . . . . . . . . . . . . . . . . . . . . 397 5.1 External Access Instructions . . . . .382 2.3 Real Mode Offset Register (RMOR) . . 399 Appendix A. Assembler Extended 2.4 Hypervisor Real Mode Offset Register Mnemonics . . . . . . . . . . . . . . . . . . 383 (HRMOR) . . . . . . . . . . . . . . . . . . . . . . 399 A.1 Data Cache Block Flush Mnemonics. . 2.5 Logical Partition 383 Identification Register (LPIDR) . . . . . . 399 A.2 Synchronize Mnemonics . . . . . . . .383 2.6 Other Hypervisor Resources . . . . 399 2.7 Sharing Hypervisor Resources. . . 400 Appendix B. Programming Examples 2.8 Hypervisor Interrupt Little-Endian (HILE) Bit . . . . . . . . . . . . . . . . . . . . . . 400 for Sharing Storage . . . . . . . . . . . 385 B.1 Atomic Update Primitives. . . . . . . .385 B.2 Lock Acquisition and Release, and Chapter 3. Branch Processor . . . 401 Related Techniques . . . . . . . . . . . . . . .387 3.1 Branch Processor Overview . . . . . 401 B.2.1 Lock Acquisition and Import Barriers 3.2 Branch Processor Registers . . . . . 401 387 3.2.1 Machine State Register . . . . . . . 401 B.2.1.1 Acquire Lock and Import Shared 3.3 Branch Processor Instructions . . . 404 Storage . . . . . . . . . . . . . . . . . . . . . . . . .387 3.3.1 System Linkage Instructions . . . 404 B.2.1.2 Obtain Pointer and Import Shared Storage . . . . . . . . . . . . . . . . . . . . . . . . .387 Chapter 4. Fixed-Point Processor 407 B.2.2 Lock Release and Export Barriers . . 4.1 Fixed-Point Processor Overview. . 407 388 4.2 Special Purpose Registers . . . . . . 407 B.2.2.1 Export Shared Storage and 4.3 Fixed-Point Processor Registers . 407 Release Lock . . . . . . . . . . . . . . . . . . . .388 4.3.1 Processor Version Register . . . . 407 B.2.2.2 Export Shared Storage and 4.3.2 Processor Identification Register 407 Release Lock using lwsync . . . . . . . . .388 4.3.3 Control Register. . . . . . . . . . . . . 408 B.2.3 Safe Fetch. . . . . . . . . . . . . . . . . .388 4.3.4 Program Priority Register . . . . . 408 B.3 List Insertion . . . . . . . . . . . . . . . . .389 4.3.5 Software-use SPRs . . . . . . . . . . 409 B.4 Notes . . . . . . . . . . . . . . . . . . . . . . .389 4.4 Fixed-Point Processor Instructions 410 4.4.1 Fixed-Point Storage Access Instruc- Book III-S: tions [Category: Load/Store Quadword] . . 410 4.4.2 OR Instruction . . . . . . . . . . . . . . 411 Power ISA Operating Environment 4.4.3 Move To/From System Register Instructions . . . . . . . . . . . . . . . . . . . . . 411 x Power ISATM Version 2.04 Chapter 5. Storage Control . . . . . 419 5.8.2.1 Storage Control Bit Restrictions . . 5.1 Overview. . . . . . . . . . . . . . . . . . . . 419 441 5.2 Storage Exceptions. . . . . . . . . . . . 420 5.8.2.2 Altering the Storage Control Bits . . 5.3 Instruction Fetch . . . . . . . . . . . . . 420 441 5.3.1 Implicit Branch . . . . . . . . . . . . . . 420 5.9 Storage Control Instructions . . . . . 442 5.3.2 Address Wrapping Combined with 5.9.1 Cache Management Instructions 442 Changing MSR Bit SF . . . . . . . . . . . . . 420 5.9.2 Synchronize Instruction . . . . . . . 442 5.4 Data Access . . . . . . . . . . . . . . . . . 420 5.9.3 Lookaside Buffer 5.5 Performing Operations Management . . . . . . . . . . . . . . . . . . . . 442 Out-of-Order . . . . . . . . . . . . . . . . . . . . 420 5.9.3.1 SLB Management Instructions 443 5.6 Invalid Real Address . . . . . . . . . . . 421 5.9.3.2 Bridge to SLB Architecture [Cate- 5.7 Storage Addressing . . . . . . . . . . . 422 gory:Server.Phased-Out] . . . . . . . . . . . 447 5.7.1 32-Bit Mode . . . . . . . . . . . . . . . . 422 5.9.3.2.1 Segment Register 5.7.2 Virtualized Partition Memory (VPM) Manipulation Instructions. . . . . . . . . . . 447 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . 422 5.9.3.3 TLB Management Instructions 450 5.7.3 Real And Virtual Real Addressing 5.10 Page Table Update Synchronization Modes . . . . . . . . . . . . . . . . . . . . . . . . . 422 Requirements . . . . . . . . . . . . . . . . . . . 454 5.7.3.1 Hypervisor Offset Real Mode 5.10.1 Page Table Updates . . . . . . . . . 454 Address . . . . . . . . . . . . . . . . . . . . . . . . 423 5.10.1.1 Adding a Page Table Entry . . 455 5.7.3.2 Offset Real Mode Address . . . 423 5.10.1.2 Modifying a Page Table Entry 456 5.7.3.3 Storage Control Attributes for 5.10.1.3 Deleting a Page Table Entry . 457 Accesses in Real and Hypervisor Real Addressing Modes . . . . . . . . . . . . . . . 424 Chapter 6. Interrupts. . . . . . . . . . . 459 5.7.3.3.1 Hypervisor Real Mode Storage 6.1 Overview . . . . . . . . . . . . . . . . . . . . 459 Control . . . . . . . . . . . . . . . . . . . . . . . . 424 6.2 Interrupt Registers. . . . . . . . . . . . . 459 5.7.3.4 Virtual Real Mode Addressing 6.2.1 Machine Status Save/Restore Regis- Mechanism. . . . . . . . . . . . . . . . . . . . . 424 ters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 5.7.3.5 Storage Control Attributes for 6.2.2 Hypervisor Machine Status Save/ Implicit Storage Accesses . . . . . . . . . . 425 Restore Registers . . . . . . . . . . . . . . . . 460 5.7.4 Address Ranges Having Defined 6.2.3 Data Address Register . . . . . . . . 460 Uses . . . . . . . . . . . . . . . . . . . . . . . . . . 426 6.2.4 Hypervisor Data Address Register 5.7.5 Address Translation Overview . . 427 460 5.7.6 Virtual Address Generation . . . . 427 6.2.5 Data Storage Interrupt 5.7.6.1 Segment Lookaside Buffer (SLB) Status Register . . . . . . . . . . . . . . . . . . 460 427 6.2.6 Hypervisor Data Storage Interrupt 5.7.6.2 SLB Search . . . . . . . . . . . . . . 428 Status Register . . . . . . . . . . . . . . . . . 460 5.7.7 Virtual to Real Translation . . . . . 430 6.3 Interrupt Synchronization . . . . . . . 462 5.7.7.1 Page Table . . . . . . . . . . . . . . . 431 6.4 Interrupt Classes . . . . . . . . . . . . . . 462 5.7.7.2 Storage Description 6.4.1 Precise Interrupt. . . . . . . . . . . . . 462 Register 1 . . . . . . . . . . . . . . . . . . . . . . 433 6.4.2 Imprecise Interrupt . . . . . . . . . . . 462 5.7.7.3 Page Table Search . . . . . . . . . 433 6.4.3 Interrupt Processing . . . . . . . . . . 463 5.7.8 Reference and Change Recording . 6.4.4 Implicit alteration of HSRR0 and 435 HSRR1 . . . . . . . . . . . . . . . . . . . . . . . . 465 5.7.9 Storage and Virtual Page Class Key 6.5 Interrupt Definitions . . . . . . . . . . . . 466 Protection . . . . . . . . . . . . . . . . . . . . . . 437 6.5.1 System Reset Interrupt . . . . . . . 466 5.7.9.1 Virtual Page Class Key Protection 6.5.2 Machine Check Interrupt . . . . . . 467 437 6.5.3 Data Storage Interrupt . . . . . . . . 467 5.7.9.2 Storage Protection, Address Trans- 6.5.4 Data Segment Interrupt . . . . . . . 468 lation Enabled . . . . . . . . . . . . . . . . . . . 438 6.5.5 Instruction Storage Interrupt. . . . 469 5.7.9.3 Storage Protection, Address Trans- 6.5.6 Instruction Segment lation Disabled. . . . . . . . . . . . . . . . . . . 439 Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 469 5.8 Storage Control Attributes . . . . . . 440 6.5.7 External Interrupt . . . . . . . . . . . . 470 5.8.1 Guarded Storage . . . . . . . . . . . . 440 6.5.8 Alignment Interrupt. . . . . . . . . . . 470 5.8.1.1 Out-of-Order Accesses to Guarded 6.5.9 Program Interrupt . . . . . . . . . . . . 471 Storage . . . . . . . . . . . . . . . . . . . . . . . . 440 6.5.10 Floating-Point Unavailable 5.8.2 Storage Control Bits. . . . . . . . . . 440 Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 472 Table of Contents xi Version 2.04 6.5.11 Decrementer Interrupt . . . . . . . .472 Appendix B. Example Performance 6.5.12 Hypervisor Decrementer Monitor . . . . . . . . . . . . . . . . . . . . . 495 Interrupt . . . . . . . . . . . . . . . . . . . . . . . .473 B.1 PMM Bit of the Machine State Register 6.5.13 System Call Interrupt . . . . . . . .473 496 6.5.14 Trace Interrupt [Category: Trace]. . . B.2 Special Purpose Registers . . . . . . 496 473 B.2.1 Performance Monitor Counter Regis- 6.5.15 Hypervisor Data Storage Inter- ters . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 rupt. . . . . . . . . . . . . . . . . . . . . . . . . . . .473 B.2.2 Monitor Mode Control Register 0 497 6.5.16 Hypervisor Instruction Storage B.2.3 Monitor Mode Control Register 1 500 Interrupt. . . . . . . . . . . . . . . . . . . . . . . .475 B.2.4 Monitor Mode Control Register A500 6.5.17 Hypervisor Data Segment Inter- B.2.5 Sampled Instruction Address Regis- rupt. . . . . . . . . . . . . . . . . . . . . . . . . . . .475 ter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 6.5.18 Hypervisor Instruction Segment B.2.6 Sampled Data Address Register 501 Interrupt. . . . . . . . . . . . . . . . . . . . . . . .475 B.3 Performance Monitor 6.5.19 Performance Monitor Interrupt. . . . . . . . . . . . . . . . . . . . . . . . 502 Interrupt [Category: Server.Performance B.4 Interaction with the Trace Facility . 502 Monitor]. . . . . . . . . . . . . . . . . . . . . . . . .476 6.5.20 Vector Unavailable Interrupt [Cate- gory: Vector] . . . . . . . . . . . . . . . . . . . . .476 Appendix C. Example Trace 6.6 Partially Executed Extensions . . . . . . . . . . . . . . . . . . 503 Instructions . . . . . . . . . . . . . . . . . . . . . .477 6.7 Exception Ordering . . . . . . . . . . . .478 Appendix D. Interpretation of the 6.7.1 Unordered Exceptions. . . . . . . . .478 DSISR as Set by an Alignment 6.7.2 Ordered Exceptions . . . . . . . . . .478 6.8 Interrupt Priorities. . . . . . . . . . . . . .479 Interrupt. . . . . . . . . . . . . . . . . . . . . 505 Chapter 7. Timer Facilities. . . . . . 481 Book III-E: 7.1 Overview . . . . . . . . . . . . . . . . . . . .481 7.2 Time Base (TB) . . . . . . . . . . . . . . .481 Power ISA Operating Environment 7.2.1 Writing the Time Base. . . . . . . . .482 Architecture - Embedded 7.3 Decrementer . . . . . . . . . . . . . . . . .482 7.3.1 Writing and Reading the Decre- Environment . . . . . . . . . . . . . . . . . 507 menter . . . . . . . . . . . . . . . . . . . . . . . . .483 7.4 Hypervisor Decrementer . . . . . . . .483 Chapter 1. Introduction . . . . . . . . 509 7.5 Processor Utilization of Resources 1.1 Overview. . . . . . . . . . . . . . . . . . . . 509 Register (PURR). . . . . . . . . . . . . . . . . .483 1.2 32-Bit Implementations . . . . . . . . . 509 1.3 Document Conventions . . . . . . . . 509 Chapter 8. Debug Facilities . . . . . 485 1.3.1 Definitions and Notation . . . . . . 509 8.1 Overview . . . . . . . . . . . . . . . . . . . .485 1.3.2 Reserved Fields. . . . . . . . . . . . . 510 8.1.1 Data Address Breakpoint . . . . . .485 1.4 General Systems Overview . . . . . 510 1.5 Exceptions . . . . . . . . . . . . . . . . . . 510 Chapter 9. External Control 1.6 Synchronization . . . . . . . . . . . . . . 511 1.6.1 Context Synchronization . . . . . . 511 [Category: External Control] . . . . 487 1.6.2 Execution Synchronization . . . . 511 9.1 External Access Register . . . . . . . .487 9.2 External Access Instructions . . . . .487 Chapter 2. Branch Processor . . . 513 2.1 Branch Processor Overview . . . . . 513 Chapter 10. Synchronization 2.2 Branch Processor Registers . . . . . 513 Requirements for Context Alterations 2.2.1 Machine State Register . . . . . . . 513 489 2.3 Branch Processor Instructions . . . 515 2.4 System Linkage Instructions. . . . . 515 Appendix A. Assembler Extended Mnemonics. . . . . . . . . . . . . . . . . . . 493 Chapter 3. Fixed-Point Processor 519 3.1 Fixed-Point Processor Overview. . 519 A.1 Move To/From Special Purpose Regis- 3.2 Special Purpose Registers . . . . . . 519 ter Mnemonics . . . . . . . . . . . . . . . . . . .493 xii Power ISATM Version 2.04 3.3 Fixed-Point Processor Registers . 519 4.9.2.1 Lock Setting and Clearing . . . . 555 3.3.1 Processor Version Register . . . . 519 4.9.2.2 Error Conditions . . . . . . . . . . . 555 3.3.2 Processor Identification Register 519 4.9.2.2.1 Overlocking . . . . . . . . . . . . . 555 3.3.3 Software-use SPRs . . . . . . . . . . 520 4.9.2.2.2 Unable-to-lock and Unable-to- 3.3.4 External Process ID Registers [Cate- unlock Conditions . . . . . . . . . . . . . . . . 556 gory: Embedded.External PID] . . . . . . 521 4.9.2.3 Cache Locking Instructions . . . 557 3.3.4.1 External Process ID Load Context 4.9.3 Synchronize Instruction . . . . . . . 559 (EPLC) Register . . . . . . . . . . . . . . . . . 521 4.9.4 Lookaside Buffer 3.3.4.2 External Process ID Store Context Management . . . . . . . . . . . . . . . . . . . . 559 (EPSC) Register . . . . . . . . . . . . . . . . . 522 4.9.4.1 TLB Management Instructions 560 3.4 Fixed-Point Processor Instructions 523 3.4.1 Move To/From System Register Chapter 5. Interrupts and Exceptions Instructions . . . . . . . . . . . . . . . . . . . . . 523 563 3.4.2 External Process ID Instructions 5.1 Overview . . . . . . . . . . . . . . . . . . . . 564 [Category: Embedded.External PID]. . 529 5.2 Interrupt Registers. . . . . . . . . . . . . 564 5.2.1 Save/Restore Register 0 . . . . . . 564 Chapter 4. Storage Control . . . . . 541 5.2.2 Save/Restore Register 1 . . . . . . 564 4.1 Storage Addressing . . . . . . . . . . . 541 5.2.3 Critical Save/Restore Register 0 565 4.2 Storage Exceptions . . . . . . . . . . . 541 5.2.4 Critical Save/Restore Register 1 565 4.3 Instruction Fetch . . . . . . . . . . . . . 542 5.2.5 Debug Save/Restore Register 0 4.3.1 Implicit Branch . . . . . . . . . . . . . . 542 [Category: Embedded.Enhanced Debug] . 4.3.2 Address Wrapping Combined with 565 Changing MSR Bit CM . . . . . . . . . . . . 542 5.2.6 Debug Save/Restore Register 1 4.4 Data Access . . . . . . . . . . . . . . . . . 542 [Category: Embedded.Enhanced Debug] . 4.5 Performing Operations 565 Out-of-Order . . . . . . . . . . . . . . . . . . . . 542 5.2.7 Data Exception Address Register . . 4.6 Invalid Real Address . . . . . . . . . . . 543 566 4.7 Storage Control. . . . . . . . . . . . . . . 543 5.2.8 Interrupt Vector Prefix Register . 566 4.7.1 Storage Control Registers . . . . . 543 5.2.9 Exception Syndrome Register . . 567 4.7.1.1 Process ID Register . . . . . . . . 543 5.2.10 Interrupt Vector Offset Registers . . 4.7.1.2 Translation Lookaside Buffer . 543 568 4.7.2 Page Identification . . . . . . . . . . . 545 5.2.11 Machine Check Registers . . . . 568 4.7.3 Address Translation . . . . . . . . . . 548 5.2.11.1 Machine Check Save/Restore 4.7.4 Storage Access Control . . . . . . . 549 Register 0 . . . . . . . . . . . . . . . . . . . . . . 569 4.7.4.1 Execute Access . . . . . . . . . . . 549 5.2.11.2 Machine Check Save/Restore 4.7.4.2 Write Access. . . . . . . . . . . . . . 549 Register 1 . . . . . . . . . . . . . . . . . . . . . . 569 4.7.4.3 Read Access . . . . . . . . . . . . . 549 5.2.11.3 Machine Check Syndrome Regis- 4.7.4.4 Storage Access Control Applied to ter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 Cache Management Instructions . . . . 549 5.2.12 External Proxy Register [Category: 4.7.4.5 Storage Access Control Applied to External Proxy] . . . . . . . . . . . . . . . . . . 569 String Instructions . . . . . . . . . . . . . . . . 550 5.3 Exceptions. . . . . . . . . . . . . . . . . . . 570 4.7.5 TLB Management . . . . . . . . . . . 550 5.4 Interrupt Classification. . . . . . . . . . 570 4.8 Storage Control Attributes . . . . . . 551 5.4.1 Asynchronous Interrupts . . . . . . 570 4.8.1 Guarded Storage . . . . . . . . . . . . 551 5.4.2 Synchronous Interrupts . . . . . . . 570 4.8.1.1 Out-of-Order Accesses to Guarded 5.4.2.1 Synchronous, Precise Interrupts . . Storage . . . . . . . . . . . . . . . . . . . . . . . . 552 571 4.8.2 User-Definable. . . . . . . . . . . . . . 552 5.4.2.2 Synchronous, Imprecise Interrupts 4.8.3 Storage Control Bits. . . . . . . . . . 552 571 4.8.3.1 Storage Control Bit Restrictions . . 5.4.3 Interrupt Classes . . . . . . . . . . . . 571 552 5.4.4 Machine Check Interrupts . . . . . 571 4.8.3.2 Altering the Storage Control Bits . 5.5 Interrupt Processing . . . . . . . . . . . 572 553 5.6 Interrupt Definitions . . . . . . . . . . . . 574 4.9 Storage Control Instructions . . . . . 554 5.6.1 Critical Input Interrupt. . . . . . . . . 576 4.9.1 Cache Management Instructions 554 5.6.2 Machine Check Interrupt . . . . . . 576 4.9.2 Cache Locking [Category: Embed- 5.6.3 Data Storage Interrupt . . . . . . . . 577 ded Cache Locking] . . . . . . . . . . . . . . 555 5.6.4 Instruction Storage Interrupt. . . . 578 Table of Contents xiii Version 2.04 5.6.5 External Input Interrupt . . . . . . . .578 5.9.1.5 Exception Priorities for Defined 5.6.6 Alignment Interrupt . . . . . . . . . . .579 Trap Instructions . . . . . . . . . . . . . . . . . 592 5.6.7 Program Interrupt . . . . . . . . . . . .580 5.9.1.6 Exception Priorities for Defined 5.6.8 Floating-Point Unavailable Interrupt . System Call Instruction . . . . . . . . . . . . 593 581 5.9.1.7 Exception Priorities for Defined 5.6.9 System Call Interrupt . . . . . . . . .581 Branch Instructions . . . . . . . . . . . . . . . 593 5.6.10 Auxiliary Processor Unavailable 5.9.1.8 Exception Priorities for Defined Interrupt . . . . . . . . . . . . . . . . . . . . . . . .581 Return From Interrupt Instructions . . . 593 5.6.11 Decrementer Interrupt . . . . . . . .582 5.9.1.9 Exception Priorities for Other 5.6.12 Fixed-Interval Timer Interrupt . .582 Defined Instructions . . . . . . . . . . . . . . 593 5.6.13 Watchdog Timer Interrupt . . . . .582 5.9.2 Exception Priorities for Reserved 5.6.14 Data TLB Error Interrupt . . . . . .583 Instructions . . . . . . . . . . . . . . . . . . . . . 593 5.6.15 Instruction TLB Error Interrupt .583 5.6.16 Debug Interrupt . . . . . . . . . . . . .584 Chapter 6. Reset and Initialization . . 5.6.17 SPE/Embedded Floating-Point/Vec- 595 tor Unavailable Interrupt 6.1 Background. . . . . . . . . . . . . . . . . . 595 [Categories: SPE.Embedded Float Scalar 6.2 Reset Mechanisms . . . . . . . . . . . . 595 Double, SPE.Embedded Float Vector, Vec- 6.3 Processor State After Reset . . . . . 595 tor] . . . . . . . . . . . . . . . . . . . . . . . . . . . .585 6.4 Software Initialization Requirements . 5.6.18 Embedded Floating-Point Data 596 Interrupt [Categories: SPE.Embedded Float Scalar Double, SPE.Embedded Float Scalar Sin- Chapter 7. Timer Facilities . . . . . 597 gle, SPE.Embedded Float Vector] . . . .586 7.1 Overview. . . . . . . . . . . . . . . . . . . . 597 5.6.19 Embedded Floating-Point Round 7.2 Time Base (TB) . . . . . . . . . . . . . . 597 Interrupt 7.2.1 Writing the Time Base . . . . . . . . 598 [Categories: SPE.Embedded Float Scalar 7.3 Decrementer . . . . . . . . . . . . . . . . . 599 Double, SPE.Embedded Float Scalar Sin- 7.3.1 Writing and Reading the Decre- gle, SPE.Embedded Float Vector] . . . .586 menter . . . . . . . . . . . . . . . . . . . . . . . . . 599 5.6.20 Performance Monitor Interrupt [Cat- 7.3.2 Decrementer Events . . . . . . . . . 599 egory: Embedded.Performance Monitor] . . 7.4 Decrementer Auto-Reload Register . . 587 600 5.6.21 Processor Doorbell Interrupt [Cate- 7.5 Timer Control Register . . . . . . . . . 600 gory: Embedded.Processor Control] . .587 7.5.1 Timer Status Register . . . . . . . . 601 5.6.22 Processor Doorbell Critical Interrupt 7.6 Fixed-Interval Timer . . . . . . . . . . . 602 [Category: Embedded.Processor Control] . 7.7 Watchdog Timer . . . . . . . . . . . . . . 602 587 7.8 Freezing the Timer Facilities . . . . . 604 5.7 Partially Executed Instructions . . . .588 5.8 Interrupt Ordering and Masking . . .589 Chapter 8. Debug Facilities . . . . 605 5.8.1 Guidelines for System Software .590 8.1 Overview. . . . . . . . . . . . . . . . . . . . 605 5.8.2 Interrupt Order . . . . . . . . . . . . . .591 8.2 Internal Debug Mode . . . . . . . . . . 605 5.9 Exception Priorities . . . . . . . . . . . .591 8.3 External Debug Mode [Category: 5.9.1 Exception Priorities for Defined Embedded.Enhanced Debug] . . . . . . . 606 Instructions . . . . . . . . . . . . . . . . . . . . . .592 8.4 Debug Events . . . . . . . . . . . . . . . . 606 5.9.1.1 Exception Priorities for Defined 8.4.1 Instruction Address Compare Debug Floating-Point Load and Store Instructions Event . . . . . . . . . . . . . . . . . . . . . . . . . . 607 592 8.4.2 Data Address Compare Debug Event 5.9.1.2 Exception Priorities for Other 609 Defined Load and Store Instructions and 8.4.3 Trap Debug Event . . . . . . . . . . . 610 Defined Cache Management Instructions . 8.4.4 Branch Taken Debug Event . . . . 610 592 8.4.5 Instruction Complete Debug Event . 5.9.1.3 Exception Priorities for Other 611 Defined Floating-Point Instructions. . . .592 8.4.6 Interrupt Taken Debug Event . . . 611 5.9.1.4 Exception Priorities for Defined 8.4.6.1 Causes of Interrupt Taken Debug Privileged Instructions . . . . . . . . . . . . .592 Events . . . . . . . . . . . . . . . . . . . . . . . . . 611 xiv Power ISATM Version 2.04 8.4.6.2 Interrupt Taken Debug Event A.2.1.1 Data Cache Debug Tag Register Description . . . . . . . . . . . . . . . . . . . . . 611 High . . . . . . . . . . . . . . . . . . . . . . . . . . . 630 8.4.7 Return Debug Event . . . . . . . . . 612 A.2.1.2 Data Cache Debug Tag Register 8.4.8 Unconditional Debug Event . . . . 612 Low . . . . . . . . . . . . . . . . . . . . . . . . . . . 630 8.4.9 Critical Interrupt Taken Debug Event A.2.1.3 Instruction Cache Debug Data [Category: Embedded.Enhanced Debug] . Register . . . . . . . . . . . . . . . . . . . . . . . . 631 612 A.2.1.4 Instruction Cache Debug Tag Reg- 8.4.10 Critical Interrupt Return Debug ister High . . . . . . . . . . . . . . . . . . . . . . . 631 Event [Category: Embedded.Enhanced A.2.1.5 Instruction Cache Debug Tag Reg- Debug]. . . . . . . . . . . . . . . . . . . . . . . . . 613 ister Low . . . . . . . . . . . . . . . . . . . . . . . 631 8.5 Debug Registers . . . . . . . . . . . . . . 613 A.2.2 Embedded Cache Debug Instruc- 8.5.1 Debug Control Registers . . . . . . 613 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 632 8.5.1.1 Debug Control Register 0 (DCBR0) 613 Appendix B. Assembler Extended 8.5.1.2 Debug Control Register 1 (DCBR1) Mnemonics . . . . . . . . . . . . . . . . . . . 635 614 B.1 Move To/From Special Purpose Regis- 8.5.1.3 Debug Control Register 2 (DCBR2) ter Mnemonics . . . . . . . . . . . . . . . . . . . 636 616 8.5.2 Debug Status Register. . . . . . . . 617 8.5.3 Instruction Address Compare Regis- Appendix C. Guidelines for 64-bit ters . . . . . . . . . . . . . . . . . . . . . . . . . . . 618 Implementations in 32-bit Mode and 8.5.4 Data Address Compare Registers . . 32-bit Implementations . . . . . . . . . 637 618 C.1 Hardware Guidelines . . . . . . . . . . 637 8.5.5 Data Value Compare Registers . 619 C.1.1 64-bit Specific Instructions. . . . . 637 8.6 Debugger Notify Halt Instruction C.1.2 Registers on 32-bit Implementations [Category: Embedded.Enhanced Debug] . 637 620 C.1.3 Addressing on 32-bit Implementa- tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 Chapter 9. Processor Control C.1.4 TLB Fields on 32-bit Implementa- [Category: Embedded.Processor tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 Control] . . . . . . . . . . . . . . . . . . . . . 621 C.2 32-bit Software Guidelines . . . . . . 637 C.2.1 32-bit Instruction Selection . . . . 637 9.1 Overview. . . . . . . . . . . . . . . . . . . . 621 9.2 Programming Model . . . . . . . . . . . 621 9.2.1 Processor Message Handling and Appendix D. Type FSL Storage Filtering . . . . . . . . . . . . . . . . . . . . . . . . 621 Control 9.2.1.1 Doorbell Message Filtering . . . 622 [Category: Embedded.MMU Type 9.2.1.2 Doorbell Critical Message Filtering FSL] . . . . . . . . . . . . . . . . . . . . . . . . . 639 622 9.3 Processor Control Instructions . . . 623 D.1 Type FSL Storage Control Overview. . 639 D.2 Type FSL Storage Control Registers . Chapter 10. Synchronization 639 Requirements for Context Alterations D.2.1 Process ID Registers (PIDn) . . . 639 625 D.2.2 Translation Lookaside Buffer . . . 639 D.2.3 Address Space Identifiers . . . . . 640 Appendix A. Implementation- D.2.4 MMU Assist Registers . . . . . . . . 640 D.2.4.1 MAS0 Register . . . . . . . . . . . . 640 Dependent Instructions . . . . . . . . 629 D.2.4.2 MAS1 Register . . . . . . . . . . . . 641 A.1 Embedded Cache Initialization D.2.4.3 MAS2 Register . . . . . . . . . . . . 641 [Category: Embedded.Cache Initialization] D.2.4.4 MAS3 Register . . . . . . . . . . . . 642 629 D.2.4.5 MAS4 Register . . . . . . . . . . . . 642 A.2 Embedded Cache Debug Facility D.2.4.6 MAS6 Register . . . . . . . . . . . . 643 [Category: Embedded.Cache Debug] . 630 D.2.4.7 MAS7 Register . . . . . . . . . . . . 643 A.2.1 Embedded Cache Debug Registers D.2.5 MMU Configuration and Control 630 Registers . . . . . . . . . . . . . . . . . . . . . . . 645 Table of Contents xv Version 2.04 D.2.5.1 MMU Configuration Register Variable Length Encoding (VLE) Envi (MMUCFG) . . . . . . . . . . . . . . . . . . . . . .645 ronment . . . . . . . . . . . . . . . . . . . . . 661 D.2.5.2 TLB Configuration Registers (TLBnCFG) . . . . . . . . . . . . . . . . . . . . . .645 D.2.5.3 MMU Control and Status Register Chapter 1. Variable Length Encoding (MMUCSR0) . . . . . . . . . . . . . . . . . . . . .645 Introduction. . . . . . . . . . . . . . . . . . 663 D.3 Page Identification and Address Trans- 1.1 Overview. . . . . . . . . . . . . . . . . . . . 663 lation . . . . . . . . . . . . . . . . . . . . . . . . . . .646 1.2 Documentation Conventions. . . . . 664 D.4 TLB Management . . . . . . . . . . . . .646 1.2.1 Description of Instruction Operation D.4.1 Reading TLB Entries. . . . . . . . . .646 664 D.4.2 Writing TLB Entries. . . . . . . . . . .646 1.3 Instruction Mnemonics and Operands D.4.3 Invalidating TLB Entries . . . . . . .647 664 D.4.4 Searching TLB Entries . . . . . . . .647 1.4 VLE Instruction Formats . . . . . . . . 664 D.4.5 TLB Replacement Hardware Assist . 1.4.1 BD8-form (16-bit Branch Instruc- 647 tions) . . . . . . . . . . . . . . . . . . . . . . . . . . 664 D.5 32-bit and 64-bit Specific MMU Behav- 1.4.2 C-form (16-bit Control Instructions) . ior . . . . . . . . . . . . . . . . . . . . . . . . . . . . .648 664 D.6 Type FSL MMU Instructions. . . . . .649 1.4.3 IM5-form (16-bit register + immediate Instructions) . . . . . . . . . . . . . . . . . . . . 664 Appendix E. Example Performance 1.4.4 OIM5-form (16-bit register + offset Monitor [Category: immediate Instructions) . . . . . . . . . . . . 664 1.4.5 IM7-form (16-bit Load immediate Embedded.Performance Monitor] 653 Instructions) . . . . . . . . . . . . . . . . . . . . 664 E.1 Overview . . . . . . . . . . . . . . . . . . . .653 1.4.6 R-form (16-bit Monadic Instructions) E.2 Programming Model . . . . . . . . . . .653 665 E.2.1 Event Counting . . . . . . . . . . . . . .654 1.4.7 RR-form (16-bit Dyadic Instructions) E.2.2 Processor Context Configurability . . 665 654 1.4.8 SD4-form (16-bit Load/Store Instruc- E.2.3 Event Selection. . . . . . . . . . . . . .654 tions) . . . . . . . . . . . . . . . . . . . . . . . . . . 665 E.2.4 Thresholds . . . . . . . . . . . . . . . . .655 1.4.9 BD15-form . . . . . . . . . . . . . . . . . 665 E.2.5 Performance Monitor Exception .655 1.4.10 BD24-form . . . . . . . . . . . . . . . . 665 E.2.6 Performance Monitor Interrupt . .655 1.4.11 D8-form . . . . . . . . . . . . . . . . . . 665 E.3 Performance Monitor Registers . . .655 1.4.12 I16A-form . . . . . . . . . . . . . . . . . 665 E.3.1 Performance Monitor Global Control 1.4.13 I16L-form . . . . . . . . . . . . . . . . . 665 Register 0 . . . . . . . . . . . . . . . . . . . . . . .655 1.4.14 M-form . . . . . . . . . . . . . . . . . . . 665 E.3.2 Performance Monitor Local Control 1.4.15 SCI8-form . . . . . . . . . . . . . . . . 665 A Registers . . . . . . . . . . . . . . . . . . . . .656 1.4.16 LI20-form . . . . . . . . . . . . . . . . . 665 E.3.3 Performance Monitor Local Control 1.4.17 Instruction Fields . . . . . . . . . . . 665 B Registers . . . . . . . . . . . . . . . . . . . . .656 E.3.4 Performance Monitor Counter Regis- Chapter 2. VLE Storage Addressing ters . . . . . . . . . . . . . . . . . . . . . . . . . . . .657 E.4 Performance Monitor Instructions .658 669 E.5 Performance Monitor Software Usage 2.1 Data Storage Addressing Modes . 669 Notes . . . . . . . . . . . . . . . . . . . . . . . . . .659 2.2 Instruction Storage Addressing Modes E.5.1 Chaining Counters . . . . . . . . . . .659 670 E.5.2 Thresholding . . . . . . . . . . . . . . . .659 2.2.1 Misaligned, Mismatched, and Byte Ordering Instruction Storage Exceptions . Book VLE: 670 2.2.2 VLE Exception Syndrome Bits. . 670 Power ISA Operating Environment Chapter 3. VLE Compatibility with Architecture - Books I­III . . . . . . . . . . . . . . . . . . . 673 3.1 Overview. . . . . . . . . . . . . . . . . . . . 673 3.2 VLE Processor and Storage Control Extensions . . . . . . . . . . . . . . . . . . . . . 673 3.2.1 Instruction Extensions . . . . . . . . 673 xvi Power ISATM Version 2.04 3.2.2 MMU Extensions . . . . . . . . . . . . 673 7.6 External PID . . . . . . . . . . . . . . . . . 713 3.3 VLE Limitations. . . . . . . . . . . . . . . 673 7.7 Embedded Performance Monitor . 714 7.8 Processor Control . . . . . . . . . . . . . 714 Chapter 4. Branch Operation Instructions . . . . . . . . . . . . . . . . . . 675 Appendix A. VLE Instruction Set 4.1 Branch Processor Registers . . . . . 675 Sorted by Mnemonic . . . . . . . . . . . 715 4.1.1 Condition Register (CR). . . . . . . 675 4.1.1.1 Condition Register Setting for Appendix B. VLE Instruction Set Compare Instructions . . . . . . . . . . . . . 676 Sorted by Opcode . . . . . . . . . . . . . 731 4.1.1.2 Condition Register Setting for the Bit Test Instruction. . . . . . . . . . . . . . . . 676 4.1.2 Link Register (LR) . . . . . . . . . . . 676 Appendices: 4.1.3 Count Register (CTR) . . . . . . . . 676 4.2 Branch Instructions . . . . . . . . . . . . 677 Power ISA Books I-III Appendices 747 4.3 System Linkage Instructions. . . . . 680 4.4 Condition Register Instructions. . . 683 Appendix A. Incompatibilities with Chapter 5. Fixed-Point Instructions . the POWER Architecture . . . . . . . . 749 A.1 New Instructions, Formerly Privileged 685 Instructions . . . . . . . . . . . . . . . . . . . . . 749 5.1 Fixed-Point Load Instructions . . . . 685 A.2 Newly Privileged 5.2 Fixed-Point Store Instructions. . . . 689 Instructions . . . . . . . . . . . . . . . . . . . . . 749 5.3 Fixed-Point Load and Store with Byte A.3 Reserved Fields in Reversal Instructions. . . . . . . . . . . . . . 692 Instructions . . . . . . . . . . . . . . . . . . . . . 749 5.4 Fixed-Point Load and Store Multiple A.4 Reserved Bits in Registers . . . . . . 749 Instructions . . . . . . . . . . . . . . . . . . . . . 692 A.5 Alignment Check. . . . . . . . . . . . . . 749 5.5 Fixed-Point Arithmetic Instructions 693 A.6 Condition Register . . . . . . . . . . . . 750 5.6 Fixed-Point Compare and Bit Test A.7 LK and Rc Bits . . . . . . . . . . . . . . . 750 Instructions . . . . . . . . . . . . . . . . . . . . . 697 A.8 BO Field . . . . . . . . . . . . . . . . . . . . 750 5.7 Fixed-Point Trap Instructions . . . . 701 A.9 BH Field . . . . . . . . . . . . . . . . . . . . 750 5.8 Fixed-Point Select Instruction . . . . 701 A.10 Branch Conditional to Count Register 5.9 Fixed-Point Logical, Bit, and Move 750 Instructions . . . . . . . . . . . . . . . . . . . . . 702 A.11 System Call. . . . . . . . . . . . . . . . . 750 5.10 Fixed-Point Rotate and Shift Instruc- A.12 Fixed-Point Exception tions . . . . . . . . . . . . . . . . . . . . . . . . . . 707 Register (XER) . . . . . . . . . . . . . . . . . . 751 5.11 Move To/From System Register A.13 Update Forms of Storage Access Instructions . . . . . . . . . . . . . . . . . . . . . 710 Instructions . . . . . . . . . . . . . . . . . . . . . 751 A.14 Multiple Register Loads . . . . . . . 751 Chapter 6. Storage Control A.15 Load/Store Multiple Instructions . 751 Instructions . . . . . . . . . . . . . . . . . . 711 A.16 Move Assist Instructions . . . . . . . 751 6.1 Storage Synchronization Instructions . A.17 Move To/From SPR. . . . . . . . . . . 751 711 A.18 Effects of Exceptions on FPSCR Bits 6.2 Cache Management Instructions . 712 FR and FI. . . . . . . . . . . . . . . . . . . . . . . 752 6.3 Cache Locking Instructions. . . . . . 712 A.19 Store Floating-Point Single Instruc- 6.4 TLB Management Instructions . . . 712 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 752 6.5 Instruction Alignment and Byte Order- A.20 Move From FPSCR. . . . . . . . . . . 752 ing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712 A.21 Zeroing Bytes in the Data Cache 752 A.22 Synchronization . . . . . . . . . . . . . 752 Chapter 7. Additional Categories A.23 Move To Machine State Register Instruction . . . . . . . . . . . . . . . . . . . . . . 752 Available in VLE . . . . . . . . . . . . . . 713 A.24 Direct-Store Segments . . . . . . . . 752 7.1 Move Assist . . . . . . . . . . . . . . . . . 713 A.25 Segment Register 7.2 Vector . . . . . . . . . . . . . . . . . . . . . . 713 Manipulation Instructions. . . . . . . . . . . 752 7.3 Signal Processing Engine. . . . . . . 713 A.26 TLB Entry Invalidation. . . . . . . . . 753 7.4 Embedded Floating Point . . . . . . . 713 A.27 Alignment Interrupts . . . . . . . . . . 753 7.5 Legacy Move Assist . . . . . . . . . . . 713 A.28 Floating-Point Interrupts . . . . . . . 753 Table of Contents xvii Version 2.04 A.29 Timing Facilities . . . . . . . . . . . . . .753 A.29.1 Real-Time Clock . . . . . . . . . . . .753 A.29.2 Decrementer . . . . . . . . . . . . . . .753 A.30 Deleted Instructions . . . . . . . . . . .754 A.31 Discontinued Opcodes. . . . . . . . .754 A.32 POWER2 Compatibility . . . . . . . .755 A.32.1 Cross-Reference for Changed POWER2 Mnemonics. . . . . . . . . . . . . .755 A.32.2 Floating-Point Conversion to Inte- ger . . . . . . . . . . . . . . . . . . . . . . . . . . . .755 A.32.3 Floating-Point Interrupts . . . . . .755 A.32.4 Trace . . . . . . . . . . . . . . . . . . . . .755 A.33 Deleted Instructions . . . . . . . . . . .755 A.33.1 Discontinued Opcodes . . . . . . .756 Appendix B. Platform Support Requirements . . . . . . . . . . . . . . . . 757 Appendix C. Complete SPR List . 759 Appendix D. Illegal Instructions . 763 Appendix E. Reserved Instructions . 765 Appendix F. Opcode Maps. . . . . . 767 Appendix G. Power ISA Instruction Set Sorted by Category. . . . . . . . . 791 Appendix H. Power ISA Instruction Set Sorted by Opcode . . . . . . . . . . 807 Appendix I. Power ISA Instruction Set Sorted by Mnemonic . . . . . . . 823 Index. . . . . . . . . . . . . . . . . . . . . . . . . 841 Last Page - End of Document . . . . 851 xviii Power ISATM Version 2.04 Figures Preface ................................................. iii 35. Count Register . . . . . . . . . . . . . . . . . . . . . . . . . 27 36. BO field encodings . . . . . . . . . . . . . . . . . . . . . . 28 37. "at" bit encodings . . . . . . . . . . . . . . . . . . . . . . . 28 Table of Contents ................................. v 38. BH field encodings . . . . . . . . . . . . . . . . . . . . . . 28 39. General Purpose Registers . . . . . . . . . . . . . . . 38 Figures................................................ xix 40. Fixed-Point Exception Register . . . . . . . . . . . . 38 41. Program Priority Register. . . . . . . . . . . . . . . . . 39 42. Software-use SPRs . . . . . . . . . . . . . . . . . . . . . 39 Book I: 43. Priority levels for or Rx,Rx,Rx . . . . . . . . . . . . . 73 44. Floating-Point Registers. . . . . . . . . . . . . . . . . . 95 Power ISA User Instruction Set Architec- 45. Floating-Point Status and Control Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 ture ....................................................... 1 46. Floating-Point Result Flags . . . . . . . . . . . . . . . 97 47. Floating-point single format. . . . . . . . . . . . . . . 97 1. Category Listing . . . . . . . . . . . . . . . . . . . . . . . . . 9 48. Floating-point double format . . . . . . . . . . . . . . 98 2. Logical processing model . . . . . . . . . . . . . . . . . 11 49. IEEE floating-point fields . . . . . . . . . . . . . . . . . 98 3. Power ISA user register set. . . . . . . . . . . . . . . . 12 50. Approximation to real numbers . . . . . . . . . . . . 98 4. I instruction format. . . . . . . . . . . . . . . . . . . . . . . 13 51. Selection of Z1 and Z2 . . . . . . . . . . . . . . . . . . 102 5. B instruction format . . . . . . . . . . . . . . . . . . . . . . 13 52. IEEE 64-bit execution model . . . . . . . . . . . . . 108 6. SC instruction format. . . . . . . . . . . . . . . . . . . . . 14 53. Interpretation of G, R, and X bits . . . . . . . . . . 108 7. D instruction format . . . . . . . . . . . . . . . . . . . . . . 14 54. Location of the Guard, Round, and 8. DS instruction format. . . . . . . . . . . . . . . . . . . . . 14 Sticky bits in the IEEE execution model . . . 108 9. DQ instruction format . . . . . . . . . . . . . . . . . . . . 14 55. Multiply-add 64-bit execution model. . . . . . . . 109 10. X instruction format . . . . . . . . . . . . . . . . . . . . . 14 56. Location of the Guard, Round, and Sticky bits in the 11. XL instruction format . . . . . . . . . . . . . . . . . . . . 15 multiply-add execution model . . . . . . . . . . . 109 12. XFX instruction format. . . . . . . . . . . . . . . . . . . 15 57. Vector Register elements. . . . . . . . . . . . . . . . 135 13. XFL instruction format . . . . . . . . . . . . . . . . . . . 15 58. Vector Registers. . . . . . . . . . . . . . . . . . . . . . . 135 14. XS instruction format . . . . . . . . . . . . . . . . . . . . 15 59. Vector Status and Control Register . . . . . . . . 135 15. XO instruction format. . . . . . . . . . . . . . . . . . . . 15 60. VR Save Register. . . . . . . . . . . . . . . . . . . . . . 136 16. A instruction format . . . . . . . . . . . . . . . . . . . . . 15 61. Aligned quadword storage operand . . . . . . . . 137 17. M instruction format. . . . . . . . . . . . . . . . . . . . . 15 62. Vector Register contents for aligned quadword 18. MD instruction format . . . . . . . . . . . . . . . . . . . 15 Load or Store . . . . . . . . . . . . . . . . . . . . . . . 137 19. MDS instruction format . . . . . . . . . . . . . . . . . . 15 63. Unaligned quadword storage operand . . . . . . 137 20. VA instruction format . . . . . . . . . . . . . . . . . . . . 15 64. Vector Register contents . . . . . . . . . . . . . . . . 137 21. VC instruction format. . . . . . . . . . . . . . . . . . . . 15 65. Vector Register contents after Vector OR . . . 138 22. VX instruction format . . . . . . . . . . . . . . . . . . . . 16 66. GPR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 23. EVX instruction format. . . . . . . . . . . . . . . . . . . 16 67. Accumulator . . . . . . . . . . . . . . . . . . . . . . . . . . 202 24. EVS instruction format . . . . . . . . . . . . . . . . . . 16 68. Signal Processing and Embedded Floating-Point Sta- 25. Storage operands and byte ordering. . . . . . . . 21 tus and Control Register . . . . . . . . . . . . . . . . . 202 26. C structure `s', showing values of elements . . 21 69. Floating-Point Data Format . . . . . . . . . . . . . . 256 27. Big-Endian mapping of structure `s'. . . . . . . . . 21 28. Little-Endian mapping of structure `s' . . . . . . . 21 Book II: 29. Instructions and byte ordering . . . . . . . . . . . . . 22 30. Assembly language program `p' . . . . . . . . . . . 22 31. Big-Endian mapping of program `p' . . . . . . . . . 22 Power ISA Virtual Environment Architec- 32. Little-Endian mapping of program `p'. . . . . . . . 22 ture.................................................... 339 33. Condition Register. . . . . . . . . . . . . . . . . . . . . . 26 34. Link Register . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1. Performance effects of storage operand placement Figures xix Version 2.04 355 40. Decrementer . . . . . . . . . . . . . . . . . . . . . . . . . 482 2. [Category: Server] Performance effects of storage 41. Hypervisor Decrementer . . . . . . . . . . . . . . . . 483 operand placement, Little-Endian . . . . . . . 356 42. Processor Utilization of Resources Register . 483 3. Time Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 43. Data Address Breakpoint Register. . . . . . . . . 485 4. Alternate Time Base . . . . . . . . . . . . . . . . . . . . 380 44. Data Address Breakpoint Register Extension 485 45. External Access Register . . . . . . . . . . . . . . . . 487 46. Performance Monitor SPR encodings for Book III-S: mtspr and mfspr . . . . . . . . . . . . . . . . . . . . . 497 47. Performance Monitor Counter registers . . . . . 497 Power ISA Operating Environment Archi- 48. Monitor Mode Control Register 0 . . . . . . . . . . 498 tecture - Server Environment ............ 391 49. Monitor Mode Control Register 1 . . . . . . . . . . 500 50. Monitor Mode Control Register A. . . . . . . . . . 500 51. Sampled Instruction Address Register. . . . . . 501 1. Logical Partitioning Control Register . . . . . . . . 397 52. Sampled Data Address Register . . . . . . . . . . 501 2. Real Mode Offset Register . . . . . . . . . . . . . . . 399 3. Hypervisor Real Mode Offset Register . . . . . . 399 4. Logical Partition Identification Register . . . . . . 399 Book III-E: 5. Machine State Register . . . . . . . . . . . . . . . . . . 401 6. Processor Version Register. . . . . . . . . . . . . . . 407 Power ISA Operating Environment Archi- 7. Processor Identification Register. . . . . . . . . . . 408 8. Control Register . . . . . . . . . . . . . . . . . . . . . . . 408 tecture - Embedded Environment..... 507 9. Program Priority Register . . . . . . . . . . . . . . . . 408 10. Software-use SPRs . . . . . . . . . . . . . . . . . . . . 409 1. Machine State Register . . . . . . . . . . . . . . . . . . 513 11. SPRs for use by hypervisor programs. . . . . . 409 2. Processor Version Register . . . . . . . . . . . . . . . 519 12. Priority levels for or Rx,Rx,Rx . . . . . . . . . . . . 411 3. Processor Identification Register . . . . . . . . . . . 520 13. SPR encodings . . . . . . . . . . . . . . . . . . . . . . . 412 4. Special Purpose Registers. . . . . . . . . . . . . . . . 520 14. SLBE for VRMA. . . . . . . . . . . . . . . . . . . . . . . 425 5. External Process ID Load Context Register. . . 521 15. Translation of 64-bit effective address to 6. External Process ID Store Context Register . . 522 78 bit virtual address . . . . . . . . . . . . . . . . . 427 7. Embedded SPR List. . . . . . . . . . . . . . . . . . . . . 524 16. SLB Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 428 8. Virtual Address to TLB Entry Match Process . . 546 17. SLBLL||LP Encoding . . . . . . . . . . . . . . . . . . . . 428 9. Effective-to-Real Address Translation Flow . . . 547 18. Translation of 78-bit virtual address to 60-bit real 10. Access Control Process . . . . . . . . . . . . . . . . . 548 address . . . . . . . . . . . . . . . . . . . . . . . . . . . 430 11. Storage control bits . . . . . . . . . . . . . . . . . . . . 552 19. Page Table Entry. . . . . . . . . . . . . . . . . . . . . . 431 12. Exception Syndrome Register 20. Format of PTELP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 567 21. SDR1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 13. Interrupt Vector Offset Register 22. Setting the Reference and Change bits . . . . 436 Assignments . . . . . . . . . . . . . . . . . . . . . . . . 568 23. Authority Mask Register (AMR). . . . . . . . . . . 437 14. External Proxy Register . . . . . . . . . . . . . . . . . 569 24. PP bit protection states, address 15. Interrupt and Exception Types . . . . . . . . . . . . 575 translation enabled. . . . . . . . . . . . . . . . . . . 439 16. Interrupt Hierarchy . . . . . . . . . . . . . . . . . . . . . 589 25. Protection states, address translation 17. Machine State Register Initial Values . . . . . . 595 disabled . . . . . . . . . . . . . . . . . . . . . . . . . . . 439 18. TLB Initial Values . . . . . . . . . . . . . . . . . . . . . . 596 26. Storage control bits . . . . . . . . . . . . . . . . . . . . 441 19. Time Base . . . . . . . . . . . . . . . . . . . . . . . . . . . 597 27. GPR contents for slbmte . . . . . . . . . . . . . . . . 445 20. Decrementer . . . . . . . . . . . . . . . . . . . . . . . . . 599 28. GPR contents for slbmfev . . . . . . . . . . . . . . . 446 21. Decrementer . . . . . . . . . . . . . . . . . . . . . . . . . 600 29. GPR contents for slbmfee . . . . . . . . . . . . . . . 446 22. . . . . . . . .Relationships of the Timer Facilities 600 30. GPR contents for mtsr, mtsrin, mfsr, and 23. Watchdog State Machine . . . . . . . . . . . . . . . . 603 mfsrin . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 24. Watchdog Timer Controls . . . . . . . . . . . . . . . 603 31. Save/Restore Registers . . . . . . . . . . . . . . . . 460 25. Data Cache Debug Tag Register High. . . . . . 630 32. Hypervisor Save/Restore Registers . . . . . . . 460 26. Data Cache Debug Tag Register Low . . . . . . 630 33. Data Address Register . . . . . . . . . . . . . . . . . 460 27. Instruction Cache Debug Data Register. . . . . 631 34. Hypervisor Data Address Register . . . . . . . . 460 28. Instruction Cache Debug Tag Register High . 631 35. Data Storage Interrupt Status Register . . . . . 460 29. Instruction Cache Debug Tag Register Low . 631 36. Hypervisor Data Storage Interrupt Status Register 30. Process ID Register (PID0­PID2) . . . . . . . . . 639 461 31. MAS0 register . . . . . . . . . . . . . . . . . . . . . . . . 640 37. MSR setting due to interrupt . . . . . . . . . . . . . 466 32. MAS1 register . . . . . . . . . . . . . . . . . . . . . . . . 641 38. Effective address of interrupt vector by 33. MAS2 register . . . . . . . . . . . . . . . . . . . . . . . . 641 interrupt type . . . . . . . . . . . . . . . . . . . . . . . 466 34. MAS3 register . . . . . . . . . . . . . . . . . . . . . . . . 642 39. Time Base . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 35. MAS4 register . . . . . . . . . . . . . . . . . . . . . . . . 642 xx Power ISATM Version 2.04 36. MAS6 register . . . . . . . . . . . . . . . . . . . . . . . . 643 37. MAS7 register . . . . . . . . . . . . . . . . . . . . . . . . 643 38. MMU Configuration Register . . . . . . . . . . . . . 645 39. TLB Configuration Register . . . . . . . . . . . . . . 645 40. MMU Control and Status Register 0 . . . . . . . 646 41. Processor States and PMLCan Bit Settings . 654 42. [User] Performance Monitor Global Control Regis- ter 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 43. [User] Performance Monitor Local Control A Regis- ters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656 44. [User] Performance Monitor Local Control B Regis- ter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656 45. [User] Performance Monitor Counter Registers . . 657 46. Embedded.Peformance Monitor PMRs. . . . . 658 Book VLE: Power ISA Operating Environment Archi- tecture - Variable Length Encoding (VLE) Environ ment.................................................. 661 1. BD8 instruction format. . . . . . . . . . . . . . . . . . . 664 2. C instruction format . . . . . . . . . . . . . . . . . . . . . 664 3. IM5 instruction format . . . . . . . . . . . . . . . . . . . 664 4. OIM5 instruction format . . . . . . . . . . . . . . . . . . 664 5. IM7 instruction format . . . . . . . . . . . . . . . . . . . 664 6. R instruction format . . . . . . . . . . . . . . . . . . . . . 665 7. RR instruction format. . . . . . . . . . . . . . . . . . . . 665 8. SD4 instruction format. . . . . . . . . . . . . . . . . . . 665 9. BD15 instruction format. . . . . . . . . . . . . . . . . . 665 10. BD24 instruction format. . . . . . . . . . . . . . . . . 665 11. D8 instruction format . . . . . . . . . . . . . . . . . . . 665 12. I16A instruction format . . . . . . . . . . . . . . . . . 665 13. I16L instruction format. . . . . . . . . . . . . . . . . . 665 14. M instruction format. . . . . . . . . . . . . . . . . . . . 665 15. SC18 instruction format. . . . . . . . . . . . . . . . . 665 16. LI20 instruction format. . . . . . . . . . . . . . . . . . 665 17. Condition Register. . . . . . . . . . . . . . . . . . . . . 675 18. BO32 field encodings . . . . . . . . . . . . . . . . . . 677 19. BO16 field encodings . . . . . . . . . . . . . . . . . . 677 Appendices: Power ISA Books I-III Appendices.... 747 20. Platform Support Requirements . . . . . . . . . . 758 Index ................................................. 841 Last Page - End of Document .......... 851 Figures xxi Version 2.04 xxii Power ISATM Version 2.04 Book I: Power ISA User Instruction Set Architecture Book I: Power ISA User Instruction Set Architecture 1 Version 2.04 2 Power ISATM -- Book I Version 2.04 Chapter 1. Introduction 1.1 Overview. . . . . . . . . . . . . . . . . . . . . . 3 1.6.10 XFL-FORM. . . . . . . . . . . . . . . . . 15 1.2 Instruction Mnemonics and Operands3 1.6.11 XS-FORM. . . . . . . . . . . . . . . . . . 15 1.3 Document Conventions . . . . . . . . . . 3 1.6.12 XO-FORM . . . . . . . . . . . . . . . . . 15 1.3.1 Definitions . . . . . . . . . . . . . . . . . . . 3 1.6.13 A-FORM . . . . . . . . . . . . . . . . . . . 15 1.3.2 Notation . . . . . . . . . . . . . . . . . . . . . 4 1.6.14 M-FORM . . . . . . . . . . . . . . . . . . 15 1.3.3 Reserved Fields and Reserved Val- 1.6.15 MD-FORM . . . . . . . . . . . . . . . . . 15 ues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.6.16 MDS-FORM . . . . . . . . . . . . . . . . 15 1.3.4 Description of Instruction Operation 7 1.6.17 VA-FORM . . . . . . . . . . . . . . . . . . 15 1.3.5 Categories . . . . . . . . . . . . . . . . . . . 9 1.6.18 VC-FORM . . . . . . . . . . . . . . . . . 15 1.3.5.1 Phased-In/Phased-Out . . . . . . . 10 1.6.19 VX-FORM. . . . . . . . . . . . . . . . . . 16 1.3.5.2 Corequisite Category . . . . . . . . 10 1.6.20 EVX-FORM . . . . . . . . . . . . . . . . 16 1.3.5.3 Category Notation. . . . . . . . . . . 10 1.6.21 EVS-FORM . . . . . . . . . . . . . . . . 16 1.3.6 Environments. . . . . . . . . . . . . . . . 10 1.6.22 Instruction Fields . . . . . . . . . . . . 16 1.4 Processor Overview . . . . . . . . . . . . 11 1.7 Classes of Instructions . . . . . . . . . . 18 1.5 Computation modes . . . . . . . . . . . . 13 1.7.1 Defined Instruction Class . . . . . . . 18 1.5.1 Modes [Category: Server] . . . . . . 13 1.7.2 Illegal Instruction Class . . . . . . . . 18 1.5.2 Modes [Category: Embedded]. . . 13 1.7.3 Reserved Instruction Class . . . . . 19 1.6 Instruction formats . . . . . . . . . . . . . 13 1.8 Forms of Defined Instructions . . . . . 19 1.6.1 I-FORM . . . . . . . . . . . . . . . . . . . . 13 1.8.1 Preferred Instruction Forms . . . . . 19 1.6.2 B-FORM . . . . . . . . . . . . . . . . . . . 13 1.8.2 Invalid Instruction Forms . . . . . . . 19 1.6.3 SC-FORM . . . . . . . . . . . . . . . . . . 14 1.9 Exceptions. . . . . . . . . . . . . . . . . . . . 19 1.6.4 D-FORM . . . . . . . . . . . . . . . . . . . 14 1.10 Storage Addressing. . . . . . . . . . . . 20 1.6.5 DS-FORM . . . . . . . . . . . . . . . . . . 14 1.10.1 Storage Operands . . . . . . . . . . . 20 1.6.6 DQ-FORM . . . . . . . . . . . . . . . . . . 14 1.10.2 Instruction Fetches . . . . . . . . . . . 22 1.6.7 X-FORM . . . . . . . . . . . . . . . . . . . 14 1.10.3 Effective Address Calculation. . . 23 1.6.8 XL-FORM . . . . . . . . . . . . . . . . . . 15 1.6.9 XFX-FORM . . . . . . . . . . . . . . . . . 15 1.1 Overview addis RT,RA,SI Power ISA-compliant Assemblers will support the mne- This chapter describes computation modes, document monics and operand lists exactly as shown. They conventions, a processor overview, instruction formats, should also provide certain extended mnemonics, such storage addressing, and instruction fetching. as the ones described in Appendix D of Book I. 1.2 Instruction Mnemonics and 1.3 Document Conventions Operands The description of each instruction includes the mne- 1.3.1 Definitions monic and a formatted list of operands. Some exam- The following definitions are used throughout this docu- ples are the following. ment. stw RS,D(RA) 1 program A sequence of related instructions. Chapter 1. Introduction 3 Version 2.04 1 application program 1 boundedly undefined A program that uses only the instructions and The results of executing a given instruction are resources described in Books I and II. said to be boundedly undefined if they could have been achieved by executing an arbitrary finite 1 quadwords, doublewords, words, halfwords, sequence of instructions (none of which yields and bytes boundedly undefined results) in the state the pro- 128 bits, 64 bits, 32 bits, 16 bits, and 8 bits, cessor was in before executing the given instruc- respectively. tion. Boundedly undefined results may include the 1 positive presentation of inconsistent state to the system Means greater than zero. error handler as described in Section 1.8.1 of Book II. Boundedly undefined results for a given instruc- 1 negative tion may vary between implementations, and Means less than zero. between different executions on the same imple- 1 floating-point single format (or simply single mentation. format) 1 "must" Refers to the representation of a single-precision If software violates a rule that is stated using the binary floating-point value in a register or storage. word "must" (e.g., "this field must be set to 0"), the 1 floating-point double format (or simply double results are boundedly undefined unless otherwise format) stated. Refers to the representation of a double-precision 1 sequential execution model binary floating-point value in a register or storage. The model of program execution described in 1 system library program Section 2.2, "Instruction Execution Order" on A component of the system software that can be page 25. called by an application program using a Branch 1 Auxiliary Processor instruction. An implementation-specific processing unit. Previ- 1 system service program ous versions of the architecture use the term Auxil- A component of the system software that can be iary Processing Unit (APU) to describe this called by an application program using a System extension of the architecture. Architectural support Call instruction. for auxiliary processors is part of the Embedded category. 1 system trap handler A component of the system software that receives control when the conditions specified in a Trap instruction are satisfied. 1.3.2 Notation 1 system error handler The following notation is used throughout the Power A component of the system software that receives ISA documents. control when an error occurs. The system error 1 All numbers are decimal unless specified in some handler includes a component for each of the vari- special way. ous kinds of error. These error-specific compo- nents are referred to as the system alignment error - 0bnnnn means a number expressed in binary handler, the system data storage error handler, format. etc. - 0xnnnn means a number expressed in hexa- decimal format. 1 latency Refers to the interval from the time an instruction Underscores may be used between digits. begins execution until it produces a result that is 1 RT, RA, R1, ... refer to General Purpose Registers. available for use by a subsequent instruction. 1 FRT, FRA, FR1, ... refer to Floating-Point Regis- 1 unavailable ters. Refers to a resource that cannot be used by the program. For example, storage is unavailable if 1 VRT, VRA, VR1, ... refer to Vector Registers. access to it is denied. See Book III. 1 (x) means the contents of register x, where x is the 1 undefined value name of an instruction field. For example, (RA) May vary between implementations, and between means the contents of register RA, and (FRA) different executions on the same implementation, means the contents of register FRA, where RA and similarly for register contents, storage con- and FRA are instruction fields. Names such as LR tents, etc., that are specified as being undefined. and CTR denote registers, not fields, so parenthe- ses are not used with them. Parentheses are also 4 Power ISATM -- Book I Version 2.04 omitted when register x is the register into which 1 ?, ??, ???, ... denotes an implementation-depen- the result of an operation is placed. dent field in a register, instruction, field or bit string. 1 (RA|0) means the contents of register RA if the RA field has the value 1-31, or the value 0 if the RA 1.3.3 Reserved Fields and field is 0. Reserved Values 1 Bits in registers, instructions, fields, and bit strings are specified as follows. In the last three items Reserved fields in instructions are ignored by the pro- (definition of Xp etc.), if X is a field that specifies a cessor. This is a requirement in the Server environment GPR, FPR, or VR (e.g., the RS field of an instruc- and is being phased into the Embedded environment. tion), the definitions apply to the register, not to the In some cases a defined field of an instruction has cer- field. tain values that are reserved. This includes cases in - Bits in instructions, fields, and bit strings are which the field is shown in the instruction layout as con- numbered from left to right, starting with bit 0 taining a particular value; in such cases all other values of the field are reserved. In general, if an instruction is - For all registers except the Vector category, coded such that a defined field contains a reserved bits in registers that are less than 64 bits start value the instruction form is invalid; see Section 1.8.2 with bit number 64-L, where L is the register on page 19. The only exception to the preceding rule is length; for the Vector category, bits in registers that it does not apply to portions of defined fields that that are less than 128 bits start with bit num- are specified, in the instruction description, as being ber 128-L. treated as reserved fields. - The leftmost bit of a sequence of bits is the most significant bit of the sequence. To maximize compatibility with future architecture - Xp means bit p of register/instruction/field/ extensions, software must ensure that reserved fields bit_string X. in instructions contain zero and that defined fields of - Xp:q means bits p through q of register/instruc- instructions do not contain reserved values. tion/field/bit_string X. - Xp q ... means bits p, q, ... of register/instruc- The handling of reserved bits in System Registers (e.g., tion/field/bit_string X. XER, FPSCR) is implementation-dependent. Unless otherwise stated, software is permitted to write any 1 ¬(RA) means the one's complement of the con- value to such a bit. A subsequent reading of the bit tents of register RA. returns 0 if the value last written to the bit was 0 and returns an undefined value (0 or 1) otherwise. 1 A period (.) as the last character of an instruction In some cases a defined field of a System Register has mnemonic means that the instruction records sta- certain values that are reserved. Software must not set tus information in certain fields of the Condition a defined field of a System Register to a reserved Register as a side effect of execution. value. 1 The symbol || is used to describe the concatena- References elsewhere in this document to a defined tion of two values. For example, 010 || 111 is the field (in an instruction or System Register) that has same as 010111. reserved values assume the field does not contain a 1 xn means x raised to the nth power. reserved value, unless otherwise stated or obvious from context. 1 nx means the replication of x, n times (i.e., x con- catenated to itself n-1 times). (n)0 and (n)1 are Assembler Note special cases: Assemblers should report uses of reserved values n - 0 means a field of n bits with each bit equal to of defined fields of instructions as errors. 0. Thus 50 is equivalent to 0b00000. - n1 means a field of n bits with each bit equal to 1. Thus 51 is equivalent to 0b11111. 1 Each bit and field in instructions, and in status and control registers (e.g., XER, FPSCR) and Special Purpose Registers, is either defined or reserved. Some defined fields contain reserved values. In such cases when this document refers to the spe- cific field, it refers only to the defined values, unless otherwise specified. 1 /, //, ///, ... denotes a reserved field, in a register, instruction, field, or bit string. Chapter 1. Introduction 5 Version 2.04 Programming Note It is the responsibility of software to preserve bits that are now reserved in System Registers, because they may be assigned a meaning in some future version of the architecture. In order to accomplish this preservation in imple- mentation-independent fashion, software should do the following. 1 Initialize each such register supplying zeros for all reserved bits. 1 Alter (defined) bit(s) in the register by reading the register, altering only the desired bit(s), and then writing the new value back to the reg- ister. The XER and FPSCR are partial exceptions to this recommendation. Software can alter the status bits in these registers, preserving the reserved bits, by executing instructions that have the side effect of altering the status bits. Similarly, software can alter any defined bit in the FPSCR by executing a Float- ing-Point Status and Control Register instruction. Using such instructions is likely to yield better per- formance than using the method described in the second item above. 6 Power ISATM -- Book I Version 2.04 1.3.4 Description of Instruction CEIL(x) Least integer x DCR(x) Device Control Register x Operation DOUBLE(x) Result of converting x from floating-point single format to floating-point double for- Instruction descriptions (including related material such mat, using the model shown on page 111 as the introduction to the section describing the instruc- EXTS(x) Result of extending x on the left with sign tions) mention that the instruction may cause a system bits error handler to be invoked, under certain conditions, if FLOOR(x) Greatest integer x and only if the system error handler may treat the case GPR(x) General Purpose Register x as a programming error. (An instruction may cause a MASK(x, y) Mask having 1s in positions x through y system error handler to be invoked under other condi- (wrapping if x > y) and 0s elsewhere tions as well; see Chapter 6 of Book III-S and Chapter 5 MEM(x, y) Contents of a sequence of y bytes of stor- of Book III-E). age. The sequence depends on the byte A formal description is given of the operation of each ordering used for storage access, as fol- instruction. In addition, the operation of most instruc- lows. tions is described by a semiformal language at the reg- Big-Endian byte ordering: ister transfer level (RTL). This RTL uses the notation The sequence starts with the byte at given below, in addition to the notation described in address x and ends with the byte at Section 1.3.2. Some of this notation is also used in the address x+y-1. formal descriptions of instructions. RTL notation not Little-Endian byte ordering: summarized here should be self-explanatory. The sequence starts with the byte at address x+y-1 and ends with the byte at The RTL descriptions cover the normal execution of the address x. instruction, except that "standard" setting of status reg- ROTL64(x, y) isters, such as the Condition Register, is not shown. Result of rotating the 64-bit value x left y ("Non-standard" setting of these registers, such as the positions setting of the Condition Register by the Compare ROTL32(x, y) instructions, is shown.) The RTL descriptions do not Result of rotating the 64-bit value x||x left y cover cases in which the system error handler is positions, where x is 32 bits long invoked, or for which the results are boundedly unde- SINGLE(x) Result of converting x from floating-point fined. double format to floating-point single for- The RTL descriptions specify the architectural transfor- mat, using the model shown on page 114 mation performed by the execution of an instruction. SPR(x) Special Purpose Register x They do not imply any particular implementation. TRAP Invoke the system trap handler characterization Reference to the setting of status bits, in a Notation Meaning standard way that is explained in the text 1 Assignment undefined An undefined value. 1iea Assignment of an instruction effective address. In 32-bit mode the high-order 32 bits of the 64-bit target address are set to 0. CIA Current Instruction Address, which is the ¬ NOT logical operator 64-bit address of the instruction being + Two's complement addition described by a sequence of RTL. Used by - Two's complement subtraction, unary relative branches to set the Next Instruc- minus tion Address (NIA), and by Branch instruc- × Multiplication tions with LK=1 to set the Link Register. ×si Signed-integer multiplication Does not correspond to any architected ×ui Unsigned-integer multiplication register. / Division NIA Next Instruction Address, which is the ÷ Division, with result truncated to integer 64-bit address of the next instruction to be Square root executed. For a successful branch, the =, Equals, Not Equals relations next instruction address is the branch tar- <, , >, Signed comparison relations get address: in RTL, this is indicated by u Unsigned comparison relations assigning a value to NIA. For other ? Unordered comparison relation instructions that cause non-sequential &, | AND, OR logical operators instruction fetching (see Book III), the RTL , Exclusive OR, Equivalence logical opera- is similar. For instructions that do not tors ((ab) = (a¬b)) branch, and do not otherwise cause ABS(x) Absolute value of x instruction fetching to be non-sequential, Chapter 1. Introduction 7 Version 2.04 the next instruction address is CIA+4 (VLE The precedence rules for RTL operators are summa- behavior is different; see Book VLE). Does rized in Table 1. Operators higher in the table are not correspond to any architected register. applied before those lower in the table. Operators at the if... then... else... same level in the table associate from left to right, from Conditional execution, indenting shows right to left, or not at all, as shown. (For example, - range; else is optional. associates from left to right, so a-b-c = (a-b)-c.) do Do loop, indenting shows range. "To" and/ Parentheses are used to override the evaluation order or "by" clauses specify incrementing an implied by the table or to increase clarity; parenthe- iteration variable, and a "while" clause sized expressions are evaluated before serving as gives termination conditions. operands. leave Leave innermost do loop, or do loop described in leave statement. Table 1: Operator precedence for For loop, indenting shows range. Clause Operators Associativity after "for" specifies the entities for which to execute the body of the loop. subscript, function evaluation left to right pre-superscript (replication), right to left post-superscript (exponentiation) unary -, ¬ right to left ×, ÷ left to right +, -, left to right || left to right =, , <, , >, ,u, ? left to right &, , left to right | left to right : (range) none 1,1iea none 8 Power ISATM -- Book I Version 2.04 1.3.5 Categories dent categories are identified by the "." in their category name, e.g., if an implementation supports the Float- Each facility (including registers and fields therein) and ing-Point.Record category, then the Floating-Point cat- instruction is in exactly one of the categories listed in egory is also supported. Figure 1. An implementation that supports a facility or instruction A category may be defined as a dependent category. in a given category, except for the two categories These are categories that are supported only if the cat- described in Section 1.3.5.1, supports all facilities and egory they are dependent on is also supported. Depen- instructions in that category. Category Abvr. Notes Base B Required for all implementations Server S Required for Server implementations Embedded E Required for Embedded implementations Alternate Time Base ATB An additional Time Base; see Book II Cache Specification CS Specify a specific cache for some instructions; see Book II Embedded.Cache Debug E.CD Provides direct access to cache data and directory content Embedded.Cache Initialization E.CI Instructions that invalidate the entire cache Embedded.Enhanced Debug E.ED Embedded Enhanced Debug facility; see Book III-E Embedded.External PID E.PD Embedded External PID facility; see Book III-E Embedded.Little-Endian E.LE Embedded Little-Endian page attribute; see Book III-E Embedded.MMU Type FSL E.MF Embedded MMU example Type FSL; see Book III-E Embedded.Performance Monitor E.PM Embedded performance monitor example; see Book III-E Embedded.Processor Control E.PC Processor control facility; see Book III-E Embedded Cache Locking ECL Embedded Cache Locking facility; see Book III-E External Control EC External Control facility; see Book II External Proxy EXP External Proxy facility; see Book III-E Floating-Point FP Floating-Point Facilities Floating-Point.Record FP.R Floating-Point instructions with Rc=1 Legacy Move Assist LMV Determine Left most Zero Byte instruction Legacy Integer Multiply-Accumulate1 LMA Legacy Integer Multiply-accumulate instructions Load/Store Quadword LSQ Load/Store Quadword instructions; see Book III-S Memory Coherence MMC Requirement for Memory Coherence; see Book II Move Assist MA Move Assist instructions Server.Performance Monitor S.PM Performance monitor example for Servers; see Book III-S Signal Processing Engine1, 2 SP Facility for signal processing SPE.Embedded Float Scalar Double SP.FD GPR-based Floating-Point double-precision instruction set SPE.Embedded Float Scalar Single SP.FS GPR-based Floating-Point single-precision instruction set SPE.Embedded Float Vector SP.FV GPR-based Floating-Point Vector instruction set Stream STM Stream variant of dcbt instruction; see Book II Trace TRC Trace Facility; see Book III-S Variable Length Encoding VLE Variable Length Encoding facility; see Book VLE Vector1 V Vector facilities Vector.Little-Endian V.LE Little-Endian support for Vector storage operations. Wait WT wait instruction; see Book II 64-Bit 64 Required for 64-bit implementations; not defined for 32-bit impl's 1 Because of overlapping opcode usage, SPE is mutually exclusive with Vector and with Legacy Integer Multi- ply-Accumulate, and Legacy Integer Multiply-Accumulate is mutually exclusive with Vector. 2 The SPE-dependent Floating-Point categories are collectively referred to as SPE.Embedded Float_* or SP.*. Figure 1. Category Listing Chapter 1. Introduction 9 Version 2.04 An instruction in a category that is not supported by the The shorthand and may also be used for Cat- implementation is treated as an illegal instruction or an egory: Embedded and Server respectively. unimplemented instruction on that implementation (see Section 1.7.2). 1.3.6 Environments For an instruction that is supported by the implementa- tion with field values that are defined by the architec- All implementations support one of the two defined ture, the field values defined as part of a category that environments, Server or Embedded. Environments is not supported by the implementation are treated as refer to common subsets of instructions that are shared reserved values on that implementation (see Section across many implementations. The Server environment 1.3.3 and Section 1.8.2). describes implementations that support Category: Base and Server. The Embedded environment Bits in a register that are in a category that is not sup- describes implementations that support Category: ported by the implementation are treated as reserved. Base and Embedded. 1.3.5.1 Phased-In/Phased-Out There are two special dependent categories, Phased-In and Phased-Out, defined below. These categories have the exception that an implementation may support a subset of the instructions or facilities defined as being part of the category. Phased-In These are facilities and instructions that, in some future version of the architecture, will be required as part of the category they are dependent on. Phased-Out These are facilities and instructions that, in some future version of the architecture, will be dropped out of the architecture. System developers should develop a migration plan to eliminate use of them in new systems. Programming Note Warning: Instructions and facilities being phased out of the architecture are likely to perform poorly on future implementations. New programs should not use them. 1.3.5.2 Corequisite Category A corequisite category is an additional category that is associated with an instruction or facility, and must be implemented if the instruction or facility is implemented. 1.3.5.3 Category Notation Instructions and facilities are considered part of the Base category unless otherwise marked. If a section is marked with a specific category tag, all material in that section and its subsections are considered part of the category, unless otherwise marked. Overview sections may contain discussion of instructions and facilities from various categories without being explicitly marked. An example of a category tag is: [Category: Server]. An example of a dependent category is: [Category: Server.Phased-In] 10 Power ISATM -- Book I Version 2.04 1.4 Processor Overview The processor implements the instruction set, the stor- age model, and other facilities defined in this docu- Branch ment. There are four basic classes of instructions: Processing 1 branch instructions (Chapter 2) Category: Category: 1 fixed-point instructions (Chapter 3), and other Fixed-Point Floating-Point Vector instructions that use the fixed-point registers Instructions Instructions Instructions (Chapters 6, 7, 8, and 9) 1 floating-point instructions (Chapter 4) 1 vector instructions (Chapter 5) Fixed-point instructions operate on byte, halfword, Fixed-Pt Float-Pt Vector word, and doubleword operands. Floating-point instruc- Processing Processing Processing tions operate on single-precision and double-precision floating-point operands. Vector instructions operate on vectors of scalar quantities and on scalar quantities Data to/from where the scalar size is byte, halfword, word, and quad- Storage word. The Power ISA uses instructions that are four bytes long and word-aligned (VLE has different instruc- tion characteristics; see Book VLE). It provides for byte, halfword, word, and doubleword operand fetches and stores between storage and a set of 32 General Pur- pose Registers (GPRs). It provides for word and dou- bleword operand fetches and stores between storage Storage and a set of 32 Floating-Point Registers (FPRs). It also Instructions from Storage provides for byte, halfword, word, and quadword oper- and fetches and stores between storage and a set of 32 Vector Registers (VRs). Figure 2. Logical processing model Signed integers are represented in two's complement form. There are no computational instructions that modify storage; instructions that reference storage may refor- mat the data (e.g. load halfword algebraic). To use a storage operand in a computation and then modify the same or another storage location, the contents of the storage operand must be loaded into a register, modi- fied, and then stored back to the target location. Figure 2 is a logical representation of instruction pro- cessing. Figure 3 shows the registers of the Power ISA User Instruction Set Architecture. Chapter 1. Introduction 11 Version 2.04 CR Category: Floating-Point: 32 63 "Condition Register" on page 26 FPR 0 FPR 1 LR ... 0 63 ... "Link Register" on page 27 FPR 30 FPR 31 CTR 0 63 0 63 "Floating-Point Registers" on page 95 "Count Register" on page 27 FPSCR GPR 0 32 63 GPR 1 "Floating-Point Status and Control Register" on ... page 95 ... Category: Vector: GPR 30 VR 0 GPR 31 VR 1 0 63 ... "General Purpose Registers" on page 38 ... VR 30 XER VR 31 0 63 0 127 "Fixed-Point Exception Register" on page 38 "Vector Registers" on page 135 Category: Embedded: VSCR 96 127 SPRG4 "Vector Status and Control Register" on page 135 SPRG5 SPRG6 Category: SPE: SPRG7 Accumulator 0 63 0 63 "Software-use SPRs" on page 39. "Accumulator" on page 202 Category: Embedded and Vector SPEFSCR 32 63 VRSAVE "Signal Processing and Embedded Floating-Point Status 32 63 and Control Register" on page 202 "VR Save Register" on page 136 Figure 3. Power ISA user register set 12 Power ISATM -- Book I Version 2.04 1.5 Computation modes 1.6 Instruction formats All instructions are four bytes long and word-aligned 1.5.1 Modes [Category: Server] (except for VLE instructions; see Book VLE). Thus, whenever instruction addresses are presented to the Processors provide two execution modes, 64-bit mode processor (as in Branch instructions) the low-order two and 32-bit mode. In both of these modes, instructions bits are ignored. Similarly, whenever the processor that set a 64-bit register affect all 64 bits. The computa- develops an instruction address the low-order two bits tional mode controls how the effective address is inter- are zero. preted, how status bits are set, how the Link Register is set by Branch instructions in which LK=1, and how the Bits 0:5 always specify the opcode (OPCD, below). Count Register is tested by Branch Conditional instruc- Many instructions also have an extended opcode (XO, tions. Nearly all instructions are available in both below). The remaining bits of the instruction contain modes (the only exceptions are a few instructions that one or more fields as shown below for the different are defined in Book III-S). In both modes, effective instruction formats. address computations use all 64 bits of the relevant The format diagrams given below show horizontally all registers (General Purpose Registers, Link Register, valid combinations of instruction fields. The diagrams Count Register, etc.) and produce a 64-bit result. How- include instruction fields that are used only by instruc- ever, in 32-bit mode the high-order 32 bits of the com- tions defined in Book II or in Book III. puted effective address are ignored for the purpose of addressing storage; see Section 1.10.3 for additional details. Split Field Notation In some cases an instruction field occupies more than 1.5.2 Modes [Category: Embed- one contiguous sequence of bits, or occupies one con- tiguous sequence of bits that are used in permuted ded] order. Such a field is called a split field. In the format diagrams given below and in the individual instruction Processors may provide 32-bit mode, or both 64-bit layouts, the name of a split field is shown in small let- mode and 32-bit mode. The modes differ in the follow- ters, once for each of the contiguous sequences. In the ing ways. RTL description of an instruction having a split field, 1 In 64-bit mode, the processor behaves as and in certain other places where individual bits of a described for 64-bit mode in the Server environ- split field are identified, the name of the field in small ment; see Section 1.5.1. letters represents the concatenation of the sequences 1 In 32-bit mode, instructions other than SP, from left to right. In all other places, the name of the SP.Embedded Float Scalar Double, and field is capitalized and represents the concatenation of SP.Embedded Float Vector use only the lower 32 the sequences in some order, which need not be left to bits of a GPR and produce a 32-bit result. Results right, as described for each affected instruction. written to the GPRs write only the lower 32-bits and the upper 32 bits are undefined except for SP.Embedded Float Scalar Single instructions which leave the upper 32-bits unchanged. SP, SP.Embedded Float Scalar Double, and SP.Embedded Float Vector instructions use all 64 bits of a GPR and produce a 64-bit result regard- 1.6.1 I-FORM less of the mode. 0 6 30 31 Instructions that set condition bits do so based on OPCD LI AA LK the 32-bit result computed. Effective addresses Figure 4. I instruction format and all SPRs operate on the lower 32 bits only unless otherwise stated. The instructions in the 64-Bit category are not necessarily available; if 1.6.2 B-FORM they are not available, attempting to execute such 0 6 11 16 30 31 an instruction causes the system illegal instruction OPCD BO BI BD AA LK error handler to be invoked. Figure 5. B instruction format Floating-Point and Vector instructions operate on FPRs and VPRs, respectively, independent of modes. Chapter 1. Introduction 13 Version 2.04 1.6.3 SC-FORM 1.6.7 X-FORM 0 6 11 16 20 27 30 31 0 6 11 16 21 31 OPCD /// /// // LEV // 1 / OPCD RT RA /// XO / OPCD /// /// /// /// // 1 / OPCD RT RA RB XO / OPCD RT RA NB XO / Figure 6. SC instruction format OPCD RT / SR /// XO / OPCD RT /// RB XO / 1.6.4 D-FORM OPCD RT /// /// XO / 0 6 11 16 31 OPCD RS RA RB XO Rc OPCD RT RA D OPCD RS RA RB XO 1 OPCD RT RA SI OPCD RS RA RB XO / OPCD RS RA D OPCD RS RA NB XO / OPCD RS RA UI OPCD RS RA SH XO Rc OPCD BF / L RA SI OPCD RS RA /// XO Rc OPCD BF / L RA UI OPCD RS RA /// XO / OPCD TO RA SI OPCD RS / SR /// XO / OPCD FRT RA D OPCD RS /// RB XO / OPCD FRS RA D OPCD RS /// /// XO / Figure 7. D instruction format OPCD RS /// L /// XO / OPCD BF / L RA RB XO / 1.6.5 DS-FORM OPCD BF // FRA FRB XO / 0 6 11 16 30 31 OPCD BF // BFA // /// XO / OPCD RT RA DS XO OPCD BF // /// U / XO Rc OPCD RS RA DS XO OPCD BF // /// /// XO / Figure 8. DS instruction format OPCD / TH RA RB XO / OPCD / CT /// /// XO / OPCD / CT RA RB XO / 1.6.6 DQ-FORM OPCD /// L RA RB XO / 0 6 11 16 28 31 OPCD RT RA DQ PT OPCD /// L /// RB XO / OPCD /// L /// /// XO / Figure 9. DQ instruction format OPCD TO RA RB XO / OPCD FRT RA RB XO / OPCD FRT /// FRB XO Rc OPCD FRT /// /// XO Rc OPCD FRS RA RB XO / OPCD BT /// /// XO Rc OPCD /// RA RB XO / OPCD /// /// RB XO / OPCD /// /// /// XO / OPCD /// /// E /// XO / OPCD ??? RA RB XO ? OPCD ??? ??? ??? XO / OPCD VRT RA RB XO / OPCD VRS RA RB XO / OPCD MO /// /// XO / Figure 10. X instruction format 14 Power ISATM -- Book I Version 2.04 1.6.8 XL-FORM 1.6.13 A-FORM 0 6 11 16 21 31 0 6 11 16 21 26 31 OPCD BT BA BB XO / OPCD FRT FRA FRB FRC XO Rc OPCD BO BI /// BH XO LK OPCD FRT FRA FRB /// XO Rc OPCD BF // BFA // /// XO / OPCD FRT FRA /// FRC XO Rc OPCD /// /// /// XO / OPCD FRT /// FRB /// XO Rc OPCD RT RA RB BC XO / Figure 11. XL instruction format Figure 16. A instruction format 1.6.9 XFX-FORM 0 6 11 21 31 1.6.14 M-FORM OPCD RT spr XO / 0 6 11 16 21 26 31 OPCD RT tbr XO / OPCD RS RA RB MB ME Rc OPCD RT 0 /// XO / OPCD RS RA SH MB ME Rc OPCD RT 1 FXM / XO / Figure 17. M instruction format OPCD RT dcr XO / OPCD RT pmrn XO / OPCD DUI DUIS XO / 1.6.15 MD-FORM 0 6 11 16 21 27 30 31 OPCD RS 0 FXM / XO / OPCD RS RA sh mb XO sh Rc OPCD RS 1 FXM / XO / OPCD RS RA sh me XO sh Rc OPCD RS spr XO / OPCD RS dcr XO / Figure 18. MD instruction format OPCD RS pmrn XO / Figure 12. XFX instruction format 1.6.16 MDS-FORM 0 6 11 16 21 27 31 OPCD RS RA RB mb XO Rc 1.6.10 XFL-FORM OPCD RS RA RB me XO Rc 0 6 7 15 16 21 31 OPCD / FLM / FRB XO Rc Figure 19. MDS instruction format Figure 13. XFL instruction format 1.6.17 VA-FORM 1.6.11 XS-FORM 0 6 11 16 21 26 31 0 6 11 16 21 30 31 OPCD VRT VRA VRB VRC XO OPCD RS RA sh XO sh Rc OPCD VRT VRA VRB / SHB XO Figure 14. XS instruction format Figure 20. VA instruction format 1.6.12 XO-FORM 1.6.18 VC-FORM 0 6 11 16 21 22 31 OPCD RT RA RB OE XO Rc 0 6 11 16 21 22 31 OPCD RT RA RB / XO Rc OPCD VRT VRA VRB Rc XO OPCD RT RA /// OE XO Rc Figure 15. XO instruction format Figure 21. VC instruction format Chapter 1. Introduction 15 Version 2.04 1.6.19 VX-FORM B-form branches the effective address of the branch target is the BD field 0 6 11 16 21 31 sign-extended to 64 bits. OPCD VRT VRA VRB XO OPCD VRT /// VRB XO BA (11:15) Field used to specify a bit in the CR to be used as OPCD VRT UIM VRB XO a source. OPCD VRT / UIM VRB XO OPCD VRT // UIM VRB XO BB (16:20) OPCD VRT /// UIM VRB XO Field used to specify a bit in the CR to be used as OPCD VRT SIM /// XO a source. OPCD VRT /// XO BC (21:25) OPCD /// VRB XO Field used to specify a bit in the CR to be used as a source. Figure 22. VX instruction format BD (16:29) Immediate field used to specify a 14-bit signed 1.6.20 EVX-FORM two's complement branch displacement which is concatenated on the right with 0b00 and 0 6 11 16 21 31 sign-extended to 64 bits. OPCD RS RA RB XO OPCD RS RA UI XO BF (6:8) OPCD RT /// RB XO Field used to specify one of the CR fields or one of the FPSCR fields to be used as a target. OPCD RT RA RB XO OPCD RT RA /// XO BFA (11:13 or 29:31) OPCD RT UI RB XO Field used to specify one of the CR fields or one of OPCD BF // RA RB XO the FPSCR fields to be used as a source. OPCD RT RA UI XO BH (19:20) OPCD RT SI /// XO Field used to specify a hint in the Branch Condi- tional to Link Register and Branch Conditional to Figure 23. EVX instruction format Count Register instructions. The encoding is described in Section 2.4, "Branch Instructions". 1.6.21 EVS-FORM BI (11:15) Field used to specify a bit in the CR to be tested by 0 6 11 16 21 29 31 a Branch Conditional instruction. OPCD RT RA RB XO BFA BO (6:10) Figure 24. EVS instruction format Field used to specify options for the Branch Condi- tional instructions. The encoding is described in Section 2.4, "Branch Instructions". 1.6.22 Instruction Fields BT (6:10) Field used to specify a bit in the CR or in the AA (30) FPSCR to be used as a target. Absolute Address bit. 0 The immediate field represents an CT (7:10) address relative to the current instruction Field used in X-form instructions to specify a cache address. For I-form branches the effective target (see Section 3.2.2 of Book II). address of the branch target is the sum of the LI field sign-extended to 64 bits and D (16:31) the address of the branch instruction. For Immediate field used to specify a 16-bit signed B-form branches the effective address of two's complement integer which is sign-extended the branch target is the sum of the BD to 64 bits. field sign-extended to 64 bits and the address of the branch instruction. DCR (11:20) 1 The immediate field represents an abso- Field used by the Move To/From Device Control lute address. For I-form branches the Register instructions (see Book III-E). effective address of the branch target is DS (16:29) the LI field sign-extended to 64 bits. For 16 Power ISATM -- Book I Version 2.04 Immediate field used to specify a 14-bit signed Immediate field used to specify a 24-bit signed two's complement integer which is concatenated two's complement integer which is concatenated on the right with 0b00 and sign-extended to 64 on the right with 0b00 and sign-extended to 64 bits. bits. DUI (6:10) LK (31) Field used by the dnh instruction (see Book II). LINK bit. 0 Do not set the Link Register. DUIS (11:20) 1 Set the Link Register. The address of the Field used by the dnh instruction (see Book II). instruction following the Branch instruction is placed into the Link Register. E (16) Field used by the Write MSR External Enable MB (21:25) and ME (26:30) instruction (see Book III-E). Fields used in M-form instructions to specify a 64-bit mask consisting of 1-bits from bit MB+32 FLM (7:14) through bit ME+32 inclusive and 0-bits elsewhere, Field mask used to identify the FPSCR fields that as described in Section 3.3.13, "Fixed-Point Rotate are to be updated by the mtfsf instruction. and Shift Instructions" on page 77. MB (21:26) FRA (11:15) Field used in MD-form and MDS-form instructions Field used to specify an FPR to be used as a to specify the first 1-bit of a 64-bit mask, as source. described in Section 3.3.13, "Fixed-Point Rotate FRB (16:20) and Shift Instructions" on page 77. Field used to specify an FPR to be used as a ME (21:26) source. Field used in MD-form and MDS-form instructions FRC (21:25) to specify the last 1-bit of a 64-bit mask, as Field used to specify an FPR to be used as a described in Section 3.3.13, "Fixed-Point Rotate source. and Shift Instructions" on page 77. FRS (6:10) MO (6:10) Field used to specify an FPR to be used as a Field used in X-form instructions to specify a sub- source. set of storage accesses. FRT (6:10) NB (16:20) Field used to specify an FPR to be used as a tar- Field used to specify the number of bytes to move get. in an immediate Move Assist instruction. FXM (12:19) OPCD (0:5) Field mask used to identify the CR fields that are to Primary opcode field. be written by the mtcrf and mtocrf instructions, or OE (21) read by the mfocrf instruction. Field used by XO-form instructions to enable set- L (10 or 15) ting OV and SO in the XER. Field used to specify whether a fixed-point Com- PMRN (11:20) pare instruction is to compare 64-bit numbers or Field used to specify a Performance Monitor Reg- 32-bit numbers. ister for the mfpmr and mtpmr instructions. Field used by the Data Cache Block Flush instruc- tion (see Section 3.2.2 of Book II). RA (11:15) Field used to specify a GPR to be used as a Field used by the Move To Machine State Register source or as a target. and TLB Invalidate Entry instructions (see Book III). RB (16:20) Field used to specify a GPR to be used as a L (9:10) source. Field used by the Synchronize instruction (see Section 3.3.1 of Book II). Rc (21 OR 31) RECORD bit. LEV (20:26) 0 Do not alter the Condition Register. Field used by the System Call instruction. LI (6:29) Chapter 1. Introduction 17 Version 2.04 1 Set Condition Register Field 0, Field 1, or VRB (16:20) Field 6 as described in Section 2.3.1, Field used to specify a VR to be used as a source. "Condition Register" on page 26. VRC (21:25) RS (6:10) Field used to specify a VR to be used as a source. Field used to specify a GPR to be used as a source. VRS (6:10) Field used to specify a VR to be used as a source. RT (6:10) Field used to specify a GPR to be used as a target. VRT (6:10) Field used to specify a VR to be used as a target. SH (16:20, or 16:20 and 30) Field used to specify a shift amount. XO (21:28, 21:29, 21:30, 21:31, 22:30, 22:31, 26:30, 26:31, 27:29, 27:30, or 30:31) SHB (22:25) Extended opcode field. Field used to specify a shift amount in bytes. SI (16:31) Immediate field used to specify a 16-bit signed 1.7 Classes of Instructions integer. An instruction falls into exactly one of the following three classes: SIM (11:15) Immediate field used to specify a 5-bit signed inte- Defined ger. Illegal Reserved The class is determined by examining the opcode, and SPR (11:20) the extended opcode if any. If the opcode, or combina- Field used to specify a Special Purpose Register tion of opcode and extended opcode, is not that of a for the mtspr and mfspr instructions. defined instruction or of a reserved instruction, the SR (12:15) instruction is illegal. Field used by the Segment Register Manipulation instructions (see Book III-S). 1.7.1 Defined Instruction Class TBR (11:20) This class of instructions contains all the instructions Field used by the Move From Time Base instruc- defined in this document. tion (see Section 4.2.1 of Book II). A defined instruction can have preferred and/or invalid TH (7:10) forms, as described in Section 1.8.1, "Preferred Instruc- Field used by the data stream variant of the dcbt tion Forms" and Section 1.8.2, "Invalid Instruction and dcbtst instructions (see Section 3.2.2 of Book Forms". Instructions that are part of a category that is II). not supported are treated as illegal instructions. TO (6:10) Field used to specify the conditions on which to 1.7.2 Illegal Instruction Class trap. The encoding is described in Section 3.3.10, "Fixed-Point Trap Instructions" on page 69. This class of instructions contains the set of instruc- tions described in Appendix D of Book Appendices. Ille- U (16:19) gal instructions are available for future extensions of Immediate field used as the data to be placed into the Power ISA ; that is, some future version of the a field in the FPSCR. Power ISA may define any of these instructions to per- form new functions. UI (11:15, 16:20, or 16:31) Immediate field used to specify an unsigned inte- Any attempt to execute an illegal instruction will cause ger. the system illegal instruction error handler to be invoked and will have no other effect. UIM (11:15, 12:15, 13:15, 14:15) Immediate field used to specify an unsigned inte- An instruction consisting entirely of binary 0s is guaran- ger. teed always to be an illegal instruction. This increases the probability that an attempt to execute data or unini- VRA (11:15) tialized storage will result in the invocation of the sys- Field used to specify a VR to be used as a source. tem illegal instruction error handler. 18 Power ISATM -- Book I Version 2.04 1.7.3 Reserved Instruction Class Assembler Note This class of instructions contains the set of instruc- Assemblers should report uses of invalid instruc- tions described in Appendix E of Book Appendices. tion forms as errors. Reserved instructions are allocated to specific pur- poses that are outside the scope of the Power ISA. 1.9 Exceptions Any attempt to execute a reserved instruction will: 1 perform the actions described by the implementa- There are two kinds of exception, those caused directly tion if the instruction is implemented; or by the execution of an instruction and those caused by 1 cause the system illegal instruction error handler to an asynchronous event. In either case, the exception be invoked if the instruction is not implemented. may cause one of several components of the system software to be invoked. The exceptions that can be caused directly by the exe- 1.8 Forms of Defined Instruc- cution of an instruction include the following: tions 1 an attempt to execute an illegal instruction, or an attempt by an application program to execute a "privileged" instruction (see Book III) (system ille- 1.8.1 Preferred Instruction Forms gal instruction error handler or system privileged instruction error handler) Some of the defined instructions have preferred forms. For such an instruction, the preferred form will execute 1 the execution of a defined instruction using an in an efficient manner, but any other form may take sig- invalid form (system illegal instruction error handler nificantly longer to execute than the preferred form. or system privileged instruction error handler) Instructions having preferred forms are: 1 an attempt to execute an instruction that is not pro- vided by the implementation (system illegal 1 the Condition Register Logical instructions instruction error handler) 1 the Load/Store Multiple instructions 1 the Load/Store String instructions 1 an attempt to access a storage location that is 1 the Or Immediate instruction (preferred form of unavailable (system instruction storage error han- no-op) dler or system data storage error handler) 1 the Move To Condition Register Fields instruction 1 an attempt to access storage with an effective address alignment that is invalid for the instruction 1.8.2 Invalid Instruction Forms (system alignment error handler) 1 the execution of a System Call instruction (system Some of the defined instructions can be coded in a service program) form that is invalid. An instruction form is invalid if one or more fields of the instruction, excluding the opcode 1 the execution of a Trap instruction that traps (sys- field(s), are coded incorrectly in a manner that can be tem trap handler) deduced by examining only the instruction encoding. 1 the execution of a floating-point instruction that In general, any attempt to execute an invalid form of an causes a floating-point enabled exception to exist instruction will either cause the system illegal instruc- (system floating-point enabled exception error han- tion error handler to be invoked or yield boundedly dler) undefined results. Exceptions to this rule are stated in 1 the execution of an auxiliary processor instruction the instruction descriptions. that causes an auxiliary processor enabled excep- Some instruction forms are invalid because the instruc- tion to exist (system auxiliary processor enabled tion contains a reserved value in a defined field (see exception error handler) Section 1.3.3 on page 5); these invalid forms are not The exceptions that can be caused by an asynchro- discussed further. All other invalid forms are identified nous event are described in Book III. in the instruction descriptions. The invocation of the system error handler is precise, References to instructions elsewhere in this document except that the invocation of the auxiliary processor assume the instruction form is not invalid, unless other- enabled exception error handler may be imprecise, and wise stated or obvious from context. if one of the imprecise modes for invoking the system floating-point enabled exception error handler is in effect (see page 103), then the invocation of the system floating-point enabled exception error handler may also be imprecise. When the system error handler is invoked Chapter 1. Introduction 19 Version 2.04 imprecisely, the excepting instruction does not appear to complete before the next instruction starts (because 1.10 Storage Addressing one of the effects of the excepting instruction, namely A program references storage using the effective the invocation of the system error handler, has not yet address computed by the processor when it executes a occurred). Storage Access or Branch instruction (or certain other Additional information about exception handling can be instructions described in Book II and Book III), or when found in Book III. it fetches the next sequential instruction. Bytes in storage are numbered consecutively starting with 0. Each number is the address of the correspond- ing byte. The byte ordering (Big-Endian or Little-Endian) for a storage access is specified by the operating system. In the Embedded environment this ordering is a page attribute (see Book II) and is specified independently for each virtual page, while in the Server environment it is a mode (see Book III-S) and applies to all storage. 1.10.1 Storage Operands Storage operands may be bytes, halfwords, words, doublewords, or quadwords (see book III), or, for the Load/Store Multiple and Move Assist instructions, a sequence of bytes or words. The address of a storage operand is the address of its first byte (i.e., of its low- est-numbered byte). Operand length is implicit for each instruction. The operand of a single-register Storage Access instruction, or of a quadword Load or Store instruction, has a "natural" alignment boundary equal to the oper- and length. In other words, the "natural" address of an operand is an integral multiple of the operand length. A storage operand is said to be aligned if it is aligned at its natural boundary; otherwise it is said to be unaligned. See the following table. Operand Length Addr60:63 if aligned Byte 8 bits xxxx Halfword 2 bytes xxx0 Word 4 bytes xx00 Doubleword 8 bytes x000 Quadword 16 bytes 0000 Note: An "x" in an address bit position indicates that the bit can be 0 or 1 independent of the contents of other bits in the address. The concept of alignment is also applied more gener- ally, to any datum in storage. For example, a 12-byte datum in storage is said to be word-aligned if its address is an integral multiple of 4. Some instructions require their storage operands to have certain alignments. In addition, alignment may affect performance. For single-register Storage Access instructions, and for quadword Load and Store instruc- tions, the best performance is obtained when storage operands are aligned. Additional effects of data place- 20 Power ISATM -- Book I Version 2.04 ment on performance are described in Chapter 2 of Figure 26 shows an example of a C language Book II. structure s containing an assortment of scalars and one character string. The value assumed to be in each When a storage operand of length N bytes starting at structure element is shown in hex in the C comments; effective address EA is copied between storage and a these values are used below to show how the bytes register that is R bytes long (i.e., the register contains making up each structure element are mapped into bytes numbered from 0, most significant, through R-1, storage. It is assumed that structure s is compiled for least significant), the bytes of the operand are placed 32-bit mode or for a 32-bit implementation. (This affects into the register or into storage in a manner that the length of the pointer to c.) depends on the byte ordering for the storage access as shown in Figure 25, unless otherwise specified in the C structure mapping rules permit the use of padding instruction description. (skipped bytes) in order to align the scalars on desir- able boundaries. Figures 27 and 28 show each scalar Big-Endian Byte Ordering aligned at its natural boundary. This alignment intro- Load Store duces padding of four bytes between a and b, one byte for i=0 to N-1: for i=0 to N-1: between d and e, and two bytes between e and f. The RT(R-N)+i1 MEM(EA+i,1) MEM(EA+i,1) 1 (RS)(R-N)+i same amount of padding is present for both Big-Endian and Little-Endian mappings. Little-Endian Byte Ordering Load Store The Big-Endian mapping of structure s is shown in for i=0 to N-1: for i=0 to N-1: Figure 27. Addresses are shown in hex at the left of RT(R-1)-i 1 MEM(EA+i,1) MEM(EA+i,1) 1 (RS)(R-1)-i each doubleword, and in small figures below each byte. The contents of each byte, as indicated in the C exam- Notes: ple in Figure 26, are shown in hex (as characters for the 1. In this table, subscripts refer to bytes in a register elements of the string). rather than to bits as defined in Section 1.3.2. 2. This table does not apply to the lvebx, lvehx, The Little-Endian mapping of structure s is shown in lvewx, stvebx, stvehx, and stvewx instructions. Figure 28. Doublewords are shown laid out from right to left, which is the common way of showing storage maps Figure 25. Storage operands and byte ordering for processors that implement only Little-Endian byte ordering. struct { int a; /* 0x1112_1314 word */ double b; /* 0x2122_2324_2526_2728 doubleword */ char * c; /* 0x3132_3334 word */ char d[7]; /* `A', `B', `C', `D', `E', `F', `G' array of bytes */ short e; /* 0x5152 halfword */ int f; /* 0x6162_6364 word */ } s; Figure 26. C structure `s', showing values of elements 11 12 13 14 00 07 06 05 04 03 02 01 00 00 11 12 13 14 21 22 23 24 25 26 27 28 08 00 01 02 03 04 05 06 07 0F 0E 0D 0C 0B 0A 09 08 08 21 22 23 24 25 26 27 28 `D' `C' `B' `A' 31 32 33 34 10 08 09 0A 0B 0C 0D 0E 0F 17 16 15 14 13 12 11 10 10 31 32 33 34 `A' `B' `C' `D' 51 52 `G' `F' `E' 18 10 11 12 13 14 15 16 17 1F 1E 1D 1C 1B 1A 19 18 18 `E' `F' `G' 51 52 61 62 63 64 20 18 19 1A 1B 1C 1D 1E 1F 23 22 21 20 20 61 62 63 64 Figure 28. Little-Endian mapping of structure `s' 20 21 22 23 Figure 27. Big-Endian mapping of structure `s' Chapter 1. Introduction 21 Version 2.04 1.10.2 Instruction Fetches beq done loop: cmplwi r5,0 00 Instructions are always four bytes long and 07 06 05 04 03 02 01 00 word-aligned (except for VLE instructions; see Book add r7,r7,r4 lwzux r4,r5,r6 08 VLE). 0F 0E 0D 0C 0B 0A 09 08 When an instruction starting at effective address EA is 10 b loop subi r5,r5,4 fetched from storage, the relative order of the bytes 17 16 15 14 13 12 11 10 within the instruction depend on the byte ordering for done: stw r7,total 18 the storage access as shown in Figure 29. 1F 1E 1D 1C 1B 1A 19 18 Big-Endian Byte Ordering Figure 32. Little-Endian mapping of program `p' for i=0 to 3: insti 1 MEM(EA+i,1) Little-Endian Byte Ordering for i=0 to 3: inst3-i 1 MEM(EA+i,1) Note: In this table, subscripts refer to bytes of the instruction rather than to bits as defined in Section 1.3.2. Figure 29. Instructions and byte ordering Figure 30 shows an example of a small assembly lan- guage program p. loop: cmplwi r5,0 beq done lwzux r4,r5,r6 add r7,r7,r4 subi r5,r5,4 b loop done: stw r7,total Figure 30. Assembly language program `p' The Big-Endian mapping of program p is shown in Figure 31 (assuming the program starts at address 0). 00 loop: cmplwi r5,0 beq done 00 01 02 03 04 05 06 07 08 lwzux r4,r5,r6 add r7,r7,r4 08 09 0A 0B 0C 0D 0E 0F 10 subi r5,r5,4 b loop 10 11 12 13 14 15 16 17 18 done: stw r7,total 18 19 1A 1B 1C 1D 1E 1F Figure 31. Big-Endian mapping of program `p' The Little-Endian mapping of program p is shown in Figure 32. 22 Power ISATM -- Book I Version 2.04 Programming Note The terms Big-Endian and Little-Endian come from forbidden, and the whole Party rendered incapable Part I, Chapter 4, of Jonathan Swift's Gulliver's Travels. by Law of holding Employments. During the Here is the complete passage, from the edition printed Course of these Troubles, the Emperors of Ble- in 1734 by George Faulkner in Dublin. fuscu did frequently expostulate by their Ambassa- dors, accusing us of making a Schism in Religion, ... our Histories of six Thousand Moons make no by offending against a fundamental Doctrine of our Mention of any other Regions, than the two great great Prophet Lustrog, in the fifty-fourth Chapter of Empires of Lilliput and Blefuscu. Which two mighty the Brundrecal, (which is their Alcoran.) This, how- Powers have, as I was going to tell you, been ever, is thought to be a mere Strain upon the text: engaged in a most obstinate War for six and thirty For the Words are these; That all true Believers Moons past. It began upon the following Occasion. shall break their Eggs at the convenient End: and It is allowed on all Hands, that the primitive Way of which is the convenient End, seems, in my humble breaking Eggs before we eat them, was upon the Opinion, to be left to every Man's Conscience, or at larger End: But his present Majesty's Grand-father, least in the Power of the chief Magistrate to deter- while he was a Boy, going to eat an Egg, and mine. Now the Big-Endian Exiles have found so breaking it according to the ancient Practice, hap- much Credit in the Emperor of Blefuscu's Court; pened to cut one of his Fingers. Whereupon the and so much private Assistance and Encourage- Emperor his Father, published an Edict, command- ment from their Party here at home, that a bloody ing all his Subjects, upon great Penalties, to break War has been carried on between the two Empires the smaller End of their Eggs. The People so for six and thirty Moons with various Success; dur- highly resented this Law, that our Histories tell us, ing which Time we have lost Forty Capital Ships, there have been six Rebellions raised on that and a much greater Number of smaller Vessels, Account; wherein one Emperor lost his Life, and together with thirty thousand of our best Seamen another his Crown. These civil Commotions were and Soldiers; and the Damage received by the constantly fomented by the Monarchs of Blefuscu; Enemy is reckoned to be somewhat greater than and when they were quelled, the Exiles always fled ours. However, they have now equipped a numer- for Refuge to that Empire. It is computed that ous Fleet, and are just preparing to make a eleven Thousand Persons have, at several Times, Descent upon us: and his Imperial Majesty, placing suffered Death, rather than submit to break their great Confidence in your Valour and Strength, hath Eggs at the smaller End. Many hundred large Vol- commanded me to lay this Account of his Affairs umes have been published upon this Controversy: before you. But the Books of the Big-Endians have been long 1.10.3 Effective Address Calcula- In 64-bit mode, the entire 64-bit result comprises the 64-bit effective address. The effective address arith- tion metic wraps around from the maximum address, 264 - 1, to address 0, except that if the current instruc- An effective address is computed by the processor tion is at effective address 264 - 4 the effective address when executing a Storage Access or Branch instruction of the next sequential instruction is undefined. (or certain other instructions described in Book II, Book III, and Book VLE) when fetching the next sequential In 32-bit mode, the low-order 32 bits of the 64-bit result, instruction, or when invoking a system error handler. preceded by 32 0 bits, comprise the 64-bit effective The following provides an overview of this process. address for the purpose of addressing storage. When More detail is provided in the individual instruction an effective address is placed into a register by an descriptions. instruction or event, the value placed into the high-order 32 bits of the register differs between the Server envi- Effective address calculations, for both data and ronment and the Embedded environment. instruction accesses, use 64-bit two's complement 1 Server environment: addition. All 64 bits of each address component partici- - Load with Update and Store with Update pate in the calculation regardless of mode (32-bit or instructions set the high-order 32 bits of regis- 64-bit). In this computation one operand is an address ter RA to the high-order 32 bits of the 64-bit (which is by definition an unsigned number) and the result. second is a signed offset. Carries out of the most signif- - In all other cases (e.g., the Link Register when icant bit are ignored. set by Branch instructions having LK=1, Spe- cial Purpose Registers when set to an effec- Chapter 1. Introduction 23 Version 2.04 tive address by invocation of a system error this address component is the effective address of handler) the high-order 32 bits of the register the next instruction. are set to 0s except as described in the last 1 With B-form Branch instructions, the 14-bit BD field sentence of this paragraph. is concatenated on the right with 0b00 and 1 Embedded environment: sign-extended to form a 64-bit address compo- The high-order 32 bits of the register are set to an nent. If AA=0, this address component is added to undefined value. the address of the Branch instruction to form the As used to address storage, the effective address arith- effective address of the next instruction. If AA=1, metic appears to wrap around from the maximum this address component is the effective address of address, 232 - 1, to address 0, except that if the current the next instruction. instruction is at effective address 232 - 4 the effective address of the next sequential instruction is undefined. 1 With XL-form Branch instructions, bits 0:61 of the Link Register or the Count Register are concate- The 64-bit current instruction address is not affected by nated on the right with 0b00 to form the effective a change from 32-bit mode to 64-bit mode, but is address of the next instruction. affected by a change from 64-bit mode to 32-bit mode. In the latter case, the high-order 32 bits are set to 0. 1 With sequential instruction fetching, the value 4 is The same is true for the 64-bit next instruction address, added to the address of the current instruction to except as described in the last item of the list below. form the effective address of the next instruction, except that if the current instruction is at the maxi- RA is a field in the instruction which specifies an mum instruction effective address for the mode address component in the computation of an effective (264 - 4 in 64-bit mode, 232 - 4 in 32-bit mode) the address. A zero in the RA field indicates the absence effective address of the next sequential instruction of the corresponding address component. A value of is undefined. (There is one other exception to this zero is substituted for the absent component of the rule; this exception involves changing between effective address computation. This substitution is 32-bit mode and 64-bit mode and is described in shown in the instruction descriptions as (RA|0). Section 5.3.2 of Book III-S and Section 4.3.2 of Effective addresses are computed as follows. In the Book III-E.) descriptions below, it should be understood that "the If the size of the operand of a storage access instruc- contents of a GPR" refers to the entire 64-bit contents, tion is more than one byte, the effective address for independent of mode, but that in 32-bit mode only bits each byte after the first is computed by adding 1 to the 32:63 of the 64-bit result of the computation are used to effective address of the preceding byte. address storage. 1 With X-form instructions, in computing the effective address of a data element, the contents of the GPR designated by RB (or the value zero for lswi and stswi) are added to the contents of the GPR designated by RA or to zero if RA=0. 1 With D-form instructions, the 16-bit D field is sign-extended to form a 64-bit address compo- nent. In computing the effective address of a data element, this address component is added to the contents of the GPR designated by RA or to zero if RA=0. 1 With DS-form instructions, the 14-bit DS field is concatenated on the right with 0b00 and sign-extended to form a 64-bit address compo- nent. In computing the effective address of a data element, this address component is added to the contents of the GPR designated by RA or to zero if RA=0. 1 With I-form Branch instructions, the 24-bit LI field is concatenated on the right with 0b00 and sign-extended to form a 64-bit address compo- nent. If AA=0, this address component is added to the address of the Branch instruction to form the effective address of the next instruction. If AA=1, 24 Power ISATM -- Book I Version 2.04 Chapter 2. Branch Processor 2.1 Branch Processor Overview . . . . . . 25 2.5 Condition Register Instructions . . . . 33 2.2 Instruction Execution Order . . . . . . 25 2.5.1 Condition Register Logical Instruc- 2.3 Branch Processor Registers . . . . . . 26 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3.1 Condition Register . . . . . . . . . . . . 26 2.5.2 Condition Register Field Instruction . 2.3.2 Link Register . . . . . . . . . . . . . . . . 27 34 2.3.3 Count Register. . . . . . . . . . . . . . . 27 2.6 System Call Instruction . . . . . . . . . 35 2.4 Branch Instructions . . . . . . . . . . . . . 27 2.1 Branch Processor Overview that causes the exception need not complete before the next instruction begins execution, with This chapter describes the registers and instructions respect to setting exception bits and (if the excep- that make up the Branch Processor facility. tion is enabled) invoking the system error handler. 1 A Store instruction modifies one or more bytes in an area of storage that contains instructions that 2.2 Instruction Execution Order will subsequently be executed. Before an instruc- tion in that area of storage is executed, software In general, instructions appear to execute sequentially, synchronization is required to ensure that the in the order in which they appear in storage. The instructions executed are consistent with the exceptions to this rule are listed below. results produced by the Store instruction. 1 Branch instructions for which the branch is taken cause execution to continue at the target address Programming Note specified by the Branch instruction. This software synchronization will generally be 1 Trap instructions for which the trap conditions are provided by system library programs (see satisfied, and System Call instructions, cause the Section 1.8 of Book II). Application programs appropriate system handler to be invoked. should call the appropriate system library pro- gram before attempting to execute modified 1 Exceptions can cause the system error handler to instructions. be invoked, as described in Section 1.9, "Excep- tions" on page 19. 1 Returning from a system service program, system trap handler, or system error handler causes exe- cution to continue at a specified address. The model of program execution in which the processor appears to execute one instruction at a time, complet- ing each instruction before beginning to execute the next instruction is called the "sequential execution model". In general, the processor obeys the sequential execution model. For the instructions and facilities defined in this Book, the only exceptions to this rule are the following. 1 A floating-point exception occurs when the proces- sor is running in one of the Imprecise floating-point exception modes (see Section 4.4). The instruction Chapter 2. Branch Processor 25 Version 2.04 2.3 Branch Processor Registers Bit Description 0 Negative (LT) The result is negative. 2.3.1 Condition Register 1 Positive (GT) The Condition Register (CR) is a 32-bit register which The result is positive. reflects the result of certain operations, and provides a 2 Zero (EQ) mechanism for testing (and branching). The result is zero. CR 3 Summary Overflow (SO) 32 63 This is a copy of the contents of XERSO at the completion of the instruction. Figure 33. Condition Register The stwcx. and stdcx. instructions (see Section 3.3.2, The bits in the Condition Register are grouped into "Load and Reserve and Store Conditional Instructions", eight 4-bit fields, named CR Field 0 (CR0), ..., CR Field in Book II) also set CR Field 0. 7 (CR7), which are set in one of the following ways. For all floating-point instructions in which Rc=1, CR 1 Specified fields of the CR can be set by a move to Field 1 (bits 36:39 of the Condition Register) is set to the CR from a GPR (mtcrf, mtocrf). the Floating-Point exception status, copied from bits 0:3 1 A specified field of the CR can be set by a move to of the Floating-Point Status and Control Register. This the CR from another CR field (mcrf), from occurs regardless of whether any exceptions are XER32:35 (mcrxr), or from the FPSCR (mcrfs). enabled, and regardless of whether the writing of the 1 CR Field 0 can be set as the implicit result of a result is suppressed (see Section 4.4, "Floating-Point fixed-point instruction. Exceptions" on page 102). These bits are interpreted 1 CR Field 1 can be set as the implicit result of a as follows. floating-point instruction. 1 CR Field 6 can be set as the implicit result of a Bit Description vector instruction. 0 Floating-Point Exception Summary (FX) 1 A specified CR field can be set as the result of a This is a copy of the contents of FPSCRFX at Compare instruction. the completion of the instruction. Instructions are provided to perform logical operations 1 Floating-Point Enabled Exception Sum- on individual CR bits and to test individual CR bits. mary (FEX) This is a copy of the contents of FPSCRFEX at For all fixed-point instructions in which Rc=1, and for the completion of the instruction. addic., andi., and andis., the first three bits of CR Field 0 (bits 32:34 of the Condition Register) are set by 2 Floating-Point Invalid Operation Exception signed comparison of the result to zero, and the fourth Summary (VX) bit of CR Field 0 (bit 35 of the Condition Register) is This is a copy of the contents of FPSCRVX at copied from the SO field of the XER. "Result" here the completion of the instruction. refers to the entire 64-bit value placed into the target 3 Floating-Point Overflow Exception (OX) register in 64-bit mode, and to bits 32:63 of the 64-bit This is a copy of the contents of FPSCROX at value placed into the target register in 32-bit mode. the completion of the instruction. if (64-bit mode) For Compare instructions, a specified CR field is set to then M 1 0 reflect the result of the comparison. The bits of the else M 1 32 specified CR field are interpreted as follows. A com- if (target_register)M:63 < 0 then c 1 0b100 plete description of how the bits are set is given in the else if (target_register)M:63 > 0 then c 1 0b010 instruction descriptions in Section 3.3.9, "Fixed-Point else c 1 0b001 CR0 1 c || XERSO Compare Instructions" on page 67, Section 4.6.7, "Floating-Point Compare Instructions" on page 129, If any portion of the result is undefined, then the value and Section 6.3.9, "SPE Instruction Set" on page 208. placed into the first three bits of CR Field 0 is unde- fined. Bit Description The bits of CR Field 0 are interpreted as follows. 0 Less Than, Floating-Point Less Than (LT, FL) For fixed-point Compare instructions, (RA) < SI or (RB) (signed comparison) or (RA) The sequence of instruction execution can be changed SI or (RB) (signed comparison) or (RA) >u UI by the Branch instructions. Because all instructions are or (RB) (unsigned comparison). For floating- on word boundaries, bits 62 and 63 of the generated point Compare instructions, (FRA) > (FRB). branch target address are ignored by the processor in 2 Equal, Floating-Point Equal (EQ, FE) performing the branch. For fixed-point Compare instructions, (RA) = The Branch instructions compute the effective address SI, UI, or (RB). For floating-point Compare (EA) of the target in one of the following four ways, as instructions, (FRA) = (FRB). described in Section 1.10.3, "Effective Address Calcu- 3 Summary Overflow, Floating-Point Unor- lation" on page 23. dered (SO,FU) 1. Adding a displacement to the address of the For fixed-point Compare instructions, this is a Branch instruction (Branch or Branch Conditional copy of the contents of XERSO at the comple- with AA=0). tion of the instruction. For floating-point Com- pare instructions, one or both of (FRA) and 2. Specifying an absolute address (Branch or Branch (FRB) is a NaN. Conditional with AA=1). 3. Using the address contained in the Link Register 2.3.2 Link Register (Branch Conditional to Link Register). The Link Register (LR) is a 64-bit register. It can be 4. Using the address contained in the Count Register used to provide the branch target address for the (Branch Conditional to Count Register). Branch Conditional to Link Register instruction, and it In all four cases, in 32-bit mode the final step in the holds the return address after Branch instructions for address computation is setting the high-order 32 bits of which LK=1. the target address to 0. LR For the first two methods, the target addresses can be computed sufficiently ahead of the Branch instruction 0 63 that instructions can be prefetched along the target Figure 34. Link Register path. For the third and fourth methods, prefetching instructions along the target path is also possible pro- vided the Link Register or the Count Register is loaded 2.3.3 Count Register sufficiently ahead of the Branch instruction. The Count Register (CTR) is a 64-bit register. It can be Branching can be conditional or unconditional, and the used to hold a loop count that can be decremented dur- return address can optionally be provided. If the return ing execution of Branch instructions that contain an address is to be provided (LK=1), the effective address appropriately coded BO field. If the value in the Count of the instruction following the Branch instruction is Register is 0 before being decremented, it is -1 after- placed into the Link Register after the branch target ward. The Count Register can also be used to provide address has been computed; this is done regardless of the branch target address for the Branch Conditional to whether the branch is taken. Count Register instruction. For Branch Conditional instructions, the BO field speci- CTR fies the conditions under which the branch is taken, as shown in Figure 36. In the figure, M=0 in 64-bit mode 0 63 and M=32 in 32-bit mode. Figure 35. Count Register Chapter 2. Branch Processor 27 Version 2.04 provides a hint about the use of the instruction, as shown in Figure 38. BO Description BH Hint 0000z Decrement the CTR, then branch if the dec- 00 bclr[l]: The instruction is a subroutine remented CTRM:630 and CRBI=0 return 0001z Decrement the CTR, then branch if the dec- bcctr[l]: The instruction is not a subroutine remented CTRM:63=0 and CRBI=0 return; the target address is likely to 001at Branch if CRBI=0 be the same as the target address 0100z Decrement the CTR, then branch if the dec- used the preceding time the branch remented CTRM:630 and CRBI=1 was taken 0101z Decrement the CTR, then branch if the dec- 01 bclr[l]: The instruction is not a subroutine remented CTRM:63=0 and CRBI=1 return; the target address is likely to be the same as the target address 011at Branch if CRBI=1 used the preceding time the branch 1a00t Decrement the CTR, then branch if the dec- was taken remented CTRM:630 bcctr[l]: Reserved 1a01t Decrement the CTR, then branch if the dec- remented CTRM:63=0 10 Reserved 1z1zz Branch always 11 bclr[l] and bcctr[l]: The target address is not predictable Notes: 1. "z" denotes a bit that is ignored. Figure 38. BH field encodings 2. The "a" and "t" bits are used as described below. Programming Note Figure 36. BO field encodings The hint provided by the BH field is independent of The "a" and "t" bits of the BO field can be used by soft- the hint provided by the "at" bits (e.g., the BH field ware to provide a hint about whether the branch is likely provides no indication of whether the branch is to be taken or is likely not to be taken, as shown in likely to be taken). Figure 37. at Hint Extended mnemonics for branches 00 No hint is given Many extended mnemonics are provided so that 01 Reserved Branch Conditional instructions can be coded with por- tions of the BO and BI fields as part of the mnemonic 10 The branch is very likely not to be taken rather than as part of a numeric operand. Some of 11 The branch is very likely to be taken these are shown as examples with the Branch instruc- Figure 37. "at" bit encodings tions. See Appendix D for additional extended mne- monics. Programming Note Programming Note Many implementations have dynamic mechanisms for predicting whether a branch will be taken. The hints provided by the "at" bits and by the BH Because the dynamic prediction is likely to be very field do not affect the results of executing the accurate, and is likely to be overridden by any hint instruction. provided by the "at" bits, the "at" bits should be set The "z" bits should be set to 0, because they may to 0b00 unless the static prediction implied by be assigned a meaning in some future version of at=0b10 or at=0b11 is highly likely to be correct. the architecture. For Branch Conditional to Link Register and Branch Conditional to Count Register instructions, the BH field 28 Power ISATM -- Book I Version 2.04 Programming Note Many implementations have dynamic mechanisms for 1 Direct subroutine linkage: predicting the target addresses of bclr[l] and bcctr[l] Here A calls B and B returns to A. The two instructions. These mechanisms may cache return branches should be as follows. addresses (i.e., Link Register values set by Branch - A calls B: use a bl or bcl instruction (LK=1). instructions for which LK=1 and for which the branch - B returns to A: use a bclr instruction (LK=0) was taken) and recently used branch target addresses. (the return address is in, or can be restored to, To obtain the best performance across the widest range the Link Register). of implementations, the programmer should obey the 1 Indirect subroutine linkage: following rules. Here A calls Glue, Glue calls B, and B returns to A rather than to Glue. (Such a calling sequence is 1 Use Branch instructions for which LK=1 only as common in linkage code used when the subroutine subroutine calls (including function calls, etc.). that the programmer wants to call, here B, is in a 1 Pair each subroutine call (i.e., each Branch instruc- different module from the caller; the Binder inserts tion for which LK=1 and the branch is taken) with a "glue" code to mediate the branch.) The three bclr instruction that returns from the subroutine branches should be as follows. and has BH=0b00. 1 Do not use bclrl as a subroutine call. (Some - A calls Glue: use a bl or bcl instruction implementations access the return address cache (LK=1). at most once per instruction; such implementations - Glue calls B: place the address of B into the are likely to treat bclrl as a subroutine return, and Count Register, and use a bcctr instruction not as a subroutine call.) (LK=0). 1 For bclr[l] and bcctr[l], use the appropriate value - B returns to A: use a bclr instruction (LK=0) in the BH field. (the return address is in, or can be restored to, the Link Register). The following are examples of programming conven- tions that obey these rules. In the examples, BH is 1 Function call: assumed to contain 0b00 unless otherwise stated. In Here A calls a function, the identity of which may addition, the "at" bits are assumed to be coded appro- vary from one instance of the call to another, priately. instead of calling a specific program B. This case Let A, B, and Glue be specific programs. should be handled using the conventions of the preceding two bullets, depending on whether the 1 Loop counts: call is direct or indirect, with the following differ- Keep them in the Count Register, and use a bc ences. instruction (LK=0) to decrement the count and to branch back to the beginning of the loop if the dec- - If the call is direct, place the address of the remented count is nonzero. function into the Count Register, and use a bcctrl instruction (LK=1) instead of a bl or bcl 1 Computed goto's, case statements, etc.: instruction. Use the Count Register to hold the address to - For the bcctr[l] instruction that branches to branch to, and use a bcctr instruction (LK=0, and the function, use BH=0b11 if appropriate. BH=0b11 if appropriate) to branch to the selected address. Chapter 2. Branch Processor 29 Version 2.04 Compatibility Note The bits corresponding to the current "a" and "t" bits, and to the current "z" bits except in the "branch always" BO encoding, had different meanings in versions of the architecture that precede Version 2.00. 1 The bit corresponding to the "t" bit was called the "y" bit. The "y" bit indicated whether to use the architected default prediction (y=0) or to use the complement of the default prediction (y=1). The default prediction was defined as follows. - If the instruction is bc[l][a] with a negative value in the displacement field, the branch is taken. (This is the only case in which the prediction corresponding to the "y" bit differs from the prediction corresponding to the "t" bit.) - In all other cases (bc[l][a] with a nonnega- tive value in the displacement field, bclr[l], or bcctr[l]), the branch is not taken. 1 The BO encodings that test both the Count Register and the Condition Register had a "y" bit in place of the current "z" bit. The meaning of the "y" bit was as described in the preceding item. 1 The "a" bit was a "z" bit. Because these bits have always been defined either to be ignored or to be treated as hints, a given program will produce the same result on any implementation regardless of the values of the bits. Also, because even the "y" bit is ignored, in prac- tice, by most processors that comply with versions of the architecture that precede Version 2.00, the performance of a given program on those proces- sors will not be affected by the values of the bits. 30 Power ISATM -- Book I Version 2.04 Branch I-form Branch Conditional B-form b target_addr (AA=0 LK=0) bc BO,BI,target_addr (AA=0 LK=0) ba target_addr (AA=1 LK=0) bca BO,BI,target_addr (AA=1 LK=0) bl target_addr (AA=0 LK=1) bcl BO,BI,target_addr (AA=0 LK=1) bla target_addr (AA=1 LK=1) bcla BO,BI,target_addr (AA=1 LK=1) 18 LI AA LK 16 BO BI BD AA LK 0 6 30 31 0 6 11 16 30 31 if AA then NIA 1iea EXTS(LI || 0b00) if (64-bit mode) else NIA 1iea CIA + EXTS(LI || 0b00) then M 1 0 if LK then LR 1iea CIA + 4 else M 1 32 if ¬BO2 then CTR 1 CTR - 1 target_addr specifies the branch target address. ctr_ok 1 BO2 | ((CTRM:63 0) BO3) If AA=0 then the branch target address is the sum of cond_ok 1 BO0 | (CRBI+32 BO1) LI || 0b00 sign-extended and the address of this if ctr_ok & cond_ok then if AA then NIA 1iea EXTS(BD || 0b00) instruction, with the high-order 32 bits of the branch tar- else NIA 1iea CIA + EXTS(BD || 0b00) get address set to 0 in 32-bit mode. if LK then LR 1iea CIA + 4 If AA=1 then the branch target address is the value BI+32 specifies the Condition Register bit to be tested. LI || 0b00 sign-extended, with the high-order 32 bits of The BO field is used to resolve the branch as described the branch target address set to 0 in 32-bit mode. in Figure 36. target_addr specifies the branch target If LK=1 then the effective address of the instruction fol- address. lowing the Branch instruction is placed into the Link If AA=0 then the branch target address is the sum of Register. BD || 0b00 sign-extended and the address of this Special Registers Altered: instruction, with the high-order 32 bits of the branch tar- LR (if LK=1) get address set to 0 in 32-bit mode. If AA=1 then the branch target address is the value BD || 0b00 sign-extended, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction fol- lowing the Branch instruction is placed into the Link Register. Special Registers Altered: CTR (if BO2=0) LR (if LK=1) Extended Mnemonics: Examples of extended mnemonics for Branch Condi- tional: Extended: Equivalent to: blt target bc 12,0,target bne cr2,target bc 4,10,target bdnz target bc 16,0,target Chapter 2. Branch Processor 31 Version 2.04 Branch Conditional to Link Register Branch Conditional to Count Register XL-form XL-form bclr BO,BI,BH (LK=0) bcctr BO,BI,BH (LK=0) bclrl BO,BI,BH (LK=1) bcctrl BO,BI,BH (LK=1) 19 BO BI /// BH 16 LK 19 BO BI /// BH 528 LK 0 6 11 16 19 21 31 0 6 11 16 19 21 31 if (64-bit mode) cond_ok 1 BO0 | (CRBI+32 BO1) then M 1 0 if cond_ok then NIA 1iea CTR0:61 || 0b00 else M 1 32 if LK then LR 1iea CIA + 4 if ¬BO2 then CTR 1 CTR - 1 ctr_ok 1 BO2 | ((CTRM:63 0) BO3 BI+32 specifies the Condition Register bit to be tested. cond_ok 1 BO0 | (CRBI+32 BO1) The BO field is used to resolve the branch as described if ctr_ok & cond_ok then NIA 1iea LR0:61 || 0b00 in Figure 36. The BH field is used as described in if LK then LR 1iea CIA + 4 Figure 38. The branch target address is CTR0:61 || 0b00, with the high-order 32 bits of the BI+32 specifies the Condition Register bit to be tested. branch target address set to 0 in 32-bit mode. The BO field is used to resolve the branch as described in Figure 36. The BH field is used as described in If LK=1 then the effective address of the instruction fol- Figure 38. The branch target address is LR0:61 || 0b00, lowing the Branch instruction is placed into the Link with the high-order 32 bits of the branch target address Register. set to 0 in 32-bit mode. If the "decrement and test CTR" option is specified If LK=1 then the effective address of the instruction fol- (BO2=0), the instruction form is invalid. lowing the Branch instruction is placed into the Link Special Registers Altered: Register. LR (if LK=1) Special Registers Altered: Extended Mnemonics: CTR (if BO2=0) LR (if LK=1) Examples of extended mnemonics for Branch Condi- tional to Count Register. Extended Mnemonics: Examples of extended mnemonics for Branch Condi- Extended: Equivalent to: tional to Link Register: bcctr 4,6 bcctr 4,6,0 bltctr bcctr 12,0,0 Extended: Equivalent to: bnectr cr2 bcctr 4,10,0 bclr 4,6 bclr 4,6,0 bltlr bclr 12,0,0 bnelr cr2 bclr 4,10,0 bdnzlr bclr 16,0,0 Programming Note bclr, bclrl, bcctr, and bcctrl each serve as both a basic and an extended mnemonic. The Assembler will recognize a bclr, bclrl, bcctr, or bcctrl mne- monic with three operands as the basic form, and a bclr, bclrl, bcctr, or bcctrl mnemonic with two operands as the extended form. In the extended form the BH operand is omitted and assumed to be 0b00. 32 Power ISATM -- Book I Version 2.04 2.5 Condition Register Instructions 2.5.1 Condition Register Logical Instructions The Condition Register Logical instructions have pre- Extended mnemonics for Condition ferred forms; see Section 1.8.1. In the preferred forms, Register logical operations the BT and BB fields satisfy the following rule. 1 The bit specified by BT is in the same Condition A set of extended mnemonics is provided that allow Register field as the bit specified by BB. additional Condition Register logical operations, beyond those provided by the basic Condition Register Logical instructions, to be coded easily. Some of these are shown as examples with the Condition Register Logical instructions. See Appendix D for additional extended mnemonics. Condition Register AND XL-form Condition Register NAND XL-form crand BT,BA,BB crnand BT,BA,BB 19 BT BA BB 257 / 19 BT BA BB 225 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 1 CRBA+32 & CRBB+32 CRBT+32 1 ¬(CRBA+32 & CRBB+32) The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is ANDed with the bit in the Condition Register specified ANDed with the bit in the Condition Register specified by BB+32, and the result is placed into the bit in the by BB+32, and the complemented result is placed into Condition Register specified by BT+32. the bit in the Condition Register specified by BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 Condition Register OR XL-form Condition Register XOR XL-form cror BT,BA,BB crxor BT,BA,BB 19 BT BA BB 449 / 19 BT BA BB 193 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 1 CRBA+32 | CRBB+32 CRBT+32 1 CRBA+32 CRBB+32 The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is ORed with the bit in the Condition Register specified by XORed with the bit in the Condition Register specified BB+32, and the result is placed into the bit in the Con- by BB+32, and the result is placed into the bit in the dition Register specified by BT+32. Condition Register specified by BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 Extended Mnemonics: Extended Mnemonics: Example of extended mnemonics for Condition Regis- Example of extended mnemonics for Condition Regis- ter OR: ter XOR: Extended: Equivalent to: Extended: Equivalent to: crmove Bx,By cror Bx,By,By crclr Bx crxor Bx,Bx,Bx Chapter 2. Branch Processor 33 Version 2.04 Condition Register NOR XL-form Condition Register Equivalent XL-form crnor BT,BA,BB creqv BT,BA,BB 19 BT BA BB 33 / 19 BT BA BB 289 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 1 ¬(CRBA+32 | CRBB+32) CRBT+32 1 CRBA+32 CRBB+32 The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is ORed with the bit in the Condition Register specified by XORed with the bit in the Condition Register specified BB+32, and the complemented result is placed into the by BB+32, and the complemented result is placed into bit in the Condition Register specified by BT+32. the bit in the Condition Register specified by BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 Extended Mnemonics: Extended Mnemonics: Example of extended mnemonics for Condition Regis- Example of extended mnemonics for Condition Regis- ter NOR: ter Equivalent: Extended: Equivalent to: Extended: Equivalent to: crnot Bx,By crnor Bx,By,By crset Bx creqv Bx,Bx,Bx Condition Register AND with Complement Condition Register OR with Complement XL-form XL-form crandc BT,BA,BB crorc BT,BA,BB 19 BT BA BB 129 / 19 BT BA BB 417 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 1 CRBA+32 & ¬CRBB+32 CRBT+32 1 CRBA+32 | ¬CRBB+32 The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is ANDed with the complement of the bit in the Condition ORed with the complement of the bit in the Condition Register specified by BB+32, and the result is placed Register specified by BB+32, and the result is placed into the bit in the Condition Register specified by into the bit in the Condition Register specified by BT+32. BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 2.5.2 Condition Register Field Instruction Move Condition Register Field XL-form mcrf BF,BFA 19 BF // BFA // /// 0 / 0 6 9 11 14 16 21 31 CR4×BF+32:4×BF+35 1 CR4×BFA+32:4×BFA+35 The contents of Condition Register field BFA are copied to Condition Register field BF. Special Registers Altered: CR field BF 34 Power ISATM -- Book I Version 2.04 2.6 System Call Instruction This instruction provides the means by which a pro- gram can call upon the system to perform a service. System Call SC-form sc LEV 17 /// /// // LEV // 1 / 0 6 11 16 20 27 30 31 This instruction calls the system to perform a service. A complete description of this instruction can be found in Book III. The use of the LEV field is described in Book III. The LEV values greater than 1 are reserved, and bits 0:5 of the LEV field (instruction bits 20:25) are treated as a reserved field. When control is returned to the program that executed the System Call instruction, the contents of the regis- ters will depend on the register conventions used by the program providing the system service. This instruction is context synchronizing (see Book III). Special Registers Altered: Dependent on the system service Programming Note sc serves as both a basic and an extended mne- monic. The Assembler will recognize an sc mne- monic with one operand as the basic form, and an sc mnemonic with no operand as the extended form. In the extended form the LEV operand is omitted and assumed to be 0. In application programs the value of the LEV oper- and for sc should be 0. Chapter 2. Branch Processor 35 Version 2.04 36 Power ISATM -- Book I Version 2.04 Chapter 3. Fixed-Point Processor 3.1 Fixed-Point Processor Overview. . . 37 3.3.8 Fixed-Point Arithmetic Instructions 58 3.2 Fixed-Point Processor Registers . . 38 3.3.8.1 64-bit Fixed-Point Arithmetic 3.2.1 General Purpose Registers . . . . . 38 Instructions [Category: 64-Bit] . . . . . . . . 65 3.2.2 Fixed-Point Exception Register . . 38 3.3.9 Fixed-Point Compare Instructions 67 3.2.3 Program Priority Register [Category: 3.3.10 Fixed-Point Trap Instructions . . . 69 Server] . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3.10.1 64-bit Fixed-Point Trap Instruc- 3.2.4 Software Use SPRs [Category: tions [Category: 64-Bit] . . . . . . . . . . . . . 70 Embedded] . . . . . . . . . . . . . . . . . . . . . . 39 3.3.11 Fixed-Point Select [Category: 3.2.5 Device Control Registers Base.Phased-In] . . . . . . . . . . . . . . . . . . 70 [Category: Embedded] . . . . . . . . . . . . . 39 3.3.12 Fixed-Point Logical Instructions . 71 3.3 Fixed-Point Processor Instructions . 40 3.3.12.1 64-bit Fixed-Point Logical Instruc- 3.3.1 Fixed-Point Storage Access Instruc- tions [Category: 64-Bit] . . . . . . . . . . . . . 76 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3.12.2 Phased-In Fixed-Point Logical 3.3.1.1 Storage Access Exceptions . . . 40 Instructions [Category: Base.Phased-In] 76 3.3.2 Fixed-Point Load Instructions . . . 40 3.3.13 Fixed-Point Rotate and Shift 3.3.2.1 64-bit Fixed-Point Load Instruc- Instructions . . . . . . . . . . . . . . . . . . . . . . 77 tions [Category: 64-Bit] . . . . . . . . . . . . . 45 3.3.13.1 Fixed-Point Rotate Instructions 77 3.3.3 Fixed-Point Store Instructions . . . 47 3.3.13.1.1 64-bit Fixed-Point Rotate 3.3.3.1 64-bit Fixed-Point Store Instruc- Instructions [Category: 64-Bit] . . . . . . . . 79 tions [Category: 64-Bit] . . . . . . . . . . . . . 50 3.3.13.2 Fixed-Point Shift Instructions. . 83 3.3.4 Fixed-Point Load and Store with Byte 3.3.13.2.1 64-bit Fixed-Point Shift Instruc- Reversal Instructions. . . . . . . . . . . . . . . 51 tions [Category: 64-Bit] . . . . . . . . . . . . . 85 3.3.5 Fixed-Point Load and Store Multiple 3.3.14 Move To/From System Register Instructions . . . . . . . . . . . . . . . . . . . . . . 52 Instructions . . . . . . . . . . . . . . . . . . . . . . 86 3.3.6 Fixed-Point Move Assist Instructions 3.3.14.1 Move To/From System Registers [Category: Move Assist] . . . . . . . . . . . . 54 [Category: Embedded]. . . . . . . . . . . . . . 91 3.3.7 Other Fixed-Point Instructions . . . 57 3.1 Fixed-Point Processor Overview This chapter describes the registers and instructions that make up the Fixed-Point Processor facility. Chapter 3. Fixed-Point Processor 37 Version 2.04 3.2 Fixed-Point Processor Registers 3.2.1 General Purpose Registers causes SO to be set to 0 and OV to be set to 1. All manipulation of information is done in registers inter- 33 Overflow (OV) nal to the Fixed-Point Processor. The principal storage The Overflow bit is set to indicate that an over- internal to the Fixed-Point Processor is a set of 32 Gen- flow has occurred during execution of an eral Purpose Registers (GPRs). See Figure 39. instruction. XO-form Add, Subtract From, and Negate GPR 0 instructions having OE=1 set it to 1 if the carry GPR 1 out of bit M is not equal to the carry out of bit ... M+1, and set it to 0 otherwise. XO-form Multiply Low and Divide instructions ... having OE=1 set it to 1 if the result cannot be GPR 30 represented in 64 bits (mulld, divd, divdu) or in 32 bits (mullw, divw, divwu), and set it to 0 GPR 31 otherwise. The OV bit is not altered by Com- 0 63 pare instructions, nor by other instructions (except mtspr to the XER, and mcrxr) that Figure 39. General Purpose Registers cannot overflow. Each GPR is a 64-bit register. [Category: Legacy Integer Multiply-Accumulate] 3.2.2 Fixed-Point Exception Reg- XO-form Legacy Integer Multiply-Accumulate instructions set OV when OE=1 to reflect over- ister flow of the 32-bit result. For signed-integer accumulation, overflow occurs when the add The Fixed-Point Exception Register (XER) is a 64-bit produces a carry out of bit 32 that is not equal register. to the carry out of bit 33. For unsigned-integer accumulation, overflow occurs when the add XER produces a carry out of bit 32. 0 63 34 Carry (CA) Figure 40. Fixed-Point Exception Register The Carry bit is set as follows, during execu- The bit definitions for the Fixed-Point Exception Regis- tion of certain instructions. Add Carrying, Sub- ter are shown below. Here M=0 in 64-bit mode and tract From Carrying, Add Extended, and M=32 in 32-bit mode. Subtract From Extended types of instructions set it to 1 if there is a carry out of bit M, and The bits are set based on the operation of an instruc- set it to 0 otherwise. Shift Right Algebraic tion considered as a whole, not on intermediate results instructions set it to 1 if any 1-bits have been (e.g., the Subtract From Carrying instruction, the result shifted out of a negative operand, and set it to of which is specified as the sum of three values, sets 0 otherwise. The CA bit is not altered by Com- bits in the Fixed-Point Exception Register based on the pare instructions, nor by other instructions entire operation, not on an intermediate sum). (except Shift Right Algebraic, mtspr to the XER, and mcrxr) that cannot carry. Bit(s Description 35:56 Reserved 0:31 Reserved 57:63 This field specifies the number of bytes to be 32 Summary Overflow (SO) transferred by a Load String Indexed or Store The Summary Overflow bit is set to 1 when- String Indexed instruction. ever an instruction (except mtspr) sets the Overflow bit. Once set, the SO bit remains set [Category: Legacy Move Assist] until it is cleared by an mtspr instruction This field is used as a target by dmlzb to indi- (specifying the XER) or an mcrxr instruction. cate the byte location of the leftmost zero byte It is not altered by Compare instructions, nor found. by other instructions (except mtspr to the XER, and mcrxr) that cannot overflow. Exe- cuting an mtspr instruction to the XER, sup- plying the values 0 for SO and 1 for OV, 38 Power ISATM -- Book I Version 2.04 3.2.3 Program Priority Register tion programs. Additional Software Use SPRs are defined in Book III. [Category: Server] The Program Priority Register (PPR) is a 64-bit register SPRG4 that controls the program's priority. The layout of the SPRG5 PPR is shown in Figure 41. SPRG6 SPRG7 /// PRI /// ??? 0 63 0 11 14 44 63 Figure 42. Software-use SPRs Bit(s) Description The VRSAVE is a 32-bit register that also can be used 11:13 Program Priority (PRI) as a software use SPR. VRSAVE is also defined as part of Category: Embedded and Vector (see Section 010 low 5.3.3) 011 medium low 100 medium (normal) Programming Note 44:63 implementation-specific (read-only; values USPRG0 was made a 32-bit register and renamed written to this field by software are ignored) to VRSAVE; see Section 5.3.3 All other fields are reserved. Figure 41. Program Priority Register 3.2.5 Device Control Registers Programming Note [Category: Embedded] By setting the PRI field, a programmer may be able Device Control Registers (DCRs) are on-chip registers to improve system throughput by causing system that exist architecturally outside the processor and thus resources to be used more efficiently. are not actually part of the processor architecture. This E.g., if a program is waiting on a lock (see specification simply defines the existence of a Device Section B.2 of Book II), it could set low priority, with Control Register `address space' and the instructions to the result that more processor resources would be access them and does not define the Device Control diverted to the program that holds the lock. This Registers themselves. diversion of resources may enable the lock-holding Device Control Registers may control the use of program to complete the operation under the lock on-chip peripherals, such as memory controllers (the more quickly, and then relinquish the lock to the definition of specific Device Control Registers is imple- waiting program. mentation-dependent). The contents of user-mode-accessible Device Control Programming Note Registers can be read using mfdcrux and written using or Rx,Rx,Rx can be used to modify the PRI field; mtdcrux. see Section 3.3.14. Programming Note When the system error handler is invoked, the PRI field may be set to an undefined value. 3.2.4 Software Use SPRs [Cate- gory: Embedded] Software Use SPRs are 64-bit registers that have no defined functionality. SPRG4-7 can be read by applica- Chapter 3. Fixed-Point Processor 39 Version 2.04 3.3 Fixed-Point Processor Instructions 3.3.1 Fixed-Point Storage Access Instructions The Storage Access instructions compute the effective Programming Note address (EA) of the storage to be accessed as described in Section 1.10.3 on page 23. The DS field in DS-form Storage Access instruc- tions is a word offset, not a byte offset like the D Programming Note field in D-form Storage Access instructions. How- ever, for programming convenience, Assemblers The la extended mnemonic permits computing an should support the specification of byte offsets for effective address as a Load or Store instruction both forms of instruction. would, but loads the address itself into a GPR rather than loading the value that is in storage at that address. 3.3.1.1 Storage Access Exceptions Storage accesses will cause the system data storage error handler to be invoked if the program is not allowed to modify the target storage (Store only), or if the pro- gram attempts to access storage that is unavailable. 3.3.2 Fixed-Point Load Instructions The byte, halfword, word, or doubleword in storage addressed by EA is loaded into register RT. Many of the Load instructions have an "update" form, in which register RA is updated with the effective address. For these forms, if RA0 and RART, the effective address is placed into register RA and the storage ele- ment (byte, halfword, word, or doubleword) addressed by EA is loaded into RT. Programming Note In some implementations, the Load Algebraic and Load with Update instructions may have greater latency than other types of Load instructions. Moreover, Load with Update instructions may take longer to execute in some implementations than the corresponding pair of a non-update Load instruction and an Add instruction. 40 Power ISATM -- Book I Version 2.04 Load Byte and Zero D-form Load Byte and Zero Indexed X-form lbz RT,D(RA) lbzx RT,RA,RB 34 RT RA D 31 RT RA RB 87 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(D) EA 1 b + (RB) RT 1 560 || MEM(EA, 1) RT 1 560 || MEM(EA, 1) Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum The byte in storage addressed by EA is loaded into (RA|0)+ (RB). The byte in storage addressed by EA is RT56:63. RT0:55 are set to 0. loaded into RT56:63. RT0:55 are set to 0. Special Registers Altered: Special Registers Altered: None None Load Byte and Zero with Update D-form Load Byte and Zero with Update Indexed X-form lbzu RT,D(RA) lbzux RT,RA,RB 35 RT RA D 0 6 11 16 31 31 RT RA RB 119 / 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) RT 1 560 || MEM(EA, 1) EA 1 (RA) + (RB) RA 1 EA RT 1 560 || MEM(EA, 1) RA 1 EA Let the effective address (EA) be the sum (RA)+ D. The byte in storage addressed by EA is loaded into RT56:63. Let the effective address (EA) be the sum (RA)+ (RB). RT0:55 are set to 0. The byte in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. EA is placed into register RA. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None Chapter 3. Fixed-Point Processor 41 Version 2.04 Load Halfword and Zero D-form Load Halfword and Zero Indexed X-form lhz RT,D(RA) lhzx RT,RA,RB 40 RT RA D 31 RT RA RB 279 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(D) EA 1 b + (RB) RT 1 480 || MEM(EA, 2) RT 1 480 || MEM(EA, 2) Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum The halfword in storage addressed by EA is loaded into (RA|0)+ (RB). The halfword in storage addressed by RT48:63. RT0:47 are set to 0. EA is loaded into RT48:63. RT0:47 are set to 0. Special Registers Altered: Special Registers Altered: None None Load Halfword and Zero with Update Load Halfword and Zero with Update D-form Indexed X-form lhzu RT,D(RA) lhzux RT,RA,RB 41 RT RA D 31 RT RA RB 311 / 0 6 11 16 31 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) EA 1 (RA) + (RB) RT 1 480 || MEM(EA, 2) RT 1 480 || MEM(EA, 2) RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA)+ D. The Let the effective address (EA) be the sum (RA)+ (RB). halfword in storage addressed by EA is loaded into The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are set to 0. RT48:63. RT0:47 are set to 0. EA is placed into register RA. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None 42 Power ISATM -- Book I Version 2.04 Load Halfword Algebraic D-form Load Halfword Algebraic Indexed X-form lha RT,D(RA) lhax RT,RA,RB 42 RT RA D 31 RT RA RB 343 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(D) EA 1 b + (RB) RT 1 EXTS(MEM(EA, 2)) RT 1 EXTS(MEM(EA, 2)) Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum The halfword in storage addressed by EA is loaded into (RA|0)+ (RB). The halfword in storage addressed by RT48:63. RT0:47 are filled with a copy of bit 0 of the EA is loaded into RT48:63. RT0:47 are filled with a copy loaded halfword. of bit 0 of the loaded halfword. Special Registers Altered: Special Registers Altered: None None Load Halfword Algebraic with Update Load Halfword Algebraic with Update D-form Indexed X-form lhau RT,D(RA) lhaux RT,RA,RB 43 RT RA D 31 RT RA RB 375 / 0 6 11 16 31 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) EA 1 (RA) + (RB) RT 1 EXTS(MEM(EA, 2)) RT 1 EXTS(MEM(EA, 2)) RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA)+ D. The Let the effective address (EA) be the sum (RA)+ (RB). halfword in storage addressed by EA is loaded into The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are filled with a copy of bit 0 of the RT48:63. RT0:47 are filled with a copy of bit 0 of the loaded halfword. loaded halfword. EA is placed into register RA. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Chapter 3. Fixed-Point Processor 43 Version 2.04 Load Word and Zero D-form Load Word and Zero Indexed X-form lwz RT,D(RA) lwzx RT,RA,RB 32 RT RA D 31 RT RA RB 23 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(D) EA 1 b + (RB) RT 1 320 || MEM(EA, 4) RT 1 320 || MEM(EA, 4) Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum The word in storage addressed by EA is loaded into (RA|0)+ (RB). The word in storage addressed by EA is RT32:63. RT0:31 are set to 0. loaded into RT32:63. RT0:31 are set to 0. Special Registers Altered: Special Registers Altered: None None Load Word and Zero with Update D-form Load Word and Zero with Update Indexed X-form lwzu RT,D(RA) lwzux RT,RA,RB 33 RT RA D 0 6 11 16 31 31 RT RA RB 55 / 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) RT 1 320 || MEM(EA, 4) EA 1 (RA) + (RB) RA 1 EA RT 1 320 || MEM(EA, 4) RA 1 EA Let the effective address (EA) be the sum (RA)+ D. The word in storage addressed by EA is loaded into Let the effective address (EA) be the sum (RA)+ (RB). RT32:63. RT0:31 are set to 0. The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. EA is placed into register RA. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None 44 Power ISATM -- Book I Version 2.04 3.3.2.1 64-bit Fixed-Point Load Instructions [Category: 64-Bit] Load Word Algebraic DS-form Load Word Algebraic Indexed X-form lwa RT,DS(RA) lwax RT,RA,RB 58 RT RA DS 2 31 RT RA RB 341 / 0 6 11 16 30 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(DS || 0b00) EA 1 b + (RB) RT 1 EXTS(MEM(EA, 4)) RT 1 EXTS(MEM(EA, 4)) Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (DS||0b00). The word in storage addressed by (RA|0)+ (RB). The word in storage addressed by EA is EA is loaded into RT32:63. RT0:31 are filled with a copy loaded into RT32:63. RT0:31 are filled with a copy of bit 0 of bit 0 of the loaded word. of the loaded word. Special Registers Altered: Special Registers Altered: None None Load Word Algebraic with Update Indexed X-form lwaux RT,RA,RB 31 RT RA RB 373 / 0 6 11 16 21 31 EA 1 (RA) + (RB) RT 1 EXTS(MEM(EA, 4)) RA 1 EA Let the effective address (EA) be the sum (RA)+ (RB). The word in storage addressed by EA is loaded into RT32:63. RT0:31 are filled with a copy of bit 0 of the loaded word. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None Chapter 3. Fixed-Point Processor 45 Version 2.04 Load Doubleword DS-form Load Doubleword Indexed X-form ld RT,DS(RA) ldx RT,RA,RB 58 RT RA DS 0 31 RT RA RB 21 / 0 6 11 16 30 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(DS || 0b00) EA 1 b + (RB) RT 1 MEM(EA, 8) RT 1 MEM(EA, 8) Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (DS||0b00). The doubleword in storage (RA|0)+ (RB). The doubleword in storage addressed by addressed by EA is loaded into RT. EA is loaded into RT. Special Registers Altered: Special Registers Altered: None None Load Doubleword with Update DS-form Load Doubleword with Update Indexed X-form ldu RT,DS(RA) ldux RT,RA,RB 58 RT RA DS 1 0 6 11 16 30 31 31 RT RA RB 53 / 0 6 11 16 21 31 EA 1 (RA) + EXTS(DS || 0b00) RT 1 MEM(EA, 8) EA 1 (RA) + (RB) RA 1 EA RT 1 MEM(EA, 8) RA 1 EA Let the effective address (EA) be the sum (RA)+ (DS||0b00). The doubleword in storage Let the effective address (EA) be the sum (RA)+ (RB). addressed by EA is loaded into RT. The doubleword in storage addressed by EA is loaded into RT. EA is placed into register RA. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None 46 Power ISATM -- Book I Version 2.04 3.3.3 Fixed-Point Store Instructions The contents of register RS are stored into the byte, 1 If RA0, the effective address is placed into regis- halfword, word, or doubleword in storage addressed by ter RA. EA. 1 If RS=RA, the contents of register RS are copied to the target storage element and then EA is Many of the Store instructions have an "update" form, in placed into RA (RS). which register RA is updated with the effective address. For these forms, the following rules apply. Store Byte D-form Store Byte Indexed X-form stb RS,D(RA) stbx RS,RA,RB 38 RS RA D 31 RS RA RB 215 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(D) EA 1 b + (RB) MEM(EA, 1) 1 (RS)56:63 MEM(EA, 1) 1 (RS)56:63 Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum (RS)56:63 are stored into the byte in storage addressed (RA|0)+ (RB). (RS)56:63 are stored into the byte in stor- by EA. age addressed by EA. Special Registers Altered: Special Registers Altered: None None Store Byte with Update D-form Store Byte with Update Indexed X-form stbu RS,D(RA) stbux RS,RA,RB 39 RS RA D 31 RS RA RB 247 / 0 6 11 16 31 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) EA 1 (RA) + (RB) MEM(EA, 1) 1 (RS)56:63 MEM(EA, 1) 1 (RS)56:63 RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA)+ D. Let the effective address (EA) be the sum (RA)+ (RB). (RS)56:63 are stored into the byte in storage addressed (RS)56:63 are stored into the byte in storage addressed by EA. by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Chapter 3. Fixed-Point Processor 47 Version 2.04 Store Halfword D-form Store Halfword Indexed X-form sth RS,D(RA) sthx RS,RA,RB 44 RS RA D 31 RS RA RB 407 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(D) EA 1 b + (RB) MEM(EA, 2) 1 (RS)48:63 MEM(EA, 2) 1 (RS)48:63 Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum (RS)48:63 are stored into the halfword in storage (RA|0)+ (RB). (RS)48:63 are stored into the halfword in addressed by EA. storage addressed by EA. Special Registers Altered: Special Registers Altered: None None Store Halfword with Update D-form Store Halfword with Update Indexed X-form sthu RS,D(RA) sthux RS,RA,RB 45 RS RA D 0 6 11 16 31 31 RS RA RB 439 / 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) MEM(EA, 2) 1 (RS)48:63 EA 1 (RA) + (RB) RA 1 EA MEM(EA, 2) 1 (RS)48:63 RA 1 EA Let the effective address (EA) be the sum (RA)+ D. (RS)48:63 are stored into the halfword in storage Let the effective address (EA) be the sum (RA)+ (RB). addressed by EA. (RS)48:63 are stored into the halfword in storage addressed by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None 48 Power ISATM -- Book I Version 2.04 Store Word D-form Store Word Indexed X-form stw RS,D(RA) stwx RS,RA,RB 36 RS RA D 31 RS RA RB 151 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(D) EA 1 b + (RB) MEM(EA, 4) 1 (RS)32:63 MEM(EA, 4) 1 (RS)32:63 Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum (RS)32:63 are stored into the word in storage addressed (RA|0)+ (RB). (RS)32:63 are stored into the word in stor- by EA. age addressed by EA. Special Registers Altered: Special Registers Altered: None None Store Word with Update D-form Store Word with Update Indexed X-form stwu RS,D(RA) stwux RS,RA,RB 37 RS RA D 31 RS RA RB 183 / 0 6 11 16 31 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) EA 1 (RA) + (RB) MEM(EA, 4) 1 (RS)32:63 MEM(EA, 4) 1 (RS)32:63 RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA)+ D. Let the effective address (EA) be the sum (RA)+ (RB). (RS)32:63 are stored into the word in storage addressed (RS)32:63 are stored into the word in storage addressed by EA. by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Chapter 3. Fixed-Point Processor 49 Version 2.04 3.3.3.1 64-bit Fixed-Point Store Instructions [Category: 64-Bit] Store Doubleword DS-form Store Doubleword Indexed X-form std RS,DS(RA) stdx RS,RA,RB 62 RS RA DS 0 31 RS RA RB 149 / 0 6 11 16 30 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(DS || 0b00) EA 1 b + (RB) MEM(EA, 8) 1 (RS) MEM(EA, 8) 1 (RS) Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (DS||0b00). (RS) is stored into the doubleword (RA|0)+ (RB). (RS) is stored into the doubleword in in storage addressed by EA. storage addressed by EA. Special Registers Altered: Special Registers Altered: None None Store Doubleword with Update DS-form Store Doubleword with Update Indexed X-form stdu RS,DS(RA) stdux RS,RA,RB 62 RS RA DS 1 0 6 11 16 30 31 31 RS RA RB 181 / 0 6 11 16 21 31 EA 1 (RA) + EXTS(DS || 0b00) MEM(EA, 8) 1 (RS) EA 1 (RA) + (RB) RA 1 EA MEM(EA, 8) 1 (RS) RA 1 EA Let the effective address (EA) be the sum (RA)+ (DS||0b00). (RS) is stored into the doubleword in Let the effective address (EA) be the sum (RA)+ (RB). storage addressed by EA. (RS) is stored into the doubleword in storage addressed by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None 50 Power ISATM -- Book I Version 2.04 3.3.4 Fixed-Point Load and Store with Byte Reversal Instructions Programming Note Programming Note These instructions have the effect of loading and In some implementations, the Load Byte-Reverse storing data in the opposite byte ordering from that instructions may have greater latency than other which would be used by other Load and Store Load instructions. instructions. Load Halfword Byte-Reverse Indexed Store Halfword Byte-Reverse Indexed X-form X-form lhbrx RT,RA,RB sthbrx RS,RA,RB 31 RT RA RB 790 / 31 RS RA RB 918 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) load_data 1 MEM(EA, 2) MEM(EA, 2) 1 (RS)56:63 || (RS)48:55 RT 1 480 || load_data8:15 || load_data0:7 Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+(RB). (RA|0)+ (RB). (RS)56:63 are stored into bits 0:7 of the Bits 0:7 of the halfword in storage addressed by EA are halfword in storage addressed by EA. (RS)48:55 are loaded into RT56:63. Bits 8:15 of the halfword in storage stored into bits 8:15 of the halfword in storage addressed by EA are loaded into RT48:55. RT0:47 are addressed by EA. set to 0. Special Registers Altered: Special Registers Altered: None None Load Word Byte-Reverse Indexed X-form Store Word Byte-Reverse Indexed X-form lwbrx RT,RA,RB stwbrx RS,RA,RB 31 RT RA RB 534 / 31 RS RA RB 662 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) load_data 1 MEM(EA, 4) MEM(EA, 4) 1 (RS)56:63 || (RS)48:55 || (RS)40:47 RT 1 320 || load_data24:31 || load_data16:23 ||(RS)32:39 || load_data8:15 || load_data0:7 Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)56:63 are stored into bits 0:7 of the (RA|0)+ (RB). Bits 0:7 of the word in storage addressed word in storage addressed by EA. (RS)48:55 are stored by EA are loaded into RT56:63. Bits 8:15 of the word in into bits 8:15 of the word in storage addressed by EA. storage addressed by EA are loaded into RT48:55. Bits (RS)40:47 are stored into bits 16:23 of the word in stor- 16:23 of the word in storage addressed by EA are age addressed by EA. (RS)32:39 are stored into bits loaded into RT40:47. Bits 24:31 of the word in storage 24:31 of the word in storage addressed by EA. addressed by EA are loaded into RT32:39. RT0:31 are set to 0. Special Registers Altered: None Special Registers Altered: None Chapter 3. Fixed-Point Processor 51 Version 2.04 3.3.5 Fixed-Point Load and Store Multiple Instructions The Load/Store Multiple instructions have preferred For the Server environment, the Load/Store Multiple forms; see Section 1.8.1, "Preferred Instruction Forms" instructions are not supported in Little-Endian mode. If on page 19. In the preferred forms, storage alignment they are executed in Little-Endian mode, the system satisfies the following rule. alignment error handler is invoked. 1 The combination of the EA and RT (RS) is such that the low-order byte of GPR 31 is loaded (stored) from (into) the last byte of an aligned quadword in storage. Load Multiple Word D-form lmw RT,D(RA) 46 RT RA D 0 6 11 16 31 if RA = 0 then b 1 0 else b 1 (RA) EA 1 b + EXTS(D) r 1 RT do while r 31 GPR(r) 1 320 || MEM(EA, 4) r 1 r + 1 EA 1 EA + 4 Let n = (32-RT). Let the effective address (EA) be the sum (RA|0)+ D. n consecutive words starting at EA are loaded into the low-order 32 bits of GPRs RT through 31. The high-order 32 bits of these GPRs are set to zero. If RA is in the range of registers to be loaded, including the case in which RA=0, the instruction form is invalid. Special Registers Altered: None 52 Power ISATM -- Book I Version 2.04 Store Multiple Word D-form stmw RS,D(RA) 47 RS RA D 0 6 11 16 31 if RA = 0 then b 1 0 else b 1 (RA) EA 1 b + EXTS(D) r 1 RS do while r 31 MEM(EA, 4) 1 GPR(r)32:63 r 1 r + 1 EA 1 EA + 4 Let n = (32-RS). Let the effective address (EA) be the sum (RA|0)+ D. n consecutive words starting at EA are stored from the low-order 32 bits of GPRs RS through 31. Special Registers Altered: None Chapter 3. Fixed-Point Processor 53 Version 2.04 3.3.6 Fixed-Point Move Assist Instructions [Category: Move Assist] The Move Assist instructions allow movement of data 1 RT = 4 or 5 from storage to registers or from registers to storage 1 last register loaded/stored 12 without concern for alignment. These instructions can For some implementations, using GPR 4 for RS and RT be used for a short move between arbitrary storage may result in slightly faster execution than using GPR locations or to initiate a long move between unaligned 5. storage fields. For the Server environment, the Move Assist instruc- The Load/Store String instructions have preferred tions are not supported in Little-Endian mode. If they forms; see Section 1.8.1, "Preferred Instruction Forms" are executed in Little-Endian mode, the system align- on page 19. In the preferred forms, register usage sat- ment error handler may be invoked or the instructions isfies the following rules. may be treated as no-ops if the number of bytes speci- 1 RS = 4 or 5 fied by the instruction is 0. 54 Power ISATM -- Book I Version 2.04 Load String Word Immediate X-form Load String Word Indexed X-form lswi RT,RA,NB lswx RT,RA,RB 31 RT RA NB 597 / 31 RT RA RB 533 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then EA 1 0 if RA = 0 then b 1 0 else EA 1 (RA) else b 1 (RA) if NB = 0 then n 1 32 EA 1 b + (RB) else n 1 NB n 1 XER57:63 r 1 RT - 1 r 1 RT - 1 i 1 32 i 1 32 do while n > 0 RT 1 undefined if i = 32 then do while n > 0 r 1 r + 1 (mod 32) if i = 32 then GPR(r) 1 0 r 1 r + 1 (mod 32) GPR(r)i:i+7 1 MEM(EA, 1) GPR(r) 1 0 i 1 i + 8 GPR(r)i:i+7 1 MEM(EA, 1) if i = 64 then i 1 32 i 1 i + 8 EA 1 EA + 1 if i = 64 then i 1 32 n 1 n - 1 EA 1 EA + 1 n 1 n - 1 Let the effective address (EA) be (RA|0). Let n = NB if NB0, n = 32 if NB=0; n is the number of bytes to load. Let the effective address (EA) be the sum Let nr=CEIL(n/4); nr is the number of registers to (RA|0)+ (RB). Let n=XER57:63; n is the number of bytes receive data. to load. Let nr=CEIL(n/4); nr is the number of registers to receive data. n consecutive bytes starting at EA are loaded into GPRs RT through RT+nr-1. Data are loaded into the If n>0, n consecutive bytes starting at EA are loaded low-order four bytes of each GPR; the high-order four into GPRs RT through RT+nr-1. Data are loaded into bytes are set to 0. the low-order four bytes of each GPR; the high-order four bytes are set to 0. Bytes are loaded left to right in each register. The sequence of registers wraps around to GPR 0 if Bytes are loaded left to right in each register. The required. If the low-order four bytes of register sequence of registers wraps around to GPR 0 if RT+nr-1 are only partially filled, the unfilled low-order required. If the low-order four bytes of register byte(s) of that register are set to 0. RT+nr-1 are only partially filled, the unfilled low-order byte(s) of that register are set to 0. If RA is in the range of registers to be loaded, including the case in which RA=0, the instruction form is invalid. If n=0, the contents of register RT are undefined. Special Registers Altered: If RA or RB is in the range of registers to be loaded, None including the case in which RA=0, the instruction is treated as if the instruction form were invalid. If RT=RA or RT=RB, the instruction form is invalid. Special Registers Altered: None Chapter 3. Fixed-Point Processor 55 Version 2.04 Store String Word Immediate X-form Store String Word Indexed X-form stswi RS,RA,NB stswx RS,RA,RB 31 RS RA NB 725 / 31 RS RA RB 661 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then EA 1 0 if RA = 0 then b 1 0 else EA 1 (RA) else b 1 (RA) if NB = 0 then n 1 32 EA 1 b + (RB) else n 1 NB n 1 XER57:63 r 1 RS - 1 r 1 RS - 1 i 1 32 i 1 32 do while n > 0 do while n > 0 if i = 32 then r 1 r + 1 (mod 32) if i = 32 then r 1 r + 1 (mod 32) MEM(EA, 1) 1 GPR(r)i:i+7 MEM(EA, 1) 1 GPR(r)i:i+7 i 1 i + 8 i 1 i + 8 if i = 64 then i 1 32 if i = 64 then i 1 32 EA 1 EA + 1 EA 1 EA + 1 n 1 n - 1 n 1 n - 1 Let the effective address (EA) be (RA|0). Let n = NB if Let the effective address (EA) be the sum NB0, n = 32 if NB=0; n is the number of bytes to store. (RA|0)+ (RB). Let n = XER57:63; n is the number of Let nr =CEIL(n/4); nr is the number of registers to sup- bytes to store. Let nr = CEIL(n/4); nr is the number of ply data. registers to supply data. n consecutive bytes starting at EA are stored from If n>0, n consecutive bytes starting at EA are stored GPRs RS through RS+nr-1. Data are stored from the from GPRs RS through RS+nr-1. Data are stored from low-order four bytes of each GPR. the low-order four bytes of each GPR. Bytes are stored left to right from each register. The Bytes are stored left to right from each register. The sequence of registers wraps around to GPR 0 if sequence of registers wraps around to GPR 0 if required. required. Special Registers Altered: If n=0, no bytes are stored. None Special Registers Altered: None 56 Power ISATM -- Book I Version 2.04 3.3.7 Other Fixed-Point Instructions The remainder of the fixed-point instructions use the these bits are set by signed comparison of the result to contents of the General Purpose Registers (GPRs) as zero. In 32-bit mode, these bits are set by signed com- source operands, and place results into GPRs, into the parison of the low-order 32 bits of the result to zero. Fixed-Point Exception Register (XER), and into Condi- Unless otherwise noted and when appropriate, when tion Register fields. In addition, the Trap instructions CR Field 0 and the XER are set they reflect the value test the contents of a GPR or XER bit, invoking the sys- placed into the target register. tem trap handler if the result of the specified test is true. These instructions treat the source operands as signed Programming Note integers unless the instruction is explicitly identified as Instructions with the OE bit set or that set CA may performing an unsigned operation. execute slowly or may prevent the execution of sub- The X-form and XO-form instructions with Rc=1, and sequent instructions until the instruction has com- the D-form instructions addic., andi., and andis., set pleted. the first three bits of CR Field 0 to characterize the result placed into the target register. In 64-bit mode, Chapter 3. Fixed-Point Processor 57 Version 2.04 3.3.8 Fixed-Point Arithmetic Instructions The XO-form Arithmetic instructions with Rc=1, and the Extended mnemonics for addition and D-form Arithmetic instruction addic., set the first three subtraction bits of CR Field 0 as described in Section 3.3.7, "Other Fixed-Point Instructions". Several extended mnemonics are provided that use the Add Immediate and Add Immediate Shifted instructions addic, addic., subfic, addc, subfc, adde, subfe, to load an immediate value or an address into a target addme, subfme, addze, and subfze always set CA, to register. Some of these are shown as examples with reflect the carry out of bit 0 in 64-bit mode and out of bit the two instructions. 32 in 32-bit mode. The XO-form Arithmetic instructions set SO and OV when OE=1 to reflect overflow of the The Power ISA supplies Subtract From instructions, result. Except for the Multiply Low and Divide instruc- which subtract the second operand from the third. A set tions, the setting of these bits is mode-dependent, and of extended mnemonics is provided that use the more reflects overflow of the 64-bit result in 64-bit mode and "normal" order, in which the third operand is subtracted overflow of the low-order 32-bit result in 32-bit mode. from the second, with the third operand being either an For XO-form Multiply Low and Divide instructions, the immediate field or a register. Some of these are shown setting of these bits is mode-independent, and reflects as examples with the appropriate Add and Subtract overflow of the 64-bit result for mulld, divd, and divdu, From instructions. and overflow of the low-order 32-bit result for mullw, See Appendix D for additional extended mnemonics. divw, and divwu. Programming Note Notice that CR Field 0 may not reflect the "true" (infinitely precise) result if overflow occurs. Add Immediate D-form Add Immediate Shifted D-form addi RT,RA,SI addis RT,RA,SI 14 RT RA SI 15 RT RA SI 0 6 11 16 31 0 6 11 16 31 if RA = 0 then RT 1 EXTS(SI) if RA = 0 then RT 1 EXTS(SI || 160) else RT 1 (RA) + EXTS(SI) else RT 1 (RA) + EXTS(SI || 160) The sum (RA|0) + SI is placed into register RT. The sum (RA|0) + (SI || 0x0000) is placed into register RT. Special Registers Altered: None Special Registers Altered: None Extended Mnemonics: Extended Mnemonics: Examples of extended mnemonics for Add Immediate: Examples of extended mnemonics for Add Immediate Extended: Equivalent to: Shifted: li Rx,value addi Rx,0,value la Rx,disp(Ry) addi Rx,Ry,disp Extended: Equivalent to: subi Rx,Ry,value addi Rx,Ry,-value lis Rx,value addis Rx,0,value subis Rx,Ry,value addis Rx,Ry,-value Programming Note addi, addis, add, and subf are the preferred instructions for addition and subtraction, because they set few status bits. Notice that addi and addis use the value 0, not the contents of GPR 0, if RA=0. 58 Power ISATM -- Book I Version 2.04 Add XO-form Subtract From XO-form add RT,RA,RB (OE=0 Rc=0) subf RT,RA,RB (OE=0 Rc=0) add. RT,RA,RB (OE=0 Rc=1) subf. RT,RA,RB (OE=0 Rc=1) addo RT,RA,RB (OE=1 Rc=0) subfo RT,RA,RB (OE=1 Rc=0) addo. RT,RA,RB (OE=1 Rc=1) subfo. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB OE 266 Rc 31 RT RA RB OE 40 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 RT 1 (RA) + (RB) RT 1 ¬(RA) + (RB) + 1 The sum (RA) + (RB) is placed into register RT. The sum ¬(RA) + (RB) +1 is placed into register RT. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Extended Mnemonics: Example of extended mnemonics for Subtract From: Extended: Equivalent to: sub Rx,Ry,Rz subf Rx,Rz,Ry Add Immediate Carrying D-form Add Immediate Carrying and Record D-form addic RT,RA,SI addic. RT,RA,SI 12 RT RA SI 0 6 11 16 31 13 RT RA SI 0 6 11 16 31 RT 1 (RA) + EXTS(SI) RT 1 (RA) + EXTS(SI) The sum (RA) + SI is placed into register RT. The sum (RA) + SI is placed into register RT. Special Registers Altered: CA Special Registers Altered: CR0 CA Extended Mnemonics: Extended Mnemonics: Example of extended mnemonics for Add Immediate Carrying: Example of extended mnemonics for Add Immediate Carrying and Record: Extended: Equivalent to: subic Rx,Ry,value addic Rx,Ry,-value Extended: Equivalent to: subic. Rx,Ry,value addic. Rx,Ry,-value Chapter 3. Fixed-Point Processor 59 Version 2.04 Subtract From Immediate Carrying D-form subfic RT,RA,SI 8 RT RA SI 0 6 11 16 31 RT 1 ¬(RA) + EXTS(SI) + 1 The sum ¬(RA) + SI + 1 is placed into register RT. Special Registers Altered: CA Add Carrying XO-form Subtract From Carrying XO-form addc RT,RA,RB (OE=0 Rc=0) subfc RT,RA,RB (OE=0 Rc=0) addc. RT,RA,RB (OE=0 Rc=1) subfc. RT,RA,RB (OE=0 Rc=1) addco RT,RA,RB (OE=1 Rc=0) subfco RT,RA,RB (OE=1 Rc=0) addco. RT,RA,RB (OE=1 Rc=1) subfco. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB OE 10 Rc 31 RT RA RB OE 8 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 RT 1 (RA) + (RB) RT 1 ¬(RA) + (RB) + 1 The sum (RA) + (RB) is placed into register RT. The sum ¬(RA) + (RB) + 1 is placed into register RT. Special Registers Altered: Special Registers Altered: CA CA CR0 (if Rc=1) CR0 (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Extended Mnemonics: Example of extended mnemonics for Subtract From Carrying: Extended: Equivalent to: subc Rx,Ry,Rz subfc Rx,Rz,Ry 60 Power ISATM -- Book I Version 2.04 Add Extended XO-form Subtract From Extended XO-form adde RT,RA,RB (OE=0 Rc=0) subfe RT,RA,RB (OE=0 Rc=0) adde. RT,RA,RB (OE=0 Rc=1) subfe. RT,RA,RB (OE=0 Rc=1) addeo RT,RA,RB (OE=1 Rc=0) subfeo RT,RA,RB (OE=1 Rc=0) addeo. RT,RA,RB (OE=1 Rc=1) subfeo. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB OE 138 Rc 31 RT RA RB OE 136 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 RT 1 (RA) + (RB) + CA RT 1 ¬(RA) + (RB) + CA The sum (RA) + (RB) + CA is placed into register RT. The sum ¬(RA) + (RB) + CA is placed into register RT. Special Registers Altered: Special Registers Altered: CA CA CR0 (if Rc=1) CR0 (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Add to Minus One Extended XO-form Subtract From Minus One Extended XO-form addme RT,RA (OE=0 Rc=0) addme. RT,RA (OE=0 Rc=1) subfme RT,RA (OE=0 Rc=0) addmeo RT,RA (OE=1 Rc=0) subfme. RT,RA (OE=0 Rc=1) addmeo. RT,RA (OE=1 Rc=1) subfmeo RT,RA (OE=1 Rc=0) subfmeo. RT,RA (OE=1 Rc=1) 31 RT RA /// OE 234 Rc 0 6 11 16 21 22 31 31 RT RA /// OE 232 Rc 0 6 11 16 21 22 31 RT 1 (RA) + CA - 1 RT 1 ¬(RA) + CA - 1 The sum (RA) + CA + 641 is placed into register RT. The sum ¬(RA) + CA + 641 is placed into register RT. Special Registers Altered: CA Special Registers Altered: CR0 (if Rc=1) CA SO OV (if OE=1) CR0 (if Rc=1) SO OV (if OE=1) Chapter 3. Fixed-Point Processor 61 Version 2.04 Add to Zero Extended XO-form Subtract From Zero Extended XO-form addze RT,RA (OE=0 Rc=0) subfze RT,RA (OE=0 Rc=0) addze. RT,RA (OE=0 Rc=1) subfze. RT,RA (OE=0 Rc=1) addzeo RT,RA (OE=1 Rc=0) subfzeo RT,RA (OE=1 Rc=0) addzeo. RT,RA (OE=1 Rc=1) subfzeo. RT,RA (OE=1 Rc=1) 31 RT RA /// OE 202 Rc 31 RT RA /// OE 200 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 RT 1 (RA) + CA RT 1 ¬(RA) + CA The sum (RA) + CA is placed into register RT. The sum ¬(RA) + CA is placed into register RT. Special Registers Altered: Special Registers Altered: CA CA CR0 (if Rc=1) CR0 (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Programming Note The setting of CA by the Add and Subtract From instructions, including the Extended versions thereof, is mode-dependent. If a sequence of these instructions is used to perform extended-pre- cision addition or subtraction, the same mode should be used throughout the sequence. Negate XO-form neg RT,RA (OE=0 Rc=0) neg. RT,RA (OE=0 Rc=1) nego RT,RA (OE=1 Rc=0) nego. RT,RA (OE=1 Rc=1) 31 RT RA /// OE 104 Rc 0 6 11 16 21 22 31 RT 1 ¬(RA) + 1 The sum ¬(RA) + 1 is placed into register RT. If the processor is in 64-bit mode and register RA con- tains the most negative 64-bit number (0x8000_ 0000_0000_0000), the result is the most negative num- ber and, if OE=1, OV is set to 1. Similarly, if the proces- sor is in 32-bit mode and (RA)32:63 contain the most negative 32-bit number (0x8000_0000), the low-order 32 bits of the result contain the most negative 32-bit number and, if OE=1, OV is set to 1. Special Registers Altered: CR0 (if Rc=1) SO OV (if OE=1) 62 Power ISATM -- Book I Version 2.04 Multiply Low Immediate D-form Multiply High Word XO-form mulli RT,RA,SI mulhw RT,RA,RB (Rc=0) mulhw. RT,RA,RB (Rc=1) 7 RT RA SI 0 6 11 16 31 31 RT RA RB / 75 Rc 0 6 11 16 21 22 31 prod0:127 1 (RA) × EXTS(SI) RT 1 prod64:127 prod0:63 1 (RA)32:63 × (RB)32:63 RT32:63 1 prod0:31 The 64-bit first operand is (RA). The 64-bit second RT0:31 1 undefined operand is the sign-extended value of the SI field. The low-order 64 bits of the 128-bit product of the operands The 32-bit operands are the low-order 32 bits of RA are placed into register RT. and of RB. The high-order 32 bits of the 64-bit product of the operands are placed into RT32:63. The contents Both operands and the product are interpreted as of RT0:31 are undefined. signed integers. Both operands and the product are interpreted as Special Registers Altered: signed integers. None Special Registers Altered: CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) Multiply Low Word XO-form mullw RT,RA,RB (OE=0 Rc=0) Multiply High Word Unsigned XO-form mullw. RT,RA,RB (OE=0 Rc=1) mullwo RT,RA,RB (OE=1 Rc=0) mulhwu RT,RA,RB (Rc=0) mullwo. RT,RA,RB (OE=1 Rc=1) mulhwu. RT,RA,RB (Rc=1) 31 RT RA RB OE 235 Rc 31 RT RA RB / 11 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 RT 1 (RA)32:63 × (RB)32:63 prod0:63 1 (RA)32:63 × (RB)32:63 RT32:63 1 prod0:31 The 32-bit operands are the low-order 32 bits of RA RT0:31 1 undefined and of RB. The 64-bit product of the operands is placed into register RT. The 32-bit operands are the low-order 32 bits of RA and of RB. The high-order 32 bits of the 64-bit product If OE=1 then OV is set to 1 if the product cannot be rep- of the operands are placed into RT32:63. The contents resented in 32 bits. of RT0:31 are undefined. Both operands and the product are interpreted as Both operands and the product are interpreted as signed integers. unsigned integers, except that if Rc=1 the first three Special Registers Altered: bits of CR Field 0 are set by signed comparison of the CR0 (if Rc=1) result to zero. SO OV (if OE=1) Special Registers Altered: CR0 (bits 0:2undefined in 64-bit mode) (if Rc=1) Programming Note For mulli and mullw, the low-order 32 bits of the product are the correct 32-bit product for 32-bit mode. For mulli and mulld, the low-order 64 bits of the product are independent of whether the operands are regarded as signed or unsigned 64-bit integers. For mulli and mullw, the low-order 32 bits of the product are independent of whether the operands are regarded as signed or unsigned 32-bit integers. Chapter 3. Fixed-Point Processor 63 Version 2.04 Divide Word XO-form Divide Word Unsigned XO-form divw RT,RA,RB (OE=0 Rc=0) divwu RT,RA,RB (OE=0 Rc=0) divw. RT,RA,RB (OE=0 Rc=1) divwu. RT,RA,RB (OE=0 Rc=1) divwo RT,RA,RB (OE=1 Rc=0) divwuo RT,RA,RB (OE=1 Rc=0) divwo. RT,RA,RB (OE=1 Rc=1) divwuo. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB OE 491 Rc 31 RT RA RB OE 459 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 dividend0:63 1 EXTS((RA)32:63) dividend0:63 1 320 || (RA)32:63 divisor0:63 1 EXTS((RB)32:63) divisor0:63 1 320 || (RB)32:63 RT32:63 1 dividend ÷ divisor RT32:63 1 dividend ÷ divisor RT0:31 1 undefined RT0:31 1 undefined The 64-bit dividend is the sign-extended value of The 64-bit dividend is the zero-extended value of (RA)32:63. The 64-bit divisor is the sign-extended value (RA)32:63. The 64-bit divisor is the zero-extended value of (RB)32:63. The 64-bit quotient is formed. The of (RB)32:63. The 64-bit quotient is formed. The low-order 32 bits of the 64-bit quotient are placed into low-order 32 bits of the 64-bit quotient are placed into RT32:63. The contents of RT0:31 are undefined. The RT32:63. The contents of RT0:31 are undefined. The remainder is not supplied as a result. remainder is not supplied as a result. Both operands and the quotient are interpreted as Both operands and the quotient are interpreted as signed integers. The quotient is the unique signed inte- unsigned integers, except that if Rc=1 the first three ger that satisfies bits of CR Field 0 are set by signed comparison of the result to zero. The quotient is the unique unsigned inte- dividend = (quotient × divisor) + r ger that satisfies where 0 r < |divisor| if the dividend is nonnegative, dividend = (quotient × divisor) + r and -|divisor| < r 0 if the dividend is negative. where 0 r < divisor. If an attempt is made to perform any of the divisions If an attempt is made to perform the division 0x8000_0000 ÷ -1 ÷ 0 ÷ 0 then the contents of register RT are undefined as are then the contents of register RT are undefined as are (if (if Rc=1) the contents of the LT, GT, and EQ bits of CR Rc=1) the contents of the LT, GT, and EQ bits of CR Field 0. In these cases, if OE=1 then OV is set to 1. Field 0. In this case, if OE=1 then OV is set to 1. Special Registers Altered: Special Registers Altered: CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Programming Note Programming Note The 32-bit signed remainder of dividing (RA)32:63 The 32-bit unsigned remainder of dividing (RA)32:63 by (RB)32:63 can be computed as follows, except in by (RB)32:63 can be computed as follows. the case that (RA)32:63 = -231 and (RB)32:63 = -1. divwu RT,RA,RB # RT = quotient divw RT,RA,RB # RT = quotient mullw RT,RT,RB # RT = quotient×divisor mullw RT,RT,RB # RT = quotient×divisor subf RT,RT,RA # RT = remainder subf RT,RT,RA # RT = remainder 64 Power ISATM -- Book I Version 2.04 3.3.8.1 64-bit Fixed-Point Arithmetic Instructions [Category: 64-Bit] Multiply Low Doubleword XO-form Multiply High Doubleword XO-form mulld RT,RA,RB (OE=0 Rc=0) mulhd RT,RA,RB (Rc=0) mulld. RT,RA,RB (OE=0 Rc=1) mulhd. RT,RA,RB (Rc=1) mulldo RT,RA,RB (OE=1 Rc=0) mulldo. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB / 73 Rc 0 6 11 16 21 22 31 31 RT RA RB OE 233 Rc 0 6 11 16 21 22 31 prod0:127 1 (RA) × (RB) RT 1 prod0:63 prod0:127 1 (RA) × (RB) The 64-bit operands are (RA) and (RB). The RT 1 prod64:127 high-order 64 bits of the 128-bit product of the oper- The 64-bit operands are (RA) and (RB). The low-order ands are placed into register RT. 64 bits of the 128-bit product of the operands are Both operands and the product are interpreted as placed into register RT. signed integers. If OE=1 then OV is set to 1 if the product cannot be rep- Special Registers Altered: resented in 64 bits. CR0 (if Rc=1) Both operands and the product are interpreted as signed integers. Special Registers Altered: CR0 (if Rc=1) SO OV (if OE=1) Programming Note The XO-form Multiply instructions may execute faster on some implementations if RB contains the operand having the smaller absolute value. Multiply High Doubleword Unsigned XO-form mulhdu RT,RA,RB (Rc=0) mulhdu. RT,RA,RB (Rc=1) 31 RT RA RB / 9 Rc 0 6 11 16 21 22 31 prod0:127 1 (RA) × (RB) RT 1 prod0:63 The 64-bit operands are (RA) and (RB). The high-order 64 bits of the 128-bit product of the oper- ands are placed into register RT. Both operands and the product are interpreted as unsigned integers, except that if Rc=1 the first three bits of CR Field 0 are set by signed comparison of the result to zero. Special Registers Altered: CR0 (if Rc=1) Chapter 3. Fixed-Point Processor 65 Version 2.04 Divide Doubleword XO-form Divide Doubleword Unsigned XO-form divd RT,RA,RB (OE=0 Rc=0) divdu RT,RA,RB (OE=0 Rc=0) divd. RT,RA,RB (OE=0 Rc=1) divdu. RT,RA,RB (OE=0 Rc=1) divdo RT,RA,RB (OE=1 Rc=0) divduo RT,RA,RB (OE=1 Rc=0) divdo. RT,RA,RB (OE=1 Rc=1) divduo. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB OE 489 Rc 31 RT RA RB OE 457 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 dividend0:63 1 (RA) dividend0:63 1 (RA) divisor0:63 1 (RB) divisor0:63 1 (RB) RT 1 dividend ÷ divisor RT 1 dividend ÷ divisor The 64-bit dividend is (RA). The 64-bit divisor is (RB). The 64-bit dividend is (RA). The 64-bit divisor is (RB). The 64-bit quotient of the dividend and divisor is placed The 64-bit quotient of the dividend and divisor is placed into register RT. The remainder is not supplied as a into register RT. The remainder is not supplied as a result. result. Both operands and the quotient are interpreted as Both operands and the quotient are interpreted as signed integers. The quotient is the unique signed inte- unsigned integers, except that if Rc=1 the first three ger that satisfies bits of CR Field 0 are set by signed comparison of the result to zero. The quotient is the unique unsigned inte- dividend = (quotient × divisor) + r ger that satisfies where 0 r < |divisor| if the dividend is nonnegative, dividend = (quotient × divisor) + r and -|divisor| < r 0 if the dividend is negative. where 0 r < divisor. If an attempt is made to perform any of the divisions If an attempt is made to perform the division 0x8000_0000_0000_0000 ÷ -1 ÷ 0 ÷ 0 then the contents of register RT are undefined as are (if then the contents of register RT are undefined as are (if Rc=1) the contents of the LT, GT, and EQ bits of CR Rc=1) the contents of the LT, GT, and EQ bits of CR Field 0. In these cases, if OE=1 then OV is set to 1. Field 0. In this case, if OE=1 then OV is set to 1. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Programming Note Programming Note The 64-bit signed remainder of dividing (RA) by The 64-bit unsigned remainder of dividing (RA) by (RB) can be computed as follows, except in the (RB) can be computed as follows. case that (RA) = -263 and (RB) = -1. divdu RT,RA,RB # RT = quotient divd RT,RA,RB # RT = quotient mulld RT,RT,RB # RT = quotient×divisor mulld RT,RT,RB # RT = quotient×divisor subf RT,RT,RA # RT = remainder subf RT,RT,RA # RT = remainder 66 Power ISATM -- Book I Version 2.04 3.3.9 Fixed-Point Compare Instructions The fixed-point Compare instructions compare the con- two to 0. XERSO is copied to bit 3 of the designated CR tents of register RA with (1) the sign-extended value of field. the SI field, (2) the zero-extended value of the UI field, The CR field is set as follows or (3) the contents of register RB. The comparison is signed for cmpi and cmp, and unsigned for cmpli and . cmpl. Bit Name Description 0 LT (RA) < SI or (RB) (signed comparison) The L field controls whether the operands are treated (RA) SI or (RB) (signed comparison) L Operand length (RA) >u UI or (RB) (unsigned comparison) 0 32-bit operands 2 EQ (RA) = SI, UI, or (RB) 1 64-bit operands 3 SO Summary Overflow from the XER L=1 is part of Category: 64-Bit. Extended mnemonics for compares When the operands are treated as 32-bit signed quanti- A set of extended mnemonics is provided so that com- ties, bit 32 of the register (RA or RB) is the sign bit. pares can be coded with the operand length as part of The Compare instructions set one bit in the leftmost the mnemonic rather than as a numeric operand. Some three bits of the designated CR field to 1, and the other of these are shown as examples with the Compare instructions. See Appendix D for additional extended mnemonics. Compare Immediate D-form Compare X-form cmpi BF,L,RA,SI cmp BF,L,RA,RB 11 BF / L RA SI 31 BF / L RA RB 0 / 0 6 9 10 11 16 31 0 6 9 10 11 16 21 31 if L = 0 then a 1 EXTS((RA)32:63) if L = 0 then a 1 EXTS((RA)32:63) else a 1 (RA) b 1 EXTS((RB)32:63) if a < EXTS(SI) then c 1 0b100 else a 1 (RA) else if a > EXTS(SI) then c 1 0b010 b 1 (RB) else c 1 0b001 if a < b then c 1 0b100 CR4×BF+32:4×BF+35 1 c || XERSO else if a > b then c 1 0b010 else c 1 0b001 The contents of register RA ((RA)32:63 sign-extended to CR4×BF+32:4×BF+35 1 c || XERSO 64 bits if L=0) are compared with the sign-extended value of the SI field, treating the operands as signed The contents of register RA ((RA)32:63 if L=0) are com- integers. The result of the comparison is placed into CR pared with the contents of register RB ((RB)32:63 if field BF. L=0), treating the operands as signed integers. The result of the comparison is placed into CR field BF. Special Registers Altered: CR field BF Special Registers Altered: CR field BF Extended Mnemonics: Extended Mnemonics: Examples of extended mnemonics for Compare Imme- diate: Examples of extended mnemonics for Compare: Extended: Equivalent to: Extended: Equivalent to: cmpdi Rx,value cmpi 0,1,Rx,value cmpd Rx,Ry cmp 0,1,Rx,Ry cmpwi cr3,Rx,value cmpi 3,0,Rx,value cmpw cr3,Rx,Ry cmp 3,0,Rx,Ry Chapter 3. Fixed-Point Processor 67 Version 2.04 Compare Logical Immediate D-form Compare Logical X-form cmpli BF,L,RA,UI cmpl BF,L,RA,RB 10 BF / L RA UI 31 BF / L RA RB 32 / 0 6 9 10 11 16 31 0 6 9 10 11 16 21 31 if L = 0 then a 1 320 || (RA)32:63 if L = 0 then a 1 320 || (RA)32:63 else a 1 (RA) b 1 320 || (RB)32:63 if a u (480 || UI) then c 1 0b010 b 1 (RB) else c 1 0b001 if a u b then c 1 0b010 else c 1 0b001 The contents of register RA ((RA)32:63 zero-extended to CR4×BF+32:4×BF+35 1 c || XERSO 64 bits if L=0) are compared with 480 || UI, treating the operands as unsigned integers. The result of the com- The contents of register RA ((RA)32:63 if L=0) are com- parison is placed into CR field BF. pared with the contents of register RB ((RB)32:63 if L=0), treating the operands as unsigned integers. The Special Registers Altered: result of the comparison is placed into CR field BF. CR field BF Special Registers Altered: Extended Mnemonics: CR field BF Examples of extended mnemonics for Compare Logical Extended Mnemonics: Immediate: Examples of extended mnemonics for Compare Logi- Extended: Equivalent to: cal: cmpldi Rx,value cmpli 0,1,Rx,value cmplwi cr3,Rx,value cmpli 3,0,Rx,value Extended: Equivalent to: cmpld Rx,Ry cmpl 0,1,Rx,Ry cmplw cr3,Rx,Ry cmpl 3,0,Rx,Ry 68 Power ISATM -- Book I Version 2.04 3.3.10 Fixed-Point Trap Instructions The Trap instructions are provided to test for a specified TO Bit ANDed with Condition set of conditions. If any of the conditions tested by a 0 Less Than, using signed comparison Trap instruction are met, the system trap handler is 1 Greater Than, using signed comparison invoked. If none of the tested conditions are met, 2 Equal instruction execution continues normally. 3 Less Than, using unsigned comparison 4 Greater Than, using unsigned comparison The contents of register RA are compared with either the sign-extended value of the SI field or the contents of register RB, depending on the Trap instruction. For Extended mnemonics for traps tdi and td, the entire contents of RA (and RB) partici- A set of extended mnemonics is provided so that traps pate in the comparison; for twi and tw, only the con- can be coded with the condition as part of the mne- tents of the low-order 32 bits of RA (and RB) participate monic rather than as a numeric operand. Some of in the comparison. these are shown as examples with the Trap instruc- This comparison results in five conditions which are tions. See Appendix D for additional extended mne- ANDed with TO. If the result is not 0 the system trap monics. handler is invoked. These conditions are as follows. Trap Word Immediate D-form Trap Word X-form twi TO,RA,SI tw TO,RA,RB 3 TO RA SI 31 TO RA RB 4 / 0 6 11 16 31 0 6 11 16 21 31 a 1 EXTS((RA)32:63) a 1 EXTS((RA)32:63) if (a < EXTS(SI)) & TO0 then TRAP b 1 EXTS((RB)32:63) if (a > EXTS(SI)) & TO1 then TRAP if (a < b) & TO0 then TRAP if (a = EXTS(SI)) & TO2 then TRAP if (a > b) & TO1 then TRAP if (a u EXTS(SI)) & TO4 then TRAP if (a u b) & TO4 then TRAP The contents of RA32:63 are compared with the sign-extended value of the SI field. If any bit in the TO The contents of RA32:63 are compared with the con- field is set to 1 and its corresponding condition is met tents of RB32:63. If any bit in the TO field is set to 1 and by the result of the comparison, the system trap han- its corresponding condition is met by the result of the dler is invoked. comparison, the system trap handler is invoked. If the trap conditions are met, this instruction is context If the trap conditions are met, this instruction is context synchronizing (see Book III). synchronizing (see Book III). Special Registers Altered: Special Registers Altered: None None Extended Mnemonics: Extended Mnemonics: Examples of extended mnemonics for Trap Word Examples of extended mnemonics for Trap Word: Immediate: Extended: Equivalent to: Extended: Equivalent to: tweq Rx,Ry tw 4,Rx,Ry twgti Rx,value twi 8,Rx,value twlge Rx,Ry tw 5,Rx,Ry twllei Rx,value twi 6,Rx,value trap tw 31,0,0 Chapter 3. Fixed-Point Processor 69 Version 2.04 3.3.10.1 64-bit Fixed-Point Trap Instructions [Category: 64-Bit] Trap Doubleword Immediate D-form Trap Doubleword X-form tdi TO,RA,SI td TO,RA,RB 2 TO RA SI 31 TO RA RB 68 / 0 6 11 16 31 0 6 11 16 21 31 a 1 (RA) a 1 (RA) if (a < EXTS(SI)) & TO0 then TRAP b 1 (RB) if (a > EXTS(SI)) & TO1 then TRAP if (a < b) & TO0 then TRAP if (a = EXTS(SI)) & TO2 then TRAP if (a > b) & TO1 then TRAP if (a u EXTS(SI)) & TO4 then TRAP if (a u b) & TO4 then TRAP The contents of register RA are compared with the sign-extended value of the SI field. If any bit in the TO The contents of register RA are compared with the con- field is set to 1 and its corresponding condition is met tents of register RB. If any bit in the TO field is set to 1 by the result of the comparison, the system trap han- and its corresponding condition is met by the result of dler is invoked. the comparison, the system trap handler is invoked. If the trap conditions are met, this instruction is context If the trap conditions are met, this instruction is context synchronizing (see Book III). synchronizing (see Book III). Special Registers Altered: Special Registers Altered: None None Extended Mnemonics: Examples of extended mnemonics for Trap Doubleword Immediate: Extended: Equivalent to: tdlti Rx,value tdi 16,Rx,value tdnei Rx,value tdi 24,Rx,value Extended Mnemonics: Extended: Equivalent to: tdlnl Rx,Ry td 5,Rx,Ry Examples of extended mnemonics for Trap Double- word: Extended: Equivalent to: tdge Rx,Ry td 12,Rx,Ry 3.3.11 Fixed-Point Select [Category: Base.Phased-In] Integer Select A-form are placed into register RT. Otherwise, the contents of register RB are placed into register RT. isel RT,RA,RB,BC Special Registers Altered: None 31 RT RA RB BC 15 / 0 6 11 16 21 26 31 if RA=0 then a 10 else a 1 (RA) if CRBC+32=1 then RT 1 a else RT 1 (RB) If the contents of bit BC+32 of the Condition Register are equal to 1, then the contents of register RA (or 0) 70 Power ISATM -- Book I Version 2.04 3.3.12 Fixed-Point Logical Instructions The Logical instructions perform bit-parallel operations Extended mnemonics for logical oper- on 64-bit operands. ations The X-form Logical instructions with Rc=1, and the An extended mnemonic is provided that generates the D-form Logical instructions andi. and andis., set the preferred form of "no-op" (an instruction that does noth- first three bits of CR Field 0 as described in ing). This is shown as an example with the OR Immedi- Section 3.3.7, "Other Fixed-Point Instructions" on ate instruction. page 57. The Logical instructions do not change the SO, OV, and CA bits in the XER. Extended mnemonics are provided that use the OR and NOR instructions to copy the contents of one regis- ter to another, with and without complementing. These are shown as examples with the two instructions. See Appendix D, "Assembler Extended Mnemonics" on page 317 for additional extended mnemonics. AND Immediate D-form OR Immediate D-form andi. RA,RS,UI ori RA,RS,UI 28 RS RA UI 24 RS RA UI 0 6 11 16 31 0 6 11 16 31 RA 1 (RS) & (480 || UI) RA 1 (RS) | (480 || UI) The contents of register RS are ANDed with 480 || UI The contents of register RS are ORed with 480 || UI and and the result is placed into register RA. the result is placed into register RA. Special Registers Altered: The preferred "no-op" (an instruction that does nothing) CR0 is: AND Immediate Shifted D-form ori 0,0,0 Special Registers Altered: andis. RA,RS,UI None 29 RS RA UI Extended Mnemonics: 0 6 11 16 31 Example of extended mnemonics for OR Immediate: RA 1 (RS) & (320 || UI || 160) Extended: Equivalent to: nop ori 0,0,0 The contents of register RS are ANDed with 320 || UI || 160 and the result is placed into register RA. Special Registers Altered: CR0 Chapter 3. Fixed-Point Processor 71 Version 2.04 OR Immediate Shifted D-form oris RA,RS,UI 25 RS RA UI 0 6 11 16 31 RA 1 (RS) | (320 || UI || 160) The contents of register RS are ORed with 320 || UI || 160 and the result is placed into register RA. Special Registers Altered: None XOR Immediate D-form XOR Immediate Shifted D-form xori RA,RS,UI xoris RA,RS,UI 26 RS RA UI 27 RS RA UI 0 6 11 16 31 0 6 11 16 31 RA 1 (RS) XOR (480 || UI) RA 1 (RS) XOR (320 || UI || 160) The contents of register RS are XORed with 480 || UI The contents of register RS are XORed with 32 and the result is placed into register RA. 0 || UI || 160 and the result is placed into register RA. Special Registers Altered: Special Registers Altered: None None 72 Power ISATM -- Book I Version 2.04 AND X-form OR X-form and RA,RS,RB (Rc=0) or RA,RS,RB (Rc=0) and. RA,RS,RB (Rc=1) or. RA,RS,RB (Rc=1) 31 RS RA RB 28 Rc 31 RS RA RB 444 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RA 1 (RS) & (RB) RA 1 (RS) | (RB) The contents of register RS are ANDed with the con- The contents of register RS are ORed with the contents tents of register RB and the result is placed into register of register RB and the result is placed into register RA. RA. For implementations that support the PPR (see Section Special Registers Altered: 3.2.3), or Rx,Rx,Rx can be used to set PPRPRI as CR0 (if Rc=1) shown in Figure 43. or. Rx,Rx,Rx does not set PPRPRI. Rx PPRPRI Priority XOR X-form 1 010 low 6 011 medium low xor RA,RS,RB (Rc=0) 2 100 medium (normal) xor. RA,RS,RB (Rc=1) Figure 43. Priority levels for or Rx,Rx,Rx 31 RS RA RB 316 Rc 0 6 11 16 21 31 Special Registers Altered: CR0 (if Rc=1) RA 1 (RS) (RB) Extended Mnemonics: The contents of register RS are XORed with the con- Example of extended mnemonics for OR: tents of register RB and the result is placed into register RA. Extended: Equivalent to: mr Rx,Ry or Rx,Ry,Ry Special Registers Altered: CR0 (if Rc=1) Programming Note Warning: Other forms of or Rx,Rx,Rx that are not described in Figure 43 may also cause program NAND X-form priority to change. Use of these forms should be avoided except when software explicitly intends to nand RA,RS,RB (Rc=0) alter program priority. If a no-op is needed, the pre- nand. RA,RS,RB (Rc=1) ferred no-op (ori 0,0,0) should be used. 31 RS RA RB 476 Rc 0 6 11 16 21 31 RA 1 ¬((RS) & (RB)) The contents of register RS are ANDed with the con- tents of register RB and the complemented result is placed into register RA. Special Registers Altered: CR0 (if Rc=1) Programming Note nand or nor with RS=RB can be used to obtain the one's complement. Chapter 3. Fixed-Point Processor 73 Version 2.04 NOR X-form Equivalent X-form nor RA,RS,RB (Rc=0) eqv RA,RS,RB (Rc=0) nor. RA,RS,RB (Rc=1) eqv. RA,RS,RB (Rc=1) 31 RS RA RB 124 Rc 31 RS RA RB 284 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RA 1 ¬((RS) | (RB)) RA 1 (RS) (RB) The contents of register RS are ORed with the contents The contents of register RS are XORed with the con- of register RB and the complemented result is placed tents of register RB and the complemented result is into register RA. placed into register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Extended Mnemonics: Example of extended mnemonics for NOR: Extended: Equivalent to: not Rx,Ry nor Rx,Ry,Ry AND with Complement X-form OR with Complement X-form andc RA,RS,RB (Rc=0) orc RA,RS,RB (Rc=0) andc. RA,RS,RB (Rc=1) orc. RA,RS,RB (Rc=1) 31 RS RA RB 60 Rc 31 RS RA RB 412 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RA 1 (RS) & ¬(RB) RA 1 (RS) | ¬(RB) The contents of register RS are ANDed with the com- The contents of register RS are ORed with the comple- plement of the contents of register RB and the result is ment of the contents of register RB and the result is placed into register RA. placed into register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Extend Sign Byte X-form Extend Sign Halfword X-form extsb RA,RS (Rc=0) extsh RA,RS (Rc=0) extsb. RA,RS (Rc=1) extsh. RA,RS (Rc=1) 31 RS RA /// 954 Rc 31 RS RA /// 922 Rc 0 6 11 16 21 31 0 6 11 16 21 31 s 1 (RS)56 s 1 (RS)48 RA56:63 1 (RS)56:63 RA48:63 1 (RS)48:63 RA0:55 1 56s RA0:47 1 48s (RS)56:63 are placed into RA56:63. Bit 56 of register RS (RS)48:63 are placed into RA48:63. Bit 48 of register RS is placed into RA0:55. is placed into RA0:47. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Count Leading Zeros Word X-form cntlzw RA,RS (Rc=0) 74 Power ISATM -- Book I Version 2.04 cntlzw. RA,RS (Rc=1) 31 RS RA /// 26 Rc 0 6 11 16 21 31 n 1 32 do while n < 64 if (RS)n = 1 then leave n 1 n + 1 RA 1 n - 32 A count of the number of consecutive zero bits starting at bit 32 of register RS is placed into register RA. This number ranges from 0 to 32, inclusive. If Rc=1, CR Field 0 is set to reflect the result. Special Registers Altered: CR0 (if Rc=1) Programming Note For both Count Leading Zeros instructions, if Rc=1 then LT is set to 0 in CR Field 0. Chapter 3. Fixed-Point Processor 75 Version 2.04 3.3.12.1 64-bit Fixed-Point Logical 3.3.12.2 Phased-In Fixed-Point Logical Instructions [Category: 64-Bit] Instructions [Category: Base.Phased-In] Extend Sign Word X-form Population Count Bytes X-form extsw RA,RS (Rc=0) popcntb RA, RS extsw. RA,RS (Rc=1) 31 RS RA /// 122 / 31 RS RA /// 986 Rc 0 6 11 16 21 31 0 6 11 16 21 31 do i = 0 to 7 s 1 (RS)32 n 1 0 RA32:63 1 (RS)32:63 do j = 0 to 7 RA0:31 1 32s if (RS)(i×8)+j = 1 then n 1 n+1 (RS)32:63 are placed into RA32:63. Bit 32 of register RS RA(i×8):(i×8)+7 1 n is placed into RA0:31. A count of the number of one bits in each byte of regis- Special Registers Altered: ter RS is placed into the corresponding byte of register CR0 (if Rc=1) RA. This number ranges from 0 to 8, inclusive. Special Registers Altered: None Count Leading Zeros Doubleword X-form Programming Note cntlzd RA,RS (Rc=0) The total number of one bits in register RS can be cntlzd. RA,RS (Rc=1) computed as follows. In this example it is assumed that register RB contains the value 31 RS RA /// 58 Rc 0x0101_0101_0101_0101 0 6 11 16 21 31 popcntb RA,RS n 1 0 mulld RT,RA,RB do while n < 64 srdi RT,RT,56 # RT = population count if (RS)n = 1 then leave n 1 n + 1 RA 1 n A count of the number of consecutive zero bits starting at bit 0 of register RS is placed into register RA. This number ranges from 0 to 64, inclusive. If Rc=1, CR Field 0 is set to reflect the result. Special Registers Altered: CR0 (if Rc=1) 76 Power ISATM -- Book I Version 2.04 3.3.13 Fixed-Point Rotate and Shift Instructions The Fixed-Point Processor performs rotation operations There is no way to specify an all-zero mask. on data from a GPR and returns the result, or a portion For instructions that use the rotate32 operation, the of the result, to a GPR. mask start and stop positions are always in the The rotation operations rotate a 64-bit quantity left by a low-order 32 bits of the mask. specified number of bit positions. Bits that exit from The use of the mask is described in following sections. position 0 enter at position 63. The Rotate and Shift instructions with Rc=1 set the first Two types of rotation operation are supported. three bits of CR field 0 as described in Section 3.3.7, For the first type, denoted rotate64 or ROTL64, the value "Other Fixed-Point Instructions" on page 57. Rotate and rotated is the given 64-bit value. The rotate64 operation Shift instructions do not change the OV and SO bits. is used to rotate a given 64-bit quantity. Rotate and Shift instructions, except algebraic right shifts, do not change the CA bit. For the second type, denoted rotate32 or ROTL32, the value rotated consists of two copies of bits 32:63 of the given 64-bit value, one copy in bits 0:31 and the other in Extended mnemonics for rotates and bits 32:63. The rotate32 operation is used to rotate a shifts given 32-bit quantity. The Rotate and Shift instructions, while powerful, can The Rotate and Shift instructions employ a mask gen- be complicated to code (they have up to five operands). erator. The mask is 64 bits long, and consists of 1-bits A set of extended mnemonics is provided that allow from a start bit, mstart, through and including a stop bit, simpler coding of often-used functions such as clearing mstop, and 0-bits elsewhere. The values of mstart and the leftmost or rightmost bits of a register, left justifying mstop range from 0 to 63. If mstart > mstop, the 1-bits or right justifying an arbitrary field, and performing sim- wrap around from position 63 to position 0. Thus the ple rotates and shifts. Some of these are shown as mask is formed as follows: examples with the Rotate instructions. See Appendix D, "Assembler Extended Mnemonics" on page 317 for if mstart mstop then additional extended mnemonics. maskmstart:mstop = ones maskall other bits = zeros else maskmstart:63 = ones mask0:mstop = ones maskall other bits = zeros 3.3.13.1 Fixed-Point Rotate Instructions These instructions rotate the contents of a register. The result of the rotation is 1 inserted into the target register under control of a mask (if a mask bit is 1 the associated bit of the rotated data is placed into the target register, and if the mask bit is 0 the associated bit in the target register remains unchanged); or 1 ANDed with a mask before being placed into the target register. The Rotate Left instructions allow right-rotation of the contents of a register to be performed (in concept) by a left-rotation of 64-n, where n is the number of bits by which to rotate right. They allow right-rotation of the contents of the low-order 32 bits of a register to be per- formed (in concept) by a left-rotation of 32-n, where n is the number of bits by which to rotate right. Chapter 3. Fixed-Point Processor 77 Version 2.04 Rotate Left Word Immediate then AND Programming Note with Mask M-form Let RSL represent the low-order 32 bits of register rlwinm RA,RS,SH,MB,ME (Rc=0) RS, with the bits numbered from 0 through 31. rlwinm. RA,RS,SH,MB,ME (Rc=1) rlwinm can be used to extract an n-bit field that starts at bit position b in RSL, right-justified into the 21 RS RA SH MB ME Rc low-order 32 bits of register RA (clearing the 0 6 11 16 21 26 31 remaining 32-n bits of the low-order 32 bits of RA), by setting SH=b+n, MB=32-n, and ME=31. It can n 1 SH be used to extract an n-bit field that starts at bit r 1 ROTL32((RS)32:63, n) position b in RSL, left-justified into the low-order 32 m 1 MASK(MB+32, ME+32) bits of register RA (clearing the remaining 32-n bits RA 1 r & m of the low-order 32 bits of RA), by setting SH=b, The contents of register RS are rotated32 left SH bits. MB = 0, and ME=n-1. It can be used to rotate the A mask is generated having 1-bits from bit MB+32 contents of the low-order 32 bits of a register left through bit ME+32 and 0-bits elsewhere. The rotated (right) by n bits, by setting SH=n (32-n), MB=0, and data are ANDed with the generated mask and the ME=31. It can be used to shift the contents of the result is placed into register RA. low-order 32 bits of a register right by n bits, by set- ting SH=32-n, MB=n, and ME=31. It can be used Special Registers Altered: to clear the high-order b bits of the low-order 32 bits CR0 (if Rc=1) of the contents of a register and then shift the result Extended Mnemonics: left by n bits, by setting SH=n, MB=b-n, and ME=31-n. It can be used to clear the low-order n Examples of extended mnemonics for Rotate Left Word bits of the low-order 32 bits of a register, by setting Immediate then AND with Mask: SH=0, MB=0, and ME=31-n. Extended: Equivalent to: For all the uses given above, the high-order 32 bits extlwi Rx,Ry,n,b rlwinm Rx,Ry,b,0,n-1 of register RA are cleared. srwi Rx,Ry,n rlwinm Rx,Ry,32-n,n,31 Extended mnemonics are provided for all of these clrrwi Rx,Ry,n rlwinm Rx,Ry,0,0,31-n uses; see Appendix D, "Assembler Extended Mne- monics" on page 317. 78 Power ISATM -- Book I Version 2.04 Rotate Left Word then AND with Mask Rotate Left Word Immediate then Mask M-form Insert M-form rlwnm RA,RS,RB,MB,ME (Rc=0) rlwimi RA,RS,SH,MB,ME (Rc=0) rlwnm. RA,RS,RB,MB,ME (Rc=1) rlwimi. RA,RS,SH,MB,ME (Rc=1) 23 RS RA RB MB ME Rc 20 RS RA SH MB ME Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 n 1 (RB)59:63 n 1 SH r 1 ROTL32((RS)32:63, n) r 1 ROTL32((RS)32:63, n) m 1 MASK(MB+32, ME+32) m 1 MASK(MB+32, ME+32) RA 1 r & m RA 1 r&m | (RA)&¬m The contents of register RS are rotated32 left the num- The contents of register RS are rotated32 left SH bits. ber of bits specified by (RB)59:63. A mask is generated A mask is generated having 1-bits from bit MB+32 having 1-bits from bit MB+32 through bit ME+32 and through bit ME+32 and 0-bits elsewhere. The rotated 0-bits elsewhere. The rotated data are ANDed with the data are inserted into register RA under control of the generated mask and the result is placed into register generated mask. RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Extended Mnemonics: Extended Mnemonics: Example of extended mnemonics for Rotate Left Word Example of extended mnemonics for Rotate Left Word Immediate then Mask Insert: then AND with Mask: Extended: Equivalent to: Extended: Equivalent to: inslwi Rx,Ry,n,b rlwimi Rx,Ry,32-b,b,b+n-1 rotlw Rx,Ry,Rz rlwnm Rx,Ry,Rz,0,31 Programming Note Programming Note Let RAL represent the low-order 32 bits of register Let RSL represent the low-order 32 bits of register RA, with the bits numbered from 0 through 31. RS, with the bits numbered from 0 through 31. rlwimi can be used to insert an n-bit field that is rlwnm can be used to extract an n-bit field that left-justified in the low-order 32 bits of register RS, starts at variable bit position b in RSL, right-justified into RAL starting at bit position b, by setting into the low-order 32 bits of register RA (clearing SH=32-b, MB=b, and ME=(b+n)-1. It can be used the remaining 32-n bits of the low-order 32 bits of to insert an n-bit field that is right-justified in the RA), by setting RB59:63=b+n, MB=32-n, and low-order 32 bits of register RS, into RAL starting at ME=31. It can be used to extract an n-bit field that bit position b, by setting SH=32-(b+n), MB=b, and starts at variable bit position b in RSL, left-justified ME=(b+n)-1. into the low-order 32 bits of register RA (clearing Extended mnemonics are provided for both of the remaining 32-n bits of the low-order 32 bits of these uses; see Appendix D, "Assembler Extended RA), by setting RB59:63=b, MB = 0, and ME=n-1. It Mnemonics" on page 317. can be used to rotate the contents of the low-order 32 bits of a register left (right) by variable n bits, by setting RB59:63=n (32-n), MB=0, and ME=31. For all the uses given above, the high-order 32 bits of register RA are cleared. Extended mnemonics are provided for some of these uses; see Appendix D, "Assembler Extended Mnemonics" on page 317. Chapter 3. Fixed-Point Processor 79 Version 2.04 3.3.13.1.1 64-bit Fixed-Point Rotate Instructions [Category: 64-Bit] Rotate Left Doubleword Immediate then Rotate Left Doubleword Immediate then Clear Left MD-form Clear Right MD-form rldicl RA,RS,SH,MB (Rc=0) rldicr RA,RS,SH,ME (Rc=0) rldicl. RA,RS,SH,MB (Rc=1) rldicr. RA,RS,SH,ME (Rc=1) 30 RS RA sh mb 0 sh Rc 30 RS RA sh me 1 sh Rc 0 6 11 16 21 27 30 31 0 6 11 16 21 27 30 31 n 1 sh5 || sh0:4 n 1 sh5 || sh0:4 r 1 ROTL64((RS), n) r 1 ROTL64((RS), n) b 1 mb5 || mb0:4 e 1 me5 || me0:4 m 1 MASK(b, 63) m 1 MASK(0, e) RA 1 r & m RA 1 r & m The contents of register RS are rotated64 left SH bits. The contents of register RS are rotated64 left SH bits. A mask is generated having 1-bits from bit MB through A mask is generated having 1-bits from bit 0 through bit bit 63 and 0-bits elsewhere. The rotated data are ME and 0-bits elsewhere. The rotated data are ANDed ANDed with the generated mask and the result is with the generated mask and the result is placed into placed into register RA. register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Extended Mnemonics: Extended Mnemonics: Examples of extended mnemonics for Rotate Left Dou- Examples of extended mnemonics for Rotate Left Dou- bleword Immediate then Clear Left: bleword Immediate then Clear Right: Extended: Equivalent to: Extended: Equivalent to: extrdi Rx,Ry,n,b rldicl Rx,Ry,b+n,64-n extldi Rx,Ry,n,b rldicr Rx,Ry,b,n-1 srdi Rx,Ry,n rldicl Rx,Ry,64-n,n sldi Rx,Ry,n rldicr Rx,Ry,n,63-n clrldi Rx,Ry,n rldicl Rx,Ry,0,n clrrdi Rx,Ry,n rldicr Rx,Ry,0,63-n Programming Note Programming Note rldicl can be used to extract an n-bit field that starts rldicr can be used to extract an n-bit field that at bit position b in register RS, right-justified into starts at bit position b in register RS, left-justified register RA (clearing the remaining 64-n bits of into register RA (clearing the remaining 64-n bits RA), by setting SH=b+n and MB=64-n. It can be of RA), by setting SH=b and ME=n-1. It can be used to rotate the contents of a register left (right) used to rotate the contents of a register left (right) by n bits, by setting SH=n (64-n) and MB=0. It can by n bits, by setting SH=n (64-n) and ME=63. It be used to shift the contents of a register right by n can be used to shift the contents of a register left by bits, by setting SH=64-n and MB=n. It can be used n bits, by setting SH=n and ME=63-n. It can be to clear the high-order n bits of a register, by setting used to clear the low-order n bits of a register, by SH=0 and MB=n. setting SH=0 and ME=63-n. Extended mnemonics are provided for all of these Extended mnemonics are provided for all of these uses; see Appendix D, "Assembler Extended Mne- uses (some devolve to rldicl); see Appendix D, monics" on page 317. "Assembler Extended Mnemonics" on page 317. 80 Power ISATM -- Book I Version 2.04 Rotate Left Doubleword Immediate then Rotate Left Doubleword then Clear Left Clear MD-form MDS-form rldic RA,RS,SH,MB (Rc=0) rldcl RA,RS,RB,MB (Rc=0) rldic. RA,RS,SH,MB (Rc=1) rldcl. RA,RS,RB,MB (Rc=1) 30 RS RA sh mb 2 sh Rc 30 RS RA RB mb 8 Rc 0 6 11 16 21 27 30 31 0 6 11 16 21 27 31 n 1 sh5 || sh0:4 n 1 (RB)58:63 r 1 ROTL64((RS), n) r 1 ROTL64((RS), n) b 1 mb5 || mb0:4 b 1 mb5 || mb0:4 m 1 MASK(b, ¬n) m 1 MASK(b, 63) RA 1 r & m RA 1 r & m The contents of register RS are rotated64 left SH bits. The contents of register RS are rotated64 left the num- A mask is generated having 1-bits from bit MB through ber of bits specified by (RB)58:63. A mask is generated bit 63-SH and 0-bits elsewhere. The rotated data are having 1-bits from bit MB through bit 63 and 0-bits else- ANDed with the generated mask and the result is where. The rotated data are ANDed with the generated placed into register RA. mask and the result is placed into register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Extended Mnemonics: Extended Mnemonics: Example of extended mnemonics for Rotate Left Dou- Example of extended mnemonics for Rotate Left Dou- bleword Immediate then Clear: bleword then Clear Left: Extended: Equivalent to: Extended: Equivalent to: clrlsldi Rx,Ry,b,n rldic Rx,Ry,n,b-n rotld Rx,Ry,Rz rldcl Rx,Ry,Rz,0 Programming Note Programming Note rldic can be used to clear the high-order b bits of rldcl can be used to extract an n-bit field that starts the contents of a register and then shift the result at variable bit position b in register RS, right-justi- left by n bits, by setting SH=n and MB=b-n. It can fied into register RA (clearing the remaining 64-n be used to clear the high-order n bits of a register, bits of RA), by setting RB58:63=b+n and MB=64-n. by setting SH=0 and MB=n. It can be used to rotate the contents of a register left (right) by variable n bits, by setting RB58:63=n Extended mnemonics are provided for both of (64-n) and MB=0. these uses (the second devolves to rldicl); see Appendix D, "Assembler Extended Mnemonics" on Extended mnemonics are provided for some of page 317. these uses; see Appendix D, "Assembler Extended Mnemonics" on page 317. Chapter 3. Fixed-Point Processor 81 Version 2.04 Rotate Left Doubleword then Clear Right Rotate Left Doubleword Immediate then MDS-form Mask Insert MD-form rldcr RA,RS,RB,ME (Rc=0) rldimi RA,RS,SH,MB (Rc=0) rldcr. RA,RS,RB,ME (Rc=1) rldimi. RA,RS,SH,MB (Rc=1) 30 RS RA RB me 9 Rc 30 RS RA sh mb 3 sh Rc 0 6 11 16 21 27 31 0 6 11 16 21 27 30 31 n 1 (RB)58:63 n 1 sh5 || sh0:4 r 1 ROTL64((RS), n) r 1 ROTL64((RS), n) e 1 me5 || me0:4 b 1 mb5 || mb0:4 m 1 MASK(0, e) m 1 MASK(b, ¬n) RA 1 r & m RA 1 r&m | (RA)&¬m The contents of register RS are rotated64 left the num- The contents of register RS are rotated64 left SH bits. ber of bits specified by (RB)58:63. A mask is generated A mask is generated having 1-bits from bit MB through having 1-bits from bit 0 through bit ME and 0-bits else- bit 63-SH and 0-bits elsewhere. The rotated data are where. The rotated data are ANDed with the generated inserted into register RA under control of the generated mask and the result is placed into register RA. mask. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Extended Mnemonics: Programming Note rldcr can be used to extract an n-bit field that starts Example of extended mnemonics for Rotate Left Dou- at variable bit position b in register RS, left-justified bleword Immediate then Mask Insert: into register RA (clearing the remaining 64-n bits of RA), by setting RB58:63=b and ME=n-1. It can Extended: Equivalent to: be used to rotate the contents of a register left insrdi Rx,Ry,n,b rldimi Rx,Ry,64-(b+n),b (right) by variable n bits, by setting RB58:63=n (64-n) and ME=63. Programming Note rldimi can be used to insert an n-bit field that is Extended mnemonics are provided for some of right-justified in register RS, into register RA start- these uses (some devolve to rldcl); see ing at bit position b, by setting SH=64-(b+n) and Appendix D, "Assembler Extended Mnemonics" on MB=b. page 317. An extended mnemonic is provided for this use; see Appendix D, "Assembler Extended Mnemon- ics" on page 317. 82 Power ISATM -- Book I Version 2.04 3.3.13.2 Fixed-Point Shift Instructions The instructions in this section perform left and right Programming Note shifts. Any Shift Right Algebraic instruction, followed by addze, can be used to divide quickly by 2n. The Extended mnemonics for shifts setting of the CA bit by the Shift Right Algebraic Immediate-form logical (unsigned) shift operations are instructions is independent of mode. obtained by specifying appropriate masks and shift val- ues for certain Rotate instructions. A set of extended Programming Note mnemonics is provided to make coding of such shifts simpler and easier to understand. Some of these are Multiple-precision shifts can be programmed as shown as examples with the Rotate instructions. See shown in Section E.1, "Multiple-Precision Shifts" on Appendix D, "Assembler Extended Mnemonics" on page 331. page 317 for additional extended mnemonics. Shift Left Word X-form Shift Right Word X-form slw RA,RS,RB (Rc=0) srw RA,RS,RB (Rc=0) slw. RA,RS,RB (Rc=1) srw. RA,RS,RB (Rc=1) 31 RS RA RB 24 Rc 31 RS RA RB 536 Rc 0 6 11 16 21 31 0 6 11 16 21 31 n 1 (RB)59:63 n 1 (RB)59:63 r 1 ROTL32((RS)32:63, n) r 1 ROTL32((RS)32:63, 64-n) if (RB)58 = 0 then if (RB)58 = 0 then m 1 MASK(32, 63-n) m 1 MASK(n+32, 63) else m 1 640 else m 1 640 RA 1 r & m RA 1 r & m The contents of the low-order 32 bits of register RS are The contents of the low-order 32 bits of register RS are shifted left the number of bits specified by (RB)58:63. shifted right the number of bits specified by (RB)58:63. Bits shifted out of position 32 are lost. Zeros are sup- Bits shifted out of position 63 are lost. Zeros are sup- plied to the vacated positions on the right. The 32-bit plied to the vacated positions on the left. The 32-bit result is placed into RA32:63. RA0:31 are set to zero. result is placed into RA32:63. RA0:31 are set to zero. Shift amounts from 32 to 63 give a zero result. Shift amounts from 32 to 63 give a zero result. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Chapter 3. Fixed-Point Processor 83 Version 2.04 Shift Right Algebraic Word Immediate Shift Right Algebraic Word X-form X-form sraw RA,RS,RB (Rc=0) srawi RA,RS,SH (Rc=0) sraw. RA,RS,RB (Rc=1) srawi. RA,RS,SH (Rc=1) 31 RS RA RB 792 Rc 31 RS RA SH 824 Rc 0 6 11 16 21 31 0 6 11 16 21 31 n 1 (RB)59:63 n 1 SH r 1 ROTL32((RS)32:63, 64-n) r 1 ROTL32((RS)32:63, 64-n) if (RB)58 = 0 then m 1 MASK(n+32, 63) m 1 MASK(n+32, 63) s 1 (RS)32 else m 1 640 RA 1 r&m | (64s)&¬m s 1 (RS)32 CA 1 s & ((r&¬m)32:630) RA 1 r&m | (64s)&¬m CA 1 s & ((r&¬m)32:630) The contents of the low-order 32 bits of register RS are shifted right SH bits. Bits shifted out of position 63 are The contents of the low-order 32 bits of register RS are lost. Bit 32 of RS is replicated to fill the vacated posi- shifted right the number of bits specified by (RB)58:63. tions on the left. The 32-bit result is placed into Bits shifted out of position 63 are lost. Bit 32 of RS is RA32:63. Bit 32 of RS is replicated to fill RA0:31. CA is replicated to fill the vacated positions on the left. The set to 1 if the low-order 32 bits of (RS) contain a nega- 32-bit result is placed into RA32:63. Bit 32 of RS is repli- tive number and any 1-bits are shifted out of position cated to fill RA0:31. CA is set to 1 if the low-order 32 63; otherwise CA is set to 0. A shift amount of zero bits of (RS) contain a negative number and any 1-bits causes RA to receive EXTS((RS)32:63), and CA to be are shifted out of position 63; otherwise CA is set to 0. set to 0. A shift amount of zero causes RA to receive EXTS((RS)32:63), and CA to be set to 0. Shift amounts Special Registers Altered: from 32 to 63 give a result of 64 sign bits, and cause CA CA to receive the sign bit of (RS)32:63. CR0 (if Rc=1) Special Registers Altered: CA CR0 (if Rc=1) 84 Power ISATM -- Book I Version 2.04 3.3.13.2.1 64-bit Fixed-Point Shift Instructions [Category: 64-Bit] Shift Left Doubleword X-form Shift Right Doubleword X-form sld RA,RS,RB (Rc=0) srd RA,RS,RB (Rc=0) sld. RA,RS,RB (Rc=1) srd. RA,RS,RB (Rc=1) 31 RS RA RB 27 Rc 31 RS RA RB 539 Rc 0 6 11 16 21 31 0 6 11 16 21 31 n 1 (RB)58:63 n 1 (RB)58:63 r 1 ROTL64((RS), n) r 1 ROTL64((RS), 64-n) if (RB)57 = 0 then if (RB)57 = 0 then m 1 MASK(0, 63-n) m 1 MASK(n, 63) else m 1 640 else m 1 640 RA 1 r & m RA 1 r & m The contents of register RS are shifted left the number The contents of register RS are shifted right the num- of bits specified by (RB)57:63. Bits shifted out of posi- ber of bits specified by (RB)57:63. Bits shifted out of tion 0 are lost. Zeros are supplied to the vacated posi- position 63 are lost. Zeros are supplied to the vacated tions on the right. The result is placed into register RA. positions on the left. The result is placed into register Shift amounts from 64 to 127 give a zero result. RA. Shift amounts from 64 to 127 give a zero result. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Shift Right Algebraic Doubleword Shift Right Algebraic Doubleword X-form Immediate XS-form srad RA,RS,RB (Rc=0) sradi RA,RS,SH (Rc=0) srad. RA,RS,RB (Rc=1) sradi. RA,RS,SH (Rc=1) 31 RS RA RB 794 Rc 31 RS RA sh 413 sh Rc 0 6 11 16 21 31 0 6 11 16 21 30 31 n 1 (RB)58:63 n 1 sh5 || sh0:4 r 1 ROTL64((RS), 64-n) r 1 ROTL64((RS), 64-n) if (RB)57 = 0 then m 1 MASK(n, 63) m 1 MASK(n, 63) s 1 (RS)0 else m 1 640 RA 1 r&m | (64s)&¬m s 1 (RS)0 CA 1 s & ((r&¬m)0) RA 1 r&m | (64s)&¬m CA 1 s & ((r&¬m)0) The contents of register RS are shifted right SH bits. Bits shifted out of position 63 are lost. Bit 0 of RS is The contents of register RS are shifted right the num- replicated to fill the vacated positions on the left. The ber of bits specified by (RB)57:63. Bits shifted out of result is placed into register RA. CA is set to 1 if (RS) is position 63 are lost. Bit 0 of RS is replicated to fill the negative and any 1-bits are shifted out of position 63; vacated positions on the left. The result is placed into otherwise CA is set to 0. A shift amount of zero causes register RA. CA is set to 1 if (RS) is negative and any RA to be set equal to (RS), and CA to be set to 0. 1-bits are shifted out of position 63; otherwise CA is set to 0. A shift amount of zero causes RA to be set equal Special Registers Altered: to (RS), and CA to be set to 0. Shift amounts from 64 CA to 127 give a result of 64 sign bits in RA, and cause CA CR0 (if Rc=1) to receive the sign bit of (RS). Special Registers Altered: CA CR0 (if Rc=1) Chapter 3. Fixed-Point Processor 85 Version 2.04 3.3.14 Move To/From System Register Instructions The Move To Condition Register Fields instruction has SPR name as part of the mnemonic rather than as a a preferred form; see Section 1.8.1, "Preferred Instruc- numeric operand. An extended mnemonic is provided tion Forms" on page 19. In the preferred form, the FXM for the mtcrf instruction for compatibility with old soft- field satisfies the following rule. ware (written for a version of the architecture that pre- 1 Exactly one bit of the FXM field is set to 1. cedes Version 2.00) that uses it to set the entire Condition Register. Some of these extended mnemon- Extended mnemonics ics are shown as examples with the relevant instruc- tions. See Appendix D, "Assembler Extended Extended mnemonics are provided for the mtspr and Mnemonics" on page 317 for additional extended mne- mfspr instructions so that they can be coded with the monics. 86 Power ISATM -- Book I Version 2.04 Move To Special Purpose Register Compiler and Assembler Note XFX-form For the mtspr and mfspr instructions, the SPR mtspr SPR,RS number coded in assembler language does not appear directly as a 10-bit binary number in the 31 RS spr 467 / instruction. The number coded is split into two 5-bit 0 6 11 21 31 halves that are reversed in the instruction, with the high-order 5 bits appearing in bits 16:20 of the instruction and the low-order 5 bits in bits 11:15. n 1 spr5:9 || spr0:4 if length(SPR(n)) = 64 then SPR(n) 1 (RS) else SPR(n) 1 (RS)32:63 The SPR field denotes a Special Purpose Register, encoded as shown in the table below. The contents of register RS are placed into the designated Special Pur- pose Register. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RS are placed into the SPR. SPR1 Register decimal spr5:9 spr0:4 Name 1 00000 00001 XER 8 00000 01000 LR 9 00000 01001 CTR 256 01000 00000 VRSAVE2 512 10000 00000 SPEFSCR3 896 11100 00000 PPR4 1 Note that the order of the two 5-bit halves of the SPR number is reversed. 2 Category: Embedded and Vector ( see Programming Note in Section 3.2.4). 3 Category: SPE. 4 Category: Server. If the SPR field contains any value other than one of the values shown above then one of the following occurs. 1 The system illegal instruction error handler is invoked. 1 The system privileged instruction error handler is invoked. 1 The results are boundedly undefined. A complete description of this instruction can be found in Book III. Special Registers Altered: See above Extended Mnemonics: Examples of extended mnemonics for Move To Special Purpose Register: Extended: Equivalent to: mtxer Rx mtspr 1,Rx mtlr Rx mtspr 8,Rx mtctr Rx mtspr 9,Rx Chapter 3. Fixed-Point Processor 87 Version 2.04 Move From Special Purpose Register Extended Mnemonics: XFX-form Examples of extended mnemonics for Move From Spe- cial Purpose Register: mfspr RT,SPR Extended: Equivalent to: 31 RT spr 339 / mfxer Rx mfspr Rx,1 0 6 11 21 31 mflr Rx mfspr Rx,8 mfctr Rx mfspr Rx,9 n 1 spr5:9 || spr0:4 if length(SPR(n)) = 64 then . Note RT 1 SPR(n) else See the Notes that appear with mtspr. RT 1 320 || SPR(n) The SPR field denotes a Special Purpose Register, encoded as shown in the table below. The contents of the designated Special Purpose Register are placed into register RT. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RT receive the contents of the Special Purpose Register and the high-order 32 bits of RT are set to zero. SPR1 Register decimal spr5:9 spr0:4 Name 1 00000 00001 XER 8 00000 01000 LR 9 00000 01001 CTR 256 01000 00000 VRSAVE2 260 01000 00100 SPRG43 261 01000 00101 SPRG53 262 01000 00110 SPRG63 263 01000 00111 SPRG73 268 01000 01100 TB4 269 01000 01101 TBU4 512 10000 00000 SPEFSCR5 526 10000 01110 ATB4,6 527 10000 01111 ATBU4,6 896 11100 00000 PPR7 1 Note that the order of the two 5-bit halves of the SPR number is reversed. 2 Category: Embedded and Vector ( see Programming Note in Section 3.2.4). 3 Category: Embedded. 4 See Chapter 4 of Book II. 5 Category: SPE. 6 Category: Alternate Time Base. 7 Category: Server. If the SPR field contains any value other than one of the values shown above then one of the following occurs. 1 The system illegal instruction error handler is invoked. 1 The system privileged instruction error handler is invoked. 1 The results are boundedly undefined. A complete description of this instruction can be found in Book III. Special Registers Altered: None 88 Power ISATM -- Book I Version 2.04 Move To Condition Register Fields Move From Condition Register XFX-form XFX-form mfcr RT mtcrf FXM,RS 31 RT 0 /// 19 / 0 6 11 12 21 31 31 RS 0 FXM / 144 / 0 6 11 12 20 21 31 RT 1 320 || CR The contents of the Condition Register are placed into mask 1 4(FXM0) || 4(FXM1) || ... 4(FXM7) RT32:63. RT0:31 are set to 0. CR 1 ((RS)32:63 & mask) | (CR & ¬mask) Special Registers Altered: The contents of bits 32:63 of register RS are placed None into the Condition Register under control of the field mask specified by FXM. The field mask identifies the 4-bit fields affected. Let i be an integer in the range 0-7. If FXMi=1 then CR field i (CR bits 4×i+32:4×i+35) is set to the contents of the corresponding field of the low-order 32 bits of RS. Special Registers Altered: CR fields selected by mask Extended Mnemonics: Example of extended mnemonics for Move To Condi- tion Register Fields: Extended: Equivalent to: mtcr Rx mtcrf 0xFF,Rx Programming Note In the preferred form of this instruction (mtocrf), only one Condition Register field is updated. Chapter 3. Fixed-Point Processor 89 Version 2.04 Move To One Condition Register Field Move From One Condition Register Field XFX-form XFX-form mtocrf FXM,RS mfocrf RT,FXM [Category: Phased-In] [Category: Phased-In] 31 RS 1 FXM / 144 / 31 RT 1 FXM / 19 / 0 6 11 12 20 21 31 0 6 11 12 20 21 31 count 1 0 RT 1 undefined do i = 0 to 7 count 1 0 if FXMi = 1 then do i = 0 to 7 n 1 i if FXMi = 1 then count 1 count + 1 n 1 i if count = 1 then count 1 count + 1 CR4×n+32:4×n+35 1 (RS)4×n+32:4×n+35 if count = 1 then else CR 1 undefined RT4×n+32:4×n+35 1 CR4×n+32:4×n+35 If exactly one bit of the FXM field is set to 1, let n be the If exactly one bit of the FXM field is set to 1, let n be the position of that bit in the field (0 n 7). The contents position of that bit in the field (0 n 7). The contents of bits 4×n+32:4×n+35 of register RS are placed into of CR field n (CR bits 4×n+32:4×n+35) are placed into CR field n (CR bits 4×n+32:4×n+35). Otherwise, the bits 4×n+32:4×n+35 of register RT and the contents of contents of the Condition Register are undefined. the remaining bits of register RT are undefined. Other- wise, the contents of register RT are undefined. Special Registers Altered: CR field selected by FXM Special Registers Altered: None Programming Note These forms of the mtcrf and mfcr instructions are intended to replace the old forms of the instructions (the forms shown in page 89), which will eventually be phased out of the architecture. The new forms are backward compatible with most processors that comply with versions of the architecture that pre- cede Version 2.00. On those processors, the new forms are treated as the old forms. However, on some processors that comply with ver- sions of the architecture that precede Version 2.00 the new forms may be treated as follows: mtocrf: may cause the system illegal instruction error handler to be invoked mfocrf: may place an undefined value into register RT 90 Power ISATM -- Book I Version 2.04 3.3.14.1 Move To/From System Registers [Category: Embedded] Move to Condition Register from XER Move From APID Indirect X-form X-form mfapidi RT,RA mcrxr BF 31 RT RA /// 275 / 31 BF // /// /// 512 / 0 6 11 16 21 31 0 6 9 11 16 21 31 RT 1 implementation-dependent value based on (RA) CR4×BF+32:4×BF+35 1 XER32:35 The contents of RA are provided to any auxiliary pro- XER32:35 1 0b0000 cessors that may be present. A value, that is implemen- The contents of XER32:35 are copied to Condition Reg- tation-dependent, is placed in RT. ister field BF. XER32:35 are set to zero. Special Registers Altered: Special Registers Altered: None CR field BF XER32:35 Programming Note This instruction is provided as a mechanism for software to query the presence and configuration of one or more auxiliary processors. See the imple- mentation for details on the behavior of this instruc- tion. Move To Device Control Register 0 6 11 16 21 31 User-mode Indexed X-form DCRN 1 (RA) mtdcrux RS,RA RT 1 DCR(DCRN) Let the contents of register RA denote a Device Control 31 RS RA /// 419 / Register. (The supported Device Control Registers are 0 6 11 16 21 31 implementation-dependent.) DCRN 1 (RA) The contents of the designated Device Control Register DCR(DCRN) 1 RS are placed into RT. For 32-bit Device Control Registers, the contents of bits 32:63 of the designated Device Let the contents of register RA denote a Device Control Control Register are placed into RT. Register. (The supported Device Control Registers are implementation-dependent.) See "Move From Device Control Register Indexed X-form" on page 527 in Book III for more information on The contents of RS are placed into the designated this instruction. Device Control Register. For 32-bit Device Control Reg- isters, the contents of bits 32:63 of RS are placed into Special Registers Altered: the Device Control Register. Implementation-dependent See "Move To Device Control Register Indexed X-form" on page 526 in Book III for more information on this instruction. Special Registers Altered: Implementation-dependent Move From Device Control Register User-mode Indexed X-form mfdcrux RT,RA 31 RT RA /// 291 / Chapter 3. Fixed-Point Processor 91 Version 2.04 92 Power ISATM -- Book I Version 2.04 Chapter 4. Floating-Point Processor [Category: Floating-Point] 4.1 Floating-Point Processor Overview. 93 4.5.1 Execution Model for IEEE Opera- 4.2 Floating-Point Processor Registers 94 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.2.1 Floating-Point Registers . . . . . . . 94 4.5.2 Execution Model for 4.2.2 Floating-Point Status and Control Multiply-Add Type Instructions. . . . . . . 109 Register. . . . . . . . . . . . . . . . . . . . . . . . . 95 4.6 Floating-Point Processor Instructions . 4.3 Floating-Point Data . . . . . . . . . . . . . 97 110 4.3.1 Data Format. . . . . . . . . . . . . . . . . 97 4.6.1 Floating-Point Storage Access 4.3.2 Value Representation . . . . . . . . . 98 Instructions . . . . . . . . . . . . . . . . . . . . . 111 4.3.3 Sign of Result . . . . . . . . . . . . . . . 99 4.6.1.1 Storage Access Exceptions . . 111 4.3.4 Normalization and 4.6.2 Floating-Point Load Instructions. 111 Denormalization . . . . . . . . . . . . . . . . . 100 4.6.3 Floating-Point Store Instructions 114 4.3.5 Data Handling and Precision . . . 100 4.6.4 Floating-Point Move Instructions 118 4.3.5.1 Single-Precision Operands . . . 100 4.6.5 Floating-Point Arithmetic Instructions 4.3.5.2 Integer-Valued Operands . . . . 101 119 4.3.6 Rounding . . . . . . . . . . . . . . . . . . 101 4.6.5.1 Floating-Point Elementary Arith- 4.4 Floating-Point Exceptions . . . . . . . 102 metic Instructions. . . . . . . . . . . . . . . . . 119 4.4.1 Invalid Operation Exception . . . . 104 4.6.5.2 Floating-Point Multiply-Add Instruc- 4.4.1.1 Definition. . . . . . . . . . . . . . . . . 104 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.4.1.2 Action . . . . . . . . . . . . . . . . . . . 104 4.6.6 Floating-Point Rounding and Conver- 4.4.2 Zero Divide Exception . . . . . . . . 105 sion Instructions. . . . . . . . . . . . . . . . . . 125 4.4.2.1 Definition. . . . . . . . . . . . . . . . . 105 4.6.6.1 Floating-Point Rounding Instruction 4.4.2.2 Action . . . . . . . . . . . . . . . . . . . 105 125 4.4.3 Overflow Exception . . . . . . . . . . 105 4.6.6.2 Floating-Point Convert To/From 4.4.3.1 Definition. . . . . . . . . . . . . . . . . 105 Integer Instructions . . . . . . . . . . . . . . . 125 4.4.3.2 Action . . . . . . . . . . . . . . . . . . . 105 4.6.6.3 Floating Round to Integer Instruc- 4.4.4 Underflow Exception . . . . . . . . . 106 tions [Category: Floating-Point.Phased-In] 4.4.4.1 Definition. . . . . . . . . . . . . . . . . 106 127 4.4.4.2 Action . . . . . . . . . . . . . . . . . . . 106 4.6.7 Floating-Point Compare Instructions 4.4.5 Inexact Exception . . . . . . . . . . . 107 129 4.4.5.1 Definition. . . . . . . . . . . . . . . . . 107 4.6.8 Floating-Point Select Instruction. 130 4.4.5.2 Action . . . . . . . . . . . . . . . . . . . 107 4.6.9 Floating-Point Status and Control 4.5 Floating-Point Execution Models . 107 Register Instructions . . . . . . . . . . . . . . 130 4.1 Floating-Point Processor system compliant with the ANSI/IEEE Standard 754-1985, "IEEE Standard for Binary Floating-Point Overview Arithmetic" (hereafter referred to as "the IEEE stan- dard"). That standard defines certain required "opera- This chapter describes the registers and instructions tions" (addition, subtraction, etc.). Herein, the term that make up the Floating-Point Processor facility. "floating-point operation" is used to refer to one of these required operations and to additional operations The processor (augmented by appropriate software defined (e.g., those performed by Multiply-Add or support, where required) implements a floating-point Chapter 4. Floating-Point Processor [Category: Floating-Point] 93 Version 2.04 Reciprocal Estimate instructions). A Non-IEEE mode is enabled exception error handler to be invoked, pre- also provided. This mode, which may produce results cisely or imprecisely, if the proper control bits are set. not in strict compliance with the IEEE standard, allows shorter latency. Floating-Point Exceptions Instructions are provided to perform arithmetic, round- The following floating-point exceptions are detected by ing, conversion, comparison, and other operations in the processor: floating-point registers; to move floating-point data between storage and these registers; and to manipu- 1 Invalid Operation Exception (VX) late the Floating-Point Status and Control Register SNaN (VXSNAN) explicitly. Infinity-Infinity (VXISI) These instructions are divided into two categories. Infinity÷Infinity (VXIDI) Zero÷Zero (VXZDZ) 1 computational instructions Infinity×Zero (VXIMZ) The computational instructions are those that per- Invalid Compare (VXVC) form addition, subtraction, multiplication, division, Software-Defined Condition (VXSOFT) extracting the square root, rounding, conversion, Invalid Square Root (VXSQRT) comparison, and combinations of these opera- Invalid Integer Convert (VXCVI) tions. These instructions provide the floating-point 1 Zero Divide Exception (ZX) operations. They place status information into the 1 Overflow Exception (OX) Floating-Point Status and Control Register. They 1 Underflow Exception (UX) are the instructions described in Sections 4.6.5 1 Inexact Exception (XX) through 4.6.7. Each floating-point exception, and each category of 1 non-computational instructions Invalid Operation Exception, has an exception bit in the FPSCR. In addition, each floating-point exception has a The non-computational instructions are those that corresponding enable bit in the FPSCR. See perform loads and stores, move the contents of a Section 4.2.2, "Floating-Point Status and Control Reg- floating-point register to another floating-point reg- ister" on page 95 for a description of these exception ister possibly altering the sign, manipulate the and enable bits, and Section 4.4, "Floating-Point Floating-Point Status and Control Register explic- Exceptions" on page 102 for a detailed discussion of itly, and select the value from one of two float- floating-point exceptions, including the effects of the ing-point registers based on the value in a third enable bits. floating-point register. The operations performed by these instructions are not considered float- ing-point operations. With the exception of the instructions that manipulate the Floating-Point Sta- 4.2 Floating-Point Processor tus and Control Register explicitly, they do not alter Registers the Floating-Point Status and Control Register. They are the instructions described in Sections 4.6.2 through 4.6.4, and 4.6.9. 4.2.1 Floating-Point Registers A floating-point number consists of a signed exponent Implementations of this architecture provide 32 float- and a signed significand. The quantity expressed by ing-point registers (FPRs). The floating-point instruction this number is the product of the significand and the formats provide 5-bit fields for specifying the FPRs to number 2exponent. Encodings are provided in the data be used in the execution of the instruction. The FPRs format to represent finite numeric values, ±Infinity, and are numbered 0-31. See Figure 44 on page 95. values that are "Not a Number" (NaN). Operations Each FPR contains 64 bits that support the float- involving infinities produce results obeying traditional ing-point double format. Every instruction that inter- mathematical conventions. NaNs have no mathemati- prets the contents of an FPR as a floating-point value cal interpretation. Their encoding permits a variable uses the floating-point double format for this interpreta- diagnostic information field. They may be used to indi- tion. cate such things as uninitialized variables and can be produced by certain invalid operations. The computational instructions, and the Move and Select instructions, operate on data located in FPRs There is one class of exceptional events that occur dur- and, with the exception of the Compare instructions, ing instruction execution that is unique to the Float- place the result value into an FPR and optionally (when ing-Point Processor: the Floating-Point Exception. Rc=1) place status information into the Condition Reg- Floating-point exceptions are signaled with bits set in ister. Instruction forms with Rc=1 are part of Category: the Floating-Point Status and Control Register Floating-Point.Record. (FPSCR). They can cause the system floating-point 94 Power ISATM -- Book I Version 2.04 Load Double and Store Double instructions are pro- FEX and VX are simply the ORs of other FPSCR bits. vided that transfer 64 bits of data between storage and Therefore these two bits are not listed among the the FPRs with no conversion. Load Single instructions FPSCR bits affected by the various instructions. are provided to transfer and convert floating-point val- ues in floating-point single format from storage to the FPSCR same value in floating-point double format in the FPRs. 32 63 Store Single instructions are provided to transfer and convert floating-point values in floating-point double for- Figure 45. Floating-Point Status and Control mat from the FPRs to the same value in floating-point Register single format in storage. The bit definitions for the FPSCR are as follows. Instructions are provided that manipulate the Float- ing-Point Status and Control Register and the Condition Bit(s) Description Register explicitly. Some of these instructions copy data 32 Floating-Point Exception Summary (FX) from an FPR to the Floating-Point Status and Control Every floating-point instruction, except mtfsfi Register or vice versa. and mtfsf, implicitly sets FPSCRFX to 1 if that The computational instructions and the Select instruc- instruction causes any of the floating-point tion accept values from the FPRs in double format. For exception bits in the FPSCR to change from 0 single-precision arithmetic instructions, all input values to 1. mcrfs, mtfsfi, mtfsf, mtfsb0, and must be representable in single format; if they are not, mtfsb1 can alter FPSCRFX explicitly. the result placed into the target FPR, and the setting of status bits in the FPSCR and in the Condition Register Programming Note (if Rc=1), are undefined. FPSCRFX is defined not to be altered implicitly by mtfsfi and mtfsf because FPR 0 permitting these instructions to alter FPSCRFX implicitly could cause a para- FPR 1 dox. An example is an mtfsfi or mtfsf ... instruction that supplies 0 for FPSCRFX ... and 1 for FPSCROX, and is executed when FPSCROX=0. See also the Pro- FPR 30 gramming Notes with the definition of FPR 31 these two instructions. 0 63 33 Floating-Point Enabled Exception Sum- Figure 44. Floating-Point Registers mary (FEX) This bit is the OR of all the floating-point exception bits masked by their respective 4.2.2 Floating-Point Status and enable bits. mcrfs, mtfsfi, mtfsf, mtfsb0, and Control Register mtfsb1 cannot alter FPSCRFEX explicitly. 34 Floating-Point Invalid Operation Exception The Floating-Point Status and Control Register Summary (VX) (FPSCR) controls the handling of floating-point excep- This bit is the OR of all the Invalid Operation tions and records status resulting from the float- exception bits. mcrfs, mtfsfi, mtfsf, mtfsb0, ing-point operations. Bits 32:55 are status bits. Bits and mtfsb1 cannot alter FPSCRVX explicitly. 56:63 are control bits. 35 Floating-Point Overflow Exception (OX) The exception bits in the FPSCR (bits 35:44, 53:55) are See Section 4.4.3, "Overflow Exception" on sticky; that is, once set to 1 they remain set to 1 until page 105. they are set to 0 by an mcrfs, mtfsfi, mtfsf, or mtfsb0 instruction. The exception summary bits in the FPSCR 36 Floating-Point Underflow Exception (UX) (FX, FEX, and VX, which are bits 32:34) are not consid- See Section 4.4.4, "Underflow Exception" on ered to be "exception bits", and only FX is sticky. page 106. 37 Floating-Point Zero Divide Exception (ZX) See Section 4.4.2, "Zero Divide Exception" on page 105. 38 Floating-Point Inexact Exception (XX) See Section 4.4.5, "Inexact Exception" on page 107. Chapter 4. Floating-Point Processor [Category: Floating-Point] 95 Version 2.04 FPSCRXX is a sticky version of FPSCRFI (see value placed into FPRF is undefined. Addi- below). Thus the following rules completely tional details are given below. describe how FPSCRXX is set by a given instruction. Programming Note 1 If the instruction affects FPSCRFI, the A single-precision operation that produces new value of FPSCRXX is obtained by a denormalized result sets FPRF to indi- ORing the old value of FPSCRXX with cate a denormalized number. When pos- the new value of FPSCRFI. sible, single-precision denormalized 1 If the instruction does not affect numbers are represented in normalized FPSCRFI, the value of FPSCRXX is double format in the target register. unchanged. 39 Floating-Point Invalid Operation Exception (SNaN) (VXSNAN) 47 Floating-Point Result Class Descriptor (C) See Section 4.4.1, "Invalid Operation Excep- Arithmetic, rounding, and Convert From Inte- tion" on page 104. ger instructions may set this bit with the FPCC bits, to indicate the class of the result as 40 Floating-Point Invalid Operation Exception shown in Figure 46 on page 97. ( - ) (VXISI) See Section 4.4.1. 48:51 Floating-Point Condition Code (FPCC) Floating-point Compare instructions set one of 41 Floating-Point Invalid Operation Exception the FPCC bits to 1 and the other three FPCC ( ÷ ) (VXIDI) bits to 0. Arithmetic, rounding, and Convert See Section 4.4.1. From Integer instructions may set the FPCC 42 Floating-Point Invalid Operation Exception bits with the C bit, to indicate the class of the (0 ÷0) (VXZDZ) result as shown in Figure 46 on page 97. Note See Section 4.4.1. that in this case the high-order three bits of the FPCC retain their relational significance indi- 43 Floating-Point Invalid Operation Exception cating that the value is less than, greater than, ( ×0) (VXIMZ) or equal to zero. See Section 4.4.1. 48 Floating-Point Less Than or Negative (FL 44 Floating-Point Invalid Operation Exception or <) (Invalid Compare) (VXVC) See Section 4.4.1. 49 Floating-Point Greater Than or Positive (FG or >) 45 Floating-Point Fraction Rounded (FR) The last Arithmetic or Rounding and Conver- 50 Floating-Point Equal or Zero (FE or =) sion instruction incremented the fraction dur- 51 Floating-Point Unordered or NaN (FU or ?) ing rounding. See Section 4.3.6, "Rounding" on page 101. This bit is not sticky. 52 Reserved 46 Floating-Point Fraction Inexact (FI) 53 Floating-Point Invalid Operation Exception The last Arithmetic or Rounding and Conver- (Software-Defined Condition) sion instruction either produced an inexact (VXSOFT) result during rounding or caused a disabled This bit can be altered only by mcrfs, mtfsfi, Overflow Exception. See Section 4.3.6. This mtfsf, mtfsb0, or mtfsb1. See Section 4.4.1. bit is not sticky. Programming Note See the definition of FPSCRXX, above, FPSCRVXSOFT can be used by software regarding the relationship between FPSCRFI to indicate the occurrence of an arbitrary, and FPSCRXX. software-defined, condition that is to be 47:51 Floating-Point Result Flags (FPRF) treated as an Invalid Operation Exception. Arithmetic, rounding, and Convert From Inte- For example, the bit could be set by a pro- ger instructions set this field based on the gram that computes a base 10 logarithm if result placed into the target register and on the supplied input is negative. the target precision, except that if any portion of the result is undefined then the value 54 Floating-Point Invalid Operation Exception placed into FPRF is undefined. Floating-point (Invalid Square Root) (VXSQRT) Compare instructions set this field based on See Section 4.4.1. the relative values of the operands being com- pared. For Convert To Integer instructions, the 96 Power ISATM -- Book I Version 2.04 55 Floating-Point Invalid Operation Exception Programming Note (Invalid Integer Convert) (VXCVI) See Section 4.4.1. When the processor is in floating-point non-IEEE mode, the results of float- 56 Floating-Point Invalid Operation Exception ing-point operations may be approximate, Enable (VE) and performance for these operations See Section 4.4.1. may be better, more predictable, or less 57 Floating-Point Overflow Exception Enable data-dependent than when the processor (OE) is not in non-IEEE mode. For example, in See Section 4.4.3, "Overflow Exception" on non-IEEE mode an implementation may page 105. return 0 instead of a denormalized num- ber, and may return a large number 58 Floating-Point Underflow Exception Enable instead of an infinity. (UE) See Section 4.4.4, "Underflow Exception" on 62:63 Floating-Point Rounding Control (RN) See page 106. Section 4.3.6, "Rounding" on page 101. 59 Floating-Point Zero Divide Exception 00 Round to Nearest Enable (ZE) 01 Round toward Zero See Section 4.4.2, "Zero Divide Exception" on 10 Round toward +Infinity page 105. 11 Round toward -Infinity 60 Floating-Point Inexact Exception Enable (XE) See Section 4.4.5, "Inexact Exception" on Result page 107. Flags Result Value Class C < > = ? 61 Floating-Point Non-IEEE Mode (NI) Floating-point non-IEEE mode is optional. If 1 0 0 0 1 Quiet NaN floating-point non-IEEE mode is not imple- 0 1 0 0 1 - Infinity mented, this bit is treated as reserved, and the 0 1 0 0 0 - Normalized Number remainder of the definition of this bit does not 1 1 0 0 0 - Denormalized Number apply. 1 0 0 1 0 - Zero If floating-point non-IEEE mode is imple- 0 0 0 1 0 + Zero mented, this bit has the following meaning. 1 0 1 0 0 + Denormalized Number 0 The processor is not in floating-point 0 0 1 0 0 + Normalized Number non-IEEE mode (i.e., all floating-point 0 0 1 0 1 + Infinity operations conform to the IEEE standard). Figure 46. Floating-Point Result Flags 1 The processor is in floating-point non-IEEE mode. When the processor is in floating-point 4.3 Floating-Point Data non-IEEE mode, the remaining FPSCR bits may have meanings different from those given in this document, and floating-point operations 4.3.1 Data Format need not conform to the IEEE standard. The This architecture defines the representation of a float- effects of executing a given floating-point ing-point value in two different binary fixed-length for- instruction with FPSCRNI=1, and any addi- mats. The format may be a 32-bit single format for a tional requirements for using non-IEEE mode, single-precision value or a 64-bit double format for a are implementation-dependent. The results of double-precision value. The single format may be used executing a given instruction in non-IEEE for data in storage. The double format may be used for mode may vary between implementations, data in storage and for data in floating-point registers. and between different executions on the same implementation. The lengths of the exponent and the fraction fields differ between these two formats. The structure of the single and double formats is shown below. S EXP FRACTION 32 33 41 63 Figure 47. Floating-point single format Chapter 4. Floating-Point Processor [Category: Floating-Point] 97 Version 2.04 ties as defined below. The relative location on the real S EXP FRACTION number line for each of the defined entities is shown in Figure 50. 0 1 12 63 Figure 48. Floating-point double format -INF -NOR -DEN -0 +0 +DEN +NOR +INF Values in floating-point format are composed of three fields: Figure 50. Approximation to real numbers S sign bit The NaNs are not related to the numeric values or infin- EXP exponent+bias ities by order or value but are encodings used to convey FRACTION fraction diagnostic information such as the representation of uninitialized variables. Representation of numeric values in the floating-point formats consists of a sign bit (S), a biased exponent The following is a description of the different float- (EXP), and the fraction portion (FRACTION) of the sig- ing-point values defined in the architecture: nificand. The significand consists of a leading implied Binary floating-point numbers bit concatenated on the right with the FRACTION. This Machine representable values used as approximations leading implied bit is 1 for normalized numbers and 0 to real numbers. Three categories of numbers are sup- for denormalized numbers and is located in the unit bit ported: normalized numbers, denormalized numbers, position (i.e., the first bit to the left of the binary point). and zero values. Values representable within the two floating-point for- mats can be specified by the parameters listed in Normalized numbers (± NOR) Figure 49. These are values that have a biased exponent value in the range: Format Single Double 1 to 254 in single format 1 to 2046 in double format Exponent Bias +127 +1023 They are values in which the implied unit bit is 1. Nor- Maximum Exponent +127 +1023 malized numbers are interpreted as follows: Minimum Exponent -126 -1022 NOR = (-1)s x 2E x (1.fraction) Widths (bits) where s is the sign, E is the unbiased exponent, and Format 32 64 1.fraction is the significand, which is composed of a leading unit bit (implied bit) and a fraction part. Sign 1 1 Exponent 8 11 The ranges covered by the magnitude (M) of a normal- Fraction 23 52 ized floating-point number are approximately equal to: Significand 24 53 Single Format: Figure 49. IEEE floating-point fields 1.2x10-38 M 3.4x1038 The architecture requires that the FPRs of the Float- Double Format: ing-Point Processor support the floating-point double 2.2x10-308 M 1.8x10308 format only. Zero values (± 0) These are values that have a biased exponent value of 4.3.2 Value Representation zero and a fraction value of zero. Zeros can have a pos- This architecture defines numeric and non-numeric val- itive or negative sign. The sign of zero is ignored by ues representable within each of the two supported for- comparison operations (i.e., comparison regards +0 as mats. The numeric values are approximations to the equal to -0). real numbers and include the normalized numbers, Denormalized numbers (± DEN) denormalized numbers, and zero values. The These are values that have a biased exponent value of non-numeric values representable are the infinities and zero and a nonzero fraction value. They are nonzero the Not a Numbers (NaNs). The infinities are adjoined numbers smaller in magnitude than the representable to the real numbers, but are not numbers themselves, normalized numbers. They are values in which the and the standard rules of arithmetic do not hold when implied unit bit is 0. Denormalized numbers are inter- they are used in an operation. They are related to the preted as follows: real numbers by order alone. It is possible however to define restricted operations among numbers and infini- DEN = (-1)s x 2Emin x (0.fraction) 98 Power ISATM -- Book I Version 2.04 where Emin is the minimum representable exponent then FRT 1 (FRB)0:34 || 290 value (-126 for single-precision, -1022 for double-pre- else FRT 1 (FRB) cision). else if (FRC) is a NaN then FRT 1 (FRC) Infinities (± ) else if generated QNaN These are values that have the maximum biased expo- then FRT 1 generated QNaN nent value: If the operand specified by FRA is a NaN, then that 255 in single format NaN is stored as the result. Otherwise, if the operand 2047 in double format specified by FRB is a NaN (if the instruction specifies and a zero fraction value. They are used to approxi- an FRB operand), then that NaN is stored as the result, mate values greater in magnitude than the maximum with the low-order 29 bits of the result set to 0 if the normalized value. instruction is frsp. Otherwise, if the operand specified by FRC is a NaN (if the instruction specifies an FRC Infinity arithmetic is defined as the limiting case of real operand), then that NaN is stored as the result. Other- arithmetic, with restricted operations defined among wise, if a QNaN was generated due to a disabled numbers and infinities. Infinities and the real numbers Invalid Operation Exception, then that QNaN is stored can be related by ordering in the affine sense: as the result. If a QNaN is to be generated as a result, - < every finite number < + then the QNaN generated has a sign bit of 0, an expo- nent field of all 1s, and a high-order fraction bit of 1 with Arithmetic on infinities is always exact and does not sig- all other fraction bits 0. Any instruction that generates a nal any exception, except when an exception occurs QNaN as the result of a disabled Invalid Operation due to the invalid operations as described in Exception generates this QNaN (i.e., Section 4.4.1, "Invalid Operation Exception" on 0x7FF8_0000_0000_0000). page 104. A double-precision NaN is considered to be represent- For comparison operations, +Infinity compares equal to able in single format if and only if the low-order 29 bits +Infinity and -Infinity compares equal to -Infinity. of the double-precision NaN's fraction are zero. Not a Numbers (NaNs) These are values that have the maximum biased expo- 4.3.3 Sign of Result nent value and a nonzero fraction value. The sign bit is ignored (i.e., NaNs are neither positive nor negative). If The following rules govern the sign of the result of an the high-order bit of the fraction field is 0 then the NaN arithmetic, rounding, or conversion operation, when the is a Signaling NaN; otherwise it is a Quiet NaN. operation does not yield an exception. They apply even when the operands or results are zeros or infinities. Signaling NaNs are used to signal exceptions when they appear as operands of computational instructions. 1 The sign of the result of an add operation is the sign of the operand having the larger absolute Quiet NaNs are used to represent the results of certain value. If both operands have the same sign, the invalid operations, such as invalid arithmetic operations sign of the result of an add operation is the same on infinities or on NaNs, when Invalid Operation Excep- as the sign of the operands. The sign of the result tion is disabled (FPSCRVE=0). Quiet NaNs propagate of the subtract operation x-y is the same as the through all floating-point operations except ordered sign of the result of the add operation x+(-y). comparison, Floating Round to Single-Precision, and conversion to integer. Quiet NaNs do not signal excep- When the sum of two operands with opposite sign, tions, except for ordered comparison and conversion to or the difference of two operands with the same integer operations. Specific encodings in QNaNs can sign, is exactly zero, the sign of the result is posi- thus be preserved through a sequence of floating-point tive in all rounding modes except Round toward operations, and used to convey diagnostic information -Infinity, in which mode the sign is negative. to help identify results from invalid operations. 1 The sign of the result of a multiply or divide opera- When a QNaN is the result of a floating-point operation tion is the Exclusive OR of the signs of the oper- because one of the operands is a NaN or because a ands. QNaN was generated due to a disabled Invalid Opera- 1 The sign of the result of a Square Root or Recipro- tion Exception, then the following rule is applied to cal Square Root Estimate operation is always pos- determine the NaN with the high-order fraction bit set to itive, except that the square root of -0 is -0 and 1 that is to be stored as the result. the reciprocal square root of -0 is -Infinity. if (FRA) is a NaN 1 The sign of the result of a Round to Single-Preci- then FRT 1 (FRA) sion, or Convert From Integer, or Round to Integer else if (FRB) is a NaN operation is the sign of the operand being con- then if instruction is frsp verted. Chapter 4. Floating-Point Processor [Category: Floating-Point] 99 Version 2.04 For the Multiply-Add instructions, the rules given above to access a true single-precision representation in stor- are applied first to the multiply operation and then to the age, and a fixed-point integer representation in GPRs. add or subtract operation (one of the inputs to the add or subtract operation is the result of the multiply opera- tion). 4.3.5.1 Single-Precision Operands For single format data, a format conversion from single to double is performed when loading from storage into 4.3.4 Normalization and an FPR and a format conversion from double to single Denormalization is performed when storing from an FPR to storage. No floating-point exceptions are caused by these instruc- The intermediate result of an arithmetic or frsp instruc- tions. An instruction is provided to explicitly convert a tion may require normalization and/or denormalization double format operand in an FPR to single-precision. as described below. Normalization and denormalization Floating-point single-precision is enabled with four do not affect the sign of the result. types of instruction. When an arithmetic or rounding instruction produces an intermediate result which carries out of the signifi- cand, or in which the significand is nonzero but has a 1. Load Floating-Point Single leading zero bit, it is not a normalized number and must This form of instruction accesses a single-preci- be normalized before it is stored. For the carry-out sion operand in single format in storage, converts it case, the significand is shifted right one bit, with a one to double format, and loads it into an FPR. No shifted into the leading significand bit, and the exponent floating-point exceptions are caused by these is incremented by one. For the leading-zero case, the instructions. significand is shifted left while decrementing its expo- nent by one for each bit shifted, until the leading signifi- 2. Round to Floating-Point Single-Precision cand bit becomes one. The Guard bit and the Round bit The Floating Round to Single-Precision instruction (see Section 4.5.1, "Execution Model for IEEE Opera- rounds a double-precision operand to single-preci- tions" on page 107) participate in the shift with zeros sion, checking the exponent for single-precision shifted into the Round bit. The exponent is regarded as range and handling any exceptions according to if its range were unlimited. respective enable bits, and places that operand After normalization, or if normalization was not into an FPR in double format. For results produced required, the intermediate result may have a nonzero by single-precision arithmetic instructions, sin- significand and an exponent value that is less than the gle-precision loads, and other instances of the minimum value that can be represented in the format Floating Round to Single-Precision instruction, this specified for the result. In this case, the intermediate operation does not alter the value. result is said to be "Tiny" and the stored result is deter- 3. Single-Precision Arithmetic Instructions mined by the rules described in Section 4.4.4, "Under- flow Exception". These rules may require This form of instruction takes operands from the denormalization. FPRs in double format, performs the operation as if it produced an intermediate result having infinite A number is denormalized by shifting its significand precision and unbounded exponent range, and right while incrementing its exponent by 1 for each bit then coerces this intermediate result to fit in single shifted, until the exponent is equal to the format's mini- format. Status bits, in the FPSCR and optionally in mum value. If any significant bits are lost in this shifting the Condition Register, are set to reflect the sin- process then "Loss of Accuracy" has occurred (See gle-precision result. The result is then converted to Section 4.4.4, "Underflow Exception" on page 106) and double format and placed into an FPR. The result Underflow Exception is signaled. lies in the range supported by the single format. All input values must be representable in single for- 4.3.5 Data Handling and Precision mat; if they are not, the result placed into the target FPR, and the setting of status bits in the FPSCR Most of the Floating-Point Processor Architecture, and in the Condition Register (if Rc=1), are unde- including all computational, Move, and Select instruc- fined. tions, use the floating-point double format to represent data in the FPRs. Single-precision and integer-valued 4. Store Floating-Point Single operands may be manipulated using double-precision This form of instruction converts a double-preci- operations. Instructions are provided to coerce these sion operand to single format and stores that oper- values from a double format operand. Instructions are and into storage. No floating-point exceptions are also provided for manipulations which do not require caused by these instructions. (The value being double-precision. In addition, instructions are provided stored is effectively assumed to be the result of an instruction of one of the preceding three types.) 100 Power ISATM -- Book I Version 2.04 When the result of a Load Floating-Point Single, Float- The Floating Convert To Integer instructions con- ing Round to Single-Precision, or single-precision arith- vert a double-precision operand to a 32-bit or metic instruction is stored in an FPR, the low-order 29 64-bit signed fixed-point integer format. Variants FRACTION bits are zero. are provided both to perform rounding based on the value of FPSCRRN and to round toward zero. Programming Note These instructions may cause Invalid Operation The Floating Round to Single-Precision instruction (VXSNaN, VXCVI) and Inexact exceptions. The is provided to allow value conversion from dou- Floating Convert From Integer instruction converts ble-precision to single-precision with appropriate a 64-bit signed fixed-point integer to a double-pre- exception checking and rounding. This instruction cision floating-point integer. Because of the limita- should be used to convert double-precision float- tions of the source format, only an Inexact ing-point values (produced by double-precision exception may be generated. load and arithmetic instructions and by fcfid) to sin- gle-precision values prior to storing them into single 4.3.6 Rounding format storage elements or using them as oper- ands for single-precision arithmetic instructions. The material in this section applies to operations that Values produced by single-precision load and arith- have numeric operands (i.e., operands that are not metic instructions are already single-precision val- infinities or NaNs). Rounding the intermediate result of ues and can be stored directly into single format such an operation may cause an Overflow Exception, storage elements, or used directly as operands for an Underflow Exception, or an Inexact Exception. The single-precision arithmetic instructions, without pre- remainder of this section assumes that the operation ceding the store, or the arithmetic instruction, by a causes no exceptions and that the result is numeric. Floating Round to Single-Precision instruction. See Section 4.3.2, "Value Representation" and Section 4.4, "Floating-Point Exceptions" for the cases Programming Note not covered here. A single-precision value can be used in double-pre- The Arithmetic and Rounding and Conversion instruc- cision arithmetic operations. The reverse is true tions round their intermediate results. With the excep- only if the double-precision value is representable tion of the Estimate instructions, these instructions in single format. produce an intermediate result that can be regarded as having infinite precision and unbounded exponent Some implementations may execute single-preci- range. All but two groups of these instructions normal- sion arithmetic instructions faster than double-pre- ize or denormalize the intermediate result prior to cision arithmetic instructions. Therefore, if rounding and then place the final result into the target double-precision accuracy is not required, sin- FPR in double format. The Floating Round to Integer gle-precision data and instructions should be used. and Floating Convert To Integer instructions with biased exponents ranging from 1022 through 1074 are prepared for rounding by repetitively shifting the signifi- 4.3.5.2 Integer-Valued Operands cand right one position and incrementing the biased Instructions are provided to round floating-point oper- exponent until it reaches a value of 1075. (Intermediate ands to integer values in floating-point format. To facili- results with biased exponents 1075 or larger are tate exchange of data between the floating-point and already integers, and with biased exponents 1021 or fixed-point processors, instructions are provided to con- less round to zero.) After rounding, the final result for vert between floating-point double format and Floating Round to Integer is normalized and put in dou- fixed-point integer format in an FPR. Computation on ble format, and for Floating Convert To Integer is con- integer-valued operands may be performed using arith- verted to a signed fixed-point integer. metic instructions of the required precision. (The results FPSCR bits FR and FI generally indicate the results of may not be integer values.) The two groups of instruc- rounding. Each of the instructions which rounds its tions provided specifically to support integer-valued intermediate result sets these bits. If the fraction is operands are described below. incremented during rounding then FR is set to 1, other- 1. Floating Round to Integer wise FR is set to 0. If the result is inexact then FI is set to 1, otherwise FI is set to zero. The Round to Integer The Floating Round to Integer instructions round a instructions are exceptions to this rule, setting FR and double-precision operand to an integer value in FI to 0. The Estimate instructions set FR and FI to floating-point double format. These instructions undefined values. The remaining floating-point instruc- may cause Invalid Operation (VXSNAN) excep- tions do not alter FR and FI. tions. See Sections 4.3.6 and 4.5.1 for more infor- mation about rounding. Four user-selectable rounding modes are provided through the Floating-Point Rounding Control field in the 2. Floating Convert To/From Integer Chapter 4. Floating-Point Processor [Category: Floating-Point] 101 Version 2.04 FPSCR. See Section 4.2.2, "Floating-Point Status and Control Register". These are encoded as follows. 4.4 Floating-Point Exceptions This architecture defines the following floating-point exceptions: RN Rounding Mode 1 Invalid Operation Exception 00 Round to Nearest SNaN 01 Round toward Zero Infinity-Infinity 10 Round toward +Infinity Infinity÷Infinity 11 Round toward -Infinity Zero÷Zero Let Z be the intermediate arithmetic result or the oper- Infinity×Zero and of a convert operation. If Z can be represented Invalid Compare exactly in the target format, then the result in all round- Software-Defined Condition ing modes is Z as represented in the target format. If Z Invalid Square Root cannot be represented exactly in the target format, let Invalid Integer Convert Z1 and Z2 bound Z as the next larger and next smaller 1 Zero Divide Exception numbers representable in the target format. Then Z1 or 1 Overflow Exception Z2 can be used to approximate the result in the target 1 Underflow Exception format. 1 Inexact Exception Figure 51 shows the relation of Z, Z1, and Z2 in this These exceptions, other than Invalid Operation Excep- case. The following rules specify the rounding in the tion due to Software-Defined Condition, may occur dur- four modes. "LSB" means "least significant bit". ing execution of computational instructions. An Invalid Operation Exception due to Software-Defined Condi- tion occurs when a Move To FPSCR instruction sets By Incrementing LSB of Z FPSCRVXSOFT to 1. Infinitely Precise Value By Truncating after LSB Each floating-point exception, and each category of Invalid Operation Exception, has an exception bit in the FPSCR. In addition, each floating-point exception has a Z2 Z1 0 Z2 Z1 corresponding enable bit in the FPSCR. The exception Z Z bit indicates occurrence of the corresponding excep- Negative values Positive values tion. If an exception occurs, the corresponding enable bit governs the result produced by the instruction and, Figure 51. Selection of Z1 and Z2 in conjunction with the FE0 and FE1 bits (see page 103), whether and how the system floating-point Round to Nearest enabled exception error handler is invoked. (In general, Choose the value that is closer to Z (Z1 or Z2). the enabling specified by the enable bit is of invoking In case of a tie, choose the one that is even the system error handler, not of permitting the excep- (least significant bit 0). tion to occur. The occurrence of an exception depends Round toward Zero only on the instruction and its inputs, not on the setting Choose the smaller in magnitude (Z1 or Z2). of any control bits. The only deviation from this general rule is that the occurrence of an Underflow Exception Round toward +Infinity may depend on the setting of the enable bit.) Choose Z1. A single instruction, other than mtfsfi or mtfsf, may set Round toward -Infinity more than one exception bit only in the following cases: Choose Z2. 1 Inexact Exception may be set with Overflow See Section 4.5.1, "Execution Model for IEEE Opera- Exception. tions" on page 107 for a detailed explanation of round- 1 Inexact Exception may be set with Underflow ing. Exception. 1 Invalid Operation Exception (SNaN) is set with Invalid Operation Exception (×0) for Multiply-Add instructions for which the values being multiplied are infinity and zero and the value being added is an SNaN. 1 Invalid Operation Exception (SNaN) may be set with Invalid Operation Exception (Invalid Compare) for Compare Ordered instructions. 1 Invalid Operation Exception (SNaN) may be set with Invalid Operation Exception (Invalid Integer Convert) for Convert To Integer instructions. 102 Power ISATM -- Book I Version 2.04 When an exception occurs the writing of a result to the ing-point exception occurs. The system floating-point target register may be suppressed or a result may be enabled exception error handler is also invoked if a delivered, depending on the exception. Move To FPSCR instruction causes an exception bit and the corresponding enable bit both to be 1; the The writing of a result to the target register is sup- Move To FPSCR instruction is considered to cause the pressed for the following kinds of exception, so that enabled exception. there is no possibility that one of the operands is lost: The FE0 and FE1 bits control whether and how the sys- 1 Enabled Invalid Operation tem floating-point enabled exception error handler is 1 Enabled Zero Divide invoked if an enabled floating-point exception occurs. For the remaining kinds of exception, a result is gener- The location of these bits and the requirements for ated and written to the destination specified by the altering them are described in Book III. (The system instruction causing the exception. The result may be a floating-point enabled exception error handler is never different value for the enabled and disabled conditions invoked because of a disabled floating-point exception.) for some of these exceptions. The kinds of exception The effects of the four possible settings of these bits that deliver a result are the following: are as follows. 1 Disabled Invalid Operation 1 Disabled Zero Divide FE0 FE1 Description 1 Disabled Overflow 0 0 Ignore Exceptions Mode 1 Disabled Underflow Floating-point exceptions do not cause 1 Disabled Inexact the system floating-point enabled excep- 1 Enabled Overflow tion error handler to be invoked. 1 Enabled Underflow 0 1 Imprecise Nonrecoverable Mode 1 Enabled Inexact The system floating-point enabled excep- Subsequent sections define each of the floating-point tion error handler is invoked at some point exceptions and specify the action that is taken when at or beyond the instruction that caused they are detected. the enabled exception. It may not be pos- sible to identify the excepting instruction The IEEE standard specifies the handling of excep- or the data that caused the exception. tional conditions in terms of "traps" and "trap handlers". Results produced by the excepting In this architecture, an FPSCR exception enable bit of 1 instruction may have been used by or may causes generation of the result value specified in the have affected subsequent instructions IEEE standard for the "trap enabled" case; the expecta- that are executed before the error handler tion is that the exception will be detected by software, is invoked. which will revise the result. An FPSCR exception 1 0 Imprecise Recoverable Mode enable bit of 0 causes generation of the "default result" The system floating-point enabled excep- value specified for the "trap disabled" (or "no trap tion error handler is invoked at some point occurs" or "trap is not implemented") case; the expecta- at or beyond the instruction that caused tion is that the exception will not be detected by soft- the enabled exception. Sufficient informa- ware, which will simply use the default result. The result tion is provided to the error handler that it to be delivered in each case for each exception is can identify the excepting instruction and described in the sections below. the operands, and correct the result. No The IEEE default behavior when an exception occurs is results produced by the excepting instruc- to generate a default value and not to notify software. In tion have been used by or have affected this architecture, if the IEEE default behavior when an subsequent instructions that are executed exception occurs is desired for all exceptions, all before the error handler is invoked. FPSCR exception enable bits should be set to 0 and 1 1 Precise Mode Ignore Exceptions Mode (see below) should be used. The system floating-point enabled excep- In this case the system floating-point enabled exception tion error handler is invoked precisely at error handler is not invoked, even if floating-point the instruction that caused the enabled exceptions occur: software can inspect the FPSCR exception. exception bits if necessary, to determine whether exceptions have occurred. In all cases, the question of whether a floating-point result is stored, and what value is stored, is governed In this architecture, if software is to be notified that a by the FPSCR exception enable bits, as described in given kind of exception has occurred, the correspond- subsequent sections, and is not affected by the value of ing FPSCR exception enable bit must be set to 1 and a the FE0 and FE1 bits. mode other than Ignore Exceptions Mode must be used. In this case the system floating-point enabled In all cases in which the system floating-point enabled exception error handler is invoked if an enabled float- exception error handler is invoked, all instructions Chapter 4. Floating-Point Processor [Category: Floating-Point] 103 Version 2.04 before the instruction at which the system floating-point 4.4.1 Invalid Operation Exception enabled exception error handler is invoked have com- pleted, and no instruction after the instruction at which the system floating-point enabled exception error han- 4.4.1.1 Definition dler is invoked has begun execution. The instruction at An Invalid Operation Exception occurs when an oper- which the system floating-point enabled exception error and is invalid for the specified operation. The invalid handler is invoked has completed if it is the excepting operations are: instruction and there is only one such instruction. Oth- 1 Any floating-point operation on a Signaling NaN erwise it has not begun execution (or may have been (SNaN) partially executed in some cases, as described in Book 1 For add or subtract operations, magnitude subtrac- III). tion of infinities ( - ) 1 Division of infinity by infinity ( ÷ ) Programming Note 1 Division of zero by zero (0 ÷ 0) In any of the three non-Precise modes, a Float- 1 Multiplication of infinity by zero ( × 0) ing-Point Status and Control Register instruction 1 Ordered comparison involving a NaN (Invalid Com- can be used to force any exceptions, due to pare) instructions initiated before the Floating-Point Sta- 1 Square root or reciprocal square root of a negative tus and Control Register instruction, to be recorded (and nonzero) number (Invalid Square Root) in the FPSCR. (This forcing is superfluous for Pre- 1 Integer convert involving a number too large in cise Mode.) magnitude to be represented in the target format, In either of the Imprecise modes, a Floating-Point or involving an infinity or a NaN (Invalid Integer Status and Control Register instruction can be used Convert) to force any invocations of the system floating-point An Invalid Operation Exception also occurs when an enabled exception error handler, due to instructions mtfsfi, mtfsf, or mtfsb1 instruction is executed that initiated before the Floating-Point Status and Con- sets FPSCRVXSOFT to 1 (Software-Defined Condition). trol Register instruction, to occur. (This forcing has no effect in Ignore Exceptions Mode, and is super- fluous for Precise Mode.) 4.4.1.2 Action The last sentence of the paragraph preceding this The action to be taken depends on the setting of the Programming Note can apply only in the Imprecise Invalid Operation Exception Enable bit of the FPSCR. modes, or if the mode has just been changed from When Invalid Operation Exception is enabled Ignore Exceptions Mode to some other mode. (It (FPSCRVE=1) and an Invalid Operation Exception always applies in the latter case.) occurs, the following actions are taken: In order to obtain the best performance across the wid- 1. One or two Invalid Operation Exceptions are set est range of implementations, the programmer should FPSCRVXSNAN (if SNaN) obey the following guidelines. FPSCRVXISI (if - ) FPSCRVXIDI (if ÷ ) 1 If the IEEE default results are acceptable to the FPSCRVXZDZ (if 0 ÷ 0) application, Ignore Exceptions Mode should be FPSCRVXIMZ (if × 0) used with all FPSCR exception enable bits set to 0. FPSCRVXVC (if invalid comp) 1 If the IEEE default results are not acceptable to the FPSCRVXSOFT (if sfw-def cond) application, Imprecise Nonrecoverable Mode FPSCRVXSQRT (if invalid sqrt) should be used, or Imprecise Recoverable Mode if FPSCRVXCVI (if invalid int cvrt) recoverability is needed, with FPSCR exception 2. If the operation is an arithmetic, Floating Round to enable bits set to 1 for those exceptions for which Single-Precision, Floating Round to Integer, or the system floating-point enabled exception error convert to integer operation, handler is to be invoked. the target FPR is unchanged 1 Ignore Exceptions Mode should not, in general, be FPSCRFR FI are set to zero used when any FPSCR exception enable bits are FPSCRFPRF is unchanged set to 1. 3. If the operation is a compare, 1 Precise Mode may degrade performance in some FPSCRFR FI C are unchanged implementations, perhaps substantially, and there- FPSCRFPCC is set to reflect unordered fore should be used only for debugging and other 4. If an mtfsfi, mtfsf, or mtfsb1 instruction is exe- specialized applications. cuted that sets FPSCRVXSOFT to 1, The FPSCR is set as specified in the instruc- tion description. 104 Power ISATM -- Book I Version 2.04 When Invalid Operation Exception is disabled 4.4.2.2 Action (FPSCRVE=0) and an Invalid Operation Exception occurs, the following actions are taken: The action to be taken depends on the setting of the Zero Divide Exception Enable bit of the FPSCR. 1. One or two Invalid Operation Exceptions are set FPSCRVXSNAN (if SNaN) When Zero Divide Exception is enabled (FPSCRZE=1) FPSCRVXISI (if - ) and a Zero Divide Exception occurs, the following FPSCRVXIDI (if ÷ ) actions are taken: FPSCRVXZDZ (if 0 ÷ 0) 1. Zero Divide Exception is set FPSCRVXIMZ (if × 0) FPSCRZX 1 1 FPSCRVXVC (if invalid comp) 2. The target FPR is unchanged FPSCRVXSOFT (if sfw-def cond) 3. FPSCRFR FI are set to zero FPSCRVXSQRT (if invalid sqrt) 4. FPSCRFPRF is unchanged FPSCRVXCVI (if invalid int cvrt) 2. If the operation is an arithmetic or Floating Round When Zero Divide Exception is disabled (FPSCRZE=0) to Single-Precision operation, and a Zero Divide Exception occurs, the following the target FPR is set to a Quiet NaN actions are taken: FPSCRFR FI are set to zero 1. Zero Divide Exception is set FPSCRFPRF is set to indicate the class of the FPSCRZX 1 1 result (Quiet NaN) 2. The target FPR is set to ± Infinity, where the sign is 3. If the operation is a convert to 64-bit integer opera- determined by the XOR of the signs of the oper- tion, ands the target FPR is set as follows: 3. FPSCRFR FI are set to zero FRT is set to the most positive 64-bit integer 4. FPSCRFPRF is set to indicate the class and sign of if the operand in FRB is a positive number the result (± Infinity) or + , and to the most negative 64-bit inte- ger if the operand in FRB is a negative num- ber, - , or NaN 4.4.3 Overflow Exception FPSCRFR FI are set to zero FPSCRFPRF is undefined 4. If the operation is a convert to 32-bit integer opera- 4.4.3.1 Definition tion, An Overflow Exception occurs when the magnitude of the target FPR is set as follows: what would have been the rounded result if the expo- FRT0:31 1 undefined nent range were unbounded exceeds that of the largest FRT32:63 are set to the most positive 32-bit finite number of the specified result precision. integer if the operand in FRB is a positive number or +infinity, and to the most nega- tive 32-bit integer if the operand in FRB is a 4.4.3.2 Action negative number, -infinity, or NaN The action to be taken depends on the setting of the FPSCRFR FI are set to zero Overflow Exception Enable bit of the FPSCR. FPSCRFPRF is undefined 5. If the operation is a compare, When Overflow Exception is enabled (FPSCROE=1) FPSCRFR FI C are unchanged and an Overflow Exception occurs, the following FPSCRFPCC is set to reflect unordered actions are taken: 1. Overflow Exception is set FPSCROX 1 1 6. If an mtfsfi, mtfsf, or mtfsb1 instruction is exe- 2. For double-precision arithmetic instructions, the cuted that sets FPSCRVXSOFT to 1, exponent of the normalized intermediate result is The FPSCR is set as specified in the instruc- adjusted by subtracting 1536 tion description. 3. For single-precision arithmetic instructions and the Floating Round to Single-Precision instruction, the 4.4.2 Zero Divide Exception exponent of the normalized intermediate result is adjusted by subtracting 192 4. The adjusted rounded result is placed into the tar- 4.4.2.1 Definition get FPR 5. FPSCRFPRF is set to indicate the class and sign of A Zero Divide Exception occurs when a Divide instruc- the result (± Normal Number) tion is executed with a zero divisor value and a finite nonzero dividend value. It also occurs when a Recipro- When Overflow Exception is disabled (FPSCROE=0) cal Estimate instruction (fre[s] or frsqrte[s]) is exe- and an Overflow Exception occurs, the following cuted with an operand value of zero. actions are taken: Chapter 4. Floating-Point Processor [Category: Floating-Point] 105 Version 2.04 1. Overflow Exception is set 4.4.4 Underflow Exception FPSCROX 1 1 2. Inexact Exception is set FPSCRXX 1 1 4.4.4.1 Definition 3. The result is determined by the rounding mode Underflow Exception is defined separately for the (FPSCRRN) and the sign of the intermediate result enabled and disabled states: as follows: - Round to Nearest 1 Enabled: Store ± Infinity, where the sign is the sign Underflow occurs when the intermediate result is of the intermediate result "Tiny". - Round toward Zero 1 Disabled: Store the format's largest finite number Underflow occurs when the intermediate result is with the sign of the intermediate result "Tiny" and there is "Loss of Accuracy". - Round toward + Infinity For negative overflow, store the format's A "Tiny" result is detected before rounding, when a non- most negative finite number; for positive zero intermediate result computed as though both the overflow, store +Infinity precision and the exponent range were unbounded - Round toward -Infinity would be less in magnitude than the smallest normal- For negative overflow, store -Infinity; for ized number. positive overflow, store the format's larg- If the intermediate result is "Tiny" and Underflow est finite number Exception is disabled (FPSCRUE=0) then the interme- 4. The result is placed into the target FPR diate result is denormalized (see Section 4.3.4, "Nor- 5. FPSCRFR is undefined malization and Denormalization" on page 100) and 6. FPSCRFI is set to 1 rounded (see Section 4.3.6, "Rounding" on page 101) 7. FPSCRFPRF is set to indicate the class and sign of before being placed into the target FPR. the result (± Infinity or ± Normal Number) "Loss of Accuracy" is detected when the delivered result value differs from what would have been com- puted were both the precision and the exponent range unbounded. 4.4.4.2 Action The action to be taken depends on the setting of the Underflow Exception Enable bit of the FPSCR. When Underflow Exception is enabled (FPSCRUE=1) and an Underflow Exception occurs, the following actions are taken: 1. Underflow Exception is set FPSCRUX 1 1 2. For double-precision arithmetic instructions, the exponent of the normalized intermediate result is adjusted by adding 1536 3. For single-precision arithmetic instructions and the Floating Round to Single-Precision instruction, the exponent of the normalized intermediate result is adjusted by adding 192 4. The adjusted rounded result is placed into the tar- get FPR 5. FPSCRFPRF is set to indicate the class and sign of the result (± Normalized Number) 106 Power ISATM -- Book I Version 2.04 Programming Note 4.5 Floating-Point Execution The FR and FI bits are provided to allow the system Models floating-point enabled exception error handler, when invoked because of an Underflow Exception, All implementations of this architecture must provide to simulate a "trap disabled" environment. That is, the equivalent of the following execution models to the FR and FI bits allow the system floating-point ensure that identical results are obtained. enabled exception error handler to unround the result, thus allowing the result to be denormalized. Special rules are provided in the definition of the com- putational instructions for the infinities, denormalized numbers and NaNs. The material in the remainder of When Underflow Exception is disabled (FPSCRUE=0) this section applies to instructions that have numeric and an Underflow Exception occurs, the following operands and a numeric result (i.e., operands and actions are taken: result that are not infinities or NaNs), and that cause no 1. Underflow Exception is set exceptions. See Section 4.3.2 and Section 4.4 for the FPSCRUX 1 1 cases not covered here. 2. The rounded result is placed into the target FPR 3. FPSCRFPRF is set to indicate the class and sign of Although the double format specifies an 11-bit expo- the result (± Normalized Number, ± Denormalized nent, exponent arithmetic makes use of two additional Number, or ± Zero) bits to avoid potential transient overflow conditions. One extra bit is required when denormalized dou- ble-precision numbers are prenormalized. The second 4.4.5 Inexact Exception bit is required to permit the computation of the adjusted exponent value in the following cases when the corre- sponding exception enable bit is 1: 4.4.5.1 Definition 1 Underflow during multiplication using a denormal- An Inexact Exception occurs when one of two condi- ized operand. tions occur during rounding: 1 Overflow during division using a denormalized divi- 1. The rounded result differs from the intermediate sor. result assuming both the precision and the expo- The IEEE standard includes 32-bit and 64-bit arith- nent range of the intermediate result to be metic. The standard requires that single-precision arith- unbounded. In this case the result is said to be metic be provided for single-precision operands. The inexact. (If the rounding causes an enabled Over- standard permits double-precision floating-point opera- flow Exception or an enabled Underflow Exception, tions to have either (or both) single-precision or dou- an Inexact Exception also occurs only if the signifi- ble-precision operands, but states that single-precision cands of the rounded result and the intermediate floating-point operations should not accept double-pre- result differ.) cision operands. The Power ISA follows these guide- 2. The rounded result overflows and Overflow Excep- lines; double-precision arithmetic instructions can have tion is disabled. operands of either or both precisions, while single-pre- cision arithmetic instructions require all operands to be 4.4.5.2 Action single-precision. Double-precision arithmetic instruc- tions and fcfid produce double-precision values, while The action to be taken does not depend on the setting single-precision arithmetic instructions produce sin- of the Inexact Exception Enable bit of the FPSCR. gle-precision values. When an Inexact Exception occurs, the following For arithmetic instructions, conversions from dou- actions are taken: ble-precision to single-precision must be done explicitly 1. Inexact Exception is set by software, while conversions from single-precision to FPSCRXX 1 1 double-precision are done implicitly. 2. The rounded or overflowed result is placed into the target FPR 4.5.1 Execution Model for IEEE 3. FPSCRFPRF is set to indicate the class and sign of the result Operations The following description uses 64-bit arithmetic as an Programming Note example. 32-bit arithmetic is similar except that the In some implementations, enabling Inexact Excep- FRACTION is a 23-bit field, and the single-precision tions may degrade performance more than does Guard, Round, and Sticky bits (described in this sec- enabling other types of floating-point exception. tion) are logically adjacent to the 23-bit FRACTION field. Chapter 4. Floating-Point Processor [Category: Floating-Point] 107 Version 2.04 IEEE-conforming significand arithmetic is considered to The significand of the intermediate result is prepared be performed with a floating-point accumulator having for rounding by shifting its contents right, if required, the following format, where bits 0:55 comprise the sig- until the least significant bit to be retained is in the nificand of the intermediate result. low-order bit position of the fraction. Four user-select- able rounding modes are provided through FPSCRRN S C L FRACTION GR X as described in Section 4.3.6, "Rounding" on page 101. 0 1 53 54 55 Using Z1 and Z2 as defined on page 101, the rules for rounding in each mode are as follows. Figure 52. IEEE 64-bit execution model 1 Round to Nearest The S bit is the sign bit. Guard bit = 0 The C bit is the carry bit, which captures the carry out The result is truncated. (Result exact (GRX=000) of the significand. or closest to next lower value in magnitude (GRX=001, 010, or 011)) The L bit is the leading unit bit of the significand, which receives the implicit bit from the operand. Guard bit = 1 The FRACTION is a 52-bit field that accepts the frac- Depends on Round and Sticky bits: tion of the operand. Case a The Guard (G), Round (R), and Sticky (X) bits are If the Round or Sticky bit is 1 (inclusive), the extensions to the low-order bits of the accumulator. The result is incremented. (Result closest to next G and R bits are required for postnormalization of the higher value in magnitude (GRX=101, 110, result. The G, R, and X bits are required during round- or 111)) ing to determine if the intermediate result is equally Case b near the two nearest representable values. The X bit If the Round and Sticky bits are 0 (result serves as an extension to the G and R bits by repre- midway between closest representable val- senting the logical OR of all bits that may appear to the ues), then if the low-order bit of the result is low-order side of the R bit, due either to shifting the 1 the result is incremented. Otherwise (the accumulator right or to other generation of low-order low-order bit of the result is 0) the result is result bits. The G and R bits participate in the left shifts truncated (this is the case of a tie rounded with zeros being shifted into the R bit. Figure 53 shows to even). the significance of the G, R, and X bits with respect to the intermediate result (IR), the representable number 1 Round toward Zero next lower in magnitude (NL), and the representable Choose the smaller in magnitude of Z1 or Z2. If the number next higher in magnitude (NH). Guard, Round, or Sticky bit is nonzero, the result is inexact. GRX Interpretation 1 Round toward + Infinity 000 IR is exact Choose Z1. 001 1 Round toward - Infinity 010 IR closer to NL Choose Z2. 011 If rounding results in a carry into C, the significand is 100 IR midway between NL and NH shifted right one position and the exponent is incre- mented by one. This yields an inexact result, and possi- 101 bly also exponent overflow. If any of the Guard, Round, 110 IR closer to NH or Sticky bits is nonzero, then the result is also inexact. 111 Fraction bits are stored to the target FPR. For Floating Round to Integer, Floating Round to Single-Precision, Figure 53. Interpretation of G, R, and X bits and single-precision arithmetic instructions, low-order Figure 54 shows the positions of the Guard, Round, zeros must be appended as appropriate to fill out the and Sticky bits for double-precision and single-preci- double-precision fraction. sion floating-point numbers relative to the accumulator illustrated in Figure 52. Format Guard Round Sticky Double G bit R bit X bit Single 24 25 OR of 26:52, G, R, X Figure 54. Location of the Guard, Round, and Sticky bits in the IEEE execution model 108 Power ISATM -- Book I Version 2.04 4.5.2 Execution Model for If the instruction is Floating Negative Multiply-Add or Floating Negative Multiply-Subtract, the final result is Multiply-Add Type Instructions negated. The Power ISA provides a special form of instruction that performs up to three operations in one instruction (a multiplication, an addition, and a negation). With this added capability comes the special ability to produce a more exact intermediate result as input to the rounder. 32-bit arithmetic is similar except that the FRACTION field is smaller. Multiply-add significand arithmetic is considered to be performed with a floating-point accumulator having the following format, where bits 0:106 comprise the signifi- cand of the intermediate result. S C L FRACTION X' 0 1 2 3 106 Figure 55. Multiply-add 64-bit execution model The first part of the operation is a multiplication. The multiplication has two 53-bit significands as inputs, which are assumed to be prenormalized, and produces a result conforming to the above model. If there is a carry out of the significand (into the C bit), then the sig- nificand is shifted right one position, shifting the L bit (leading unit bit) into the most significant bit of the FRACTION and shifting the C bit (carry out) into the L bit. All 106 bits (L bit, the FRACTION) of the product take part in the add operation. If the exponents of the two inputs to the adder are not equal, the significand of the operand with the smaller exponent is aligned (shifted) to the right by an amount that is added to that exponent to make it equal to the other input's exponent. Zeros are shifted into the left of the significand as it is aligned and bits shifted out of bit 105 of the significand are ORed into the X' bit. The add operation also pro- duces a result conforming to the above model with the X' bit taking part in the add operation. The result of the addition is then normalized, with all bits of the addition result, except the X' bit, participating in the shift. The normalized result serves as the inter- mediate result that is input to the rounder. For rounding, the conceptual Guard, Round, and Sticky bits are defined in terms of accumulator bits. Figure 56 shows the positions of the Guard, Round, and Sticky bits for double-precision and single-precision float- ing-point numbers in the multiply-add execution model. Format Guard Round Sticky Double 53 54 OR of 55:105, X' Single 24 25 OR of 26:105, X' Figure 56. Location of the Guard, Round, and Sticky bits in the multiply-add execution model The rules for rounding the intermediate result are the same as those given in Section 4.5.1. Chapter 4. Floating-Point Processor [Category: Floating-Point] 109 Version 2.04 4.6 Floating-Point Processor Instructions For each instruction in this section that defines the use of an Rc bit, the behavior defined for the instruction cor- responding to Rc=1 is considered part of the Float- ing-Point.Record category. 110 Power ISATM -- Book I Version 2.04 4.6.1 Floating-Point Storage Access Instructions The Storage Access instructions compute the effective 4.6.1.1 Storage Access Exceptions address (EA) of the storage to be accessed as described in Section 1.10.3, "Effective Address Calcu- Storage accesses will cause the system data storage lation" on page 23. error handler to be invoked if the program is not allowed to modify the target storage (Store only), or if the pro- Programming Note gram attempts to access storage that is unavailable. The la extended mnemonic permits computing an effective address as a Load or Store instruction would, but loads the address itself into a GPR rather than loading the value that is in storage at that address. This extended mnemonic is described in Section D.9, "Miscellaneous Mnemon- ics" on page 327. 4.6.2 Floating-Point Load Instructions There are two basic forms of load instruction: sin- exp 1 exp - 1 gle-precision and double-precision. Because the FPRs FRT0 1 sign support only floating-point double format, single-preci- FRT1:11 1 exp + 1023 sion Load Floating-Point instructions convert sin- FRT12:63 1 frac1:52 gle-precision data to double format prior to loading the Zero / Infinity / NaN operand into the target FPR. The conversion and load- if WORD1:8 = 255 or WORD1:31 = 0 then ing steps are as follows. FRT0:1 1 WORD0:1 Let WORD0:31 be the floating-point single-precision FRT2 1 WORD1 operand accessed from storage. FRT3 1 WORD1 FRT4 1 WORD1 Normalized Operand FRT5:63 1 WORD2:31 || 290 if WORD1:8 > 0 and WORD1:8 < 255 then FRT0:1 1 WORD0:1 For double-precision Load Floating-Point instructions FRT2 1 ¬WORD1 no conversion is required, as the data from storage are FRT3 1 ¬WORD1 copied directly into the FPR. FRT4 1 ¬WORD1 Many of the Load Floating-Point instructions have an FRT5:63 1 WORD2:31 || 290 "update" form, in which register RA is updated with the Denormalized Operand effective address. For these forms, if RA0, the effec- if WORD1:8 = 0 and WORD9:31 0 then tive address is placed into register RA and the storage sign 1 WORD0 element (word or doubleword) addressed by EA is exp 1 -126 loaded into FRT. frac0:52 1 0b0 || WORD9:31 || 290 Note: Recall that RA and RB denote General Purpose normalize the operand Registers, while FRT denotes a Floating-Point Register. do while frac0 = 0 frac0:52 1 frac1:52 || 0b0 Chapter 4. Floating-Point Processor [Category: Floating-Point] 111 Version 2.04 Load Floating-Point Single D-form Load Floating-Point Single Indexed X-form lfs FRT,D(RA) lfsx FRT,RA,RB 48 FRT RA D 0 6 11 16 31 31 FRT RA RB 535 / 0 6 11 16 21 31 if RA = 0 then b 1 0 else b 1 (RA) if RA = 0 then b 1 0 EA 1 b + EXTS(D) else b 1 (RA) FRT 1 DOUBLE(MEM(EA, 4)) EA 1 b + (RB) FRT 1 DOUBLE(MEM(EA, 4)) Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum (RA|0)+(RB). The word in storage addressed by EA is interpreted as a floating-point single-precision operand. This word is The word in storage addressed by EA is interpreted as converted to floating-point double format (see a floating-point single-precision operand. This word is page 111) and placed into register FRT. converted to floating-point double format (see page 111) and placed into register FRT. Special Registers Altered: None Special Registers Altered: None Load Floating-Point Single with Update Load Floating-Point Single with Update D-form Indexed X-form lfsu FRT,D(RA) lfsux FRT,RA,RB 49 FRT RA D 31 FRT RA RB 567 / 0 6 11 16 31 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) EA 1 (RA) + (RB) FRT 1 DOUBLE(MEM(EA, 4)) FRT 1 DOUBLE(MEM(EA, 4)) RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB). The word in storage addressed by EA is interpreted as The word in storage addressed by EA is interpreted as a floating-point single-precision operand. This word is a floating-point single-precision operand. This word is converted to floating-point double format (see converted to floating-point double format (see page 111) and placed into register FRT. page 111) and placed into register FRT. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None 112 Power ISATM -- Book I Version 2.04 Load Floating-Point Double D-form Load Floating-Point Double Indexed X-form lfd FRT,D(RA) lfdx FRT,RA,RB 50 FRT RA D 0 6 11 16 31 31 FRT RA RB 599 / 0 6 11 16 21 31 if RA = 0 then b 1 0 else b 1 (RA) if RA = 0 then b 1 0 EA 1 b + EXTS(D) else b 1 (RA) FRT 1 MEM(EA, 8) EA 1 b + (RB) FRT 1 MEM(EA, 8) Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum (RA|0)+(RB). The doubleword in storage addressed by EA is placed into register FRT. The doubleword in storage addressed by EA is placed into register FRT. Special Registers Altered: None Special Registers Altered: None Load Floating-Point Double with Update Load Floating-Point Double with Update D-form Indexed X-form lfdu FRT,D(RA) lfdux FRT,RA,RB 51 FRT RA D 31 FRT RA RB 631 / 0 6 11 16 31 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) EA 1 (RA) + (RB) FRT 1 MEM(EA, 8) FRT 1 MEM(EA, 8) RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB). The doubleword in storage addressed by EA is placed The doubleword in storage addressed by EA is placed into register FRT. into register FRT. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Chapter 4. Floating-Point Processor [Category: Floating-Point] 113 Version 2.04 4.6.3 Floating-Point Store Instructions There are three basic forms of store instruction: sin- gle-precision Load Floating-Point from WORD will not gle-precision, double-precision, and integer. The inte- compare equal to the contents of the original source ger form is provided by the Store Floating-Point as register). Integer Word instruction, described on page 117. For double-precision Store Floating-Point instructions Because the FPRs support only floating-point double and for the Store Floating-Point as Integer Word format for floating-point data, single-precision Store instruction no conversion is required, as the data from Floating-Point instructions convert double-precision the FPR are copied directly into storage. data to single format prior to storing the operand into storage. The conversion steps are as follows. Many of the Store Floating-Point instructions have an "update" form, in which register RA is updated with the Let WORD0:31 be the word in storage written to. effective address. For these forms, if RA0, the effec- No Denormalization Required (includes Zero / Infin- tive address is placed into register RA. ity / NaN) Note: Recall that RA and RB denote General Purpose if FRS1:11 > 896 or FRS1:63 = 0 then Registers, while FRS denotes a Floating-Point Regis- WORD0:1 1 FRS0:1 ter. WORD2:31 1 FRS5:34 Denormalization Required if 874 FRS1:11 896 then sign 1 FRS0 exp 1 FRS1:11 - 1023 frac0:52 1 0b1 || FRS12:63 denormalize operand do while exp < -126 frac0:52 1 0b0 || frac0:51 exp 1 exp + 1 WORD0 1 sign WORD1:8 1 0x00 WORD9:31 1 frac1:23 else WORD 1 undefined Notice that if the value to be stored by a single-preci- sion Store Floating-Point instruction is larger in magni- tude than the maximum number representable in single format, the first case above (No Denormalization Required) applies. The result stored in WORD is then a well-defined value, but is not numerically equal to the value in the source register (i.e., the result of a sin- 114 Power ISATM -- Book I Version 2.04 Store Floating-Point Single D-form Store Floating-Point Single Indexed X-form stfs FRS,D(RA) stfsx FRS,RA,RB 52 FRS RA D 0 6 11 16 31 31 FRS RA RB 663 / 0 6 11 16 21 31 if RA = 0 then b 1 0 else b 1 (RA) if RA = 0 then b 1 0 EA 1 b + EXTS(D) else b 1 (RA) MEM(EA, 4) 1 SINGLE((FRS)) EA 1 b + (RB) MEM(EA, 4) 1 SINGLE((FRS)) Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum (RA|0)+(RB). The contents of register FRS are converted to single format (see page 114) and stored into the word in stor- The contents of register FRS are converted to single age addressed by EA. format (see page 114) and stored into the word in stor- age addressed by EA. Special Registers Altered: None Special Registers Altered: None Store Floating-Point Single with Update Store Floating-Point Single with Update D-form Indexed X-form stfsu FRS,D(RA) stfsux FRS,RA,RB 53 FRS RA D 31 FRS RA RB 695 / 0 6 11 16 31 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) EA 1 (RA) + (RB) MEM(EA, 4) 1 SINGLE((FRS)) MEM(EA, 4) 1 SINGLE((FRS)) RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB). The contents of register FRS are converted to single The contents of register FRS are converted to single format (see page 114) and stored into the word in stor- format (see page 114) and stored into the word in stor- age addressed by EA. age addressed by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Chapter 4. Floating-Point Processor [Category: Floating-Point] 115 Version 2.04 Store Floating-Point Double D-form Store Floating-Point Double Indexed X-form stfd FRS,D(RA) stfdx FRS,RA,RB 54 FRS RA D 0 6 11 16 31 31 FRS RA RB 727 / 0 6 11 16 21 31 if RA = 0 then b 1 0 else b 1 (RA) if RA = 0 then b 1 0 EA 1 b + EXTS(D) else b 1 (RA) MEM(EA, 8) 1 (FRS) EA 1 b + (RB) MEM(EA, 8) 1 (FRS) Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum (RA|0)+(RB). The contents of register FRS are stored into the dou- bleword in storage addressed by EA. The contents of register FRS are stored into the dou- bleword in storage addressed by EA. Special Registers Altered: None Special Registers Altered: None Store Floating-Point Double with Update Store Floating-Point Double with Update D-form Indexed X-form stfdu FRS,D(RA) stfdux FRS,RA,RB 55 FRS RA D 31 FRS RA RB 759 / 0 6 11 16 31 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) EA 1 (RA) + (RB) MEM(EA, 8) 1 (FRS) MEM(EA, 8) 1 (FRS) RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB). The contents of register FRS are stored into the dou- The contents of register FRS are stored into the dou- bleword in storage addressed by EA. bleword in storage addressed by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None 116 Power ISATM -- Book I Version 2.04 Store Floating-Point as Integer Word Indexed X-form stfiwx FRS,RA,RB 31 FRS RA RB 983 / 0 6 11 16 21 31 if RA = 0 then b 1 0 else b 1 (RA) EA 1 b + (RB) MEM(EA, 4) 1 (FRS)32:63 Let the effective address (EA) be the sum (RA|0)+(RB). The contents of the low-order 32 bits of register FRS are stored, without conversion, into the word in storage addressed by EA. If the contents of register FRS were produced, either directly or indirectly, by a Load Floating-Point Single instruction, a single-precision Arithmetic instruction, or frsp, then the value stored is undefined. (The contents of register FRS are produced directly by such an instruction if FRS is the target register for the instruc- tion. The contents of register FRS are produced indi- rectly by such an instruction if FRS is the final target register of a sequence of one or more Floating-Point Move instructions, with the input to the sequence hav- ing been produced directly by such an instruction.) Special Registers Altered: None Chapter 4. Floating-Point Processor [Category: Floating-Point] 117 Version 2.04 4.6.4 Floating-Point Move Instructions These instructions copy data from one floating-point (e.g., the sign bit of a NaN may be altered by fneg, register to another, altering the sign bit (bit 0) as fabs, and fnabs). These instructions do not alter the described below for fneg, fabs, and fnabs. These FPSCR. instructions treat NaNs just like any other kind of value Floating Move Register X-form Floating Negate X-form fmr FRT,FRB (Rc=0) fneg FRT,FRB (Rc=0) fmr. FRT,FRB (Rc=1) fneg. FRT,FRB (Rc=1) 63 FRT /// FRB 72 Rc 63 FRT /// FRB 40 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The contents of register FRB are placed into register The contents of register FRB with bit 0 inverted are FRT. placed into register FRT. Special Registers Altered: Special Registers Altered: CR1 (if Rc=1) CR1 (if Rc=1) Floating Absolute Value X-form Floating Negative Absolute Value X-form fabs FRT,FRB (Rc=0) fnabs FRT,FRB (Rc=0) fabs. FRT,FRB (Rc=1) fnabs. FRT,FRB (Rc=1) 63 FRT /// FRB 264 Rc 63 FRT /// FRB 136 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The contents of register FRB with bit 0 set to zero are The contents of register FRB with bit 0 set to one are placed into register FRT. placed into register FRT. Special Registers Altered: Special Registers Altered: CR1 (if Rc=1) CR1 (if Rc=1) 118 Power ISATM -- Book I Version 2.04 4.6.5 Floating-Point Arithmetic Instructions 4.6.5.1 Floating-Point Elementary Arithmetic Instructions Floating Add [Single] A-form Floating Subtract [Single] A-form fadd FRT,FRA,FRB (Rc=0) fsub FRT,FRA,FRB (Rc=0) fadd. FRT,FRA,FRB (Rc=1) fsub. FRT,FRA,FRB (Rc=1) 63 FRT FRA FRB /// 21 Rc 63 FRT FRA FRB /// 20 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 fadds FRT,FRA,FRB (Rc=0) fsubs FRT,FRA,FRB (Rc=0) fadds. FRT,FRA,FRB (Rc=1) fsubs. FRT,FRA,FRB (Rc=1) 59 FRT FRA FRB /// 21 Rc 59 FRT FRA FRB /// 20 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 The floating-point operand in register FRA is added to The floating-point operand in register FRB is subtracted the floating-point operand in register FRB. from the floating-point operand in register FRA. If the most significant bit of the resultant significand is If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR and placed Rounding Control field RN of the FPSCR and placed into register FRT. into register FRT. Floating-point addition is based on exponent compari- The execution of the Floating Subtract instruction is son and addition of the two significands. The expo- identical to that of Floating Add, except that the con- nents of the two operands are compared, and the tents of FRB participate in the operation with the sign significand accompanying the smaller exponent is bit (bit 0) inverted. shifted right, with its exponent increased by one for FPSCRFPRF is set to the class and sign of the result, each bit shifted, until the two exponents are equal. The except for Invalid Operation Exceptions when two significands are then added or subtracted as FPSCRVE=1. appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand Special Registers Altered: as well as all three guard bits (G, R, and X) enter into FPRF FR FI the computation. FX OX UX XX VXSNAN VXISI If a carry occurs, the sum's significand is shifted right CR1 (if Rc=1) one bit position and the exponent is increased by one. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI CR1 (if Rc=1) Chapter 4. Floating-Point Processor [Category: Floating-Point] 119 Version 2.04 Floating Multiply [Single] A-form Floating Divide [Single] A-form fmul FRT,FRA,FRC (Rc=0) fdiv FRT,FRA,FRB (Rc=0) fmul. FRT,FRA,FRC (Rc=1) fdiv. FRT,FRA,FRB (Rc=1) 63 FRT FRA /// FRC 25 Rc 63 FRT FRA FRB /// 18 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 fmuls FRT,FRA,FRC (Rc=0) fdivs FRT,FRA,FRB (Rc=0) fmuls. FRT,FRA,FRC (Rc=1) fdivs. FRT,FRA,FRB (Rc=1) 59 FRT FRA /// FRC 25 Rc 59 FRT FRA FRB /// 18 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 The floating-point operand in register FRA is multiplied The floating-point operand in register FRA is divided by by the floating-point operand in register FRC. the floating-point operand in register FRB. The remain- der is not supplied as a result. If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to If the most significant bit of the resultant significand is the target precision under control of the Floating-Point not 1, the result is normalized. The result is rounded to Rounding Control field RN of the FPSCR and placed the target precision under control of the Floating-Point into register FRT. Rounding Control field RN of the FPSCR and placed into register FRT. Floating-point multiplication is based on exponent addi- tion and multiplication of the significands. Floating-point division is based on exponent subtrac- tion and division of the significands. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRFPRF is set to the class and sign of the result, FPSCRVE=1. except for Invalid Operation Exceptions when FPSCRVE=1 and Zero Divide Exceptions when Special Registers Altered: FPSCRZE=1. FPRF FR FI FX OX UX XX Special Registers Altered: VXSNAN VXIMZ FPRF FR FI CR1 (if Rc=1) FX OX UX ZX XX VXSNAN VXIDI VXZDZ CR1 (if Rc=1) 120 Power ISATM -- Book I Version 2.04 Floating Square Root [Single] A-form Floating Reciprocal Estimate [Single] A-form fsqrt FRT,FRB (Rc=0) fsqrt. FRT,FRB (Rc=1) fre FRT,FRB (Rc=0) [Category:Floating-Point.Phased-In] 63 FRT /// FRB /// 22 Rc 0 6 11 16 21 26 31 fre. FRT,FRB (Rc=1) [Category: Floating-Point.Record.Phased-In] fsqrts FRT,FRB (Rc=0) fsqrts. FRT,FRB (Rc=1) 63 FRT /// FRB /// 24 Rc 0 6 11 16 21 26 31 59 FRT /// FRB /// 22 Rc 0 6 11 16 21 26 31 fres FRT,FRB (Rc=0) fres. FRT,FRB (Rc=1) The square root of the floating-point operand in register FRB is placed into register FRT. 59 FRT /// FRB /// 24 Rc 0 6 11 16 21 26 31 If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to A estimate of the reciprocal of the floating-point oper- the target precision under control of the Floating-Point and in register FRB is placed into register FRT. The Rounding Control field RN of the FPSCR and placed estimate placed into register FRT is correct to a preci- into register FRT. sion of one part in 256 of the reciprocal of (FRB), i.e., Operation with various special values of the operand is estimate ­ 1 / x 1- ABS(--------------------------------------) --------- - summarized below. 1/x 256 Operand Result Exception where x is the initial value in FRB. - QNaN1 VXSQRT Operation with various special values of the operand is <0 QNaN1 VXSQRT summarized below. -0 -0 None + + None Operand Result Exception SNaN QNaN1 VXSNAN - -0 None QNaN QNaN None -0 -1 ZX 1 No result if FPSCRVE = 1 +0 +1 ZX + +0 None FPSCRFPRF is set to the class and sign of the result, SNaN QNaN2 VXSNAN except for Invalid Operation Exceptions when QNaN QNaN None FPSCRVE=1. 1 No result if FPSCRZE = 1. 2 Special Registers Altered: No result if FPSCRVE = 1. FPRF FR FI FPSCRFPRF is set to the class and sign of the result, FX XX except for Invalid Operation Exceptions when VXSNAN VXSQRT FPSCRVE=1 and Zero Divide Exceptions when CR1 (if Rc=1) FPSCRZE=1. The results of executing this instruction may vary between implementations, and between different exe- cutions on the same implementation. Special Registers Altered: FPRF FR (undefined) FI (undefined) FX OX UX ZX XX (undefined) VXSNAN CR1 (if Rc=1) Chapter 4. Floating-Point Processor [Category: Floating-Point] 121 Version 2.04 Floating Reciprocal Square Root Estimate [Single] A-form frsqrte FRT,FRB (Rc=0) [Category:Floating-Point.Phased-In] frsqrte. FRT,FRB (Rc=1) [Category:Floating-Point.Record.Phased-In] 63 FRT /// FRB /// 26 Rc 0 6 11 16 21 26 31 frsqrtes FRT,FRB (Rc=0) frsqrtes. FRT,FRB (Rc=1) 59 FRT /// FRB /// 26 Rc 0 6 11 16 21 26 31 A estimate of the reciprocal of the square root of the floating-point operand in register FRB is placed into register FRT. The estimate placed into register FRT is correct to a precision of one part in 32 of the reciprocal of the square root of (FRB), i.e., ABS(estimate ­ 1 / ( x )) ----- 1- ----------------------------------------------- - 1 / ( x) 32 where x is the initial value in FRB. Operation with various special values of the operand is summarized below. Operand Result Exception - QNaN2 VXSQRT <0 QNaN2 VXSQRT -0 -1 ZX +0 +1 ZX + +0 None SNaN QNaN2 VXSNAN QNaN QNaN None 1 No result if FPSCR ZE = 1. 2 No result if FPSCR VE = 1. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1 and Zero Divide Exceptions when FPSCRZE=1. The results of executing this instruction may vary between implementations, and between different exe- cutions on the same implementation. Special Registers Altered: FPRF FR (undefined) FI (undefined) FX ZX XX (undefined) VXSNAN VXSQRT CR1 (if Rc=1) 122 Power ISATM -- Book I Version 2.04 4.6.5.2 Floating-Point Multiply-Add Instructions These instructions combine a multiply and an add oper- based on the final result of the operation, and not ation without an intermediate rounding operation. The on the result of the multiplication. fraction part of the intermediate product is 106 bits wide 1 Invalid Operation Exception bits are set as if the (L bit, FRACTION), and all 106 bits take part in the add/ multiplication and the addition were performed subtract portion of the instruction. using two separate instructions (fmul[s], followed Status bits are set as follows. by fadd[s] or fsub[s]). That is, multiplication of infinity by 0 or of anything by an SNaN, and/or 1 Overflow, Underflow, and Inexact Exception bits, addition of an SNaN, cause the corresponding the FR and FI bits, and the FPRF field are set exception bits to be set. Floating Multiply-Add [Single] A-form Floating Multiply-Subtract [Single] A-form fmadd FRT,FRA,FRC,FRB (Rc=0) fmsub FRT,FRA,FRC,FRB (Rc=0) fmadd. FRT,FRA,FRC,FRB (Rc=1) fmsub. FRT,FRA,FRC,FRB (Rc=1) 63 FRT FRA FRB FRC 29 Rc 63 FRT FRA FRB FRC 28 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 fmadds FRT,FRA,FRC,FRB (Rc=0) fmsubs FRT,FRA,FRC,FRB (Rc=0) fmadds. FRT,FRA,FRC,FRB (Rc=1) fmsubs. FRT,FRA,FRC,FRB (Rc=1) 59 FRT FRA FRB FRC 29 Rc 59 FRT FRA FRB FRC 28 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 The operation The operation FRT 1 [(FRA)×(FRC)] + (FRB) FRT 1 [(FRA)×(FRC)] - (FRB) is performed. is performed. The floating-point operand in register FRA is multiplied The floating-point operand in register FRA is multiplied by the floating-point operand in register FRC. The by the floating-point operand in register FRC. The floating-point operand in register FRB is added to this floating-point operand in register FRB is subtracted intermediate result. from this intermediate result. If the most significant bit of the resultant significand is If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR and placed Rounding Control field RN of the FPSCR and placed into register FRT. into register FRT. FPSCRFPRF is set to the class and sign of the result, FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when except for Invalid Operation Exceptions when FPSCRVE=1. FPSCRVE=1. Special Registers Altered: Special Registers Altered: FPRF FR FI FPRF FR FI FX OX UX XX FX OX UX XX VXSNAN VXISI VXIMZ VXSNAN VXISI VXIMZ CR1 (if Rc=1) CR1 (if Rc=1) Chapter 4. Floating-Point Processor [Category: Floating-Point] 123 Version 2.04 Floating Negative Multiply-Add [Single] Floating Negative Multiply-Subtract A-form [Single] A-form fnmadd FRT,FRA,FRC,FRB (Rc=0) fnmsub FRT,FRA,FRC,FRB (Rc=0) fnmadd. FRT,FRA,FRC,FRB (Rc=1) fnmsub. FRT,FRA,FRC,FRB (Rc=1) 63 FRT FRA FRB FRC 31 Rc 63 FRT FRA FRB FRC 30 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 fnmadds FRT,FRA,FRC,FRB (Rc=0) fnmsubs FRT,FRA,FRC,FRB (Rc=0) fnmadds. FRT,FRA,FRC,FRB (Rc=1) fnmsubs. FRT,FRA,FRC,FRB (Rc=1) 59 FRT FRA FRB FRC 31 Rc 59 FRT FRA FRB FRC 30 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 The operation The operation FRT 1 - ( [(FRA)×(FRC)] + (FRB) ) FRT 1 - ( [(FRA)×(FRC)] - (FRB) ) is performed. is performed. The floating-point operand in register FRA is multiplied The floating-point operand in register FRA is multiplied by the floating-point operand in register FRC. The by the floating-point operand in register FRC. The floating-point operand in register FRB is added to this floating-point operand in register FRB is subtracted intermediate result. from this intermediate result. If the most significant bit of the resultant significand is If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR, then negated Rounding Control field RN of the FPSCR, then negated and placed into register FRT. and placed into register FRT. This instruction produces the same result as would be This instruction produces the same result as would be obtained by using the Floating Multiply-Add instruction obtained by using the Floating Multiply-Subtract and then negating the result, with the following excep- instruction and then negating the result, with the follow- tions. ing exceptions. 1 QNaNs propagate with no effect on their "sign" bit. 1 QNaNs propagate with no effect on their "sign" bit. 1 QNaNs that are generated as the result of a dis- 1 QNaNs that are generated as the result of a dis- abled Invalid Operation Exception have a "sign" bit abled Invalid Operation Exception have a "sign" bit of 0. of 0. 1 SNaNs that are converted to QNaNs as the result 1 SNaNs that are converted to QNaNs as the result of a disabled Invalid Operation Exception retain the of a disabled Invalid Operation Exception retain the "sign" bit of the SNaN. "sign" bit of the SNaN. FPSCRFPRF is set to the class and sign of the result, FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when except for Invalid Operation Exceptions when FPSCRVE=1. FPSCRVE=1. Special Registers Altered: Special Registers Altered: FPRF FR FI FPRF FR FI FX OX UX XX FX OX UX XX VXSNAN VXISI VXIMZ VXSNAN VXISI VXIMZ CR1 (if Rc=1) CR1 (if Rc=1) 124 Power ISATM -- Book I Version 2.04 4.6.6 Floating-Point Rounding and Conversion Instructions Programming Note Examples of uses of these instructions to perform various conversions can be found in Section E.2, "Floating-Point Conversions [Category: Float- ing-Point]" on page 334. 4.6.6.1 Floating-Point Rounding 4.6.6.2 Floating-Point Convert To/From Instruction Integer Instructions Floating Round to Single-Precision Floating Convert To Integer Doubleword X-form X-form frsp FRT,FRB (Rc=0) fctid FRT,FRB (Rc=0) frsp. FRT,FRB (Rc=1) fctid. FRT,FRB (Rc=1) 63 FRT /// FRB 12 Rc 63 FRT /// FRB 814 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The floating-point operand in register FRB is rounded The floating-point operand in register FRB is converted to single-precision, using the rounding mode specified to a 64-bit signed fixed-point integer, using the rounding by FPSCRRN, and placed into register FRT. mode specified by FPSCRRN, and placed into register FRT. The rounding is described fully in Section A.1, "Float- ing-Point Round to Single-Precision Model" on If the operand in FRB is greater than 263 - 1, then FRT page 299. is set to 0x7FFF_FFFF_FFFF_FFFF. If the operand in FRB is less than -263, then FRT is set to FPSCRFPRF is set to the class and sign of the result, 0x8000_0000_0000_0000. except for Invalid Operation Exceptions when FPSCRVE=1. The conversion is described fully in Section A.2, "Float- ing-Point Convert to Integer Model" on page 303. Special Registers Altered: FPRF FR FI Except for enabled Invalid Operation Exceptions, FX OX UX XX FPSCRFPRF is undefined. FPSCRFR is set if the result VXSNAN is incremented when rounded. FPSCRFI is set if the CR1 (if Rc=1) result is inexact. Special Registers Altered: FPRF (undefined) FR FI FX XX VXSNAN VXCVI CR1 (if Rc=1) Chapter 4. Floating-Point Processor [Category: Floating-Point] 125 Version 2.04 Floating Convert To Integer Doubleword Floating Convert To Integer Word X-form with round toward Zero X-form fctiw FRT,FRB (Rc=0) fctidz FRT,FRB (Rc=0) fctiw. FRT,FRB (Rc=1) fctidz. FRT,FRB (Rc=1) 63 FRT /// FRB 14 Rc 63 FRT /// FRB 815 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The floating-point operand in register FRB is converted The floating-point operand in register FRB is converted to a 32-bit signed fixed-point integer, using the rounding to a 64-bit signed fixed-point integer, using the rounding mode specified by FPSCRRN, and placed into mode Round toward Zero, and placed into register FRT. FRT32:63. The contents of FRT0:31 are undefined. If the operand in FRB is greater than 263 - 1, then FRT If the operand in FRB is greater than 231 - 1, then bits is set to 0x7FFF_FFFF_FFFF_FFFF. If the operand in 32:63 of FRT are set to 0x7FFF_FFFF. If the operand FRB is less than -263, then FRT is set to in FRB is less than -231, then bits 32:63 of FRT are set 0x8000_0000_0000_0000. to 0x8000_0000. The conversion is described fully in Section A.2, "Float- The conversion is described fully in Section A.2, "Float- ing-Point Convert to Integer Model" on page 303. ing-Point Convert to Integer Model" on page 303. Except for enabled Invalid Operation Exceptions, Except for enabled Invalid Operation Exceptions, FPSCRFPRF is undefined. FPSCRFR is set if the result FPSCRFPRF is undefined. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the is incremented when rounded. FPSCRFI is set if the result is inexact. result is inexact. Special Registers Altered: Special Registers Altered: FPRF (undefined) FR FI FPRF (undefined) FR FI FX XX FX XX VXSNAN VXCVI VXSNAN VXCVI CR1 (if Rc=1) CR1 (if Rc=1) 126 Power ISATM -- Book I Version 2.04 Floating Convert To Integer Word with 4.6.6.3 Floating Round to Integer round toward Zero X-form Instructions [Category: Float- fctiwz FRT,FRB (Rc=0) ing-Point.Phased-In] fctiwz. FRT,FRB (Rc=1) The Floating Round to Integer instructions provide direct support for rounding functions found in high level 63 FRT /// FRB 15 Rc languages. For example, frin, friz, frip, and frim imple- 0 6 11 16 21 31 ment C++ round(), trunc(), ceil(), and floor(), respec- tively. Note that frin does not implement the IEEE The floating-point operand in register FRB is converted Round to Nearest function, which is often further to a 32-bit signed fixed-point integer, using the rounding described as "ties to even." The rounding performed by mode Round toward Zero, and placed into FRT32:63. these instructions is described fully in Section A.4, The contents of FRT0:31 are undefined. "Floating-Point Round to Integer Model" on page 307. If the operand in FRB is greater than 231 - 1, then bits Programming Note 32:63 of FRT are set to 0x7FFF_FFFF. If the operand These instructions set FPSCRFR FI to 0b00 regard- in FRB is less than -231, then bits 32:63 of FRT are set less of whether the result is inexact or rounded to 0x8000_0000. because there is a desire to preserve the value of The conversion is described fully in Section A.2, "Float- FPSCRXX. Furthermore, it is believed that most ing-Point Convert to Integer Model". programs do not need to know whether these rounding operations produce inexact or rounded Except for enabled Invalid Operation Exceptions, results. If it is necessary to determine whether the FPSCRFPRF is undefined. FPSCRFR is set if the result result is inexact or rounded, software must com- is incremented when rounded. FPSCRFI is set if the pare the result with the original source operand. result is inexact. Special Registers Altered: FPRF (undefined) FR FI FX XX VXSNAN VXCVI CR1 (if Rc=1) Floating Convert From Integer Doubleword X-form fcfid FRT,FRB (Rc=0) fcfid. FRT,FRB (Rc=1) 63 FRT /// FRB 846 Rc 0 6 11 16 21 31 The 64-bit signed fixed-point operand in register FRB is converted to an infinitely precise floating-point integer. The result of the conversion is rounded to double-preci- sion, using the rounding mode specified by FPSCRRN, and placed into register FRT. The conversion is described fully in Section A.3, "Float- ing-Point Convert from Integer Model". FPSCRFPRF is set to the class and sign of the result. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF FR FI FX XX CR1 (if Rc=1) Chapter 4. Floating-Point Processor [Category: Floating-Point] 127 Version 2.04 Floating Round to Integer Nearest X-form Floating Round to Integer Plus X-form frin FRT,FRB (Rc=0) frip FRT,FRB (Rc=0) frin. FRT,FRB (Rc=1) frip. FRT,FRB (Rc=1) 63 FRT /// FRB 392 Rc 63 FRT /// FRB 456 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The floating-point operand in register FRB is rounded The floating-point operand in register FRB is rounded to an integral value as follows, with the result placed to an integral value using the rounding mode round into register FRT. If the sign of the operand is positive, toward +infinity, and the result is placed into register (FRB) + 0.5 is truncated to an integral value, otherwise FRT. (FRB) - 0.5 is truncated to an integral value. FPSCRFPRF is set to the class and sign of the result, FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when except for Invalid Operation Exceptions when FPSCRVE = 1. FPSCRVE = 1. Special Registers Altered: Special Registers Altered: FPRF FR (set to 0) FI (set to 0) FPRF FR (set to 0) FI (set to 0) FX FX VXSNAN VXSNAN CR1 (if Rc = 1) CR1 (if Rc = 1) Floating Round to Integer Toward Zero Floating Round to Integer Minus X-form X-form frim FRT,FRB (Rc=0) friz FRT,FRB (Rc=0) frim. FRT,FRB (Rc=1) friz. FRT,FRB (Rc=1) 63 FRT /// FRB 488 Rc 63 FRT /// FRB 424 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The floating-point operand in register FRB is rounded The floating-point operand in register FRB is rounded to an integral value using the rounding mode round to an integral value using the rounding mode round toward -infinity, and the result is placed into register toward zero, and the result is placed into register FRT. FRT. FPSCRFPRF is set to the class and sign of the result, FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when except for Invalid Operation Exceptions when FPSCRVE = 1. FPSCRVE = 1. Special Registers Altered: Special Registers Altered: FPRF FR (set to 0) FI (set to 0) FPRF FR (set to 0) FI (set to 0) FX FX VXSNAN VXSNAN CR1 (if Rc = 1) CR1 (if Rc = 1) 128 Power ISATM -- Book I Version 2.04 4.6.7 Floating-Point Compare Instructions The floating-point Compare instructions compare the The CR field and the FPCC are set as follows. contents of two floating-point registers. Comparison ignores the sign of zero (i.e., regards +0 as equal to Bit Name Description -0). The comparison can be ordered or unordered. 0 FL (FRA) < (FRB) 1 FG (FRA) > (FRB) The comparison sets one bit in the designated CR field 2 FE (FRA) = (FRB) to 1 and the other three to 0. The FPCC is set in the 3 FU (FRA) ? (FRB) (unordered) same way. Floating Compare Unordered X-form Floating Compare Ordered X-form fcmpu BF,FRA,FRB fcmpo BF,FRA,FRB 63 BF // FRA FRB 0 / 63 BF // FRA FRB 32 / 0 6 9 11 16 21 31 0 6 9 11 16 21 31 if (FRA) is a NaN or if (FRA) is a NaN or (FRB) is a NaN then c 1 0b0001 (FRB) is a NaN then c 1 0b0001 else if (FRA) < (FRB) then c 1 0b1000 else if (FRA) < (FRB) then c 1 0b1000 else if (FRA) > (FRB) then c 1 0b0100 else if (FRA) > (FRB) then c 1 0b0100 else c 1 0b0010 else c 1 0b0010 FPCC 1 c FPCC 1 c CR4×BF:4×BF+3 1 c CR4×BF:4×BF+3 1 c if (FRA) is an SNaN or if (FRA) is an SNaN or (FRB) is an SNaN then (FRB) is an SNaN then VXSNAN 1 1 VXSNAN 1 1 if VE = 0 then VXVC 1 1 The floating-point operand in register FRA is compared else if (FRA) is a QNaN or to the floating-point operand in register FRB. The (FRB) is a QNaN then VXVC 1 1 result of the compare is placed into CR field BF and the FPCC. The floating-point operand in register FRA is compared to the floating-point operand in register FRB. The If either of the operands is a NaN, either quiet or signal- result of the compare is placed into CR field BF and the ing, then CR field BF and the FPCC are set to reflect FPCC. unordered. If either of the operands is a Signaling NaN, then VXSNAN is set. If either of the operands is a NaN, either quiet or signal- ing, then CR field BF and the FPCC are set to reflect Special Registers Altered: unordered. If either of the operands is a Signaling NaN, CR field BF then VXSNAN is set and, if Invalid Operation is dis- FPCC abled (VE=0), VXVC is set. If neither operand is a Sig- FX naling NaN but at least one operand is a Quiet NaN, VXSNAN then VXVC is set. Special Registers Altered: CR field BF FPCC FX VXSNAN VXVC Chapter 4. Floating-Point Processor [Category: Floating-Point] 129 Version 2.04 4.6.8 Floating-Point Select 4.6.9 Floating-Point Status and Instruction Control Register Instructions Every Floating-Point Status and Control Register Floating Select A-form instruction synchronizes the effects of all floating-point instructions executed by a given processor. Executing fsel FRT,FRA,FRC,FRB (Rc=0) a Floating-Point Status and Control Register instruction fsel. FRT,FRA,FRC,FRB (Rc=1) ensures that all floating-point instructions previously ini- tiated by the given processor have completed before 63 FRT FRA FRB FRC 23 Rc the Floating-Point Status and Control Register instruc- 0 6 11 16 21 26 31 tion is initiated, and that no subsequent floating-point instructions are initiated by the given processor until the if (FRA) 0.0 then FRT 1 (FRC) Floating-Point Status and Control Register instruction else FRT 1 (FRB) has completed. In particular: The floating-point operand in register FRA is compared 1 All exceptions that will be caused by the previously to the value zero. If the operand is greater than or initiated instructions are recorded in the FPSCR equal to zero, register FRT is set to the contents of reg- before the Floating-Point Status and Control Reg- ister FRC. If the operand is less than zero or is a NaN, ister instruction is initiated. register FRT is set to the contents of register FRB. The comparison ignores the sign of zero (i.e., regards +0 as 1 All invocations of the system floating-point enabled equal to -0). exception error handler that will be caused by the previously initiated instructions have occurred Special Registers Altered: before the Floating-Point Status and Control Reg- CR1 (if Rc=1) ister instruction is initiated. Programming Note 1 No subsequent floating-point instruction that depends on or alters the settings of any FPSCR Examples of uses of this instruction can be found in bits is initiated until the Floating-Point Status and Sections E.2, "Floating-Point Conversions [Cate- Control Register instruction has completed. gory: Floating-Point]" on page 334 and E.3, "Float- ing-Point Selection [Category: Floating-Point]" on (Floating-point Storage Access instructions are not page 336. affected.) Warning: Care must be taken in using fsel if IEEE compatibility is required, or if the values being tested can be NaNs or infinities; see Section E.3.4, "Notes" on page 336. 130 Power ISATM -- Book I Version 2.04 Move From FPSCR X-form Move to Condition Register from FPSCR X-form mffs FRT (Rc=0) mffs. FRT (Rc=1) mcrfs BF,BFA 63 FRT /// /// 583 Rc 63 BF // BFA // /// 64 / 0 6 11 16 21 31 0 6 9 11 14 16 21 31 The contents of the FPSCR are placed into FRT32:63. The contents of FPSCR field BFA are copied to Condi- The contents of FRT0:31 are undefined. tion Register field BF. All exception bits copied are set to 0 in the FPSCR. If the FX bit is copied, it is set to 0 Special Registers Altered: in the FPSCR. CR1 (if Rc=1) Special Registers Altered: CR field BF FX OX (if BFA=0) UX ZX XX VXSNAN (if BFA=1) VXISI VXIDI VXZDZ VXIMZ (if BFA=2) VXVC (if BFA=3) VXSOFT VXSQRT VXCVI (if BFA=5) Move To FPSCR Field Immediate X-form Move To FPSCR Fields XFL-form mtfsfi BF,U (Rc=0) mtfsf FLM,FRB (Rc=0) mtfsfi. BF,U (Rc=1) mtfsf. FLM,FRB (Rc=1) 63 BF // /// U / 134 Rc 63 / FLM / FRB 711 Rc 0 6 9 11 16 20 21 31 0 6 7 15 16 21 31 The value of the U field is placed into FPSCR field BF. The contents of bits 32:63 of register FRB are placed into the FPSCR under control of the field mask speci- FPSCRFX is altered only if BF=0. fied by FLM. The field mask identifies the 4-bit fields Special Registers Altered: affected. Let i be an integer in the range 0-7. If FLMi=1 FPSCR field BF then FPSCR field i (FPSCR bits 4×i+32:4×i+35) is set CR1 (if Rc=1) to the contents of the corresponding field of the low-order 32 bits of register FRB. Programming Note FPSCRFX is altered only if FLM0 = 1. When FPSCR32:35 is specified, bits 32 (FX) and 35 (OX) are set to the values of U0 and U3 (i.e., even if Special Registers Altered: this instruction causes OX to change from 0 to 1, FPSCR fields selected by mask FX is set from U0 and not by the usual rule that FX CR1 (if Rc=1) is set to 1 when an exception bit changes from 0 to 1). Bits 33 and 34 (FEX and VX) are set according Programming Note to the usual rule, given on page 95, and not from Updating fewer than all eight fields of the FPSCR U1:2. may have substantially poorer performance on some implementations than updating all the fields. Programming Note When FPSCR32:35 is specified, bits 32 (FX) and 35 (OX) are set to the values of (FRB)32 and (FRB)35 (i.e., even if this instruction causes OX to change from 0 to 1, FX is set from (FRB)32 and not by the usual rule that FX is set to 1 when an exception bit changes from 0 to 1). Bits 33 and 34 (FEX and VX) are set according to the usual rule, given on page 95, and not from (FRB)33:34. Chapter 4. Floating-Point Processor [Category: Floating-Point] 131 Version 2.04 Move To FPSCR Bit 0 X-form Move To FPSCR Bit 1 X-form mtfsb0 BT (Rc=0) mtfsb1 BT (Rc=0) mtfsb0. BT (Rc=1) mtfsb1. BT (Rc=1) 63 BT /// /// 70 Rc 63 BT /// /// 38 Rc 0 6 11 16 21 31 0 6 11 16 21 31 Bit BT+32 of the FPSCR is set to 0. Bit BT+32 of the FPSCR is set to 1. Special Registers Altered: Special Registers Altered: FPSCR bit BT+32 FPSCR bits BT+32 and FX CR1 (if Rc=1) CR1 (if Rc=1) Programming Note Programming Note Bits 33 and 34 (FEX and VX) cannot be explicitly Bits 32 and 34 (FEX and VX) cannot be explicitly reset. set. 132 Power ISATM -- Book I Version 2.04 Chapter 5. Vector Processor [Category: Vector] 5.1 Vector Processor Overview . . . . . 134 5.8.6 Vector Shift Instructions . . . . . . . 158 5.2 Chapter Conventions . . . . . . . . . . 134 5.9 Vector Integer Instructions . . . . . . 160 5.2.1 Description of Instruction Operation 5.9.1 Vector Integer Arithmetic Instructions 134 160 5.3 Vector Processor Registers . . . . . 135 5.9.1.1 Vector Integer Add Instructions 160 5.3.1 Vector Registers . . . . . . . . . . . . 135 5.9.1.2 Vector Integer Subtract Instructions 5.3.2 Vector Status and Control Register . 163 135 5.9.1.3 Vector Integer Multiply Instructions 5.3.3 VR Save Register . . . . . . . . . . . 136 166 5.4 Vector Storage Access Operations 136 5.9.1.4 Vector Integer Multiply-Add/Sum 5.4.1 Accessing Unaligned Storage Oper- Instructions . . . . . . . . . . . . . . . . . . . . . 168 ands . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.9.1.5 Vector Integer Sum-Across Instruc- 5.5 Vector Integer Operations . . . . . . . 139 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 5.5.1 Integer Saturation . . . . . . . . . . . 139 5.9.1.6 Vector Integer Average Instructions 5.6 Vector Floating-Point Operations . 140 175 5.6.1 Floating-Point Overview. . . . . . . 140 5.9.1.7 Vector Integer Maximum and Mini- 5.6.2 Floating-Point Exceptions . . . . . 140 mum Instructions . . . . . . . . . . . . . . . . . 177 5.6.2.1 NaN Operand Exception. . . . . 141 5.9.2 Vector Integer Compare Instructions 5.6.2.2 Invalid Operation Exception . . 141 181 5.6.2.3 Zero Divide Exception . . . . . . 141 5.9.3 Vector Logical Instructions . . . . . 184 5.6.2.4 Log of Zero Exception . . . . . . 141 5.9.4 Vector Integer Rotate and Shift 5.6.2.5 Overflow Exception. . . . . . . . . 141 Instructions . . . . . . . . . . . . . . . . . . . . . 185 5.6.2.6 Underflow Exception. . . . . . . . 142 5.10 Vector Floating-Point Instruction Set . 5.7 Vector Storage Access Instructions142 189 5.7.1 Storage Access Exceptions . . . . 142 5.10.1 Vector Floating-Point Arithmetic 5.7.2 Vector Load Instructions . . . . . . 143 Instructions . . . . . . . . . . . . . . . . . . . . . 189 5.7.3 Vector Store Instructions . . . . . . 146 5.10.2 Vector Floating-Point Maximum and 5.7.4 Vector Alignment Support Instruc- Minimum Instructions. . . . . . . . . . . . . . 191 tions . . . . . . . . . . . . . . . . . . . . . . . . . . 148 5.10.3 Vector Floating-Point Rounding and 5.8 Vector Permute and Formatting Conversion Instructions . . . . . . . . . . . . 192 Instructions . . . . . . . . . . . . . . . . . . . . . 149 5.10.4 Vector Floating-Point Compare 5.8.1 Vector Pack and Unpack Instructions Instructions . . . . . . . . . . . . . . . . . . . . . 195 149 5.10.5 Vector Floating-Point Estimate 5.8.2 Vector Merge Instructions . . . . . 154 Instructions . . . . . . . . . . . . . . . . . . . . . 197 5.8.3 Vector Splat Instructions . . . . . . 156 5.11 Vector Status and Control Register 5.8.4 Vector Permute Instruction . . . . 157 Instructions . . . . . . . . . . . . . . . . . . . . . 199 5.8.5 Vector Select Instruction . . . . . . 157 Chapter 5. Vector Processor [Category: Vector] 133 Version 2.04 5.1 Vector Processor Overview Clamp(x, y, z) x is interpreted as a signed integer. If the This chapter describes the registers and instructions value of x is less than y, then the value y is that make up the Vector Processor facility. returned, else if the value of x is greater than z, the value z is returned, else the value x is returned. 5.2 Chapter Conventions if (x < y) then result 1 y VSCRSAT 1 1 5.2.1 Description of Instruction else if (x > z) then result 1 z Operation VSCRSAT 1 1 else result 1 x The following notation, in addition to that described in RoundToSPIntCeil(x) Section 1.3.2, is used in this chapter. Additional RTL The value x if x is a single-precision float- functions are described in Appendix B. ing-point integer; otherwise the smallest Notation Meaning single-precision floating-point integer that x?y:z if the value of x is true, then the value of y, is greater than x. otherwise the value z. RoundToSPIntFloor(x) +int Integer addition. The value x if x is a single-precision float- +fp Floating-point addition. ing-point integer; otherwise the largest sin- ­fp Floating-point subtraction. gle-precision floating-point integer that is ×sui Multiplication of a signed-integer (first less than x. operand) by an unsigned-integer (second RoundToSPIntNear(x) operand). The value x if x is a single-precision float- ×fp Floating-point multiplication. ing-point integer; otherwise the single-pre- =int Integer equals relation. cision floating-point integer that is nearest =fp Floating-point equals relation. in value to x (in case of a tie, the even sin- ui, ui gle-precision floating-point integer is Unsigned-integer comparison relations. used). si, si RoundToSPIntTrunc(x) Signed-integer comparison relations. The value x if x is a single-precision float- fp, fp ing-point integer; otherwise the largest sin- Floating-point comparison relations. gle-precision floating-point integer that is LENGTH( x ) Length of x, in bits. If x is the word "ele- less than x if x>0, or the smallest sin- ment", LENGTH( x ) is the length, in bits, gle-precision floating-point integer that is of the element implied by the instruction greater than x if x<0. mnemonic. RoundToNearSP(x) x << y Result of shifting x left by y bits, filling The single-precision floating-point number vacated bits with zeros. that is nearest in value to the infinitely-pre- b 1 LENGTH(x) cise floating-point intermediate result x (in result 1 (y < b) ? (xy:b-1 ||y0) : b0 case of a tie, the single-precision float- x >>ui y Result of shifting x right by y bits, filling ing-point value with the least-significant bit vacated bits with zeros. equal to 0 is used). b 1 LENGTH(x) ReciprocalEstimateSP(x) result 1 (y < b) ? (y0 || x0:(b-y)-1) : b0 A single-precision floating-point estimate x >> y Result of shifting x right by y bits, filling of the reciprocal of the single-precision vacated bits with copies of bit 0 (sign bit) floating-point number x. of x. ReciprocalSquareRootEstimateSP(x) b 1 LENGTH(x) A single-precision floating-point estimate result 1 (y>ui ( shb || 0b000 ) do i=0 to 127 by 8 t 1 t & ((VRB)i+5:i+7=sh) The contents of VRA are shifted right by the number of if t=1 then VRT 1 (VRA) >>ui sh bytes specified in (VRB)121:124. else VRT 1 undefined - Bytes shifted out of byte 15 are lost. - Zeros are supplied to the vacated bytes on the The contents of VRA are shifted right by the number of left. bits specified in (VRB)125:127. - Bits shifted out of bit 127 are lost. The result is placed into VRT. - Zeros are supplied to the vacated bits on the Special Registers Altered: left. None The result is place into VRT, except if, for any byte ele- ment in register VRB, the low-order 3 bits are not equal to the shift amount, then VRT is undefined. Special Registers Altered: None Programming Note A double-register shift by a dynamically specified number of bits (0-127) can be performed in six instructions. The following example shifts Vw || Vx left by the number of bits specified in Vy and places the high-order 128 bits of the result into Vz. vslo Vt1,Vw,Vy #shift high-order reg left vsl Vt1,Vt1,Vy vsububm Vt3,V0,Vy #adjust shift count ((V0)=0) vsro Vt2,Vx,Vt3 #shift low-order reg right vsr Vt2,Vt2,Vt3 vor Vz,Vt1,Vt2 #merge to get final result Chapter 5. Vector Processor [Category: Vector] 159 Version 2.04 5.9 Vector Integer Instructions 5.9.1 Vector Integer Arithmetic Instructions 5.9.1.1 Vector Integer Add Instructions Vector Add and Write Carry-out Unsigned Vector Add Signed Byte Saturate VX-form Word VX-form vaddsbs VRT,VRA,VRB vaddcuw VRT,VRA,VRB 4 VRT VRA VRB 768 4 VRT VRA VRB 384 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 32 aop 1 EXTS(VRAi:i+7) aop 1 EXTZ((VRA)i:i+31) bop 1 EXTS(VRBi:i+7) bop 1 EXTZ((VRB)i:i+31) VRTi:i+7 1 Clamp( aop +int bop, -128, 127 )24:31 VRTi:i+31 1 Chop( ( aop +int bop ) >>ui 32,1) For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 3, do the following. Signed-integer byte element i in VRA is added to Unsigned-integer word element i in VRA is added signed-integer byte element i in VRB. to unsigned-integer word element i in VRB. The - If the sum is greater than 127 the result carry out of the 32-bit sum is zero-extended to 32 saturates to 127. bits and placed into word element i of VRT. - If the sum is less than -128 the result sat- urates to -128. Special Registers Altered: None The low-order 8 bits of the result are placed into byte element i of VRT. Special Registers Altered: SAT Vector Add Signed Halfword Saturate Vector Add Signed Word Saturate VX-form VX-form vaddshs VRT,VRA,VRB vaddsws VRT,VRA,VRB 4 VRT VRA VRB 832 4 VRT VRA VRB 896 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 16 do i=0 to 127 by 32 aop 1 EXTS((VRA)i:i+15) aop 1 EXTS((VRA)i:i+31) bop 1 EXTS((VRB)i:i+15) bop 1 EXTS((VRB)i:i+31) VRTi:i+15 VRTi:i+31 1 Clamp(aop +int bop, -231, 231-1) 21 Clamp(aop +int bop, -215, 215-1)16:31 For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 7, do the following. Signed-integer word element i in VRA is added to Signed-integer halfword element i in VRA is added signed-integer word element i in VRB. to signed-integer halfword element i in VRB. - If the sum is greater than 231-1 the result - If the sum is greater than 215-1 the result saturates to 231-1. saturates to 215-1 - If the sum is less than -231 the result satu- - If the sum is less than -215 the result satu- rates to -231. rates to -215. The low-order 32 bits of the result are placed into The low-order 16 bits of the result are placed into word element i of VRT. halfword element i of VRT. Special Registers Altered: Special Registers Altered: SAT SAT 160 Power ISATM -- Book I Version 2.04 Vector Add Unsigned Byte Modulo Vector Add Unsigned Halfword Modulo VX-form VX-form vaddubm VRT,VRA,VRB vadduhm VRT,VRA,VRB 4 VRT VRA VRB 0 4 VRT VRA VRB 64 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 aop 1 EXTZ((VRA)i:i+7) aop 1 EXTZ((VRA)i:i+15) bop 1 EXTZ((VRB)i:i+7) bop 1 EXTZ((VRB)i:i+15) VRTi:i+7 1 Chop( aop +int bop, 8 ) VRTi:i+15 1 Chop( aop +int bop, 16 ) For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRA is added Unsigned-integer halfword element i in VRA is to unsigned-integer byte element i in VRB. added to unsigned-integer halfword element i in VRB. The low-order 8 bits of the result are placed into byte element i of VRT. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: None Special Registers Altered: None Programming Note vaddubm can be used for unsigned or signed-inte- Programming Note gers. vadduhm can be used for unsigned or signed-inte- gers. Vector Add Unsigned Word Modulo VX-form vadduwm VRT,VRA,VRB 4 VRT VRA VRB 128 0 6 11 16 21 31 do i=0 to 127 by 32 aop 1 EXTZ((VRA)i:i+31) bop 1 EXTZ((VRB)i:i+31) temp 1 aop +int bop VRTi:i+31 1 Chop( aop +int bop, 32 ) For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is added to unsigned-integer word element i in VRB. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: None Programming Note vadduwm can be used for unsigned or signed-inte- gers. Chapter 5. Vector Processor [Category: Vector] 161 Version 2.04 Vector Add Unsigned Byte Saturate Vector Add Unsigned Halfword Saturate VX-form VX-form vaddubs VRT,VRA,VRB vadduhs VRT,VRA,VRB 4 VRT VRA VRB 512 4 VRT VRA VRB 576 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 aop 1 EXTZ((VRA)i:i+7) aop 1 EXTZ((VRA)i:i+15) bop 1 EXTZ((VRB)i:i+7) bop 1 EXTZ((VRB)i:i+15) VRTi:i+7 1 Clamp( aop +int bop, 0, 255 )24:31 VRTi:i+15 1 Clamp(aop +int bop, 0, 216-1)16:31 For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRA is added Unsigned-integer halfword element i in VRA is to unsigned-integer byte element i in VRB. added to unsigned-integer halfword element i in - If the sum is greater than 255 the result VRB. saturates to 255. - If the sum is greater than 216-1 the result saturates to 216-1. The low-order 8 bits of the result are placed into byte element i of VRT. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: SAT Special Registers Altered: SAT Vector Add Unsigned Word Saturate VX-form vadduws VRT,VRA,VRB 4 VRT VRA VRB 640 0 6 11 16 21 31 do i=0 to 127 by 32 aop 1 EXTZ((VRA)i:i+31) bop 1 EXTZ((VRB)i:i+31) VRTi:i+31 1 Clamp(aop +int bop, 0, 232-1) For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is added to unsigned-integer word element i in VRB. - If the sum is greater than 232-1 the result saturates to 232-1. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: SAT 162 Power ISATM -- Book I Version 2.04 5.9.1.2 Vector Integer Subtract Instructions Vector Subtract and Write Carry-Out Vector Subtract Signed Byte Saturate Unsigned Word VX-form VX-form vsubcuw VRT,VRA,VRB vsubsbs VRT,VRA,VRB 4 VRT VRA VRB 1408 4 VRT VRA VRB 1792 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 8 aop 1 (VRA)i:i+31 aop 1 EXTS((VRA)i:i+7) bop 1 (VRB)i:i+31 bop 1 EXTS((VRB)i:i+7) temp 1 (EXTZ(aop) +int EXTZ(¬bop) +int 1) >> 32 VRTi:i+7 1 VRTi:i+31 1 temp & 0x0000_0001 Clamp(aop +int ¬bop +int 1, -128, 127)24:31 For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 15, do the following. Unsigned-integer word element i in VRB is sub- Signed-integer byte element i in VRB is subtracted tracted from unsigned-integer word element i in from signed-integer byte element i in VRA. VRA. The complement of the borrow out of bit 0 of - If the intermediate result is greater than the 32-bit difference is zero-extended to 32 bits 127 the result saturates to 127. and placed into word element i of VRT. - If the intermediate result is less than -128 the result saturates to -128. Special Registers Altered: None The low-order 8 bits of the result are placed into byte element i of VRT. Special Registers Altered: SAT Vector Subtract Signed Halfword Saturate Vector Subtract Signed Word Saturate VX-form VX-form vsubshs VRT,VRA,VRB vsubsws VRT,VRA,VRB 4 VRT VRA VRB 1856 4 VRT VRA VRB 1920 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 16 do i=0 to 127 by 32 aop 1 EXTS((VRA)i:i+15) aop 1 EXTS((VRA)i:i+31) bop 1 EXTS((VRB)i:i+15) bop 1 EXTS((VRB)i:i+31) VRTi:i+15 VRTi:i+31 1 Clamp(aop +int ¬bop +int 1,-231,231-1) 21 Clamp(aop +int ¬bop +int 1, -215, 215-1)16:31 For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 7, do the following. Signed-integer word element i in VRB is sub- Signed-integer halfword element i in VRB is sub- tracted from signed-integer word element i in VRA. tracted from signed-integer halfword element i in - If the intermediate result is greater than VRA. 231-1 the result saturates to 231-1. - If the intermediate result is greater than - If the intermediate result is less than -231 215-1 the result saturates to 215-1. the result saturates to -231. - If the intermediate result is less than -215 the result saturates to -215. The low-order 32 bits of the result are placed into word element i of VRT. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: SAT Special Registers Altered: SAT Chapter 5. Vector Processor [Category: Vector] 163 Version 2.04 Vector Subtract Unsigned Byte Modulo Vector Subtract Unsigned Halfword VX-form Modulo VX-form vsububm VRT,VRA,VRB vsubuhm VRT,VRA,VRB 4 VRT VRA VRB 1024 4 VRT VRA VRB 1088 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 aop 1 EXTZ((VRA)i:i+7) aop 1 EXTZ((VRA)i:i+15) bop 1 EXTZ((VRB)i:i+7) bop 1 EXTZ((VRB)i:i+15) VRTi:i+7 1 Chop( aop +int ¬bop +int 1, 8 ) VRTi:i+16 1 Chop( aop +int ¬bop +int 1, 16 ) For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRB is sub- Unsigned-integer halfword element i in VRB is tracted from unsigned-integer byte element i in subtracted from unsigned-integer halfword ele- VRA. The low-order 8 bits of the result are placed ment i in VRA. The low-order 16 bits of the result into byte element i of VRT. are placed into halfword element i of VRT. Special Registers Altered: Special Registers Altered: None None Vector Subtract Unsigned Word Modulo VX-form vsubuwm VRT,VRA,VRB 4 VRT VRA VRB 1152 0 6 11 16 21 31 do i=0 to 127 by 32 aop 1 EXTZ((VRA)i:i+31) bop 1 EXTZ((VRB)i:i+31) VRTi:i+31 1 Chop( aop +int ¬bop +int 1, 32 ) For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRB is sub- tracted from unsigned-integer word element i in VRA. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: None 164 Power ISATM -- Book I Version 2.04 Vector Subtract Unsigned Byte Saturate Vector Subtract Unsigned Halfword VX-form Saturate VX-form vsububs VRT,VRA,VRB vsubuhs VRT,VRA,VRB 4 VRT VRA VRB 1536 4 VRT VRA VRB 1600 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 aop 1 EXTZ((VRA)i:i+7) aop 1 EXTZ((VRA)i:i+15) bop 1 EXTZ((VRB)i:i+7) bop 1 EXTZ((VRB)i:i+15) VRTi:i+7 1 Clamp(aop +int ¬bop +int 1, 0, 255)24:31 VRTi:i+15 1 Clamp(aop +int ¬bop +int 1,0,216-1)16:31 For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRB is sub- Unsigned-integer halfword element i in VRB is tracted from unsigned-integer byte element i in subtracted from unsigned-integer halfword ele- VRA. If the intermediate result is less than 0 the ment i in VRA. If the intermediate result is less result saturates to 0. The low-order 8 bits of the than 0 the result saturates to 0. The low-order 16 result are placed into byte element i of VRT. bits of the result are placed into halfword element i of VRT. Special Registers Altered: SAT Special Registers Altered: SAT Vector Subtract Unsigned Word Saturate VX-form vsubuws VRT,VRA,VRB 4 VRT VRA VRB 1664 0 6 11 16 21 31 do i=0 to 127 by 32 aop 1 EXTZ((VRA)i:i+31) bop 1 EXTZ((VRB)i:i+31) VRTi:i+31 1 Clamp(aop +int ¬bop +int 1, 0, 232-1) For each vector element i from 0 to 7, do the following. Unsigned-integer word element i in VRB is sub- tracted from unsigned-integer word element i in VRA. - If the intermediate result is less than 0 the result saturates to 0. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: SAT Chapter 5. Vector Processor [Category: Vector] 165 Version 2.04 5.9.1.3 Vector Integer Multiply Instructions Vector Multiply Even Signed Byte Vector Multiply Even Signed Halfword VX-form VX-form vmulesb VRT,VRA,VRB vmulesh VRT,VRA,VRB 4 VRT VRA VRB 776 4 VRT VRA VRB 840 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 16 do i=0 to 127 by 32 prod 1 EXTS((VRA)i:i+7) ×si EXTS((VRB)i:i+7) prod 1 EXTS((VRA)i:i+15) ×si EXTS((VRB)i:i+15) VRTi:i+15 1 Chop( prod, 16 ) VRTi:i+31 1 Chop( prod, 32 ) For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 3, do the following. Signed-integer byte element i×2 in VRA is multi- Signed-integer halfword element i×2 in VRA is plied by signed-integer byte element i×2 in VRB. multiplied by signed-integer halfword element i×2 The low-order 16 bits of the product are placed into in VRB. The low-order 32 bits of the product are halfword element i VRT. placed into halfword element i VRT. Special Registers Altered: Special Registers Altered: None None Vector Multiply Even Unsigned Byte Vector Multiply Even Unsigned Halfword VX-form VX-form vmuleub VRT,VRA,VRB vmuleuh VRT,VRA,VRB 4 VRT VRA VRB 520 4 VRT VRA VRB 584 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 16 do i=0 to 127 by 32 prod 1 EXTZ((VRA)i:i+7) ×ui EXTZ((VRB)i:i+7) prod 1 EXTZ((VRA)i:i+15) ×ui EXTZ((VRB)i:i+15) VRTi:i+15 1 Chop(prod, 16) VRTi:i+31 1 Chop(prod, 32) For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 3, do the following. Unsigned-integer byte element i×2 in VRA is multi- Unsigned-integer halfword element i×2 in VRA is plied by unsigned-integer byte element i×2 in VRB. multiplied by unsigned-integer halfword element The low-order 16 bits of the product are placed into i×2 in VRB. The low-order 32 bits of the product halfword element i VRT. are placed into halfword element i VRT. Special Registers Altered: Special Registers Altered: None None 166 Power ISATM -- Book I Version 2.04 Vector Multiply Odd Signed Byte VX-form Vector Multiply Odd Signed Halfword VX-form vmulosb VRT,VRA,VRB vmulosh VRT,VRA,VRB 4 VRT VRA VRB 264 0 6 11 16 21 31 4 VRT VRA VRB 328 0 6 11 16 21 31 do i=0 to 127 by 16 prod 1 EXTS((VRA)i+8:i+15) ×si EXTS((VRB)i+8:i+15) do i=0 to 127 by 32 VRTi:i+15 1 Chop( prod, 16 ) prod 1 EXTS((VRA)i+16:i+31) ×si EXTS((VRB)i+16:i+31) VRTi:i+31 1 Chop( prod, 32 ) For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 3, do the following. Signed-integer byte element i×2+1 in VRA is multi- plied by signed-integer byte element i×2+1 in VRB. Signed-integer halfword element i×2+1 in VRA is The low-order 16 bits of the product are placed into multiplied by signed-integer halfword element halfword element i VRT. i×2+1 in VRB. The low-order 32 bits of the product are placed into halfword element i VRT. Special Registers Altered: None Special Registers Altered: None Vector Multiply Odd Unsigned Byte Vector Multiply Odd Unsigned Halfword VX-form VX-form vmuloub VRT,VRA,VRB vmulouh VRT,VRA,VRB 4 VRT VRA VRB 8 4 VRT VRA VRB 72 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 16 do i=0 to 127 by 32 prod 1 EXTZ((VRA)i+8:i+15) ×ui EXTZ((VRB)i+8:i+15) prod 1 EXTZ((VRA)i+16:i+31)×ui EXTZ((VRB)i+16:i+31) VRTi:i+15 1 Chop( prod, 16 ) VRTi:i+31 1 Chop( prod, 32 ) For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 3, do the following. Unsigned-integer byte element i×2+1 in VRA is Unsigned-integer halfword element i×2+1 in VRA multiplied by unsigned-integer byte element i×2+1 is multiplied by unsigned-integer halfword element in VRB. The low-order 16 bits of the product are i×2+1 in VRB. The low-order 32 bits of the product placed into halfword element i VRT. are placed into halfword element i VRT. Special Registers Altered: Special Registers Altered: None None Chapter 5. Vector Processor [Category: Vector] 167 Version 2.04 5.9.1.4 Vector Integer Multiply-Add/Sum Instructions Vector Multiply-High-Add Signed Vector Multiply-High-Round-Add Signed Halfword Saturate VA-form Halfword Saturate VA-form vmhaddshs VRT,VRA,VRB,VRC vmhraddshs VRT,VRA,VRB,VRC 4 VRT VRA VRB VRC 32 4 VRT VRA VRB VRC 33 0 6 11 16 21 26 31 0 6 11 16 21 26 31 do i=0 to 127 by 16 do i=0 to 127 by 16 prod 1 EXTS((VRA)i:i+15) ×si EXTS((VRB)i:i+15) prod 1 EXTS((VRA)i:i+15) ×si EXTS((VRB)i:i+15) sum 1 (prod >>si 15) +int EXTS((VRC)i:i+15 sum 1 ((prod +int 0x0000_4000) >>si 15) VRTi:i+15 1 Clamp(sum, -215, 215-1)16:31 +int EXTS((VRC)i:i+15) VRTi:i+15 1 Clamp(sum, -215, 215-1)16:31 For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 7, do the following. Signed-integer halfword element i in VRA is multi- plied by signed-integer halfword element i in VRB, Signed-integer halfword element i in VRA is multi- producing a 32-bit signed-integer product. Bits plied by signed-integer halfword element i in VRB, 0:16 of the product are added to signed-integer producing a 32-bit signed-integer product. The halfword element i in VRC. value 0x0000_4000 is added to the product, pro- - If the intermediate result is greater than ducing a 32-bit signed-integer sum. Bits 0:16 of the 215-1 the result saturates to 215-1. sum are added to signed-integer halfword element - If the intermediate result is less than -215 i in VRC. the result saturates to -215. - If the intermediate result is greater than 215-1 the result saturates to 215-1. The low-order 16 bits of the result are placed into - If the intermediate result is less than -215 halfword element i of VRT. the result saturates to -215. Special Registers Altered: The low-order 16 bits of the result are placed into SAT halfword element i of VRT. Special Registers Altered: SAT 168 Power ISATM -- Book I Version 2.04 Vector Multiply-Low-Add Unsigned Vector Multiply-Sum Unsigned Byte Halfword Modulo VA-form Modulo VA-form vmladduhm VRT,VRA,VRB,VRC vmsumubm VRT,VRA,VRB,VRC 4 VRT VRA VRB VRC 34 4 VRT VRA VRB VRC 36 0 6 11 16 21 26 31 0 6 11 16 21 26 31 do i=0 to 127 by 16 do i=0 to 127 by 32 prod 1 EXTZ((VRA)i:i+15) ×ui EXTZ((VRB)i:i+15) temp 1 EXTZ((VRC)i:i+31) sum 1 Chop( prod, 16 ) +int (VRC)i:i+15 do j=0 to 31 by 8 VRTi:i+15 1 Chop( sum, 16 ) prod 1 EXTZ((VRA)i+j:i+j+7) ×ui EXTZ((VRB)i+j:i+j+7) For each vector element i from 0 to 3, do the following. temp 1 temp +int prod Unsigned-integer halfword element i in VRA is VRTi:i+31 1 Chop( temp, 32 ) multiplied by unsigned-integer halfword element i For each word element in VRT the following operations in VRB, producing a 32-bit unsigned-integer prod- are performed, in the order shown. uct. The low-order 16 bits of the product are added to unsigned-integer halfword element i in VRC. - Each of the four unsigned-integer byte ele- ments contained in the corresponding word The low-order 16 bits of the sum are placed into element of VRA is multiplied by the corre- halfword element i of VRT. sponding unsigned-integer byte element in Special Registers Altered: VRB, producing an unsigned-integer halfword None product. - The sum of these four unsigned-integer half- Programming Note word products is added to the unsigned-inte- vmladduhm can be used for unsigned or ger word element in VRC. signed-integers. - The unsigned-integer word result is placed into the corresponding word element of VRT. Special Registers Altered: None Chapter 5. Vector Processor [Category: Vector] 169 Version 2.04 Vector Multiply-Sum Mixed Byte Modulo Vector Multiply-Sum Signed Halfword VA-form Modulo VA-form vmsummbm VRT,VRA,VRB,VRC vmsumshm VRT,VRA,VRB,VRC 4 VRT VRA VRB VRC 37 4 VRT VRA VRB VRC 40 0 6 11 16 21 26 31 0 6 11 16 21 26 31 do i=0 to 127 by 32 do i=0 to 127 by 32 temp 1 (VRC)i:i+31 temp 1 (VRC)i:i+31 do j=0 to 31 by 8 do j=0 to 31 by 16 prod0:15 1 (VRA)i+j:i+j+7 ×sui (VRB)i+j:i+j+7 prod0:31 1 (VRA)i+j:i+j+15 ×si (VRB)i+j:i+j+15 temp 1 temp +int EXTS(prod) temp 1 temp +int prod VRTi:i+31 1 temp VRTi:i+31 1 temp For each word element in VRT the following operations For each word element in VRT the following operations are performed, in the order shown. are performed, in the order shown. - Each of the four signed-integer byte elements - Each of the two signed-integer halfword ele- contained in the corresponding word element ments contained in the corresponding word of VRA is multiplied by the corresponding element of VRA is multiplied by the corre- unsigned-integer byte element in VRB, pro- sponding signed-integer halfword element in ducing a signed-integer product. VRB, producing a signed-integer product. - The sum of these four signed-integer halfword - The sum of these two signed-integer word products is added to the signed-integer word products is added to the signed-integer word element in VRC. element in VRC. - The signed-integer result is placed into the - The signed-integer word result is placed into corresponding word element of VRT. the corresponding word element of VRT. Special Registers Altered: Special Registers Altered: None None 170 Power ISATM -- Book I Version 2.04 Vector Multiply-Sum Signed Halfword Vector Multiply-Sum Unsigned Halfword Saturate VA-form Modulo VA-form vmsumshs VRT,VRA,VRB,VRC vmsumuhm VRT,VRA,VRB,VRC 4 VRT VRA VRB VRC 41 4 VRT VRA VRB VRC 38 0 6 11 16 21 26 31 0 6 11 16 21 26 31 do i=0 to 127 by 32 do i=0 to 127 by 32 temp 1 EXTS((VRC)i:i+31) temp 1 EXTZ((VRC)i:i+31) do j=0 to 31 by 16 do j=0 to 31 by 16 prod 1 EXTS((VRA)i+j:i+j+15) prod 1 EXTZ((VRA)i+j:i+j+15) ×si EXTS((VRB)i+j:i+j+15) ×ui EXTZ((VRB)i+j:i+j+15) temp 1 temp +int prod temp 1 temp +int prod VRTi:i+31 1 Clamp(temp, -231, 231-1) VRTi:i+31 1 Chop( temp, 32 ) For each word element in VRT the following operations For each word element in VRT the following operations are performed, in the order shown. are performed, in the order shown. - Each of the two signed-integer halfword ele- - Each of the two unsigned-integer halfword ments contained in the corresponding word elements contained in the corresponding word element of VRA is multiplied by the corre- element of VRA is multiplied by the corre- sponding signed-integer halfword element in sponding unsigned-integer halfword element VRB, producing a signed-integer product. in VRB, producing an unsigned-integer word product. - The sum of these two signed-integer word products is added to the signed-integer word - The sum of these two unsigned-integer word element in VRC. products is added to the unsigned-integer word element in VRC. - If the intermediate result is greater than 231-1 the result saturates to 231-1 and if it is less - The unsigned-integer result is placed into the than -231 it saturates to -231. corresponding word element of VRT. - The result is placed into the corresponding Special Registers Altered: word element of VRT. None Special Registers Altered: SAT Chapter 5. Vector Processor [Category: Vector] 171 Version 2.04 Vector Multiply-Sum Unsigned Halfword Saturate VA-form vmsumuhs VRT,VRA,VRB,VRC 4 VRT VRA VRB VRC 39 0 6 11 16 21 26 31 do i=0 to 127 by 32 temp 1 EXTZ((VRC)i:i+31) do j=0 to 31 by 16 prod 1 EXTZ((VRA)i+j:i+j+15) ×ui EXTZ((VRB)i+j:i+j+15) temp 1 temp +int prod VRTi:i+31 1 Clamp(temp, 0, 232-1) For each word element in VRT the following operations are performed, in the order shown. - Each of the two unsigned-integer halfword elements contained in the corresponding word element of VRA is multiplied by the corre- sponding unsigned-integer halfword element in VRB, producing an unsigned-integer prod- uct. - The sum of these two unsigned-integer word products is added to the unsigned-integer word element in VRC. - If the intermediate result is greater than 232-1 the result saturates to 232-1. - The result is placed into the corresponding word element of VRT. Special Registers Altered: SAT 172 Power ISATM -- Book I Version 2.04 5.9.1.5 Vector Integer Sum-Across Instructions Vector Sum across Signed Word Saturate Vector Sum across Half Signed Word VX-form Saturate VX-form vsumsws VRT,VRA,VRB vsum2sws VRT,VRA,VRB 4 VRT VRA VRB 1928 4 VRT VRA VRB 1672 0 6 11 16 21 31 0 6 11 16 21 31 temp 1 EXTS((VRB)96:127) do i=0 to 127 by 64 do i=0 to 127 by 32 temp 1 EXTS((VRB)i+32:i+63) temp 1 temp +int EXTS((VRA)i:i+31) do j=0 to 63 by 32 VRT0:31 1 0x0000_0000 temp 1 temp +int EXTS((VRA)i+j:i+j+31) VRT32:63 1 0x0000_0000 VRTi:i+63 1 0x0000_0000 || Clamp(temp, -231, 231-1) VRT64:95 1 0x0000_0000 VRT96:127 1 Clamp(temp, -231, 231-1) Word elements 0 and 2 of VRT are set to 0. The sum of the four signed-integer word elements in The sum of the signed-integer word elements 0 and 1 VRA is added to signed-integer word element 3 of in VRA is added to the signed-integer word element in VRB. bits 32:63 of VRB. - If the intermediate result is greater than 231-1 - If the intermediate result is greater than 231-1 the result saturates to 231-1. the result saturates to 231-1. - If the intermediate result is less than -231 the - If the intermediate result is less than -231 the result saturates to -231. result saturates to -231. The low-end 32 bits of the result are placed into word The low-order 32 bits of the result are placed into word element 3 of VRT. element 1 of VRT. Word elements 0 to 2 of VRT are set to 0. The sum of signed-integer word elements 2 and 3 in VRA is added to the signed-integer word element in Special Registers Altered: bits 96:127 of VRB. SAT - If the intermediate result is greater than 231-1 the result saturates to 231-1. - If the intermediate result is less than -231 the result saturates to -231. The low-order 32 bits of the result are placed into word element 3 of VRT. Special Registers Altered: SAT Chapter 5. Vector Processor [Category: Vector] 173 Version 2.04 Vector Sum across Quarter Signed Byte Vector Sum across Quarter Signed Saturate VX-form Halfword Saturate VX-form vsum4sbs VRT,VRA,VRB vsum4shs VRT,VRA,VRB 4 VRT VRA VRB 1800 4 VRT VRA VRB 1608 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 32 temp 1 EXTS((VRB)i:i+31) temp 1 EXTS((VRB)i:i+31) do j=0 to 31 by 8 do j=0 to 31 by 16 temp 1 temp +int EXTS((VRA)i+j:i+j+7) temp 1 temp +int EXTS((VRA)i+j:i+j+15) VRTi:i+31 1 Clamp(temp, -231, 231-1) VRTi:i+31 1 Clamp(temp, -231, 231-1) For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. The sum of the four signed-integer byte elements The sum of the two signed-integer halfword ele- contained in word element i of VRA is added to ments contained in word element i of VRA is signed-integer word element i in VRB. added to signed-integer word element i in VRB. - If the intermediate result is greater than - If the intermediate result is greater than 231-1 the result saturates to 231-1. 231-1 the result saturates to 231-1. - If the intermediate result is less than -231 - If the intermediate result is less than -231 the result saturates to -231. the result saturates to -231. The low-order 32 bits of the result are placed into The low-order 32 bits of the result are placed into word element i of VRT. the corresponding word element of VRT. Special Registers Altered: Special Registers Altered: SAT SAT Vector Sum across Quarter Unsigned Byte Saturate VX-form vsum4ubs VRT,VRA,VRB 4 VRT VRA VRB 1544 0 6 11 16 21 31 do i=0 to 127 by 32 temp 1 EXTZ((VRB)i:i+31) do j=0 to 31 by 8 temp 1 temp +int EXTZ((VRA)i+j:i+j+7) VRTi:i+31 1 Clamp( temp, 0, 232-1 ) For each vector element i from 0 to 3, do the following. The sum of the four unsigned-integer byte ele- ments contained in word element i of VRA is added to unsigned-integer word element i in VRB. - If the intermediate result is greater than 232-1 it saturates to 232-1. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: SAT 174 Power ISATM -- Book I Version 2.04 5.9.1.6 Vector Integer Average Instructions Vector Average Signed Byte VX-form Vector Average Signed Halfword VX-form vavgsb VRT,VRA,VRB vavgsh VRT,VRA,VRB 4 VRT VRA VRB 1282 4 VRT VRA VRB 1346 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 aop 1 EXTS((VRA)i:i+7) aop 1 EXTS((VRA)i:i+15) bop 1 EXTS((VRB)i:i+7) bop 1 EXTS((VRB)i:i+15) VRTi:i+7 1 Chop(( aop +int bop +int 1 ) >> 1, 8) VRTi:i+15 1 Chop(( aop +int bop +int 1 ) >> 1, 16) For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Signed-integer byte element i in VRA is added to Signed-integer halfword element i in VRA is added signed-integer byte element i in VRB. The sum is to signed-integer halfword element i in VRB. The incremented by 1 and then shifted right 1 bit. sum is incremented by 1 and then shifted right 1 bit. The low-order 8 bits of the result are placed into byte element i of VRT. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: None Special Registers Altered: None Vector Average Signed Word VX-form vavgsw VRT,VRA,VRB 4 VRT VRA VRB 1410 0 6 11 16 21 31 do i=0 to 127 by 32 aop 1 EXTS((VRA)i:i+31) bop 1 EXTS((VRB)i:i+31) VRTi:i+31 1 Chop(( aop +int bop +int 1 ) >> 1, 32) For each vector element i from 0 to 3, do the following. Signed-integer word element i in VRA is added to signed-integer word element i in VRB. The sum is incremented by 1 and then shifted right 1 bit. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: None Chapter 5. Vector Processor [Category: Vector] 175 Version 2.04 Vector Average Unsigned Byte VX-form Vector Average Unsigned Halfword VX-form vavgub VRT,VRA,VRB vavguh VRT,VRA,VRB 4 VRT VRA VRB 1026 0 6 11 16 21 31 4 VRT VRA VRB 1090 0 6 11 16 21 31 do i=0 to 127 by 8 aop 1 EXTZ((VRA)i:i+7) do i=0 to 127 by 16 bop 1 EXTZ((VRB)i:i+7 aop 1 EXTZ((VRA)i:i+15) VRTi:i+7 1 Chop((aop +int bop +int 1) >>ui 1, 8) bop 1 EXTZ((VRB)i:i+15) VRTi:i+15 1 Chop((aop +int bop +int 1) >>ui 1, 16) For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRA is added to unsigned-integer byte element i in VRB. The Unsigned-integer halfword element i in VRA is sum is incremented by 1 and then shifted right 1 added to unsigned-integer halfword element i in bit. VRB. The sum is incremented by 1 and then shifted right 1 bit. The low-order 8 bits of the result are placed into byte element i of VRT. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: None Special Registers Altered: None Vector Average Unsigned Word VX-form vavguw VRT,VRA,VRB 4 VRT VRA VRB 1154 0 6 11 16 21 31 do i=0 to 127 by 32 aop 1 EXTZ((VRA)i:i+31) bop 1 EXTZ((VRB)i:i+31) VRTi:i+31 1 Chop((aop +int bop +int 1) >>ui 1, 32) For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is added to unsigned-integer word element i in VRB. The sum is incremented by 1 and then shifted right 1 bit. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: None 176 Power ISATM -- Book I Version 2.04 5.9.1.7 Vector Integer Maximum and Minimum Instructions Vector Maximum Signed Byte VX-form Vector Maximum Signed Halfword VX-form vmaxsb VRT,VRA,VRB vmaxsh VRT,VRA,VRB 4 VRT VRA VRB 258 0 6 11 16 21 31 4 VRT VRA VRB 322 0 6 11 16 21 31 do i=0 to 127 by 8 aop 1 EXTS((VRA)i:i+7) do i=0 to 127 by 16 bop 1 EXTS((VRB)i:i+7) aop 1 EXTS((VRA)i:i+15) VRTi:i+7 1 ( aop >si bop ) bop 1 EXTS((VRB)i:i+15 ? (VRA)i:i+7 : (VRB)i:i+7 VRTi:i+15 1 ( aop >si bop ) ? (VRA)i:i+15 : (VRB)i:i+15 For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Signed-integer byte element i in VRA is compared to signed-integer byte element i in VRB. The larger Signed-integer halfword element i in VRA is com- of the two values is placed into byte element i of pared to signed-integer halfword element i in VRB. VRT. The larger of the two values is placed into halfword element i of VRT. Special Registers Altered: None Special Registers Altered: None Vector Maximum Signed Word VX-form vmaxsw VRT,VRA,VRB 4 VRT VRA VRB 386 0 6 11 16 21 31 do i=0 to 127 by 32 aop 1 EXTS((VRA)i:i+31) bop 1 EXTS((VRB)i:i+31) VRTi:i+31 1 ( aop >si bop ) ? (VRA)i:i+31 : (VRB)i:i+31 For each vector element i from 0 to 3, do the following. Signed-integer word element i in VRA is compared to signed-integer word element i in VRB. The larger of the two values is placed into word ele- ment i of VRT. Special Registers Altered: None Chapter 5. Vector Processor [Category: Vector] 177 Version 2.04 Vector Maximum Unsigned Byte VX-form Vector Maximum Unsigned Halfword VX-form vmaxub VRT,VRA,VRB vmaxuh VRT,VRA,VRB 4 VRT VRA VRB 2 0 6 11 16 21 31 4 VRT VRA VRB 66 0 6 11 16 21 31 do i=0 to 127 by 8 aop 1 EXTZ((VRA)i:i+7) do i=0 to 127 by 16 bop 1 EXTZ((VRB)i:i+7) aop 1 EXTZ((VRA)i:i+15) VRTi:i+7 1 (aop >ui bop) ? (VRA)i:i+7 : (VRB)i:i+7 bop 1 EXTZ((VRB)i:i+15) VRTi:i+15 1 (aop >ui bop) For each vector element i from 0 to 15, do the following. ? (VRA)i:i+15 : (VRB)i:i+15 Unsigned-integer byte element i in VRA is com- For each vector element i from 0 to 7, do the following. pared to unsigned-integer byte element i in VRB. The larger of the two values is placed into byte ele- Unsigned-integer halfword element i in VRA is ment i of VRT. compared to unsigned-integer halfword element i in VRB. The larger of the two values is placed into Special Registers Altered: halfword element i of VRT. None Special Registers Altered: None Vector Maximum Unsigned Word VX-form vmaxuw VRT,VRA,VRB 4 VRT VRA VRB 130 0 6 11 16 21 31 do i=0 to 127 by 32 aop 1 EXTZ((VRA)i:i+31) bop 1 EXTZ((VRB)i:i+31) VRTi:i+31 1 (aop >ui bop) ? (VRA)i:i+31 : (VRB)i:i+31 For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is com- pared to unsigned-integer word element i in VRB. The larger of the two values is placed into word element i of VRT. Special Registers Altered: None 178 Power ISATM -- Book I Version 2.04 Vector Minimum Signed Byte VX-form Vector Minimum Signed Halfword VX-form vminsb VRT,VRA,VRB vminsh VRT,VRA,VRB 4 VRT VRA VRB 770 0 6 11 16 21 31 4 VRT VRA VRB 834 0 6 11 16 21 31 do i=0 to 127 by 8 aop 1 EXTS((VRA)i:i+7) do i=0 to 127 by 16 bop 1 EXTS((VRB)i:i+7) aop 1 EXTS((VRA)i:i+15) VRTi:i+7 1 (aop si (VRB)i:i+7) ? 81 : 80 if Rc=1 then do if Rc=1 then do t 1 (VRT=1281) t 1 (VRT=1281) f 1 (VRT=1280) f 1 (VRT=1280) CR6 1 t || 0b0 || f || 0b0 CR6 1 t || 0b0 || f || 0b0 For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 15, do the following. Unsigned-integer word element i in VRA is com- Signed-integer byte element i in VRA is compared pared to unsigned-integer word element i in VRB. to signed-integer byte element i in VRB. Byte ele- Word element i in VRT is set to all 1s if ment i in VRT is set to all 1s if signed-integer byte unsigned-integer word element i in VRA is equal to element i in VRA is greater than to signed-integer unsigned-integer word element i in VRB, and is set byte element i in VRB, and is set to all 0s other- to all 0s otherwise. wise. Special Registers Altered: Special Registers Altered: CR6 (if Rc=1) CR6 (if Rc=1) Vector Compare Greater Than Signed Vector Compare Greater Than Signed Halfword VC-form Word VC-form vcmpgtsh VRT,VRA,VRB (Rc=0) vcmpgtsw VRT,VRA,VRB (Rc=0) vcmpgtsh. VRT,VRA,VRB (Rc=1) vcmpgtsw. VRT,VRA,VRB (Rc=1) 4 VRT VRA VRB Rc 838 4 VRT VRA VRB Rc 902 0 6 11 16 21 22 31 0 6 11 16 21 22 31 do i=0 to 127 by 16 do i=0 to 127 by 32 VRTi:i+15 1 ((VRA)i:i+15 >si (VRB)i:i+15) ? 161 : 160 VRTi:i+31 1 ((VRA)i:i+31 >si (VRB)i:i+31) ? 321 : 320 if Rc=1 then do if Rc=1 then do t 1 (VRT=1281) t 1 (VRT=1281) f 1 (VRT=1280) f 1 (VRT=1280) CR6 1 t || 0b0 || f || 0b0 CR6 1 t || 0b0 || f || 0b0 For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 3, do the following. Signed-integer halfword element i in VRA is com- Signed-integer word element i in VRA is compared pared to signed-integer halfword element i in VRB. to signed-integer word element i in VRB. Word ele- Halfword element i in VRT is set to all 1s if ment i in VRT is set to all 1s if signed-integer word signed-integer halfword element i in VRA is greater element i in VRA is greater than signed-integer than signed-integer halfword element i in VRB, and word element i in VRB, and is set to all 0s other- is set to all 0s otherwise. wise. Special Registers Altered: Special Registers Altered: CR6 (if Rc=1) CR6 (if Rc=1) 182 Power ISATM -- Book I Version 2.04 Vector Compare Greater Than Unsigned Vector Compare Greater Than Unsigned Byte VC-form Halfword VC-form vcmpgtub VRT,VRA,VRB (Rc=0) vcmpgtuh VRT,VRA,VRB (Rc=0) vcmpgtub. VRT,VRA,VRB (Rc=1) vcmpgtuh. VRT,VRA,VRB (Rc=1) 4 VRT VRA VRB Rc 518 4 VRT VRA VRB Rc 582 0 6 11 16 21 22 31 0 6 11 16 21 22 31 do i=0 to 127 by 8 do i=0 to 127 by 16 VRTi:i+7 1 ((VRA)i:i+7 >ui (VRB)i:i+7) ? 81 : 80 VRTi:i+15 1 ((VRA)i:i+15 >ui (VRB)i:i+15) ? 161 : 160 if Rc=1 then do if Rc=1 then do t 1 (VRT=1281) t 1 (VRT=1281) f 1 (VRT=1280) f 1 (VRT=1280) CR6 1 t || 0b0 || f || 0b0 CR6 1 t || 0b0 || f || 0b0 For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRA is com- Unsigned-integer halfword element i in VRA is pared to unsigned-integer byte element i in VRB. compared to unsigned-integer halfword element i Byte element i in VRT is set to all 1s if in VRB. Halfword element i in VRT is set to all 1s if unsigned-integer byte element i in VRA is greater unsigned-integer halfword element i in VRA is than to unsigned-integer byte element i in VRB, greater than to unsigned-integer halfword element and is set to all 0s otherwise. i in VRB, and is set to all 0s otherwise. Special Registers Altered: Special Registers Altered: CR6 (if Rc=1) CR6 (if Rc=1) Vector Compare Greater Than Unsigned Word VC-form vcmpgtuw VRT,VRA,VRB (Rc=0) vcmpgtuw. VRT,VRA,VRB (Rc=1) 4 VRT VRA VRB Rc 646 0 6 11 16 21 22 31 do i=0 to 127 by 32 VRTi:i+31 1 ((VRA)i:i+31 >ui (VRB)i:i+31) ? 321 : 320 if Rc=1 then do t 1 (VRT=1281) f 1 (VRT=1280) CR6 1 t || 0b0 || f || 0b0 For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is com- pared to unsigned-integer word element i in VRB. Word element i in VRT is set to all 1s if unsigned-integer word element i in VRA is greater than to unsigned-integer word element i in VRB, and is set to all 0s otherwise. Special Registers Altered: CR6 (if Rc=1) Chapter 5. Vector Processor [Category: Vector] 183 Version 2.04 5.9.3 Vector Logical Instructions Extended mnemonics for vector logi- Vector Logical AND with Complement cal operations VX-form Extended mnemonics are provided that use the Vector vandc VRT,VRA,VRB OR and Vector NOR instructions to copy the contents of one Vector Register to another, with and without 4 VRT VRA VRB 1092 complementing. These are shown as examples with the 0 6 11 16 21 31 two instructions. Vector Move Register VRT 1 (VRA) & ¬(VRB) Several vector instructions can be coded in a way The contents of VRA are ANDed with the complement such that they simply copy the contents of one of the contents of VRB and the result is placed into Vector Register to another. An extended mne- VRT. monic is provided to convey the idea that no com- Special Registers Altered: putation is being performed but merely data None movement (from one register to another). The following instruction copies the contents of Vector Logical NOR VX-form register Vy to register Vx. vnor VRT,VRA,VRB vmr Vx,Vy (equivalent to: vor Vx,Vy,Vy) 4 VRT VRA VRB 1284 Vector Complement Register 0 6 11 16 21 31 The Vector NOR instruction can be coded in a way such that it complements the contents of one Vec- VRT 1 ¬( (VRA) | (VRB) ) tor Register and places the result into another Vec- The contents of VRA are ORed with the contents of tor Register. An extended mnemonic is provided VRB and the complemented result is placed into VRT. that allows this operation to be coded easily. Special Registers Altered: The following instruction complements the con- None tents of register Vy and places the result into regis- ter Vx. Vector Logical OR VX-form vnot Vx,Vy (equivalent to: vnor Vx,Vy,Vy) vor VRT,VRA,VRB Vector Logical AND VX-form 4 VRT VRA VRB 1156 vand VRT,VRA,VRB 0 6 11 16 21 31 4 VRT VRA VRB 1028 VRT 1 (VRA) | (VRB) 0 6 11 16 21 31 The contents of VRA are ORed with the contents of VRT 1 (VRA) & (VRB) VRB and the result is placed into VRT. The contents of VRA are ANDed with the contents of Special Registers Altered: VRB and the result is placed into VRT. None Special Registers Altered: Vector Logical XOR VX-form None vxor VRT,VRA,VRB 4 VRT VRA VRB 1220 0 6 11 16 21 31 VRT 1 (VRA) (VRB) The contents of VRA are XORed with the contents of VRB and the result is placed into VRT. Special Registers Altered: None 184 Power ISATM -- Book I Version 2.04 5.9.4 Vector Integer Rotate and Shift Instructions Vector Rotate Left Byte VX-form Vector Rotate Left Halfword VX-form vrlb VRT,VRA,VRB vrlh VRT,VRA,VRB 4 VRT VRA VRB 4 4 VRT VRA VRB 68 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 sh 1 (VRB)i+5:i+7 sh 1 (VRB)i+12:i+15 VRTi:i+7 1 (VRA)i:i+7 <<< sh VRTi:i+15 1 (VRA)i:i+15 <<< sh For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Byte element i in VRA is rotated left by the number Halfword element i in VRA is rotated left by the of bits specified in the low-order 3 bits of the corre- number of bits specified in the low-order 4 bits of sponding byte element i in VRB. the corresponding halfword element i in VRB. The result is placed into byte element i in VRT. The result is placed into halfword element i in VRT. Special Registers Altered: Special Registers Altered: None None Vector Rotate Left Word VX-form vrlw VRT,VRA,VRB 4 VRT VRA VRB 132 0 6 11 16 21 31 do i=0 to 127 by 32 sh 1 (VRB)i+27:i+31 VRTi:i+31 1 (VRA)i:i+31 <<< sh For each vector element i from 0 to 3, do the following. Word element i in VRA is rotated left by the num- ber of bits specified in the low-order 5 bits of the corresponding word element i in VRB. The result is placed into word element i in VRT. Special Registers Altered: None Chapter 5. Vector Processor [Category: Vector] 185 Version 2.04 Vector Shift Left Byte VX-form Vector Shift Left Halfword VX-form vslb VRT,VRA,VRB vslh VRT,VRA,VRB 4 VRT VRA VRB 260 4 VRT VRA VRB 324 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 sh 1 (VRB)i+5:i+7 sh 1 (VRB)i+12:i+15 VRTi:i+7 1 (VRA)i:i+7 << sh VRTi:i+15 1 (VRA)i:i+15 << sh For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Byte element i in VRA is shifted left by the number Halfword element i in VRA is shifted left by the of bits specified in the low-order 3 bits of byte ele- number of bits specified in the low-order 4 bits of ment i in VRB. halfword element i in VRB. - Bits shifted out of bit 0 are lost. - Bits shifted out of bit 0 are lost. - Zeros are supplied to the vacated bits on - Zeros are supplied to the vacated bits on the right. the right. The result is placed into byte element i of VRT. The result is placed into halfword element i of VRT. Special Registers Altered: Special Registers Altered: None None Vector Shift Left Word VX-form vslw VRT,VRA,VRB 4 VRT VRA VRB 388 0 6 11 16 21 31 do i=0 to 127 by 32 sh 1 (VRB)i+27:i+31 VRTi:i+31 1 (VRA)i:i+31 << sh For each vector element i from 0 to 3, do the following. Word element i in VRA is shifted left by the number of bits specified in the low-order 5 bits of word ele- ment i in VRB. - Bits shifted out of bit 0 are lost. - Zeros are supplied to the vacated bits on the right. The result is placed into word element i of VRT. Special Registers Altered: None 186 Power ISATM -- Book I Version 2.04 Vector Shift Right Byte VX-form Vector Shift Right Halfword VX-form vsrb VRT,VRA,VRB vsrh VRT,VRA,VRB 4 VRT VRA VRB 516 4 VRT VRA VRB 580 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 sh 1 (VRB)i+5:i+7 sh 1 (VRB)i+12:i+15 VRTi:i+7 1 (VRA)i:i+7 >>ui sh VRTi:i+15 1 (VRA)i:i+15 >>ui sh For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Byte element i in VRA is shifted right by the num- Halfword element i in VRA is shifted right by the ber of bits specified in the low-order 3 bits of byte number of bits specified in the low-order 4 bits of element i in VRB. Bits shifted out of the least-sig- halfword element i in VRB. Bits shifted out of the nificant bit are lost. Zeros are supplied to the least-significant bit are lost. Zeros are supplied to vacated bits on the left. The result is placed into the vacated bits on the left. The result is placed byte element i of VRT. into halfword element i of VRT. Special Registers Altered: Special Registers Altered: None None Vector Shift Right Word VX-form vsrw VRT,VRA,VRB 4 VRT VRA VRB 644 0 6 11 16 21 31 do i=0 to 127 by 32 sh 1 (VRB)i+27:i+31 VRTi:i+31 1 (VRA)i:i+31 >>ui sh For each vector element i from 0 to 3, do the following. Word element i in VRA is shifted right by the num- ber of bits specified in the low-order 5 bits of word element i in VRB. Bits shifted out of the least-sig- nificant bit are lost. Zeros are supplied to the vacated bits on the left. The result is placed into word element i of VRT. Special Registers Altered: None Chapter 5. Vector Processor [Category: Vector] 187 Version 2.04 Vector Shift Right Algebraic Byte Vector Shift Right Algebraic Halfword VX-form VX-form vsrab VRT,VRA,VRB vsrah VRT,VRA,VRB 4 VRT VRA VRB 772 4 VRT VRA VRB 836 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 sh 1 (VRB)i+5:i+7 sh 1 (VRB)i+12:i+15 VRTi:i+7 1 (VRA)i:i+7 >>si sh VRTi:i+15 1 (VRA)i:i+15 >>si sh For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Byte element i in VRA is shifted right by the num- Halfword element i in VRA is shifted right by the ber of bits specified in the low-order 3 bits of the number of bits specified in the low-order 4 bits of corresponding byte element i in VRB. Bits shifted the corresponding halfword element i in VRB. Bits out of bit 7 of the byte element are lost. Bit 0 of the shifted out of bit 15 of the halfword are lost. Bit 0 of byte element is replicated to fill the vacated bits on the halfword is replicated to fill the vacated bits on the left. The result is placed into byte element i of the left. The result is placed into halfword element i VRT. of VRT. Special Registers Altered: Special Registers Altered: None None Vector Shift Right Algebraic Word VX-form vsraw VRT,VRA,VRB 4 VRT VRA VRB 900 0 6 11 16 21 31 do i=0 to 127 by 32 sh 1 (VRB)i+27:i+31 VRTi:i+31 1 (VRA)i:i+31 >>si sh For each vector element i from 0 to 3, do the following. Word element i in VRA is shifted right by the num- ber of bits specified in the low-order 5 bits of the corresponding word element i in VRB. Bits shifted out of bit 31 of the word are lost. Bit 0 of the word is replicated to fill the vacated bits on the left. The result is placed into word element i of VRT. Special Registers Altered: None 188 Power ISATM -- Book I Version 2.04 5.10 Vector Floating-Point Instruction Set 5.10.1 Vector Floating-Point Arithmetic Instructions Vector Add Single-Precision VX-form Vector Subtract Single-Precision VX-form vaddfp VRT,VRA,VRB vsubfp VRT,VRA,VRB 4 VRT VRA VRB 10 4 VRT VRA VRB 74 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 32 VRTi:i+31 1 VRTi:i+31 1 RoundToNearSP((VRA)i:i+31 +fp (VRB)i:i+31) RoundToNearSP((VRA)i:i+31 -fp (VRB)i:i+31) For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is Single-precision floating-point element i in VRB is added to single-precision floating-point element i in subtracted from single-precision floating-point ele- VRB. The intermediate result is rounded to the ment i in VRA. The intermediate result is rounded nearest single-precision floating-point number and to the nearest single-precision floating-point num- placed into word element i of VRT. ber and placed into word element i of VRT. Special Registers Altered: Special Registers Altered: None None Chapter 5. Vector Processor [Category: Vector] 189 Version 2.04 Vector Multiply-Add Single-Precision Vector Negative Multiply-Subtract VA-form Single-Precision VA-form vmaddfp VRT,VRA,VRC,VRB vnmsubfp VRT,VRA,VRC,VRB 4 VRT VRA VRB VRC 46 4 VRT VRA VRB VRC 47 0 6 11 16 21 26 31 0 6 11 16 21 26 31 do i=0 to 127 by 32 do i=0 to 127 by 32 prod 1 (VRA)i:i+31 ×fp (VRC)i:i+31 prod0:inf21 (VRA)i:i+31 ×fp (VRC)i:i+31 VRTi:i+3121 RoundToNearSP( prod +fp (VRB)i:i+31 ) VRTi:i+31 1 -RoundToNearSP(prod0:inf -fp (VRB)i:i+31) For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is multiplied by single-precision floating-point ele- Single-precision floating-point element i in VRA is ment i in VRC. Single-precision floating-point ele- multiplied by single-precision floating-point ele- ment i in VRB is added to the infinitely-precise ment i in VRC. Single-precision floating-point ele- product. The intermediate result is rounded to the ment i in VRB is subtracted from the nearest single-precision floating-point number and infinitely-precise product. The intermediate result placed into word element i of VRT. is rounded to the nearest single-precision float- ing-point number, then negated and placed into Special Registers Altered: word element i of VRT. None Special Registers Altered: Programming Note None To use a multiply-add to perform an IEEE or Java compliant multiply, the addend must be -0.0. This is necessary to insure that the sign of a zero result will be correct when the product is -0.0 (+0.0 + -0.0 +0.0, and -0.0 + -0.0 -0.0). When the sign of a resulting 0.0 is not important, then +0.0 can be used as an addend which may, in some cases, avoid the need for a second register to hold a -0.0 in addition to the integer 0/floating-point +0.0 that may already be available. 190 Power ISATM -- Book I Version 2.04 5.10.2 Vector Floating-Point Maximum and Minimum Instructions Vector Maximum Single-Precision Vector Minimum Single-Precision VX-form VX-form vmaxfp VRT,VRA,VRB vminfp VRT,VRA,VRB 4 VRT VRA VRB 1034 4 VRT VRA VRB 1098 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 32 VRTi:i+31 1 ( (VRA)i:i+31 >fp (VRB)i:i+31 ) VRTi:i+31 1 ( (VRA)i:i+31 fp (VRB)i:i+31) ? 321 : 320 if Rc=1 then do if Rc=1 then do t 1 ( VRT=1281 ) t 1 ( VRT=1281 ) f 1 ( VRT=1280 ) f 1 ( VRT=1280 ) CR6 1 t || 0b0 || f || 0b0 CR6 1 t || 0b0 || f || 0b0 For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is Single-precision floating-point element i in VRA is compared to single-precision floating-point ele- compared to single-precision floating-point ele- ment i in VRB. Word element i in VRT is set to all ment i in VRB. Word element i in VRT is set to all 1s if single-precision floating-point element i in 1s if single-precision floating-point element i in VRA is greater than or equal to single-precision VRA is greater than single-precision floating-point floating-point element i in VRB, and is set to all 0s element i in VRB, and is set to all 0s otherwise. otherwise. If the source element i in VRA or the source ele- If the source element i in VRA or the source ele- ment i in VRB is a NaN, VRT is set to all 0s, indi- ment i in VRB is a NaN, VRT is set to all 0s, indi- cating "not greater than". If the source element i in cating "not greater than or equal to". If the source VRA and the source element i in VRB are both element i in VRA and the source element i in VRB infinity with the same sign, VRT is set to all 0s, indi- are both infinity with the same sign, VRT is set to cating "not greater than". all 1s, indicating "greater than or equal to". Special Registers Altered: Special Registers Altered: CR6 (if Rc=1) CR6 (if Rc=1) 196 Power ISATM -- Book I Version 2.04 5.10.5 Vector Floating-Point Estimate Instructions Vector 2 Raised to the Exponent Estimate Vector Log Base 2 Estimate Floating-Point VX-form Floating-Point VX-form vexptefp VRT,VRB vlogefp VRT,VRB 4 VRT /// VRB 394 4 VRT /// VRB 458 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 32 VRTi:i+31 1 Power2EstimateSP( (VRB)i:i+31 ) VRTi:i+31 1 LogBase2EstimateSP((VRB)i:i+31) For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. The single-precision floating-point estimate of 2 The single-precision floating-point estimate of the raised to the power of single-precision float- base 2 logarithm of single-precision floating-point ing-point element i in VRB is placed into word ele- element i in VRB is placed into the corresponding ment i of VRT. word element of VRT. Let x be any single-precision floating-point input value. Let x be any single-precision floating-point input value. Unless x< -146 or the single-precision floating-point Unless | x-1 | is less than or equal to 0.125 or the sin- result of computing 2 raised to the power x would be a gle-precision floating-point result of computing the base zero, an infinity, or a QNaN, the estimate has a relative 2 logarithm of x would be an infinity or a QNaN, the error in precision no greater than one part in 16. The estimate has an absolute error in precision (absolute most significant 12 bits of the estimate's significand are value of the difference between the estimate and the monotonic. An integral input value returns an integral infinitely precise value) no greater than 2-5. Under the value when the result is representable. same conditions, the estimate has a relative error in precision no greater than one part in 8. The result for various special cases of the source value is given below. The most significant 12 bits of the estimate's signifi- cand are monotonic. The estimate is exact if x=2y, Value Result where y is an integer between -149 and +127 inclusive. - Infinity +0 Otherwise the value placed into the element of register -0 +1 VRT may vary between implementations, and between +0 +1 different executions on the same implementation. +Infinity +Infinity NaN QNaN The result for various special cases of the source value is given below. Special Registers Altered: None Value Result - Infinity QNaN <0 QNaN -0 - Infinity +0 - Infinity +Infinity +Infinity NaN QNaN Special Registers Altered: None Chapter 5. Vector Processor [Category: Vector] 197 Version 2.04 Vector Reciprocal Estimate Vector Reciprocal Square Root Estimate Single-Precision VX-form Single-Precision VX-form vrefp VRT,VRB vrsqrtefp VRT,VRB 4 VRT /// VRB 266 4 VRT /// VRB 330 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 32 VRTi:i+31 1 ReciprocalEstimateSP( (VRB)i:i+31 ) VRTi:i+31 1 ReciprocalSquareRootEstimateSP( (VRB)i:i+31 ) For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. The single-precision floating-point estimate of the reciprocal of single-precision floating-point ele- The single-precision floating-point estimate of the ment i in VRB is placed into word element i of reciprocal of the square root of single-precision VRT. floating-point element i in VRB is placed into word element i of VRT. Unless the single-precision floating-point result of com- puting the reciprocal of a value would be a zero, an Let x be any single-precision floating-point value. infinity, or a QNaN, the estimate has a relative error in Unless the single-precision floating-point result of com- precision no greater than one part in 4096. puting the reciprocal of the square root of x would be a zero, an infinity, or a QNaN, the estimate has a relative Note that results may vary between implementations, error in precision no greater than one part in 4096. and between different executions on the same imple- mentation. Note that results may vary between implementations, and between different executions on the same imple- The result for various special cases of the source value mentation. is given below. The result for various special cases of the source value Value Result is given below. - Infinity -0 -0 - Infinity Value Result +0 + Infinity - Infinity QNaN +Infinity +0 <0 QNaN NaN QNaN -0 - Infinity +0 + Infinity Special Registers Altered: +Infinity +0 None NaN QNaN Special Registers Altered: None 198 Power ISATM -- Book I Version 2.04 5.11 Vector Status and Control Register Instructions Move To Vector Status and Control Move From Vector Status and Control Register VX-form Register VX-form mtvscr VRB mfvscr VRT 4 /// VRB 1604 4 VRT /// 1540 0 6 16 21 31 0 6 11 21 31 VSCR 1 (VRB)96:127 VRT 1 960 || (VSCR) The contents of word element 3 of VRB are placed into The contents of the VSCR are placed into word ele- the VSCR. ment 3 of VRT. Special Registers Altered: The remaining word elements in VRT are set to 0. None Special Registers Altered: None Chapter 5. Vector Processor [Category: Vector] 199 Version 2.04 200 Power ISATM -- Book I Version 2.04 Chapter 6. Signal Processing Engine (SPE) [Category: Signal Processing Engine] 6.1 Overview. . . . . . . . . . . . . . . . . . . . 201 6.3.5.2 Fractional Format . . . . . . . . . . 205 6.2 Nomenclature and Conventions . . 201 6.3.6 Computational Operations . . . . . 206 6.3 Programming Model . . . . . . . . . . . 202 6.3.7 SPE Instructions. . . . . . . . . . . . . 207 6.3.1 General Operation . . . . . . . . . . . 202 6.3.8 Saturation, Shift, and Bit Reverse 6.3.2 GPR Registers. . . . . . . . . . . . . . 202 Models . . . . . . . . . . . . . . . . . . . . . . . . . 207 6.3.3 Accumulator Register . . . . . . . . 202 6.3.8.1 Saturation . . . . . . . . . . . . . . . . 207 6.3.4 Signal Processing Embedded Float- 6.3.8.2 Shift Left . . . . . . . . . . . . . . . . . 207 ing-Point Status and Control Register 6.3.8.3 Bit Reverse . . . . . . . . . . . . . . . 207 (SPEFSCR). . . . . . . . . . . . . . . . . . . . . 202 6.3.9 SPE Instruction Set . . . . . . . . . . 208 6.3.5 Data Formats . . . . . . . . . . . . . . . 205 6.3.5.1 Integer Format . . . . . . . . . . . . 205 6.1 Overview The RTL conventions in described below are used in addition to those described in Section 1.3:Additional The Signal Processing Engine (SPE) accelerates sig- RTL functions are described in Appendix C. nal processing applications normally suited to DSP Notation Meaning operation. This is accomplished using short vectors ×sf Signed fractional multiplication. Result of (two element) within 64-bit GPRs and using single multiplying 2 signed fractional quantities instruction multiple data (SIMD) operations to perform having bit length n taking the least signifi- the requisite computations. SPE also architects an cant 2n-1 bits of the sign extended product Accumulator register to allow for back to back opera- and concatenating a 0 to the least signifi- tions without loop unrolling. cant bit forming a signed fractional result of 2n bits. Two 16-bit signed fractional quantities, a and b are multiplied, as 6.2 Nomenclature and Conven- shown below: tions ea0:31 = EXTS(a) eb0:31 = EXTS(b) Several conventions regarding nomenclature are used prod0:63 = ea X eb for SPE: eprod0:63 = EXTS(prod32:63) 1 The Signal Processing Engine category is abbrevi- result0:31 = eprod33:63 || 0b0 ated as SPE. ×gsf Guarded signed fractional multiplication. 1 Bits 0 to 31 of a 64-bit register are referenced as Result of multiplying 2 signed fractional upper word, even word or high word element of the quantities having bit length 16 taking the register. Bits 32:63 are referred to as lower word, least significant 31 bits of the sign odd word or low word element of the register. Each extended product and concatenating a 0 to half is an element of a 64-bit GPR. the least significant bit forming a guarded 1 Bits 0 to 15 and bits 32 to 47 are referenced as signed fractional result of 64 bits. Since even halfwords. Bits 16 to 31 and bits 48 to 63 are guarded signed fractional multiplication referenced as odd halfwords. produces a 64-bit result, fractional input 1 Mnemonics for SPE instructions generally begin quantities of -1 and -1 can produce +1 in with the letters `ev' (embedded vector). the intermediate product. Two 16-bit frac- tional quantities, a and b are multiplied, as shown below: Chapter 6. Signal Processing Engine (SPE) 201 Version 2.04 ea0:31 = EXTS(a) Unless otherwise specified, SPE instructions write all eb0:31 = EXTS(b) 64-bits of the destination register. prod0:63 = ea X eb eprod0:63 = EXTS(prod32:63) GPR Upper Word GPR Lower Word result0:63 = eprod1:63 || 0b0 0 32 63 << Logical shift left. x << y shifts value x left by y bits, leaving zeros in the vacated bits. Figure 66. GPR >> Logical shift right. x >> y shifts value x right by y bits, leaving zeros in the vacated 6.3.3 Accumulator Register bits. A partially visible accumulator register (ACC) is pro- vided for some SPE instructions. The accumulator is a 6.3 Programming Model 64-bit register that holds the results of the Multiply Accumulate (MAC) forms of SPE Fixed-Point instruc- tions. The accumulator allows the back-to-back execu- 6.3.1 General Operation tion of dependent MAC instructions, something that is SPE instructions generally take elements from one found in the inner loops of DSP code such as FIR and source register and operate on them with the corre- FFT filters. The accumulator is partially visible to the sponding elements of a second source register (and/or programmer in the sense that its results do not have to the accumulator) to produce results. Results are placed be explicitly read to use them. Instead they are always in the destination register and/or the accumulator. copied into a 64-bit destination GPR which is specified Instructions that are vector in nature (i.e. produce as part of the instruction. Based upon the type of results of more than one element) provide results for instruction, the accumulator can hold either a single each element that are independent of the computation 64-bit value or a vector of two 32-bit elements. of the other elements. These instructions can also be used to perform scalar DSP operations by ignoring the ACC Upper Word ACC Lower Word results of the upper 32-bit half of the register file. 0 32 63 There are no record forms of SPE instructions. As a Figure 67. Accumulator result, the meaning of bits in the CR is different than for other categories. SPE Compare instructions specify a CR field, two source registers, and the type of com- 6.3.4 Signal Processing Embed- pare: greater than, less than, or equal. Two bits of the ded Floating-Point Status and Con- CR field are written with the result of the vector com- pare, one for each element. The remaining two bits trol Register (SPEFSCR) reflect the ANDing and ORing of the vector compare Status and control for SPE uses the SPEFSCR regis- results. ter. This register is also used by the SPE.Embedded Float Scalar Double, SPE.Embedded Float Scalar Sin- gle, and SPE.Embedded Float Vector categories. Sta- 6.3.2 GPR Registers tus and control bits are shared with these categories. The SPE requires a GPR register file with thirty-two The SPEFSCR register is implemented as special pur- 64-bit registers. For 32-bit implementations, instruc- pose register (SPR) number 512 and is read and writ- tions that normally operate on a 32-bit register file ten by the mfspr and mtspr instructions. SPE access and change only the least significant 32-bits of instructions affect both the high element (bits 32:33) the GPRs leaving the most significant 32-bits and low element status flags (bits 48:49) of the SPEF- unchanged. For 64-bit implementations, operation of SCR. these instructions is unchanged, i.e. those instructions continue to operate on the 64-bit registers as they SPEFSCR would if the SPE was not implemented. Most SPE 32 63 instructions view the 64-bit register as being composed of a vector of two elements, each of which is 32 bits Figure 68. Signal Processing and Embedded wide (some instructions read or write 16-bit elements). Floating-Point Status and Control Register The most significant 32-bits are called the upper word, The SPEFSCR bits are defined as shown below. high word or even word. The least significant 32-bits are called the lower word, low word or odd word. Bit Description 32 Summary Integer Overflow High (SOVH) SOVH is set to 1 when an SPE instruction sets OVH. This is a sticky bit. 202 Power ISATM -- Book I Version 2.04 33 Integer Overflow High (OVH) Execution of an SPE.Embedded Float Scalar OVH is set to 1 to indicate that an overflow instruction leaves FDBZH undefined. has occurred in the upper element during exe- 38 Embedded Floating-Point Underflow High cution of an SPE instruction. The bit is set to 1 (FUNFH) [Category: SP.FV] if a result of an operation performed by the The FUNFH bit is set to 1 when the execution instruction cannot be represented in the num- of an SPE.Embedded Float Vector instruction ber of bits into which the result is to be placed, results in an underflow on the high word oper- and is set to 0 otherwise. The OVH bit is not ation. altered by Modulo instructions, nor by other instructions that cannot overflow. Execution of an SPE.Embedded Float Scalar instruction leaves FUNFH undefined. 34 Embedded Floating-Point Guard Bit High (FGH) [Category: SP.FV] 39 Embedded Floating-Point Overflow High FGH is supplied for use by the Embedded (FOVFH) [Category: SP.FV] Floating-Point Round interrupt handler. FGH The FOVFH bit is set to 1 when the execution is an extension of the low-order bits of the of an SPE.Embedded Float Vector instruction fractional result produced from an results in an overflow on the high word opera- SPE.Embedded Float Vector instruction on tion. the high word. FGH is zeroed if an overflow, Execution of an SPE.Embedded Float Scalar underflow, or invalid input error is detected on instruction leaves FOVFH undefined. the high element of an SPE.Embedded Float Vector instruction. 40:41 Reserved Execution of an SPE.Embedded Float Scalar 42 Embedded Floating-Point Inexact Sticky instruction leaves FGH undefined. Flag (FINXS) [Categories: SP.FV, SP.FD, SP.FS] 35 Embedded Floating-Point Inexact Bit High The FINXS bit is set to 1 whenever the execu- (FXH) [Category: SP.FV] tion of an Embedded Floating-Point instruction FXH is supplied for use by the Embedded delivers an inexact result for either the low or Floating-Point Round interrupt handler. FXH is high element and no Embedded Float- an extension of the low-order bits of the frac- ing-Point Data interrupt is taken for either ele- tional result produced from an SPE.Embed- ment, or if an Embedded Floating-Point ded Float Vector instruction on the high word. instruction results in overflow (FOVF=1 or FXH represents the logical `or' of all the bits FOVFH=1), but Embedded Floating-Point shifted right from the Guard bit when the frac- Overflow exceptions are disabled (FOVFE=0), tional result is normalized. FXH is zeroed if an or if an Embedded Floating-Point instruction overflow, underflow, or invalid input error is results in underflow (FUNF=1 or FUNFH=1), detected on the high element of an but Embedded Floating-Point Underflow SPE.Embedded Float Vector instruction. exceptions are disabled (FUNFE=0), and no Execution of an SPE.Embedded Float Scalar Embedded Floating-Point Data interrupt instruction leaves FXH undefined. occurs. This is a sticky bit. 36 Embedded Floating-Point Invalid Opera- 43 Embedded Floating-Point Invalid Opera- tion/Input Error High (FINVH) [Category: tion/Input Sticky Flag (FINVS) [Categories: SP.FV] SP.FV, SP.FD, SP.FS] The FINVH bit is set to 1 if any high word The FINVS bit is defined to be the sticky result operand of an SPE.Embedded Float Vector of any Embedded Floating-Point instruction instruction is infinity, NaN, or a denormalized that causes FINVH or FINV to be set to 1. value, or if the instruction is a divide and the That is, FINVS 1 FINVS | FINV | FINVH. This dividend and divisor are both 0, or if a conver- is a sticky bit. sion to integer or fractional value overflows. 44 Embedded Floating-Point Divide By Zero Execution of an SPE.Embedded Float Scalar Sticky Flag (FDBZS) [Categories: SP.FV, instruction leaves FINVH undefined. SP.FD, SP.FS] The FDBZS bit is set to 1 when an Embedded 37 Embedded Floating-Point Divide By Zero Floating-Point Divide instruction sets FDBZH High (FDBZH) [Category: SP.FV] or FDBZ to 1. That is, FDBZS 1 FDBZS | The FDBZH bit is set to 1 when an FDBZ | FDBZH. This is a sticky bit. SPE.Embedded Vector Floating-Point Divide instruction is executed with a divisor of 0 in the 45 Embedded Floating-Point Underflow Sticky high word operand, and the dividend is a finite Flag (FUNFS) [Categories: SP.FV, SP.FD, nonzero number. SP.FS] The FUNFS bit is defined to be the sticky Chapter 6. Signal Processing Engine (SPE) 203 Version 2.04 result of any Embedded Floating-Point instruc- or if the operation is a divide and the dividend tion that causes FUNFH or FUNF to be set to and divisor are both 0, or if a conversion to 1. That is, FUNFS 1 FUNFS | FUNF | FUNFH. integer or fractional value overflows. This is a sticky bit. 53 Embedded Floating-Point Divide By Zero 46 Embedded Floating-Point Overflow Sticky (Low/scalar) (FDBZ) [Categories: SP.FV, Flag (FOVFS) [Categories: SP.FV, SP.FD, SP.FD, SP.FS] SP.FS] The FDBZ bit is set to 1 when an Embedded The FOVFS bit is defined to be the sticky Floating-Point Divide instruction is executed result of any Embedded Floating-Point instruc- with a divisor of 0 in the low word operand, tion that causes FOVH or FOVF to be set to 1. and the dividend is a finite nonzero number. That is, FOVFS 1 FOVFS | FOVF | FOVFH. 54 Embedded Floating-Point Underflow (Low/ This is a sticky bit. scalar) (FUNF) [Categories: SP.FV, SP.FD, 47 Reserved SP.FS] The FUNF bit is set to 1 when the execution of 48 Summary Integer Overflow (SOV) an Embedded Floating-Point instruction SOV is set to 1 when an SPE instruction sets results in an underflow on the low word opera- OV to 1. This is a sticky bit. tion. 49 Integer Overflow (OV) 55 Embedded Floating-Point Overflow (Low/ OV is set to 1 to indicate that an overflow has scalar) (FOVF) [Categories: SP.FV, SP.FD, occurred in the lower element during execu- SP.FS] tion of an SPE instruction. The bit is set to 1 if The FOVF bit is set to 1 when the execution of a result of an operation performed by the an Embedded Floating-Point instruction instruction cannot be represented in the num- results in an overflow on the low word opera- ber of bits into which the result is to be placed, tion. and is set to 0 otherwise. The OV bit is not altered by Modulo instructions, nor by other 56 Reserved instructions that cannot overflow. 57 Embedded Floating-Point Round (Inexact) 50 Embedded Floating-Point Guard Bit (Low/ Exception Enable (FINXE) [Categories: scalar) (FG) [Categories: SP.FV, SP.FD, SP.FV, SP.FD, SP.FS] SP.FS] 0 Exception disabled FG is supplied for use by the Embedded 1 Exception enabled Floating-Point Round interrupt handler. FG is an extension of the low-order bits of the frac- The Embedded Floating-Point Round interrupt tional result produced from an Embedded is taken if the exception is enabled and if FG | Floating-Point instruction on the low word. FG FGH | FX | FXH (signifying an inexact result) is zeroed if an overflow, underflow, or invalid is set to 1 as a result of an Embedded Float- input error is detected on the low element of ing-Point instruction. an Embedded Floating-Point instruction. If an Embedded Floating-Point instruction 51 Embedded Floating-Point Inexact Bit (Low/ results in overflow or underflow and the corre- scalar) (FX) [Categories: SP.FV, SP.FD, sponding Embedded Floating-Point Underflow SP.FS] or Embedded Floating-Point Overflow excep- FX is supplied for use by the Embedded Float- tion is disabled then the Embedded Float- ing-Point Round interrupt handler. FX is an ing-Point Round interrupt is taken. extension of the low-order bits of the fractional 58 Embedded Floating-Point Invalid Opera- result produced from an Embedded Float- tion/Input Error Exception Enable (FINVE) ing-Point instruction on the low word. FX rep- [Categories: SP.FV, SP.FD, SP.FS] resents the logical `or' of all the bits shifted right from the Guard bit when the fractional 0 Exception disabled result is normalized. FX is zeroed if an over- 1 Exception enabled flow, underflow, or invalid input error is If the exception is enabled, an Embedded detected on Embedded Floating-Point instruc- Floating-Point Data interrupt is taken if the tion FINV or FINVH bit is set to 1 by an Embedded 52 Embedded Floating-Point Invalid Opera- Floating-Point instruction. tion/Input Error (Low/scalar) (FINV) [Cate- 59 Embedded Floating-Point Divide By Zero gories: SP.FV, SP.FD, SP.FS] Exception Enable (FDBZE) [Categories: The FINV bit is set to 1 if any low word oper- SP.FV, SP.FD, SP.FS] and of an Embedded Floating-Point instruc- tion is infinity, NaN, or a denormalized value, 0 Exception disabled 204 Power ISATM -- Book I Version 2.04 1 Exception enabled produce values larger than 2n-1 or smaller than 0 may set OV or OVH in the SPEFSCR. If the exception is enabled, an Embedded Floating-Point Data interrupt is taken if the Signed integers consist of 16, 32, or 64-bit binary val- FDBZ or FDBZH bit is set to 1 by an Embed- ues in two's complement form. The largest represent- ded Floating-Point instruction. able value is 2n-1-1 where n represents the number of 60 Embedded Floating-Point Underflow bits in the value. The smallest representable value is Exception Enable (FUNFE) [Categories: -2n-1. Computations that produce values larger than SP.FV, SP.FD, SP.FS] 2n-1-1 or smaller than -2n-1 may set OV or OVH in the SPEFSCR. 0 Exception disabled 1 Exception enabled 6.3.5.2 Fractional Format If the exception is enabled, an Embedded Floating-Point Data interrupt is taken if the Fractional data format is conventionally used for DSP FUNF or FUNFH bit is set to 1 by an Embed- fractional arithmetic. Fractional data is useful for repre- ded Floating-Point instruction. senting data converted from analog devices. 61 Embedded Floating-Point Overflow Excep- Unsigned fractions consist of 16, 32, or 64-bit binary tion Enable (FOVFE) [Categories: SP.FV, fractional values that range from 0 to less than 1. SP.FD, SP.FS] Unsigned fractions place the radix point immediately to the left of the most significant bit. The most significant 0 Exception disabled bit of the value represents the value 2-1, the next most 1 Exception enabled significant bit represents the value 2-2 and so on. The If the exception is enabled, an Embedded largest representable value is 1-2-n where n represents Floating-Point Data interrupt is taken if the the number of bits in the value. The smallest represent- FOVF or FOVFH bit is set to 1 by an Embed- able value is 0. Computations that produce values ded Floating-Point instruction. larger than 1-2-n or smaller than 0 may set OV or OVH in the SPEFSCR. The SPE category does not define 62:63 Embedded Floating-Point Rounding Mode unsigned fractional forms of instructions to manipulate Control (FRMC) [Categories: SP.FV, SP.FD, unsigned fractional data since the unsigned integer SP.FS] forms of the instructions produce the same results as 00 Round to Nearest would the unsigned fractional forms. 01 Round toward Zero Guarded unsigned fractions are 64-bit binary fractional 10 Round toward +Infinity values. Guarded unsigned fractions place the decimal 11 Round toward -Infinity point immediately to the left of bit 32. The largest repre- sentable value is 232-2-32. The smallest representable Programming Note value is 0. Guarded unsigned fractional computations Rounding modes 0b10 (+Infinity) and are always modulo and do not set OV or OVH in the 0b11 (-Infinity) may not be supported by SPEFSCR. some implementations. If an implementa- tion does not support these, Embedded Signed fractions consist of 16, 32, or 64-bit binary frac- Floating-Point Round interrupts are gener- tional values in two's-complement form that range from ated for every Embedded Floating-Point -1 to less than 1. Signed fractions place the decimal instruction for which rounding is required point immediately to the right of the most significant bit. when +Infinity or -Infinity modes are set The largest representable value is 1-2-(n-1) where n rep- and software is required to produce the resents the number of bits in the value. The smallest correctly rounded result representable value is -1. Computations that produce values larger than 1-2-(n-1)or smaller than -1 may set OV or OVH in the SPEFSCR. Multiplication of two 6.3.5 Data Formats signed fractional values causes the result to be shifted left one bit to remove the resultant redundant sign bit in The SPE provides two different data formats, integer the product. In this case, a 0 bit is concatenated as the and fractional. Both data formats can be treated as least significant bit of the shifted result. signed or unsigned quantities. Guarded signed fractions are 64-bit binary fractional values. Guarded signed fractions place the decimal 6.3.5.1 Integer Format point immediately to the left of bit 33. The largest repre- sentable value is 232-2-31. The smallest representable Unsigned integers consist of 16, 32, or 64-bit binary value is -232-1+2-31. Guarded signed fractional compu- integer values. The largest representable value is 2n-1 tations are always modulo and do not set OV or OVH in where n represents the number of bits in the value. The the SPEFSCR. smallest representable value is 0. Computations that Chapter 6. Signal Processing Engine (SPE) 205 Version 2.04 6.3.6 Computational Operations 1 Multiply and Accumulate instructions. These instructions perform multiply operations, optionally The SPE category supports several different computa- add the result to the accumulator, and place the tional capabilities. Both modulo and saturation results result into the destination register and optionally can be performed. Modulo results produce truncation of into the accumulator. These instructions are com- the overflow bits in a calculation, therefore overflow posed of different multiply forms, data formats and does not occur and no saturation is performed. For data accumulate options. The mnemonics for instructions for which overflow occurs, saturation pro- these instructions indicate their various character- vides a maximum or minimum representable value (for istics. These are shown in Table 2. the data type) in the case of overflow. Instructions are 1 Load and Store instructions. These instructions provided for a wide range of computational capability. provide load and store capabilities for moving data The operation types can be divided into 4 basic catego- to and from memory. A variety of forms are pro- ries: vided that position data for efficient computation. 1 Compare and miscellaneous instructions. These 1 Simple Vector instructions. These instructions use instructions perform miscellaneous functions such the corresponding low and high word elements of as field manipulation, bit reversed incrementing, the operands to produce a vector result that is and vector compares. placed in the destination register, the accumulator, or both. Table 2: Mnemonic Extensions for Multiply Accumulate Instructions Extension Meaning Comments Multiply Form he halfword even 16 X 16 32 heg halfword even guarded 16 X 16 32, 64-bit final accumulate result ho halfword odd 16 X 16 32 hog halfword odd guarded 16 X 16 32, 64-bit final accumulate result w word 32 X 32 64 wh word high 32 X 32 32 (high-order 32 bits of product) wl word low 32 X 32 32 (low-order 32 bits of product) Data Format smf signed modulo fractional modulo, no saturation or overflow smi signed modulo integer modulo, no saturation or overflow ssf signed saturate fractional saturation on product and accumulate ssi signed saturate integer saturation on product and accumulate umi unsigned modulo integer modulo, no saturation or overflow usi unsigned saturate integer saturation on product and accumulate Accumulate Option a place in accumulator result accumulator aa add to accumulator accumulator + result accumulator aaw add to accumulator as word elements accumulator0:31 + result0:31 accumulator0:31 accumulator32:63 + result32:63 accumulator32:63 an add negated to accumulator accumulator - result accumulator anw add negated to accumulator as word accumulator0:31 - result0:31 accumulator0:31 elements accumulator32:63 - result32:63 accumulator32:63 206 Power ISATM -- Book I Version 2.04 6.3.7 SPE Instructions 6.3.8 Saturation, Shift, and Bit Reverse Models For saturation, left shifts, and bit reversal, the pseudo RTL is provided here to more accurately describe those functions that are referenced in the instruction pseudo RTL. 6.3.8.1 Saturation SATURATE(ov, carry, sat_ovn, sat_ov, val) if ov then if carry then return sat_ovn else return sat_ov else return val 6.3.8.2 Shift Left SL(value, cnt) if cnt > 31 then return 0 else return (value << cnt) 6.3.8.3 Bit Reverse BITREVERSE(value) result 1 0 mask 1 1 shift 1 31 cnt 1 32 while cnt > 0 then do t 1 value & mask if shift >= 0 then result 1 (t << shift) | result else result 1 (t >> -shift) | result cnt 1 cnt - 1 shift 1 shift - 2 mask 1 mask << 1 return result Chapter 6. Signal Processing Engine (SPE) 207 Version 2.04 6.3.9 SPE Instruction Set Bit Reversed Increment EVX-form Vector Absolute Value EVX-form brinc RT,RA,RB evabs RT,RA 4 RT RA RB 527 4 RT RA /// 520 0 6 11 16 21 31 0 6 11 16 21 31 n 1 implementation-dependent number of mask bits RT0:31 1 ABS((RA)0:31) mask 1 (RB)64-n:63 RT32:63 1 ABS((RA)32:63) a 1 (RA)64-n:63 d 1 BITREVERSE(1 + BITREVERSE(a | (¬ mask))) The absolute value of each element of RA is placed in RT 1 (RA)0:63-n || (d & mask) the corresponding elements of RT. An absolute value of 0x8000_0000 (most negative number) returns brinc computes a bit-reverse index based on the con- 0x8000_0000. tents of RA and a mask specified in RB. The new index is written to RT. Special Registers Altered: None The number of bits in the mask is implementa- tion-dependent but may not exceed 32. Special Registers Altered: None Vector Add Immediate Word EVX-form Programming Note evaddiw RT,RB,UI brinc provides a way for software to access FFT 4 RT UI RB 514 data in a bit-reversed manner. RA contains the 0 6 11 16 21 31 index into a buffer that contains data on which FFT is to be performed. RB contains a mask that allows RT0:31 1 (RB)0:31 + EXTZ(UI) the index to be updated with bit-reversed address- RT32:63 1 (RB)32:63 + EXTZ(UI) ing. Typically this instruction precedes a load with index instruction; for example, UI is zero-extended and added to both the high and low elements of RB and the results are placed in RT. Note brinc r2, r3, r4 that the same value is added to both elements of the lhax r8, r5, r2 register. RB contains a bit-mask that is based on the num- Special Registers Altered: ber of points in an FFT. To access a buffer contain- None ing n byte sized data that is to be accessed with bit-reversed addressing, the mask has log2n 1s in the least significant bit positions and 0s in the remaining most significant bit positions. If, how- Vector Add Signed, Modulo, Integer to ever, the data size is a multiple of a halfword or a Accumulator Word EVX-form word, the mask is constructed so that the 1s are shifted left by log2 (size of the data) and 0s are evaddsmiaaw RT,RA placed in the least significant bit positions. 4 RT RA /// 1225 0 6 11 16 21 31 Programming Note Architecture Note This instruction only modifies the lower 32 bits of RT0:31 1 (ACC)0:31 + (RA)0:31 the destination register in 32-bit implementations. RT32:63 1 (ACC)32:63 + (RA)32:63 For 64-bit implementations in 32-bit mode, the con- ACC0:63 1 (RT)0:63 tents of the upper 32-bits of the destination register are undefined. Each word element in RA is added to the correspond- ing element in the accumulator and the results are placed in RT and into the accumulator. Programming Note Special Registers Altered: Execution of brinc does not cause SPE Unavail- ACC able exceptions regardless of MSRSPV. 208 Power ISATM -- Book I Version 2.04 Vector Add Signed, Saturate, Integer to Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX-form Accumulator Word EVX-form evaddssiaaw RT,RA evaddusiaaw RT,RA 4 RT RA /// 1217 4 RT RA /// 1216 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 EXTS((ACC)0:31) + EXTS((RA)0:31) temp0:63 1 EXTZ((ACC)0:31) + EXTZ((RA)0:31) ovh 1 temp31 temp32 ovh 1 temp31 RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 1 SATURATE(ovh, temp31, 0xFFFF_FFFF, 0x7FFF_FFFF, temp32:63) 0xFFFF_FFFF, temp32:63) temp0:63 1 EXTS((ACC)32:63) + EXTS((RA)32:63) temp0:63 1 EXTZ((ACC)32:63) + EXTZ((RA)32:63) ovl 1 temp31 temp32 ovl 1 temp31 RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 1 SATURATE(ovl, temp31, 0xFFFF_FFFF, 0x7FFF_FFFF, temp32:63) 0xFFFF_FFFF, temp32:63) ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 SPEFSCROVH 1 ovh SPEFSCROV 1 ovl SPEFSCROVH 1 ovh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCROV 1 ovl SPEFSCRSOV 1 SPEFSCRSOV | ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl Each signed-integer word element in RA is sign-extended and added to the corresponding Each unsigned-integer word element in RA is sign-extended element in the accumulator saturating if zero-extended and added to the corresponding overflow occurs, and the results are placed in RT and zero-extended element in the accumulator saturating if the accumulator. overflow occurs, and the results are placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH Special Registers Altered: ACC OV OVH SOV SOVH Vector Add Unsigned, Modulo, Integer to Vector Add Word EVX-form Accumulator Word EVX-form evaddw RT,RA,RB evaddumiaaw RT,RA 4 RT RA RB 512 4 RT RA /// 1224 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (RA)0:31 + (RB)0:31 RT32:63 1 (RA)32:63 + (RB)32:63 RT0:31 1 (ACC)0:31 + (RA)0:31 The corresponding elements of RA and RB are added RT32:63 1 (ACC)32:63 + (RA)32:63 ACC0:63 1 (RT)0:63 and the results are placed in RT. The sum is a modulo sum. Each unsigned-integer word element in RA is added to the corresponding element in the accumulator and the Special Registers Altered: results are placed in RT and the accumulator. None Special Registers Altered: ACC Chapter 6. Signal Processing Engine (SPE) 209 Version 2.04 Vector AND EVX-form Vector AND with Complement EVX-form evand RT,RA,RB evandc RT,RA,RB 4 RT RA RB 529 4 RT RA RB 530 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (RA)0:31 & (RB)0:31 RT0:31 1 (RA)0:31 & (¬(RB)0:31) RT32:63 1 (RA)32:63 & (RB)32:63 RT32:63 1 (RA)32:63 & (¬(RB)32:63) The corresponding elements of RA and RB are ANDed The word elements of RA are ANDed bitwise with the bitwise and the results are placed in the corresponding complement of the corresponding elements of RB. The element of RT. results are placed in the corresponding element of RT. Special Registers Altered: Special Registers Altered: None None Vector Compare Equal EVX-form Vector Compare Greater Than Signed EVX-form evcmpeq BF,RA,RB evcmpgts BF,RA,RB 4 BF // RA RB 564 0 6 9 11 16 21 31 4 BF // RA RB 561 0 6 9 11 16 21 31 ah 1 (RA)0:31 al 1 (RA)32:63 ah 1 (RA)0:31 bh 1 (RB)0:31 al 1 (RA)32:63 bl 1 (RB)32:63 bh 1 (RB)0:31 if (ah = bh) then ch 1 1 bl 1 (RB)32:63 else ch 1 0 if (ah > bh) then ch 1 1 if (al = bl) then cl 1 1 else ch 1 0 else cl 1 0 if (al > bl) then cl 1 1 CR4×BF+32:4×BF+35 1 ch || cl || (ch | cl) || (ch & cl) else cl 1 0 CR4×BF+32:4×BF+35 1 ch || cl || (ch | cl) || (ch & cl) The most significant bit in BF is set if the high-order ele- ment of RA is equal to the high-order element of RB; it The most significant bit in BF is set if the high-order ele- is cleared otherwise. The next bit in BF is set if the ment of RA is greater than the high-order element of low-order element of RA is equal to the low-order ele- RB; it is cleared otherwise. The next bit in BF is set if ment of RB and cleared otherwise. The last two bits of the low-order element of RA is greater than the BF are set to the OR and AND of the result of the com- low-order element of RB and cleared otherwise. The pare of the high and low elements. last two bits of BF are set to the OR and AND of the result of the compare of the high and low elements. Special Registers Altered: CR field BF Special Registers Altered: CR field BF 210 Power ISATM -- Book I Version 2.04 Vector Compare Greater Than Unsigned Vector Compare Less Than Signed EVX-form EVX-form evcmpgtu BF,RA,RB evcmplts BF,RA,RB 4 BF // RA RB 560 4 BF // RA RB 563 0 6 9 11 16 21 31 0 6 9 11 16 21 31 ah 1 (RA)0:31 ah 1 (RA)0:31 al 1 (RA)32:63 al 1 (RA)32:63 bh 1 (RB)0:31 bh 1 (RB)0:31 bl 1 (RB)32:63 bl 1 (RB)32:63 if (ah >u bh) then ch 1 1 if (ah < bh) then ch 1 1 else ch 1 0 else ch 1 0 if (al >u bl) then cl 1 1 if (al < bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF+32:4×BF+35 1 ch || cl || (ch | cl) || (ch & cl) CR4×BF+32:4×BF+35 1 ch || cl || (ch | cl) || (ch & cl) The most significant bit in BF is set if the high-order ele- The most significant bit in BF is set if the high-order ele- ment of RA is greater than the high-order element of ment of RA is less than the high-order element of RB; it RB; it is cleared otherwise. The next bit in BF is set if is cleared otherwise. The next bit in BF is set if the the low-order element of RA is greater than the low-order element of RA is less than the low-order ele- low-order element of RB and cleared otherwise. The ment of RB and cleared otherwise. The last two bits of last two bits of BF are set to the OR and AND of the BF are set to the OR and AND of the result of the com- result of the compare of the high and low elements. pare of the high and low elements. Special Registers Altered: Special Registers Altered: CR field BF CR field BF Vector Compare Less Than Unsigned EVX-form evcmpltu BF,RA,RB 4 BF // RA RB 562 0 6 9 11 16 21 31 ah 1 (RA)0:31 al 1 (RA)32:63 bh 1 (RB)0:31 bl 1 (RB)32:63 if (ah = 0) & (dvh = 0)) then RT32:63 1 n RT0:31 1 0x7FFFFFFF ovh 1 1 The leading sign bits in each element of RA are else if (ddh = 0x8000_0000)&(dvh = 0xFFFF_FFFF) counted, and the respective count is placed into each then element of RT. RT0:31 1 0x7FFFFFFF ovh 1 1 Special Registers Altered: if ((ddl < 0) & (dvl = 0)) then None RT32:63 1 0x8000_0000 ovl 1 1 Programming Note else if ((ddl >= 0) & (dvl = 0)) then evcntlzw is used for unsigned operands; evcntlsw RT32:63 1 0x7FFFFFFF ovl 1 1 is used for signed operands. else if (ddl = 0x8000_0000)&(dvl = 0xFFFF_FFFF) then RT32:63 1 0x7FFFFFFF ovl 1 1 SPEFSCROVH 1 ovh SPEFSCROV 1 ovl Vector Count Leading Zeros Word SPEFSCRSOVH 1 SPEFSCRSOVH | ovh EVX-form SPEFSCRSOV 1 SPEFSCRSOV | ovl The two dividends are the two elements of the contents evcntlzw RT,RA of RA. The two divisors are the two elements of the contents of RB. The resulting two 32-bit quotients on 4 RT RA /// 525 0 6 11 16 21 31 each element are placed into RT. The remainders are not supplied. The operands and quotients are inter- preted as signed integers. n 1 0 do while n < 32 Special Registers Altered: if (RA)n = 1 then leave OV OVH SOV SOVH n 1 n + 1 RT0:31 1 n Programming Note n 1 0 do while n < 32 Note that any overflow indication is always set as a if (RA)n+32 = 1 then leave side effect of this instruction. No form is defined n 1 n + 1 that disables the setting of the overflow bits. In RT32:63 1 n case of overflow, a saturated value is delivered into The leading zero bits in each element of RA are the destination register. counted, and the respective count is placed into each element of RT. Special Registers Altered: None 212 Power ISATM -- Book I Version 2.04 Vector Divide Word Unsigned EVX-form Vector Equivalent EVX-form evdivwu RT,RA,RB eveqv RT,RA,RB 4 RT RA RB 1223 4 RT RA RB 537 0 6 11 16 21 31 0 6 11 16 21 31 ddh 1 (RA)0:31 RT0:31 1 (RA)0:31 (RB)0:31 ddl 1(RA)32:63 RT32:63 1 (RA)32:63 (RB)32:63 dvh 1 (RB)0:31 dvl 1 (RB)32:63 The corresponding elements of RA and RB are XORed RT0:31 1 ddh ÷ dvh bitwise, and the complemented results are placed in RT32:63 1 ddl ÷ dvl RT. ovh 1 0 Special Registers Altered: ovl 1 0 if (dvh = 0) then None RT0:31 1 0xFFFFFFFF ovh 1 1 if (dvl = 0) then RT32:63 1 0xFFFFFFFF ovl 1 1 Vector Extend Sign Byte EVX-form SPEFSCROVH 1 ovh SPEFSCROV 1 ovl evextsb RT,RA SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl 4 RT RA /// 522 The two dividends are the two elements of the contents 0 6 11 16 21 31 of RA. The two divisors are the two elements of the contents of RB. Two 32-bit quotients are formed as a RT0:31 1 EXTS((RA)24:31) result of the division on each of the high and low ele- RT32:63 1 EXTS((RA)56:63) ments and the quotients are placed into RT. Remain- ders are not supplied. Operands and quotients are The signs of the low-order byte in each of the elements interpreted as unsigned integers. in RA are extended, and the results are placed in RT. Special Registers Altered: Special Registers Altered: OV OVH SOV SOVH None Programming Note Note that any overflow indication is always set as a side effect of this instruction. No form is defined Vector Extend Sign Halfword EVX-form that disables the setting of the overflow bits. In case of overflow, a saturated value is delivered into evextsh RT,RA the destination register. 4 RT RA /// 523 0 6 11 16 21 31 RT0:31 1 EXTS((RA)16:31) RT32:63 1 EXTS((RA)48:63) The signs of the odd halfwords in each of the elements in RA are extended, and the results are placed in RT. Special Registers Altered: None Chapter 6. Signal Processing Engine (SPE) 213 Version 2.04 Vector Load Double Word into Double Vector Load Double Word into Double Word EVX-form Word Indexed EVX-form evldd RT,D(RA) evlddx RT,RA,RB 4 RT RA UI 769 4 RT RA RB 768 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×8) EA 1 b + (RB) RT 1 MEM(EA, 8) RT 1 MEM(EA, 8) D in the instruction mnemonic is UI × 8. The double- The doubleword addressed by EA is loaded from mem- word addressed by EA is loaded from memory and ory and placed in RT. placed in RT. Special Registers Altered: Special Registers Altered: None None Vector Load Double into Four Halfwords Vector Load Double into Four Halfwords EVX-form Indexed EVX-form evldh RT,D(RA) evldhx RT,RA,RB 4 RT RA UI 773 4 RT RA RB 772 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×8) EA 1 b + (RB) RT0:15 1 MEM(EA, 2) RT0:15 1 MEM(EA, 2) RT16:31 1 MEM(EA+2,2) RT16:31 1 MEM(EA+2,2) RT32:47 1 MEM(EA+4,2) RT32:47 1 MEM(EA+4,2) RT48:63 1 MEM(EA+6,2) RT48:63 1 MEM(EA+6,2) D in the instruction mnemonic is UI × 8. The double- The doubleword addressed by EA is loaded from mem- word addressed by EA is loaded from memory and ory and placed in RT. placed in RT. Special Registers Altered: Special Registers Altered: None None 214 Power ISATM -- Book I Version 2.04 Vector Load Double into Two Words Vector Load Double into Two Words EVX-form Indexed EVX-form evldw RT,D(RA) evldwx RT,RA,RB 4 RT RA UI 771 4 RT RA RB 770 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×8) EA 1 b + (RB) RT0:31 1 MEM(EA, 4) RT0:31 1 MEM(EA, 4) RT32:63 1 MEM(EA+4, 4) RT32:63 1 MEM(EA+4, 4) D in the instruction mnemonic is UI × 8. The double- The doubleword addressed by EA is loaded from mem- word addressed by EA is loaded from memory and ory and placed in RT. placed in RT. Special Registers Altered: Special Registers Altered: None None Vector Load Halfword into Halfwords Vector Load Halfword into Halfwords Even and Splat EVX-form Even and Splat Indexed EVX-form evlhhesplat RT,D(RA) evlhhesplatx RT,RA,RB 4 RT RA UI 777 4 RT RA RB 776 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×2) EA 1 b + (RB) RT0:15 1 MEM(EA,2) RT0:15 1 MEM(EA,2) RT16:31 1 0x0000 RT16:31 1 0x0000 RT32:47 1 MEM(EA,2) RT32:47 1 MEM(EA,2) RT48:63 1 0x0000 RT48:63 1 0x0000 D in the instruction mnemonic is UI × 2. The halfword The halfword addressed by EA is loaded from memory addressed by EA is loaded from memory and placed in and placed in the even halfwords of each element of the even halfwords of each element of RT. The odd RT. The odd halfwords of each element of RT are set to halfwords of each element of RT are set to 0. 0. Special Registers Altered: Special Registers Altered: None None Chapter 6. Signal Processing Engine (SPE) 215 Version 2.04 Vector Load Halfword into Halfword Odd Vector Load Halfword into Halfword Odd Signed and Splat EVX-form Signed and Splat Indexed EVX-form evlhhossplat RT,D(RA) evlhhossplatx RT,RA,RB 4 RT RA UI 783 4 RT RA RB 782 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×2) EA 1 b + (RB) RT0:31 1 EXTS(MEM(EA,2)) RT0:31 1 EXTS(MEM(EA,2)) RT32:63 1 EXTS(MEM(EA,2)) RT32:63 1 EXTS(MEM(EA,2)) D in the instruction mnemonic is UI × 2. The halfword The halfword addressed by EA is loaded from memory addressed by EA is loaded from memory and placed in and placed in the odd halfwords sign extended in each the odd halfwords sign extended in each element of RT. element of RT. Special Registers Altered: Special Registers Altered: None None Vector Load Halfword into Halfword Odd Vector Load Halfword into Halfword Odd Unsigned and Splat EVX-form Unsigned and Splat Indexed EVX-form evlhhousplat RT,D(RA) evlhhousplatx RT,RA,RB 4 RT RA UI 781 4 RT RA RB 780 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×2) EA 1 b + (RB) RT0:31 1 EXTZ(MEM(EA,2)) RT0:31 1 EXTZ(MEM(EA,2)) RT32:63 1 EXTZ(MEM(EA,2)) RT32:63 1 EXTZ(MEM(EA,2)) D in the instruction mnemonic is UI × 2. The halfword The halfword addressed by EA is loaded from memory addressed by EA is loaded from memory and placed in and placed in the odd halfwords zero-extended in each the odd halfwords zero-extended in each element of element of RT. RT. Special Registers Altered: Special Registers Altered: None None 216 Power ISATM -- Book I Version 2.04 Vector Load Word into Two Halfwords Vector Load Word into Two Halfwords Even EVX-form Even Indexed EVX-form evlwhe RT,D(RA) evlwhex RT,RA,RB 4 RT RA UI 785 4 RT RA RB 784 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×4) EA 1 b + (RB) RT0:15 1 MEM(EA,2) RT0:15 1 MEM(EA,2) RT16:31 1 0x0000 RT16:31 1 0x0000 RT32:47 1 MEM(EA+2,2) RT32:47 1 MEM(EA+2,2) RT48:63 1 0x0000 RT48:63 1 0x0000 D in the instruction mnemonic is UI × 4. The word The word addressed by EA is loaded from memory and addressed by EA is loaded from memory and placed in placed in the even halfwords in each element of RT. the even halfwords of each element of RT. The odd The odd halfwords of each element of RT are set to 0. halfwords of each element of RT are set to 0. Special Registers Altered: Special Registers Altered: None None Vector Load Word into Two Halfwords Vector Load Word into Two Halfwords Odd Signed (with sign extension) Odd Signed Indexed (with sign extension) EVX-form EVX-form evlwhos RT,D(RA) evlwhosx RT,RA,RB 4 RT RA UI 791 4 RT RA RB 790 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×4) EA 1 b + (RB) RT0:31 1 EXTS(MEM(EA,2)) RT0:31 1 EXTS(MEM(EA,2)) RT32:63 1 EXTS(MEM(EA+2,2)) RT32:63 1 EXTS(MEM(EA+2,2)) D in the instruction mnemonic is UI × 4. The word The word addressed by EA is loaded from memory and addressed by EA is loaded from memory and placed in placed in the odd halfwords sign extended in each ele- the odd halfwords sign extended in each element of RT. ment of RT. Special Registers Altered: Special Registers Altered: None None Chapter 6. Signal Processing Engine (SPE) 217 Version 2.04 Vector Load Word into Two Halfwords Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) EVX-form Odd Unsigned Indexed (zero-extended) EVX-form evlwhou RT,D(RA) evlwhoux RT,RA,RB 4 RT RA UI 789 0 6 11 16 21 31 4 RT RA RB 788 0 6 11 16 21 31 if (RA = 0) then b 1 0 else b 1 (RA) if (RA = 0) then b 1 0 EA 1 b + EXTZ(UI×4) else b 1 (RA) RT0:31 1 EXTZ(MEM(EA,2)) EA 1 b + (RB) RT32:63 1 EXTZ(MEM(EA+2,2)) RT0:31 1 EXTZ(MEM(EA,2)) RT32:63 1 EXTZ(MEM(EA+2,2)) D in the instruction mnemonic is UI × 4. The word addressed by EA is loaded from memory and placed in The word addressed by EA is loaded from memory and the odd halfwords zero-extended in each element of placed in the odd halfwords zero-extended in each ele- RT. ment of RT. Special Registers Altered: Special Registers Altered: None None Vector Load Word into Two Halfwords and Vector Load Word into Two Halfwords and Splat EVX-form Splat Indexed EVX-form evlwhsplat RT,D(RA) evlwhsplatx RT,RA,RB 4 RT RA UI 797 4 RT RA RB 796 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×4) EA 1 b + (RB) RT0:15 1 MEM(EA,2) RT0:15 1 MEM(EA,2) RT16:31 1 MEM(EA,2) RT16:31 1 MEM(EA,2) RT32:47 1 MEM(EA+2,2) RT32:47 1 MEM(EA+2,2) RT48:63 1 MEM(EA+2,2) RT48:63 1 MEM(EA+2,2) D in the instruction mnemonic is UI × 4. The word The word addressed by EA is loaded from memory and addressed by EA is loaded from memory and placed in placed in both the even and odd halfwords in each ele- both the even and odd halfwords in each element of RT. ment of RT. Special Registers Altered: Special Registers Altered: None None 218 Power ISATM -- Book I Version 2.04 Vector Load Word into Word and Splat Vector Load Word into Word and Splat EVX-form Indexed EVX-form evlwwsplat RT,D(RA) evlwwsplatx RT,RA,RB 4 RT RA UI 793 4 RT RA RB 792 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×4) EA 1 b + (RB) RT0:31 1 MEM(EA,4) RT0:31 1 MEM(EA,4) RT32:63 1 MEM(EA,4) RT32:63 1 MEM(EA,4) D in the instruction mnemonic is UI × 4. The word The word addressed by EA is loaded from memory and addressed by EA is loaded from memory and placed in placed in both elements of RT. both elements of RT. Special Registers Altered: Special Registers Altered: None None Vector Merge High EVX-form Vector Merge Low EVX-form evmergehi RT,RA,RB evmergelo RT,RA,RB 4 RT RA RB 556 4 RT RA RB 557 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (RA)0:31 RT0:31 1 (RA)32:63 RT32:63 1 (RB)0:31 RT32:63 1 (RB)32:63 The high-order elements of RA and RB are merged and The low-order elements of RA and RB are merged and placed in RT. placed in RT. Special Registers Altered: Special Registers Altered: None None Programming Note Programming Note A vector splat high can be performed by specifying A vector splat low can be performed by specifying the same register in RA and RB. the same register in RA and RB. Chapter 6. Signal Processing Engine (SPE) 219 Version 2.04 Vector Merge High/Low EVX-form Vector Merge Low/High EVX-form evmergehilo RT,RA,RB evmergelohi RT,RA,RB 4 RT RA RB 558 4 RT RA RB 559 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (RA)0:31 RT0:31 1 (RA)32:63 RT32:63 1 (RB)32:63 RT32:63 1 (RB)0:31 The high-order element of RA and the low-order ele- The low-order element of RA and the high-order ele- ment of RB are merged and placed in RT. ment of RB are merged and placed in RT. Special Registers Altered: Special Registers Altered: None None Programming Note Programming Note With appropriate specification of RA and RB, A vector swap can be performed by specifying the evmergehi, evmergelo, evmergehilo, and same register in RA and RB. evmergelohi provide a full 32-bit permute of two source operands. Vector Multiply Halfwords, Even, Guarded, Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Signed, Modulo, Fractional and Accumulate EVX-form Accumulate Negative EVX-form evmhegsmfaa RT,RA,RB evmhegsmfan RT,RA,RB 4 RT RA RB 1323 4 RT RA RB 1451 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)32:47 ×gsf (RB)32:47 temp0:63 1 (RA)32:47 ×gsf (RB)32:47 RT0:63 1 (ACC)0:63 + temp0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 The corresponding low even-numbered, halfword The corresponding low even-numbered, halfword signed fractional elements in RA and RB are multiplied signed fractional elements in RA and RB are multiplied using guarded signed fractional multiplication produc- using guarded signed fractional multiplication produc- ing a sign extended 64-bit fractional product with the ing a sign extended 64-bit fractional product with the decimal between bits 32 and 33. The product is added decimal between bits 32 and 33. The product is sub- to the contents of the 64-bit accumulator and the result tracted from the contents of the 64-bit accumulator and is placed in RT and the accumulator the result is placed in RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Note Note If the two input operands are both -1.0, the interme- If the two input operands are both -1.0, the interme- diate product is represented as +1.0. diate product is represented as +1.0. 220 Power ISATM -- Book I Version 2.04 Vector Multiply Halfwords, Even, Guarded, Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Signed, Modulo, Integer and Accumulate EVX-form Negative EVX-form evmhegsmiaa RT,RA,RB evmhegsmian RT,RA,RB 4 RT RA RB 1321 4 RT RA RB 1449 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)32:47 ×si (RB)32:47 temp0:31 1 (RA)32:47 ×si (RB)32:47 temp0:63 1 EXTS(temp0:31) temp0:63 1 EXTS(temp0:31) RT0:63 1 (ACC)0:63 + temp0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 The corresponding low even-numbered halfword The corresponding low even-numbered halfword signed-integer elements in RA and RB are multiplied. signed-integer elements in RA and RB are multiplied. The intermediate product is sign-extended and added The intermediate product is sign-extended and sub- to the contents of the 64-bit accumulator, and the tracted from the contents of the 64-bit accumulator, and resulting sum is placed in RT and into the accumulator. the result is placed in RT and into the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Vector Multiply Halfwords, Even, Guarded, Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Unsigned, Modulo, Integer and Accumulate EVX-form Accumulate Negative EVX-form evmhegumiaa RT,RA,RB evmhegumian RT,RA,RB 4 RT RA RB 1320 4 RT RA RB 1448 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)32:47 ×ui (RB)32:47 temp0:31 1 (RA)32:47 ×ui (RB)32:47 temp0:63 1 EXTZ(temp0:31) temp0:63 1 EXTZ(temp0:31) RT0:63 1 (ACC)0:63 + temp0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 The corresponding low even-numbered halfword The corresponding low even-numbered unsigned-inte- unsigned-integer elements in RA and RB are multi- ger elements in RA and RB are multiplied. The interme- plied. The intermediate product is zero-extended and diate product is zero-extended and subtracted from the added to the contents of the 64-bit accumulator. The contents of the 64-bit accumulator. The result is placed resulting sum is placed in RT and into the accumulator. in RT and into the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Chapter 6. Signal Processing Engine (SPE) 221 Version 2.04 Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Modulo, Fractional EVX-form Modulo, Fractional to Accumulator EVX-form evmhesmf RT,RA,RB evmhesmfa RT,RA,RB 4 RT RA RB 1035 0 6 11 16 21 31 4 RT RA RB 1067 0 6 11 16 21 31 RT0:31 1 (RA)0:15 ×sf (RB)0:15 RT32:631 (RA)32:47 ×sf (RB)32:47 RT0:31 1 (RA)0:15 ×sf (RB)0:15 The corresponding even-numbered halfword signed RT32:631 (RA)32:47 ×sf (RB)32:47 fractional elements in RA and RB are multiplied then ACC0:63 1 (RT)0:63 placed into the corresponding words of RT. The corresponding even-numbered halfword signed Special Registers Altered: fractional elements in RA and RB are multiplied then None placed into the corresponding words of RT and into the accumulator. Special Registers Altered: ACC Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate into Modulo, Fractional and Accumulate Words EVX-form Negative into Words EVX-form evmhesmfaaw RT,RA,RB evmhesmfanw RT,RA,RB 4 RT RA RB 1291 4 RT RA RB 1419 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)0:15 ×sf (RB)0:15 temp0:31 1 (RA)0:15 ×sf (RB)0:15 RT0:31 1 (ACC)0:31 + temp0:31 RT0:31 1 (ACC)0:31 - temp0:31 temp0:31 1 (RA)32:47 ×sf (RB)32:47 RT32:63 1 (ACC)32:63 + temp0:31 temp0:31 1 (RA)32:47 ×sf (RB)32:47 ACC0:63 1 (RT)0:63 RT32:631 (ACC)32:63 - temp0:31 ACC0:63 1 (RT)0:63 For each word element in the accumulator, the corre- sponding even-numbered halfword signed fractional For each word element in the accumulator, the corre- elements in RA and RB are multiplied. The 32 bits of sponding even-numbered halfword signed fractional each intermediate product are added to the contents of elements in RA and RB are multiplied. The 32-bit inter- the accumulator words to form intermediate sums, mediate products are subtracted from the contents of which are placed into the corresponding RT words and the accumulator words to form intermediate differ- into the accumulator. ences, which are placed into the corresponding RT words and into the accumulator. Special Registers Altered: ACC Special Registers Altered: ACC 222 Power ISATM -- Book I Version 2.04 Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Modulo, Integer EVX-form Modulo, Integer to AccumulatorEVX-form evmhesmi RT,RA,RB evmhesmia RT,RA,RB 4 RT RA RB 1033 4 RT RA RB 1065 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (RA)0:15 ×si (RB)0:15 RT0:31 1 (RA)0:15 ×si (RB)0:15 RT32:63 1 (RA)32:47 ×si (RB)32:47 RT32:63 1 (RA)32:47 ×si (RB)32:47 ACC0:63 1 (RT)0:63 The corresponding even-numbered halfword signed-integer elements in RA and RB are multiplied. The corresponding even-numbered halfword The two 32-bit products are placed into the correspond- signed-integer elements in RA and RB are multiplied. ing words of RT. The two 32-bit products are placed into the correspond- ing words of RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate into Modulo, Integer and Accumulate Negative Words EVX-form into Words EVX-form evmhesmiaaw RT,RA,RB evmhesmianw RT,RA,RB 4 RT RA RB 1289 4 RT RA RB 1417 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)0:15 ×si (RB)0:15 temp0:31 1 (RA)0:15 ×si (RB)0:15 RT0:31 1 (ACC)0:31 + temp0:31 RT0:31 1 (ACC)0:31 - temp0:31 temp0:31 1 (RA)32:47 ×si (RB)32:47 temp0:31 1 (RA)32:47 ×si (RB)32:47 RT32:63 1 (ACC)32:63 + temp0:31 RT32:63 1 (ACC)32:63 - temp0:31 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 For each word element in the accumulator, the corre- For each word element in the accumulator, the corre- sponding even-numbered halfword signed-integer ele- sponding even-numbered halfword signed-integer ele- ments in RA and RB are multiplied. Each intermediate ments in RA and RB are multiplied. Each intermediate 32-bit product is added to the contents of the accumu- 32-bit product is subtracted from the contents of the lator words to form intermediate sums, which are accumulator words to form intermediate differences, placed into the corresponding RT words and into the which are placed into the corresponding RT words and accumulator. into the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Chapter 6. Signal Processing Engine (SPE) 223 Version 2.04 Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Saturate, Fractional EVX-form Saturate, Fractional to Accumulator EVX-form evmhessf RT,RA,RB evmhessfa RT,RA,RB 4 RT RA RB 1027 0 6 11 16 21 31 4 RT RA RB 1059 0 6 11 16 21 31 temp0:31 1 (RA)0:15 ×sf (RB)0:15 if ((RA)0:15 = 0x8000) & ((RB)0:15 = 0x8000) then temp0:31 1 (RA)0:15 ×sf (RB)0:15 RT0:31 1 0x7FFF_FFFF if ((RA)0:15 = 0x8000) & ((RB)0:15 = 0x8000) then movh 1 1 RT0:31 1 0x7FFF_FFFF else movh 1 1 RT0:31 1 temp0:31 else movh 1 0 RT0:31 1 temp0:31 temp0:31 1 (RA)32:47 ×sf (RB)32:47 movh 1 0 if ((RA)32:47 = 0x8000) & ((RB)32:47 = 0x8000) then temp0:31 1 (RA)32:47 ×sf (RB)32:47 RT32:63 1 0x7FFF_FFFF if ((RA)32:47 = 0x8000) & ((RB)32:47 = 0x8000) then movl 1 1 RT32:63 1 0x7FFF_FFFF else movl 1 1 RT32:63 1 temp0:31 else movl 1 0 RT32:63 1 temp0:31 SPEFSCROVH 1 movh movl 1 0 SPEFSCROV 1 movl ACC0:63 1 (RT)0:63 SPEFSCRSOVH 1 SPEFSCRSOVH | movh SPEFSCROVH 1 movh SPEFSCRSOV 1 SPEFSCRSOV | movl SPEFSCROV 1 movl SPEFSCRSOVH 1 SPEFSCRSOVH | movh The corresponding even-numbered halfword signed SPEFSCRSOV 1 SPEFSCRSOV | movl fractional elements in RA and RB are multiplied. The 32 bits of each product are placed into the corresponding The corresponding even-numbered halfword signed words of RT. If both inputs are -1.0, the result saturates fractional elements in RA and RB are multiplied. The 32 to the largest positive signed fraction. bits of each product are placed into the corresponding words of RT and into the accumulator. If both inputs are Special Registers Altered: -1.0, the result saturates to the largest positive signed OV OVH SOV SOVH fraction. Special Registers Altered: ACC OV OVH SOV SOVH 224 Power ISATM -- Book I Version 2.04 Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Saturate, Fractional and Accumulate Words EVX-form Negative into Words EVX-form evmhessfaaw RT,RA,RB evmhessfanw RT,RA,RB 4 RT RA RB 1283 4 RT RA RB 1411 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)0:15 ×sf (RB)0:15 temp0:31 1 (RA)0:15 ×sf (RB)0:15 if ((RA)0:15 = 0x8000) & ((RB)0:15 = 0x8000) then if ((RA)0:15 = 0x8000) & ((RB)0:15 = 0x8000) then temp0:31 1 0x7FFF_FFFF temp0:31 1 0x7FFF_FFFF movh 1 1 movh 1 1 else else movh 1 0 movh 1 0 temp0:63 1 EXTS((ACC)0:31) + EXTS(temp0:31) temp0:63 1 EXTS((ACC)0:31) - EXTS(temp0:31) ovh 1 (temp31 temp32) ovh 1 (temp31 temp32) RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) temp0:31 1 (RA)32:47 ×sf (RB)32:47 temp0:31 1 (RA)32:47 ×sf (RB)32:47 if ((RA)32:47 = 0x8000) & ((RB)32:47 = 0x8000) then if ((RA)32:47 = 0x8000) & ((RB)32:47 = 0x8000) then temp0:31 1 0x7FFF_FFFF temp0:31 1 0x7FFF_FFFF movl 1 1 movl 1 1 else else movl 1 0 movl 1 0 temp0:63 1 EXTS((ACC)32:63) - EXTS(temp0:31) temp0:63 1 EXTS((ACC)32:63) + EXTS(temp0:31) ovl 1 (temp31 temp32) ovl 1 (temp31 temp32) RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 SPEFSCROVH 1 ovh | movh SPEFSCROVH 1 ovh | movh SPEFSCROV 1 ovl| movl SPEFSCROV 1 ovl| movl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh | movh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh | movh SPEFSCRSOV 1 SPEFSCRSOV | ovl| movl SPEFSCRSOV 1 SPEFSCRSOV | ovl| movl The corresponding even-numbered halfword signed The corresponding even-numbered halfword signed fractional elements in RA and RB are multiplied produc- fractional elements in RA and RB are multiplied produc- ing a 32-bit product. If both inputs are -1.0, the result ing a 32-bit product. If both inputs are -1.0, the result saturates to 0x7FFF_FFFF. Each 32-bit product is then saturates to 0x7FFF_FFFF. Each 32-bit product is then added to the corresponding word in the accumulator subtracted from the corresponding word in the accumu- saturating if overflow occurs, and the result is placed in lator saturating if overflow occurs, and the result is RT and the accumulator. placed in RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH Chapter 6. Signal Processing Engine (SPE) 225 Version 2.04 Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate into Saturate, Integer and Accumulate Words EVX-form Negative into Words EVX-form evmhessiaaw RT,RA,RB evmhessianw RT,RA,RB 4 RT RA RB 1281 4 RT RA RB 1409 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)0:15 ×si (RB)0:15 temp0:31 1 (RA)0:15 ×si (RB)0:15 temp0:63 1 EXTS((ACC)0:31) + EXTS(temp0:31) temp0:63 1 EXTS((ACC)0:31) - EXTS(temp0:31) ovh 1 (temp31 temp32) ovh 1 (temp31 temp32) RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) temp0:31 1 (RA)32:47 ×si (RB)32:47 temp0:31 1 (RA)32:47 ×si (RB)32:47 temp0:63 1 EXTS((ACC)32:63) - EXTS(temp0:31) temp0:63 1 EXTS((ACC)32:63) + EXTS(temp0:31) ovl 1 (temp31 temp32) ovl 1 (temp31 temp32) RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) ACC0:63 1 RT0:63 ACC0:63 1 (RT)0:63 SPEFSCROVH 1 ovh SPEFSCROVH 1 ovh SPEFSCROV 1 ovl SPEFSCROV 1 ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl SPEFSCRSOV 1 SPEFSCRSOV | ovl The corresponding even-numbered halfword The corresponding even-numbered halfword signed-integer elements in RA and RB are multiplied signed-integer elements in RA and RB are multiplied producing a 32-bit product. Each 32-bit product is then producing a 32-bit product. Each 32-bit product is then subtracted from the corresponding word in the accumu- added to the corresponding word in the accumulator lator saturating if overflow occurs, and the result is saturating if overflow occurs, and the result is placed in placed in RT and the accumulator. RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH 226 Power ISATM -- Book I Version 2.04 Vector Multiply Halfwords, Even, Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer EVX-form Unsigned, Modulo, Integer to Accumulator EVX-form evmheumi RT,RA,RB evmheumia RT,RA,RB 4 RT RA RB 1032 0 6 11 16 21 31 4 RT RA RB 1064 0 6 11 16 21 31 RT0:31 1 (RA)0:15 ×ui (RB)0:15 RT32:63 1 (RA)32:47 ×ui (RB)32:47 RT0:31 1 (RA)0:15 ×ui (RB)0:15 RT32:63 1 (RA)32:47 ×ui (RB)32:47 The corresponding even-numbered halfword ACC0:63 1 (RT)0:63 unsigned-integer elements in RA and RB are multi- plied. The two 32-bit products are placed into the corre- The corresponding even-numbered halfword sponding words of RT. unsigned-integer elements in RA and RB are multi- plied. The two 32-bit products are placed into RT and Special Registers Altered: into the accumulator. None Special Registers Altered: ACC Vector Multiply Halfwords, Even, Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Unsigned, Modulo, Integer and Accumulate into Words EVX-form Accumulate Negative into Words EVX-form evmheumiaaw RT,RA,RB evmheumianw RT,RA,RB 4 RT RA RB 1288 0 6 11 16 21 31 4 RT RA RB 1416 0 6 11 16 21 31 temp0:31 1 (RA)0:15 ×ui (RB)0:15 RT0:31 1 (ACC)0:31 + temp0:31 temp0:31 1 (RA)0:15 ×ui (RB)0:15 temp0:31 1 (RA)32:47 ×ui (RB)32:47 RT0:31 1 (ACC)0:31 - temp0:31 RT32:63 1 (ACC)32:63 + temp0:31 temp0:31 1 (RA)32:47 ×ui (RB)32:47 ACC0:63 1 (RT)0:63 RT32:63 1 (ACC)32:63 - temp0:31 ACC0:63 1 (RT)0:63 For each word element in the accumulator, the corre- sponding even-numbered halfword unsigned-integer For each word element in the accumulator, the corre- elements in RA and RB are multiplied. Each intermedi- sponding even-numbered halfword unsigned-integer ate product is added to the contents of the correspond- elements in RA and RB are multiplied. Each intermedi- ing accumulator words and the sums are placed into ate product is subtracted from the contents of the corre- the corresponding RT and accumulator words. sponding accumulator words. The differences are placed into the corresponding RT and accumulator Special Registers Altered: words. ACC Special Registers Altered: ACC Chapter 6. Signal Processing Engine (SPE) 227 Version 2.04 Vector Multiply Halfwords, Even, Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Unsigned, Saturate, Integer and Accumulate into Words EVX-form Accumulate Negative into Words EVX-form evmheusiaaw RT,RA,RB evmheusianw RT,RA,RB 4 RT RA RB 1280 0 6 11 16 21 31 4 RT RA RB 1408 0 6 11 16 21 31 temp0:31 1 (RA)0:15 ×ui (RB)0:15 temp0:63 1 EXTZ((ACC)0:31) + EXTZ(temp0:31) temp0:31 1 (RA)0:15 ×ui (RB)0:15 ovh 1 temp31 temp0:63 1 EXTZ((ACC)0:31) - EXTZ(temp0:31) RT0:31 1 SATURATE(ovh, 0, 0xFFFF_FFFF, 0xFFFF_FFFF, ovh 1 temp31 temp32:63) RT0:31 1 SATURATE(ovh, 0, 0x0000_0000, 0x0000_0000, temp0:31 1 (RA)32:47 ×ui (RB)32:47 temp32:63) temp0:63 1 EXTZ((ACC)32:63) + EXTZ(temp0:31) temp0:31 1 (RA)32:47 ×ui (RB)32:47 ovl 1 temp31 temp0:63 1 EXTZ((ACC)32:63) - EXTZ(temp0:31) RT32:63 1 SATURATE(ovl, 0, 0xFFFF_FFFF, ovl 1 temp31 0xFFFF_FFFF, temp32:63) RT32:63 1 SATURATE(ovl, 0, 0x0000_0000, ACC0:63 1 (RT)0:63 0x0000_0000, temp32:63) SPEFSCROVH 1 ovh ACC0:63 1 (RT)0:63 SPEFSCROV 1 ovl SPEFSCROVH 1 ovh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCROV 1 ovl SPEFSCRSOV 1 SPEFSCRSOV | ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl For each word element in the accumulator, correspond- ing even-numbered halfword unsigned-integer ele- For each word element in the accumulator, correspond- ments in RA and RB are multiplied producing a 32-bit ing even-numbered halfword unsigned-integer ele- product. Each 32-bit product is then added to the corre- ments in RA and RB are multiplied producing a 32-bit sponding word in the accumulator saturating if overflow product. Each 32-bit product is then subtracted from occurs, and the result is placed in RT and the accumu- the corresponding word in the accumulator saturating if lator. overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH Special Registers Altered: ACC OV OVH SOV SOVH 228 Power ISATM -- Book I Version 2.04 Vector Multiply Halfwords, Odd, Guarded, Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Signed, Modulo, Fractional and Accumulate EVX-form Accumulate Negative EVX-form evmhogsmfaa RT,RA,RB evmhogsmfan RT,RA,RB 4 RT RA RB 1327 4 RT RA RB 1455 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)48:63 ×gsf (RB)48:63 temp0:63 1 (RA)48:63 ×gsf (RB)48:63 RT0:63 1 (ACC)0:63 + temp0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 The corresponding low odd-numbered, halfword signed The corresponding low odd-numbered, halfword signed fractional elements in RA and RB are multiplied using fractional elements in RA and RB are multiplied using guarded signed fractional multiplication producing a guarded signed fractional multiplication producing a sign extended 64-bit fractional product with the decimal sign extended 64-bit fractional product with the decimal between bits 32 and 33. The product is added to the between bits 32 and 33. The product is subtracted from contents of the 64-bit accumulator and the result is the contents of the 64-bit accumulator and the result is placed in RT and the accumulator. placed in RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Note Note If the two input operands are both -1.0, the interme- If the two input operands are both -1.0, the interme- diate product is represented as +1.0. diate product is represented as +1.0. Vector Multiply Halfwords, Odd, Guarded, Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Signed, Modulo, Integer and Accumulate EVX-form Negative EVX-form evmhogsmiaa RT,RA,RB evmhogsmian RT,RA,RB 4 RT RA RB 1325 4 RT RA RB 1453 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)48:63 ×si (RB)48:63 temp0:31 1 (RA)48:63 ×si (RB)48:63 temp0:63 1 EXTS(temp0:31) temp0:63 1 EXTS(temp0:31) RT0:63 1 (ACC)0:63 + temp0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 The corresponding low odd-numbered halfword The corresponding low odd-numbered halfword signed-integer elements in RA and RB are multiplied. signed-integer elements in RA and RB are multiplied. The intermediate product is sign-extended to 64 bits The intermediate product is sign-extended to 64 bits then added to the contents of the 64-bit accumulator, then subtracted from the contents of the 64-bit accumu- and the result is placed in RT and into the accumulator. lator, and the result is placed in RT and into the accu- mulator. Special Registers Altered: ACC Special Registers Altered: ACC Chapter 6. Signal Processing Engine (SPE) 229 Version 2.04 Vector Multiply Halfwords, Odd, Guarded, Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Unsigned, Modulo, Integer and Accumulate EVX-form Accumulate Negative EVX-form evmhogumiaa RT,RA,RB evmhogumian RT,RA,RB 4 RT RA RB 1324 4 RT RA RB 1452 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)48:63 ×ui (RB)48:63 temp0:31 1 (RA)48:63 ×ui (RB)48:63 temp0:63 1 EXTZ(temp0:31) temp0:63 1 EXTZ(temp0:31) RT0:63 1 (ACC)0:63 + temp0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 The corresponding low odd-numbered halfword The corresponding low odd-numbered halfword unsigned-integer elements in RA and RB are multi- unsigned-integer elements in RA and RB are multi- plied. The intermediate product is zero-extended to 64 plied. The intermediate product is zero-extended to 64 bits then added to the contents of the 64-bit accumula- bits then subtracted from the contents of the 64-bit tor, and the result is placed in RT and into the accumu- accumulator, and the result is placed in RT and into the lator. accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional EVX-form Modulo, Fractional to Accumulator EVX-form evmhosmf RT,RA,RB evmhosmfa RT,RA,RB 4 RT RA RB 1039 0 6 11 16 21 31 4 RT RA RB 1071 0 6 11 16 21 31 RT0:31 1 (RA)16:31 ×sf (RB)16:31 RT32:63 1 (RA)48:63 ×sf (RB)48:63 RT0:31 1 (RA)16:31 ×sf (RB)16:31 RT32:63 1 (RA)48:63 ×sf (RB)48:63 The corresponding odd-numbered, halfword signed ACC0:63 1 (RT)0:63 fractional elements in RA and RB are multiplied. Each product is placed into the corresponding words of RT. The corresponding odd-numbered, halfword signed fractional elements in RA and RB are multiplied. Each Special Registers Altered: product is placed into the corresponding words of RT. None and into the accumulator. Special Registers Altered: ACC 230 Power ISATM -- Book I Version 2.04 Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate into Modulo, Fractional and Accumulate Words EVX-form Negative into Words EVX-form evmhosmfaaw RT,RA,RB evmhosmfanw RT,RA,RB 4 RT RA RB 1295 4 RT RA RB 1423 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)16:31 ×sf (RB)16:31 temp0:31 1 (RA)16:31 ×sf (RB)16:31 RT0:31 1 (ACC)0:31 + temp0:31 RT0:31 1 (ACC)0:31 - temp0:31 temp0:31 1 (RA)48:63 ×sf (RB)48:63 temp0:31 1 (RA)48:63 ×sf (RB)48:63 RT32:63 1 (ACC)32:63 + temp0:31 RT32:63 1 (ACC)32:63 - temp0:31 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 For each word element in the accumulator, the corre- For each word element in the accumulator, the corre- sponding odd-numbered halfword signed fractional ele- sponding odd-numbered halfword signed fractional ele- ments in RA and RB are multiplied. The 32 bits of each ments in RA and RB are multiplied. The 32 bits of each intermediate product are added to the contents of the intermediate product are subtracted from the contents corresponding accumulator word and the results are of the corresponding accumulator word and the results placed into the corresponding RT words and into the are placed into the corresponding RT words and into accumulator. the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Modulo, Integer EVX-form Modulo, Integer to AccumulatorEVX-form evmhosmi RT,RA,RB evmhosmia RT,RA,RB 4 RT RA RB 1037 4 RT RA RB 1069 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (RA)16:31 ×si (RB)16:31 RT0:31 1 (RA)16:31 ×si (RB)16:31 RT32:63 1 (RA)48:63 ×si (RB)48:63 RT32:63 1 (RA)48:63 ×si (RB)48:63 ACC0:63 1 (RT)0:63 The corresponding odd-numbered halfword signed-integer elements in RA and RB are multiplied. The corresponding odd-numbered halfword The two 32-bit products are placed into the correspond- signed-integer elements in RA and RB are multiplied. ing words of RT. The two 32-bit products are placed into the correspond- ing words of RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC Chapter 6. Signal Processing Engine (SPE) 231 Version 2.04 Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate into Modulo, Integer and Accumulate Negative Words EVX-form into Words EVX-form evmhosmiaaw RT,RA,RB evmhosmianw RT,RA,RB 4 RT RA RB 1293 4 RT RA RB 1421 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)16:31 ×si (RB)16:31 temp0:31 1(RA)16:31 ×si (RB)16:31 RT0:31 1 (ACC)0:31 + temp0:31 RT0:31 1 (ACC)0:31 - temp0:31 temp0:31 1 (RA)48:63 ×si (RB)48:63 temp0:31 1 (RA)48:63 ×si (RB)48:63 RT32:63 1 (ACC)32:63 + temp0:31 RT32:63 1 (ACC)32:63 - temp0:31 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 For each word element in the accumulator, the corre- For each word element in the accumulator, the corre- sponding odd-numbered halfword signed-integer ele- sponding odd-numbered halfword signed-integer ele- ments in RA and RB are multiplied. Each intermediate ments in RA and RB are multiplied. Each intermediate 32-bit product is added to the contents of the corre- 32-bit product is subtracted from the contents of the sponding accumulator word and the results are placed corresponding accumulator word and the results are into the corresponding RT words and into the accumu- placed into the corresponding RT words and into the lator. accumulator. Special Registers Altered: Special Registers Altered: ACC ACC 232 Power ISATM -- Book I Version 2.04 Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional EVX-form Saturate, Fractional to Accumulator EVX-form evmhossf RT,RA,RB evmhossfa RT,RA,RB 4 RT RA RB 1031 0 6 11 16 21 31 4 RT RA RB 1063 0 6 11 16 21 31 temp0:31 1 (RA)16:31 ×sf (RB)16:31 if ((RA)16:31 = 0x8000) & ((RB)16:31 = 0x8000) then temp0:31 1 (RA)16:31 ×sf (RB)16:31 RT0:31 1 0x7FFF_FFFF if ((RA)16:31 = 0x8000) & ((RB)16:31 = 0x8000) then movh 1 1 RT0:31 1 0x7FFF_FFFF else movh 1 1 RT0:31 1 temp0:31 else movh 1 0 RT0:31 1 temp0:31 temp0:31 1 (RA)48:63 ×sf (RB)48:63 movh 1 0 if ((RA)48:63 = 0x8000) & ((RB)48:63 = 0x8000) then temp0:31 1 (RA)48:63 ×sf (RB)48:63 RT32:63 1 0x7FFF_FFFF if ((RA)48:63 = 0x8000) & ((RB)48:63 = 0x8000) then movl 1 1 RT32:63 1 0x7FFF_FFFF else movl 1 1 RT32:63 1 temp0:31 else movl 1 0 RT32:63 1 temp0:31 SPEFSCROVH 1 movh movl 1 0 SPEFSCROV 1 movl ACC0:63 1 (RT)0:63 SPEFSCRSOVH 1 SPEFSCRSOVH | movh SPEFSCROVH 1 movh SPEFSCRSOV 1 SPEFSCRSOV | movl SPEFSCROV 1 movl SPEFSCRSOVH 1 SPEFSCRSOVH | movh The corresponding odd-numbered halfword signed SPEFSCRSOV 1 SPEFSCRSOV | movl fractional elements in RA and RB are multiplied. The 32 bits of each product are placed into the corresponding The corresponding odd-numbered halfword signed words of RT. If both inputs are -1.0, the result saturates fractional elements in RA and RB are multiplied. The 32 to the largest positive signed fraction. bits of each product are placed into the corresponding words of RT and into the accumulator. If both inputs are Special Registers Altered: -1.0, the result saturates to the largest positive signed OV OVH SOV SOVH fraction. Special Registers Altered: ACC OV OVH SOV SOVH Chapter 6. Signal Processing Engine (SPE) 233 Version 2.04 Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional and Accumulate into Saturate, Fractional and Accumulate Words EVX-form Negative into Words EVX-form evmhossfaaw RT,RA,RB evmhossfanw RT,RA,RB 4 RT RA RB 1287 4 RT RA RB 1415 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)16:31 ×sf (RB)16:31 temp0:31 1 (RA)16:31 ×sf (RB)16:31 if ((RA)16:31 = 0x8000) & ((RB)16:31 = 0x8000) then if ((RA)16:31 = 0x8000) & ((RB)16:31 = 0x8000) then temp0:31 1 0x7FFF_FFFF temp0:31 1 0x7FFF_FFFF movh 1 1 movh 1 1 else else movh 1 0 movh 1 0 temp0:63 1 EXTS((ACC)0:31) + EXTS(temp0:31) temp0:63 1 EXTS((ACC)0:31) - EXTS(temp0:31) ovh 1 (temp31 temp32) ovh 1 (temp31 temp32) RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) temp0:31 1 (RA)48:63 ×sf (RB)48:63 temp0:31 1 (RA)48:63 ×sf (RB)48:63 if ((RA)48:63 = 0x8000) & ((RB)48:63 = 0x8000) then if ((RA)48:63 = 0x8000) & ((RB)48:63 = 0x8000) then temp0:31 1 0x7FFF_FFFF temp0:31 1 0x7FFF_FFFF movl 1 1 movl 1 1 else else movl 1 0 movl 1 0 temp0:63 1 EXTS((ACC)32:63) + EXTS(temp0:31) temp0:63 1 EXTS((ACC)32:63) - EXTS(temp0:31) ovl 1 (temp31 temp32) ovl 1 (temp31 temp32) RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 SPEFSCROVH 1 ovh | movh SPEFSCROVH 1 ovh | movh SPEFSCROV 1 ovl| movl SPEFSCROV 1 ovl| movl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh | movh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh | movh SPEFSCRSOV 1 SPEFSCRSOV | ovl| movl SPEFSCRSOV 1 SPEFSCRSOV | ovl| movl The corresponding odd-numbered halfword signed The corresponding odd-numbered halfword signed fractional elements in RA and RB are multiplied produc- fractional elements in RA and RB are multiplied produc- ing a 32-bit product. If both inputs are -1.0, the result ing a 32-bit product. If both inputs are -1.0, the result saturates to 0x7FFF_FFFF. Each 32-bit product is then saturates to 0x7FFF_FFFF. Each 32-bit product is then added to the corresponding word in the accumulator subtracted from the corresponding word in the accumu- saturating if overflow occurs, and the result is placed in lator saturating if overflow occurs, and the result is RT and the accumulator. placed in RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH 234 Power ISATM -- Book I Version 2.04 Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate into Saturate, Integer and Accumulate Words EVX-form Negative into Words EVX-form evmhossiaaw RT,RA,RB evmhossianw RT,RA,RB 4 RT RA RB 1285 4 RT RA RB 1413 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)16:31 ×si (RB)16:31 temp0:31 1 (RA)16:31 ×si (RB)16:31 temp0:63 1 EXTS((ACC)0:31) + EXTS(temp0:31) temp0:63 1 EXTS((ACC)0:31) - EXTS(temp0:31) ovh 1 (temp31 temp32) ovh 1 (temp31 temp32) RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) temp0:31 1 (RA)48:63 ×si (RB)48:63 temp0:31 1 (RA)48:63 ×si (RB)48:63 temp0:63 1 EXTS((ACC)32:63) + EXTS(temp0:31) temp0:63 1 EXTS((ACC)32:63) - EXTS(temp0:31) ovl 1 (temp31 temp32) ovl 1 (temp31 temp32) RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 SPEFSCROVH 1 ovh SPEFSCROVH 1 ovh SPEFSCROV 1 ovl SPEFSCROV 1 ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl SPEFSCRSOV 1 SPEFSCRSOV | ovl The corresponding odd-numbered halfword The corresponding odd-numbered halfword signed-integer elements in RA and RB are multiplied signed-integer elements in RA and RB are multiplied producing a 32-bit product. Each 32-bit product is then producing a 32-bit product. Each 32-bit product is then added to the corresponding word in the accumulator subtracted from the corresponding word in the accumu- saturating if overflow occurs, and the result is placed in lator saturating if overflow occurs, and the result is RT and the accumulator. placed in RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH Vector Multiply Halfwords, Odd, Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer EVX-form Unsigned, Modulo, Integer to Accumulator EVX-form evmhoumi RT,RA,RB evmhoumia RT,RA,RB 4 RT RA RB 1036 0 6 11 16 21 31 4 RT RA RB 1068 0 6 11 16 21 31 RT0:31 1 (RA)16:31 ×ui (RB)16:31 RT32:63 1 (RA)48:63 ×ui (RB)48:63 RT0:31 1 (RA)16:31 ×ui (RB)16:31 RT32:63 1 (RA)48:63 ×ui (RB)48:63 The corresponding odd-numbered halfword ACC0:63 1 (RT)0:63 unsigned-integer elements in RA and RB are multi- plied. The two 32-bit products are placed into the corre- The corresponding odd-numbered halfword sponding words of RT. unsigned-integer elements in RA and RB are multi- plied. The two 32-bit products are placed into RT and Special Registers Altered: into the accumulator. None Special Registers Altered: ACC Chapter 6. Signal Processing Engine (SPE) 235 Version 2.04 Vector Multiply Halfwords, Odd, Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Unsigned, Modulo, Integer and Accumulate into Words EVX-form Accumulate Negative into Words EVX-form evmhoumiaaw RT,RA,RB evmhoumianw RT,RA,RB 4 RT RA RB 1292 0 6 11 16 21 31 4 RT RA RB 1420 0 6 11 16 21 31 temp0:31 1 (RA)16:31 ×ui (RB)16:31 RT0:31 1 (ACC)0:31 + temp0:31 temp0:31 1 (RA)16:31 ×ui (RB)16:31 temp0:31 1 (RA)48:63 ×ui (RB)48:63 RT0:31 1 (ACC)0:31 - temp0:31 RT32:63 1 (ACC)32:63 + temp0:31 temp0:31 1 (RA)48:63 ×ui (RB)48:63 ACC0:63 1 (RT)0:63 RT32:63 1 (ACC)32:63 - temp0:31 ACC0:63 1 (RT)0:63 For each word element in the accumulator, the corre- sponding odd-numbered halfword unsigned-integer For each word element in the accumulator, the corre- elements in RA and RB are multiplied. Each intermedi- sponding odd-numbered halfword unsigned-integer ate product is added to the contents of the correspond- elements in RA and RB are multiplied. Each intermedi- ing accumulator word. The sums are placed into the ate product is subtracted from the contents of the corre- corresponding RT and accumulator words. sponding accumulator word. The results are placed into the corresponding RT and accumulator words. Special Registers Altered: ACC Special Registers Altered: ACC Vector Multiply Halfwords, Odd, Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Unsigned, Saturate, Integer and Accumulate into Words EVX-form Accumulate Negative into Words EVX-form evmhousiaaw RT,RA,RB evmhousianw RT,RA,RB 4 RT RA RB 1284 0 6 11 16 21 31 4 RT RA RB 1412 0 6 11 16 21 31 temp0:31 1 (RA)16:31 ×ui (RB)16:31 temp0:63 1 EXTZ((ACC)0:31) + EXTZ(temp0:31) temp0:31 1 (RA)16:31 ×ui (RB)16:31 ovh 1 temp31 temp0:63 1 EXTZ((ACC)0:31) - EXTZ(temp0:31) RT0:31 1 SATURATE(ovh, 0, 0xFFFF_FFFF, 0xFFFF_FFFF, ovh 1 temp31 temp32:63) RT0:31 1 SATURATE(ovh, 0, 0x0000_0000, 0x0000_0000, temp0:31 1 (RA)48:63 ×ui (RB)48:63 temp32:63) temp0:63 1 EXTZ((ACC)32:63) + EXTZ(temp0:31) temp0:31 1 (RA)48:63 ×ui (RB)48:63 ovl 1 temp31 temp0:63 1 EXTZ((ACC)32:63) - EXTZ(temp0:31) RT32:63 1 SATURATE(ovl, 0, 0xFFFF_FFFF, ovl 1 temp31 0xFFFF_FFFF, temp32:63) RT32:63 1 SATURATE(ovl, 0, 0x0000_0000,0x0000_0000, ACC0:63 1 (RT)0:63 temp32:63) SPEFSCROVH 1 ovh ACC0:63 1 (RT)0:63 SPEFSCROV 1 ovl SPEFSCROVH 1 ovh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCROV 1 ovl SPEFSCRSOV 1 SPEFSCRSOV | ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl For each word element in the accumulator, correspond- ing odd-numbered halfword unsigned-integer elements For each word element in the accumulator, correspond- in RA and RB are multiplied producing a 32-bit product. ing odd-numbered halfword unsigned-integer elements Each 32-bit product is then added to the corresponding in RA and RB are multiplied producing a 32-bit product. word in the accumulator saturating if overflow occurs, Each 32-bit product is then subtracted from the corre- and the result is placed in RT and the accumulator. sponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumu- Special Registers Altered: lator. ACC OV OVH SOV SOVH Special Registers Altered: ACC OV OVH SOV SOVH 236 Power ISATM -- Book I Version 2.04 Initialize Accumulator EVX-form evmra RT,RA 4 RT RA /// 1220 0 6 11 16 21 31 ACC0:63 1 (RA)0:63 RT0:63 1 (RA)0:63 The contents of RA are placed into the accumulator and RT. This is the method for initializing the accumula- tor. Special Registers Altered: ACC Vector Multiply Word High Signed, Vector Multiply Word High Signed, Modulo, Fractional EVX-form Modulo, Fractional to Accumulator EVX-form evmwhsmf RT,RA,RB evmwhsmfa RT,RA,RB 4 RT RA RB 1103 0 6 11 16 21 31 4 RT RA RB 1135 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×sf (RB)0:31 RT0:31 1 temp0:31 temp0:63 1 (RA)0:31 ×sf (RB)0:31 temp0:63 1 (RA)32:63 ×sf (RB)32:63 RT0:31 1 temp0:31 RT32:63 1 temp0:31 temp0:63 1 (RA)32:63 ×sf (RB)32:63 RT32:63 1 temp0:31 The corresponding word signed fractional elements in ACC0:63 1 (RT)0:63 RA and RB are multiplied and bits 0:31 of the two prod- ucts are placed into the two corresponding words of RT. The corresponding word signed fractional elements in RA and RB are multiplied and bits 0:31 of the two prod- Special Registers Altered: ucts are placed into the two corresponding words of RT None and into the accumulator. Special Registers Altered: ACC Vector Multiply Word High Signed, Vector Multiply Word High Signed, Modulo, Integer EVX-form Modulo, Integer to AccumulatorEVX-form evmwhsmi RT,RA,RB evmwhsmia RT,RA,RB 4 RT RA RB 1101 4 RT RA RB 1133 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×si (RB)0:31 temp0:63 1 (RA)0:31 ×si (RB)0:31 RT0:31 1 temp0:31 RT0:31 1 temp0:31 temp0:63 1 (RA)32:63 ×si (RB)32:63 temp0:63 1 (RA)32:63 ×si (RB)32:63 RT32:63 1 temp0:31 RT32:63 1 temp0:31 ACC0:63 1 (RT)0:63 The corresponding word signed-integer elements in RA and RB are multiplied. Bits 0:31 of the two 64-bit prod- The corresponding word signed-integer elements in RA ucts are placed into the two corresponding words of RT. and RB are multiplied. Bits 0:31 of the two 64-bit prod- ucts are placed into the two corresponding words of RT Special Registers Altered: and into the accumulator. None Special Registers Altered: ACC Chapter 6. Signal Processing Engine (SPE) 237 Version 2.04 Vector Multiply Word High Signed, Vector Multiply Word High Signed, Saturate, Fractional EVX-form Saturate, Fractional to Accumulator EVX-form evmwhssf RT,RA,RB evmwhssfa RT,RA,RB 4 RT RA RB 1095 0 6 11 16 21 31 4 RT RA RB 1127 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×sf (RB)0:31 if ((RA)0:31 = 0x8000_0000)& ((RB)0:31 = 0x8000_0000) temp0:63 1 (RA)0:31 ×sf (RB)0:31 then if ((RA)0:31 = 0x8000_0000) & ((RB)0:31 = 0x8000_0000) RT0:31 1 0x7FFF_FFFF then movh 1 1 RT0:31 1 0x7FFF_FFFF else movh 1 1 RT0:31 1 temp0:31 else movh 1 0 RT0:31 1 temp0:31 temp0:63 1 (RA)32:63 ×sf (RB)32:63 movh 1 0 if ((RA)32:63 = 0x8000_0000 &(RB)32:63 = 0x8000_0000) temp0:63 1 (RA)32:63 ×sf (RB)32:63 then if ((RA)32:63=0x8000_0000)&((RB)32:63=0x8000_0000) RT32:63 1 0x7FFF_FFFF then movl 1 1 RT32:63 1 0x7FFF_FFFF else movl 1 1 RT32:63 1 temp0:31 else movl 1 0 RT32:63 1 temp0:31 SPEFSCROVH 1 movh movl 1 0 SPEFSCROV 1 movl ACC0:63 1 (RT)0:63 SPEFSCRSOVH 1 SPEFSCRSOVH | movh SPEFSCROVH 1 movh SPEFSCRSOV 1 SPEFSCRSOV | movl SPEFSCROV 1 movl SPEFSCRSOVH 1 SPEFSCRSOVH | movh The corresponding word signed fractional elements in SPEFSCRSOV 1 SPEFSCRSOV | movl RA and RB are multiplied. Bits 0:31 of each product are placed into the corresponding words of RT. If both The corresponding word signed fractional elements in inputs are -1.0, the result saturates to the largest posi- RA and RB are multiplied. Bits 0:31 of each product are tive signed fraction. placed into the corresponding words of RT and into the accumulator. If both inputs are -1.0, the result saturates Special Registers Altered: to the largest positive signed fraction. OV OVH SOV SOVH Special Registers Altered: ACC OV OVH SOV SOVH Vector Multiply Word High Unsigned, Vector Multiply Word High Unsigned, Modulo, Integer EVX-form Modulo, Integer to AccumulatorEVX-form evmwhumi RT,RA,RB evmwhumia RT,RA,RB 4 RT RA RB 1100 4 RT RA RB 1132 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×ui (RB)0:31 temp0:63 1 (RA)0:31 ×ui (RB)0:31 RT0:31 1 temp0:31 RT0:31 1 temp0:31 temp0:63 1 (RA)32:63 ×ui (RB)32:63 temp0:63 1 (RA)32:63 ×ui (RB)32:63 RT32:63 1 temp0:31 RT32:63 1 temp0:31 ACC0:63 1 (RT)0:63 The corresponding word unsigned-integer elements in RA and RB are multiplied. Bits 0:31 of the two products The corresponding word unsigned-integer elements in are placed into the two corresponding words of RT. RA and RB are multiplied. Bits 0:31 of the two products are placed into the two corresponding words of RT and Special Registers Altered: into the accumulator. None Special Registers Altered: ACC 238 Power ISATM -- Book I Version 2.04 Vector Multiply Word Low Signed, Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Modulo, Integer and Accumulate Negative Words EVX-form in Words EVX-form evmwlsmiaaw RT,RA,RB evmwlsmianw RT,RA,RB 4 RT RA RB 1353 4 RT RA RB 1481 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×si (RB)0:31 temp0:63 1 (RA)0:31 ×si (RB)0:31 RT0:31 1 (ACC)0:31 + temp32:63 RT0:31 1 (ACC)0:31 - temp32:63 temp0:63 1 (RA)32:63 ×si (RB)32:63 temp0:63 1 (RA)32:63 ×si (RB)32:63 RT32:63 1 (ACC)32:63 + temp32:63 RT32:63 1 (ACC)32:63 - temp32:63 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 For each word element in the accumulator, the corre- For each word element in the accumulator, the corre- sponding word signed-integer elements in RA and RB sponding word elements in RA and RB are multiplied. are multiplied. The least significant 32 bits of each The least significant 32 bits of each intermediate prod- intermediate product are added to the contents of the uct are subtracted from the contents of the correspond- corresponding accumulator words, and the result is ing accumulator words and the result is placed in RT placed in RT and the accumulator. and the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Vector Multiply Word Low Signed, Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Saturate, Integer and Accumulate Words EVX-form Negative in Words EVX-form evmwlssiaaw RT,RA,RB evmwlssianw RT,RA,RB 4 RT RA RB 1345 4 RT RA RB 1473 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×si (RB)0:31 temp0:63 1 (RA)0:31 ×si (RB)0:31 temp0:63 1 EXTS((ACC)0:31) + EXTS(temp32:63) temp0:63 1 EXTS((ACC)0:31) - EXTS(temp32:63) ovh 1 (temp31 temp32) ovh 1 (temp31 temp32) RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) temp0:63 1 (RA)32:63 ×si (RB)32:63 temp0:63 1 (RA)32:63 ×si (RB)32:63 temp0:63 1 EXTS((ACC)32:63) + EXTS(temp32:63) temp0:63 1 EXTS((ACC)32:63) - EXTS(temp32:63) ovl 1 (temp31 temp32) ovl 1 (temp31 temp32) RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 SPEFSCROVH 1 ovh SPEFSCROVH 1 ovh SPEFSCROV 1 ovl SPEFSCROV 1 ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl SPEFSCRSOV 1 SPEFSCRSOV | ovl The corresponding word signed-integer elements in RA The corresponding word signed-integer elements in RA and RB are multiplied producing a 64-bit product. The and RB are multiplied producing a 64-bit product. The least significant 32 bits of each product are then added least significant 32 bits of each product are then sub- to the corresponding word in the accumulator saturat- tracted from the corresponding word in the accumulator ing if overflow occurs, and the result is placed in RT and saturating if overflow occurs, and the result is placed in the accumulator. RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH Chapter 6. Signal Processing Engine (SPE) 239 Version 2.04 Vector Multiply Word Low Unsigned, Vector Multiply Word Low Unsigned, Modulo, Integer EVX-form Modulo, Integer to AccumulatorEVX-form evmwlumi RT,RA,RB evmwlumia RT,RA,RB 4 RT RA RB 1096 4 RT RA RB 1128 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×ui (RB)0:31 temp0:63 1 (RA)0:31 ×ui (RB)0:31 RT0:31 1 temp32:63 RT0:31 1 temp32:63 temp0:63 1 (RA)32:63 ×ui (RB)32:63 temp0:63 1 (RA)32:63 ×ui (RB)32:63 RT32:63 1 temp32:63 RT32:63 1 temp32:63 ACC0:63 1 (RT)0:63 The corresponding word unsigned-integer elements in RA and RB are multiplied. The least significant 32 bits The corresponding word unsigned-integer elements in of each product are placed into the two corresponding RA and RB are multiplied. The least significant 32 bits words of RT. of each product are placed into the two corresponding words of RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC Programming Note The least significant 32 bits of the product are inde- Programming Note pendent of whether the word elements in RA and The least significant 32 bits of the product are inde- RB are treated as signed or unsigned 32-bit inte- pendent of whether the word elements in RA and gers. RB are treated as signed or unsigned 32-bit inte- gers. Note that evmwlumi can be used for signed or unsigned integers. Note that evmwlumia can be used for signed or unsigned integers. Vector Multiply Word Low Unsigned, Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Modulo, Integer and Accumulate Negative Words EVX-form in Words EVX-form evmwlumiaaw RT,RA,RB evmwlumianw RT,RA,RB 4 RT RA RB 1352 4 RT RA RB 1480 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×ui (RB)0:31 temp0:63 1 (RA)0:31 ×ui (RB)0:31 RT0:31 1 (ACC)0:31 + temp32:63 RT0:31 1 (ACC)0:31 - temp32:63 temp0:63 1 (RA)32:63 ×ui (RB)32:63 temp0:63 1 (RA)32:63 ×ui (RB)32:63 RT32:63 1 (ACC)32:63 - temp32:63 RT32:63 1 (ACC)32:63 + temp32:63 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 For each word element in the accumulator, the corre- For each word element in the accumulator, the corre- sponding word unsigned-integer elements in RA and sponding word unsigned-integer elements in RA and RB are multiplied. The least significant 32 bits of each RB are multiplied. The least significant 32 bits of each product are subtracted from the contents of the corre- product are added to the contents of the corresponding sponding accumulator word and the result is placed in accumulator word and the result is placed in RT and RT and the accumulator. the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC 240 Power ISATM -- Book I Version 2.04 Vector Multiply Word Low Unsigned, Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Saturate, Integer and Accumulate Words EVX-form Negative in Words EVX-form evmwlusiaaw RT,RA,RB evmwlusianw RT,RA,RB 4 RT RA RB 1344 4 RT RA RB 1472 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×ui (RB)0:31 temp0:63 1 (RA)0:31 ×ui (RB)0:31 temp0:63 1 EXTZ((ACC)0:31) + EXTZ(temp32:63) temp0:63 1 EXTZ((ACC)0:31) - EXTZ(temp32:63) ovh 1 temp31 ovh 1 temp31 RT0:31 1 SATURATE(ovh, 0, 0xFFFF_FFFF, 0xFFFF_FFFF, RT0:31 1 SATURATE(ovh, 0, 0x0000_0000, 0x0000_0000, temp32:63) temp32:63) temp0:63 1 (RA)32:63 ×ui (RB)32:63 temp0:63 1 (RA)32:63 ×ui (RB)32:63 temp0:63 1 EXTZ((ACC)32:63) + EXTZ(temp32:63) temp0:63 1 EXTZ((ACC)32:63) - EXTZ(temp32:63) ovl 1 temp31 ovl 1 temp31 RT32:63 1 SATURATE(ovl, 0, 0xFFFF_FFFF, RT32:63 1 SATURATE(ovl, 0, 0x0000_0000, 0xFFFF_FFFF, temp32:63) 0x0000_0000, temp32:63) ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 SPEFSCROVH 1 ovh SPEFSCROVH 1 ovh SPEFSCROV 1 ovl SPEFSCROV 1 ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl SPEFSCRSOV 1 SPEFSCRSOV | ovl For each word element in the accumulator, correspond- For each word element in the accumulator, correspond- ing word unsigned-integer elements in RA and RB are ing word unsigned-integer elements in RA and RB are multiplied producing a 64-bit product. The least signifi- multiplied producing a 64-bit product. The least signifi- cant 32 bits of each product are then added to the cor- cant 32 bits of each product are then subtracted from responding word in the accumulator saturating if the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the overflow occurs, and the result is placed in RT and the accumulator. accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH Vector Multiply Word Signed, Modulo, Vector Multiply Word Signed, Modulo, Fractional EVX-form Fractional to Accumulator EVX-form evmwsmf RT,RA,RB evmwsmfa RT,RA,RB 4 RT RA RB 1115 4 RT RA RB 1147 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 (RA)32:63 ×sf (RB)32:63 RT0:63 1 (RA)32:63 ×sf (RB)32:63 ACC0:63 1 (RT)0:63 The corresponding low word signed fractional elements in RA and RB are multiplied. The product is placed in The corresponding low word signed fractional elements RT. in RA and RB are multiplied. The product is placed in RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC Chapter 6. Signal Processing Engine (SPE) 241 Version 2.04 Vector Multiply Word Signed, Modulo, Vector Multiply Word Signed, Modulo, Fractional and Accumulate EVX-form Fractional and Accumulate Negative EVX-form evmwsmfaa RT,RA,RB evmwsmfan RT,RA,RB 4 RT RA RB 1371 0 6 11 16 21 31 4 RT RA RB 1499 0 6 11 16 21 31 temp0:63 1 (RA)32:63 ×sf (RB)32:63 RT0:63 1 (ACC)0:63 + temp0:63 temp0:63 1 (RA)32:63 ×sf (RB)32:63 ACC0:63 1 (RT)0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 The corresponding low word signed fractional elements in RA and RB are multiplied. The intermediate product The corresponding low word signed fractional elements is added to the contents of the 64-bit accumulator and in RA and RB are multiplied. The intermediate product the result is placed in RT and the accumulator. is subtracted from the contents of the accumulator and the result is placed in RT and the accumulator. Special Registers Altered: ACC Special Registers Altered: ACC Vector Multiply Word Signed, Modulo, Vector Multiply Word Signed, Modulo, Integer EVX-form Integer to Accumulator EVX-form evmwsmi RT,RA,RB evmwsmia RT,RA,RB 4 RT RA RB 1113 4 RT RA RB 1145 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 (RA)32:63 ×si (RB)32:63 RT0:63 1 (RA)32:63 ×si (RB)32:63 ACC0:63 1 (RT)0:63 The low word signed-integer elements in RA and RB are multiplied. The product is placed in RT. The low word signed-integer elements in RA and RB are multiplied. The product is placed in RT and the Special Registers Altered: accumulator. None Special Registers Altered: ACC Vector Multiply Word Signed, Modulo, Vector Multiply Word Signed, Modulo, Integer and Accumulate EVX-form Integer and Accumulate Negative EVX-form evmwsmiaa RT,RA,RB evmwsmian RT,RA,RB 4 RT RA RB 1369 0 6 11 16 21 31 4 RT RA RB 1497 0 6 11 16 21 31 temp0:63 1 (RA)32:63 ×si (RB)32:63 RT0:63 1 (ACC)0:63 + temp0:63 temp0:63 1 (RA)32:63 ×si (RB)32:63 ACC0:63 1 (RT)0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 The low word signed-integer elements in RA and RB are multiplied. The intermediate product is added to the The low word signed-integer elements in RA and RB contents of the 64-bit accumulator and the result is are multiplied. The intermediate product is subtracted placed in RT and the accumulator. from the contents of the 64-bit accumulator and the result is placed in RT and the accumulator. Special Registers Altered: ACC Special Registers Altered: ACC 242 Power ISATM -- Book I Version 2.04 Vector Multiply Word Signed, Saturate, Vector Multiply Word Signed, Saturate, Fractional EVX-form Fractional to Accumulator EVX-form evmwssf RT,RA,RB evmwssfa RT,RA,RB 4 RT RA RB 1107 4 RT RA RB 1139 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)32:63 ×sf (RB)32:63 temp0:63 1 (RA)32:63 ×sf (RB)32:63 if ((RA)32:63 = 0x8000_0000) & (RB32:63 = 0x8000_0000) if ((RA)32:63=0x8000_0000)&((RB)32:63=0x8000_0000) then then RT0:63 1 0x7FFF_FFFF_FFFF_FFFF RT0:63 1 0x7FFF_FFFF_FFFF_FFFF mov 1 1 mov 1 1 else else RT0:63 1 temp0:63 RT0:63 1 temp0:63 mov 1 0 mov 1 0 SPEFSCROVH 1 0 ACC0:63 1 (RT)0:63 SPEFSCROV 1 mov SPEFSCROVH 1 0 SPEFSCRSOV 1 SPEFSCRSOV | mov SPEFSCROV 1 mov SPEFSCRSOV 1 SPEFSCRSOV | mov The low word signed fractional elements in RA and RB are multiplied. The 64-bit product is placed in RT. If The low word signed fractional elements in RA and RB both inputs are -1.0, the result saturates to the largest are multiplied. The 64-bit product is placed in RT and positive signed fraction. into the accumulator. If both inputs are -1.0, the result saturates to the largest positive signed fraction. Special Registers Altered: OV OVH SOV Special Registers Altered: ACC OV OVH SOV Chapter 6. Signal Processing Engine (SPE) 243 Version 2.04 Vector Multiply Word Signed, Saturate, Vector Multiply Word Signed, Saturate, Fractional and Accumulate EVX-form Fractional and Accumulate Negative EVX-form evmwssfaa RT,RA,RB evmwssfan RT,RA,RB 4 RT RA RB 1363 0 6 11 16 21 31 4 RT RA RB 1491 0 6 11 16 21 31 temp0:63 1 (RA)32:63 ×sf (RB)32:63 if ((RA)32:63=0x8000_0000)&((RB)32:63=0x8000_0000) temp0:63 1 (RA)32:63 ×sf (RB)32:63 then if ((RA)32:63=0x8000_0000)&((RB)32:63=0x8000_0000) temp0:63 1 0x7FFF_FFFF_FFFF_FFFF then mov 1 1 temp0:63 1 0x7FFF_FFFF_FFFF_FFFF else mov 1 1 mov 1 0 else temp0:64 1 EXTS((ACC)0:63) + EXTS(temp0:63) mov 1 0 ov 1 (temp0 temp1) temp0:64 1 EXTS((ACC)0:63) - EXTS(temp0:63) RT0:63 1 temp1:64 ov 1 (temp0 temp1) RT0:63 1 temp1:64 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 SPEFSCROVH 1 0 SPEFSCROVH 1 0 SPEFSCROV 1 ov | mov SPEFSCROV 1 ov | mov SPEFSCRSOV 1 SPEFSCRSOV | ov | mov SPEFSCRSOV 1 SPEFSCRSOV | ov | mov The low word signed fractional elements in RA and RB The low word signed fractional elements in RA and RB are multiplied producing a 64-bit product. If both inputs are multiplied producing a 64-bit product. If both inputs are -1.0, the product saturates to the largest positive are -1.0, the product saturates to the largest positive signed fraction. The 64-bit product is then added to the signed fraction. The 64-bit product is then subtracted accumulator and the result is placed in RT and the from the accumulator and the result is placed in RT and accumulator. the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV ACC OV OVH SOV Vector Multiply Word Unsigned, Modulo, Vector Multiply Word Unsigned, Modulo, Integer EVX-form Integer to Accumulator EVX-form evmwumi RT,RA,RB evmwumia RT,RA,RB 4 RT RA RB 1112 4 RT RA RB 1144 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 (RA)32:63 ×ui (RB)32:63 RT0:63 1 (RA)32:63 ×ui (RB)32:63 ACC0:63 1 (RT)0:63 The low word unsigned-integer elements in RA and RB are multiplied to form a 64-bit product that is placed in The low word unsigned-integer elements in RA and RB RT. are multiplied to form a 64-bit product that is placed in RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC 244 Power ISATM -- Book I Version 2.04 Vector Multiply Word Unsigned, Modulo, Vector Multiply Word Unsigned, Modulo, Integer and Accumulate EVX-form Integer and Accumulate Negative EVX-form evmwumiaa RT,RA,RB evmwumian RT,RA,RB 4 RT RA RB 1368 0 6 11 16 21 31 4 RT RA RB 1496 0 6 11 16 21 31 temp0:63 1 (RA)32:63 ×ui (RB)32:63 RT0:63 1 (ACC)0:63 + temp0:63 temp0:63 1 (RA)32:63 ×ui (RB)32:63 ACC0:63 1 (RT)0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 The low word unsigned-integer elements in RA and RB are multiplied. The intermediate product is added to the The low word unsigned-integer elements in RA and RB contents of the 64-bit accumulator, and the resulting are multiplied. The intermediate product is subtracted value is placed into the accumulator and in RT. from the contents of the 64-bit accumulator, and the resulting value is placed into the accumulator and in Special Registers Altered: RT. ACC Special Registers Altered: ACC Vector NAND EVX-form Vector Negate EVX-form evnand RT,RA,RB evneg RT,RA 4 RT RA RB 542 4 RT RA /// 521 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 ¬((RA)0:31 & (RB)0:31) RT0:31 1 NEG((RA)0:31) RT32:63 1 ¬((RA)32:63 & (RB)32:63) RT32:63 1 NEG((RA)32:63) Each element of RA and RB is bitwise NANDed. The The negative of each element of RA is placed in RT. result is placed in the corresponding element of RT. The negative of 0x8000_0000 (most negative number) returns 0x8000_0000. Special Registers Altered: None Special Registers Altered: None Chapter 6. Signal Processing Engine (SPE) 245 Version 2.04 Vector NOR EVX-form Vector OR EVX-form evnor RT,RA,RB evor RT,RA,RB 4 RT RA RB 536 4 RT RA RB 535 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 ¬((RA)0:31 | (RB)0:31) RT0:31 1 (RA)0:31 | (RB)0:31 RT32:63 1 ¬((RA)32:63 | (RB)32:63) RT32:63 1 (RA)32:63 | (RB)32:63 Each element of RA and RB is bitwise NORed. The Each element of RA and RB is bitwise ORed. The result is placed in the corresponding element of RT. result is placed in the corresponding element of RT. Special Registers Altered: Special Registers Altered: None None Extended Mnemonics: Extended Mnemonics: Extended mnemonics are provided for the Vector NOR Extended mnemonics are provided for the Vector OR instruction to produce a vector bitwise complement instruction to provide a 64-bit vector move instruction. operation. Extended: Equivalent to: Extended: Equivalent to: evmr RT,RA evor RT,RA,RA evnot RT,RA evnor RT,RA,RA Vector OR with Complement EVX-form Vector Rotate Left Word EVX-form evorc RT,RA,RB evrlw RT,RA,RB 4 RT RA RB 539 4 RT RA RB 552 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (RA)0:31 | (¬(RB)0:31) nh 1 (RB)27:31 RT32:63 1 (RA)32:63 | (¬(RB)32:63) nl 1 (RB)59:63 RT0:31 1 ROTL((RA)0:31, nh) Each element of RA is bitwise ORed with the comple- RT32:63 1 ROTL((RA)32:63, nl) ment of RB. The result is placed in the corresponding element of RT. Each of the high and low elements of RA is rotated left by an amount specified in RB. The result is placed in Special Registers Altered: RT. Rotate values for each element of RA are found in None bit positions RB27:31 and RB59:63. Special Registers Altered: None 246 Power ISATM -- Book I Version 2.04 Vector Rotate Left Word Immediate Vector Round Word EVX-form EVX-form evrndw RT,RA evrlwi RT,RA,UI 4 RT RA /// 524 4 RT RA UI 554 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 ((RA)0:31+0x00008000) & 0xFFFF0000 n 1 UI RT32:63 1 ((RA)32:63+0x00008000) & 0xFFFF0000 RT0:31 1 ROTL((RA)0:31, n) The 32-bit elements of RA are rounded into 16 bits. The RT32:63 1 ROTL((RA)32:63, n) result is placed in RT. The resulting 16 bits are placed Both the high and low elements of RA are rotated left in the most significant 16 bits of each element of RT, by an amount specified by UI. zeroing out the low-order 16 bits of each element. Special Registers Altered: Special Registers Altered: None None Vector Select EVS-form evsel RT,RA,RB,BFA 4 RT RA RB 79 BFA 0 6 11 16 21 29 31 ch 1 CRBFA×4 cl 1 CRBFA×4+1 if (ch = 1) then RT0:31 1 (RA)0:31 else RT0:31 1 (RB)0:31 if (cl = 1) then RT32:63 1 (RA)32:63 else RT32:63 1 (RB)32:63 If the most significant bit in the BFA field of CR is set to 1, the high-order element of RA is placed in the high-order element of RT; otherwise, the high-order element of RB is placed into the high-order element of RT. If the next most significant bit in the BFA field of CR is set to 1, the low-order element of RA is placed in the low-order element of RT, otherwise, the low-order ele- ment of RB is placed into the low-order element of RT. Special Registers Altered: None Chapter 6. Signal Processing Engine (SPE) 247 Version 2.04 Vector Shift Left Word EVX-form Vector Shift Left Word Immediate EVX-form evslw RT,RA,RB evslwi RT,RA,UI 4 RT RA RB 548 0 6 11 16 21 31 4 RT RA UI 550 0 6 11 16 21 31 nh 1 (RB)26:31 nl 1 (RB)58:63 n 1 UI RT0:31 1 SL((RA)0:31, nh) RT0:31 1 SL((RA)0:31, n) RT32:63 1 SL((RA)32:63, nl) RT32:63 1 SL((RA)32:63, n) Each of the high and low elements of RA is shifted left Both high and low elements of RA are shifted left by the by an amount specified in RB. The result is placed in 5-bit UI value and the results are placed in RT. RT. The separate shift amounts for each element are specified by 6 bits in RB that lie in bit positions 26:31 Special Registers Altered: and 58:63. None Shift amounts from 32 to 63 give a zero result. Special Registers Altered: None Vector Splat Fractional Immediate Vector Splat Immediate EVX-form EVX-form evsplati RT,SI evsplatfi RT,SI 4 RT SI /// 553 4 RT SI /// 555 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 EXTS(SI) RT0:31 1 SI || 270 RT32:63 1 EXTS(SI) RT32:63 1 SI || 270 The value specified by SI is sign extended and placed The value specified by SI is padded with trailing zeros in both elements of RT. and placed in both elements of RT. The SI ends up in Special Registers Altered: bit positions RT0:4 and RT32:36. None Special Registers Altered: None Vector Shift Right Word Immediate Signed Vector Shift Right Word Immediate EVX-form Unsigned EVX-form evsrwis RT,RA,UI evsrwiu RT,RA,UI 4 RT RA UI 547 4 RT RA UI 546 0 6 11 16 21 31 0 6 11 16 21 31 n 1 UI n 1 UI RT0:31 1 EXTS((RA)0:31-n) RT0:31 1 EXTZ((RA)0:31-n) RT32:63 1 EXTS((RA)32:63-n) RT32:63 1 EXTZ((RA)32:63-n) Both high and low elements of RA are shifted right by Both high and low elements of RA are shifted right by the 5-bit UI value. Bits in the most significant positions the 5-bit UI value; zeros are shifted into the most signif- vacated by the shift are filled with a copy of the sign bit. icant position. Special Registers Altered: Special Registers Altered: None None 248 Power ISATM -- Book I Version 2.04 Vector Shift Right Word Signed EVX-form Vector Shift Right Word Unsigned EVX-form evsrws RT,RA,RB evsrwu RT,RA,RB 4 RT RA RB 545 0 6 11 16 21 31 4 RT RA RB 544 0 6 11 16 21 31 nh 1 (RB)26:31 nl 1 (RB)58:63 nh 1 (RB)26:31 RT0:31 1 EXTS((RA)0:31-nh) nl 1 (RB)58:63 RT32:63 1 EXTS((RA)32:63-nl) RT0:31 1 EXTZ((RA)0:31-nh) RT32:63 1 EXTZ((RA)32:63-nl) Both the high and low elements of RA are shifted right by an amount specified in RB. The result is placed in Both the high and low elements of RA are shifted right RT. The separate shift amounts for each element are by an amount specified in RB. The result is placed in specified by 6 bits in RB that lie in bit positions 26:31 RT. The separate shift amounts for each element are and 58:63. The sign bits are shifted into the most signif- specified by 6 bits in RB that lie in bit positions 26:31 icant position. and 58:63. Zeros are shifted into the most significant position. Shift amounts from 32 to 63 give a result of 32 sign bits. Shift amounts from 32 to 63 give a zero result. Special Registers Altered: None Special Registers Altered: None Vector Store Double of Double EVX-form Vector Store Double of Double Indexed EVX-form evstdd RS,D(RA) evstddx RS,RA,RB 4 RS RA UI 801 0 6 11 16 21 31 4 RS RA RB 800 0 6 11 16 21 31 if (RA = 0) then b 1 0 else b 1 (RA) if (RA = 0) then b 1 0 EA 1 b + EXTZ(UI×8) else b 1 (RA) MEM(EA,8) 1 (RS)0:63 EA 1 b + (RB) MEM(EA,8) 1 (RS)0:63 D in the instruction mnemonic is UI × 8. The contents of RS are stored as a doubleword in storage addressed The contents of RS are stored as a doubleword in stor- by EA. age addressed by EA. Special Registers Altered: Special Registers Altered: None None Chapter 6. Signal Processing Engine (SPE) 249 Version 2.04 Vector Store Double of Four Halfwords Vector Store Double of Four Halfwords EVX-form Indexed EVX-form evstdh RS,D(RA) evstdhx RS,RA,RB 4 RS RA UI 805 4 RS RA RB 804 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×8) EA 1 b + (RB) MEM(EA,2) 1 (RS)0:15 MEM(EA,2) 1 (RS)0:15 MEM(EA+2,2) 1 (RS)16:31 MEM(EA+2,2) 1 (RS)16:31 MEM(EA+4,2) 1 (RS)32:47 MEM(EA+4,2) 1 (RS)32:47 MEM(EA+6,2) 1 (RS)48:63 MEM(EA+6,2) 1 (RS)48:63 D in the instruction mnemonic is UI × 8. The contents of The contents of RS are stored as four halfwords in stor- RS are stored as four halfwords in storage addressed age addressed by EA. by EA. Special Registers Altered: Special Registers Altered: None None Vector Store Double of Two Words Vector Store Double of Two Words EVX-form Indexed EVX-form evstdw RS,D(RA) evstdwx RS,RA,RB 4 RS RA UI 803 4 RS RA RB 802 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×8) EA 1 b + (RB) MEM(EA,4) 1 (RS)0:31 MEM(EA,4) 1 (RS)0:31 MEM(EA+4,4) 1 (RS)32:63 MEM(EA+4,4) 1 (RS)32:63 D in the instruction mnemonic is UI × 8. The contents of The contents of RS are stored as two words in storage RS are stored as two words in storage addressed by addressed by EA. EA. Special Registers Altered: Special Registers Altered: None None 250 Power ISATM -- Book I Version 2.04 Vector Store Word of Two Halfwords from Vector Store Word of Two Halfwords from Even EVX-form Even Indexed EVX-form evstwhe RS,D(RA) evstwhex RS,RA,RB 4 RS RA UI 817 4 RS RA RB 816 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×4) EA 1 b + (RB) MEM(EA,2) 1 (RS)0:15 MEM(EA,2) 1 (RS)0:15 MEM(EA+2,2) 1 (RS)32:47 MEM(EA+2,2) 1 (RS)32:47 D in the instruction mnemonic is UI × 4. The even half- The even halfwords from each element of RS are words from each element of RS are stored as two half- stored as two halfwords in storage addressed by EA. words in storage addressed by EA. Special Registers Altered: Special Registers Altered: None None Vector Store Word of Two Halfwords from Vector Store Word of Two Halfwords from Odd EVX-form Odd Indexed EVX-form evstwho RS,D(RA) evstwhox RS,RA,RB 4 RS RA UI 821 4 RS RA RB 820 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×4) EA 1 b + (RB) MEM(EA,2) 1 (RS)16:31 MEM(EA,2) 1 (RS)16:31 MEM(EA+2,2) 1 (RS)48:63 MEM(EA+2,2) 1 (RS)48:63 D in the instruction mnemonic is UI × 4. The odd half- The odd halfwords from each element of RS are stored words from each element of RS are stored as two half- as two halfwords in storage addressed by EA. words in storage addressed by EA. Special Registers Altered: Special Registers Altered: None None Vector Store Word of Word from Even Vector Store Word of Word from Even EVX-form Indexed EVX-form evstwwe RS,D(RA) evstwwex RS,RA,RB 4 RS RA UI 825 4 RS RA RB 824 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×4) EA 1 b + (RB) MEM(EA,4) 1 (RS)0:31 MEM(EA,4) 1 (RS)0:31 D in the instruction mnemonic is UI × 4. The even word The even word of RS is stored in storage addressed by of RS is stored in storage addressed by EA. EA. Special Registers Altered: Special Registers Altered: None None Chapter 6. Signal Processing Engine (SPE) 251 Version 2.04 Vector Store Word of Word from Odd Vector Store Word of Word from Odd EVX-form Indexed EVX-form evstwwo RS,D(RA) evstwwox RS,RA,RB 4 RS RA UI 829 4 RS RA RB 828 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×4) EA 1 b + (RB) MEM(EA,4) 1 (RS)32:63 MEM(EA,4) 1 (RS)32:63 D in the instruction mnemonic is UI × 4. The odd word The odd word of RS is stored in storage addressed by of RS is stored in storage addressed by EA. EA. Special Registers Altered: Special Registers Altered: None None Vector Subtract Signed, Modulo, Integer Vector Subtract Signed, Saturate, Integer to Accumulator Word EVX-form to Accumulator Word EVX-form evsubfsmiaaw RT,RA evsubfssiaaw RT,RA 4 RT RA /// 1227 4 RT RA /// 1219 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (ACC)0:31 - (RA)0:31 temp0:63 1 EXTS((ACC)0:31) - EXTS((RA)0:31) RT32:63 1 (ACC)32:63 - (RA)32:63 ovh 1 temp31 temp32 ACC0:63 1 (RT)0:63 RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) Each word element in RA is subtracted from the corre- temp0:63 1 EXTS((ACC)32:63) - EXTS((RA)32:63) sponding element in the accumulator and the difference ovl 1 temp31 temp32 is placed into the corresponding RT word and into the RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, accumulator. 0x7FFF_FFFF, temp32:63) ACC0:63 1 (RT)0:63 Special Registers Altered: SPEFSCROVH 1 ovh ACC SPEFSCROV 1 ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl Each signed-integer word element in RA is sign-extended and subtracted from the corresponding sign-extended element in the accumulator saturating if overflow occurs, and the results are placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH 252 Power ISATM -- Book I Version 2.04 Vector Subtract Unsigned, Modulo, Vector Subtract from Word EVX-form Integer to Accumulator Word EVX-form evsubfw RT,RA,RB evsubfumiaaw RT,RA 4 RT RA RB 516 4 RT RA /// 1226 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (RB)0:31 - (RA)0:31 RT0:31 1 (ACC)0:31 - (RA)0:31 RT32:63 1 (RB)32:63 - (RA)32:63 RT32:63 1 (ACC)32:63 - (RA)32:63 Each signed-integer element of RA is subtracted from ACC0:63 1 (RT)0:63 the corresponding element of RB and the results are Each unsigned-integer word element in RA is sub- placed in RT. tracted from the corresponding element in the accumu- Special Registers Altered: lator and the results are placed in RT and into the None accumulator. Special Registers Altered: ACC Vector Subtract Unsigned, Saturate, Vector Subtract Immediate from Word Integer to Accumulator Word EVX-form EVX-form evsubfusiaaw RT,RA evsubifw RT,UI,RB 4 RT RA /// 1218 4 RT UI RB 518 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 EXTZ((ACC)0:31) - EXTZ((RA)0:31) RT0:31 1 (RB)0:31 - EXTZ(UI) ovh 1 temp31 RT32:63 1 (RB)32:63 - EXTZ(UI) RT0:31 1 SATURATE(ovh, temp31, 0x0000_0000, 0x0000_0000, temp32:63) UI is zero-extended and subtracted from both the high temp0:63 1 EXTS((ACC)32:63) - EXTS((RA)32:63) and low elements of RB. Note that the same value is ovl 1 temp31 subtracted from both elements of the register. RT32:63 1 SATURATE(ovl, temp31, 0x0000_0000, 0x0000_0000, temp32:63) Special Registers Altered: ACC0:63 1 (RT)0:63 None SPEFSCROVH 1 ovh SPEFSCROV 1 ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl Vector XOR EVX-form Each unsigned-integer word element in RA is zero-extended and subtracted from the corresponding evxor RT,RA,RB zero-extended element in the accumulator saturating if overflow occurs, and the results are placed in RT and 4 RT RA RB 534 0 6 11 16 21 31 the accumulator. Special Registers Altered: RT0:31 1 (RA)0:31 (RB)0:31 ACC OV OVH SOV SOVH RT32:63 1 (RA)32:63 (RB)32:63 Each element of RA and RB is exclusive-ORed. The results are placed in RT. Special Registers Altered: None Chapter 6. Signal Processing Engine (SPE) 253 Version 2.04 254 Power ISATM -- Book I Version 2.04 Chapter 7. Embedded Floating-Point [Category: SPE.Embedded Float Scalar Double] [Category: SPE.Embedded Float Scalar Single] [Category: SPE.Embedded Float Vector] 7.1 Overview. . . . . . . . . . . . . . . . . . . . 255 7.2.4.1 Sticky Bit Handling For Exception 7.2 Programming Model . . . . . . . . . . . 256 Conditions . . . . . . . . . . . . . . . . . . . . . . 258 7.2.1 Signal Processing Embedded Float- 7.3 Embedded Floating-Point Instructions ing-Point Status and Control Register 259 (SPEFSCR). . . . . . . . . . . . . . . . . . . . . 256 7.3.1 Load/Store Instructions . . . . . . . 259 7.2.2 Floating-Point Data Formats . . . 256 7.3.2 SPE.Embedded Float Vector Instruc- 7.2.3 Exception Conditions . . . . . . . . . 257 tions [Category: SPE.Embedded Float 7.2.3.1 Denormalized Values on Input 257 Vector] . . . . . . . . . . . . . . . . . . . . . . . . . 259 7.2.3.2 Embedded Floating-Point Over- 7.3.3 SPE.Embedded Float Scalar Single flow and Underflow . . . . . . . . . . . . . . . 257 Instructions 7.2.3.3 Embedded Floating-Point Invalid [Category: SPE.Embedded Float Scalar Operation/Input Errors. . . . . . . . . . . . . 257 Single] . . . . . . . . . . . . . . . . . . . . . . . . . 267 7.2.3.4 Embedded Floating-Point Round 7.3.4 SPE.Embedded Float Scalar Double (Inexact) . . . . . . . . . . . . . . . . . . . . . . . 257 Instructions 7.2.3.5 Embedded Floating-Point Divide by [Category: SPE.Embedded Float Scalar Zero. . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Double] . . . . . . . . . . . . . . . . . . . . . . . . 274 7.2.3.6 Default Results . . . . . . . . . . . . 258 7.4 Embedded Floating-Point Results 7.2.4 IEEE 754 Compliance . . . . . . . . 258 Summary . . . . . . . . . . . . . . . . . . . . . . . 282 7.1 Overview Single-precision floating-point is handled by the SPE.Embedded Float Vector and SPE.Embedded The Embedded Floating-Point categories require the Float Scalar Single categories; double-precision float- implementation of the Signal Processing Engine (SPE) ing-point is handled by the SPE.Embedded Float Sca- category and consist of three distinct categories: lar Double category. 1 Embedded vector single-precision floating-point (SPE.Embedded Float Vector [SP.FV]) 1 Embedded scalar single-precision floating-point (SPE.Embedded Float Scalar Single [SP.FS]) 1 Embedded scalar double-precision floating-point (SPE.Embedded Float Scalar Double [SP.FD]) Although each of these may be implemented indepen- dently, they are defined in a single chapter because it is likely that they may be implemented together. References to Embedded Floating-Point categories, Embedded Floating-Point instructions, or Embedded Floating-Point operations apply to all 3 categories. Chapter 7. Embedded Floating-Point 255 Version 2.04 7.2 Programming Model ing-point data elements are 64 bits wide with 1 sign bit (s), 11 bits of biased exponent (e) and 52 bits of fraction Embedded floating-point operations are performed in (f). the GPRs of the processor. In the IEEE 754 specification, floating-point values are The SPE.Embedded Float Vector and SPE.Embedded represented in a format consisting of three explicit Float Scalar Double categories require a GPR register fields (sign field, biased exponent field, and fraction file with thirty-two 64-bit registers as required by the field) and an implicit hidden bit. Signal Processing Engine category. hidden bit The SPE.Embedded Float Scalar Single category 0 1 8 9 31 (or 32:63) s exp fraction Single-precision requires a GPR register file with thirty-two 32-bit regis- ters. When implemented with a 64-bit register file on a 0 1 11 12 63 32-bit implementation, instructions in this category only s exp fraction Double-precision use and modify bits 32:63 of the GPR. In this case, bits s - sign bit; 0 = positive; 1 = negative 0:31 of the GPR are left unchanged by the operation. exp - biased exponent field For 64-bit implementations, bits 0:31 are unchanged fraction - fractional portion of number after the operation. Figure 69. Floating-Point Data Format Instructions in the SPE.Embedded Float Scalar Double category operate on the entire 64 bits of the GPRs. For single-precision normalized numbers, the biased exponent value e lies in the range of 1 to 254 corre- Instructions in the SPE.Embedded Float Vector cate- sponding to an actual exponent value E in the range gory operate on the entire 64 bits of the GPRs as well, -126 to +127. For double-precision normalized num- but contain two 32-bit data items that are operated on bers, the biased exponent value e lies in the range of 1 independently of each other in a SIMD fashion. The for- to 2046 corresponding to an actual exponent value E in mat of both data items is the same as the format of a the range -1022 to +1023. With the hidden bit implied to data item in the SPE.Embedded Float Scalar Single be `1' (for normalized numbers), the value of the num- category. The data item contained in bits 0:31 is called ber is interpreted as follows: the `high word'. The data item contained in bits 32:63 is called the `low word'. s ( ­ 1 ) × 2 E × ( 1.fraction ) There are no record forms of Embedded Floating-Point instructions. Embedded Floating-Point Compare where E is the unbiased exponent and 1.fraction is the instructions treat NaNs, Infinity, and Denorm as normal- mantissa (or significand) consisting of a leading `1' (the ized numbers for the comparison calculation when hidden bit) and a fractional part (fraction field). For the default results are provided. single-precision format, the maximum positive normal- ized number (pmax) is represented by the encoding 7.2.1 Signal Processing Embed- 0x7F7FFFFF which is approximately 3.4E+38 (2128), and the minimum positive normalized value (pmin) is ded Floating-Point Status and Con- represented by the encoding 0x00800000 which is trol Register (SPEFSCR) approximately 1.2E-38 (2-126). For the double-precision format, the maximum positive normalized number Status and control for the Embedded Floating-Point (pmax) is represented by the encoding categories uses the SPEFSCR. This register is defined 0x7feFFFFF_FFFFFFFF which is approximately by the Signal Processing Engine category in Section 1.8E+307 (21024), and the minimum positive normal- 6.3.4. Status and control bits are shared for Embedded ized value (pmin) is represented by the encoding Floating-Point and SPE operations. Instructions in the 0x00100000_00000000 which is approximately SPE.Embedded Float Vector category affect both the 2.2E-308 (2-1022). high element (bits 34:39) and low element floating-point Two specific values of the biased exponent are status flags (bits 50:55). Instructions in the reserved (0 and 255 for single-precision; 0 and 2047 for SPE.Embedded Float Scalar Double and SPE.Embed- double-precision) for encoding special values of +0, -0, ded Float Scalar Single categories affect only the low +infinity, -infinity, and NaNs. element floating-point status flags and leave the high element floating-point status flags undefined. Zeros of both positive and negative sign are repre- sented by a biased exponent value e of 0 and a fraction f which is 0. 7.2.2 Floating-Point Data Formats Infinities of both positive and negative sign are repre- Single-precision floating-point data elements are 32 sented by a maximum exponent field value (255 for sin- bits wide with 1 sign bit (s), 8 bits of biased exponent gle-precision, 2047 for double-precision) and a fraction (e) and 23 bits of fraction (f). Double-precision float- which is 0. 256 Power ISATM -- Book I Version 2.04 Denormalized numbers of both positive and negative Programming Note sign are represented by a biased exponent value e of 0 and a fraction f, which is nonzero. For these numbers, On some implementations, operations that result in the hidden bit is defined by the IEEE 754 standard to overflow or underflow are likely to take significantly be 0. This number type is not directly supported in longer than operations that do not. For example, hardware. Instead, either a software interrupt handler is these operations may cause a system error handler invoked, or a default value is defined. to be invoked; on such implementations, the sys- tem error handler updates the overflow bits appro- Not-a-Numbers (NaNs) are represented by a maximum priately. exponent field value (255 for single-precision, 2047 for double-precision) and a fraction f which is nonzero. 7.2.3 Exception Conditions 7.2.3.3 Embedded Floating-Point Invalid Operation/Input Errors 7.2.3.1 Denormalized Values on Input Embedded Floating-Point Invalid Operation/Input errors Any denormalized value used as an operand may be occur when an operand to an operation contains an truncated by the implementation to a properly signed invalid input value. If any of the input values are Infinity, zero value. Denorm, or NaN, or for an Embedded Floating-Point Divide instruction both operands are +/-0, SPEFSCRF- INV FINVH are set to 1 appropriately, and SPEFSCRFGH 7.2.3.2 Embedded Floating-Point Over- FXH FG FX are set to 0 appropriately. If SPEFSCRF- flow and Underflow INVE=1, an Embedded Floating-Point Data interrupt is taken and the destination register is not updated. Defining pmax to be the most positive normalized value (farthest from zero), pmin the smallest positive normal- ized value (closest to zero), nmax the most negative 7.2.3.4 Embedded Floating-Point normalized value (farthest from zero) and nmin the Round (Inexact) smallest normalized negative value (closest to zero), an overflow is said to have occurred if the numerically cor- If any result element of an Embedded Floating-Point rect result (r) of an instruction is such that r>pmax or instruction is inexact, or overflows but Embedded Float- r bh) then ch 1 1 if (ah < bh) then ch 1 1 else ch 1 0 else ch 1 0 if (al > bl) then cl 1 1 if (al < bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 ch || cl || (ch | cl) || (ch & cl) CR4×BF:4×BF+3 1 ch || cl || (ch | cl) || (ch & cl) Each element of register RA is compared against the Each element of register RA is compared against the corresponding element of register RB. The results of corresponding element of register RB. The results of the comparisons are placed into CR field BF. If RA0:31 the comparisons are placed into CR field BF. If RA0:31 is greater than RB0:31, bit 0 of CR field BF is set to 1, is less than RB0:31, bit 0 of CR field BF is set to 1, oth- otherwise it is set to 0. If RA32:63 is greater than erwise it is set to 0. If RA32:63 is less than RB32:63, bit 1 RB32:63, bit 1 of CR field BF is set to 1, otherwise it is of CR field BF is set to 1, otherwise it is set to 0. Bit 2 of set to 0. Bit 2 of CR field BF is set to the OR of both CR field BF is set to the OR of both result bits and Bit 3 result bits and Bit 3 of CR field BF is set to the AND of of CR field BF is set to the AND of both result bits. both result bits. Comparison ignores the sign of 0 Comparison ignores the sign of 0 (+0 = -0). (+0 = -0). If an input error occurs and default results are gener- If an input error occurs and default results are gener- ated, NaNs, Infinities, and Denorms as treated as nor- ated, NaNs, Infinities, and Denorms as treated as nor- malized numbers, using their values of `e' and `f' malized numbers, using their values of `e' and `f' directly. directly. Special Registers Altered: Special Registers Altered: FINV FINVH FINVS FINV FINVH FINVS FGH FXH FG FX FGH FXH FG FX CR field BF CR field BF Chapter 7. Embedded Floating-Point 261 Version 2.04 Vector Floating-Point Single-Precision Vector Floating-Point Single-Precision Compare Equal EVX-form Test Greater Than EVX-form evfscmpeq BF,RA,RB evfststgt BF,RA,RB 4 BF // RA RB 654 4 BF // RA RB 668 0 6 9 11 16 21 31 0 6 9 11 16 21 31 ah 1 (RA)0:31 ah 1 (RA)0:31 al 1 (RA)32:63 al 1 (RA)32:63 bh 1 (RB)0:31 bh 1 (RB)0:31 bl 1 (RB)32:63 bl 1 (RB)32:63 if (ah = bh) then ch 1 1 if (ah > bh) then ch 1 1 else ch 1 0 else ch 1 0 if (al = bl) then cl 1 1 if (al > bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 ch || cl || (ch | cl) || (ch & cl) CR4×BF:4×BF+3 1 ch || cl || (ch | cl) || (ch & cl) Each element of register RA is compared against the Each element of register RA is compared against the corresponding element of register RB. The results of corresponding element of register RB.The results of the the comparisons are placed into CR field BF. If RA0:31 comparisons are placed into CR field BF. If RA0:31 is is equal to RB0:31, bit 0 of CR field BF is set to 1, other- greater than RB0:31, bit 0 of CR field BF is set to 1, oth- wise it is set to 0. If RA32:63 is equal to RB32:63, bit 1 of erwise it is set to 0. If RA32:63 is greater than RB32:63, CR field BF is set to 1, otherwise it is set to 0. Bit 2 of bit 1 of CR field BF is set to 1, otherwise it is set to 0. CR field BF is set to the OR of both result bits and Bit 3 Bit 2 of CR field BF is set to the OR of both result bits of CR field BF is set to the AND of both result bits. and Bit 3 of CR field BF is set to the AND of both result Comparison ignores the sign of 0 (+0 = -0). bits. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, If an input error occurs and default results are gener- and Denorms as normalized numbers, using their val- ated, NaNs, Infinities, and Denorms as treated as nor- ues of `e' and `f' directly. malized numbers, using their values of `e' and `f' directly. No exceptions are taken during the execution of evfst- stgt. Special Registers Altered: FINV FINVH FINVS Special Registers Altered: FGH FXH FG FX CR field BF CR field BF Programming Note In an implementation, the execution of evfststgt is likely to be faster than the execution of evfscmpgt; however, if strict IEEE 754 compliance is required, the program should use evfscmpgt. 262 Power ISATM -- Book I Version 2.04 Vector Floating-Point Single-Precision Vector Floating-Point Single-Precision Test Less Than EVX-form Test Equal EVX-form evfststlt BF,RA,RB evfststeq BF,RA,RB 4 BF // RA RB 669 4 BF // RA RB 670 0 6 9 11 16 21 31 0 6 9 11 16 21 31 ah 1 (RA)0:31 ah 1 (RA)0:31 al 1 (RA)32:63 al 1 (RA)32:63 bh 1 (RB)0:31 bh 1 (RB)0:31 bl 1 (RB)32:63 bl 1 (RB)32:63 if (ah < bh) then ch 1 1 if (ah = bh) then ch 1 1 else ch 1 0 else ch 1 0 if (al < bl) then cl 1 1 if (al = bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 ch || cl || (ch | cl) || (ch & cl) CR4×BF:4×BF+3 1 ch || cl || (ch | cl) || (ch & cl) Each element of register RA is compared with the cor- Each element of register RA is compared against the responding element of register RB. The results of the corresponding element of register RB. The results of comparisons are placed into CR field BF. If RA0:31 is the comparisons are placed into CR field BF. If RA0:31 less than RB0:31, bit 0 of CR field BF is set to 1, other- is equal to RB0:31, bit 0 of CR field BF is set to 1, other- wise it is set to 0. If RA32:63 is less than RB32:63, bit 1 of wise it is set to 0. If RA32:63 is equal to RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bit 2 of CR field BF is set to 1, otherwise it is set to 0. Bit 2 of CR field BF is set to the OR of both result bits and Bit 3 CR field BF is set to the OR of both result bits and Bit 3 of CR field BF is set to the AND of both result bits. of CR field BF is set to the AND of both result bits. Comparison ignores the sign of 0 (+0 = -0). The com- Comparison ignores the sign of 0 (+0 = -0). The com- parison proceeds after treating NaNs, Infinities, and parison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of Denorms as normalized numbers, using their values of `e' and `f' directly. `e' and `f' directly. No exceptions are taken during the execution of evfst- No exceptions are taken during the execution of evfst- stlt. steq. Special Registers Altered: Special Registers Altered: CR field BF CR field BF Programming Note Programming Note In an implementation, the execution of evfststlt is In an implementation, the execution of evfststeq is likely to be faster than the execution of evfscmplt; likely to be faster than the execution of evfsc- however, if strict IEEE 754 compliance is required, mpeq; however, if strict IEEE 754 compliance is the program should use evfscmplt. required, the program should use evfscmpeq. Chapter 7. Embedded Floating-Point 263 Version 2.04 Vector Convert Floating-Point Vector Convert Floating-Point Single-Precision from Signed Integer Single-Precision from Unsigned Integer EVX-form EVX-form evfscfsi RT,RB evfscfui RT,RB 4 RT /// RB 657 4 RT /// RB 656 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 CnvtI32ToFP32((RB)0:31, S, HI, I) RT0:31 1 CnvtI32ToFP32((RB)0:31, U, HI, I) RT32:63 1 CnvtI32ToFP32((RB)32:63, S, LO, I) RT32:63 1 CnvtI32ToFP32((RB)32:63, U, LO, I) Each signed integer element of register RB is con- Each unsigned integer element of register RB is con- verted to the nearest single-precision floating-point verted to the nearest single-precision floating-point value using the current rounding mode and the results value using the current rounding mode and the results are placed into the corresponding element of register are placed into the corresponding elements of register RT. RT. Special Registers Altered: Special Registers Altered: FGH FXH FG FX FINXS FGH FXH FG FX FINXS Vector Convert Floating-Point Vector Convert Floating-Point Single-Precision from Signed Fraction Single-Precision from Unsigned Fraction EVX-form EVX-form evfscfsf RT,RB evfscfuf RT,RB 4 RT /// RB 659 4 RT /// RB 658 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 CnvtI32ToFP32((RB)0:31, S, HI, F) RT0:31 1 CnvtI32ToFP32((RB)0:31, U, HI, F) RT32:63 1 CnvtI32ToFP32((RB)32:63, S, LO, F) RT32:63 1 CnvtI32ToFP32((RB)32:63, U, LO, F) Each signed fractional element of register RB is con- Each unsigned fractional element of register RB is con- verted to a single-precision floating-point value using verted to a single-precision floating-point value using the current rounding mode and the results are placed the current rounding mode and the results are placed into the corresponding elements of register RT. into the corresponding elements of register RT. Special Registers Altered: Special Registers Altered: FGH FXH FG FX FINXS FGH FXH FG FX FINXS 264 Power ISATM -- Book I Version 2.04 Vector Convert Floating-Point Vector Convert Floating-Point Single-Precision to Signed Integer Single-Precision to Signed Integer with EVX-form Round toward Zero EVX-form evfsctsi RT,RB evfsctsiz RT,RB 4 RT /// RB 661 4 RT /// RB 666 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 CnvtFP32ToI32Sat((RB)0:31, S, HI, RND, I) RT0:31 1 CnvtFP32ToI32Sat((RB)0:31, S, HI, ZER, I) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, S, LO, RND, I) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, S, LO, ZER, I) Each single-precision floating-point element in register Each single-precision floating-point element in register RB is converted to a signed integer using the current RB is converted to a signed integer using the rounding rounding mode and the result is saturated if it cannot be mode Round toward Zero and the result is saturated if it represented in a 32-bit integer. NaNs are converted as cannot be represented in a 32-bit integer. NaNs are though they were zero. converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVH FINVS FINV FINVH FINVS FGH FXH FG FX FINXS FGH FXH FG FX FINXS Vector Convert Floating-Point Vector Convert Floating-Point Single-Precision to Unsigned Integer Single-Precision to Unsigned Integer with EVX-form Round toward Zero EVX-form evfsctui RT,RB evfsctuiz RT,RB 4 RT /// RB 660 4 RT /// RB 664 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 CnvtFP32ToI32Sat((RB)0:31, U, HI, RND, I) RT0:31 1 CnvtFP32ToI32Sat((RB)0:31, U, HI, ZER, I) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63,U, LO, RND, I) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, U, LO, ZER, I) Each single-precision floating-point element in register Each single-precision floating-point element in register RB is converted to an unsigned integer using the cur- RB is converted to an unsigned integer using the rent rounding mode and the result is saturated if it can- rounding mode Round toward Zero and the result is not be represented in a 32-bit integer. NaNs are saturated if it cannot be represented in a 32-bit integer. converted as though they were zero. NaNs are converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVH FINVS FINV FINVH FINVS FGH FXH FG FX FINXS FGH FXH FG FX FINXS Chapter 7. Embedded Floating-Point 265 Version 2.04 Vector Convert Floating-Point Vector Convert Floating-Point Single-Precision to Signed Fraction Single-Precision to Unsigned Fraction EVX-form EVX-form evfsctsf RT,RB evfsctuf RT,RB 4 RT /// RB 663 4 RT /// RB 662 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 CnvtFP32ToI32Sat((RB)0:31, S, HI, RND ,F) RT0:31 1 CnvtFP32ToI32Sat((RB)0:31, U, HI, RND, F) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, S, LO, RND, F) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, U, LO, RND, F) Each single-precision floating-point element in register Each single-precision floating-point element in register RB is converted to a signed fraction using the current RB is converted to an unsigned fraction using the cur- rounding mode and the result is saturated if it cannot be rent rounding mode and the result is saturated if it can- represented in a 32-bit signed fraction. NaNs are con- not be represented in a 32-bit fraction. NaNs are verted as though they were zero. converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVH FINVS FINV FINVH FINVS FGH FXH FG FX FINXS FGH FXH FG FX FINXS 266 Power ISATM -- Book I Version 2.04 7.3.3 SPE.Embedded Float Scalar Single Instructions [Category: SPE.Embedded Float Scalar Single] Floating-Point Single-Precision Absolute Floating-Point Single-Precision Negative Value EVX-form Absolute Value EVX-form efsabs RT,RA efsnabs RT,RA 4 RT RA /// 708 4 RT RA /// 709 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 0b0 || (RA)33:63 RT32:63 1 0b1 || (RA)33:63 The sign bit of the low element of register RA is set to 0 The sign bit of the low element of register RA is set to 1 and the result is placed into the low element of register and the result is placed into the low element of register RT. RT. Regardless of the value of register RA, no exceptions Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. are taken during the execution of this instruction. Special Registers Altered: Special Registers Altered: None None Floating-Point Single-Precision Negate EVX-form efsneg RT,RA 4 RT RA /// 710 0 6 11 16 21 31 RT32:63 1 ¬(RA)32 || (RA)33:63 The sign bit of the low element of register RA is com- plemented and the result is placed into the low element of register RT. Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. Special Registers Altered: None Chapter 7. Embedded Floating-Point 267 Version 2.04 Floating-Point Single-Precision Add Floating-Point Single-Precision Subtract EVX-form EVX-form efsadd RT,RA,RB efssub RT,RA,RB 4 RT RA RB 704 4 RT RA RB 705 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 (RA)32:63 +sp (RB)32:63 RT32:63 1 (RA)32:63 -sp (RB)32:63 The low element of register RA is added to the low ele- The low element of register RB is subtracted from the ment of register RB and the result is stored in the low low element of register RA and the result is stored in element of register RT. the low element of register RT. If an underflow occurs, +0 (for rounding modes RN, RZ, If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in register RP) or -0 (for rounding mode RM) is stored in register RT. RT. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FOVF FOVFS FOVF FOVFS FUNF FUNFS FUNF FUNFS FG FX FINXS FG FX FINXS Floating-Point Single-Precision Multiply Floating-Point Single-Precision Divide EVX-form EVX-form efsmul RT,RA,RB efsdiv RT,RA,RB 4 RT RA RB 712 4 RT RA RB 713 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 (RA)32:63 ×sp (RB)32:63 RT32:63 1 (RA)32:63 ÷sp (RB)32:63 The low element of register RA is multiplied by the low The low element of register RA is divided by the low element of register RB and the result is stored in the element of register RB and the result is stored in the low element of register RT. low element of register RT. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FOVF FOVFS FG FX FINXS FUNF FUNFS FDBZ FDBZS FG FX FINXS FOVF FOVFS FUNF FUNFS 268 Power ISATM -- Book I Version 2.04 Floating-Point Single-Precision Compare Floating-Point Single-Precision Compare Greater Than EVX-form Less Than EVX-form efscmpgt BF,RA,RB efscmplt BF,RA,RB 4 BF // RA RB 716 4 BF // RA RB 717 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al 1 (RA)32:63 al 1 (RA)32:63 bl 1 (RB)32:63 bl 1 (RB)32:63 if (al > bl) then cl 1 1 if (al < bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined The low element of register RA is compared against the The low element of register RA is compared against the low element of register RB. The results of the compari- low element of register RB. If RA32:63 is less than sons are placed into CR field BF. If RA32:63 is greater RB32:63, bit 1 of CR field BF is set to 1, otherwise it is than RB32:63, bit 1 of CR field BF is set to 1, otherwise it set to 0. Bits 0, 2, and 3 of CR field BF are undefined. is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). Comparison ignores the sign of 0 (+0 = -0). If an Input Error occurs and default results are gener- If an Input Error occurs and default results are gener- ated, NaNs, Infinities, and Denorms are treated as nor- ated, NaNs, Infinities, and Denorms are treated as nor- malized numbers, using their values of `e' and `f' malized numbers, using their values of `e' and `f' directly. directly. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FG FX FG FX CR field BF CR field BF Chapter 7. Embedded Floating-Point 269 Version 2.04 Floating-Point Single-Precision Compare Floating-Point Single-Precision Test Equal EVX-form Greater Than EVX-form efscmpeq BF,RA,RB efststgt BF,RA,RB 4 BF // RA RB 718 4 BF // RA RB 732 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al 1 (RA)32:63 al 1 (RA)32:63 bl 1 (RB)32:63 bl 1 (RB)32:63 if (al = bl) then cl 1 1 if (al > bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined The low element of register RA is compared against the The low element of register RA is compared against the low element of register RB. If RA32:63 is equal to low element of register RB. If RA32:63 is greater than RB32:63, bit 1 of CR field BF is set to 1, otherwise it is RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). Comparison ignores the sign of 0 (+0 = -0). The com- parison proceeds after treating NaNs, Infinities, and If an Input Error occurs and default results are gener- Denorms as normalized numbers, using their values of ated, NaNs, Infinities, and Denorms are treated as nor- `e' and `f' directly. malized numbers, using their values of `e' and `f' directly. No exceptions are generated during the execution of efststgt. Special Registers Altered: FINV FINVS Special Registers Altered: FG FX CR field BF CR field BF Programming Note In an implementation, the execution of efststgt is likely to be faster than the execution of efscmpgt; however, if strict IEEE 754 compliance is required, the program should use efscmpgt. 270 Power ISATM -- Book I Version 2.04 Floating-Point Single-Precision Test Less Floating-Point Single-Precision Test Than EVX-form Equal EVX-form efststlt BF,RA,RB efststeq BF,RA,RB 4 BF // RA RB 733 4 BF // RA RB 734 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al 1 (RA)32:63 al 1 (RA)32:63 bl 1 (RB)32:63 bl 1 (RB)32:63 if (al < bl) then cl 1 1 if (al = bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined The low element of register RA is compared against the The low element of register RA is compared against the low element of register RB. If RA32:63 is less than low element of register RB. If RA32:63 is equal to RB32:63, bit 1 of CR field BF is set to 1, otherwise it is RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). The com- Comparison ignores the sign of 0 (+0 = -0). The com- parison proceeds after treating NaNs, Infinities, and parison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of Denorms as normalized numbers, using their values of `e' and `f' directly. `e' and `f' directly. No exceptions are generated during the execution of No exceptions are generated during the execution of efststlt. efststeq. Special Registers Altered: Special Registers Altered: CR field BF CR field BF Programming Note Programming Note In an implementation, the execution of efststlt is In an implementation, the execution of efststeq is likely to be faster than the execution of efscmplt; likely to be faster than the execution of efscmpeq; however, if strict IEEE 754 compliance is required, however, if strict IEEE 754 compliance is required, the program should use efscmplt. the program should use efscmpeq. Chapter 7. Embedded Floating-Point 271 Version 2.04 Convert Floating-Point Single-Precision Convert Floating-Point Single-Precision from Signed Integer EVX-form from Unsigned Integer EVX-form efscfsi RT,RB efscfui RT,RB 4 RT /// RB 721 4 RT /// RB 720 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 CnvtI32ToFP32((RB)32:63, S, LO, I) RT32:63 1 CnvtI32ToFP32((RB)32:63, U, LO, I) The signed integer low element in register RB is con- The unsigned integer low element in register RB is con- verted to a single-precision floating-point value using verted to a single-precision floating-point value using the current rounding mode and the result is placed into the current rounding mode and the result is placed into the low element of register RT. the low element of register RT. Special Registers Altered: Special Registers Altered: FINXS FG FX FINXS FG FX Convert Floating-Point Single-Precision Convert Floating-Point Single-Precision from Signed Fraction EVX-form from Unsigned Fraction EVX-form efscfsf RT,RB efscfuf RT,RB 4 RT /// RB 723 4 RT /// RB 722 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 CnvtI32ToFP32((RB)32:63, S, LO, F) RT32:63 1 CnvtI32ToFP32((RB)32:63, U, LO, F) The signed fractional low element in register RB is con- The unsigned fractional low element in register RB is verted to a single-precision floating-point value using converted to a single-precision floating-point value the current rounding mode and the result is placed into using the current rounding mode and the result is the low element of register RT. placed into the low element of register RT. Special Registers Altered: Special Registers Altered: FINXS FG FX FINXS FG FX Convert Floating-Point Single-Precision Convert Floating-Point Single-Precision to Signed Integer EVX-form to Unsigned Integer EVX-form efsctsi RT,RB efsctui RT,RB 4 RT /// RB 725 4 RT /// RB 724 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, S, LO, RND, I) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, U, LO, RND, I) The single-precision floating-point low element in regis- The single-precision floating-point low element in regis- ter RB is converted to a signed integer using the cur- ter RB is converted to an unsigned integer using the rent rounding mode and the result is saturated if it current rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FINXS FG FX FINXS FG FX 272 Power ISATM -- Book I Version 2.04 Convert Floating-Point Single-Precision Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero to Unsigned Integer with Round toward EVX-form Zero EVX-form efsctsiz RT,RB efsctuiz RT,RB 4 RT /// RB 730 4 RT /// RB 728 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, S, LO, ZER, I) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, U, LO, ZER, I) The single-precision floating-point low element in regis- The single-precision floating-point low element in regis- ter RB is converted to a signed integer using the round- ter RB is converted to an unsigned integer using the ing mode Round toward Zero and the result is rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. NaNs are converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FINXS FG FX FINXS FG FX Convert Floating-Point Single-Precision Convert Floating-Point Single-Precision to Signed Fraction EVX-form to Unsigned Fraction EVX-form efsctsf RT,RB efsctuf RT,RB 4 RT /// RB 727 4 RT /// RB 726 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, S, LO, RND, F) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, U, LO, RND, F) The single-precision floating-point low element in regis- The single-precision floating-point low element in regis- ter RB is converted to a signed fraction using the cur- ter RB is converted to an unsigned fraction using the rent rounding mode and the result is saturated if it current rounding mode and the result is saturated if it cannot be represented in a 32-bit fraction. NaNs are cannot be represented in a 32-bit unsigned fraction. converted as though they were zero. NaNs are converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FINXS FG FX FINXS FG FX Chapter 7. Embedded Floating-Point 273 Version 2.04 7.3.4 SPE.Embedded Float Scalar Double Instructions [Category: SPE.Embedded Float Scalar Double] Floating-Point Double-Precision Absolute Floating-Point Double-Precision Negative Value EVX-form Absolute Value EVX-form efdabs RT,RA efdnabs RT,RA 4 RT RA /// 740 4 RT RA /// 741 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 0b0 || (RA)1:63 RT0:63 1 0b1 || (RA)1:63 The sign bit of register RA is set to 0 and the result is The sign bit of register RA is set to 1 and the result is placed in register RT. placed in register RT. Regardless of the value of register RA, no exceptions Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. are taken during the execution of this instruction. Special Registers Altered: Special Registers Altered: None None Floating-Point Double-Precision Negate EVX-form efdneg RT,RA 4 RT RA /// 742 0 6 11 16 21 31 RT0:63 1 ¬(RA)0 || (RA)1:63 The sign bit of register RA is complemented and the result is placed in register RT. Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. Special Registers Altered: None 274 Power ISATM -- Book I Version 2.04 Floating-Point Double-Precision Add Floating-Point Double-Precision Subtract EVX-form EVX-form efdadd RT,RA,RB efdsub RT,RA,RB 4 RT RA RB 736 4 RT RA RB 737 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 (RA)0:63 +dp (RB)0:63 RT0:63 1 (RA)0:63 -dp (RB)0:63 RA is added to RB and the result is stored in register RB is subtracted from RA and the result is stored in RT. register RT. If an underflow occurs, +0 (for rounding modes RN, RZ, If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in register RP) or -0 (for rounding mode RM) is stored in register RT. RT. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FOVF FOVFS FOVF FOVFS FUNF FUNFS FUNF FUNFS FG FX FINXS FG FX FINXS Floating-Point Double-Precision Multiply Floating-Point Double-Precision Divide EVX-form EVX-form efdmul RT,RA,RB efddiv RT,RA,RB 4 RT RA RB 744 4 RT RA RB 745 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 (RA)0:63 ×dp (RB)0:63 RT0:63 1 (RA)0:63 ÷dp (RB)0:63 RA is multiplied by RB and the result is stored in regis- RA is divided by RB and the result is stored in register ter RT. RT. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FOVF FOVFS FG FX FINXS FUNF FUNFS FDBZ FDBZS FG FX FINXS FOVF FOVFS FUNF FUNFS Chapter 7. Embedded Floating-Point 275 Version 2.04 Floating-Point Double-Precision Compare Floating-Point Double-Precision Compare Greater Than EVX-form Less Than EVX-form efdcmpgt BF,RA,RB efdcmplt BF,RA,RB 4 BF // RA RB 748 4 BF // RA RB 749 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al 1 (RA)0:63 al 1 (RA)0:63 bl 1 (RB)0:63 bl 1 (RB)0:63 if (al > bl) then cl 1 1 if (al < bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined RA is compared against RB. If RA is greater than RB, RA is compared against RB. If RA is less than RB, bit 1 bit 1 of CR field BF is set to 1, otherwise it is set to 0. of CR field BF is set to 1, otherwise it is set to 0. Bits 0, Bits 0, 2, and 3 of CR field BF are undefined. Compari- 2, and 3 of CR field BF are undefined. Comparison son ignores the sign of 0 (+0 = -0). ignores the sign of 0 (+0 = -0). If an input error occurs and default results are gener- If an input error occurs and default results are gener- ated, NaNs, Infinities, and Denorms are treated as nor- ated, NaNs, Infinities, and Denorms are treated as nor- malized numbers, using their values of `e' and `f' malized numbers, using their values of `e' and `f' directly. directly. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FG FX FG FX CR field BF CR field BF Floating-Point Double-Precision Compare Floating-Point Double-Precision Test Equal EVX-form Greater Than EVX-form efdcmpeq BF,RA,RB efdtstgt BF,RA,RB 4 BF // RA RB 750 4 BF // RA RB 764 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al 1 (RA)0:63 al 1 (RA)0:63 bl 1 (RB)0:63 bl 1 (RB)0:63 if (al = bl) then cl 1 1 if (al > bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined RA is compared against RB. If RA is equal to RB, bit 1 RA is compared against RB. If RA is greater than RB, of CR field BF is set to 1, otherwise it is set to 0. Bits 0, bit 1 of CR field BF is set to 1, otherwise it is set to 0. 2, and 3 of CR field BF are undefined. Comparison Bits 0, 2, and 3 of CR field BF are undefined. Compari- ignores the sign of 0 (+0 = -0). son ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms If an input error occurs and default results are gener- as normalized numbers, using their values of `e' and `f' ated, NaNs, Infinities, and Denorms are treated as nor- directly. malized numbers, using their values of `e' and `f' directly. No exceptions are generated during the execution of efdtstgt. Special Registers Altered: FINV FINVS Special Registers Altered: FG FX CR field BF CR field BF Programming Note In an implementation, the execution of efdtstgt is likely to be faster than the execution of efdcmpgt; however, if strict IEEE 754 compliance is required, the program should use efdcmpgt. 276 Power ISATM -- Book I Version 2.04 Floating-Point Double-Precision Test Less Floating-Point Double-Precision Test Than EVX-form Equal EVX-form efdtstlt BF,RA,RB efdtsteq BF,RA,RB 4 BF // RA RB 765 4 BF // RA RB 766 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al 1 (RA)0:63 al 1 (RA)0:63 bl 1 (RB)0:63 bl 1 (RB)0:63 if (al < bl) then cl 1 1 if (al = bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined RA is compared against RB. If RA is less than RB, bit 1 RA is compared against RB. If RA is equal to RB, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). The comparison pro- ignores the sign of 0 (+0 = -0). The comparison pro- ceeds after treating NaNs, Infinities, and Denorms as ceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of `e' and `f' normalized numbers, using their values of `e' and `f' directly. directly. No exceptions are generated during the execution of No exceptions are generated during the execution of efdtstlt. efdtsteq. Special Registers Altered: Special Registers Altered: CR field BF CR field BF Programming Note Programming Note In an implementation, the execution of efdtstlt is In an implementation, the execution of efdtsteq is likely to be faster than the execution of efdcmplt; likely to be faster than the execution of efdcmpeq; however, if strict IEEE 754 compliance is required, however, if strict IEEE 754 compliance is required, the program should use efdcmplt. the program should use efdcmpeq. Convert Floating-Point Double-Precision Convert Floating-Point Double-Precision from Signed Integer EVX-form from Unsigned Integer EVX-form efdcfsi RT,RB efdcfui RT,RB 4 RT /// RB 753 4 RT /// RB 752 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 CnvtI32ToFP64((RB)32:63, S, I) RT0:63 1 CnvtI32ToFP64((RB)32:63, U, I) The signed integer low element in register RB is con- The unsigned integer low element in register RB is con- verted to a double-precision floating-point value using verted to a double-precision floating-point value using the current rounding mode and the result is placed in the current rounding mode and the result is placed in register RT. register RT. Special Registers Altered: Special Registers Altered: None None Chapter 7. Embedded Floating-Point 277 Version 2.04 Convert Floating-Point Double-Precision Convert Floating-Point Double-Precision from Signed Integer Doubleword from Unsigned Integer Doubleword EVX-form EVX-form efdcfsid RT,RB efdcfuid RT,RB 4 RT /// RB 739 4 RT /// RB 738 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 CnvtI64ToFP64((RB)0:63, S) RT0:63 1 CnvtI64ToFP64((RB)0:63, U) The signed integer doubleword in register RB is con- The unsigned integer doubleword in register RB is con- verted to a double-precision floating-point value using verted to a double-precision floating-point value using the current rounding mode and the result is placed in the current rounding mode and the result is placed in register RT. register RT. Corequisite Categories: Corequisite Categories: 64-Bit 64-Bit Special Registers Altered: Special Registers Altered: FINXS FG FX FINXS FG FX Convert Floating-Point Double-Precision Convert Floating-Point Double-Precision from Signed Fraction to Signed Integer EVX-form EVX-form efdctsi RT,RB efdcfsf RT,RB 4 RT /// RB 757 4 RT /// RB 755 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 CnvtFP64ToI32Sat((RB)0:63, S, RND, I) RT0:63 1 CnvtI32ToFP64((RB)32:63, S, F) The double-precision floating-point value in register RB The signed fractional low element in register RB is con- is converted to a signed integer using the current verted to a double-precision floating-point value using rounding mode and the result is saturated if it cannot be the current rounding mode and the result is placed in represented in a 32-bit integer. NaNs are converted as register RT. though they were zero. Special Registers Altered: Special Registers Altered: None FINV FINVS FINXS FG FX Convert Floating-Point Double-Precision from Unsigned Fraction EVX-form Convert Floating-Point Double-Precision to Unsigned Integer EVX-form efdcfuf RT,RB efdctui RT,RB 4 RT /// RB 754 0 6 11 16 21 31 4 RT /// RB 756 0 6 11 16 21 31 RT0:63 1 CnvtI32ToFP64((RB)32:63, U, F) RT32:63 1 CnvtFP64ToI32Sat((RB)0:63, U, RND, I) The unsigned fractional low element in register RB is converted to a double-precision floating-point value The double-precision floating-point value in register RB using the current rounding mode and the result is is converted to an unsigned integer using the current placed in register RT. rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as Special Registers Altered: though they were zero. None Special Registers Altered: FINV FINVS FINXS FG FX 278 Power ISATM -- Book I Version 2.04 Convert Floating-Point Double-Precision Convert Floating-Point Double-Precision to Signed Integer Doubleword with Round to Unsigned Integer Doubleword with toward Zero EVX-form Round toward Zero EVX-form efdctsidz RT,RB efdctuidz RT,RB 4 RT /// RB 747 4 RT /// RB 746 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 CnvtFP64ToI64Sat((RB)0:63, S, ZER) RT0:63 1 CnvtFP64ToI64Sat((RB)0:63, U, ZER) The double-precision floating-point value in register RB The double-precision floating-point value in register RB is converted to a signed integer doubleword using the is converted to an unsigned integer doubleword using rounding mode Round toward Zero and the result is the rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 64-bit integer. saturated if it cannot be represented in a 64-bit integer. NaNs are converted as though they were zero. NaNs are converted as though they were zero. Corequisite Categories: Corequisite Categories: 64-Bit 64-Bit Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FINXS FG FX FINXS FG FX Chapter 7. Embedded Floating-Point 279 Version 2.04 Convert Floating-Point Double-Precision Convert Floating-Point Double-Precision to Signed Integer with Round toward Zero to Unsigned Integer with Round toward EVX-form Zero EVX-form efdctsiz RT,RB efdctuiz RT,RB 4 RT /// RB 762 4 RT /// RB 760 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 CnvtFP64ToI32Sat((RB)0:63, S, ZER, I) RT32:63 1 CnvtFP64ToI32Sat((RB)0:63, U, ZER, I) The double-precision floating-point value in register RB The double-precision floating-point value in register RB is converted to a signed integer using the rounding is converted to an unsigned integer using the rounding mode Round toward Zero and the result is saturated if it mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FINXS FG FX FINXS FG FX Convert Floating-Point Double-Precision Floating-Point Double-Precision convert to Signed Fraction EVX-form from Single-Precision EVX-form efdctsf RT,RB efdcfs RT,RB 4 RT /// RB 759 4 RT /// RB 751 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 CnvtFP64ToI32Sat((RB)0:63, S, RND, F) FP32format f; FP64format result; The double-precision floating-point value in register RB f 1 (RB)32:63 is converted to a signed fraction using the current if (fexp = 0) & (ffrac = 0)) then rounding mode and the result is saturated if it cannot be result 1 fsign || 630 represented in a 32-bit fraction. NaNs are converted as else if Isa32NaNorInfinity(f) | Isa32Denorm(f) then though they were zero. SPEFSCRFINV 1 1 result 1 fsign || 0b11111111110 || 521 Special Registers Altered: else if Isa32Denorm(f) then FINV FINVS SPEFSCRFINV 1 1 FINXS FG FX result 1 fsign || 630 else resultsign 1 fsign Convert Floating-Point Double-Precision resultexp 1 fexp - 127 + 1023 resultfrac 1 ffrac || 290 to Unsigned Fraction EVX-form RT0:63 1 result efdctuf RT,RB The single-precision floating-point value in the low ele- ment of register RB is converted to a double-precision 4 RT /// RB 758 floating-point value and the result is placed in register 0 6 11 16 21 31 RT. Corequisite Categories: RT32:63 1 CnvtFP64ToI32Sat((RB)0:63, U, RND, F) SPE.Embedded Float Scalar Single or The double-precision floating-point value in register RB SPE.Embedded Float Vector is converted to an unsigned fraction using the current Special Registers Altered: rounding mode and the result is saturated if it cannot be FINV FINVS represented in a 32-bit unsigned fraction. NaNs are FG FX converted as though they were zero. Special Registers Altered: FINV FINVS FINXS FG FX 280 Power ISATM -- Book I Version 2.04 Floating-Point Single-Precision Convert from Double-Precision EVX-form efscfd RT,RB 4 RT /// RB 719 0 6 11 16 21 31 FP64format f; FP32format result; f 1 (RB)0:63 if (fexp = 0) & (ffrac = 0)) then result 1 fsign || 310 else if Isa64NaNorInfinity(f) then SPEFSCRFINV 1 1 result 1 fsign || 0b11111110 || 231 else if Isa64Denorm(f) then SPEFSCRFINV 1 1 result 1 fsign || 310 else unbias 1 fexp - 1023 if unbias > 127 then result 1 fsign || 0b11111110 || 231 SPEFSCRFOVF 1 1 else if unbias < -126 then result 1 fsign || 0b00000001 || 230 SPEFSCRFUNF 1 1 else resultsign 1 fsign resultexp 1 unbias + 127 resultfrac 1 ffrac[0:22] guard 1 ffrac[23] sticky 1 (ffrac[24:51] 0) result 1 Round32(result, LO, guard, sticky) SPEFSCRFG 1 guard SPEFSCRFX 1 sticky if guard | sticky then SPEFSCRFINXS 1 1 RT32:63 1 result The double-precision floating-point value in register RB is converted to a single-precision floating-point value using the current rounding mode and the result is placed into the low element of register RT. Corequisite Categories: SPE.Embedded Float Scalar Scalar Special Registers Altered: FINV FINVS FOVF FOVFS FUNF FUNFS FG FX FINXS Chapter 7. Embedded Floating-Point 281 Version 2.04 7.4 Embedded Floating-Point Results Summary The following tables summarize the results of various 1 nmin denotes the minimum normalized negative types of Embedded Floating-Point operations on vari- number. The encoding for single-precision is: ous combinations of input operands. Flag settings are 0x80800000. The encoding for double-precision is: performed on appropriate element flags. For all the 0x80100000_00000000. tables the following annotation and general rules apply: 1 Calculations that overflow or underflow saturate. 1 * denotes that this status flag is set based on the Overflow for operations that have a floating-point results of the calculation. result force the result to max. Underflow for opera- 1 _Calc_ denotes that the result is updated with the tions that have a floating-point result force the results of the computation. result to zero. Overflow for operations that have a 1 max denotes the maximum normalized number signed integer result force the result to with the sign set to the computation [sign(operand 0x7FFFFFFF (positive) or 0x80000000 (negative). A) XOR sign(operand B)]. Overflow for operations that have an unsigned inte- 1 amax denotes the maximum normalized number ger result force the result to 0xFFFFFFFF (posi- with the sign set to the sign of Operand A. tive) or 0x00000000 (negative). 1 bmax denotes the maximum normalized number 1 1 (superscript) denotes that the sign of the result is with the sign set to the sign of Operand B. positive when the sign of Operand A and the sign 1 pmax denotes the maximum normalized positive of Operand B are different, for all rounding modes number. The encoding for single-precision is: except round to -infinity, where the sign of the 0x7F7FFFFF. The encoding for double-precision result is then negative. is: 0x7FEFFFFF_FFFFFFFF. 1 2 (superscript) denotes that the sign of the result is 1 nmax denotes the maximum normalized negative positive when the sign of Operand A and the sign number. The encoding for single-precision is: of Operand B are the same, for all rounding modes 0xFF7FFFFF. The encoding for double-precision except round to -infinity, where the sign of the is: 0xFFEFFFFF_FFFFFFFF. result is then negative. 1 pmin denotes the minimum normalized positive 1 3 (superscript) denotes that the sign for any multi- number. The encoding for single-precision is: ply or divide is always the result of the operation 0x00800000. The encoding for double-precision is: [sign(Operand A) XOR sign(Operand B)]. 0x00100000_00000000. 1 4 (superscript) denotes that if an overflow is detected, the result may be saturated. Table 3: Embedded Floating-Point Results Summary--Add, Sub, Mul, Div Operation Operand A Operand B Result FINV FOVF FUNF FDBZ FINX Add Add amax 1 0 0 0 0 Add NaN amax 1 0 0 0 0 Add denorm amax 1 0 0 0 0 Add zero amax 1 0 0 0 0 Add Norm amax 1 0 0 0 0 Add NaN amax 1 0 0 0 0 Add NaN NaN amax 1 0 0 0 0 Add NaN denorm amax 1 0 0 0 0 Add NaN zero amax 1 0 0 0 0 Add NaN norm amax 1 0 0 0 0 Add denorm bmax 1 0 0 0 0 Add denorm NaN bmax 1 0 0 0 0 1 Add denorm denorm zero 1 0 0 0 0 Add denorm zero zero1 1 0 0 0 0 4 Add denorm norm operand_b 1 0 0 0 0 Add zero bmax 1 0 0 0 0 Add zero NaN bmax 1 0 0 0 0 Add zero denorm zero1 1 0 0 0 0 282 Power ISATM -- Book I Version 2.04 Table 3: Embedded Floating-Point Results Summary--Add, Sub, Mul, Div (Continued) Operation Operand A Operand B Result FINV FOVF FUNF FDBZ FINX Add zero zero zero1 0 0 0 0 0 Add zero norm operand_b4 0 0 0 0 0 Add norm bmax 1 0 0 0 0 Add norm NaN bmax 1 0 0 0 0 Add norm denorm operand_a4 1 0 0 0 0 Add norm zero operand_a4 0 0 0 0 0 Add norm norm _Calc_ 0 * * 0 * Subtract Sub amax 1 0 0 0 0 Sub NaN amax 1 0 0 0 0 Sub denorm amax 1 0 0 0 0 Sub zero amax 1 0 0 0 0 Sub Norm amax 1 0 0 0 0 Sub NaN amax 1 0 0 0 0 Sub NaN NaN amax 1 0 0 0 0 Sub NaN denorm amax 1 0 0 0 0 Sub NaN zero amax 1 0 0 0 0 Sub NaN norm amax 1 0 0 0 0 Sub denorm -bmax 1 0 0 0 0 Sub denorm NaN -bmax 1 0 0 0 0 2 Sub denorm denorm zero 1 0 0 0 0 2 Sub denorm zero zero 1 0 0 0 0 Sub denorm norm -operand_b4 1 0 0 0 0 Sub zero -bmax 1 0 0 0 0 Sub zero NaN -bmax 1 0 0 0 0 Sub zero denorm zero2 1 0 0 0 0 Sub zero zero zero2 0 0 0 0 0 4 Sub zero norm -operand_b 0 0 0 0 0 Sub norm -bmax 1 0 0 0 0 Sub norm NaN -bmax 1 0 0 0 0 Sub norm denorm operand_a4 1 0 0 0 0 Sub norm zero operand_a4 0 0 0 0 0 Sub norm norm _Calc_ 0 * * 0 * Multiply3 Mul max 1 0 0 0 0 Mul NaN max 1 0 0 0 0 Mul denorm zero 1 0 0 0 0 Mul zero zero 1 0 0 0 0 Mul Norm max 1 0 0 0 0 Mul NaN max 1 0 0 0 0 Mul NaN NaN max 1 0 0 0 0 Mul NaN denorm zero 1 0 0 0 0 Mul NaN zero zero 1 0 0 0 0 Mul NaN norm max 1 0 0 0 0 Chapter 7. Embedded Floating-Point 283 Version 2.04 Table 3: Embedded Floating-Point Results Summary--Add, Sub, Mul, Div (Continued) Operation Operand A Operand B Result FINV FOVF FUNF FDBZ FINX Mul denorm zero 1 0 0 0 0 Mul denorm NaN zero 1 0 0 0 0 Mul denorm denorm zero 1 0 0 0 0 Mul denorm zero zero 1 0 0 0 0 Mul denorm norm zero 1 0 0 0 0 Mul zero zero 1 0 0 0 0 Mul zero NaN zero 1 0 0 0 0 Mul zero denorm zero 1 0 0 0 0 Mul zero zero zero 0 0 0 0 0 Mul zero norm zero 0 0 0 0 0 Mul norm max 1 0 0 0 0 Mul norm NaN max 1 0 0 0 0 Mul norm denorm zero 1 0 0 0 0 Mul norm zero zero 0 0 0 0 0 Mul norm norm _Calc_ 0 * * 0 * 3 Divide Div zero 1 0 0 0 0 Div NaN zero 1 0 0 0 0 Div denorm max 1 0 0 0 0 Div zero max 1 0 0 0 0 Div Norm max 1 0 0 0 0 Div NaN zero 1 0 0 0 0 Div NaN NaN zero 1 0 0 0 0 Div NaN denorm max 1 0 0 0 0 Div NaN zero max 1 0 0 0 0 Div NaN norm max 1 0 0 0 0 Div denorm zero 1 0 0 0 0 Div denorm NaN zero 1 0 0 0 0 Div denorm denorm max 1 0 0 0 0 Div denorm zero max 1 0 0 0 0 Div denorm norm zero 1 0 0 0 0 Div zero zero 1 0 0 0 0 Div zero NaN zero 1 0 0 0 0 Div zero denorm max 1 0 0 0 0 Div zero zero max 1 0 0 0 0 Div zero norm zero 0 0 0 0 0 Div norm zero 1 0 0 0 0 Div norm NaN zero 1 0 0 0 0 Div norm denorm max 1 0 0 0 0 Div norm zero max 0 0 0 1 0 Div norm norm _Calc_ 0 * * 0 * 284 Power ISATM -- Book I Version 2.04 Table 4: Embedded Floating-Point Results Summary--Single Convert from Double Operand B efscfd result FINV FOVF FUNF FDBZ FINX + pmax 1 0 0 0 0 - nmax 1 0 0 0 0 +NaN pmax 1 0 0 0 0 -NaN nmax 1 0 0 0 0 +denorm +zero 1 0 0 0 0 -denorm -zero 1 0 0 0 0 +zero +zero 0 0 0 0 0 -zero -zero 0 0 0 0 0 norm _Calc_ 0 * * 0 * Table 5: Embedded Floating-Point Results Summary--Double Convert from Single Operand B efdcfs result FINV FOVF FUNF FDBZ FINX + pmax 1 0 0 0 0 - nmax 1 0 0 0 0 +NaN pmax 1 0 0 0 0 -NaN nmax 1 0 0 0 0 +denorm +zero 1 0 0 0 0 -denorm -zero 1 0 0 0 0 +zero +zero 0 0 0 0 0 -zero -zero 0 0 0 0 0 norm _Calc_ 0 0 0 0 0 Table 6: Embedded Floating-Point Results Summary--Convert to Unsigned Integer Result Fractional Result Operand B FINV FOVF FUNF FDBZ FINX ctui[d][z] ctuf + 0xFFFF_FFFF 0x7FFF_FFFF 1 0 0 0 0 0xFFFF_FFFF_FFFF_FFFF - 0 0 1 0 0 0 0 +NaN 0 0 1 0 0 0 0 -NaN 0 0 1 0 0 0 0 denorm 0 0 1 0 0 0 0 zero 0 0 0 0 0 0 0 +norm _Calc_ _Calc_ * 0 0 0 * -norm _Calc_ _Calc_ * 0 0 0 * Chapter 7. Embedded Floating-Point 285 Version 2.04 Table 7: Embedded Floating-Point Results Summary--Convert to Signed Integer Result Fractional Result Operand B FINV FOVF FUNF FDBZ FINX ctsi[d][z] ctsf + 0x7FFF_FFFF 0x7FFF_FFFF 1 0 0 0 0 0x7FFF_FFFF_FFFF_FFFF - 0x8000_0000 0x8000_0000 1 0 0 0 0 0x8000_0000_0000_0000 +NaN 0 0 1 0 0 0 0 -NaN 0 0 1 0 0 0 0 denorm 0 0 1 0 0 0 0 zero 0 0 0 0 0 0 0 +norm _Calc_ _Calc_ * 0 0 0 * -norm _Calc_ _Calc_ * 0 0 0 * Table 8: Embedded Floating-Point Results Summary--Convert from Unsigned Integer Source Fractional Source Operand B FINV FOVF FUNF FDBZ FINX cfui cfuf zero zero zero 0 0 0 0 0 norm _Calc_ _Calc_ 0 0 0 0 * Table 9: Embedded Floating-Point Results Summary--Convert from Signed Integer Source Fractional Source Operand B FINV FOVF FUNF FDBZ FINX cfsi cfsf zero zero zero 0 0 0 0 0 norm _Calc_ _Calc_ 0 0 0 0 * Table 10:Embedded Floating-Point Results Summary--*abs, *nabs, *neg Operand A *abs *nabs *neg FINV FOVF FUNF FDBZ FINX + pmax | + nmax | - -amax | - 1 0 0 0 0 - pmax | + nmax | - -amax | + 1 0 0 0 0 +NaN pmax | NaN nmax | -NaN -amax | -NaN 1 0 0 0 0 -NaN pmax | NaN nmax | -NaN -amax | +NaN 1 0 0 0 0 +denorm +zero | +denorm -zero | -denorm -zero | -denorm 1 0 0 0 0 -denorm +zero | +denorm -zero | -denorm +zero | +denorm 1 0 0 0 0 +zero +zero -zero -zero 0 0 0 0 0 -zero +zero -zero +zero 0 0 0 0 0 +norm +norm -norm -norm 0 0 0 0 0 -norm +norm -norm +norm 0 0 0 0 0 286 Power ISATM -- Book I Version 2.04 Chapter 8. Legacy Move Assist Instruction [Category: Legacy Move Assist] Determine Leftmost Zero Byte X-form Special Registers Altered: XER57:63 dlmzb RA,RS,RB (Rc=0) CR0 (if Rc=1) dlmzb. RA,RS,RB (Rc=1) 31 RS RA RB 78 Rc 0 6 11 16 21 31 d0:63 1 (RS)32:63 || (RB)32:63 i 1 0 x 1 0 y 1 0 do while (x<8) & (y=0) x 1 x + 1 if di+32:i+39 = 0 then y 1 1 else i 1 i + 8 RA 1 x XER57:63 1 x if Rc = 1 then do CR35 1 SO if y = 1 then do if x<5 then CR32:34 1 0b010 else CR32:34 1 0b100 else CR32:34 1 0b001 The contents of bits 32:63 of register RS and the con- tents of bits 32:63 of register RB are concatenated to form an 8-byte operand. The operand is searched for the leftmost byte in which each bit is 0 (i.e., a null byte). Bytes in the operand are numbered from left to right starting with 1. If a null byte is found, its byte number is placed into bits 57:63 of the XER and into register RA. Otherwise, the value 0b000_1000 is placed into both bits 57:63 of the XER and register RA. If Rc is equal to 1, SO is copied into bit 35 of the CR and bits 32:34 of the CR are updated as follows: 1 If no null byte is found, bits 32:34 of the CR are set to 0b001. 1 If the leftmost null byte is in the first 4 bytes (i.e., from register RS), bits 32:34 of the CR are set to 0b010. 1 If the leftmost null byte is in the last 4 bytes (i.e., from register RB), bits 32:34 of the CR are set to 0b100. Chapter 8. Legacy Move Assist Instruction [Category: Legacy Move Assist] 287 Version 2.04 288 Power ISATM -- Book I Version 2.04 Chapter 9. Legacy Integer Multiply-Accumulate Instructions [Category: Legacy Integer Multiply-Accumulate] The Legacy Integer Multiply-Accumulate instructions Programming Note with Rc=1 set the first three bits of CR Field 0 based on the 32-bit result, as described in Section 3.3.7, "Other Notice that CR Field 0 may not reflect the "true" Fixed-Point Instructions". (infinitely precise) result if overflow occurs. The XO-form Legacy Integer Multiply-Accumulate instructions set SO and OV when OE=1 to reflect over- flow of the 32-bit result. Multiply Accumulate Cross Halfword to Multiply Accumulate Cross Halfword to Word Modulo Signed XO-form Word Saturate Signed XO-form macchw RT,RA,RB (OE=0 Rc=0) macchws RT,RA,RB (OE=0 Rc=0) macchw. RT,RA,RB (OE=0 Rc=1) macchws. RT,RA,RB (OE=0 Rc=1) macchwo RT,RA,RB (OE=1 Rc=0) macchwso RT,RA,RB (OE=1 Rc=0) macchwo. RT,RA,RB (OE=1 Rc=1) macchwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 172 Rc 4 RT RA RB OE 236 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)48:63 ×si (RB)32:47 prod0:31 1 (RA)48:63 ×si (RB)32:47 temp0:32 1 prod0:31 + (RT)32:63 temp0:32 1 prod0:31 + RT32:63 RT32:63 1 temp1:32 if temp < -231 then RT32:63 1 0x8000_0000 RT0:31 1 undefined else if temp > 231-1 then RT32:63 1 0x7FFF_FFFF else RT32:63 1 temp1:32 The signed-integer halfword in bits 48:63 of register RA RT0:31 1 undefined is multiplied by the signed-integer halfword in bits 32:47 of register RB. The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 32:47 The 32-bit signed-integer product is added to the of register RB. signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is added to the The low-order 32 bits of the sum are placed into bits signed-integer word in bits 32:63 of register RT. 32:63 of register RT. If the sum is less than -231, then the value 0x8000_0000 The contents of bits 0:31 of register RT are undefined. is placed into bits 32:63 of register RT. Special Registers Altered: If the sum is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the sum is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) Chapter 9. Legacy Integer Multiply-Accumulate Instructions 289 Version 2.04 Multiply Accumulate Cross Halfword to Multiply Accumulate Cross Halfword to Word Modulo Unsigned XO-form Word Saturate Unsigned XO-form macchwu RT,RA,RB (OE=0 Rc=0) macchwsu RT,RA,RB (OE=0 Rc=0) macchwu. RT,RA,RB (OE=0 Rc=1) macchwsu. RT,RA,RB (OE=0 Rc=1) macchwuo RT,RA,RB (OE=1 Rc=0) macchwsuo RT,RA,RB (OE=1 Rc=0) macchwuo. RT,RA,RB (OE=1 Rc=1) macchwsuo. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 140 Rc 4 RT RA RB OE 204 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)48:63 ×ui (RB)32:47 prod0:31 1 (RA)48:63 ×ui (RB)32:47 temp0:32 1 prod0:31 + (RT)32:63 temp0:32 1 prod0:31 + (RT)32:63 RT 1 temp1:32 if temp > 232-1 then RT 1 0xFFFF_FFFF else RT 1 temp1:32 The unsigned-integer halfword in bits 48:63 of register RA is multiplied by the unsigned-integer halfword in bits The unsigned-integer halfword in bits 48:63 of register 32:47 of register RB. RA is multiplied by the unsigned-integer halfword in bits 32:47 of register RB. The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. The low-order 32 bits of the sum are placed into bits 32:63 of register RT. If the sum is greater than 232-1, then the value 0xFFFF_FFFF is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Otherwise, the sum is placed into bits 32:63 of register Special Registers Altered: RT. SO OV (if OE=1) CR0 (if Rc=1) The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) 290 Power ISATM -- Book I Version 2.04 Multiply Accumulate High Halfword to Multiply Accumulate High Halfword to Word Modulo Signed XO-form Word Saturate Signed XO-form machhw RT,RA,RB (OE=0 Rc=0) machhws RT,RA,RB (OE=0 Rc=0) machhw. RT,RA,RB (OE=0 Rc=1) machhws. RT,RA,RB (OE=0 Rc=1) machhwo RT,RA,RB (OE=1 Rc=0) machhwso RT,RA,RB (OE=1 Rc=0) machhwo. RT,RA,RB (OE=1 Rc=1) machhwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 44 Rc 4 RT RA RB OE 108 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)32:47 ×si (RB)32:47 prod0:31 1 (RA)32:47 ×si (RB)32:47 temp0:32 1 prod0:31 + (RT)32:63 temp0:32 1 prod0:31 + (RT)32:63 RT32:63 1 temp1:32 if temp < -231 then RT32:63 1 0x8000_0000 RT0:31 1 undefined else if temp > 231-1 then RT32:63 1 0x7FFF_FFFF else RT32:63 1 temp1:32 The signed-integer halfword in bits 32:47 of register RA RT0:31 1 undefined is multiplied by the signed-integer halfword in bits 32:47 of register RB. The signed-integer halfword in bits 32:47 of register RA is multiplied by the signed-integer halfword in bits 32:47 The 32-bit signed-integer product is added to the of register RB. signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is added to the The low-order 32 bits of the sum are placed into bits signed-integer word in bits 32:63 of register RT. 32:63 of register RT. If the sum is less than -231, then the value 0x8000_0000 The contents of bits 0:31 of register RT are undefined. is placed into bits 32:63 of register RT. Special Registers Altered: If the sum is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the sum is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) Chapter 9. Legacy Integer Multiply-Accumulate Instructions 291 Version 2.04 Multiply Accumulate High Halfword to Multiply Accumulate High Halfword to Word Modulo Unsigned XO-form Word Saturate Unsigned XO-form machhwu RT,RA,RB (OE=0 Rc=0) machhwsu RT,RA,RB (OE=0 Rc=0) machhwu. RT,RA,RB (OE=0 Rc=1) machhwsu. RT,RA,RB (OE=0 Rc=1) machhwuo RT,RA,RB (OE=1 Rc=0) machhwsuo RT,RA,RB (OE=1 Rc=0) machhwuo. RT,RA,RB (OE=1 Rc=1) machhwsuo. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 12 Rc 4 RT RA RB OE 76 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)32:47 ×ui (RB)32:47 prod0:31 1 (RA)32:47 ×ui (RB)32:47 temp0:32 1 prod0:31 + (RT)32:63 temp0:32 1 prod0:31 + (RT)32:63 RT32:63 1 temp1:32 if temp > 232-1 then RT 1 0xFFFF_FFFF RT0:31 1 undefined else RT 1 temp1:32 The unsigned-integer halfword in bits 32:47 of register The unsigned-integer halfword in bits 32:47 of register RA is multiplied by the unsigned-integer halfword in bits RA is multiplied by the unsigned-integer halfword in bits 32:47 of register RB. 32:47 of register RB. The 32-bit unsigned-integer product is added to the The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. unsigned-integer word in bits 32:63 of register RT. The low-order 32 bits of the sum are placed into bits If the sum is greater than 232-1, then the value 32:63 of register RT. 0xFFFF_FFFF is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Otherwise, the sum is placed into bits 32:63 of register RT. Special Registers Altered: SO OV (if OE=1) The contents of bits 0:31 of register RT are undefined. CR0 (if Rc=1) Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) 292 Power ISATM -- Book I Version 2.04 Multiply Accumulate Low Halfword to Multiply Accumulate Low Halfword to Word Modulo Signed XO-form Word Saturate Signed XO-form maclhw RT,RA,RB (OE=0 Rc=0) maclhws RT,RA,RB (OE=0 Rc=0) maclhw. RT,RA,RB (OE=0 Rc=1) maclhws. RT,RA,RB (OE=0 Rc=1) maclhwo RT,RA,RB (OE=1 Rc=0) maclhwso RT,RA,RB (OE=1 Rc=0) maclhwo. RT,RA,RB (OE=1 Rc=1) maclhwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 428 Rc 4 RT RA RB OE 492 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)48:63 ×si (RB)48:63 prod0:31 1 (RA)48:63 ×si (RB)48:63 temp0:32 1 prod0:31 + (RT)32:63 temp0:32 1 prod0:31 + (RT)32:63 RT32:63 1 temp1:32 if temp < -231 then RT32:63 1 0x8000_0000 RT0:31 1 undefined else if temp > 231-1 then RT32:63 1 0x7FFF_FFFF else RT32:63 1 temp1:32 The signed-integer halfword in bits 48:63 of register RA RT0:31 1 undefined is multiplied by the signed-integer halfword in bits 48:63 of register RB. The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 48:63 The 32-bit signed-integer product is added to the of register RB. signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is added to the The low-order 32 bits of the sum are placed into bits signed-integer word in bits 32:63 of register RT. 32:63 of register RT. If the sum is less than -231, then the value 0x8000_0000 The contents of bits 0:31 of register RT are undefined. is placed into bits 32:63 of register RT. Special Registers Altered: If the sum is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the sum is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) Chapter 9. Legacy Integer Multiply-Accumulate Instructions 293 Version 2.04 Multiply Accumulate Low Halfword to Multiply Accumulate Low Halfword to Word Modulo Unsigned XO-form Word Saturate Unsigned XO-form maclhwu RT,RA,RB (OE=0 Rc=0) maclhwsu RT,RA,RB (OE=0 Rc=0) maclhwu. RT,RA,RB (OE=0 Rc=1) maclhwsu. RT,RA,RB (OE=0 Rc=1) maclhwuo RT,RA,RB (OE=1 Rc=0) maclhwsuo RT,RA,RB (OE=1 Rc=0) maclhwuo. RT,RA,RB (OE=1 Rc=1) maclhwsuo. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 396 Rc 4 RT RA RB OE 460 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)48:63 ×ui (RB)48:63 prod0:31 1 (RA)48:63 ×ui (RB)48:63 temp0:32 1 prod0:31 + (RT)32:63 temp0:32 1 prod0:31 + (RT)32:63 RT32:63 1 temp1:32 if temp > 232-1 then RT 1 0xFFFF_FFFF RT0:31 1 undefined else RT 1 temp1:32 The unsigned-integer halfword in bits 48:63 of register The unsigned-integer halfword in bits 48:63 of register RA is multiplied by the unsigned-integer halfword in bits RA is multiplied by the unsigned-integer halfword in bits 48:63 of register RB. 48:63 of register RB. The 32-bit unsigned-integer product is added to the The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. unsigned-integer word in bits 32:63 of register RT. The low-order 32 bits of the sum are placed into bits If the sum is greater than 232-1, then the value 32:63 of register RT. 0xFFFF_FFFF is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Otherwise, the sum is placed into bits 32:63 of register RT. Special Registers Altered: SO OV (if OE=1) The contents of bits 0:31 of register RT are undefined. CR0 (if Rc=1) Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) Multiply Cross Halfword to Word Signed Multiply Cross Halfword to Word X-form Unsigned X-form mulchw RT,RA,RB (Rc=0) mulchwu RT,RA,RB (Rc=0) mulchw. RT,RA,RB (Rc=1) mulchwu. RT,RA,RB (Rc=1) 4 RT RA RB 168 Rc 4 RT RA RB 136 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 (RA)48:63 ×si (RB)32:47 RT32:63 1 (RA)48:63 ×ui (RB)32:47 RT0:31 1 undefined RT0:31 1 undefined The signed-integer halfword in bits 48:63 of register RA The unsigned-integer halfword in bits 48:63 of register is multiplied by the signed-integer halfword in bits 32:47 RA is multiplied by the unsigned-integer halfword in bits of register RB and the signed-integer word result is 32:47 of register RB and the unsigned-integer word placed into bits 32:63 of register RT. result is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) 294 Power ISATM -- Book I Version 2.04 Multiply High Halfword to Word Signed Multiply High Halfword to Word Unsigned X-form X-form mulhhw RT,RA,RB (Rc=0) mulhhwu RT,RA,RB (Rc=0) mulhhw. RT,RA,RB (Rc=1) mulhhwu. RT,RA,RB (Rc=1) 4 RT RA RB 40 Rc 4 RT RA RB 8 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 (RA)32:47 ×si (RB)32:47 RT32:63 1 (RA)32:47 ×ui (RB)32:47 RT0:31 1 undefined RT0:31 1 undefined The signed-integer halfword in bits 32:47 of register RA The unsigned-integer halfword in bits 32:47 of register is multiplied by the signed-integer halfword in bits 32:47 RA is multiplied by the unsigned-integer halfword in bits of register RB and the signed-integer word result is 32:47 of register RB and the unsigned-integer word placed into bits 32:63 of register RT. result is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Multiply Low Halfword to Word Signed Multiply Low Halfword to Word Unsigned X-form X-form mullhw RT,RA,RB (Rc=0) mullhwu RT,RA,RB (Rc=0) mullhw. RT,RA,RB (Rc=1) mullhwu. RT,RA,RB (Rc=1) 4 RT RA RB 424 Rc 4 RT RA RB 392 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 (RA)48:63 ×si (RB)48:63 RT32:63 1 (RA)48:63 ×ui (RB)48:63 RT0:31 1 undefined RT0:31 1 undefined The signed-integer halfword in bits 48:63 of register RA The unsigned-integer halfword in bits 48:63 of register is multiplied by the signed-integer halfword in bits 48:63 RA is multiplied by the unsigned-integer halfword in bits of register RB and the signed-integer word result is 48:63 of register RB and the unsigned-integer word placed into bits 32:63 of register RT. result is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Chapter 9. Legacy Integer Multiply-Accumulate Instructions 295 Version 2.04 Negative Multiply Accumulate Cross Negative Multiply Accumulate Cross Halfword to Word Modulo Signed Halfword to Word Saturate Signed XO-form XO-form nmacchw RT,RA,RB (OE=0 Rc=0) nmacchws RT,RA,RB (OE=0 Rc=0) nmacchw. RT,RA,RB (OE=0 Rc=1) nmacchws. RT,RA,RB (OE=0 Rc=1) nmacchwo RT,RA,RB (OE=1 Rc=0) nmacchwso RT,RA,RB (OE=1 Rc=0) nmacchwo. RT,RA,RB (OE=1 Rc=1) nmacchwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 174 Rc 4 RT RA RB OE 238 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)48:63 ×si (RB)32:47 prod0:31 1 (RA)48:63 ×si (RB)32:47 temp0:32 1 (RT)32:63 -si prod0:31 temp0:32 1 (RT)32:63 -si prod0:31 RT32:63 1 temp1:32 if temp < -231 then RT32:63 1 0x8000_0000 RT0:31 1 undefined else if temp > 231-1 then RT32:63 1 0x7FFF_FFFF else RT32:63 1 temp1:32 The signed-integer halfword in bits 48:63 of register RA RT0:31 1 undefined is multiplied by the signed-integer halfword in bits 32:47 of register RB. The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 32:47 The 32-bit signed-integer product is subtracted from of register RB. the signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is subtracted from The low-order 32 bits of the difference are placed into the signed-integer word in bits 32:63 of register RT. bits 32:63 of register RT. If the difference is less than -231, then the value The contents of bits 0:31 of register RT are undefined. 0x8000_0000 is placed into bits 32:63 of register RT. Special Registers Altered: If the difference is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the difference is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) 296 Power ISATM -- Book I Version 2.04 Negative Multiply Accumulate High Negative Multiply Accumulate High Halfword to Word Modulo Signed Halfword to Word Saturate Signed XO-form XO-form nmachhw RT,RA,RB (OE=0 Rc=0) nmachhws RT,RA,RB (OE=0 Rc=0) nmachhw. RT,RA,RB (OE=0 Rc=1) nmachhws. RT,RA,RB (OE=0 Rc=1) nmachhwo RT,RA,RB (OE=1 Rc=0) nmachhwso RT,RA,RB (OE=1 Rc=0) nmachhwo. RT,RA,RB (OE=1 Rc=1) nmachhwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 46 Rc 4 RT RA RB OE 110 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)32:47 ×si (RB)32:47 prod0:31 1 (RA)32:47 ×si (RB)32:47 temp0:32 1 (RT)32:63 -si prod0:31 temp0:32 1 (RT)32:63 -si prod0:31 RT32:63 1 temp1:32 if temp < -231 then RT32:63 1 0x8000_0000 RT0:31 1 undefined else if temp > 231-1 then RT32:63 1 0x7FFF_FFFF else RT32:63 1 temp1:32 The signed-integer halfword in bits 32:47 of register RA RT0:31 1 undefined is multiplied by the signed-integer halfword in bits 32:47 of register RB. The signed-integer halfword in bits 32:47 of register RA is multiplied by the signed-integer halfword in bits 32:47 The 32-bit signed-integer product is subtracted from of register RB. the signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is subtracted from The low-order 32 bits of the difference are placed into the signed-integer word in bits 32:63 of register RT. bits 32:63 of register RT. If the difference is less than -231, then the value The contents of bits 0:31 of register RT are undefined. 0x8000_0000 is placed into bits 32:63 of register RT. Special Registers Altered: If the difference is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the difference is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) Chapter 9. Legacy Integer Multiply-Accumulate Instructions 297 Version 2.04 Negative Multiply Accumulate Low Negative Multiply Accumulate Low Halfword to Word Modulo Signed Halfword to Word Saturate Signed XO-form XO-form nmaclhw RT,RA,RB (OE=0 Rc=0) nmaclhws RT,RA,RB (OE=0 Rc=0) nmaclhw. RT,RA,RB (OE=0 Rc=1) nmaclhws. RT,RA,RB (OE=0 Rc=1) nmaclhwo RT,RA,RB (OE=1 Rc=0) nmaclhwso RT,RA,RB (OE=1 Rc=0) nmaclhwo. RT,RA,RB (OE=1 Rc=1) nmaclhwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 430 Rc 4 RT RA RB OE 494 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)48:63 ×si (RB)48:63 prod0:31 1 (RA)48:63 ×si (RB)48:63 temp0:32 1 (RT)32:63 -si prod0:31 temp0:32 1 (RT)32:63 -si prod0:31 RT32:63 1 temp1:32 if temp < -231 then RT32:63 1 0x8000_0000 RT0:31 1 undefined else if temp > 231-1 then RT32:63 1 0x7FFF_FFFF else RT32:63 1 temp1:32 The signed-integer halfword in bits 48:63 of register RA RT0:31 1 undefined is multiplied by the signed-integer halfword in bits 48:63 of register RB. The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 48:63 The 32-bit signed-integer product is subtracted from of register RB. the signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is subtracted from The low-order 32 bits of the difference are placed into the signed-integer word in bits 32:63 of register RT. bits 32:63 of register RT. If the difference is less than -231, then the value The contents of bits 0:31 of register RT are undefined. 0x8000_0000 is placed into bits 32:63 of register RT. Special Registers Altered: If the difference is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the difference is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) 298 Power ISATM -- Book I Version 2.04 Appendix A. Suggested Floating-Point Models [Category: Floating-Point] A.1 Floating-Point Round to Single-Precision Model The following describes algorithmically the operation of the Floating Round to Single-Precision instruction. If (FRB)1:11 < 897 and (FRB)1:63 > 0 then Do If FPSCRUE = 0 then goto Disabled Exponent Underflow If FPSCRUE = 1 then goto Enabled Exponent Underflow End If (FRB)1:11 > 1150 and (FRB)1:11 < 2047 then Do If FPSCROE = 0 then goto Disabled Exponent Overflow If FPSCROE = 1 then goto Enabled Exponent Overflow End If (FRB)1:11 > 896 and (FRB)1:11 < 1151 then goto Normal Operand If (FRB)1:63 = 0 then goto Zero Operand If (FRB)1:11 = 2047 then Do If (FRB)12:63 = 0 then goto Infinity Operand If (FRB)12 = 1 then goto QNaN Operand If (FRB)12 = 0 and (FRB)13:63 > 0 then goto SNaN Operand End Disabled Exponent Underflow: sign 1 (FRB)0 If (FRB)1:11 = 0 then Do exp 1 -1022 frac0:52 1 0b0 || (FRB)12:63 End If (FRB)1:11 > 0 then Do exp 1 (FRB)1:11 - 1023 frac0:52 1 0b1 || (FRB)12:63 End Denormalize operand: G || R || X 1 0b000 Do while exp < -126 exp 1 exp + 1 frac0:52 || G || R || X 1 0b0 || frac0:52 || G || (R | X) End FPSCRUX 1 (frac24:52 || G || R || X) > 0 Round Single(sign,exp,frac0:52,G,R,X) FPSCRXX 1 FPSCRXX | FPSCRFI If frac0:52 = 0 then Do Appendix A. Suggested Floating-Point Models [Category: Floating-Point] 299 Version 2.04 FRT0 1 sign FRT1:63 1 0 If sign = 0 then FPSCRFPRF 1 "+ zero" If sign = 1 then FPSCRFPRF 1 "- zero" End If frac0:52 > 0 then Do If frac0 = 1 then Do If sign = 0 then FPSCRFPRF 1 "+ normal number" If sign = 1 then FPSCRFPRF 1 "- normal number" End If frac0 = 0 then Do If sign = 0 then FPSCRFPRF 1 "+ denormalized number" If sign = 1 then FPSCRFPRF 1 "- denormalized number" End Normalize operand: Do while frac0 = 0 exp 1 exp-1 frac0:52 1 frac1:52 || 0b0 End FRT0 1 sign FRT1:11 1 exp + 1023 FRT12:63 1 frac1:52 End Done Enabled Exponent Underflow: FPSCRUX 1 1 sign 1 (FRB)0 If (FRB)1:11 = 0 then Do exp 1 -1022 frac0:52 1 0b0 || (FRB)12:63 End If (FRB)1:11 > 0 then Do exp 1 (FRB)1:11 - 1023 frac0:52 1 0b1 || (FRB)12:63 End Normalize operand: Do while frac0 = 0 exp 1 exp - 1 frac0:52 1 frac1:52 || 0b0 End Round Single(sign,exp,frac0:52,0,0,0) FPSCRXX 1 FPSCRXX | FPSCRFI exp 1 exp + 192 FRT0 1 sign FRT1:11 1 exp + 1023 FRT12:63 1 frac1:52 If sign = 0 then FPSCRFPRF 1 "+ normal number" If sign = 1 then FPSCRFPRF 1 "- normal number" Done Disabled Exponent Overflow: FPSCROX 1 1 If FPSCRRN = 0b00 then /* Round to Nearest */ Do If (FRB)0 = 0 then FRT 1 0x7FF0_0000_0000_0000 If (FRB)0 = 1 then FRT 1 0xFFF0_0000_0000_0000 If (FRB)0 = 0 then FPSCRFPRF 1 "+ infinity" If (FRB)0 = 1 then FPSCRFPRF 1 "- infinity" End 300 Power ISATM -- Book I Version 2.04 If FPSCRRN = 0b01 then /* Round toward Zero */ Do If (FRB)0 = 0 then FRT 1 0x47EF_FFFF_E000_0000 If (FRB)0 = 1 then FRT 1 0xC7EF_FFFF_E000_0000 If (FRB)0 = 0 then FPSCRFPRF 1 "+ normal number" If (FRB)0 = 1 then FPSCRFPRF 1 "- normal number" End If FPSCRRN = 0b10 then /* Round toward +Infinity */ Do If (FRB)0 = 0 then FRT 1 0x7FF0_0000_0000_0000 If (FRB)0 = 1 then FRT 1 0xC7EF_FFFF_E000_0000 If (FRB)0 = 0 then FPSCRFPRF 1 "+ infinity" If (FRB)0 = 1 then FPSCRFPRF 1 "- normal number" End If FPSCRRN = 0b11 then /* Round toward -Infinity */ Do If (FRB)0 = 0 then FRT 1 0x47EF_FFFF_E000_0000 If (FRB)0 = 1 then FRT 1 0xFFF0_0000_0000_0000 If (FRB)0 = 0 then FPSCRFPRF 1 "+ normal number" If (FRB)0 = 1 then FPSCRFPRF 1 "- infinity" End FPSCRFR 1 undefined FPSCRFI 1 1 FPSCRXX 1 1 Done Enabled Exponent Overflow: sign 1 (FRB)0 exp 1 (FRB)1:11 - 1023 frac0:52 1 0b1 || (FRB)12:63 Round Single(sign,exp,frac0:52,0,0,0) FPSCRXX 1 FPSCRXX | FPSCRFI Enabled Overflow: FPSCROX 1 1 exp 1 exp - 192 FRT0 1 sign FRT1:11 1 exp + 1023 FRT12:63 1 frac1:52 If sign = 0 then FPSCRFPRF 1 "+ normal number" If sign = 1 then FPSCRFPRF 1 "- normal number" Done Zero Operand: FRT 1 (FRB) If (FRB)0 = 0 then FPSCRFPRF 1 "+ zero" If (FRB)0 = 1 then FPSCRFPRF 1 "- zero" FPSCRFRFI 1 0b00 Done Infinity Operand: FRT 1 (FRB) If (FRB)0 = 0 then FPSCRFPRF 1 "+ infinity" If (FRB)0 = 1 then FPSCRFPRF 1 "- infinity" FPSCRFRFI 1 0b00 Done QNaN Operand: FRT 1 (FRB)0:34 || 290 FPSCRFPRF 1 "QNaN" FPSCRFR FI 1 0b00 Done Appendix A. Suggested Floating-Point Models [Category: Floating-Point] 301 Version 2.04 SNaN Operand: FPSCRVXSNAN 1 1 If FPSCRVE = 0 then Do FRT0:11 1 (FRB)0:11 FRT12 1 1 FRT13:63 1 (FRB)13:34 || 290 FPSCRFPRF 1 "QNaN" End FPSCRFR FI 1 0b00 Done Normal Operand: sign 1 (FRB)0 exp 1 (FRB)1:11 - 1023 frac0:52 1 0b1 || (FRB)12:63 Round Single(sign,exp,frac0:52,0,0,0) FPSCRXX 1 FPSCRXX | FPSCRFI If exp > 127 and FPSCROE = 0 then go to Disabled Exponent Overflow If exp > 127 and FPSCROE = 1 then go to Enabled Overflow FRT0 1 sign FRT1:11 1 exp + 1023 FRT12:63 1 frac1:52 If sign = 0 then FPSCRFPRF 1 "+ normal number" If sign = 1 then FPSCRFPRF 1 "- normal number" Done Round Single(sign,exp,frac0:52,G,R,X): inc 1 0 lsb 1 frac23 gbit 1 frac24 rbit 1 frac25 xbit 1 (frac26:52||G||R||X)0 If FPSCRRN = 0b00 then /* Round to Nearest */ Do /* comparisons ignore u bits */ If sign || lsb || gbit || rbit || xbit = 0bu11uu then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0bu011u then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0bu01u1 then inc 1 1 End If FPSCRRN = 0b10 then /* Round toward + Infinity */ Do /* comparisons ignore u bits */ If sign || lsb || gbit || rbit || xbit = 0b0u1uu then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0b0uu1u then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0b0uuu1 then inc 1 1 End If FPSCRRN = 0b11 then /* Round toward - Infinity */ Do /* comparisons ignore u bits */ If sign || lsb || gbit || rbit || xbit = 0b1u1uu then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0b1uu1u then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0b1uuu1 then inc 1 1 End frac0:23 1 frac0:23 + inc If carry_out = 1 then Do frac0:23 1 0b1 || frac0:22 exp 1 exp + 1 End frac24:52 1 290 FPSCRFR 1 inc FPSCRFI 1 gbit | rbit | xbit Return 302 Power ISATM -- Book I Version 2.04 A.2 Floating-Point Convert to Integer Model The following describes algorithmically the operation of the Floating Convert To Integer instructions. If Floating Convert To Integer Word then Do round_mode 1 FPSCRRN tgt_precision 1 "32-bit integer" End If Floating Convert To Integer Word with round toward Zero then Do round_mode 1 0b01 tgt_precision 1 "32-bit integer" End If Floating Convert To Integer Doubleword then Do round_mode 1 FPSCRRN tgt_precision 1 "64-bit integer" End If Floating Convert To Integer Doubleword with round toward Zero then Do round_mode 1 0b01 tgt_precision 1 "64-bit integer" End sign 1 (FRB)0 If (FRB)1:11 = 2047 and (FRB)12:63 = 0 then goto Infinity Operand If (FRB)1:11 = 2047 and (FRB)12 = 0 then goto SNaN Operand If (FRB)1:11 = 2047 and (FRB)12 = 1 then goto QNaN Operand If (FRB)1:11 > 1086 then goto Large Operand If (FRB)1:11 > 0 then exp 1 (FRB)1:11 - 1023 /* exp - bias */ If (FRB)1:11 = 0 then exp 1 -1022 If (FRB)1:11 > 0 then frac0:64 1 0b01 || (FRB)12:63 || 110 /* normal; need leading 0 for later complement */ If (FRB)1:11 = 0 then frac0:64 1 0b00 || (FRB)12:63 || 110 /* denormal */ gbit || rbit || xbit 1 0b000 Do i=1,63-exp /* do the loop 0 times if exp = 63 */ frac0:64 || gbit || rbit || xbit 1 0b0 || frac0:64 || gbit || (rbit | xbit) End Round Integer(sign,frac0:64,gbit,rbit,xbit,round_mode) If sign = 1 then frac0:64 1 ¬frac0:64 + 1 /* needed leading 0 for -264 < (FRB) < -263 */ If tgt_precision = "32-bit integer" and frac0:64 > 231-1 then goto Large Operand If tgt_precision = "64-bit integer" and frac0:64 > 263-1 then goto Large Operand If tgt_precision = "32-bit integer" and frac0:64 < -231 then goto Large Operand If tgt_precision = "64-bit integer" and frac0:64 < -263 then goto Large Operand FPSCRXX 1 FPSCRXX | FPSCRFI If tgt_precision = "32-bit integer" then FRT 1 0xuuuu_uuuu || frac33:64 /* u is undefined hex digit */ If tgt_precision = "64-bit integer" then FRT 1 frac1:64 FPSCRFPRF 1 undefined Done Appendix A. Suggested Floating-Point Models [Category: Floating-Point] 303 Version 2.04 Round Integer(sign,frac0:64,gbit,rbit,xbit,round_mode): inc 1 0 If round_mode = 0b00 then /* Round to Nearest */ Do /* comparisons ignore u bits */ If sign || frac64 || gbit || rbit || xbit = 0bu11uu then inc 1 1 If sign || frac64 || gbit || rbit || xbit = 0bu011u then inc 1 1 If sign || frac64 || gbit || rbit || xbit = 0bu01u1 then inc 1 1 End If round_mode = 0b10 then /* Round toward +Infinity */ Do /* comparisons ignore u bits */ If sign || frac64 || gbit || rbit || xbit = 0b0u1uu then inc 1 1 If sign || frac64 || gbit || rbit || xbit = 0b0uu1u then inc 1 1 If sign || frac64 || gbit || rbit || xbit = 0b0uuu1 then inc 1 1 End If round_mode = 0b11 then /* Round toward -Infinity */ Do /* comparisons ignore u bits */ If sign || frac64 || gbit || rbit || xbit = 0b1u1uu then inc 1 1 If sign || frac64 || gbit || rbit || xbit = 0b1uu1u then inc 1 1 If sign || frac64 || gbit || rbit || xbit = 0b1uuu1 then inc 1 1 End frac0:64 1 frac0:64 + inc FPSCRFR 1 inc FPSCRFI 1 gbit | rbit | xbit Return Infinity Operand: FPSCRFR FI VXCVI 1 0b001 If FPSCRVE = 0 then Do If tgt_precision = "32-bit integer" then Do If sign = 0 then FRT 1 0xuuuu_uuuu_7FFF_FFFF /* u is undefined hex digit */ If sign = 1 then FRT 1 0xuuuu_uuuu_8000_0000 /* u is undefined hex digit */ End Else Do If sign = 0 then FRT 1 0x7FFF_FFFF_FFFF_FFFF If sign = 1 then FRT 1 0x8000_0000_0000_0000 End FPSCRFPRF 1 undefined End Done SNaN Operand: FPSCRFR FI VXSNAN VXCVI 1 0b0011 If FPSCRVE = 0 then Do If tgt_precision = "32-bit integer" then FRT 1 0xuuuu_uuuu_8000_0000 /* u is undefined hex digit */ If tgt_precision = "64-bit integer" then FRT 1 0x8000_0000_0000_0000 FPSCRFPRF 1 undefined End Done QNaN Operand: FPSCRFR FI VXCVI 1 0b001 If FPSCRVE = 0 then Do If tgt_precision = "32-bit integer" then FRT 1 0xuuuu_uuuu_8000_0000 /* u is undefined hex digit */ If tgt_precision = "64-bit integer" then FRT 1 0x8000_0000_0000_0000 FPSCRFPRF 1 undefined End Done 304 Power ISATM -- Book I Version 2.04 Large Operand: FPSCRFR FI VXCVI 1 0b001 If FPSCRVE = 0 then Do If tgt_precision = "32-bit integer" then Do If sign = 0 then FRT 1 0xuuuu_uuuu_7FFF_FFFF /* u is undefined hex digit */ If sign = 1 then FRT 1 0xuuuu_uuuu_8000_0000 /* u is undefined hex digit */ End Else Do If sign = 0 then FRT 1 0x7FFF_FFFF_FFFF_FFFF If sign = 1 then FRT 1 0x8000_0000_0000_0000 End FPSCRFPRF 1 undefined End Done Appendix A. Suggested Floating-Point Models [Category: Floating-Point] 305 Version 2.04 A.3 Floating-Point Convert from Integer Model The following describes algorithmically the operation of the Floating Convert From Integer Doubleword instruction. sign 1 (FRB)0 exp 1 63 frac0:63 1 (FRB) If frac0:63 = 0 then go to Zero Operand If sign = 1 then frac0:63 1 ¬frac0:63 + 1 Do while frac0 = 0 /* do the loop 0 times if (FRB) = maximum negative integer */ frac0:63 1 frac1:63 || 0b0 exp 1 exp - 1 End Round Float(sign,exp,frac0:63,FPSCRRN) If sign = 0 then FPSCRFPRF 1 "+normal number" If sign = 1 then FPSCRFPRF 1 "-normal number" FRT0 1 sign FRT1:11 1 exp + 1023 /* exp + bias */ FRT12:63 1 frac1:52 Done Zero Operand: FPSCRFR FI 1 0b00 FPSCRFPRF 1 "+ zero" FRT 1 0x0000_0000_0000_0000 Done Round Float(sign,exp,frac0:63,round_mode): inc 1 0 lsb 1 frac52 gbit 1 frac53 rbit 1 frac54 xbit 1 frac55:63 > 0 If round_mode = 0b00 then /* Round to Nearest */ Do /* comparisons ignore u bits */ If sign || lsb || gbit || rbit || xbit = 0bu11uu then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0bu011u then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0bu01u1 then inc 1 1 End If round_mode = 0b10 then /* Round toward + Infinity */ Do /* comparisons ignore u bits */ If sign || lsb || gbit || rbit || xbit = 0b0u1uu then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0b0uu1u then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0b0uuu1 then inc 1 1 End If round_mode = 0b11 then /* Round toward - Infinity */ Do /* comparisons ignore u bits */ If sign || lsb || gbit || rbit || xbit = 0b1u1uu then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0b1uu1u then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0b1uuu1 then inc 1 1 End frac0:52 1 frac0:52 + inc If carry_out = 1 then exp 1 exp + 1 FPSCRFR 1 inc FPSCRFI 1 gbit | rbit | xbit FPSCRXX 1 FPSCRXX | FPSCRFI Return 306 Power ISATM -- Book I Version 2.04 A.4 Floating-Point Round to Integer Model The following describes algorithmically the operation of the Floating Round To Integer instructions. If (FRB)1:11 = 2047 and (FRB)12:63 = 0, then goto Infinity Operand If (FRB)1:11 = 2047 and (FRB)12 = 0, then goto SNaN Operand If (FRB)1:11 = 2047 and (FRB)12 = 1, then goto QNaN Operand if (FRB)1:63 = 0 then goto Zero Operand If (FRB)1:11 < 1023 then goto Small Operand /* exp < 0; |value| < 1*/ If (FRB)1:11 > 1074 then goto Large Operand /* exp > 51; integral value */ sign 1 (FRB)0 exp 1 (FRB)1:11 - 1023 /* exp - bias */ frac0:52 1 0b1 || (FRB)12:63 gbit || rbit || xbit 1 0b000 Do i = 1, 52 - exp frac0:52 || gbit || rbit || xbit 1 0b0 || frac0:52 || gbit || (rbit | xbit) End Round Integer (sign, frac0:52, gbit, rbit, xbit) Do i = 2, 52 - exp frac0:52 1 frac1:52 || 0b0 End If frac0 = 1, then exp 1 exp + 1 Else frac0:52 1 frac1:52 || 0b0 FRT0 1 sign FRT1:11 1 exp + 1023 FRT12:63 1 frac1:52 If (FRT)0 = 0 then FPSCRFPRF 1 "+ normal number" Else FPSCRFPRF 1 "- normal number" FPSCRFR FI 1 0b00 Done Round Integer(sign, frac0:52, gbit, rbit, xbit): inc 1 0 If inst = Floating Round to Integer Nearest then /* ties away from zero */ Do /* comparisons ignore u bits */ If sign || frac52 || gbit || rbit || xbit = 0buu1uu then inc 1 1 End If inst = Floating Round to Integer Plus then Do /* comparisons ignore u bits */ If sign || frac52 || gbit || rbit || xbit = 0b0u1uu then inc 1 1 If sign || frac52 || gbit || rbit || xbit = 0b0uu1u then inc 1 1 If sign || frac52 || gbit || rbit || xbit = 0b0uuu1 then inc 1 1 End If inst = Floating Round to Integer Minus then Do /* comparisons ignore u bits */ If sign || frac52 || gbit || rbit || xbit = 0b1u1uu then inc 1 1 If sign || frac52 || gbit || rbit || xbit = 0b1uu1u then inc 1 1 If sign || frac52 || gbit || rbit || xbit = 0b1uuu1 then inc 1 1 End frac0:52 1 frac0:52 + inc Return Appendix A. Suggested Floating-Point Models [Category: Floating-Point] 307 Version 2.04 Infinity Operand: FRT 1 (FRB) If (FRB)0 = 0 then FPSCRFPRF 1 "+ infinity" If (FRB)0 = 1 then FPSCRFPRF 1 "- infinity" FPSCRFR FI 1 0b00 Done SNaN Operand: FPSCRVXSNAN 1 1 If FPSCRVE = 0 then Do FRT 1 (FRB) FRT12 1 1 FPSCRFPRF 1 "QNaN" End FPSCRFR FI 1 0b00 Done QNaN Operand: FRT 1 (FRB) FPSCRFPRF 1 "QNaN" FPSCRFR FI 1 0b00 Done Zero Operand: If (FRB)0 = 0 then Do FRT 1 0x0000_0000_0000_0000 FPSCRFPRF 1 "+ zero" End Else Do FRT 1 0x8000_0000_0000_0000 FPSCRFPRF 1 "- zero" End FPSCRFR FI 1 0b00 Done Small Operand: If inst = Floating Round to Integer Nearest and (FRB)1:11 < 1022 then goto Zero Operand If inst = Floating Round to Integer Toward Zero then goto Zero Operand If inst = Floating Round to Integer Plus and (FRB)0 = 1 then goto Zero Operand If inst = Floating Round to Integer Minus and (FRB)0 = 0 then goto Zero Operand If (FRB)0 = 0 then Do FRT 1 0x3FF0_0000_0000_0000 /* value = 1.0 */ FPSCRFPRF 1 "+ normal number" End Else Do FRT 1 0xBFF0_0000_0000_0000 /* value = -1.0 */ FPSCRFPRF 1 "- normal number" End FPSCRFR FI 1 0b00 Done Large Operand: FRT 1 (FRB) If FRT0 = 0 then FPSCRFPRF 1 "+ normal number" Else FPSCRFPRF 1 "- normal number" FPSCRFR FI 1 0b00 Done 308 Power ISATM -- Book I Version 2.04 Appendix B. Vector RTL Functions [Category: Vector] ConvertSPtoSXWsaturate( X, Y ) sign = X0 exp0:7 = X1:8 frac0:30 = X9:31 || 0b0000_0000 if((exp==255)&(frac!=0)) then return(0x0000_0000) // NaN operand if((exp==255)&(frac==0)) then do // infinity operand VSCRSAT = 1 return( (sign==1) ? 0x8000_0000 : 0x7FFF_FFFF ) if((exp+Y-127)>30) then do // large operand VSCRSAT = 1 return( (sign==1) ? 0x8000_0000 : 0x7FFF_FFFF ) if((exp+Y-127)<0) then return(0x0000_0000) // -1.0 < value < 1.0 (value rounds to 0) significand0:31 = 0b1 || frac do i=1 to 31-(exp+Y-127) significand = significand >>ui 1 return( (sign==0) ? significand : (¬significand + 1) ) ConvertSPtoUXWsaturate( X, Y ) sign = X0 exp0:7 = X1:8 frac0:30 = X9:31 || 0b0000_0000 if((exp==255)&&(frac!=0)) then return(0x0000_0000) // NaN operand if((exp==255)&&(frac==0)) then do // infinity operand VSCRSAT = 1 return( (sign==1) ? 0x0000_0000 : 0xFFFF_FFFF ) if((exp+Y-127)>31) then do // large operand VSCRSAT = 1 return( (sign==1) ? 0x0000_0000 : 0xFFFF_FFFF ) if((exp+Y-127)<0) then return(0x0000_0000) // -1.0 < value < 1.0 // value rounds to 0 if( sign==1 ) then do // negative operand VSCRSAT = 1 return(0x0000_0000) significand0:31 = 0b1 || frac do i=1 to 31-(exp+Y-127) significand = significand >>ui 1 return( significand ) ConvertSXWtoSP( X ) sign = X0 exp0:7 = 32 + 127 frac0:32 = X0 || X0:31 if( frac==0 ) return( 0x0000_0000 ) // Zero operand if( sign==1 ) then frac = ¬frac + 1 do while( frac0==0 ) frac = frac << 1 exp = exp - 1 lsb = frac23 gbit = frac24 xbit = frac25:32!=0 inc = ( lsb && gbit ) | ( gbit && xbit ) frac0:23 = frac0:23 + inc if( carry_out==1 ) exp = exp + 1 return( sign || exp || frac1:23 ) Appendix B. Vector RTL Functions [Category: Vector] 309 Version 2.04 ConvertUXWtoSP( X ) exp0:7 = 31 + 127 frac0:31 = X0:31 if( frac==0 ) return( 0x0000_0000 ) // Zero Operand do while( frac0==0 ) frac = frac << 1 exp = exp - 1 lsb = frac23 gbit = frac24 xbit = frac25:31!=0 inc = ( lsb && gbit ) | ( gbit && xbit ) frac0:23 = frac0:23 + inc if( carry_out==1 ) exp = exp + 1 return( 0b0 || exp || frac1:23 ) 310 Power ISATM -- Book I Version 2.04 Appendix C. Embedded Floating-Point RTL Functions [Category: SPE.Embedded Float Scalar Double] [Category: SPE.Embedded Float Scalar Single] [Category: SPE.Embedded Float Vector] C.1 Common Functions // Round a 32-bit fp result Round32(fp, guard, sticky) // Check if 32-bit fp value is a NaN or Infinity FP32format fp; Isa32NaNorInfinity(fp) if (SPEFSCRFINXE = 0) then return (fpexp = 255) if (SPEFSCRFRMC = 0b00) then // nearest if (guard) then Isa32NaN(fp) if (sticky | fpfrac[22]) then return ((fpexp = 255) & (fpfrac 0)) v0:23 1 fpfrac + 1 if v0 then // Check if 32-bit fp value is denormalized if (fpexp >= 254) then Isa32Denorm(fp) // overflow return ((fpexp = 0) & (fpfrac 0)) fp 1 fpsign || 0b11111110 || 231 else // Check if 64-bit fp value is a NaN or Infinity fpexp 1 fpexp + 1 Isa64NaNorInfinity(fp) fpfrac 1 v1:23 return (fpexp = 2047) else fpfrac 1 v1:23 Isa64NaN(fp) else if ((SPEFSCRFRMC & 0b10) = 0b10) then return ((fpexp = 2047) & (fpfrac 0)) // infinity modes // implementation dependent // Check if 32-bit fp value is denormalized return fp Isa64Denorm(fp) return ((fpexp = 0) & (fpfrac 0)) // Round a 64-bit fp result Round64(fp, guard, sticky) // Signal an error in the SPEFSCR SignalFPError(upper_lower, bits) FP32format fp; if (upper_lower = HI) then if (SPEFSCRFINXE = 0) then bits 1 bits << 15 if (SPEFSCRFRMC = 0b00) then // nearest SPEFSCR 1 SPEFSCR | bits if (guard) then bits 1 (FG | FX) if (sticky | fpfrac[51]) then if (upper_lower = HI) then v0:52 1 fpfrac + 1 bits 1 bits << 15 if v0 then SPEFSCR 1 SPEFSCR & ¬bits if (fpexp >= 2046) then // overflow fp 1 fpsign || 0b11111111110 || 521 else fpexp 1 fpexp + 1 fpfrac 1 v1:52 else fpfrac 1 v1:52 else if ((SPEFSCRFRMC & 0b10) = 0b10) then // infinity modes // implementation dependent return fp Appendix C. Embedded Floating-Point RTL Functions [Category: SPE.Em- 311 Version 2.04 guard 1 result & 0x00000001 C.2 Convert from Single-Preci- result 1 result > 1 sion Embedded Floating-Point to // Report sticky and guard bits if (upper_lower = HI) then Integer Word with Saturation SPEFSCRFGH 1 guard SPEFSCRFXH 1 sticky // Convert 32-bit Floating-Point to 32-bit integer else // or fractional SPEFSCRFG 1 guard // signed = S (signed) or U (unsigned) SPEFSCRFX 1 sticky // upper_lower = HI (high word) or LO (low word) // round = RND (round) or ZER (truncate) if (guard | sticky) then // fractional = F (fractional) or I (integer) SPEFSCRFINXS 1 1 // Round the integer result CnvtFP32ToI32Sat(fp, signed, if ((round = RND) & (SPEFSCRFINXE = 0)) then upper_lower, round, fractional) if (SPEFSCRFRMC = 0b00) then // nearest if (guard) then FP32format fp; if (sticky | (result & 0x00000001)) then if (Isa32NaNorInfinity(fp)) then result 1 result + 1 SignalFPError(upper_lower, FINV) else if ((SPEFSCRFRMC & 0b10) = 0b10) then if (Isa32NaN(fp)) then // infinity modes return 0x00000000 // all NaNs // implementation dependent if (signed = S) then if (signed = S) then if (fpsign = 1) then if (fpsign = 1) then return 0x80000000 result 1 ¬result + 1 else return result return 0x7fffffff else if (fpsign = 1) then return 0x00000000 else return 0xffffffff if (Isa32Denorm(fp)) then SignalFPError(upper_lower, FINV) return 0x00000000 // regardless of sign if ((signed = U) & (fpsign = 1)) then SignalFPError(upper_lower, FOVF) // overflow return 0x00000000 if ((fpexp = 0) & (fpfrac = 0)) then return 0x00000000 // all zero values if (fractional = I) then // convert to integer max_exp 1 158 shift 1 158 - fpexp if (signed = S) then if ((fpexp158)|(fpfrac0)|(fpsign1)) then max_exp 1 max_exp - 1 else // fractional conversion max_exp 1 126 shift 1 126 - fpexp if (signed = S) then shift 1 shift + 1 if (fpexp > max_exp) then SignalFPError(upper_lower, FOVF) // overflow if (signed = S) then if (fpsign = 1) then return 0x80000000 else return 0x7fffffff else return 0xffffffff result 1 0b1 || fpfrac || 0b00000000 // add U bit guard 1 0 sticky 1 0 for (n 1 0; n < shift; n 1 n + 1) do sticky 1 sticky | guard 312 Power ISATM -- Book I Version 2.04 guard 1 result & 0x00000001 C.3 Convert from Double-Preci- result 1 result > 1 sion Embedded Floating-Point to // Report sticky and guard bits Integer Word with Saturation SPEFSCRFG 1 guard SPEFSCRFX 1 sticky // Convert 64-bit Floating-Point to 32-bit integer // or fractional if (guard | sticky) then // signed = S (signed) or U (unsigned) SPEFSCRFINXS 1 1 // round = RND (round) or ZER (truncate) // Round the result // fractional = F (fractional) or I (integer) if ((round = RND) & (SPEFSCRFINXE = 0)) then if (SPEFSCRFRMC = 0b00) then // nearest CnvtFP64ToI32Sat(fp, signed, round, if (guard) then fractional) if (sticky | (result & 0x00000001)) then FP64format fp; result 1 result + 1 else if ((SPEFSCRFRMC & 0b10) = 0b10) then if (Isa64NaNorInfinity(fp)) then // infinity modes SignalFPError(LO, FINV) // implementation dependent if (Isa64NaN(fp)) then if (signed = S) then return 0x00000000 // all NaNs if (fpsign = 1) then if (signed = S) then result 1 ¬result + 1 if (fpsign = 1) then return result return 0x80000000 else return 0x7fffffff else if (fpsign = 1) then return 0x00000000 else return 0xffffffff if (Isa64Denorm(fp)) then SignalFPError(LO, FINV) return 0x00000000 // regardless of sign if ((signed = U) & (fpsign = 1)) then SignalFPError(LO, FOVF) // overflow return 0x00000000 if ((fpexp = 0) & (fpfrac = 0)) then return 0x00000000 // all zero values if (fractional = I) then // convert to integer max_exp 1 1054 shift 1 1054 - fpexp if (signed 1 S) then if ((fpexp1054)|(fpfrac0)|(fpsign1)) then max_exp 1 max_exp - 1 else // fractional conversion max_exp 1 1022 shift 1 1022 - fpexp if (signed = S) then shift 1 shift + 1 if (fpexp > max_exp) then SignalFPError(LO, FOVF) // overflow if (signed = S) then if (fpsign = 1) then return 0x80000000 else return 0x7fffffff else return 0xffffffff result 1 0b1 || fpfrac[0:30] // add U to frac guard 1 fpfrac[31] sticky 1 (fpfrac[32:63] 0) for (n 1 0; n < shift; n 1 n + 1) do sticky 1 sticky | guard Appendix C. Embedded Floating-Point RTL Functions [Category: SPE.Em- 313 Version 2.04 C.4 Convert from Double-Preci- if (guard | sticky) then SPEFSCRFINXS 1 1 sion Embedded Floating-Point to // Round the result if ((round = RND) & (SPEFSCRFINXE = 0)) then Integer Doubleword with Satura- if (SPEFSCRFRMC = 0b00) then // nearest if (guard) then tion if (sticky | (result&0x00000000_00000001)) then // Convert 64-bit Floating-Point to 64-bit integer result 1 result + 1 // signed = S (signed) or U (unsigned) else if ((SPEFSCRFRMC & 0b10) = 0b10) then // round = RND (round) or ZER (truncate) // infinity modes // implementation dependent CnvtFP64ToI64Sat(fp, signed, round) if (signed = S) then FP64format fp; if (fpsign = 1) then if (Isa64NaNorInfinity(fp)) then result 1 ¬result + 1 SignalFPError(LO, FINV) return result if (Isa64NaN(fp)) then return 0x00000000_00000000 // all NaNs if (signed = S) then if (fpsign = 1) then return 0x80000000_00000000 else return 0x7fffffff_ffffffff else if (fpsign = 1) then return 0x00000000_00000000 else return 0xffffffff_ffffffff if (Isa64Denorm(fp)) then SignalFPError(LO, FINV) return 0x00000000_00000000 if ((signed = U) & (fpsign = 1)) then SignalFPError(LO, FOVF) // overflow return 0x00000000_00000000 if ((fpexp = 0) & (fpfrac = 0)) then return 0x00000000_00000000 // all zero values max_exp 1 1086 shift 1 1086 - fpexp if (signed = S) then if ((fpexp1086)|(fpfrac0)|(fpsign1)) then max_exp 1 max_exp - 1 if (fpexp > max_exp) then SignalFPError(LO, FOVF) // overflow if (signed = S) then if (fpsign = 1) then return 0x80000000_00000000 else return 0x7fffffff_ffffffff else return 0xffffffff_ffffffff result 1 0b1 || fpfrac || 0b00000000000 //add U bit guard 1 0 sticky 1 0 for (n 1 0; n < shift; n 1 n + 1) do sticky 1 sticky | guard guard 1 result & 0x00000000_00000001 result 1 result > 1 // Report sticky and guard bits SPEFSCRFG 1 guard SPEFSCRFX 1 sticky 314 Power ISATM -- Book I Version 2.04 C.5 Convert to Single-Precision C.6 Convert to Double-Preci- Embedded Floating-Point from sion Embedded Floating-Point Integer Word from Integer Word // Convert from 32-bit integer or fractional to // Convert from integer or fractional to 64 bit // 32-bit Floating-Point // Floating-Point // signed = S (signed) or U (unsigned) // signed = S (signed) or U (unsigned) // round = RND (round) or ZER (truncate) // fractional = F (fractional) or I (integer) // fractional = F (fractional) or I (integer) CnvtI32ToFP64(v, signed, fractional) CnvtI32ToFP32(v, signed, upper_lower, FP64format result; fractional) resultsign 1 0 FP32format result; if (v = 0) then resultsign 1 0 result 1 0 if (v = 0) then SPEFSCRFG 1 0 result 1 0 SPEFSCRFX 1 0 if (upper_lower = HI) then else SPEFSCRFGH 1 0 if (signed = S) then SPEFSCRFXH 1 0 if (v0 = 1) then else v 1 ¬v + 1 SPEFSCRFG 1 0 resultsign 1 1 SPEFSCRFX 1 0 if (fractional = F) then // frac bit align else maxexp 1 1023 if (signed = S) then if (signed = U) then if (v0 = 1) then maxexp 1 maxexp - 1 v 1 ¬v + 1 else resultsign 1 1 maxexp 1 1054 // integer bit align if (fractional = F) then // frac bit align sc 1 0 maxexp 1 127 while (v0 = 0) if (signed = U) then v 1 v << 1 maxexp 1 maxexp - 1 sc 1 sc + 1 else v0 1 0 // clear U bit maxexp 1 158 // integer bit alignment resultexp 1 maxexp - sc sc 1 0 while (v0 = 0) // Report sticky and guard bits v 1 v << 1 sc 1 sc + 1 SPEFSCRFG 1 0 v0 1 0 // clear U bit SPEFSCRFX 1 0 resultexp 1 maxexp - sc guard 1 v24 resultfrac 1 v1:31 || 210 sticky 1 (v25:31 0) return result // Report sticky and guard bits if (upper_lower = HI) then SPEFSCRFGH 1 guard SPEFSCRFXH 1 sticky else SPEFSCRFG 1 guard SPEFSCRFX 1 sticky if (guard | sticky) then SPEFSCRFINXS 1 1 // Round the result resultfrac 1 v1:23 result 1 Round32(result, guard, sticky) return result Appendix C. Embedded Floating-Point RTL Functions [Category: SPE.Em- 315 Version 2.04 C.7 Convert to Double-Preci- sion Embedded Floating-Point from Integer Doubleword // Convert from 64-bit integer to 64-bit // floating-point // signed = S (signed) or U (unsigned) CnvtI64ToFP64(v, signed) FP64format result; resultsign 1 0 if (v = 0) then result 1 0 SPEFSCRFG 1 0 SPEFSCRFX 1 0 else if (signed = S) then if (v0 = 1) then v 1 ¬v + 1 resultsign 1 1 maxexp 1 1054 sc 1 0 while (v0 = 0) v 1 v << 1 sc 1 sc + 1 v0 1 0 // clear U bit resultexp 1 maxexp - sc guard 1 v53 sticky 1 (v54:63 0) // Report sticky and guard bits SPEFSCRFG 1 guard SPEFSCRFX 1 sticky if (guard | sticky) then SPEFSCRFINXS 1 1 // Round the result resultfrac 1 v1:52 result 1 Round64(result, guard, sticky) return result 316 Power ISATM -- Book I Version 2.04 Appendix D. Assembler Extended Mnemonics In order to make assembler language programs simpler to write and easier to understand, a set of extended mne- monics and symbols is provided that defines simple shorthand for the most frequently used forms of Branch Condi- tional, Compare, Trap, Rotate and Shift, and certain other instructions. Assemblers should provide the extended mnemonics and symbols listed here, and may provide others. D.1 Symbols The following symbols are defined for use in instructions (basic or extended mnemonics) that specify a Condition Register field or a Condition Register bit. The first five (lt, ..., un) identify a bit number within a CR field. The remainder (cr0, ..., cr7) identify a CR field. An expression in which a CR field symbol is multiplied by 4 and then added to a bit- number-within-CR-field symbol and 32 can be used to identify a CR bit. Symbol Value Meaning lt 0 Less than gt 1 Greater than eq 2 Equal so 3 Summary overflow un 3 Unordered (after floating-point comparison) cr0 0 CR Field 0 cr1 1 CR Field 1 cr2 2 CR Field 2 cr3 3 CR Field 3 cr4 4 CR Field 4 cr5 5 CR Field 5 cr6 6 CR Field 6 cr7 7 CR Field 7 The extended mnemonics in Sections D.2.2 and D.3 require identification of a CR bit: if one of the CR field symbols is used, it must be multiplied by 4 and added to a bit-number-within-CR-field (value in the range 0-3, explicit or sym- bolic) and 32. The extended mnemonics in Sections D.2.3 and D.5 require identification of a CR field: if one of the CR field symbols is used, it must not be multiplied by 4 or added to 32. (For the extended mnemonics in Section D.2.3, the bit number within the CR field is part of the extended mnemonic. The programmer identifies the CR field, and the Assembler does the multiplication and addition required to produce a CR bit number for the BI field of the underlying basic mnemonic.) Appendix D. Assembler Extended Mnemonics 317 Version 2.04 D.2 Branch Mnemonics The mnemonics discussed in this section are variations of the Branch Conditional instructions. Note: bclr, bclrl, bcctr, and bcctrl each serve as both a basic and an extended mnemonic. The Assembler will rec- ognize a bclr, bclrl, bcctr, or bcctrl mnemonic with three operands as the basic form, and a bclr, bclrl, bcctr, or bcctrl mnemonic with two operands as the extended form. In the extended form the BH operand is omitted and assumed to be 0b00. Similarly, for all the extended mnemonics described in Sections D.2.2 - D.2.4 that devolve to any of these four basic mnemonics the BH operand can either be coded or omitted. If it is omitted it is assumed to be 0b00. D.2.1 BO and BI Fields The 5-bit BO and BI fields control whether the branch is taken. Providing an extended mnemonic for every possible combination of these fields would be neither useful nor practical. The mnemonics described in Sections D.2.2 - D.2.4 include the most useful cases. Other cases can be coded using a basic Branch Conditional mnemonic (bc[l][a], bclr[l], bcctr[l]) with the appropriate operands. D.2.2 Simple Branch Mnemonics Instructions using one of the mnemonics in Table 11 that tests a Condition Register bit specify the corresponding bit as the first operand. The symbols defined in Section D.1 can be used in this operand. Notice that there are no extended mnemonics for relative and absolute unconditional branches. For these the basic mnemonics b, ba, bl, and bla should be used. Table 11: Simple branch mnemonics LR not Set LR Set Branch Semantics bc bca bclr bcctr bcl bcla bclrl bcctrl Relative Absolute To LR To CTR Relative Absolute To LR To CTR Branch unconditionally - - blr bctr - - blrl bctrl Branch if CRBI=1 bt bta btlr btctr btl btla btlrl btctrl Branch if CRBI=0 bf bfa bflr bfctr bfl bfla bflrl bfctrl Decrement CTR, branch if bdnz bdnza bdnzlr - bdnzl bdnzla bdnzlrl - CTR nonzero Decrement CTR, branch if bdnzt bdnzta bdnztlr - bdnztl bdnztla bdnztlrl - CTR nonzero and CRBI=1 Decrement CTR, branch if bdnzf bdnzfa bdnzflr - bdnzfl bdnzfla bdnzflrl - CTR nonzero and CRBI=0 Decrement CTR, branch if bdz bdza bdzlr - bdzl bdzla bdzlrl - CTR zero Decrement CTR, branch if bdzt bdzta bdztlr - bdztl bdztla bdztlrl - CTR zero and CRBI=1 Decrement CTR, branch if bdzf bdzfa bdzflr - bdzfl bdzfla bdzflrl - CTR zero and CRBI=0 Examples 1. Decrement CTR and branch if it is still nonzero (closure of a loop controlled by a count loaded into CTR). bdnz target (equivalent to: bc 16,0,target) 2. Same as (1) but branch only if CTR is nonzero and condition in CR0 is "equal". bdnzt eq,target (equivalent to: bc 8,2,target) 3. Same as (2), but "equal" condition is in CR5. bdnzt 4×cr5+eq,target (equivalent to: bc 8,22,target) 318 Power ISATM -- Book I Version 2.04 4. Branch if bit 59 of CR is 0. bf 27,target (equivalent to: bc 4,27,target) 5. Same as (4), but set the Link Register. This is a form of conditional "call". bfl 27,target (equivalent to: bcl 4,27,target) D.2.3 Branch Mnemonics Incorporating Conditions In the mnemonics defined in Table 12, the test of a bit in a Condition Register field is encoded in the mnemonic. Instructions using the mnemonics in Table 12 specify the CR field as an optional first operand. One of the CR field symbols defined in Section D.1 can be used for this operand. If the CR field being tested is CR Field 0, this operand need not be specified unless the resulting basic mnemonic is bclr[l] or bcctr[l] and the BH operand is specified. A standard set of codes has been adopted for the most common combinations of branch conditions. Code Meaning lt Less than le Less than or equal eq Equal ge Greater than or equal gt Greater than nl Not less than ne Not equal ng Not greater than so Summary overflow ns Not summary overflow un Unordered (after floating-point comparison) nu Not unordered (after floating-point comparison) These codes are reflected in the mnemonics shown in Table 12. Table 12: Branch mnemonics incorporating conditions LR not Set LR Set Branch Semantics bc bca bclr bcctr bcl bcla bclrl bcctrl Relative Absolute To LR To CTR Relative Absolute To LR To CTR Branch if less than blt blta bltlr bltctr bltl bltla bltlrl bltctrl Branch if less than or equal ble blea blelr blectr blel blela blelrl blectrl Branch if equal beq beqa beqlr beqctr beql beqla beqlrl beqctrl Branch if greater than or equal bge bgea bgelr bgectr bgel bgela bgelrl bgectrl Branch if greater than bgt bgta bgtlr bgtctr bgtl bgtla bgtlrl bgtctrl Branch if not less than bnl bnla bnllr bnlctr bnll bnlla bnllrl bnlctrl Branch if not equal bne bnea bnelr bnectr bnel bnela bnelrl bnectrl Branch if not greater than bng bnga bnglr bngctr bngl bngla bnglrl bngctrl Branch if summary overflow bso bsoa bsolr bsoctr bsol bsola bsolrl bsoctrl Branch if not summary overflow bns bnsa bnslr bnsctr bnsl bnsla bnslrl bnsctrl Branch if unordered bun buna bunlr bunctr bunl bunla bunlrl bunctrl Branch if not unordered bnu bnua bnulr bnuctr bnul bnula bnulrl bnuctrl Examples 1. Branch if CR0 reflects condition "not equal". bne target (equivalent to: bc 4,2,target) 2. Same as (1), but condition is in CR3. Appendix D. Assembler Extended Mnemonics 319 Version 2.04 bne cr3,target (equivalent to: bc 4,14,target) 3. Branch to an absolute target if CR4 specifies "greater than", setting the Link Register. This is a form of condi- tional "call". bgtla cr4,target (equivalent to: bcla 12,17,target) 4. Same as (3), but target address is in the Count Register. bgtctrl cr4 (equivalent to: bcctrl 12,17,0) D.2.4 Branch Prediction Software can use the "at" bits of Branch Conditional instructions to provide a hint to the processor about the behavior of the branch. If, for a given such instruction, the branch is almost always taken or almost always not taken, a suffix can be added to the mnemonic indicating the value to be used for the "at" bits. + Predict branch to be taken (at=0b11) - Predict branch not to be taken (at=0b10) Such a suffix can be added to any Branch Conditional mnemonic, either basic or extended, that tests either the Count Register or a CR bit (but not both). Assemblers should use 0b00 as the default value for the "at" bits, indicating that software has offered no prediction. Examples 1. Branch if CR0 reflects condition "less than", specifying that the branch should be predicted to be taken. blt+ target 2. Same as (1), but target address is in the Link Register and the branch should be predicted not to be taken. bltlr- 320 Power ISATM -- Book I Version 2.04 D.3 Condition Register Logical Mnemonics The Condition Register Logical instructions can be used to set (to 1), clear (to 0), copy, or invert a given Condition Register bit. Extended mnemonics are provided that allow these operations to be coded easily. Table 13: Condition Register logical mnemonics Operation Extended Mnemonic Equivalent to Condition Register set crset bx creqv bx,bx,bx Condition Register clear crclr bx crxor bx,bx,bx Condition Register move crmove bx,by cror bx,by,by Condition Register not crnot bx,by crnor bx,by,by The symbols defined in Section D.1 can be used to identify the Condition Register bits. Examples 1. Set CR bit 57. crset 25 (equivalent to: creqv 25,25,25) 2. Clear the SO bit of CR0. crclr so (equivalent to: crxor 3,3,3) 3. Same as (2), but SO bit to be cleared is in CR3. crclr 4×cr3+so (equivalent to: crxor 15,15,15) 4. Invert the EQ bit. crnot eq,eq (equivalent to: crnor 2,2,2) 5. Same as (4), but EQ bit to be inverted is in CR4, and the result is to be placed into the EQ bit of CR5. crnot 4×cr5+eq,4×cr4+eq (equivalent to: crnor 22,18,18) D.4 Subtract Mnemonics D.4.1 Subtract Immediate Although there is no "Subtract Immediate" instruction, its effect can be achieved by using an Add Immediate instruc- tion with the immediate operand negated. Extended mnemonics are provided that include this negation, making the intent of the computation clearer. subi Rx,Ry,value (equivalent to: addi Rx,Ry,-value) subis Rx,Ry,value (equivalent to: addis Rx,Ry,-value) subic Rx,Ry,value (equivalent to: addic Rx,Ry,-value) subic. Rx,Ry,value (equivalent to: addic. Rx,Ry,-value) D.4.2 Subtract The Subtract From instructions subtract the second operand (RA) from the third (RB). Extended mnemonics are pro- vided that use the more "normal" order, in which the third operand is subtracted from the second. Both these mne- monics can be coded with a final "o" and/or "." to cause the OE and/or Rc bit to be set in the underlying instruction. sub Rx,Ry,Rz (equivalent to: subf Rx,Rz,Ry) subc Rx,Ry,Rz (equivalent to: subfc Rx,Rz,Ry) Appendix D. Assembler Extended Mnemonics 321 Version 2.04 D.5 Compare Mnemonics The L field in the fixed-point Compare instructions controls whether the operands are treated as 64-bit quantities or as 32-bit quantities. Extended mnemonics are provided that represent the L value in the mnemonic rather than requir- ing it to be coded as a numeric operand. The BF field can be omitted if the result of the comparison is to be placed into CR Field 0. Otherwise the target CR field must be specified as the first operand. One of the CR field symbols defined in Section D.1 can be used for this operand. Note: The basic Compare mnemonics of Power ISA are the same as those of POWER, but the POWER instructions have three operands while the Power ISA instructions have four. The Assembler will recognize a basic Compare mne- monic with three operands as the POWER form, and will generate the instruction with L=0. (Thus the Assembler must require that the BF field, which normally can be omitted when CR Field 0 is the target, be specified explicitly if L is.) D.5.1 Doubleword Comparisons Table 14: Doubleword compare mnemonics Operation Extended Mnemonic Equivalent to Compare doubleword immediate cmpdi bf,ra,si cmpi bf,1,ra,si Compare doubleword cmpd bf,ra,rb cmp bf,1,ra,rb Compare logical doubleword immediate cmpldi bf,ra,ui cmpli bf,1,ra,ui Compare logical doubleword cmpld bf,ra,rb cmpl bf,1,ra,rb Examples 1. Compare register Rx and immediate value 100 as unsigned 64-bit integers and place result into CR0. cmpldi Rx,100 (equivalent to: cmpli 0,1,Rx,100) 2. Same as (1), but place result into CR4. cmpldi cr4,Rx,100 (equivalent to: cmpli 4,1,Rx,100) 3. Compare registers Rx and Ry as signed 64-bit integers and place result into CR0. cmpd Rx,Ry (equivalent to: cmp 0,1,Rx,Ry) D.5.2 Word Comparisons Table 15: Word compare mnemonics Operation Extended Mnemonic Equivalent to Compare word immediate cmpwi bf,ra,si cmpi bf,0,ra,si Compare word cmpw bf,ra,rb cmp bf,0,ra,rb Compare logical word immediate cmplwi bf,ra,ui cmpli bf,0,ra,ui Compare logical word cmplw bf,ra,rb cmpl bf,0,ra,rb Examples 1. Compare bits 32:63 of register Rx and immediate value 100 as signed 32-bit integers and place result into CR0. cmpwi Rx,100 (equivalent to: cmpi 0,0,Rx,100) 2. Same as (1), but place result into CR4. cmpwi cr4,Rx,100 (equivalent to: cmpi 4,0,Rx,100) 3. Compare bits 32:63 of registers Rx and Ry as unsigned 32-bit integers and place result into CR0. cmplw Rx,Ry (equivalent to: cmpl 0,0,Rx,Ry) 322 Power ISATM -- Book I Version 2.04 D.6 Trap Mnemonics The mnemonics defined in Table 16 are variations of the Trap instructions, with the most useful values of TO repre- sented in the mnemonic rather than specified as a numeric operand. A standard set of codes has been adopted for the most common combinations of trap conditions. Code Meaning TO encoding < > = u lt Less than 16 1 0 0 0 0 le Less than or equal 20 1 0 1 0 0 eq Equal 4 0 0 1 0 0 ge Greater than or equal 12 0 1 1 0 0 gt Greater than 8 0 1 0 0 0 nl Not less than 12 0 1 1 0 0 ne Not equal 24 1 1 0 0 0 ng Not greater than 20 1 0 1 0 0 llt Logically less than 2 0 0 0 1 0 lle Logically less than or equal 6 0 0 1 1 0 lge Logically greater than or equal 5 0 0 1 0 1 lgt Logically greater than 1 0 0 0 0 1 lnl Logically not less than 5 0 0 1 0 1 lng Logically not greater than 6 0 0 1 1 0 u Unconditionally with parameters 31 1 1 1 1 1 (none) Unconditional 31 1 1 1 1 1 These codes are reflected in the mnemonics shown in Table 16. Table 16: Trap mnemonics 64-bit Comparison 32-bit Comparison Trap Semantics tdi td twi tw Immediate Register Immediate Register Trap unconditionally - - - trap Trap unconditionally with parameters tdui tdu twui twu Trap if less than tdlti tdlt twlti twlt Trap if less than or equal tdlei tdle twlei twle Trap if equal tdeqi tdeq tweqi tweq Trap if greater than or equal tdgei tdge twgei twge Trap if greater than tdgti tdgt twgti twgt Trap if not less than tdnli tdnl twnli twnl Trap if not equal tdnei tdne twnei twne Trap if not greater than tdngi tdng twngi twng Trap if logically less than tdllti tdllt twllti twllt Trap if logically less than or equal tdllei tdlle twllei twlle Trap if logically greater than or equal tdlgei tdlge twlgei twlge Trap if logically greater than tdlgti tdlgt twlgti twlgt Trap if logically not less than tdlnli tdlnl twlnli twlnl Trap if logically not greater than tdlngi tdlng twlngi twlng Appendix D. Assembler Extended Mnemonics 323 Version 2.04 Examples 1. Trap if register Rx is not 0. tdnei Rx,0 (equivalent to: tdi 24,Rx,0) 2. Same as (1), but comparison is to register Ry. tdne Rx,Ry (equivalent to: td 24,Rx,Ry) 3. Trap if bits 32:63 of register Rx, considered as a 32-bit quantity, are logically greater than 0x7FF. twlgti Rx,0x7FF (equivalent to: twi 1,Rx,0x7FF) 4. Trap unconditionally. trap (equivalent to: tw 31,0,0) 5. Trap unconditionally with immediate parameters Rx and Ry tdu Rx,Ry (equivalent to: td 31,Rx,Ry) 324 Power ISATM -- Book I Version 2.04 D.7 Rotate and Shift Mnemonics The Rotate and Shift instructions provide powerful and general ways to manipulate register contents, but can be diffi- cult to understand. Extended mnemonics are provided that allow some of the simpler operations to be coded easily. Mnemonics are provided for the following types of operation. Extract Select a field of n bits starting at bit position b in the source register; left or right justify this field in the target register; clear all other bits of the target register to 0. Insert Select a left-justified or right-justified field of n bits in the source register; insert this field starting at bit posi- tion b of the target register; leave other bits of the target register unchanged. (No extended mnemonic is provided for insertion of a left-justified field when operating on doublewords, because such an insertion requires more than one instruction.) Rotate Rotate the contents of a register right or left n bits without masking. Shift Shift the contents of a register right or left n bits, clearing vacated bits to 0 (logical shift). Clear Clear the leftmost or rightmost n bits of a register to 0. Clear left and shift left Clear the leftmost b bits of a register, then shift the register left by n bits. This operation can be used to scale a (known nonnegative) array index by the width of an element. D.7.1 Operations on Doublewords All these mnemonics can be coded with a final "." to cause the Rc bit to be set in the underlying instruction. Table 17: Doubleword rotate and shift mnemonics Operation Extended Mnemonic Equivalent to Extract and left justify immediate extldi ra,rs,n,b (n > 0) rldicr ra,rs,b,n-1 Extract and right justify immediate extrdi ra,rs,n,b (n > 0) rldicl ra,rs,b+n,64-n Insert from right immediate insrdi ra,rs,n,b (n > 0) rldimi ra,rs,64-(b+n),b Rotate left immediate rotldi ra,rs,n rldicl ra,rs,n,0 Rotate right immediate rotrdi ra,rs,n rldicl ra,rs,64-n,0 Rotate left rotld ra,rs,rb rldcl ra,rs,rb,0 Shift left immediate sldi ra,rs,n (n < 64) rldicr ra,rs,n,63-n Shift right immediate srdi ra,rs,n (n < 64) rldicl ra,rs,64-n,n Clear left immediate clrldi ra,rs,n (n < 64) rldicl ra,rs,0,n Clear right immediate clrrdi ra,rs,n (n < 64) rldicr ra,rs,0,63-n Clear left and shift left immediate clrlsldi ra,rs,b,n (n <= b < 64) rldic ra,rs,n,b-n Examples 1. Extract the sign bit (bit 0) of register Ry and place the result right-justified into register Rx. extrdi Rx,Ry,1,0 (equivalent to: rldicl Rx,Ry,1,63) 2. Insert the bit extracted in (1) into the sign bit (bit 0) of register Rz. insrdi Rz,Rx,1,0 (equivalent to: rldimi Rz,Rx,63,0) 3. Shift the contents of register Rx left 8 bits. sldi Rx,Rx,8 (equivalent to: rldicr Rx,Rx,8,55) 4. Clear the high-order 32 bits of register Ry and place the result into register Rx. clrldi Rx,Ry,32 (equivalent to: rldicl Rx,Ry,0,32) Appendix D. Assembler Extended Mnemonics 325 Version 2.04 D.7.2 Operations on Words All these mnemonics can be coded with a final "." to cause the Rc bit to be set in the underlying instruction. The operations as described above apply to the low-order 32 bits of the registers, as if the registers were 32-bit registers. The Insert operations either preserve the high-order 32 bits of the target register or place rotated data there; the other operations clear these bits. Table 18: Word rotate and shift mnemonics Operation Extended Mnemonic Equivalent to Extract and left justify immediate extlwi ra,rs,n,b (n > 0) rlwinm ra,rs,b,0,n-1 Extract and right justify immediate extrwi ra,rs,n,b (n > 0) rlwinm ra,rs,b+n,32-n,31 Insert from left immediate inslwi ra,rs,n,b (n > 0) rlwimi ra,rs,32-b,b,(b+n)-1 Insert from right immediate insrwi ra,rs,n,b (n > 0) rlwimi ra,rs,32-(b+n),b,(b+n)-1 Rotate left immediate rotlwi ra,rs,n rlwinm ra,rs,n,0,31 Rotate right immediate rotrwi ra,rs,n rlwinm ra,rs,32-n,0,31 Rotate left rotlw ra,rs,rb rlwnm ra,rs,rb,0,31 Shift left immediate slwi ra,rs,n (n < 32) rlwinm ra,rs,n,0,31-n Shift right immediate srwi ra,rs,n (n < 32) rlwinm ra,rs,32-n,n,31 Clear left immediate clrlwi ra,rs,n (n < 32) rlwinm ra,rs,0,n,31 Clear right immediate clrrwi ra,rs,n (n < 32) rlwinm ra,rs,0,0,31-n Clear left and shift left immediate clrlslwi ra,rs,b,n (n b < 32) rlwinm ra,rs,n,b-n,31-n Examples 1. Extract the sign bit (bit 32) of register Ry and place the result right-justified into register Rx. extrwi Rx,Ry,1,0 (equivalent to: rlwinm Rx,Ry,1,31,31) 2. Insert the bit extracted in (1) into the sign bit (bit 32) of register Rz. insrwi Rz,Rx,1,0 (equivalent to: rlwimi Rz,Rx,31,0,0) 3. Shift the contents of register Rx left 8 bits, clearing the high-order 32 bits. slwi Rx,Rx,8 (equivalent to: rlwinm Rx,Rx,8,0,23) 4. Clear the high-order 16 bits of the low-order 32 bits of register Ry and place the result into register Rx, clearing the high-order 32 bits of register Rx. clrlwi Rx,Ry,16 (equivalent to: rlwinm Rx,Ry,0,16,31) 326 Power ISATM -- Book I Version 2.04 D.8 Move To/From Special Purpose Register Mnemonics The mtspr and mfspr instructions specify a Special Purpose Register (SPR) as a numeric operand. Extended mne- monics are provided that represent the SPR in the mnemonic rather than requiring it to be coded as an operand. Table 19: Extended mnemonics for moving to/from an SPR Move To SPR Move From SPR Special Purpose Register Extended Equivalent to Extended Equivalent to Fixed-Point Exception Register (XER) mtxer Rx mtspr 1,Rx mfxer Rx mfspr Rx,1 Link Register (LR) mtlr Rx mtspr 8,Rx mflr Rx mfspr Rx,8 Count Register (CTR) mtctr Rx mtspr 9,Rx mfctr Rx mfspr Rx,9 PPR mtppr Rx mtspr 896,Rx mfppr Rx mfspr Rx,896 Examples 1. Copy the contents of register Rx to the XER. mtxer Rx (equivalent to: mtspr 1,Rx) 2. Copy the contents of the LR to register Rx. mflr Rx (equivalent to: mfspr Rx,8) 3. Copy the contents of register Rx to the CTR. mtctr Rx (equivalent to: mtspr 9,Rx) D.9 Miscellaneous Mnemonics No-op Many Power ISA instructions can be coded in a way such that, effectively, no operation is performed. An extended mnemonic is provided for the preferred form of no-op. If an implementation performs any type of run-time optimization related to no-ops, the preferred form is the no-op that will trigger this. nop (equivalent to: ori 0,0,0) Load Immediate The addi and addis instructions can be used to load an immediate value into a register. Extended mnemonics are provided to convey the idea that no addition is being performed but merely data movement (from the immediate field of the instruction to a register). Load a 16-bit signed immediate value into register Rx. li Rx,value (equivalent to: addi Rx,0,value) Load a 16-bit signed immediate value, shifted left by 16 bits, into register Rx. lis Rx,value (equivalent to: addis Rx,0,value) Appendix D. Assembler Extended Mnemonics 327 Version 2.04 Load Address This mnemonic permits computing the value of a base-displacement operand, using the addi instruction which nor- mally requires separate register and immediate operands. la Rx,D(Ry) (equivalent to: addi Rx,Ry,D) The la mnemonic is useful for obtaining the address of a variable specified by name, allowing the Assembler to sup- ply the base register number and compute the displacement. If the variable v is located at offset Dv bytes from the address in register Rv, and the Assembler has been told to use register Rv as a base for references to the data struc- ture containing v, then the following line causes the address of v to be loaded into register Rx. la Rx,v (equivalent to: addi Rx,Rv,Dv) Move Register Several Power ISA instructions can be coded in a way such that they simply copy the contents of one register to another. An extended mnemonic is provided to convey the idea that no computation is being performed but merely data movement (from one register to another). The following instruction copies the contents of register Ry to register Rx. This mnemonic can be coded with a final "." to cause the Rc bit to be set in the underlying instruction. mr Rx,Ry (equivalent to: or Rx,Ry,Ry) Complement Register Several Power ISA instructions can be coded in a way such that they complement the contents of one register and place the result into another register. An extended mnemonic is provided that allows this operation to be coded easily. The following instruction complements the contents of register Ry and places the result into register Rx. This mne- monic can be coded with a final "." to cause the Rc bit to be set in the underlying instruction. not Rx,Ry (equivalent to: nor Rx,Ry,Ry) Move To/From Condition Register This mnemonic permits copying the contents of the low-order 32 bits of a GPR to the Condition Register, using the same style as the mfcr instruction. mtcr Rx (equivalent to: mtcrf 0xFF,Rx) The following instructions may generate either the (old) mtcrf or mfcr instructions or the (new) mtocrf or mfocrf instruction, respectively, depending on the target machine type assembler parameter. mtcrf FXM,Rx mfcr Rx All three extended mnemonics in this subsection are being phased out. In future assemblers the form "mtcr Rx" may not exist, and the mtcrf and mfcr mnemonics may generate the old form instructions (with bit 11 = 0) regardless of the target machine type assembler parameter, or may cease to exist. 328 Power ISATM -- Book I Version 2.04 Appendix D. Assembler Extended Mnemonics 329 Version 2.04 330 Power ISATM -- Book I Version 2.04 Appendix E. Programming Examples E.1 Multiple-Precision Shifts them to the case N=2 when the more stringent restric- tion on shift amount is met. For shifts with immediate This section gives examples of how multiple-precision shift amounts only the case N=3 is shown, because the shifts can be programmed. more stringent restriction on shift amount is always met. A multiple-precision shift is defined to be a shift of an N-doubleword quantity (64-bit mode) or an N-word In the examples it is assumed that GPRs 2 and 3 (and quantity (32-bit mode), where N>1. The quantity to be 4) contain the quantity to be shifted, and that the result shifted is contained in N registers. The shift amount is is to be placed into the same registers, except for the specified either by an immediate value in the instruc- immediate left shifts in 64-bit mode for which the result tion, or by a value in a register. is placed into GPRs 3, 4, and 5. In all cases, for both input and result, the lowest-numbered register contains The examples shown below distinguish between the the highest-order part of the data and highest-num- cases N=2 and N>2. If N=2, the shift amount may be in bered register contains the lowest-order part. For the range 0 through 127 (64-bit mode) or 0 through 63 non-immediate shifts, the shift amount is assumed to (32-bit mode), which are the maximum ranges sup- be in GPR 6. For immediate shifts, the shift amount is ported by the Shift instructions used. However if N>2, assumed to be greater than 0. GPRs 0 and 31 are used the shift amount must be in the range 0 through 63 as scratch registers. (64-bit mode) or 0 through 31 (32-bit mode), in order for the examples to yield the desired result. The specific For N>2, the number of instructions required is 2N-1 instance shown for N>2 is N=3; extending those code (immediate shifts) or 3N-1 (non-immediate shifts). sequences to larger N is straightforward, as is reducing Appendix E. Programming Examples 331 Version 2.04 Multiple-precision shifts in 64-bit Multiple-precision shifts in 32-bit mode [Category: 64-Bit] mode Shift Left Immediate, N = 3 (shift amnt < 64) Shift Left Immediate, N = 3 (shift amnt < 32) rldicr r5,r4,sh,63-sh rlwinm r2,r2,sh,0,31-sh rldimi r4,r3,0,sh rlwimi r2,r3,sh,32-sh,31 rldicl r4,r4,sh,0 rlwinm r3,r3,sh,0,31-sh rldimi r3,r2,0,sh rlwimi r3,r4,sh,32-sh,31 rldicl r3,r3,sh,0 rlwinm r4,r4,sh,0,31-sh Shift Left, N = 2 (shift amnt < 128) Shift Left, N = 2 (shift amnt < 64) subfic r31,r6,64 subfic r31,r6,32 sld r2,r2,r6 slw r2,r2,r6 srd r0,r3,r31 srw r0,r3,r31 or r2,r2,r0 or r2,r2,r0 addi r31,r6,-64 addi r31,r6,-32 sld r0,r3,r31 slw r0,r3,r31 or r2,r2,r0 or r2,r2,r0 sld r3,r3,r6 slw r3,r3,r6 Shift Left, N = 3 (shift amnt < 64) Shift Left, N = 3 (shift amnt < 32) subfic r31,r6,64 subfic r31,r6,32 sld r2,r2,r6 slw r2,r2,r6 srd r0,r3,r31 srw r0,r3,r31 or r2,r2,r0 or r2,r2,r0 sld r3,r3,r6 slw r3,r3,r6 srd r0,r4,r31 srw r0,r4,r31 or r3,r3,r0 or r3,r3,r0 sld r4,r4,r6 slw r4,r4,r6 Shift Right Immediate, N = 3 (shift amnt < 64) Shift Right Immediate, N = 3 (shift amnt < 32) rldimi r4,r3,0,64-sh rlwinm r4,r4,32-sh,sh,31 rldicl r4,r4,64-sh,0 rlwimi r4,r3,32-sh,0,sh-1 rldimi r3,r2,0,64-sh rlwinm r3,r3,32-sh,sh,31 rldicl r3,r3,64-sh,0 rlwimi r3,r2,32-sh,0,sh-1 rldicl r2,r2,64-sh,sh rlwinm r2,r2,32-sh,sh,31 Shift Right, N = 2 (shift amnt < 128) Shift Right, N = 2 (shift amnt < 64) subfic r31,r6,64 subfic r31,r6,32 srd r3,r3,r6 srw r3,r3,r6 sld r0,r2,r31 slw r0,r2,r31 or r3,r3,r0 or r3,r3,r0 addi r31,r6,-64 addi r31,r6,-32 srd r0,r2,r31 srw r0,r2,r31 or r3,r3,r0 or r3,r3,r0 srd r2,r2,r6 srw r2,r2,r6 Shift Right, N = 3 (shift amnt < 64) Shift Right, N = 3 (shift amnt < 32) subfic r31,r6,64 subfic r31,r6,32 srd r4,r4,r6 srw r4,r4,r6 sld r0,r3,r31 slw r0,r3,r31 or r4,r4,r0 or r4,r4,r0 srd r3,r3,r6 srw r3,r3,r6 sld r0,r2,r31 slw r0,r2,r31 or r3,r3,r0 or r3,r3,r0 srd r2,r2,r6 srw r2,r2,r6 332 Power ISATM -- Book I Version 2.04 Multiple-precision shifts in 64-bit Multiple-precision shifts in 32-bit mode, continued [Category: 64-Bit] mode, continued Shift Right Algebraic Immediate, N = 3 (shift amnt < Shift Right Algebraic Immediate, N = 3 (shift amnt < 64) 32) rldimi r4,r3,0,64-sh rlwinm r4,r4,32-sh,sh,31 rldicl r4,r4,64-sh,0 rlwimi r4,r3,32-sh,0,sh-1 rldimi r3,r2,0,64-sh rlwinm r3,r3,32-sh,sh,31 rldicl r3,r3,64-sh,0 rlwimi r3,r2,32-sh,0,sh-1 sradi r2,r2,sh srawi r2,r2,sh Shift Right Algebraic, N = 2 (shift amnt < 128) Shift Right Algebraic, N = 2 (shift amnt < 64) subfic r31,r6,64 subfic r31,r6,32 srd r3,r3,r6 srw r3,r3,r6 sld r0,r2,r31 slw r0,r2,r31 or r3,r3,r0 or r3,r3,r0 addic. r31,r6,-64 addic. r31,r6,-32 srad r0,r2,r31 sraw r0,r2,r31 ble $+8 ble $+8 ori r3,r0,0 ori r3,r0,0 srad r2,r2,r6 sraw r2,r2,r6 Shift Right Algebraic, N = 3 (shift amnt < 64) Shift Right Algebraic, N = 3 (shift amnt < 32) subfic r31,r6,64 subfic r31,r6,32 srd r4,r4,r6 srw r4,r4,r6 sld r0,r3,r31 slw r0,r3,r31 or r4,r4,r0 or r4,r4,r0 srd r3,r3,r6 srw r3,r3,r6 sld r0,r2,r31 slw r0,r2,r31 or r3,r3,r0 or r3,r3,r0 srad r2,r2,r6 sraw r2,r2,r6 Appendix E. Programming Examples 333 Version 2.04 E.2 Floating-Point Conversions [Category: Floating-Point] This section gives examples of how the Floating-Point Warning: Some of the examples use the fsel instruc- Conversion instructions can be used to perform various tion. Care must be taken in using fsel if IEEE compati- conversions. bility is required, or if the values being tested can be NaNs or infinities; see Section E.3.4, "Notes" on page 336. E.2.1 Conversion from E.2.3 Conversion from Floating-Point Number to Floating-Point Number to Unsigned Floating-Point Integer Fixed-Point Integer Doubleword The full convert to floating-point integer function can be The full convert to unsigned fixed-point integer double- implemented with the sequence shown below, assum- word function can be implemented with the sequence ing the floating-point value to be converted is in FPR 1 shown below, assuming the floating-point value to be and the result is returned in FPR 3. converted is in FPR 1, the value 0 is in FPR 0, the value 264-2048 is in FPR 3, the value 263 is in FPR 4 and mtfsb0 23 #clear VXCVI GPR 4, the result is returned in GPR 3, and a double- fctid[z] f3,f1 #convert to fx int word at displacement "disp" from the address in GPR 1 fcfid f3,f3 #convert back again mcrfs 7,5 #VXCVI to CR can be used as scratch space. bf 31,$+8 #skip if VXCVI was 0 fsel f2,f1,f1,f0 #use 0 if < 0 fmr f3,f1 #input was fp int fsub f5,f3,f1 #use max if > max fsel f2,f5,f2,f3 E.2.2 Conversion from fsub f5,f2,f4 #subtract 263 fcmpu cr2,f2,f4 #use diff if >= 263 Floating-Point Number to Signed fsel f2,f5,f5,f2 fctid[z] f2,f2 #convert to fx int Fixed-Point Integer Doubleword stfd f2,disp(r1) #store float ld r3,disp(r1) #load dword The full convert to signed fixed-point integer double- blt cr2,$+8 #add 263 if input word function can be implemented with the sequence add r3,r3,r4 # was >= 263 shown below, assuming the floating-point value to be converted is in FPR 1, the result is returned in GPR 3, and a doubleword at displacement "disp" from the E.2.4 Conversion from address in GPR 1 can be used as scratch space. Floating-Point Number to Signed fctid[z] f2,f1 #convert to dword int Fixed-Point Integer Word stfd f2,disp(r1) #store float ld r3,disp(r1) #load dword The full convert to signed fixed-point integer word func- tion can be implemented with the sequence shown below, assuming the floating-point value to be con- verted is in FPR 1, the result is returned in GPR 3, and a doubleword at displacement "disp" from the address in GPR 1 can be used as scratch space. fctiw[z] f2,f1 #convert to fx int stfd f2,disp(r1) #store float lwa r3,disp+4(r1) #load word algebraic 334 Power ISATM -- Book I Version 2.04 E.2.5 Conversion from An alternative, shorter, sequence can be used if round- ing according to FSCPRRN is desired and FPSCRRN Floating-Point Number to Unsigned specifies Round toward +Infinity or Round toward Fixed-Point Integer Word -Infinity, or if it is acceptable for the rounded answer to be either of the two representable floating-point inte- The full convert to unsigned fixed-point integer word gers nearest to the given fixed-point integer. In this function can be implemented with the sequence shown case the full convert from unsigned fixed-point integer below, assuming the floating-point value to be con- doubleword function can be implemented with the verted is in FPR 1, the value 0 is in FPR 0, the value sequence shown below, assuming the value 264 is in 232-1 is in FPR 3, the result is returned in GPR 3, and a FPR 2. doubleword at displacement "disp" from the address in GPR 1 can be used as scratch space. std r3,disp(r1) #store dword lfd f1,disp(r1) #load float fsel f2,f1,f1,f0 #use 0 if < 0 fcfid f1,f1 #convert to fp int fsub f4,f3,f1 #use max if > max fadd f4,f1,f2 #add 264 fsel f2,f4,f2,f3 fsel f1,f1,f1,f4 # if r3 < 0 fctid[z] f2,f2 #convert to fx int stfd f2,disp(r1) #store float lwz r3,disp+4(r1) #load word and zero E.2.8 Conversion from Signed Fixed-Point Integer Word to Float- E.2.6 Conversion from Signed ing-Point Number Fixed-Point Integer Doubleword to The full convert from signed fixed-point integer word Floating-Point Number function can be implemented with the sequence shown below, assuming the fixed-point value to be converted The full convert from signed fixed-point integer double- is in GPR 3, the result is returned in FPR 1, and a dou- word function, using the rounding mode specified by bleword at displacement "disp" from the address in FPSCRRN, can be implemented with the sequence GPR 1 can be used as scratch space. (The result is shown below, assuming the fixed-point value to be con- exact.) verted is in GPR 3, the result is returned in FPR 1, and a doubleword at displacement "disp" from the address extsw r3,r3 #extend sign in GPR 1 can be used as scratch space. std r3,disp(r1) #store dword lfd f1,disp(r1) #load float std r3,disp(r1) #store dword fcfid f1,f1 #convert to fp int lfd f1,disp(r1) #load float fcfid f1,f1 #convert to fp int E.2.9 Conversion from Unsigned E.2.7 Conversion from Unsigned Fixed-Point Integer Word to Float- Fixed-Point Integer Doubleword to ing-Point Number Floating-Point Number The full convert from unsigned fixed-point integer word function can be implemented with the sequence shown The full convert from unsigned fixed-point integer dou- below, assuming the fixed-point value to be converted bleword function, using the rounding mode specified by is in GPR 3, the result is returned in FPR 1, and a dou- FPSCRRN, can be implemented with the sequence bleword at displacement "disp" from the address in shown below, assuming the fixed-point value to be con- GPR 1 can be used as scratch space. (The result is verted is in GPR 3, the value 232 is in FPR 4, the result exact.) is returned in FPR 1, and two doublewords at displace- ment "disp" from the address in GPR 1 can be used as rldicl r0,r3,0,32 #zero-extend std r0,disp(r1) #store dword scratch space. lfd f1,disp(r1) #load float rldicl r2,r3,32,32 #isolate high half fcfid f1,f1 #convert to fp int rldicl r0,r3,0,32 #isolate low half std r2,disp(r1) #store dword both std r0,disp+8(r1) lfd f2,disp(r1) #load float both lfd f1,disp+8(r1) fcfid f2,f2 #convert each half to fcfid f1,f1 # fp int (exact result) fmadd f1,f4,f2,f1 #(232)×high + low Appendix E. Programming Examples 335 Version 2.04 E.3 Floating-Point Selection [Category: Floating-Point] This section gives examples of how the Floating Select in FPRs fa, fb, fx, fy, and fz. FPR fs is assumed to be instruction can be used to implement floating-point min- available for scratch space. imum and maximum functions, and certain simple Additional examples can be found in Section E.2, forms of if-then-else constructions, without branching. "Floating-Point Conversions [Category: Floating-Point]" The examples show program fragments in an imagi- on page 334. nary, C-like, high-level programming language, and the Warning: Care must be taken in using fsel if IEEE corresponding program fragment using fsel and other compatibility is required, or if the values being tested Power ISA instructions. In the examples, a, b, x, y, and can be NaNs or infinities; see Section E.3.4. z are floating-point variables, which are assumed to be E.3.1 Comparison to Zero E.3.4 Notes The following Notes apply to the preceding examples High-level language: Power ISA: Notes and to the corresponding cases using the other three if a 0.0 then x 1 y fsel fx,fa,fy,fz (1) arithmetic relations (<, , and ). They should also be else x 1 z considered when any other use of fsel is contemplated. if a > 0.0 then x 1 y fneg fs,fa (1,2) In these Notes, the "optimized program" is the Power else x 1 z fsel fx,fs,fz,fy ISA program shown, and the "unoptimized program" if a = 0.0 then x 1 y fsel fx,fa,fy,fz (1) (not shown) is the corresponding Power ISA program else x 1 z fneg fs,fa that uses fcmpu and Branch Conditional instructions fsel fx,fs,fx,fz instead of fsel. 1. The unoptimized program affects the VXSNAN bit E.3.2 Minimum and Maximum of the FPSCR, and therefore may cause the sys- tem error handler to be invoked if the correspond- High-level language: Power ISA: Notes ing exception is enabled, while the optimized x 1 min(a,b) fsub fs,fa,fb (3,4,5) program does not affect this bit. This property of fsel fx,fs,fb,fa the optimized program is incompatible with the IEEE standard. x 1 max(a,b) fsub fs,fa,fb (3,4,5) fsel fx,fs,fa,fb 2. The optimized program gives the incorrect result if a is a NaN. E.3.3 Simple if-then-else 3. The optimized program gives the incorrect result if a and/or b is a NaN (except that it may give the Constructions correct result in some cases for the minimum and maximum functions, depending on how those func- High-level language: Power ISA: Notes tions are defined to operate on NaNs). if a b then x 1 y fsub fs,fa,fb (4,5) 4. The optimized program gives the incorrect result if else x 1 z fsel fx,fs,fy,fz a and b are infinities of the same sign. (Here it is if a > b then x 1 y fsub fs,fb,fa (3,4,5) assumed that Invalid Operation Exceptions are else x 1 z fsel fx,fs,fz,fy disabled, in which case the result of the subtraction if a = b then x 1 y fsub fs,fa,fb (4,5) is a NaN. The analysis is more complicated if else x 1 z fsel fx,fs,fy,fz Invalid Operation Exceptions are enabled, fneg fs,fs because in that case the target register of the sub- fsel fx,fs,fx,fz traction is unchanged.) 5. The optimized program affects the OX, UX, XX, and VXISI bits of the FPSCR, and therefore may cause the system error handler to be invoked if the corresponding exceptions are enabled, while the unoptimized program does not affect these bits. This property of the optimized program is incom- patible with the IEEE standard. 336 Power ISATM -- Book I Version 2.04 E.4 Vector Unaligned Storage Operations [Category: Vector] E.4.1 Loading a Unaligned Quad- word Using Permute from Big-Endian Storage The following sequence of instructions copies the unaligned quadword storage operand into VRT. # Assumptions: # Rb != 0 and contents of Rb = 0xB lvx Vhi,0,Rb # load MSQ lvsl Vp,0,Rb # set permute control vector addi Rb,Rb,16 # address of LSQ lvx Vlo,0,Rb # load LSQ perm Vt,Vhi,Vlo,Vp # align the data Appendix E. Programming Examples 337 Version 2.04 338 Power ISATM -- Book I Version 2.04 Book II: Power ISA Virtual Environment Architecture Book II: Power ISA Virtual Environment Architecture 339 Version 2.04 340 Power ISATM -- Book II Version 2.04 Chapter 1. Storage Model 1.1 Definitions . . . . . . . . . . . . . . . . . . . 341 1.6.6 Variable Length Encoded (VLE) 1.2 Introduction . . . . . . . . . . . . . . . . . . 342 Instructions . . . . . . . . . . . . . . . . . . . . . 346 1.3 Virtual Storage . . . . . . . . . . . . . . . 342 1.7 Shared Storage . . . . . . . . . . . . . . 347 1.4 Single-copy Atomicity . . . . . . . . . 343 1.7.1 Storage Access Ordering . . . . 347 1.5 Cache Model . . . . . . . . . . . . . . . . 343 1.7.2 Storage Ordering of I/O Accesses . . 1.6 Storage Control Attributes . . . . . . 344 349 1.6.1 Write Through Required . . . . . . 344 1.7.3 Atomic Update . . . . . . . . . . . . . . 349 1.6.2 Caching Inhibited . . . . . . . . . . . 344 1.7.3.1 Reservations . . . . . . . . . . . . . 349 1.6.3 Memory Coherence Required [Cate- 1.7.3.2 Forward Progress. . . . . . . . . . 351 gory: Memory Coherence] . . . . . . . . . 345 1.8 Instruction Storage . . . . . . . . . . . . 351 1.6.4 Guarded . . . . . . . . . . . . . . . . . . 345 1.8.1 Concurrent Modification and Execu- 1.6.5 Endianness [Category: Embed- tion of Instructions . . . . . . . . . . . . . . . . 353 ded.Little-Endian] . . . . . . . . . . . . . . . . 346 1 program order 1.1 Definitions The execution of instructions in the order required The following definitions, in addition to those specified by the sequential execution model. (See the sec- in Book I, are used in this Book. In these definitions, tion entitled "Instruction Execution Order" in Book "Load instruction" includes the Cache Management I. A dcbz instruction that modifies storage which and other instructions that are stated in the instruction contains instructions has the same effect with descriptions to be "treated as a Load", and similarly for respect to the sequential execution model as a "Store instruction". Store instruction as described there.) 1 processor 1 storage location A hardware component that executes the instruc- A contiguous sequence of one or more bytes in tions specified in a program. storage. When used in association with a specific instruction or the instruction fetching mechanism, 1 system the length of the sequence of one or more bytes is A combination of processors, storage, and associ- typically implied by the operation. In other uses, it ated mechanisms that is capable of executing pro- may refer more abstractly to a group of bytes which grams. Sometimes the reference to system share common storage attributes. includes services provided by the operating sys- tem. 1 storage access An access to a storage location. There are three 1 main storage (mutually exclusive) kinds of storage access. The level of storage hierarchy in which all storage state is visible to all processors and mechanisms - data access in the system. An access to the storage location specified by 1 instruction storage a Load or Store instruction, or, if the access is The view of storage as seen by the mechanism performed "out-of-order" (see Book III), an that fetches instructions. access to a storage location as if it were the storage location specified by a Load or Store 1 data storage instruction. The view of storage as seen by a Load or Store instruction. Chapter 1. Storage Model 341 Version 2.04 - instruction fetch 1 page (virtual page) 2n contiguous bytes of storage aligned such that An access for the purpose of fetching an the effective address of the first byte in the page is instruction. an integral multiple of the page size for which pro- - implicit access tection and control attributes are independently specifiable and for which reference and change An access by the processor for the purpose of status are independently recorded. address translation or reference and change recording (see Book III-S). 1 block The aligned unit of storage operated on by the 1 caused by, associated with Cache Management instructions. The size of an - caused by instruction cache block may differ from the size of a data cache block, and both sizes may vary A storage access is said to be caused by an between implementations. The maximum block instruction if the instruction is a Load or Store size is equal to the minimum page size. and the access (data access) is to the storage location specified by the instruction. 1 aligned storage access A load or store is aligned if the address of the tar- - associated with get storage location is a multiple of the size of the A storage access is said to be associated with transfer effected by the instruction. an instruction if the access is for the purpose of fetching the instruction (instruction fetch), or is a data access caused by the instruction, or 1.2 Introduction is an implicit access that occurs as a side effect of fetching or executing the instruction. The Power ISA User Instruction Set Architecture, dis- cussed in Book I, defines storage as a linear array of 1 prefetched instructions bytes indexed from 0 to a maximum of 264-1. Each byte Instructions for which a copy of the instruction has is identified by its index, called its address, and each been fetched from instruction storage, but the byte contains a value. This information is sufficient to instruction has not yet been executed. allow the programming of applications that require no 1 uniprocessor special features of any particular system environment. A system that contains one processor. The Power ISA Virtual Environment Architecture, described herein, expands this simple storage model to 1 multiprocessor include caches, virtual storage, and shared storage A system that contains two or more processors. multiprocessors. The Power ISA Virtual Environment Architecture, in conjunction with services based on the 1 shared storage multiprocessor Power ISA Operating Environment Architecture (see A multiprocessor that contains some common stor- Book III) and provided by the operating system, permits age, which all the processors in the system can explicit control of this expanded storage model. A sim- access. ple model for sequential execution allows at most one 1 performed storage access to be performed at a time and requires A load or instruction fetch by a processor or mech- that all storage accesses appear to be performed in anism (P1) is performed with respect to any pro- program order. In contrast to this simple model, the cessor or mechanism (P2) when the value to be Power ISA specifies a relaxed model of storage consis- returned by the load or instruction fetch can no tency. In a multiprocessor system that allows multiple longer be changed by a store by P2. A store by P1 copies of a storage location, aggressive implementa- is performed with respect to P2 when a load by P2 tions of the architecture can permit intervals of time from the location accessed by the store will return during which different copies of a storage location have the value stored (or a value stored subsequently). different values. This chapter describes features of the An instruction cache block invalidation by P1 is Power ISA that enable programmers to write correct performed with respect to P2 when an instruction programs for this storage model. fetch by P2 will not be satisfied from the copy of the block that existed in its instruction cache when the instruction causing the invalidation was exe- 1.3 Virtual Storage cuted, and similarly for a data cache block invalida- tion. The Power ISA system implements a virtual storage model for applications. This means that a combination The preceding definitions apply regardless of of hardware and software can present a storage model whether P1 and P2 are the same entity. that allows applications to exist within a "virtual" address space larger than either the effective address space or the real address space. 342 Power ISATM -- Book II Version 2.04 Each program can access 264 bytes of "effective An access that is not atomic is performed as a set of address" (EA) space, subject to limitations imposed by smaller disjoint atomic accesses. The number and the operating system. In a typical Power ISA system, alignment of these accesses are implementation- each program's EA space is a subset of a larger "virtual dependent, as is the relative order in which they are address" (VA) space managed by the operating sys- performed. tem. The results for several combinations of loads and Each effective address is translated to a real address stores to the same or overlapping locations are (i.e., to an address of a byte in real storage or on an I/O described below. device) before being used to access storage. The 1. When two processors execute atomic stores to hardware accomplishes this, using the address transla- locations that do not overlap, and no other stores tion mechanism described in Book III. The operating are performed to those locations, the contents of system manages the real (physical) storage resources those locations are the same as if the two stores of the system, by setting up the tables and other infor- were performed by a single processor. mation used by the hardware address translation mechanism. 2. When two processors execute atomic stores to the same storage location, and no other store is per- formed to that location, the contents of that loca- In general, real storage may not be large enough to tion are the result stored by one of the processors. map all the virtual pages used by the currently active applications. With support provided by hardware, the 3. When two processors execute stores that have the operating system can attempt to use the available real same target location and are not guaranteed to be pages to map a sufficient set of virtual pages of the atomic, and no other store is performed to that applications. If a sufficient set is maintained, "paging" location, the result is some combination of the activity is minimized. If not, performance degradation is bytes stored by both processors. likely. 4. When two processors execute stores to overlap- The operating system can support restricted access to ping locations, and no other store is performed to virtual pages (including read/write, read only, and no those locations, the result is some combination of access; see Book III), based on system standards (e.g., the bytes stored by the processors to the overlap- program code might be read only) and application ping bytes. The portions of the locations that do requests. not overlap contain the bytes stored by the proces- sor storing to the location. 5. When a processor executes an atomic store to a 1.4 Single-copy Atomicity location, a second processor executes an atomic load from that location, and no other store is per- An access is single-copy atomic, or simply atomic, if it formed to that location, the value returned by the is always performed in its entirety with no visible frag- load is the contents of the location before the store mentation. Atomic accesses are thus serialized: each or the contents of the location after the store. happens in its entirety in some order, even when that order is not specified in the program or enforced 6. When a load and a store with the same target loca- between processors. tion can be executed simultaneously, and no other store is performed to that location, the value Vector storage accesses are not guaranteed to be returned by the load is some combination of the atomic. The following other types of single-register contents of the location before the store and the accesses are always atomic: contents of the location after the store. 1 byte accesses (all bytes are aligned on byte boundaries) 1 halfword accesses aligned on halfword boundaries 1.5 Cache Model 1 word accesses aligned on word boundaries 1 doubleword accesses aligned on doubleword A cache model in which there is one cache for instruc- boundaries (64-bit implementations only; see tions and another cache for data is called a "Harvard- Section 1.2 of Book III-E) style" cache. This is the model assumed by the Power ISA, e.g., in the descriptions of the Cache Management No other accesses are guaranteed to be atomic. For instructions in Section 3.2. Alternative cache models example, the access caused by the following instruc- may be implemented (e.g., a "combined cache" model, tions is not guaranteed to be atomic. in which a single cache is used for both instructions and 1 any Load or Store instruction for which the oper- data, or a model in which there are several levels of and is unaligned caches), but they support the programming model 1 lmw, stmw, lswi, lswx, stswi, stswx implied by a Harvard-style cache. 1 any Cache Management instruction Chapter 1. Storage Model 343 Version 2.04 The processor is not required to maintain copies of described below. The storage control attributes are the storage locations in the instruction cache consistent following. with modifications to those storage locations (e.g., 1 Write Through Required modifications caused by Store instructions). 1 Caching Inhibited A location in the data cache is considered to be modi- 1 Memory Coherence Required fied in that cache if the location has been modified 1 Guarded (e.g., by a Store instruction) and the modified data have 1 Endianness not been written to main storage. These attributes have meaning only when an effective Cache Management instructions are provided so that address is translated by the processor performing the programs can manage the caches when needed. For storage access. example, program management of the caches is Additional storage control attributes may be needed when a program generates or modifies code defined for some implementations. See Section 4.8 of that will be executed (i.e., when the program modifies Book III-E for additional information. data in storage and then attempts to execute the modi- fied data as instructions). The Cache Management Programming Note instructions are also useful in optimizing the use of memory bandwidth in such applications as graphics The Write Through Required and Caching Inhibited and numerically intensive computing. The functions attributes are mutually exclusive because, as performed by these instructions depend on the storage described below, the Write Through Required control attributes associated with the specified storage attribute permits the storage location to be in the location (see Section 1.6, "Storage Control Attributes"). data cache while the Caching Inhibited attribute does not. The Cache Management instructions allow the program to do the following. Storage that is Write Through Required or Caching Inhibited is not intended to be used for general-pur- 1 invalidate the copy of storage in an instruction pose programming. For example, the lwarx, ldarx, cache block (icbi) stwcx., and stdcx. instructions may cause the 1 provide a hint that an instruction will probably system data storage error handler to be invoked if soon be accessed from a specified instruction they specify a location in storage having either of cache block (icbt) these attributes. 1 provide a hint that the program will probably soon access a specified data cache block (dcbt, dcbtst) In the remainder of this section, "Load instruction" 1 allocate a data cache block and set the con- includes the Cache Management and other instructions tents of that block to zeros, but perform no opera- that are stated in the instruction descriptions to be tion if no write access is allowed to the data cache "treated as a Load", and similarly for "Store instruction". block (dcba) 1 set the contents of a data cache block to zeros (dcbz) 1.6.1 Write Through Required 1 copy the contents of a modified data cache block A store to a Write Through Required storage location is to main storage (dcbst) performed in main storage. A Store instruction that 1 copy the contents of a modified data cache block specifies a location in Write Through Required storage to main storage and make the copy of the block in may cause additional locations in main storage to be the data cache invalid (dcbf or dcbfl) accessed. If a copy of the block containing the speci- fied location is retained in the data cache, the store is 1.6 Storage Control Attributes also performed in the data cache. The store does not cause the block to be considered to be modified in the Some operating systems may provide a means to allow data cache. programs to specify the storage control attributes In general, accesses caused by separate Store instruc- described in this section. Because the support pro- tions that specify locations in Write Through Required vided for these attributes by the operating system may storage may be combined into one access. Such com- vary between systems, the details of the specific sys- bining does not occur if the Store instructions are sepa- tem being used must be known before these attributes rated by a sync, eieio, or mbar instruction. can be used. Storage control attributes are associated with units of 1.6.2 Caching Inhibited storage that are multiples of the page size. Each stor- age access is performed according to the storage con- An access to a Caching Inhibited storage location is trol attributes of the specified storage location, as performed in main storage. A Load instruction that specifies a location in Caching Inhibited storage may 344 Power ISATM -- Book II Version 2.04 cause additional locations in main storage to be the storage has the Memory Coherence Required accessed unless the specified location is also Guarded. attribute for all processors that access it. An instruction fetch from Caching Inhibited storage may cause additional words in main storage to be accessed. Programming Note No copy of the accessed locations is placed into the Operating systems that allow programs to request caches. that storage not be Memory Coherence Required In general, non-overlapping accesses caused by sepa- should provide services to assist in managing rate Load instructions that specify locations in Caching memory coherence for such storage, including all Inhibited storage may be combined into one access, as system-dependent aspects thereof. may non-overlapping accesses caused by separate In most systems the default is that all storage is Store instructions that specify locations in Caching Memory Coherence Required. For some applica- Inhibited storage. Such combining does not occur if the tions in some systems, software management of Load or Store instructions are separated by a sync or coherence may yield better performance. In such mbar instruction, or by an eieio instruction if cases, a program can request that a given unit of the storage is also Guarded. storage not be Memory Coherence Required, and can manage the coherence of that storage by using 1.6.3 Memory Coherence the sync instruction, the Cache Management instructions, and services provided by the operat- Required [Category: Memory ing system. Coherence] An access to a Memory Coherence Required storage 1.6.4 Guarded location is performed coherently, as follows. A data access to a Guarded storage location is per- Memory coherence refers to the ordering of stores to a formed only if either (a) the access is caused by an single location. Atomic stores to a given location are instruction that is known to be required by the sequen- coherent if they are serialized in some order, and no tial execution model, or (b) the access is a load and the processor or mechanism is able to observe any subset storage location is already in a cache. If the storage is of those stores as occurring in a conflicting order. This also Caching Inhibited, only the storage location speci- serialization order is an abstract sequence of values; fied by the instruction is accessed; otherwise any stor- the physical storage location need not assume each of age location in the cache block containing the specified the values written to it. For example, a processor may storage location may be accessed. update a location several times before the value is writ- ten to physical storage. The result of a store operation For the Server environment, instructions are not is not available to every processor or mechanism at the fetched from virtual storage that is Guarded. If the same instant, and it may be that a processor or mecha- instruction addressed by the current instruction nism observes only some of the values that are written address is in such storage, the system instruction stor- to a location. However, when a location is accessed age error handler may be invoked (see Section 6.5.5 of atomically and coherently by all processors and mech- Book III-S). anisms, the sequence of values loaded from the loca- tion by any processor or mechanism during any interval of time forms a subsequence of the sequence of values that the location logically held during that interval. That is, a processor or mechanism can never load a "newer" value first and then, later, load an "older" value. Memory coherence is managed in blocks called coher- ence blocks. Their size is implementation-dependent, but is larger than a word and is usually the size of a cache block. For storage that is not Memory Coherence Required, software must explicitly manage memory coherence to the extent required by program correctness. The oper- ations required to do this may be system-dependent. Because the Memory Coherence Required attribute for a given storage location is of little use unless all proces- sors that access the location do so coherently, in state- ments about Memory Coherence Required storage elsewhere in this document it is generally assumed that Chapter 1. Storage Model 345 Version 2.04 Programming Note In some implementations, instructions may be exe- cuted before they are known to be required by the sequential execution model. Because the results of instructions executed in this manner are dis- carded if it is later determined that those instruc- tions would not have been executed in the sequential execution model, this behavior does not affect most programs. This behavior does affect programs that access storage locations that are not "well-behaved" (e.g., a storage location that represents a control register on an I/O device that, when accessed, causes the device to perform an operation). To avoid unin- tended results, programs that access such storage locations should request that the storage be Guarded, and should prevent such storage loca- tions from being in a cache (e.g., by requesting that the storage also be Caching Inhibited). 1.6.5 Endianness [Category: Embedded.Little-Endian] The Endianness storage control attribute specifies the byte ordering (Big-Endian or Little-Endian) that is used when the storage location is accessed; see Section 1.10 of Book I. 1.6.6 Variable Length Encoded (VLE) Instructions VLE storage is used to store VLE instructions. Instruc- tions fetched from VLE storage are processed as VLE instructions. VLE storage must also be Big-Endian. Instructions fetched from VLE storage that is Little- Endian cause a Byte-ordering exception, and the sys- tem instruction storage error handler will be invoked. The VLE attribute has no effect on data accesses. See Chapter 1 of Book VLE. 346 Power ISATM -- Book II Version 2.04 1.7 Shared Storage accesses pairwise, as follows. Let A be a set of storage accesses that includes all storage This architecture supports the sharing of storage accesses associated with instructions preceding between programs, between different instances of the the barrier-creating instruction, and let B be a set same program, and between processors and other of storage accesses that includes all storage mechanisms. It also supports access to a storage loca- accesses associated with instructions following the tion by one or more programs using different effective barrier-creating instruction. For each applicable addresses. All these cases are considered storage pair ai,bj of storage accesses such that ai is in A sharing. Storage is shared in blocks that are an inte- and bj is in B, the memory barrier ensures that ai gral number of pages. will be performed with respect to any processor or mechanism, to the extent required by the associ- When the same storage location has different effective ated Memory Coherence Required attributes, addresses, the addresses are said to be aliases. Each before bj is performed with respect to that proces- application can be granted separate access privileges sor or mechanism. to aliased pages. The ordering done by a memory barrier is said to be "cumulative" if it also orders storage accesses 1.7.1 Storage Access Ordering that are performed by processors and mechanisms other than P1, as follows. The storage model for the ordering of storage accesses is weakly consistent. This model provides an opportu- - A includes all applicable storage accesses by nity for improved performance over a model that has any such processor or mechanism that have stronger consistency rules, but places the responsibility been performed with respect to P1 before the on the program to ensure that ordering or synchroniza- memory barrier is created. tion instructions are properly placed when storage is - B includes all applicable storage accesses by shared by two or more programs. any such processor or mechanism that are The order in which the processor performs storage performed after a Load instruction executed accesses, the order in which those accesses are per- by that processor or mechanism has returned formed with respect to another processor or mecha- the value stored by a store that is in B. nism, and the order in which those accesses are No ordering should be assumed among the storage performed in main storage may all be different. Several accesses caused by a single instruction (i.e, by an means of enforcing an ordering of storage accesses instruction for which the access is not atomic), and no are provided to allow programs to share storage with means are provided for controlling that order. other programs, or with mechanisms such as I/O devices. These means are listed below. The phrase "to the extent required by the associated Memory Coherence Required attributes" refers to the Memory Coherence Required attribute, if any, associated with each access. 1 If two Store instructions specify storage locations that are both Caching Inhibited and Guarded, the corresponding storage accesses are performed in program order with respect to any processor or mechanism. 1 If a Load instruction depends on the value returned by a preceding Load instruction (because the value is used to compute the effective address specified by the second Load), the corresponding storage accesses are performed in program order with respect to any processor or mechanism to the extent required by the associated Memory Coher- ence Required attributes. This applies even if the dependency has no effect on program logic (e.g., the value returned by the first Load is ANDed with zero and then added to the effective address spec- ified by the second Load). 1 When a processor (P1) executes a Synchronize, eieio, or mbar instruction a memory bar- rier is created, which orders applicable storage Chapter 1. Storage Model 347 Version 2.04 Programming Note Because stores cannot be performed "out-of-order" not order the Store Conditional's store with respect (see Book III), if a Store instruction depends on the to storage accesses caused by instructions that value returned by a preceding Load instruction follow the Branch. (because the value returned by the Load is used to 1 Because processors may predict branch target compute either the effective address specified by the addresses and branch condition resolution, control Store or the value to be stored), the corresponding stor- dependencies (e.g., branches) do not order stor- age accesses are performed in program order. The age accesses except as described above. For same applies if whether the Store instruction is exe- example, when a subroutine returns to its caller the cuted depends on a conditional Branch instruction that return address may be predicted, with the result in turn depends on the value returned by a preceding that loads caused by instructions at or after the Load instruction. return address may be performed before the load Because an isync instruction prevents the execution of that obtains the return address is performed. instructions following the isync until instructions pre- Because processors may implement nonarchitected ceding the isync have completed, if an isync follows a duplicates of architected resources (e.g., GPRs, CR conditional Branch instruction that depends on the fields, and the Link Register), resource dependencies value returned by a preceding Load instruction, the (e.g., specification of the same target register for two load on which the Branch depends is performed before Load instructions) do not order storage accesses. any loads caused by instructions following the isync. This applies even if the effects of the "dependency" are Examples of correct uses of dependencies, sync, independent of the value loaded (e.g., the value is lwsync, eieio, and mbar to order storage compared to itself and the Branch tests the EQ bit in accesses can be found in Appendix B. "Programming the selected CR field), and even if the branch target is Examples for Sharing Storage" on page 385. the sequentially next instruction. Because the storage model is weakly consistent, the With the exception of the cases described above and sequential execution model as applied to instructions earlier in this section, data dependencies and control that cause storage accesses guarantees only that dependencies do not order storage accesses. Exam- those accesses appear to be performed in program ples include the following. order with respect to the processor executing the instructions. For example, an instruction may com- 1 If a Load instruction specifies the same storage plete, and subsequent instructions may be executed, location as a preceding Store instruction and the before storage accesses caused by the first instruction location is in storage that is not Caching Inhibited, have been performed. However, for a sequence of the load may be satisfied from a "store queue" (a atomic accesses to the same storage location, if the buffer into which the processor places stored val- location is in storage that is Memory Coherence ues before presenting them to the storage sub- Required the definition of coherence guarantees that system), and not be visible to other processors and the accesses are performed in program order with mechanisms. A consequence is that if a subse- respect to any processor or mechanism that accesses quent Store depends on the value returned by the the location coherently, and similarly if the location is in Load, the two stores need not be performed in pro- storage that is Caching Inhibited. gram order with respect to other processors and mechanisms. Because accesses to storage that is Caching Inhibited 1 Because a Store Conditional instruction may com- are performed in main storage, memory barriers and plete before its store has been performed, a condi- dependencies on Load instructions order such tional Branch instruction that depends on the CR0 accesses with respect to any processor or mechanism value set by a Store Conditional instruction does even if the storage is not Memory Coherence Required. 348 Power ISATM -- Book II Version 2.04 the doubleword forms ldarx and stdcx. is the same Programming Note except for obvious substitutions. The first example below illustrates cumulative ordering of storage accesses preceding a memory The lwarx instruction is a load from a word-aligned barrier, and the second illustrates cumulative order- location that has two side effects. Both of these side ing of storage accesses following a memory barrier. effects occur at the same time that the load is per- Assume that locations X, Y, and Z initially contain formed. the value 0. 1. A reservation for a subsequent stwcx. instruction is created. Example 1: 2. The memory coherence mechanism is notified that Processor A: a reservation exists for the storage location speci- stores the value 1 to location X fied by the lwarx. Processor B: The stwcx. instruction is a store to a word-aligned loca- loads from location X obtaining the value tion that is conditioned on the existence of the reserva- 1, executes a sync instruction, then tion created by the lwarx and on whether the same stores the value 2 to location Y storage location is specified by both instructions. To Processor C: emulate an atomic operation with these instructions, it loads from location Y obtaining the value is necessary that both the lwarx and the stwcx. spec- 2, executes a sync instruction, then loads ify the same storage location. from location X A stwcx. performs a store to the target storage location Example 2: only if the storage location specified by the lwarx that established the reservation has not been stored into by Processor A: another processor or mechanism since the reservation stores the value 1 to location X, executes was created. If the storage locations specified by the a sync instruction, then stores the value 2 two instructions differ, the store is not necessarily per- to location Y formed. Processor B: A stwcx. that performs its store is said to "succeed". loops loading from location Y until the value 2 is obtained, then stores the value Examples of the use of lwarx and stwcx. are given in 3 to location Z Appendix B. "Programming Examples for Sharing Stor- age" on page 385. Processor C: loads from location Z obtaining the value A successful stwcx. to a given location may complete 3, executes a sync instruction, then loads before its store has been performed with respect to from location X other processors and mechanisms. As a result, a sub- sequent load or lwarx from the given location by In both cases, cumulative ordering dictates that the another processor may return a "stale" value. However, value loaded from location X by processor C is 1. a subsequent lwarx from the given location by the other processor followed by a successful stwcx. by that processor is guaranteed to have returned the value 1.7.2 Storage Ordering of I/O stored by the first processor's stwcx. (in the absence of other stores to the given location). Accesses A "coherence domain" consists of all processors and all Programming Note interfaces to main storage. Memory reads and writes The store caused by a successful stwcx. is initiated by mechanisms outside the coherence domain ordered, by a dependence on the reservation, with are performed within the coherence domain in the respect to the load caused by the lwarx that estab- order in which they enter the coherence domain and lished the reservation, such that the two storage are performed as coherent accesses. accesses are performed in program order with respect to any processor or mechanism. 1.7.3 Atomic Update The Load And Reserve and Store Conditional instruc- 1.7.3.1 Reservations tions together permit atomic update of a shared storage The ability to emulate an atomic operation using lwarx location. There are word and doubleword forms of and stwcx. is based on the conditional behavior of each of these instructions. Described here is the oper- stwcx., the reservation created by lwarx, and the ation of the word forms lwarx and stwcx.; operation of clearing of that reservation if the target location is mod- Chapter 1. Storage Model 349 Version 2.04 ified by another processor or mechanism before the Programming Note stwcx. performs its store. One use of lwarx and stwcx. is to emulate a "Com- A reservation is held on an aligned unit of real storage pare and Swap" primitive like that provided by the called a reservation granule. The size of the reservation IBM System/370 Compare and Swap instruction; granule is 2n bytes, where n is implementation-depen- see Section B.1, "Atomic Update Primitives" on dent but is always at least 4 (thus the minimum reserva- page 385. A System/370-style Compare and Swap tion granule size is a quadword). The reservation checks only that the old and current values of the granule associated with effective address EA contains word being tested are equal, with the result that the real address to which EA maps. ("real_addr(EA)" in programs that use such a Compare and Swap to the RTL for the Load And Reserve and Store Condi- control a shared resource can err if the word has tional instructions stands for "real address to which EA been modified and the old value subsequently maps".) restored. The combination of lwarx and stwcx. A processor has at most one reservation at any time. A improves on such a Compare and Swap, because reservation is established by executing a lwarx or ldarx the reservation reliably binds the lwarx and stwcx. instruction, and is lost (or may be lost, in the case of the together. The reservation is always lost if the word third, fifth, sixth and seventh item) if any of the following is modified by another processor or mechanism occur. between the lwarx and stwcx., so the stwcx. never succeeds unless the word has not been 1. The processor holding the reservation executes stored into (by another processor or mechanism) another lwarx or ldarx: this clears the first reserva- since the lwarx. tion and establishes a new one. 2. The processor holding the reservation executes Programming Note any stwcx. or stdcx., regardless of whether the In general, programming conventions must ensure specified address matches the address specified that lwarx and stwcx. specify addresses that by the lwarx or ldarx that established the reserva- match; a stwcx. should be paired with a specific tion. lwarx to the same storage location. Situations in 3. The processor holding the reservation executes a which a stwcx. may erroneously be issued after dcbf or dcbfl to the reservation granule: some lwarx other than that with which it is intended whether the reservation is lost is undefined. to be paired must be scrupulously avoided. For example, there must not be a context switch in 4. Some other processor executes a Store or dcbz to which the processor holds a reservation in behalf of the same reservation granule. the old context, and the new context resumes after 5. Some other processor executes a dcbtst, dcbst, a lwarx and before the paired stwcx.. The stwcx. dcbf (but not dcbfl) to the same reservation in the new context might succeed, which is not granule: whether the reservation is lost is unde- what was intended by the programmer. Such a situ- fined. ation must be prevented by executing a stwcx. or stdcx. that specifies a dummy writable aligned 6. Some other processor executes a dcba to the location as part of the context switch; see same reservation granule: the reservation is lost if Section 6.4.3 of Book III-S and Section 5.5 of Book the instruction causes the target block to be newly III-E. established in a data cache or to be modified; oth- erwise whether the reservation is lost is undefined. 7. Any processor modifies a Reference or Change bit (see Book III-S) in the same reservation granule: whether the reservation is lost is undefined. 8. Some mechanism other than a processor modifies a storage location in the same reservation granule. For the Server environment, interrupts (see Book III-S) do not clear reservations (however, system software invoked by interrupts may clear reservations); for the Embedded environment, interrupts do not necessarily clear reservations (see Book III-E). 350 Power ISATM -- Book II Version 2.04 specify the possible causes of reservation loss in Case Programming Note 3. While the architecture alone cannot provide such a Because the reservation is lost if another processor guarantee, the characteristics listed in Cases 1 and 2 stores anywhere in the reservation granule, lock are necessary conditions for any forward progress words (or doublewords) should be allocated such guarantee. An implementation and operating system that few such stores occur, other than perhaps to can build on them to provide such a guarantee. the lock word itself. (Stores by other processors to the lock word result from contention for the lock, Programming Note and are an expected consequence of using locks to The architecture does not include a "fairness guar- control access to shared storage; stores to other antee". In competing for a reservation, two proces- locations in the reservation granule can cause sors can indefinitely lock out a third. needless reservation loss.) Such allocation can most easily be accomplished by allocating an entire reservation granule for the lock and wasting all but one word. Because reservation granule size is 1.8 Instruction Storage implementation-dependent, portable code must do such allocation dynamically. The instruction execution properties and requirements described in this section, including its subsections, Similar considerations apply to other data that are apply only to instruction execution that is required by shared directly using lwarx and stwcx. (e.g., point- the sequential execution model. ers in certain linked lists; see Section B.3, "List Insertion" on page 389). In this section, including its subsections, it is assumed that all instructions for which execution is attempted are in storage that is not Caching Inhibited and (unless 1.7.3.2 Forward Progress instruction address translation is disabled; see Book III) is not Guarded, and from which instruction fetching Forward progress in loops that use lwarx and stwcx. is does not cause the system error handler to be invoked achieved by a cooperative effort among hardware, sys- (e.g., from which instruction fetching is not prohibited tem software, and application software. by the "address translation mechanism" or the "storage The architecture guarantees that when a processor protection mechanism"; see Book III). executes a lwarx to obtain a reservation for location X and then a stwcx. to store a value to location X, either Programming Note The results of attempting to execute instructions 1. the stwcx. succeeds and the value is written to from storage that does not satisfy this assumption location X, or are described in Section 1.6.2 and Section 1.6.4 of this Book and in Book III. 2. the stwcx. fails because some other processor or mechanism modified location X, or For each instance of executing an instruction from loca- 3. the stwcx. fails because the processor's reserva- tion X, the instruction may be fetched multiple times. tion was lost for some other reason. The instruction cache is not necessarily kept consistent In Cases 1 and 2, the system as a whole makes with the data cache or with main storage. It is the progress in the sense that some processor successfully responsibility of software to ensure that instruction stor- modifies location X. Case 3 covers reservation loss age is consistent with data storage when such consis- required for correct operation of the rest of the system. tency is required for program correctness. This includes cancellation caused by some other pro- cessor writing elsewhere in the reservation granule for After one or more bytes of a storage location have been X, as well as cancellation caused by the operating sys- modified and before an instruction located in that stor- tem in managing certain limited resources such as real age location is executed, software must execute the storage. It may also include implementation-dependent appropriate sequence of instructions to make instruc- causes of reservation loss. tion storage consistent with data storage. Otherwise the result of attempting to execute the instruction is bound- An implementation may make a forward progress guar- edly undefined except as described in Section 1.8.1, antee, defining the conditions under which the system "Concurrent Modification and Execution of Instructions" as a whole makes progress. Such a guarantee must on page 353. Programming Note Following are examples of how to make instruction stor- sistent with data storage may vary between systems, age consistent with data storage. Because the optimal many operating systems will provide a system service instruction sequence to make instruction storage con- to perform this function. Chapter 1. Storage Model 351 Version 2.04 Case 1: The given program does not modify instruc- icbi X #invalidate copy in instr cache tions executed by another program nor does another sync #order invalidation before store program modify the instructions executed by the given # to flag program. stw r0,flag #set flag indicating instruction # storage is now consistent Assume that location X previously contained the The following instruction sequence, executed by the instruction A0; the program modified one of more bytes waiting program, will prevent the waiting programs of that location such that, in data storage, the location from executing the instruction at location X until loca- contains the instruction A1; and location X is wholly tion X in instruction storage is consistent with data stor- contained in a single cache block. The following age, and then will cause any prefetched instructions to instruction sequence will make instruction storage con- be discarded. sistent with data storage such that if the isync was in location X-4, the instruction A1 in location X would be lwz r0,flag #loop until flag = 1 (when 1 is executed immediately after the isync. cmpwi r0,1 # loaded, location X in inst'n bne $-8 # storage is consistent with dcbst X #copy the block to main storage # location X in data storage) sync #order copy before invalidation isync #discard any prefetched inst'ns icbi X #invalidate copy in instr cache isync #discard prefetched instructions In the preceding instruction sequence any context syn- chronizing instruction (e.g., rfid) can be used instead of Case 2: One or more programs execute the instruc- isync. (For Case 1 only isync can be used.) tions that are concurrently being modified by another program. For both cases, if two or more instructions in separate data cache blocks have been modified, the dcbst Assume program A has modified the instruction at loca- instruction in the examples must be replaced by a tion X and other programs are waiting for program A to sequence of dcbst instructions such that each block signal that the new instruction is ready to execute. The containing the modified instructions is copied back to following instruction sequence will make instruction main storage. Similarly, for icbi the sequence must storage consistent with data storage and then set a flag invalidate each instruction cache block containing a to indicate to the waiting programs that the new location of an instruction that was modified. The sync instruction can be executed. instruction that appears above between "dcbst X" and "icbi X" would be placed between the sequence of li r0,1 #put a 1 value in r0 dcbst instructions and the sequence of icbi instruc- dcbst X #copy the block in main storage sync #order copy before invalidation tions. 352 Power ISATM -- Book II Version 2.04 1.8.1 Concurrent Modification and Programming Note Execution of Instructions An example of how failure to satisfy the require- ments given above can cause inconsistent informa- The phrase "concurrent modification and execution of tion to be presented to the system error handler is instructions" (CMODX) refers to the case in which a as follows. If the value X0 (an illegal instruction) is processor fetches and executes an instruction from executed, causing the system illegal instruction instruction storage which is not consistent with data handler to be invoked, and before the error handler storage or which becomes inconsistent with data stor- can load X0 into a register, X0 is replaced with X1, age prior to the completion of its processing. This sec- an Add Immediate instruction, it will appear that a tion describes the only case in which executing this legal instruction caused an illegal instruction instruction under these conditions produces defined exception. results. In the remainder of this section the following terminol- Programming Note ogy is used. It is possible to apply a patch or to instrument a 1 Location X is an arbitrary word-aligned storage given program without the need to suspend or halt location. the program. This can be accomplished by modify- 1 X0 is the value of the contents of location X for ing the example shown in the Programming Note at which software has made the location X in instruc- the end of Section 1.8 where one program is creat- tion storage consistent with data storage. ing instructions to be executed by one or more other programs. 1 X1, X2, ..., Xn are the sequence of the first n values occupying location X after X0. In place of the Store to a flag to indicate to the other programs that the code is ready to be executed, the 1 Xn is the first value of X subsequent to X0 for which program that is applying the patch would replace a software has again made instruction storage con- patch class instruction in the original program with sistent with data storage. a Branch instruction that would cause any program 1 The "patch class" of instructions consists of the I- executing the Branch to branch to the newly cre- form Branch instruction (b[l][a]) and the preferred ated code. The first instruction in the newly created no-op instruction (ori 0,0,0). code must be an isync, which will cause any prefetched instructions to be discarded, ensuring If the instruction from location X is executed after the that the execution is consistent with the newly cre- copy of location X in instruction storage is made consis- ated code. The instruction storage location con- tent for the value X0 and before it is made consistent for taining the isync instruction in the patch area must the value Xn, the results of executing the instruction are be consistent with data storage with respect to the defined if and only if the following conditions are satis- processor that will execute the patched code fied. before the Store which stores the new Branch 1. The stores that place the values X1, ..., Xn into instruction is performed. location X are atomic stores that modify all four bytes of location X. Programming Note 2. Each Xi, 0 i n, is a patch class instruction. It is believed that all processors that comply with 3. Location X is in storage that is Memory Coherence versions of the architecture that precede Version Required. 2.01 support concurrent modification and execution of instructions as described in this section if the If these conditions are satisfied, the result of each exe- requirements given above are satisfied, and that cution of an instruction from location X will be the exe- most such processors yield boundedly undefined cution of some Xi, 0 i n. The value of the ordinate i results if the requirements given above are not sat- associated with each value executed may be different isfied. However, in general such support has not and the sequence of ordinates i associated with a been verified by processor testing. Also, one such sequence of values executed is not constrained, (e.g., processor is known to yield undefined results in a valid sequence of executions of the instruction at certain cases if the requirements given above are location X could be the sequence Xi, Xi+2, then Xi-1). If not satisfied. these conditions are not satisfied, the results of each such execution of an instruction from location X are boundedly undefined, and may include causing incon- sistent information to be presented to the system error handler. Chapter 1. Storage Model 353 Version 2.04 354 Power ISATM -- Book II Version 2.04 Chapter 2. Effect of Operand Placement on Performance 2.1 Instruction Restart . . . . . . . . . . . 356 The placement (location and alignment) of operands in storage affects relative performance of storage accesses, and may affect it significantly. The best per- Operand Boundary Crossing formance is guaranteed if storage operands are Byte Cache Virtual aligned. In order to obtain the best performance across Size Align. None Block Page2 the widest range of implementations, the programmer should assume the performance model described in Integer Figure 1 with respect to the placement of storage oper- 8 Byte 8 optimal - - ands for the Embedded environment. For the Server 4 good good good environment, Figure 1 applies for Big-Endian byte <4 good good good ordering, and Figure 2 applies for Little-Endian byte 4 Byte 4 optimal - - ordering. Performance of storage accesses varies <4 good good good depending on the following: 2 Byte 2 optimal - - 1. Operand Size <2 good good good 2. Operand Alignment 1 Byte 1 optimal - - 3. Crossing no boundary 4. Crossing a cache block boundary lmw, 4 good good good 5. Crossing a virtual page boundary stmw <4 poor poor poor string good good good The Move Assist instructions have no alignment Float requirements. 8 Byte 8 optimal - - 4 good good poor <4 poor poor poor 4 Byte 4 optimal - - <4 poor poor poor Vector any any optimal3 - - 1 If an instruction causes an access that is not atomic and any portion of the operand is in stor- age that is Write Through Required or Caching Inhibited, performance is likely to be poor. 2 If the storage operand spans two virtual pages that have different storage control attributes or, in the Server environment, spans two segments, performance is likely to be poor. 3 The storage operands for Vector instructions are all assumed to be aligned (see Section 5.4 of Book I). Figure 1. Performance effects of storage operand placement Chapter 2. Effect of Operand Placement on Performance 355 Version 2.04 Any other Load or Store instruction may be partially Operand Boundary Crossing executed and then aborted after having accessed a portion of the storage operand, and then re-executed Byte Cache Virtual (i.e., restarted, by the processor or the operating sys- Size Align. None Block Page2 tem). If an instruction is partially executed, the contents Integer of registers are preserved to the extent that the correct 8 Byte 8 optimal - - result will be produced when the instruction is re-exe- 4 poor poor poor cuted. Additional restrictions on the partial execution of <4 poor poor poor instructions are described in Section 6.6 of Book III-S 4 Byte 4 optimal - - and Section 5.7 of Book III-E. <4 poor poor poor Programming Note 2 Byte 2 optimal - - <2 poor poor poor In order to ensure that the contents of registers are preserved to the extent that a partially executed 1 Byte 1 optimal - - instruction can be re-executed correctly, the regis- Float ters that are preserved must satisfy the following 8 Byte 8 optimal - - conditions. For any given instruction, zero or more 4 poor poor poor of the conditions applies. <4 poor poor poor 1 For a fixed-point Load instruction that is not a multiple or string form, or for an eciwx instruc- 4 Byte 4 optimal - - tion, if RT=RA or RT=RB then the contents of <4 poor poor poor register RT are not altered. Vector 1 For an update form Load or Store instruction, any any optimal3 - - the contents of register RA are not altered. 1 If an instruction causes an access that is not atomic and any portion of the operand is in stor- Programming Note age that is Write Through Required or Caching There are many events that might cause a Load or Inhibited, performance is likely to be poor. 2 Store instruction to be restarted. For example, a If the storage operand spans two virtual pages hardware error may cause execution of the instruc- that have different storage control attributes or, tion to be aborted after part of the access has been in the Server environment, spans two seg- performed, and the recovery operation could then ments, performance is likely to be poor. 3 cause the aborted instruction to be re-executed. The storage operands for Vector instructions are all assumed to be aligned (see Section 5.4 When an instruction is aborted after being partially of Book I). executed, the contents of the instruction pointer indicate that the instruction has not been executed, Figure 2. [Category: Server] Performance effects however, the contents of some registers may have of storage operand placement, Little- been altered and some bytes within the storage Endian operand may have been accessed. The following are examples of an instruction being partially exe- cuted and altering the program state even though it 2.1 Instruction Restart appears that the instruction has not been executed. In this section, "Load instruction" includes the Cache 1. Load Multiple, Load String: Some registers in Management and other instructions that are stated in the range of registers to be loaded may have the instruction descriptions to be "treated as a Load", been altered. and similarly for "Store instruction". 2. Any Store instruction, dcbz: Some bytes of the storage operand may have been altered. The following instructions are never restarted after hav- ing accessed any portion of the storage operand (unless the instruction causes a "Data Address Break- point match", for which the corresponding rules are given in Book III). 1. A Store instruction that causes an atomic access and, for the Embedded environment, accesses storage that is Guarded 2. A Load instruction that causes an atomic access to storage that is Guarded and, for the Server envi- ronment, is also Caching Inhibited 356 Power ISATM -- Book II Version 2.04 Chapter 3. Storage Control Instructions 3.1 Parameters Useful to Application Pro- 3.3.2 Load and Reserve and Store Condi- grams . . . . . . . . . . . . . . . . . . . . . . . . . 357 tional Instructions. . . . . . . . . . . . . . . . . 369 3.2 Cache Management Instructions . 358 3.3.2.1 64-Bit Load and Reserve and 3.2.1 Instruction Cache Instructions . . 359 Store Conditional Instructions [Category: 3.2.2 Data Cache Instructions . . . . . . 360 64-Bit] . . . . . . . . . . . . . . . . . . . . . . . . . 371 3.2.2.1 Obsolete Data Cache Instructions 3.3.3 Memory Barrier Instructions. . . . 372 [Category: Vector.Phased-Out] . . . . . . 368 3.3.4 Wait Instruction. . . . . . . . . . . . . . 375 3.3 Synchronization Instructions. . . . . 369 3.3.1 Instruction Synchronize Instruction . 369 3.1 Parameters Useful to Application Programs It is suggested that the operating system provide a ser- vice that allows an application program to obtain the fol- lowing information. 1. The virtual page sizes 2. Coherence block size 3. Granule sizes for reservations 4. An indication of the cache model implemented (e.g., Harvard-style cache, combined cache) 5. Instruction cache size 6. Data cache size 7. Instruction cache block size 8. Data cache block size 9. Instruction cache associativity 10. Data cache associativity 11. Number of stream IDs supported for the stream variant of dcbt 12. Factors for converting the Time Base to seconds If the caches are combined, the same value should be given for an instruction cache attribute and the corre- sponding data cache attribute. Chapter 3. Storage Control Instructions 357 Version 2.04 3.2 Cache Management Instructions The Cache Management instructions obey the sequen- tial execution model except as described in Section 3.2.1. In the instruction descriptions the statements "this instruction is treated as a Load" and "this instruction is treated as a Store" mean that the instruction is treated as a Load (Store) from (to) the addressed byte with respect to address translation, the definition of program order on page 341, storage protection, reference and change recording, and the storage access ordering described in Section 1.7.1 and is treated as a Read (Write) from (to) the addressed byte with respect to debug events unless otherwise specified. (See Book III- E.) Some Cache Management instructions contain a CT field that is used to specify a cache level within a cache hierarchy or a portion of a cache structure to which the instruction is to be applied. The correspondence between the CT value specified and the cache level is shown below. CT Field Value Cache Level 0 Primary Cache 2 Secondary Cache CT values not shown above may be used to specify implementation-dependent cache levels or implemen- tation-dependent portions of a cache structure. 358 Power ISATM -- Book II Version 2.04 3.2.1 Instruction Cache Instructions Instruction Cache Block Invalidate X-form Instruction Cache Block Touch X-form icbi RA,RB icbt CT, RA, RB [Category: Embedded] 31 /// RA RB 982 / 0 6 11 16 21 31 31 / CT RA RB 22 / 0 6 7 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). If the block containing the byte addressed by EA is in storage that is Memory Coherence Required and a If CT=0, this instruction provides a hint that the program block containing the byte addressed by EA is in the will probably soon execute code from the addressed instruction cache of any processors, the block is invali- location. dated in those instruction caches. If CT0, the operation performed by this instruction is If the block containing the byte addressed by EA is in implementation-dependent, except that the instruction storage that is not Memory Coherence Required and is treated as a no-op for values of CT that are not the block is in the instruction cache of this processor, implemented. the block is invalidated in that instruction cache. The hint is ignored if the block is Caching Inhibited. The function of this instruction is independent of This instruction treated as a Load (see Section 3.2), whether the block containing the byte addressed by EA except that the system instruction storage error handler is in storage that is Write Through Required or Caching is not invoked. Inhibited. Special Registers Altered: This instruction is treated as a Load (see Section 3.2), None except that reference and change recording need not be done. Special Registers Altered: None Programming Note Because the instruction is treated as a Load, the effective address is translated using translation resources that are used for data accesses, even though the block being invalidated was copied into the instruction cache based on translation resources used for instruction fetches (see Book III). Programming Note The invalidation of the specified block need not have been performed with respect to the processor executing the icbi instruction until a subsequent isync instruction has been executed by that pro- cessor. No other instruction or event has the corre- sponding effect. Chapter 3. Storage Control Instructions 359 Version 2.04 3.2.2 Data Cache Instructions Data Cache Block Allocate X-form Data Cache Block Touch X-form dcba RA,RB dcbt RA,RB,TH [Category: Server] [Category: Embedded] dcbt TH,RA,RB [Category: Embedded] 31 /// RA RB 758 / 31 / TH RA RB 278 / 0 6 11 16 21 31 0 6 7 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). This instruction provides a hint that the program will Let the effective address (EA) be the sum (RA|0)+(RB). probably soon store into a portion of the block and the The dcbt instruction provides a hint that describes a contents of the rest of the block are not meaningful to block or data stream, or indicates the expected use the program. The contents of the block are undefined thereof. A hint that the program will probably soon load when the instruction completes. The hint is ignored if from a given storage location is ignored if the location is the block is Caching Inhibited. Caching Inhibited or, for the Server environment, This instruction is treated as a Store (see Section 3.2) Guarded. except that the instruction is treated as a no-op if exe- The only operation that is "caused" by the dcbt instruc- cution of the instruction would cause the system data tion is the providing of the hint. The actions (if any) storage error handler to be invoked. taken by the processor in response to the hint are not Special Registers Altered: considered to be "caused by" or "associated with" the None dcbt instruction (e.g., dcbt is considered not to cause any data accesses). No means are provided by which software can synchronize these actions with the execu- tion of the instruction stream. For example, these actions are not ordered by the memory barrier created by a sync instruction. The dcbt instruction may complete before the opera- tion it causes has been performed. The nature of the hint depends, in part, on the value of the TH field, as specified below. If TH0b1010 this instruction is treated as a Load (see Section 3.2), except that the system data storage error handler is not invoked, and reference and change recording need not be done. Special Registers Altered: None Extended Mnemonics: Extended mnemonics are provided for the Data Cache Block Touch instruction so that it can be coded with the TH value as the last operand for all categories. Extended: Equivalent to: dcbtct RA,RB,TH dcbt for TH values of 0b0000 - 0b0111; other TH values are invalid. dcbtds RA,RB,TH dcbt for TH values of 0b0000 or 0b1000 - 0b1010; other TH values are invalid. 360 Power ISATM -- Book II Version 2.04 tion fields (see the section entitled "Reserved Fields Programming Note and Reserved Values" in Book I), except that a New programs should avoid using the dcbt and reserved value in a defined EA field does not make the dcbtst mnemonics; one of the extended mnemon- instruction form invalid. If a defined EA field contains a ics should be used exclusively. reserved value, the hint provided by the instruction is If the dcbt mnemonic is used with only two undefined. operands, the TH operand assumed to be 0b0000. TH Description TH Field 1000 The dcbt instruction provides a hint that describes certain attributes of a data stream, For all TH field values which are not listed below, the and may indicate that the program will proba- hint provided by the instruction is undefined. bly soon load from the stream. TH=0b0000 The EA is interpreted as follows. If TH=0b0000, the dcbt instruction provides a hint that the program will probably soon load from the block con- EATRUNC D UG / ID taining the byte addressed by EA. 0 57 59 60 63 TH=0b0000 - 0b0111 Bit(s) Description [Category: Cache Specification] 0:56 EATRUNC In addition to the hint specified above for the TH field value of 0b0000, an additional hint is provided indicat- High-order 57 bits of effective address ing that placement of the block in the cache specified of first unit of data stream (i.e., the by the TH field might also improve performance. The effective address of the first unit of the correspondence between each value of the TH field stream is EATRUNC || 70) and the cache to be specified is the same as the corre- 57 Direction (D) spondence between each value the CT field and the cache to be specified as defined in Section 3.2. The 0 Subsequent units are the sequen- hints corresponding to values of the TH field not sup- tially following units. ported by the implementation are undefined. 1 Subsequent units are the sequen- tially preceding units. TH=0b1000 - 0b1111 [Category: Stream] 58 Unlimited/GO (UG) The hints provided by the dcbt instruction provide a 0 No information is provided by the hint regarding a sequence of contiguous data cache UG field. blocks, or indicates the expected use thereof. Such a 1 The number of units in the data sequence is called a "data stream", and a dcbt instruc- stream is unlimited, the program's tion in which TH is set to one of these values is said to need for each block of the stream is be a "data stream variant" of dcbt. In the remainder of not likely to be transient, and the this section, "data stream" may be abbreviated to program will probably soon load "stream". from the stream. When, and how often, effective addresses for a data 59 Reserved stream are translated is implementation-dependent. 60:63 Stream ID (ID) The address and length of such data streams are spec- ified in terms of aligned 128-byte units of storage; in the Stream ID to use for this data stream remainder of this instruction description, "aligned 128- 1010 The dcbt instruction provides a hint that byte unit of storage" is abbreviated to "unit". describes certain attributes of a data stream, Each such data stream is associated, by software, with or indicates that the program will probably a stream ID, which is a resource that the processor soon load from data streams that have been uses to distinguish the data stream from other such described using dcbt instructions in which data streams. The number of stream IDs is an imple- TH0=1 or will probably no longer load from mentation-dependent value in the range 1:16. Stream such data streams. IDs are numbered sequentially starting from 0. The EA is interpreted as follows. If GO=1 and S0b00 the hint provided by the instruction is The encodings of the TH field and of the corresponding undefined; the remainder of this instruction EA values, are as follows. In the EA layout diagrams, fields shown as "/"s are reserved. These fields, and reserved values of defined EA fields, are treated in the same manner as the corresponding cases for instruc- Chapter 3. Storage Control Instructions 361 Version 2.04 description assumes that this combination is If the specified stream ID value is greater than m -1, not used. where m is the number of stream IDs provided by the implementation, and either (a) TH=0b1000 or /// GO S /// UNITCNT T U / ID (b) TH=0b1010 and GO=0 and S0b11, no hint is pro- 0 32 35 47 57 59 60 63 vided by the instruction. Bit(s) Description The following terminology is used to describe the state 0:31 Reserved of a data stream. Except as described in the paragraph 32 GO after the next paragraph, the state of a data stream at a given time is determined by the most recently provided 0 No information is provided by the hint for the stream. GO field. 1 The program will probably soon 1 A data stream for which only descriptive hints have load from all nascent data streams been provided (by dcbt instructions with that have been completely TH=0b1000 and UG=0 or with TH=0b1010 and described, and will probably no GO=0 and S=0b00) is said to be "nascent". A longer load from all other nascent nascent data stream for which both kinds of data streams. All other fields of the descriptive hint have been provided (by both of the EA are ignored. ("Nascent" and dcbt usages listed in the preceding sentence) is "completely described" are defined considered to be "completely described". below.) 1 A data stream for which a hint has been provided 33:34 Stop (S) (by a dcbt instruction with TH=0b1000 and UG=1 or with TH=0b1010 and GO=1) that the program 00 No information is provided by the S will probably soon load from it is said to be "active". field. 01 Reserved 1 A data stream that is either nascent or active is 10 The program will probably no considered to "exist". longer load from the data stream (if 1 A data stream for which a hint has been provided any) associated with the specified (e.g., by a dcbt instruction with TH=0b1010 and stream ID. (All other fields of the S0b00) that the program will probably no longer EA except the ID field are ignored.) load from it is considered no longer to exist. 11 The program will probably no longer load from the data streams The hint provided by a dcbt instruction with associated with all stream IDs. (All TH=0b1000 and UG=1 implicitly includes a hint that the other fields of the EA are ignored.) program will probably no longer load from the data stream (if any) previously associated with the specified 35:46 Reserved stream ID. The hint provided by a dcbt instruction with 47:56 UNITCNT TH=0b1000 and UG=0 or with TH=0b1010 and GO=0 and S=0b00 implicitly includes a hint that the program Number of units in data stream will probably no longer load from the active data stream 57 Transient (T) (if any) previously associated with the specified stream ID. If T=1, the program's need for each block of the data stream is likely to be Interrupts (see Book III) cause all existing data streams transient (i.e., the time interval during to cease to exist. In addition, depending on the imple- which the program accesses the block mentation, certain conditions and events may cause an is likely to be short). existing data stream to cease to exist. 58 Unlimited (U) If U=1, the number of units in the data stream is unlimited (and the UNITCNT field is ignored). 59 Reserved 60:63 Stream ID (ID) Stream ID to use for this data stream (GO=0 and S=0b00), or stream ID associated with the data stream from which the program will probably no longer load (S=0b10) 362 Power ISATM -- Book II Version 2.04 Programming Note To obtain the best performance across the widest range from the stream (e.g., by executing the appropriate of implementations that support the variants of dcbt in dcbt instruction with TH=0b1010 and S0b00). which TH0=1, the programmer should assume the fol- 1 At each level of the storage hierarchy that is "near" lowing model when using those variants. the processor, blocks of a data stream that is spec- 1 The processor's response to a hint that the pro- ified as transient are most likely to be replaced. As gram will probably soon load from a given data a result, it may be desirable to stagger addresses stream is to take actions that reduce the latency of of streams (choose addresses that map to different loads from the first few blocks of the stream. (Such cache congruence classes) to reduce the likeli- actions may include prefetching the blocks into lev- hood that a unit of a transient stream will be els of the storage hierarchy that are "near" the pro- replaced prior to being accessed by the program. cessor.) Thereafter, as the program loads from 1 On some implementations, data streams that are each successive block of the stream, the proces- not specified by software may be detected by the sor takes latency-reducing actions for additional processor. Such data streams are called "hard- blocks of the stream, pacing these actions with the ware-detected data streams". On some such program's loads (i.e., taking the actions for only a implementations, data stream resources limited number of blocks ahead of the block that (resources that are used primarily to support data the program is currently loading from). streams) are shared between software-specified The processor's response to a hint that the pro- data streams and hardware-detected data gram will probably no longer load from a given data streams. On these latter implementations, the pro- stream, or to the cessation of existence of a data gramming model includes the following. stream, is to stop taking latency-reducing actions - Software-specified data streams take prece- for the stream. dence over hardware-detected data streams 1 A data stream having finite length ceases to exist in use of data stream resources. when the latency-reducing actions have been - The processor's response to a hint that the taken for all blocks of the stream. program will probably no longer load from a 1 If the program ceases to need a given data stream given data stream, or to the cessation of exist- before having loaded from all blocks of the stream ence of a data stream, includes releasing the (always the case for streams having unlimited associated data stream resources, so that length), performance may be improved if the pro- they can be used by hardware-detected data gram then provides a hint that it will no longer load streams. Chapter 3. Storage Control Instructions 363 Version 2.04 Programming Note This Programming Note describes several aspects of eieio (or sync) instruction must separate that using dcbt instructions in which TH0=1. dcbt instruction from the following dcbt instruc- tions. 1 A non-transient data stream having unlimited length can be completely specified, including pro- 1 In practice, the second eieio (or sync) viding the hint that the program will probably soon described above can sometimes be omitted. For load from it, using one dcbt instruction. The corre- example, if the program consists of an outer loop sponding specification for a data stream having that contains the dcbt instructions and an inner other attributes requires three dcbt instructions. loop that contains the Load instructions that load However, one dcbt instruction with TH=0b1010 from the data streams, the characteristics of the and GO=1 can apply to a set of the data streams inner loop and of the implementation's branch pre- described in the preceding sentence, so the corre- diction mechanisms may make it highly unlikely sponding specification for n such data streams that hints corresponding to a given iteration of the requires 2×n+1 dcbt instructions. (There is no outer loop will be provided out of program order need to execute a dcbt instruction with with respect to hints corresponding to the previous TH=0b1010 and S=0b10 for a given stream ID iteration of the outer loop. (Also, any providing of before using the stream ID for a new data stream; hints out of program order affects only perfor- the implicit portion of the hint provided by dcbt mance, not program correctness.) instructions that describe data streams suffices.) 1 To mitigate the effects of interrupts on data 1 If it is desired that the hint provided by a given dcbt streams, it may be desirable to specify a given instruction be provided in program order with "logical" data stream as a sequence of shorter, respect to the hint provided by another dcbt component data streams. Similar considerations instruction, the two dcbt instructions must be sep- apply to conditions and events that, depending on arated by an eieio (or sync) instruction. For the implementation, may cause an existing data example, if a dcbt instruction with TH=0b1010 and stream to cease to exist. GO=1 is intended to indicate that the program will 1 If it is desired to specify data streams without probably soon load from nascent data streams regard to the number of stream IDs provided by the described (completely) by preceding dcbt instruc- implementation, stream IDs should be assigned to tions, and is intended not to indicate that the pro- data streams in order of decreasing stream impor- gram will probably soon load from nascent data tance (stream ID 0 to the most important stream, streams described (completely) by following dcbt stream ID 1 to the next most important stream, instructions, an eieio (or sync) instruction etc.). This order ensures that the hints for the most must separate the dcbt instruction with GO=1 from important data streams will be provided. the preceding dcbt instructions, and another 364 Power ISATM -- Book II Version 2.04 Data Cache Block Touch for Store X-form Extended Mnemonic: An extended mnemonic is provided for the Data Cache dcbtst RA,RB [Category: Server] Block Touch for Store instruction so that it can be coded dcbtst TH,RA,RB [Category: Embedded] with the TH value as the last operand for all categories. . 31 / TH RA RB 246 / 0 6 7 11 16 21 31 Extended: Equivalent to: dcbtstct RA,RB,TH dcbt for TH values of 0b0000 or Let the effective address (EA) be the sum (RA|0)+(RB). 0b0000 - 0b0111; other TH values are invalid The dcbtst instruction provides a hint that the program for this extended mnemonic. will probably soon store to the block containing the byte addressed by EA. If the Cache Specification category is supported, the nature of the hint depends on the Programming Note value of the TH field, as specified below. If the Cache See the Programming Notes for the dcbt instruc- Specification category is not supported, the TH field is tion. treated as a reserved field. The hint is ignored if the block is in a storage location Programming Note that is Caching Inhibited or, for the Server environment, The processor's response to the hint provided by Guarded. dcbt or dcbtst is to take actions that reduce the The only operation that is "caused by" the dcbtst latency of subsequent loads or stores that access instruction is the providing of the hint. The actions (if the specified block. (Such actions may include any) taken by the processor in response to the hint are prefetching the block into levels of the storage hier- not considered to be "caused by" or "associated with" archy that are "near" the processor.) the dcbtst instruction (e.g., dcbtst is considered not to Processors that comply with versions of the archi- cause any data accesses). No means are provided by tecture that precede Version 2.01 do not necessar- which software can synchronize these actions with the ily ignore the hint provided by dcbt and dcbtst if execution of the instruction stream. For example, these the specified block is in storage that is Guarded actions are not ordered by memory barriers. and not Caching Inhibited. The dcbtst instruction may complete before the opera- tion it causes has been performed. This instruction is treated as a Load (see Section 3.2), except that the system data storage error handler is not invoked, and reference and change recording need not be done. TH Field [Category: Cache Specification] For all TH field values which the are not listed below, the hint provided by the instruction is undefined. TH=0b0000 - 0b0111 [Category: Cache Specifica- tion] In addition to the hint provided if the Cache Specifica- tion category is not supported, a hint is provided indi- cating that placement of the block in the cache specified by the TH field might also improve perfor- mance. The correspondence between each value of the TH field and the cache to be specified is the same as the correspondence between each value of the CT field and the cache to be specified as defined in Sec- tion 3.2. The hints corresponding to values of the TH field not supported by the implementation are unde- fined. Special Registers Altered: None Chapter 3. Storage Control Instructions 365 Version 2.04 Data Cache Block set to Zero X-form If the block containing the byte addressed by EA is in storage that is Memory Coherence Required and a dcbz RA,RB block containing the byte addressed by EA is in the data cache of any processor and any locations in the 31 /// RA RB 1014 / block are considered to be modified there, those loca- 0 6 11 16 21 31 tions are written to main storage, additional locations in the block may be written to main storage, and the block if RA = 0 then b 1 0 ceases to be considered to be modified in that data else b 1 (RA) cache. EA 1 b + (RB) If the block containing the byte addressed by EA is in n 1 block size (bytes) storage that is not Memory Coherence Required and m 1 log2(n) ea 1 EA0:63-m || m0 the block is in the data cache of this processor and any MEM(ea, n) 1 n0x00 locations in the block are considered to be modified there, those locations are written to main storage, addi- Let the effective address (EA) be the sum (RA|0)+(RB). tional locations in the block may be written to main stor- All bytes in the block containing the byte addressed by age, and the block ceases to be considered to be EA are set to zero. modified in that data cache. This instruction is treated as a Store (see Section 3.2). The function of this instruction is independent of whether the block containing the byte addressed by EA Special Registers Altered: is in storage that is Write Through Required or Caching None Inhibited. Programming Note This instruction is treated as a Load (see Section 3.2), except that reference and change recording need dcbz does not cause the block to exist in the data not be done, and it is treated as a Write with respect to cache if the block is in storage that is Caching debug events. Inhibited. Special Registers Altered: For storage that is neither Write Through Required None nor Caching Inhibited, dcbz provides an efficient means of setting blocks of storage to zero. It can be used to initialize large areas of such storage, in a manner that is likely to consume less memory bandwidth than an equivalent sequence of Store instructions. For storage that is either Write Through Required or Caching Inhibited, dcbz is likely to take signifi- cantly longer to execute than an equivalent sequence of Store instructions. For example, on some implementations dcbz for such storage may cause the system alignment error handler to be invoked; on such implementations the system alignment error handler sets the specified block to zero using Store instructions. See Section 5.9.1 of Book III-S and Section 4.9.1 of Book III-E. for additional information about dcbz. Data Cache Block Store X-form dcbst RA,RB 31 /// RA RB 54 / 0 6 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). 366 Power ISATM -- Book II Version 2.04 Data Cache Block Flush X-form numeric operand. These are shown as examples with the instruction. See Appendix A. "Assembler Extended dcbf RA,RB,L Mnemonics" on page 383. The extended mnemonics are shown below. 31 /// L RA RB 86 / 0 6 10 11 16 21 31 Extended: Equivalent to: dcbf RA,RB dcbf RA,RB,0 dcbfl RA,RB dcbf RA,RB,1 Let the effective address (EA) be the sum (RA|0)+(RB). L=0 Except in the dcbf instruction description in this sec- tion, references to "dcbf" in Books I-III imply L=0 If the block containing the byte addressed by EA is unless otherwise stated or obvious from context; in storage that is Memory Coherence Required "dcbfl" is used for L=1. and a block containing the byte addressed by EA is in the data cache of any processor and any loca- Programming Note tions in the block are considered to be modified dcbf serves as both a basic and an extended mne- there, those locations are written to main storage monic. The Assembler will recognize a dcbf mne- and additional locations in the block may be written monic with three operands as the basic form, and a to main storage. The block is invalidated in the dcbf mnemonic with two operands as the extended data caches of all processors. form. In the extended form the L operand is omit- If the block containing the byte addressed by EA is ted and assumed to be 0. in storage that is not Memory Coherence Required and the block is in the data cache of this processor and any locations in the block are considered to be Programming Note [Category: Server] modified there, those locations are written to main dcbf with L=1 can be used to cause a block that storage and additional locations in the block may will not be reused soon to be removed from the be written to main storage. The block is invalidated processor's data cache, and thereby potentially to in the data cache of this processor. cause that data cache to be used more efficiently. L=1 ("dcbf local") [Category: Server Phased-In] Programming Note [Category: Server] The L=1 form of the dcbf instruction permits a pro- gram to limit the scope of the "flush" operation to The functions provided by dcbf with L=1 are identi- the data cache of a single processor. If the block cal to those that would be provided if L were 0 and containing the byte addressed by EA is in the data the specified block were in storage that is not Mem- cache of this processor and any locations in the ory Coherence Required. block are considered to be modified there, those locations are written to main storage and additional locations in the block may be written to main stor- age. The block is invalidated in the data cache of this processor. The function of this instruction is independent of whether the block containing the byte addressed by EA is in storage that is Write Through Required or Caching Inhibited. If L=1, the function of this instruction is also independent of whether the block containing the byte addressed by EA is in storage that is Memory Coher- ence Required. This instruction is treated as a Load (see Section 3.2), except that reference and change recording need not be done, and it is treated as a Write with respect to debug events. Special Registers Altered: None Extended Mnemonics: Extended mnemonics are provided for the Data Cache Block Flush instruction so that it can be coded with the L value as part of the mnemonic rather than as a Chapter 3. Storage Control Instructions 367 Version 2.04 3.2.2.1 Obsolete Data Cache Instruc- tions [Category: Vector.Phased-Out] The Data Stream Touch (dst), Data Stream Touch for Store (dstst), and Data Stream Stop (dss) instructions (primary opcode 31, extended opcodes 342, 374, and 822 respectively), which were proposed for addition to the Power ISA and were implemented by some proces- sors, may be treated as no-ops (rather than as illegal instructions). The treatment of these instructions (no-op or illegal instruction) is independent of whether other Vector instructions are available (i.e., is independent of the contents of MSRVEC (see Book III-S) or MSRSPV (see Book III-E). Programming Note These instructions merely provided hints, and thus were permitted to be treated as no-ops even on processors that implemented them. The treatment of these instructions is independent of whether other Vector instructions are available because, on processors that implemented the instructions, the instructions were available even when other Vector instructions were not. The extended mnemonics for these instructions were dstt, dststt, and dssall. 368 Power ISATM -- Book II Version 2.04 3.3 Synchronization Instructions The synchronization instructions are used to ensure instructions are initiated, or to control storage access that certain instructions have completed before other ordering, or to support debug operations. 3.3.1 Instruction Synchronize 3.3.2 Load and Reserve and Store Instruction Conditional Instructions The Load And Reserve and Store Conditional instruc- Instruction Synchronize XL-form tions can be used to construct a sequence of instruc- tions that appears to perform an atomic update isync operation on an aligned storage location. See Section 1.7.3, "Atomic Update" for additional informa- 19 /// /// /// 150 / tion about these instructions. 0 6 11 16 21 31 The Load And Reserve and Store Conditional instruc- Executing an isync instruction ensures that all instruc- tions are fixed-point Storage Access instructions; see tions preceding the isync instruction have completed Section 3.3.1, "Fixed-Point Storage Access Instruc- before the isync instruction completes, and that no tions", in Book I. subsequent instructions are initiated until after the The storage location specified by the Load And isync instruction completes. It also ensures that all Reserve and Store Conditional instructions must be in instruction cache block invalidations caused by icbi storage that is Memory Coherence Required if the loca- instructions preceding the isync instruction have been tion may be modified by other processors or mecha- performed with respect to the processor executing the nisms. If the specified location is in storage that is Write isync instruction, and then causes any prefetched Through Required or Caching Inhibited, the system instructions to be discarded. data storage error handler or the system alignment Except as described in the preceding sentence, the error handler is invoked for the Server environment and isync instruction may complete before storage may be invoked for the Embedded environment. accesses associated with instructions preceding the isync instruction have been performed. Programming Note The Memory Coherence Required attribute on This instruction is context synchronizing (see Book III). other processors and mechanisms ensures that Special Registers Altered: their stores to the reservation granule will cause None the reservation created by the Load And Reserve instruction to be lost. Programming Note Because the Load And Reserve and Store Condi- tional instructions have implementation dependen- cies (e.g., the granularity at which reservations are managed), they must be used with care. The oper- ating system should provide system library pro- grams that use these instructions to implement the high-level synchronization functions (Test and Set, Compare and Swap, locking, etc.; see Appendix B) that are needed by application programs. Applica- tion programs should use these library programs, rather than use the Load And Reserve and Store Conditional instructions directly. Chapter 3. Storage Control Instructions 369 Version 2.04 Load Word And Reserve Indexed X-form Store Word Conditional Indexed X-form lwarx RT,RA,RB stwcx. RS,RA,RB 31 RT RA RB 20 / 31 RS RA RB 150 1 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b +(RB) EA 1 b + (RB) RESERVE 1 1 if RESERVE then RESERVE_ADDR 1 real_addr(EA) if RESERVE_ADDR = real_addr(EA) then RT 1 320 || MEM(EA, 4) MEM(EA, 4) 1 (RS)32:63 CR0 1 0b00 || 0b1 || XERSO Let the effective address (EA) be the sum (RA|0)+(RB). else The word in storage addressed by EA is loaded into u1 1 undefined 1-bit value RT32:63. RT0:31 are set to 0. if u1 then MEM(EA, 4) 1 (RS)32:63 This instruction creates a reservation for use by a Store u2 1 undefined 1-bit value Word Conditional instruction. An address computed CR0 1 0b00 || u2 || XERSO from the EA as described in Section 1.7.3.1 is associ- RESERVE 1 0 ated with the reservation, and replaces any address else previously associated with the reservation. CR0 1 0b00 || 0b0 || XERSO EA must be a multiple of 4. If it is not, either the system Let the effective address (EA) be the sum (RA|0)+(RB). alignment error handler is invoked or the results are If a reservation exists and the storage location specified boundedly undefined. by the stwcx. is the same as the location specified by Special Registers Altered: the Load And Reserve instruction that established the None reservation, (RS)32:63 are stored into the word in stor- age addressed by EA and the reservation is cleared. If a reservation exists but the storage location specified by the stwcx. is not the same as the location specified by the Load And Reserve instruction that established the reservation, the reservation is cleared, and it is undefined whether (RS)32:63 are stored into the word in storage addressed by EA. If a reservation does not exist, the instruction com- pletes without altering storage. CR Field 0 is set as follows. n is a 1-bit value that indi- cates whether the store was performed, except that if a reservation exists but the storage location specified by the stwcx. is not the same as the location specified by the Load And Reserve instruction that established the reservation the value of n is undefined. CR0LT GT EQ SO = 0b00 || n || XERSO EA must be a multiple of 4. If it is not, either the system alignment error handler is invoked or the results are boundedly undefined. Special Registers Altered: CR0 370 Power ISATM -- Book II Version 2.04 3.3.2.1 64-Bit Load and Reserve and Store Conditional Instructions [Category: 64-Bit] Store Doubleword Conditional Indexed Load Doubleword And Reserve Indexed X-form X-form stdcx. RS,RA,RB ldarx RT,RA,RB 31 RS RA RB 214 1 31 RT RA RB 84 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b +(RB) if RESERVE then RESERVE 1 1 if RESERVE_ADDR = real_addr(EA) then RESERVE_ADDR 1 real_addr(EA) MEM(EA, 8) 1 (RS) RT 1 MEM(EA, 8) CR0 1 0b00 || 0b1 || XERSO else Let the effective address (EA) be the sum (RA|0)+(RB). u1 1 undefined 1-bit value The doubleword in storage addressed by EA is loaded if u1 then into RT. MEM(EA, 8) 1 (RS) This instruction creates a reservation for use by a Store u2 1 undefined 1-bit value CR0 1 0b00 || u2 || XERSO Doubleword Conditional instruction. An address com- RESERVE 1 0 puted from the EA as described in Section 1.7.3.1 is else associated with the reservation, and replaces any CR0 1 0b00 || 0b0 || XERSO address previously associated with the reservation. Let the effective address (EA) be the sum (RA|0)+(RB). EA must be a multiple of 8. If it is not, either the system alignment error handler is invoked or the results are If a reservation exists and the storage location specified boundedly undefined. by the stdcx. is the same as the location specified by the Load And Reserve instruction that established the Special Registers Altered: reservation, (RS) is stored into the doubleword in stor- None age addressed by EA and the reservation is cleared. If a reservation exists but the storage location specified by the stdcx. is not the same as the location specified by the Load And Reserve instruction that established the reservation, the reservation is cleared, and it is undefined whether (RS) is stored into the doubleword in storage addressed by EA. If a reservation does not exist, the instruction com- pletes without altering storage. CR Field 0 is set as follows. n is a 1-bit value that indi- cates whether the store was performed, except that if a reservation exists but the storage location specified by the stdcx. is not the same as the location specified by the Load And Reserve instruction that established the reservation the value of n is undefined. CR0LT GT EQ SO = 0b00 || n || XERSO EA must be a multiple of 8. If it is not, either the system alignment error handler is invoked or the results are boundedly undefined. Special Registers Altered: CR0 Chapter 3. Storage Control Instructions 371 Version 2.04 3.3.3 Memory Barrier Instructions The Memory Barrier instructions can be used to control Extended mnemonics for Synchronize the order in which storage accesses are performed. Additional information about these instructions and Extended mnemonics are provided for the Synchronize about related aspects of storage management can be instruction so that it can be supported by assemblers found in Book III. that recognize only the msync mnemonic and so that it can be coded with the L value as part of the mne- monic rather than as a numeric operand. These are shown as examples with the instruction. See Appendix A. "Assembler Extended Mnemonics" on page 383. Synchronize X-form If L=0 (or L=2), the sync instruction has the follow- ing additional properties. sync L 1 Executing the sync instruction ensures that all instructions preceding the sync instruction have 31 /// L /// /// 598 / completed before the sync instruction completes, 0 6 9 11 16 21 31 and that no subsequent instructions are initiated until after the sync instruction completes. The sync instruction creates a memory barrier (see 1 The sync instruction is execution synchronizing Section 1.7.1). The set of storage accesses that is (see Book III). However, address translation and ordered by the memory barrier depends on the value of reference and change recording (see Book III) the L field. associated with subsequent instructions may be L=0 ("heavyweight sync") performed before the sync instruction completes. The memory barrier provides an ordering function 1 The memory barrier provides the additional order- for the storage accesses associated with all ing function such that if a given instruction that is instructions that are executed by the processor the result of a store in set B is executed, all appli- executing the sync instruction. The applicable cable storage accesses in set A have been per- pairs are all pairs ai,bj in which bj is a data access, formed with respect to the processor executing the except that if ai is the storage access caused by an instruction to the extent required by the associated icbi instruction then bj may be performed with memory coherence properties. The single excep- respect to the processor executing the sync tion is that any storage access in set A that is instruction before ai is performed with respect to caused by an icbi instruction executed by the pro- that processor. cessor executing the sync instruction (P1) may not have been performed with respect to P1 (see the L=1 ("lightweight sync") description of the icbi instruction on page 359). The memory barrier provides an ordering function The cumulative properties of the barrier apply to for the storage accesses caused by Load, Store, the execution of the given instruction as they would and dcbz instructions that are executed by the pro- to a load that returned a value that was the result cessor executing the sync instruction and for of a store in set B. which the specified storage location is in storage that is Memory Coherence Required and is neither 1 The sync instruction provides an ordering function Write Through Required nor Caching Inhibited. for the operations caused by dcbt instructions with The applicable pairs are all pairs ai,bj of such TH0=1. accesses except those in which ai is an access The value L=3 is reserved. caused by a Store or dcbz instruction and bj is an access caused by a Load instruction. The sync instruction may complete before storage L=2 accesses associated with instructions preceding the sync instruction have been performed. The sync The set of storage accesses that is ordered by the instruction may complete before operations caused by memory barrier is described in Section 5.9.2 of dcbt instructions with TH0=1 preceding the sync Book III-S and Section 4.9.3 of Book III-E, as are instruction have been performed. additional properties of the sync instruction with L=2. Special Registers Altered: None The ordering done by the memory barrier is cumulative. 372 Power ISATM -- Book II Version 2.04 Extended Mnemonics: Programming Note Extended mnemonics for Synchronize: The sync instruction can be used to ensure that all stores into a data structure, caused by Store Extended: Equivalent to: instructions executed in a "critical section" of a pro- sync sync 0 gram, will be performed with respect to another msync sync 0 processor before the store that releases the lock is lwsync sync 1 performed with respect to that processor; see ptesync sync 2 Section B.2, "Lock Acquisition and Release, and Except in the sync instruction description in this sec- Related Techniques" on page 387. tion, references to "sync" in Books I-III imply L=0 The memory barrier created by a sync instruction unless otherwise stated or obvious from context; the with L=0 or L=1 does not order implicit storage appropriate extended mnemonics are used when other accesses. The memory barrier created by a sync L values are intended. instruction with any L value does not order instruc- tion fetches. Programming Note (The memory barrier created by a sync instruction Section 1.8 contains a detailed description of how with L=0 ­ or L=2; see Book III ­ appears to to modify instructions such that a well-defined order instruction fetches for instructions preceding result is obtained. the sync instruction with respect to data accesses caused by instructions following the sync instruc- Programming Note tion. However, this ordering is a consequence of the first "additional property" of sync with L=0, not sync serves as both a basic and an extended mne- a property of the memory barrier.) monic. The Assembler will recognize a sync mne- monic with one operand as the basic form, and a In order to obtain the best performance across the sync mnemonic with no operand as the extended widest range of implementations, the programmer form. In the extended form the L operand is omit- should use the sync instruction with L=1, or the ted and assumed to be 0. eieio or mbar instruction, if any of these is sufficient for his needs; otherwise he should use sync with L=0. sync with L=2 should not be used by application programs. Programming Note The functions provided by sync with L=1 are a strict subset of those provided by sync with L=0. (The functions provided by sync with L=2 are a strict superset of those provided by sync with L=0; see Book III.) Chapter 3. Storage Control Instructions 373 Version 2.04 Enforce In-order Execution of I/O X-form Memory Barrier X-form eieio mbar MO [Category: Server] [Category: Embedded] 31 /// /// /// 854 / 31 MO /// /// 854 / 0 6 11 16 21 31 0 6 11 16 21 31 The eieio instruction creates a memory barrier (see When MO=0, the mbar instruction creates a cumulative Section 1.7.1, "Storage Access Ordering"), which pro- memory barrier (see Section 1.7.1, "Storage Access vides an ordering function for the storage accesses Ordering"), which provides an ordering function for the caused by Load, Store, dcbz, eciwx, and ecowx storage accesses executed by the processor executing instructions executed by the processor executing the the mbar instruction. eieio instruction. These storage accesses are divided When MO0, an implementation may support the mbar into the two sets listed below. The storage access instruction ordering a particular subset of storage caused by an eciwx instruction is ordered as a load, accesses. An implementation may also support multi- and the storage access caused by a dcbz or ecowx ple, non-zero values of MO that each specify a different instruction is ordered as a store. subset of storage accesses that are ordered by the 1. Loads and stores to storage that is both Caching mbar instruction. Which subsets of storage accesses Inhibited and Guarded, and stores to main storage that are ordered and which values of MO that specify caused by stores to storage that is Write Through these subsets is implementation-dependent. Required. The mbar instruction may complete before storage The applicable pairs are all pairs ai,bj of such accesses associated with instructions preceding the accesses. mbar instruction have been performed. The mbar instruction may complete before operations caused by dcbt instructions having TH0=1 preceding the mbar 2. Stores to storage that is Memory Coherence instruction have been performed. Required and is neither Write Through Required nor Caching Inhibited. Special Registers Altered: None The applicable pairs are all pairs ai,bj of such accesses. Programming Note The eieio and mbar instructions are intended for use in doing memory-mapped I/O, and The operations caused by dcbt instructions with in preventing load/store combining operations in TH0 = 1 are ordered by eieio as a third set of opera- main storage (see Section 1.6, "Storage Control tions, and the operations caused by tlbie and tlb- Attributes" on page 344). sync instructions (see Book III-S) are ordered by eieio as a fourth set of operations. Because stores to storage that is both Caching Inhibited and Guarded are performed in program Each of the four sets of storage accesses or operations order (see Section 1.7.1, "Storage Access Order- is ordered independently of the other three sets. The ing" on page 347), eieio or mbar is ordering done by eieio's memory barrier for the second needed for such storage only when loads must be set is cumulative; the ordering done by eieio's memory ordered with respect to stores or with respect to barrier for the other three sets is not cumulative. other loads, or when load/store combining opera- The eieio instruction may complete before storage tions must be prevented. accesses or operations associated with instructions For the eieio instruction, accesses in set 1, ai preceding the eieio instruction have been performed. and bj need not be the same kind of access or be to storage having the same storage control attributes. For example, ai can be a load to Caching Inhibited, Special Registers Altered: Guarded storage, and bj a store to Write Through None Required storage. If stronger ordering is desired than that provided by eieio or mbar, the sync instruction must be used, with the appropriate value in the L field. 374 Power ISATM -- Book II Version 2.04 Programming Note 3.3.4 Wait Instruction The functions provided by eieio and mbar are a strict subset of those provided by sync with Wait X-form L=0. The functions provided by eieio for its second set are a strict subset of those provided by wait sync with L=1. [Category: Wait] Since eieio and mbarshare the same op- 31 /// /// /// 62 / code, software designed for both server and 0 6 11 16 21 31 embedded environments must assume that only the eieio functionality applies since the func- The wait instruction provides an ordering function for tions provided by eieio are a subset of those pro- the effects of all instructions executed by the processor vided by mbar. executing the wait instruction. Executing a wait instruc- tion ensures that all instructions have completed before the wait instruction completes, and that no subsequent instructions are initiated until an interrupt occurs. The wait instruction also causes any prefetched instructions to be discarded and instruction fetching is suspended until an interrupt occurs. Once the wait instruction has completed, the NIA will point to the next sequential instruction. Special Registers Altered: None Programming Note The wait instruction can be used in verification test cases to signal the end of a test case. The encod- ing for the instruction is the same in both Big- Endian and Little-Endian modes. Programming Note The wait instruction may be useful as the primary instruction of an "idle process" or the completion of processing for a cooperative thread. Note that wait updates the NIA so that an interrupt that awakens a wait instruction will return to the instruction after the wait. Chapter 3. Storage Control Instructions 375 Version 2.04 376 Power ISATM -- Book II Version 2.04 Chapter 4. Time Base 4.1 Time Base Overview. . . . . . . . . . . 377 4.3 Alternate Time Base [Category: Alter- 4.2 Time Base . . . . . . . . . . . . . . . . . . 377 nate Time Base] . . . . . . . . . . . . . . . . . 380 4.2.1 Time Base Instructions . . . . . . . 378 4.1 Time Base Overview The Power ISA AS does not specify a relationship between the frequency at which the Time Base is The time base facilities include a Time Base and an updated and other frequencies, such as the CPU clock Alternate Time Base which is category: Alternate Time or bus clock, in a Power ISA AS system. The Time Base. The Alternate Time Base is analogous to the Base update frequency is not required to be constant. Time Base except that it may count at a different fre- What is required, so that system software can keep quency and is not writable. time of day and operate interval timers, is one of the fol- lowing. 1 The system provides an (implementation-depen- 4.2 Time Base dent) interrupt to software whenever the update frequency of the Time Base changes, and a means The Time Base (TB) is a 64-bit register (see Figure 3) to determine what the current update frequency is. containing a 64-bit unsigned integer that is incremented periodically. Each increment adds 1 to the low-order bit 1 The update frequency of the Time Base is under (bit 63). The frequency at which the integer is updated the control of the system software. is implementation-dependent. Programming Note TBU TBL If the operating system initializes the Time Base on 0 32 63 power-on to some reasonable value and the update frequency of the Time Base is constant, the Field Description Time Base can be used as a source of values that TBU Upper 32 bits of Time Base increase at a constant rate, such as for time stamps in trace entries. TBL Lower 32 bits of Time Base Even if the update frequency is not constant, val- Figure 3. Time Base ues read from the Time Base are monotonically increasing (except when the Time Base wraps from The Time Base increments until its value becomes 264-1 to 0). If a trace entry is recorded each time 0xFFFF_FFFF_FFFF_FFFF (264 - 1). At the next the update frequency changes, the sequence of increment, its value becomes Time Base values can be post-processed to 0x0000_0000_0000_0000. There is no explicit indica- become actual time values. tion (such as an interrupt; see Book III) that this has occurred. Successive readings of the Time Base may return identical values. The period of the Time Base depends on the driving frequency. As an order of magnitude example, sup- pose that the CPU clock is 1 GHz and that the Time Base is driven by this frequency divided by 32. Then the period of the Time Base would be 2 64 × 32 TTB = --------------------- = 5.90 x 1011 seconds 1GHz which is approximately 18,700 years. Chapter 4. Time Base 377 Version 2.04 4.2.1 Time Base Instructions Programming Note mftb serves as both a basic and an extended mne- Move From Time Base XFX-form monic. The Assembler will recognize an mftb mnemonic with two operands as the basic form, mftb RT,TBR and an mftb mnemonic with one operand as the [Category: Server.Phased-Out] extended form. In the extended form the TBR operand is omitted and assumed to be 268 (the 31 RT tbr 371 / value that corresponds to TB). 0 6 11 21 31 Programming Note This instruction behaves as if it were an mfspr instruc- tion; see the mfspr instruction description in The mfspr instruction can be used to read the Time Section 3.3.14 of Book I. Base on all processors that comply with Version 2.01 of the architecture or with any subsequent Special Registers Altered: version. None. It is believed that the mfspr instruction can be used Extended Mnemonics: to read the Time Base on most processors that comply with versions of the architecture that pre- Extended mnemonics for Move From Time Base: cede Version 2.01. Processors for which mfspr cannot be used to read the Time Base include the Extended: Equivalent to: following. mftb Rx,268 - 601 mftb Rx mfspr Rx,268 - POWER3 mftb Rx,269 mftbu Rx (601 implements neither the Time Base nor mftb, mfspr Rx,269 but depends on software using mftb to read the Programming Note Time Base, so that the attempt causes an Illegal Instruction type Program interrupt and thereby per- New programs should use mfspr instead of mftb mits the operating system to emulate the Time to access the Time Base. Base.) Programming Note 1 GHz Since the update frequency of the Time Base is imple- ------------------ = 31,250,000 - mentation-dependent, the algorithm for converting the 32 current value in the Time Base to time of day is also which is the number of times the Time Base is implementation-dependent. updated each second. As an example, assume that the Time Base is incre- 1 The integer constant ns_adj contains the value mented at a constant rate of once for every 32 cycles of a 1 GHz CPU instruction clock. What is wanted is the 1,000,000,000 pair of 32-bit values comprising a POSIX standard ------------------------------------- = 32 - 31,250,000 clock:1 the number of whole seconds that have passed since 00:00:00 January 1, 1970, UTC, and the remain- which is the number of nanoseconds per tick of the ing fraction of a second expressed as a number of Time Base. nanoseconds. When the processor is in 64-bit mode, the POSIX clock Assume that: can be computed with an instruction sequence such as 1 The value 0 in the Time Base represents the start this: time of the POSIX clock (if this is not true, a simple mfspr Ry,268 # Ry = Time Base 64-bit subtraction will make it so). lwz Rx,ticks_per_sec divd Rz,Ry,Rx# Rz = whole seconds 1 The integer constant ticks_per_sec contains the stw Rz,posix_sec value mulld Rz,Rz,Rx# Rz = quotient × divisor sub Rz,Ry,Rz# Rz = excess ticks 1. Described in POSIX Draft Standard P1003.4/D12, Draft Standard for Information Technology -- Portable Operating System Interface (POSIX) -- Part 1: System Application Program Interface (API) - Amendment 1: Real-time Extension [C Language]. Institute of Electrical and Electronics Engi- neers, Inc., Feb. 1992. 378 Power ISATM -- Book II Version 2.04 lwz Rx,ns_adj Non-constant update frequency mulld Rz,Rz,Rx# Rz = excess nanoseconds stw Rz,posix_ns In a system in which the update frequency of the Time Base may change over time, it is not possible to convert For the Embedded environment when the processor is an isolated Time Base value into time of day. Instead, a in 32-bit mode, it is not possible to read the Time Base Time Base value has meaning only with respect to the using a single instruction. Instead, two instructions current update frequency and the time of day that the must be used, one of which reads TBL and the other of update frequency was last changed. Each time the which reads TBU. Because of the possibility of a carry update frequency changes, either the system software from TBL to TBU occurring between the two reads, a is notified of the change via an interrupt (see Book III), sequence such as the following must be used to read or the change was instigated by the system software the Time Base. itself. At each such change, the system software must loop: compute the current time of day using the old update mfspr Rx,TBU # load from TBU mfspr Ry,TB # load from TB frequency, compute a new value of ticks_per_sec for mfspr Rz,TBU # load from TBU the new frequency, and save the time of day, Time Base cmp cr0,0,Rx,Rz# check if `old'='new' value, and tick rate. Subsequent calls to compute Time bne loop #branch if carry occurred of Day use the current Time Base Value and the saved value. Chapter 4. Time Base 379 Version 2.04 4.3 Alternate Time Base [Cate- gory: Alternate Time Base] The Alternate Time Base (ATB) is a 64-bit register (see Figure 3) containing a 64-bit unsigned integer that is incremented periodically. The frequency at which the integer is updated is implementation-dependent. ATBU ATBL 0 32 63 Figure 4. Alternate Time Base The ATBL register is an aliased name for the ATB. The Alternate Time Base increments until its value becomes 0xFFFF_FFFF_FFFF_FFFF (264 - 1). At the next increment, its value becomes 0x0000_0000_0000_0000. There is no explicit indica- tion (such as an interrupt; see Book III) that this has occurred. The Alternate Time Base is accessible in both user and supervisor mode. The counter can be read by execut- ing a mfspr instruction specifying the ATB (or ATBL) register, but cannot be written. A second SPR register ATBU, is defined that accesses only the upper 32 bits of the counter. Thus the upper 32 bits of the counter may be read into a register by reading the ATBU regis- ter. The effect of entering a power-savings mode or of pro- cessor frequency changes on counting in the Alternate Time Base is implementation-dependent. 380 Power ISATM -- Book II Version 2.04 Chapter 5. External Control [Category: External Control] The External Control category of facilities and instruc- The ecowx instruction might be used to send the tions permits a program to communicate with a special- device the translated real address of a buffer containing purpose device. Two instructions are provided, both of graphics data, and the word transmitted from the Gen- which must be implemented if the facility is provided. eral Purpose Register might be control information that tells the adapter what operation to perform on the data 1 External Control In Word Indexed (eciwx), which in the buffer. The eciwx instruction might be used to does the following: load status information from the adapter. - Computes an effective address (EA) like most A device designed to be used with the External Control X-form instructions facility may also recognize events that indicate that the - Validates the EA as would be done for a load address translation being used by the processor has from that address changed. In this case the operating system need not - Translates the EA to a real address "pin" the area of storage identified by an eciwx or - Transmits the real address to the device ecowx instruction (i.e., need not protect it from being - Accepts a word of data from the device and paged out). places it into a General Purpose Register 1 External Control Out Word Indexed (ecowx), which does the following: - Computes an effective address (EA) like most X-form instructions - Validates the EA as would be done for a store to that address - Translates the EA to a real address - Transmits the real address and a word of data from a General Purpose Register to the device Permission to execute these instructions and identifica- tion of the target device are controlled by two fields, called the E bit and the RID field respectively. If attempt is made to execute either of these instructions when E=0 the system data storage error handler is invoked. The location of these fields is described in Book III. The storage access caused by eciwx and ecowx is performed as though the specified storage location is Caching Inhibited and Guarded, and is neither Write Through Required nor Memory Coherence Required. Interpretation of the real address transmitted by eciwx and ecowx and of the 32-bit value transmitted by ecowx is up to the target device, and is not specified by the Power ISA. See the System Architecture documen- tation for a given Power ISA system for details on how the External Control facility can be used with devices on that system. Example An example of a device designed to be used with the External Control facility might be a graphics adapter. Chapter 5. External Control [Category: External Control] 381 Version 2.04 5.1 External Access Instructions In the instruction descriptions the statements "this treated as a Store" have the same meanings as for the instruction is treated as a Load" and "this instruction is Cache Management instructions; see Section 3.2. External Control In Word Indexed X-form else b 1 (RA) EA 1 b + (RB) eciwx RT,RA,RB raddr 1 address translation of EA send store word request for raddr to device identified by RID send (RS)32:63 to device 31 RT RA RB 310 / 0 6 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). A store word request for the real address correspond- if RA = 0 then b 1 0 ing to EA and the contents of RS32:63 are sent to the else b 1 (RA) device identified by RID, bypassing the cache. EA 1 b + (RB) raddr 1 address translation of EA The E bit must be 1. If it is not, the data storage error send load word request for raddr to handler is invoked. device identified by RID RT 1 320 || word from device EA must be a multiple of 4. If it is not, either the system alignment error handler is invoked or the results are Let the effective address (EA) be the sum (RA|0)+(RB). boundedly undefined. A load word request for the real address corresponding This instruction is treated as a Store, except that its to EA is sent to the device identified by RID, bypassing storage access is not performed in program order with the cache. The word returned by the device is placed respect to accesses to other Caching Inhibited and into RT32:63. RT0:31 are set to 0. Guarded storage locations unless software explicitly The E bit must be 1. If it is not, the data storage error imposes that order. handler is invoked. See Book III-S for additional information about this EA must be a multiple of 4. If it is not, either the system instruction. alignment error handler is invoked or the results are Special Registers Altered: boundedly undefined. None This instruction is treated as a Load. See Book III-S for additional information about this instruction. Special Registers Altered: None Programming Note The eieio or mbar instruction can be used to ensure that the storage accesses caused by eciwx and ecowx are performed in program order with respect to other Caching Inhibited and Guarded storage accesses. External Control Out Word Indexed X-form ecowx RS,RA,RB 31 RS RA RB 438 / 0 6 11 16 21 31 if RA = 0 then b 1 0 382 Power ISATM -- Book II Version 2.04 Appendix A. Assembler Extended Mnemonics In order to make assembler language programs simpler tions. This appendix defines extended mnemonics and to write and easier to understand, a set of extended symbols related to instructions defined in Book II. mnemonics and symbols is provided for certain instruc- Assemblers should provide the extended mnemonics and symbols listed here, and may provide others. A.1 Data Cache Block Flush A.2 Synchronize Mnemonics Mnemonics The L field in the Synchronize instruction controls the scope of the synchronization function performed by the The L field in the Data Cache Block Flush instruction instruction. Extended mnemonics are provided that controls the scope of the flush function performed by represent the L value in the mnemonic rather than the instruction. Extended mnemonics are provided that requiring it to be coded as a numeric operand. Two represent the L value in the mnemonic rather than extended mnemonics are provided for the L=0 value in requiring it to be coded as a numeric operand. order to support assemblers that do not recognize the Note: dcbf serves as both a basic and an extended sync mnemonic. mnemonic. The Assembler will recognize a dcbf mne- Note: sync serves as both a basic and an extended monic with three operands as the basic form, and a mnemonic. The Assembler will recognize a sync mne- dcbf mnemonic with two operands as the extended monic with one operand as the basic form, and a sync form. In the extended form the L operand is omitted mnemonic with no operand as the extended form. In and assumed to be 0. the extended form the L operand is omitted and assumed to be 0. dcbf RA,RB (equivalent to: dcbf RA,RB,0) dcbfl RA,RB (equivalent to: dcbfl RA,RB,1) sync (equivalent to: sync 0) msync (equivalent to: sync 0) lwsync (equivalent to: sync 1) ptesync (equivalent to: sync 2) Appendix A. Assembler Extended Mnemonics 383 Version 2.04 384 Power ISATM -- Book II Version 2.04 Appendix B. Programming Examples for Sharing Storage This appendix gives examples of how dependencies In these examples it is assumed that contention for the and the Synchronization instructions can be used to shared resource is low; the conditional branches are control storage access ordering when storage is shared optimized for this case by using "+" and "-" suffixes between programs. appropriately. Many of the examples use extended mnemonics (e.g., The examples deal with words; they can be used for bne, bne-, cmpw) that are defined in Appendix D of doublewords by changing all word-specific mnemonics Book I. to the corresponding doubleword-specific mnemonics (e.g., lwarx to ldarx, cmpw to cmpd). Many of the examples use the Load And Reserve and Store Conditional instructions, in a sequence that In this appendix it is assumed that all shared storage begins with a Load And Reserve instruction and ends locations are in storage that is Memory Coherence with a Store Conditional instruction (specifying the Required, and that the storage locations specified by same storage location as the Load Conditional) fol- Load And Reserve and Store Conditional instructions lowed by a Branch Conditional instruction that tests are in storage that is neither Write Through Required whether the Store Conditional instruction succeeded. nor Caching Inhibited. B.1 Atomic Update Primitives An atomic read/modify/write operation reads a storage location and writes its next value, which may be a func- This section gives examples of how the Load And tion of its current value, all as a single atomic operation. Reserve and Store Conditional instructions can be The examples shown provide the effect of an atomic used to emulate atomic read/modify/write operations. read/modify/write operation, but use several instruc- tions rather than a single atomic instruction. Fetch and No-op Fetch and Store The "Fetch and No-op" primitive atomically loads the The "Fetch and Store" primitive atomically loads and current value in a word in storage. replaces a word in storage. In this example it is assumed that the address of the In this example it is assumed that the address of the word to be loaded is in GPR 3 and the data loaded are word to be loaded and replaced is in GPR 3, the new returned in GPR 4. value is in GPR 4, and the old value is returned in GPR 5. loop: lwarx r4,0,r3 #load and reserve loop: stwcx. r4,0,r3 #store old value if lwarx r5,0,r3 #load and reserve # still reserved stwcx. r4,0,r3 #store new value if bne- loop #loop if lost reservation # still reserved bne- loop loop if lost reservation Note: 1. The stwcx., if it succeeds, stores to the target location the same value that was loaded by the preceding lwarx. While the store is redundant with respect to the value in the location, its success ensures that the value loaded by the lwarx is still the current value at the time the stwcx. is exe- cuted. Appendix B. Programming Examples for Sharing Storage 385 Version 2.04 Fetch and Add Compare and Swap The "Fetch and Add" primitive atomically increments a The "Compare and Swap" primitive atomically com- word in storage. pares a value in a register with a word in storage, if they are equal stores the value from a second register into In this example it is assumed that the address of the the word in storage, if they are unequal loads the word word to be incremented is in GPR 3, the increment is in from storage into the first register, and sets the EQ bit GPR 4, and the old value is returned in GPR 5. of CR Field 0 to indicate the result of the comparison. loop: In this example it is assumed that the address of the lwarx r5,0,r3 #load and reserve word to be tested is in GPR 3, the comparand is in GPR add r0,r4,r5#increment word stwcx. r0,0,r3 #store new value if still res'ved 4 and the old value is returned there, and the new value bne- loop #loop if lost reservation is in GPR 5. loop: Fetch and AND lwarx r6,0,r3 #load and reserve The "Fetch and AND" primitive atomically ANDs a value cmpw r4,r6 #1st 2 operands equal? bne- exit #skip if not into a word in storage. stwcx. r5,0,r3 #store new value if still res'ved In this example it is assumed that the address of the bne- loop #loop if lost reservation word to be ANDed is in GPR 3, the value to AND into it exit: is in GPR 4, and the old value is returned in GPR 5. mr r4,r6 #return value from storage Notes: loop: lwarx r5,0,r3 #load and reserve 1. The semantics given for "Compare and Swap" and r0,r4,r5#AND word above are based on those of the IBM System/370 stwcx. r0,0,r3 #store new value if still res'ved Compare and Swap instruction. Other architec- bne- loop #loop if lost reservation tures may define a Compare and Swap instruction Note: differently. 1. The sequence given above can be changed to per- 2. "Compare and Swap" is shown primarily for peda- form another Boolean operation atomically on a gogical reasons. It is useful on machines that lack word in storage, simply by changing the and the better synchronization facilities provided by instruction to the desired Boolean instruction (or, lwarx and stwcx.. A major weakness of a Sys- xor, etc.). tem/370-style Compare and Swap instruction is that, although the instruction itself is atomic, it checks only that the old and current values of the Test and Set word being tested are equal, with the result that This version of the "Test and Set" primitive atomically programs that use such a Compare and Swap to loads a word from storage, sets the word in storage to a control a shared resource can err if the word has nonzero value if the value loaded is zero, and sets the been modified and the old value subsequently EQ bit of CR Field 0 to indicate whether the value restored. The sequence shown above has the loaded is zero. same weakness. In this example it is assumed that the address of the 3. In some applications the second bne- instruction word to be tested is in GPR 3, the new value (nonzero) and/or the mr instruction can be omitted. The is in GPR 4, and the old value is returned in GPR 5. bne- is needed only if the application requires that if the EQ bit of CR Field 0 on exit indicates "not loop: equal" then (r4) and (r6) are in fact not equal. The lwarx r5,0,r3 #load and reserve mr is needed only if the application requires that if cmpwi r5,0 #done if word not equal to 0 the comparands are not equal then the word from bne- exit storage is loaded into the register with which it was stwcx. r4,0,r3 #try to store non-0 compared (rather than into a third register). If bne- loop #loop if lost reservation either or both of these instructions is omitted, the exit: ... resulting Compare and Swap does not obey Sys- tem/370 semantics. 386 Power ISATM -- Book II Version 2.04 B.2 Lock Acquisition and Release, and Related Techniques This section gives examples of how dependencies and ment locks, import and export barriers, and similar con- the Synchronization instructions can be used to imple- structs. B.2.1 Lock Acquisition and Import If the shared data structure is in storage that is neither Write Through Required nor Caching Inhibited, an Barriers lwsync instruction can be used instead of the isync instruction. If lwsync is used, the load from An "import barrier" is an instruction or sequence of "data1" may be performed before the stwcx.. But if the instructions that prevents storage accesses caused by stwcx. fails, the second branch is taken and the lwarx instructions following the barrier from being performed is re-executed. If the stwcx. succeeds, the value before storage accesses that acquire a lock have been returned by the load from "data1" is valid even if the performed. An import barrier can be used to ensure load is performed before the stwcx., because the that a shared data structure protected by a lock is not lwsync ensures that the load is performed after the accessed until the lock has been acquired. A sync instance of the lwarx that created the reservation used instruction can be used as an import barrier, but the by the successful stwcx.. approaches shown below will generally yield better per- formance because they order only the relevant storage accesses. B.2.1.2 Obtain Pointer and Import Shared Storage B.2.1.1 Acquire Lock and Import If lwarx and stwcx. instructions are used to obtain a Shared Storage pointer into a shared data structure, an import barrier is not needed if all the accesses to the shared data struc- If lwarx and stwcx. instructions are used to obtain the ture depend on the value obtained for the pointer. The lock, an import barrier can be constructed by placing an following example uses the "Fetch and Add" primitive to isync instruction immediately following the loop con- obtain and increment the pointer. taining the lwarx and stwcx.. The following example uses the "Compare and Swap" primitive to acquire the In this example it is assumed that the address of the lock. pointer is in GPR 3, the value to be added to the pointer is in GPR 4, and the old value of the pointer is returned In this example it is assumed that the address of the in GPR 5. lock is in GPR 3, the value indicating that the lock is free is in GPR 4, the value to which the lock should be loop: set is in GPR 5, the old value of the lock is returned in lwarx r5,0,r3 #load pointer and reserve GPR 6, and the address of the shared data structure is add r0,r4,r5#increment the pointer in GPR 9. stwcx. r0,0,r3 #try to store new value bne- loop #loop if lost reservation loop: lwz r7,data1(r5) #load shared data lwarx r6,0,r3 #load lock and reserve cmpw r4,r6 #skip ahead if The load from "data1" cannot be performed until the bne- wait # lock not free pointer value has been loaded into GPR 5 by the lwarx. stwcx. r5,0,r3 #try to set lock The load from "data1" may be performed before the bne- loop #loop if lost reservation stwcx.. But if the stwcx. fails, the branch is taken and isync #import barrier the value returned by the load from "data1" is dis- lwz r7,data1(r9)#load shared data carded. If the stwcx. succeeds, the value returned by . the load from "data1" is valid even if the load is per- . formed before the stwcx., because the load uses the wait... #wait for lock to free pointer value returned by the instance of the lwarx that The second bne- does not complete until CR0 has created the reservation used by the successful stwcx.. been set by the stwcx.. The stwcx. does not set CR0 An isync instruction could be placed between the bne- until it has completed (successfully or unsuccessfully). and the subsequent lwz, but no isync is needed if all The lock is acquired when the stwcx. completes suc- accesses to the shared data structure depend on the cessfully. Together, the second bne- and the subse- value returned by the lwarx. quent isync create an import barrier that prevents the load from "data1" from being performed until the branch has been resolved not to be taken. Appendix B. Programming Examples for Sharing Storage 387 Version 2.04 B.2.2 Lock Release and Export The lwsync ensures that the store that releases the lock will not be performed with respect to any other pro- Barriers cessor until all stores caused by instructions preceding the lwsync have been performed with respect to that An "export barrier" is an instruction or sequence of processor. instructions that prevents the store that releases a lock from being performed before stores caused by instruc- tions preceding the barrier have been performed. An export barrier can be used to ensure that all stores to a shared data structure protected by a lock will be per- B.2.3 Safe Fetch formed with respect to any other processor before the If a load must be performed before a subsequent store store that releases the lock is performed with respect to (e.g., the store that releases a lock protecting a shared that processor. data structure), a technique similar to the following can be used. B.2.2.1 Export Shared Storage and In this example it is assumed that the address of the Release Lock storage operand to be loaded is in GPR 3, the contents A sync instruction can be used as an export barrier of the storage operand are returned in GPR 4, and the independent of the storage control attributes (e.g., address of the storage operand to be stored is in GPR presence or absence of the Caching Inhibited attribute) 5. of the storage containing the shared data structure. lwz r4,0(r3)#load shared data Because the lock must be in storage that is neither cmpw r4,r4 #set CR0 to "equal" Write Through Required nor Caching Inhibited, if the bne- $-8 #branch never taken shared data structure is in storage that is Write stw r7,0(r5)#store other shared data Through Required or Caching Inhibited a sync instruc- tion must be used as the export barrier. An alternative is to use a technique similar to that described in Section B.2.1.2, by causing the stw to In this example it is assumed that the shared data depend on the value returned by the lwz and omitting structure is in storage that is Caching Inhibited, the the cmpw and bne-. The dependency could be created address of the lock is in GPR 3, the value indicating by ANDing the value returned by the lwz with zero and that the lock is free is in GPR 4, and the address of the then adding the result to the value to be stored by the shared data structure is in GPR 9. stw. If both storage operands are in storage that is nei- ther Write Through Required nor Caching Inhibited, stw r7,data1(r9)#store shared data (last) another alternative is to replace the cmpw and bne- sync #export barrier with an lwsync instruction. stw r4,lock(r3)#release lock The sync ensures that the store that releases the lock will not be performed with respect to any other proces- sor until all stores caused by instructions preceding the sync have been performed with respect to that proces- sor. B.2.2.2 Export Shared Storage and Release Lock using lwsync If the shared data structure is in storage that is neither Write Through Required nor Caching Inhibited, an lwsync instruction can be used as the export barrier. Using lwsync rather than sync will yield better perfor- mance in most systems. In this example it is assumed that the shared data structure is in storage that is neither Write Through Required nor Caching Inhibited, the address of the lock is in GPR 3, the value indicating that the lock is free is in GPR 4, and the address of the shared data structure is in GPR 9. stw r7,data1(r9)#store shared data (last) lwsync #export barrier stw r4,lock(r3)#release lock 388 Power ISATM -- Book II Version 2.04 B.3 List Insertion B.4 Notes This section shows how the lwarx and stwcx. instruc- 1. To increase the likelihood that forward progress is tions can be used to implement simple insertion into a made, it is important that looping on lwarx/stwcx. singly linked list. (Complicated list insertion, in which pairs be minimized. For example, in the "Test and multiple values must be changed atomically, or in which Set" sequence shown in Section B.1, this is the correct order of insertion depends on the contents achieved by testing the old value before attempting of the elements, cannot be implemented in the manner the store; were the order reversed, more stwcx. shown below and requires a more complicated strategy instructions might be executed, and reservations such as using locks.) might more often be lost between the lwarx and the stwcx. The "next element pointer" from the list element after which the new element is to be inserted, here called the 2. The manner in which lwarx and stwcx. are com- "parent element", is stored into the new element, so municated to other processors and mechanisms, that the new element points to the next element in the and between levels of the storage hierarchy within list; this store is performed unconditionally. Then the a given processor, is implementation-dependent. address of the new element is conditionally stored into In some implementations performance may be the parent element, thereby adding the new element to improved by minimizing looping on a lwarx instruc- the list. tion that fails to return a desired value. For exam- ple, in the "Test and Set" sequence shown in In this example it is assumed that the address of the Section B.1, if the programmer wishes to stay in parent element is in GPR 3, the address of the new ele- the loop until the word loaded is zero, he could ment is in GPR 4, and the next element pointer is at off- change the "bne- exit" to "bne- loop". However, in set 0 from the start of the element. It is also assumed some implementations better performance may be that the next element pointer of each list element is in a obtained by using an ordinary Load instruction to reservation granule separate from that of the next ele- do the initial checking of the value, as follows. ment pointer of all other list elements. loop: loop: lwz r5,0(r3)#load the word lwarx r2,0,r3 #get next pointer cmpwi r5,0 #loop back if word stw r2,0(r4)#store in new element bne- loop # not equal to 0 lwsync or sync#order stw before stwcx lwarx r5,0,r3 #try again, reserving stwcx. r4,0,r3 #add new element to list cmpwi r5,0 # (likely to succeed) bne- loop #loop if stwcx. failed bne- loop stwcx.r4,0,r3 #try to store non-0 In the preceding example, if two list elements have next bne- loop #loop if lost reserv'n element pointers in the same reservation granule then, in a multiprocessor, "livelock" can occur. (Livelock is a 3. In a multiprocessor, livelock is possible if there is a state in which processors interact in a way such that no Store instruction (or any other instruction that can processor makes forward progress.) clear another processor's reservation; see Section 1.7.3.1) between the lwarx and the stwcx. of a If it is not possible to allocate list elements such that lwarx/stwcx. loop and any byte of the storage each element's next element pointer is in a different location specified by the Store is in the reservation reservation granule, then livelock can be avoided by granule. For example, the first code sequence using the following, more complicated, sequence. shown in Section B.3 can cause livelock if two list elements have next element pointers in the same lwz r2,0(r3)#get next pointer reservation granule. loop1: mr r5,r2 #keep a copy stw r2,0(r4)#store in new element sync #order stw before stwcx. and before lwarx loop2: lwarx r2,0,r3 #get it again cmpw r2,r5 #loop if changed (someone bne- loop1 # else progressed) stwcx. r4,0,r3 #add new element to list bne- loop2 #loop if failed In the preceding example, livelock is avoided by the fact that each processor re-executes the stw only if some other processor has made forward progress. Appendix B. Programming Examples for Sharing Storage 389 Version 2.04 390 Power ISATM -- Book II Version 2.04 Book III-S: Power ISA Operating Environment Architecture - Server Environment Book III-S: Power ISA Operating Environment Architecture - Server Envi- 391 Version 2.04 392 Power ISATM -- Book III-S Version 2.04 Chapter 1. Introduction 1.1 Overview. . . . . . . . . . . . . . . . . . . . 393 1.4 Exceptions. . . . . . . . . . . . . . . . . . . 394 1.2 Document Conventions . . . . . . . . 393 1.5 Synchronization. . . . . . . . . . . . . . . 395 1.2.1 Definitions and Notation . . . . . . 393 1.5.1 Context Synchronization . . . . . . 395 1.2.2 Reserved Fields. . . . . . . . . . . . . 394 1.5.2 Execution Synchronization . . . . . 395 1.3 General Systems Overview . . . . . 394 1 For "system trap handler" substitute "Trap type 1.1 Overview Program interrupt". Chapter 1 of Book I describes computation modes, document conventions, a general systems overview, 1.2.1 Definitions and Notation instruction formats, and storage addressing. This chap- ter augments that description as necessary for the The definitions and notation given in Book I are aug- Power ISA Operating Environment Architecture. mented by the following. 1 real page 1.2 Document Conventions A unit of real storage that is aligned at a boundary that is a multiple of its size. The real page size is The notation and terminology used in Book I apply to 4KB. this Book also, with the following substitutions. 1 context of a program 1 For "system alignment error handler" substitute The processor state (e.g., privilege and relocation) "Alignment interrupt". in which the program executes. The context is con- trolled by the contents of certain System Registers, 1 For "system data storage error handler" substitute such as the MSR and SDR1, of certain lookaside "Data Storage interrupt", "Hypervisor Data Storage buffers, such as the SLB and TLB, and of the Page interrupt", "Data Segment interrupt", or "Hypervisor Table. Data Segment interrupt," as appropriate. 1 exception 1 For "system error handler" substitute "interrupt". An error, unusual condition, or external signal, that 1 For "system floating-point enabled exception error may set a status bit and may or may not cause an handler" substitute "Floating-Point Enabled Excep- interrupt, depending upon whether the correspond- tion type Program interrupt". ing interrupt is enabled. 1 For "system illegal instruction error handler" substi- 1 interrupt tute "Illegal Instruction type Program interrupt" The act of changing the machine state in response to an exception, as described in Chapter 1 For "system instruction storage error handler" sub- 6. "Interrupts" on page 459. stitute "Instruction Storage interrupt", "Hypervisor Instruction Storage interrupt", "Instruction Segment 1 trap interrupt interrupt", or "Hypervisor Instruction Segment An interrupt that results from execution of a Trap interrupt", as appropriate. instruction. 1 For "system privileged instruction error handler" 1 Additional exceptions to the rule that the processor substitute "Privileged Instruction type Program obeys the sequential execution model, beyond interrupt". those described in the section entitled "Instruction 1 For "system service program" substitute "System Fetching" in Book I, are the following. Call interrupt". - A System Reset or Machine Check interrupt may occur. The determination of whether an Chapter 1. Introduction 393 Version 2.04 instruction is required by the sequential execu- Some fields of certain architected storage tables may tion model is not affected by the potential be written to automatically by the processor, e.g., Ref- occurrence of a System Reset or Machine erence and Change bits in the Page Table. When the Check interrupt. (The determination is processor writes to such a table, the following rules are affected by the potential occurrence of any obeyed. other kind of interrupt.) 1 Unless otherwise stated, no defined field other - A context-altering instruction is executed than the one(s) the processor is specifically updat- (Chapter 10. "Synchronization Requirements ing are modified. for Context Alterations" on page 489). The 1 Contents of reserved fields are either preserved by context alteration need not take effect until the the processor or written as zero. required subsequent synchronizing operation has occurred. Programming Note - A Reference and Change bit is updated by the Software should set reserved fields in the SLB and processor. The update need not be performed in architected storage tables to zero, because with respect to that processor until the these fields may be assigned a meaning in some required subsequent synchronizing operation future version of the architecture. has occurred. 1 "must" If hypervisor software violates a rule that is stated using the word "must" (e.g., "this field must be set 1.3 General Systems Overview to 0"), and the rule pertains to the contents of a The processor or processor unit contains the sequenc- hypervisor resource, to executing an instruction ing and processing controls for instruction fetch, that can be executed only in hypervisor state, or to instruction execution, and interrupt action. Most imple- accessing storage in real addressing mode, the mentations also contain data and instruction caches. results are undefined, and may include altering Instructions that the processing unit can execute fall resources belonging to other partitions, causing into the following classes: the system to "hang", etc. 1 instructions executed in the Branch Processor 1 hardware 1 instructions executed in the Fixed-Point Processor Any combination of hard-wired implementation, 1 instructions executed in the Floating-Point Proces- emulation assist, or interrupt for software assis- sor tance. In the last case, the interrupt may be to an 1 instructions executed in the Vector Processor architected location or to an implementation- dependent location. Any use of emulation assists Almost all instructions executed in the Branch Proces- or interrupts to implement the architecture is imple- sor, Fixed-Point Processor, Floating-Point Processor, mentation-dependent. and Vector Processor are nonprivileged and are described in Book I. Book II may describe additional 1 privileged state and supervisor mode nonprivileged instructions (e.g., Book II describes some Used interchangeably to refer to a processor state nonprivileged instructions for cache management). in which privileged facilities are available. Instructions related to the privileged state of the pro- cessor, control of processor resources, control of the 1 problem state and user mode storage hierarchy, and all other privileged instructions Used interchangeably to refer to a processor state are described here or are implementation-dependent. in which privileged facilities are not available. 1 /, //, ///, ... denotes a field that is reserved in an instruction, in a register, or in an architected stor- 1.4 Exceptions age table. The following augments the exceptions defined in Book 1 ?, ??, ???, ... denotes a field that is implementa- I that can be caused directly by the execution of an tion-dependent in an instruction, in a register, or in instruction: an architected storage table. 1 the execution of a floating-point instruction when MSRFP=0 (Floating-Point Unavailable interrupt) 1.2.2 Reserved Fields 1 an attempt to modify a hypervisor resource when Book I's description of the handling of reserved bits in the processor is in privileged but non-hypervisor System Registers, and of reserved values of defined state (see Chapter 2), or an attempt to execute a fields of System Registers, applies also to the SLB. hypervisor-only instruction (e.g., tlbie) when the Book I's description of the handling of reserved values processor is in privileged but non-hypervisor state of defined fields of System Registers applies also to architected storage tables (e.g., the Page Table). 394 Power ISATM -- Book III-S Version 2.04 1 the execution of a traced instruction (Trace inter- Programming Note rupt) A context synchronizing operation is necessarily 1 the execution of a Vector instruction when the vec- execution synchronizing; see Section 1.5.2. tor processor is unavailable (Vector Unavailable interrupt) Unlike the Synchronize instruction, a context syn- chronizing operation does not affect the order in which storage accesses are performed. 1.5 Synchronization Item 2 permits a choice only for isync (and sync and ptesync; see Section 1.5.2) because all other The synchronization described in this section refers to execution synchronizing operations also alter con- the state of the processor that is performing the syn- text. chronization. 1.5.1 Context Synchronization 1.5.2 Execution Synchronization An instruction or event is context synchronizing if it sat- An instruction is execution synchronizing if it satisfies isfies the requirements listed below. Such instructions items 2 and 3 of the definition of context synchroniza- and events are collectively called context synchronizing tion (see Section 1.5.1). sync and ptesync are treated operations. The context synchronizing operations are like isync with respect to item 2 (i.e., the conditions the isync instruction, the System Linkage instructions, described in item 2 apply to the completion of sync and the mtmsr[d] instructions with L=0, and most interrupts ptesync). Examples of execution synchronizing (see Section 6.4). instructions include sync, ptesync, and mtmsrd. 1. The operation causes instruction dispatching (the An instruction is execution synchronizing if it satisfies issuance of instructions by the instruction fetching items 2 and 3 of the definition of context synchroniza- mechanism to any instruction execution mecha- tion (see Section 1.5.1). sync and ptesync are treated nism) to be halted. like isync with respect to item 2. The execution syn- 2. The operation is not initiated or, in the case of chronizing instructions are sync, ptesync, the isync, does not complete, until all instructions that mtmsr[d] instructions with L=1, and all context syn- precede the operation have completed to a point at chronizing instructions. which they have reported all exceptions they will cause. Programming Note All context synchronizing instructions are execution 3. The operation ensures that the instructions that synchronizing. precede the operation will complete execution in the context (privilege, relocation, storage protec- Unlike a context synchronizing operation, an exe- tion, etc.) in which they were initiated, except that cution synchronizing instruction does not ensure the operation has no effect on the context in which that the instructions following that instruction will the associated Reference and Change bit updates execute in the context established by that instruc- are performed. tion. This new context becomes effective some- time after the execution synchronizing instruction 4. If the operation directly causes an interrupt (e.g., completes and before or at a subsequent context sc directly causes a System Call interrupt) or is an synchronizing operation. interrupt, the operation is not initiated until no exception exists having higher priority than the exception associated with the interrupt (see Sec- tion 6.8). 5. The operation ensures that the instructions that fol- low the operation will be fetched and executed in the context established by the operation. (This requirement dictates that any prefetched instruc- tions be discarded and that any effects and side effects of executing them out-of-order also be dis- carded, except as described in Section 5.5, "Per- forming Operations Out-of-Order".) Chapter 1. Introduction 395 Version 2.04 396 Power ISATM -- Book III-S Version 2.04 Chapter 2. Logical Partitioning (LPAR) 2.1 Overview. . . . . . . . . . . . . . . . . . . . 397 2.5 Logical Partition 2.2 Logical Partitioning Control Register Identification Register (LPIDR) . . . . . . 399 (LPCR) . . . . . . . . . . . . . . . . . . . . . . . . 397 2.6 Other Hypervisor Resources. . . . . 399 2.3 Real Mode Offset Register (RMOR) . . 2.7 Sharing Hypervisor Resources . . . 400 399 2.8 Hypervisor Interrupt Little-Endian 2.4 Hypervisor Real Mode Offset Register (HILE) Bit . . . . . . . . . . . . . . . . . . . . . . . 400 (HRMOR) . . . . . . . . . . . . . . . . . . . . . . 399 2.1 Overview 2.2 Logical Partitioning Control The Logical Partitioning (LPAR) facility permits proces- Register (LPCR) sors and portions of real storage to be assigned to logi- The layout of the Logical Partitioning Control Register cal collections called partitions, such that a program (LPCR) is shown in Figure 1 below. executing on a processor in one partition cannot inter- fere with any program executing on a processor in a dif- VRMASD ferent partition. This isolation can be provided for both / // RMLS ILE // LPES RMI HDICE VC problem state and privileged state programs, by using a layer of trusted software, called a hypervisor program (or simply a "hypervisor"), and the resources provided 0 3 12 17 34 38 39 60 62 63 by this facility to manage system resources. (A hypervi- Figure 1. Logical Partitioning Control Register sor is a program that runs in hypervisor state; see below.) The contents of the LPCR control a number of aspects of the operation of the processor with respect to a logi- The number of partitions supported is implementation- cal partition. Below are shown the bit definitions for the dependent. LPCR. A processor is assigned to one partition at any given time. A processor can be assigned to any given parti- Bit Description tion without consideration of the physical configuration 0:2 Virtualization Control (VC) of the system (e.g., shared registers, caches, organiza- tion of the storage hierarchy), except that processors Controls the virtualization of partition memory. that share certain hypervisor resources may need to be This field contains two subfields, VPM and assigned to the same partition; see Section 2.6. The ISL. registers and facilities used to control Logical Partition- 0:1 Virtualized Partition Memory (VPM) ing are listed below and described in the following sub- sections. This field controls whether VPM mode is enabled as specified below. (See Except in the following subsections, references to the Section 5.7.3.4, "Virtual Real Mode "operating system" in this document include the hyper- Addressing Mechanism" and Section 5.7.2, visor unless otherwise stated or obvious from context. "Virtualized Partition Memory (VPM) Mode" for additional information on VPM mode.) Bit Description Chapter 2. Logical Partitioning (LPAR) 397 Version 2.04 0 This bit controls whether VPM mode is dent, and each value corresponds to a maxi- enabled when address translation is mum effective address of 2m, where m has a disabled minimum value of 12 and a maximum value 0 - VPM mode disabled equal to the number of bits in the real address 1 - VPM mode enabled size supported by the implementation. 38 Interrupt Little-Endian (ILE) 1 This bit controls whether VPM mode is enabled when address translation is The contents of the ILE bit are copied into enabled MSRLE by interrupts that set MSRHV to 0 (see 0 - VPM mode disabled Section 6.5), to establish the Endian mode for 1 - VPM mode enabled the interrupt handler. 39:59 Reserved 2 Ignore SLB Large Page Specification (ISL) 60:61 Logical Partitioning Environment Selector (LPES) Controls whether ISL mode is enabled as specified below. Three of the four LPES values are sup- ported. The 0b10 value is reserved. 0 - ISL mode disabled 1 - ISL mode enabled 60 LPES0 When ISL mode is enabled and address Controls whether External interrupts set translation is enabled and the processor is MSRHV to 1 or leave it unchanged. not in hypervisor state, address translation 61 LPES1 is performed as if the contents of SLBL||LP were 0b000. When address translation is Controls how storage is accessed when disabled, the setting of the ISL bit has no address translation is disabled, and whether a effect. ISL mode has no effect on SLB, subset of interrupts set MSRHV to 1. TLB, and ERAT entry invalidations caused by slbie, slbia, tlbia, tlbie, and slbie. Programming Note LPES1=0 provides an environment in 3:11 Reserved which only the hypervisor can run with 12:16 Virtual Real Mode Area Segment Descrip- address translation disabled and in which tor (VRMASD) all interrupts invoke the hypervisor. This When address translation is disabled and value (along with MSRHV=1) can also be VPM0=1, the contents of this field specify the used in a system that is not partitioned, to L and LP fields of the segment descriptor that permit the operating system to access all apply for storage references to the virtualized system resources. real mode area (VRMA). See Section 5.7.3.4, "Virtual Real Mode Addressing Mechanism" 62 Real Mode Caching Inhibited Bit (RMI) for additional information. The definitions and The RMI bit affects the manner in which stor- allowed values of the L and LP fields are the age accesses are performed in hypervisor same as for the corresponding fields in the state when address translation is disabled segment descriptor. (See Section 5.7.7.) If (see Section 5.7.3.3 on page 424). VPM0=0 or address translation is enabled, the setting of the VRMASD has no effect. Programming Note Because in real addressing mode all stor- Bit Description age is not Caching Inhibited (unless the Real Mode Caching Inhibited bit is 1), 0 Virtual Page Size Selector Bit 0 (L) software should not map a Caching Inhib- 1:2 Reserved ited virtual page to storage that is treated 3:4 Virtual Page Size Selector Bits 1:2 (LP) as non-Guarded in real addressing mode. Doing so could permit storage locations in Programming Note the virtual page to be copied into the 17:33 Reserved cache, which could lead to violations of the requirement given in Section 5.8.2.2 34:37 Real Mode Limit Selector (RMLS) on page 441 for changing the value of the The RMLS field specifies the largest effective I bit. See also Section 5.7.3.3.1 on address that can be used by partition software page 424. when address translation is disabled. The valid RMLS values are implementation-depen- 398 Power ISATM -- Book III-S Version 2.04 63 Hypervisor Decrementer Interrupt Condi- The contents of the HRMOR affect how some storage tionally Enable (HDICE) accesses are performed as described in Section 5.7.3 on page 422 and Section 5.7.4 on page 426. 0 Hypervisor Decrementer interrupts are disabled. 1 Hypervisor Decrementer interrupts are enabled if permitted by MSREE, MSRHV, 2.5 Logical Partition and MSRPR; see Section 6.5.12 on Identification Register (LPIDR) page 473. The layout of the Logical Partition Identification Regis- See Section 5.7.3 on page 422 (including subsections) ter (LPIDR) is shown in Figure 4 below. and Section 5.7.9 on page 437 for a description of how storage accesses are affected by the setting of LPES1, LPID RMLS, and RMI. See Section 6.5 on page 466 for a 32 63 description of how the setting of LPES0:1 affects the processing of interrupts. Bits Name Description 32:63 LPID Logical Partition Identifier 2.3 Real Mode Offset Register Figure 4. Logical Partition Identification Register (RMOR) The contents of the LPIDR identify the partition to which the processor is assigned, affecting operations The layout of the Real Mode Offset Register (RMOR) is necessary to manage the coherency of some transla- shown in Figure 2 below. tion lookaside buffers (see Section 5.10.1 on page 454 and Chapter 10 on page 489). // RMO 0 4 63 The supported LPID values consist of all non-negative values that are less than an implementation-dependent Bits Name Description power of 2, 2q, where 2q (the maximum number of 4:63 RMO Real Mode Offset processors in a system) × 2. Figure 2. Real Mode Offset Register Programming Note All other fields are reserved. On some implementations, software must prevent the execution of a tlbie instruction on any proces- The supported RMO values are the non-negative multi- sor for which the contents of the LPIDR is the same ples of 2s, where 2s is the smallest implementation- as on the processor on which the LPIDR is being dependent limit value representable by the contents of modified or is the same as the new value being the Real Mode Limit Selector field of the LPCR. written to the LPIDR. This restriction can be met The contents of the RMOR affect how some storage with less effort if one partition identity is used only accesses are performed as described in Section 5.7.3 on processors on which no tlbie instruction is ever on page 422 and Section 5.7.4 on page 426. executed. This partition can be thought of as the transfer partition used exclusively to move a pro- cessor from one partition to another. 2.4 Hypervisor Real Mode Offset Register (HRMOR) 2.6 Other Hypervisor Resources The layout of the Hypervisor Real Mode Offset Register (HRMOR) is shown in Figure 3 below. In addition to the resources described above, the fol- lowing resources are hypervisor resources, accessible // HRMO to software only when the processor is in hypervisor 0 4 63 state. 1 All implementation-specific resources, including Bits Name Description implementation-specific registers (e.g., "HID" reg- 4:63 HRMO Real Mode Offset isters), that control hardware functions or affect the results of instruction execution. Examples include Figure 3. Hypervisor Real Mode Offset Register resources that disable caches, disable hardware All other fields are reserved. error detection, set breakpoints, control power management, or significantly affect performance. The supported HRMO values are the non-negative multiples of 2r, where r is an implementation-dependent 1 ME bit of the MSR value and 12 r 26. Chapter 2. Logical Partitioning (LPAR) 399 Version 2.04 1 DABR, DABRX, EAR (if implemented), HDAR, tents of the field could have side effects. (E.g., soft- HDSISR, Hypervisor Decrementer, PIR, PURR, ware must ensure that the contents of LPCRLPES are SDR1, and Time Base. (Note: Although the Time identical among all processors that are in the same par- Base and the PURR can be altered only by a tition and are not in hypervisor state.) For the HDICE hypervisor program, the Time Base can be read by field, software must ensure that the contents of the field all programs and the PURR can be read when the are identical among all processors that share the processor is in privileged state.) Hypervisor Decrementer and are in a state such that the contents of the field could have side effects. (There The contents of a hypervisor resource can be modified are no identity requirements for the RMI field.) by the execution of an instruction (e.g., mtspr) only in hypervisor state (MSRHV PR = 0b10). Whether an attempt to modify the contents of a given hypervisor resource, other than MSRME, in privileged but non- 2.8 Hypervisor Interrupt Little- hypervisor state (MSRHV PR = 0b00) is ignored (i.e., Endian (HILE) Bit treated as a no-op) or causes a Privileged Instruction type Program interrupt is implementation-dependent. The Hypervisor Interrupt Little-Endian (HILE) bit is a bit An attempt to modify MSRME in privileged but non- in an implementation-dependent register or similar hypervisor state is ignored (i.e., the bit is not changed). mechanism. The contents of the HILE bit are copied into MSRLE by interrupts that set MSRHV to 1 (see Sec- The tlbie, tlbiel, tlbia, and tlbsync instructions can be tion 6.5), to establish the Endian mode for the interrupt executed only in hypervisor state; see the descriptions handler. The HILE bit is set, by an implementation- of these instructions on pages 450 and 453. dependent method, during system initialization, and cannot be modified after system initialization. Programming Note Because the SPRs listed above are privileged for The contents of the HILE bit must be the same for all writing, an attempt to modify the contents of any of processors under the control of a given instance of the these SPRs in problem state (MSRPR=1) using hypervisor; otherwise all results are undefined. mtspr causes a Privileged Instruction type Pro- gram exception, and similarly for MSRME. 2.7 Sharing Hypervisor Resources Some hypervisor resources may be shared among pro- cessors. Programs that modify these resources must be aware of this sharing, and must allow for the fact that changes to these resources may affect more than one processor. The following resources may be shared among processors. 1 RMOR (see Section 2.3.) 1 HRMOR (see Section 2.4.) 1 LPIDR (see Section 2.5.) 1 PVR (see Section 4.3.1.) 1 SDR1 (see Section 5.7.7.2.) 1 Time Base (see Section 7.2.) 1 Hypervisor Decrementer (see Section 7.4.) 1 certain implementation-specific registers The set of resources that are shared is implementation- dependent. Processors that share any of the resources listed above, with the exception of the PIR and the HRMOR, must be in the same partition. For each field of the LPCR except the RMI field and the HDICE field, software must ensure that the contents of the field are identical among all processors that are in the same partition and are in a state such that the con- 400 Power ISATM -- Book III-S Version 2.04 Chapter 3. Branch Processor 3.1 Branch Processor Overview . . . . . 401 3.3 Branch Processor Instructions . . . 404 3.2 Branch Processor Registers . . . . . 401 3.3.1 System Linkage Instructions . . . 404 3.2.1 Machine State Register . . . . . . . 401 3.1 Branch Processor Overview Programming Note The privilege state of the processor is This chapter describes the details concerning the regis- determined by MSRHV and MSRPR, as ters and the privileged instructions implemented in the follows. Branch Processor that are not covered in Book I. HV PR 0 0 privileged 3.2 Branch Processor Registers 0 1 problem 1 0 privileged and hypervisor 3.2.1 Machine State Register 1 1 problem MSRHV can be set to 1 only by the Sys- The Machine State Register (MSR) is a 64-bit register. tem Call instruction and some interrupts. This register defines the state of the processor. On It can be set to 0 only by rfid and hrfid. interrupt, the MSR bits are altered in accordance with Figure 37 on page 466. The MSR can also be modified 4:37 Reserved by the mtmsr[d], rfid, and hrfid instructions. It can be read by the mfmsr instruction. 38 Vector Available (VEC) [Category: Vector] 0 The processor cannot execute any vector MSR instructions, including vector loads, stores, 0 63 and moves. Figure 5. Machine State Register 1 The processor can execute vector instruc- tions. Below are shown the bit definitions for the Machine State Register. 39:46 Reserved 47 Reserved Bit Description 48 External Interrupt Enable (EE) 0 Sixty-Four-Bit Mode (SF) 0 External and Decrementer interrupts are 0 The processor is in 32-bit mode. disabled. 1 The processor is in 64-bit mode. 1 External and Decrementer interrupts are 1:2 Reserved enabled. 3 Hypervisor State (HV) This bit also affects whether Hypervisor Dec- rementer interrupts are enabled; 0 The processor is not in hypervisor state. Section 6.5.12 on page 473. 1 If MSRPR=0 the processor is in hypervisor state; otherwise the processor is not in 49 Problem State (PR) hypervisor state. 0 The processor is in privileged state. 1 The processor is in problem state. Chapter 3. Branch Processor 401 Version 2.04 58 Instruction Relocate (IR) Programming Note Any instruction that sets MSRPR to 1 also 0 Instruction address translation is disabled. sets MSREE, MSRIR, and MSRDR to 1. 1 Instruction address translation is enabled. 50 Floating-Point Available (FP) Programming Note [Category: Floating-Point] See the Programming Note in the defini- tion of MSRPR. 0 The processor cannot execute any float- ing-point instructions, including floating- 59 Data Relocate (DR) point loads, stores, and moves. 1 The processor can execute floating-point 0 Data address translation is disabled. instructions. Effective Address Overflow (EAO) (see Book I) does not occur. 51 Machine Check Interrupt Enable (ME) 1 Data address translation is enabled. EAO 0 Machine Check interrupts are disabled. causes a Data Storage interrupt. 1 Machine Check interrupts are enabled. Programming Note This bit is a hypervisor resource; see Chapter 2., "Logical Partitioning (LPAR)", on page 397. See the Programming Note in the defini- tion of MSRPR. Programming Note 60 Reserved The only instructions that can alter MSRME are rfid and hrfid. 61 Performance Monitor Mark (PMM) [Category: Server.Performance Monitor] See Appendix B of Book III-S. 52 Floating-Point Exception Mode 0 (FE0) 62 Recoverable Interrupt (RI) [Category: Floating-Point] 0 Interrupt is not recoverable. See below. 1 Interrupt is recoverable. 53 Single-Step Trace Enable (SE) Additional information about the use of this bit [Category: Trace] is given in Sections 6.4.3, "Interrupt Process- 0 The processor executes instructions nor- ing" on page 463, 6.5.1, "System Reset Inter- mally. rupt" on page 466, and 6.5.2, "Machine Check 1 The processor generates a Single-Step Interrupt" on page 467. type Trace interrupt after successfully 63 Little-Endian Mode (LE) completing the execution of the next instruction, unless that instruction is hrfid 0 The processor is in Big-Endian mode. or rfid, which are never traced. Successful 1 The processor is in Little-Endian mode. completion means that the instruction caused no other interrupt. Programming Note 54 Branch Trace Enable (BE) The only instructions that can alter MSRLE [Category: Trace] are rfid and hrfid. 0 The processor executes branch instruc- The Floating-Point Exception Mode bits FE0 and FE1 tions normally. are interpreted as shown below. For further details see 1 The processor generates a Branch type Book I. Trace interrupt after completing the execu- tion of a branch instruction, whether or not FE0 FE1 Mode the branch is taken. 0 0 Ignore Exceptions Branch tracing need not be supported on all 0 1 Imprecise Nonrecoverable implementations that support the Trace cate- 1 0 Imprecise Recoverable gory. If the function is not implemented, this bit 1 1 Precise is treated as reserved. 55 Floating-Point Exception Mode 1 (FE1) [Category: Floating-Point] See below. 56:57 Reserved 402 Power ISATM -- Book III-S Version 2.04 Chapter 3. Branch Processor 403 Version 2.04 3.3 Branch Processor Instructions 3.3.1 System Linkage Instructions These instructions provide the means by which a pro- The System Call instruction is described in Book I, but gram can call upon the system to perform a service, only at the level required by an application programmer. and by which the system can return from performing a A complete description of this instruction appears service or from processing an interrupt. below. System Call SC-form Programming Note sc LEV sc serves as both a basic and an extended mne- monic. The Assembler will recognize an sc mne- 17 /// /// // LEV // 1 / monic with one operand as the basic form, and an 0 6 11 16 20 27 30 31 sc mnemonic with no operand as the extended form. In the extended form the LEV operand is omitted and assumed to be 0. SRR0 1iea CIA + 4 SRR133:36 42:47 1 0 SRR10:32 37:41 48:63 1 MSR0:32 37:41 48:63 MSR 1 new_value (see below) NIA 1 0x0000_0000_0000_0C00 The effective address of the instruction following the System Call instruction is placed into SRR0. Bits 0:32, 37:41, and 48:63 of the MSR are placed into the corre- sponding bits of SRR1, and bits 33:36 and 42:47 of SRR1 are set to zero. Then a System Call interrupt is generated. The inter- rupt causes the MSR to be set as described in Section 6.5, "Interrupt Definitions" on page 466. The setting of the MSR is affected by the contents of the LEV field. LEV values greater than 1 are reserved. Bits 0:5 of the LEV field (instruction bits 20:25) are treated as a reserved field. The interrupt causes the next instruction to be fetched from effective address 0x0000_0000_0000_0C00. This instruction is context synchronizing. Special Registers Altered: SRR0 SRR1 MSR Programming Note If LEV=1 the hypervisor is invoked. If LPES1=1, executing this instruction with LEV=1 is the only way that executing an instruction can cause hypervisor state to be entered. Because this instruction is not privileged, it is possi- ble for application software to invoke the hypervi- sor. However, such invocation should be considered a programming error. 404 Power ISATM -- Book III-S Version 2.04 Return From Interrupt Doubleword Hypervisor Return From Interrupt XL-form Doubleword XL-form rfid hrfid 19 /// /// /// 18 / 19 /// /// /// 274 / 0 6 11 16 21 31 0 6 11 16 21 31 MSR51 1 (MSR3 & SRR151) | ((¬MSR3) & MSR51) MSR48 1 HSRR148 | HSRR149 MSR3 1 MSR3 & SRR13 MSR58 1 HSRR158 | HSRR149 MSR48 1 SRR148 | SRR149 MSR59 1 HSRR159 | HSRR149 MSR58 1 SRR158 | SRR149 MSR0:32 37:41 49:57 60:63 1 HSRR10:32 37:41 49:57 60:63 MSR59 1 SRR159 | SRR149 NIA 1iea HSRR00:61 || 0b00 MSR0:2 4:32 37:41 49:50 52:57 60:631SRR10:2 4:32 37:41 49:50 52:57 60:63 NIA 1iea SRR00:61 || 0b00 The result of ORing bits 48 and 49 of HSRR1 is placed into MSR48. The result of ORing bits 58 and 49 of If MSR3=1 then bits 3 and 51 of SRR1 are placed into HSRR1 is placed into MSR58. The result of ORing bits the corresponding bits of the MSR. The result of ORing 59 and 49 of HSRR1 is placed into MSR59. Bits 0:32, bits 48 and 49 of SRR1 is placed into MSR48. The 37:41, 49:57, and 60:63 of HSRR1 are placed into the result of ORing bits 58 and 49 of SRR1 is placed into corresponding bits of the MSR. MSR58. The result of ORing bits 59 and 49 of SRR1 is placed into MSR59. Bits 0:2, 4:32, 37:41, 49:50, 52:57, If the new MSR value does not enable any pending and 60:63 of SRR1 are placed into the corresponding exceptions, then the next instruction is fetched, under bits of the MSR. control of the new MSR value, from the address HSRR00:61 || 0b00 (when SF=1 in the new MSR value) If the new MSR value does not enable any pending or 320 || HSRR032:61 || 0b00 (when SF=0 in the new exceptions, then the next instruction is fetched, under MSR value). If the new MSR value enables one or more control of the new MSR value, from the address pending exceptions, the interrupt associated with the SRR00:61 || 0b00 (when SF=1 in the new MSR value) highest priority pending exception is generated; in this or 320 || SRR032:61 || 0b00 (when SF=0 in the new MSR case the value placed into SRR0 or HSRR0 by the value). If the new MSR value enables one or more interrupt processing mechanism (see Section 6.4.3) is pending exceptions, the interrupt associated with the the address of the instruction that would have been highest priority pending exception is generated; in this executed next had the interrupt not occurred. case the value placed into SRR0 or HSRR0 by the interrupt processing mechanism (see Section 6.4.3) is This instruction is privileged and context synchronizing, the address of the instruction that would have been and can be executed only in hypervisor state. If it is executed next had the interrupt not occurred. executed in privileged but non-hypervisor state either a Privileged Instruction type Program interrupt occurs or This instruction is privileged and context synchronizing. the results are boundedly undefined. Special Registers Altered: Special Registers Altered: MSR MSR Programming Note Programming Note If this instruction sets MSRPR to 1, it also sets If this instruction sets MSRPR to 1, it also sets MSREE, MSRIR, and MSRDR to 1. MSREE, MSRIR, and MSRDR to 1. Chapter 3. Branch Processor 405 Version 2.04 406 Power ISATM -- Book III-S Version 2.04 Chapter 4. Fixed-Point Processor 4.1 Fixed-Point Processor Overview. . 407 4.4 Fixed-Point Processor Instructions 410 4.2 Special Purpose Registers . . . . . . 407 4.4.1 Fixed-Point Storage Access Instruc- 4.3 Fixed-Point Processor Registers . 407 tions [Category: Load/Store Quadword] . . 4.3.1 Processor Version Register . . . . 407 410 4.3.2 Processor Identification Register 407 4.4.2 OR Instruction . . . . . . . . . . . . . . 411 4.3.3 Control Register. . . . . . . . . . . . . 408 4.4.3 Move To/From System Register 4.3.4 Program Priority Register . . . . . 408 Instructions . . . . . . . . . . . . . . . . . . . . . 411 4.3.5 Software-use SPRs . . . . . . . . . . 409 4.1 Fixed-Point Processor Over- The PVR distinguishes between processors that differ in attributes that may affect software. It contains two view fields. Version A 16-bit number that identifies the version This chapter describes the details concerning the regis- of the processor. Different version numbers ters and the privileged instructions implemented in the indicate major differences between proces- Fixed-Point Processor that are not covered in Book I. sors, such as which categories are sup- ported. 4.2 Special Purpose Registers Revision A 16-bit number that distinguishes between implementations of the version. Different Special Purpose Registers (SPRs) are read and written revision numbers indicate minor differences using the mfspr (page 414) and mtspr (page 413) between processors having the same ver- instructions. Most SPRs are defined in other chapters sion number, such as clock rate and Engi- of this book; see the index to locate those definitions. neering Change level. Version numbers are assigned by the Power ISA pro- 4.3 Fixed-Point Processor Reg- cess. Revision numbers are assigned by an implemen- tation-defined process. isters 4.3.2 Processor Identification 4.3.1 Processor Version Register Register The Processor Version Register (PVR) is a 32-bit read- The Processor Identification Register (PIR) is a 32-bit only register that contains a value identifying the ver- register that contains a value that can be used to distin- sion and revision level of the processor. The contents guish the processor from other processors in the sys- of the PVR can be copied to a GPR by the mfspr tem. The contents of the PIR can be copied to a GPR instruction. Read access to the PVR is privileged; write by the mfspr instruction. Read access to the PIR is access is not provided. Version Revision 32 48 63 Figure 6. Processor Version Register Chapter 4. Fixed-Point Processor 407 Version 2.04 privileged; write access, if provided, is implementation- 4.3.3 Control Register dependent. The Control Register (CTRL) is a 32-bit register that PROCID controls an external I/O pin. This signal may be used 32 63 for the following: 1 driving the RUN Light on a system operator panel Bits Name Description 1 External interrupt routing 0:31 PROCID Processor ID 1 Performance Monitor Counter incrementing (see Appendix B) Figure 7. Processor Identification Register /// RUN The means by which the PIR is initialized are imple- 32 63 mentation-dependent. The PIR is a hypervisor resource; see Chapter 2. Bit Name Description 63 RUN Run state bit All other fields are implementation-dependent. Figure 8. Control Register The CTRL RUN can be used by the operating system to indicate when the processor is doing useful work. The contents of the CTRL can be written by the mtspr instruction and read by the mfspr instruction. Write access to the CTRL is privileged. Reads can be per- formed in privileged or problem state. 4.3.4 Program Priority Register The Program Priority Register (PPR) is a 64-bit register that controls the program's priority. The layout of the PPR is shown in Figure 9. A subset of the PRI values may be set by problem state programs (see Section 3.2.3 of Book I). /// PRI /// imp-specific 0 11 14 44 63 Bit(s) Description 11:13 Program Priority (PRI) 001 very low 010 low 011 medium low 100 medium (normal) 101 medium high 110 high 111 very high 44:63 Implementation-specific Figure 9. Program Priority Register 408 Power ISATM -- Book III-S Version 2.04 4.3.5 Software-use SPRs Software-use SPRs are 64-bit registers provided for use by software. SPRG0 SPRG1 SPRG2 SPRG3 0 63 Figure 10. Software-use SPRs SPRG0, SPRG1, and SPRG2 are privileged registers. SPRG3 is a privileged register except that the contents may be copied to a GPR in Problem state when accessed using the mfspr instruction. Programming Note Neither the contents of the SPRGs, nor accessing them using mtspr or mfspr, has a side effect on the operation of the processor. One or more of the registers is likely to be needed by non-hypervisor interrupt handler programs (e.g., as scratch regis- ters and/or pointers to per processor save areas). Operating systems must ensure that no sensitive data are left in SPRG3 when a problem state pro- gram is dispatched, and operating systems for secure systems must ensure that SPRG3 cannot be used to implement a "covert channel" between problem state programs. These requirements can be satisfied by clearing SPRG3 before passing control to a program that will run in problem state. HSPRG0 and HSPRG1 are 64-bit registers provided for use by hypervisor programs. HSPRG0 HSPRG1 0 63 Figure 11. SPRs for use by hypervisor programs Programming Note Neither the contents of the HSPRGs, nor accessing them using mtspr or mfspr, has a side effect on the operation of the processor. One or more of the registers is likely to be needed by hypervisor inter- rupt handler programs (e.g., as scratch registers and/or pointers to per processor save areas). Chapter 4. Fixed-Point Processor 409 Version 2.04 4.4 Fixed-Point Processor Instructions 4.4.1 Fixed-Point Storage Access Instructions [Category: Load/Store Quadword] Load Quadword DQ-form Store Quadword DS-form lq RT,DQ(RA) stq RS,DS(RA) 56 RT RA DQ // 62 RS RA DS 2 0 6 11 16 28 31 0 6 11 16 30 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(DQ || 0b0000) EA 1 b + EXTS(DS || 0b00) RT 1 MEM(EA, 8) MEM(EA, 8) 1 RS GPR(RT+1) 1 MEM(EA+8, 8) MEM(EA+8, 8) 1 GPR(RS+1) Let the effective address (EA) be the sum (RA|0)+ Let the effective address (EA) be the sum (RA|0)+ (DQ||0b0000). The quadword in storage addressed by (DS||0b00). (RS) and (RS+1) are stored into the quad- EA is loaded into registers RT and RT+1, in increasing word in storage addressed by EA, in increasing order of order of storage address and register number. storage address and register number. EA must be a multiple of 16. If it is not, an Alignment EA must be a multiple of 16. If it is not, an Alignment interrupt occurs. interrupt occurs. If RT is odd or RT=RA, the instruction form is invalid. If If RS is odd, the instruction form is invalid. RT=RA, an attempt to execute this instruction causes This instruction is not supported in Little-Endian mode. an Illegal Instruction type Program interrupt. (The Execution of this instruction in Little-Endian mode RT=RA case includes the case of RT=RA=0.) causes either an Alignment interrupt or the results are This instruction is not supported in Little-Endian mode. boundedly undefined. Execution of this instruction in Little-Endian mode This instruction is privileged. causes either an Alignment interrupt or the results are boundedly undefined. This instruction is privileged. Special Registers Altered: None Special Registers Altered: None 410 Power ISATM -- Book III-S Version 2.04 4.4.2 OR Instruction 4.4.3 Move To/From System Reg- or Rx,Rx,Rx can be used to set PPRPRI (see Section ister Instructions 4.3.4) as shown in Figure 12. PPRPRI remains The Move To Special Purpose Register and Move From unchanged if the privilege state of the processor exe- Special Purpose Register instructions are described in cuting the instruction is lower than the privilege indi- Book I, but only at the level available to an application cated in the figure. (The encodings available to programmer. For example, no mention is made there of application programs are also shown in Book I.) registers that can be accessed only in privileged state. The descriptions of these instructions given below Rx PPRPRI Priority Privileged extend the descriptions given in Book I, but do not list 31 001 very low yes Special Purpose Registers that are implementation- 1 010 low no dependent. In the descriptions of these instructions given below, the "defined" SPR numbers are the SPR 6 011 medium low no numbers shown in the figure for the instruction and the 2 100 medium (normal) no implementation-specific SPR numbers that are imple- 5 101 medium high yes mented, and similarly for "defined" registers. 3 110 high yes 7 111 very high hypv Extended mnemonics Extended mnemonics are provided for the mtspr and Figure 12. Priority levels for or Rx,Rx,Rx mfspr instructions so that they can be coded with the SPR name as part of the mnemonic rather than as a numeric operand. See Appendix A, "Assembler Extended Mnemonics" on page 493. Chapter 4. Fixed-Point Processor 411 Version 2.04 SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 1 00000 00001 XER no no 64 B 8 00000 01000 LR no no 64 B 9 00000 01001 CTR no no 64 B 18 00000 10010 DSISR yes yes 32 S 19 00000 10011 DAR yes yes 64 S 22 00000 10110 DEC yes yes 32 B 25 00000 11001 SDR1 hypv3 yes 64 S 26 00000 11010 SRR0 yes yes 64 B 27 00000 11011 SRR1 yes yes 64 B 29 00000 11101 AMR yes yes 64 S 136 00100 01000 CTRL - no 32 S 152 00100 11000 CTRL yes - 32 S 256 01000 00000 VRSAVE no no 32 V 259 01000 00011 SPRG3 - no 64 B 268 01000 01100 TB - no 64 B 269 01000 01100 TBU - no 32 B 272-275 01000 100xx SPRG[0-3] yes yes 64 B 282 01000 11010 EAR hypv3 yes 32 EC 284 01000 11100 TBL hypv3 - 32 B 285 01000 11101 TBU hypv3 - 32 B 286 01000 11110 TBU40 hypv - 64 S 287 01000 11111 PVR - yes 32 B 304 01001 10000 HSPRG0 hypv3 hypv3 64 S 305 01001 10001 HSPRG1 hypv3 hypv3 64 S 306 01001 10010 HDSISR hypv3 hypv3 32 B 307 01001 10011 HDAR hypv3 hypv3 64 B 309 01001 10101 PURR hypv3 yes 64 S 310 01001 10110 HDEC hypv3 yes 32 S 312 01001 11000 RMOR hypv3 hypv3 64 S 313 01001 11001 HRMOR hypv3 hypv3 64 S 314 01001 11010 HSRR0 hypv3 hypv3 64 S 315 01001 11011 HSRR1 hypv3 hypv3 64 S 318 01001 11110 LPCR hypv3 hypv3 64 S 319 01001 11111 LPIDR hypv3 hypv3 32 S 768-783 11000 0xxxx perf_mon - no 64 S.PM 784-799 11000 1xxxx perf_mon yes yes 64 S.PM 896 11100 00000 PPR no no 64 S 1013 11111 10101 DABR hypv3 yes 64 S 1015 11111 10111 DABRX hypv3 yes 64 S 1023 11111 11111 PIR - yes 32 S - This register is not defined for this instruction. 1 Note that the order of the two 5-bit halves of the SPR number is reversed. 2 See Section 1.3.5 of Book I. 3 This register is a hypervisor resource, and can be modified by this instruc- tion only in hypervisor state (see Chapter 2). All SPR numbers that are not shown above and are not implementation- specific are reserved. Figure 13. SPR encodings 412 Power ISATM -- Book III-S Version 2.04 Move To Special Purpose Register XFX-form mtspr SPR,RS 31 RS spr 467 / 0 6 11 21 31 n 1 spr5:9 || spr0:4 if length(SPR(n)) = 64 then SPR(n) 1 (RS) else SPR(n) 1 (RS)32:63 The SPR field denotes a Special Purpose Register, encoded as shown in Figure 13. The contents of regis- ter RS are placed into the designated Special Purpose Register. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RS are placed into the SPR. For this instruction, SPRs TBL and TBU are treated as separate 32-bit registers; setting one leaves the other unaltered. spr0=1 if and only if writing the register is privileged. Execution of this instruction specifying a defined and privileged register when MSRPR=1 causes a Privileged Instruction type Program interrupt. Execution of this instruction specifying a hypervisor resource when MSRHV PR = 0b00 either has no effect or causes a Priv- ileged Instruction type Program interrupt (Chapter 2., "Logical Partitioning (LPAR)", on page 397). Execution of this instruction specifying an SPR number that is not defined for the implementation causes either an Illegal Instruction type Program interrupt or one of the following. 1 if spr0=0: boundedly undefined results 1 if spr0=1: - if MSRPR=1: Privileged Instruction type Pro- gram interrupt - if MSRPR=0 and MSRHV=0: boundedly unde- fined results - if MSRPR=0 and MSRHV=1: undefined results If the SPR number is set to a value that is shown in Figure 13 but corresponds to an optional Special Pur- pose Register that is not provided by the implementa- tion, the effect of executing this instruction is the same as if the SPR number were reserved. Special Registers Altered: See Figure 13 Programming Note For a discussion of software synchronization requirements when altering certain Special Pur- pose Registers, see Chapter 10. "Synchronization Requirements for Context Alterations" on page 489. Chapter 4. Fixed-Point Processor 413 Version 2.04 Move From Special Purpose Register XFX-form mfspr RT,SPR 31 RT spr 339 / 0 6 11 21 31 n 1 spr5:9 || spr0:4 if length(SPR(n)) = 64 then RT 1 SPR(n) else RT 1 320 || SPR(n) The SPR field denotes a Special Purpose Register, encoded as shown in Figure 13. The contents of the designated Special Purpose Register are placed into register RT. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RT receive the con- tents of the Special Purpose Register and the high- order 32 bits of RT are set to zero. spr0=1 if and only if reading the register is privileged. Execution of this instruction specifying a defined and privileged register when MSRPR=1 causes a Privileged Instruction type Program interrupt. Execution of this instruction specifying an SPR number that is not defined for the implementation causes either an Illegal Instruction type Program interrupt or one of the following. 1 if spr0=0: boundedly undefined results 1 if spr0=1: - if MSRPR=1: Privileged Instruction type Pro- gram interrupt - if MSRPR=0: boundedly undefined results If the SPR field contains a value that is shown in Figure 13 but corresponds to an optional Special Pur- pose Register that is not provided by the implementa- tion, the effect of executing this instruction is the same as if the SPR number were reserved. Special Registers Altered: None Note See the Notes that appear with mtspr. 414 Power ISATM -- Book III-S Version 2.04 Move To Machine State Register X-form Programming Note mtmsr RS,L If MSREE=0 and an External or Decrementer exception is pending, executing an mtmsr instruc- 31 RS /// L /// 146 / tion that sets MSREE to 1 will cause the External or 0 6 11 15 16 21 31 Decrementer interrupt to occur before the next instruction is executed, if no higher priority excep- tion exists (see Section 6.8, "Interrupt Priorities" on if L = 0 then page 479). Similarly, if a Hypervisor Decrementer MSR48 1 (RS)48 | (RS)49 MSR58 1 (RS)58 | (RS)49 interrupt is pending, execution of the instruction by MSR59 1 (RS)59 | (RS)49 the hypervisor causes a Hypervisor Decrementer MSR32:47 49:50 52:57 60:62 1(RS)32:47 49:50 52:57 60:62 interrupt to occur if HDICE=1. else For a discussion of software synchronization MSR48 62 1 (RS)48 62 requirements when altering certain MSR bits, see The MSR is set based on the contents of register RS Chapter 10. and of the L field. L=0: Programming Note The result of ORing bits 48 and 49 of register RS is mtmsr serves as both a basic and an extended placed into MSR48. The result of ORing bits 58 mnemonic. The Assembler will recognize an and 49 of register RS is placed into MSR58. The mtmsr mnemonic with two operands as the basic result of ORing bits 59 and 49 of register RS is form, and an mtmsr mnemonic with one operand placed into MSR59. Bits 32:47, 49:50, 52:57, and as the extended form. In the extended form the L 60:62 of register RS are placed into the corre- operand is omitted and assumed to be 0. sponding bits of the MSR. L=1: Programming Note Bits 48 and 62 of register RS are placed into the There is no need for an analogous version of the corresponding bits of the MSR. The remaining bits mfmsr instruction, because the existing instruction of the MSR are unchanged. copies the entire contents of the MSR to the selected GPR. This instruction is privileged. If L=0 this instruction is context synchronizing. If L=1 this instruction is execution synchronizing; in addition, the alterations of the EE and RI bits take effect as soon as the instruction completes. Special Registers Altered: MSR Except in the mtmsr instruction description in this sec- tion, references to "mtmsr" in this document imply either L value unless otherwise stated or obvious from context (e.g., a reference to an mtmsr instruction that modifies an MSR bit other than the EE or RI bit implies L=0). Programming Note If this instruction sets MSRPR to 1, it also sets MSREE, MSRIR, and MSRDR to 1. This instruction does not alter MSRME or MSRLE. (This instruction does not alter MSRHV because it does not alter any of the high-order 32 bits of the MSR.) If the only MSR bits to be altered are MSREE RI, to obtain the best performance L=1 should be used. Chapter 4. Fixed-Point Processor 415 Version 2.04 Move To Machine State Register Programming Note Doubleword X-form If MSREE=0 and an External or Decrementer mtmsrd RS,L exception is pending, executing an mtmsrd instruction that sets MSREE to 1 will cause the 31 RS /// L /// 178 / External or Decrementer interrupt to occur before 0 6 11 15 16 21 31 the next instruction is executed, if no higher priority exception exists (see Section 6.8, "Interrupt Priori- ties" on page 479). Similarly, if a Hypervisor Decre- if L = 0 then menter interrupt is pending, execution of the instruction by the hypervisor causes a Hypervisor MSR48 1 (RS)48 | (RS)49 MSR58 1 (RS)58 | (RS)49 Decrementer interrupt to occur if HDICE=1. MSR59 1 (RS)59 | (RS)49 For a discussion of software synchronization MSR0:2 4:47 49:50 52:57 60:62 1 (RS)0:2 4:47 49:50 52:57 60:62 requirements when altering certain MSR bits, see else MSR48 62 1 (RS)48 62 Chapter 10. The MSR is set based on the contents of register RS and of the L field. Programming Note mtmsrd serves as both a basic and an extended L=0: mnemonic. The Assembler will recognize an The result of ORing bits 48 and 49 of register RS is mtmsrd mnemonic with two operands as the basic placed into MSR48. The result of ORing bits 58 and form, and an mtmsrd mnemonic with one operand 49 of register RS is placed into MSR58. The result as the extended form. In the extended form the L of ORing bits 59 and 49 of register RS is placed operand is omitted and assumed to be 0. into MSR59. Bits 0:2, 4:47, 49:50, 52:57, and 60:62 of register RS are placed into the corresponding bits of the MSR. L=1: Bits 48 and 62 of register RS are placed into the corresponding bits of the MSR. The remaining bits of the MSR are unchanged. This instruction is privileged. If L=0 this instruction is context synchronizing. If L=1 this instruction is execution synchronizing; in addition, the alterations of the EE and RI bits take effect as soon as the instruction completes. Special Registers Altered: MSR Except in the mtmsrd instruction description in this section, references to "mtmsrd" in this document imply either L value unless otherwise stated or obvious from context (e.g., a reference to an mtmsrd instruction that modifies an MSR bit other than the EE or RI bit implies L=0). Programming Note If this instruction sets MSRPR to 1, it also sets MSREE, MSRIR, and MSRDR to 1. This instruction does not alter MSRLE, MSRME or MSRHV. If the only MSR bits to be altered are MSREE RI, to obtain the best performance L=1 should be used. 416 Power ISATM -- Book III-S Version 2.04 Move From Machine State Register X-form mfmsr RT 31 RT /// /// 83 / 0 6 11 16 21 31 RT 1 MSR The contents of the MSR are placed into register RT. This instruction is privileged. Special Registers Altered: None Chapter 4. Fixed-Point Processor 417 Version 2.04 418 Power ISATM -- Book III-S Version 2.04 Chapter 5. Storage Control 5.1 Overview. . . . . . . . . . . . . . . . . . . . 419 5.7.7.3 Page Table Search . . . . . . . . . 433 5.2 Storage Exceptions. . . . . . . . . . . . 420 5.7.8 Reference and Change Recording. . 5.3 Instruction Fetch . . . . . . . . . . . . . 420 435 5.3.1 Implicit Branch . . . . . . . . . . . . . . 420 5.7.9 Storage and Virtual Page Class Key 5.3.2 Address Wrapping Combined with Protection . . . . . . . . . . . . . . . . . . . . . . 437 Changing MSR Bit SF . . . . . . . . . . . . . 420 5.7.9.1 Virtual Page Class Key Protection 5.4 Data Access . . . . . . . . . . . . . . . . . 420 437 5.5 Performing Operations 5.7.9.2 Storage Protection, Address Trans- Out-of-Order . . . . . . . . . . . . . . . . . . . . 420 lation Enabled . . . . . . . . . . . . . . . . . . . 438 5.6 Invalid Real Address . . . . . . . . . . . 421 5.7.9.3 Storage Protection, Address Trans- 5.7 Storage Addressing . . . . . . . . . . . 422 lation Disabled . . . . . . . . . . . . . . . . . . . 439 5.7.1 32-Bit Mode . . . . . . . . . . . . . . . . 422 5.8 Storage Control Attributes . . . . . . . 440 5.7.2 Virtualized Partition Memory (VPM) 5.8.1 Guarded Storage . . . . . . . . . . . . 440 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . 422 5.8.1.1 Out-of-Order Accesses to Guarded 5.7.3 Real And Virtual Real Addressing Storage . . . . . . . . . . . . . . . . . . . . . . . . 440 Modes . . . . . . . . . . . . . . . . . . . . . . . . . 422 5.8.2 Storage Control Bits . . . . . . . . . . 440 5.7.3.1 Hypervisor Offset Real Mode 5.8.2.1 Storage Control Bit Restrictions . . Address . . . . . . . . . . . . . . . . . . . . . . . . 423 441 5.7.3.2 Offset Real Mode Address . . . 423 5.8.2.2 Altering the Storage Control Bits . . 5.7.3.3 Storage Control Attributes for 441 Accesses in Real and Hypervisor Real 5.9 Storage Control Instructions . . . . . 442 Addressing Modes . . . . . . . . . . . . . . . 424 5.9.1 Cache Management Instructions 442 5.7.3.3.1 Hypervisor Real Mode Storage 5.9.2 Synchronize Instruction . . . . . . . 442 Control . . . . . . . . . . . . . . . . . . . . . . . . 424 5.9.3 Lookaside Buffer 5.7.3.4 Virtual Real Mode Addressing Management . . . . . . . . . . . . . . . . . . . . 442 Mechanism. . . . . . . . . . . . . . . . . . . . . 424 5.9.3.1 SLB Management Instructions 443 5.7.3.5 Storage Control Attributes for 5.9.3.2 Bridge to SLB Architecture [Cate- Implicit Storage Accesses . . . . . . . . . . 425 gory:Server.Phased-Out] . . . . . . . . . . . 447 5.7.4 Address Ranges Having Defined 5.9.3.2.1 Segment Register Uses . . . . . . . . . . . . . . . . . . . . . . . . . . 426 Manipulation Instructions. . . . . . . . . . . 447 5.7.5 Address Translation Overview . . 427 5.9.3.3 TLB Management Instructions 450 5.7.6 Virtual Address Generation . . . . 427 5.10 Page Table Update Synchronization 5.7.6.1 Segment Lookaside Buffer (SLB) Requirements . . . . . . . . . . . . . . . . . . . 454 427 5.10.1 Page Table Updates . . . . . . . . . 454 5.7.6.2 SLB Search . . . . . . . . . . . . . . 428 5.10.1.1 Adding a Page Table Entry . . 455 5.7.7 Virtual to Real Translation . . . . . 430 5.10.1.2 Modifying a Page Table Entry 456 5.7.7.1 Page Table . . . . . . . . . . . . . . . 431 5.10.1.3 Deleting a Page Table Entry . 457 5.7.7.2 Storage Description Register 1 . . . . . . . . . . . . . . . . . . . . . . 433 5.1 Overview address computed by the processor when it executes a Load, Store, Branch, or Cache Management instruc- A program references storage using the effective tion, or when it fetches the next sequential instruction. Chapter 5. Storage Control 419 Version 2.04 The effective address is translated to a real address 5.3.2 Address Wrapping Com- according to procedures described in Section 5.7.3, in Section 5.7.5 and in the following sections. The real bined with Changing MSR Bit SF address is what is presented to the storage subsystem. If the current instruction is at effective address 232 - 4 For a complete discussion of storage addressing and and is an mtmsrd instruction that changes the contents effective address calculation, see Section 1.10 of Book of MSRSF, the effective address of the next sequential I. instruction is undefined. Programming Note 5.2 Storage Exceptions In the case described in the preceding paragraph, if an interrupt occurs before the next sequential A storage exception results when the sequential execu- instruction is executed, the contents of SRR0, or tion model requires that a storage access be performed HSRR0, as appropriate to the interrupt, are unde- but the access is not permitted (e.g., is not permitted by fined. the storage protection mechanism), the access cannot be performed because the effective address cannot be translated to a real address, or the access matches some tracking mechanism criteria (e.g., Data Address 5.4 Data Access Breakpoint). Data accesses are controlled by MSRDR. In certain cases a storage exception may result in the "restart" of (re-execution of at least part of) a Load or MSRDR=0 Store instruction. See Section 2.1 of Book II, and Sec- The effective address of the data is interpreted as tion 6.6 in this Book. described in Section 5.7.3. MSRDR=1 5.3 Instruction Fetch The effective address of the data is translated by the Address Translation mechanism described in Instructions are fetched under control of MSRIR. Section 5.7.5. MSRIR=0 The effective address of the instruction is inter- preted as described in Section 5.7.3. 5.5 Performing Operations MSRIR=1 Out-of-Order The effective address of the instruction is trans- An operation is said to be performed "in-order" if, at the lated by the Address Translation mechanism time that it is performed, it is known to be required by described beginning in Section 5.7.5. the sequential execution model. An operation is said to be performed "out-of-order" if, at the time that it is per- formed, it is not known to be required by the sequential 5.3.1 Implicit Branch execution model. Explicitly altering certain MSR bits (using mtmsr[d]), or Operations are performed out-of-order by the proces- explicitly altering SLB entries, Page Table Entries, or sor on the expectation that the results will be needed by certain System Registers (including the HRMOR, and an instruction that will be required by the sequential possibly other implementation-dependent registers), execution model. Whether the results are really needed may have the side effect of changing the addresses, is contingent on everything that might divert the control effective or real, from which the current instruction flow away from the instruction, such as Branch, Trap, stream is being fetched. This side effect is called an System Call, and Return From Interrupt instructions, implicit branch. For example, an mtmsrd instruction and interrupts, and on everything that might change the that changes the value of MSRSF may change the context in which the instruction is executed. effective addresses from which the current instruction stream is being fetched. The MSR bits and System Typically, the processor performs operations out-of- Registers (excluding implementation-dependent regis- order when it has resources that would otherwise be ters) for which alteration can cause an implicit branch idle, so the operation incurs little or no cost. If subse- are indicated as such in Chapter 10. "Synchronization quent events such as branches or interrupts indicate Requirements for Context Alterations" on page 489. that the operation would not have been performed in Implicit branches are not supported by the Power ISA. the sequential execution model, the processor aban- If an implicit branch occurs, the results are boundedly dons any results of the operation (except as described undefined. below). 420 Power ISATM -- Book III-S Version 2.04 In the remainder of this section, including its subsec- Programming Note tions, "Load instruction" includes the Cache Manage- ment and other instructions that are stated in the In configurations supporting multiple partitions, instruction descriptions to be "treated as a Load", and hypervisor software must ensure that a storage similarly for "Store instruction". access by a program in one partition will not cause a Checkstop or other system-wide event that could A data access that is performed out-of-order may corre- affect the integrity of other partitions (see Chapter spond to an arbitrary Load or Store instruction (e.g., a 2). For example, such an event could occur if a real Load or Store instruction that is not in the instruction address placed in a Page Table Entry or made stream being executed). Similarly, an instruction fetch accessible to a partition using the Offset Real that is performed out-of-order may be for an arbitrary Mode Address mechanism (see Section 5.7.3.3) instruction (e.g., the aligned word at an arbitrary loca- does not exist. tion in instruction storage). Most operations can be performed out-of-order, as long as the machine appears to follow the sequential execu- tion model. Certain out-of-order operations are restricted, as follows. 1 Stores Stores are not performed out-of-order (even if the Store instructions that caused them were executed out-of-order). 1 Accessing Guarded Storage The restrictions for this case are given in Section 5.8.1.1. The only permitted side effects of performing an opera- tion out-of-order are the following. 1 A Machine Check or Checkstop that could be caused by in-order execution may occur out-of- order, except as described in Section 5.7.3.3.1 for the Real Mode Storage Control facility. 1 On implementations which support Reference and Change bits, these bits may be set as described in Section 5.7.8. 1 Non-Guarded storage locations that could be fetched into a cache by in-order fetching or execu- tion of an arbitrary instruction may be fetched out- of-order into that cache. 5.6 Invalid Real Address A storage access (including an access that is per- formed out-of-order; see Section 5.5) may cause a Machine Check if the accessed storage location con- tains an uncorrectable error or does not exist. In the case that the accessed storage location does not exist, the Checkstop state may be entered. See Section 6.5.2 on page 467. Chapter 5. Storage Control 421 Version 2.04 5.7 Storage Addressing Storage Control Overview Programming Note 1 Real address space size is 2m bytes, m60; see Treating the high-order 32 bits of the effective Note 1. address as zeros effectively truncates the 64-bit 1 Real page size is 212 bytes (4 KB). effective address to a 32-bit effective address such as would have been generated on a 32-bit imple- 1 Effective address space size is 264 bytes. mentation of the Power ISA. Thus, for example, the 1 An effective address is translated to a virtual ESID in 32-bit mode is the high-order four bits of address via the Segment Lookaside Buffer (SLB). this truncated effective address; the ESID thus lies - Virtual address space size is 2n bytes, in the range 0-15. When address translation is 65n78; see Note 2. enabled, these four bits would select a Segment - Segment size is 2s bytes, s=28 or 40. Register on a 32-bit implementation of the Power - 2n-40 number of virtual segments 2n-28; ISA. The SLB entries that translate these 16 ESIDs see Note 2. can be used to emulate these Segment Registers. - Virtual page size is 2p bytes, where 12p, and 2p is no larger than either the size of the big- gest segment or the real address space; a 5.7.2 Virtualized Partition Mem- size of 4KB, 64 KB, and an implementation- ory (VPM) Mode dependent number of other sizes are sup- ported; see Note 3. VPM mode enables the hypervisor to reassign all or - Segments contain pages of a single size or a part of a partition's memory transparently so that the mixture of 4KB and 64KB pages reassignment is not visible to the partition. When this is 1 A virtual address is translated to a real address via done, the partition's memory is said to be "virtualized." the Page Table. The VPM field in the LPCR enables VPM mode sepa- rately when address translation is enabled and when Notes: translation is disabled. 1. The value of m is implementation-dependent (sub- If the processor is not in hypervisor state, and either ject to the maximum given above). When used to address translation is enabled and VPM1=1, or address address storage, the high-order 60-m bits of the translation is disabled and VPM0=1, conditions that "60-bit" real address must be zeros. would have caused a Data Storage or an Instruction 2. The value of n is implementation-dependent (sub- Storage interrupt if the affected memory were not virtu- ject to the range given above). In references to 78- alized instead cause a Hypervisor Data Storage or a bit virtual addresses elsewhere in this Book, the Hypervisor Instruction Storage interrupt respectively. high-order 78-n bits of the "78-bit" virtual address Because the Hypervisor Data Storage and Hypervisor are assumed to be zeros. Instruction Storage interrupts always put the processor in hypervisor state, they permit the hypervisor to handle 3. The supported values of p for the larger virtual the condition if appropriate (e.g., to restore the contents page sizes are implementation-dependent (subject of a page that was reassigned), and to reflect it to the to the limitations given above). operating system's Data Storage or Instruction Storage interrupt handler otherwise. 5.7.1 32-Bit Mode When address translation is enabled, VPM mode has no effect on address translation. When address transla- The computation of the 64-bit effective address is inde- tion is disabled, addressing is controlled as specified in pendent of whether the processor is in 32-bit mode or Section 5.7.3. 64-bit mode. In 32-bit mode (MSRSF=0), the high-order 32 bits of the 64-bit effective address are treated as zeros for the purpose of addressing storage. This 5.7.3 Real And Virtual Real applies to both data accesses and instruction fetches. It Addressing Modes applies independent of whether address translation is enabled or disabled. This truncation of the effective When a storage access is an instruction fetch per- address is the only respect in which storage accesses formed when instruction address translation is dis- in 32-bit mode differ from those in 64-bit mode. abled, or if the access is a data access and data address translation is disabled, it is said to be per- formed in "real addressing mode" if VPM0=0 and the processor is not in hypervisor state. If the processor is in hypervisor state, the access is said to be performed 422 Power ISATM -- Book III-S Version 2.04 in "hypervisor real addressing mode" regardless of the Programming Note value of VPM0. If the processor is not in hypervisor state and VPM0=1, the access is said to be performed EA4:63-r should equal 60-r0. If this condition is satis- in "virtual real addressing mode." Storage accesses in fied, ORing the effective address with the offset real, hypervisor real, and virtual real addressing modes produces a result that is equivalent to adding the are performed in a manner that depends on the con- effective address and the offset. tents of MSRHV, LPES, VPM, VRMASD, HRMOR, If m<60, EA4:63-m and HRMOR0:59-m must be RMLS, and RMOR (see Chapter 2), and bit 0 of the zeros. effective address (EA0) as described below. Bit 1 of the effective address is ignored. Software must ensure that altering the HRMOR does not cause an implicit branch. MSRHV=1 1 If EA0=0, the Hypervisor Offset Real Mode Address mechanism, described in Section 5.7.3.1, 5.7.3.2 Offset Real Mode Address controls the access. If VPM0=0, MSRHV=0, and LPES1=1, the access is controlled by the contents of the Real Mode Limit 1 If EA0=1, bits 4:63 of the effective address are Selector and Real Mode Offset Register, as specified used as the real address for the access. below, and the set of storage locations accessible by code is referred to as the Real Mode Area (RMA). MSRHV=0 Real Mode Limit Selector (RMLS) 1 If LPES1=0, the access causes a storage excep- tion as described in Section 5.7.9.3. If bits 4:63 of effective address for the access are 1 If LPES1=1 and VPM0=0, the Offset Real Mode greater than or equal to the value (limit) repre- Address mechanism, described in Section 5.7.3.2, sented by the contents of the RMLR, the access controls the access. causes a storage exception (see Section 5.7.9.3). In this comparison, if m<60, bits 4:63-m of the 1 If LPES1=1 and VPM0=1, the Virtual Real Mode effective address may be ignored (i.e., treated as if Addressing mechanism, described in Section they were zeros), where the real address size sup- 5.7.3.4, controls the access. ported by the implementation is m bits. The sup- ported limit values are of the form 2j, where 12 j 5.7.3.1 Hypervisor Offset Real Mode 60. Subject to the preceding sentence, the num- ber and values of the limits supported are imple- Address mentation-dependent. If MSRHV = 1 and EA0 = 0, the access is controlled by Real Mode Offset Register (RMOR) the contents of the Hypervisor Real Mode Offset Regis- ter, as follows. If the access is permitted by the RMLR, bits 4:63 of the effective address for the access are ORed with Hypervisor Real Mode Offset Register (HRMOR) the 60-bit offset represented by the contents of the Bits 4:63 of the effective address for the access RMOR, and the low-order m bits of the 60-bit result are ORed with the 60-bit offset represented by the are used as the real address for the access. The contents of the HRMOR, and the 60-bit result is supported offset values are all values of the form used as the real address for the access. The sup- i×2s, where 0 i < 2k, and k and s are implementa- ported offset values are all values of the form i×2r, tion-dependent values having the properties that where 0 i < 2j, and j and r are implementation- 2s is the minimum limit value supported by the dependent values having the properties that 12 r implementation (i.e., the minimum value represent- 26 (i.e., the minimum offset granularity is 4 KB able by the contents of the RMLR) and k+s = m. and the maximum offset granularity is 64 MB) and j+r = m, where the real address size supported by Programming Note the implementation is m bits. The offset specified by the RMOR should be a non- zero multiple of the limit specified by the RMLS. If these registers are set thus, ORing the effective address with the offset produces a result that is equivalent to adding the effective address and the offset. (The offset must not be zero, because real page 0 contains the fixed interrupt vectors and real pages 1 and 2 may be used for implementation- specific purposes; see Section 5.7.4, "Address Ranges Having Defined Uses" on page 426.) Chapter 5. Storage Control 423 Version 2.04 5.7.3.3 Storage Control Attributes for 5.7.3.3.1 Hypervisor Real Mode Storage Control Accesses in Real and Hypervisor Real The Hypervisor Real Mode Storage Control facility pro- Addressing Modes vides a means of specifying portions of real storage that are treated as non-Guarded in hypervisor real Storage accesses in hypervisor real addressing mode addressing mode (MSRHV PR=0b10, and MSRIR=0 or are performed as though all of storage had the follow- MSRDR=0, as appropriate for the type of access). The ing storage control attributes, except as modified by the remaining portions are treated as Guarded in hypervi- Real Mode Storage Control facility (see sor real addressing mode. The means is a hypervisor Section 5.7.3.3.1). (The storage control attributes are resource (see Chapter 2), and may also be system- defined in Book II.) specific. 1 not Write Through Required If the Real Mode Caching Inhibited (RMI) bit is set to 1, 1 not Caching Inhibited, for instruction fetches it is undefined whether a given data access to a storage 1 not Caching Inhibited, for data accesses if the Real location that is treated as non-Guarded in hypervisor Mode Caching Inhibited bit is set to 0; Caching real addressing mode is treated as Caching Inhibited or Inhibited, for data accesses if the Real Mode as not Caching Inhibited. If the access is treated as Caching Inhibited bit is set to 1 Caching Inhibited and is performed out-of-order, the 1 Memory Coherence Required, for data accesses access cannot cause a Machine Check or Checkstop to 1 Guarded occur out-of-order due to violation of the requirements Storage accesses in real addressing mode are per- given in Section 5.8.2.2 for changing the value of the formed as though all of storage had the following stor- effective I bit. (Recall that software must ensure that age control attributes. (Such accesses use the Offset RMI = 0 when the processor is not in hypervisor real Real Mode Address mechanism.) addressing mode; see Section 5.7.3.3.) 1 not Write Through Required The facility does not apply to implicit accesses to the 1 not Caching Inhibited Page Table by the processor in performing address 1 Memory Coherence Required, for data accesses translation or in recording reference and change infor- 1 not Guarded mation. These accesses are performed as described in Section 5.7.3.3. Additionally, storage accesses in real or hypervisor real addressing modes are performed as though all storage Programming Note was not No-execute. The preceding capability can be used to improve Software must ensure that any data storage location the performance of hypervisor software that runs in that is accessed with the Real Mode Caching Inhibited hypervisor real addressing mode, by causing bit set to 1 is not in the caches. accesses to instructions and data that occupy well- behaved storage to be treated as non-Guarded. Software must ensure that the Real Mode Caching See also the second paragraph of the Program- Inhibited bit contains 0 whenever data address transla- ming Note in Section 5.7.3.3. tion is enabled and whenever the processor is not in hypervisor state. If RMI=1, the statement in Section 5.5, that non- Guarded storage locations may be fetched out-of- Programming Note order into a cache only if they could be fetched into Because storage accesses in real addressing that cache by in-order execution does not preclude mode and hypervisor real addressing mode do not the out-of-order fetching into the data cache of stor- use the SLB or the Page Table, accesses in these age locations that are treated as non-Guarded in modes bypass all checking and recording of infor- hypervisor real addressing mode, because the mation contained therein (e.g., storage protection effective RMI value that could be used for an in- checks that use information contained therein are order data access to such a storage location is not performed, and reference and change informa- undefined and hence could be 0. tion is not recorded). The Real Mode Caching Inhibited bit can be used 5.7.3.4 Virtual Real Mode Addressing to permit a control register on an I/O device to be Mechanism accessed without permitting the corresponding storage location to be copied into the caches. The If VPM0=1, MSRHV=0, LPES1=1, and MSRDR=0 or bit should normally contain 0. Software would set MSRIR=0 as appropriate for the type of access, the the bit to 1 just before accessing the control regis- access is said to be made in virtual real addressing ter, access the control register as needed, and then mode and is controlled by the mechanism specified set the bit back to 0. below. The set of storage locations accessible by code is referred to as the Virtualized Real Mode Area (VRMA). 424 Power ISATM -- Book III-S Version 2.04 In virtual real addressing mode, address translation, Programming Note storage protection, and reference and change record- ing are handled as follows. Software should specify PTEB = 0b01 for all Page 1 Address translation and storage protection are Table Entries that map the VRMA in order to be handled as if address translation were enabled, consistent with the values in Figure 14. except that translation of effective addresses to vir- tual addresses use the SLBE values in Figure 14 Programming Note instead of the entry in the SLB corresponding to the ESID, bits 0:3 of the effective address are All accesses to the RMA are considered not ignored (i.e. treated as if they were 0s), bits 4:63- Guarded. The G bit of the associated Page Table m of the effective address may be ignored (where Entry determines whether an access to the VRMA the real address size supported by the implemen- is Guarded. Therefore, if an instruction is fetched tation is m bits), and the Virtual Page Class Key from the VRMA, a Hypervisor Instruction Storage protection mechanism does not apply. interrupt will result if G=1 in the associated Page Table Entry. Programming Note The Virtual Page Class Key protection mecha- 5.7.3.5 Storage Control Attributes for nism does not apply because the authority mask that an OS has set for application pro- Implicit Storage Accesses grams executing with address translation Implicit accesses to the Page Table by the processor in enabled may not be the same as the authority performing address translation and in recording refer- mask required by the OS when address trans- ence and change information are performed as though lation is disabled, such as when first entering the storage occupied by the Page Table had the follow- an interrupt handler. ing storage control attributes. 1 Reference and change recording are handled as if 1 not Write Through Required address translation were enabled. 1 not Caching Inhibited 1 Memory Coherence Required 1 not Guarded Field Value ESID 360 The definition of "performed" given in Book II applies V 1 also to these implicit accesses; accesses for perform- ing address translation are considered to be loads in B 0b01 - 1 TB this respect, and accesses for recording reference and VSID 0x0_01FF_FFFF change information are considered to be stores. These Ks 0 implicit accesses are ordered by the ptesync instruc- Kp undefined tion as described in Section 5.9.2. N 0 L VRMASDL C 0 LP VRMASDLP Figure 14. SLBE for VRMA If the effective address is not less than 1 TB, a Hypervi- sor Data Segment or Hypervisor Instruction Segment interrupt may occur. Programming Note The C bit in Figure 14 is set to 0 because the imple- mentation-dependent lookaside information associ- ated with the VRMA is expected to be long-lived. See Section 5.9.3.1. Programming Note The 1 TB VSID 0x0_01FF_FFFF should not be used by the operating system for purposes other than mapping the VRMA when address translation is enabled. Chapter 5. Storage Control 425 Version 2.04 5.7.4 Address Ranges Having Defined Uses The address ranges described below have uses that are defined by the architecture. 1 Fixed interrupt vectors Except for the first 256 bytes, which are reserved for software use, the real page beginning at real address 0x0000_0000_0000_0000 is either used for interrupt vectors or reserved for future interrupt vectors. 1 Implementation-specific use The two contiguous real pages beginning at real address 0x0000_0000_0000_1000 are reserved for implementation-specific purposes. 1 Offset Real Mode interrupt vectors The real pages beginning at the real address spec- ified by the HRMOR and RMOR are used similarly to the page for the fixed interrupt vectors. 1 Page Table A contiguous sequence of real pages beginning at the real address specified by SDR1 contains the Page Table. 426 Power ISATM -- Book III-S Version 2.04 5.7.5 Address Translation Overview The effective address (EA) is the address generated by the processor for an instruction fetch or for a data 64-bit Effective Address access. If address translation is enabled, this address is passed to the Address Translation mechanism, which 64-s s-p p attempts to convert the address to a real address which ESID Page Byte is then used to access storage. 0 63-s 64-s 63-p 64-p 63 The first step in address translation is to convert the effective address to a virtual address (VA), as described in Section 5.7.6. The second step, conver- sion of the virtual address to a real address (RA), is Segment Lookaside Buffer (SLB) described in Section 5.7.7. If the effective address cannot be translated, a storage SLBE0 ESID V B VSID KsKpNLC LP exception (see Section 5.2) occurs. Figure gives an overview of the address translation process. SLBEn Effective Address 0 35 37 39 88 89 93 95 96 VSID0:77-s 78-s s-p p VSID Page Byte Lookup in SLB Virtual Page Number (VPN) 78-bit Virtual Address Figure 15. Translation of 64-bit effective address to 78 bit virtual address Virtual Address 5.7.6.1 Segment Lookaside Buffer (SLB) The Segment Lookaside Buffer (SLB) specifies the Lookup in mapping between Effective Segment IDs (ESIDs) and Page Table Virtual Segment IDs (VSIDs). The number of SLB entries is implementation-dependent, except that all implementations provide at least 32 entries. The contents of the SLB are managed by software, using the instructions described in Section 5.9.3.1. See Real Address Chapter 10. "Synchronization Requirements for Con- text Alterations" on page 489 for the rules that software must follow when updating the SLB. SLB Entry Address translation overview Each SLB entry (SLBE, sometimes referred to as a "segment descriptor") maps one ESID to one VSID. 5.7.6 Virtual Address Generation Figure 16 shows the layout of an SLB entry Conversion of a 64-bit effective address to a virtual address is done by searching the Segment Lookaside Buffer (SLB) as shown in Figure 15. Chapter 5. Storage Control 427 Version 2.04 . - L||LP contains a value supported by the imple- mentation. ESID V B VSID KsKpNLC / LP - The page size selected by the L and LP fields 0 36 37 39 89 94 95 96 does not exceed the segment size selected by the B field. Bit(s) Name Description - If s=40, the following bits of the SLB entry con- 0:35 ESID Effective Segment ID tain 0s. 36 V Entry valid (V=1) or invalid (V=0) - ESID24:35 37:38 B Segment Size Selector - VSID38:49 0b00 - 256 MB (s=28) The bits in the above two items are ignored by 0b01 - 1 TB (s=40) the processor. 0b10 - reserved 0b11 - reserved The Class field is used in conjunction with the slbie 39:88 VSID Virtual Segment ID instruction (see Section 5.9.3.1). 89 Ks Supervisor (privileged) state stor- Software must ensure that the SLB contains at most age key (see Section 5.7.9.2) one entry that translates a given effective address, and 90 Kp Problem state storage key (See that if the SLB contains an entry that translates a given Section 5.7.9.2.) effective address, then any previously existing transla- 91 N No-execute segment if N=1 tion of that effective address has been invalidated. An 92 L Virtual page size selector bit 0. attempt to create an SLB entry that violates this 93 C Class requirement may cause a Machine Check. 95:96 LP Virtual page size selector bits 1:2. All other fields are reserved. B0 (SLBE37)is treated as a Programming Note reserved field. It is permissible for software to replace the contents of a valid SLB entry without invalidating the transla- Figure 16. SLB Entry tion specified by that entry provided the specified Instructions cannot be executed from a No-execute restrictions are followed. See Chapter 10 Note 11. (N=1) segment. The L and LP bits specify the page size or sizes that 5.7.6.2 SLB Search the segment may contain as shown in Figure 17. A Mixed Page Size (MPS) segment is a segment that When the hardware searches the SLB, all entries are may contain 4 KB pages, 64 KB pages, or a mixture of tested for a match with the EA. For a match to exist, the both. A Uniform Page Size (UPS) segment is a seg- following conditions must be satisfied for indicated ment that must contain pages of only a single size. fields in the SLBE. 1 V=1 Seg- 1 ESID0:63-s=EA0:63-s, where the value of s is speci- SLBEL||LP ment Virtual Page Size(s) fied by the B field in the SLBE being tested Type 0b000 MPS 4 KB, 64 KB if PTEL LP specifies If no match is found, the search fails. If one match is 64 KB page in MPS segment, or found, the search succeeds. If more than one match is both sizes found, one of the matching entries is used as if it were the only matching entry, or a Machine Check occurs. 0b101 UPS 64 KB if PTEL LP specifies 64 KB page in UPS segment If the SLB search succeeds, the virtual address (VA) is additional UPS 2p bytes, where p > 12 and may formed from the EA and the matching SLB entry fields values1 differ among SLBL||LP values as follows. 1 The "additional values" of SLBL||LP are implementa- VA=VSID0:77-s || EA64-s:63 tion-dependent, as are the corresponding virtual page sizes. The Virtual Page Number (VPN) is bits 0:77-p of the virtual address. If the value of the virtual page size selector field in the matching SLBE is 0b000, then the Figure 17. SLBLL||LP Encoding value of p is the value specified in the PTE used to translate the virtual address (see Section 5.7.7.1); oth- erwise the value of p is the value specified in the virtual page size selector field in the matching SLBE. If SLBEN = 1, the N (No-execute) value used for the storage access is 1. For each SLB entry, software must ensure the following requirements are satisfied. 428 Power ISATM -- Book III-S Version 2.04 If the SLB search fails, a segment fault occurs. This is an Instruction Segment exception or a Data Segment exception, depending on whether the effective address is for an instruction fetch or for a data access. Chapter 5. Storage Control 429 Version 2.04 5.7.7 Virtual to Real Translation Conversion of a 78-bit virtual address to a real address is done by searching the Page Table as shown in Figure 18. 78-bit Virtual Address 78-p p Virtual Page Number (VPN) Byte 78-p 77 HTABORG HTABSIZE 2 44 13 5 // xxx.......xx000.00 /// 0 4 1718 45 59 63 28 39 Decode to Mask Hash Function (see Section 5.7.7.3) 0 27 0 2728 38 28 AND Page Table 16 bytes 28 PTEG 0 PTE0 PTE7 OR 14 28 11 7 0000000 60-bit Real Address of Page Table Entry Group (PTEG) PTEG n 128 bytes Page Table Entry (PTE) 16 bytes B AVPN SW L H V pp / key ARPN LP key R C WIMG N pp 0 57 6162 63 0 1 2 4 44 52 54 5556 57 61 62 63 (ARPN||LP)0:59-p key LP 60-p p 60-bit Real Address Byte Figure 18. Translation of 78-bit virtual address to 60-bit real address 430 Power ISATM -- Book III-S Version 2.04 5.7.7.1 Page Table The Hashed Page Table (HTAB) is a variable-sized data Page Table Entry structure that specifies the mapping between Virtual Page Numbers and real page numbers, where the real Each Page Table Entry (PTE) maps one VPN to one page number of a real page is bits 0:50 of the address RPN. Figure 19 shows the layout of a PTE. This layout of the first byte in the real page. The HTAB's size must is independent of the Endian mode of the processor. be a multiple of 4 KB, its starting address must be a multiple of its size, and it must be located in storage 0 57 61 62 63 having the storage control attributes that are used for B AVPN SW L H V implicit accesses to it (see Section 5.7.3.3). pp / key ARPN LP key R C WIMG N pp The HTAB contains Page Table Entry Groups (PTEGs). 0 1 2 4 44 52 55 56 57 61 62 63 A PTEG contains 8 Page Table Entries (PTEs) of 16 bytes each; each PTEG is thus 128 bytes long. PTEGs Dword Bit(s) Name Description are entry points for searches of the Page Table. 0 0:1 B Segment Size 0b00 - 256 MB See Section 5.10 for the rules that software must follow 0b01 - 1 TB when updating the Page Table. 0b10 - reserved 0b11 - reserved Programming Note 2:56 AVPN Abbreviated Virtual Page The Page Table must be treated as a hypervisor Number resource (see Chapter 2), and therefore must be 57:60 SW Available for software use placed in real storage to which only the hypervisor 61 L Virtual page size has write access. Moreover, the contents of the 0b0 - 4 KB Page Table must be such that non-hypervisor soft- 0b1 - greater than 4KB ware cannot modify storage that contains hypervi- (large page) sor programs or data. 62 H Hash function identifier 63 V Entry valid (V=1) or invalid (V=0) 1 0 pp Page Protection bit 0 2:3 key KEY bits 0:1 4:43 ARPN Abbreviated Real Page Number 44:51 LP Large page size selector 52:54 key KEY bits 2:4 55 R Reference bit 56 C Change bit 57:60 WIMG Storage control bits 61 N No-execute page if N=1 62:63 pp Page protection bits 1:2 All other fields are reserved. Figure 19. Page Table Entry If p23, the Abbreviated Virtual Page Number (AVPN) field contains bits 0:54 of the VPN. Otherwise bits 0:77- p of the AVPN field contain bits 0:77-p of the VPN, and bits 78-p:54 of the AVPN field must be zeros and are ignored by the processor. Programming Note If p23, the AVPN field omits the low-order 23-p bits of the VPN. These bits are not needed in the PTE, because the low-order 11 bits of the VPN are always used in selecting the PTEGs to be searched (see Section 5.7.7.3). Chapter 5. Storage Control 431 Version 2.04 On implementations that support a virtual address size Programming Note of only n bits, n<78, bits 0:77-n of the AVPN field must be zeros. The processor often has implementation-depen- dent lookaside buffers (e.g. TLBs and ERATs) used A virtual page is mapped to a sequence of 2p-12 contig- to cache translations of recently used storage uous real pages such that the low-order p-12 bits of the addresses. Mapping virtual storage to large pages real page number of the first real page in the sequence may increase the effectiveness of such lookaside are 0s. buffers, improving performance, because it is pos- If PTEL=0, the virtual page size is 4KB, and ARPN con- sible for such buffers to translate a larger range of catenated with LP (ARPN||LP) contains the page num- addresses, reducing the frequency that the Page ber of the real page that maps the virtual page Table must be searched to translate an address. described by the entry. Instructions cannot be executed from a No-execute If PTEL=1, the virtual page size is specified by PTELP. (N=1) page. In this case, the contents of PTELP have the format shown in Figure 20. Bits labelled "r" are bits of the real page number. The page size specified by the non-r bits Page Table Size of PTELP is implementation-dependent. The number of entries in the Page Table directly affects r r r r _r r r 0 performance because it influences the hit ratio in the r r r r _r r 01 Page Table and thus the rate of page faults. If the table r r r r _r 011 is too small, it is possible that not all the virtual pages r r r r _0111 that actually have real pages assigned can be mapped r r r 0_1111 r r 01_1111 via the Page Table. This can happen if too many hash r 011_1111 collisions occur and there are more than 16 entries for 0111_1111 the same primary/secondary pair of PTEGs. While this situation cannot be guaranteed not to occur for any size Figure 20. Format of PTELP Page Table, making the Page Table larger than the min- imum size (see Section 5.7.7.2) will reduce the fre- There are at least 2 formats of PTELP that specify a quency of occurrence of such collisions. 64 KB page. One format specifies a 64 KB page con- tained in an MPS segment, and another specifies a 64 Programming Note K page contained in a Uniform segment. If large pages are not used, it is recommended that If L=1, the page size selected by the LP field must not the number of PTEGs in the Page Table be at least exceed the segment size selected by the B field. Forms half the number of real pages to be accessed. For of PTELP not supported by a given processor are example, if the amount of real storage to be treated as reserved values for that processor. accessed is 231 bytes (2 GB), then we have 231-12=219 real pages. The minimum recom- The concatenation of the ARPN field and bits labeled mended Page Table size would be 218 PTEGs, or "r" in the LP field contain the high-order bits of the real 225 bytes (32 MB). page number of the real page that maps the first 4KB of the virtual page described by the entry. The low-order p-12 bits of the real page number con- tained in the ARPN and LP fields must be 0s and are ignored by the processor. Programming Note The page size specified by a given PTELP format is at least 212+(8-c), where c is the number of r bits in the format. 432 Power ISATM -- Book III-S Version 2.04 5.7.7.2 Storage Description Programming Note Register 1 Let n equal the virtual address size (in bits) sup- The Storage Description Register 1 (SDR1) register is ported by the implementation. If n<67, software shown in Figure 21. should set the HTABSIZE field to a value that does not exceed n-39. Because the high-order 78-n bits // HTABORG /// HTABSIZE of the VSID are assumed to be zeros, the hash 0 4 46 59 63 value used in the Page Table search will have the high-order 67-n bits either all 0s (primary hash; see Section 5.7.7.3) or all 1s (secondary hash). If Bits Name Description HTABSIZE > n-39, some of these hash value bits 4:45 HTABORG Real address of Page Table will be used to index into the Page Table, with the 59:63 HTABSIZE Encoded size of Page Table result that certain PTEGs will not be searched. All other fields are reserved. Example: Figure 21. SDR1 Suppose that the Page Table is 16,384 (214) 128-byte PTEGs, for a total size of 221 bytes (2 MB). A 14-bit SDR1 is a hypervisor resource; see Chapter 2. index is required. Eleven bits are provided from the The HTABORG field in SDR1 contains the high-order hash to start with, so 3 additional bits from the hash 42 bits of the 60-bit real address of the Page Table. The must be selected. Thus the value in HTABSIZE must be Page Table is thus constrained to lie on a 218 byte (256 3 and the value in HTABORG must have its low-order 3 KB) boundary at a minimum. At least 11 bits from the bits (bits 43:45 of SDR1) equal to 0. This means that hash function (see Figure 18) are used to index into the the Page Table must begin on a 23+11+7 = 221 = 2 MB Page Table. The minimum size Page Table is 256 KB boundary. (211 PTEGs of 128 bytes each). The Page Table can be any size 2n bytes where 5.7.7.3 Page Table Search 18n46. As the table size is increased, more bits are When the hardware searches the Page Table, the used from the hash to index into the table and the value accesses are performed as described in in HTABORG must have more of its low-order bits Section 5.7.3.3. equal to 0. An outline of the HTAB search process is shown in The HTABSIZE field in SDR1 contains an integer giving Figure 18. Up to two hash functions are used to locate the number of bits (in addition to the minimum of 11 a PTE that may translate the given virtual address. bits) from the hash that are used in the Page Table index. This number must not exceed 28. HTABSIZE is A 39-bit hash value is computed from the VPN. The used to generate a mask of the form 0b00...011...1, value of s is the value specified in the SLBE that was which is a string of 28 - HTABSIZE 0-bits followed by a used to generate the virtual address; the value of p string of HTABSIZE 1-bits. The 1-bits determine which used when computing the hash function is 12 if additional bits (beyond the minimum of 11) from the SLBEL||LP =0b000, otherwise the value of p is the value hash are used in the index (see Figure 18). The num- specified in the SLBE. ber of low-order 0 bits in HTABORG must be greater 1. Primary Hash: than or equal to the value in HTABSIZE. If s=28, the hash value is computed by Exclusive On implementations that support a real address size of ORing VPN11:49 with (11+p0||VPN50:77-p) only m bits, m<60, bits 0:59-m of the HTABORG field are treated as reserved bits, and software must set If s=40, the hash value is computed by Exclusive them to zeros. ORing the following three quantities: (VPN24:37 ||250), (0||VPN0:37), and (p-10||VPN38:77-p) The 60-bit real address of a PTEG is formed by concatenating the following values: 1 Bits 4:17 of SDR1 (the high-order 14 bits of HTABORG). 1 Bits 0:27 of the 39-bit hash value ANDed with the mask generated from bits 59:63 of SDR1 (HTABSIZE) and then ORed with bits 18:45 of SDR1 (the low-order 28 bits of HTABORG). 1 Bits 28:38 of the 39-bit hash value. 1 Seven 0-bits. Chapter 5. Storage Control 433 Version 2.04 This operation identifies a particular PTEG, called Programming Note the "primary PTEG", whose eight PTEs will be tested. For segments that may contain a mixture of 4 KB and 64 KB pages (i.e. SLBEL||LP = 0b000), the 2. Secondary Hash: value of p used when searching the Page Table to If s=28, the hash value is computed by taking the identify the PTEGs is specified to be 12. Since the ones complement of the Exclusive OR of VPN11:49 segment may contain pages of size 4KB and 64 with (11+p0||VPN50:77-p) KB, the processor searches for PTEs specifying pages of either size, and the real address is formed If s=40, the hash value is computed by taking the using a value of p specified by the matching PTE. ones complement of the Exclusive OR of the fol- lowing three quantities: (VPN24:37 ||250), If the Page Table search fails, a page fault occurs. This (0||VPN0:37), and (p-10||VPN38:77-p) is an Instruction Storage exception or a Data Storage The 60-bit real address of a PTEG is formed by exception, depending on whether the effective address concatenating the following values: is for an instruction fetch or for a data access. The N 1 Bits 4:17 of SDR1 (the high-order 14 bits of value used for the storage access is the N bit from the HTABORG). SLB entry that was used to translate the effective 1 Bits 0:27 of the 39-bit hash value ANDed with address. the mask generated from bits 59:63 of SDR1 (HTABSIZE) and then ORed with bits 18:45 of Programming Note SDR1 (the low-order 28 bits of HTABORG). To obtain the best performance, Page Table Entries 1 Bits 28:38 of the 39-bit hash value. should be allocated beginning with the first empty 1 Seven 0-bits. entry in the primary PTEG, or with the first empty This operation identifies the "secondary PTEG". entry in the secondary PTEG if the primary PTEG is full. 3. As many as 16 PTEs in the two identified PTEGs are tested to determine if any translate the given virtual address. Let q = minimum(54, 77-p). For a Translation Lookaside Buffer match to exist, the following conditions must be satisfied, where SLBE is the SLBE used to form Conceptually, the Page Table is searched by the the virtual address. address relocation hardware to translate every refer- 1 PTEH=0 for the primary PTEG, 1 for the sec- ence. For performance reasons, the hardware usually ondary PTEG keeps a Translation Lookaside Buffer (TLB) that holds 1 PTEV=1 PTEs that have recently been used. The TLB is 1 PTEB=SLBEB searched prior to searching the Page Table. As a con- 1 PTEAVPN[0:q]=VA0:q sequence, when software makes changes to the Page 1 if PTEL=0 then SLBEL||LP =0b000 Table it must perform the appropriate TLB invalidate else PTELP specifies a page size operations to maintain the consistency of the TLB with specified by SLBEL||LP the Page Table (see Section 5.10). If no match is found, the search fails. If one match Programming Notes is found, the search succeeds. If more than one 1. Page Table Entries may or may not be cached match is found, one of the matching entries is used in a TLB. as if it were the only matching entry, or a Machine Check occurs. 2. It is possible that the hardware implements more than one TLB, such as one for data and If the Page Table search succeeds, the real address one for instructions. In this case the size and (RA) is formed by concatenating bits 0:59-p of shape of the TLBs may differ, as may the val- (ARPN||LP) from the matching PTE with bits 64-p:63 of ues contained therein. the effective address (the byte offset), where the p value is the value specified by PTEL LP. 3. Use the tlbie or tlbia instruction to ensure that the TLB no longer contains a mapping for a RA=(ARPN || LP)0:59-p || EA64-p:63 particular virtual page. The N (No-execute) value used for the storage access is the result of ORing the N bit from the matching PTE with the N bit from the SLB entry that was used to translate the effective address. 434 Power ISATM -- Book III-S Version 2.04 5.7.8 Reference and Change Programming Note Recording Even though the execution of a Store instruction causes the Change bit to be set to 1, the store If address translation is enabled, Reference (R) and might not be performed or might be only partially Change (C) bits are maintained in the Page Table Entry performed in cases such as the following. that is used to translate the virtual address. If the stor- age operand of a Load or Store instruction crosses a 1 A Store Conditional instruction (stwcx. or virtual page boundary, the accesses to the components stdcx.) is executed, but no store is performed. of the operand in each page are treated as separate 1 A Store String Word Indexed instruction and independent accesses to each of the pages for the (stswx) is executed, but the length is zero. purpose of setting the Reference and Change bits. 1 The Store instruction causes a Data Storage Reference and Change bits are set by the processor as exception (for which setting the Change bit is described below. Setting the bits need not be atomic not prohibited). with respect to performing the access that caused the 1 The Store instruction causes an Alignment bits to be updated. An attempt to access storage may exception. cause one or more of the bits to be set (as described below) even if the access is not performed. The bits are 1 The Page Table Entry that translates the virtual updated in the Page Table Entry if the new value would address of the storage operand is altered such otherwise be different from the old, as determined by that the new contents of the Page Table Entry examining either the Page Table Entry or any corre- preclude performing the store (e.g., the PTE is sponding lookaside information (e.g., TLB) maintained made invalid, or the PP bits are changed). by the processor. For example, when executing a Store instruc- Reference Bit tion, the processor may search the Page Table for the purpose of setting the Change bit and The Reference bit is set to 1 if the corresponding then re-execute the instruction. When reexe- access (load, store, or instruction fetch) is required cuting the instruction, the processor may by the sequential execution model and is per- search the Page Table a second time. If the formed. Otherwise the Reference bit may be set to Page Table Entry has meanwhile been altered, 1 if the corresponding access is attempted, either by a program executing on another processor, in-order or out-of-order, even if the attempt causes the second search may obtain the new con- an exception. tents, which may preclude the store. Change Bit 1 A system-caused interrupt occurs before the store has been performed. The Change bit is set to 1 if a Store instruction is executed and the store is performed. Otherwise the Change bit may be set to 1 if a Store instruc- Figure 22 on page 436 summarizes the rules for setting tion is executed and the store is permitted by the the Reference and Change bits. The table applies to storage protection mechanism and, if the Store each atomic storage reference. It should be read from instruction is executed out-of-order, the instruction the top down; the first line matching a given situation would be required by the sequential execution applies. For example, if stwcx. fails due to both a stor- model in the absence of the following kinds of age protection violation and the lack of a reservation, interrupts: the Change bit is not altered. 1 system-caused interrupts (see Section 6.4 on In the figure, the "Load-type" instructions are the Load page 462) instructions described in Books I and II, eciwx, and the 1 Floating-Point Enabled Exception type Pro- Cache Management instructions that are treated as gram interrupts when the processor is in an Loads. The "Store-type" instructions are the Store Imprecise mode instructions described in Books I and II, ecowx, and the Cache Management instructions that are treated as Stores. The "ordinary" Load and Store instructions are those described in Books I and II. "set" means "set to 1". When the processor updates the Reference and Change bits in the Page Table Entry, the accesses are performed as described in Section 5.7.3.3, "Storage Control Attributes for Accesses in Real and Hypervisor Real Addressing Modes" on page 424. The accesses may be performed using operations equivalent to a store to a byte, halfword, word, or doubleword, and are Chapter 5. Storage Control 435 Version 2.04 not necessarily performed as an atomic read/modify/ write of the affected bytes. These Reference and Change bit updates are not nec- essarily immediately visible to software. Executing a sync instruction ensures that all Reference and Change bit updates associated with address transla- tions that were performed, by the processor executing the sync instruction, before the sync instruction is exe- cuted will be performed with respect to that processor before the sync instruction's memory barrier is created. There are additional requirements for synchronizing Reference and Change bit updates in multiprocessor systems; see Section 5.10, "Page Table Update Syn- chronization Requirements" on page 454. Programming Note Because the sync instruction is execution synchro- nizing, the set of Reference and Change bit updates that are performed with respect to the pro- cessor executing the sync instruction before the memory barrier is created includes all Reference and Change bit updates associated with instruc- tions preceding the sync instruction. If software refers to a Page Table Entry when MSRDR=1, the Reference and Change bits in the asso- ciated Page Table Entry are set as for ordinary loads and stores. See Section 5.10 for the rules software must follow when updating Reference and Change bits. Status of Access R C Storage protection violation Acc1 No Out-of-order I-fetch or Load-type insn Acc No Out-of-order Store-type insn Would be required by the sequential execution model in the absence of system-caused or imprecise interrupts3 Acc Acc1 2 All other cases Acc No In-order Load-type or Store-type insn, access not performed Load-type insn Acc No Store-type insn Acc Acc2 Other in-order access I-fetch Yes No Ordinary Load, eciwx Yes No Other ordinary Store, ecowx, dcbz Yes Yes icbi, dcbt, dcbtst, dcbst, dcbf[l] Acc No "Acc" means that it is acceptable to set the bit. 1 It is preferable not to set the bit. 2 If C is set, R is also set unless it is already set. 3 For Floating-Point Enabled Exception type Pro- gram interrupts, "imprecise" refers to the exception mode controlled by MSRFE0 FE1. Figure 22. Setting the Reference and Change bits 436 Power ISATM -- Book III-S Version 2.04 5.7.9 Storage and Virtual Page Class Key protection mechanism has no effect on instruction fetches. Class Key Protection The storage and virtual page class key protection Key0 Key1 Key2 ... Key29 Key30 Key31 mechanism provides a means for selectively granting 0 2 4 6 58 60 62 instruction fetch access, granting read access, granting read/write access, and prohibiting access to areas of Figure 23. Authority Mask Register (AMR) storage based on a number of control criteria. The contents of the AMR are as follows. The operation of the protection mechanism depends on one or more of the following conditions. Bit Description - the state of MSR bits HV, IR,DR, PR 0:1 Access mask for class number 0 - the value of the key bits in the associated SLB 2:3 Access mask for class number 1 entry ... - the values of the page protection and key bits 2n:2n+1 Access mask for class number n in the associated PTE ... - the contents of the Authority Mask Register 62:63 Access mask for class number 31 When translation is enabled for an access, the access is permitted if and only if the access is permitted by the The access mask for each class defines the access virtual page class key protection (see Section 5.7.9.1) permissions used in conjunction with load and store and the storage protection mechanism (see operations corresponding to page table entries contain- Section 5.7.9.2). If an instruction fetch is not permitted, ing a KEY field value equal to the class number. The an Instruction Storage exception is generated. If a data access permissions associated with each class are access is not permitted, a Data Storage exception is defined as follows, where AMR2n and AMR2n+1 refer to generated. (See Section 5.2) the first and second bits of the of the access mask cor- responding to class number n. Unless otherwise indicated, references to "storage pro- tection mechanism" or "protection mechanism" - An access caused by a Store instruction is throughout the Books refer to both the Storage Protec- permitted if AMR2n=0b0; otherwise the access tion mechansm and the Virtual Page Class Key Protec- is not permitted. tion mechanism. - An access caused by a Load instruction is When address translation is enabled, a protection permitted if AMR2n+1=0b0; otherwise the domain is a range of unmapped effective addresses, a access is not permitted. virtual page, or a segment. When address translation is disabled and LPES1=1 there are two protection Programming Note domains: the set of effective addresses that are less If translation is disabled for a given access, the than the value specified by the RMLS, and all other access is not affected by the Virtual Page Class effective addresses. When address translation is dis- Key protection mechanism even if the access is abled and LPES1=0 the entire effective address space made in virtual real addressing mode. comprises a single protection domain. A protection boundary is a boundary between protection domains. 5.7.9.1 Virtual Page Class Key Protec- tion The Virtual Page Class Key protection mechanism pro- vides the means to assign virtual pages to one of 32 classes, and to modify access permissions for each class quickly by modifying the Authority Mask Register (AMR) shown in Figure 23. The access permissions associated with the Virtual Page Class Key protection mechanism apply only to load and store operations when address translation is enabled. The Virtual Page Chapter 5. Storage Control 437 Version 2.04 Programming Note The Virtual Page Class Key protection mechanism replaces the Data Address Compare mechanism that was defined in versions of the architecture that precede Version 2.04 (e.g., the two facilities use some of the same processor resources, as described below). However, the Virtual Page Class Key protection mechanism can be used to emulate the Data Address Compare mechanism. Moreover, programs that use the Data Address Compare mechanism can be modified in a manner such that they will work correctly both on processors that comply with versions of the archi- tecture that precede Version 2.04 (and hence implement the Data Address Compare mechanism) and on processors that comply with Version 2.04 of the architecture or with any subsequent version (and hence instead implement the Virtual Page Class Key protection mechanism). The technique takes advantage of the facts that the AMR has the same SPR number as the Data Address Compare mechanism's ACCR (Address Compare Control Register), that KEY4 occupies the same bit in the PTE as the Data Address Compare mechanism's AC (Address Compare) bit, and that the definition of ACCR62:63 is very similar to the definition of each even-odd pair of AMR bits. The technique is as follows, where PTE1 refers to doubleword 1 of the PTE. - Set bits 2:3 and 62:63 of SPR 29 (which is also be used for any virtual pages for which it either the ACCR or the AMR) to x, where x is is desired that the Virtual Page Class Key the desired 2-bit value for controlling Data mechanism permit all accesses. Do not use Address Compare matches, and set bits 0:1 to PTEKEY =31. 0s. - When a Data Storage interrupt occurs, if - Set PTE154 (which is either the AC bit or DSISR42=1 then ignore the interrupt for KEY4) to the same value that the AC bit would Cache Management instructions other than be set to, and set PTE12:3 (which are either dcbz. (These instructions can cause a virtual RPN bits, that correspond to a real address page class key protection violation but cannot size larger than the size implemented by any cause a Data Address Compare match.) Oth- processor that implements the Data Address erwise treat the interrupt as if a Data Address Compare mechanism, or KEY0:1) and Compare match had occurred. (Note: Cases PTE152:53 (which are either reserved bits or for which it is undefined whether a Data KEY2:3) to 0s. Address Compare match occurs do not nec- essarily cause a virtual page class key protec- - Use PTEKEY values 0 and 1 only for purposes tion violation.) of emulating the Data Address Compare mechanism, except that PTEKEY value 0 may 5.7.9.2 Storage Protection, Address 2. For any access except an instruction fetch that is Translation Enabled not permitted by rule 1, a "Key" value is computed using the following formula: When address translation is enabled, the protection Key 1 (Kp & MSRPR) | (Ks & ¬MSRPR) mechanism is controlled both by virtual page class key Using the computed Key, Figure 24 is applied. An protection (see Section 5.7.9.1) and the following. instruction fetch is permitted for any entry in the 1 MSRPR, which distinguishes between supervisor figure except "no access". A load is permitted for (privileged) state and problem state 1 Ks and Kp, the supervisor (privileged) state and problem state storage key bits in the SLB entry used to translate the effective address 1 PP, page protection bits 0:2 in the Page Table Entry used to translate the effective address 1 For instruction fetches only: - the N (No-execute) value used for the access (see Sections 5.7.6.1 and 5.7.7.3) - PTEG, the G (Guarded) bit in the Page Table Entry used to translate the effective address Using the above values, the following rules are applied. 1. For an instruction fetch, the access is not permitted if the N value is 1 or if PTEG=1. 438 Power ISATM -- Book III-S Version 2.04 any entry except "no access". A store is permitted Programming Note only for entries with "read/write". The comparison described in note 1 in Figure 25 Key PP Access Authority ignores bits 0:3 of the effective address and may ignore bits 4:63-m; see Section 5.7.3. 0 000 read/write 0 001 read/write 0 010 read/write 0 011 read only 0 110 read only 1 000 no access 1 001 read only 1 010 read/write 1 011 read only 1 110 no access All PP encodings not shown above are reserved. The results of using reserved PP encodings are bound- edly undefined. Figure 24. PP bit protection states, address translation enabled 5.7.9.3 Storage Protection, Address Translation Disabled When address translation is disabled, the protection mechanism is controlled by the following (see Chapter 2 and Section 5.7.3, "Real And Virtual Real Addressing Modes"). 1 LPES1, which distinguishes between the two modes of accessing storage using the LPAR facility 1 MSRHV, which distinguishes between hypervisor state and other privilege states 1 RMLS, which specifies the real mode limit value Using the above values, Figure 25 is applied. The access is permitted for any entry in the figure except "no access". LPES1 HV Access Authority 0 0 no access 0 1 read/write 1 0 read/write or no access1 1 1 read/write 1 If VPM0=1, the access authority is read/write. If VPM0=0 and the effective address for the access is less than the value specified by the RMLS, the access authority is read/write; otherwise the access is not permitted. Figure 25. Protection states, address translation disabled Chapter 5. Storage Control 439 Version 2.04 5.8 Storage Control Attributes This section describes aspects of the storage control 5.8.1.1 Out-of-Order Accesses to attributes that are relevant only to privileged software Guarded Storage programmers. The rest of the description of storage control attributes may be found in Section 1.6 of Book II In general, Guarded storage is not accessed out-of- and subsections. order. The only exceptions to this rule are the following. Load Instruction 5.8.1 Guarded Storage If a copy of any byte of the storage operand is in a Storage is said to be "well-behaved" if the correspond- cache then that byte may be accessed in the cache or ing real storage exists and is not defective, and if the in main storage. effects of a single access to it are indistinguishable from the effects of multiple identical accesses to it. Data Instruction Fetch and instructions can be fetched out-of-order from well- If MSRHV IR=0b10 then an instruction may be fetched if behaved storage without causing undesired side any of the following conditions are met. effects. 1. The instruction is in a cache. In this case it may be Storage is said to be Guarded if any of the following fetched from the cache or from main storage. conditions is satisfied. 2. The instruction is in a real page from which an 1 MSR bit IR or DR is 1 for instruction fetches or data instruction has previously been fetched, except accesses respectively, and the G bit is 1 in the rel- that if that previous fetch was based on condition 1 evant Page Table Entry. then the previously fetched instruction must have 1 MSR bit IR or DR is 0 for instruction fetches or data been in the instruction cache. accesses respectively, MSRHV=1, and the storage 3. The instruction is in the same real page as an is outside the range(s) specified by the Real Mode instruction that is required by the sequential execu- Storage Control facility (see Section 5.7.3.3.1). tion model, or is in the real page immediately fol- In general, storage that is not well-behaved should be lowing such a page. Guarded. Because such storage may represent a con- trol register on an I/O device or may include locations Programming Note that do not exist, an out-of-order access to such stor- Software should ensure that only well-behaved age may cause an I/O device to perform unintended storage is copied into a cache, either by accessing operations or may result in a Machine Check. as Caching Inhibited (and Guarded) all storage that may not be well-behaved, or by accessing such The following rules apply to in-order execution of Load storage as not Caching Inhibited (but Guarded) and and Store instructions for which the first byte of the referring only to cache blocks that are well- storage operand is in storage that is both Caching behaved. Inhibited and Guarded. If a real page contains instructions that will be exe- 1 Load or Store instruction that causes an atomic cuted when MSRIR=0 and MSRHV=1, software access should ensure that this real page and the next real If any portion of the storage operand has been page contain only well-behaved storage (or that the accessed and an External, Decrementer, Hypervi- Real Mode Storage Control facility specifies that sor Decrementer, or Imprecise mode Floating- this real page is not Guarded). Point Enabled exception is pending, the instruction completes before the interrupt occurs. 1 Load or Store instruction that causes an Alignment 5.8.2 Storage Control Bits exception, or that causes a Data Storage excep- When address translation is enabled, each storage tion for reasons other than Data Address Break- access is performed under the control of the Page point match. Table Entry used to translate the effective address. The portion of the storage operand that is in Cach- Each Page Table Entry contains storage control bits ing Inhibited and Guarded storage is not accessed. that specify the presence or absence of the corre- sponding storage control for all accesses translated by (The corresponding rules for instructions that the entry as shown in Figure 26. cause a Data Address Breakpoint match are given in Section 8.1.1.) 440 Power ISATM -- Book III-S Version 2.04 At any given time, the value of the W bit must be the same for all accesses to a given real page. Bit Storage Control Attribute W1 0 - not Write Through Required 5.8.2.2 Altering the Storage Control 1 - Write Through Required Bits I 0 - not Caching Inhibited When changing the value of the I bit for a given real 1 - Caching Inhibited page from 0 to 1, software must set the I bit to 1 and M2 0 - not Memory Coherence Required then flush all copies of locations in the page from the 1 - Memory Coherence Required caches using dcbf[l] and icbi before permitting any G 0 - not Guarded other accesses to the page. 1 - Guarded When changing the value of the W bit for a given real 1 Support for the 1 value of the W bit is optional. page from 0 to 1, software must ensure that no proces- Implementations that do not support the 1 value sor modifies any location in the page until after all cop- treat the bit as reserved and assume its value to ies of locations in the page that are considered to be be 0. modified in the data caches have been copied to main 2 [Category: Memory Coherence] Support for the 0 storage using dcbst or dcbf[l] value of the M bit is optional, implementations that do not support the 0 value assume the value of the Programming Note bit to be 1, and may either preserve the value of It is recommended that dcbf be used, rather than the bit or write it as 1. dcbfl, when changing the value of the I or W bit Figure 26. Storage control bits from 0 to 1. (dcbfl would have to be executed on all processors for which the contents of the data cache When address translation is enabled, instructions are may be inconsistent with the new value of the bit, not fetched from storage for which the G bit in the Page whereas, if the M bit for the page is 1, dcbf need be Table Entry is set to 1; see Section 5.7.9. executed on only one processor in the system.) When address translation is disabled, the storage con- trol attributes are implicit; see Section 5.7.3.3. When changing the value of the M bit for a given real page, software must ensure that all data caches are In Section 5.8.2.1 and 5.8.2.2, "access" includes consistent with main storage. The actions required to accesses that are performed out-of-order, and refer- do this to are system-dependent. ences to W, I, M, and G bits include the values of those bits that are implied when address translation is dis- Programming Note abled. For example, when changing the M bit in some directory-based systems, software may be required Programming Note to execute dcbf[l] on each processor to flush all In a uniprocessor system in which only the proces- storage locations accessed with the old M value sor has caches, correct coherent execution does before permitting the locations to be accessed with not require the processor to access storage as the new M value. Memory Coherence Required, and accessing stor- age as not Memory Coherence Required may give Additional requirements for changing the storage con- better performance. trol bits in the Page Table are given in Section 5.10. 5.8.2.1 Storage Control Bit Restrictions All combinations of W, I, M, and G values are permitted except those for which both W and I are 1. Programming Note If an application program requests both the Write Through Required and the Caching Inhibited attributes for a given storage location, the operating system should set the I bit to 1 and the W bit to 0. At any given time, the value of the I bit must be the same for all accesses to a given real page. Chapter 5. Storage Control 441 Version 2.04 5.9 Storage Control Instructions 5.9.1 Cache Management Instructions This section describes aspects of cache management delayed Machine Check interrupt or a delayed Check- that are relevant only to privileged software program- stop. mers. Each implementation provides an efficient means by For a dcbz instruction that causes the target block to which software can ensure that all blocks that are con- be newly established in the data cache without being sidered to be modified in the data cache have been fetched from main storage, the processor need not ver- copied to main storage before the processor enters any ify that the associated real address is valid. The exist- power conserving mode in which data cache contents ence of a data cache block that is associated with an are not maintained. invalid real address (see Section 5.6) can cause a 5.9.2 Synchronize Instruction The Synchronize instruction is described in respect to the processor executing the ptesync Section 3.3.3 of Book II, but only at the level required instruction, before any implicit accesses to the by an application programmer (sync with L=0 or L=1). affected Page Table Entries, by such Page Table This section describes properties of the instruction that searches, are performed with respect to that pro- are relevant only to operating system and hypervisor cessor. software programmers. This variant of the Synchronize 1 In conjunction with the tlbie and tlbsync instruc- instruction is designated the Page Table Entry sync tions, the ptesync instruction provides an ordering and is specified by the extended mnemonic ptesync function for TLB invalidations and related storage (equivalent to sync with L=2). accesses on other processors as described in the The ptesync instruction has all of the properties of tlbsync instruction description on page 453. sync with L=0 and also the following additional proper- ties. Programming Note 1 The memory barrier created by the ptesync For instructions following a ptesync instruc- instruction provides an ordering function for the tion, the memory barrier need not order implicit storage accesses associated with all instructions storage accesses for purposes of address that are executed by the processor executing the translation and reference and change record- ptesync instruction and, as elements of set A, for ing. all Reference and Change bit updates associated The functions performed by the ptesync with additional address translations that were per- instruction may take a significant amount of formed, by the processor executing the ptesync time to complete, so this form of the instruction instruction, before the ptesync instruction is exe- should be used only if the functions listed cuted. The applicable pairs are all pairs ai,bj in above are needed. Otherwise sync with L=0 which bj is a data access and ai is not an instruc- should be used (or sync with L=1, or eieio, if tion fetch. appropriate). 1 The ptesync instruction causes all Reference and Section 5.10, "Page Table Update Synchroni- Change bit updates associated with address trans- zation Requirements" on page 454 gives lations that were performed, by the processor exe- examples of uses of ptesync. cuting the ptesync instruction, before the ptesync instruction is executed, to be performed with respect to that processor before the ptesync instruction's memory barrier is created. 5.9.3 Lookaside Buffer 1 The ptesync instruction provides an ordering func- Management tion for all stores to the Page Table caused by All implementations have a Segment Lookaside Buffer Store instructions preceding the ptesync instruc- (SLB). For performance reasons, most implementa- tion with respect to searches of the Page Table that tions also have implementation-specific lookaside infor- are performed, by the processor executing the pte- mation that is used in address translation. This sync instruction, after the ptesync instruction lookaside information may be: a Translation Lookaside completes. Executing a ptesync instruction Buffer (TLB) which is a cache of recently used Page ensures that all such stores will be performed, with 442 Power ISATM -- Book III-S Version 2.04 Table Entries (PTEs); a cache of recently used transla- 5.9.3.1 SLB Management Instructions tions of effective addresses to real addresses; etc.; or any combination of these. Lookaside information, Programming Note including the SLB, is managed using the instructions Accesses to a given SLB entry caused by the described in the subsections of this section. instructions described in this section obey the Lookaside information derived from PTEs is not neces- sequential execution model with respect to the con- sarily kept consistent with the Page Table. When soft- tents of the entry and with respect to data depen- ware alters the contents of a PTE, in general it must dencies on those contents. That is, if an instruction also invalidate all corresponding implementation-spe- sequence contains two or more of these instruc- cific lookaside information; exceptions to this rule are tions, when the sequence has completed, the final described in Section 5.10.1.2. state of the SLB entry and of General Purpose Registers is as if the instructions had been exe- The effects of the slbie, slbia, and TLB Management cuted in program order. instructions on address translations, as specified in Sections 5.9.3.1 and 5.9.3.3 for the SLB and TLB However, software synchronization is required in respectively, apply to all implementation-specific looka- order to ensure that any alterations of the entry side information that is used in address translation. take effect correctly with respect to address transla- Unless otherwise stated or obvious from context, refer- tion; see Chapter 10. ences to SLB entry invalidation and TLB entry invalida- tion elsewhere in the Books apply also to all implementation-specific lookaside information that is SLB Invalidate Entry X-form derived from SLB entries and PTEs respectively. slbie RB The tlbia instruction is optional. However, all implemen- tations provide a means by which software can invali- 31 /// /// RB 434 / date all implementation-specific lookaside information 0 6 11 16 21 31 that is derived from PTEs. Implementation-specific lookaside information that con- ea 0:35 1 (RB)0:35 tains translations of effective addresses to real if, for SLB entry that translates addresses may include "translations" that apply in real or most recently translated ea, entry_class = (RB)36 and addressing mode. Because such "translations" are entry_seg_size = size specified in (RB)37:38 affected by the contents of the LPCR, RMOR, and then for SLB entry (if any) that translates ea HRMOR, when software alters the contents of these SLBEV 1 0 registers it must also invalidate the corresponding all other fields of SLBE 1 undefined implementation-specific lookaside information. else s 1 log_base_2(entry_seg_size) All implementations that have such lookaside informa- esid 1 (RB)0:63-s tion provide a means by which software can invalidate translation of esid 1 undefined all such lookaside information. Let the Effective Address (EA) be any EA for which For simplicity, elsewhere in the Books it is assumed that EA0:35 = (RB)0:35. Let the class be (RB)36. Let the seg- the TLB exists. ment size be equal to the segment size specified in (RB)37:38; the allowed values of (RB)37:38, and the cor- Programming Note respondence between the values and the segment Because the instructions used to manage imple- size, are the same as for the B field in the SLBE (see mentation-specific lookaside information that is Figure 16 on page 428). derived from PTEs may be changed in a future ver- The class value and segment size must be the same as sion of the architecture, it is recommended that the class value and segment size in the SLB entry that software "encapsulate" uses of the TLB Manage- translates the EA, or the values that were in the SLB ment instructions into subroutines. entry that most recently translated the EA if the transla- tion is no longer in the SLB; if these values are not the Programming Note same, the results of translating effective addresses that would have been translated by that SLB entry are The function of all the instructions described in undefined, and the next paragraph need not apply. Sections 5.9.3.1 - 5.9.3.3 is independent of whether address translation is enabled or disabled. If the SLB contains only a single entry that translates the EA, then that is the only SLB entry that is invali- For a discussion of software synchronization dated. If the SLB contains more than one such entry, requirements when invalidating SLB and TLB then zero or more such entries are invalidated, and entries, see Chapter 10. similarly for any implementation-specific lookaside Chapter 5. Storage Control 443 Version 2.04 information used in address translation; additionally, a SLB Invalidate All X-form machine check may occur. slbia SLB entries are invalidated by setting the V bit in the entry to 0, and the remaining fields of the entry are set 31 /// /// /// 498 / to undefined values. 0 6 11 16 21 31 The processor ignores the contents of RB listed below and software must set them to 0s. for each SLB entry except SLB entry 0 - (RB)37 SLBEV 1 0 - (RB)39:63 all other fields of SLBE 1 undefined - If s = 40, (RB)24:35 For all SLB entries except SLB entry 0, the V bit in the entry is set to 0, making the entry invalid, and the If this instruction is executed in 32-bit mode, (RB)0:31 remaining fields of the entry are set to undefined val- must be zeros. ues. SLB entry 0 is not altered. This instruction is privileged. This instruction is privileged. Special Registers Altered: Special Registers Altered: None None Programming Note Programming Note slbie does not affect SLBs on other processors. slbia does not affect SLBs on other processors. Programming Note Programming Note The reason the class value specified by slbie must If slbia is executed when instruction address trans- be the same as the Class value that is or was in the lation is enabled, software can ensure that attempt- relevant SLB entry is that the processor may use ing to fetch the instruction following the slbia does these values to optimize invalidation of implemen- not cause an Instruction Segment interrupt by plac- tation-specific lookaside information used in ing the slbia and the subsequent instruction in the address translation. If the value specified by slbie effective segment mapped by SLB entry 0. (The differs from the value that is or was in the relevant preceding assumes that no other interrupts occur SLB entry, these optimizations may produce incor- between executing the slbia and executing the rect results. (An example of implementation-spe- subsequent instruction.) cific address translation lookaside information is the set of recently used translations of effective addresses to real addresses that some processors maintain in an Effective to Real Address Translation (ERAT) lookaside buffer.) The recommended use of the Class field is to use the 0 value to indicate that the SLB entry contains a translation that is expected to be long-lived and the 1 value to indicate the SLB entry contains a transla- tion that is expected to be short lived. If this is done and the processor invalidates certain implementa- tion-specific lookaside information based only on the specified class value, an slbie instruction that invalidates a short-lived translation will preserve such lookaside information for long-lived transla- tions. The Move To Segment Register instructions (see Section 5.9.3.2.1) create SLB entries in which the Class value is 0. Programming Note The B value in register RB may be needed for inval- idating ERAT entries corresponding to the transla- tion being invalidated. 444 Power ISATM -- Book III-S Version 2.04 SLB Move To Entry X-form Programming Note slbmte RS,RB The reason slbmte cannot be used to invalidate an SLB entry is that it does not necessarily affect 31 RS /// RB 402 / implementation-specific address translation looka- 0 6 11 16 21 31 side information. slbie (or slbia) must be used for this purpose. The SLB entry specified by bits 52:63 of register RB is loaded from register RS and from the remainder of reg- ister RB. The contents of these registers are inter- preted as shown in Figure 27. RS B VSID KsKpNLC 0 LP 0s 0 2 52 57 58 60 63 RB ESID V 0s index 0 36 37 52 63 RS0:1 B RS2:51 VSID RS52 Ks RS53 Kp RS54 N RS55 L RS56 C RS57 must be 0b0 RS58:59 LP RS60:63 must be 0b0000 RB0:35 ESID RB36 V RB37:51 must be 0b000 || 0x000 RB52:63 index, which selects the SLB entry Figure 27. GPR contents for slbmte On implementations that support a virtual address size of only n bits, n<78, (RS)0:77-n must be zeros. (RS)57 and (RS)60:63 must be ignored by the processor. High-order bits of (RB)52:63 that correspond to SLB entries beyond the size of the SLB provided by the implementation must be zeros. If this instruction is executed in 32-bit mode, (RB)0:31 must be zeros (i.e., the ESID must be in the range 0- 15). This instruction cannot be used to invalidate an SLB entry. This instruction is privileged. Special Registers Altered: None Chapter 5. Storage Control 445 Version 2.04 SLB Move From Entry VSID X-form SLB Move From Entry ESID X-form slbmfev RT,RB slbmfee RT,RB 31 RT /// RB 851 / 31 RT /// RB 915 / 0 6 11 16 21 31 0 6 11 16 21 31 If the SLB entry specified by bits 52:63 of register RB is If the SLB entry specified by bits 52:63 of register RB is valid (V=1), the contents of the B, VSID, Ks, Kp, N, L, C, valid (V=1), the contents of the ESID and V fields of the and LP fields of the entry are placed into register RT. entry are placed into register RT. The contents of these The contents of these registers are interpreted as registers are interpreted as shown in Figure 29. shown in Figure 28. RT RT ESID V 0s B VSID KsKpNLC 0 LP 0s 0 36 37 63 0 2 52 57 58 60 63 RB RB 0s index 0s index 0 52 63 0 52 63 RT0:35 ESID RT0:1 B RT36 V RT2:51 VSID RT37:63 set to 0b000 || 0x00_0000 RT52 Ks RB0:51 must be 0x0_0000_0000_0000 RT53 Kp RB52:63 index, which selects the SLB entry RT54 N Figure 29. GPR contents for slbmfee RT55 L RT56 C If the SLB entry specified by bits 52:63 of register RB is RT57 set to 0b0 invalid (V=0), the contents of register RT are set to 0. RT58:59 LP RT60:63 set to 0b0000 High-order bits of (RB)52:63 that correspond to SLB entries beyond the size of the SLB provided by the RB0:51 must be 0x0_0000_0000_0000 implementation must be zeros. RB52:63 index, which selects the SLB entry This instruction is privileged. Figure 28. GPR contents for slbmfev Special Registers Altered: On implementations that support a virtual address size None of only n bits, n<78, RT0:77-n are set to zeros. If the SLB entry specified by bits 52:63 of register RB is invalid (V=0), the contents of register RT are set to 0. High-order bits of (RB)52:63 that correspond to SLB entries beyond the size of the SLB provided by the implementation must be zeros. This instruction is privileged. Special Registers Altered: None 446 Power ISATM -- Book III-S Version 2.04 5.9.3.2 Bridge to SLB Architecture [Category:Server.Phased-Out] The facility described in this section can be used to RS/RT ease the transition to the current Power ISA software- ::: . KsKpN 0 VSID23:49 managed Segment Lookaside Buffer (SLB) architec- 0 32 33 36 37 63 ture, from the Segment Register architecture provided by 32-bit PowerPC implementations. A complete RB description of the Segment Register architecture may --- ESID --- be found in "Segmented Address Translation, 32-Bit 0 32 36 63 Implementations," Section 4.5, Book III of Version 1.10 of the PowerPC architecture, referenced in the intro- Figure 30. GPR contents for mtsr, mtsrin, mfsr, and duction to this architecture. mfsrin The facility permits the operating system to continue to Programming Note use the 32-bit PowerPC implementation's Segment Register Manipulation instructions. The "Segment Register" format used by the instruc- tions described in this section corresponds to the 5.9.3.2.1 Segment Register low-order 32 bits of RS and RT shown in the figure. This format is essentially the same as that for the Manipulation Instructions Segment Registers of 32-bit PowerPC implementa- The instructions described in this section -- mtsr, tions. The only differences are the following. mtsrin, mfsr, and mfsrin -- allow software to associate 1 Bit 36 corresponds to a reserved bit in Seg- effective segments 0 through 15 with any of virtual seg- ment Registers. Software must supply 0 for the ments 0 through 227-1. SLB entries 0:15 serve as vir- bit because it corresponds to the L bit in SLB tual Segment Registers, with SLB entry i used to entries, and large pages are not supported for emulate Segment Register i. The mtsr and mtsrin SLB entries created by the Move To Segment instructions move 32 bits from a selected GPR to a Register instructions. selected SLB entry. The mfsr and mfsrin instructions move 32 bits from a selected SLB entry to a selected 1 VSID bits 23:25 correspond to reserved bits in GPR. Segment Registers. Software can use these extra VSID bits to create VSIDs that are larger The contents of the GPRs used by the instructions than those supported by the Segment Register described in this section are shown in Figure 30. Fields Manipulation instructions of 32-bit PowerPC shown as zeros must be zero for the Move To Segment implementations. Register instructions. Fields shown as hyphens are ignored. Fields shown as periods are ignored by the Bit 32 of RS and RT corresponds to the T (direct- Move To Segment Register instructions and set to zero store) bit of early 32-bit PowerPC implementations. by the Move From Segment Register instructions. No corresponding bit exists in SLB entries. Fields shown as colons are ignored by the Move To Segment Register instructions and set to undefined val- Programming Note ues by the Move From Segment Register instructions. The Programming Note in the introduction to Sec- tion 5.9.3.1 applies also to the Segment Register Manipulation instructions described in this section, and to any combination of the instructions described in the two sections, except as specified below for mfsr and mfsrin. The requirement that the SLB contain at most one entry that translates a given effective address (see Section 5.7.6.1) applies to SLB entries created by mtsr and mtsrin. This requirement is satisfied nat- urally if only mtsr and mtsrin are used to create SLB entries for a given ESID, because for these instructions the association between SLB entries and ESID values is fixed (SLB entry i is used for ESID i). However, care must be taken if slbmte is also used to create SLB entries for the ESID, because for slbmte the association between SLB entries and ESID values is specified by software. Chapter 5. Storage Control 447 Version 2.04 Move To Segment Register X-form Move To Segment Register Indirect X-form mtsr SR,RS mtsrin RS,RB 31 RS / SR /// 210 / 0 6 11 12 16 21 31 31 RS /// RB 242 / 0 6 11 16 21 31 The SLB entry specified by SR is loaded from register RS, as follows. The SLB entry specified by (RB)32:35 is loaded from register RS, as follows. SLBE Set to SLB Field(s) Bit(s) SLBE Set to SLB Field(s) 0:31 0x0000_0000 ESID0:31 Bit(s) 32:35 SR ESID32:35 0:31 0x0000_0000 ESID0:31 36 0b1 V 32:35 (RB)32:35 ESID32:35 37:38 0b00 B 36 0b1 V 39:61 0b000||0x0_0000 VSID0:22 62:88 (RS)37:63 VSID23:49 37:38 0b00 B 89:91 (RS)33:35 KsKpN 39:61 0b000||0x0_0000 VSID0:22 92 (RS)36 L ((RS)36 must be 0b0) 62:88 (RS)37:63 VSID23:49 93 0b0 C 89:91 (RS)33:35 KsKpN 94 0b0 reserved 92 (RS)36 L ((RS)36 must be 0b0) 95:96 0b00 LP 93 0b0 C 94 0b0 reserved MSRSF must be 0 when this instruction is executed; otherwise the results are boundedly undefined. 95:96 0b00 LP This instruction is privileged. MSRSF must be 0 when this instruction is executed; Special Registers Altered: otherwise the results are boundedly undefined. None This instruction is privileged. Special Registers Altered: None 448 Power ISATM -- Book III-S Version 2.04 Move From Segment Register X-form Move From Segment Register Indirect X-form mfsr RT,SR mfsrin RT,RB 31 RT / SR /// 595 / 0 6 11 12 16 21 31 31 RT /// RB 659 / 0 6 11 16 21 31 The contents of the low-order 27 bits of the VSID field and the contents of the Ks, Kp, N, and L fields of the The contents of the low-order 27 bits of the VSID field SLB entry specified by SR are placed into register RT and the contents of the Ks, Kp, N, and L fields of the as follows. SLB entry specified by (RB)32:35 are placed into regis- ter RT as follows. SLBE Bit(s) Copied to SLB Field(s) 62:88 RT37:63 VSID23:49 SLBE Bit(s) Copied to SLB Field(s) 89:91 RT33:35 KsKpN 62:88 RT37:63 VSID23:49 92 RT36 L (SLBEL must be 0b0) 89:91 RT33:35 KsKpN 92 RT36 L (SLBEL must be 0b0) RT32 is set to 0. The contents of RT0:31 are undefined. RT32 is set to 0. The contents of RT0:31 are undefined. MSRSF must be 0 when this instruction is executed; otherwise the results are boundedly undefined. MSRSF must be 0 when this instruction is executed; otherwise the results are boundedly undefined. This instruction must be used only to read an SLB entry that was, or could have been, created by mtsr or This instruction must be used only to read an SLB entry mtsrin and has not subsequently been invalidated (i.e., that was, or could have been, created by mtsr or an SLB entry in which ESID<16, V=1, VSID<227, L=0, mtsrin and has not subsequently been invalidated (i.e., and C=0). If the SLB entry is invalid (V=0), RT33:63 are an SLB entry in which ESID<16, V=1, VSID<227, L=0, set to 0. Otherwise the contents of register RT are and C=0). If the SLB entry is invalid (V=0), RT33:63 are undefined. set to 0. Otherwise the contents of register RT are undefined. This instruction is privileged. This instruction is privileged. Special Registers Altered: None Special Registers Altered: None Chapter 5. Storage Control 449 Version 2.04 5.9.3.3 TLB Management Instructions TLB Invalidate Entry X-form LP field of the PTE that was used to create the TLB entry to be invalidated. tlbie RB,L - (RB)0:43+c must contain bits 14:77-p of the vir- [Category: Server] tual address translated by the TLB to be inval- idated, followed by p+c-20 zeros which must 31 /// L /// RB 306 / be ignored by the processor. 0 6 10 11 16 21 31 Let the segment size be equal to the segment size if L = 0 specified in RB54:55 (B field). The contents of RB54:55 then must be the same as the contents of PTEB used to cre- p = 12 ate the TLB entry to be invalidated. if (RB)56=0 then pg_size 1 4 KB RB52:53 , RB56 (when the L field of the instruction is 1), else pg_size 1 64 KB and RB57:63 must be set to zeros and must be ignored else by the processor. pg_size 1 page size specified in (RB)44:51 All TLB entries that have all of the following properties p 1 log_base_2(pg_size) sg_size 12segment size specified in (RB)54:55 are made invalid on all processors that are in the same for each processor in the partition partition as the processor executing the tlbie instruc- for each TLB entry tion. if (entry_VA14:77-p = (RB)0:63-p) & 1 The entry translates a virtual address for which (entry_sg_size = sg_size) & VA14:77-p is equal to (RB)0:63-p. (entry_pg_size = pg_size) then TLB entry 1 invalid 1 The segment size of the entry is the same as the segment size specified in (RB)54:55. The operation performed by this instruction is based 1 Either of the following is true: upon the contents of RB and the L field. The contents - The L field in the instruction is 0, and either of RB are shown below, where L is the L field in the the page size of the entry is 4KB and instruction. (RB)56=0, or the page size of the entry is L=0: 64KB and (RB)56 =1. - The L field of the instruction is 1, and the page size of the entry matches the page size speci- VPN 0s B AP 0s fied in (RB)44:51. 0 52 54 56 57 63 Additional TLB entries may also be made invalid on any L=1: processor that is in the same partition as the processor executing the tlbie instruction. VPN LP 0s B 0s 0 44 52 54 56 63 MSRSF must be 1 when this instruction is executed; otherwise the results are undefined. If the L field of the instruction contains 0, RB56 (AP - Admixed Page size field) must be set to 0 if the page The operation performed by this instruction is ordered size specified by the PTE that was used to create the by the eieio (or sync or ptesync) instruction with TLB entry to be invalidated is 4 KB and must be set to 1 respect to a subsequent tlbsync instruction executed if the page size specified by the PTE that was used to by the processor executing the tlbie instruction. The create the TLB entry to be invalidated is 64 KB. The operations caused by tlbie and tlbsync are ordered by VPN field in register RB must contain bits 14:65 of the eieio as a fourth set of operations, which is indepen- virtual address translated by the TLB entry to be invali- dent of the other three sets that eieio orders. dated. This instruction is privileged, and can be executed only If the L field in the instruction contains 1, the following in hypervisor state. If it is executed in privileged but rules apply, where c is the number of "r" bits in the LP non-hypervisor state either a privileged Instruction type field of the PTE that was used to create the TLB entry Program interrupt occurs or the results are boundedly to be invalidated. undefined. - The page size is specified in the LP field in register RB, where the relationship between See Section 5.10, "Page Table Update Synchronization (RB)LP and the page size is the same as the Requirements" for a description of other requirements relationship between PTELP and the page size associated with the use of this instruction. (see Figure 6). Specifically, (RB)44+c:51 must be equal to the contents of bits c:7 of the 450 Power ISATM -- Book III-S Version 2.04 Special Registers Altered: None Programming Note For tlbie[l] instructions in which L=0, the AP value in RB is provided to make it easier for the processor to locate address translations, in lookaside buffers, corresponding to the address translation being invalidated. Chapter 5. Storage Control 451 Version 2.04 TLB Invalidate Entry Local X-form - (RB)0:43+c must contain bits 14:77-p of the vir- tual address translated by the TLB to be inval- tlbiel RB,L idated, followed by p+c-20 zeros which must [Category: Server] be ignored by the processor. Let the segment size be equal to the segment size 31 /// L /// RB 274 / specified in RB54:55 (B field). The contents of RB54:55 0 6 10 11 16 21 31 must be the same as the contents of PTEB used to cre- ate the TLB entry to be invalidated. if L = 0 then RB52:53 , RB56 (when the L field of the instruction is 1), p = 12 and RB 57:63 must be set to 0s and must be ignored by if (RB)56=0 the processor. then pg_size 1 4 KB else pg_size 1 64 KB All TLB entries that have all of the following properties else are made invalid on the processor executing the tlbiel pg_size 1 page size specified in (RB)44:51 instruction. p 1 log_base_2(pg_size) sg_size 12segment size specified in (RB)54:55 1 The entry translates a virtual address for which for each TLB entry VA14:77-p is equal to (RB)0:63-p. if (entry_VA14:77-p = (RB)0:63-p) & 1 The segment size of the entry is the same as the (entry_sg_size = segment_size) segment size specified in (RB)54:55. (entry_pg_size = pg_size) 1 Either of the following is true: then TLB entry 1 invalid - The L field in the instruction is 0, and either The operation performed by this instruction is based the page size of the entry is 4KB and upon the contents of RB and the L field. The contents (RB)56=0, or the page size of the entry is of RB are shown below, where L is the L field in the 64KB and (RB)56 =1. instruction. - The L field of the instruction is 1, and the page size of the entry matches the page size speci- L=0: fied in (RB)44:51. Only TLB entries on the processor executing the tlbiel VPN 0s B AP 0s instruction are affected. 0 52 54 56 57 63 MSRSF must be 1 when this instruction is executed; L=1: otherwise the results are undefined. VPN LP 0s B 0s This instruction is privileged, and can be executed only 0 44 52 54 56 63 in hypervisor state. If it is executed in privileged but non-hypervisor state either a Privileged Instruction type If the L field of the instruction contains 0, RB56 (AP - Program interrupt occurs or the results are boundedly Admixed Page size field) must be set to 0 if the page undefined. size specified by the PTE that was used to create the TLB entry to be invalidated is 4 KB and must be set to 1 if the page size specified by the PTE that was used to See Section 5.10, "Page Table Update Synchronization create the TLB entry to be invalidated is 64 KB. The Requirements" on page 454 for a description of other VPN field in register RB must contain bits 14:65 of the requirements associated with the use of this instruction. virtual address translated by the TLB entry to be invali- dated. Special Registers Altered: None If the L field in the instruction contains 1, the following rules apply, where c is the number of "r" bits in the LP Programming Note field of the PTE that was used to create the TLB entry The primary use of this instruction by hypervisor to be invalidated. state code is to invalidate TLB entries prior to reas- - The page size is specified in the LP field in signing a processor to a new logical partition. register RB, where the relationship between (RB)LP and the page size is the same as the tlbiel may be executed on a given processor even if relationship between PTELP and the page size the sequence tlbie - eieio - tlbsync - ptesync is (see Figure 6). Specifically, (RB)44+c:51 concurrently being executed on another processor. must be equal to the contents of bits c:7 of the See also the Programming Note with the descrip- LP field of the PTE that was used to create the tion of the tlbie instruction. TLB entry to be invalidated. 452 Power ISATM -- Book III-S Version 2.04 TLB Invalidate All X-form TLB Synchronize X-form tlbia tlbsync 31 /// /// /// 370 / 31 /// /// /// 566 / 0 6 11 16 21 31 0 6 11 16 21 31 all TLB entries 1 invalid The tlbsync instruction provides an ordering function for the effects of all tlbie instructions executed by the All TLB entries are made invalid on the processor exe- processor executing the tlbsync instruction, with cuting the tlbia instruction. respect to the memory barrier created by a subsequent This instruction is privileged, and can be executed only ptesync instruction executed by the same processor. in hypervisor state. If it is executed in privileged but Executing a tlbsync instruction ensures that all of the non-hypervisor state either a Privileged instruction type following will occur. Program interrupt occurs or the results are boundedly 1 All TLB invalidations caused by tlbie instructions undefined. preceding the tlbsync instruction will have com- This instruction is optional, and need not be imple- pleted on any other processor before any data mented. accesses caused by instructions following the pte- sync instruction are performed with respect to that Special Registers Altered: processor. None 1 All storage accesses by other processors for which Programming Note the address was translated using the translations being invalidated, and all Reference and Change tlbia does not affect TLBs on other processors. bit updates associated with address translations that were performed by other processors using the translations being invalidated, will have been per- formed with respect to the processor executing the ptesync instruction, to the extent required by the associated Memory Coherence Required attributes, before the ptesync instruction's mem- ory barrier is created. The operation performed by this instruction is ordered by the eieio (or sync or ptesync) instruction with respect to preceding tlbie instructions executed by the processor executing the tlbsync instruction. The oper- ations caused by tlbie and tlbsync are ordered by eieio as a fourth set of operations, which is indepen- dent of the other three sets that eieio orders. The tlbsync instruction may complete before opera- tions caused by tlbie instructions preceding the tlb- sync instruction have been performed. This instruction is privileged and can be executed only in hypervisor state. If it is executed in privileged but non-hypervisor state either a Privileged Instruction type Program interrupt occurs or the results are boundedly undefined. See Section 5.10 for a description of other require- ments associated with the use of this instruction. Special Registers Altered: None Programming Note tlbsync should not be used to synchronize the completion of tlbiel. Chapter 5. Storage Control 453 Version 2.04 5.10 Page Table Update Synchronization Requirements This section describes rules that software must follow Unsynchronized lookups in the HTAB continue when updating the Page Table, and includes suggested even while it is being modified. Any processor, sequences of operations for some representative including a processor on which software is modifying cases. the HTAB, may look in the HTAB at any time in an attempt to translate a virtual address. When modifying In the sequences of operations shown in the following a PTE, software must ensure that the PTE's Valid bit is subsections, any alteration of a Page Table Entry (PTE) 0 if the PTE is inconsistent (e.g., if the RPN field is not that corresponds to a single line in the sequence is correct for the current AVPN field). assumed to be done using a Store instruction for which the access is atomic. Appropriate modifications must Updates of Reference and Change bits by the pro- be made to these sequences if this assumption is not cessor are not synchronized with the accesses that satisfied (e.g., if a store doubleword operation is done cause the updates. When modifying doubleword 1 of using two Store Word instructions). a PTE, software must take care to avoid overwriting a processor update of these bits and to avoid having the Stores are not performed out-of-order, as described in value written by a Store instruction overwritten by a pro- Section 5.5, "Performing Operations Out-of-Order" on cessor update. page 420. Moreover, address translations associated with instructions preceding the corresponding Store Before permitting one or more tlbie instructions to be instructions are not performed again after the stores executed on a given processor in a given partition soft- have been performed. (These address translations ware must ensure that no other processor will execute must have been performed before the store was deter- a "conflicting instruction" until after the following mined to be required by the sequential execution sequence of instructions has been executed on the model, because they might have caused an exception.) given processor. As a result, an update to a PTE need not be preceded by a context synchronizing operation. the tlbie instruction(s) eieio All of the sequences require a context synchronizing tlbsync operation after the sequence if the new contents of the ptesync PTE are to be used for address translations associated with subsequent instructions. The "conflicting instructions" in this case are the follow- ing. As noted in the description of the Synchronize instruc- tion in Section 3.3.3 of Book II, address translation 1 a tlbie or tlbsync instruction, if executed on associated with instructions which occur in program another processor in the given partition order subsequent to the Synchronize (and this includes 1 an mtspr instruction that modifies the LPIDR, if the the ptesync variant) may actually be performed prior to modification has either of the following properties. the completion of the Synchronize. To ensure that these instructions and data which may have been spec- - The old LPID value (i.e., the contents of the ulatively fetched are discarded, a context synchronizing LPIDR just before the mtspr instruction is operation is required. executed) is the value that identifies the given partition Programming Note - The new LPID value (i.e., the value specified by the mtspr instruction) is the value that In many cases this context synchronization will identifies the given partition occur naturally; for example, if the sequence is exe- cuted within an interrupt handler the rfid or hrfid Other instructions (excluding mtspr instructions that instruction that returns from the interrupt handler modify the LPIDR as described above, and excluding may provide the required context synchronization. tlbie instructions except as shown) may be interleaved with the instruction sequence shown above, but the instructions in the sequence must appear in the order shown. On uniprocessor systems, the eieio and tlb- Page Table Entries must not be changed in a manner sync instructions can be omitted. Other instructions that causes an implicit branch. may be interleaved with this sequence of instructions, but these instructions must appear in the order shown. 5.10.1 Page Table Updates TLBs are non-coherent caches of the HTAB. TLB entries must be invalidated explicitly with one of the TLB Invalidate instructions. 454 Power ISATM -- Book III-S Version 2.04 Programming Note Programming Note The eieio instruction prevents the reordering of For all of the sequences shown in the following tlbie instructions previously executed by the pro- subsections, if it is necessary to communicate com- cessor with respect to the subsequent tlbsync pletion of the sequence to software running on instruction. The tlbsync instruction and the subse- another processor, the ptesync instruction at the quent ptesync instruction together ensure that all end of the sequence should be followed by a Store storage accesses for which the address was trans- instruction that stores a chosen value to some cho- lated using the translations being invalidated, and sen storage location X. The memory barrier cre- all Reference and Change bit updates associated ated by the ptesync instruction ensures that if a with address translations that were performed Load instruction executed by another processor using the translations being invalidated, will be per- returns the chosen value from location X, the formed with respect to any processor or mecha- sequence's stores to the Page Table have been nism, to the extent required by the associated performed with respect to that other processor. The Memory Coherence Required attributes, before any Load instruction that returns the chosen value data accesses caused by instructions following the should be followed by a context synchronizing ptesync instruction are performed with respect to instruction in order to ensure that all instructions that processor or mechanism. following the context synchronizing instruction will be fetched and executed using the values stored by The requirements specified above for tlbie instructions the sequence (or values stored subsequently). apply also to tlbsync instructions, except that the (These instructions may have been fetched or exe- "sequence of instructions" consists solely of the tlb- cuted out-of-order using the old contents of the sync instruction(s) followed by a ptesync instruction. PTE.) Before permitting an mtspr instruction that modifies the This Note assumes that the Page Table and loca- LPIDR to be executed on a given processor, software tion X are in storage that is Memory Coherence must ensure that no other processor will execute a Required. "conflicting instruction" until after the mtspr instruction followed by a context synchronizing instruction have been executed on the given processor (a context syn- 5.10.1.1 Adding a Page Table Entry chronizing event can be used instead of the context This is the simplest Page Table case. The Valid bit of synchronizing instruction; see Chapter 10). the old entry is assumed to be 0. The following The "conflicting instructions" in this case are the follow- sequence can be used to create a PTE, maintain a ing. consistent state, and ensure that a subsequent refer- ence to the virtual address translated by the new entry 1 a tlbie or tlbsync instruction, if executed on a pro- will use the correct real address and associated cessor in either of the following partitions attributes - the partition identified by the old LPID value PTEARPN,LP,AC,R,C,WIMG,N,PP 1 new values - the partition identified by the new LPID value eieio /* order 1st update before 2nd */ PTEB,AVPN,SW,L,H,V 1 new values (V=1) Programming Note ptesync /* order updates before next The restrictions specified above regarding modify- Page Table search and before ing the LPIDR apply even on uniprocessor sys- next data access. */ tems, and even if the new LPID value is equal to the old LPID value. Similarly, when a tlbsync instruction has been exe- cuted by a processor in a given partition, a ptesync instruction must be executed by that processor before a tlbie or tlbsync instruction is executed by another pro- cessor in that partition. The sequences of operations shown in the following subsections assume a multiprocessor environment. In a uniprocessor environment the tlbsync must be omit- ted, and the eieio that separates the tlbie from the tlb- sync can be omitted. In a multiprocessor environment, when tlbiel is used instead of tlbie in a Page Table update, the synchronization requirements are the same as when tlbie is used in a uniprocessor environment. Chapter 5. Storage Control 455 Version 2.04 5.10.1.2 Modifying a Page Table Entry Resetting the Reference Bit If the only change being made to a valid entry is to set General Case the Reference bit to 0, a simpler sequence suffices If a valid entry is to be modified and the translation because the Reference bit need not be maintained instantiated by the entry being modified is to be invali- exactly. dated, the following sequence can be used to modify the PTE, maintain a consistent state, ensure that the oldR 1 PTER /* get old R */ if oldR = 1 then translation instantiated by the old entry is no longer PTER 1 0 /* store byte (R=0, other bits available, and ensure that a subsequent reference to unchanged) */ the virtual address translated by the new entry will use tlbie(B,VA14:77-p,L,LP,AP) /* invalidate entry */ the correct real address and associated attributes. (The eieio /* order tlbie before tlbsync */ sequence is equivalent to deleting the PTE and then tlbsync /* order tlbie before ptesync */ adding a new one; see Sections 5.10.1.1 and 5.10.1.3.) ptesync /* order tlbie, tlbsync, and update before next Page Table search PTEV 1 0 /* (other fields don't matter)*/ and before next data access */ ptesync /* order update before tlbie and before next Page Table search */ Modifying the SW field tlbie(old_B,old_VA14:77-p,old_L,old_LP,old_AP) /*invalidate old translation*/ If the only change being made to a valid entry is to eieio /* order tlbie before tlbsync */ modify the SW field, the following sequence suffices, tlbsync /* order tlbie before ptesync */ because the SW field is not used by the processor and ptesync /* order tlbie, tlbsync and 1st doubleword 0 of the PTE is not modified by the proces- update before 2nd update */ sor. PTEARPN,LP,AC,R,C,WIMG,N,PP 1 new values eieio /* order 2nd update before 3rd */ loop: ldarx r1 1 PTE_dwd_0 /* load dwd 0 of PTE */ PTEB,AVPN,SW,L,H,V 1 new values (V=1) r157:60 1 new SW value /* replace SW, in r1 */ ptesync /* order 2nd and 3rd updates before stdcx. PTE_dwd_0 12r1 /* store dwd 0 of PTE next Page Table search and if still reserved (new SW value, other before next data access */ fields unchanged) */ bne- loop /* loop if lost reservation */ A lwarx/stwcx. pair (specifying the low-order word of doubleword 0 of the PTE) can be used instead of the ldarx /stdcx. pair shown above. Modifying the Virtual Address If the virtual address translated by a valid PTE is to be modified and the new virtual address hashes to the same two PTEGs as does the old virtual address, the following sequence can be used to modify the PTE, maintain a consistent state, ensure that the translation instantiated by the old entry is no longer available, and ensure that a subsequent reference to the virtual address translated by the new entry will use the correct real address and associated attributes. PTEAVPN,SW,L,H,V 1 new values (V=1) ptesync /* order update before tlbie and before next Page Table search */ tlbie(old_B,old_VA14:77-p,old_L,old_LP,old_AP) /*invalidate old translation*/ eieio /* order tlbie before tlbsync */ tlbsync /* order tlbie before ptesync */ ptesync /* order tlbie, tlbsync, and update before next data access */ 456 Power ISATM -- Book III-S Version 2.04 5.10.1.3 Deleting a Page Table Entry The following sequence can be used to ensure that the translation instantiated by an existing entry is no longer available. PTEV 1 0 /* (other fields don't matter) */ ptesync /* order update before tlbie and before next Page Table search */ tlbie(old_B,old_VA14:77-p,old_L,old_LP,old_AP) /*invalidate old translation*/ eieio /* order tlbie before tlbsync */ tlbsync /* order tlbie before ptesync */ ptesync /* order tlbie, tlbsync, and update before next data access */ Chapter 5. Storage Control 457 Version 2.04 458 Power ISATM -- Book III-S Version 2.04 Chapter 6. Interrupts 6.1 Overview. . . . . . . . . . . . . . . . . . . . 459 6.5.9 Program Interrupt . . . . . . . . . . . . 471 6.2 Interrupt Registers . . . . . . . . . . . . 459 6.5.10 Floating-Point Unavailable 6.2.1 Machine Status Save/Restore Regis- Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 472 ters . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 6.5.11 Decrementer Interrupt . . . . . . . 472 6.2.2 Hypervisor Machine Status Save/ 6.5.12 Hypervisor Decrementer Restore Registers . . . . . . . . . . . . . . . . 460 Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 473 6.2.3 Data Address Register . . . . . . . 460 6.5.13 System Call Interrupt . . . . . . . . 473 6.2.4 Hypervisor Data Address Register 6.5.14 Trace Interrupt [Category: Trace] . . 460 473 6.2.5 Data Storage Interrupt 6.5.15 Hypervisor Data Storage Inter- Status Register . . . . . . . . . . . . . . . . . . 460 rupt . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 6.2.6 Hypervisor Data Storage Interrupt 6.5.16 Hypervisor Instruction Storage Status Register . . . . . . . . . . . . . . . . . 460 Interrupt . . . . . . . . . . . . . . . . . . . . . . . 475 6.3 Interrupt Synchronization . . . . . . . 462 6.5.17 Hypervisor Data Segment Inter- 6.4 Interrupt Classes . . . . . . . . . . . . . 462 rupt . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 6.4.1 Precise Interrupt . . . . . . . . . . . . 462 6.5.18 Hypervisor Instruction Segment 6.4.2 Imprecise Interrupt. . . . . . . . . . . 462 Interrupt . . . . . . . . . . . . . . . . . . . . . . . 475 6.4.3 Interrupt Processing . . . . . . . . . 463 6.5.19 Performance Monitor 6.4.4 Implicit alteration of HSRR0 and Interrupt [Category: Server.Performance HSRR1 . . . . . . . . . . . . . . . . . . . . . . . . 465 Monitor] . . . . . . . . . . . . . . . . . . . . . . . . 476 6.5 Interrupt Definitions . . . . . . . . . . . 466 6.5.20 Vector Unavailable Interrupt [Cate- 6.5.1 System Reset Interrupt . . . . . . . 466 gory: Vector] . . . . . . . . . . . . . . . . . . . . 476 6.5.2 Machine Check Interrupt . . . . . . 467 6.6 Partially Executed 6.5.3 Data Storage Interrupt . . . . . . . . 467 Instructions . . . . . . . . . . . . . . . . . . . . . 477 6.5.4 Data Segment Interrupt . . . . . . . 468 6.7 Exception Ordering . . . . . . . . . . . . 478 6.5.5 Instruction Storage Interrupt . . . 469 6.7.1 Unordered Exceptions . . . . . . . . 478 6.5.6 Instruction Segment 6.7.2 Ordered Exceptions . . . . . . . . . . 478 Interrupt. . . . . . . . . . . . . . . . . . . . . . . . 469 6.8 Interrupt Priorities . . . . . . . . . . . . . 479 6.5.7 External Interrupt . . . . . . . . . . . . 470 6.5.8 Alignment Interrupt . . . . . . . . . . 470 6.1 Overview resources used by most interrupts, program state may be lost when an unordered interrupt is taken. The Power ISA provides an interrupt mechanism to allow the processor to change state as a result of exter- nal signals, errors, or unusual conditions arising in the 6.2 Interrupt Registers execution of instructions. System Reset and Machine Check interrupts are not 6.2.1 Machine Status Save/ ordered. All other interrupts are ordered such that only Restore Registers one interrupt is reported, and when it is processed (taken) no program state is lost. Since Save/Restore When various interrupts occur, the state of the machine Registers SRR0 and SRR1 are serially reusable is saved in the Machine Status Save/Restore registers Chapter 6. Interrupts 459 Version 2.04 (SRR0 and SRR1). Section 6.5 describes which regis- Segment, and Alignment interrupts; see Sections 6.5.2, ters are altered by each interrupt. 6.5.3, 6.5.4, and 6.5.8. In general, when one of these interrupts occurs the DAR is set to an effective address SRR0 // associated with the storage access that caused the 0 62 63 interrupt, with the high-order 32 bits of the DAR set to 0 if the interrupt occurs in 32-bit mode. SRR1 0 63 DAR 0 63 Figure 31. Save/Restore Registers Figure 33. Data Address Register SRR1 bits may be treated as reserved in a given imple- mentation if they correspond to MSR bits that are reserved or are treated as reserved in that implementa- 6.2.4 Hypervisor Data Address tion or, for SRR1 bits in the range 33:36 and 42:47, they Register are specified as being set either to 0 or to an undefined value for all interrupts that set SRR1 (including imple- The Hypervisor Data Address Register (HDAR) is a 64- mentation-dependent setting, e.g. by the Machine bit register that is set by the Hypervisor Data Storage Check interrupt or by implementation-specific inter- and Hypervisor Data Segment interrupts; see Section rupts). 6.5.15 and Section 6.5.17. In general, when one of these interrupts occurs the HDAR is set to an effective address associated with the storage access that 6.2.2 Hypervisor Machine Status caused the interrupt, with the high-order 32 bits of the Save/Restore Registers HDAR set to 0 if the interrupt occurs in 32-bit mode. When various interrupts occur, the state of the machine HDAR is saved in the Hypervisor Machine Status Save/ 0 63 Restore registers (HSRR0 and HSRR1). Section 6.5 describes which registers are altered by each interrupt. Figure 34. Hypervisor Data Address Register HSRR0 // 0 62 63 6.2.5 Data Storage Interrupt Status Register HSRR1 The Data Storage Interrupt Status Register (DSISR) is 0 63 a 32-bit register that is set by the Machine Check, Data Figure 32. Hypervisor Save/Restore Registers Storage, Data Segment, and Alignment interrupts; see Sections 6.5.2, 6.5.3, 6.5.4, and 6.5.8. In general, when HSRR1 bits may be treated as reserved in a given one of these interrupts occurs the DSISR is set to indi- implementation if they correspond to MSR bits that are cate the cause of the interrupt. reserved or are treated as reserved in that implementa- tion or, for HSRR1 bits in the range 33:36 and 42:47, DSISR they are specified as being set either to 0 or to an unde- 32 63 fined value for all interrupts that set HSRR1 (including implementation-dependent setting, e.g. by implementa- Figure 35. Data Storage Interrupt Status Register tion-specific interrupts). DSISR bits may be treated as reserved in a given The HSRR0 and HSRR1 are hypervisor resources; see implementation if they are specified as being set either Chapter 2. to 0 or to an undefined value for all interrupts that set the DSISR (including implementation-dependent set- Programming Note ting, e.g. by the Machine Check interrupt or by imple- Execution of some instructions, and fetching mentation-specific interrupts). instructions when MSRIR=1, may have the side effect of modifying HSRR0 and HSRR1; see Sec- 6.2.6 Hypervisor Data Storage tion 6.4.4. Interrupt Status Register The Hypervisor Data Storage Interrupt Status Register 6.2.3 Data Address Register (HDSISR) is a 32-bit register that is set by the Hypervi- The Data Address Register (DAR) is a 64-bit register sor Data Storage interrupt. In general, when one of that is set by the Machine Check, Data Storage, Data 460 Power ISATM -- Book III-S Version 2.04 these interrupts occurs the HDSISR is set to indicate the cause of the interrupt. HDSISR 32 63 Figure 36. Hypervisor Data Storage Interrupt Status Register Chapter 6. Interrupts 461 Version 2.04 6.3 Interrupt Synchronization 3. The instruction causing the exception may appear not to have begun execution (except for causing When an interrupt occurs, SRR0 or HSRR0 is set to the exception), may have been partially executed, point to an instruction such that all preceding instruc- or may have completed, depending on the interrupt tions have completed execution, no subsequent type. instruction has begun execution, and the instruction 4. Architecturally, no subsequent instruction has addressed by SRR0 or HSRR0 may or may not have begun execution. completed execution, depending on the interrupt type. With the exception of System Reset and Machine 6.4.2 Imprecise Interrupt Check interrupts, all interrupts are context synchroniz- ing as defined in Section 1.5.1. System Reset and This architecture defines one imprecise interrupt, the Machine Check interrupts are context synchronizing if Imprecise Mode Floating-Point Enabled Exception type they are recoverable (i.e., if bit 62 of SRR1 is set to 1 by Program interrupt. the interrupt). If a System Reset or Machine Check When an Imprecise Mode Floating-Point Enabled interrupt is not recoverable (i.e., if bit 62 of SRR1 is set Exception type Program interrupt occurs, the following to 0 by the interrupt), it acts like a context synchronizing conditions exist at the interrupt point. operation with respect to subsequent instructions. That is, a non-recoverable System Reset or Machine Check 1. SRR0 addresses either the instruction causing the interrupt need not satisfy items 1 through 3 of Section exception or some instruction following that 1.5.1, but does satisfy items 4 and 5. instruction; see Section 6.5.9, "Program Interrupt" on page 471. 2. An interrupt is generated such that all instructions 6.4 Interrupt Classes preceding the instruction addressed by SRR0 Interrupts are classified by whether they are directly appear to have completed with respect to the exe- caused by the execution of an instruction or are caused cuting processor. by some other system exception. Those that are "sys- 3. The instruction addressed by SRR0 may appear tem-caused" are: not to have begun execution (except, in some 1 System Reset cases, for causing the interrupt to occur), may 1 Machine Check have been partially executed, or may have com- 1 External pleted; see Section 6.5.9. 1 Decrementer 4. No instruction following the instruction addressed 1 Hypervisor Decrementer by SRR0 appears to have begun execution. External, Decrementer, and Hypervisor Decrementer All Floating-Point Enabled Exception type Program interrupts are maskable interrupts. Therefore, software interrupts are maskable using the MSR bits FE0 and may delay the generation of these interrupts. System FE1. Although these interrupts are maskable, they dif- Reset and Machine Check interrupts are not maskable. fer significantly from the other maskable interrupts in "Instruction-caused" interrupts are further divided into that the masking of these interrupts is usually con- two classes, precise and imprecise. trolled by the application program, whereas the mask- ing of all other maskable interrupts is controlled by either the operating system or the hypervisor. 6.4.1 Precise Interrupt Except for the Imprecise Mode Floating-Point Enabled Exception type Program interrupt, all instruction- caused interrupts are precise. When the fetching or execution of an instruction causes a precise interrupt, the following conditions exist at the interrupt point. 1. SRR0 addresses either the instruction causing the exception or the immediately following instruction. Which instruction is addressed can be determined from the interrupt type and status bits. 2. An interrupt is generated such that all instructions preceding the instruction causing the exception appear to have completed with respect to the exe- cuting processor. 462 Power ISATM -- Book III-S Version 2.04 6.4.3 Interrupt Processing Associated with each kind of interrupt is an interrupt vector, which contains the initial sequence of instruc- tions that is executed when the corresponding interrupt occurs. Interrupt processing consists of saving a small part of the processor's state in certain registers, identifying the cause of the interrupt in other registers, and continuing execution at the corresponding interrupt vector location. When an exception exists that will cause an interrupt to be generated and it has been determined that the inter- rupt will occur, the following actions are performed. The handling of Machine Check interrupts (see Section 6.5.2) differs from the description given below in several respects. 1. SRR0 or HSRR0 is loaded with an instruction address that depends on the type of interrupt; see the specific interrupt description for details. 2. Bits 33:36 and 42:47 of SRR1 or HSRR1 are loaded with information specific to the interrupt type. 3. Bits 0:32, 37:41, and 48:63 of SRR1 or HSRR1 are loaded with a copy of the corresponding bits of the MSR. 4. The MSR is set as shown in Figure 37 on page 466. In particular, MSR bits IR and DR are set to 0, disabling relocation, and MSR bit SF is set to 1, selecting 64-bit mode. The new values take effect beginning with the first instruction executed following the interrupt. 5. Instruction fetch and execution resumes, using the new MSR value, at the effective address specific to the interrupt type. These effective addresses are shown in Figure 38 on page 466. Interrupts do not clear reservations obtained with lwarx or ldarx. Programming Note In general, when an interrupt occurs, the following instructions should be executed by the operating system before dispatching a "new" program. 1 stwcx. or stdcx., to clear the reservation if one is outstanding, to ensure that a lwarx or ldarx in the interrupted program is not paired with a stwcx. or stdcx. in the "new" program. 1 sync, to ensure that all storage accesses caused by the interrupted program will be per- formed with respect to another processor before the program is resumed on that other processor. 1 isync or rfid, to ensure that the instructions in the "new" program execute in the "new" con- text. Chapter 6. Interrupts 463 Version 2.04 Programming Note For instruction-caused interrupts, in some cases it may a category that the implementation does not be desirable for the operating system to emulate the support but is used by some programs that the instruction that caused the interrupt, while in other operating system supports. cases it may be desirable for the operating system not In general, the instruction should not be emulated if: to emulate the instruction. The following list, while not complete, illustrates criteria by which decisions regard- - The purpose of the instruction is to cause an ing emulation should be made. The list applies to gen- interrupt. Example: System Call interrupt eral execution environments; it does not necessarily caused by sc. apply to special environments such as program debug- - The interrupt is caused by a condition that is ging, processor bring-up, etc. stated, in the instruction description, poten- In general, the instruction should be emulated if: tially to cause the interrupt. Example: Align- ment interrupt caused by lwarx for which the - The interrupt is caused by a condition for storage operand is not aligned. which the instruction description (including related material such as the introduction to the - The program is attempting to perform a func- section describing the instruction) implies that tion that it should not be permitted to perform. the instruction works correctly. Example: Example: Data Storage interrupt caused by Alignment interrupt caused by lmw for which lwz for which the storage operand is in stor- the storage operand is not aligned, or by dcbz age that the program should not be permitted for which the storage operand is in storage to access. (If the function is one that the pro- that is Write Through Required or Caching gram should be permitted to perform, the con- Inhibited. ditions that caused the interrupt should be corrected and the program re-dispatched such - The instruction is an illegal instruction that that the instruction will be re-executed. Exam- should appear, to the program executing it, as ple: Data Storage interrupt caused by lwz for if it were supported by the implementation. which the storage operand is in storage that Example: Illegal Instruction type Program the program should be permitted to access interrupt caused by an instruction that has but for which there currently is no PTE that been phased out of the architecture but is still satisfies the Page Table search.) used by some programs that the operating system supports, or by an instruction that is in Programming Note If a program modifies an instruction that it or another program will subsequently execute and the execution of the instruction causes an interrupt, the state of storage and the content of some processor registers may appear to be inconsistent to the inter- rupt handler program. For example, this could be the result of one program executing an instruction that causes an Illegal Instruction type Program interrupt just before another instance of the same program stores an Add Immediate instruction in that storage location. To the interrupt handler code, it would appear that a processor generated the Program interrupt as the result of executing a valid instruction. 464 Power ISATM -- Book III-S Version 2.04 Execution of these instructions is guaranteed not Programming Note to have the side effect of altering HSRR0 and In order to handle Machine Check and System HSRR1 only if the storage operand is aligned and Reset interrupts correctly, the operating system MSRDR=0. should manage MSRRI as follows. 3. Arithmetic instructions 1 In the Machine Check and System Reset inter- rupt handlers, interpret SRR1 bit 62 (where addi, addis, add, subf, neg MSRRI is placed) as: 4. Compare instructions - 0: interrupt is not recoverable - 1: interrupt is recoverable cmpi, cmp, cmpli, cmpl 1 In each interrupt handler, when enough state 5. Logical and Extend Sign instructions has been saved that a Machine Check or Sys- ori, oris, xori, xoris, and, or, xor, nand, nor, eqv, tem Reset interrupt can be recovered from, set andc, orc, extsb, extsh, extsw MSRRI to 1. 6. Rotate and Shift instructions 1 In each interrupt handler, do the following (in order) just before returning. rldicl<64>, rldicr<64>, rldic<64>, rlwinm, 1. Set MSRRI to 0. rldcl<64>, rldcr<64>, rlwnm, rldimi<64>, rlwimi, 2. Set SRR0 and SRR1 to the values to be sld<64>, slw, srd<64>, srw used by rfid. The new value of SRR1 7. Other instructions should have bit 62 set to 1 (which will hap- pen naturally if SRR1 is restored to the isync value saved there by the interrupt, rfid, hrfid because the interrupt handler will not be executing this sequence unless the inter- mtspr, mfspr, mtmsrd, mfmsr rupt is recoverable). 3. Execute rfid. Programming Note For interrupts that set the SRRs other than Instructions excluded from the list include the fol- Machine Check or System Reset, MSRRI can be lowing. managed similarly when these interrupts occur 1 instructions that set or use XERCA within interrupt handlers for other interrupts that set 1 instructions that set XEROV or XERSO the SRRs. 1 andi., andis., and fixed-point instructions with Rc=1 (Fixed-point instructions with Rc=1 can This Note does not apply to interrupts that set the be replaced by the corresponding instruction HSRRs because these interrupts put the processor with Rc=0 followed by a Compare instruction.) into hypervisor state, and either do not occur or can 1 all floating-point instructions be prevented from occurring within interrupt han- 1 mftb dlers for other interrupts that set the HSRRs. These instructions, and the other excluded instruc- tions, may be implemented with the assistance of 6.4.4 Implicit alteration of HSRR0 implementation-specific interrupts that modify HSRR0 and HSRR1. The included instructions are and HSRR1 guaranteed not to be implemented thus. (The Executing some of the more complex instructions may included instructions are sufficiently simple as to be have the side effect of altering the contents of HSRR0 unlikely to need such assistance. Moreover, they and HSRR1. The instructions listed below are guaran- are likely to be needed in interrupt handlers before teed not to have this side effect. Any omission of HSRR0 and HSRR1 have been saved or after instruction suffixes is significant; e.g., add is listed but HSRR0 and HSRR1 have been restored.) add. is excluded. Similarly, fetching instructions may have the side effect 1. Branch instructions of altering the contents of HSRR0 and HSRR1 unless MSRIR=0. b[l][a], bc[l][a], bclr[l], bcctr[l] 2. Fixed-Point Load and Store Instructions lbz, lbzx, lhz, lhzx, lwz, lwzx, ld<64>, ldx<64>, stb, stbx, sth, sthx, stw, stwx, std<64>, stdx<64> Chapter 6. Interrupts 465 Version 2.04 6.5 Interrupt Definitions Figure 37 shows all the types of interrupts and the val- Effective ues assigned to the MSR for each. Figure 38 shows the Address1 Interrupt Type effective address of the interrupt vector for each inter- 00..0000_0100 System Reset rupt type. (Section 5.7.4 on page 426 summarizes all 00..0000_0200 Machine Check architecturally defined uses of effective addresses, 00..0000_0300 Data Storage including those implied by Figure 38.) 00..0000_0380 Data Segment 00..0000_0400 Instruction Storage 00..0000_0480 Instruction Segment Interrupt Type MSR Bit 00..0000_0500 External IR DR FE0 FE1 EE RI ME HV 00..0000_0600 Alignment System Reset 0 0 0 0 0 0 - 1 00..0000_0700 Program Machine Check 0 0 0 0 0 0 0 1 00..0000_0800 Floating-Point Unavailable Data Storage 0 0 0 0 0 0 - m 00..0000_0900 Decrementer Data Segment 0 0 0 0 0 0 - m 00..0000_0980 Hypervisor Decrementer Instruction Storage 0 0 0 0 0 0 - m 00..0000_0A00 Reserved Instruction Segment 0 0 0 0 0 0 - m 00..0000_0B00 Reserved External 0 0 0 0 0 0 - e 00..0000_0C00 System Call Alignment 0 0 0 0 0 0 - m 00..0000_0D00 Trace Program 0 0 0 0 0 0 - m 00..0000_0E00 Hypervisor Data Storage FP Unavailable 0 0 0 0 0 0 - m 00..0000_0E10 Hypervisor Instruction Storage Decrementer 0 0 0 0 0 0 - m 00..0000_0E20 Hypervisor Data Segment Hypervisor Decrem'er 0 0 0 0 0 - - 1 00..0000_0E30 Hypervisor Instruction Segment System Call 0 0 0 0 0 0 - s 00..0000_0E40 Reserved Trace 0 0 0 0 0 0 - m . . . ... Hypervisor Data Stg. 0 0 0 0 0 - - 1 00..0000_0EFF Reserved Hypervisor Instr. Stg. 0 0 0 0 0 - - 1 00..0000_0F00 Performance Monitor Hypervisor Instr. Seg. 0 0 0 0 0 - - 1 00..0000_0F10 Reserved Hypervisor Data Seg. 0 0 0 0 0 - - 1 00..0000_0F20 Vector Unavailable3 Performance Monitor 0 0 0 0 0 0 - m 00..0000_0F30 Reserved Vector Unavailable1 0 0 0 0 0 0 - m . . . ... 00..0000_0FFF Reserved 0 bit is set to 0 1 The values in the Effective Address column are 1 bit is set to 1 interpreted as follows. - bit is not altered 1 00...0000_nnnn means m if LPES1=0, set to 1; otherwise not altered 0x0000_0000_0000_nnnn e if LPES0=0, set to 1; otherwise not altered 2 Effective addresses 0x0000_0000_0000_0000 s if LEV=1 or LPES/LPES1=0, set to 1; otherwise through 0x0000_0000_0000_00FF are used by not altered software and will not be assigned as interrupt vectors. Settings for Other Bits 3 Category: Vector. Bits BE, FP, PMM, PR, SE, and VEC1are set to 0. Figure 38. Effective address of interrupt vector by If the interrupt results in HV being equal to 1, the LE bit interrupt type is copied from the HILE bit; otherwise the LE bit is cop- ied from the LPCRILE bit. Programming Note The SF bit is set to 1. When address translation is disabled, use of any of the effective addresses that are shown as reserved Reserved bits are set as if written as 0. in Figure 38 risks incompatibility with future imple- 1 mentations. Category: Vector Figure 37. MSR setting due to interrupt 6.5.1 System Reset Interrupt If a System Reset exception causes an interrupt that is not context synchronizing or causes the loss of a 466 Power ISATM -- Book III-S Version 2.04 Machine Check exception or an External exception, or SRR0 Set on a "best effort" basis to the effective if the state of the processor has been corrupted, the address of some instruction that was exe- interrupt is not recoverable. cuting or was about to be executed when the Machine Check exception occurred. The following registers are set: The details are implementation-dependent. SRR0 Set to the effective address of the instruc- SRR1 tion that the processor would have 62 Loaded from bit 62 of the MSR if the pro- attempted to execute next if no interrupt cessor is in a recoverable state; otherwise conditions were present. set to 0. SRR1 Others Set to an implementation-dependent value. 33:36 Set to 0. MSR See Figure 37. 42:44 Set to an implementation-dependent value. 45:47 Set to 0. DSISR Set to an implementation-dependent value. 62 Loaded from bit 62 of the MSR if the pro- DAR Set to an implementation-dependent value. cessor is in a recoverable state; otherwise set to 0. Execution resumes at effective address Others Loaded from the MSR. 0x0000_0000_0000_0200. MSR See Figure 37 on page 466. Programming Note Execution resumes at effective address If a Machine Check interrupt is caused by an error 0x0000_0000_0000_0100. in the storage subsystem, the storage subsystem Each implementation provides an implementation- may return incorrect data, which may be placed dependent means for software to distinguish power-on into registers. This corruption of register contents Reset from other types of System Reset. may occur even if the interrupt is recoverable. 6.5.2 Machine Check Interrupt 6.5.3 Data Storage Interrupt The causes of Machine Check interrupts are implemen- A Data Storage interrupt occurs when no higher priority tation-dependent. For example, a Machine Check inter- exception exists, the value of the expression rupt may be caused by a reference to a storage location that contains an uncorrectable error or does not exist (MSRHV PR = 0b10)|(¬VPM0 & ¬MSRDR) (see Section 5.6), or by an error in the storage sub- | (¬VPM1 & MSRDR) system. is 1, and a data access cannot be performed for any of Machine Check interrupts are enabled when the following reasons. MSRME=1. If MSRME=0 and a Machine Check occurs, the processor enters the Checkstop state. The Check- 1 Data address translation is enabled (MSRDR=1) stop state may also be entered if an access is and the virtual address of any byte of the storage attempted to a storage location that does not exist (see location specified by a Load, Store, icbi, dcbz, Section 5.6). dcbst, dcbf[l], eciwx, or ecowx instruction cannot be translated to a real address. Disabled Machine Check (Checkstop State) 1 The effective address specified by a lq, stq, lwarx, When a processor is in Checkstop state, instruction ldarx, stwcx., or stdcx. instruction refers to stor- processing is suspended and generally cannot be age that is Write Through Required or Caching restarted without resetting the processor. Some imple- Inhibited. mentations may preserve some or all of the internal 1 The access violates storage protection. state of the processor when entering Checkstop state, 1 A Data Address Breakpoint match occurs. so that the state can be analyzed as an aid in problem 1 Execution of an eciwx or ecowx instruction is dis- determination. allowed because EARE=0. Enabled Machine Check If a stwcx. or stdcx. would not perform its store in the absence of a Data Storage interrupt, and either (a) the If a Machine Check exception causes an interrupt that specified effective address refers to storage that is is not context synchronizing or causes the loss of an Write Through Required or Caching Inhibited, or (b) a External exception, or if the state of the processor has non-conditional Store to the specified effective address been corrupted, the interrupt is not recoverable. would cause a Data Storage interrupt, it is implementa- tion-dependent whether a Data Storage interrupt In some systems, the operating system may attempt to occurs. identify and log the cause of the Machine Check. The following registers are set: Chapter 6. Interrupts 467 Version 2.04 If the contents of the XER specifies a length of zero word for which access was attempted in the bytes for a Move Assist instruction, a Data Storage page that caused the exception. interrupt does not occur for reasons of address transla- 1 a Data Storage exception occurs for tion, or storage protection. If such an instruction causes reasons other than a Data Address a Data Storage interrupt for other reasons, the setting Breakpoint match or, for eciwx and of the DSISR and DAR reflects only these other rea- ecowx, EARE=0 sons listed in the preceding sentence. (E.g., if such an - a byte in the block that caused the instruction causes a storage protection violation and a exception, for a Cache Manage- Data Address Breakpoint match, the DSISR and DAR ment instruction are set as if the storage protection violation did not - a byte in the first aligned double- occur.) word for which access was attempted in the page that caused The following registers are set: the exception, for a Load, Store, SRR0 Set to the effective address of the instruc- eciwx, or ecowx instruction ("first" tion that caused the interrupt. refers to address order; see Section 6.7) SRR1 1 undefined, for a Data Address Break- 33:36 Set to 0. point match, or if eciwx or ecowx is 42:47 Set to 0. executed when EARE=0 Others Loaded from the MSR. For the cases in which the DAR is specified MSR See Figure 37. above to be set to a defined value, if the DSISR interrupt occurs in 32-bit mode the high- 32 Set to 0. order 32 bits of the DAR are set to 0. 33 Set to 1 if MSRDR=1 and the translation for If multiple Data Storage exceptions occur for a given an attempted access is not found in the pri- effective address, any one or more of the bits corre- mary PTEG or in the secondary PTEG; oth- sponding to these exceptions may be set to 1 in the erwise set to 0. DSISR. 34:35 Set to 0. 36 Set to 1 if the access is not permitted by Figure 24 or 25, as appropriate; otherwise Execution resumes at effective address set to 0. 0x0000_0000_0000_0300. 37 Set to 1 if the access is due to a lq, stq, lwarx, ldarx, stwcx., or stdcx. instruction that addresses storage that is Write Through Required or Caching Inhibited; otherwise set to 0. 6.5.4 Data Segment Interrupt 38 Set to 1 for a Store, dcbz, or ecowx A Data Segment interrupt occurs when no higher prior- instruction; otherwise set to 0. ity exception exists and a data access cannot be per- 39:40 Set to 0. formed because data address translation is enabled 41 Set to 1 if a Data Address Breakpoint and the effective address of any byte of the storage match occurs; otherwise set to 0. location specified by a Load, Store, icbi, dcbz, dcbst, 42 Set to 1 if the access is not permitted by dcbf[l] eciwx, or ecowx instruction cannot be trans- virtual page class key protection; otherwise lated to a virtual address. set to 0. 43 Set to 1 if execution of an eciwx or ecowx If a stwcx. or stdcx. would not perform its store in the instruction is attempted when EARE=0; oth- absence of a Data Segment interrupt, and a non-condi- erwise set to 0. tional Store to the specified effective address would 44:63 Set to 0. cause a Data Segment interrupt, it is implementation- dependent whether a Data Segment interrupt occurs. DAR Set to the effective address of a storage element as described in the following list. If a Move Assist instruction has a length of zero (in the The list should be read from the top down; XER), a Data Segment interrupt does not occur, the DAR is set as described by the first item regardless of the effective address. that corresponds to an exception that is The following registers are set: reported in the DSISR. For example, if a Load instruction causes a storage protec- SRR0 Set to the effective address of the instruc- tion violation and a Data Address Break- tion that caused the interrupt. point match (and both are reported in the SRR1 DSISR), the DAR is set to the effective 33:36 Set to 0. address of a byte in the first aligned double- 468 Power ISATM -- Book III-S Version 2.04 42:47 Set to 0. 35 Set to 1 if the access is to No-execute or Others Loaded from the MSR. Guarded storage; otherwise set to 0. 36 Set to 1 if the access is not permitted by MSR See Figure 37. Figure 24, or 25, as appropriate; otherwise DSISR Set to an undefined value. set to 0. DAR Set to the effective address of a storage Programming Note element as described in the following list. Storage protection violations for the 1 a byte in the block that caused the Data Storage Interrupt are reported in Data Segment interrupt, for a Cache DSISR36 and DSISR42, whereas stor- Management instruction age protection violations for the Instruc- 1 a byte in the first aligned doubleword tion Storage Interrupt are reported in for which access was attempted in the SRR135 and SRR136. segment that caused the Data Seg- ment interrupt, for a Load, Store, eciwx, or ecowx instruction ("first" 42:47 Set to 0. refers to address order; see Section 6.7) Others Loaded from the MSR. If the interrupt occurs in 32-bit mode, the MSR See Figure 37. high-order 32 bits of the DAR are set to 0. If multiple Instruction Storage exceptions occur due to Execution resumes at effective address attempting to fetch a single instruction, any one or more 0x0000_0000_0000_0380. of the bits corresponding to these exceptions may be set to 1 in SRR1. Programming Note A Data Segment interrupt occurs if MSRDR=1 and the translation of the effective address of any byte Execution resumes at effective address of the specified storage location is not found in the 0x0000_0000_0000_0400. SLB (or in any implementation-specific address translation lookaside information). 6.5.6 Instruction Segment Interrupt 6.5.5 Instruction Storage Interrupt An Instruction Segment interrupt occurs when no An Instruction Storage interrupt occurs when no higher higher priority exception exists and the next instruction priority exception exists, the value of the expression to be executed cannot be fetched because instruction address translation is enabled and the effective (MSRHV PR = 0b10)|(¬VPM0 & ¬MSRIR) address cannot be translated to a virtual address. | (¬VPM1 & MSRIR) The following registers are set: is 1, and the next instruction to be executed cannot be SRR0 Set to the effective address of the instruction fetched for any of the following reasons. that the processor would have attempted to execute next if no interrupt conditions were 1 Instruction address translation is enabled and the present (if the interrupt occurs on attempting virtual address cannot be translated to a real to fetch a branch target, SRR0 is set to the address. branch target address). 1 The fetch access violates storage protection. SRR1 The following registers are set: 33:36 Set to 0. SRR0 Set to the effective address of the instruction 42:47 Set to 0. that the processor would have attempted to Others Loaded from the MSR. execute next if no interrupt conditions were MSR See Figure 37 on page 466. present (if the interrupt occurs on attempting to fetch a branch target, SRR0 is set to the Execution resumes at effective address branch target address). 0x0000_0000_0000_0480. SRR1 33 Set to 1 if MSRIR=1 and the translation for an attempted access is not found in the pri- mary PTEG or in the secondary PTEG; oth- erwise set to 0. 34 Set to 0. Chapter 6. Interrupts 469 Version 2.04 If a stwcx. or stdcx. would not perform its store in the Programming Note absence of an Alignment interrupt and the specified An Instruction Segment interrupt occurs if effective address refers to storage that is Write Through MSRIR=1 and the translation of the effective Required or Caching Inhibited, it is implementation- address of the next instruction to be executed is not dependent whether an Alignment interrupt occurs. found in the SLB (or in any implementation-specific address translation lookaside information). Setting the DSISR and DAR as described below is optional for implementations on which Alignment inter- rupts occur rarely, if ever, for cases that the Alignment 6.5.7 External Interrupt interrupt handler emulates. For such implementations, if the DSISR and DAR are not set as described below An External interrupt occurs when no higher priority they are set to undefined values. exception exists, an External exception exists, and The following registers are set: MSREE=1. The occurrence of the interrupt does not cause the exception to cease to exist. SRR0 Set to the effective address of the instruction that caused the interrupt. The following registers are set: SRR1 SRR0 Set to the effective address of the instruction 33:36 Set to 0. that the processor would have attempted to 42:47 Set to 0. execute next if no interrupt conditions were Others Loaded from the MSR. present. MSR See Figure 37. SRR1 33:36 Set to 0. DSISR 42:47 Set to 0. 32:43 Set to 0. Others Loaded from the MSR. 44:45 Set to bits 30:31 of the instruction if DS- form. Set to 0b00 if D-, or X-form. MSR See Figure 37. 46 Set to 0. Execution resumes at effective address 47:48 Set to bits 29:30 of the instruction if X-form. 0x0000_0000_0000_0500. Set to 0b00 if D- or DS-form. 49 Set to bit 25 of the instruction if X-form. Set to bit 5 of the instruction if D- or DS-form. 6.5.8 Alignment Interrupt 50:53 Set to bits 21:24 of the instruction if X-form. Set to bits 1:4 of the instruction if D- or DS- An Alignment interrupt occurs when no higher priority form. exception exists and a data access cannot be per- 54:58 Set to bits 6:10 of the instruction (RT/RS/ formed for any of the following reasons. FRT/FRS), except undefined for dcbz. 1 The operand of a floating-point Load or Store is not 59:63 Set to bits 11:15 of the instruction (RA) for word-aligned, or crosses a virtual page boundary. update form instructions; set to either bits 11:15 of the instruction or to any register 1 The operand of lq, stq, lmw, stmw, lwarx, ldarx, number not in the range of registers to be stwcx., stdcx., eciwx, or ecowx is not aligned. loaded for a valid form lmw, a valid form 1 The operand of a single-register Load or Store is lswi, or a valid form lswx for which neither not aligned and the processor is in Little-Endian RA nor RB is in the range of registers to be mode. loaded; otherwise undefined. 1 The instruction is lq, stq, lmw, stmw, lswi, lswx, DAR Set to the effective address computed by stswi, or stswx, and the operand is in storage that the instruction, except that if the interrupt is Write Through Required or Caching Inhibited, or occurs in 32-bit mode the high-order 32 bits the processor is in Little-Endian mode. of the DAR are set to 0. 1 The operand of a Load or Store crosses a segment For an X-form Load or Store, it is acceptable for the boundary, or crosses a boundary between virtual processor to set the DSISR to the same value that pages that have different storage control attributes. would have resulted if the corresponding D- or DS-form 1 The operand of a Load or Store is not aligned and instruction had caused the interrupt. Similarly, for a D- is in storage that is Write Through Required or or DS-form Load or Store, it is acceptable for the pro- Caching Inhibited. cessor to set the DSISR to the value that would have resulted for the corresponding X-form instruction. For 1 The operand of dcbz, lwarx, ldarx, stwcx., or example, an unaligned lwax (that crosses a protection stdcx. is in storage that is Write Through Required boundary) would normally, following the description or Caching Inhibited. above, cause the DSISR to be set to binary: 470 Power ISATM -- Book III-S Version 2.04 000000000000 00 0 01 0 0101 ttttt ????? An Illegal Instruction type Program interrupt may be generated when execution is attempted of any where "ttttt" denotes the RT field, and "?????" denotes of the following kinds of instruction. an undefined 5-bit value. However, it is acceptable if it causes the DSISR to be set as for lwa, which is 1 an instruction that is in invalid form 000000000000 10 0 00 0 1101 ttttt ????? 1 an lswx instruction for which RA or RB is in the range of registers to be loaded If there is no corresponding alternative form instruction 1 an mtspr or mfspr instruction with an SPR (e.g., for lwaux), the value described above is set in the field that does not contain one of the defined DSISR. values The instruction pairs that may use the same DSISR Privileged Instruction value are. The following applies if the instruction is executed lhz/lhzx lhzu/lhzux lha/lhax lhau/lhaux when MSRPR = 1. lwz/lwzx lwzu/lwzux lwa/lwax A Privileged Instruction type Program interrupt ld/ldx ldu/ldux is generated when execution is attempted of a lsth/sthx sthu/sthux stw/stwx stwu/stwux privileged instruction, or of an mtspr or mfspr std/stdx stdu/stdux instruction with an SPR field that contains one lfs/lfsx lfsu/lfsux lfd/lfdx lfdu/lfdux of the defined values having spr0=1. It may be stfs/stfsx stfsu/stfsux stfd/stfdx stfdu/stfdux generated when execution is attempted of an mtspr or mfspr instruction with an SPR field Execution resumes at effective address that does not contain one of the defined val- 0x0000_0000_0000_0600. ues but has spr0=1. Programming Note The following applies if the instruction is executed The architecture does not support the use of an when MSRHV PR = 0b00. unaligned effective address by lwarx, ldarx, A Privileged Instruction type Program interrupt stwcx., stdcx., eciwx, and ecowx. If an Align- may be generated when execution is ment interrupt occurs because one of these instruc- attempted of an mtspr instruction with an tions specifies an unaligned effective address, the SPR field that designates a hypervisor Alignment interrupt handler must not attempt to resource, or when execution of a tlbie, tlbiel, simulate the instruction, but instead should treat tlbia, or tlbsync instruction is attempted. the instruction as a programming error. Programming Note These are the only cases in which a Privi- 6.5.9 Program Interrupt leged Instruction type Program interrupt can be generated when MSRPR=0. They A Program interrupt occurs when no higher priority can be distinguished from other causes of exception exists and one of the following exceptions Privileged Instruction type Program inter- arises during execution of an instruction: rupts by examining SRR149 (the bit in Floating-Point Enabled Exception which MSRPR was saved by the interrupt). A Floating-Point Enabled Exception type Program interrupt is generated when the value of the Trap expression A Trap type Program interrupt is generated when any of the conditions specified in a Trap instruction (MSRFE0 | MSRFE1) & FPSCRFEX is met. is 1. FPSCRFEX is set to 1 by the execution of a The following registers are set: floating-point instruction that causes an enabled exception, including the case of a Move To FPSCR SRR0 For all Program interrupts except a Floating- instruction that causes an exception bit and the Point Enabled Exception type Program inter- corresponding enable bit both to be 1. rupt, set to the effective address of the instruc- tion that caused the corresponding exception. Illegal Instruction For a Floating-Point Enabled Exception type An Illegal Instruction type Program interrupt is gen- Program interrupt, set as described in the fol- erated when execution is attempted of an illegal lowing list. instruction, or of a reserved instruction or an - If MSRFE0 FE1 = 0b00, FPSCRFEX = 1, instruction that is not provided by the implementa- and an instruction is executed that tion. changes MSRFE0 FE1 to a nonzero value, Chapter 6. Interrupts 471 Version 2.04 set to the effective address of the instruc- Programming Note tion that the processor would have attempted to execute next if no interrupt SRR147 can be set to 1 only if the conditions were present. exception is a Floating-Point Enabled Exception and either MSRFE0 FE1 = Programming Note 0b01 or 0b10 or MSRFE0 FE1 has just been changed from 0b00 to a nonzero Recall that all instructions that can alter value. (SRR147 is always set to 1 in MSRFE0 FE1 are context synchroniz- the last case.) ing, and therefore are not initiated until all preceding instructions have reported all exceptions they will cause. Others Loaded from the MSR. Only one of bits 43:46 can be set to 1. - If MSRFE0 FE = 0b11, set to the effective address of the instruction that caused the MSR See Figure 37 on page 466. Floating-Point Enabled Exception. Execution resumes at effective address - If MSRFE0 FE = 0b01 or 0b10, set to the 0x0000_0000_0000_0700. effective address of the first instruction that caused a Floating-Point Enabled Exception since the most recent time 6.5.10 Floating-Point Unavailable FPSCRFEX was changed from 1 to 0 or of some subsequent instruction. Interrupt A Floating-Point Unavailable interrupt occurs when no Programming Note higher priority exception exists, an attempt is made to If SRR0 is set to the effective address execute a floating-point instruction (including floating- of a subsequent instruction, that point loads, stores, and moves), and MSRFP=0. instruction will not be beyond the first such instruction at which synchroniza- The following registers are set: tion of floating-point instructions SRR0 Set to the effective address of the instruc- occurs. (Recall that such synchroniza- tion that caused the interrupt. tion is caused by Floating-Point Status SRR1 and Control Register instructions, as 33:36 Set to 0. well as by execution synchronizing 42:47 Set to 0. instructions and events.) Others Loaded from the MSR. SRR1 MSR See Figure 37 on page 466. 33:36 Set to 0. Execution resumes at effective address 42 Set to 0. 0x0000_0000_0000_0800. 43 Set to 1 for a Floating-Point Enabled Exception type Program interrupt; other- wise set to 0. 6.5.11 Decrementer Interrupt 44 Set to 1 for an Illegal Instruction type Pro- gram interrupt; otherwise set to 0. A Decrementer interrupt occurs when no higher priority 45 Set to 1 for a Privileged Instruction type exception exists, a Decrementer exception exists, and Program interrupt; otherwise set to 0. MSREE=1. 46 Set to 1 for a Trap type Program interrupt; The following registers are set: otherwise set to 0. 47 Set to 0 if SRR0 contains the address of SRR0 Set to the effective address of the instruc- the instruction causing the exception and tion that the processor would have there is only one such instruction; other- attempted to execute next if no interrupt wise set to 1. conditions were present. SRR1 33:36 Set to 0. 42:47 Set to 0. Others Loaded from the MSR. MSR See Figure 37 on page 466. Execution resumes at effective address 0x0000_0000_0000_0900. 472 Power ISATM -- Book III-S Version 2.04 6.5.14 Trace Interrupt [Category: Trace] 6.5.12 Hypervisor Decrementer A Trace interrupt occurs when no higher priority excep- Interrupt tion exists and either MSRSE=1 and any instruction A Hypervisor Decrementer interrupt occurs when no except rfid or hrfid, is successfully completed, or higher priority exception exists, a Hypervisor Decre- MSRBE=1 and a Branch instruction is completed. Suc- menter exception exists, and the value of the following cessful completion means that the instruction caused expression is 1. no other interrupt. Thus a Trace interrupt never occurs for a System Call instruction, or for a Trap instruction (MSREE | ¬(MSRHV) | MSRPR) & HDICE that traps. The instruction that causes a Trace interrupt The following registers are set: is called the "traced instruction". HSRR0 Set to the effective address of the instruc- When a Trace interrupt occurs, the following registers tion that the processor would have are set: attempted to execute next if no interrupt SRR0 Set to the effective address of the instruc- conditions were present. tion that the processor would have HSRR1 attempted to execute next if no interrupt 33:36 Set to 0. conditions were present. 42:47 Set to 0. SRR1 Others Loaded from the MSR. 33:36 and 42:47 MSR See Figure 37 on page 466. Set to an implementation-dependent value. Others Loaded from the MSR. Execution resumes at effective address 0x0000_0000_0000_0980. MSR See Figure 37 on page 466. Execution resumes at effective address Programming Note 0x0000_0000_0000_0D00. Because the value of MSREE is always 1 when the Extensions to the Trace facility are described in processor is in problem state, the simpler expres- Appendix C. sion (MSREE | ¬(MSRHV)) & HDICE Programming Note is equivalent to the expression given above. The following instructions are not traced. 1 rfid 1 hrfid 6.5.13 System Call Interrupt 1 sc, and Trap instructions that trap 1 other instructions that cause interrupts (other A System Call interrupt occurs when a System Call than Trace interrupts) instruction is executed. 1 the first instructions of any interrupt handler The following registers are set: 1 instructions that are emulated by software SRR0 Set to the effective address of the instruc- In general, interrupt handlers can achieve the effect tion following the System Call instruction. of tracing these instructions. SRR1 33:36 Set to 0. 42:47 Set to 0. 6.5.15 Hypervisor Data Storage Others Loaded from the MSR. Interrupt MSR See Figure 37 on page 466. A Hypervisor Data Storage interrupt occurs when the Execution resumes at effective address processor is not in hypervisor state, no higher priority 0x0000_0000_0000_0C00. exception exists, the value of the expression (VPM0 & ¬MSRDR) | (VPM1 & MSRDR) Programming Note is 1, and a data access cannot be performed for any of An attempt to execute an sc instruction with LEV=1 the following reasons. in problem state should be treated as a program- ming error. 1 Data address translation is enabled (MSRDR=1) and the virtual address of any byte of the storage location specified by a Load, Store, icbi, dcbz, Chapter 6. Interrupts 473 Version 2.04 dcbst, dcbf[l], eciwx, or ecowx instruction cannot 37 Set to 1 if the access is due to a lq, stq, be translated to a real address. lwarx, ldarx, stwcx., or stdcx. instruction 1 Data address translation is disabled (MSRDR=0), that addresses storage that is Write LPES1 =1, and the virtual address of any byte of Through Required or Caching Inhibited; the storage location specified by a Load, Store, otherwise set to 0. icbi, dcbz, dcbst, dcbf[l], eciwx, or ecowx 38 Set to 1 for a Store, dcbz, or ecowx instruction cannot be translated to a real address instruction; otherwise set to 0. by means of the virtual real addressing mecha- 39:40 Set to 0. nism. 41 Set to 1 if a Data Address Compare match 1 The effective address specified by a lwarx, ldarx, or a Data Address Breakpoint match stwcx., or stdcx. instruction refers to storage that occurs; otherwise set to 0. is Write Through Required or Caching Inhibited. 42 Set to 0. 1 The access violates storage protection. 43 Set to 1 if execution of an eciwx or ecowx 1 A Data Address Compare match or a Data instruction is attempted when EARE=0; oth- Address Breakpoint match occurs. erwise set to 0. 1 Execution of an eciwx or ecowx instruction is dis- 44:63 Set to 0. allowed because EARE=0. HDAR Set to the effective address of a storage If a stwcx. or stdcx. would not perform its store in the element as described in the following list. absence of a Hypervisor Data Storage interrupt, and The list should be read from the top down; either (a) the specified effective address refers to stor- the HDAR is set as described by the first age that is Write Through Required or Caching Inhib- item that corresponds to an exception that ited, or (b) a non-conditional Store to the specified is reported in the HDSISR. For example, if effective address would cause a Hypervisor Data Stor- a Load instruction causes a storage protec- age interrupt, it is implementation-dependent whether a tion violation and a Data Address Break- Hypervisor Data Storage interrupt occurs. point match (and both are reported in the HDSISR), the HDAR is set to the effective If the contents of the XER specifies a length of zero address of a byte in the first aligned double- bytes for a Move Assist instruction, a Hypervisor Data word for which access was attempted in the Storage interrupt does not occur for reasons of address page that caused the exception. translation, or storage protection. If such an instruction 1 a Data Storage exception occurs for causes a Hypervisor Data Storage interrupt for other reasons other than a Data Address reasons, the setting of the HDSISR and HDAR reflects Breakpoint match or, for eciwx and only these other reasons listed in the preceding sen- ecowx, EARE=0 tence. (E.g., if such an instruction causes a storage - a byte in the block that caused the protection violation and a Data Address Breakpoint exception, for a Cache Manage- match, the HDSISR and HDAR are set as if the storage ment instruction protection violation did not occur.) - a byte in the first aligned double- The following registers are set: word for which access was attempted in the page that caused HSRR0 Set to the effective address of the instruc- the exception, for a Load, Store, tion that caused the interrupt. eciwx, or ecowx instruction ("first" HSRR1 refers to address order; see 33:36 Set to 0. Section 6.7) 42:47 Set to 0. 1 undefined, for a Data Address Break- Others Loaded from the MSR. point match, or if eciwx or ecowx is executed when EARE=0 MSR See Figure 37. For the cases in which the HDAR is speci- HDSISR fied above to be set to a defined value, if 32 Set to 0. the interrupt occurs in 32-bit mode the high- 33 Set to 1 if the value of the expression order 32 bits of the DAR are set to 0. (MSRDR) | ((¬MSRDR & VPM0) & LPES1) If multiple Hypervisor Data Storage exceptions occur is 1 and the translation for an attempted for a given effective address, any one or more of the access is not found in the primary PTEG or bits corresponding to these exceptions may be set to 1 in the secondary PTEG; otherwise set to 0. in the HDSISR. 34:35 Set to 0. Execution resumes at effective address 36 Set to 1 if the access is not permitted by the 0x0000_0000_0000_0E00. storage protection mechanism; otherwise set to 0. 474 Power ISATM -- Book III-S Version 2.04 6.5.16 Hypervisor Instruction Execution resumes at effective address 0x0000_0000_0000_0E10. Storage Interrupt A Hypervisor Instruction Storage interrupt occurs when 6.5.17 Hypervisor Data Seg- the processor is not in hypervisor state, no higher prior- ity exception exists, the value of the expression ment Interrupt (VPM0 & ¬MSRIR) | (VPM1 & MSRIR) A Hypervisor Data Segment interrupt may occur when the processor is not in hypervisor state, data address is 1, and the next instruction to be executed cannot be translation is disabled (MSRDR=0), VPM0=1, LPES1=1, fetched for any of the following reasons. no higher priority exception exists, the effective address of any byte of the storage location specified by a Load, 1 Instruction address translation is enabled Store, icbi, dcbz, dcbst, dcbf[l] eciwx, or ecowx (MSRIR=1) and the virtual address cannot be instruction is beyond the 1 TB VRMA. translated to a real address. 1 Instruction address translation is disabled If a stwcx. or stdcx. would not perform its store in the (MSRIR=0), LPES1 =1, and the virtual address absence of a Hypervisor Data Segment interrupt, and a cannot be translated to a real address by means of non-conditional Store to the specified effective address the virtual real addressing mechanism. would cause a Hypervisor Data Segment interrupt, it is implementation-dependent whether a Hypervisor Data 1 The fetch access violates storage protection. Segment interrupt occurs. The following registers are set: If a Move Assist instruction has a length of zero (in the HSRR0 Set to the effective address of the instruction XER), a Hypervisor Data Segment interrupt does not that the processor would have attempted to occur, regardless of the effective address. execute next if no interrupt conditions were The following registers are set: present (if the interrupt occurs on attempting to fetch a branch target, HSRR0 is set to the HSRR0 Set to the effective address of the instruc- branch target address). tion that caused the interrupt. HSRR1 HSRR1 33 Set to 1 if the value of the expression 33:36 Set to 0. (MSRIR) | ((¬MSRIR & VPM0) 42:47 Set to 0. & LPES1) Others Loaded from the MSR. is 1 and the translation for an attempted MSR See Figure 37. access is not found in the primary PTEG or in the secondary PTEG; otherwise set to 0. HDSISR Set to an undefined value. 34 Set to 0. HDAR Set to the effective address of a storage 35 Set to 1 if the access is to No-execute or element as described in the following list. Guarded storage; otherwise set to 0. 1 a byte in the block that caused the 36 Set to 1 if the access is not permitted by Hypervisor Data Segment interrupt, for Figure 24; otherwise set to 0. a Cache Management instruction 1 a byte in the first aligned doubleword Programming Note for which access was attempted in the segment that caused the Hypervisor Storage protection violations for the Data Segment interrupt, for a Load, Hypervisor Data Storage Interrupt are Store, eciwx, or ecowx instruction reported in HDSISR36, whereas storage ("first" refers to address order; see protection violations for the Hypervisor Section 6.7) Instruction Storage Interrupt are reported in HSRR135 and HSRR136. Execution resumes at effective address 0x0000_0000_0000_0E20. 42:46 Set to 0. 47 Set to 0. 6.5.18 Hypervisor Instruction Others Loaded from the MSR. Segment Interrupt MSR See Figure 37. A Hypervisor Instruction Segment interrupt may occur If multiple Instruction Storage exceptions occur due to when the processor is not in hypervisor state, instruc- attempting to fetch a single instruction, any one or more tion address translation is disabled (MSRIR=0), of the bits corresponding to these exceptions may be VPM0=1, LPES1=1, no higher priority exception exists, set to 1 in HSRR1. Chapter 6. Interrupts 475 Version 2.04 and the effective address of any byte of the instruction is beyond the 1 TB VRMA. The following registers are set: HSRR0 Set to the effective address of the instruction that the processor would have attempted to execute next if no interrupt conditions were present (if the interrupt occurs on attempting to fetch a branch target, HSRR0 is set to the branch target address). HSRR1 33:36 Set to 0. 42:47 Set to 0. Others Loaded from the MSR. MSR See Figure 37 on page 466. Execution resumes at effective address 0x0000_0000_0000_03E0. 6.5.19 Performance Monitor Interrupt [Category: Server.Perfor- mance Monitor] The Performance Monitor interrupt is part of the Perfor- mance Monitor facility; see Appendix C. If the Perfor- mance Monitor facility is not implemented or does not use this interrupt, the corresponding interrupt vector (see Figure 38 on page 466) is treated as reserved. 6.5.20 Vector Unavailable Inter- rupt [Category: Vector] A Vector Unavailable interrupt occurs when no higher priority exception exists, an attempt is made to execute a Vector instruction (including Vector loads, stores, and moves), and MSRVEC=0. The following registers are set: SRR0 Set to the effective address of the instruc- tion that caused the interrupt. SRR1 33:36 Set to 0. 42:47 Set to 0. Others Loaded from the MSR. MSR See Figure 37 on page 466. Execution resumes at effective address 0x0000_0000_0000_0F20. 476 Power ISATM -- Book III-S Version 2.04 6.6 Partially Executed Programming Note Instructions An exception may result in the partial execution of a Load or Store instruction. For example, if the If a Data Storage, Data Segment, Alignment, system- Page Table Entry that translates the address of the caused, or imprecise exception occurs while a Load or storage operand is altered, by a program running Store instruction is executing, the instruction may be on another processor, such that the new contents aborted. In such cases the instruction is not completed, of the Page Table Entry preclude performing the but may have been partially executed in the following access, the alteration could cause the Load or respects. Store instruction to be aborted after having been partially executed. 1 Some of the bytes of the storage operand may have been accessed, except that if access to a As stated in the Book II section cited above, if an given byte of the storage operand would violate instruction is partially executed the contents of reg- storage protection, that byte is neither copied to a isters are preserved to the extent that the instruc- register by a Load instruction nor modified by a tion can be re-executed correctly. The consequent Store instruction. Also, the rules for storage preservation is described in the following list. For accesses given in Section 5.8.1, "Guarded Stor- any given instruction, zero or one item in the list age" and in Section 2.1 of Book II are obeyed. applies. 1 Some registers may have been altered as 1 For a fixed-point Load instruction that is not a described in the Book II section cited above. multiple or string form, or for an eciwx instruc- tion, if RT=RA or RT=RB then the contents of 1 Reference and Change bits may have been register RT are not altered. updated as described in Section 5.7.8. 1 For an lq instruction, if RT+1 = RA then the 1 For a stwcx. or stdcx. instruction that is executed contents of register RT+1 are not altered. in-order, CR0 may have been set to an undefined value and the reservation may have been cleared. 1 For an update form Load or Store instruction, the contents of register RA are not altered. 1 For an lq instruction that is executed in-order, the TGCC may have been set to an undefined value. The architecture does not support continuation of an aborted instruction but intends that the aborted instruc- tion be re-executed if appropriate. Chapter 6. Interrupts 477 Version 2.04 6.7 Exception Ordering Instruction-Caused and Precise Since multiple exceptions can exist at the same time 1. [Hypervisor] Instruction Segment and the architecture does not provide for reporting 2. [Hypervisor] Instruction Storage more than one interrupt at a time, the generation of 3. Program more than one interrupt is prohibited. Some exceptions, - Illegal Instruction such as the External exception, persist and can be - Privileged Instruction deferred. However, other exceptions would be lost if 4. Function-Dependent they were not recognized and handled when they 4.a Fixed-Point and Branch occur. For example, if an External interrupt was gener- 1a Program ated when a Data Storage exception existed, the Data - Trap Storage exception would be lost. If the Data Storage 1b System Call exception was caused by a Store Multiple instruction for 1c [Hypervisor] Data Storage, [Hypervisor] Data which the storage operand crosses a virtual page Segment, or Alignment boundary and the exception was a result of attempting 2 Trace to access the second virtual page, the store could have 4.b Floating-Point modified locations in the first virtual page even though it 1 FP Unavailable appeared that the Store Multiple instruction was never 2a Program executed. - Precise Mode Floating-Pt Enabled Excep'n 2b [Hypervisor] Data Storage, [Hypervisor] Data For the above reasons, all exceptions are prioritized Segment, or Alignment with respect to other exceptions that may exist at the 3 Trace same instant to prevent the loss of any exception that is 4.c Vector not persistent. Some exceptions cannot exist at the 1 Vector Unavailable same instant as some others. 2a [Hypervisor] Data Storage, [Hypervisor] Data Segment, or Alignment Data Storage, Hypervisor Data Storage, Data Seg- 3 Trace ment, Hypervisor Data Segment, and Alignment excep- tions occur as if the storage operand were accessed For implementations that execute multiple instructions one byte at a time in order of increasing effective in parallel using pipeline or superscalar techniques, or address (with the obvious caveat if the operand combinations of these, it can be difficult to understand includes both the maximum effective address and the ordering of exceptions.To understand this ordering effective address 0). it is useful to consider a model in which each instruction is fetched, then decoded, then executed, all before the next instruction is fetched. In this model, the excep- 6.7.1 Unordered Exceptions tions a single instruction would generate are in the The exceptions listed here are unordered, meaning that order shown in the list of instruction-caused exceptions. they may occur at any time regardless of the state of Exceptions with different numbers have different order- the interrupt processing mechanism. These exceptions ing. Exceptions with the same numbering but differ- are recognized and processed when presented. ent lettering are mutually exclusive and cannot be caused by the same instruction. The External, Decre- 1. System Reset menter, and Hypervisor Decrementer interrupts have 2. Machine Check equal ordering. Similarly, where Data Storage, Data Segment, and Alignment exceptions are listed in the same item they have equal ordering. 6.7.2 Ordered Exceptions Even on processors that are capable of executing sev- The exceptions listed here are ordered with respect to eral instructions simultaneously, or out of order, the state of the interrupt processing mechanism. In the instruction-caused interrupts (precise and imprecise) following list, the hypervisor forms of the Data Storage, occur in program order. Instruction Storage, Data Segment, and Instruction Segment exceptions can be substituted for the non- hypervisor forms since the hypervisor forms cannot be caused by the same instruction and have the same ordering. System-Caused or Imprecise 1. Program - Imprecise Mode Floating-Point Enabled Exception 2. External and [Hypervisor] Decrementer 478 Power ISATM -- Book III-S Version 2.04 6.8 Interrupt Priorities b. [Hypervisor] Data Storage, [Hypervisor] Data Segment, or Alignment This section describes the relationship of nonmaskable, c. Trace maskable, precise, and imprecise interrupts. In the fol- B. Floating-Point Loads and Stores lowing descriptions, the interrupt mechanism waiting for a. Program - Illegal Instruction all possible exceptions to be reported includes only b. Floating-Point Unavailable exceptions caused by previously initiated instructions c. [Hypervisor] Data Storage, [Hypervisor] (e.g., it does not include waiting for the Decrementer to Data Segment, or Alignment step through zero). The exceptions are listed in order d. Trace of highest to lowest priority. In the following list, the hypervisor forms of the Data Storage, Instruction Stor- C. Vector Loads and Stores age, Data Segment, and Instruction Segment excep- a. Program - Illegal Instruction tions can be substituted for the non-hypervisor forms b. Vector Unavailable since the hypervisor forms cannot occur simultaneously c. [Hypervisor] Data Storage, [Hypervisor] and have the same priority. Data Segment, or Alignment d. Trace 1. System Reset D. Other Floating-Point Instructions System Reset exception has the highest priority of a. Floating-Point Unavailable all exceptions. If this exception exists, the inter- b. Program - Precise Mode Floating-Point rupt mechanism ignores all other exceptions and Enabled Exception generates a System Reset interrupt. c. Trace Once the System Reset interrupt is generated, E. Other Vector Instructions no nonmaskable interrupts are generated due to a. Vector Unavailable exceptions caused by instructions issued prior to b. Trace the generation of this interrupt. F. rfid, hrfid and mtmsr[d] 2. Machine Check a. Program - Privileged Instruction Machine Check exception is the second highest b. Program - Floating-Point Enabled Exception priority exception. If this exception exists and a c. Trace, for mtmsr[d] only System Reset exception does not exist, the G. Other Instructions interrupt mechanism ignores all other exceptions a.These exceptions are mutually exclusive and generates a Machine Check interrupt. and have the same priority: Once the Machine Check interrupt is generated, 1 Program - Trap no nonmaskable interrupts are generated due to 1 System Call exceptions caused by instructions issued prior to 1 Program - Privileged Instruction the generation of this interrupt. 1 Program - Illegal Instruction b.Trace 3. Instruction-Dependent H. [Hypervisor] Instruction Storage and This exception is the third highest priority excep- [Hypervisor] Instruction Segment tion. When this exception is created, the inter- rupt mechanism waits for all possible Imprecise These exceptions have the lowest priority in exceptions to be reported. It then generates the this category. They are recognized only when appropriate ordered interrupt if no higher priority all instructions prior to the instruction caus- exception exists when the interrupt is to be ing one of these exceptions appear to have generated. Within this category a particular completed and that instruction is the next instruction may present more than a single excep- instruction to be executed. The two excep- tion. When this occurs, those exceptions are tions are mutually exclusive. ordered in priority as indicated in the following lists. The priority of these exceptions is specified for Where [Hypervisor] Data Storage, [Hypervisor] completeness and to ensure that they are Data Segment, and Alignment exceptions are not given more favorable treatment. It is listed in the same item they have equal priority acceptable for an implementation to treat (i.e., the processor may generate any one of the these exceptions as though they had a lower three interrupts for which an exception exists). priority. A. Fixed-Point Loads and Stores 4. Program - Imprecise Mode Floating-Point Enabled a. These exceptions are mutually exclusive Exception and have the same priority: 1 Program - Illegal Instruction This exception is the fourth highest priority excep- 1 Program - Privileged Instruction tion. When this exception is created, the interrupt mechanism waits for all other possible exceptions Chapter 6. Interrupts 479 Version 2.04 to be reported. It then generates this interrupt if no higher priority exception exists when the interrupt is to be generated. 5. External and [Hypervisor] Decrementer These exceptions are the lowest priority excep- tions. All have equal priority (i.e., the processor may generate any one of these interrupts for which an exception exists). When one of these excep- tions is created, the interrupt processing mecha- nism waits for all other possible exceptions to be reported. It then generates the corresponding interrupt if no higher priority exception exists when the interrupt is to be generated. If a Hypervisor Decrementer exception exists and each attempt to execute an instruction when the Hypervisor Decrementer interrupt is enabled causes an exception (see the Programming Note below), the Hypervisor Decrementer interrupt is not delayed indefinitely. Programming Note An incorrect or malicious operating system could corrupt the first instruction in the interrupt vector location for an instruction- caused interrupt such that the attempt to exe- cute the instruction causes the same excep- tion that caused the interrupt (a looping interrupt; e.g., illegal instruction and Program interrupt). Similarly, the first instruction of the interrupt vector for one instruction-caused interrupt could cause a different instruction- caused interrupt, and the first instruction of the interrupt vector for the second instruction-caused interrupt could cause the first instruction-caused interrupt (e.g., Program interrupt and Floating-Point Unavailable interrupt). Similarly, if the Real Mode Area is virtualized and there is no PTE for the page containing the interrupt vectors, every attempt to execute the first instruction of the OS's Instruction Storage interrupt handler would cause a Hypervisor Instruction Storage interrupt; if the Hypervisor Instruction Storage interrupt handler returns to the OS's Instruc- tion Storage interrupt handler without the rele- vant PTE having been created, another Hypervisor Instruction Storage interrupt would occur immediately. The looping caused by these and similar cases is terminated by the occurrence of a System Reset or Hypervisor Decrementer interrupt. 480 Power ISATM -- Book III-S Version 2.04 Chapter 7. Timer Facilities 7.1 Overview. . . . . . . . . . . . . . . . . . . . 481 7.4 Hypervisor Decrementer . . . . . . . . 483 7.2 Time Base (TB) . . . . . . . . . . . . . . 481 7.5 Processor Utilization of Resources 7.2.1 Writing the Time Base . . . . . . . . 482 Register (PURR) . . . . . . . . . . . . . . . . . 483 7.3 Decrementer . . . . . . . . . . . . . . . . . 482 7.3.1 Writing and Reading the Decre- menter . . . . . . . . . . . . . . . . . . . . . . . . . 483 7.1 Overview The Time Base increments until its value becomes 0xFFFF_FFFF_FFFF_FFFF (264 - 1). At the next The Time Base, Decrementer, Hypervisor Decre- increment, its value becomes menter, and the Processor Utilization of Resources 0x0000_0000_0000_0000. There is no interrupt or Register, provide timing functions for the system. The other indication when this occurs. remainder of this section describes these registers and The period of the Time Base depends on the driving related facilities. frequency. As an order of magnitude example, sup- pose that the CPU clock is 1 GHz and that the Time Base is driven by this frequency divided by 32. Then 7.2 Time Base (TB) the period of the Time Base would be The Time Base (TB) is a 64-bit register (see Figure 39) 64 2 × 32 containing a 64-bit unsigned integer that is incremented TTB = -------------------- = 5.90 × 1011 seconds - 1 GHz periodically. Each increment adds 1 to the low-order bit which is approximately 18,700 years. (bit 63). The frequency at which the integer is updated is implementation-dependent. The Time Base is implemented such that: 1. Loading a GPR from the Time Base has no effect 0 39 on the accuracy of the Time Base. TBU40 /// TBU TBL 2. Copying the contents of a GPR to the Time Base 0 32 63 replaces the contents of the Time Base with the contents of the GPR. Field Description The Power ISA does not specify a relationship between TBU40 Upper 40 bits of Time Base the frequency at which the Time Base is updated and TBU Upper 32 bits of Time Base other frequencies, such as the CPU clock or bus clock TBL Lower 32 bits of Time Base in a Power ISA system. The Time Base update fre- quency is not required to be constant. What is required, so that system software can keep time of day and oper- Figure 39. Time Base ate interval timers, is one of the following. The Time Base is a hypervisor resource; see Chapter 1 The system provides an (implementation-depen- 2. dent) interrupt to software whenever the update The SPRs TBU40, TBU, and TBL provide access to the frequency of the Time Base changes, and a means fields of the Time Base shown in Figure 39. When a to determine what the current update frequency is. mtspr instruction is executed specifying one of these 1 The update frequency of the Time Base is under SPRs, the associated field of the Time Base is altered the control of the system software. and the remaining bits of the Time Base are not affected. Chapter 7. Timer Facilities 481 Version 2.04 Implementations must provide a means for either pre- demonstrates the process. Assume the upper 40 bits of venting the Time Base from incrementing or preventing Rx contain the desired value upper 40 bits of the Time it from being read in problem state (MSRPR=1). If the Base. means is under software control, it must be privileged and, in implementations of the Server environment, mftb Ry # Read 64-bit Time Base value must be accessible only in hypervisor state (MSRHV PR clrldi Ry,Ry,40# lower 24 bits of old TB = 0b10). There must be a method for getting all pro- mttbu40Rx # write upper 40 bits of TB cessors' Time Bases to start incrementing with values mftb Rz # read TB value again that are identical or almost identical in all processors. clrldi Rz,Rz,40# lower 24 bits of new TB cmpld Rz,Ry # compare new and old lwr 24 bge done # no carry out of low 24 bits Programming Note addis Rx,Rx,0x0100#increment upper 40 bits If software initializes the Time Base on power-on to mttbu40 Rx # update to adjust for carry some reasonable value and the update frequency of the Time Base is constant, the Time Base can Programming Note be used as a source of values that increase at a The instructions for writing the Time Base are constant rate, such as for time stamps in trace mode-independent. Thus code written to set the entries. Time Base will work correctly in either 64-bit or 32- Even if the update frequency is not constant, val- bit mode. ues read from the Time Base are monotonically increasing (except when the Time Base wraps from 264-1 to 0). If a trace entry is recorded each time 7.3 Decrementer the update frequency changes, the sequence of Time Base values can be post-processed to The Decrementer (DEC) is a 32-bit decrementing become actual time values. counter that provides a mechanism for causing a Dec- rementer interrupt after a programmable delay. The Successive readings of the Time Base may return contents of the Decrementer are treated as a signed identical values. integer. See the description of the Time Base in Chapter 4 of Book II for ways to compute time of day in POSIX DEC format from the Time Base. 32 63 Figure 40. Decrementer 7.2.1 Writing the Time Base The Decrementer is driven by the same frequency as the Time Base. The period of the Decrementer will Writing the Time Base is privileged, and can be done depend on the driving frequency, but if the same values only in hypervisor state. Reading the Time Base is not are used as given above for the Time Base (see Sec- privileged; it is discussed in Chapter 4 of Book II. tion 7.2), and if the Time Base update frequency is con- It is not possible to write the entire 64-bit Time Base stant, the period would be using a single instruction. The mttbl and mttbu 32 extended mnemonics write the lower and upper halves 2 × 32 TDEC = -------------------- = 137 seconds. - of the Time Base (TBL and TBU), respectively, preserv- 1 GHz ing the other half. These are extended mnemonics for The Decrementer counts down. the mtspr instruction; see Appendix A, "Assembler When the contents of DEC32 change from 0 to 1, a Extended Mnemonics" on page 493. Decrementer exception will come into existence within The Time Base can be written by a sequence such as: a reasonable period or time. When the contents of DEC32 change from 1 to 0, an existing Decrementer lwz Rx,upper # load 64-bit value for exception will cease to exist within a reasonable period lwz Ry,lower # TB into Rx and Ry of time, but not later than the completion of the next li Rz,0 context synchronizing instruction or event. mttbl Rz # set TBL to 0 mttbu Rx # set TBU The preceding paragraph applies regardless of whether mttbl Ry # set TBL the change in the contents of DEC32 is the result of decrementation of the Decrementer by the processor or Provided that no interrupts occur while the last three of modification of the Decrementer caused by execu- instructions are being executed, loading 0 into TBL pre- tion of an mtspr instruction. vents the possibility of a carry from TBL to TBU while the Time Base is being initialized. The operation of the Decrementer satisfies the follow- ing constraints. The preferred method of changing the Time Base uti- lizes the TBU40 facility. The following code sequence 482 Power ISATM -- Book III-S Version 2.04 32 2 × 32 1. The operation of the Time Base and the Decre- TDEC = -------------------- = 137 seconds. - menter is coherent, i.e., the counters are driven by 1 GHz the same fundamental time base. When the contents of HDEC32 change from 0 to 1, a 2. Loading a GPR from the Decrementer has no Hypervisor Decrementer exception will come into exist- effect on the accuracy of the Time Base. ence within a reasonable period or time. When the con- tents of HDEC32 change from 1 to 0, an existing 3. Copying the contents of a GPR to the Decrementer Hypervisor Decrementer exception will cease to exist replaces the contents of the Decrementer with the within a reasonable period of time, but not later than the contents of the GPR. completion of the next context synchronizing instruction or event. Programming Note In systems that change the Time Base update fre- The preceding paragraph applies regardless of whether quency for purposes such as power management, the change in the contents of HDEC32 is the result of the Decrementer input frequency will also change. decrementation of the Hypervisor Decrementer by the Software must be aware of this in order to set inter- processor or of modification of the Hypervisor Decre- val timers. menter caused by execution of an mtspr instruction. The operation of the Hypervisor Decrementer satisfies the following constraints. 7.3.1 Writing and Reading the 1. The operation of the Time Base and the Hypervi- Decrementer sor Decrementer is coherent, i.e., the counters are driven by the same fundamental time base. The contents of the Decrementer can be read or written using the mfspr and mtspr instructions, both of which 2. Loading a GPR from the Hypervisor Decrementer are privileged when they refer to the Decrementer. has no effect on the accuracy of the Hypervisor Using an extended mnemonic (see Appendix A, Decrementer. "Assembler Extended Mnemonics" on page 493), the 3. Copying the contents of a GPR to the Hypervisor Decrementer can be written from GPR Rx using: Decrementer replaces the contents of the Hypervi- sor Decrementer with the contents of the GPR. mtdec Rx The Decrementer can be read into GPR Rx using: Programming Note In systems that change the Time Base update fre- mfdec Rx quency for purposes such as power management, Copying the Decrementer to a GPR has no effect on the Hypervisor Decrementer update frequency will the Decrementer contents or on the interrupt mecha- also change. Software must be aware of this in nism. order to set interval timers. 7.4 Hypervisor Decrementer 7.5 Processor Utilization of The Hypervisor Decrementer (HDEC) is a 32-bit decre- menting counter that provides a mechanism for causing Resources Register (PURR) a Hypervisor Decrementer interrupt after a programma- The Processor Utilization of Resources Register ble delay. The contents of the Decrementer are treated (PURR) is a 64-bit counter, the contents of which pro- as a signed integer. vide an estimate of the resources used by the proces- sor. The contents of the PURR are treated as a 64-bit HDEC unsigned integer. 32 63 PURR Figure 41. Hypervisor Decrementer 0 63 The Hypervisor Decrementer is a hypervisor resource; see Chapter 2. Figure 42. Processor Utilization of Resources Register The Hypervisor Decrementer is driven by the same fre- quency as the Time Base. The period of the Hypervisor The PURR is a hypervisor resource; see Chapter 2. Decrementer will depend on the driving frequency, but The contents of the PURR increase monotonically, if the same values are used as given above for the unless altered by software, until the sum of the contents Time Base (see Section 7.2), and if the Time Base plus the amount by which it is to be increased exceed update frequency is constant, the period would be 0xFFFF_FFFF_FFFF_FFFF (264 - 1) at which point the Chapter 7. Timer Facilities 483 Version 2.04 contents are replaced by that sum modulo 264. There is no interrupt or other indication when this occurs. The rate at which the value represented by the contents of the PURR increases is an estimate of the portion of resources used by the processor with respect to other processors that share those resources monitored by the PURR. Let the difference between the value represented by the contents of the Time Base at times Ta and Tb be Tab. Let the difference between the value represented by the contents of the PURR at time Ta and Tb be the value Pab. The ratio of Pab/Tab is an estimate of the per- centage of shared resources used by the processor during the interval Tab. For the set {S} of processors that share the resources monitored by the PURR, the sum of the usage estimates for all the processors in the set is 1.0. The definition of the set of processors S, the shared resources corresponding to the set S, and specifics of the algorithm for incrementing the PURR are imple- mentation-specific. The PURR is implemented such that: 1. Loading a GPR from the PURR has no effect on the accuracy of the PURR. 2. Copying the contents of a GPR to the PURR replaces the contents of the PURR with the con- tents of the GPR. Programming Note Estimates computed as described above may be useful for purposes of resource use accounting, program dispatching, etc. Because the rate at which the PURR accumulates resource usage estimates is dependent on the fre- quency at which the Time Base is incremented, the interpretation of the contents of the PURR must be adjusted if the frequency at which the Time Base is incremented is altered. 484 Power ISATM -- Book III-S Version 2.04 Chapter 8. Debug Facilities 8.1 Overview. . . . . . . . . . . . . . . . . . . . 485 8.1.1 Data Address Breakpoint. . . . . . 485 8.1 Overview shown in Figure 43, and the Data Address Breakpoint Register Extension (DABRX), shown in Figure 44. Processors provide debug facilities to enable hardware and software debug functions, such as instructions and DAB BT DW DR data breakpoints and program single stepping. The 0 61 62 63 debug facilities consist of a data address breakpoint register (DABR), a data address breakpoint register Bit(s) Name Description extension (DABRX) (see Section 8.1.1) and an associ- 0:60 DAB Data Address Breakpoint ated interrupt (see Section 6.5.3). 61 BT Breakpoint Translation 62 DW Data Write The mfspr and mtspr instructions (see Section 4.4.3) 63 DR Data Read provide access to the registers of the debug facilities. Figure 43. Data Address Breakpoint Register In addition to the facilities described here, implementa- tions will typically include debug facilities, modes, and /// BTI PRIVM access mechanisms which are implementation-spe- 0 60 61 63 cific. For example, implementations will typically pro- vide access to the debug facilities via a dedicated Bit(s) Name Description interface such as the IEEE 1149.1 Test Access Port 60 BTI Breakpoint Translation Ignore (JTAG). 61:63 PRIVM Privilege Mask 61 HYP Hypervisor state 62 PNH Privileged but Non-Hypervisor state 63 PRO Problem state 8.1.1 Data Address Breakpoint All other fields are reserved. The Data Address Breakpoint mechanism provides a means of detecting load and store accesses to a desig- Figure 44. Data Address Breakpoint Register nated doubleword. The address comparison is done Extension on an effective address (EA). The DABR and DABRX are hypervisor resources; see The Data Address Breakpoint mechanism is controlled Section 2.6 on page 399. by the Data Address Breakpoint Register (DABR), The supported PRIVM values are 0b000, 0b001, 0b010, 0b011, 0b100, and 0b111. If the PRIVM field does not contain one of the supported values, then whether a match occurs for a given storage access is undefined. Elsewhere in this section it is assumed that the PRIVM field contains one of the supported values. Chapter 8. Debug Facilities 485 Version 2.04 Programming Note Programming Note PRIVM value 0b000 causes matches not to occur Before setting a breakpoint requested by the oper- regardless of the contents of other DABR and ating system, the hypervisor must verify that the DABRX fields. PRIVM values 0b101 and 0b110 are requested contents of the DABR and DABRX can- not supported because a storage location that is not cause the hypervisor to receive a Data Storage shared between the hypervisor and non-hypervisor interrupt that it is not prepared to handle, or that it software is unlikely to be accessed using the same intrinsically cannot handle (e.g., the EA is in the EA by both the hypervisor and the non-hypervisor range of EAs at which the hypervisor's Data Stor- software. (PRIVM value 0b111 is supported prima- age interrupt handler saves registers, DABRBT || rily for reasons of software compatibility, as DABRXBTI 0b10, DABRDW = 1, and DABRXHYP = described in a subsequent Programming Note.) 1). A Data Address Breakpoint match occurs for a Load or Programming Note Store instruction if, for any byte accessed, all of the fol- lowing conditions are satisfied. Processors that comply with versions of the archi- tecture that precede Version 2.02 do not provide 1 EA0:60 = DABRDAB the DABRX. Forward compatibility for software 1 (MSRDR = DABRBT) | DABRXBTI that was written for such processors (and uses the 1 if the processor is in Data Address Breakpoint facility) can be obtained - hypervisor state and DABRXHYP = 1 or by setting DABRX60:63 to 0b0111. - privileged but non-hypervisor state and DABRXPNH = 1 or - problem state and DABRXPR = 1 1 the instruction is a Store and DABRDW = 1, or the instruction is a Load and DABRDR = 1. In 32-bit mode the high-order 32 bits of the EA are treated as zeros for the purpose of detecting a match. If the above conditions are satisfied, a match also occurs for eciwx and ecowx. For the purpose of deter- mining whether a match occurs, eciwx is treated as a Load, and ecowx is treated as a Store. If the above conditions are satisfied, it is undefined whether a match occurs in the following cases. 1 The instruction is Store Conditional but the store is not performed. 1 The instruction is a Load/Store String of zero length. 1 The instruction is dcbz. (For the purpose of deter- mining whether a match occurs, dcbz is treated as a Store.) The Cache Management instructions other than dcbz never cause a match. A Data Address Breakpoint match causes a Data Stor- age exception (see Section 6.5.3, "Data Storage Inter- rupt" on page 467). If a match occurs, some or all of the bytes of the storage operand may have been accessed; however, if a Store or ecowx instruction causes the match, the storage operand is not modified if the instruction is one of the following: 1 any Store instruction that causes an atomic access 1 ecowx Programming Note The Data Address Breakpoint mechanism does not apply to instruction fetches. 486 Power ISATM -- Book III-S Version 2.04 Chapter 9. External Control [Category: External Control] 9.1 External Access Register . . . . . . . 487 9.2 External Access Instructions . . . . 487 The External Control facility permits a program to com- municate with a special-purpose device. The facility 9.2 External Access Instructions consists of a Special Purpose Register, called EAR, The External Access instructions, External Control In and two instructions, called External Control In Word Word Indexed (eciwx) and External Control Out Word Indexed (eciwx) and External Control Out Word Indexed (ecowx), are described in Book II. Additional Indexed (ecowx). information about them is given below. This facility must provide a means of synchronizing the If attempt is made to execute either of these instruc- devices with the processor to prevent the use of an tions when EARE=0, a Data Storage interrupt occurs address by the device when the translation that pro- with bit 43 of the DSISR set to 1. duced that address is being invalidated. The instructions are supported whenever MSRDR=1. If either instruction is executed when MSRDR=0 (real 9.1 External Access Register addressing mode), the results are boundedly unde- fined. This 32-bit Special Purpose Register controls access to the External Control facility and, for external control operations that are permitted, identifies the target device. E /// RID 32 33 58 63 Bit(s) Name Description 32 E Enable bit 58:63 RID Resource ID All other fields are reserved. Figure 45. External Access Register The EAR is a hypervisor resource; see Chapter 2. The high-order bits of the RID field that correspond to bits of the Resource ID beyond the width of the Resource ID supported by the implementation are treated as reserved bits. Programming Note The hypervisor can use the EAR to control which programs are allowed to execute External Access instructions, when they are allowed to do so, and which devices they are allowed to communicate with using these instructions. Chapter 9. External Control [Category: External Control] 487 Version 2.04 488 Power ISATM -- Book III-S Version 2.04 Chapter 10. Synchronization Requirements for Context Alterations Changing the contents of certain System Registers, the If a sequence of instructions contains context-altering contents of SLB entries, or the contents of other system instructions and contains no instructions that are resources that control the context in which a program affected by any of the context alterations, no software executes can have the side effect of altering the context synchronization is required within the sequence. in which data addresses and instruction addresses are interpreted, and in which instructions are executed and Programming Note data accesses are performed. For example, changing Sometimes advantage can be taken of the fact that MSRIR from 0 to 1 has the side effect of enabling trans- certain events, such as interrupts, and certain lation of instruction addresses. These side effects need instructions that occur naturally in the program, not occur in program order, and therefore may require such as the rfid that returns from an interrupt han- explicit synchronization by software. (Program order is dler, provide the required synchronization. defined in Book II.) An instruction that alters the context in which data No software synchronization is required before or after addresses or instruction addresses are interpreted, or a context-altering instruction that is also context syn- in which instructions are executed or data accesses are chronizing or when altering the MSR in most cases performed, is called a context-altering instruction. This (see the tables). No software synchronization is chapter covers all the context-altering instructions. The required before most of the other alterations shown in software synchronization required for them is shown in Table 2, because all instructions preceding the context- Table 1 (for data access) and Table 2 (for instruction altering instruction are fetched and decoded before the fetch and execution). context-altering instruction is executed (the processor must determine whether any of these preceding The notation "CSI" in the tables means any context syn- instructions are context synchronizing). chronizing instruction (e.g., sc, isync, or rfid). A con- text synchronizing interrupt (i.e., any interrupt except Unless otherwise stated, the material in this chapter non-recoverable System Reset or non-recoverable assumes a uniprocessor environment. Machine Check) can be used instead of a context syn- chronizing instruction. If it is, phrases like "the synchro- nizing instruction", below, should be interpreted as meaning the instruction at which the interrupt occurs. If no software synchronization is required before (after) a context-altering instruction, "the synchronizing instruc- tion before (after) the context-altering instruction" should be interpreted as meaning the context-altering instruction itself. The synchronizing instruction before the context-alter- ing instruction ensures that all instructions up to and including that synchronizing instruction are fetched and executed in the context that existed before the alter- ation. The synchronizing instruction after the context- altering instruction ensures that all instructions after that synchronizing instruction are fetched and executed in the context established by the alteration. Instructions after the first synchronizing instruction, up to and including the second synchronizing instruction, may be fetched or executed in either context. Chapter 10. Synchronization Requirements for Context Alterations 489 Version 2.04 Instruction or Required Required Notes Instruction or Required Required Notes Event Before After Event Before After interrupt none none interrupt none none rfid none none rfid none none hrfid none none hrfid none none sc none none sc none none Trap none none Trap none none mtmsrd (SF) none none mtmsrd (SF) none none 8 mtmsr[d] (PR) none none mtmsr[d] (EE) none none 1 mtmsr[d] (DR) none none mtmsr[d] (PR) none none 9 mtsr[in] CSI CSI mtmsr[d] (FP) none none mtspr (SDR1) ptesync CSI 3,4 mtmsr[d](FE0,FE1) none none mtspr (AMR) CSI CSI mtmsr[d] (SE, BE) none none mtspr (EAR) CSI CSI mtmsr[d] (IR) none none 9 mtspr (RMOR) CSI CSI 13 mtmsr[d] (RI) none none mtspr (HRMOR) CSI CSI 13 mtsr[in] none CSI 9 mtspr (LPCR) CSI CSI 13 mtspr (DEC) none none 10 mtspr (DABR) -- -- 2 mtspr (SDR1) ptesync CSI 3,4 mtspr (DABRX) -- -- 2 mtspr (CTRL) none none slbie CSI CSI mtspr (HDEC) none none 10 slbia CSI CSI mtspr (RMOR) none CSI 13 slbmte CSI CSI 11 mtspr (HRMOR) none CSI 9,13 tlbie CSI CSI 5,7 mtspr (LPCR) none CSI 13 tlbiel CSI ptesync 5 mtspr (LPIDR) CSI CSI 7,12 tlbia CSI CSI 5 slbie none CSI Store(PTE) none {ptesync, 6,7 slbia none CSI CSI} slbmte none CSI 9,11 Table 1: Synchronization requirements for data access tlbie none CSI 5,7 tlbiel none CSI 5 tlbia none CSI 5 Store(PTE) none {ptesync, CSI} 6,7 Table 2: Synchronization requirements for instruction fetch and/or execution 490 Power ISATM -- Book III-S Version 2.04 Notes: tion ensures that all preceding instructions that access data storage have completed to a point at 1. The effect of changing the EE bit is immediate, which they have reported all exceptions they will even if the mtmsr[d] instruction is not context syn- cause. chronizing (i.e., even if L=1). 1 If an mtmsr[d] instruction sets the EE bit to 0, The context synchronizing instruction after the neither an External interrupt nor a Decre- tlbie, tlbiel, or tlbia instruction ensures that stor- menter interrupt occurs after the mtmsr[d] is age accesses associated with instructions follow- executed. ing the context synchronizing instruction will not 1 If an mtmsr[d] instruction changes the EE bit use the TLB entry(s) being invalidated. from 0 to 1 when an External, Decrementer, or (If it is necessary to order storage accesses asso- higher priority exception exists, the corre- ciated with preceding instructions, or Reference sponding interrupt occurs immediately after and Change bit updates associated with preceding the mtmsr[d] is executed, and before the next address translations, with respect to subsequent instruction is executed in the program that set data accesses, a ptesync instruction must also be EE to 1. used, either before or after the tlbie, tlbiel, or tlbia 1 If a hypervisor executes the mtmsr[d] instruc- instruction. These effects of the ptesync instruc- tion that sets the EE bit to 0, a Hypervisor tion are described in the last paragraph of Note 8.) Decrementer interrupt does not occur after mtmsr[d] is executed as long as the proces- 6. The notation "{ptesync,CSI}" denotes an instruc- sor remains in hypervisor state. tion sequence. Other instructions may be inter- 1 If the hypervisor executes an mtmsr[d] leaved with this sequence, but these instructions instruction that changes the EE bit from 0 to 1 must appear in the order shown. when a Hypervisor Decrementer or higher pri- No software synchronization is required before the ority exception exists, the corresponding inter- Store instruction because (a) stores are not per- rupt occurs immediately after the mtmsr[d] formed out-of-order and (b) address translations instruction is executed, and before the next associated with instructions preceding the Store instruction is executed, provided HDICE is 1. instruction are not performed again after the store 2. Synchronization requirements for this instruction has been performed (see Section 5.5). These are implementation-dependent. properties ensure that all address translations associated with instructions preceding the Store 3. SDR1 must not be altered when MSRDR=1 or instruction will be performed using the old contents MSRIR=1; if it is, the results are undefined. of the PTE. 4. A ptesync instruction is required before the mtspr The ptesync instruction after the Store instruction instruction because (a) SDR1 identifies the Page ensures that all searches of the Page Table that Table and thereby the location of Reference and are performed after the ptesync instruction com- Change bits, and (b) on some implementations, pletes will use the value stored (or a value stored use of SDR1 to update Reference and Change bits subsequently). The context synchronizing instruc- may be independent of translating the virtual tion after the ptesync instruction ensures that any address. (For example, an implementation might address translations associated with instructions identify the PTE in which to update the Reference following the context synchronizing instruction that and Change bits in terms of its offset in the Page were performed using the old contents of the PTE Table, instead of its real address, and then add the will be discarded, with the result that these Page Table address from SDR1 to the offset to address translations will be performed again and, if determine the real address at which to update the there is no corresponding entry in any implementa- bits.) To ensure that Reference and Change bits tion-specific address translation lookaside informa- are updated in the correct Page Table, SDR1 must tion, will use the value stored (or a value stored not be altered until all Reference and Change bit subsequently). updates associated with address translations that were performed, by the processor executing the The ptesync instruction also ensures that all stor- mtspr instruction, before the mtspr instruction is age accesses associated with instructions preced- executed have been performed with respect to that ing the ptesync instruction, and all Reference and processor. A ptesync instruction guarantees this Change bit updates associated with additional synchronization of Reference and Change bit address translations that were performed, by the updates, while neither a context synchronizing processor executing the ptesync instruction, operation nor the instruction fetching mechanism before the ptesync instruction is executed, will be does so. performed with respect to any processor or mech- anism, to the extent required by the associated 5. For data accesses, the context synchronizing Memory Coherence Required attributes, before instruction before the tlbie, tlbiel, or tlbia instruc- any data accesses caused by instructions following Chapter 10. Synchronization Requirements for Context Alterations 491 Version 2.04 the ptesync instruction are performed with respect different ESID (e.g., to satisfy an SLB miss). How- to that processor or mechanism. ever, the slbie is needed later if and when the translation that was contained in the replaced SLB 7. There are additional software synchronization entry is to be invalidated. requirements for this instruction in multiprocessor environments (e.g., it may be necessary to invali- 12. The context synchronizing instruction before the date one or more TLB entries on all processors in mtspr instruction ensures that the LPIDR is not the multiprocessor system and to be able to deter- altered out-of-order. (Out-of-order alteration of the mine that the invalidations have completed and LPIDR could permit the requirements described in that all side effects of the invalidations have taken Section 5.10.1 to be violated. For the same rea- effect). son, such a context synchronizing instruction may be needed even if the new LPID value is equal to Section 5.10 gives examples of using tlbie, Store, the old LPID value.) and related instructions to maintain the Page Table, in both multiprocessor and uniprocessor See also Chapter 2. "Logical Partitioning (LPAR)" environments. on page 397 regarding moving a processor from one partition to another. Programming Note 13. When the RMOR or HRMOR is modified, or the In a multiprocessor system, if software locking VC, VRMASD, RMLS, LPES1, or RMI fields of the is used to help ensure that the requirements LPCR are modified, software must invalidate all described in Section 5.10 are satisfied, the implementation-specific lookaside information lwsync instruction near the end of the lock used in address translation that depends on values acquisition sequence (see Section B.2.1.1 of stored in these registers. All implementations pro- Book II) may naturally provide the context syn- vide a means by which software can do this. chronization that is required before the alter- ation. 8. The alteration must not cause an implicit branch in effective address space. Thus, when changing MSRSF from 1 to 0, the mtmsrd instruction must have an effective address that is less than 232 - 4. Furthermore, when changing MSRSF from 0 to 1, the mtmsrd instruction must not be at effective address 232 - 4 (see Section 5.3.2 on page 420). 9. The alteration must not cause an implicit branch in real address space. Thus the real address of the context-altering instruction and of each subse- quent instruction, up to and including the next con- text synchronizing instruction, must be independent of whether the alteration has taken effect. 10. The elapsed time between the contents of the Dec- rementer or Hypervisor Decrementer becoming negative and the signaling of the corresponding exception is not defined. 11. If an slbmte instruction alters the mapping, or associated attributes, of a currently mapped ESID, the slbmte must be preceded by an slbie (or slbia) instruction that invalidates the existing trans- lation. This applies even if the corresponding entry is no longer in the SLB (the translation may still be in implementation-specific address translation lookaside information). No software synchroniza- tion is needed between the slbie and the slbmte, regardless of whether the index of the SLB entry (if any) containing the current translation is the same as the SLB index specified by the slbmte. No slbie (or slbia) is needed if the slbmte instruc- tion replaces a valid SLB entry with a mapping of a 492 Power ISATM -- Book III-S Version 2.04 Appendix A. Assembler Extended Mnemonics In order to make assembler language programs simpler tions. This appendix defines extended mnemonics and to write and easier to understand, a set of extended symbols related to instructions defined in Book III. mnemonics and symbols is provided for certain instruc- Assemblers should provide the extended mnemonics and symbols listed here, and may provide others. A.1 Move To/From Special Purpose Register Mnemonics This section defines extended mnemonics for the mftb mnemonic with one operand as the extended mtspr and mfspr instructions, including the Special form. In the extended form the TBR operand is omitted Purpose Registers (SPRs) defined in Book I and cer- and assumed to be 268 (the value that corresponds to tain privileged SPRs, and for the Move From Time Base TB). instruction defined in Book II. Programming Note The mtspr and mfspr instructions specify an SPR as a numeric operand; extended mnemonics are provided The extended mnemonics in Table 3 for SPRs that represent the SPR in the mnemonic rather than associated with the Performance Monitor facility requiring it to be coded as an operand. Similar are based on the definitions in Appendix B. extended mnemonics are provided for the Move From Other versions of Performance Monitor facilities Time Base instruction, which specifies the portion of used different sets of SPR numbers (all 32-bit Pow- the Time Base as a numeric operand. erPC processors used a different set, and some Note: mftb serves as both a basic and an extended early Power ISA processors used yet a different mnemonic. The Assembler will recognize an mftb mne- set). monic with two operands as the basic form, and an Appendix A. Assembler Extended Mnemonics 493 Version 2.04 Table 3: Extended mnemonics for moving to/from an SPR Move To SPR Move From SPR1 Special Purpose Register Extended Equivalent to Extended Equivalent to Fixed-Point Exception Register mtxer Rx mtspr 1,Rx mfxer Rx mfspr Rx,1 Link Register mtlr Rx mtspr 8,Rx mflr Rx mfspr Rx,8 Count Register mtctr Rx mtspr 9,Rx mfctr Rx mfspr Rx,9 Data Storage Interrupt Status mtdsisr Rx mtspr 18,Rx mfdsisr Rx mfspr Rx,18 Register Data Address Register mtdar Rx mtspr 19,Rx mfdar Rx mfspr Rx,19 Decrementer mtdec Rx mtspr 22,Rx mfdec Rx mfspr Rx,22 Storage Description Register 1 mtsdr1 Rx mtspr 25,Rx mfsdr1 Rx mfspr Rx,25 Save/Restore Register 0 mtsrr0 Rx mtspr 26,Rx mfsrr0 Rx mfspr Rx,26 Save/Restore Register 1 mtsrr1 Rx mtspr 27,Rx mfsrr1 Rx mfspr Rx,27 AMR mtamr Rx mtspr 29,Rx mfamr Rx mfspr Rx,29 CTRL mtctrl Rx mtspr 152,Rx mfctrl Rx mfspr Rx,136 Special Purpose Registers mtsprg n,Rx mtspr 272+n,Rx mfsprg Rx,n mfspr Rx,272+n G0 through G3 Time Base [Lower] mttbl Rx mtspr 284,Rx mftb Rx mftb Rx,2681 mfspr Rx,268 Time Base Upper mttbu Rx mtspr 285,Rx mftbu Rx mftb Rx,2691 mfspr Rx,269 Time Base Upper 40 mttbu40 Rx mtspr 286,Rx - - Processor Version Register - - mfpvr Rx mfspr Rx,287 MMCRA mtmmcra Rx mtspr 786,Rx mfmmcra Rx mfspr Rx,770 PMC1 mtpmc1 Rx mtspr 787,Rx mfpmc1 Rx mfspr Rx,771 PMC2 mtpmc2 Rx mtspr 788,Rx mfpmc2 Rx mfspr Rx,772 PMC3 mtpmc3 Rx mtspr 789,Rx mfpmc3 Rx mfspr Rx,773 PMC4 mtpmc4 Rx mtspr 790,Rx mfpmc4 Rx mfspr Rx,774 PMC5 mtpmc5 Rx mtspr 791,Rx mfpmc5 Rx mfspr Rx,775 PMC6 mtpmc6 Rx mtspr 792,Rx mfpmc6 Rx mfspr Rx,776 MMCR0 mtmmcr0 Rx mtspr 795,Rx mfmmcr0 Rx mfspr Rx,779 MMCR1 mtmmcr1 Rx mtspr 798,Rx mfmmcr1 Rx mfspr Rx,782 PPR mtppr Rx mtspr 896, Rx mfppr Rx mfspr Rx, 896 Processor Identification Register - - mfpir Rx mfspr Rx,1023 1 The mftb instruction is Category: Server.Phased-Out. Assemblers targeting version 2.03 or later of the architec- ture should generate an mfspr instruction for the mftb and mftbu extended mnemonics; see the corresponding Assembler Note in the mftb instruction description (see Section 4.2.1 of Book II). 494 Power ISATM -- Book III-S Version 2.04 Appendix B. Example Performance Monitor Note - SIAR and SDAR (Sampled Instruction Address Register and Sampled Data Address This Appendix describes an example implementa- Register), which contain the address of the tion of a Performance Monitor. A subset of these "sampled instruction" and of the "sampled requirements are being considered for inclusion in data" the Architecture as part of Category: Server.Perfor- mance Monitor. 1 the Performance Monitor interrupt, which can be caused by monitored conditions and events A Performance Monitor facility provides a means of col- The minimal subset of the features that makes the lecting information about program and system perfor- resulting Performance Monitor useful to software con- mance. sists of MSRPMM, PMC1, PMC2, PMC3, PMC4, The resources (e.g., SPR numbers) that a Performance MMCR0, MMCR1, and MMCRA and certain bits and Monitor facility may use are identified elsewhere in this fields of these three Monitor Mode Control Registers, Book. All other aspects of any Performance Monitor and the Performance Monitor Interrupt. These features facility are implementation-dependent. are identified as the "basic" features below. The remaining features (the remaining SPRs, and the This appendix provides an example of a Performance remaining bits and fields in the three Monitor Mode Monitor facility. It is only an example; implementations Control Registers) are considered "extensions". may provide all, some, or none of the features described here, or may provide features that are similar The events that can be counted in the PMCs as well as to those described here but differ in detail. the code that identifies each event are implementation- dependent. The events and codes may vary between PMCs, as well as between implementations. For the Programming Note programmable PMCs, the event to be counted is Because the features provided by a Performance selected by specifying the appropriate code in the Monitor facility are implementation-dependent, MMCR "Selector" field for the PMC. Some events may operating systems should provide services that include operations that are performed out-of-order. support the useful performance monitoring func- tions in a generic fashion. Application programs Many aspects of the operation of the Performance should use these services, and should not depend Monitor are summarized by the following hierarchy, on the features provided by a particular implemen- which is described starting at the lowest level. tation. 1 A "counter negative condition" exists when the value in a PMC is negative (i.e., when bit 0 of the The example Performance Monitor facility consists of PMC is 1). A "Time Base transition event" occurs the following features (described in detail in subsequent when a selected bit of the Time Base changes sections). from 0 to 1 (the bit is selected by an MMCR field). 1 one MSR bit The term "condition or event" is used as an abbre- viation for "counter negative condition or Time - PMM (Performance Monitor Mark), which can Base transition event". A condition or event can be be used to select one or more programs for caused implicitly by the processor (e.g., increment- monitoring ing a PMC) or explicitly by software (mtspr). 1 SPRs 1 A condition or event is enabled if the correspond- - PMC1 - PMC6 (Performance Monitor Counter ing "Enable" bit in an MMCR is 1. The occurrence registers 1 - 6), which count events of an enabled condition or event can have side effects within the Performance Monitor, such as - MMCR0, MMCR1, and MMCRA (Monitor causing the PMCs to cease counting. Mode Control Registers 0, 1, and A), which control the Performance Monitor facility 1 An enabled condition or event causes a Perfor- mance Monitor alert if Performance Monitor alerts are enabled by the corresponding "Enable" bit in Appendix B. Example Performance Monitor 495 Version 2.04 an MMCR. A single Performance Monitor alert may Programming Note reflect multiple enabled conditions and events. Software can use this bit as a process-specific 1 A Performance Monitor alert causes a Perfor- marker which, in conjunction with MMCR0FCM0 mance Monitor exception. FCM1 (see Section B.2.2), permits events to be The exception effects of the Performance Monitor counted on a process-specific basis. (The bit is are said to be consistent with the contents of saved by interrupts and restored by rfid.) MMCR0PMAO if one of the following statements is Common uses of the PMM bit include the following. true. (MMCR0PMAO reflects the occurrence of Per- formance Monitor alerts; see the definition of that 1 Count events for a few selected processes. bit in Section B.2.2.) This use requires the following bit settings. - MMCR0PMAO=0 and a Performance Monitor - MSRPMM=1 for the selected processes, exception does not exist. MSRPMM=0 for all other processes - MMCR0PMAO=1 and a Performance Monitor - MMCR0FCM0=1 exception exists. - MMCR0FCM1=0 A context synchronizing instruction or event that 1 Count events for all but a few selected pro- occurs when MMCR0PMAO=0 ensures that the cesses. This use requires the following bit set- exception effects of the Performance Monitor are tings. consistent with the contents of MMCR0PMAO. - MSRPMM=1 for the selected processes, MSRPMM=0 for all other processes Even without software synchronization, when the - MMCR0FCM0=0 contents of MMCR0PMAO change, the exception - MMCR0FCM1=1 effects of the Performance Monitor become consis- tent with the new contents of MMCR0PMAO suffi- Notice that for both of these uses a mark value of 1 ciently soon that the Performance Monitor facility is identifies the "few" processes and a mark value of 0 useful to software for its intended purposes. identifies the remaining "many" processes. Because the PMM bit is set to 0 when an interrupt 1 A Performance Monitor exception causes a Perfor- occurs (see Figure 37 on page 466), interrupt han- mance Monitor interrupt when MSREE=1. dlers are treated as one of the "many". If it is desired to treat interrupt handlers as one of the Programming Note "few", the mark value convention just described The Performance Monitor can be effectively dis- would be reversed. abled (i.e., put into a state in which Performance Monitor SPRs are not altered and Performance Monitor interrupts do not occur) by setting MMCR0 to 0x0000_0000_8000_0000. B.2 Special Purpose Registers The Performance Monitor SPRs count events, control the operation of the Performance Monitor, and provide B.1 PMM Bit of the Machine associated information. State Register The Performance Monitor SPRs can be read and writ- ten using the mfspr and mtspr instructions (see The Performance Monitor uses MSR bit PMM, which is Section 4.4.3, "Move To/From System Register Instruc- defined as follows. tions" on page 411). The Performance Monitor SPR numbers are shown in Figure 46. Writing any of the Bit Description Performance Monitor SPRs is privileged. Reading any 61 Performance Monitor Mark (PMM) of the Performance Monitor SPRs is not privileged This bit is a basic feature. (however, the privileged SPR numbers used to write the SPRs can also be used to read them; see the fig- This bit contains the Performance Monitor ure). "mark" (0 or 1). The elapsed time between the execution of an instruc- tion and the time at which events due to that instruction have been reflected in Performance Monitor SPRs is not defined. No means are provided by which software can ensure that all events due to preceding instructions have been reflected in Performance Monitor SPRs. Similarly, if the events being monitored may be caused by operations that are performed out-of-order, no means are provided by which software can prevent such events due to subsequent instructions from being 496 Power ISATM -- Book III-S Version 2.04 reflected in Performance Monitor SPRs. Thus the con- B.2.1 Performance Monitor tents obtained by reading a Performance Monitor SPR may not be precise: it may fail to reflect some events Counter Registers due to instructions that precede the mfspr and may The six Performance Monitor Counter registers, PMC1 reflect some events due to instructions that follow the through PMC6, are 32-bit registers that count events. mfspr. This lack of precision applies regardless of whether the state of the processor is such that the SPR is subject to change by the processor at the time the PMC1 mfspr is executed. Similarly, if an mtspr instruction is PMC2 executed that changes the contents of the Time Base, PMC3 the change is not guaranteed to have taken effect with PMC4 respect to causing Time Base transition events until PMC5 after a subsequent context synchronizing instruction has been executed. PMC6 If an mtspr instruction is executed that changes the 32 63 value of a Performance Monitor SPR other than SIAR or SDAR, the change is not guaranteed to have taken Figure 47. Performance Monitor Counter registers effect until after a subsequent context synchronizing PMC1, PMC2, PMC3, and PMC4 are basic features. instruction has been executed (see Chapter PMC5 and PMC6 are not programmable. PMC5 10. "Synchronization Requirements for Context Alter- counts instructions completed and PMC6 counts ations" on page 489). cycles. Programming Note Normally each PMC is incremented each processor Depending on the events being monitored, the con- cycle by the number of times the corresponding event tents of Performance Monitor SPRs may be occurred in that cycle. Other modes of incrementing affected by aspects of the runtime environment may also be provided (e.g., see the description of (e.g., cache contents) that are not directly attribut- MMCR1 bits PMC1HIST and PMCjHIST). able to the programs being monitored. "PMCj" is used as an abbreviation for "PMCi, i > 1". Programming Note 1,2 PMC5 and PMC6 are defined to facilitate calculat- SPR Register Privi- ing basic performance metrics such as cycles per decimal spr5:9 spr0:4 Name leged instruction (CPI). 770,786 11000 n0010 MMCRA no,yes 771,787 11000 n0011 PMC1 no,yes 772,788 11000 n0100 PMC2 no,yes Programming Note 773,789 11000 n0101 PMC3 no,yes Software can use a PMC to "pace" the collection of 774,790 11000 n0110 PMC4 no,yes Performance Monitor data. For example, if it is 775,791 11000 n0111 PMC5 no,yes desired to collect event counts every n cycles, soft- 776,792 11000 n1000 PMC6 no,yes ware can specify that a particular PMC count cycles and set that PMC to 0x8000_0000 - n. The events of interest would be counted in other PMCs. 779,795 11000 n1011 MMCR0 no,yes The counter negative condition that will occur after 780,796 11000 n1100 SIAR no,yes n cycles can, with the appropriate setting of MMCR 781,797 11000 n1101 SDAR no,yes bits, cause counter values to become frozen, cause 782,798 11000 n1110 MMCR1 no,yes a Performance Monitor interrupt to occur, etc. 1 Note that the order of the two 5-bit halves of the SPR number is reversed. 2 For mtspr, n must be 1. For mfspr, reading B.2.2 Monitor Mode Control the SPR is privileged if and only if n=1. Register 0 Figure 46. Performance Monitor SPR encodings for Monitor Mode Control Register 0 (MMCR0) is a 64-bit mtspr and mfspr register. This register, along with MMCR1 and Appendix B. Example Performance Monitor 497 Version 2.04 MMCRA, controls the operation of the Performance This bit is a basic feature. Monitor. 0 Performance Monitor alerts are disabled. 1 Performance Monitor alerts are enabled MMCR0 until a Performance Monitor alert occurs, 0 63 at which time: Figure 48. Monitor Mode Control Register 0 1 MMCR0PMAE is set to 0 1 MMCR0PMAO is set to 1 MMCR0 is a basic feature. Within MMCR0, some of the bits and fields are basic features and some are Programming Note extensions. The basic bits and fields are identified as Software can set this bit and such, below. MMCR0PMAO to 0 to prevent Performance Some bits of MMCR0 are altered by the processor Monitor interrupts. when various events occur, as described below. Software can set this bit to 1 and then poll The bit definitions of MMCR0 are as follows. MMCR0 the bit to determine whether an enabled bits that are not implemented are treated as reserved. condition or event has occurred. This is especially useful for software that runs Bit(s) Description with MSREE=0. 0:31 Reserved In earlier versions of the architecture that 32 Freeze Counters (FC) lacked the concept of Performance Moni- tor alerts, this bit was called Performance This bit is a basic feature. Monitor Exception Enable (PMXE). 0 The PMCs are incremented (if permitted by other MMCR bits). 38 Freeze Counters on Enabled Condition or 1 The PMCs are not incremented. Event (FCECE) The processor sets this bit to 1 when an 0 The PMCs are incremented (if permitted enabled condition or event occurs and by other MMCR bits). MMCR0FCECE=1. 1 The PMCs are incremented (if permitted 33 Freeze Counters in Privileged State (FCS) by other MMCR bits) until an enabled condition or event occurs when This bit is a basic feature. MMCR0TRIGGER=0, at which time: 0 The PMCs are incremented (if permitted 1 MMCR0FC is set to 1 by other MMCR bits). If the enabled condition or event occurs when 1 The PMCs are not incremented if MMCR0TRIGGER=1, the FCECE bit is treated MSRHV PR=0b00. as if it were 0. 34 Freeze Counters in Problem State (FCP) 39:40 Time Base Selector (TBSEL) This bit is a basic feature. This field selects the Time Base bit that can 0 The PMCs are incremented (if permitted cause a Time Base transition event (the event by other MMCR bits). occurs when the selected bit changes from 0 1 The PMCs are not incremented if to 1). MSRPR=1. 00 Time Base bit 63 is selected. 35 Freeze Counters while Mark = 1 (FCM1) 01 Time Base bit 55 is selected. This bit is a basic feature. 10 Time Base bit 51 is selected. 11 Time Base bit 47 is selected. 0 The PMCs are incremented (if permitted by other MMCR bits). 1 The PMCs are not incremented if MSRPMM=1. 36 Freeze Counters while Mark = 0 (FCM0) This bit is a basic feature. 0 The PMCs are incremented (if permitted by other MMCR bits). 1 The PMCs are not incremented if MSRPMM=0. 37 Performance Monitor Alert Enable (PMAE) 498 Power ISATM -- Book III-S Version 2.04 See the description of the FCECE bit, above, Programming Note regarding the interaction between TRIGGER Time Base transition events can be used and FCECE. to collect information about processor activity, as revealed by event counts in Programming Note PMCs and by addresses in SIAR and Uses of TRIGGER include the following. SDAR, at periodic intervals. 1 Resume counting in the PMCjs when In multiprocessor systems in which the PMC1 becomes negative, without Time Base registers are synchronized causing a Performance Monitor inter- among the processors, Time Base transi- rupt. Then freeze all PMCs (and tion events can be used to correlate the optionally cause a Performance Mon- Performance Monitor data obtained by the itor interrupt) when a PMCj becomes several processors. For this use, software negative. The PMCjs then reflect the must specify the same TBSEL value for all events that occurred between the the processors in the system. time PMC1 became negative and the Because the frequency of the Time Base time a PMCj becomes negative. This is implementation-dependent, software use requires the following MMCR0 bit should invoke a system service program settings. to obtain the frequency before choosing a - TRIGGER=1 value for TBSEL. - PMC1CE=0 - PMCjCE=1 41 Time Base Event Enable (TBEE) - TBEE=0 0 Time Base transition events are disabled. - FCECE=1 1 Time Base transition events are enabled. - PMAE=1 (if a Performance Moni- tor interrupt is desired) 42:47 Reserved 1 Resume counting in the PMCjs when 48 PMC1 Condition Enable (PMC1CE) PMC1 becomes negative, and cause This bit controls whether counter negative a Performance Monitor interrupt with- conditions due to a negative value in PMC1 out freezing any PMCs. The PMCjs are enabled. then reflect the events that occurred between the time PMC1 became 0 Counter negative conditions for PMC1 are negative and the time the interrupt disabled. handler reads them. This use 1 Counter negative conditions for PMC1 are requires the following MMCR0 bit set- enabled. tings. 49 PMCj Condition Enable (PMCjCE) - TRIGGER=1 This bit controls whether counter negative - PMC1CE=1 conditions due to a negative value in any - TBEE=0 PMCj (i.e., in any PMC except PMC1) are - FCECE=0 enabled. - PMAE=1 0 Counter negative conditions for all PMCjs 51:52 Setting is implementation-dependent. are disabled. 1 Counter negative conditions for all PMCjs 53:55 Reserved are enabled. 56 Performance Monitor Alert Occurred 50 Trigger (TRIGGER) (PMAO) 0 The PMCs are incremented (if permitted This bit is a basic feature. by other MMCR bits). 0 A Performance Monitor alert has not 1 PMC1 is incremented (if permitted by occurred since the last time software set other MMCR bits). The PMCjs are not this bit to 0. incremented until PMC1 is negative or an 1 A Performance Monitor alert has occurred enabled condition or event occurs, at since the last time software set this bit to which time: 0. 1 the PMCjs resume incrementing (if permitted by other MMCR bits) This bit is set to 1 by the processor when a 1 MMCR0TRIGGER is set to 0 Performance Monitor alert occurs. This bit can be set to 0 only by the mtspr instruction. Appendix B. Example Performance Monitor 499 Version 2.04 Some bits of MMCR1 are altered by the processor Programming Note when various events occur, as described below. Software can set this bit to 1 to simulate the occurrence of a Performance Monitor The bit definitions of MMCR1 are as follows. MMCR1 alert. bits that are not implemented are treated as reserved. Software should set this bit to 0 after han- Bit(s) Description dling the Performance Monitor alert. 0:31 Implementation-Dependent Use 57 Setting is implementation-dependent. These bits have implementation-dependent uses (e.g., extended event selection). 58 Freeze Counters 1-4 (FC1-4) 32:39 PMC1 Selector (PMC3SEL) 0 PMC1 - PMC4 are incremented (if permit- 40:47 PMC2 Selector (PMC4SEL) ted by other MMCR bits). 48:55 PMC3 Selector (PMC5SEL) 1 PMC1 - PMC4 are not incremented. 56:63 PMC4 Selector (PMC6SEL) 59 Freeze Counters 5-6 (FC5-6) Each of these fields contains a code that iden- 0 PMC5 - PMC6 are incremented (if permit- tifies the event to be counted by PMCs 1 ted by other MMCR bits). through 4 respectively. 1 PMC5 - PMC6 are not incremented. PMC Selectors are basic features. 60:61 Reserved Compatibility Note 62 Freeze Counters in Wait State (FCWAIT) In versions of the architecture that pre- This bit is a basic feature. cede Version 2.02 the PMC Selector 0 The PMCs are incremented (if permitted Fields were six bits long, and were split by other MMCR bits). between MMCR0 and MMCR1. PMC1-8 1 The PMCs are not incremented if were all programmable. CTRL31=0. Software is expected to set If more programmable PMCs are imple- CTRL31=0 when it is in a "wait state", i.e, mented in the future, additional MMCRs when there is no process ready to run. may be defined to cover the additional Only Branch Unit type of events do not incre- selectors. ment if CTRL31=0. Other units continue to count. 63 Freeze Counters in Hypervisor State (FCH) B.2.4 Monitor Mode Control This bit is a basic feature. Register A 0 The PMCs are incremented (if permitted Monitor Mode Control Register A (MMCRA) is a 64-bit by other MMCR bits). register. This register, along with MMCR0 and 1 The PMCs are not incremented if MMCR1, controls the operation of the Performance MSRHV PR=0b10. Monitor. MMCRA B.2.3 Monitor Mode Control 0 63 Register 1 Figure 50. Monitor Mode Control Register A Monitor Mode Control Register 1 (MMCR1) is a 64-bit MMCRA is a basic feature. Within MMCRA, some of register. This register, along with MMCR0 and the bits and fields are basic features and some are MMCRA, controls the operation of the Performance extensions. The basic bits and fields are identified as Monitor. such, below. MMCR1 Some bits of MMCRA are altered by the processor 0 63 when various events occur, as described below. Figure 49. Monitor Mode Control Register 1 The bit definitions of MMCRA are as follows. MMCRA bits that are not implemented are treated as reserved. MMCR1 is a basic feature. Within MMCR1, some of the bits and fields are basic features and some are Bit(s) Description extensions. The basic bits and fields are identified as 0:31 Reserved such, below. 500 Power ISATM -- Book III-S Version 2.04 32 Contents of SIAR and SDAR Are Related that the Performance Monitor alert occurred. This (CSSR) instruction is called the "sampled instruction". Set to 1 by the processor if the contents of The contents of SIAR may be altered by the processor SIAR and SDAR are associated with the same if and only if MMCR0PMAE=1. Thus after the Perfor- instruction; otherwise set to 0. mance Monitor alert occurs, the contents of SIAR are 33:34 Setting is implementation-dependent. not altered by the processor until software sets MMCR0PMAE to 1. After software sets MMCR0PMAE to 35 Sampled MSRHV (SAMPHV) 1, the contents of SIAR are undefined until the next Value of MSRHV when the Performance Moni- Performance Monitor alert occurs. tor Alert occurred. See Section B.4 regarding the effects of the Trace facil- 36 Sampled MSRPR (SAMPPR) ity on SIAR. Value of MSRPR when the Performance Moni- Programming Note tor Alert occurred. If the Performance Monitor alert causes a Perfor- 37:47 Setting is implementation-dependent. mance Monitor interrupt, the value of MSRHV PR 48:53 Threshold (THRESHOLD) that was in effect when the sampled instruction was being executed is reported in MMCRA. This field contains a "threshold value", which is a value such that only events that exceed the value are counted. The events to which a B.2.6 Sampled Data Address Reg- threshold value can apply are implementation- dependent, as are the dimension of the ister threshold (e.g., duration in cycles) and the The Sampled Data Address Register (SDAR) is a 64-bit granularity with which the threshold value is register. It contains the address of the "sampled data" interpreted. when a Performance Monitor alert occurs. Programming Note SDAR By varying the threshold value, software 0 63 can obtain a profile of the characteristics of the events subject to the threshold. For Figure 52. Sampled Data Address Register example, if PMC1 counts the number of When a Performance Monitor alert occurs, SDAR is set cache misses for which the duration to the effective address of the storage operand of an exceeds the threshold value, then soft- instruction that was being executed, possibly out-of- ware can obtain the distribution of cache order, at or around the time that the Performance Moni- miss durations for a given program by tor alert occurred. This storage operand is called the monitoring the program repeatedly using "sampled data". The sampled data may be, but need a different threshold value each time. not be, the storage operand (if any) of the sampled instruction (see Section B.2.5). 54:59 Reserved for implementation-specific use. 60:62 Reserved The contents of SDAR may be altered by the processor if and only if MMCR0PMAE=1. Thus after the Perfor- 63 Setting is implementation-dependent. mance Monitor alert occurs, the contents of SDAR are not altered by the processor until software sets MMCR0PMAE to 1. After software sets MMCR0PMAE to B.2.5 Sampled Instruction 1, the contents of SDAR are undefined until the next Address Register Performance Monitor alert occurs. The Sampled Instruction Address Register (SIAR) is a See Section B.4 regarding the effects of the Trace facil- 64-bit register. It contains the address of the "sampled ity on SDAR. instruction" when a Performance Monitor alert occurs. Programming Note SIAR If the Performance Monitor alert causes a Perfor- 0 63 mance Monitor interrupt, MMCRA indicates whether the sampled data is the storage operand of Figure 51. Sampled Instruction Address Register the sampled instruction. When a Performance Monitor alert occurs, SIAR is set to the effective address of an instruction that was being executed, possibly out-of-order, at or around the time Appendix B. Example Performance Monitor 501 Version 2.04 B.3 Performance Monitor B.4 Interaction with the Trace Interrupt Facility The Performance Monitor interrupt is a system caused If the Trace facility includes setting SIAR and SDAR interrupt (Section 6.4). It is masked by MSREE in the (see Appendix C, "Example Trace Extensions" on same manner that External and Decrementer interrupts page 503), and tracing is active (MSRSE=1 or are. MSRBE=1), the contents of SIAR and SDAR as used by the Performance Monitor facility are undefined and may The Performance Monitor interrupt is a basic feature. change even when MMCR0PMAE=0. A Performance Monitor interrupt occurs when no higher priority exception exists, a Performance Monitor excep- Programming Note tion exists, and MSREE=1. A potential combined use of the Trace and Perfor- If multiple Performance Monitor exceptions occur mance Monitor facilities is to trace the control flow before the first causes a Performance Monitor interrupt, of a program and simultaneously count events for the interrupt reflects the most recent Performance Mon- that program. itor exception and the preceding Performance Monitor exceptions are lost. The following registers are set: SRR0 Set to the effective address of the instruc- tion that the processor would have attempted to execute next if no interrupt conditions were present. SRR1 33:36 and 42:47 Implementation-specific. Others Loaded from the MSR. MSR See Figure 37 on page 466. SIAR Set to the effective address of the "sampled instruction" (see Section B.2.5). SDAR Set to the effective address of the "sampled data" (see Section B.2.6). Execution resumes at effective address 0x0000_0000_0000_0F00. In general, statements about External and Decre- menter interrupts elsewhere in this Book apply also to the Performance Monitor interrupt; for example, if a Performance Monitor exception exists when an mtm- srd[d] instruction is executed that changes MSREE from 0 to 1, the Performance Monitor interrupt will occur before the next instruction is executed (if no higher pri- ority exception exists). The priority of the Performance Monitor exception is equal to that of the External, Decrementer, and Hyper- visor Decrementer exceptions (i.e., the processor may generate any one of the four interrupts for which an exception exists) (see Section 6.7.2, "Ordered Excep- tions" on page 478 and Section 6.8, "Interrupt Priori- ties" on page 479). 502 Power ISATM -- Book III-S Version 2.04 Appendix C. Example Trace Extensions 34 Set to 1 if the traced instruction is dcbt, Note dcbtst, dcbz, dcbst, dcbf[l]; otherwise set This Appendix describes an example implementa- to 0. tion of Trace Extensions. A subset of these require- 35 Set to 1 if the traced instruction is a Load ments are being considered for inclusion in the instruction or eciwx; may be set to 1 if the Architecture as part of Category: Trace. traced instruction is icbi, dcbt, dcbtst, dcbst, dcbf[l]; otherwise set to 0. This appendix provides an example of extensions that 36 Set to 1 if the traced instruction is a Store may be added to the Trace facility described in instruction, dcbz, or ecowx; otherwise set Section 6.5.14, "Trace Interrupt [Category: Trace]" on to 0. page 473. It is only an example; implementations may 42 Set to 1 if the traced instruction is lswx or provide all, some, or none of the features described stswx; otherwise set to 0. here, or may provide features that are similar to those 43 Implementation-dependent. described here but differ in detail. 44 Set to 1 if the traced instruction is a Branch instruction and the branch is taken; other- The extensions consist of the following features wise set to 0. (described in detail below). 45 Set to 1 if the traced instruction is eciwx or 1 use of MSRSE BE=0b11 to specify new causes of ecowx; otherwise set to 0. Trace interrupts 46 Set to 1 if the traced instruction is lwarx, 1 specification of how certain SRR1 bits are set ldarx, stwcx., or stdcx.; otherwise set to 0. when a Trace interrupt occurs 47 Implementation-dependent. 1 setting of SIAR and SDAR (see Appendix B, "Example Performance Monitor" on page 495) SIAR and SDAR when a Trace interrupt occurs If the Performance Monitor facility is implemented and includes SIAR and SDAR (see Appendix B), the follow- MSRSE BE = 0b11 ing additional registers are set when a Trace interrupt occurs: If MSRSE BE=0b11, the processor generates a Trace exception under the conditions described in Section SIAR Set to the effective address of the traced 6.5.14 for MSRSE BE=0b01, and also after successfully instruction. completing the execution of any instruction that would SDAR Set to the effective address of the storage cause at least one of SRR1 bits 33:36, 42, and 44:46 to operand (if any) of the traced instruction; be set to 1 (see below) if the instruction were executed otherwise undefined. when MSRSE BE=0b10. If the state of the Performance Monitor is such that the This overrides the implicit statement in Section 6.5.14 Performance Monitor may be altering these registers that the effects of MSRSE BE=0b11 are the same as (i.e., if MMCR0PMAE=1), the contents of SIAR and those of MSRSE BE=0b10. SDAR as used by the Trace facility are undefined and may change even when no Trace interrupt occurs. SRR1 When a Trace interrupt occurs, the SRR1 bits that are not loaded from the MSR are set as follows instead of as described in Section 6.5.14. 33 Set to 1 if the traced instruction is icbi; oth- erwise set to 0. Appendix C. Example Trace Extensions 503 Version 2.04 504 Power ISATM -- Book III-S Version 2.04 Appendix D. Interpretation of the DSISR as Set by an Alignment Interrupt For most causes of Alignment interrupt, the interrupt Therefore two such instructions may yield the same handler will emulate the interrupting instruction. To do DSISR value (all 32 bits). For example, stw and stwx this, it needs the following characteristics of the inter- may both yield either the DSISR value shown in the fol- rupting instruction: lowing table for stw, or that shown for stwx. Load or store Length (halfword, word, doubleword) String, multiple, or elementary Fixed-point or floating-point Update or non-update Byte reverse or not Is it dcbz? The Power ISA optionally provides this information by setting bits in the DSISR that identify the interrupting instruction type. It is not necessary for the interrupt handler to load the interrupting instruction from storage. The mapping is unique except for a few exceptions that are discussed below. The near-uniqueness depends on the fact that many instructions, such as the fixed- and floating-point arithmetic instructions and the one-byte loads and stores, cannot cause an Alignment interrupt. See Section 6.5.8 for a description of how the opcode and extended opcode are mapped to a DSISR value for an X-, D-, or DS-form instruction that causes an Align- ment interrupt. The table on the next page shows the inverse mapping: how the DSISR bits identify the interrupting instruc- tion. The following notes are cited in the table. 1. The instructions lwz and lwarx give the same DSISR bits (all zero). But if lwarx causes an Align- ment interrupt, it should not be emulated. It is ade- quate for the Alignment interrupt handler simply to treat the instruction as if it were lwz. The emulator must use the address in the DAR, rather than com- pute it from RA/RB/D, because lwz and lwarx have different instruction formats. If opcode 0 ("Illegal or Reserved") can cause an Alignment interrupt, it will be indistinguishable to the interrupt handler from lwarx and lwz. 2. These are distinguished by DSISR bits 44:45, which are not shown in the table. The interrupt handler has no need to distinguish between an X-form instruction and the corresponding D- or DS-form instruction if one exists, and vice versa. Appendix D. Interpretation of the DSISR as Set by an Alignment Interrupt 505 Version 2.04 then it is or D/ then it is or D/ either X- DS- either X- DS- If DSISR form form If DSISR form form 47:53 is: opcode: opcode: so the instruction is: 47:53 is: opcode: opcode: so the instruction is: 00 0 0000 00000xxx00 x00000 lwarx,lwz,reserved(1) 10 0 0001 00010xxx10 - 00 0 0001 00010xxx00 x00010 ldarx 10 0 0010 00100xxx10 stwcx. 00 0 0010 00100xxx00 x00100 stw 10 0 0011 00110xxx10 stdcx. 00 0 0011 00110xxx00 x00110 - 10 0 0100 01000xxx10 - 00 0 0100 01000xxx00 x01000 lhz 10 0 0101 01010xxx10 - 00 0 0101 01010xxx00 x01010 lha 10 0 0110 01100xxx10 - 00 0 0110 01100xxx00 x01100 sth 10 0 0111 01110xxx10 - 00 0 0111 01110xxx00 x01110 lmw 10 0 1000 10000xxx10 lwbrx 00 0 1000 10000xxx00 x10000 lfs 10 0 1001 10010xxx10 - 00 0 1001 10010xxx00 x10010 lfd 10 0 1010 10100xxx10 stwbrx 00 0 1010 10100xxx00 x10100 stfs 10 0 1011 10110xxx10 - 00 0 1011 10110xxx00 x10110 stfd 10 0 1100 11000xxx10 lhbrx 00 0 1100 11000xxx00 x11000 - 10 0 1101 11010xxx10 - 00 0 1101 11010xxx00 x11010 ld, ldu, lwa (2) 10 0 1110 11100xxx10 sthbrx 00 0 1110 11100xxx00 x11100 - 10 0 1111 11110xxx10 - 00 0 1111 11110xxx00 x11110 std, stdu (2) 10 1 0000 00001xxx10 - 00 1 0000 00001xxx00 x00001 lwzu 10 1 0001 00011xxx10 - 00 1 0001 00011xxx00 x00011 - 10 1 0010 00101xxx10 - 00 1 0010 00101xxx00 x00101 stwu 10 1 0011 00111xxx10 - 00 1 0011 00111xxx00 x00111 - 10 1 0100 01001xxx10 eciwx 00 1 0100 01001xxx00 x01001 lhzu 10 1 0101 01011xxx10 - 00 1 0101 01011xxx00 x01011 lhau 10 1 0110 01101xxx10 ecowx 00 1 0110 01101xxx00 x01101 sthu 10 1 0111 01111xxx10 - 00 1 0111 01111xxx00 x01111 stmw 10 1 1000 10001xxx10 - 00 1 1000 10001xxx00 x10001 lfsu 10 1 1001 10011xxx10 - 00 1 1001 10011xxx00 x10011 lfdu 10 1 1010 10101xxx10 - 00 1 1010 10101xxx00 x10101 stfsu 10 1 1011 10111xxx10 - 00 1 1011 10111xxx00 x10111 stfdu 10 1 1100 11001xxx10 - 00 1 1100 11001xxx00 x11001 - 10 1 1101 11011xxx10 - 00 1 1101 11011xxx00 x11011 - 10 1 1110 11101xxx10 - 00 1 1110 11101xxx00 x11101 - 10 1 1111 11111xxx10 dcbz 00 1 1111 11111xxx00 x11111 - 11 0 0000 00000xxx11 lwzx 01 0 0000 00000xxx01 ldx 11 0 0001 00010xxx11 - 01 0 0001 00010xxx01 - 11 0 0010 00100xxx11 stwx 01 0 0010 00100xxx01 stdx 11 0 0011 00110xxx11 - 01 0 0011 00110xxx01 - 11 0 0100 01000xxx11 lhzx 01 0 0100 01000xxx01 - 11 0 0101 01010xxx11 lhax 01 0 0101 01010xxx01 lwax 11 0 0110 01100xxx11 sthx 01 0 0110 01100xxx01 - 11 0 0111 01110xxx11 - 01 0 0111 01110xxx01 - 11 0 1000 10000xxx11 lfsx 01 0 1000 10000xxx01 lswx 11 0 1001 10010xxx11 lfdx 01 0 1001 10010xxx01 lswi 11 0 1010 10100xxx11 stfsx 01 0 1010 10100xxx01 stswx 11 0 1011 10110xxx11 stfdx 01 0 1011 10110xxx01 stswi 11 0 1100 11000xxx11 - 01 0 1100 11000xxx01 - 11 0 1101 11010xxx11 - 01 0 1101 11010xxx01 - 11 0 1110 11100xxx11 - 01 0 1110 11100xxx01 - 11 0 1111 11110xxx11 stfiwx 01 0 1111 11110xxx01 - 11 1 0000 00001xxx11 lwzux 01 1 0000 00001xxx01 ldux 11 1 0001 00011xxx11 - 01 1 0001 00011xxx01 - 11 1 0010 00101xxx11 stwux 01 1 0010 00101xxx01 stdux 11 1 0011 00111xxx11 - 01 1 0011 00111xxx01 - 11 1 0100 01001xxx11 lhzux 01 1 0100 01001xxx01 - 11 1 0101 01011xxx11 lhaux 01 1 0101 01011xxx01 lwaux 11 1 0110 01101xxx11 sthux 01 1 0110 01101xxx01 - 11 1 0111 01111xxx11 - 01 1 0111 01111xxx01 - 11 1 1000 10001xxx11 lfsux 01 1 1000 10001xxx01 - 11 1 1001 10011xxx11 lfdux 01 1 1001 10011xxx01 - 11 1 1010 10101xxx11 stfsux 01 1 1010 10101xxx01 - 11 1 1011 10111xxx11 stfdux 01 1 1011 10111xxx01 - 11 1 1100 11001xxx11 - 01 1 1100 11001xxx01 - 11 1 1101 11011xxx11 - 01 1 1101 11011xxx01 - 11 1 1110 11101xxx11 - 01 1 1110 11101xxx01 - 11 1 1111 11111xxx11 - 01 1 1111 11111xxx01 - 10 0 0000 00000xxx10 - 506 Power ISATM -- Book III-S Version 2.04 Book III-E: Power ISA Operating Environment Architecture - Embedded Environment Book III-E: Power ISA Operating Environment Architecture - Embedded 507 Version 2.04 508 Power ISATM -- Book III-E Version 2.04 Chapter 1. Introduction 1.1 Overview. . . . . . . . . . . . . . . . . . . . 509 1.5 Exceptions. . . . . . . . . . . . . . . . . . . 510 1.2 32-Bit Implementations . . . . . . . . . 509 1.6 Synchronization . . . . . . . . . . . . . . 511 1.3 Document Conventions . . . . . . . . 509 1.6.1 Context Synchronization . . . . . . 511 1.3.1 Definitions and Notation . . . . . . 509 1.6.2 Execution Synchronization . . . . . 511 1.3.2 Reserved Fields. . . . . . . . . . . . . 510 1.4 General Systems Overview . . . . . 510 1.1 Overview interrupt" or "Unimplemented Operation exception type Program interrupt", as appropriate. Chapter 1 of Book I describes computation modes, 1 For "system instruction storage error handler" sub- document conventions, a general systems overview, stitute "Instruction Storage interrupt" or "Instruction instruction formats, and storage addressing. This chap- TLB Error", as appropriate. ter augments that description as necessary for the Power ISA Operating Environment Architecture. 1 For "system privileged instruction error handler" substitute "Privileged Instruction exception type Program interrupt". 1.2 32-Bit Implementations 1 For "system service program" substitute "System Call interrupt". Though the specifications in this document assume a 64-bit implementation, 32-bit implementations are per- 1 For "system trap handler" substitute "Trap type mitted as described in Appendix C, "Guidelines for Program interrupt". 64-bit Implementations in 32-bit Mode and 32-bit Imple- mentations" on page 637. 1.3.1 Definitions and Notation The definitions and notation given in Book I are aug- 1.3 Document Conventions mented by the following. 1 real page The notation and terminology used in Book I apply to this Book also, with the following substitutions. A unit of real storage that is aligned at a boundary 1 For "system alignment error handler" substitute that is a multiple of its size. The real page size may "Alignment interrupt". range from 1KB to 1TB. 1 For "system auxiliary processor enabled exception 1 context of a program error handler" substitute "Auxiliary Processor The processor state (e.g., privilege and relocation) Enabled Exception type Program interrupt", in which the program executes. The context is con- 1 For "system data storage error handler" substitute trolled by the contents of certain System Registers, "Data Storage interrupt" or Data TLB Error inter- such as the MSR, of certain lookaside buffers, rupt" as appropriate. such as the TLB, and of other resources. 1 For "system error handler" substitute "interrupt". 1 exception 1 For "system floating-point enabled exception error An error, unusual condition, or external signal, that handler" substitute "Floating-Point Enabled Excep- may set a status bit and may or may not cause an tion type Program interrupt". interrupt, depending upon whether the correspond- ing interrupt is enabled. 1 For "system illegal instruction error handler" substi- tute "Illegal Instruction exception type Program Chapter 1. Introduction 509 Version 2.04 1 interrupt 1 Contents of reserved fields are either preserved by the processor or written as zero. The act of changing the machine state in response to an exception, as described in Chapter The reader should be aware that reading and writing of 5. "Interrupts and Exceptions" on page 563. some of these registers (e.g., the MSR) can occur as a 1 trap interrupt side effect of processing an interrupt and of returning from an interrupt, as well as when requested explicitly An interrupt that results from execution of a Trap by the appropriate instruction (e.g., mtmsr instruction). instruction. 1 Additional exceptions to the rule that the processor obeys the sequential execution model, beyond 1.4 General Systems Overview those described in the section entitled "Instruction The processor or processor unit contains the sequenc- Fetching" in Book I, are the following. ing and processing controls for instruction fetch, - A System Reset or Machine Check interrupt instruction execution, and interrupt action. Most imple- may occur. The determination of whether an mentations also contain data and instruction caches. instruction is required by the sequential execu- Instructions that the processing unit can execute fall tion model is not affected by the potential into the following classes: occurrence of a System Reset or Machine 1 instructions executed in the Branch Processor Check interrupt. (The determination is 1 instructions executed in the Fixed-Point Processor affected by the potential occurrence of any 1 instructions executed in the Floating-Point Proces- other kind of interrupt.) sor - A context-altering instruction is executed 1 instructions executed in the Vector Processor (Chapter 10. "Synchronization Requirements 1 instructions executed in an Auxiliary Processor for Context Alterations" on page 625). The 1 other instructions executed by the processor context alteration need not take effect until the required subsequent synchronizing operation Almost all instructions executed in the Branch Proces- has occurred. sor, Fixed-Point Processor, Floating-Point Processor, and Vector Processor are nonprivileged and are described in Book I. Book I may describe additional nonprivileged instructions (e.g., Book II describes some nonprivileged instructions for cache management). 1 hardware Instructions executed in an Auxiliary Processor are implementation-dependent. Instructions related to the Any combination of hard-wired implementation, supervisor mode, control of processor resources, con- emulation assist, or interrupt for software assis- trol of the storage hierarchy, and all other privileged tance. In the last case, the interrupt may be to an instructions are described here or are implementation- architected location or to an implementation- dependent. dependent location. Any use of emulation assists or interrupts to implement the architecture is imple- mentation-dependent. 1.5 Exceptions 1 /, //, ///, ... denotes a field that is reserved in an instruction, in a register, or in an architected stor- The following augments the exceptions defined in Book age table. I that can be caused directly by the execution of an instruction: 1 ?, ??, ???, ... denotes a field that is implementa- tion-dependent in an instruction, in a register, or in 1 the execution of a floating-point instruction when an architected storage table. MSRFP=0 (Floating-Point Unavailable interrupt) 1 execution of an instruction that causes a debug 1.3.2 Reserved Fields event (Debug interrupt). 1 the execution of an auxiliary processor instruction Some fields of certain architected registers may be when the auxiliary processor instruction is unavail- written to automatically by the processor, e.g., able (Auxiliary Processor Unavailable interrupt) Reserved bits in System Registers. When the proces- sor writes to such a register, the following rules are 1 the execution of a Vector, SPE, or Embedded obeyed. Floating-Point instruction when MSRSPV=0 (SPE/ Embedded Floating-Point/Vector Unavailable inter- 1 Unless otherwise stated, no defined field other rupt) than the one(s) the processor is specifically updat- ing are modified. 510 Power ISATM -- Book III-E Version 2.04 1.6 Synchronization 1.6.2 Execution Synchronization The synchronization described in this section refers to An instruction is execution synchronizing if it satisfies the state of the processor that is performing the syn- items 2 and 3 of the definition of context synchroniza- chronization. tion (see Section 1.6.1). sync is treated like isync with respect to item 2. The execution synchronizing instruc- tions are sync, mtmsr and all context synchronizing 1.6.1 Context Synchronization instructions. An instruction or event is context synchronizing if it sat- Programming Note isfies the requirements listed below. Such instructions and events are collectively called context synchronizing All context synchronizing instructions are execution operations. The context synchronizing operations synchronizing. include the isync instruction, the System Linkage Unlike a context synchronizing operation, an exe- instructions, the mtmsr instruction, and most interrupts cution synchronizing instruction does not ensure (see Section 5.1). that the instructions following that instruction will 1. The operation causes instruction dispatching (the execute in the context established by that instruc- issuance of instructions by the instruction fetching tion. This new context becomes effective some- mechanism to any instruction execution mecha- time after the execution synchronizing instruction nism) to be halted. completes and before or at a subsequent context synchronizing operation. 2. The operation is not initiated or, in the case of dnh [Category: Embedded.Enhanced Debug], isync and wait [Category: Wait], does not complete, until all instructions that precede the operation have completed to a point at which they have reported all exceptions they will cause. 3. The operation ensures that the instructions that precede the operation will complete execution in the context (privilege, relocation, storage protec- tion, etc.) in which they were initiated. 4. If the operation directly causes an interrupt (e.g., sc directly causes a System Call interrupt) or is an interrupt, the operation is not initiated until no exception exists having higher priority than the exception associated with the interrupt (see Section 5.9, "Exception Priorities" on page 591). 5. The operation ensures that the instructions that fol- low the operation will be fetched and executed in the context established by the operation. (This requirement dictates that any prefetched instruc- tions be discarded and that any effects and side effects of executing them out-of-order also be dis- carded, except as described in Section 4.5, "Per- forming Operations Out-of-Order".) Programming Note A context synchronizing operation is necessarily execution synchronizing; see Section 1.6.2. Unlike the Synchronize instruction, a context syn- chronizing operation does not affect the order in which storage accesses are performed. Item 2 permits a choice only for isync (and sync; see Section 1.6.2) because all other execution syn- chronizing operations also alter context. Chapter 1. Introduction 511 Version 2.04 512 Power ISATM -- Book III-E Version 2.04 Chapter 2. Branch Processor 2.1 Branch Processor Overview . . . . . 513 2.3 Branch Processor Instructions . . . 515 2.2 Branch Processor Registers . . . . . 513 2.4 System Linkage Instructions . . . . . 515 2.2.1 Machine State Register . . . . . . . 513 2.1 Branch Processor Overview 34:36 Implementation-dependent 37 User Cache Locking Enable (UCLE) This chapter describes the details concerning the regis- [Category: Embedded Cache Locking.User ters and the privileged instructions implemented in the Mode] Branch Processor that are not covered in Book I. 0 Cache Locking instructions are privileged. 1 Cache Locking instructions can be exe- 2.2 Branch Processor Registers cuted in user mode (MSRPR=1). If category Embedded Cache Locking.User Mode is not supported, this bit is treated as 2.2.1 Machine State Register reserved. The MSR (MSR) is a 32-bit register. MSR bits are num- 38 SP/Embedded Floating-Point/Vector Avail- bered 32 (most-significant bit) to 63 (least-significant able (SPV) bit). This register defines the state of the processor. [Category: Signal Processing]: The MSR can also be modified by the mtmsr, rfi, rfci, 0 The processor cannot execute any SP rfdi [Category: Embedded.Enhanced Debug], rfmci, instructions except for the brinc instruc- wrtee and wrteei instructions and interrupts. It can be tion. read by the mfmsr instruction. 1 The processor can execute all SP instruc- tions. MSR 32 63 [Category: Vector]: Figure 1. Machine State Register 0 The processor cannot execute any Vector instruction. Below are shown the bit definitions for the Machine 1 The processor can execute Vector instruc- State Register. tions. Bit Description 39:44 Reserved 32 Computation Mode (CM) 45 Wait State Enable (WE) 0 The processor runs in 32-bit mode. 0 The processor is not in wait state and con- 1 The processor runs in 64-bit mode. tinues processing 1 The processor enters the wait state by 33 Interrupt Computation Mode (ICM) ceasing to execute instructions and enter- On interrupt this bit is copied to MSRCM, ing low power mode. The details of how selecting 32-bit or 64-bit mode for interrupt the wait state is entered and exited, and handling. how the processor behaves while in the wait state, are implementation-dependent. 0 MSRCM is set to 0 (32-bit mode) when an interrupt occurs. 1 MSRCM is set to 1 (64-bit mode) when an interrupt occurs. Chapter 2. Branch Processor 513 Version 2.04 46 Critical Enable (CE) 58 Instruction Address Space (IS) 0 Critical Input, Watchdog Timer, and Pro- 0 The processor directs all instruction cessor Doorbell Critical interrupts are dis- fetches to address space 0 (TS=0 in the abled relevant TLB entry). 1 Critical Input, Watchdog Timer, and Pro- 1 The processor directs all instruction cessor Doorbell Critical interrupts are fetches to address space 1 (TS=1 in the enabled relevant TLB entry). 47 Reserved 59 Data Address Space (DS) 48 External Enable (EE) 0 The processor directs all data storage accesses to address space 0 (TS=0 in the 0 External Input, Decrementer, Fixed-Inter- relevant TLB entry). val Timer, Processor Doorbell, and 1 The processor directs all data storage Embedded Performance Monitor [Cate- accesses to address space 1 (TS=1 in the gory:E.PM] interrupts are disabled. relevant TLB entry). 1 External Input, Decrementer, Fixed-Inter- val Timer, Processor Doorbell, and 60 Implementation-dependent Embedded Performance Monitor [Cate- 61 Performance Monitor Mark (PMM) gory:E.PM] interrupts are enabled. [Category: Embedded.Performance Monitor] 49 Problem State (PR) 0 Disable statistics gathering on marked 0 The processor is in supervisor mode, can processes. execute any instruction, and can access 1 Enable statistics gathering on marked pro- any resource (e.g. GPRs, SPRs, MSR, cesses etc.). See Appendix E for additional information. 1 The processor is in user mode, cannot execute any privileged instruction, and 62 Reserved cannot access any privileged resource. 63 Reserved MSRPR also affects storage access control, The Floating-Point Exception Mode bits FE0 and FE1 as described in Section 6.2.4. are interpreted as shown below. For further details see 50 Floating-Point Available (FP) Book I. [Category: Floating-Point] FE0 FE1 Mode 0 The processor cannot execute any float- 0 0 Ignore Exceptions ing-point instructions, including floating- 0 1 Imprecise Nonrecoverable point loads, stores and moves. 1 0 Imprecise Recoverable 1 The processor can execute floating-point 1 1 Precise instructions. 51 Machine Check Enable (ME) See Section 6.3, "Processor State After Reset" on page 595 for the initial state of the MSR. 0 Machine Check interrupts are disabled. 1 Machine Check interrupts are enabled. Programming Note 52 Floating-Point Exception Mode 0 (FE0) A Machine State Register bit that is reserved may [Category: Floating-Point] be altered by rfi/rfci/rfmci/rfdi [Category:Embed- ded.Enhanced Debug]. (See below) 53 Implementation-dependent 54 Debug Interrupt Enable (DE) 0 Debug interrupts are disabled 1 Debug interrupts are enabled if DBCR0IDM=1 55 Floating-Point Exception Mode 1 (FE1) [Category: Floating-Point] (See below) 56 Reserved 57 Reserved 514 Power ISATM -- Book III-E Version 2.04 2.3 Branch Processor Instruc- and by which the system can return from performing a service or from processing an interrupt. tions The System Call instruction is described in Book I, but only at the level required by an application programmer. A complete description of this instruction appears 2.4 System Linkage Instructions below. These instructions provide the means by which a pro- gram can call upon the system to perform a service, System Call SC-form Return From Interrupt XL-form sc rfi 17 /// /// /// /// // 1 / 19 /// /// /// 50 / 0 6 11 16 20 27 30 31 0 6 11 16 21 31 SRR0 1iea CIA + 4 MSR 1 SRR1 SRR1 1 MSR NIA 1iea SRR00:61 || 0b00 NIA 1 IVPR0:47 || IVOR848:59 || 0b0000 MSR 1 new_value (see below) The rfi instruction is used to return from a base class interrupt, or as a means of simultaneously establishing The effective address of the instruction following the a new context and synchronizing on that new context. System Call instruction is placed into SRR0. The con- tents of the MSR are copied into SRR1. The contents of SRR1 are placed into the MSR. If the new MSR value does not enable any pending excep- Then a System Call interrupt is generated. The inter- tions, then the next instruction is fetched, under control rupt causes the MSR to be set as described in Section of the new MSR value, from the address 5.6 on page 574. SRR00:61||0b00. (Note: VLE behavior may be different; The interrupt causes the next instruction to be fetched see Book VLE.) If the new MSR value enables one or from effective address more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in IVPR0:47||IVOR848:59||0b0000. this case the value placed into the applicable save/ restore register 0 by the interrupt processing mecha- This instruction is context synchronizing. nism (see Section 5.6 on page 574) is the address of Special Registers Altered: the instruction that would have been executed next had SRR0 SRR1 MSR the interrupt not occurred (i.e. the address in SRR0 at the time of the execution of the rfi). This instruction is privileged and context synchronizing. Special Registers Altered: MSR Chapter 2. Branch Processor 515 Version 2.04 Return From Critical Interrupt XL-form Return From Debug Interrupt X-form rfci rfdi [Category: Embedded.Enhanced Debug] 19 /// /// /// 51 / 0 6 11 16 21 31 19 /// /// /// 39 / 0 6 11 16 21 31 MSR 1 CSRR1 NIA 1iea CSRR00:61 || 0b00 MSR 1 DSRR1 NIA 1iea DSRR00:61 || 0b00 The rfci instruction is used to return from a critical class interrupt, or as a means of establishing a new The rfdi instruction is used to return from a Debug context and synchronizing on that new context simulta- interrupt, or as a means of establishing a new context neously. and synchronizing on that new context simultaneously. The contents of CSRR1 are placed into the MSR. If the The contents of DSRR1 are placed into the MSR. If the new MSR value does not enable any pending excep- new MSR value does not enable any pending excep- tions, then the next instruction is fetched, under control tions, then the next instruction is fetched, under control of the new MSR value, from the address of the new MSR value, from the address CSRR00:61||0b00. (Note: VLE behavior may be differ- DSRR00:61||0b00. (Note: VLE behavior may be differ- ent; see Book VLE.) If the new MSR value enables one ent; see Book VLE.) If the new MSR value enables one or more pending exceptions, the interrupt associated or more pending exceptions, the interrupt associated with the highest priority pending exception is gener- with the highest priority pending exception is gener- ated; in this case the value placed into SRR0 or ated; in this case the value placed into SRR0, CSRR0, CSRR0 by the interrupt processing mechanism (see or DSRR0 by the interrupt processing mechanism is Section 5.6 on page 574) is the address of the instruc- the address of the instruction that would have been tion that would have been executed next had the inter- executed next had the interrupt not occurred (i.e. the rupt not occurred (i.e. the address in CSRR0 at the address in DSRR0 at the time of the execution of the time of the execution of the rfci). rfdi). This instruction is privileged and context synchronizing. This instruction is privileged and context synchronizing. Special Registers Altered: Special Registers Altered: MSR MSR 516 Power ISATM -- Book III-E Version 2.04 Return From Machine Check Interrupt XL-form rfmci 19 /// /// /// 38 / 0 6 11 16 21 31 MSR 1 MCSRR1 NIA 1iea MCSRR00:61 || 0b00 The rfmci instruction is used to return from a Machine Check class interrupt, or as a means of establishing a new context and synchronizing on that new context simultaneously. The contents of MCSRR1 are placed into the MSR. If the new MSR value does not enable any pending exceptions, then the next instruction is fetched, under control of the new MSR value, from the address MCSRR00:61||0b00. (Note: VLE behavior may be differ- ent; see Book VLE.) If the new MSR value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is gener- ated; in this case the value placed into SRR0, CSRR0, MCSRR0, or DSRR0 [Category: Embedded.Enhanced Debug] by the interrupt processing mechanism (see Section 5.6 on page 574) is the address of the instruc- tion that would have been executed next had the inter- rupt not occurred (i.e. the address in MCSRR0 at the time of the execution of the rfmci). This instruction is privileged and context synchronizing. Special Registers Altered: MSR Chapter 2. Branch Processor 517 Version 2.04 518 Power ISATM -- Book III-E Version 2.04 Chapter 3. Fixed-Point Processor 3.1 Fixed-Point Processor Overview. . 519 3.3.4.2 External Process ID Store Context 3.2 Special Purpose Registers . . . . . . 519 (EPSC) Register . . . . . . . . . . . . . . . . . 522 3.3 Fixed-Point Processor Registers . 519 3.4 Fixed-Point Processor Instructions 523 3.3.1 Processor Version Register . . . . 519 3.4.1 Move To/From System Register 3.3.2 Processor Identification Register 519 Instructions . . . . . . . . . . . . . . . . . . . . . 523 3.3.3 Software-use SPRs . . . . . . . . . . 520 3.4.2 External Process ID Instructions 3.3.4 External Process ID Registers [Cate- [Category: Embedded.External PID] . . 529 gory: Embedded.External PID] . . . . . . 521 3.3.4.1 External Process ID Load Context (EPLC) Register . . . . . . . . . . . . . . . . . 521 3.1 Fixed-Point Processor Over- The PVR distinguishes between processors that differ in attributes that may affect software. It contains two view fields. Version A 16-bit number that identifies the version This chapter describes the details concerning the regis- of the processor. Different version numbers ters and the privileged instructions implemented in the indicate major differences between proces- Fixed-Point Processor that are not covered in Book I. sors, such as which optional facilities and instructions are supported. 3.2 Special Purpose Registers Revision A 16-bit number that distinguishes between implementations of the version. Different Special Purpose Registers (SPRs) are read and written revision numbers indicate minor differences using the mfspr (page 526) and mtspr (page 524) between processors having the same ver- instructions. Most SPRs are defined in other chapters sion number, such as clock rate and Engi- of this book; see the index to locate those definitions. neering Change level. Version numbers are assigned by the Power ISA Archi- 3.3 Fixed-Point Processor Reg- tecture process. Revision numbers are assigned by an implementation-defined process. isters 3.3.2 Processor Identification 3.3.1 Processor Version Register Register The Processor Version Register (PVR) is a 32-bit read- The Processor Identification Register (PIR) is a 32-bit only register that contains a value identifying the ver- register that contains a value that can be used to distin- sion and revision level of the processor. The contents guish the processor from other processors in the sys- of the PVR can be copied to a GPR by the mfspr tem. The contents of the PIR can be copied to a GPR instruction. Read access to the PVR is privileged; write by the mfspr instruction. Read access to the PIR is access is not provided. Version Revision 32 48 63 Figure 2. Processor Version Register Chapter 3. Fixed-Point Processor 519 Version 2.04 privileged; write access, if provided, is implementation- The contents of SPRGi can be read using mfspr and dependent. written into SPRGi using mtspr. PROCID 32 63 Bits Name Description 32:63 PROCID Processor ID Figure 3. Processor Identification Register The means by which the PIR is initialized are imple- mentation-dependent. 3.3.3 Software-use SPRs Software-use SPRs are 64-bit registers provided for use by software. SPRG0 SPRG1 SPRG2 SPRG3 SPRG4 SPRG5 SPRG6 SPRG7 SPRG8 SPRG9 [Category: Embedded.Enhanced Debug] 0 63 Figure 4. Special Purpose Registers Programming Note USPRG0 was made a 32-bit register and renamed to VRSAVE; see Book I, Section 5.3.3. SPRG0 through SPRG2 These 64-bit registers can be accessed only in supervisor mode. SPRG3 This 64-bit register can be read in supervisor mode and can be written only in supervisor mode. It is implementation-dependent whether or not this reg- ister can be read in user mode. SPRG4 through SPRG7 These 64-bit registers can be written only in super- visor mode. These registers can be read in super- visor and user modes. SPRG8 through SPRG9 These 64-bit registers can be accessed only in supervisor mode. 520 Power ISATM -- Book III-E Version 2.04 3.3.4 External Process ID Regis- a Data Storage interrupt occurs, and the ESREPID bit is set to 1. If the operation was a Store, the ESRST bit is ters [Category: Embedded.Exter- also set to 1. nal PID] The External Process ID Registers provide capabilities 3.3.4.1 External Process ID Load Con- for loading and storing General Purpose Registers and text (EPLC) Register performing cache management operations using a sup- The EPLC register contains fields to provide the con- plied context other than the context normally used by text for External Process ID load instructions. the programming model. Two SPRs describe the context for loading and storing EPLC using external contexts. The External Process ID Load 32 63 Context (EPLC) Register provides the context for Exter- Figure 5. External Process ID Load Context nal Process ID Load instructions, and the External Pro- Register cess ID Store Context (EPSC) Register provides the context for External Process ID Store instructions. Each These bits are interpreted as follows: of these registers contains a PR (privilege) bit, an AS (address space) bit, and a Process ID. Changes to the Bit Definition EPLC or the EPSC Register require that a context syn- 0 External Load Context PR Bit (EPR) chronizing operation be performed prior to using any Used in place of MSRPR by the storage External Process ID instructions that use these regis- access control mechanism when an External ters. Process ID Load instruction is executed. External Process ID instructions that use the context 0 Supervisor mode provided by the EPLC register include lbepx, lhepx, 1 User mode lwepx, ldepx, dcbtep, dcbtstep, dcbfep, dcbstep, icbiep, lfdepx, evlddepx, lvepx, and lvepxl and those 1 External Load Context AS Bit (EAS) that use the context provided by the EPSC register Used in place of MSRDS for translation when include stbepx, sthepx, stwepx, stdepx, dcbzep, an External Process ID Load instruction is stfdepx, evstddepx, stvepx, and stvepxl. Instruction executed. definitions appear in Section 3.4.2. 0 Address space 0 System software configures the EPLC register to reflect 1 Address space 1 the Process ID, AS, and PR state from the context that 2:17 Reserved it wishes to perform loads from and configures the EPSC register to reflect the Process ID, AS, and PR 18:31 External Load Context Process ID Value state from the context it wishes to perform stores to. (EPID) Software then issues External Process ID instructions Used in place of all Process ID register values to manipulate data as required. for translation when an external Process ID Load instruction is executed. When the processor executes an External Process ID Load instruction, it uses the context information in the EPLC Register instead of the normal context with respect to address translation and storage access con- trol. EPLCEPR is used in place of MSRPR, EPLCEAS is used in place of MSRDS, and EPLCEPID is used in place of any Process ID registers implemented by the processor. Similarly, when the processor executes an External Process ID Store instruction, it uses the con- text information in the EPSC Register instead of the normal context with respect to address translation and storage access control. EPSCEPR is used in place of MSRPR, EPSCEAS is used in place of MSRDS, and EPSCEPID is used in place of all Process ID registers implemented by the processor. Translation occurs using the new substituted values. If the TLB lookup is successful, the storage access control mechanism grants or denies the access using context information from EPLCEPR or EPSCEPR for loads and stores respectively. If access is not granted, Chapter 3. Fixed-Point Processor 521 Version 2.04 3.3.4.2 External Process ID Store Con- text (EPSC) Register The EPSC register contains fields to provide the con- text for External Process ID Store instructions. The field encoding is the same as the EPLC Register. EPSC 32 63 Figure 6. External Process ID Store Context Register These bits are interpreted as follows: Bits Definition 0 External Store Context PR Bit (EPR) Used in place of MSRPR by the storage access control mechanism when an External Process ID Store instruction is executed. 0 Supervisor mode 1 User mode 1 External Store Context AS Bit (EAS) Used in place of MSRDS for translation when an External Process ID Store instruction is executed. 0 Address space 0 1 Address space 1 2:17 Reserved 18:31 External Store Context Process ID Value (EPID) Used in place of all Process ID register values for translation when an external PID Store instruction is executed. 522 Power ISATM -- Book III-E Version 2.04 3.4 Fixed-Point Processor Instructions 3.4.1 Move To/From System Register Instructions The Move To Special Purpose Register and Move From numbers shown in Figure 7 and the implementation- Special Purpose Register instructions are described in specific SPR numbers that are implemented, and simi- Book I, but only at the level available to an application larly for "defined" registers. programmer. For example, no mention is made there of registers that can be accessed only in supervisor Extended mnemonics mode. The descriptions of these instructions given below extend the descriptions given in Book I, but do Extended mnemonics are provided for the mtspr and not list Special Purpose Registers that are implementa- mfspr instructions so that they can be coded with the tion-dependent. In the descriptions of these instructions SPR name as part of the mnemonic rather than as a given below, the "defined" SPR numbers are the SPR numeric operand; see Appendix B. SPR Numbers SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 1 00000 00001 XER no no 64 B 8 00000 01000 LR no no 64 B 9 00000 01001 CTR no no 64 B 22 00000 10110 DEC yes yes 32 B 26 00000 11010 SRR0 yes yes 64 B 27 00000 11011 SRR1 yes yes 64 B 48 00001 10000 PID yes yes 32 E 54 00001 10110 DECAR yes yes 32 E 58 00001 11010 CSRR0 yes yes 64 E 59 00001 11011 CSRR1 yes yes 32 E 61 00001 11101 DEAR yes yes 64 E 62 00001 11110 ESR yes yes 32 E 63 00001 11111 IVPR yes yes 64 E 256 01000 00000 VRSAVE no no 32 E,V 259 01000 00011 SPRG3 - no 64 B 260-263 01000 001xx SPRG[4-7] - no 64 E 268 01000 01100 TB - no 64 B 269 01000 01101 TBU - no 325 B 272-275 01000 100xx SPRG[0-3] yes yes 64 B 276-279 01000 101xx SPRG[4-7] yes yes 64 E 282 01000 11010 EAR yes yes 32 EC 284 01000 11100 TBL yes - 32 B 285 01000 11101 TBU yes - 32 B 286 01000 11110 PIR - yes 32 E 287 01000 11111 PVR - yes 32 B 304 01001 10000 DBSR yes3 yes 32 E 308 01001 10100 DBCR0 yes yes 32 E 309 01001 10101 DBCR1 yes yes 32 E 310 01001 10110 DBCR2 yes yes 32 E 312 01001 11000 IAC1 yes yes 64 E 313 01001 11001 IAC2 yes yes 64 E 314 01001 11010 IAC3 yes yes 64 E 315 01001 11011 IAC4 yes yes 64 E 316 01001 11100 DAC1 yes yes 64 E 317 01001 11101 DAC2 yes yes 64 E 318 01001 11110 DVC1 yes yes 64 E 319 01001 11111 DVC2 yes yes 64 E 336 01010 10000 TSR yes3 yes 32 E Chapter 3. Fixed-Point Processor 523 Version 2.04 SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 340 01010 10100 TCR yes yes 32 E 400-415 01100 1xxxx IVOR[0-15] yes yes 32 E 512 10000 00000 SPEFSCR no no 32 SPE 526 10000 01110 ATB/ATBL - no 64 ATB 527 10000 01111 ATBU - no 32 ATB 528 10000 10000 IVOR32 yes yes 32 SPE 529 10000 10001 IVOR33 yes yes 32 SPE 530 10000 10010 IVOR34 yes yes 32 SPE 531 10000 10011 IVOR35 yes yes 32 E.PM 532 10000 10100 IVOR36 yes yes 32 E.PC 533 10000 10101 IVOR37 yes yes 32 E.PC 570 10001 11010 MCSRR0 yes yes 64 E 571 10001 11011 MCSRR1 yes yes 32 E 572 10001 11100 MCSR yes yes 64 E 574 10001 11110 DSRR0 yes yes 64 E.ED 575 10001 11111 DSRR1 yes yes 32 E.ED 604 10010 11100 SPRG8 yes yes 64 E 605 10010 11101 SPRG9 yes yes 64 E.ED 624 10011 10000 MAS0 yes yes 32 E.MF 625 10011 10001 MAS1 yes yes 32 E.MF 626 10011 10010 MAS2 yes yes 64 E.MF 627 10011 10011 MAS3 yes yes 32 E.MF 628 10011 10100 MAS4 yes yes 32 E.MF 630 10011 10110 MAS6 yes yes 32 E.MF 633 10011 11001 PID1 yes yes 32 E.MF 634 10011 11010 PID2 yes yes 32 E.MF 688-691 10101 100xx TLB[0-3]CFG yes yes 32 E.MF 702 10101 11110 EPR - yes 32 EXP 924 11100 11100 DCBTRL -4 yes 32 E.CD 925 11100 11101 DCBTRH -4 yes 32 E.CD 926 11100 11110 ICBTRL -5 yes 32 E.CD 927 11100 11111 ICDBTRH -5 yes 32 E.CD 944 11101 10000 MAS7 yes yes 32 E.MF 947 11101 10011 EPLC yes yes 32 E.PD 948 11101 10100 EPSC yes yes 32 E.PD 979 11110 10011 ICBDR -5 yes 32 E.CD 1012 11111 10100 MMUCSR0 yes yes 32 E.MF 1015 11111 10111 MMUCFG yes yes 32 E.MF - This register is not defined for this instruction. 1 Note that the order of the two 5-bit halves of the SPR number is reversed. 2 See Section 1.3.5 of Book I. 3 This register cannot be directly written to. Instead, bits in the register corre- sponding to 1 bits in (RS) can be cleared using mtspr SPR,RS. 4 The register can be written by the dcread instruction. 5 The register can be written by the icread instruction. All SPR numbers that are not shown above and are not implementation- specific are reserved. Figure 7. Embedded SPR List 0 6 11 21 31 Move To Special Purpose Register n 1 spr5:9 || spr0:4 XFX-form if length(SPR(n)) = 64 then SPR(n) 1 (RS) mtspr SPR,RS else SPR(n) 1 (RS)32:63 31 RS spr 467 / 524 Power ISATM -- Book III-E Version 2.04 The SPR field denotes a Special Purpose Register, encoded as shown in Figure 7. The contents of register RS are placed into the designated Special Purpose Register. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RS are placed into the SPR. For this instruction, SPRs TBL and TBU are treated as separate 32-bit registers; setting one leaves the other unaltered. spr0=1 if and only if writing the register is privileged. Execution of this instruction specifying a defined and privileged register when MSRPR=1 causes a Privileged Instruction type Program interrupt. Execution of this instruction specifying an SPR number that is not defined for the implementation causes either an Illegal Instruction type Program interrupt or one of the following. 1 if spr0=0: boundedly undefined results 1 if spr0=1: - if MSRPR=1: Privileged Instruction type Pro- gram interrupt; if MSRPR=0: boundedly unde- fined results If the SPR number is set to a value that is shown in Figure 7 but corresponds to an optional Special Pur- pose Register that is not provided by the implementa- tion, the effect of executing this instruction is the same as if the SPR number were reserved. Special Registers Altered: See Figure 7 Compiler and Assembler Note For the mtspr and mfspr instructions, the SPR number coded in assembler language does not appear directly as a 10-bit binary number in the instruction. The number coded is split into two 5-bit halves that are reversed in the instruction, with the high-order 5 bits appearing in bits 16:20 of the instruction and the low-order 5 bits in bits 11:15. Programming Note For a discussion of software synchronization requirements when altering certain Special Pur- pose Registers, see Chapter 10. "Synchronization Requirements for Context Alterations" on page 625. Chapter 3. Fixed-Point Processor 525 Version 2.04 Move From Special Purpose Register Move To Device Control Register XFX-form XFX-form mfspr RT,SPR mtdcr DCRN,RS 31 RT spr 339 / 31 RS dcr 451 / 0 6 11 21 31 0 6 11 21 31 n 1 spr5:9 || spr0:4 DCRN 1 dcr0:4 || dcr5:9 if length(SPR(n)) = 64 then DCR(DCRN) 1 (RS) RT 1 SPR(n) else Let DCRN denote a Device Control Register. (The sup- RT 1 320 || SPR(n) ported Device Control Registers are implementation- dependent.) The SPR field denotes a Special Purpose Register, encoded as shown in Figure 7. The contents of the The contents of register RS are placed into the desig- designated Special Purpose Register are placed into nated Device Control Register. For 32-bit Device Con- register RT. For Special Purpose Registers that are 32 trol Registers, the contents of bits 32:63 of (RS) are bits long, the low-order 32 bits of RT receive the con- placed into the Device Control Register. tents of the Special Purpose Register and the high- This instruction is privileged. order 32 bits of RT are set to zero. Special Registers Altered: spr0=1 if and only if reading the register is privileged. Implementation-dependent. Execution of this instruction specifying a defined and privileged register when MSRPR=1 causes a Privileged Move To Device Control Register Indexed Instruction type Program interrupt. X-form Execution of this instruction specifying an SPR number that is not defined for the implementation causes either mtdcrx RA,RS an Illegal Instruction type Program interrupt or one of the following. 31 RS RA /// 387 / 0 6 11 16 21 31 1 if spr0=0: boundedly undefined results 1 if spr0=1: DCRN 1 (RA) - if MSRPR=1: Privileged Instruction type Pro- DCR(DCRN) 12(RS) gram interrupt Let the contents of register RA denote a Device Control - if MSRPR=0: boundedly undefined results Register. (The supported Device Control Registers supported are implementation-dependent.) If the SPR field contains a value that is shown in Figure 7 but corresponds to an optional Special Pur- The contents of register RS are placed into the desig- pose Register that is not provided by the implementa- nated Device Control Register. For 32-bit Device Con- tion, the effect of executing this instruction is the same trol Registers, the contents of RS32:63 are placed into as if the SPR number were reserved. the Device Control Register. Special Registers Altered: The specification of Device Control Registers using None mtdcrx, mtdcrux (see Book I), and mtdcr is imple- mentation-dependent. For example, mtdcr 105,r2 and Note mtdcrux r1,r2 (where register r1 contains the value 105) See the Notes that appear with mtspr. may not produce identical results on an implementa- tion. This instruction is privileged. Special Registers Altered: Implementation-dependent. 526 Power ISATM -- Book III-E Version 2.04 Move From Device Control Register Move To Machine State Register X-form XFX-form mtmsr RS mfdcr RT,DCRN 31 RS /// /// 146 / 31 RT dcr 323 / 0 6 11 16 21 31 0 6 11 21 31 newmsr 1 (RS)32:63 DCRN 1 dcr0:4 || dcr5:9 if MSRCM = 0 & newmsrCM = 1 then NIA0:31 1 0 RT 1 DCR(DCRN) MSR 1 newmsr Let DCRN denote a Device Control Register. (The sup- The contents of register RS32:63 are placed into the ported Device Control Registers are implementation- MSR. If the processor is changing from 32-bit mode to dependent.) 64-bit mode, the next instruction is fetched from 32 0||NIA32:63. The contents of the designated Device Control Register are placed into register RT. For 32-bit Device Control This instruction is privileged and execution synchroniz- Registers, the contents of the Device Control Register ing. are placed into bits 32:63 of RT. Bits 0:31 of RT are set In addition, alterations to the EE or CE bits are effective to 0. as soon as the instruction completes. Thus if MSREE=0 This instruction is privileged. and an External interrupt is pending, executing an mtmsr that sets MSREE to 1 will cause the External Special Registers Altered: interrupt to be taken before the next instruction is exe- Implementation-dependent. cuted, if no higher priority exception exists. Likewise, if MSRCE=0 and a Critical Input interrupt is pending, exe- cuting an mtmsr that sets MSRCE to 1 will cause the Critical Input interrupt to be taken before the next Move From Device Control Register instruction is executed if no higher priority exception Indexed X-form exists. (See Section 5.6 on page 574). mfdcrx RT,RA Special Registers Altered: MSR 31 RT RA /// 259 / 0 6 11 16 21 31 Programming Note For a discussion of software synchronization DCRN 1 (RA) requirements when altering certain MSR bits RT 1 DCR(DCRN) please refer to Chapter 10. Let the contents of register RA denote a Device Control Register (the supported Device Control Registers are implementation-dependent.) The contents of the designated Device Control Register are placed into register RT. For 32-bit Device Control Move From Machine State Register Registers, the contents of bits 32:63 of the designated X-form Device Control Register are placed into RT. Bits 0:31 of mfmsr RT RT are set to 0. The specification of Device Control Registers using 31 RT /// /// 83 / mfdcrx and mfdcrux (see Book I) compared to the 0 6 11 16 21 31 specification of Device Control Registers using mfdcr is implementation-dependent. For example, mfdcr RT 1 320 || MSR r2,105 and mfdcrx r2,r1 (where register r1 contains the value 105) may not produce identical results on an The contents of the MSR are placed into bits 32:63 of implementation or between implementations. Also, register RT and bits 0:31 of RT are set to 0. accessing privileged Device Control Registers with This instruction is privileged. mfdcrux when the processor is in supervisor mode is implementation-dependent. Special Registers Altered: None This instruction is privileged. Special Registers Altered: Implementation-dependent. Chapter 3. Fixed-Point Processor 527 Version 2.04 Write MSR External Enable X-form Write MSR External Enable Immediate X-form wrtee RS wrteei 31 RS /// /// 131 / 0 6 11 16 21 31 31 /// /// E /// 163 / 0 6 11 16 17 21 31 MSREE 1 (RS)48 MSREE 1 E The content of (RS)48 is placed into MSREE. The value specified in the E field is placed into MSREE. Alteration of the MSREE bit is effective as soon as the instruction completes. Thus if MSREE=0 and an Exter- Alteration of the MSREE bit is effective as soon as the nal interrupt is pending, executing a wrtee instruction instruction completes. Thus if MSREE=0 and an Exter- that sets MSREE to 1 will cause the External interrupt to nal interrupt is pending, executing a wrtee instruction occur before the next instruction is executed, if no that sets MSREE to 1 will cause the External interrupt to higher priority exception exists (Section 5.9, "Exception occur before the next instruction is executed, if no Priorities" on page 591). higher priority exception exists (Section 5.9, "Exception Priorities" on page 591). This instruction is privileged. This instruction is privileged. Special Registers Altered: MSR Special Registers Altered: MSR Programming Note wrtee and wrteei are used to provide atomic update of MSREE. Typical usage is: mfmsr Rn #save EE in (Rn)48 wrteei 0 #turn off EE mfmsr Rn #save EE in (Rn)48 wrteei 0 #turn off EE : : : : #code with EE disabled wrtee Rn #restore EE without altering #other MSR bits that might #have changed 528 Power ISATM -- Book III-E Version 2.04 3.4.2 External Process ID Instructions [Category: Embedded.External PID] External Process ID instructions provide capabilities for If an Alignment interrupt, Data Storage interrupt, or a loading and storing General Purpose Registers and Data TLB Error interrupt, occurs while attempting to performing cache management operations using a sup- execute an External Process ID instruction, ESREPID is plied context other than the context normally used by set to 1 indicating that the instruction causing the inter- translation. rupt was an External Process ID instruction; any other applicable ESR bits are also set. The EPLC and EPSC registers provide external con- texts for performing loads and stores. The EPLC and the EPSC registers are described in Section 3.3.4. Load Byte by External Process ID Indexed Load Halfword by External Process ID X-form Indexed X-form lbepx RT,RA,RB lhepx RT,RA,RB 31 RT RA RB 95 / 31 RT RA RB 287 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) RT 1 560 || MEM(EA,1) RT 1 480 || MEM(EA,2) Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). The byte in storage addressed by EA is loaded into The halfword in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. RT48:63. RT0:47 are set to 0. For lbepx, the normal translation mechanism is not For lhepx, the normal translation mechanism is not used. The contents of the EPLC register are used to used. The contents of the EPLC register are used to provide the context in which translation occurs. The fol- provide the context in which translation occurs. The fol- lowing substitutions are made for just the translation lowing substitutions are made for just the translation and access control process: and access control process: EPLCEPR is used in place of MSRPR EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEAS is used in place of MSRDS EPLCEPID is used in place of all Process ID regis- EPLCEPID is used in place of all Process ID regis- ters. ters. This instruction is privileged. This instruction is privileged. Special Registers Altered: Special Registers Altered: None None Programming Note Programming Note This instruction behaves identically to a lbzx This instruction behaves identically to a lhzx instruction except for using the EPLC register to instruction except for using the EPLC register to provide the translation context. provide the translation context. Chapter 3. Fixed-Point Processor 529 Version 2.04 Load Word by External Process ID Load Doubleword by External Process ID Indexed X-form Indexed X-form lwepx RT,RA,RB ldepx RT,RA,RB 31 RT RA RB 31 / 31 RT RA RB 29 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) RT 1 320 || MEM(EA,4) RT 1 MEM(EA,8) Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). The word in storage addressed by EA is loaded into The doubleword in storage addressed by EA is loaded RT32:63. RT0:31 are set to 0. into RT. For lwepx, the normal translation mechanism is not For ldepx, the normal translation mechanism is not used. The contents of the EPLC register are used to used. The contents of the EPLC register are used to provide the context in which translation occurs. The fol- provide the context in which translation occurs. The fol- lowing substitutions are made for just the translation lowing substitutions are made for just the translation and access control process: and access control process: EPLCEPR is used in place of MSRPR EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEAS is used in place of MSRDS EPLCEPID is used in place of all Process ID regis- EPLCEPID is used in place of all Process ID regis- ters. ters. This instruction is privileged. This instruction is privileged. Special Registers Altered: Corequisite Categories: None 64-Bit Special Registers Altered: Programming Note None This instruction behaves identically to a lwzx instruction except for using the EPLC register to Programming Note provide the translation context. This instruction behaves identically to a ldx instruc- tion except for using the EPLC register to provide the translation context. 530 Power ISATM -- Book III-E Version 2.04 Store Byte by External Process ID Store Halfword by External Process ID Indexed X-form Indexed X-form stbepx RS,RA,RB sthepx RS,RA,RB 31 RS RA RB 223 / 31 RS RA RB 415 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) MEM(EA,1) 1 (RS)56:63 MEM(EA,2) 1 (RS)48:63 Let the effective address (EA) be the sum (RA|0)+(RB). (RS)56:63 are stored into the byte in storage addressed Let the effective address (EA) be the sum (RA|0)+(RB). by EA. (RS)48:63 are stored into the halfword in storage addressed by EA. For stbepx, the normal translation mechanism is not used. The contents of the EPSC register are used to For sthepx, the normal translation mechanism is not provide the context in which translation occurs. The fol- used. The contents of the EPSC register are used to lowing substitutions are made for just the translation provide the context in which translation occurs. The fol- and access control process: lowing substitutions are made for just the translation EPSCEPR is used in place of MSRPR and access control process: EPSCEAS is used in place of MSRDS EPSCEPR is used in place of MSRPR EPSCEPID is used in place of all Process ID regis- EPSCEAS is used in place of MSRDS ters. EPSCEPID is used in place of all Process ID regis- ters. This instruction is privileged. This instruction is privileged. Special Registers Altered: None Special Registers Altered: None Programming Note This instruction behaves identically to a stbx Programming Note instruction except for using the EPSC register to This instruction behaves identically to a sthx provide the translation context. instruction except for using the EPSC register to provide the translation context. Chapter 3. Fixed-Point Processor 531 Version 2.04 Store Word by External Process ID Store Doubleword by External Process ID Indexed X-form Indexed X-form stwepx RS,RA,RB stdepx RS,RA,RB 31 RS RA RB 159 / 31 RS RA RB 157 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) MEM(EA,4) 1 (RS)32:63 MEM(EA,8) 1 (RS) Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). (RS)32:63 are stored into the word in storage addressed (RS) is stored into the doubleword in storage by EA. addressed by EA. For stwepx, the normal translation mechanism is not For stdepx, the normal translation mechanism is not used. The contents of the EPSC register are used to used. The contents of the EPSC register are used to provide the context in which translation occurs. The fol- provide the context in which translation occurs. The fol- lowing substitutions are made for just the translation lowing substitutions are made for just the translation and access control process: and access control process: EPSCEPR is used in place of MSRPR EPSCEPR is used in place of MSRPR EPSCEAS is used in place of MSRDS EPSCEAS is used in place of MSRDS EPSCEPID is used in place of all Process ID regis- EPSCEPID is used in place of all Process ID regis- ters. ters. This instruction is privileged. This instruction is privileged. Special Registers Altered: Corequisite Categories: None 64-Bit Special Registers Altered: Programming Note None This instruction behaves identically to a stwx instruction except for using the EPSC register to Programming Note provide the translation context. This instruction behaves identically to a stdx instruction except for using the EPSC register to provide the translation context. 532 Power ISATM -- Book III-E Version 2.04 Data Cache Block Store by External PID Data Cache Block Touch by External PID X-form X-form dcbstep RA,RB dcbtep TH,RA,RB 31 /// RA RB 63 / 31 / TH RA RB 319 / 0 6 11 16 21 31 0 6 7 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). If the block containing the byte addressed by EA is in The dcbtep instruction provides a hint that describes a storage that is Memory Coherence Required, a block block or data stream, or indicates the expected use containing the byte addressed by EA is in the data thereof. A hint that the program will probably soon load cache of any processor, and any locations in the block from a given storage location is ignored if the location is are considered to be modified there, then those loca- Caching Inhibited or Guarded. tions are written to main storage. Additional locations in the block may be written to main storage. The block The only operation that is "caused" by the dcbtep ceases to be considered modified in that data cache. instruction is the providing of the hint. The actions (if any) taken by the processor in response to the hint are If the block containing the byte addressed by EA is in not considered to be "caused by" or "associated with" storage that is not Memory Coherence Required and the dcbtep instruction (e.g., dcbtep is considered not the block is in the data cache of this processor, and any to cause any data accesses). No means are provided locations in the block are considered to be modified by which software can synchronize these actions with there, those locations are written to main storage. Addi- the execution of the instruction stream. For example, tional locations in the block may be written to main stor- these actions are not ordered by the memory barrier age, and the block ceases to be considered modified in created by a sync instruction. that data cache. The dcbtep instruction may complete before the opera- The function of this instruction is independent of tion it causes has been performed. whether the block containing the byte addressed by EA is in storage that is Write Through Required or Caching The nature of the hint depends, in part, on the value of Inhibited. the TH field, as specified in the dcbt instruction in Section 3.2.2 of Book II. The instruction is treated as a Load with respect to translation, memory protection, and is treated as a The instruction is treated as a Load, except that no Write with respect to debug events. interrupt occurs if a protection violation occurs. This instruction is privileged. The instruction is privileged. For dcbstep, the normal translation mechanism is not The normal address translation mechanism is not used. used. The contents of the EPLC register are used to The contents of the EPLC register are used to provide provide the context in which translation occurs. The fol- the context in which translation occurs. The following lowing substitutions are made for just the translation substitutions are made for just the translation and and access control process: access control process: EPLCEPR is used in place of MSRPR EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEAS is used in place of MSRDS EPLCEPID is used in place of all Process ID regis- EPLCEPID is used in place of all Process ID regis- ters ters. Special Registers Altered: Special Registers Altered: None None Extended Mnemonics: Programming Note Extended mnemonics are provided for the Data Cache This instruction behaves identically to a dcbst Block Touch by External PID instruction so that it can instruction except for using the EPLC register to be coded with the TH value as the last operand for all provide the translation context. categories. . Extended: Equivalent to: dcbtctep RA,RB,TH dcbtep for TH values of 0b0000 - 0b0111; other TH values are invalid. Chapter 3. Fixed-Point Processor 533 Version 2.04 Extended: Equivalent to: Data Cache Block Flush by External PID dcbtdsep RA,RB,TH dcbtep for TH values of 0b0000 X-form or 0b1000 - 0b1010; other TH values are invalid. dcbfep RA,RB Programming Note 31 /// RA RB 127 / 0 6 11 16 21 31 This instruction behaves identically to a dcbt instruction except for using the EPLC register to provide the translation context. Let the effective address (EA) be the sum (RA|0)+(RB). If the block containing the byte addressed by EA is in storage that is Memory Coherence Required, a block containing the byte addressed by EA is in the data cache of any processor, and any locations in the block are considered to be modified there, then those loca- tions are written to main storage. Additional locations in the block may also be written to main storage. The block is invalidated in the data cache of all processors. If the block containing the byte addressed by EA is in storage that is not Memory Coherence Required, a block containing the byte addressed by EA is in the data cache of this processor, and any locations in the block are considered to be modified there, then those locations are written to main storage. Additional loca- tions in the block may also be written to main storage. The block is invalidated in the data cache of this pro- cessor. The function of this instruction is independent of whether the block containing the byte addressed by EA is in storage that is Write Through Required or Caching Inhibited. The instruction is treated as a Load with respect to translation, memory protection, and is treated as a Write with respect to debug events. This instruction is privileged. The normal translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following sub- stitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of all Process ID regis- ters Special Registers Altered: None Programming Note This instruction behaves identically to a dcbf instruction except for using the EPLC register to provide the translation context. 534 Power ISATM -- Book III-E Version 2.04 Data Cache Block Touch for Store by that it can be coded with the TH value as the last oper- External PID X-form and for all categories. . dcbtstep TH,RA,RB Extended: Equivalent to: dcbtstctep RA,RB,TH dcbtstep for TH values of 31 / TH RA RB 255 / 0b0000 - 0b0111; 0 6 7 11 16 21 31 other TH values are invalid. Let the effective address (EA) be the sum (RA|0)+(RB). Programming Note This instruction behaves identically to a dcbtst The dcbtstep instruction provides a hint that the pro- instruction except for using the EPLC register to gram will probably soon store to the block containing provide the translation context. the byte addressed by EA. If the Cache Specification category is supported, the nature of the hint depends on the value of the TH field, as specified in Section 3.2.2 of Book II. If the Cache Specification cat- egory is not supported, the TH field is treated as a reserved field. If the block is in a storage location that is Caching Inhib- ited or Guarded, then the hint is ignored. The only operation that is "caused" by the dcbtstep instruction is the providing of the hint. The actions (if any) taken by the processor in response to the hint are not considered to be "caused by" or "associated with" the dcbtstep instruction (e.g., dcbtstep is considered not to cause any data accesses). No means are pro- vided by which software can synchronize these actions with the execution of the instruction stream. For exam- ple, these actions are not ordered by the memory bar- rier created by a sync instruction. The dcbtstep instruction may complete before the operation it causes has been performed. The instruction is treated as a Load, except that no interrupt occurs if a protection violation occurs. The instruction is privileged. The normal address translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of all Process ID regis- ters. Special Registers Altered: None Extended Mnemonics: Extended mnemonics are provided for the Data Cache Block Touch for Store by External PID instruction so Chapter 3. Fixed-Point Processor 535 Version 2.04 Instruction Cache Block Invalidate by Data Cache Block set to Zero by External External PID X-form PID X-form icbiep RA,RB dcbzep RA,RB 31 /// RA RB 991 / 31 /// RA RB 1023 / 0 6 11 16 21 31 0 6 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). if RA = 0 then b 1 0 else b 1 (RA) If the block containing the byte addressed by EA is in EA 1 b + (RB) storage that is Memory Coherence Required and a n 1 block size (bytes) block containing the byte addressed by EA is in the m 1 log2(n) instruction cache of any processor, the block is invali- ea 1 EA0:63-m || m0 dated in those instruction caches. MEM(ea, n) 1 n0x00 If the block containing the byte addressed by EA is in Let the effective address (EA) be the sum (RA|0)+(RB). storage that is not Memory Coherence Required and a All bytes in the block containing the byte addressed by block containing the byte addressed by EA is in the EA are set to zero. instruction cache of this processor, the block is invali- dated in that instruction cache. This instruction is treated as a Store. The function of this instruction is independent of This instruction is privileged. whether the block containing the byte addressed by EA The normal translation mechanism is not used. The is in storage that is Write Through Required or Caching contents of the EPSC register are used to provide the Inhibited. context in which translation occurs. The following sub- The instruction is treated as a Load. stitutions are made for just the translation and access control process: This instruction is privileged. EPSCEPR is used in place of MSRPR For icbiep, the normal translation mechanism is not EPSCEAS is used in place of MSRDS used. The contents of the EPLC register are used to EPSCEPID is used in place of all Process ID regis- provide the context in which translation occurs. The fol- ters lowing substitutions are made for just the translation Special Registers Altered: and access control process: None EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS Programming Note EPLCEPID is used in place of all Process ID regis- ters See the Programming Notes for the dcbz instruc- tion. Special Registers Altered: None Programming Note Programming Note This instruction behaves identically to a dcbz This instruction behaves identically to an icbi instruction except for using the EPSC register to instruction except for using the EPLC register to provide the translation context. provide the translation context. 536 Power ISATM -- Book III-E Version 2.04 Load Floating-Point Double by External Store Floating-Point Double by External Process ID Indexed X-form Process ID Indexed X-form lfdepx FRT,RA,RB stfdepx FRS,RA,RB 31 FRT RA RB 607 / 31 FRS RA RB 735 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) FRT 1 MEM(EA,8) MEM(EA,8) 1 (FRS) Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). The doubleword in storage addressed by EA is loaded (FRS) is stored into the doubleword in storage into FRT. addressed by EA. For lfdepx, the normal translation mechanism is not For stfdepx, the normal translation mechanism is not used. The contents of the EPLC register are used to used. The contents of the EPSC register are used to provide the context in which translation occurs. The fol- provide the context in which translation occurs. The fol- lowing substitutions are made for just the translation lowing substitutions are made for just the translation and access control process: and access control process: EPLCEPR is used in place of MSRPR EPSCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPSCEAS is used in place of MSRDS EPLCEPID is used in place of all Process ID regis- EPSCEPID is used in place of all Process ID regis- ters ters This instruction is privileged. This instruction is privileged. An attempt to execute lfdepx while MSRFP=0 will An attempt to execute stfdepx while MSRFP=0 will cause a Floating-Point Unavailable interrupt. cause a Floating-Point Unavailable interrupt. Corequisite Categories: Corequisite Categories: Floating-Point Floating-Point Special Registers Altered: Special Registers Altered: None None Programming Note Programming Note This instruction behaves identically to a lfdx This instruction behaves identically to a stfdx instruction except for using the EPLC register to instruction except for using the EPSC register to provide the translation context. provide the translation context. Chapter 3. Fixed-Point Processor 537 Version 2.04 Vector Load Doubleword into Doubleword Vector Store Doubleword into by External Process ID Indexed EVX-form Doubleword by External Process ID Indexed EVX-form evlddepx RT,RA,RB evstddepx RS,RA,RB 31 RT RA RB 285 0 6 11 16 21 31 31 RS RA RB 413 0 6 11 16 21 31 if RA = 0 then b 1 0 else b 1 (RA) if RA = 0 then b 1 0 EA 1 b + (RB) else b 1 (RA) RT 1 MEM(EA,8) EA 1 b + (RB) MEM(EA,8) 1 (RS) Let the effective address (EA) be the sum (RA|0)+(RB). The doubleword in storage addressed by EA is loaded Let the effective address (EA) be the sum (RA|0)+(RB). into RT. (RS) is stored into the doubleword in storage addressed by EA. For evlddepx, the normal translation mechanism is not used. The contents of the EPLC register are used to For evstddepx, the normal translation mechanism is provide the context in which translation occurs. The fol- not used. The contents of the EPSC register are used lowing substitutions are made for just the translation to provide the context in which translation occurs. The and access control process: following substitutions are made for just the translation EPLCEPR is used in place of MSRPR and access control process: EPLCEAS is used in place of MSRDS EPSCEPR is used in place of MSRPR EPLCEPID is used in place of all Process ID regis- EPSCEAS is used in place of MSRDS ters EPSCEPID is used in place of all Process ID regis- ters This instruction is privileged. This instruction is privileged. An attempt to execute evlddepx while MSRSPV=0 will cause an SPE Unavailable interrupt. An attempt to execute evstddepx while MSRSPV=0 will cause an SPE Unavailable interrupt. Corequisite Categories: Signal Processing Engine Corequisite Categories: Signal Processing Engine Special Registers Altered: None Special Registers Altered: None Programming Note This instruction behaves identically to a evlddx Programming Note instruction except for using the EPLC register to This instruction behaves identically to a evstddx provide the translation context. instruction except for using the EPSC register to provide the translation context. 538 Power ISATM -- Book III-E Version 2.04 Load Vector by External Process ID Load Vector by External Process ID Indexed X-form Indexed LRU X-form lvepx VRT,RA,RB lvepxl VRT,RA,RB 31 VRT RA RB 295 / 31 VRT RA RB 263 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) VRT 1 MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) VRT 1 MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) mark_as_not_likely_to_be_needed_again_anytime_soon Let the effective address (EA) be the sum (RA|0)+(RB). ( EA ) The quadword in storage addressed by the result of EA ANDed with 0xFFFF_FFFF_FFFF_FFF0 is loaded into Let the effective address (EA) be the sum (RA|0)+(RB). VRT. The quadword in storage addressed by the result of EA ANDed with 0xFFFF_FFFF_FFFF_FFF0 is loaded into For lvepx, the normal translation mechanism is not VRT. used. The contents of the EPLC register are used to provide the context in which translation occurs. The fol- lvepxl provides a hint that the quadword in storage lowing substitutions are made for just the translation addressed by EA will probably not be needed again by and access control process: the program in the near future. EPLCEPR is used in place of MSRPR For lvepxl, the normal translation mechanism is not EPLCEAS is used in place of MSRDS used. The contents of the EPLC register are used to EPLCEPID is used in place of all Process ID regis- provide the context in which translation occurs. The fol- ters lowing substitutions are made for just the translation This instruction is privileged. and access control process: EPLCEPR is used in place of MSRPR An attempt to execute lvepx while MSRSPV=0 will EPLCEAS is used in place of MSRDS cause a Vector Unavailable interrupt. EPLCEPID is used in place of all Process ID regis- Corequisite Categories: ters Vector This instruction is privileged. Special Registers Altered: An attempt to execute lvepxl while MSRSPV=0 will None cause a Vector Unavailable interrupt. Programming Note Corequisite Categories: Vector This instruction behaves identically to a lvx instruc- tion except for using the EPLC register to provide Special Registers Altered: the translation context. None Programming Note See the Programming Notes for the lvxl instruction in Section 5.7.2 of Book I. Programming Note This instruction behaves identically to a lvxl instruction except for using the EPLC register to provide the translation context. Chapter 3. Fixed-Point Processor 539 Version 2.04 Store Vector by External Process ID Store Vector by External Process ID Indexed X-form Indexed LRU X-form stvepx VRS,RA,RB stvepxl VRS,RA,RB 31 VRS RA RB 807 / 31 VRS RA RB 775 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) 1 (VRS) MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) 1 (VRS) mark_as_not_likely_to_be_needed_again_anytime_soon Let the effective address (EA) be the sum (RA|0)+(RB). (EA) The contents of VRS are stored into the quadword in storage addressed by the result of EA ANDed with Let the effective address (EA) be the sum (RA|0)+(RB). 0xFFFF_FFFF_FFFF_FFF0. The contents of VRS are stored into the quadword in storage addressed by the result of EA ANDed with For stvepx, the normal translation mechanism is not 0xFFFF_FFFF_FFFF_FFF0. used. The contents of the EPSC register are used to provide the context in which translation occurs. The fol- The stvepxl instruction provides a hint that the quad- lowing substitutions are made for just the translation word addressed by EA will probably not be needed and access control process: again by the program in the near future. EPSCEPR is used in place of MSRPR For stvepxl, the normal translation mechanism is not EPSCEAS is used in place of MSRDS used. The contents of the EPSC register are used to EPSCEPID is used in place of all Process ID regis- provide the context in which translation occurs. The fol- ters lowing substitutions are made for just the translation This instruction is privileged. and access control process: EPSCEPR is used in place of MSRPR An attempt to execute stvepx while MSRSPV=0 will EPSCEAS is used in place of MSRDS cause a Vector Unavailable interrupt. EPSCEPID is used in place of all Process ID regis- Corequisite Categories: ters Vector This instruction is privileged. Special Registers Altered: An attempt to execute stvepxl while MSRSPV=0 will None cause a Vector Unavailable interrupt. Programming Note Corequisite Categories: Vector This instruction behaves identically to a stvx instruction except for using the EPSC register to Special Registers Altered: provide the translation context. None Programming Note See the Programming Notes for the lvxl instruction in Section 5.7.2 of Book I. Programming Note This instruction behaves identically to a stvxl instruction except for using the EPSC register to provide the translation context. 540 Power ISATM -- Book III-E Version 2.04 Chapter 4. Storage Control 4.1 Storage Addressing . . . . . . . . . . . 541 4.8 Storage Control Attributes . . . . . . . 551 4.2 Storage Exceptions . . . . . . . . . . . 541 4.8.1 Guarded Storage . . . . . . . . . . . . 551 4.3 Instruction Fetch . . . . . . . . . . . . . 542 4.8.1.1 Out-of-Order Accesses to Guarded 4.3.1 Implicit Branch . . . . . . . . . . . . . . 542 Storage . . . . . . . . . . . . . . . . . . . . . . . . 552 4.3.2 Address Wrapping Combined with 4.8.2 User-Definable . . . . . . . . . . . . . . 552 Changing MSR Bit CM . . . . . . . . . . . . 542 4.8.3 Storage Control Bits . . . . . . . . . . 552 4.4 Data Access . . . . . . . . . . . . . . . . . 542 4.8.3.1 Storage Control Bit Restrictions . . 4.5 Performing Operations 552 Out-of-Order . . . . . . . . . . . . . . . . . . . . 542 4.8.3.2 Altering the Storage Control Bits . . 4.6 Invalid Real Address . . . . . . . . . . . 543 553 4.7 Storage Control. . . . . . . . . . . . . . . 543 4.9 Storage Control Instructions . . . . . 554 4.7.1 Storage Control Registers . . . . . 543 4.9.1 Cache Management Instructions 554 4.7.1.1 Process ID Register . . . . . . . . 543 4.9.2 Cache Locking [Category: Embed- 4.7.1.2 Translation Lookaside Buffer . 543 ded Cache Locking] . . . . . . . . . . . . . . . 555 4.7.2 Page Identification . . . . . . . . . . . 545 4.9.2.1 Lock Setting and Clearing . . . . 555 4.7.3 Address Translation . . . . . . . . . . 548 4.9.2.2 Error Conditions . . . . . . . . . . . 555 4.7.4 Storage Access Control . . . . . . . 549 4.9.2.2.1 Overlocking . . . . . . . . . . . . . 555 4.7.4.1 Execute Access . . . . . . . . . . . 549 4.9.2.2.2 Unable-to-lock and Unable-to- 4.7.4.2 Write Access. . . . . . . . . . . . . . 549 unlock Conditions . . . . . . . . . . . . . . . . 556 4.7.4.3 Read Access . . . . . . . . . . . . . 549 4.9.2.3 Cache Locking Instructions . . . 557 4.7.4.4 Storage Access Control Applied to 4.9.3 Synchronize Instruction . . . . . . . 559 Cache Management Instructions . . . . 549 4.9.4 Lookaside Buffer 4.7.4.5 Storage Access Control Applied to Management . . . . . . . . . . . . . . . . . . . . 559 String Instructions . . . . . . . . . . . . . . . . 550 4.9.4.1 TLB Management Instructions 560 4.7.5 TLB Management . . . . . . . . . . . 550 4.1 Storage Addressing 4.2 Storage Exceptions A program references storage using the effective A storage exception results when the sequential execu- address computed by the processor when it executes a tion model requires that a storage access be performed Load, Store, Branch, or Cache Management instruc- but the access is not permitted (e.g., is not permitted by tion, or when it fetches the next sequential instruction. the storage protection mechanism), the access cannot The effective address is translated to a real address be performed because the effective address cannot be according to procedures described in Section 4.7.2 and translated to a real address, or the access matches in Section 4.7.3. The real address that results from the some tracking mechanism criteria (e.g., Data Address respective translations is used to access main storage. Breakpoint). For a complete discussion of storage addressing and In certain cases a storage exception may result in the effective address calculation, see Section 1.10 of Book "restart" of (re-execution of at least part of) a Load or I. Store instruction. See Section 2.1 of Book II and Section 5.7 on page 588 in this Book. Chapter 4. Storage Control 541 Version 2.04 4.3 Instruction Fetch the sequential execution model. An operation is said to be performed "out-of-order" if, at the time that it is per- The effective address for an instruction fetch is pro- formed, it is not known to be required by the sequential cessed under control of MSRIS. The Address Transla- execution model. tion mechanism is described beginning in Operations are performed out-of-order by the proces- Section 4.7.2. sor on the expectation that the results will be needed by an instruction that will be required by the sequential 4.3.1 Implicit Branch execution model. Whether the results are really needed is contingent on everything that might divert the control Explicitly altering certain MSR bits (using mtmsr), or flow away from the instruction, such as Branch, Trap, explicitly altering TLB entries, certain System Registers System Call, and Return From Interrupt instructions, and possibly other implementation-dependent regis- and interrupts, and on everything that might change the ters, may have the side effect of changing the context in which the instruction is executed. addresses, effective or real, from which the current Typically, the processor performs operations out-of- instruction stream is being fetched. This side effect is order when it has resources that would otherwise be called an implicit branch. For example, an mtmsr idle, so the operation incurs little or no cost. If subse- instruction that changes the value of MSRCM may quent events such as branches or interrupts indicate change the real address from which the current instruc- that the operation would not have been performed in tion stream is being fetched. The MSR bits and System the sequential execution model, the processor aban- Registers (excluding implementation-dependent regis- dons any results of the operation (except as described ters) for which alteration can cause an implicit branch below). are indicated as such in Chapter 10. "Synchronization Requirements for Context Alterations" on page 625. In the remainder of this section, including its subsec- Implicit branches are not supported by the Power ISA. tions, "Load instruction" includes the Cache Manage- If an implicit branch occurs, the results are boundedly ment and other instructions that are stated in the undefined. instruction descriptions to be "treated as a Load", and similarly for "Store instruction". 4.3.2 Address Wrapping Com- A data access that is performed out-of-order may corre- spond to an arbitrary Load or Store instruction (e.g., a bined with Changing MSR Bit CM Load or Store instruction that is not in the instruction If the current instruction is at effective address 232-4 stream being executed). Similarly, an instruction fetch and is an mtmsr instruction that changes the contents that is performed out-of-order may be for an arbitrary of MSRCM, the effective address of the next sequential instruction (e.g., the aligned word at an arbitrary loca- instruction is undefined. tion in instruction storage). Most operations can be performed out-of-order, as long Programming Note as the machine appears to follow the sequential execu- In the case described in the preceding paragraph, if tion model. Certain out-of-order operations are an interrupt occurs before the next sequential restricted, as follows. instruction is executed, the contents of SRR0, 1 Stores CSRR0, or MCSRR0, as appropriate to the inter- rupt, are undefined. Stores are not performed out-of-order (even if the Store instructions that caused them were executed out-of-order). 4.4 Data Access 1 Accessing Guarded Storage The restrictions for this case are given in Section The effective address for a data access is processed 4.8.1.1. under control of MSRDS. The Address Translation mechanism is described beginning in Section 4.7.2. The only permitted side effects of performing an opera- tion out-of-order are the following. Storage control attributes may also affect instruction fetch. 1 A Machine Check that could be caused by in-order execution may occur out-of-order. 1 Non-Guarded storage locations that could be 4.5 Performing Operations fetched into a cache by in-order fetching or execu- Out-of-Order tion of an arbitrary instruction may be fetched out- of-order into that cache. An operation is said to be performed "in-order" if, at the time that it is performed, it is known to be required by 542 Power ISATM -- Book III-E Version 2.04 4.6 Invalid Real Address Some implementations may support more than one Process ID Register. See User's Manual for the imple- A storage access (including an access that is per- mentation. formed out-of-order; see Section 4.5) may cause a Machine Check if the accessed storage location con- 4.7.1.2 Translation Lookaside Buffer tains an uncorrectable error or does not exist. See Section 5.6.2 on page 576. The Translation Lookaside Buffer (TLB) is the hardware resource that controls translation, protection, and stor- age control attributes. The organization of the TLB (e.g. 4.7 Storage Control unified versus separate instruction and data, hierar- chies, associativity, number of entries, etc.) is imple- This section describes the address translation facility, mentation-dependent. Thus, the software for updating access control, and storage control attributes. the TLB is also implementation-dependent. For the pur- poses of this discussion, a unified TLB organization is Demand-paged virtual memory is supported, as well as assumed. The differences for an implementation with a variety of other management schemes that depend separate instruction and data TLBs are for the most on precise control of effective-to-real address transla- part obvious (e.g. separate instructions or separate tion and flexible memory protection. Translation misses index ranges for reading, writing, searching, and invali- and protection faults cause precise exceptions. Suffi- dating each TLB). For details on how to synchronize cient information is available to correct the fault and TLB updates with instruction execution see Chapter 10. restart the faulting instruction. Maintenance of TLB entries is under software control. The effective address space is divided into pages. The System software determines TLB entry replacement page represents the granularity of effective address strategy and the format and use of any page state infor- translation, access control, and storage control mation. The TLB entry contains all the information attributes. Up to sixteen page sizes (1KB, 4KB, 16KB, required to identify the page, to specify the translation, 64KB, 256KB, 1MB, 4MB, 16MB, 64MB, 256MB, 1GB, to specify access controls, and to specify the storage 4GB, 16GB, 64GB, 256GB, 1TB) may be simulta- control attributes. The format of the TLB entry is imple- neously supported. In order for an effective to real mentation-dependent. translation to exist, a valid entry for the page containing the effective address must be in the Translation Looka- While the TLB is managed by software, an implementa- side Buffer (TLB). Addresses for which no TLB entry tion may include partial or full hardware assist for TLB exists cause TLB Miss exceptions. management (e.g. support of the Server environment's virtual memory architecture). However, such implemen- tations should be able to disable such support with 4.7.1 Storage Control Registers implementation-dependent software or hardware con- figuration mechanisms. In addition to the registers described below, the Machine State Register provides the IS and DS bits, A TLB entry is written by copying information from a that specify which of the two address spaces the GPR or other implementation-dependent source, using respective instruction or data storage accesses are a series of tlbwe instructions (see page 562). A TLB directed towards. MSRPR bit is also used by the stor- entry is read by copying information to a GPR or other age access control mechanism. implementation-dependent target, using a series of tlbre instructions (see page 560). Software can also 4.7.1.1 Process ID Register search for specific TLB entries using the tlbsx instruc- tion (see page 561). Writing, reading and searching the The Process ID Register (PID) is a 32-bit register. Pro- TLB is implementation-dependent. cess ID Register bits are numbered 32 (most-signifi- Each TLB entry describes a page that is eligible for cant bit) to 63 (least-significant bit). The Process ID translation and access controls. Fields in the TLB entry Register provides a value that is used to construct a vir- fall into four categories: tual address for accessing storage. 1 Page identification fields (information required to The Process ID Register can be read using mfspr and identify the page to the hardware translation mech- can be written using mtspr. An implementation may opt anism). to implement only the least-significant n bits of the Pro- 1 Address translation fields cess ID Register, where 0 n 32, and n must be the 1 Access control fields same as the number of implemented bits in the TID 1 Storage attribute fields field of the TLB entry. The most-significant 32­n bits of the Process ID Register are treated as reserved. While the fields in the TLB entry are required, no partic- ular TLB entry format is formally specified. The tlbre and tlbwe instructions provide the ability to read or Chapter 4. Storage Control 543 Version 2.04 write portions of individual entries. Below are shown the field definitions for the TLB entry. Translation Field Name Description Page Identification Fields RPN Real Page Number (up to 54 bits) Bits 0:n­1 of the RPN field are used to replace Name Description bits 0:n­1 of the effective address to produce EPN Effective Page Number (up to 54 bits) the real address for the storage access Bits 0:n­1 of the EPN field are compared to (where n=64­log2(page size in bytes) and bits 0:n­1 of the effective address (EA) of page size is specified by the SIZE field of the the storage access (where n=64­ TLB entry). Software must set unused low- log2(page size in bytes) and page size is order RPN bits (i.e. bits n:53) to 0. See Sec- specified by the SIZE field of the TLB entry). tion 4.7.3. See Table 1. Note: Bits X:Y of the RPN field may be imple- Note: Bits X:Y of the EPN field may be imple- mented, where X 0 and 53 Y. The num- mented, where X=0 or X=32, and Y 153. ber of bits implemented for EPN are not The number of bits implemented for EPN required to be the same number of bits as are not required to be the same number of are implemented for RPN. bits as are implemented for RPN. TS Translation Address Space This bit indicates the address space this TLB entry is associated with. For instruction stor- Storage Control Bits (see Section 4.8.3 on page 552) age accesses, MSRIS must match the value of TS in the TLB entry for that TLB entry to Name Description provide the translation. Likewise, for data W Write-Through Required See Section 1.6.1 storage accesses, MSRDS must match the of Book II. value of TS in the TLB entry. For tlbsx and I Caching Inhibited See Section 1.6.2 of tlbivax instructions, an implementation- Book II. dependent source provides the address M Memory Coherence Required See space specification that must match the Section 1.6.3 of Book II. value of TS. G Guarded See Section 1.6.4 of Book II and SIZE Page Size Section 4.8.1. The SIZE field specifies the size of the page E Endian Mode See Section 1.10.1 of Book I associated with the TLB entry as 4SIZEKB, and Section 1.6.5 of Book II. where 0 SIZE 15. Implementations may U0:U3 User-Definable Storage Control implement any one or more of these page Attributes See Section 4.8.2. sizes. See Table 1. Specifies implementation-dependent and sys- TID Translation ID (implementation-dependent tem-dependent storage control attributes for size) the page associated with the TLB entry. Field used to identify a shared page (TID=0) or VLE Variable Length Encoding [Category: VLE] the owner's process ID of a private page See Section 4.8.3 and Chapter 1 of Book (TID0). See Section 4.7.2. VLE. V Valid This bit indicates that this TLB entry is valid and may be used for translation. The Valid bit for a given entry can be set or cleared Access Control Fields with a tlbwe instruction; alternatively, the Name Description Valid bit for an entry may be cleared by a tlbivax instruction. UX User State Execute Enable See Section 4.7.4.1. 0 Instruction fetch and execution is not permit- ted from this page while MSRPR=1 and will cause an Execute Access Control exception type Instruction Storage interrupt. 1 Instruction fetch and execution is permitted from this page while MSRPR=1. 544 Power ISATM -- Book III-E Version 2.04 SX Supervisor State Execute Enable See Sec- 4.7.2 Page Identification tion 4.7.4.1. 0 Instruction fetch and execution is not permit- Instruction effective addresses are generated for ted from this page while MSRPR=0 and will sequential instruction fetches and for addresses that cause an Execute Access Control exception correspond to a change in program flow (branches, type Instruction Storage interrupt. interrupts). Data effective addresses are generated by 1 Instruction fetch and execution is permitted Load, Store, and Cache Management instructions. TLB from this page while MSRPR=1. Management instructions generate effective addresses UW User State Write Enable See Section to determine the presence of or to invalidate a specific 4.7.4.2. TLB entry associated with that address. 0 Store operations, including dcba dcbz, and The Valid (V) bit, Effective Page Number (EPN) field, dcbzep are not permitted to this page when Translation Space Identifier (TS) bit, Page Size (SIZE) MSRPR=1 and will cause a Write Access field, and Translation ID (TID) field of a particular TLB Control exception. Except as noted in entry identify the page associated with that TLB entry. Table 3 on page 550, a Write Access Control Except as noted, all comparisons must succeed to vali- exception will cause a Data Storage inter- date this entry for subsequent translation and access rupt. control processing. Failure to locate a matching TLB 1 Store operations, including dcba, dcbz, and entry based on this criteria for instruction fetches will dcbzep are permitted to this page when result in an Instruction TLB Miss exception type Instruc- MSRPR=1. tion TLB Error interrupt. Failure to locate a matching SW Supervisor State Write Enable See Section TLB entry based on this criteria for data storage 4.7.4.2. accesses will result in a Data TLB Miss exception which 0 Store operations, including dcba, dcbi, may result in a Data TLB Error interrupt. Figure 8 on dcbz, and dcbzep are not permitted to this page 546 illustrates the criteria for a virtual address to page when MSRPR=0. Store operations, match a specific TLB entry. including dcbi, dcbz, and dcbzep, will cause a Write Access Control exception. There are two address spaces, one typically associated Except as noted in Table 3 on page 550, a with interrupt-related storage accesses and one typi- Write Access Control exception will cause a cally associated with non-interrupt-related storage Data Storage interrupt. accesses. There are two bits in the Machine State Reg- 1 Store operations, including dcba, dcbi, ister, the Instruction Address Space bit (IS) and the dcbz, and dcbzep, are permitted to this Data Address Space bit (DS), that control which page when MSRPR=0. address space instruction and data storage accesses, UR User State Read Enable See Section respectively, are performed in, and a bit in the TLB 4.7.4.3. entry (TS) that specifies which address space that TLB 0 Load operations (including load-class Cache entry is associated with. Management instructions) are not permitted Load, Store, Cache Management, Branch, tlbsx, and from this page when MSRPR=1 and will tlbivax instructions and next-sequential-instruction cause a Read Access Control exception. fetches produce a 64-bit effective address. The virtual Except as noted in Table 3 on page 550, a address space is extended from this 64-bit effective Read Access Control exception will cause a address space by prepending a one-bit address space Data Storage interrupt. identifier and a process identifier. For instruction 1 Load operations (including load-class Cache fetches, the address space identifier is provided by Management instructions) are permitted MSRIS and the process identifier is provided by the from this page when MSRPR=1. contents of the Process ID Register. For data storage SR Supervisor State Read Enable See Section accesses, the address space identifier is provided by 4.7.4.3. the MSRDS and the process identifier is provided by the 0 Load operations (including load-class Cache contents of the Process ID Register. For tlbsx, and Management instructions) are not permitted tlbivax instructions, the address space identifier and from this page when MSRPR=0 and will the process identifier are provided by implementation- cause a Read Access Control exception. dependent sources. Except as noted in Table 3 on page 550, a Read Access Control exception will cause a This virtual address is used to locate the associated Data Storage interrupt. entry in the TLB. The address space identifier, the pro- 1 Load operations (including load-class Cache cess identifier, and the effective address of the storage Management instructions) are permitted access are compared to the Translation Address Space from this page when MSRPR=0. bit (TS), the Translation ID field (TID), and the value in the Effective Page Number field (EPN), respectively, of each TLB entry. Chapter 4. Storage Control 545 Version 2.04 The virtual address of a storage access matches a TLB match a specific virtual address exists, assuming a set- entry if, for every TLB entry i in the congruence class associative or fully-associative organization, doing so is specified by EA: a programming error and the results are undefined. 1 the value of the address specifier for the storage access (MSRIS for instruction fetches, MSRDS for Table 1: Page Size and Effective Address to EPN data storage accesses, and implementation- Comparison dependent source for tlbsx and tlbivax) is equal to Page Size EA to EPN Comparison SIZE the value of the TS bit of the TLB entry, and (4SIZEKB) (bits 0:53­2¥SIZE) =0b0000 1KB EPN0:53 =? EA0:53 1 either the value of the process identifier (Process =0b0001 4KB EPN0:51 =? EA0:51 ID Register for instruction and data storage =0b0010 16KB EPN0:49 =? EA0:49 accesses, and implementation-dependent source =0b0011 64KB EPN0:47 =? EA0:47 for tlbsx and tlbivax) is equal to the value in the =0b0100 256KB EPN0:45 =? EA0:45 TID field of the TLB entry, or the value of the TID =0b0101 1MB EPN0:43 =? EA0:43 field of the TLB entry is equal to 0, and =0b0110 4MB EPN0:41 =? EA0:41 1 the contents of bits 0:n­1 of the effective address =0b0111 16MB EPN0:39 =? EA0:39 of the storage or TLB access are equal to the value =0b1000 64MB EPN0:37 =? EA0:37 of bits 0:n-1 of the EPN field of the TLB entry =0b1001 256MB EPN0:35 =? EA0:35 (where n=64-log2(page size in bytes) and =0b1010 1GB EPN0:33 =? EA0:33 page size is specified by the value of the SIZE field =0b1011 4GB EPN0:31 =? EA0:31 of the TLB entry). See Table 1. =0b1100 16GB EPN0:29 =? EA0:29 =0b1101 64GB EPN0:27 =? EA0:27 A TLB Miss exception occurs if there is no valid entry in =0b1110 256GB EPN0:25 =? EA0:25 the TLB for the page specified by the virtual address =0b1111 1TB EPN0:23 =? EA0:23 (Instruction or Data TLB Error interrupt). Although the possibility to place multiple entries into the TLB that TLBentry[i][V] TLB entry i matches effective address TLBentry[i][TS] =? AS Process IDn:63 private page =? Legend: TLBentry[i][TID]n:63 =0? shared page AS EA {MSRIS for instruction fetches, or MSRDS for data storage accesses, or implementation-dependent for tlbsx & tlbivax effective address of storage access contents of Process ID Register for TLBentry[i][EPN]0:N-1 EA0:N-1 =? Process ID N-1 { instruction fetches and data storage accesses, or implementation-dependent for tlbsx & tlbivax 63 ­ log2(page size) n 64 ­ # of implemented PID/TID bits Figure 8. Virtual Address to TLB Entry Match Process 546 Power ISATM -- Book III-E Version 2.04 MSRDS for data storage accesses MSRIS for instruction fetch 64-bit Effective Address AS PID Effective Page Address Offset 0 n­1 n 63 Virtual Address TLB multiple-entry RPN0:53 Real Page Number Offset 0 n­1 n 63 NOTE: n = 64­log2(page size) 64-bit Real Address Figure 9. Effective-to-Real Address Translation Flow Chapter 4. Storage Control 547 Version 2.04 4.7.3 Address Translation The Real Page Number field (RPN) of the matching TLB entry provides the translation for the effective A program references memory by using the effective address of the storage access. Based on the setting of address computed by the processor when it executes a the SIZE field of the matching TLB entry, the RPN field Load, Store, Cache Management, or Branch instruc- replaces the corresponding most-significant N bits of tion, and when it fetches the next instruction. The effec- the effective address (where N = 64 ­ log2(page size)), tive address is translated to a real address according to as shown in Table 2, to produce the 64-bit real address the procedures described in this section. The storage that is to be presented to main storage to perform the subsystem uses the real address for the access. All storage access. storage access effective addresses are translated to real addresses using the TLB mechanism. See Figure 9. Table 2: Effective Address to Real Address Page RPN Bits If the virtual address of the storage access matches a Size Required TLB entry in accordance with the selection criteria SIZE Real Address (4SIZE to be Equal specified in Section 4.7.2, the value of the Real Page KB) to 0 Number field (RPN) of the selected TLB entry provides the real page number portion of the real address. Let =0b0000 1KB none RPN0:53 || EA54:63 n=64­log2(page size in bytes) where page size is spec- =0b0001 4KB RPN52:53=0 RPN0:51 || EA52:63 ified by the SIZE field of the TLB entry. Bits n:63 of the =0b0010 16KB RPN50:53=0 RPN0:49 || EA50:63 effective address are appended to bits 0:n­1 of the 54- =0b0011 64KB RPN48:53=0 RPN0:47 || EA48:63 bit RPN field of the selected TLB entry to produce the =0b0100 256KB RPN46:53=0 RPN0:45 || EA46:63 64-bit real address (i.e. RA = RPN0:n­1 || EAn:63). The =0b0101 1MB RPN44:53=0 RPN0:43 || EA44:63 page size is determined by the value of the SIZE field =0b0110 4MB RPN42:53=0 RPN0:41 || EA42:63 of the selected TLB entry. See Table 2. =0b0111 16MB RPN40:53=0 RPN0:39 || EA40:63 =0b1000 64MB RPN38:53=0 RPN0:37 || EA38:63 The rest of the selected TLB entry provides the access =0b1001 256MB RPN36:53=0 RPN0:35 || EA36:63 control bits (UX, SX, UW, SW, UR, SR), and storage =0b1010 1GB RPN34:53=0 RPN0:33 || EA34:63 control attributes (U0, U1, U2, U3, W, I, M, G, E) for the =0b1011 4GB RPN32:53=0 RPN0:31 || EA32:63 storage access. The access control bits and storage =0b1100 16GB RPN30:53=0 RPN0:29 || EA30:63 attribute bits specify whether or not the access is =0b1101 64GB RPN28:53=0 RPN0:27 || EA28:63 allowed and how the access is to be performed. See =0b1110 256GB RPN26:53=0 RPN0:25 || EA26:63 Sections 4.7.4 and 4.7.5. =0b1111 1TB RPN24:53=0 RPN0:23 || EA24:63 TLB match (see Figure 8) access granted MSRPR instruction fetch TLBentry[UX] TLBentry[SX] load-class data storage access TLBentry[UR] TLBentry[SR] store-class data storage access TLBentry[UW] TLBentry[SW] Figure 10. Access Control Process 548 Power ISATM -- Book III-E Version 2.04 4.7.4 Storage Access Control Store operations (including Store-class Cache Man- agement instructions) are permitted to a page in stor- After a matching TLB entry has been identified, an age while in user state (MSRPR=1) if the UW access access control mechanism selectively grants shared control bit for that page is equal to 1. If the UW access access, grants execute access, grants read access, control bit is equal to 0, then execution of the Store grants write access, and prohibits access to areas of instruction is suppressed and a Write Access Control storage based on a number of criteria. Figure 10 illus- exception type Data Storage interrupt is taken. trates the access control process and is described in Store operations (including Store-class Cache Man- detail in Sections 4.7.4.1 through 4.7.4.5. agement instructions) are permitted to a page in stor- An Execute, Read, or Write Access Control exception age while in supervisor state (MSRPR=0) if the SW occurs if the appropriate TLB entry is found but the access control bit for that page is equal to 1. If the SW access is not allowed by the access control mechanism access control bit is equal to 0, then execution of the (Instruction or Data Storage interrupt). See Section 5.6 Store instruction is suppressed and a Write Access for additional information about these and other inter- Control exception type Data Storage interrupt is taken. rupt types. In certain cases, Execute, Read, and Write Access Control exceptions may result in the restart of 4.7.4.3 Read Access (re-execution of at least part of) a Load or Store instruc- tion. The UR and SR bits of the TLB entry control read access to the page (see Table 3). Some implementation may provide additional access control capabilities beyond that described here. Load operations (including Load-class Cache Manage- ment instructions) are permitted from a page in storage while in user state (MSRPR=1) if the UR access control 4.7.4.1 Execute Access bit for that page is equal to 1. If the UR access control The UX and SX bits of the TLB entry control execute bit is equal to 0, then execution of the Load instruction access to the page (see Table 3). is suppressed and a Read Access Control exception type Data Storage interrupt is taken. Instructions may be fetched and executed from a page in storage while in user state (MSRPR=1) if the UX Load operations (including Load-class Cache Manage- access control bit for that page is equal to 1. If the UX ment instructions) are permitted from a page in storage access control bit is equal to 0, then instructions from while in supervisor state (MSRPR=0) if the SR access that page will not be fetched, and will not be placed into control bit for that page is equal to 1. If the SR access any cache as the result of a fetch request to that page control bit is equal to 0, then execution of the Load while in user state. instruction is suppressed and a Read Access Control exception type Data Storage interrupt is taken. Instructions may be fetched and executed from a page in storage while in supervisor state (MSRPR=0) if the SX access control bit for that page is equal to 1. If the 4.7.4.4 Storage Access Control Applied SX access control bit is equal to 0, then instructions to Cache Management Instructions from that page will not be fetched, and will not be placed into any cache as the result of a fetch request to dcbi, dcbz, and dcbzep instructions are treated as that page while in supervisor state. Stores since they can change data (or cause loss of data by invalidating a dirty line). As such, they both can Instructions from no-execute storage may be in the cause Write Access Control exception type Data Stor- instruction cache if they were fetched into that cache age interrupts. If an implementation first flushes a line when their effective addresses were mapped to exe- before invalidating it during a dcbi, the dcbi is treated cute permitted storage. Software need not flush a page as a a Load since the data is not modified. from the instruction cache before marking it no-exe- cute. dcba instructions are treated as Stores since they can change data. As such, they can cause Write Access Furthermore, if the sequential execution model calls for Control exceptions. However, such exceptions will not the execution of an instruction from a page that is not result in a Data Storage interrupt. enabled for execution (i.e. UX=0 when MSRPR=1 or SX=0 when MSRPR=0), an Execute Access Control icbi and icbiep instructions are treated as Loads with exception type Instruction Storage interrupt is taken. respect to protection. As such, they can cause Read Access Control exception type Data Storage interrupts. 4.7.4.2 Write Access dcbt, dcbtep, dcbtst, dcbtstep, and icbt instructions are treated as Loads with respect to protection. As The UW and SW bits of the TLB entry control write such, they can cause Read Access Control exceptions. access to the page (seeTable 3 ). However, such exceptions will not result in a Data Stor- age interrupt. Chapter 4. Storage Control 549 Version 2.04 dcbf, dcbfep, dcbst, and dcbstep instructions are 4.7.5 TLB Management treated as Loads with respect to protection. Flushing or storing a line from the cache is not considered a Store No format for the Page Tables or the Page Table Entries since the store has already been done to update the is implied. Software has significant flexibility in imple- cache and the dcbf, dcbfep, dcbst, or dcbstep menting a custom replacement strategy. For example, instruction is only updating the copy in main storage. As software may choose to lock TLB entries that corre- a Load, they can cause Read Access Control exception spond to frequently used storage, so that those entries type Data Storage interrupts. are never cast out of the TLB and TLB Miss exceptions to those pages never occur. At a minimum, software Table 3: Storage Access Control Applied to Cache must maintain an entry or entries for the Instruction and Instructions Data TLB Error interrupt handlers. Read Protection Write Protection TLB management is performed in software with some Instruction Violation Violation hardware assist. This hardware assist consists of a dcba No Yes2 minimum of: dcbf Yes No 1 Automatic recording of the effective address caus- dcbfep Yes No ing a TLB Miss exception. For Instruction TLB Miss exceptions, the address is saved in the Save/ dcbi Yes3 Yes3 Restore Register 0. For Data TLB Miss exceptions, dcblc Yes No the address is saved in the Data Exception dcbst Yes No Address Register. dcbstep Yes No 1 Instructions for reading, writing, searching, invali- dcbt Yes 1 No dating, and synchronizing the TLB (see Section 4.9.4.1). dcbtep Yes1 No dcbtls Yes No dcbtst Yes1 No dcbtstep Yes1 No dcbtstls Yes4 Yes4 dcbz No Yes dcbzep No Yes dci No No icbi Yes No icbiep Yes No icblc Yes5 No icbt Yes1 No icbtls Yes5 No ici No No 1. dcbt, dcbtep, dcbtst, dcbtstep, and icbt may cause a Read Access Control exception but does not result in a Data Storage interrupt. 2. dcba may cause a Write Access Control exception but does not result in a Data Storage interrupt. 3. dcbi may cause a Read or Write Access Control Exception based on whether the data is flushed prior to invalidation. 4. It is implementation-dependent whether dcbtstls is treated as a Load or a Store. 5. icbtls and icblc require execute or read access. 4.7.4.5 Storage Access Control Applied to String Instructions When the string length is zero, neither lswx nor stswx can cause Data Storage interrupts. 550 Power ISATM -- Book III-E Version 2.04 Programming Note This Note suggests one example for managing refer- attempt of application code to use the page will cause ence and change recording. an Access Control exception (because the entry is marked "No Execute", "No Read", and "No Write"). The When performing physical page management, it is use- Instruction or Data Storage interrupt handler records ful to know whether a given physical page has been ref- the reference to the TLB entry and to the associated erenced or altered. Note that this may be more involved physical page in a software table, and then turns on the than whether a given TLB entry has been used to refer- appropriate access control bit. An initial read from the ence or alter memory, since multiple TLB entries may page could be handled by only turning on the appropri- translate to the same physical page. If it is necessary to ate UR or SR access control bits, leaving the page replace the contents of some physical page with other "read-only". Subsequent execute, read, or write contents, a page which has been referenced (accessed accesses to the page via this TLB entry will proceed for any purpose) is more likely to be maintained than a normally. page which has never been referenced. If the contents of a given physical page are to be replaced, then the In a demand-paged environment, when the contents of contents of that page must be written to the backing a physical page are to be replaced, if any storage in store before replacement, if anything in that page has that physical page has been altered, then the backing been changed. Software must maintain records to con- storage must be updated. The information that a physi- trol this process. cal page is dirty is typically recorded in a "Change" bit for that page. Similarly, when performing TLB management, it is use- ful to know whether a given TLB entry has been refer- Write Access Control exceptions may be used to allow enced. When making a decision about which entry to software to maintain change information for a physical cast-out of the TLB, an entry which has been refer- page. For the example just given for reference record- enced is more likely to be maintained in the TLB than ing, the first write access to the page via the TLB entry an entry which has never been referenced. will create a Write Access Control exception type Data Storage interrupt. The Data Storage interrupt handler Execute, Read and Write Access Control exceptions records the change status to the physical page in a may be used to allow software to maintain reference software table, and then turns on the appropriate UW information for a TLB entry and for its associated physi- and SW bits. All subsequent accesses to the page via cal page. The entry is built, with its UX, SX, UR, SR, this TLB entry will proceed normally. UW, and SW bits off, and the index and effective page number of the entry retained by software. The first 4.8 Storage Control Attributes This section describes aspects of the storage control Storage is said to be Guarded if the G bit is 1 in the TLB attributes that are relevant only to privileged software entry that translates the effective address. programmers. The rest of the description of storage In general, storage that is not well-behaved should be control attributes may be found in Section 1.6 of Book II Guarded. Because such storage may represent a con- and subsections. trol register on an I/O device or may include locations that do not exist, an out-of-order access to such stor- 4.8.1 Guarded Storage age may cause an I/O device to perform unintended operations or may result in a Machine Check. Storage is said to be "well-behaved" if the correspond- ing real storage exists and is not defective, and if the Instruction fetching is not affected by the G bit. Soft- effects of a single access to it are indistinguishable ware must set guarded pages to no execute (i.e. UX=0 from the effects of multiple identical accesses to it. and SX=0) to prevent instruction fetching from guarded Data and instructions can be fetched out-of-order from storage. well-behaved storage without causing undesired side The following rules apply to in-order execution of Load effects. and Store instructions for which the first byte of the storage operand is in storage that is both Caching Inhibited and Guarded. Chapter 4. Storage Control 551 Version 2.04 1 Load or Store instruction that causes an atomic access If any portion of the storage operand has been Bit Storage Control Attribute accessed, the instruction completes before the W1 0 - not Write Through Required interrupt occurs if any of the following exceptions is 1 - Write Through Required pending. I 0 - not Caching Inhibited 1 External, Decrementer, Critical Input, Machine 1 - Caching Inhibited Check, Fixed-Interval Timer, Watchdog Timer, M2 0 - not Memory Coherence Required Debug, or Imprecise mode Floating-Point or 1 - Memory Coherence Required Auxiliary Processor Enabled G 0 - not Guarded 1 Load or Store instruction that causes an Alignment 1 - Guarded exception, a Data TLB Error exception, or that causes a Data Storage exception. E3 0 - Big-Endian 1 - Little-Endian The portion of the storage operand that is in Cach- U0-U34 User-Definable ing Inhibited and Guarded storage is not accessed. 5 VLE 0 - non Variable Length Encoding (VLE). 1 - VLE 4.8.1.1 Out-of-Order Accesses to 1 Support for the 1 value of the W bit is optional. Guarded Storage Implementations that do not support the 1 value In general, Guarded storage is not accessed out-of- treat the bit as reserved and assume its value to order. The only exceptions to this rule are the following. be 0. 2 Support of the 1 value is optional for implementa- Load Instruction tions that do not support multiprocessing, imple- mentations that do not support this storage If a copy of any byte of the storage operand is in a attribute assume the value of the bit to be 0, and cache then that byte may be accessed in the cache or setting M=1 in a TLB entry will have no effect. in main storage. 3 [Category: Embedded.Little-Endian] 4 Support for these attributes is optional. 4.8.2 User-Definable 5 [Category: VLE] User-definable storage control attributes control user- Figure 11. Storage control bits definable and implementation-dependent behavior of In Section 4.8.3.1 and 4.8.3.2, "access" includes the storage system. These bits are both implementa- accesses that are performed out-of-order. tion-dependent and system-dependent in their effect. They may be used in any combination and also in com- Programming Note bination with the other storage attribute bits. In a uniprocessor system in which only the proces- sor has caches, correct coherent execution does 4.8.3 Storage Control Bits not require the processor to access storage as Memory Coherence Required, and accessing stor- Storage control attributes are specified on a per-page age as not Memory Coherence Required may give basis. These attributes are specified in storage control better performance. bits in the TLB entries. The interpretation of their values is given in Figure 11. 4.8.3.1 Storage Control Bit Restrictions All combinations of W, I, M, G, and E values are permit- ted except those for which both W and I are 1. Programming Note If an application program requests both the Write Through Required and the Caching Inhibited attributes for a given storage location, the operating system should set the I bit to 1 and the W bit to 0. At any given time, the value of the I bit must be the same for all accesses to a given real page. 552 Power ISATM -- Book III-E Version 2.04 Accesses to the same storage location using two effec- tive addresses for which the W bit differs meet the memory coherence requirements described in Section 1.6.3 of Book II if the accesses are performed by a single processor. If the accesses are performed by two or more processors, coherence is enforced by the hardware only if the W bit is the same for all the accesses. At any given time, data accesses to a given real page may use both Endian modes. When changing the Endian mode of a given real page for instruction fetch- ing, care must be taken to prevent accesses while the change is made and to flush the instruction cache(s) after the change has been completed. 4.8.3.2 Altering the Storage Control Bits When changing the value of the I bit for a given real page from 0 to 1, software must set the I bit to 1 and then flush all copies of locations in the page from the caches using dcbf, dcbfep, or dcbi, and icbi or icbiep before permitting any other accesses to the page. When changing the value of the W bit for a given real page from 0 to 1, software must ensure that no proces- sor modifies any location in the page until after all cop- ies of locations in the page that are considered to be modified in the data caches have been copied to main storage using dcbst, dcbstep, dcbf, dcbfep, or dcbi. When changing the value of the M bit for a given real page, software must ensure that all data caches are consistent with main storage. The actions required to do this to are system-dependent. Programming Note For example, when changing the M bit in some directory-based systems, software may be required to execute dcbf or dcbfep on each processor to flush all storage locations accessed with the old M value before permitting the locations to be accessed with the new M value. Chapter 4. Storage Control 553 Version 2.04 4.9 Storage Control Instructions 4.9.1 Cache Management Instructions This section describes aspects of cache management delayed Machine Check interrupt or a delayed Check- that are relevant only to privileged software program- stop. mers. Each implementation provides an efficient means by For a dcbz or dcba instruction that causes the target which software can ensure that all blocks that are con- block to be newly established in the data cache without sidered to be modified in the data cache have been being fetched from main storage, the processor need copied to main storage before the processor enters any not verify that the associated real address is valid. The power conserving mode in which data cache contents existence of a data cache block that is associated with are not maintained. an invalid real address (see Section 4.6) can cause a Data Cache Block Invalidate X-form cache, except that the invalidation is not ordered by mbar. On other implementations this instruction is dcbi RA,RB treated as a Load (see the section cited above). If a processor holds a reservation and some other pro- 31 /// RA RB 470 / cessor executes a dcbi to the same reservation gran- 0 6 11 16 21 31 ule, whether the reservation is lost is undefined. if RA=0 then b 0 dcbi may cause a cache locking exception, the details else b (RA) of which are implementation-dependent. EA b + (RB) This instruction is privileged. InvalidateDataCacheBlock( EA ) Special Registers Altered: Let the effective address (EA) be the sum (RA|0)+(RB). None If the block containing the byte addressed by EA is in storage that is Memory Coherence Required and a block containing the byte addressed by EA is in the data cache of any processors, then the block is invali- dated in those data caches. On some implementations, before the block is invalidated, if any locations in the block are considered to be modified in any such data cache, those locations are written to main storage and additional locations in the block may be written to main storage. If the block containing the byte addressed by EA is in storage that is not Memory Coherence Required and a block containing the byte addressed by EA is in the data cache of this processor, then the block is invali- dated in that data cache. On some implementations, before the block is invalidated, if any locations in the block are considered to be modified in that data cache, those locations are written to main storage and addi- tional locations in the block may be written to main stor- age. The function of this instruction is independent of whether the block containing the byte addressed by EA is in storage that is Write Through Required or Caching Inhibited. This instruction is treated as a Store (see Section 4.7.4.4) on implementations that invalidate a block without first writing to main storage all locations in the block that are considered to be modified in the data 554 Power ISATM -- Book III-E Version 2.04 4.9.2 Cache Locking [Category: Embedded Cache Locking] The Embedded Cache Locking category defines the method of locking is said to be persistent; otherwise instructions and methods for locking cache blocks for it is not persistent. An implementation may choose to frequently used instructions and data. Cache locking implement locks as persistent or not persistent; how- allows software to instruct the cache to keep latency ever, the preferred method is persistent. sensitive data readily available for fast access. This is It is implementation-dependent if cache blocks are accomplished by marking individual cache blocks as implicitly unlocked in the following ways: locked. 1 A locked block is invalidated as the result of a dcbi, A locked block differs from a normal block in the cache dcbf, dcbfep, icbi, or icbiep instruction. in the following way: 1 A locked block is evicted because of an overlock- 1 blocks that are locked in the cache do not partici- ing condition. pate in the normal replacement policy when a 1 A snoop hit on a locked block that requires the block must be replaced. block to be invalidated. This can occur because the data the block contains has been modified external to the processor, or another processor has explic- 4.9.2.1 Lock Setting and Clearing itly invalidated the block. Blocks are locked into the cache by software using 1 The entire cache containing the locked block is Cache Locking instructions. The following instructions invalidated. are provided to lock data items into the data and instruction cache: 4.9.2.2 Error Conditions 1 dcbtls - Data cache block touch and lock set. Setting locks in the cache can fail for a variety of rea- 1 dcbtstls - Data cache block touch for store and sons. A Lock Set instruction addressing a byte in stor- lock set. age that is not allowed to be accessed by the storage 1 icbtls - Instruction cache block touch and lock set. access control mechanism (see Section 4.7.4) will The RA and RB operands in these instructions are cause a Data Storage interrupt (DSI). Addresses refer- used to identify the block to be locked. The CT field enced by Cache Locking instructions are always trans- indicates which cache in the cache hierarchy should be lated as data references; therefore, icbtls instructions targeted. (See Section 3.2 of Book II.) that fail to translate or are not allowed by the storage access control mechanism cause Data TLB Error inter- These instructions are similar in nature to the dcbt, rupts and Data Storage interrupts, respectively. Addi- dcbtst, and icbt instructions, but are not hints and thus tionally, cache locking and clearing operations can fail locking instructions do not execute speculatively and due to non-privileged access. The methods for deter- may cause additional exceptions. For unified caches, mining other failure conditions such as unable-to-lock both the instruction lock set and the data lock set target or overlocking (see below), is implementation-depen- the same cache. dent. Similarly, blocks are unlocked from the cache by soft- When a Cache Locking instruction is executed in user ware using Lock Clear instructions. The following mode and MSRUCLE is 0, a Data Storage interrupt instructions are provided to unlock instructions and occurs and one of the following ESR bits is set to 1. data in their respective caches: 1 dcblc - Data cache block lock clear. Bit Description 1 icblc - Instruction cache block lock clear. 42 DLK0 The RA and RB operands in these instructions are 0 Default setting. used to identify the block to be unlocked. The CT field 1 A dcbtls, dcbtstls, or dcblc instruction indicates which cache in the cache hierarchy should be was executed in user mode. targeted. 43 DLK1 Additionally, an implementation-dependent method can be provided for software to clear all the locks in the 0 Default setting. cache. 1 An icbtls or icblc instruction was exe- cuted in user mode. An implementation is not required to unlock blocks that contain data that has been invalidated unless it is 4.9.2.2.1 Overlocking explicitly unlocked with a dcblc or icblc instruction; if the implementation does not unlock the block upon If no exceptions occur for the execution of an dcbtls, invalidation, the block remains locked even though it dcbtstls, or icbtls instruction, an attempt is made to contains invalid data. If the implementation does not lock the specified block into the cache. If all of the avail- clear locks when the associated block is invalidated, able cache blocks into which the specified block may be Chapter 4. Storage Control 555 Version 2.04 loaded are already locked, an overlocking condition occurs. The overlocking condition may be reported in an implementation-dependent manner. If an overlocking condition occurs, it is implementation- dependent whether the specified block is not locked into the cache or if another locked block is evicted and the specified block is locked. The selection of which block is replaced in an overlock- ing situation is implementation-dependent. The over- locking condition is still said to exist, and is reflected in any implementation-dependent overlocking status. An attempt to lock a block that is already present and valid in the cache will not cause an overlocking condi- tion. If a cache block is to be loaded because of an instruc- tion other than a Cache Management or Cache Locking instruction and all available blocks into which the block can be loaded are locked, the instruction executes and completes, but no cache blocks are unlocked and the block is not loaded into the cache. Programming Note Since caches may be shared among processors, an overlocking condition may occur when loading a block even though a given processor has not locked all the available cache blocks. Similarly. blocks may be unlocked as a result of invalidations by other processors. 4.9.2.2.2 Unable-to-lock and Unable-to-unlock Conditions If no exceptions occur and no overlocking condition exists, an attempt to set or unlock a lock may fail if any of the following are true: 1 The target address is marked Caching Inhibited, or the storage attributes of the address use a coher- ency protocol that does not support locking. 1 The target cache is disabled or not present. 1 The CT field of the instructions contains a value not supported by the implementation. 1 Any other implementation-specific error conditions are detected. If an unable-to-lock or unable-to-unlock condition occurs, the lock set or unlock instruction is treated as a no-op and the condition may be reported in an imple- mentation-dependent manner. 556 Power ISATM -- Book III-E Version 2.04 4.9.2.3 Cache Locking Instructions Data Cache Block Touch and Lock Set Data Cache Block Touch for Store and X-form Lock Set X-form dcbtls CT,RA,RB dcbtstls CT,RA,RB 31 / CT RA RB 166 / 31 / CT RA RB 134 / 0 6 7 11 16 21 31 0 6 7 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). The dcbtls instruction provides a hint that the program The dcbtstls instruction provides a hint that the pro- will probably soon load from the block containing the gram will probably soon store to the block containing byte addressed by EA, and that the block containing the byte addressed by EA, and that the block contain- the byte addressed by EA is to be loaded and locked ing the byte addressed by EA is to be loaded and into the cache specified by the CT field. (See locked into the cache specified by the CT field. (See Section 3.2 of Book II.) If the CT field is set to a value Section 3.2 of Book II.) If the CT field is set to a value not supported by the implementation, no operation is not supported by the implementation, no operation is performed. performed. If the block already exists in the cache, the block is If the block already exists in the cache, the block is locked without accessing storage. If the block is in a locked without accessing storage. If the block is in a storage location that is Caching Inhibited, then no storage location that is Caching Inhibited, then no cache operation is performed. An unable-to-lock condi- cache operation is performed. An unable-to-lock condi- tion may occur (see Section 4.9.2.2.2), or an overlock- tion may occur (see Section 4.9.2.2.2), or an overlock- ing condition may occur (see Section 4.9.2.2.1). ing condition may occur (see Section 4.9.2.2.1). The dcbtls instruction may complete before the opera- The dcbtstls instruction may complete before the oper- tion it causes has been performed. ation it causes has been performed. The instruction is treated as a Load. It is implementation-dependent whether the instruction is treated as a Load or a Store. This instruction is privileged unless the Embedded Cache Locking.User Mode category is supported. If the This instruction is privileged unless the Embedded Embedded Cache Locking.User Mode category is Cache Locking.User Mode category is supported. If the supported, this instruction is privileged only if Embedded Cache Locking.User Mode category is sup- MSRUCLE=0. ported, this instruction is privileged only if MSRUCLE=0. Special Registers Altered: Special Registers Altered: None None Chapter 4. Storage Control 557 Version 2.04 Instruction Cache Block Touch and Lock Instruction Cache Block Lock Clear Set X-form X-form icbtls CT,RA,RB icblc CT,RA,RB 31 / CT RA RB 486 / 31 / CT RA RB 230 / 0 6 7 11 16 21 31 0 6 7 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). The icbtls instruction causes the block containing the The block containing the byte addressed by EA in the byte addressed by EA to be loaded and locked into the instruction cache specified by the CT field is unlocked. instruction cache specified by CT, and provides a hint The instruction is treated as a Load. that the program will probably soon execute code from the block. See Section 3.2 of Book II for a definition of An unable-to-unlock condition may occur (see Section the CT field. 4.9.2.2.2). If the block containing the byte addressed by EA is not locked in the specified cache, no cache oper- If the block already exists in the cache, the block is ation is performed. locked without refetching from memory. If the block is in storage that is Caching Inhibited, no cache operation is This instruction is privileged unless the Embedded performed. Cache Locking.User Mode category is supported. If the Embedded Cache Locking.User Mode category is sup- This instruction treated as a Load (see Section 3.2), ported, this instruction is privileged only if MSRUCLE=0. except that the system instruction storage error handler is not invoked. Special Registers Altered: None An unable-to-lock condition may occur (see Section 4.9.2.2.2), or an overlocking condition may occur (see Section 4.9.2.2.1). This instruction is privileged unless the Embedded Data Cache Block Lock Clear X-form Cache Locking.User Mode category is supported. If the Embedded Cache Locking.User Mode category is sup- dcblc CT,RA,RB ported, this instruction is privileged only if MSRUCLE=0. 31 / CT RA RB 390 / Special Registers Altered: 0 6 7 11 16 21 31 None Let the effective address (EA) be the sum (RA|0)+(RB). The block containing the byte addressed by EA in the data cache specified by the CT field is unlocked. The instruction is treated as a Load. An unable-to-unlock condition may occur (see Section 4.9.2.2.2). If the block containing the byte addressed by EA is not locked in the specified cache, no cache oper- ation is performed. This instruction is privileged unless the Embedded Cache Locking.User Mode category is supported. If the Embedded Cache Locking.User Mode category is sup- ported, this instruction is privileged only if MSRUCLE=0. Special Registers Altered: None Programming Note The dcblc and icblc instructions are used to remove locks previously set by the corresponding lock set instructions. 558 Power ISATM -- Book III-E Version 2.04 4.9.3 Synchronize Instruction 4.9.4 Lookaside Buffer The Synchronize instruction is described in Management Section 3.3.3 of Book II, but only at the level required All implementations include a TLB as the architected by an application programmer. This section describes repository of translation, protection, and attribute infor- properties of the instruction that are relevant only to mation for storage. operating system programmers. Each implementation that has a TLB or similar looka- In conjunction with the tlbie and tlbsync instructions, side buffer provides a means by which software can the sync instruction provides an ordering function for invalidate the lookaside entry that translates a given TLB invalidations and related storage accesses on effective address. other processors as described in the tlbsync instruc- tion description on page 561. Programming Note The invalidate all entries function is not required because each TLB entry can be addressed directly without regard to the contents of the entry. In addition, implementations provide a means by which software can do the following. 1 Read a specified TLB entry 1 Identify the TLB entry (if any) associated with a specified effective address 1 Write a specified TLB entry Programming Note Because the presence, absence, and exact semantics of the TLB Management instructions are implementation-dependent, it is recommended that system software "encapsulate" uses of these instructions into subroutines to minimize the impact of moving from one implementation to another. Chapter 4. Storage Control 559 Version 2.04 4.9.4.1 TLB Management Instructions The tlbivax instruction is used to invalidate TLB write, and search TLB entries, and to provide an order- entries. Additional instructions are used to read and ing function for the effects of tlbivax TLB Invalidate Virtual Address Indexed TLB Read Entry X-form X-form tlbre (implementation dependent) tlbivax (implementation dependent) 31 ??? ??? ??? 946 / 31 ??? ??? ??? 786 / 0 6 11 16 21 31 0 6 11 16 21 31 Bits 6:20 of the instruction encoding are implementa- Bits 6:20 of the instruction encoding are implementa- tion-dependent, and may be used to specify the source tion-dependent, and may be used to specify the TLB TLB entry, the source portion of the source TLB entry, entry or entries to be invalidated. (E.g. they may specify and the target resource that the result is placed into. virtual or effective addresses.) The implementation-dependent-specified TLB entry is If a single tlbivax instruction can invalidate more read, and the implementation-dependent-specified por- entries than those corresponding to a single VA, a tion of the TLB entry is extracted and placed into an means must be provided to prevent specific TLB entries implementation-dependent target resource. from being invalidated. If the instruction specifies a TLB entry that does not If the Translation Lookaside Buffer (TLB) contains an exist, the results are undefined. entry specified, the entry or entries are made invalid Execution of this instruction may cause other imple- (i.e. removed from the TLB). This instruction causes the mentation-dependent effects. target TLB entry to be invalidated in all processors. This instruction is privileged. If the instruction specifies a TLB entry that does not exist, the results are undefined. Special Registers Altered: Implementation-dependent Execution of this instruction may cause other imple- mentation-dependent effects. The operation performed by this instruction is ordered by the mbar (or sync) instruction with respect to a sub- sequent tlbsync instruction executed by the processor executing the tlbivax instruction. The operations caused by tlbivax and tlbsync are ordered by mbar as a set of operations which is independent of the other sets that mbar orders. This instruction is privileged. Special Registers Altered: None Programming Note The effects of the invalidation may not be visible until the completion of a context synchronizing operation (see Section 1.6.1). Programming Note Care must be taken not to invalidate any TLB entry that contains the mapping for any interrupt vector. 560 Power ISATM -- Book III-E Version 2.04 TLB Search Indexed X-form TLB Synchronize X-form tlbsx RA,RB, (implementation dependent) tlbsync 31 ??? RA RB 914 ? 31 /// /// /// 566 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA=0 then b 1 0 The tlbsync instruction provides an ordering function else b 1 (RA) for the effects of all tlbivax instructions executed by the EA 1 b + (RB) processor executing the tlbsync instruction, with AS 1 implementation-dependent value respect to the memory barrier created by a subsequent ProcessID 1 implementation-dependent value sync instruction executed by the same processor. Exe- VA 1 AS || ProcessID || EA cuting a tlbsync instruction ensures that all of the fol- If there is a TLB entry for which TLBentryVA=VA then result 1 implementation-dependent value lowing will occur. else result 1 undefined 1 All storage accesses by other processors for which target resource(???) 1 result the address was translated using the translations Let the effective address (EA) be the sum(RA|0)+ (RB). being invalidated will have been performed with respect to the processor executing the sync Let address space (AS) be defined as implementation- instruction, to the extent required by the associ- dependent (e.g. could be MSRDS or a bit from an imple- ated Memory Coherence Required attributes, mentation-dependent SPR). before the sync instruction's memory barrier is Let the ProcessID be defined as implementation- created. dependent (e.g. could be from the PID register or from The operation performed by this instruction is ordered an implementation-dependent SPR). by the mbar or msync instruction with respect to pre- Let the virtual address (VA) be the value AS || Pro- ceding tlbivax instructions executed by the processor cessID || EA. See Figure 9 on page 547. executing the tlbsync instruction. The operations caused by tlbivax and tlbsync are ordered by mbar as Bits 6:10 of the instruction encoding are implementa- a set of operations, which is independent of the other tion-dependent, and may be used to specify the target sets that mbar orders. resource that the result of the instruction is placed into. The tlbsync instruction may complete before opera- If the Translation Lookaside Buffer (TLB) contains an tions caused by tlbivax instructions preceding the tlb- entry corresponding to VA, an implementation-depen- sync instruction have been performed. dent value is placed into an implementation-dependent- specified target. Otherwise the contents of the imple- This instruction is privileged. mentation-dependent-specified target are left unde- Special Registers Altered: fined. None Bit 31 of the instruction encoding is implementation- dependent. For example, bit 31 may be interpreted as an "Rc" bit, used to enable recording the success or failure of the search operation. This instruction is privileged. Special Registers Altered: None Chapter 4. Storage Control 561 Version 2.04 TLB Write Entry X-form tlbwe (implementation dependent) 31 ??? ??? ??? 978 / 0 6 11 16 21 31 Bits 6:20 of the instruction encoding are implementa- tion-dependent, and may be used to specify the target TLB entry, the target portion of the target TLB entry, and the source of the value that is to be written into the TLB. The contents of the implementation-dependent-speci- fied source are written into the implementation-depen- dent-specified portion of the implementation- dependent-specified TLB entry. If the instruction specifies a TLB entry that does not exist, the results are undefined. Execution of this instruction may cause other imple- mentation-dependent effects. This instruction is privileged. Special Registers Altered: Implementation-dependent Programming Note The effects of the update may not be visible until the completion of a context synchronizing opera- tion (see Section 1.6.1). Programming Note Care must be taken not to invalidate any TLB entry that contains the mapping for any interrupt vector. 562 Power ISATM -- Book III-E Version 2.04 Chapter 5. Interrupts and Exceptions 5.1 Overview. . . . . . . . . . . . . . . . . . . . 564 5.6.7 Program Interrupt . . . . . . . . . . . . 580 5.2 Interrupt Registers . . . . . . . . . . . . 564 5.6.8 Floating-Point Unavailable Interrupt . 5.2.1 Save/Restore Register 0 . . . . . . 564 581 5.2.2 Save/Restore Register 1 . . . . . . 564 5.6.9 System Call Interrupt . . . . . . . . . 581 5.2.3 Critical Save/Restore Register 0 565 5.6.10 Auxiliary Processor Unavailable 5.2.4 Critical Save/Restore Register 1 565 Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 581 5.2.5 Debug Save/Restore Register 0 5.6.11 Decrementer Interrupt . . . . . . . 582 [Category: Embedded.Enhanced Debug] . 5.6.12 Fixed-Interval Timer Interrupt . . 582 565 5.6.13 Watchdog Timer Interrupt. . . . . 582 5.2.6 Debug Save/Restore Register 1 5.6.14 Data TLB Error Interrupt. . . . . . 583 [Category: Embedded.Enhanced Debug] . 5.6.15 Instruction TLB Error Interrupt . 583 565 5.6.16 Debug Interrupt . . . . . . . . . . . . 584 5.2.7 Data Exception Address Register . . 5.6.17 SPE/Embedded Floating-Point/Vec- 566 tor Unavailable Interrupt 5.2.8 Interrupt Vector Prefix Register . 566 [Categories: SPE.Embedded Float Scalar 5.2.9 Exception Syndrome Register . . 567 Double, SPE.Embedded Float Vector, Vec- 5.2.10 Interrupt Vector Offset Registers . . tor] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 568 5.6.18 Embedded Floating-Point Data 5.2.11 Machine Check Registers . . . . 568 Interrupt 5.2.11.1 Machine Check Save/Restore [Categories: SPE.Embedded Float Scalar Register 0 . . . . . . . . . . . . . . . . . . . . . . 569 Double, SPE.Embedded Float Scalar Sin- 5.2.11.2 Machine Check Save/Restore gle, SPE.Embedded Float Vector] . . . . 586 Register 1 . . . . . . . . . . . . . . . . . . . . . . 569 5.6.19 Embedded Floating-Point Round 5.2.11.3 Machine Check Syndrome Regis- Interrupt ter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 [Categories: SPE.Embedded Float Scalar 5.2.12 External Proxy Register [Category: Double, SPE.Embedded Float Scalar Sin- External Proxy] . . . . . . . . . . . . . . . . . . 569 gle, SPE.Embedded Float Vector] . . . . 586 5.3 Exceptions . . . . . . . . . . . . . . . . . . 570 5.6.20 Performance Monitor Interrupt [Cat- 5.4 Interrupt Classification . . . . . . . . . 570 egory: Embedded.Performance Monitor]. . 5.4.1 Asynchronous Interrupts . . . . . . 570 587 5.4.2 Synchronous Interrupts . . . . . . . 570 5.6.21 Processor Doorbell Interrupt [Cate- 5.4.2.1 Synchronous, Precise Interrupts. . gory: Embedded.Processor Control] . . 587 571 5.6.22 Processor Doorbell Critical Interrupt 5.4.2.2 Synchronous, Imprecise Interrupts [Category: Embedded.Processor Control]. 571 587 5.4.3 Interrupt Classes . . . . . . . . . . . . 571 5.7 Partially Executed Instructions . . . 588 5.4.4 Machine Check Interrupts . . . . . 571 5.8 Interrupt Ordering and Masking . . 589 5.5 Interrupt Processing . . . . . . . . . . . 572 5.8.1 Guidelines for System Software . 590 5.6 Interrupt Definitions . . . . . . . . . . . 574 5.8.2 Interrupt Order . . . . . . . . . . . . . . 591 5.6.1 Critical Input Interrupt . . . . . . . . 576 5.9 Exception Priorities . . . . . . . . . . . . 591 5.6.2 Machine Check Interrupt . . . . . . 576 5.9.1 Exception Priorities for Defined 5.6.3 Data Storage Interrupt . . . . . . . . 577 Instructions . . . . . . . . . . . . . . . . . . . . . 592 5.6.4 Instruction Storage Interrupt . . . 578 5.9.1.1 Exception Priorities for Defined 5.6.5 External Input Interrupt . . . . . . . 578 Floating-Point Load and Store Instructions 5.6.6 Alignment Interrupt . . . . . . . . . . 579 592 Chapter 5. Interrupts and Exceptions 563 Version 2.04 5.9.1.2 Exception Priorities for Other 5.9.1.6 Exception Priorities for Defined Defined Load and Store Instructions and System Call Instruction . . . . . . . . . . . . 593 Defined Cache Management Instructions . 5.9.1.7 Exception Priorities for Defined 592 Branch Instructions . . . . . . . . . . . . . . . 593 5.9.1.3 Exception Priorities for Other 5.9.1.8 Exception Priorities for Defined Defined Floating-Point Instructions. . . .592 Return From Interrupt Instructions . . . 593 5.9.1.4 Exception Priorities for Defined 5.9.1.9 Exception Priorities for Other Privileged Instructions . . . . . . . . . . . . .592 Defined Instructions . . . . . . . . . . . . . . 593 5.9.1.5 Exception Priorities for Defined 5.9.2 Exception Priorities for Reserved Trap Instructions . . . . . . . . . . . . . . . . . .592 Instructions . . . . . . . . . . . . . . . . . . . . . 593 5.1 Overview In general, SRR0 contains the address of the instruc- tion that caused the non-critical interrupt, or the An interrupt is the action in which the processor saves address of the instruction to return to after a non-critical its old context (MSR and next instruction address) and interrupt is serviced. begins execution at a pre-determined interrupt-handler The contents of SRR0 when an interrupt is taken are address, with a modified MSR. Exceptions are the mode dependent, reflecting the computation mode cur- events that will, if enabled, cause the processor to take rently in use (specified by MSRCM) and the computa- an interrupt. tion mode entered for execution of the interrupt Exceptions are generated by signals from internal and (specified by MSRICM). The contents of SRR0 upon external peripherals, instructions, the internal timer interrupt can be described as follows (assuming Addr is facility, debug events, or error conditions. the address to be put into SRR0): Interrupts are divided into 4 classes, as described in if (MSRCM = 0) & (MSRICM = 0) Section 5.4.3, such that only one interrupt of each class then SRR0 32undefined || Addr32:63 is reported, and when it is processed no program state if (MSRCM = 0) & (MSRICM = 1) is lost. Since Save/Restore register pairs SRR0/SRR1, then SRR0 320 || Addr32:63 CSRR0/CSRR1, DSRR0/DSRR1 [Category: E.ED], if (MSRCM = 1) & (MSRICM = 1) then SRR0 Addr0:63 if (MSRCM = 1) & (MSRICM = 0) then SRR0 undefined and MCSSR0/MCSSR1 are serially reusable resources used by base, critical, debug [Category: E.ED], The contents of SRR0 can be read into register RT Machine Check interrupts, respectively, program state using mfspr RT,SRR0. The contents of register RS can may be lost when an unordered interrupt is taken. (See be written into the SRR0 using mtspr SRR0,RS. Section 5.8. All interrupts, except Machine Check, are context syn- 5.2.2 Save/Restore Register 1 chronizing as defined in Section 1.6.1 on page 511. A Machine Check interrupt acts like a context synchroniz- Save/Restore Register 1 (SRR1) is a 32-bit register. ing operation with respect to subsequent instructions; SRR1 bits are numbered 32 (most-significant bit) to 63 that is, a Machine Check interrupt need not satisfy (least-significant bit). The register is used to save items 2-3 of Section 1.6.1 but does satisfy items 1, 4, machine state on non-critical interrupts, and to restore and 5. machine state when an rfi is executed. When a non- critical interrupt is taken, the contents of the MSR are placed into SRR1. When rfi is executed, the contents of 5.2 Interrupt Registers SRR1 are placed into the MSR. Bits of SRR1 that correspond to reserved bits in the 5.2.1 Save/Restore Register 0 MSR are also reserved. Save/Restore Register 0 (SRR0) is a 64-bit register. Programming Note SRR0 bits are numbered 0 (most-significant bit) to 63 A MSR bit that is reserved may be inadvertently (least-significant bit). The register is used to save modified by rfi/rfci/rfmci. machine state on non-critical interrupts, and to restore machine state when an rfi is executed. On a non-criti- The contents of SRR1 can be read into register RT cal interrupt, SRR0 is set to the current or next instruc- using mfspr RT,SRR1. The contents of register RS can tion address. When rfi is executed, instruction be written into the SRR1 using mtspr SRR1,RS. execution continues at the address in SRR0. 564 Power ISATM -- Book III-E Version 2.04 5.2.3 Critical Save/Restore Regis- can be written into the CSRR1 using mtspr CSRR1,RS. ter 0 Critical Save/Restore Register 0 (CSRR0) is a 64-bit 5.2.5 Debug Save/Restore Regis- register. CSRR0 bits are numbered 0 (most-significant bit) to 63 (least-significant bit). The register is used to ter 0 [Category: Embed- save machine state on critical interrupts, and to restore ded.Enhanced Debug] machine state when an rfci is executed. When a critical interrupt is taken, the CSRR0 is set to the current or Debug Save/Restore Register 0 (DSRR0) is a 64-bit next instruction address. When rfci is executed, register used to save machine state on Debug inter- instruction execution continues at the address in rupts, and to restore machine state when an rfdi is exe- CSRR0. cuted. When a Debug interrupt is taken, the DSRR0 is set to the current or next instruction address. When rfdi In general, CSRR0 contains the address of the instruc- is executed, instruction execution continues at the tion that caused the critical interrupt, or the address of address in DSRR0. the instruction to return to after a critical interrupt is ser- viced. In general, DSRR0 contains the address of an instruc- tion that was executing or just finished execution when The contents of CSRR0 when a critical interrupt is the Debug exception occurred. taken are mode dependent, reflecting the computation mode currently in use (specified by MSRCM) and the The contents of DSRR0 when a Debug interrupt is computation mode entered for execution of the critical taken are mode dependent, reflecting the computation interrupt (specified by MSRICM). The contents of mode currently in use (specified by MSRCM) and the CSRR0 upon critical interrupt can be described as fol- computation mode entered for execution of the Debug lows (assuming Addr is the address to be put into interrupt (specified by MSRICM). The contents of CSRR0): DSRR0 upon Debug interrupt can be described as fol- lows (assuming Addr is the address to be put into if (MSRCM = 0) & (MSRICM = 0) DSRR0): then CSRR0 32undefined || Addr32:63 if (MSRCM = 0) & (MSRICM = 0) then DSRR0 1 32undefined || if (MSRCM = 0) & (MSRICM = 1) Addr32:63 then CSRR0 320 || Addr32:63 if (MSRCM = 0) & (MSRICM = 1) then DSRR0 1 320 || Addr32:63 if (MSRCM = 1) & (MSRICM = 1) then CSRR0 Addr0:63 if (MSRCM = 1) & (MSRICM = 1) then DSRR0 1 Addr0:63 if (MSRCM = 1) & (MSRICM = 0) then CSRR0 undefined if (MSRCM = 1) & (MSRICM = 0) then DSRR0 1 undefined The contents of CSRR0 can be read into register RT The contents of DSRR0 can be read into register RT using mfspr RT,CSRR0. The contents of register RS using mfspr RT,DSRR0. The contents of register RS can be written into CSRR0 using mtspr CSRR0,RS. can be written into DSRR0 using mtspr DSRR0,RS. 5.2.4 Critical Save/Restore Regis- 5.2.6 Debug Save/Restore Regis- ter 1 ter 1 [Category: Embed- Critical Save/Restore Register 1 (CSRR1) is a 32-bit ded.Enhanced Debug] register. CSRR1 bits are numbered 32 (most-significant Debug Save/Restore Register 1 (DSRR1) is a 32-bit bit) to 63 (least-significant bit). The register is used to register used to save machine state on Debug inter- save machine state on critical interrupts, and to restore rupts, and to restore machine state when an rfdi is exe- machine state when an rfci is executed. When a critical cuted. When a Debug interrupt is taken, the contents of interrupt is taken, the contents of the MSR are placed the Machine State Register are placed into DSRR1. into CSRR1. When rfci is executed, the contents of When rfdi is executed, the contents of DSRR1 are CSRR1 are placed into the MSR. placed into the Machine State Register. Bits of CSRR1 that correspond to reserved bits in the Bits of DSRR1 that correspond to reserved bits in the MSR are also reserved. Machine State Register are also reserved. Programming Note The contents of DSRR1 can be read into bits 32:63 of A MSR bit that is reserved may be inadvertently register RT using mfspr RT,DSRR1, setting bits 0:31 of modified by rfi/rfci/rfmci. RT to zero. The contents of bits 32:63 of register RS can be written into the DSSR1 using mtspr DSRR1,RS. The contents of CSRR1 can be read into bits 32:63 of register RT using mfspr RT,CSRR1, setting bits 0:31 of RT to zero. The contents of bits 32:63 of register RS Chapter 5. Interrupts and Exceptions 565 Version 2.04 5.2.7 Data Exception Address Register The Data Exception Address Register (DEAR) is a 64- bit register. DEAR bits are numbered 0 (most-signifi- cant bit) to 63 (least-significant bit). The DEAR contains the address that was referenced by a Load, Store or Cache Management instruction that caused an Align- ment, Data TLB Miss, or Data Storage interrupt. The contents of the DEAR when an interrupt is taken are mode dependent, reflecting the computation mode currently in use (specified by MSRCM) and the compu- tation mode entered for execution of the critical inter- rupt (specified by MSRICM). The contents of the DEAR upon interrupt can be described as follows (assuming Addr is the address to be put into DEAR): if (MSRCM = 0) & (MSRICM = 0) then DEAR 32undefined || Addr32:63 if (MSRCM = 0) & (MSRICM = 1) then DEAR 320 || Addr32:63 if (MSRCM = 1) & (MSRICM = 1) then DEAR Addr0:63 if (MSRCM = 1) & (MSRICM = 0) then DEAR undefined The contents of DEAR can be read into register RT using mtspr RT,DEAR. The contents of register RS can be written into the DEAR using mtspr DEAR,RS. 5.2.8 Interrupt Vector Prefix Reg- ister The Interrupt Vector Prefix Register (IVPR) is a 64-bit register. Interrupt Vector Prefix Register bits are num- bered 0 (most-significant bit) to 63 (least-significant bit). Bits 48:63 are reserved. Bits 0:47 of the Interrupt Vec- tor Prefix Register provides the high-order 48 bits of the address of the exception processing routines. The 16- bit exception vector offsets (provided in Section 5.2.10) are concatenated to the right of bits 0:47 of the Inter- rupt Vector Prefix Register to form the 64-bit address of the exception processing routine. The contents of Interrupt Vector Prefix Register can be read into register RT using mfspr RT,IVPR. The con- tents of register RS can be written into Interrupt Vector Prefix Register using mtspr IVPR,RS. 566 Power ISATM -- Book III-E Version 2.04 5.2.9 Exception Syndrome Register The Exception Syndrome Register (ESR) is a 32-bit the bit or bits corresponding to the specific exception register. ESR bits are numbered 32 (most-significant that generated the interrupt is set, and all other ESR bit) to 63 (least-significant bit). The ESR provides a bits are cleared. Other interrupt types do not affect the syndrome to differentiate between the different kinds of contents of the ESR. The ESR does not need to be exceptions that can generate the same interrupt type. cleared by software. Figure 12 shows the bit definitions Upon the generation of one of these types of interrupts, for the ESR. Bit(s) Name Meaning Associated Interrupt Type 32:35 Implementation-dependent (Implementation-dependent) 36 PIL Illegal Instruction exception Program 37 PPR Privileged Instruction exception Program 38 PTR Trap exception Program 39 FP Floating-point operation Alignment Data Storage Data TLB Program 40 ST Store operation Alignment Data Storage Data TLB Error 41 Reserved 42 DLK0 (Implementation-dependent) (Implementation-dependent) 43 DLK1 (implementation-dependent) (Implementation-dependent) 44 AP Auxiliary Processor operation Alignment Data Storage Data TLB Program 45 PUO Unimplemented Operation exception Program 46 BO Byte Ordering exception Data Storage Instruction Storage 47 PIE Imprecise exception Program 48:55 Reserved 56 SPV Signal Processing operation [Category: Sig- Alignment nal Processing Engine] Data Storage Vector operation [Category: Vector] Data TLB Embedded Floating-point Data Embedded Floating-point Round SPE/Embedded Floating-point/Vector Unavailable 57 EPID External Process ID operation [Category: Alignment Embedded.External Process ID] Data Storage Data TLB 58 VLEMI VLE operation [Category: VLE] Alignment Data Storage Data TLB SPE/Embedded Floating-point/Vector Unavailable Embedded Floating-point Data Embedded Floating-point Round Instruction Storage Program System Call 59:61 Implementation-dependent (Implementation-dependent) 62 MIF Misaligned Instruction [Category: VLE] Instruction TLB Instruction Storage Figure 12. Exception Syndrome Register Definitions Chapter 5. Interrupts and Exceptions 567 Version 2.04 Programming Note The information provided by the ESR is not com- IVORi Interrupt plete. System software may also need to identify the type of instruction that caused the interrupt, IVOR0 Critical Input examine the TLB entry accessed by a data or IVOR1 Machine Check instruction storage access, as well as examine the IVOR2 Data Storage ESR to fully determine what exception or excep- IVOR3 Instruction Storage tions caused the interrupt. For example, a Data IVOR4 External Storage interrupt may be caused by both a Protec- IVOR5 Alignment tion Violation exception as well as a Byte Ordering IVOR6 Program exception. System software would have to look IVOR7 Floating-Point Unavailable beyond ESRBO, such as the state of MSRPR in IVOR8 System Call SRR1 and the page protection bits in the TLB entry IVOR9 Auxiliary Processor Unavailable accessed by the storage access, to determine IVOR10 Decrementer whether or not a Protection Violation also occurred. IVOR11 Fixed-Interval Timer Interrupt IVOR12 Watchdog Timer Interrupt IVOR13 Data TLB Error The contents of the ESR can be read into bits 32:63 of IVOR14 Instruction TLB Error register RT using mfspr RT,ESR, setting bits 0:31 of RT IVOR15 Debug to zero. The contents of bits 32:63 of register RS can be written into the ESR using mtspr ESR,RS. IVOR16 Reserved : IVOR31 5.2.10 Interrupt Vector Offset Reg- [Category: Signal Processing Engine] isters [Category: Vector] The Interrupt Vector Offset Registers (IVORs) are 32- IVOR 32 SPE/Embedded Floating-Point/Vector bit registers. Interrupt Vector Offset Register bits are Unavailable Interrupt numbered 32 (most-significant bit) to 63 (least-signifi- [Category: SP.Embedded Float_*] cant bit). Bits 32:47 and bits 60:63 are reserved. An (IVORs 33 & 34 are required if any SP.Float_ Interrupt Vector Offset Register provides the quadword dependent category is supported.) index from the base address provided by the IVPR (see IVOR 33 Embedded Floating-Point Data Interrupt Section 5.2.8) for its respective interrupt. Interrupt Vec- IVOR 34 Embedded Floatg.-pt. round Interrupt tor Offset Registers 0 through 15 and 32-37 are pro- [Category: Embedded Performance Monitor] vided for the defined interrupts. SPR numbers corresponding to Interrupt Vector Offset Registers 16 IVOR 35 Embedded Performance Monitor Inter- through 31 are reserved. SPR numbers corresponding rupt to Interrupt Vector Offset Registers 38 through 63 are [Category: Embedded.Processor Control] allocated for implementation-dependent use. Figure 13 IVOR 36 Processor Doorbell Interrupt provides the assignments of specific Interrupt Vector IVOR 37 Processor Doorbell Critical Interrupt Offset Registers to specific interrupts. IVOR38 Implementation-dependent : IVOR63 Figure 13. Interrupt Vector Offset Register Assignments Bits 48:59 of the contents of IVORi can be read into bits 48:59 of register RT using mfspr RT,IVORi, setting bits 0:47 and bits 60:63 of GPR(RT) to zero. Bits 48:59 of the contents of register RS can be written into bits 48:59 of IVORi using mtspr IVORi,RS. 5.2.11 Machine Check Registers A set of Special Purpose Registers are provided to sup- port Machine Check interrupts. 568 Power ISATM -- Book III-E Version 2.04 5.2.11.1 Machine Check Save/Restore 5.2.11.3 Machine Check Syndrome Register 0 Register Machine Check Save/Restore Register 0 (MCSRR0) is MCSR (MCSR) is a 64-bit register that is used to used to save machine state on Machine Check inter- record the cause of the Machine Check interrupt. The rupts, and to restore machine state when an rfmci is specific definition of the contents of this register are executed. When a Machine Check interrupt is taken, implementation-dependent (see the User Manual of the the MCSRR0 is set to the current or next instruction implementation). address. When rfmci is executed, instruction execution The contents of MCSR can be read into register RT continues at the address in MCSRR0. using mfspr RT,MCSR. The contents of register RS can In general, MCSRR0 contains the address of an be written into the MCSR using mtspr MCSR,RS. instruction that was executing or about to be executed when the Machine Check exception occurred. 5.2.12 External Proxy Register The contents of MCSRR0 when a Machine Check inter- rupt is taken are mode dependent, reflecting the com- [Category: External Proxy] putation mode currently in use (specified by MSRCM) The External Proxy Register (EPR) contains implemen- and the computation mode entered for execution of the tation-dependent information related to an External Machine Check interrupt (specified by MSRICM). The Input interrupt when an External Input interrupt occurs. contents of MCSRR0 upon Machine Check interrupt The EPR is only considered valid from the time that the can be described as follows (assuming Addr is the External Input Interrupt occurs until MSREE is set to 1 address to be put into MCSRR0): as the result of a mtmsr or a return from interrupt instruction. if (MSRCM = 0) & (MSRICM = 0) then MCSRR0 32undefined || Addr32:63 The format of the EPR is shown below. if (MSRCM = 0) & (MSRICM = 1) then MCSRR0 320 || Addr32:63 EPR if (MSRCM = 1) & (MSRICM = 1) then MCSRR0 Addr0:63 32 63 if (MSRCM = 1) & (MSRICM = 0) then MCSRR0 unde- fined Figure 14. External Proxy Register The contents of MCSRR0 can be read into register RT When the External Input interrupt is taken, the contents using mfspr RT,MCSRR0. The contents of register RS of the EPR provide information related to the External can be written into MCSRR0 using mtspr MCSRR0,RS. Input Interrupt. Programming Note 5.2.11.2 Machine Check Save/Restore Register 1 The EPR is provided for faster interrupt processing as well as situations where an interrupt must be Machine Check Save/Restore Register 1 (MCSRR1) is taken, but software must delay the resultant pro- used to save machine state on Machine Check inter- cessing for later. rupts, and to restore machine state when an rfmci is The EPR contains the vector from the interrupt con- executed. When a Machine Check interrupt is taken, troller. The process of receiving the interrupt into the contents of the MSR are placed into MCSRR1. the EPR acknowledges the interrupt to the interrupt When rfmci is executed, the contents of MCSRR1 are controller. The method for enabling or disabling the placed into the MSR. acknowledgment of the interrupt by placing the Bits of MCSRR1 that correspond to reserved bits in the interrupt-related information in the EPR is imple- MSR are also reserved. mentation-dependent. If this acknowledgement is disabled, then the EPR is set to 0 when the Exter- Programming Note nal Input interrupt occurs. A MSR bit that is reserved may be inadvertently modified by rfi/rfci/rfmci. The contents of MCSRR1 can be read into register RT using mfspr RT,MCSRR1. The contents of register RS can be written into the MCSRR1 using mtspr MCSRR1,RS. Chapter 5. Interrupts and Exceptions 569 Version 2.04 1 the execution of an instruction that is not imple- 5.3 Exceptions mented by the implementation (Illegal Instruction There are two kinds of exceptions, those caused exception or Unimplemented Operation exception directly by the execution of an instruction and those type of Program interrupt) caused by an asynchronous event. In either case, the 1 the execution of an auxiliary processor instruction exception may cause one of several types of interrupts when the auxiliary processor instruction is unavail- to be invoked. able (Auxiliary Processor Unavailable interrupt) Examples of exceptions that can be caused directly by 1 the execution of an instruction that causes an aux- the execution of an instruction include but are not lim- iliary processor enabled exception (Enabled ited to the following: exception type Program interrupt) 1 an attempt to execute a reserved-illegal instruction The invocation of an interrupt is precise, except that if (Illegal Instruction exception type Program inter- one of the imprecise modes for invoking the Floating- rupt) point Enabled Exception type Program interrupt is in 1 an attempt by an application program to execute a effect then the invocation of the Floating-point Enabled `privileged' instruction (Privileged Instruction Exception type Program interrupt may be imprecise. exception type Program interrupt) When the interrupt is invoked imprecisely, the excepting instruction does not appear to complete before the next 1 an attempt by an application program to access a instruction starts (because one of the effects of the `privileged' Special Purpose Register (Privileged excepting instruction, namely the invocation of the Instruction exception type Program interrupt) interrupt, has not yet occurred). 1 an attempt by an application program to access a Special Purpose Register that does not exist (Unimplemented Operation Instruction exception 5.4 Interrupt Classification type Program interrupt) All interrupts, except for Machine Check, can be classi- 1 an attempt by a system program to access a Spe- fied as either Asynchronous or Synchronous. Indepen- cial Purpose Register that does not exist (bound- dent from this classification, all interrupts, including edly undefined results) Machine Check, can be classified into one of the follow- 1 the execution of a defined instruction using an ing classes: invalid form (Illegal Instruction exception type Pro- 1 Base gram interrupt, Unimplemented Operation excep- 1 Critical tion type Program interrupt, or Privileged 1 Machine Check Instruction exception type Program interrupt) 1 Debug[Category:Embedded.Enhanced Debug]. 1 an attempt to access a storage location that is either unavailable (Instruction TLB Error interrupt 5.4.1 Asynchronous Interrupts or Data TLB Error interrupt) or not permitted Asynchronous interrupts are caused by events that are (Instruction Storage interrupt or Data Storage independent of instruction execution. For asynchronous interrupt) interrupts, the address reported to the exception han- 1 an attempt to access storage with an effective dling routine is the address of the instruction that would address alignment not supported by the implemen- have executed next, had the asynchronous interrupt not tation (Alignment interrupt) occurred. 1 the execution of a System Call instruction (System Call interrupt) 5.4.2 Synchronous Interrupts 1 the execution of a Trap instruction whose trap con- Synchronous interrupts are those that are caused dition is met (Trap type Program interrupt) directly by the execution (or attempted execution) of 1 the execution of a floating-point instruction when instructions, and are further divided into two classes, floating-point instructions are unavailable (Float- precise and imprecise. ing-point Unavailable interrupt) Synchronous, precise interrupts are those that pre- 1 the execution of a floating-point instruction that cisely indicate the address of the instruction causing causes a floating-point enabled exception to exist the exception that generated the interrupt; or, for cer- (Enabled exception type Program interrupt) tain synchronous, precise interrupt types, the address 1 the execution of a defined instruction that is not of the immediately following instruction. implemented by the implementation (Illegal Synchronous, imprecise interrupts are those that may Instruction exception or Unimplemented Opera- indicate the address of the instruction causing the tion exception type of Program interrupt) 570 Power ISATM -- Book III-E Version 2.04 exception that generated the interrupt, or some instruc- rupt). If the imprecise interrupt is forced by an tion after the instruction causing the exception. msync or isync instruction, then SRR0 or CSRR0 may address either the msync or isync instruc- tion, or the following instruction. 5.4.2.1 Synchronous, Precise Inter- 1 If the imprecise interrupt is not forced by either the rupts context synchronizing mechanism or the execution When the execution or attempted execution of an synchronizing mechanism, then the instruction instruction causes a synchronous, precise interrupt, the addressed by SRR0 or CSRR0 may have been following conditions exist at the interrupt point. partially executed (see Section 5.7 on page 588). 1 No instruction following the instruction addressed 1 SRR0, CSRR0, or DSRR0 [Category: Embed- by SRR0 or CSRR0 has executed. ded.Enhanced Debug] addresses either the instruction causing the exception or the instruction immediately following the instruction causing the 5.4.3 Interrupt Classes exception. Which instruction is addressed can be determined from the interrupt type and status bits. Interrupts can also be classified as base, critical, 1 An interrupt is generated such that all instructions Machine Check, and Debug [Category: Embed- preceding the instruction causing the exception ded.Enhanced Debug]. appear to have completed with respect to the exe- Interrupt classes other than the base class may cuting processor. However, some storage demand immediate attention even if another class of accesses associated with these preceding instruc- interrupt is currently being processed and software has tions may not have been performed with respect to not yet had the opportunity to save the state of the other processors and mechanisms. machine (i.e. return address and captured state of the 1 The instruction causing the exception may appear MSR). For this reason, the interrupts are organized into not to have begun execution (except for causing a hierarchy (see Section 5.8). To enable taking a criti- the exception), may have been partially executed, cal, Machine Check, or Debug [Category: Embed- or may have completed, depending on the interrupt ded.Enhanced Debug] interrupt immediately after a type. See Section 5.7 on page 588. base class interrupt occurs (i.e. before software has 1 Architecturally, no subsequent instruction has exe- saved the state of the machine), these interrupts use cuted beyond the instruction causing the excep- the Save/Restore Register pair CSRR0/CSRR1, tion. MCSRR0/MCSRR1, or DSRR0/DSRR1 [Category: Embedded.Enhanced Debug], and base class inter- 5.4.2.2 Synchronous, Imprecise Inter- rupts use Save/Restore Register pair SRR0/SRR1. rupts When the execution or attempted execution of an 5.4.4 Machine Check Interrupts instruction causes an imprecise interrupt, the following Machine Check interrupts are a special case. They are conditions exist at the interrupt point. typically caused by some kind of hardware or storage 1 SRR0 or CSRR0 addresses either the instruction subsystem failure, or by an attempt to access an invalid causing the exception or some instruction following address. A Machine Check may be caused indirectly by the instruction causing the exception that gener- the execution of an instruction, but not be recognized ated the interrupt. and/or reported until long after the processor has exe- 1 An interrupt is generated such that all instructions cuted past the instruction that caused the Machine preceding the instruction addressed by SRR0 or Check. As such, Machine Check interrupts cannot CSRR0 appear to have completed with respect to properly be thought of as synchronous or asynchro- the executing processor. nous, nor as precise or imprecise. The following gen- 1 If the imprecise interrupt is forced by the context eral rules apply to Machine Check interrupts: synchronizing mechanism, due to an instruction 1. No instruction after the one whose address is that causes another exception that generates an reported to the Machine Check interrupt handler in interrupt (e.g., Alignment, Data Storage), then MCSRR0 has begun execution. SRR0 addresses the interrupt-forcing instruction, and the interrupt-forcing instruction may have been 2. The instruction whose address is reported to the partially executed (see Section 5.7 on page 588). Machine Check interrupt handler in MCSRR0, and 1 If the imprecise interrupt is forced by the execution all prior instructions, may or may not have com- synchronizing mechanism, due to executing an pleted successfully. All those instructions that are execution synchronizing instruction other than ever going to complete appear to have done so msync or isync, then SRR0 or CSRR0 addresses already, and have done so within the context exist- the interrupt-forcing instruction, and the interrupt- ing prior to the Machine Check interrupt. No further forcing instruction appears not to have begun exe- interrupt (other than possible additional Machine cution (except for its forcing the imprecise inter- Chapter 5. Interrupts and Exceptions 571 Version 2.04 Check interrupts) will occur as a result of those where IVPR is the Interrupt Vector Prefix Register instructions. and IVORi is the Interrupt Vector Offset Register for that interrupt (see Figure 13 on page 568). The contents of the Interrupt Vector Prefix Register and 5.5 Interrupt Processing Interrupt Vector Offset Registers are indeterminate upon power-on reset, and must be initialized by Associated with each kind of interrupt is an interrupt system software using the mtspr instruction. vector, that is the address of the initial instruction that is executed when the corresponding interrupt occurs. Interrupts may not clear reservations obtained with Load and Reserve instructions. The operating system Interrupt processing consists of saving a small part of should do so at appropriate points, such as at process the processor's state in certain registers, identifying the switch. cause of the interrupt in another register, and continu- ing execution at the corresponding interrupt vector At the end of an interrupt handling routine, execution of location. When an exception exists that will cause an an rfi, rfdi [Category: Embedded.Enhanced Debug], interrupt to be generated and it has been determined rfmci, or rfci causes the MSR to be restored from the that the interrupt can be taken, the following actions are contents of SRR1, DSRR1 [Category: Embed- performed, in order: ded.Enhanced Debug], MCSRR1, or CSRR1, and instruction execution to resume at the address con- 1. SRR0, DSRR0 [Category: Embedded.Enhanced tained in SRR0, DSRR0 [Category: Embed- Debug], MCSRR0, or CSRR0 is loaded with an ded.Enhanced Debug], MCSRR0, or CSRR0, instruction address that depends on the interrupt; respectively. see the specific interrupt description for details. 2. The ESR is loaded with information specific to the Programming Note exception. Note that many interrupts can only be In general, at process switch, due to possible pro- caused by a single kind of exception event, and cess interlocks and possible data availability thus do not need nor use an ESR setting to indi- requirements, the operating system needs to con- cate to the cause of the interrupt was. sider executing the following. 3. SRR1, DSRR1 [Category: Embedded.Enhanced 1 stwcx. or stdcx., to clear the reservation if Debug], or MCSRR1, or CSRR1 is loaded with a one is outstanding, to ensure that a lwarx or copy of the contents of the MSR. ldarx in the "old" process is not paired with a stwcx. or stdcx. in the "new" process. 4. The MSR is updated as described below. The new 1 msync, to ensure that all storage operations of values take effect beginning with the first instruc- an interrupted process are complete with tion following the interrupt. MSR bits of particular respect to other processors before that pro- interest are the following. cess begins executing on another processor. 1 MSRWE,EE,PR,FP,FE0,FE1,IS,DS are set to 0 by 1 isync, rfi, rfdi [Category: Embed- all interrupts. ded.Enhanced Debug], rfmci, or rfci to ensure 1 MSRME is set to 0 by Machine Check inter- that the instructions in the "new" process exe- rupts and left unchanged by all other inter- cute in the "new" context. rupts. 1 MSRCE is set to 0 by critical class interrupts, Debug interrupts, and Machine Check inter- rupts, and is left unchanged by all other inter- rupts. 1 MSRDE is set to 0 by critical class interrupts unless Category E.ED is supported, by Debug interrupts, and by Machine Check interrupts, and is left unchanged by all other interrupts. 1 MSRCM is set to MSRICM. 1 Other supported MSR bits are left unchanged by all interrupts. See Section 2.2.1 for more detail on the definition of the MSR. 5. Instruction fetching and execution resumes, using the new MSR value, at a location specific to the interrupt. The location is IVPR0:47 || IVORi48:59 || 0b0000 572 Power ISATM -- Book III-E Version 2.04 Programming Note For instruction-caused interrupts, in some cases it may system supports, or by an instruction that is in be desirable for the operating system to emulate the a category that the implementation does not instruction that caused the interrupt, while in other support but is used by some programs that the cases it may be desirable for the operating system not operating system supports. to emulate the instruction. The following list, while not In general, the instruction should not be emulated if: complete, illustrates criteria by which decisions regard- ing emulation should be made. The list applies to gen- - The purpose of the instruction is to cause an eral execution environments; it does not necessarily interrupt. Example: System Call interrupt apply to special environments such as program debug- caused by sc. ging, processor bring-up, etc. - The interrupt is caused by a condition that is In general, the instruction should be emulated if: stated, in the instruction description, poten- tially to cause the interrupt. Example: Align- - The interrupt is caused by a condition for ment interrupt caused by lwarx for which the which the instruction description (including storage operand is not aligned. related material such as the introduction to the section describing the instruction) implies that - The program is attempting to perform a func- the instruction works correctly. Example: tion that it should not be permitted to perform. Alignment interrupt caused by lmw for which Example: Data Storage interrupt caused by the storage operand is not aligned, or by dcbz lwz for which the storage operand is in stor- or dcbzep for which the storage operand is in age that the program should not be permitted storage that is Write Through Required or to access. (If the function is one that the pro- Caching Inhibited. gram should be permitted to perform, the con- ditions that caused the interrupt should be - The instruction is an illegal instruction that corrected and the program re-dispatched such should appear, to the program executing it, as that the instruction will be re-executed. Exam- if it were supported by the implementation. ple: Data Storage interrupt caused by lwz for Example: Illegal Instruction type Program which the storage operand is in storage that interrupt caused by an instruction that has the program should be permitted to access been phased out of the architecture but is still but for which there currently is no TLB entry.) used by some programs that the operating Chapter 5. Interrupts and Exceptions 573 Version 2.04 5.6 Interrupt Definitions Table 15 provides a summary of each interrupt type, interrupt type and which Interrupt Vector Offset Regis- the various exception types that may cause that inter- ter is used to specify that interrupt type's vector rupt type, the classification of the interrupt, which ESR address. bits can be set, if any, which MSR bits can mask the Synchronous, Imprecise (Section 1.3.5 of Book I) Synchronous, Precise DBCR0/TCR Mask Bit Notes (see page 575) MSR Mask Bit(s) Asynchronous Category Critical ESR Page IVOR Interrupt Exception (See Note 5) IVOR0 Critical Input Critical Input x x CE E 1 576 IVOR1 Machine Check Machine Check ME E 2,4 576 IVOR2 Data Storage Access x [ST],[FP,AP,SPV] E 9 577 [VLEMI], [EPID] Load and Reserve or Store x [ST], [VLEMI] E 9 Conditional to `write-thru required' storage (W=1) Cache Locking x {DLK0,DLK1},[ST] E 8 [VLEMI] Byte Ordering x BO, [ST], E [FP,AP,SPV], [VLEMI], [EPID] IVOR3 Inst Storage Access x E 578 Byte Ordering x BO, [VLEMI] E Mismatched Instruction x BO, VLEMI EE E, 1 Storage (See Book VLE.)) VLE Misaligned Instruction x MIF EE E, 1 Storage (See Book VLE.) VLE IVOR4 External Input External Input x EE E 1 578 IVOR5 Alignment Alignment x [ST],[FP,AP,SPV] E 579 [EPID],[VLEMI] IVOR6 Program Illegal x PIL, [VLEMI] E 580 Privileged x PPR,[AP], E [VLEMI] Trap x PTR,[VLEMI] E FP Enabled x x FP, [PIE] FE0, E 6,7 FE1 AP Enabled x x AP E Unimplemented Op x PUO, [VLEMI] E 7 [FP,AP,SPV] IVOR7 FP Unavailable FP Unavailable x E 581 IVOR8 System Call System Call x [VLEMI] E 581 IVOR9 AP Unavailable AP Unavailable x E 581 IVOR10 Decrementer x EE DIE E 582 IVOR11 FIT x EE FIE E 582 IVOR12 Watchdog x x CE WIE E 10 582 IVOR13 Data TLB Error Data TLB Miss x [ST],[FP,AP,SPV] E 583 [VLEMI],[EPID] IVOR14 Inst TLB Error Inst TLB Miss x [MIF] E 583 574 Power ISATM -- Book III-E Version 2.04 Synchronous, Imprecise (Section 1.3.5 of Book I) Synchronous, Precise DBCR0/TCR Mask Bit Notes (see page 575) MSR Mask Bit(s) Asynchronous Category Critical ESR Page IVOR Interrupt Exception (See Note 5) IVOR15 Debug Trap x x DE IDM E 10 584 Inst Addr Compare x x DE IDM E 10 Data Addr Compare x x DE IDM E 10 Instruction Complete x x DE IDM E 3,10 Branch Taken x x DE IDM E 3,10 Return From Interrupt x x DE IDM E 10 Interrupt Taken x x DE IDM E 10 Uncond Debug Event x x DE IDM E.ED 10 Critical Interrupt Taken x DE IDM E.ED Critical Interrupt Return x DE IDM E.ED IVOR32 SPE/Embedded SPE Unavailable x SPV, [VLEMI] SPE 585 Floating-Point/Vector Unavailable Vector Unavailable SPV V IVOR33 Embedded Floating- Embedded Floating-Point x SPV, [VLEMI] SP.F* 586 Point Data Data IVOR34 Embedded Floating- Embedded Floating-Point x SPV, [VLEMI] SP.F* 586 Point Round Round IVOR35 Embedded Perfor- Embedded Performance x E.PM mance Monitor Monitor IVOR36 Processor Doorbell Processor Doorbell x EE E.PC IVOR37 Processor Critical Processor Critical Doorbell x x CE E.PC Doorbell Figure 15. Interrupt and Exception Types Figure 15 Notes 4. Machine Check status information is commonly provided as part of the system implementation, but 1. Although it is not specified, it is common for sys- is implementation-dependent. tem implementations to provide, as part of the interrupt controller, independent mask and status 5. In general, when an interrupt causes a particular bits for the various sources of Critical Input and ESR bit or bits to be set (or cleared) as indicated in External Input interrupts. the table, it also causes all other ESR bits to be cleared. There may be special rules regarding the 2. Machine Check interrupts are a special case and handling of implementation-specific ESR bits. are not classified as asynchronous nor synchro- nous. See Section 5.4.4 on page 571. Legend: 3. The Instruction Complete and Branch Taken debug [xxx] means ESRxxx could be set events are only defined for MSRDE=1 when in [xxx,yyy] means either ESRxxx or ESRyyy Internal Debug Mode (DBCR0IDM=1). In other may be set, but never both words, when in Internal Debug Mode with MSRDE=0, then Instruction Complete and Branch (xxx,yyy) means either ESRxxx or ESRyyy Taken debug events cannot occur, and no DBSR will be set, but never both status bits are set and no subsequent imprecise {xxx,yyy} means either ESRxxx or ESRyyy will Debug interrupt will occur (see Section 8.4 on be set, or possibly both page 606). xxx means ESRxxx is set Chapter 5. Interrupts and Exceptions 575 Version 2.04 6. The precision of the Floating-point Enabled Excep- All other defined MSR bits set to 0. tion type Program interrupt is controlled by the Instruction execution resumes at address IVPR0:47 || MSRFE0,FE1 bits. When MSRFE0,FE1=0b01 or IVOR048:59||0b0000. 0b10, the interrupt may be imprecise. When such a Program interrupt is taken, if the address saved in SRR0 is not the address of the instruction that Programming Note caused the exception (i.e. the instruction that Software is responsible for taking any action(s) that caused FPSCRFEX to be set to 1), ESRPIE is set to are required by the implementation in order to clear 1. When MSRFE0,FE1=0b11, the interrupt is pre- any Critical Input exception status prior to re- cise. When MSRFE0,FE1=0b00, the interrupt is enabling MSRCE in order to avoid another, redun- masked, and the interrupt will subsequently occur dant Critical Input interrupt. imprecisely if and when Floating-point Enabled Exception type Program interrupts are enabled by setting either or both of MSRFE0,FE1, and will also 5.6.2 Machine Check Interrupt cause ESRPIE to be set to 1. See Section 5.6.7. Also, exception status on the exact cause is avail- A Machine Check interrupt occurs when no higher pri- able in the Floating-Point Status and Control Reg- ority exception exists (see Section 5.9 on page 591), a ister (see Section 4.2.2 and Section 4.4 of Book I). Machine Check exception is presented to the interrupt mechanism, and MSRME=1. The specific cause or The precision of the Auxiliary Processor Enabled causes of Machine Check exceptions are implementa- Exception type Program interrupt is implementa- tion-dependent, as are the details of the actions taken tion-dependent. on a Machine Check interrupt. 7. Auxiliary Processor exception status is commonly If the Machine Check Extension is implemented, provided as part of the implementation. MCSRR0, MCSRR1, and MCSR are set, otherwise 8. Cache locking and cache locking exceptions are CSRR0, CSRR1, and ESR are set. The registers are implementation-dependent. updated as follows: 9. Software must examine the instruction and the CSRR0/MCSRR0 subject TLB entry to determine the exact cause of Set to an instruction address. As closely as the interrupt. possible, set to the effective address of an instruction that was executing or about to 10. If the Embedded.Enhanced Debug category is be executed when the Machine Check enabled, this interrupt is not a critical interrupt. exception occurred. DSRR0 and DSRR1 are used instead of CSRR0 and CSRR1. CSRR1/MCSRR1 Set to the contents of the MSR at the time of the interrupt. 5.6.1 Critical Input Interrupt MSR A Critical Input interrupt occurs when no higher priority CM MSRCM is set to MSRICM. exception exists (see Section 5.9 on page 591), a Criti- DE Unchanged if category E.ED is supported; cal Input exception is presented to the interrupt mecha- otherwise set to 0. nism, and MSRCE=1. While the specific definition of a All other defined MSR bits set to 0. Critical Input exception is implementation-dependent, it would typically be caused by the activation of an asyn- ESR/MCSR chronous signal that is part of the system. Also, imple- Implementation-dependent. mentations may provide an alternative means (in Instruction execution resumes at address IVPR0:47 || addition to MSRCE) for masking the Critical Input inter- IVOR148:59||0b0000. rupt. CSRR0, CSRR1, and MSR are updated as follows: Programming Note CSRR0 Set to the effective address of the next If a Machine Check interrupt is caused by an error instruction to be executed. in the storage subsystem, the storage subsystem may return incorrect data, that may be placed into CSRR1 Set to the contents of the MSR at the time registers and/or on-chip caches. of the interrupt. MSR CM MSRCM is set to MSRICM. ME, ICM Unchanged. DE Unchanged if category E.ED is supported; otherwise set to 0 576 Power ISATM -- Book III-E Version 2.04 Cache Locking exception Programming Note On implementations on which a Machine Check A Cache Locking exception may occur when the locked interrupt can be caused by referring to an invalid state of one or more cache lines has the potential to be real address, executing a dcbz, dcbzep, or dcba altered. This exception is implementation-dependent. instruction can cause a delayed Machine Check Storage Synchronization exception interrupt by establishing in the data cache a block that is associated with an invalid real address. See A Storage Synchronization exception will occur when Section 3.2 of Book II. A Machine Check interrupt an attempt is made to execute a Load and Reserve or can eventually occur if and when a subsequent Store Conditional instruction from or to a location that is attempt is made to write that block to main storage, Write Through Required or Caching Inhibited (if the for example as the result of executing an instruc- interrupt does not occur then the instruction executes tion that causes a cache miss for which the block is correctly: see Section 3.3.2 of Book II). the target for replacement or as the result of exe- cuting a dcbst, dcbstep, dcbf, or dcbfep instruc- If a stwcx. or stdcx. would not perform its store in the tion. absence of a Data Storage interrupt, and either (a) the specified effective address refers to storage that is Write Through Required or Caching Inhibited, or (b) a non-conditional Store to the specified effective address 5.6.3 Data Storage Interrupt would cause a Data Storage interrupt, it is implementa- A Data Storage interrupt may occur when no higher pri- tion-dependent whether a Data Storage interrupt ority exception exists (see Section 5.9 on page 591) occurs. and a Data Storage exception is presented to the inter- Instructions lswx or stswx with a length of zero, icbt, rupt mechanism. A Data Storage exception is caused dcbt, dcbtep, dcbtst, dcbtstep, or dcba cannot cause when any of the following exceptions arises during exe- a Data Storage interrupt, regardless of the effective cution of an instruction: address. Read Access Control exception Programming Note A Read Access Control exception is caused when one The icbi, icbiep, and icbt instructions are treated of the following conditions exist. as Loads from the addressed byte with respect to 1 While in user mode (MSRPR=1), a Load or `load- address translation and protection. These Instruc- class' Cache Management instruction attempts to tion Cache Management instructions use MSRDS, access a location in storage that is not user mode not MSRIS, to determine translation for their oper- read enabled (i.e. page access control bit UR=0). ands. Instruction Storage exceptions and Instruc- 1 While in supervisor mode (MSRPR=0), a Load or tion TLB Miss exceptions are associated with the `load-class' Cache Management instruction `fetching' of instructions not with the `execution' of attempts to access a location in storage that is not instructions. Data Storage exceptions and Data supervisor mode read enabled (i.e. page access TLB Miss exceptions are associated with the `exe- control bit SR=0). cution' of Instruction Cache Management instruc- tions. Write Access Control exception When a Data Storage interrupt occurs, the processor A Write Access Control exception is caused when one suppresses the execution of the instruction causing the of the following conditions exist. Data Storage exception. 1 While in user mode (MSRPR=1), a Store or `store- SRR0, SRR1, MSR, DEAR, and ESR are updated as class' Cache Management instruction attempts to follows: access a location in storage that is not user mode write enabled (i.e. page access control bit UW=0). SRR0 Set to the effective address of the instruc- 1 While in supervisor mode (MSRPR=0), a Store or tion causing the Data Storage interrupt. `store-class' Cache Management instruction SRR1 Set to the contents of the MSR at the time attempts to access a location in storage that is not of the interrupt. supervisor mode write enabled (i.e. page access control bit SW=0). MSR Byte Ordering exception CM MSRCM is set to MSRICM. CE, ME, A Byte Ordering exception may occur when the imple- DE, ICM Unchanged. mentation cannot perform the data storage access in All other defined MSR bits set to 0. the byte order specified by the Endian storage attribute of the page being accessed. Chapter 5. Interrupts and Exceptions 577 Version 2.04 DEAR Set to the effective address of a byte that is is not user mode execute enabled (i.e. page both within the range of the bytes being access control bit UX=0). accessed by the Storage Access or Cache 1 While in supervisor mode (MSRPR=0), an instruc- Management instruction, and within the tion fetch attempts to access a location in storage page whose access caused the Data Stor- that is not supervisor mode execute enabled (i.e. age exception. page access control bit SX=0). ESR Byte Ordering exception FP Set to 1 if the instruction causing the inter- A Byte Ordering exception may occur when the imple- rupt is a floating-point load or store; other- mentation cannot perform the instruction fetch in the wise set to 0. byte order specified by the Endian storage attribute of ST Set to 1 if the instruction causing the inter- the page being accessed. rupt is a Store or `store-class' Cache Man- agement instruction; otherwise set to 0. When an Instruction Storage interrupt occurs, the pro- DLK0:1 Set to an implementation-dependent value cessor suppresses the execution of the instruction due to a Cache Locking exception causing causing the Instruction Storage exception. the interrupt. SRR0, SRR1, MSR, and ESR are updated as follows: AP Set to 1 if the instruction causing the inter- rupt is an Auxiliary Processor load or store; SRR0 Set to the effective address of the instruc- otherwise set to 0. tion causing the Instruction Storage inter- BO Set to 1 if the instruction caused a Byte rupt. Ordering exception; otherwise set to 0. SRR1 Set to the contents of the MSR at the time SPV Set to 1 if the instruction causing the inter- of the interrupt. rupt is a SPE operation or a Vector opera- tion; otherwise set to 0. MSR VLEMI Set to 1 if the instruction causing the inter- CM MSRCM is set to MSRICM. rupt resides in VLE storage. CE, ME, EPID Set to 1 if the instruction causing the inter- DE, ICM Unchanged. rupt is an External Process ID instruction; otherwise set to 0. All other defined MSR bits set to 0. All other defined ESR bits are set to 0. ESR BO Set to 1 if the instruction fetch caused a Programming Note Byte Ordering exception; otherwise set to Read and Write Access Control and Byte Ordering 0. exceptions are not mutually exclusive. Even if VLEMI Set to 1 if the instruction causing the inter- ESRBO is set, system software must also examine rupt resides in VLE storage. the TLB entry accessed by the data storage access All other defined ESR bits are set to 0. to determine whether or not a Read Access Control or Write Access Control exception may have also occurred. Programming Note Execute Access Control and Byte Ordering excep- Instruction execution resumes at address IVPR0:47 || tions are not mutually exclusive. Even if ESRBO is IVOR248:59||0b0000. set, system software must also examine the TLB entry accessed by the instruction fetch to deter- mine whether or not an Execute Access Control 5.6.4 Instruction Storage Interrupt exception may have also occurred. An Instruction Storage interrupt occurs when no higher Instruction execution resumes at address IVPR0:47 || priority exception exists (see Section 5.9 on page 591) IVOR348:59||0b0000. and an Instruction Storage exception is presented to the interrupt mechanism. An Instruction Storage excep- tion is caused when any of the following exceptions 5.6.5 External Input Interrupt arises during execution of an instruction: An External Input interrupt occurs when no higher pri- Execute Access Control exception ority exception exists (see Section 5.9 on page 591), an An Execute Access Control exception is caused when External Input exception is presented to the interrupt one of the following conditions exist. mechanism, and MSREE=1. While the specific defini- tion of an External Input exception is implementation- 1 While in user mode (MSRPR=1), an instruction dependent, it would typically be caused by the activa- fetch attempts to access a location in storage that tion of an asynchronous signal that is part of the pro- 578 Power ISATM -- Book III-E Version 2.04 cessing system. Also, implementations may provide an execution means setting each byte of the block in main alternative means (in addition to MSREE) for masking storage to 0x00.) the External Input interrupt. Programming Note SRR0, SRR1, and MSR are updated as follows: The architecture does not support the use of an SRR0 Set to the effective address of the next unaligned effective address by Load and Reserve instruction to be executed. and Store Conditional instructions. If an Alignment SRR1 Set to the contents of the MSR at the time interrupt occurs because one of these instructions of the interrupt. specifies an unaligned effective address, the Align- ment interrupt handler must not attempt to emulate MSR the instruction, but instead should treat the instruc- CM MSRCM is set to MSRICM. tion as a programming error. CE, ME, DE, ICM Unchanged. When an Alignment interrupt occurs, the processor suppresses the execution of the instruction causing the All other defined MSR bits set to 0. Alignment exception. Instruction execution resumes at address IVPR0:47 || SRR0, SRR1, MSR, DEAR, and ESR are updated as IVOR448:59||0b0000. follows: Programming Note SRR0 Set to the effective address of the instruc- Software is responsible for taking whatever tion causing the Alignment interrupt. action(s) are required by the implementation in SRR1 Set to the contents of the MSR at the time order to clear any External Input exception status of the interrupt. prior to re-enabling MSREE in order to avoid another, redundant External Input interrupt. MSR CM MSRCM is set to MSRICM. CE, ME, 5.6.6 Alignment Interrupt DE, ICM Unchanged. An Alignment interrupt occurs when no higher priority All other defined MSR bits set to 0. exception exists (see Section 5.9 on page 591) and an DEAR Set to the effective address of a byte that is Alignment exception is presented to the interrupt mech- both within the range of the bytes being anism. An Alignment exception may be caused when accessed by the Storage Access or Cache the implementation cannot perform a data storage Management instruction, and within the access for one of the following reasons: page whose access caused the Alignment exception. 1 The operand of a Load or Store is not aligned. 1 The instruction is a Move Assist, Load Multiple or ESR Store Multiple. FP Set to 1 if the instruction causing the inter- 1 The operand of dcbz or dcbzep is in storage that rupt is a floating-point load or store; other- is Write Through Required or Caching Inhibited, or wise set to 0. one of these instructions is executed in an imple- ST Set to 1 if the instruction causing the inter- mentation that has either no data cache or a Write rupt is a Store; otherwise set to 0. Through data cache or the line addressed by the AP Set to 1 if the instruction causing the inter- instruction cannot be established in the cache rupt is an Auxiliary Processor load or store; because the cache is disabled or locked. otherwise set to 0. 1 The operand of a Store, except Store Conditional, SPV Set to 1 if the instruction causing the inter- is in storage that is Write-Through Required. rupt is a SPE operation or a Vector opera- For lmw and stmw with an operand that is not word- tion; otherwise set to 0. aligned, and for Load and Reserve and Store Condi- VLEMI Set to 1 if the instruction causing the inter- tional instructions with an operand that is not aligned, rupt resides in VLE storage. an implementation may yield boundedly undefined EPID Set to 1 if the instruction causing the inter- results instead of causing an Alignment interrupt. A rupt is an External Process ID instruction; Store Conditional to Write Through Required storage otherwise set to 0. may either cause a Data Storage interrupt, cause an All other defined ESR bits are set to 0. Alignment interrupt, or correctly execute the instruction. For all other cases listed above, an implementation Instruction execution resumes at address IVPR0:47 || may execute the instruction correctly instead of causing IVOR548:59||0b0000. an Alignment interrupt. (For dcbz or dcbzep, `correct' Chapter 5. Interrupts and Exceptions 579 Version 2.04 5.6.7 Program Interrupt Trap exception A Trap exception occurs when any of the conditions A Program interrupt occurs when no higher priority specified in a Trap instruction are met and the excep- exception exists (see Section 5.9 on page 591), a Pro- tion is not also enabled as a Debug interrupt. If enabled gram exception is presented to the interrupt mecha- as a Debug interrupt (i.e. DBCR0TRAP=1, nism, and, for Floating-point Enabled exception, DBCR0IDM=1, and MSRDE=1), then a Debug interrupt MSRFE0,FE1 are non-zero. A Program exception is will be taken instead of the Program interrupt. caused when any of the following exceptions arises during execution of an instruction: Unimplemented Operation exception Floating-point Enabled exception An Unimplemented Operation exception may occur when execution is attempted of a defined instruction A Floating-point Enabled exception is caused when that is not implemented by the implementation. Other- FPSCRFEX is set to 1 by the execution of a floating- wise an Illegal Instruction exception occurs. point instruction that causes an enabled exception, including the case of a Move To FPSCR instruction that An Unimplemented Operation exception may also causes an exception bit and the corresponding enable occur when the processor is in 32-bit mode and execu- bit both to be 1. Note that in this context, the term tion is attempted of an instruction that is part of the 64- `enabled exception' refers to the enabling provided by Bit category. Otherwise the instruction executes nor- control bits in the Floating-Point Status and Control mally. Register. See Section 4.2.2 of Book I. SRR0, SRR1, MSR, and ESR are updated as follows: Auxiliary Processor Enabled exception SRR0 For all Program interrupts except an The cause of an Auxiliary Processor Enabled exception Enabled exception when in one of the is implementation-dependent. imprecise modes (see Section 2.2.1 on page 513) or when a disabled exception is Illegal Instruction exception subsequently enabled, set to the effective address of the instruction that caused the An Illegal Instruction exception does occur when exe- Program interrupt. cution is attempted of any of the following kinds of instructions. For an imprecise Enabled exception, set to the effective address of the excepting 1 a reserved-illegal instruction instruction or to the effective address of 1 when MSRPR=1 (user mode), an mtspr or mfspr some subsequent instruction. If it points to that specifies an SPRN value with SPRN5=0 (user- a subsequent instruction, that instruction mode accessible) that represents an unimple- has not been executed, and ESRPIE is set mented Special Purpose Register to 1. If a subsequent instruction is an An Illegal Instruction exception may occur when execu- msync or isync, SRR0 will point at the tion is attempted of any of the following kinds of instruc- msync or isync instruction, or at the follow- tions. If the exception does not occur, the alternative is ing instruction. shown in parentheses. If FPSCRFEX=1 but both MSRFE0=0 and 1 an instruction that is in invalid form (boundedly MSRFE1=0, an Enabled exception type Pro- undefined results) gram interrupt will occur imprecisely prior to 1 an lswx instruction for which register RA or regis- or at the next synchronizing event if these ter RB is in the range of registers to be loaded MSR bits are altered by any instruction that (boundedly undefined results) can set the MSR so that the expression 1 a reserved-no-op instruction (no-operation per- (MSRFE0 | MSRFE1) & FPSCRFEX formed is preferred) 1 a defined instruction that is not implemented by the is 1. When this occurs, SRR0 is loaded with implementation (Unimplemented Operation excep- the address of the instruction that would tion) have executed next, not with the address of the instruction that modified the MSR caus- Privileged Instruction exception ing the interrupt, and ESRPIE is set to 1. A Privileged Instruction exception occurs when SRR1 Set to the contents of the MSR at the time MSRPR=1 and execution is attempted of any of the fol- of the interrupt. lowing kinds of instructions. MSR 1 a privileged instruction CM MSRCM is set to MSRICM. 1 an mtspr or mfspr instruction that specifies an CE, ME, SPRN value with SPRN5=1 DE, ICM Unchanged. 580 Power ISATM -- Book III-E Version 2.04 All other defined MSR bits set to 0. Instruction execution resumes at address IVPR0:47 || IVOR748:59||0b0000. ESR PIL Set to 1 if an Illegal Instruction exception type Program interrupt; otherwise set to 0 5.6.9 System Call Interrupt PPR Set to 1 if a Privileged Instruction exception A System Call interrupt occurs when no higher priority type Program interrupt; otherwise set to 0 exception exists (see Section 5.9 on page 591) and a PTR Set to 1 if a Trap exception type Program System Call (sc) instruction is executed. interrupt; otherwise set to 0 PUO Set to 1 if an Unimplemented Operation SRR0, SRR1, and MSR are updated as follows: exception type Program interrupt; other- SRR0 Set to the effective address of the instruc- wise set to 0 tion after the sc instruction. FP Set to 1 if the instruction causing the inter- rupt is a floating-point instruction; otherwise SRR1 Set to the contents of the MSR at the time set to 0. of the interrupt. PIE Set to 1 if a Floating-point Enabled excep- tion type Program interrupt, and the MSR address saved in SRR0 is not the address CM MSRCM is set to MSRICM. of the instruction causing the exception (i.e. VLEMI Set to 1 if the instruction causing the inter- the instruction that caused FPSCRFEX to rupt resides in VLE storage. be set); otherwise set to 0. CE, ME, AP Set to 1 if the instruction causing the inter- DE, ICM Unchanged. rupt is an Auxiliary Processor instruction; All other defined MSR bits set to 0. otherwise set to 0. SPV Set to 1 if the instruction causing the inter- Instruction execution resumes at address IVPR0:47 || rupt is a SPE operation or a Vector opera- IVOR848:59||0b0000. tion; otherwise set to 0. VLEMI Set to 1 if the instruction causing the inter- rupt resides in VLE storage. 5.6.10 Auxiliary Processor All other defined ESR bits are set to 0. Unavailable Interrupt Instruction execution resumes at address IVPR0:47 || An Auxiliary Processor Unavailable interrupt occurs IVOR648:59||0b0000. when no higher priority exception exists (see Section 5.9 on page 591), an attempt is made to exe- cute an Auxiliary Processor instruction (including Auxil- 5.6.8 Floating-Point Unavailable iary Processor loads, stores, and moves), the target Auxiliary Processor is present on the implementation, Interrupt and the Auxiliary Processor is configured as unavail- A Floating-Point Unavailable interrupt occurs when no able. Details of the Auxiliary Processor, its instruction higher priority exception exists (see Section 5.9 on set, and its configuration are implementation-depen- page 591), an attempt is made to execute a floating- dent. See User's Manual for the implementation. point instruction (i.e. any instruction listed in When an Auxiliary Processor Unavailable interrupt Section 4.6 of Book I), and MSRFP=0. occurs, the processor suppresses the execution of the When a Floating-Point Unavailable interrupt occurs, the instruction causing the Auxiliary Processor Unavailable processor suppresses the execution of the instruction interrupt. causing the Floating-Point Unavailable interrupt. Registers SRR0, SRR1, and MSR are updated as fol- SRR0, SRR1, and MSR are updated as follows: lows: SRR0 Set to the effective address of the instruc- SRR0 Set to the effective address of the instruc- tion that caused the interrupt. tion that caused the interrupt. SRR1 Set to the contents of the MSR at the time SRR1 Set to the contents of the MSR at the time of the interrupt. of the interrupt. MSR MSR CM MSRCM is set to MSRICM. CM MSRCM is set to MSRICM. CE, ME, CE, ME, DE, ICM Unchanged. DE, ICM Unchanged. All other defined MSR bits set to 0. All other defined MSR bits set to 0. Chapter 5. Interrupts and Exceptions 581 Version 2.04 Instruction execution resumes at address IVPR0:47 || Programming Note IVOR948:59||0b0000. MSREE also enables the External Input and Decre- menter interrupts. 5.6.11 Decrementer Interrupt SRR0, SRR1, MSR, and TSR are updated as follows: A Decrementer interrupt occurs when no higher priority exception exists (see Section 5.9 on page 591), a Dec- SRR0 Set to the effective address of the next rementer exception exists (TSRDIS=1), and the inter- instruction to be executed. rupt is enabled (TCRDIE=1 and MSREE=1). See SRR1 Set to the contents of the MSR at the time Section 7.3 on page 599. of the interrupt. Programming Note MSR MSREE also enables the External Input and Fixed- CM MSRCM is set to MSRICM. Interval Timer interrupts. CE, ME, DE, ICM Unchanged. SRR0, SRR1, MSR, and TSR are updated as follows: All other defined MSR bits set to 0. SRR0 Set to the effective address of the next instruction to be executed. TSR (See Section 7.5.1 on page 601.) SRR1 Set to the contents of the MSR at the time FIS Set to 1 of the interrupt. Instruction execution resumes at address IVPR0:47 || IVOR1148:59||0b0000. MSR CM MSRCM is set to MSRICM. Programming Note CE, ME, Software is responsible for clearing the Fixed-Inter- DE, ICM Unchanged. val Timer exception status prior to re-enabling the All other defined MSR bits set to 0. MSREE bit in order to avoid another redundant Fixed-Interval Timer interrupt. To clear the Fixed- TSR (See Section 7.5.1 on page 601.) Interval Timer exception, the interrupt handling rou- DIS Set to 1. tine must clear TSRFIS. Clearing is done by writing a word to TSR using mtspr with a 1 in any bit posi- Instruction execution resumes at address IVPR0:47 || tion that is to be cleared and 0 in all other bit posi- IVOR1048:59||0b0000. tions. The write-data to the TSR is not direct data, but a mask. A 1 causes the bit to be cleared, and a Programming Note 0 has no effect. Software is responsible for clearing the Decre- menter exception status prior to re-enabling the MSREE bit in order to avoid another redundant 5.6.13 Watchdog Timer Interrupt Decrementer interrupt. To clear the Decrementer exception, the interrupt handling routine must clear A Watchdog Timer interrupt occurs when no higher pri- TSRDIS. Clearing is done by writing a word to TSR ority exception exists (see Section 5.9 on page 591), a using mtspr with a 1 in any bit position that is to be Watchdog Timer exception exists (TSRWIS=1), and the cleared and 0 in all other bit positions. The write- interrupt is enabled (i.e. TCRWIE=1 and MSRCE=1). data to the TSR is not direct data, but a mask. A 1 See Section 7.7 on page 602. causes the bit to be cleared, and a 0 has no effect. Programming Note MSRCE also enables the Critical Input interrupt. 5.6.12 Fixed-Interval Timer Inter- rupt CSRR0, CSRR1, MSR, and TSR are updated as fol- lows: A Fixed-Interval Timer interrupt occurs when no higher CSRR0 Set to the effective address of the next priority exception exists (see Section 5.9 on page 591), instruction to be executed. a Fixed-Interval Timer exception exists (TSRFIS=1), and the interrupt is enabled (TCRFIE=1 and MSREE=1). CSRR1 Set to the contents of the MSR at the time See Section 7.6 on page 602. of the interrupt. MSR CM MSRCM is set to MSRICM. ME, ICM, 582 Power ISATM -- Book III-E Version 2.04 DE Unchanged. Management instruction, and within the page whose access caused the Data TLB All other defined MSR bits set to 0. Error exception. TSR (See Section 7.5.1 on page 601.) WIS Set to 1. ESR ST Set to 1 if the instruction causing the inter- Instruction execution resumes at address IVPR0:47 || rupt is a Store, dcbi, dcbz, or dcbzep IVOR1248:59||0b0000. instruction; otherwise set to 0. FP Set to 1 if the instruction causing the inter- Programming Note rupt is a floating-point load or store; other- Software is responsible for clearing the Watchdog wise set to 0. Timer exception status prior to re-enabling the AP Set to 1 if the instruction causing the inter- MSRCE bit in order to avoid another redundant rupt is an Auxiliary Processor load or store; Watchdog Timer interrupt. To clear the Watchdog otherwise set to 0. Timer exception, the interrupt handling routine SPV Set to 1 if the instruction causing the inter- must clear TSRWIS. Clearing is done by writing a rupt is a SPE operation or a Vector opera- word to TSR using mtspr with a 1 in any bit posi- tion; otherwise set to 0. tion that is to be cleared and 0 in all other bit posi- VLEMI Set to 1 if the instruction causing the inter- tions. The write-data to the TSR is not direct data, rupt resides in VLE storage. but a mask. A 1 causes the bit to be cleared, and a EPID Set to 1 if the instruction causing the inter- 0 has no effect. rupt is an External Process ID instruction; otherwise set to 0. All other defined ESR bits are set to 0. 5.6.14 Data TLB Error Interrupt Instruction execution resumes at address IVPR0:47 || A Data TLB Error interrupt occurs when no higher prior- IVOR1348:59||0b0000. ity exception exists (see Section 5.9 on page 591) and any of the following Data TLB Error exceptions is pre- sented to the interrupt mechanism. 5.6.15 Instruction TLB Error Inter- TLB Miss exception rupt Caused when the virtual address associated with a An Instruction TLB Error interrupt occurs when no data storage access does not match any valid entry in higher priority exception exists (see Section 5.9 on the TLB as specified in Section 4.7.2 on page 545. page 591) and any of the following Instruction TLB Error exceptions is presented to the interrupt mecha- If a stwcx. or stdcx. would not perform its store in the nism. absence of a Data Storage interrupt, and a non-condi- tional Store to the specified effective address would TLB Miss exception cause a Data Storage interrupt, it is implementation- dependent whether a Data Storage interrupt occurs. Caused when the virtual address associated with an instruction fetch does not match any valid entry in the When a Data TLB Error interrupt occurs, the processor TLB as specified in Section 4.7.2 on page 545. suppresses the execution of the instruction causing the Data TLB Error interrupt. When an Instruction TLB Error interrupt occurs, the processor suppresses the execution of the instruction SRR0, SRR1, MSR, DEAR and ESR are updated as causing the Instruction TLB Miss exception. follows: SRR0, SRR1, and MSR are updated as follows: SRR0 Set to the effective address of the instruc- tion causing the Data TLB Error interrupt SRR0 Set to the effective address of the instruc- tion causing the Instruction TLB Error inter- SRR1 Set to the contents of the MSR at the time rupt. of the interrupt. SRR1 Set to the contents of the MSR at the time MSR of the interrupt. CM MSRCM is set to MSRICM. CE, ME, DE, ICM Unchanged. MSR CM MSRCM is set to MSRICM. All other defined MSR bits set to 0. CE, ME, DEAR Set to the effective address of a byte that is DE, ICM Unchanged. both within the range of the bytes being All other defined MSR bits set to 0. accessed by the Storage Access or Cache Chapter 5. Interrupts and Exceptions 583 Version 2.04 Instruction execution resumes at address IVPR0:47 || 5.6.16 Debug Interrupt IVOR1448:59||0b0000. A Debug interrupt occurs when no higher priority exception exists (see Section 5.9 on page 591), a Debug exception exists in the DBSR, and Debug inter- rupts are enabled (DBCR0IDM=1 and MSRDE=1). A Debug exception occurs when a Debug Event causes a corresponding bit in the DBSR to be set. See Section 8.5. If the Embedded.Enhanced Debug category is not sup- ported or is supported and is not enabled, CSRR0, CSRR1, MSR, and DBSR are updated as follows. If the Embedded.Enhanced Debug category is supported and is enabled, DSRR0 and DSRR1 are updated as specified below and CSRR0 and CSRR1 are not changed. The means by which the Embed- ded.Enhanced Debug category is enabled is implemen- tation-dependent. CSRR0 or DSRR0 [Category: Embedded.Enhanced Debug] For Debug exceptions that occur while Debug interrupts are enabled (DBCR0IDM=1 and MSRDE=1), CSRR0 is set as follows: 1 For Instruction Address Compare (IAC1, IAC2, IAC3, IAC4), Data Address Compare (DAC1R, DAC1W, DAC2R, DAC2W), Data Value Com- pare (DVC1, DVC2), Trap (TRAP), or Branch Taken (BRT) debug excep- tions, set to the address of the instruc- tion causing the Debug interrupt. 1 For Instruction Complete (ICMP) debug exceptions, set to the address of the instruction that would have exe- cuted after the one that caused the Debug interrupt. 1 For Unconditional Debug Event (UDE) debug exceptions, set to the address of the instruction that would have exe- cuted next if the Debug interrupt had not occurred. 1 For Interrupt Taken (IRPT) debug exceptions, set to the interrupt vector value of the interrupt that caused the Interrupt Taken debug event. 1 For Return From Interrupt (RET) debug exceptions, set to the address of the rfi instruction that caused the Debug interrupt. 1 For Critical Interrupt Taken (CRPT) debug exceptions, DSRR0 is set to the address of the first instruction of the critical interrupt handler. CSRR0 is unaffected. 1 For Critical Interrupt Return (CRET) debug exceptions, DSRR0 is set to the address of the rfci instruction that 584 Power ISATM -- Book III-E Version 2.04 caused the Debug interrupt. See 5.6.17 SPE/Embedded Floating- Section 8.4.10, "Critical Interrupt Return Debug Event [Category: Point/Vector Unavailable Interrupt Embedded.Enhanced Debug]". [Categories: SPE.Embedded Float For Debug exceptions that occur while Scalar Double, SPE.Embedded Debug interrupts are disabled (DBCR0IDM=0 or MSRDE=0), a Debug Float Vector, Vector] interrupt will occur at the next synchroniz- The SPE/Embedded Floating-Point/Vector Unavailable ing event if DBCR0IDM and MSRDE are interrupt occurs when no higher priority exception modified such that they are both 1 and if the exists, and an attempt is made to execute an SPE, Debug exception Status is still set in the SPE.Embedded Float Scalar Double, SPE.Embedded DBSR. When this occurs, CSRR0 or Float Vector, or Vector instruction and MSRSPV = 0. DSRR0 [Category:Embedded.Enhanced Debug] is set to the address of the instruc- When an Embedded Floating-Point Unavailable inter- tion that would have executed next, not with rupt occurs, the processor suppresses the execution of the address of the instruction that modified the instruction causing the exception. the Debug Control Register 0 or MSR and SRR0, SRR1, MSR, and ESR are updated as follows: thus caused the interrupt. SRR0 Set to the effective address of the instruc- CSRR1 or DSRR1 [Category: Embedded.Enhanced tion causing the Embedded Floating-Point Debug] Unavailable interrupt. Set to the contents of the MSR at the time SRR1 Set to the contents of the MSR at the time of the interrupt. of the interrupt. MSR MSR CM MSRCM is set to MSRICM. CM MSRCM is set to MSRICM. ME, ICM Unchanged. VLEMI Set to 1 if the instruction causing the inter- All other supported MSR bits set to 0. rupt resides in VLE storage. DBSR Set to indicate type of Debug Event (see CE, ME, Section 8.5.2) DE, ICM Unchanged. All other defined MSR bits set to 0. Instruction execution resumes at address IVPR0:47 || IVOR1548:59||0b0000. ESR SPV Set to 1. VLEMI Set to 1 if the instruction causing the inter- rupt resides in VLE storage. All other defined ESR bits are set to 0. Instruction execution resumes at address IVPR0:47 || IVOR3248:59||0b0000. Programming Note This interrupt is also used by the Signal Processing Engine in the same manner. It should be used by software to determine if the application is using the upper 32 bits of the GPRs in a 32-bit implementa- tion and thus be required to save and restore them on context switch. Chapter 5. Interrupts and Exceptions 585 Version 2.04 5.6.18 Embedded Floating-Point 5.6.19 Embedded Floating-Point Data Interrupt Round Interrupt [Categories: SPE.Embedded Float [Categories: SPE.Embedded Float Scalar Double, SPE.Embedded Scalar Double, SPE.Embedded Float Scalar Single, SPE.Embedded Float Scalar Single, SPE.Embedded Float Vector] Float Vector] The Embedded Floating-Point Data interrupt occurs The Embedded Floating-Point Round interrupt occurs when no higher priority exception exists (see Section when no higher priority exception exists (see 5.9) and an Embedded Floating-Point Data exception is Section 5.9 on page 591), SPEFSCRFINXE is set to 1, presented to the interrupt mechanism. The Embedded and any of the following occurs: Floating-Point Data exception causing the interrupt is - the unrounded result of an Embedded Float- indicated in the SPEFSCR; these exceptions include ing-Point operation is not exact Embedded Floating-Point Invalid Operation/Input Error (FINV, FINVH), Embedded Floating-Point Divide By - an overflow occurs and overflow exceptions Zero (FDBZ, FDBZH), Embedded Floating-Point Over- are disabled (FOVF or FOVFH is set to 1 and flow (FOV, FOVH), and Embedded Floating-Point FOVFE is set to 0) Underflow (FUNF, FUNFH) - an underflow occurs and underflow exceptions When an Embedded Floating-Point Data interrupt are disabled (FUNF is set to 1 and FUNFE is occurs, the processor suppresses the execution of the set to 0). instruction causing the exception. The value of SPEFSCRFINXS is 1, indicating that one of SRR0, SRR1, MSR, and ESR are updated as follows: the above exceptions has occurred, and additional information about the exception is found in SRR0 Set to the effective address of the instruc- SPEFSCRFGH FG FXH FX. tion causing the Embedded Floating-Point Data interrupt. When an Embedded Floating-Point Round interrupt occurs, the processor completes the execution of the SRR1 Set to the contents of the MSR at the time instruction causing the exception and writes the result of the interrupt. to the destination register prior to taking the interrupt. MSR SRR0, SRR1, MSR, and ESR are updated as follows: CM MSRCM is set to MSRICM. SRR0 Set to the effective address of the instruc- VLEMI Set to 1 if the instruction causing the inter- tion following the instruction causing the rupt resides in VLE storage. Embedded Floating-Point Round interrupt. CE, ME, SRR1 Set to the contents of the MSR at the time DE, ICM Unchanged. of the interrupt. All other defined MSR bits set to 0. MSR CM MSRCM is set to MSRICM. ESR CE, ME, SPV Set to 1. DE, ICM Unchanged. All other defined ESR bits are set to 0. All other defined MSR bits set to 0. Instruction execution resumes at address IVPR0:47 || IVOR3348:59||0b0000. ESR SPV Set to 1. VLEMI Set to 1 if the instruction causing the inter- rupt resides in VLE storage. All other defined ESR bits are set to 0. Instruction execution resumes at address IVPR0:47 || IVOR3448:59||0b0000. 586 Power ISATM -- Book III-E Version 2.04 Programming Note 5.6.21 Processor Doorbell Inter- If an implementation does not support ±Infinity rupt [Category: Embedded.Proces- rounding modes and the rounding mode is set to be +Infinity or -Infinity, an Embedded Floating-Point sor Control] Round interrupt occurs after every Embedded A Processor Doorbell Interrupt occurs when no higher Floating-Point instruction for which rounding might priority exception exists, a Processor Doorbell excep- occur regardless of the value of FINXE, provided tion is present, and MSREE=1. Processor Doorbell no higher priority exception exists. exceptions are generated when DBELL messages (see When an Embedded Floating-Point Round interrupt Chapter 9) are received and accepted by the proces- occurs, the unrounded (truncated) result of an inex- sor. act high or low element is placed in the target regis- When a Processor Doorbell Interrupt occurs, SRR0 is ter. If only a single element is inexact, the other set to the address of the next instruction to be executed exact element is updated with the correctly and SRR1 is set to the contents of the MSR at the time rounded result, and the FG and FX bits corre- of the interrupt. sponding to the other exact element will both be 0. Instruction execution resumes at address IVPR0:47 || The bits FG (FGH) and FX (FXH) are provided so IVOR3648:59 || 0b0000. that an interrupt handler can round the result as it desires. FG (FGH) is the value of the bit immedi- ately to the right of the least significant bit of the 5.6.22 Processor Doorbell Critical destination format mantissa from the infinitely pre- Interrupt [Category: Embed- cise intermediate calculation before rounding. FX (FXH) is the value of the `or' of all the bits to the ded.Processor Control] right of the FG (FGH) of the destination format A Processor Doorbell Critical Interrupt occurs when no mantissa from the infinitely precise intermediate higher priority exception exists, a Processor Doorbell calculation before rounding. Critical exception is present, and MSRCE=1. Processor Doorbell Critical exceptions are generated when DBELL_CRIT messages (see Chapter 9) are received 5.6.20 Performance Monitor Inter- and accepted by the processor. rupt [Category: Embedded.Perfor- When a Processor Doorbell Critical Interrupt occurs, mance Monitor] CSRR0 is set to the address of the next instruction to be executed and CSRR1 is set to the contents of the The Performance Monitor interrupt is part of the MSR at the time of the interrupt. optional Performance Monitor facility; see Appendix E. Instruction execution resumes at address IVPR0:47 || IVOR3748:59 || 0b0000. Chapter 5. Interrupts and Exceptions 587 Version 2.04 5.7 Partially Executed Instructions In general, the architecture permits load and store 1. Any Load or Store (except elementary, aligned, instructions to be partially executed, interrupted, and guarded): then to be restarted from the beginning upon return Any asynchronous interrupt from the interrupt. Unaligned Load and Store instruc- Machine Check tions, or Load Multiple, Store Multiple, Load String, and Program (Imprecise Mode Floating-Point Store String instructions may be broken up into multi- Enabled) ple, smaller accesses, and these accesses may be per- Program (Imprecise Mode Auxiliary Processor formed in any order. In order to guarantee that a Enabled) particular load or store instruction will complete without 2. Unaligned elementary Load or Store, or any multi- being interrupted and restarted, software must mark ple or string: the storage being referred to as Guarded, and must use an elementary (non-string or non-multiple) load or store All of the above listed under item 1, plus the that is aligned on an operand-sized boundary. following: Data Storage (if the access crosses a protec- In order to guarantee that Load and Store instructions tion boundary) can, in general, be restarted and completed correctly Debug (Data Address Compare, Data Value without software intervention, the following rules apply Compare) when an execution is partially executed and then inter- rupted: 3. mtcrf may also be partially executed due to the occurrence of any of the interrupts listed under 1 For an elementary Load, no part of the target reg- item 1 at the time the mtcrf was executing. ister RT or FRT, will have been altered. 1 All instructions prior to the mtcrf have com- 1 For `with update' forms of Load or Store, the pleted execution. (Some storage accesses update register, register RA, will not have been generated by these preceding instructions altered. may not have completed.) On the other hand, the following effects are permissible 1 No subsequent instruction has begun execu- when certain instructions are partially executed and tion. then restarted: 1 The mtcrf instruction (the address of which was saved in SRR0/CSRR0/MCSRR0/ 1 For any Store, some of the bytes at the target stor- DSRR0 [Category: Embedded.Enhanced age location may have been altered (if write Debug] at the occurrence of the interrupt), access to that page in which bytes were altered is may appear not to have begun or may have permitted by the access control mechanism). In partially executed. addition, for Store Conditional instructions, CR0 has been set to an undefined value, and it is unde- fined whether the reservation has been cleared. 1 For any Load, some of the bytes at the addressed storage location may have been accessed (if read access to that page in which bytes were accessed is permitted by the access control mechanism). 1 For Load Multiple or Load String, some of the reg- isters in the range to be loaded may have been altered. Including the addressing registers (RA, and possibly RB) in the range to be loaded is a programming error, and thus the rules for partial execution do not protect against overwriting of these registers. In no case will access control be violated. As previously stated, the only load or store instructions that are guaranteed to not be interrupted after being partially executed are elementary, aligned, guarded loads and stores. All others may be interrupted after being partially executed. The following list identifies the specific instruction types for which interruption after partial execution may occur, as well as the specific interrupt types that could cause the interruption: 588 Power ISATM -- Book III-E Version 2.04 5.8 Interrupt Ordering and Masking It is possible for multiple exceptions to exist simulta- chy of interrupt classes is as follows from highest to neously, each of which could cause the generation of lowest: an interrupt. Furthermore, for interrupts classes other than the Machine Check interrupt and critical interrupts, MSR Enables Save/Restore the architecture does not provide for reporting more Interrupt Class Cleared Registers than one interrupt of the same class (unless the Machine Check ME,DE, CE, EE MSRR0/1 Embedded.Enhanced Debug category is supported). Debug1 DE,CE,EE DSRR0/1 Therefore, the architecture defines that interrupts are ordered with respect to each other, and provides a Critical CE,EE CSRR0/1 masking mechanism for certain persistent interrupt Base EE SRR0/1 types. 1 The Debug interrupt class is Category: E.ED. When an interrupt is masked (disabled), and an event Note: MSRDE may be cleared when a critical inter- causes an exception that would normally generate the rupt occurs if Category: E.ED is not supported. interrupt, the exception persists as a status bit in a reg- ister (which register depends upon the exception type). Figure 16. Interrupt Hierarchy However, no interrupt is generated. Later, if the inter- If the Embedded.Enhanced Debug category is not sup- rupt is enabled (unmasked), and the exception status ported (or is supported and is not enabled), then the has not been cleared by software, the interrupt due to Debug interrupt becomes a Critical class interrupt and the original exception event will then finally be gener- all critical class interrupts will clear DE, CE, and EE in ated. the MSR. All asynchronous interrupts can be masked. In addition, Base Class interrupts that occur as a result of precise certain synchronous interrupts can be masked. An exceptions are not masked by the EE bit in the MSR example of such an interrupt is the Floating-Point and any such exception that occurs prior to software Enabled exception type Program interrupt. The execu- saving the state of SRR0/1 in a base class exception tion of a floating-point instruction that causes the handler will result in a situation that could result in the FPSCRFEX bit to be set to 1 is considered an exception loss of state information. event, regardless of the setting of MSRFE0,FE1. If MSRFE0,FE1 are both 0, then the Floating-Point This first step of the hardware clearing the MSR enable Enabled exception type of Program interrupt is masked, bits lower in the hierarchy shown in Figure 16 prevents but the exception persists in the FPSCRFEX bit. Later, if any subsequent asynchronous interrupts from overwrit- the MSRFE0,FE1 bits are enabled, the interrupt will ing the Save/Restore Registers (SRR0/SRR1, CSRR0/ finally be generated. CSRR1, MCSRR0/MCSRR1, or DSRR0/DSRR1 [Cate- gory: Embedded.Enhanced Debug]), prior to software The architecture enables implementations to avoid situ- being able to save their contents. Hardware also auto- ations in which an interrupt would cause the state infor- matically clears, on any interrupt, mation (saved in Save/Restore Registers) from a MSRWE,PR,FP,FE0,FE1,IS,DS. The clearing of these bits previous interrupt to be overwritten and lost. In order to assists in the avoidance of subsequent interrupts of do this, the architecture defines interrupt classes in a certain other types. However, guaranteeing that inter- hierarchical manner. At each interrupt class, hardware rupt classes lower in the hierarchy do not occur and automatically disables any further interrupts associated thus do not overwrite the Save/Restore Registers with the interrupt class by masking the interrupt enable (SRR0/SRR1, CSRR0/CSRR1, DSRR0/DSRR1 [Cate- in the MSR when the interrupt is taken. In addition, gory: Embedded.Enhanced Debug], or MCSRR0/ each interrupt class masks the interrupt enable in the MCSRR1) also requires the cooperation of system soft- MSR for each lower class in the hierarchy. The hierar- ware. Specifically, system software must avoid the exe- cution of instructions that could cause (or enable) a subsequent interrupt, if the contents of the Save/ Restore Registers (SRR0/SRR1, CSRR0/CSRR1, DSRR0/DSRR1 [Category: Embedded.Enhanced Debug]), or MCSRR0/MCSRR1) have not yet been saved. Chapter 5. Interrupts and Exceptions 589 Version 2.04 5.8.1 Guidelines for System Soft- and Unimplemented Operation type Program inter- rupts. ware 1 Execution of any Illegal instructions The following list identifies the actions that system soft- This prevents Illegal Instruction exception type ware must avoid, prior to having saved the Save/ Program interrupts. Restore Registers' contents: 1 Execution of any instruction that could cause an 1 Re-enabling an interrupt class that is at the same Alignment interrupt or a lower level in the interrupt hierarchy. This includes the following actions: This prevents Alignment interrupts. Included in this category are any string or multiple instructions, - Re-enabling of MSREE and any unaligned elementary load or store - Re-enabling of MSRCE,EE in critical class instructions. See Section 5.6.6 on page 579 for a interrupt handlers, and if the Embed- complete list of instructions that may cause Align- ded.Enhanced Debug category is not sup- ment interrupts. ported, re-enabling of MSRDE. It is not necessary for hardware or software to avoid - Category: Embedded.Enhanced Debug: Re- interrupts higher in the interrupt hierarchy (see enabling of MSRCE,EE,DE in Debug class inter- Figure 16) from within interrupt handlers (and hence, rupt handlers for example, hardware does not automatically clear - Re-enabling of MSREE,CE,DE,ME in Machine MSRCE,ME,DE upon a base class interrupt), since inter- Check interrupt handlers. rupts at each level of the hierarchy use different pairs of Save/Restore Registers to save the instruction address 1 Branching (or sequential execution) to addresses and MSR (i.e. SRR0/SRR1 for base class interrupts, not mapped by the TLB, or mapped without UX=1 and MCSRR0/MCSRR1,DSRR0/DSRR1 [Category: or SX=1 permission. Embedded.Enhanced Debug], or CSRR0/CSRR1 for This prevents Instruction Storage and Instruction non-base class interrupts). The converse, however, is TLB Error interrupts. not true. That is, hardware and software must cooper- ate in the avoidance of interrupts lower in the hierarchy 1 Load, Store or Cache Management instructions to from occurring within interrupt handlers, even though addresses not mapped by the TLB or not having the these interrupts use different Save/Restore Regis- required access permissions. ter pairs. This is because the interrupt higher in the This prevents Data Storage and Data TLB Error hierarchy may have occurred from within a interrupt interrupts. handler for an interrupt lower in the hierarchy prior to the interrupt handler having saved the Save/Restore 1 Execution of System Call (sc) or Trap (tw, twi, td, Registers. Therefore, within an interrupt handler, Save/ tdi) instructions Restore Registers for all interrupts lower in the hierar- This prevents System Call and Trap exception type chy may contain data that is necessary to the system Program interrupts. software. 1 Execution of any floating-point instruction This prevents Floating-Point Unavailable inter- rupts. Note that this interrupt would occur upon the execution of any floating-point instruction, due to the automatic clearing of MSRFP. However, even if software were to re-enable MSRFP, floating-point instructions must still be avoided in order to pre- vent Program interrupts due to various possible Program interrupt exceptions (Floating-Point Enabled, Unimplemented Operation). 1 Re-enabling of MSRPR This prevents Privileged Instruction exception type Program interrupts. Alternatively, software could re-enable MSRPR, but avoid the execution of any privileged instructions. 1 Execution of any Auxiliary Processor instruction This prevents Auxiliary Processor Unavailable interrupts, and Auxiliary Processor Enabled type 590 Power ISATM -- Book III-E Version 2.04 5.8.2 Interrupt Order 5.9 Exception Priorities The following is a prioritized listing of the various All synchronous (precise and imprecise) interrupts are enabled interrupts for which exceptions might exist reported in program order, as required by the Sequen- simultaneously: tial Execution Model. The one exception to this rule is 1. Synchronous (Non-Debug) Interrupts: the case of multiple synchronous imprecise interrupts. Data Storage Upon a synchronizing event, all previously executed Instruction Storage instructions are required to report any synchronous Alignment imprecise interrupt-generating exceptions, and the Program interrupt will then be generated with all of those excep- Floating-Point Unit Unavailable tion types reported cumulatively, in both the ESR, and Auxiliary Processor Unavailable any status registers associated with the particular Embedded Floating-Point Unavailable exception type (e.g. the Floating-Point Status and Con- [SP.Category: SP.Embedded Float_*] trol Register). SPE/Embedded Floating-Point/Vector For any single instruction attempting to cause multiple Unavailable exceptions for which the corresponding synchronous Embedded Floating-Point Data [Category: interrupt types are enabled, this section defines the pri- SP.Embedded Float_*] ority order by which the instruction will be permitted to Embedded Floating-Point Round [Category: cause a single enabled exception, thus generating a SP.Embedded Float_*] particular synchronous interrupt. Note that it is this System Call exception priority mechanism, along with the require- Data TLB Error ment that synchronous interrupts be generated in pro- Instruction TLB Error gram order, that guarantees that at any given time, Only one of the above types of synchronous inter- there exists for consideration only one of the synchro- rupts may have an existing exception generating it nous interrupt types listed in item 1 of Section 5.8.2 on at any given time. This is guaranteed by the excep- page 591. The exception priority mechanism also pre- tion priority mechanism (see Section 5.9 on vents certain debug exceptions from existing in combi- page 591) and the requirements of the Sequential nation with certain other synchronous interrupt- Execution Model. generating exceptions. 2. Machine Check Because unaligned Load and Store instructions, or 3. Debug Load Multiple, Store Multiple, Load String, and Store 4. Critical Input Sting instructions may be broken up into multiple, 5. Watchdog Timer smaller accesses, and these accesses may be per- 6. Processor Doorbell Critical formed in any order. The exception priority mechanism 7. External Input applies to each of the multiple storage accesses in the 8. Fixed-Interval Timer order they are performed by the implementation. 9. Decrementer This section does not define the permitted setting of 10. Processor Doorbell multiple exceptions for which the corresponding inter- 11. Embedded Performance Monitor rupt types are disabled. The generation of exceptions Even though, as indicated above, the base, synchro- for which the corresponding interrupt types are disabled nous exception types listed under item 1 are generated will have no effect on the generation of other exceptions with higher priority than the non-base interrupt classes for which the corresponding interrupt types are listed in items 2-5, the fact is that these base class enabled. Conversely, if a particular exception for which interrupts will immediately be followed by the highest the corresponding interrupt type is enabled is shown in priority existing interrupt in items 2-5, without executing the following sections to be of a higher priority than any instructions at the base class interrupt handler. another exception, it will prevent the setting of that This is because the base interrupt classes do not auto- other exception, independent of whether that other matically disable the MSR mask bits for the interrupts exception's corresponding interrupt type is enabled or listed in 2-5. In all other cases, a particular interrupt disabled. class from the above list will automatically disable any Except as specifically noted, only one of the exception subsequent interrupts of the same class, as well as all types listed for a given instruction type will be permitted other interrupt classes that are listed below it in the pri- to be generated at any given time. The priority of the ority order. exception types are listed in the following sections ranging from highest to lowest, within each instruction type. Chapter 5. Interrupts and Exceptions 591 Version 2.04 10. Debug (Data Address Compare, Data Value Com- Programming Note pare) Some exception types may even be mutually exclu- 11. Debug (Instruction Complete) sive of each other and could otherwise be consid- ered the same priority. In these cases, the If the instruction is causing both a Debug (Instruction exceptions are listed in the order suggested by the Address Compare) and a Debug (Data Address Com- sequential execution model. pare) or Debug (Data Value Compare), and is not caus- ing any of the exceptions listed in items 2-9, it is permissible for both exceptions to be generated and 5.9.1 Exception Priorities for recorded in the DBSR. A single Debug interrupt will result. Defined Instructions 5.9.1.3 Exception Priorities for Other 5.9.1.1 Exception Priorities for Defined Defined Floating-Point Instructions Floating-Point Load and Store Instruc- The following prioritized list of exceptions may occur as tions a result of the attempted execution of any defined float- The following prioritized list of exceptions may occur as ing-point instruction other than a load or store. a result of the attempted execution of any defined 1. Debug (Instruction Address Compare) Floating-Point Load and Store instruction. 2. Instruction TLB Error 1. Debug (Instruction Address Compare) 3. Instruction Storage Interrupt (all types) 2. Instruction TLB Error 4. Program (Illegal Instruction) 3. Instruction Storage Interrupt (all types) 5. Floating-Point Unavailable 4. Program (Illegal Instruction) 6. Program (Unimplemented Operation) 5. Floating-Point Unavailable 7. Program (Floating-point Enabled) 6. Program (Unimplemented Operation) 8. Debug (Instruction Complete) 7. Data TLB Error 8. Data Storage (all types) 5.9.1.4 Exception Priorities for Defined 9. Alignment 10. Debug (Data Address Compare, Data Value Com- Privileged Instructions pare) The following prioritized list of exceptions may occur as 11. Debug (Instruction Complete) a result of the attempted execution of any defined privi- leged instruction, except dcbi, rfi, and rfci instructions. If the instruction is causing both a Debug (Instruction Address Compare) and a Debug (Data Address Com- 1. Debug (Instruction Address Compare) pare) or Debug (Data Value Compare), and is not caus- 2. Instruction TLB Error ing any of the exceptions listed in items 2-9, it is 3. Instruction Storage Interrupt (all types) permissible for both exceptions to be generated and 4. Program (Illegal Instruction) recorded in the DBSR. A single Debug interrupt will 5. Program (Privileged Instruction) result. 6. Program (Unimplemented Operation) 7. Debug (Instruction Complete) 5.9.1.2 Exception Priorities for Other For mtmsr, mtspr (DBCR0, DBCR1, DBCR2), mtspr Defined Load and Store Instructions and (TCR), and mtspr (TSR), if they are not causing Debug Defined Cache Management Instructions (Instruction Address Compare) nor Program (Privileged Instruction) exceptions, it is possible that they are The following prioritized list of exceptions may occur as simultaneously enabling (via mask bits) multiple exist- a result of the attempted execution of any other defined ing exceptions (and at the same time possibly causing Load or Store instruction, or defined Cache Manage- a Debug (Instruction Complete) exception). When this ment instruction. occurs, the interrupts will be handled in the order defined by Section 5.8.2 on page 591. 1. Debug (Instruction Address Compare) 2. Instruction TLB Error 3. Instruction Storage Interrupt (all types) 5.9.1.5 Exception Priorities for Defined 4. Program (Illegal Instruction) Trap Instructions 5. Program (Privileged Instruction) (dcbi only) 6. Program (Unimplemented Operation) The following prioritized list of exceptions may occur as 7. Data TLB Error a result of the attempted execution of a defined Trap 8. Data Storage (all types) instruction. 9. Alignment 1. Debug (Instruction Address Compare) 2. Instruction TLB Error 592 Power ISATM -- Book III-E Version 2.04 3. Instruction Storage Interrupt (all types) 4. Program (Illegal Instruction) 4. Program (Illegal Instruction) 5. Program (Privileged Instruction) 5. Program (Unimplemented Operation) 6. Program (Unimplemented Operation) 6. Debug (Trap) 7. Debug (Return From Interrupt) 7. Program (Trap) 8. Debug (Instruction Complete) 8. Debug (Instruction Complete) If the rfi or rfci, rfmci, or rfdi [Category: Embed- If the instruction is causing both a Debug (Instruction ded.Enhanced Debug] instruction is causing both a Address Compare) and a Debug (Trap), and is not Debug (Instruction Address Compare) and a Debug causing any of the exceptions listed in items 2-5, it is (Return From Interrupt), and is not causing any of the permissible for both exceptions to be generated and exceptions listed in items 2-5, it is permissible for both recorded in the DBSR. A single Debug interrupt will exceptions to be generated and recorded in the DBSR. result. A single Debug interrupt will result. 5.9.1.6 Exception Priorities for Defined 5.9.1.9 Exception Priorities for Other System Call Instruction Defined Instructions The following prioritized list of exceptions may occur as The following prioritized list of exceptions may occur as a result of the attempted execution of a defined System a result of the attempted execution of all other instruc- Call instruction. tions not listed above. 1. Debug (Instruction Address Compare) 1. Debug (Instruction Address Compare) 2. Instruction TLB Error 2. Instruction TLB Error 3. Instruction Storage Interrupt (all types) 3. Instruction Storage Interrupt (all types) 4. Program (Illegal Instruction) 4. Program (Illegal Instruction) 5. Program (Unimplemented Operation) 5. Program (Unimplemented Operation) 6. System Call 6. Debug (Instruction Complete) 7. Debug (Instruction Complete) 5.9.2 Exception Priorities for 5.9.1.7 Exception Priorities for Defined Branch Instructions Reserved Instructions The following prioritized list of exceptions may occur as The following prioritized list of exceptions may occur as a result of the attempted execution of any reserved a result of the attempted execution of any defined instruction. branch instruction. 1. Debug (Instruction Address Compare) 1. Debug (Instruction Address Compare) 2. Instruction TLB Error 2. Instruction TLB Error 3. Instruction Storage Interrupt (all types) 3. Instruction Storage Interrupt (all types) 4. Program (Illegal Instruction) 4. Program (Illegal Instruction) 5. Program (Unimplemented Operation) 6. Debug (Branch Taken) 7. Debug (Instruction Complete) If the instruction is causing both a Debug (Instruction Address Compare) and a Debug (Branch Taken), and is not causing any of the exceptions listed in items 2-5, it is permissible for both exceptions to be generated and recorded in the DBSR. A single Debug interrupt will result. 5.9.1.8 Exception Priorities for Defined Return From Interrupt Instructions The following prioritized list of exceptions may occur as a result of the attempted execution of an rfi, rfci, rfmci, rfdi [Category:Embedded.Enhanced Debug] instruc- tion. 1. Debug (Instruction Address Compare) 2. Instruction TLB Error 3. Instruction Storage Interrupt (all types) Chapter 5. Interrupts and Exceptions 593 Version 2.04 594 Power ISATM -- Book III-E Version 2.04 Chapter 6. Reset and Initialization 6.1 Background. . . . . . . . . . . . . . . . . . 595 6.4 Software Initialization Requirements . . 6.2 Reset Mechanisms . . . . . . . . . . . . 595 596 6.3 Processor State After Reset . . . . . 595 6.1 Background The Machine State Register and Processor Version Register and a TLB entry are updated as follows: This chapter describes the requirements for processor reset. This includes both the means of causing reset, Machine State Register and the specific initialization that is required to be per- formed automatically by the processor hardware. This Bit Setting Comments chapter also provides an overview of the operations CM 0 Computation Mode (set to 32-bit that should be performed by initialization software, in mode) order to fully initialize the processor. ICM 0 Interrupt Computation Mode (set In general, the specific actions taken by a processor to 32-bit) upon reset are implementation-dependent. Also, it is UCLE 0 User Cache Locking Enable the responsibility of system initialization software to ini- SPV 0 SPE/Embedded Floating-Point/ tialize the majority of processor and system resources Vector Unavailable after reset. Implementations are required to provide a minimum processor initialization such that this system WE 0 Wait State disabled software may be fetched and executed, thereby accom- CE 0 Critical Input interrupts disabled plishing the rest of system initialization. DE 0 Debug interrupts disabled EE 0 External Input interrupts disabled PR 0 Supervisor mode 6.2 Reset Mechanisms FP 0 FP unavailable This specification defines two processor mechanisms ME 0 Machine Check interrupts disabled for internally invoking a reset operation using either the FE0 0 FP exception type Program inter- Watchdog Timer (see Section 7.7 on page 602) or the rupts disabled Debug facilities using DBCR0RST (see Section 8.5.1.1 FE1 0 FP exception type Program inter- on page 613). In addition, implementations will typically rupts disabled provide additional means for invoking a reset operation, via an external mechanism such as a signal pin which IS 0 Instruction Address Space 0 when activated will cause the processor to reset. DS 0 Data Address Space 0 PMM 0 Performance Monitor Mark 6.3 Processor State After Reset Figure 17. Machine State Register Initial Values The initial processor state is controlled by the register contents after reset. In general, the contents of most Processor Version Register registers are undefined after reset. Implementation-Dependent. (This register is read-only, The processor hardware is only guaranteed to initialize and contains a value which identifies the specific imple- those registers (or specific bits in registers) which must mentation) be initialized in order for software to be able to reliably perform the rest of system initialization. Chapter 6. Reset and Initialization 595 Version 2.04 TLB entry address is different from the PowerPC Architecture System Reset interrupt vector. A TLB entry (which entry is implementation-dependent) is initialized in an implementation-dependent manner An implementation may provide additional methods for that maps the last 4KB page in the implemented effec- initializing the TLB entry used for initial boot by provid- tive storage address space, with the following field set- ing an implementation-dependent RPN, or initializing tings: other TLB entries. 6.4 Software Initialization Field Setting Comments EPN see Represents the last 4K page in Requirements below effective address space When reset occurs, the processor is initialized to a min- RPN see Represents the last 4K page in imum configuration to start executing initialization code. below physical address space Initialization code is necessary to complete the proces- TS 0 translation address space 0 sor and system configuration. The initialization code SIZE 0b0001 4KB page size described in this section is the minimum recommended for configuring the processor to run application code. W ? implementation-dependent value I ? implementation-dependent value Initialization code should configure the following pro- M ? implementation-dependent value cessor resources: G ? implementation-dependent value - Invalidate the instruction cache and data E ? implementation-dependent value cache (implementation-dependent). U0 ? implementation-dependent value - Initialize system memory as required by the U1 ? implementation-dependent value operating system or application code. U2 ? implementation-dependent value - Initialize the Interrupt Vector Prefix Register U3 ? implementation-dependent value and Interrupt Vector Offset Register. TID ? implementation-dependent value, - Initialize other processor registers as needed but page must be accessible by the system. UX ? implementation-dependent value - Initialize off-chip system facilities. UR ? implementation-dependent value - Dispatch the operating system or application UW ? implementation-dependent value code. SX 1 page is execute accessible in supervisor mode SR 1 page is read accessible in supervisor mode SW 1 page is write accessible in supervisor mode VLE ? implementation-dependent value ACM ? implementation-dependent value Figure 18. TLB Initial Values The initial settings of EPN and RPN are dependent upon the number of bits implemented in the EPN and RPN fields and the minimum page size supported by the implementation. For example, an implementation that allows 1KB pages and 32 bits of effective address would implement a 22 bit EPN and set the initial value of the boot entry to 222-4 (0x3FFC) while an implemen- tation that supports only 4K pages as the smallest size and 32 bits of effective address would implement a 20 bit EPN and set the initial value of the boot entry to 220- 1 (0xFFFF). Instruction execution begins at the last word address of the page mapped by the boot TLB entry. Note that this 596 Power ISATM -- Book III-E Version 2.04 Chapter 7. Timer Facilities 7.1 Overview. . . . . . . . . . . . . . . . . . . . 597 7.4 Decrementer Auto-Reload Register . . 7.2 Time Base (TB) . . . . . . . . . . . . . . 597 600 7.2.1 Writing the Time Base . . . . . . . . 598 7.5 Timer Control Register . . . . . . . . . 600 7.3 Decrementer . . . . . . . . . . . . . . . . . 599 7.5.1 Timer Status Register . . . . . . . . 601 7.3.1 Writing and Reading the Decre- 7.6 Fixed-Interval Timer . . . . . . . . . . . 602 menter . . . . . . . . . . . . . . . . . . . . . . . . . 599 7.7 Watchdog Timer . . . . . . . . . . . . . . 602 7.3.2 Decrementer Events . . . . . . . . . 599 7.8 Freezing the Timer Facilities . . . . . 604 7.1 Overview 64 2 × 32 TTB = -------------------- = 5.90 × 1011 seconds - The Time Base, Decrementer, Fixed-interval Timer, 1 GHz and Watchdog Timer provide timing functions for the which is approximately 18,700 years. system. The remainder of this section describes these The Time Base is implemented such that: registers and related facilities. 1. Loading a GPR from the Time Base has no effect on the accuracy of the Time Base. 7.2 Time Base (TB) 2. Copying the contents of a GPR to the Time Base replaces the contents of the Time Base with the The Time Base (TB) is a 64-bit register (see Figure 19) contents of the GPR. containing a 64-bit unsigned integer that is incremented periodically. Each increment adds 1 to the low-order bit The Power ISA does not specify a relationship between (bit 63). The frequency at which the integer is updated the frequency at which the Time Base is updated and is implementation-dependent. other frequencies, such as the CPU clock or bus clock in a Power ISA system. The Time Base update fre- TBU TBL quency is not required to be constant. What is required, 0 32 63 so that system software can keep time of day and oper- ate interval timers, is one of the following. Field Description 1 The system provides an (implementation-depen- TBU Upper 32 bits of Time Base dent) interrupt to software whenever the update TBL Lower 32 bits of Time Base frequency of the Time Base changes, and a means to determine what the current update frequency is. Figure 19. Time Base 1 The update frequency of the Time Base is under The Time Base increments until its value becomes the control of the system software. 0xFFFF_FFFF_FFFF_FFFF (264-1). At the next incre- Implementations must provide a means for either pre- ment, its value becomes 0x0000_0000_0000_0000. venting the Time Base from incrementing or preventing There is no interrupt or other indication when this it from being read in user mode (MSRPR=1). If the occurs. means is under software control, it must be privileged. The period of the Time Base depends on the driving There must be a method for getting all processors' frequency. As an order of magnitude example, sup- Time Bases to start incrementing with values that are pose that the CPU clock is 1 GHz and that the Time identical or almost identical in all processors. Base is driven by this frequency divided by 32. Then the period of the Time Base would be Chapter 7. Timer Facilities 597 Version 2.04 Programming Note If software initializes the Time Base on power-on to some reasonable value and the update frequency of the Time Base is constant, the Time Base can be used as a source of values that increase at a con- stant rate, such as for time stamps in trace entries. Even if the update frequency is not constant, val- ues read from the Time Base are monotonically increasing (except when the Time Base wraps from 264-1 to 0). If a trace entry is recorded each time the update frequency changes, the sequence of Time Base values can be post-processed to become actual time values. Successive readings of the Time Base may return identical values. See the description of the Time Base in Book II, for ways to compute time of day in POSIX format from the Time Base. 7.2.1 Writing the Time Base Writing the Time Base is privileged. Reading the Time Base is not privileged; it is discussed in Book II. It is not possible to write the entire 64-bit Time Base using a single instruction. The mttbl and mttbu extended mnemonics write the lower and upper halves of the Time Base (TBL and TBU), respectively, preserv- ing the other half. These are extended mnemonics for the mtspr instruction; see Appendix B, "Assembler Extended Mnemonics" on page 635. The Time Base can be written by a sequence such as: lwz Rx,upper # load 64-bit value for lwz Ry,lower # TB into Rx and Ry li Rz,0 mttbl Rz # set TBL to 0 mttbu Rx # set TBU mttbl Ry # set TBL Provided that no interrupts occur while the last three instructions are being executed, loading 0 into TBL pre- vents the possibility of a carry from TBL to TBU while the Time Base is being initialized. Programming Note The instructions for writing the Time Base are mode-independent. Thus code written to set the Time Base will work correctly in either 64-bit or 32- bit mode. 598 Power ISATM -- Book III-E Version 2.04 7.3 Decrementer Copying the Decrementer to a GPR has no effect on the Decrementer contents or on the interrupt mecha- The Decrementer (DEC) is a 32-bit decrementing nism. counter that provides a mechanism for causing a Dec- rementer interrupt after a programmable delay. The 7.3.2 Decrementer Events contents of the Decrementer are treated as a signed integer. A Decrementer event occurs when a decrement occurs on a Decrementer value of 0x0000_0001. DEC Upon the occurrence of a Decrementer event, the Dec- 32 63 rementer may be reloaded from a 32-bit Decrementer Figure 20. Decrementer Auto-Reload Register (DECAR). See Section 7.4. Upon the occurrence of a Decrementer event, the Decre- The Decrementer is driven by the same frequency as menter has the following basic modes of operation. the Time Base. The period of the Decrementer will depend on the driving frequency, but if the same values are used as given above for the Time Base (see Sec- Decrement to one and stop on zero tion 7.2), and if the Time Base update frequency is con- If TCRARE=0, TSRDIS is set to 1, the value stant, the period would be 0x0000_0000 is then placed into the DEC, and the 32 Decrementer stops decrementing. 2 × 32 TDEC = -------------------- = 137 seconds. - If enabled by TCRDIE=1 and MSREE=1, a Decre- 1 GHz The Decrementer counts down. menter interrupt is taken. See Section 5.6.11, "Decrementer Interrupt" on page 582 for details of The operation of the Decrementer satisfies the follow- register behavior caused by the Decrementer inter- ing constraints. rupt. 1. The operation of the Time Base and the Decre- menter is coherent, i.e., the counters are driven by Decrement to one and auto-reload the same fundamental time base. If TCRARE=1, TSRDIS is set to 1, the contents of 2. Loading a GPR from the Decrementer has no the Decrementer Auto-Reload Register is then effect on the accuracy of the Time Base. placed into the DEC, and the Decrementer contin- 3. Copying the contents of a GPR to the Decrementer ues decrementing from the reloaded value. replaces the contents of the Decrementer with the contents of the GPR. If enabled by TCRDIE=1 and MSREE=1, a Decre- menter interrupt is taken. See Section 5.6.11, Programming Note "Decrementer Interrupt" on page 582 for details of In systems that change the Time Base update fre- register behavior caused by the Decrementer inter- quency for purposes such as power management, rupt. the Decrementer input frequency will also change. Forcing the Decrementer to 0 using the mtspr instruc- Software must be aware of this in order to set inter- tion will not cause a Decrementer exception; however, val timers. decrementing which was in progress at the instant of the mtspr may cause the exception. To eliminate the Decrementer as a source of exceptions, set TCRDIE to 7.3.1 Writing and Reading the 0 (clear the Decrementer Interrupt Enable bit). Decrementer If it is desired to eliminate all Decrementer activity, the procedure is as follows: The contents of the Decrementer can be read or written using the mfspr and mtspr instructions, both of which 1. Write 0 to TCRDIE. This will prevent Decrementer are privileged when they refer to the Decrementer. activity from causing exceptions. Using an extended mnemonic (see Appendix B, 2. Write 0 to TCRARE to disable the Decrementer "Assembler Extended Mnemonics" on page 635), the auto-reload. Decrementer can be written from GPR Rx using: 3. Write 0 to Decrementer. This will halt Decrementer decrementing. While this action will not cause a mtdec Rx Decrementer exception to be set in TSRDIS, a near The Decrementer can be read into GPR Rx using: simultaneous decrement may have done so. 4. Write 1 to TSRDIS. This action will clear TSRDIS to mfdec Rx 0 ( see Section 7.5.1 on page 601). This will clear any Decrementer exception which may be pend- ing. Because the Decrementer is frozen at zero, no further Decrementer events are possible. Chapter 7. Timer Facilities 599 Version 2.04 If the auto-reload feature is disabled (TCRARE=0), then bit). The Decrementer Auto-Reload Register is pro- once the Decrementer decrements to zero, it will stay vided to support the auto-reload feature of the Decre- there until software reloads it using the mtspr instruc- menter. See Section 7.3.2 tion. The contents of the Decrementer Auto-Reload Register On reset, TCRARE is set to 0. This disables the auto- cannot be read. The contents of bits 32:63 of register reload feature. RS can be written to the Decrementer Auto-Reload Register using the mtspr instruction. 7.4 Decrementer Auto-Reload Register 7.5 Timer Control Register The Timer Control Register (TCR) is a 32-bit register. The Decrementer Auto-Reload Register is a 32-bit reg- Timer Control Register bits are numbered 32 (most-sig- ister as shown below. nificant bit) to 63 (least-significant bit). The Timer Con- trol Register controls Decrementer (see Section 7.3), DECAR Fixed-Interval Timer (see Section 7.6), and Watchdog 32 63 Timer (see Section 7.7) options. Figure 21. Decrementer The relationship of the Timer facilities to the TCR and Bits of the decrementer auto-reload register are num- TB is shown in the figure below. bered 32 (most-significant bit) to 63 (least-significant TIME BASE (incrementer) TBU TBL Timer Clock 0 31 0 31 Watchdog Timer events based on one of 4 Time Base bits selected by TCRWP (the 4 Time Base bits that can be selected by TCRWP are implementation-dependent) Fixed-Interval Timer events based on one of 4 Time Base bits selected by TCRFP (the 4 Time Base bits that can be selected by TCRFP are implementation-dependent) (decrementer) DEC Decrementer event < 0/1 detect auto-reload DECAR 0 31 Figure 22. Relationships of the Timer Facilities The contents of the Timer Control Register can be read Bit(s) Description using the mfspr instruction. The contents of bits 32:63 32:33 Watchdog Timer Period (WP) (see of register RS can be written to the Timer Control Reg- Section 7.7 on page 602) ister using the mtspr instruction. Specifies one of 4 bit locations of the Time The contents of the TCR are defined below: Base used to signal a Watchdog Timer exception on a transition from 0 to 1. The 4 Time Base bits that can be specified to 600 Power ISATM -- Book III-E Version 2.04 serve as the Watchdog Timer period are Decrementer interrupt is taken. Software implementation-dependent. must reset TSRDIS. 1 Enable auto-reload of the Decrementer 34:35 Watchdog Timer Reset Control (WRC) (see Decrementer exception is presented (i.e. Section 7.7 on page 602) TSRDIS is set to 1) when the Decrementer 00 No Watchdog Timer reset will occur is decremented from a value of 0x0000_0001. The contents of the Decre- TCRWRC resets to 0b00. This field may be menter Auto-Reload Register is placed in set by software, but cannot be cleared by the Decrementer. The Decrementer software (except by a software-induced resumes decrementing. If MSREE=1, reset) TCRDIE=1, and TSRDIS=1, a Decrementer interrupt is taken. Software must reset 01-11 TSRDIS. Force processor to be reset on second time-out of Watchdog Timer. The exact 42 Implementation-dependent function of any of these settings is imple- 43:63 Reserved mentation-dependent. The Watchdog Timer Reset Control field is cleared to zero by processor reset. These bits 7.5.1 Timer Status Register are set only by software. Once a 1 has been The Timer Status Register (TSR) is a 32-bit register. written to one of these bits, that bit remains a Timer Status Register bits are numbered 32 (most-sig- 1 until a reset occurs. This is to prevent errant nificant bit) to 63 (least-significant bit). The Timer Sta- code from disabling the Watchdog reset func- tus Register contains status on timer events and the tion. most recent Watchdog Timer-initiated processor reset. 36 Watchdog Timer Interrupt Enable (WIE) The Timer Status Register is set via hardware, and (see Section 7.7 on page 602) read and cleared via software. The contents of the 0 Disable Watchdog Timer interrupt Timer Status Register can be read using the mfspr 1 Enable Watchdog Timer interrupt instruction. Bits in the Timer Status Register can be cleared using the mtspr instruction. Clearing is done 37 Decrementer Interrupt Enable (DIE) (see by writing bits 32:63 of a General Purpose Register to Section 7.3 on page 599) the Timer Status Register with a 1 in any bit position 0 Disable Decrementer interrupt that is to be cleared and 0 in all other bit positions. The 1 Enable Decrementer interrupt write-data to the Timer Status Register is not direct data, but a mask. A 1 causes the bit to be cleared, and 38:39 Fixed-Interval Timer Period (FP) (see a 0 has no effect. Section 7.6 on page 602) Specifies one of 4 bit locations of the Time The contents of the TSR are defined below: Base used to signal a Fixed-Interval Timer exception on a transition from 0 to 1. The 4 Bit(s) Description Time Base bits that can be specified to serve 32 Enable Next Watchdog Timer (ENW) (see as the Fixed-Interval Timer period are imple- Section 7.7 on page 602) mentation-dependent. 0 Action on next Watchdog Timer time-out is 40 Fixed-Interval Timer Interrupt Enable (FIE) to set TSRENW (see Section 7.6 on page 602 1 Action on next Watchdog Timer time-out is 0 Disable Fixed-Interval Timer interrupt governed by TSRWIS 1 Enable Fixed-Interval Timer interrupt 33 Watchdog Timer Interrupt Status (WIS) (see 41 Auto-Reload Enable (ARE) Section 7.7 on page 602) 0 Disable auto-reload of the Decrementer 0 A Watchdog Timer event has not occurred. Decrementer exception is presented (i.e. 1 A Watchdog Timer event has occurred. TSRDIS is set to 1) when the Decrementer When MSRCE=1 and TCRWIE=1, a is decremented from a value of Watchdog Timer interrupt is taken. 0x0000_0001. The next value placed in the Decrementer is the value 0x0000_0000. 34:35 Watchdog Timer Reset Status (WRS) (see The Decrementer then stops decrementing. Section 7.7 on page 602) If MSREE=1, TCRDIE=1, and TSRDIS=1, a Chapter 7. Timer Facilities 601 Version 2.04 These two bits are set to one of three values a Watchdog Timer exception is generated and logged when a reset is caused by the Watchdog by setting TSRWIS to 1. This is referred to as a Watch- Timer. These bits are undefined at power-up. dog Timer First Time Out. A Watchdog Timer interrupt will occur if enabled by TCRWIE and MSRCE. See Section 5.6.13 on page 582 for details of register 00 No Watchdog Timer reset has occurred. behavior caused by the Watchdog Timer interrupt. The 01 Implementation-dependent reset informa- purpose of the Watchdog Timer First time-out is to give tion. an indication that there may be problem and give the 10 Implementation-dependent reset informa- system a chance to perform corrective action or cap- tion. ture a failure before a reset occurs from the Watchdog 11 Implementation-dependent reset informa- Timer Second time-out as explained further below. tion. 36 Decrementer Interrupt Status (DIS) (see Note that a Watchdog Timer exception will also occur if Section 7.3.2 on page 599) the selected Time Base bit transitions from 0 to 1 due to an mtspr instruction that writes a 1 to the bit when its 0 A Decrementer event has not occurred. previous value was 0. 1 A Decrementer event has occurred. When MSREE=1 and TCRDIE=1, a Decrementer When a Watchdog Timer time-out occurs while interrupt is taken. TSRWIS = 1 and TSRENW = 1, a processor reset occurs if it is enabled by a non-zero value of the Watchdog 37 Fixed-Interval Timer Interrupt Status (FIS) Reset Control field in the Timer Control Register (TCR- (see Section 7.6 on page 602) WRC). This is referred to as a Watchdog Timer Second 0 A Fixed-Interval Timer event has not Time Out. The assumption is that TSRWIS was not occurred. cleared because the processor was unable to execute 1 A Fixed-Interval Timer event has the Watchdog Timer interrupt handler, leaving reset as occurred. When MSREE=1 and TCRFIE=1, the only available means to restart the system. Note a Fixed-Interval Timer interrupt is taken. that once TCRWRC has been set to a non-zero value, it cannot be reset by software; this feature prevents 38:63 Reserved errant software from disabling the Watchdog Timer reset capability. 7.6 Fixed-Interval Timer A more complete view of Watchdog Timer behavior is afforded by Figure 23 and Table 24, which describe the The Fixed-Interval Timer (FIT) is a mechanism for pro- Watchdog Timer state machine and Watchdog Timer viding timer interrupts with a repeatable period, to facili- controls. The numbers in parentheses in the figure refer tate system maintenance. It is similar in function to an to the discussion of modes of operation which follow auto-reload Decrementer, except that there are fewer the table. selections of interrupt period available. The Fixed-Inter- val Timer exception occurs on 0 to 1 transitions of a selected bit from the Time Base (see Section 7.5). The Fixed-Interval Timer exception is logged by TSR- FIS. A Fixed-Interval Timer interrupt will occur if TCRFIE and MSREE are enabled. See Section 5.6.12 on page 582 for details of register behavior caused by the Fixed-Interval Timer interrupt. Note that a Fixed-Interval Timer exception will also occur if the selected Time Base bit transitions from 0 to 1 due to an mtspr instruction that writes a 1 to the bit when its previous value was 0. 7.7 Watchdog Timer The Watchdog Timer is a facility intended to aid system recovery from faulty software or hardware. Watchdog time-outs occur on 0 to 1 transitions of selected bits from the Time Base (Section 7.5). When a Watchdog Timer time-out occurs while Watch- dog Timer Interrupt Status is clear (TSRWIS = 0) and the next Watchdog Time-out is enabled (TSRENW = 1), 602 Power ISATM -- Book III-E Version 2.04 Time-out. No exception recorded in TSRWIS. Set TSRENW so next time-out will cause exception. (2) SW Loop TSRENW,WIS=0b00 TSRENW,WIS=0b10 (1) Watchdog Interrupt Time-out. WDT exception recorded in TSRWIS Handler WDT interrupt will occur if enabled by (3) SW Loop TCRWIE and MSRCE (2) Watchdog Interrupt Handler If TCRWRC00 then RESET, including Time-out TSRENW,WIS=0b01 TSRENW,WIS=0b11 TSRWRS TCRWRC TCRWRC 0b00 Time-out. Set TSRENW so next time-out will cause reset Figure 23. Watchdog State Machine Enable WDT Status Next WDT Action when timer interval expires (TSRWIS) (TSRENW) 0 0 Set Enable Next Watchdog Timer (TSRENW=1). 0 1 Set Enable Next Watchdog Timer (TSRENW=1). 1 0 Set Watchdog Timer interrupt status bit (TSRWIS=1). If Watchdog Timer interrupt is enabled (TCRWIE=1 and MSRCE=1), then interrupt. 1 1 Cause Watchdog Timer reset action specified by TCRWRC. Reset will copy pre-reset TCRWRC into TSRWRS, then clear TCRWRC. Figure 24. Watchdog Timer Controls The controls described in the above table imply three a periodic interrupt handler such as the Fixed- different modes of operation that a programmer might Interval Timer interrupt handler) is used to repeat- select for the Watchdog Timer. Each of these modes edly clear TSRENW such that a first time-out assumes that TCRWRC has been set to allow processor exception is avoided, and thus no Watchdog Timer reset by the Watchdog facility: interrupt occurs. Once TSRENW has been cleared, software has between one and two full Watchdog 1. Always take the Watchdog Timer interrupt when periods before a Watchdog exception will be pending, and never attempt to prevent its occur- posted in TSRWIS. If this occurs before the soft- rence. In this mode, the Watchdog Timer interrupt ware is able to clear TSRENW again, a Watchdog caused by a first time-out is used to clear TSRWIS Timer interrupt will occur. In this case, the Watch- so a second time-out never occurs. TSRENW is not dog Timer interrupt handler will then clear both cleared, thereby allowing the next time-out to TSRENW and TSRWIS, in order to (hopefully) avoid cause another interrupt. the next Watchdog Timer interrupt. 2. Always take the Watchdog Timer interrupt when 3. Never take the Watchdog Timer interrupt. In this pending, but avoid when possible. In this mode a mode, Watchdog Timer interrupts are disabled (via recurring code loop of reliable duration (or perhaps TCRWIE=0), and the system depends upon a Chapter 7. Timer Facilities 603 Version 2.04 recurring code loop of reliable duration (or perhaps a periodic interrupt handler such as the Fixed- Interval Timer interrupt handler) to repeatedly clear TSRWIS such that a second time-out is avoided, and thus no reset occurs. TSRENW is not cleared, thereby allowing the next time-out to set TSRWIS again. The recurring code loop must have a period which is less than one Watchdog Timer period in order to guarantee that a Watchdog Timer reset will not occur. 7.8 Freezing the Timer Facilities The debug mechanism provides a means of tempo- rarily freezing the timers upon a debug event. Specifi- cally, the Time Base and Decrementer can be frozen and prevented from incrementing/decrementing, respectively, whenever a debug event is set in the Debug Status Register. Note that this also freezes the FIT and Watchdog timer. This allows a debugger to simulate the appearance of `real time', even though the application has been temporarily `halted' to service the debug event. See the description of bit 63 of the Debug Control Register 0 (Freeze Timers on Debug Event or DBCR0FT) in Section 8.5.1.1 on page 613. 604 Power ISATM -- Book III-E Version 2.04 Chapter 8. Debug Facilities 8.1 Overview. . . . . . . . . . . . . . . . . . . . 605 8.4.10 Critical Interrupt Return Debug 8.2 Internal Debug Mode . . . . . . . . . . 605 Event [Category: Embedded.Enhanced 8.3 External Debug Mode [Category: Debug] . . . . . . . . . . . . . . . . . . . . . . . . . 613 Embedded.Enhanced Debug] . . . . . . . 606 8.5 Debug Registers . . . . . . . . . . . . . . 613 8.4 Debug Events . . . . . . . . . . . . . . . . 606 8.5.1 Debug Control Registers . . . . . . 613 8.4.1 Instruction Address Compare Debug 8.5.1.1 Debug Control Register 0 (DCBR0) Event . . . . . . . . . . . . . . . . . . . . . . . . . . 607 613 8.4.2 Data Address Compare Debug Event 8.5.1.2 Debug Control Register 1 (DCBR1) 609 614 8.4.3 Trap Debug Event . . . . . . . . . . . 610 8.5.1.3 Debug Control Register 2 (DCBR2) 8.4.4 Branch Taken Debug Event . . . . 610 616 8.4.5 Instruction Complete Debug Event . 8.5.2 Debug Status Register . . . . . . . . 617 611 8.5.3 Instruction Address Compare Regis- 8.4.6 Interrupt Taken Debug Event . . . 611 ters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 618 8.4.6.1 Causes of Interrupt Taken Debug 8.5.4 Data Address Compare Registers . . Events . . . . . . . . . . . . . . . . . . . . . . . . . 611 618 8.4.6.2 Interrupt Taken Debug Event 8.5.5 Data Value Compare Registers . 619 Description . . . . . . . . . . . . . . . . . . . . . 611 8.6 Debugger Notify Halt Instruction 8.4.7 Return Debug Event . . . . . . . . . 612 [Category: Embedded.Enhanced Debug] . 8.4.8 Unconditional Debug Event . . . . 612 620 8.4.9 Critical Interrupt Taken Debug Event [Category: Embedded.Enhanced Debug] . 612 8.1 Overview In addition to the facilities described here, implementa- tions will typically include debug facilities, modes, and Processors provide debug facilities to enable hardware access mechanisms which are implementation-spe- and software debug functions, such as instruction and cific. For example, implementations will typically pro- data breakpoints and program single stepping. The vide access to the debug facilities via a dedicated debug facilities consist of a set of Debug Control Regis- interface such as the IEEE 1149.1 Test Access Port ters (DBCR0, DBCR1, and DBCR2) (see Section 8.5.1 (JTAG). on page 613), a set of Address and Data Value Com- pare Registers (IAC1, IAC2, IAC3, IAC4, DAC1, DAC2, DVC1, and DVC2), (see Section 8.4.3, Section 8.4.4, 8.2 Internal Debug Mode and Section 8.4.5), a Debug Status Register (DBSR) Debug events include such things as instruction and (see Section 8.5.2) for enabling and recording various data breakpoints. These debug events cause status kinds of debug events, and a special Debug interrupt bits to be set in the Debug Status Register. The exist- type built into the interrupt mechanism (see ence of a set bit in the Debug Status Register is consid- Section 5.6.16). The debug facilities also provide a ered a Debug exception. Debug exceptions, if enabled, mechanism for software-controlled processor reset, will cause Debug interrupts. and for controlling the operation of the timers in a debug environment. There are two different mechanisms that control whether Debug interrupts are enabled. The first is the The mfspr and mtspr instructions (see Section 3.4.1) MSRDE bit, and this bit must be set to 1 to enable provide access to the registers of the debug facilities. Chapter 8. Debug Facilities 605 Version 2.04 Debug interrupts. The second mechanism is an enable bit in the Debug Control Register 0 (DBCR0). This bit is 8.4 Debug Events the Internal Debug Mode bit (DBCR0IDM), and it must Debug events are used to cause Debug exceptions to also be set to 1 to enable Debug interrupts. be recorded in the Debug Status Register (see When DBCR0IDM=1, the processor is in Internal Debug Section 8.5.2). In order for a debug event to be enabled Mode. In this mode, debug events will (if also enabled to set a Debug Status Register bit and thereby cause a by MSRDE) cause Debug interrupts. Software at the Debug exception, the specific event type must be Debug interrupt vector location will thus be given con- enabled by a corresponding bit or bits in the Debug trol upon the occurrence of a debug event, and can Control Register DBCR0 (see Section 8.5.1.1), DBCR1 access (via the normal instructions) all architected pro- (see Section 8.5.1.2), or DBCR2 (see Section 8.5.1.3), cessor resources. In this fashion, debug monitor soft- in most cases; the Unconditional Debug Event (UDE) is ware can control the processor and gather status, and an exception to this rule. Once a Debug Status Register interact with debugging hardware connected to the pro- bit is set, if Debug interrupts are enabled by MSRDE, a cessor. Debug interrupt will be generated. When the processor is not in Internal Debug Mode Certain debug events are not allowed to occur when (DBCR0IDM=0), debug events may still occur and be MSRDE=0. In such situations, no Debug exception recorded in the Debug Status Register. These excep- occurs and thus no Debug Status Register bit is set. tions may be monitored via software by reading the Other debug events may cause Debug exceptions and Debug Status Register (using mfspr), or may eventu- set Debug Status Register bits regardless of the state ally cause a Debug interrupt if later enabled by setting of MSRDE. The associated Debug interrupts that result DBCR0IDM=1 (and MSRDE=1). Processor behavior from such Debug exceptions will be delayed until when debug events occur while DBCR0IDM=0 is imple- MSRDE=1, provided the exceptions have not been mentation-dependent. cleared from the Debug Status Register in the mean- time. Any time that a Debug Status Register bit is allowed to 8.3 External Debug Mode [Cate- be set while MSRDE=0, a special Debug Status Regis- gory: Embedded.Enhanced ter bit, Imprecise Debug Event (DBSRIDE), will also be set. DBSRIDE indicates that the associated Debug Debug] exception bit in the Debug Status Register was set while Debug interrupts were disabled via the MMSRDE The External Debug Mode is a mode in which facilities bit. Debug interrupt handler software can use this bit to external to the processor can access processor determine whether the address recorded in CSRR0/ resources and control execution. These facilities are DSRR0 [Category: Embedded.Enhanced Debug] defined as the external debug facilities and are not should be interpreted as the address associated with defined here, however some instructions and registers the instruction causing the Debug exception, or simply share internal and external debug roles and are briefly the address of the instruction after the one which set described as necessary. the MSRDE bit, thereby enabling the delayed Debug A dnh instruction is provided to stop instruction fetching interrupt. and execution and allow the processor to be managed Debug interrupts are ordered with respect to other by an external debug facility. After the dnh instruction is interrupt types (see Section 7.8 on page 179). Debug executed, instructions are not fetched, interrupts are exceptions are prioritized with respect to other excep- not taken, and the processor does not execute instruc- tions (see Section 7.9 on page 183). tions. There are eight types of debug events defined: 1. Instruction Address Compare debug events 2. Data Address Compare debug events 3. Trap debug events 4. Branch Taken debug events 5. Instruction Complete debug events 6. Interrupt Taken debug events 7. Return debug events 8. Unconditional debug events 606 Power ISATM -- Book III-E Version 2.04 Programming Note There are two classes of debug exception types: ysis it wants to, then clears all debug event enables in the DBCR except for the instruction complete Type 1: exception before instruction debug event enable. Type 2: exception after instruction 4. Software does an rfci or rfdi [Category: Embed- Almost all debug exceptions fall into the first type. That ded.Enhanced Debug]. is, they all take the interrupt upon encountering an 5. Hardware would execute and complete one instruction having the exception without updating any instruction (the branch taken in this case), and architectural state (other than DBSR, CSRR0/DSRR0 then take a Debug interrupt with CSRR0/DSRR0 [Category: Embedded.Enhanced Debug], CSRR1/ [Category: Embedded.Enhanced Debug] pointing DSRR1 [Category: Embedded.Enhanced Debug], to the target of the branch. MSR) for that instruction. 6. Software would see the instruction complete inter- The CSRR0/DSRR0 [Category: Embedded.Enhanced rupt type. It clears the instruction complete event Debug] for this type of exception points to the instruc- enable, then enables the branch taken interrupt tion that encountered the exception. This includes IAC, event again. DAC, branch taken, etc. 7. Software does an rfci or rfdi [Category: Embed- The only exception which fall into the second type is the ded.Enhanced Debug]. instruction complete debug exception. This exception is taken upon completing and updating one instruction 8. Hardware resumes on the target of the taken and then pointing CSRR0/DSRR0 [Category: Embed- branch and continues until another taken branch, ded.Enhanced Debug] to the next instruction to exe- in which case we end up at step 2 again. cute. This, at first, seems like a double tax (i.e. 2 debug inter- To make forward progress for any Type 1 debug excep- rupts for every instance of a Type 1 exception), but tion one does the following: there doesn't seem like any other clean way to make forward progress on Type 1 debug exceptions. The only 1. Software sets up Type 1 exceptions (e.g. branch other way to avoid the double tax is to have the debug taken debug exceptions) and then returns to nor- handler routine actually emulate the instruction pointed mal program operation to for the Type 1 exceptions, determine the next instruc- 2. Hardware takes Debug interrupt upon the first tion that would have been executed by the interrupted branch taken Debug exception, pointing to the program flow and load the CSRR0/DSRR0 [Category: branch with CSRR0/DSRR0 [Category: Embed- Embedded.Enhanced Debug] with that address and do ded.Enhanced Debug]. an rfci/rfdi [Category: Embedded.Enhanced Debug]; this is probably not faster. 3. Software, in the debug handler, sees the branch taken exception type, does whatever logging/anal- 8.4.1 Instruction Address Com- DBCR1IAC2US specifies whether IAC2 debug events can occur in user mode or supervisor mode, or both. pare Debug Event DBCR1IAC3US specifies whether IAC3 debug events One or more Instruction Address Compare debug can occur in user mode or supervisor mode, or both. events (IAC1, IAC2, IAC3 or IAC4) occur if they are enabled and execution is attempted of an instruction at DBCR1IAC4US specifies whether IAC4 debug events an address that meets the criteria specified in the can occur in user mode or supervisor mode, or both. DBCR0, DBCR1, IAC1, IAC2, IAC3, and IAC4 Regis- ters. Effective/Real Address Mode DBCR1IAC1ER specifies whether effective addresses, Instruction Address Compare User/ real addresses, effective addresses and MSRIS=0, or Supervisor Mode effective addresses and MSRIS=1 are used in deter- mining an address match on IAC1 debug events. DBCR1IAC1US specifies whether IAC1 debug events can occur in user mode or supervisor mode, or both. DBCR1IAC2ER specifies whether effective addresses, real addresses, effective addresses and MSRIS=0, or Chapter 8. Debug Facilities 607 Version 2.04 effective addresses and MSRIS=1 are used in deter- address of the instruction fetch is greater than mining an address match on IAC2 debug events. or equal to the contents of the IAC1 and less than the contents of the IAC2, an instruction DBCR1IAC3ER specifies whether effective addresses, address match occurs. real addresses, effective addresses and MSRIS=0, or effective addresses and MSRIS=1 are used in deter- For IAC3 and IAC4 debug events, if the 64-bit mining an address match on IAC3 debug events. address of the instruction fetch is greater than or equal to the contents of the IAC3 and less DBCR1IAC4ER specifies whether effective addresses, than the contents of the IAC4, an instruction real addresses, effective addresses and MSRIS=0, or address match occurs. effective addresses and MSRIS=1 are used in deter- mining an address match on IAC4 debug events. - For 64-bit implementations, the addresses are masked to compare only bits 32:63 when the processor is executing in 32-bit mode. Instruction Address Compare Mode - Exclusive address range compare mode DBCR1IAC12M specifies whether all or some of the bits For IAC1 and IAC2 debug events, if the 64-bit of the address of the instruction fetch must match the address of the instruction fetch is less than the contents of the IAC1 or IAC2, whether the address contents of the IAC1 or greater than or equal must be inside a specific range specified by the IAC1 to the contents of the IAC2, an instruction and IAC2 or outside a specific range specified by the address match occurs. IAC1 and IAC2 for an IAC1 or IAC2 debug event to occur. For IAC3 and IAC4 debug events, if the 64-bit address of the instruction fetch is less than the DBCR1IAC34M specifies whether all or some of the bits contents of the IAC3 or greater than or equal of the address of the instruction fetch must match the to the contents of the IAC4, an instruction contents of the IAC3 Register or IAC4 Register, address match occurs. whether the address must be inside a specific range specified by the IAC3 Register and IAC4 Register or For 64-bit implementations, the addresses are outside a specific range specified by the IAC3 Register masked to compare only bits 32:63 when the and IAC4 Register for an IAC3 or IAC4 debug event to processor is executing in 32-bit mode. occur. See the detailed description of DBCR0 (see There are four instruction address compare modes. Section 8.5.1.1, "Debug Control Register 0 (DCBR0)" on page 613) and DBCR1 (see Section 8.5.1.2, "Debug There are four instruction address compare modes. Control Register 1 (DCBR1)" on page 614) and the - Exact address compare mode modes for detecting IAC1, IAC2, IAC3 and IAC4 debug If the address of the instruction fetch is equal events. Instruction Address Compare debug events can to the value in the enabled IAC Register, an occur regardless of the setting of MSRDE or instruction address match occurs. For 64-bit DBCR0IDM. implementations, the addresses are masked When an Instruction Address Compare debug event to compare only bits 32:63 when the proces- occurs, the corresponding DBSRIAC1, DBSRIAC2, sor is executing in 32-bit mode. DBSRIAC3, or DBSRIAC4 bit or bits are set to record the - Address bit match mode debug exception. If MSRDE=0, DBSRIDE is also set to 1 For IAC1 and IAC2 debug events, if the to record the imprecise debug event. address of the instruction fetch access, If MSRDE=1 (i.e. Debug interrupts are enabled) at the ANDed with the contents of the IAC2, are time of the Instruction Address Compare debug excep- equal to the contents of the IAC1, also ANDed tion, a Debug interrupt will occur immediately (provided with the contents of the IAC2, an instruction there exists no higher priority exception which is address match occurs. enabled to cause an interrupt). The execution of the For IAC3 and IAC4 debug events, if the instruction causing the exception will be suppressed, address of the instruction fetch, ANDed with and CSRR0/DSRR0 [Category: Embedded.Enhanced the contents of the IAC4, are equal to the con- Debug] will be set to the address of the excepting tents of the IAC3, also ANDed with the con- instruction. tents of the IAC4, an instruction address If MSRDE=0 (i.e. Debug interrupts are disabled) at the match occurs. time of the Instruction Address Compare debug excep- For 64-bit implementations, the addresses are tion, a Debug interrupt will not occur, and the instruction masked to compare only bits 32:63 when the will complete execution (provided the instruction is not processor is executing in 32-bit mode. causing some other exception which will generate an - Inclusive address range compare mode enabled interrupt). For IAC1 and IAC2 debug events, if the 64-bit 608 Power ISATM -- Book III-E Version 2.04 Later, if the debug exception has not been reset by with respect to debug events. Note that dcbf, clearing DBSRIAC1, DBSRIAC2, DBSRIAC3, and dcbfep, dcbst, and dcbstep are considered DBSRIAC4, and MSRDE is set to 1, a delayed Debug reads with respect to Data Storage excep- interrupt will occur. In this case, CSRR0/DSRR0 [Cate- tions, since they do not actually change the gory: Embedded.Enhanced Debug will contain the data at a given address. However, since the address of the instruction after the one which enabled execution of these instructions may result in the Debug interrupt by setting MSRDE to 1. Software in write activity on the processor's data bus, they the Debug interrupt handler can observe DBSRIDE to are treated as writes with respect to debug determine how to interpret the value in CSRR0/DSRR0 events. [Category: Embedded.Enhanced Debug. Data Address Compare User/Supervi- 8.4.2 Data Address Compare sor Mode Debug Event DBCR2DAC1US specifies whether DAC1R and DAC1W debug events can occur in user mode or One or more Data Address Compare debug events supervisor mode, or both. (DAC1R, DAC1W, DAC2R, DAC2W) occur if they are enabled, execution is attempted of a data storage DBCR2DAC2US specifies whether DAC2R and access instruction, and the type, address, and possibly DAC2W debug events can occur in user mode or even the data value of the data storage access meet supervisor mode, or both. the criteria specified in the Debug Control Register 0, Debug Control Register 2, and the DAC1, DAC2, DVC1, Effective/Real Address Mode and DVC2 Registers. DBCR2DAC1ER specifies whether effective addresses, real addresses, effective addresses Data Address Compare Read/Write and MSRDS=0, or effective addresses and Enable MSRDS=1 are used to in determining an address match on DAC1R and DAC1W debug events. DBCR0DAC1 specifies whether DAC1R debug events can occur on read-type data storage accesses and DBCR2DAC2ER specifies whether effective whether DAC1W debug events can occur on write-type addresses, real addresses, effective addresses data storage accesses. and MSRDS=0, or effective addresses and MSRDS=1 are used to in determining an address DBCR0DAC2 specifies whether DAC2R debug events match on DAC2R and DAC2W debug events. can occur on read-type data storage accesses and whether DAC2W debug events can occur on write-type data storage accesses. Data Address Compare Mode Indexed-string instructions (lswx, stswx) for which the DBCR2DAC12M specifies whether all or some of the XER field specifies zero bytes as the length of the bits of the address of the data storage access must string are treated as no-ops, and are not allowed to match the contents of the DAC1 or DAC2, whether cause Data Address Compare debug events. the address must be inside a specific range speci- fied by the DAC1 and DAC2 or outside a specific All Load instructions are considered reads with respect range specified by the DAC1 and DAC2 for a to debug events, while all Store instructions are consid- DAC1R, DAC1W, DAC2R or DAC2W debug event ered writes with respect to debug events. In addition, to occur. the Cache Management instructions, and certain spe- cial cases, are handled as follows. There are four data address compare modes. - dcbt, dcbtls, dcbtep, dcbtst, dcbtstls, dcbt- - Exact address compare mode step, icbt, icbtls, icbtep, icbi, icblc, dcblc, If the 64-bit address of the data storage and icbiep are all considered reads with access is equal to the value in the enabled respect to debug events. Note that dcbt, Data Address Compare Register, a data dcbtep, dcbtst, dcbtstep, icbt, and icbtep address match occurs. are treated as no-operations when they report Data Storage or Data TLB Miss exceptions, For 64-bit implementations, the addresses instead of being allowed to cause interrupts. are masked to compare only bits 32:63 when However, these instructions are allowed to the processor is executing in 32-bit mode. cause Debug interrupts, even when they - Address bit match mode would otherwise have been no-op'ed due to a If the address of the data storage access, Data Storage or Data TLB Miss exception. ANDed with the contents of the DAC2, are - dcbz, dcbzep, dcbi, dcbf, dcbfep, dcba, equal to the contents of the DAC1, also dcbst, and dcbstep are all considered writes ANDed with the contents of the DAC2, a data Chapter 8. Debug Facilities 609 Version 2.04 address match occurs. If MSRDE=0 (i.e. Debug interrupts are disabled) at the time of the Data Address Compare debug exception, a For 64-bit implementations, the addresses are Debug interrupt will not occur, and the instruction will masked to compare only bits 32:63 when the complete execution (provided the instruction is not processor is executing in 32-bit mode. causing some other exception which will generate an enabled interrupt). Also, DBSRIDE is set to indicate that - Inclusive address range compare mode the debug exception occurred while Debug interrupts If the 64-bit address of the data storage were disabled by MSRDE=0. access is greater than or equal to the contents of the DAC1 and less than the contents of the Later, if the debug exception has not been reset by DAC2, a data address match occurs. clearing DBSRDAC1R, DBSRDAC1W, DBSRDAC2R, DBSRDAC2W, and MSRDE is set to 1, a delayed Debug For 64-bit implementations, the addresses are interrupt will occur. In this case, CSRR0/DSRR0 [Cate- masked to compare only bits 32:63 when the gory: Embedded.Enhanced Debug will contain the processor is executing in 32-bit mode. address of the instruction after the one which enabled - Exclusive address range compare mode the Debug interrupt by setting MSRDE to 1. Software in If the 64-bit address of the data storage the Debug interrupt handler can observe DBSRIDE to access is less than the contents of the DAC1 determine how to interpret the value in CSRR0/DSRR0 or greater than or equal to the contents of the [Category: Embedded.Enhanced Debug. DAC2, a data address match occurs. 8.4.3 Trap Debug Event For 64-bit implementations, the addresses are masked to compare only bits 32:63 when the A Trap debug event (TRAP) occurs if DBCR0TRAP=1 processor is executing in 32-bit mode. (i.e. Trap debug events are enabled) and a Trap instruction (tw, twi, td, tdi) is executed and the condi- Data Value Compare Mode tions specified by the instruction for the trap are met. The event can occur regardless of the setting of DBCR2DVC1M and DBCR2DVC1BE specify whether MSRDE or DBCR0IDM. and how the data value being accessed by the storage access must match the contents of the When a Trap debug event occurs, DBSRTR is set to 1 to DVC1 for a DAC1R or DAC1W debug event to record the debug exception. If MSRDE=0, DBSRIDE is occur. also set to 1 to record the imprecise debug event. DBCR2DVC2M and DBCR2DVC2BE specify whether If MSRDE=1 (i.e. Debug interrupts are enabled) at the and how the data value being accessed by the time of the Trap debug exception, a Debug interrupt will storage access must match the contents of the occur immediately (provided there exists no higher pri- DVC2 for a DAC2R or DAC2W debug event to ority exception which is enabled to cause an interrupt), occur. and CSRR0/DSRR0 [Category: Embedded.Enhanced Debug] will be set to the address of the excepting The description of DBCR0 (see Section 8.5.1.1) and instruction. DBCR2 (see Section 8.5.1.3) and the modes for detect- ing Data Address Compare debug events. Data If MSRDE=0 (i.e. Debug interrupts are disabled) at the Address Compare debug events can occur regardless time of the Trap debug exception, a Debug interrupt will of the setting of MSRDE or DBCR0IDM. not occur, and a Trap exception type Program interrupt will occur instead if the trap condition is met. When an Data Address Compare debug event occurs, the corresponding DBSRDAC1R, DBSRDAC1W, Later, if the debug exception has not been reset by DBSRDAC2R, or DBSRDAC2W bit or bits are set to 1 to clearing DBSRTR, and MSRDE is set to 1, a delayed record the debug exception. If MSRDE=0, DBSRIDE is Debug interrupt will occur. In this case, CSRR0/DSRR0 also set to 1 to record the imprecise debug event. [Category: Embedded.Enhanced Debug will contain the address of the instruction after the one which If MSRDE=1 (i.e. Debug interrupts are enabled) at the enabled the Debug interrupt by setting MSRDE to 1. time of the Data Address Compare debug exception, a Software in the debug interrupt handler can observe Debug interrupt will occur immediately (provided there DBSRIDE to determine how to interpret the value in exists no higher priority exception which is enabled to CSRR0/DSRR0 [Category: Embedded.Enhanced cause an interrupt), the execution of the instruction Debug]. causing the exception will be suppressed, and CSRR0/ DSRR0 [Category: Embedded.Enhanced Debug will be set to the address of the excepting instruction. Depend- 8.4.4 Branch Taken Debug Event ing on the type of instruction and/or the alignment of A Branch Taken debug event (BRT) occurs if the data access, the instruction causing the exception DBCR0BRT=1 (i.e. Branch Taken Debug events are may have been partially executed (see Section 5.7). enabled), execution is attempted of a branch instruction 610 Power ISATM -- Book III-E Version 2.04 whose direction will be taken (that is, either an uncondi- 8.4.6 Interrupt Taken Debug Event tional branch, or a conditional branch whose branch condition is met), and MSRDE=1. 8.4.6.1 Causes of Interrupt Taken Branch Taken debug events are not recognized if MSRDE=0 at the time of the execution of the branch Debug Events instruction and thus DBSRIDE can not be set by a Only base class interrupts can cause an Interrupt Branch Taken debug event. This is because branch Taken debug event. If the Embedded.Enhanced Debug instructions occur very frequently. Allowing these com- category is not supported or is supported and not mon events to be recorded as exceptions in the DBSR enabled, all other interrupts automatically clear MSRDE, while debug interrupts are disabled via MSRDE would and thus would always prevent the associated Debug result in an inordinate number of imprecise Debug interrupt from occurring precisely. If the Embed- interrupts. ded.Enhanced Debug category is supported and When a Branch Taken debug event occurs, the DBSR- enabled, then critical class interrupts do not automati- cally clear MSRDE, but they cause Critical Interrupt BRT bit is set to 1 to record the debug exception and a Debug interrupt will occur immediately (provided there Taken debug events instead of Interrupt Taken debug exists no higher priority exception which is enabled to events. cause an interrupt). The execution of the instruction Also, if the Embedded.Enhanced Debug category is not causing the exception will be suppressed, and CSRR0/ supported or is supported and not enabled, Debug DSRR0 [Category: Embedded.Enhanced Debug] will interrupts themselves are critical class interrupts, and be set to the address of the excepting instruction. thus any Debug interrupt (for any other debug event) would always end up setting the additional exception of 8.4.5 Instruction Complete Debug DBSRIRPT upon entry to the Debug interrupt handler. At this point, the Debug interrupt handler would be Event unable to determine whether or not the Interrupt Taken debug event was related to the original debug event. An Instruction Complete debug event (ICMP) occurs if DBCR0ICMP=1 (i.e. Instruction Complete debug events are enabled), execution of any instruction is completed, 8.4.6.2 Interrupt Taken Debug Event and MSRDE=1. Note that if execution of an instruction Description is suppressed due to the instruction causing some other exception which is enabled to generate an inter- An Interrupt Taken debug event (IRPT) occurs if rupt, then the attempted execution of that instruction DBCR0IRPT=1 (i.e. Interrupt Taken debug events are does not cause an Instruction Complete debug event. enabled) and a base class interrupt occurs. Interrupt The sc instruction does not fall into the type of an Taken debug events can occur regardless of the setting instruction whose execution is suppressed, since the of MSRDE. instruction actually completes execution and then gen- When an Interrupt Taken debug event occurs, DBSR- erates a System Call interrupt. In this case, the Instruc- IRPT is set to 1 to record the debug exception. If tion Complete debug exception will also be set. MSRDE=0, DBSRIDE is also set to 1 to record the Instruction Complete debug events are not recognized imprecise debug event. if MSRDE=0 at the time of the execution of the instruc- If MSRDE=1 (i.e. Debug interrupts are enabled) at the tion, DBSRIDE can not be set by an ICMP debug event. time of the Interrupt Taken debug event, a Debug inter- This is because allowing the common event of Instruc- rupt will occur immediately (provided there exists no tion Completion to be recorded as an exception in the higher priority exception which is enabled to cause an DBSR while Debug interrupts are disabled via MSRDE interrupt), and Critical Save/Restore Register 0/Debug would mean that the Debug interrupt handler software Save/Restore Register 0 [Category: Embed- would receive an inordinate number of imprecise ded.Enhanced Debug] will be set to the address of the Debug interrupts every time Debug interrupts were re- interrupt vector which caused the Interrupt Taken enabled via MSRDE. debug event. No instructions at the base interrupt han- When an Instruction Complete debug event occurs, dler will have been executed. DBSRICMP is set to 1 to record the debug exception, a If MSRDE=0 (i.e. Debug interrupts are disabled) at the Debug interrupt will occur immediately (provided there time of the Interrupt Taken debug event, a Debug inter- exists no higher priority exception which is enabled to rupt will not occur, and the handler for the interrupt cause an interrupt), and CSRR0/DSRR0 [Category: which caused the Interrupt Taken debug event will be Embedded.Enhanced Debug] will be set to the address allowed to execute. of the instruction after the one causing the Instruction Complete debug exception. Later, if the debug exception has not been reset by clearing DBSRIRPT, and MSRDE is set to 1, a delayed Debug interrupt will occur. In this case, CSRR0/DSRR0 Chapter 8. Debug Facilities 611 Version 2.04 [Category: Embedded.Enhanced Debug] will contain ded.Enhanced Debug] will be set to the address of the the address of the instruction after the one which instruction which would have executed next had the enabled the Debug interrupt by setting MSRDE to 1. interrupt not occurred. Software in the Debug interrupt handler can observe If MSRDE=0 (i.e. Debug interrupts are disabled) at the the DBSRIDE bit to determine how to interpret the value time of the Unconditional Debug exception, a Debug in CSRR0/DSRR0 [Category: Embedded.Enhanced interrupt will not occur. Debug. Later, if the Unconditional Debug exception has not been reset by clearing DBSRUDE, and MSRDE is set to 8.4.7 Return Debug Event 1, a delayed Debug interrupt will occur. In this case, A Return debug event (RET) occurs if DBCR0RET=1 CSRR0/DSRR0 [Category: Embedded.Enhanced and an attempt is made to execute an rfi. Return debug Debug] will contain the address of the instruction after events can occur regardless of the setting of MSRDE. the one which enabled the Debug interrupt by setting MSRDE to 1. Software in the Debug interrupt handler When a Return debug event occurs, DBSRRET is set to can observe DBSRIDE to determine how to interpret the 1 to record the debug exception. If MSRDE=0, DBSRIDE value in CSRR0/DSRR0 [Category: Embed- is also set to 1 to record the imprecise debug event. ded.Enhanced Debug]. If MSRDE=1 at the time of the Return Debug event, a Debug interrupt will occur immediately, and CSRR0/ 8.4.9 Critical Interrupt Taken DSRR0 [Category: Embedded.Enhanced Debug will be set to the address of the rfi. Debug Event [Category: Embed- If MSRDE=0 at the time of the Return Debug event, a ded.Enhanced Debug] Debug interrupt will not occur. A Critical Interrupt Taken debug event (CIRPT) occurs if Later, if the Debug exception has not been reset by DBCR0CIRPT = 1 (i.e. Critical Interrupt Taken debug clearing DBSRRET, and MSRDE is set to 1, a delayed events are enabled) and a critical interrupt occurs. A imprecise Debug interrupt will occur. In this case, critical interrupt is any interrupt that saves state in CSRR0/DSRR0 [Category: Embedded.Enhanced CSRR0 and CSRR1 when the interrupt is taken. Criti- Debug will contain the address of the instruction after cal Interrupt Taken debug events can occur regardless the one which enabled the Debug interrupt by setting of the setting of MSRDE. MSRDE to 1. An imprecise Debug interrupt can be When a Critical Interrupt Taken debug event occurs, caused by executing an rfi when DBCR0RET=1 and DBSRCIRPT is set to 1 to record the debug event. If MSRDE=0, and the execution of that rfi happens to MSRDE=0, DBSRIDE is also set to 1 to record the cause MSRDE to be set to 1. Software in the Debug imprecise debug event. interrupt handler can observe the DBSRIDE bit to deter- mine how to interpret the value in CSRR0/DSRR0 [Cat- If MSRDE = 1 (i.e. Debug Interrupts are enabled) at the egory: Embedded.Enhanced Debug]. time of the Critical Interrupt Taken debug event, a Debug Interrupt will occur immediately (provided there is no higher priority exception which is enabled to 8.4.8 Unconditional Debug Event cause an interrupt), and DSRR0 will be set to the An Unconditional debug event (UDE) occurs when the address of the first instruction of the critical interrupt Unconditional Debug Event (UDE) signal is activated by handler. No instructions at the critical interrupt handler the debug mechanism. The exact definition of the UDE will have been executed. signal and how it is activated is implementation-depen- If MSRDE = 0 (i.e. Debug Interrupts are disabled) at the dent. The Unconditional debug event is the only debug time of the Critical Interrupt Taken debug event, a event which does not have a corresponding enable bit Debug Interrupt will not occur, and the handler for the for the event in DBCR0 (hence the name of the event). critical interrupt which caused the debug event will be The Unconditional debug event can occur regardless of allowed to execute normally. Later, if the debug excep- the setting of MSRDE. tion has not been reset by clearing DBSRCIRPT and When an Unconditional debug event occurs, the MSRDE is set to 1, a delayed Debug Interrupt will occur. DBSRUDE bit is set to 1 to record the Debug exception. In this case DSRR0 will contain the address of the If MSRDE=0, DBSRIDE is also set to 1 to record the instruction after the one that set MSRDE = 1. Software imprecise debug event. in the Debug Interrupt handler can observe DBSRIDE to determine how to interpret the value in DSRR0. If MSRDE=1 (i.e. Debug interrupts are enabled) at the time of the Unconditional Debug exception, a Debug interrupt will occur immediately (provided there exists no higher priority exception which is enabled to cause an interrupt), and CSRR0/DSRR0 [Category: Embed- 612 Power ISATM -- Book III-E Version 2.04 8.4.10 Critical Interrupt Return 8.5.1.1 Debug Control Register 0 Debug Event [Category: Embed- (DCBR0) ded.Enhanced Debug] The contents of the DCBR0 can be read into bits 32:63 of register RT using mfspr RT,DBCR0, setting bits 0:31 A Critical Interrupt Return debug event (CRET) occurs of RT to 0. The contents of bits 32:63 of register RS can if DBCR0CRET = 1 (i.e. Critical Interrupt Return debug be written to the DCBR0 using mtspr DBCR0,RS. The events are enabled) and an attempt is made to execute bit definitions for DCBR0 are shown below. an rfci instruction. Critical Interrupt Return debug events can occur regardless of the setting of MSRDE. Bit(s) Description When a Critical Interrupt Return debug event occurs, 32 External Debug Mode (EDM) [Category: DBSRCRET is set to 1 to record the debug event. If Embedded.Enhanced Debug] MSRDE=0, DBSRIDE is also set to 1 to record the The EDM bit is a read-only bit that reflects imprecise debug event. whether the processor is controlled by an external debug facility. When EDM is set, If MSRDE = 1 (i.e. Debug Interrupts are enabled) at the internal debug mode is suppressed and the time of the Critical Interrupt Return debug event, a taking of debug interrupts does not occur. Debug Interrupt will occur immediately (provided there is no higher priority exception which is enabled to 0 The processor is not in external debug cause an interrupt), and DSRR0 will be set to the mode. address of the rfci instruction. 1 The processor is in external debug mode. If MSRDE = 0 (i.e. Debug Interrupts are disabled) at the 33 Internal Debug Mode (IDM) time of the Critical Interrupt Return debug event, a 0 Debug interrupts are disabled. Debug Interrupt will not occur. Later, if the debug 1 If MSRDE=1, then the occurrence of a exception has not been reset by clearing DBSRCRET debug event or the recording of an earlier and MSRDE is set to 1, a delayed Debug Interrupt will debug event in the Debug Status Register occur. In this case DSRR0 will contain the address of when MSRDE=0 or DBCR0IDM=0 will the instruction after the one that set MSRDE = 1. An cause a Debug interrupt. imprecise Debug Interrupt can be caused by executing an rfci when DBCR0CRET = 1 and MSRDE = 0, and the 34:35 Reset (RST) execution of the rfci happens to cause MSRDE to be 00 No action set to 1. Software in the Debug Interrupt handler can 01 Implementation-specific observe DBSRIDE to determine how to interpret the 10 Implementation-specific value in DSRR0. 11 Implementation-specific Warning: Writing 0b01, 0b10, or 0b11 to 8.5 Debug Registers these bits may cause a processor reset to occur. This section describes debug-related registers that are 36 Instruction Completion Debug Event accessible to software running on the processor. These (ICMP) registers are intended for use by special debug tools and debug software, and not by general application or 0 ICMP debug events are disabled operating system code. 1 ICMP debug events are enabled Note: Instruction Completion will not cause an ICMP debug event if MSRDE=0. 8.5.1 Debug Control Registers Debug Control Register 0 (DBCR0), Debug Control Register 1 (DBCR1), and Debug Control Register 2 37 Branch Taken Debug Event Enable (BRT) (DBCR2) are each 32-bit registers. Bits of DBCR0, 0 BRT debug events are disabled DBCR1, and DBCR2 are numbered 32 (most-signifi- 1 BRT debug events are enabled cant bit) to 63 (least-significant bit). DBCR0, DBCR1, and DBCR2 are used to enable debug events, reset the Note: Taken branches will not cause a BRT processor, control timer operation during debug events, debug event if MSRDE=0. and set the debug mode of the processor. 38 Interrupt Taken Debug Event Enable (IRPT) 0 IRPT debug events are disabled 1 IRPT debug events are enabled Note: Critical interrupts will not cause an IRPT Debug event even if MSRDE=0. If the Chapter 8. Debug Facilities 613 Version 2.04 Embedded.Enhanced Debug category is sup- Debug] ported, see Section 8.4.9. A Critical Interrupt Taken Debug Event occurs when DBCR0CIRPT = 1 and a critical interrupt 39 Trap Debug Event Enable (TRAP) (any interrupt that uses the critical class, i.e. 0 TRAP debug events cannot occur uses CSRR0 and CSRR1) occurs. 1 TRAP debug events can occur 0 Critical interrupt taken debug events are 40 Instruction Address Compare 1 Debug disabled. Event Enable (IAC1) 1 Critical interrupt taken debug events are 0 IAC1 debug events cannot occur enabled. 1 IAC1 debug events can occur 58 Critical Interrupt Return Debug Event 41 Instruction Address Compare 2 Debug (CRET) [Category: Embedded.Enhanced Event Enable (IAC2) Debug] A Critical Interrupt Return Debug Event 0 IAC2 debug events cannot occur occurs when DBCR0CRET= 1 and a return 1 IAC2 debug events can occur from critical interrupt (an rfci instruction is 42 Instruction Address Compare 3 Debug executed) occurs. Event Enable (IAC3) 0 Critical interrupt return debug events are 0 IAC3 debug events cannot occur disabled. 1 IAC3 debug events can occur 1 Critical interrupt return debug events are enabled. 43 Instruction Address Compare 4 Debug Event Enable (IAC4) 59:62 Implementation-dependent 0 IAC4 debug events cannot occur 63 Freeze Timers on Debug Event (FT) 1 IAC4 debug events can occur 0 Enable clocking of timers 44:45 Data Address Compare 1 Debug Event 1 Disable clocking of timers if any DBSR bit Enable (DAC1) is set (except MRR) 00 DAC1 debug events cannot occur 01 DAC1 debug events can occur only if a 8.5.1.2 Debug Control Register 1 store-type data storage access (DCBR1) 10 DAC1 debug events can occur only if a The contents of the DCBR1 can be read into bits 32:63 load-type data storage access a register RT using mfspr RT,DBCR1, setting bits 0:31 11 DAC1 debug events can occur on any of RT to 0. The contents of bits 32:63 of register RS can data storage access be written to the DBCR1 using mtspr DBCR1,RS. The 46:47 Data Address Compare 2 Debug Event bit definitions for DCBR1 are shown below. Enable (DAC2) Bit(s) Description 00 DAC2 debug events cannot occur 01 DAC2 debug events can occur only if a 32:33 Instruction Address Compare 1 User/ store-type data storage access Supervisor Mode(IAC1US) 10 DAC2 debug events can occur only if a 00 IAC1 debug events can occur load-type data storage access 01 Reserved 11 DAC2 debug events can occur on any 10 IAC1 debug events can occur only if data storage access MSRPR=0 11 IAC1 debug events can occur only if MSRPR=1 48 Return Debug Event Enable (RET) 34:35 Instruction Address Compare 1 Effective/ 0 RET debug events cannot occur Real Mode (IAC1ER) 1 RET debug events can occur 00 IAC1 debug events are based on effective Note: Return From Critical Interrupt will not addresses cause an RET debug event if MSRDE=0. If the 01 IAC1 debug events are based on real Embedded.Enhanced Debug category is sup- addresses ported, see Section 8.4.10 10 IAC1 debug events are based on effective addresses and can occur only if MSRIS=0 49:56 Reserved 11 IAC1 debug events are based on effective 57 Critical Interrupt Taken Debug Event addresses and can occur only if MSRIS=1 (CIRPT) [Category: Embedded.Enhanced 614 Power ISATM -- Book III-E Version 2.04 36:37 Instruction Address Compare 2 User/ If IAC1USAC2US or IAC1ERIAC2ER, Supervisor Mode (IAC2US) results are boundedly undefined. 00 IAC2 debug events can occur 42:47 Reserved 01 Reserved 48:49 Instruction Address Compare 3 User/ 10 IAC2 debug events can occur only if Supervisor Mode (IAC3US) MSRPR=0 11 IAC2 debug events can occur only if 00 IAC3 debug events can occur MSRPR=1 01 Reserved 10 IAC3 debug events can occur only if 38:39 Instruction Address Compare 2 Effective/ MSRPR=0 Real Mode (IAC2ER) 11 IAC3 debug events can occur only if 00 IAC2 debug events are based on effective MSRPR=1 addresses 50:51 Instruction Address Compare 3 Effective/ 01 IAC2 debug events are based on real Real Mode (IAC3ER) addresses 10 IAC2 debug events are based on effective 00 IAC3 debug events are based on effective addresses and can occur only if MSRIS=0 addresses 11 IAC2 debug events are based on effective 01 IAC3 debug events are based on real addresses and can occur only if MSRIS=1 addresses 10 IAC3 debug events are based on effective 40:41 Instruction Address Compare 1/2 Mode addresses and can occur only if MSRIS=0 (IAC12M) 11 IAC3 debug events are based on effective 00 Exact address compare addresses and can occur only if MSRIS=1 IAC1 debug events can occur only if the 52:53 Instruction Address Compare 4 User/ address of the instruction fetch is equal to Supervisor Mode (IAC4US) the value specified in IAC1. 00 IAC4 debug events can occur IAC2 debug events can occur only if the 01 Reserved address of the instruction fetch is equal to 10 IAC4 debug events can occur only if the value specified in IAC2. MSRPR=0 11 IAC4 debug events can occur only if MSRPR=1 01 Address bit match 54:55 Instruction Address Compare 4 Effective/ IAC1 and IAC2 debug events can occur Real Mode (IAC4ER) only if the address of the instruction fetch, ANDed with the contents of IAC2 are equal 00 IAC4 debug events are based on effective to the contents of IAC1, also ANDed with addresses the contents of IAC2. 01 IAC4 debug events are based on real addresses If IAC1USIAC2US or IAC1ERIAC2ER, 10 IAC4 debug events are based on effective results are boundedly undefined. addresses and can occur only if MSRIS=0 11 IAC4 debug events are based on effective 10 Inclusive address range compare addresses and can occur only if MSRIS=1 IAC1 and IAC2 debug events can occur 56:57 Instruction Address Compare 3/4 Mode only if the address of the instruction fetch is (IAC34M) greater than or equal to the value specified 00 Exact address compare in IAC1 and less than the value specified in IAC2. IAC3 debug events can occur only if the address of the instruction fetch is equal to If IAC1USIAC2US or IAC1ERIAC2ER, the value specified in IAC3. results are boundedly undefined. IAC4 debug events can occur only if the address of the instruction fetch is equal to 11 Exclusive address range compare the value specified in IAC4. IAC1 and IAC2 debug events can occur only if the address of the instruction fetch is 01 Address bit match less than the value specified in IAC1 or is greater than or equal to the value specified IAC3 and IAC4 debug events can occur in IAC2. only if the address of the data storage access, ANDed with the contents of IAC4 Chapter 8. Debug Facilities 615 Version 2.04 are equal to the contents of IAC3, also 11 DAC1 debug events are based on effec- ANDed with the contents of IAC4. tive addresses and can occur only if MSRDS=1 If IAC3USIAC4US or IAC3ERIAC4ER, results are boundedly undefined. 36:37 Data Address Compare 2 User/Supervisor Mode (DAC2US) 10 Inclusive address range compare 00 DAC2 debug events can occur 01 Reserved IAC3 and IAC4 debug events can occur 10 DAC2 debug events can occur only if only if the address of the instruction fetch is MSRPR=0 greater than or equal to the value specified 11 DAC2 debug events can occur only if in IAC3 and less than the value specified in MSRPR=1 IAC4. 38:39 Data Address Compare 2 Effective/Real If IAC3USIAC4US or IAC3ERIAC4ER, Mode (DAC2ER) results are boundedly undefined. 00 DAC2 debug events are based on effec- 11 Exclusive address range compare tive addresses 01 DAC2 debug events are based on real IAC3 and IAC4 debug events can occur addresses only if the address of the instruction fetch is 10 DAC2 debug events are based on effec- less than the value specified in IAC3 or is tive addresses and can occur only if greater than or equal to the value specified MSRDS=0 in IAC4. 11 DAC2 debug events are based on effec- If IAC3USIAC4US or IAC3ERIAC4ER, tive addresses and can occur only if results are boundedly undefined. MSRDS=1 58:63 Reserved 40:41 Data Address Compare 1/2 Mode (DAC12M) 8.5.1.3 Debug Control Register 2 00 Exact address compare (DCBR2) DAC1 debug events can occur only if the address of the data storage access is equal The contents of the DCBR2 can be copied into bits to the value specified in DAC1. 32:63 register RT using mfspr RT,DBCR2, setting bits 0:31 of register RT to 0. The contents of bits 32:63 of a DAC2 debug events can occur only if the register RS can be written to the DCBR2 using address of the data storage access is equal mtspr DBCR2,RS. The bit definitions for DCBR2 are to the value specified in DAC2. shown below. 01 Address bit match Bit(s) Description DAC1 and DAC2 debug events can occur 32:33 Data Address Compare 1 User/Supervisor only if the address of the data storage Mode (DAC1US) access, ANDed with the contents of DAC2 00 DAC1 debug events can occur are equal to the contents of DAC1, also 01 Reserved ANDed with the contents of DAC2. 10 DAC1 debug events can occur only if If DAC1USDAC2US or MSRPR=0 DAC1ERDAC2ER, results are boundedly 11 DAC1 debug events can occur only if undefined. MSRPR=1 34:35 Data Address Compare 1 Effective/Real Mode (DAC1ER) 00 DAC1 debug events are based on effec- 10 Inclusive address range compare tive addresses 01 DAC1 debug events are based on real DAC1 and DAC2 debug events can occur addresses only if the address of the data storage 10 DAC1 debug events are based on effec- access is greater than or equal to the value tive addresses and can occur only if specified in DAC1 and less than the value MSRDS=0 specified in DAC2. If DAC1US DAC2US or DAC1ER DAC2ER, results are boundedly undefined. 616 Power ISATM -- Book III-E Version 2.04 Specifies which bytes in the aligned data 11 Exclusive address range compare value being read or written by the storage access are compared to the corresponding DAC1 and DAC2 debug events can occur bytes in DVC2 only if the address of the data storage access is less than the value specified in DAC1 or is greater than or equal to the 8.5.2 Debug Status Register value specified in DAC2. The Debug Status Register (DBSR) is a 32-bit register If DAC1US DAC2US or DAC1ER and contains status on debug events and the most DAC2ER, results are boundedly undefined. recent processor reset. 42:43 Reserved The DBSR is set via hardware, and read and cleared 44:45 Data Value Compare 1 Mode (DVC1M) via software. The contents of the DBSR can be read into bits 32:63 of a register RT using the mfspr instruc- 00 DAC1 debug events can occur tion, setting bits 0:31 of RT to zero. Bits in the DBSR 01 DAC1 debug events can occur only when can be cleared using the mtspr instruction. Clearing is all bytes specified in DBCR2DVC1BE in the done by writing bits 32:63 of a register to the DBSR data value of the data storage access with a 1 in any bit position that is to be cleared and 0 in match their corresponding bytes in DVC1 all other bit positions. The write-data to the DBSR is not 10 DAC1 debug events can occur only when direct data, but a mask. A 1 causes the bit to be at least one of the bytes specified in cleared, and a 0 has no effect. DBCR2DVC1BE in the data value of the data storage access matches its corre- The bit definitions for the DBSR are shown below: sponding byte in DVC1 11 DAC1 debug events can occur only when Bit(s) Description all bytes specified in DBCR2DVC1BE within 32 Imprecise Debug Event (IDE) at least one of the halfwords of the data value of the data storage access matches Set to 1 if MSRDE=0 and a debug event their corresponding bytes in DVC1 causes its respective Debug Status Register bit to be set to 1. 46:47 Data Value Compare 2 Mode (DVC2M) 33 Unconditional Debug Event (UDE) 00 DAC2 debug events can occur 01 DAC2 debug events can occur only when Set to 1 if an Unconditional debug event all bytes specified in DBCR2DVC2BE in the occurred. See Section 8.4.8. data value of the data storage access 34:35 Most Recent Reset (MRR) match their corresponding bytes in DVC2 10 DAC2 debug events can occur only when Set to one of three values when a reset at least one of the bytes specified in occurs. These two bits are undefined at DBCR2DVC2BE in the data value of the power-up. data storage access matches its corre- sponding byte in DVC2 00 No reset occurred since these bits last 11 DAC2 debug events can occur only when cleared by software all bytes specified in DBCR2DVC2BE within 01 Implementation-dependent reset informa- at least one of the halfwords of the data tion value of the data storage access matches 10 Implementation-dependent reset informa- their corresponding bytes in DVC2 tion 11 Implementation-dependent reset informa- tion 48:55 Data Value Compare 1 Byte Enables (DVC1BE) 36 Instruction Complete Debug Event (ICMP) Specifies which bytes in the aligned data Set to 1 if an Instruction Completion debug value being read or written by the storage event occurred and DBCR0ICMP=1. See access are compared to the corresponding Section 8.4.5. bytes in DVC1. 37 Branch Taken Debug Event (BRT) 56:63 Data Value Compare 2 Byte Enables (DVC2BE) Set to 1 if a Branch Taken debug event occurred and DBCR0BRT=1. See Section 8.4.4. Chapter 8. Debug Facilities 617 Version 2.04 38 Interrupt Taken Debug Event (IRPT) 53:56 Implementation-dependent Set to 1 if an Interrupt Taken debug event 57 Critical Interrupt Taken Debug Event occurred and DBCR0IRPT=1. See (CIRPT) [Category: Embedded.Enhanced Section 8.4.6. Debug] A Critical Interrupt Taken Debug Event occurs 39 Trap Instruction Debug Event (TRAP) when DBCR0CIRPT=1 and a critical interrupt Set to 1 if a Trap Instruction debug event (any interrupt that uses the critical class, i.e. occurred and DBCR0TRAP=1. See uses CSRR0 and CSRR1) occurs. Section 8.4.3. 0 Critical interrupt taken debug events are 40 Instruction Address Compare 1 Debug disabled. Event (IAC1) 1 Critical interrupt taken debug events are enabled. Set to 1 if an IAC1 debug event occurred and DBCR0IAC1=1. See Section 8.4.1. 58 Critical Interrupt Return Debug Event (CRET) [Category: Embedded.Enhanced 41 Instruction Address Compare 2 Debug Debug] Event (IAC2) A Critical Interrupt Return Debug Event Set to 1 if an IAC2 debug event occurred and occurs when DBCR0CRET=1 and a return DBCR0IAC2=1. See Section 8.4.1. from critical interrupt (an rfci instruction is executed) occurs. 42 Instruction Address Compare 3 Debug Event (IAC3) 0 Critical interrupt return debug events are disabled. Set to 1 if an IAC3 debug event occurred and 1 Critical interrupt return debug events are DBCR0IAC3=1. See Section 8.4.1. enabled. 43 Instruction Address Compare 4 Debug 59:63 Implementation-dependent Event (IAC4) Set to 1 if an IAC4 debug event occurred and DBCR0IAC4=1. See Section 8.4.1. 8.5.3 Instruction Address Com- 44 Data Address Compare 1 Read Debug pare Registers Event (DAC1R) The Instruction Address Compare Register 1, 2, 3, and Set to 1 if a read-type DAC1 debug event 4 (IAC1, IAC2, IAC3, and IAC4 respectively) are each occurred and DBCR0DAC1=0b10 or 64-bits, with bit 63 being reserved. DBCR0DAC1=0b11. See Section 8.4.2. A debug event may be enabled to occur upon an 45 Data Address Compare 1 Write Debug attempt to execute an instruction from an address Event (DAC1W) specified in either IAC1, IAC2, IAC3, or IAC4, inside or outside a range specified by IAC1 and IAC2 or, inside Set to 1 if a write-type DAC1 debug event or outside a range specified by IAC3 and IAC4, or to occurred and DBCR0DAC1=0b01 or blocks of addresses specified by the combination of the DBCR0DAC1=0b11. See Section 8.4.2. IAC1 and IAC2, or to blocks of addresses specified by 46 Data Address Compare 2 Read Debug the combination of the IAC3 and IAC4. Since all instruc- Event (DAC2R) tion addresses are required to be word-aligned, the two low-order bits of the Instruction Address Compare Reg- Set to 1 if a read-type DAC2 debug event isters are reserved and do not participate in the com- occurred and DBCR0DAC2=0b10 or parison to the instruction address (see Section 8.4.1 on DBCR0DAC2=0b11. See Section 8.4.2. page 607). The contents of the Instruction Address Compare i 47 Data Address Compare 2 Write Debug Register (where i={1,2,3, or 4}) can be read into regis- Event (DAC2W) ter RT using mfspr RT,IACi. The contents of register RS can be written to the Instruction Address Compare i Set to 1 if a write-type DAC2 debug event Register using mtspr IACi,RS. occurred and DBCR0DAC2=0b01 or DBCR0DAC2=0b11. See Section 8.4.2. 48 Return Debug Event (RET) 8.5.4 Data Address Compare Reg- Set to 1 if a Return debug event occurred and isters DBCR0RET=1. See Section 8.4.2. The Data Address Compare Register 1 and 2 (DAC1 49:52 Reserved and DAC2 respectively) are each 64-bits. 618 Power ISATM -- Book III-E Version 2.04 A debug event may be enabled to occur upon loads, stores, or cache operations to an address specified in either the DAC1 or DAC2, inside or outside a range specified by the DAC1 and DAC2, or to blocks of addresses specified by the combination of the DAC1 and DAC1 (see Section 8.4.2). The contents of the Data Address Compare i Register (where i={1 or 2}) can be read into register RT using mfspr RT,DACi. The contents of register RS can be written to the Data Address Compare i Register using mtspr DACi,RS. The contents of the DAC1 or DAC2 are compared to the address generated by a data storage access instruction. 8.5.5 Data Value Compare Regis- ters The Data Value Compare Register 1 and 2 (DVC1 and DVC2 respectively) are each 64-bits. A DAC1R, DAC1W, DAC2R, or DAC2W debug event may be enabled to occur upon loads or stores of a spe- cific data value specified in either or both of the DVC1 and DVC2. DBCR2DVC1M and DBCR2DVC1BE control how the contents of the DVC1 is compared with the value and DBCR2DVC2M and DBCR2DVC2BE control how the contents of the DVC2 is compared with the value (see Section 8.4.2 and Section 8.5.1.3). The contents of the Data Value Compare i Register (where i={1 or 2}) can be read into register RT using mfspr RT,DVCi. The contents of register RS can be written to the Data Value Compare i Register using mtspr DVCi,RS. Chapter 8. Debug Facilities 619 Version 2.04 8.6 Debugger Notify Halt Instruction [Category: Embedded.Enhanced Debug] The dnh instruction provides the means for the transfer of information between the processor and an imple- mentation-dependent external debug facility. dnh also causes the processor to stop fetching and executing instructions. Debugger Notify Halt XFX-form dnh DUI,DUIS 19 DUI DUIS 198 / 0 6 11 21 31 if enabled by implementation-dependent means then implementation-dependent register 1 dui halt processor else illegal instruction exception Execution of the dnh instruction causes the processor to stop fetching instructions and taking interrupts if exe- cution of the instruction has been enabled. The con- tents of the DUI field are sent to the external debug facility to identify the reason for the halt. If execution of the dnh instruction has not been previ- ously enabled, executing the dnh instruction produces an Illegal Instruction exception. The means by which execution of the dnh instruction is enabled is imple- mentation-dependent. The current state of the processor debug facility, whether the processor is in IDM or EDM mode has no effect on the execution of the dnh instruction. The instruction is context synchronizing. Programming Note The DUIS field in the instruction may be used to pass information to an external debug facility. After the dnh instruction has executed, the instruction itself can be read back by the Illegal Instruction Interrupt handler or the external debug facility if the contents of the DUIS field are of interest. If the pro- cessor entered the Illegal Instruction Interrupt han- dler, software can use SRR0 to obtain the address of the dnh instruction which caused the handler to be invoked. If the dnh instruction has been exe- cuted and the processor has stopped fetching instructions, the external debug facility can issue a mfspr NIA to obtain the address of the dnh instruc- tion that was executed. Special Registers Altered: None 620 Power ISATM -- Book III-E Version 2.04 Chapter 9. Processor Control [Category: Embedded.Processor Control] 9.1 Overview. . . . . . . . . . . . . . . . . . . . 621 9.2.1.2 Doorbell Critical Message Filtering 9.2 Programming Model . . . . . . . . . . . 621 622 9.2.1 Processor Message Handling and 9.3 Processor Control Instructions . . . 623 Filtering . . . . . . . . . . . . . . . . . . . . . . . . 621 9.2.1.1 Doorbell Message Filtering . . . 622 9.1 Overview when the message is received and the processor deter- mines through examination of the payload that the The Processor Control facility provides a mechanism message should be accepted. The examination of the for processors within a coherence domain to send mes- payload for this purpose is termed filtering. The accep- sages to all devices in the coherence domain. The facil- tance of a Processor Doorbell [Critical] message ity provides a mechanism for sending interrupts that are causes an exception to be generated on the accepting not dependent on the interrupt controller to processors processor. and allows message filtering by the processors that Processors accept and filter messages defined in receive the message. Section 9.2.1. Processors may also accept other imple- The Processor Control facility is also useful for sending mentation-dependent defined messages. messages to a device that provides specialized ser- vices such as secure boot operations controlled by a 9.2.1 Processor Message Han- security device. dling and Filtering The Processor Control facility defines how processors send messages and what actions processors take on Processors filter, accept, and handle message types the receipt of a message. The actions taken by devices defined as follows. The message type is specified in the other than processors are not defined. message and is determined by the contents of register RB32:36 used as the operand in the msgsnd instruc- tion.The message type is interpreted as follows: 9.2 Programming Model Value Description Processors initiate a message by executing the msg- 0 Doorbell Interrupt (DBELL) snd instruction and specifying a message type and A Processor Doorbell exception is generated message payload in a general purpose register. Send- on the processor when the processor has fil- ing a message causes the message to be sent to all the tered the message based on the payload and devices, including the sending processor, in the coher- has determined that it should accept the mes- ence domain in a reliable manner. sage. A Processor Doorbell Interrupt occurs when no higher priority exception exists, a Each device receives all messages that are sent. The Processor Doorbell exception exists, and actions that a device takes are dependent on the mes- MSREE=1. sage type and payload. There are no restrictions on what messages a processor can send. 1 Doorbell Critical Interrupt (DBELL_CRIT) A Processor Doorbell Critical exception is To provide inter processor interrupt capability two mes- generated on the processor when the proces- sage types are defined, Processor Doorbell and Pro- sor has filtered the message based on the cessor Doorbell Critical. A Processor Doorbell [Critical] payload and has determined that it should message causes an interrupt to occur on processors accept the message. A Processor Doorbell Chapter 9. Processor Control [Category: Embedded.Processor Control] 621 Version 2.04 Critical Interrupt occurs when no higher prior- 9.2.1.2 Doorbell Critical Message Filter- ity exception exists, a Processor Doorbell Crit- ing ical exception exists, and MSRCE=1. A processor receiving a DBELL_CRIT message type Message types other than these and their associated will filter the message and either ignore the message or actions are implementation-dependent. accept the message and generate a Processor Door- bell Critical exception based on the payload and the 9.2.1.1 Doorbell Message Filtering state of the processor at the time the message is received. A processor receiving a DBELL message type will filter the message and either ignore the message or accept The payload is specified in the message and is deter- the message and generate a Processor Doorbell mined by the contents of register RB37:63 used as the exception based on the payload and the state of the operand in the msgsnd instruction. The payload bits processor at the time the message is received. are defined below. The payload is specified in the message and is deter- Bit Description mined by the contents of register RB37:63 used as the 37 Broadcast (BRDCAST) operand in the msgsnd instruction. The payload bits The message is accepted by all processors are defined below. regardless of the value of the PIR register and Bit Description the value of PIRTAG. 37 Broadcast (BRDCAST) 0 If the value of PIR and PIRTAG are equal The message is accepted by all processors a Processor Doorbell Critical exception is regardless of the value of the PIR register and generated. the value of PIRTAG. 1 A Processor Doorbell Critical exception is generated regardless of the value of 0 If the value of PIR and PIRTAG are equal PIRTAG and PIR. a Processor Doorbell exception is gener- ated. 38:41 Reserved 1 A Processor Doorbell exception is gener- 50:63 PIR Tag (PIRTAG) ated regardless of the value of PIRTAG The contents of this field are compared with and PIR. bits 50:63 of the PIR register. 38:41 Reserved If a DBELL_CRIT message is received by a processor 50:63 PIR Tag (PIRTAG) and either payloadBRDCAST=1 or PIR50:63=payload- The contents of this field are compared with PIRTAG then a Processor Doorbell Critical exception is bits 50:63 of the PIR register. generated. The exception condition remains until a Pro- cessor Doorbell Critical Interrupt is taken, or a msgclr If a DBELL message is received by a processor and instruction is executed on the receiving processor with either payloadBRDCAST=1 or PIR50:63=payloadPIRTAG a message type of DBELL_CRIT. A change to any of then a Processor Doorbell exception is generated. The the filtering criteria (i.e. changing the PIR register) will exception condition remains until a Processor Doorbell not clear a pending Processor Doorbell Critical excep- Interrupt is taken, or a msgclr instruction is executed tion. on the receiving processor with a message type of DBELL. A change to any of the filtering criteria (i.e. DBELL_CRIT messages are not cumulative. That is, if changing the PIR register) will not clear a pending Pro- a DBELL_CRIT message is accepted and the interrupt cessor Doorbell exception. is pended because MSRCE=0, further DBELL_CRIT messages that would be accepted are ignored until the DBELL messages are not cumulative. That is, if a Processor Doorbell Critical exception is cleared by tak- DBELL message is accepted and the interrupt is ing the interrupt or cleared by executing a msgclr with pended because MSREE=0, further DBELL messages a message type of DBELL_CRIT on the receiving pro- that would be accepted are ignored until the Processor cessor. Doorbell exception is cleared by taking the interrupt or cleared by executing a msgclr with a message type of The temporal relationship between when a DBELL on the receiving processor. DBELL_CRIT message is sent and when it is received in a given processor is not defined. The temporal relationship between when a DBELL message is sent and when it is received in a given pro- cessor is not defined. 622 Power ISATM -- Book III-E Version 2.04 9.3 Processor Control Instructions msgsnd and msgclr instructions are provided for In the instruction descriptions the statement "this sending and clearing messages to processors and instructions is treated as a Store" means that the other devices in the coherence domain. These instruc- instruction is treated as a Store with respect to the stor- tions are privileged. age access ordering mechanism caused by memory barriers in Section 1.7.1 of Book II. Message Send X-form Message Clear X-form msgsnd RB msgclr RB 31 /// /// RB 206 / 31 /// /// RB 238 / 0 6 11 16 21 31 0 6 11 16 21 31 msgtype 1 GPR(RB)32:36 msgtype 1 GPR(RB)32:36 payload 1 GPR(RB)37:63 clear_received_message(msgtype) send_msg_to_choherence_domain(msgtype, payload) msgclr clears a message of msgtype previously msgsnd sends a message to all devices in the coher- accepted by the processor executing the msgclr. msg- ence domain. The message contains a type and a pay- type is defined by the contents of RB32:36. A message load. The message type (msgtype) is defined by the is said to be cleared when a pending exception gener- contents of RB32:36 and the message payload is ated by an accepted message has not yet taken its defined by the contents of RB37:63. Message delivery is associated interrupt. reliable and guaranteed. Each device may perform spe- cific actions based on the message type and payload or If a pending exception exists for msgtype that exception may ignore messages. Consult the implementation is cleared at the completion of the msgclr instruction. user's manual for specific actions taken based on mes- For processors, the types of messages that can be sage type and payload. cleared are defined in Section 9.2.1. For processors, actions taken on receipt of a message This instruction is privileged. are defined in Section 9.2.1. Special Registers Altered: For storage access ordering, msgsnd is treated as a None Store with respect to memory barriers. This instruction is privileged. Programming Note Execution of a msgclr instruction that clears a Special Registers Altered: pending exception when the associated interrupt is None masked because the interrupt enable (MSREE or MSRCE) is not set to 1 will always clear the pending exception (and thus the interrupt will not occur) if a subsequent instruction causes MSREE or MSRCE to be set to 1. Chapter 9. Processor Control [Category: Embedded.Processor Control] 623 Version 2.04 624 Power ISATM -- Book III-E Version 2.04 Chapter 10. Synchronization Requirements for Context Alterations Changing the contents of certain System Registers, the If a sequence of instructions contains context-altering contents of TLB entries, or the contents of other system instructions and contains no instructions that are resources that control the context in which a program affected by any of the context alterations, no software executes can have the side effect of altering the context synchronization is required within the sequence. in which data addresses and instruction addresses are interpreted, and in which instructions are executed and Programming Note data accesses are performed. For example, changing Sometimes advantage can be taken of the fact that certain bits in the MSR has the side effect of changing certain events, such as interrupts, and certain how instruction addresses are calculated. These side instructions that occur naturally in the program, effects need not occur in program order, and therefore such as an rfi, rfci, rfmci, or rfdi [Cate- may require explicit synchronization by software. (Pro- gory:Embeddd.Enhanced Debug] that returns from gram order is defined in Book II.) an interrupt handler, provide the required synchro- An instruction that alters the context in which data nization. addresses or instruction addresses are interpreted, or in which instructions are executed or data accesses are No software synchronization is required before or after performed, is called a context-altering instruction. This a context-altering instruction that is also context syn- chapter covers all the context-altering instructions. The chronizing (e.g., rfi, etc.) or when altering the MSR in software synchronization required for them is shown in most cases (see the tables). No software synchroniza- Table 5 (for data access) and Table 4 (for instruction tion is required before most of the other alterations fetch and execution). shown in Table 4, because all instructions preceding the context-altering instruction are fetched and The notation "CSI" in the tables means any context syn- decoded before the context-altering instruction is exe- chronizing instruction (e.g., sc, isync, rfi, rfci, rfmci, or cuted (the processor must determine whether any of rfdi [Category: Embedded. Enhanced Debug]). A con- these preceding instructions are context synchroniz- text synchronizing interrupt (i.e., any interrupt except ing). non-recoverable System Reset or non-recoverable Machine Check) can be used instead of a context syn- Unless otherwise stated, the material in this chapter chronizing instruction. If it is, phrases like "the synchro- assumes a uniprocessor environment. nizing instruction", below, should be interpreted as meaning the instruction at which the interrupt occurs. If no software synchronization is required before (after) a context-altering instruction, "the synchronizing instruc- tion before (after) the context-altering instruction" should be interpreted as meaning the context-altering instruction itself. The synchronizing instruction before the context-alter- ing instruction ensures that all instructions up to and including that synchronizing instruction are fetched and executed in the context that existed before the alter- ation. The synchronizing instruction after the context- altering instruction ensures that all instructions after that synchronizing instruction are fetched and executed in the context established by the alteration. Instructions after the first synchronizing instruction, up to and including the second synchronizing instruction, may be fetched or executed in either context. Chapter 10. Synchronization Requirements for Context Alterations 625 Version 2.04 Instruction or Required Required Notes Instruction or Required Required Notes Event Before After Event Before After interrupt none none interrupt none none rfi none none rfi none none rfci none none rfci none none rfmci none none rfmci none none rfdi[Category:E.ED] none none rfdi[Category:E.ED] none none sc none none sc none none mtmsr (CM) none none mtmsr (CM) none CSI mtmsr (ICM) none CSI mtmsr (ICM) none none mtmsr (UCLE) none none mtmsr (PR) none CSI mtmsr (SPV) none none mtmsr (ME) none CSI 3 mtmsr (WE) -- -- 4 mtmsr (DS) none CSI mtmsr (CE) none none 5 mtspr (PID) CSI CSI mtmsr (EE) none none 5 mtspr (DBSR) -- -- 6 mtmsr (PR) none CSI mtspr --- --- 6 mtmsr (FP) none CSI (DBCR0,DBCR2) mtmsr (DE) none CSI mtspr -- -- 6 mtmsr (ME) none CSI 3 (DAC1,DAC2, mtmsr (FE0) none CSI DVC1,DVC2) mtmsr (FE1) none CSI tlbivax CSI CSI, or CSI 1,7 and sync mtmsr (IS) none CSI 2 tlbwe CSI CSI, or CSI 1,7 mtspr (DEC) none none 8 and sync mtspr (PID) none CSI 2 mtspr (IVPR) none none Table 5: Synchronization requirements for data access mtspr (DBSR) -- -- 6 mtspr -- -- 6 Notes: (DBCR0,DBCR1) 1. There are additional software synchronization mtspr -- -- 6 requirements for this instruction in multiprocessor (IAC1,IAC2,IAC3, environments (e.g., it may be necessary to invali- IAC4) date one or more TLB entries on all processors in mtspr (IVORi) none none the multiprocessor system and to be able to deter- mtspr (TSR) none none 8 mine that the invalidations have completed and mtspr (TCR) none none 8 that all side effects of the invalidations have taken tlbivax none CSI, or 1,7 effect); it is also necessary to execute a tlbsync CSI and sync instruction. tlbwe none CSI, or 1,7 2. The alteration must not cause an implicit branch in CSI and sync real address space. Thus the real address of the wrtee none none 5 context-altering instruction and of each subse- wrteei none none 5 quent instruction, up to and including the next con- text synchronizing instruction, must be Table 4: Synchronization requirements for instruction independent of whether the alteration has taken fetch and/or execution effect. 3. A context synchronizing instruction is required after altering MSRME to ensure that the alteration takes effect for subsequent Machine Check inter- rupts, which may not be recoverable and therefore may not be context synchronizing. 4. Synchronization requirements for changing the Wait State Enable are implementation-dependent,. 5. The effect of changing MSREE or MSRCE is imme- diate. 626 Power ISATM -- Book III-E Version 2.04 If an mtmsr, wrtee, or wrteei instruction sets Programming Note MSREE to `0', an External Input, DEC or FIT inter- rupt does not occur after the instruction is exe- The following sequence illustrates why it is cuted. necessary, for data accesses, to ensure that all storage accesses due to instructions before If an mtmsr, wrtee, or wrteei instruction changes the tlbwe or tlbivax have completed to a point MSREE from `0' to `1' when an External Input, Dec- at which they have reported all exceptions they rementer, Fixed-Interval Timer, or higher priority will cause. Assume that valid TLB entries exist enabled exception exists, the corresponding inter- for the target storage location when the rupt occurs immediately after the mtmsr, wrtee, or sequence starts. wrteei is executed, and before the next instruction 1 A program issues a load or store to a is executed in the program that set MSREE to `1'. page. If an mtmsr instruction sets MSRCE to `0', a Criti- 1 The same program executes a tlbwe or cal Input or Watchdog Timer interrupt does not tlbivax that invalidates the corresponding occur after the instruction is executed. TLB entry. 1 The Load or Store instruction finally exe- If an mtmsr instruction changes MSRCE from `0' to cutes, and gets a TLB Miss exception. `1' when a Critical Input, Watchdog Timer or higher 1 The TLB Miss exception is semantically priority enabled exception exists, the correspond- incorrect. In order to prevent it, a context ing interrupt occurs immediately after the mtmsr is synchronizing instruction must be exe- executed, and before the next instruction is exe- cuted between steps 1 and 2. cuted in the program that set MSRCE to `1'. 6. Synchronization requirements for changing any of 8. The elapsed time between the Decrementer reach- the Debug Facility Registers are implementation- ing zero, or the transition of the selected Time dependent. Base bit for the Fixed-Interval Timer or the Watch- dog Timer, and the signalling of the Decrementer, 7. For data accesses, the context synchronizing Fixed-Interval Timer or the Watchdog Timer excep- instruction before the tlbwe or tlbivax instruction tion is not defined. ensures that all storage accesses due to preceding instructions have completed to a point at which they have reported all exceptions they will cause. The context synchronizing instruction after the tlbwe or tlbivax ensures that subsequent storage accesses (data and instruction) will use the updated value in the TLB entry(s) being affected. It does not ensure that all storage accesses previ- ously translated by the TLB entry(s) being updated have completed with respect to storage; if these completions must be ensured, the tlbwe or tlbivax must be followed by an sync instruction as well as by a context synchronizing instruction. Chapter 10. Synchronization Requirements for Context Alterations 627 Version 2.04 628 Power ISATM -- Book III-E Version 2.04 Appendix A. Implementation-Dependent Instructions This appendix documents architectural resources that may exercise reasonable flexibility in implementing are allocated for specific implementation-sensitive func- these functions, but that flexibility should be limited to tions which have scope-limited utility. Implementations that allowed in this appendix. A.1 Embedded Cache Initialization [Category: Embedded.Cache Ini- tialization] Data Cache Invalidate X-form Instruction Cache Invalidate X-form dci CT ici CT 31 / CT /// /// 454 / 31 / CT /// /// 966 / 0 6 7 11 16 21 31 0 6 7 11 16 21 31 If CT is not supported by the implementation, this If CT is not supported by the implementation, this instruction designates the primary data cache as the instruction designates the primary instruction cache as target data cache. the target instruction cache. If CT is supported by the implementation, let CT desig- If CT is supported by the implementation, let CT desig- nate either the primary data cache or another level of nate either the primary instruction cache or another the data cache hierarchy, as specified in Book II Sec- level of the instruction cache hierarchy, as specified in tion 3.2, as the target data cache. Book II Section 3.2, as the target instruction cache. The contents of the target data cache of the processor The contents of the target instruction cache of the pro- executing the dci instruction are invalidated. cessor executing the ici instruction are invalidated. Software must place a sync instruction before the dci Software must place a sync instruction before the ici to to guarantee all previous data storage accesses com- guarantee all previous instruction storage accesses plete before the dci is performed. complete before the ici is performed. Software must place a sync instruction after the dci to Software must place an isync instruction after the ici to guarantee that the dci completes before any subse- invalidate any instructions that may have already been quent data storage accesses are performed. fetched from the previous contents of the instruction cache after the isync. This instruction is privileged. This instruction is privileged. Special Registers Altered: None Special Registers Altered: None Extended Mnemonics: Extended Mnemonics: Extended mnemonic for Data Cache Invalidate Extended mnemonic for Instruction Cache Invalidate Extended: Equivalent to: dccci dci 0 Extended: Equivalent to: iccci ici 0 Appendix A. Implementation-Dependent Instructions 629 Version 2.04 A.2 Embedded Cache Debug Facility [Category: Embedded.Cache Debug] A.2.1 Embedded Cache Debug Registers A.2.1.1 Data Cache Debug Tag Register A.2.1.2 Data Cache Debug Tag Register High Low The Data Cache Debug Tag Register High (DCDBTRH) The Data Cache Debug Tag Register Low (DCDBTRL) is a 32-bit Special Purpose Register (SPRN=0x39D). is a 32-bit Special Purpose Register (SPRN=0x39C). Data Cache Debug Tag Register High is read using Data Cache Debug Tag Register Low is read using mfspr and is set by dcread. mfspr and is set by dcread. DCDBTRH DCDBTRL 32 63 32 63 Figure 25. Data Cache Debug Tag Register High Figure 26. Data Cache Debug Tag Register Low Programming Note Programming Note An example implementation of DCDBTRH could An example implementation of DCDBTRL could have the following content and format. have the following content and format. Bit(s) Description Bit(s) Description 32:55 Tag Real Address (TRA) 32:44 Reserved (TRA) Bits 0:23 of the lower 32 bits of the 36-bit 45 U bit parity (UPAR) real address associated with this cache block 46:47 Tag parity (TPAR) 56 Valid (V) 48:51 Data parity (DPAR) The valid indicator for the cache block (1 52:55 Modified (dirty) parity (MPAR) indicates valid) 56:59 Dirty Indicators (D) 57:59 Reserved The "dirty" (modified) indicators for each 60:63 Tag Extended Real Address (TERA) of the four doublewords in the cache block Upper 4 bits of the 36-bit real address 60 U0 Storage Attribute (U0) associated with this cache block The U0 storage attribute for the page Implementations may support different content and associated with this cache block format based on their cache implementation. 61 U1 Storage Attribute (U1) The U1 storage attribute for the page associated with this cache block 62 U2 Storage Attribute (U2) The U2 storage attribute for the page associated with this cache block 63 U3 Storage Attribute (U3) The U3 storage attribute for the page associated with this cache block Implementations may support different content and format based on their cache implementation. 630 Power ISATM -- Book III-E Version 2.04 A.2.1.3 Instruction Cache Debug Data A.2.1.5 Instruction Cache Debug Tag Register Register Low The Instruction Cache Debug Data Register (ICDBDR) The Instruction Cache Debug Tag Register Low (ICDB- is a read-only 32-bit Special Purpose Register TRL) is a 32-bit Special Purpose Register (SPRN=0x3D3). Instruction Cache Debug Data Regis- (SPRN=0x39E). Instruction Cache Debug Tag Register ter can be read using mfspr and is set by icread. Low is read using mfspr and is set by icread. ICDBDR ICDBTRL 32 63 32 63 Figure 27. Instruction Cache Debug Data Register Figure 29. Instruction Cache Debug Tag Register Low A.2.1.4 Instruction Cache Debug Tag Programming Note Register High An example implementation of ICDBTRL could The Instruction Cache Debug Tag Register High (ICDB- have the following content and format. TRH) is a 32-bit Special Purpose Register (SPRN=0x39F). Instruction Cache Debug Tag Register Bit(s) Description High is read using mfspr and is set by icread. 32:53 Reserved ICDBTRH 54 Translation Space (TS) 32 63 The address space portion of the virtual address associated with this cache block. Figure 28. Instruction Cache Debug Tag Register High 55 Translation ID Disable (TD) TID Disable field for the memory page Programming Note associated with this cache block An example implementation of ICDBTRH could 56:63 Translation ID (TID) have the following content and format. TID field portion of the virtual address associated with this cache block Bit(s) Description Other implementations may support different con- 32:55 Tag Effective Address (TEA) tent and format based on their cache implementa- Bits 0:23 of the 32-bit effective address tion. associated with this cache block 56 Valid (V) The valid indicator for the cache block (1 indicates valid) 57:58 Tag parity (TPAR) 59 Instruction Data parity (DPAR) 60:63 Reserved Implementations may support different content and format based on their cache implementation. Appendix A. Implementation-Dependent Instructions 631 Version 2.04 A.2.2 Embedded Cache Debug Instructions Data Cache Read X-form msync # ensure that all previous # cache operations have dcread RT,RA,RB # completed 31 RT RA RB 486 / dcread regT,regA,regB# read cache information; 0 6 11 16 21 31 isync # ensure dcread completes # before attempting to [Alternative Encoding] # read results 31 RT RA RB 326 / mfspr regD,dcdbtrh # move high portion of tag 0 6 11 16 21 31 # into GPR D mfspr regE,dcdbtrl # move low portion of tag if RA = 0 then b 1 0 # into GPR E else b 1 (RA) EA 1 b + (RB) This instruction is privileged. C 1 log2(cache size) Special Registers Altered: B 1 log2(cache block size) DCDBTRH DCDBTRL IDX1 EA64-C:63-B WD 1 EA64-B:61 RT0:311 undefined Programming Note RT32:631 (data cache data)[IDX]WD×32:WD×32+31 dcread can be used by a debug tool to determine DCDBTRH1 (data cache tag high)[IDX] the contents of the data cache, without knowing the DCDBTRL1 (data cache tag low)[IDX] specific addresses of the blocks which are currently Let the effective address (EA) be the sum of the con- contained within the cache. tents of register RA, or 0 if RA is equal to 0, and the contents of register RB. Programming Note Let C = log2(cache size in bytes). Execution of dcread before the data cache has Let B = log2(cache block size in bytes). completed all cache operations associated with previously executed instructions (such as block fills EA64-C:63-B selects one of the 2C-B data cache blocks. and block flushes) is undefined. EA64-B:61 selects one of the data words in the selected data cache block. The selected word in the selected data cache block is placed into register RT. The contents of the data cache directory entry associ- ated with the selected data cache block are placed into DCDBTRH and DCDBTRL (see Figure 25 and Figure 26). dcread requires software to guarantee execution syn- chronization before subsequent mfspr instructions can read the results of the dcread instruction into GPRs. In order to guarantee that the mfspr instructions obtain the results of the dcread instruction, a sequence such as the following must be used: 632 Power ISATM -- Book III-E Version 2.04 Instruction Cache Read X-form Programming Note icread RA,RB icread can be used by a debug tool to determine the contents of the instruction cache, without know- 31 /// RA RB 998 / ing the specific addresses of the blocks which are 0 6 11 16 21 31 currently contained within the cache. if RA = 0 then b 1 0 else b 1 (RA) EA 1 b + (RB) C 1 log2(cache size) B 1 log2(cache block size) IDX1 EA64-C:63-B WD 1 EA64-B:61 ICDBDR1 (instruction cache data)[IDX]WD×32:WD×32+31 ICDBTRH1 (instruction cache tag high)[IDX] ICDBTRL1 (instruction cache tag low)[IDX] Let the effective address (EA) be the sum of the con- tents of register RA, or 0 if RA is equal to 0, and the contents of register RB. Let C = log2(cache size in bytes). Let B = log2(cache block size in bytes). EA64-C:63-B selects one of the 2C-B instruction cache blocks. EA64-B:61 selects one of the data words in the selected instruction cache block. The selected word in the selected instruction cache block is placed into ICDBDR. The contents of the instruction cache directory entry associated with the selected cache block are placed into ICDBTRH and ICDBTRL (see Figure 28 and Figure 29). icread requires software to guarantee execution syn- chronization before subsequent mfspr instructions can read the results of the icread instruction into GPRs. In order to guarantee that the mfspr instructions obtain the results of the icread instruction, a sequence such as the following must be used: icread regA,regB # read cache information isync # ensure icread completes # before attempting to # read results mficdbdr regC # move instruction # information into GPR C mficdbtrh regD # move high portion of # tag into GPR D mficdbtrl regE # move low portion of tag # into GPR E This instruction is privileged. Special Registers Altered: ICDBDR ICDBTRH ICDBTRL Appendix A. Implementation-Dependent Instructions 633 Version 2.04 634 Power ISATM -- Book III-E Version 2.04 Appendix B. Assembler Extended Mnemonics In order to make assembler language programs simpler to write and easier to understand, a set of extended mnemonics and symbols is provided for certain instruc- tions. This appendix defines extended mnemonics and symbols related to instructions defined in Book III. Assemblers should provide the extended mnemonics and symbols listed here, and may provide others. Appendix B. Assembler Extended Mnemonics 635 Version 2.04 B.1 Move To/From Special Purpose Register Mnemonics This section defines extended mnemonics for the Time Base instruction, which specifies the portion of mtspr and mfspr instructions, including the Special the Time Base as a numeric operand. Purpose Registers (SPRs) defined in Book I and cer- Note: mftb serves as both a basic and an extended tain privileged SPRs, and for the Move From Time Base mnemonic. The Assembler will recognize an mftb mne- instruction defined in Book II. monic with two operands as the basic form, and an The mtspr and mfspr instructions specify an SPR as a mftb mnemonic with one operand as the extended numeric operand; extended mnemonics are provided form. In the extended form the TBR operand is omitted that represent the SPR in the mnemonic rather than and assumed to be 268 (the value that corresponds to requiring it to be coded as an operand. Similar TB). extended mnemonics are provided for the Move From Table 6: Extended mnemonics for moving to/from an SPR Move To SPR Move From SPR Special Purpose Register Extended Equivalent to Extended Equivalent to Fixed-Point Exception Register mtxer Rx mtspr 1,Rx mfxer Rx mfspr Rx,1 Link Register mtlr Rx mtspr 8,Rx mflr Rx mfspr Rx,8 Count Register mtctr Rx mtspr 9,Rx mfctr Rx mfspr Rx,9 Decrementer mtdec Rx mtspr 22,Rx mfdec Rx mfspr Rx,22 Save/Restore Register 0 mtsrr0 Rx mtspr 26,Rx mfsrr0 Rx mfspr Rx,26 Save/Restore Register 1 mtsrr1 Rx mtspr 27,Rx mfsrr1 Rx mfspr Rx,27 Special Purpose Registers mtsprg n,Rx mtspr 272+n,Rx mfsprg Rx,n mfspr Rx,272+n G0 through G3 Time Base [Lower] mttbl Rx mtspr 284,Rx mftb Rx mfspr Rx,268 Time Base Upper mttbu Rx mtspr 285,Rx mftbu Rx mfspr Rx,269 Processor Version Register - - mfpvr Rx mfspr Rx,287 636 Power ISATM -- Book III-E Version 2.04 Appendix C. Guidelines for 64-bit Implementations in 32-bit Mode and 32-bit Implementations C.1 Hardware Guidelines C.1.1 64-bit Specific Instructions the 32 0s when implementing these instructions. For Branch to Link Register and Branch to Count Register The instructions in the Category: 64-Bit are considered instructions, given the LR and CTR are implemented restricted only to 64-bit processing. A 32-bit implemen- only as 32-bit registers, only concatenating 2 0s to the tation need not implement the group; likewise, the right of bits 32:61 of these registers is necessary to 32-bit applications will not utilize any of these instruc- form the 32-bit branch target address. tions. All other instructions shall either be supported For next sequential instruction address computation, directly by the implementation, or sufficient infrastruc- the behavior is the same as for 64-bit implementations ture will be provided to enable software emulation of in 32-bit mode. the instructions. A 64-bit implementation that is execut- ing in 32-bit mode may choose to take an Unimple- mented Instruction Exception when these 64-bit C.1.4 TLB Fields on 32-bit Imple- specific instructions are executed. mentations C.1.2 Registers on 32-bit Imple- 32-bit implementations should support bits 32:53 of the Effective Page Number (EPN) field in the TLB. This size mentations provides support for a 32-bit effective address, which Power ISA ABIs may have come to expect to be avail- The Power ISA provides 32-bit and 64-bit registers. All able. 32-bit implementations may support greater than 32-bit registers shall be supported as defined in the 32-bit real addresses by supporting more than bits specification except the MSR. The MSR shall be sup- 32:53 of the Real Page Number (RPN) field in the TLB. ported as defined in the specification except that bits 32:33 (CM and ICM) are treated as reserved bits. Only bits 32:63 of the 64-bit registers are required to be implemented in hardware in a 32-bit implementation C.2 32-bit Software Guidelines except for the 64-bit FPRs. Such 64-bit registers include the LR, the CTR, the XER, the 32 GPRs, SRR0 C.2.1 32-bit Instruction Selection and CSRR0. Any software that uses any of the instructions listed in Likewise, other than floating-point instructions, all Category: 64-Bit shall be considered 64-bit software, instructions which are defined to return a 64-bit result and correct execution cannot be guaranteed on 32-bit shall return only bits 32:63 of the result on a 32-bit implementations. Generally speaking, 32-bit software implementation. should avoid using any instruction or instructions that depend on any particular setting of bits 0:31 of any C.1.3 Addressing on 32-bit Imple- 64-bit application-accessible system register, including General Purpose Registers, for producing the correct mentations 32-bit results. Context switching may or may not pre- Only bits 32:63 of the 64-bit instruction and data stor- serve the upper 32 bits of application-accessible 64-bit age effective addresses need to be calculated and pre- system registers and insertion of arbitrary settings of sented to main storage. Given that the only branch and those upper 32 bits at arbitrary times during the execu- data storage access instructions that are not included tion of the 32-bit application must not affect the final in Section C.1.1 are defined to prepend 32 0s to bits result. 32:63 of the effective address computation, a 32-bit implementation can simply bypass the prepending of Appendix C. Guidelines for 64-bit Implementations in 32-bit Mode and 637 Version 2.04 638 Power ISATM -- Book III-E Version 2.04 Appendix D. Type FSL Storage Control [Category: Embedded.MMU Type FSL] D.1 Type FSL Storage Control D.2 Type FSL Storage Control Overview Registers The Embedded category provides two different mem- ory management and TLB programming models from D.2.1 Process ID Registers (PIDn) which an implementation may choose. Both models use the same definition of the general contents of a Process ID Registers are used by system software to Translation Lookaside Buffer (TLB) entry, but differ on specify which TLB entries are used by the processor to what methods and resources are used to manipulate accomplish address translation for loads, stores, and the TLB itself. The programming model presented here instruction fetches. Section 4.7.1.1 defines the PID reg- is called Type FSL and it defines functions and struc- ister. The PID register is synonymous with PID0. In tures that are visible to software. These are divided into addition to PID0, 2 additional PID registers, PID1 and the following areas: PID2 are defined. An implementation may choose to provide any number of PIDs up to a maximum of 3. The 1 The TLB itself. The TLB consists of one or more number of PIDs implemented is indicated by the value structures called TLB arrays each of which may of MMUCFGNPIDS and the number of bits implemented have differing characteristics. in each PID register is indicated by the value of 1 The address translation mechanism. MMUCFGPIDSIZE. PID values are used to construct vir- 1 Methods and effects of changing and manipulating tual addresses for accessing memory. TLB arrays. 1 Configuration information available to the operat- PIDn ing system that describes the structure and form of 32 63 the TLB arrays and translation mechanism. Figure 30. Process ID Register (PID0­PID2) The TLB structure and the methods of performing translations are called the Memory Management Unit Bit Description (MMU). 32:49 Reserved The programming model for reading and writing TLBs 50:63 Process ID is software managed. Hardware page table formats are Identifies the process not defined and software is free to choose any form in which to hold information about address translation. Programming Note Address translation is accomplished through a set of The suggested software convention for PID usage TLB arrays, PID registers, and address space identifi- is to use PID0 to denote private mappings for a ers from the MSR, all of which are software managed. process and to use other PIDs to handle mappings TLB entries are used to translate both instruction and that may be common to multiple processes. This data memory references providing a unified memory method allows for processes sharing address management model. space to also share TLB entries if the shared address space is mapped at the same virtual address in each process. D.2.2 Translation Lookaside Buffer The MMU contains up to four TLB arrays. TLB arrays are on-chip storage areas for holding TLB entries. A Appendix D. Type FSL Storage Control [Category: Embedded.MMU Type 639 Version 2.04 TLB entry contains effective to real address mappings instructions. Execution of a tlbre instruction causes the for loads, stores, and instruction fetches. A TLB array TLB entry specified by MAS0TLBSEL, MAS0ESEL, and contains zero or more TLB entries. Each of the TLB MAS2EPN to be copied to the MAS registers. Con- entries has specific fields that can be accessed using versely, execution of a tlbwe instruction causes the the corresponding fields in the MMU Assist Registers TLB entry specified by MAS0TLBSEL, MAS0ESEL, and (see Section D.2.4). Each TLB array that is imple- MAS2EPN to be written with contents of the MAS regis- mented has a configuration register (TLBnCFG) associ- ters. MAS registers may also be updated by hardware ated with it describing the size and attributes of the TLB on the occurrence of an Instruction or Data TLB Error entries in that array (see Section D.2.5.2). interrupt or as the result of a tlbsx instruction. A TLB entry contains the fields described in Section All MAS registers are privileged. All MAS registers with 4.7.1.2 as well as these additional fields: the exception of MAS7 must be implemented. MAS7 is not required to be implemented if the processor sup- Field Description ports 32 bits or less of real address. IPROT Invalidation protection. This entry is protected Processors are only required to implement the neces- from all TLB invalidation mechanisms except sary bits of any multi-bit field in a MAS register such the explicit writing of a 0 to the V bit. that only the resources supplied by the processor are ACM The Alternate Coherency Mode (ACM) represented. Any non-implemented bits in a field attribute allows an implementation to employ should have no effect when writing and should always more than a single coherency method. This read as zero. For example, a processor that imple- allows for a processor to participate in multiple ments only 2 TLB arrays will likely only implement the coherency protocols. If the M attribute (Mem- lower-order bit of the MAS0TLBSEL field. ory Coherence Required) is not set for a page (M=0), the page has no coherency associated D.2.4.1 MAS0 Register with it and the ACM attribute is ignored. If the M attribute is set to 1 for a page (M=1), the The MAS0 register contains fields for identifying and ACM attribute is used to determine the coher- selecting a TLB entry. ence domain (or protocol) used. The values for ACM are implementation-dependent. MAS0 32 63 D.2.3 Address Space Identifiers Figure 31. MAS0 register The address space identifier is called the AS bit. Thus These bits are interpreted as follows: there are two possible address spaces, 0 and 1. The Bit Description value of the AS bit (see Section 4.7.2, Figure 8) is determined by the type of translation performed and 32:33 Reserved from the contents of the MSR when an address is 34:35 TLB Select (TLBSEL) translated. If the type of translation performed is an Selects TLB for access. instruction fetch, the value of the AS bit is taken from 00 TLB0 the contents of MSRIS. If the type of translation per- 01 TLB1 formed is a load, store, or other data translation includ- 10 TLB2 ing target addresses of software initiated instruction 11 TLB3 fetch hints and locks the value of the AS bit is taken from the contents of MSRDS. 36:47 Entry Select (ESEL) Identifies an entry in the selected array to be Programming Note used for tlbwe and tlbre. Valid values for While system software is free to use address space ESEL are from 0 to TLBnCFGASSOC - 1. That bits as it sees fit, it should be noted that on inter- is, ESEL selects the entry in the TLB array rupt, the MSRIS and MSRDS bits are set to 0. This from the set of entries which can be used for encourages software to use address space 0 for translating addresses with the EPN specified system software and address space 1 for user soft- by MAS2EPN. For fully-associative TLB arrays, ware. ESEL ranges from 0 to TLBnCFGNENTRY - 1. ESEL is also updated on TLB error exceptions (misses), and tlbsx hit and miss cases. D.2.4 MMU Assist Registers 48:51 Reserved The MMU Assist Registers (MAS) are used to transfer 52:63 Next Victim (NV) data to and from the TLB arrays. MAS registers can be NV is a hint to software to identify the next vic- read and written by software using mfspr and mtspr tim to be targeted for a TLB miss replacement 640 Power ISATM -- Book III-E Version 2.04 operation for those TLBs that support the NV 56:63 Reserved field. If the TLB selected by MAS0TLBSEL does not support the NV field, then this field is undefined. The computation of this field is D.2.4.3 MAS2 Register implementation-dependent. NV is updated on The MAS2 register is a 64-bit register in 64-bit mode TLB error exceptions (misses), tlbsx hit and and a 32-bit register in 32-bit mode. The register con- miss cases as shown in Table 7, and on exe- tains fields for specifying the effective page address cution of tlbre if the TLB array being accessed and the storage attributes for a TLB entry. supports the NV field. When NV is updated by a supported TLB array, the NV field will always MAS2 present a value that can be used in the 0 63 MAS0ESEL field. Figure 33. MAS2 register D.2.4.2 MAS1 Register These bits are interpreted as follows: The MAS1 register contains fields for selecting a TLB Bit Description entry during translation. 0:51 Effective Page Number (EPN) Depending on page size, only the bits associ- MAS1 ated with a page boundary are valid. Bits that 32 63 represent offsets within a page are ignored and should be zero. EPN0:31 are accessible Figure 32. MAS1 register only in 64-bit implementations as the upper 32 These bits are interpreted as follows: bits of the effective address of the page. Bit Definition 52:55 Reserved 32 TLB Valid Bit (V) 56:57 Alternate Coherency Mode (ACM) The ACM attribute allows an implementation 0 This TLB entry is invalid. to employ more than a single coherency 1 This TLB entry is valid. method. This allows for a processor to partici- 33 Invalidate Protect (IPROT) pate in multiple coherency protocols. If the M Indicates this TLB entry is protected from attribute (Memory Coherence Required) is not invalidate operations due to execution of set for a page (M=0), the page has no coher- tlbivax, tlbivax invalidations from another ency associated with it and the ACM attribute processor, or invalidate all operations. IPROT is ignored. If the M attribute is set to 1 for a is only implemented for TLB entries in TLB page (M=1), the ACM attribute is used to arrays where TLBnCFGIPROT is indicated. determine the coherence domain (or protocol) used. The values for ACM are implementa- 0 Entry is not protected from invalidation tion-dependent. 1 Entry is protected from invalidation. 34:47 Translation Identity (TID) Programming Note During translation, TID is compared with the Some previous implementations may current process IDs (PIDs) to select a TLB have a storage bit in the bit 57 position entry. A TID value of 0 defines an entry as glo- labeled as X0. bal and matches with all process IDs. 48:50 Reserved 58 VLE Mode (VLE) [Category: VLE] 51 Translation Space (TS) Identifies pages which contain instructions to During translation, TS is compared with AS be decoded as VLE instructions (see Chapter (the IS or DS fields of the MSR depending on 1 of Book VLE). Setting the VLE attribute to 1 the type of access) to select a TLB entry. and setting the E attribute to 1 is considered a 52:55 Translation Size (TSIZE) programming error and an attempt to fetch TSIZE defines the page size of the TLB entry. instructions from a page so marked produces For TLB arrays that contain fixed-size TLB an Instruction Storage Interrupt Byte Ordering entries, this field is ignored. For variable page Exception and sets ESRBO. size TLB arrays, the page size is 0 Instructions fetched from the page are 4TSIZE Kbytes. TSIZE must be between decoded and executed as non-VLE TLBnCFGMINSIZE and TLBnCFGMAXSIZE. instructions. Encodings for page size are defined in Section 4.7.1.2. Appendix D. Type FSL Storage Control [Category: Embedded.MMU Type 641 Version 2.04 1 Instructions fetched from the page are 52:53 Reserved decoded and executed as VLE instruc- 54:57 User Bits (U0:U3) tions. These bits are associated with a TLB entry and can be used by system software. For Programming Note example, these bits may be used to hold infor- Some previous implementations may mation useful to a page scanning algorithm or have a storage bit in this position labeled be used to mark more abstract page as X1. Software should not use the pres- attributes. ence of this bit (the ability to set to 1 and read a 1) to determine if the implementa- 58:63 Permission Bits (UX, SX, UW, SW, UR, SR). tion supports the VLE. User and supervisor execute, write, and read permission bits. The effect of the Permission 59 Write Through (W) Bits are defined in Section 4.7.1.2. 0 This page is not Write-Through Required storage. D.2.4.5 MAS4 Register 1 This page is Write-Through Required stor- The MAS4 register contains fields for specifying default age. information to be pre-loaded on certain MMU related 60 Caching Inhibited (I) exceptions. See Section D.4.5 for more information. 0 This page is not Caching Inhibited stor- MAS4 age. 32 63 1 This page is Caching Inhibited storage 61 Memory Coherence Required (M) Figure 35. MAS4 register 0 This page is not Memory Coherence The MAS4 fields are described below. Required storage. Bit Description 1 This page is Memory Coherence Required storage. 32:33 Reserved 62 Guarded (G) 34:35 TLBSEL Default Value (TLBSELD) Specifies the default value loaded in 0 This page is not Guarded storage. MAS0TLBSEL on a TLB miss exception. 1 This page is Guarded storage. 36:43 Reserved 63 Endianness (E) 44:47 TID Default Selection Value (TIDSELD) 0 The page is accessed in Big-Endian byte Specifies which of the current PID registers order. should be used to load the MAS1TID field on a 1 The page is accessed in Little-Endian byte TLB miss exception. order. D.2.4.4 MAS3 Register The PID registers are addressed as follows: 0000 = PID0 (PID) The MAS3 register contains fields for specifying the 0001 = PID1 real page address, user defined attributes, and the per- 0010 = PID2 mission attributes for a TLB entry. A value that references a non-implemented MAS3 PID register causes a value of 0 to be placed 32 63 in MAS1TID. Figure 34. MAS3 register 48:51 Reserved 52:55 Default TSIZE Value (TSIZED) These bits are interpreted as follows: Specifies the default value loaded into Bit Description MAS1TSIZE on a TLB miss exception. 32:51 Real Page Number (bits 32:51) (RPNL or 56:57 Default ACM Value (ACMD) RPN32:51) Specifies the default value loaded into Depending on page size, only the bits associ- MAS2ACM on a TLB miss exception. ated with a page boundary are valid. Bits that 58 Default VLE Value (VLED) represent offsets within a page are ignored Specifies the default value loaded into and should be zero. RPN0:31 are accessed MAS2VLE on a TLB miss exception. through MAS7. 642 Power ISATM -- Book III-E Version 2.04 59 Default W Value (WD) 32:63 Real Page Number (bits 0:31) (RPNU or Specifies the default value loaded into MAS2W RPN0:31) on a TLB miss exception. RPN32:51 are accessed through MAS3. 60 Default I Value (ID) Specifies the default value loaded into MAS2I on a TLB miss exception. 61 Default M Value (MD) Specifies the default value loaded into MAS2M on a TLB miss exception. 62 Default G Value (GD) Specifies the default value loaded into MAS2G on a TLB miss exception. 63 Default E Value (ED) Specifies the default value loaded into MAS2E on a TLB miss exception. D.2.4.6 MAS6 Register The MAS6 register contains fields for specifying PID and AS values to be used when searching TLB entries with the tlbsx instruction. MAS6 32 63 Figure 36. MAS6 register These bits are interpreted as follows: Bit Description 32:33 Reserved 34:47 Search PID0 (SPID0) Specifies the value of PID0 used when searching the TLB during execution of tlbsx. This field is valid for only the number of bits implemented for PID registers. 48:62 Reserved 63 Address Space Value for Searches (SAS) Specifies the value of AS used when search- ing the TLB during execution of tlbsx. D.2.4.7 MAS7 Register The MAS7 register contains the high order address bits of the RPN for implementations that support more than 32 bits of physical address. Implementations that do not support more than 32 bits of physical addressing are not required to implement MAS7. MAS7 32 63 Figure 37. MAS7 register These bits are interpreted as follows: Bit Description Appendix D. Type FSL Storage Control [Category: Embedded.MMU Type 643 Version 2.04 Table 7: MAS Register Update Summary Value Loaded on Event MAS Field Updated Data or Instruction tlbsx hit tlbsx miss tlbre TLB Error Interrupt MAS0TLBSEL MAS4TLBSELD TLB array that hit MAS4TLBSELD -- MAS0ESEL if TLB array Number of entry that hit if TLB array -- [MAS4TLBSELD] sup- [MAS4TLBSELD] sup- ports next victim then ports next victim then hardware hint, hardware hint, else undefined else undefined MAS0NV if TLB array if TLB array if TLB array if TLB array [MAS4TLBSELD] sup- [MAS4TLBSELD] sup- [MAS4TLBSELD] sup- [MAS4TLBSELD] sup- ports next victim then ports next victim then ports next victim then ports next victim then next hardware hint, hardware hint, next hardware hint, hardware hint, else undefined else undefined else undefined else undefined MAS1V 1 1 0 TLBV MAS1IPROT 0 TLBIPROT 0 TLBIPROT MAS1TID if PID[MAS4TIDSELD] TLBTID MAS6SPID0 TLBTID implemented then PID[MAS4TIDSELD] else 0 MAS1TS MSRIS or MSRDS TLBTS MAS6SAS TLBTS MAS1TSIZE MAS4TSIZED TLBSIZE MAS4TSIZED TLBSIZE MAS2EPN EA0:511 TLBEPN undefined TLBEPN MAS2ACM MAS4ACMD TLBACM MAS4ACMD TLBACM MAS2VLE MAS4VLED TLBVLE MAS4VLED TLBVLE MAS2W MAS4WD TLBW MAS4WD TLBW MAS2I MAS4ID TLBI MAS4ID TLBI MAS2M MAS4MD TLBM MAS4MD TLBM MAS2G MAS4GD TLBG MAS4GD TLBG MAS2E MAS4ED TLBE MAS4ED TLBE MAS3RPN 0 TLBRPN 0 TLBRPN (bits 32:51) (bits 32:51) MAS3U0 U1 U2 U3 0 TLBU0 U1 U2 U3 0 TLBU0 U1 U2 U3 MAS3UX SX UW 0 TLBUX SX UW SW UR SR 0 TLBUX SX UW SW UR SR SW UR SR MAS4 -- -- -- -- MAS6SPID0 PID0 -- -- -- MAS6SAS MSRIS or MSRDS -- -- -- MAS7RPN 0 TLBRPN 0 TLBRPN (bits 0:31) (bits 0:31) 1. If MSRCM=0 (32-bit mode) at the time of the exception, EPN0:31 are set to 0. 644 Power ISATM -- Book III-E Version 2.04 D.2.5 MMU Configuration and D.2.5.2 TLB Configuration Registers Control Registers (TLBnCFG) The TLBnCFG read-only registers provide information about each specific TLB that is implemented. There is D.2.5.1 MMU Configuration Register one TLBnCFG register implemented for each TLB array (MMUCFG) that is implemented. TLB0CFG corresponds to TLB0, The read-only MMUCFG register is described as fol- TLB1CFG corresponds to TLB1, etc. lows. TLBnCFG provides configuration information for the corresponding TLB array. MMUCFG 32 63 TLBnCFG Figure 38. MMU Configuration Register 32 63 These bits are interpreted as follows: Figure 39. TLB Configuration Register Bit Description These bits are interpreted as follows: 32:39 Reserved Bit Description 40:46 Real Address Size (RASIZE) 32:39 Associativity (ASSOC) Number of bits in a real address supported by Total number of entries in a TLB array which the implementation. can be used for translating addresses with a given EPN. This number is referred to as the 47:48 Reserved associativity level of the TLB array. A value 49:52 Number of PID Registers (NPIDS) equal to NENTRY or 0 indicates the array is Indicates the number of PID registers pro- fully-associative. vided by the processor. 40:43 Minimum Page Size (MINSIZE) 53:57 PID Register Size (PIDSIZE) Minimum page size of TLB array. Page size The value of PIDSIZE is one less than the encoding is defined in Section 4.7.1.2. number of bits implemented for each of the 44:47 Maximum Page Size (MAXSIZE) PID registers implemented by the processor. Maximum page size of TLB array. Page size The processor implements only the least sig- encoding is defined in Section 4.7.1.2. nificant PIDSIZE+1 bits in the PID registers. The maximum number of PID register bits that 48 Invalidate Protection (IPROT) may be implemented is 14. Invalidate protect capability of TLB array. 58:59 Reserved 0 Indicates invalidate protection capability not supported. 60:61 Number of TLBs (NTLBS) 1 Indicates invalidate protection capability The value of NTLBS is one less than the num- supported. ber of software-accessible TLB structures that are implemented by the processor. NTLBS is 49 Page Size Availability (AVAIL) set to one less than the number of TLB struc- Page size availability of TLB array. tures so that its value matches the maximum 0 Fixed selectable page size from MINSIZE value of MAS0TLBSEL. to MAXSIZE (all TLB entries are the same 00 1 TLB size). 01 2 TLBs 1 Variable page size from MINSIZE to MAX- 10 3 TLBs SIZE (each TLB entry can be sized sepa- 11 4 TLBs rately). 62:63 MMU Architecture Version Number (MAVN) 50:51 Reserved Indicates the version number of the architec- 52:63 Number of Entries (NENTRY) ture of the MMU implemented by the proces- Number of entries in TLB array. sor. 00 Version 1.0 D.2.5.3 MMU Control and Status Regis- 01 Reserved 10 Reserved ter (MMUCSR0) 11 Reserved The MMUCSR0 register is used for general control of the MMU including invalidation of the TLB arrays and page sizes for programmable fixed size arrays. For TLB Appendix D. Type FSL Storage Control [Category: Embedded.MMU Type 645 Version 2.04 arrays that have programmable fixed sizes, the Programming Note TLBn_PS fields allow software to specify the page size. Changing the fixed page size of an entire MMUCSR0 array must be done with great care. If any entries in the array are valid, changing the 32 63 page size may cause those entries to Figure 40. MMU Control and Status Register 0 overlap, creating a serious programming error. It is suggested that the entire TLB These bits are interpreted as follows: array be invalidated and any entries with Bit Description IPROT have their V bits set to zero before changing page size. 32:40 Reserved 41:56 TLBn Array Page Size A 4-bit field specifies the page size for TLBn array. Page size encoding is defined in Sec- D.3 Page Identification and tion 4.7.1.2. For each TLB array n, the field is Address Translation implemented only if TLBnCFGAVAIL=0 and TLBnCFGMINSIZETLBnCFGMAXSIZE. If the Page Identification occurs as described in Section 4.7.2 value of TLBn_PS is not between TLBnCFG- except the matching TLB entry may be identified using MINSIZE and TLBnCFGMAXSIZE the page size more than one PID register. Accesses that would result is set to TLBnCFGMINSIZE. in multiple matching entries are not allowed and are 41:44 TLB3 Array Page Size (TLB3_PS) considered a serious programming error by system Page size of the TLB3 array. software and the results of such a translation are unde- fined. A PID register containing a 0 value (or the same 45:48 TLB2 Array Page Size (TLB2_PS) value as another PID register) will form a non unique Page size of the TLB2 array. match and is permissible. 49:52 TLB1 Array Page Size (TLB1_PS) Once a match occurs the matching TLB entry is used Page size of the TLB1 array. for access control, storage attributes, and effective to 53:56 TLB0 Array Page Size (TLB0_PS) real address translation. Page size of the TLB0 array. 57:62 TLBn Invalidate All TLB invalidate all bit for the TLBn array. D.4 TLB Management 0 If this bit reads as a 1, an invalidate all operation for the TLBn array is in D.4.1 Reading TLB Entries progress. Hardware will set this bit to 0 TLB entries can be read by executing tlbre instructions. when the invalidate all operation is com- At the time of tlbre execution, the MAS registers are pleted. Writing a 0 to this bit during an used to index a specific TLB entry and upon completion invalidate all operation is ignored. of the tlbre instruction, the MAS registers will contain 1 TLBn invalidation operation. Hardware ini- the contents of the indexed TLB entry. tiates a TLBn invalidate all operation. When this operation is complete, this bit is Specifying invalid values for MAS0TLBSEL and cleared. Writing a 1 during an invalidate MAS0ESEL produce undefined results. all operation produces an undefined result. If the TLB array supports IPROT, entries that have IPROT set will not be D.4.2 Writing TLB Entries invalidated. TLB entries can be written by executing tlbwe instruc- 57 TLB2 Invalidate All (TLB2_FI) tions. At the time of tlbwe execution, the MAS registers TLB invalidate all bit for the TLB2 array. are used to index a specific TLB entry and contain the contents to be written to the indexed TLB entry. Upon 58 TLB3 Invalidate All (TLB3_FI) completion of the tlbwe instruction, the contents of the TLB invalidate all bit for the TLB3 array. MAS registers corresponding to TLB entry fields will be 59:60 Reserved written to the indexed TLB entry. 61 TLB0 Invalidate All (TLB0_FI) Specifying invalid values for MAS0TLBSEL ESEL pro- TLB invalidate all bit for the TLB0 array. duces undefined results. 62 TLB1 Invalidate All (TLB1_FI) TLB invalidate all bit for the TLB1 array. 63 Reserved 646 Power ISATM -- Book III-E Version 2.04 D.4.3 Invalidating TLB Entries Programming Note TLB entries may be invalidated by three different meth- Not all TLB arrays in a given implementation will ods. The TLB entry can be invalidated as the result of a implement the IPROT attribute. It is likely that tlbwe instruction that sets the MAS1V bit in the entry to implementations that are suitable for demand page 0. TLB entries may also be invalidated as a result of a environments will implement it for only a single tlbivax instruction or from an invalidation resulting from array, while not implementing it for other TLB a tlbivax on another processor. Lastly, TLB entries may arrays. be invalidated as a result of an invalidate all operation specified through appropriate settings in the Programming Note MMUCSR0. Operating systems need to use great care when In both multiprocessor and uniprocessor systems, using protected (IPROT) TLB entries, particularly in invalidations can occur on a wider set of TLB entries SMP systems. An SMP system that contains TLB than intended. That is, a virtual address presented for entries on other processors will require a cross pro- invalidation may cause not only the intended TLB tar- cessor interrupt or some other synchronization geted for invalidation to be invalidated, but may also mechanism to assure that each processor per- invalidate other TLB entries depending on the imple- forms the required invalidation by writing its own mentation. This is because parts of the translation TLB entries. mechanism may not be fully specified to the hardware at invalidate time. This is especially true in SMP sys- tems, where the invalidation address must be supplied Programming Note to all processors in the system, and there may be other To ensure a TLB entry that is not protected by limitations imposed by the hardware implementation. IPROT is invalidated if software does not know This phenomenon is known as generous invalidates. which TLB array the entry is in, software should The architecture assures that the intended TLB will be issue a tlbivax instruction targeting each TLB in invalidated, but does not guarantee that it will be the the implementation with the EA to be invalidated. only one. A TLB entry invalidated by writing the V bit of the TLB entry to 0 by use of a tlbwe instruction is guar- Programming Note anteed to invalidate only the addressed TLB entry. Invalidates occurring from tlbivax instructions or from The preferred method of invalidating entire TLB tlbivax instructions on another processor may cause arrays is invalidation using MMUCSR0. generous invalidates. The architecture provides a method to protect against Programming Note generous invalidations. This is important since there Invalidations using MMUCSR0 only affect the TLB are certain virtual memory regions that must be prop- array on the processor that performs the invalida- erly mapped to make forward progress. To prevent this, tion. To perform invalidations in a multiprocessor the architecture specifies an IPROT bit for TLB entries. system on all processors in a coherence domain, If the IPROT bit is set to 1 in a given TLB entry, that software should use tlbivax. entry is protected from invalidations resulting from tlbivax instructions, or from invalidate all operations. TLB entries with the IPROT field set may only be invali- D.4.4 Searching TLB Entries dated by explicitly writing the TLB entry and specifying a 0 for the V (MAS1V) field. Software may search the MMU by using the tlbsx instruction. The tlbsx instruction uses PID values and Programming Note an AS value from the MAS registers instead of the PID The most obvious issue with generous invalida- registers and the MSR. This allows software to search tions is the code memory region that serves as the address spaces that differ from the current address exception handler for MMU faults. If this region space defined by the PID registers. This is useful for does not have a valid mapping, an MMU exception TLB fault handling. cannot be handled because the first address of the exception handler will result in another MMU D.4.5 TLB Replacement Hardware exception. Assist The architecture provides mechanisms to assist soft- ware in creating and updating TLB entries when MMU related exceptions occur. This is called TLB Replace- ment Hardware Assist. Hardware will update the MAS Appendix D. Type FSL Storage Control [Category: Embedded.MMU Type 647 Version 2.04 registers on the occurrence of a Data TLB Error Inter- ing the exception was executing in 32-bit mode, rupt or Instruction TLB Error interrupt. then bits 0:31 of the EPN field in MAS2 will be set to 0. When a Data or Instruction TLB Error interrupt (miss) 1 Executing a tlbre instruction in 32-bit mode will set occurs, MAS0, MAS1, and MAS2 are automatically bits 0:31 of the MAS2 EPN field to an undefined updated using the defaults specified in MAS4 as well as value. the AS and EPN values corresponding to the access that caused the exception. MAS6 is updated to set Programming Note MAS6SPID0 to the value of PID0 and MAS6SAS to the value of MSRDS or MSRIS depending on the type of This allows a 32-bit OS to operate seamlessly on a access that caused the error. In addition, if 64-bit implementation and a 64-bit OS to easily MAS4TLBSELD identifies a TLB array that supports NV support 32-bit applications. (Next Victim), MAS0ESEL is loaded with a value that hardware believes represents the best TLB entry to vic- timize to create a new TLB entry and MAS0NV is updated with the TLB entry index of what hardware believes to be the next victim. Thus MAS0ESEL identi- fies the current TLB entry to be replaced, and MAS0NV points to the next victim. When software writes the TLB entry, the MAS0NV field is written to the TLB array. The algorithm used by the hardware to determine which TLB entry should be targeted for replacement is imple- mentation-dependent. The automatic update of the MAS registers sets up all the necessary fields for creating a new TLB entry with the exception of RPN, the U0-U3 attribute bits, and the permission bits. With the exception of the upper 32 bits of RPN and the page attributes (should software desire to specify changes from the default attributes), all the remaining fields are located in MAS3, requiring only the single MAS register manipulation by software before writing the TLB entry. For Instruction Storage interrupt (ISI) and Data Storage interrupt (DSI) related exceptions, the MAS registers are not updated. Software must explicitly search the TLB to find the appropriate entry. The update of MAS registers through TLB Replace- ment Hardware Assist is summarized in Table 7. D.5 32-bit and 64-bit Specific MMU Behavior MMU behavior is largely unaffected by whether the pro- cessor is in 32-bit computation mode (MSRCM=0) or 64-bit computation mode (MSRCM=1). The only differ- ences occur in the EPN field of the TLB entry and the EPN field of MAS2. The differences are summarized here. 1 Executing a tlbwe instruction in 32-bit mode will set bits 0:31 of the TLB EPN field to 0, regardless of the value of bits 0:31 of the EPN field in MAS2. 1 Updates to MAS registers via TLB Replacement Hardware Assist (see Section D.4.5), update bits 0:51 of the EPN field regardless of the computa- tion mode of the processor at the time of the exception or the interrupt computation mode in which the interrupt is taken. If the instruction caus- 648 Power ISATM -- Book III-E Version 2.04 D.6 Type FSL MMU Instructions The instructions described in this section, replace the instructions described in Section 4.9.4.1, "TLB Man- agement Instructions". TLB Invalidate Virtual Address Indexed a set of operations which is independent of the other X-form sets that mbar orders. The effects of the invalidation are not guaranteed to be tlbivax RA,RB visible to the programming model until the completion of a context synchronizing operation. 31 /// RA RB 786 / 0 6 11 16 21 31 Invalidations may occur for other TLB entries in the designated array, but in no case will any TLB entries if RA = 0 then b 1 0 with the IPROT attribute set be made invalid. else b 1 (RA) EA 1 b + (RB) In some implementations, if RA does not equal 0, it for each processor may produce an Illegal Instruction exception. for TLB array = EA59:60 This instruction is privileged. for each TLB entry m 1 ¬((1 << (2×(entrySIZE-1))) - 1) Special Registers Altered: if ((EA0:51 & m) = (entryEPN & m)) | EA61 None then if entryIPROT = 0 then entryV 1 0 Programming Note Let the effective address (EA) be the sum(RA|0)+ (RB). The use of EA61 to invalidate TLB arrays may be The EA is interpreted as show below. phased out in future versions of the architecture. EA0:51 EA0:51 The preferred method of invalidating TLB arrays is invalidation using MMUCSR0. EA52:58 Reserved EA59:60 TLB array selector 00 TLB0 01 TLB1 10 TLB2 11 TLB3 EA61 TLB Invalidate All EA62:63 Reserved If EA61=0, then if the TLB array targeted by EA59:60 contains an entry identified by EA0:51, that entry is made invalid unless the TLB entry is protected by the IPROT attribute. A TLB entry is identified if, for m = ¬((1 << (2×(TLB_entrysize-1))) - 1), EA0:51&m is equal to TLB_entryEPN&m. The AS bit does not partici- pate in the comparison. If EA61=1, then all entries not protected by the IPROT attribute in the TLB array targeted by EA59:60 are made invalid. This instruction causes the target TLB entry to be inval- idated in all processors. The operation performed by this instruction is ordered by the mbar (or sync) instruction with respect to a sub- sequent tlbsync instruction executed by the processor executing the tlbivax instruction. The operations caused by tlbivax and tlbsync are ordered by mbar as Appendix D. Type FSL Storage Control [Category: Embedded.MMU Type 649 Version 2.04 TLB Search Indexed X-form Special Registers Altered: MAS0 MAS1 MAS2 MAS3 MAS7 tlbsx RA,RB 31 /// RA RB 914 / 0 6 11 16 21 31 TLB Read Entry X-form if RA = 0 then b 1 0 tlbre else b 1 (RA) EA 1 b + (RB) 31 /// /// /// 946 / pid 1 MAS6SPID0 0 6 11 16 21 31 as 1 MAS6SAS va 1 as || pid || EA if Valid_matching_entry_exists(va) then entry 1 SelectTLB(MAS0TLBSEL, MAS0ESEL, MAS2EPN) entry 1 matching entry found rpn 1 entryRPN array 1 TLB array number where TLB entry found if TLB array supports Next Victim then index 1 index into TLB array of TLB entry found MAS0NV 1 hint if TLB array supports Next Victim then else hint 1 hardware hint for Next Victim MAS0NV 1 undefined else MAS1V IPROT TID TS TSIZE 1 entryV IPROT TID TS SIZE hint 1 undefined MAS2EPN VLE W I M G E ACM 1 entryEPN VLE W I M G E ACM rpn 1 entryRPN MAS3RPNL 1 rpn32:51 MAS0TLBSEL 1 array MAS3U0:U3 UX SX UW SW UR SR 1 entryU0:U3 UX SX UW SW UR SR MAS0ESEL 1 index MAS7RPNU 1 rpn0:31 MAS0NV 1 hint The contents of the TLB entry specified by MAS1V 1 1 MAS0TLBSEL, MAS0ESEL, and MAS2EPN are read and MAS1IPROT TID TS TSIZE 1 entryIPROT TID TS SIZE placed into the MAS registers. MAS2EPN VLE W I M G E ACM 1 entryEPN VLE W I M G E ACM MAS3RPNL 1 rpn32:51 If the TLB array supports MAS0NV, then an implemen- MAS3U0:U3 UX SX UW SW UR SR 1 entryU0:U3 UX SX UW SW UR SR tation defined value, hint, specifying the index for the MAS7RPNU 1 rpn0:31 next entry to be replaced is loaded into MAS0NV; other- else wise MAS0NV is set to an undefined value. MAS0TLBSEL 1 MAS4TLBSELD MAS0ESEL 1 hint If the specified entry does not exist, the results are MAS0NV 1 hint undefined. MAS1V IPROT 1 0 MAS1TID TS 1 MAS6SPID0 SAS This instruction is privileged. MAS1TSIZE 1 MAS4TSIZED MAS2VLE W I M G E ACM 1 MAS4VLED WD ID MD GD ED ACMD Special Registers Altered: MAS2EPN 1 undefined MAS0 MAS1 MAS2 MAS3 MAS7 MAS3RPNL 1 0 MAS3U0:U3 UX SX UW SW UR SR 1 0 MAS7RPNU 1 0 Let the effective address (EA) be the sum(RA|0)+ (RB). If any valid TLB array contains an entry corresponding to the virtual address formed by MAS6SAS SPID0 and EA, that entry as well as the index and array are read into the MAS registers. If no valid matching translation exists, MAS1V is set to 0 and the MAS registers are loaded with defaults to facilitate a TLB replacement. If the TLB array supports MAS0NV, an implementation defined value, hint, specifying the index for the next entry to be replaced is loaded into MAS0NV regardless of whether a match occurs; otherwise MAS0NV is set to an undefined value. It is also loaded into MAS0ESEL if no match occurs. In some implementations, if RA does not equal 0, it may produce an Illegal Instruction exception. This instruction is privileged. 650 Power ISATM -- Book III-E Version 2.04 TLB Synchronize X-form TLB Write Entry X-form tlbsync tlbwe 31 /// /// /// 566 / 31 /// /// /// 978 / 0 6 11 16 21 31 0 6 11 16 21 31 The tlbsync instruction provides an ordering function entry 1 SelectTLB(MAS0TLBSEL, MAS0ESEL, MAS2EPN) for the effects of all tlbivax instructions executed by the rpn 1 MAS7RPNU || MAS3RPNL processor executing the tlbsync instruction, with hint 1 MAS0NV respect to the memory barrier created by a subsequent entryV IPROT TID TS SIZE 1 MAS1V IPROT TID TS TSIZE sync (msync) instruction executed by the same pro- entryEPN VLE W I M G E ACM 1 MAS2EPN VLE W I M G E ACM entryU0:U3 UX SX UW SW UR SR 1 MAS3U0:U3 UX SX UW SW UR SR cessor. Executing a tlbsync instruction ensures that all entryRPN 1 rpn of the following will occur. The contents of the MAS registers are written to the 1 All TLB invalidations caused by tlbivax instructions TLB entry specified by MAS0TLBSEL, MAS0ESEL, and preceding the tlbsync instruction will have com- MAS2EPN. pleted on any other processor before any storage accesses associated with data accesses caused MAS0NV provides a suggestion to hardware of where by instructions following the sync (msync) instruc- the next hardware hint for replacement should be given tion are performed with respect to that processor. when the next Data or Instruction TLB Error Interrupt, tlbsx, or tlbre instruction occurs. 1 All storage accesses by other processors for which the address was translated using the translations If the specified entry does not exist, the results are being invalidated will have been performed with undefined. respect to the processor executing the sync (msync) instruction, to the extent required by the A context synchronizing instruction is required after a associated Memory Coherence Required tlbwe instruction to ensure any subsequent instructions attributes, before the sync (msync) instruction's that will use the updated TLB values execute in the new memory barrier is created. context. The operation performed by this instruction is ordered This instruction is privileged. by the mbar or sync (msync) instruction with respect Special Registers Altered: to preceding tlbivax instructions executed by the pro- None cessor executing the tlbsync instruction. The opera- tions caused by tlbivax and tlbsync are ordered by mbar as a set of operations, which is independent of the other sets that mbar orders. The tlbsync instruction may complete before opera- tions caused by tlbivax instructions preceding the tlb- sync instruction have been performed. This instruction is privileged. Special Registers Altered: None Appendix D. Type FSL Storage Control [Category: Embedded.MMU Type 651 Version 2.04 652 Power ISATM -- Book III-E Version 2.04 Appendix E. Example Performance Monitor [Category: Embedded.Performance Monitor] 1 Counter registers. These registers are used for E.1 Overview data collection. The occurrence of selected events This appendix describes an example of a Performance are counted here. These registers are named Monitor facility. It defines an architecture suitable for PMC0..15. User and supervisor level access to performance monitoring facilities in the Embedded these registers is through different PMR numbers environment. The architecture itself presents only pro- allowing different access rights. gramming model visible features in conjunction with 1 Global controls. This register control global set- architecturally defined behavioral features. Much of the tings of the Performance Monitor facility and affect selection of events is by necessity implementation- all counters. This register is named PMGC0. User dependent and is not described as part of the architec- and supervisor level access to these registers is ture; however, this document provides guidelines for through different PMR numbers allowing different some features of a performance monitor implementa- access rights. In addition, a bit in the MSR tion that should be followed by all implementations. (MSRPMM) is defined to enable/disable counting. The example Performance Monitor facility provides the 1 Local controls. These registers control settings that ability to monitor and count predefined events such as apply only to a particular counter. These registers processor clocks, misses in the instruction cache or are named PMLCa0..15 and PMLCb0..15. User data cache, types of instructions decoded, or mispre- and supervisor level access to these registers is dicted branches. The count of such events can be used through different PMR numbers allowing different to trigger the Performance Monitor exception. While access rights. Each set of local control registers most of the specific events are not architected, the (PMLCan and PMLCbn) contains controls that mechanism of controlling data collection is. apply to the associated same numbered counter register (e.g. PMLCa0 and PMLCb0 contain con- The example Performance Monitor facility can be used trols for PMC0 while PMLCa1 and PMLCb1 con- to do the following: tain controls for PMC1). 1 Improve system performance by monitoring soft- ware execution and then recoding algorithms for Assembler Note more efficiency. For example, memory hierarchy The counter registers, global controls, and local behavior can be monitored and analyzed to opti- controls have alias names which cause the assem- mize task scheduling or data distribution algo- bler to use different PMR numbers. The names rithms. PMC0...15, PMGC0, PMLCa0...15, and 1 Characterize processors in environments not eas- PMLCb0...15 cause the assembler to use the ily characterized by benchmarking. supervisor level PMR number, and the names UPMC0...15, UPMGC0, UPMLCa0...15, and 1 Help system developers bring up and debug their UPMLCb0...15 cause the assembler to use the systems. user-level PMR number. A given implementation may implement fewer counter E.2 Programming Model registers (and their associated control registers) than The example Performance Monitor facility defines a set are architected. Architected counter and counter con- of Performance Monitor Registers (PMRs) that are trol registers that are not implemented behave the used to collect and control performance data collection same as unarchitected Performance Monitor Registers. and an interrupt to allow intervention by software. The PMRs are described in Section E.3. PMRs provide various controls and access to collected data. They are categorized as follows: Software uses the global and local controls to select which events are counted in the counter registers, when such events should be counted, and what action Appendix E. Example Performance Monitor [Category: Embedded.Perfor- 653 Version 2.04 should be taken when a counter overflows. Software monitoring of each processor state are shown in can use the collected information to determine perfor- Figure 41. mance attributes of a given segment of code, a pro- cess, or the entire software system. PMRs can be read Processor State FCS FCU FCM1 FCM0 by software using the mfpmr instruction and PMRs can Marked 0 0 0 1 be written by using the mtpmr instruction. Both instruc- tions are described in Section E.4. Not marked 0 0 1 0 Supervisor 0 1 0 0 Since counters are defined as 32-bit registers, it is pos- sible for the counting of some events to overflow. A Per- User 1 0 0 0 formance Monitor interrupt is provided that can be Marked and supervisor 0 1 0 1 programmed to occur in the event of a counter overflow. Marked and user 1 0 0 1 The Performance Monitor interrupt is described in detail in Section E.2.5 and Section E.2.6. Not marked and supervisor 0 1 1 0 Not mark and user 1 0 1 0 E.2.1 Event Counting All 0 0 0 0 None X X 1 1 Event counting can be configured in several different None 1 1 X X ways. This section describes configurability and spe- cific unconditional counting modes. Figure 41. Processor States and PMLCan Bit Settings E.2.2 Processor Context Config- Two unconditional counting modes may be specified: urability 1 Counting is unconditionally enabled regardless of the states of MSRPMM and MSRPR. This can be Counting can be enabled if conditions in the processor accomplished by setting PMLCanFCS, state match a software-specified condition. Because a PMLCanFCU, PMLCanFCM1, and PMLCanFCM0 to software task scheduler may switch a processor's exe- 0 for each counter control. cution among multiple processes and because statis- tics on only a particular process may be of interest, a 1 Counting is unconditionally disabled regardless of facility is provided to mark a process. The Performance the states of MSRPMM and MSRPR. This can be Monitor mark bit, MSRPMM, is used for this purpose. accomplished by setting PMGC0FAC to 1 or by set- System software may set this bit to 1 when a marked ting PMLCanFC to 1 for each counter control. Alter- process is running. This enables statistics to be gath- natively, this can be accomplished by setting ered only during the execution of the marked process. PMLCanFCM1 to 1 and PMLCanFCM0 to 1 for each The states of MSRPR and MSRPMM together define a counter control or by setting PMLCanFCS to 1 and state that the processor (supervisor or user) and the PMLCanFCU to 1 for each counter control. process (marked or unmarked) may be in at any time. If this state matches an individual state specified by the Programming Note PMLCanFCS, PMLCanFCU, PMLCanFCM1 and Events may be counted in a fuzzy manner. That is, PMLCanFCM0 fields in PMLCan (the state for which events may not be counted precisely due to the monitoring is enabled), counting is enabled for PMCn. nature of an implementation. Users of the Perfor- Each event, on an implementation basis, may count mance Monitor facility should be aware that an regardless of the value of MSRPMM. The counting event may be counted even if it was precisely fil- behavior of each event should be documented in the tered, though it should not have been. In general User's Manual. such discrepancies are statistically unimportant and users should not assume that counts are The processor states and the settings of the explicitly accurate. PMLCanFCS, PMLCanFCU, PMLCanFCM1 and PMLCanFCM0 fields in PMLCan necessary to enable E.2.3 Event Selection Events to count are determined by placing an imple- mentation defined event value into the PMLCa0..15EVENT field. Which events may be pro- grammed into which counter are implementation spe- cific and should be defined in the User's Manual. In general, most events may be programmed into any of the implementation available counters. Programming a 654 Power ISATM -- Book III-E Version 2.04 counter with an event that is not supported for that Programming Note counter gives boundedly undefined results. When taking a Performance Monitor interrupt soft- Programming Note ware should clear the overflow condition by reading the counter register and setting the counter register Event name and event numbers will differ greatly to a non-overflow value since the normal return across implementations and software should not from the interrupt will set MSREE back to 1. expect that events and event names will be consis- tent. E.3 Performance Monitor Regis- E.2.4 Thresholds ters Thresholds are values that must be exceeded for an event to be counted. Threshold values are programmed in the PMLCb0..15THRESHOLD field. The events which E.3.1 Performance Monitor Glo- may be thresholded and the units of each event that bal Control Register 0 may be thresholded are implementation-dependent. Programming a threshold value for an event that is not The Performance Monitor Global Control Register 0 defined to use a threshold gives boundedly undefined (PMGC0) controls all Performance Monitor counters. results. PMGC0 32 63 E.2.5 Performance Monitor Excep- Figure 42. [User] Performance Monitor Global tion Control Register 0 A Performance Monitor exception occurs when counter These bits are interpreted as follows: overflow detection is enabled and a counter overflows. More specifically, for each counter register n, if Bit Description PMGC0PMIE=1 and PMLCanCE=1 and PMCnOV=1 and 32 Freeze All Counters (FAC) MSREE = 1, a Performance Monitor exception is said to The FAC bit is sticky; that is, once set to 1 it exist. The Performance Monitor exception condition will remains set to 1 until it is set to 0 by an mtpmr cause a Performance Monitor interrupt if the exception instruction. is the highest priority exception. 0 The PMCs can be incremented (if enabled The Performance Monitor exception is level sensitive by other Performance Monitor control and the exception condition may cease to exist if any of fields). the required conditions fail to be met. Thus it is possible 1 The PMCs can not be incremented. for a counter to overflow and continue counting events 33 Performance Monitor Interrupt Enable until PMCnOV becomes 0 without taking a Performance (PMIE) Monitor interrupt if MSREE = 0 during the overflow con- dition. To avoid this, software should program the 0 Performance Monitor interrupts are dis- counters to freeze if an overflow condition is detected abled. (see Section E.3.4). 1 Performance Monitor interrupts are enabled and occur when an enabled con- dition or event occurs. Enabled conditions E.2.6 Performance Monitor Inter- and events are described in Section E.2.5. rupt 34 Freeze Counters on Enabled Condition or Event (FCECE) A Performance Monitor interrupt occurs when a Perfor- Enabled conditions and events are described mance Monitor exception exists and no higher priority in Section E.2.5. exception exists. When a Performance Monitor inter- rupt occurs, SRR0 and SRR1 record the current state 0 The PMCs can be incremented (if enabled of the NIA and the MSR, the MSR is set to handle the by other Performance Monitor control interrupt, and instruction execution resumes at fields). IVPR0:47 || IVOR3548:59 || 0b0000. 1 The PMCs can be incremented (if enabled by other Performance Monitor control The Performance Monitor interrupt is precise and asyn- fields) only until an enabled condition or chronous. event occurs. When an enabled condition or event occurs, PMGC0FAC is set to 1. It is the user's responsibility to set PMGC0FAC to 0. Appendix E. Example Performance Monitor [Category: Embedded.Perfor- 655 Version 2.04 35:63 Reserved 0 Overflow conditions for PMCn cannot occur (PMCn cannot cause interrupts, The UPMGC0 register is an alias to the PMGC0 regis- cannot freeze counters) ter for user mode read only access. 1 Overflow conditions occur when the most- significant-bit of PMCn is equal to 1. E.3.2 Performance Monitor Local It is recommended that CE be set to 0 when Control A Registers counter PMCn is selected for chaining; see Section E.5.1. The Performance Monitor Local Control A Registers 0 through 15 (PMLCa0..15) function as event selectors 38:40 Reserved and give local control for the corresponding numbered 41:47 Event Selector (EVENT) Performance Monitor counters. PMLCa works with the Up to 128 events selectable; see Section corresponding numbered PMLCb register. E.2.3. 48:53 Setting is implementation-dependent. PMLCa0..15 32 63 54:63 Reserved Figure 43. [User] Performance Monitor Local The UPMLCa0..15 registers are aliases to the Control A Registers PMLCa0..15 registers for user mode read only access. PMLCa is set to 0 at reset. These bits are interpreted as follows: E.3.3 Performance Monitor Local Bit Description Control B Registers 32 Freeze Counter (FC) The Performance Monitor Local Control B Registers 0 0 The PMC can be incremented (if enabled through 15 (PMLCb0..15) specify a threshold value and by other Performance Monitor control a multiple to apply to a threshold event selected for the fields). corresponding Performance Monitor counter. Threshold 1 The PMC can not be incremented. capability is implementation counter dependent. Not all events or all counters of an implementation are guaran- 33 Freeze Counter in Supervisor State (FCS) teed to support thresholds. PMLCb works with the cor- 0 The PMC is incremented (if enabled by responding numbered PMLCa register. other Performance Monitor control fields). 1 The PMC can not be incremented if PMLCb0..15 MSRPR is 0. 32 63 34 Freeze Counter in User State (FCU) Figure 44. [User] Performance Monitor Local Control B Register 0 The PMC can be incremented (if enabled by other Performance Monitor control PMLCb is set to 0 at reset. These bits are interpreted fields). as follows: 1 The PMC can not be incremented if Bit Description MSRPR is 1. 32:52 Reserved 35 Freeze Counter while Mark is Set (FCM1) 53:55 Threshold Multiple (THRESHMUL) 0 The PMC can be incremented (if enabled by other Performance Monitor control 000 Threshold field is multiplied by 1 fields). (THRESHOLD × 1) 1 The PMC can not be incremented if 001 Threshold field is multiplied by 2 MSRPMM is 1. (THRESHOLD × 2) 010 Threshold field is multiplied by 4 36 Freeze Counter while Mark is Cleared (THRESHOLD × 4) (FCM0) 011 Threshold field is multiplied by 8 0 The PMC can be incremented (if enabled (THRESHOLD × 8) by other Performance Monitor control 100 Threshold field is multiplied by 16 fields). (THRESHOLD × 16) 1 The PMC can not be incremented if 101 Threshold field is multiplied by 32 MSRPMM is 0. (THRESHOLD × 32) 37 Condition Enable (CE) 110 Threshold field is multiplied by 64 (THRESHOLD × 64) 656 Power ISATM -- Book III-E Version 2.04 111 Threshold field is multiplied by 128 counter increments from a value below 2,147,483,648 (THRESHOLD × 128) (0x8000_0000) to a value greater than or equal to 2,147,483,648 (0x8000_0000). 56:57 Reserved 58:63 Threshold (THRESHOLD) Several different actions may occur when an overflow Only events that exceed the value THRESH- state is reached, depending on the configuration: OLD multiplied as described by THRESHMUL 1 If PMLCanCE is 0, no special actions occur on are counted. Events to which a threshold overflow: the counter continues incrementing, and value applies are implementation-dependent no exception is signaled. as are the unit (for example duration in cycles) 1 If PMLCanCE and PMGC0FCECE are 1, all counters and the granularity with which the threshold are frozen when PMCn overflows. value is interpreted. 1 If PMLCanCE, PMGC0PMIE, and MSREE are 1, an exception is signalled when PMCn reaches over- Programming Note flow. Note that the interrupts are masked by setting By varying the threshold value, software can obtain MSREE to 0. An overflow condition may be present a profile of the event characteristics subject to while MSREE is zero, but the interrupt is not taken thresholding. For example, if PMC1 is configured to until MSREE is set to 1. count cache misses that last longer than the If an overflow condition occurs while MSREE is 0 (the threshold value, software can measure the distribu- exception is masked), the exception is still signalled tion of cache miss durations for a given program by once MSREE is set to 1 if the overflow condition is still monitoring the program repeatedly using a different present and the configuration has not been changed in threshold value each time. the meantime to disable the exception; however, if MSREE remains 0 until after the counter leaves the The UPMLCb0..15 registers are aliases to the overflow state (MSB becomes 0), or if MSREE remains PMLCb0..15 registers for user mode read only access. 0 until after PMLCanCE or PMGC0PMIE are set to 0, the exception does not occur. E.3.4 Performance Monitor Programming Note Counter Registers Loading a PMC with an overflowed value can The Performance Monitor Counter Registers cause an immediate exception. For example, if (PMC0..15) are 32-bit counters that can be pro- PMLCanCE, PMGC0PMIE, and MSREE are all 1, grammed to generate interrupt signals when they over- and an mtpmr loads an overflowed value into a flow. Each counter is enabled to count up to 128 PMCn that previously held a non-overflowed value, events. then an interrupt will be generated before any event counting has occurred. PMC0..15 32 63 The following sequence is generally recommended for setting the counter values and configurations. Figure 45. [User] Performance Monitor Counter 1. Set PMGC0FAC to 1 to freeze the counters. Registers 2. Perform a series of mtpmr operations to initialize PMCs are set to 0 at reset. These bits are interpreted counter values and configure the control registers as follows: 3. Release the counters by setting PMGC0FAC to 0 Bit Description with a final mtpmr. 32 Overflow (OV) 0 Counter has not reached an overflow state. 1 Counter has reached an overflow state. 33:63 Counter Value (CV) Indicates the number of occurrences of the specified event. The minimum value for a counter is 0 (0x0000_0000) and the maximum value is 4,294,967,295 (0xFFFF_FFFF). A counter can increment up to the maximum value and then wraps to the minimum value. A counter enters the overflow state when the high-order bit is set to 1, which normally occurs only when the Appendix E. Example Performance Monitor [Category: Embedded.Perfor- 657 Version 2.04 E.4 Performance Monitor Instructions Move From Performance Monitor Register Move To Performance Monitor Register XFX-form XFX-form mfpmr RT,PMRN mtpmr PMRN,RS 31 RT pmrn 334 / 31 RS pmrn 462 / 0 6 11 21 31 0 6 11 21 31 n 1 pmrn5:9 || pmrn0:4 n 1 pmrn5:9 || pmrn0:4 if length(PMR(n)) = 64 then if length(PMR(n)) = 64 then RT 1 PMR(n) PMR(n) 1 (RS) else else RT 1 320 || PMR(n)32:63 PMR(n) 1 (RS)32:63 Let PMRN denote a Performance Monitor Register Let PMRN denote a Performance Monitor Register number and PMR the set of Performance Monitor Reg- number and PMR the set of Performance Monitor Reg- isters. isters. The contents of the designated Performance Monitor The contents of the register RS are placed into the des- Register are placed into register RT. ignated Performance Monitor Register. The list of defined Performance Monitor Registers and The list of defined Performance Monitor Registers and their privilege class is provided in Figure 46. their privilege class is provided in Figure 46. Execution of this instruction specifying a defined and Execution of this instruction specifying a defined and privileged Performance Monitor Register when privileged Performance Monitor Register when MSRPR=1 will result in a Privileged Instruction excep- MSRPR=1 will result in a Privileged Instruction excep- tion. tion. Execution of this instruction specifying an undefined Execution of this instruction specifying an undefined Performance Monitor Register will either result in an Performance Monitor Register will either result in an Illegal Instruction exception or will produce an unde- Illegal Instruction exception or will perform no opera- fined value for register RT. tion. Special Registers Altered: Special Registers Altered: None None PMR1 Privileged decimal Register Name Cat pmrn5:9 pmrn0:4 mtpmr mfpmr 0-15 00000 0xxxx PMC0..15 - no E.PM 16-31 00000 1xxxx PMC0..15 yes yes E.PM 128-143 00100 0xxxx PMLCA0..15 - no E.PM 144-159 00100 1xxxx PMLCA0..15 yes yes E.PM 256-271 01000 0xxxx PMLCB0..15 - no E.PM 272-287 01000 1xxxx PMLCB0..15 yes yes E.PM 384 01100 00000 PMGC0 - no E.PM 400 01100 10000 PMGC0 yes yes E.PM - This register is not defined for this instruction. 1 Note that the order of the two 5-bit halves of the PMR number is reversed. Figure 46. Embedded.Peformance Monitor PMRs 658 Power ISATM -- Book III-E Version 2.04 E.5 Performance Monitor Soft- ware Usage Notes E.5.1 Chaining Counters An implementation may contain events that are used to "chain" counters together to provide a larger range of event counts. This is accomplished by programming the desired event into one counter and programming another counter with an event that occurs when the first counter transitions from 1 to 0 in the most significant bit. The counter chaining feature can be used to decrease the processing pollution caused by Performance Moni- tor interrupts, (things like cache contamination, and pipeline effects), by allowing a higher event count than is possible with a single counter. Chaining two counters together effectively adds 32 bits to a counter register where the first counter's carry-out event acts like a carry-out feeding the second counter. By defining the event of interest to be another PMC's overflow genera- tion, the chained counter increments each time the first counter rolls over to zero. Multiple counters may be chained together. Because the entire chained value cannot be read in a single instruction, an overflow may occur between counter reads, producing an inaccurate value. A sequence like the following is necessary to read the complete chained value when it spans multiple counters and the counters are not frozen. The example shown is for a two-counter case. loop: mfpmr Rx,pmctr1 #load from upper counter mfpmr Ry,pmctr0 #load from lower counter mfpmr Rz,pmctr1 #load from upper counter cmp cr0,0,Rz,Rx #see if `old' = `new' bc 4,2,loop #loop if carry occurred between reads The comparison and loop are necessary to ensure that a consistent set of values has been obtained. The above sequence is not necessary if the counters are frozen. E.5.2 Thresholding Threshold event measurement enables the counting of duration and usage events. Assume an example event, dLFB load miss cycles, requires a threshold value. A dLFB load miss cycles event is counted only when the number of cycles spent recovering from the miss is greater than the threshold. If the event is counted on two counters and each counter has an individual threshold, one execution of a performance monitor pro- gram can sample two different threshold values. Mea- suring code performance with multiple concurrent thresholds expedites code profiling significantly. Appendix E. Example Performance Monitor [Category: Embedded.Perfor- 659 Version 2.04 660 Power ISATM -- Book III-E Version 2.04 Book VLE: Power ISA Operating Environment Architecture - Variable Length Encoding (VLE) Environment Book VLE: Power ISA Operating Environment Architecture - 661 Version 2.04 662 Power ISATM -- Book VLE Version 2.04 Chapter 1. Variable Length Encoding Introduction 1.1 Overview. . . . . . . . . . . . . . . . . . . . 663 1.4.6 R-form (16-bit Monadic Instructions) 1.2 Documentation Conventions. . . . . 664 665 1.2.1 Description of Instruction Operation 1.4.7 RR-form (16-bit Dyadic Instructions) 664 665 1.3 Instruction Mnemonics and Operands 1.4.8 SD4-form (16-bit Load/Store Instruc- 664 tions) . . . . . . . . . . . . . . . . . . . . . . . . . . 665 1.4 VLE Instruction Formats . . . . . . . . 664 1.4.9 BD15-form . . . . . . . . . . . . . . . . . 665 1.4.1 BD8-form (16-bit Branch Instruc- 1.4.10 BD24-form . . . . . . . . . . . . . . . . 665 tions) . . . . . . . . . . . . . . . . . . . . . . . . . . 664 1.4.11 D8-form . . . . . . . . . . . . . . . . . . 665 1.4.2 C-form (16-bit Control Instructions) . 1.4.12 I16A-form . . . . . . . . . . . . . . . . . 665 664 1.4.13 I16L-form . . . . . . . . . . . . . . . . . 665 1.4.3 IM5-form (16-bit register + immediate 1.4.14 M-form . . . . . . . . . . . . . . . . . . . 665 Instructions) . . . . . . . . . . . . . . . . . . . . 664 1.4.15 SCI8-form . . . . . . . . . . . . . . . . . 665 1.4.4 OIM5-form (16-bit register + offset 1.4.16 LI20-form . . . . . . . . . . . . . . . . . 665 immediate Instructions) . . . . . . . . . . . . 664 1.4.17 Instruction Fields . . . . . . . . . . . 665 1.4.5 IM7-form (16-bit Load immediate Instructions) . . . . . . . . . . . . . . . . . . . . 664 This chapter describes computation modes, document standard instruction encodings and VLE instructions for conventions, a processor overview, instruction formats, that page of memory. storage addressing, and instruction addressing. Instruction encodings in pages marked as VLE are either 16 or 32 bits long, and are aligned on 16-bit 1.1 Overview boundaries. Because of this, all instruction pages marked as VLE are required to use Big-Endian byte Variable Length Encoding (VLE) is a code density opti- ordering. mized re-encoding of much of the instruction set The programming model uses the same register set defined by Books I, II, and III-E using both 16-bit and with both instruction set encodings, although some reg- 32-bit instruction formats. isters are not accessible by VLE instructions using the VLE offers more efficient binary representations of 16-bit formats and not all condition register (CR) fields applications for the embedded processor spaces where are used by Conditional Branch instructions or instruc- code density plays a major role in affecting overall sys- tions that access the condition register executing from tem cost, and to a somewhat lesser extent, perfor- a VLE instruction page. In addition, immediate fields mance. and displacements differ in size and use, due to the more restrictive encodings imposed by VLE instruction VLE is a supplement to the instruction set defined by formats. Book I-III and code pages using VLE encoding or non- VLE encoding can be intermingled in a system provid- VLE additional instruction fields are described in ing focus on both high performance and code density Section 1.4.17, "Instruction Fields". where most needed. Other than the requirement of Big-Endian byte ordering VLE provides alternative encodings to instructions for instruction pages and the additional storage defined in Books I-III to enable reduced code footprint. attribute to identify whether the instruction page corre- This set of alternative encodings is selected on a page sponds to a VLE section of code, VLE complies with basis. A single storage attribute bit selects between the memory model, register model, timer facilities, debug facilities, and interrupt/exception model defined Chapter 1. Variable Length Encoding Introduction 663 Version 2.04 in Book I-III and therefore execute in the same environ- In some cases an instruction field must contain a par- ment as non-VLE instructions. ticular value. If a field that must contain a particular value does not contain that value, the instruction form is invalid and the results are as described for invalid 1.2 Documentation Conventions instruction forms in Book I. Book VLE adheres to the documentation conventions VLE instructions use split field notation as defined in defined inSection 1.3 of Book I. Note however that this Section 1.6 of Book I. book defines instructions that apply to the User Instruc- tion Set Architecture, the Virtual Environment Architec- 1.4.1 BD8-form (16-bit Branch ture, and the Operating Environment Architecture. Instructions) 1.2.1 Description of Instruction 0 5 6 7 8 15 Operation OPCD BO16 BI16 BD8 OPCD X O LK BD8 The RTL (register transfer language) descriptions in Book VLE conform to the conventions described in Figure 1. BD8 instruction format Section 1.3.4 of Book I. 1.4.2 C-form (16-bit Control 1.3 Instruction Mnemonics and Instructions) Operands 0 15 The description of each instruction includes the mne- OPCD monic and a formatted list of operands. VLE instruction OPCD LK semantics are either identical or similar to those of other instructions in the architecture. Where the semantics, side-effects, and binary encodings are iden- Figure 2. C instruction format tical, the standard mnemonics and formats are used. Such unchanged instructions are listed and appropri- 1.4.3 IM5-form (16-bit register + ately referenced, but the instruction definitions are not replicated in this book. Where the semantics are similar immediate Instructions) but the binary encodings differ, the standard mnemonic 0 6 7 12 15 is typically preceded with an e_ to denote a VLE instruction. To distinguish between similar instructions OPCD X O UI5 RX available in both 16- and 32-bit forms under VLE and standard instructions, VLE instructions encoded with Figure 3. IM5 instruction format 16 bits have an se_ prefix. The following are examples: stwx RS,RA,RB // standard Book I instruction e_stw RS,D(RA) // 32-bit VLE instruction 1.4.4 OIM5-form (16-bit register + se_stw RZ,SD4(RX) // 16-bit VLE instruction offset immediate Instructions) 1.4 VLE Instruction Formats 0 OPCD 6 7 X OIM5 12 RX 15 O All VLE instructions to be executed are either two or OPCD R C OIM5 RX four bytes long and are halfword-aligned in storage. Thus, whenever instruction addresses are presented to Figure 4. OIM5 instruction format the processor (as in Branch instructions), the low-order bit is treated as 0. Similarly, whenever the processor generates an instruction address, the low-order bit is 1.4.5 IM7-form (16-bit Load imme- zero. diate Instructions) The format diagrams given below show horizontally all valid combinations of instruction fields. Only those for- 0 5 12 15 mats that are unique to VLE-defined instructions are OPCD UI7 RX included here. Instruction forms that are available in VLE or non-VLE mode are described in Section 1.6 of Figure 5. IM7 instruction format Book I and are not repeated here. 664 Power ISATM -- Book VLE Version 2.04 1.4.6 R-form (16-bit Monadic 1.4.12 I16A-form Instructions) 0 6 11 16 21 31 OPCD si RA XO si 0 6 12 15 OPCD XO RX OPCD ui RA XO ui Figure 6. R instruction format Figure 12. I16A instruction format 1.4.7 RR-form (16-bit Dyadic 1.4.13 I16L-form Instructions) 0 6 11 16 21 31 OPCD RT ui XO ui 0 6 7 8 12 15 OPCD XO RY RX Figure 13. I16L instruction format OPCD X R O C RY RX OPCD XO ARY RX OPCD XO RY ARX 1.4.14 M-form 0 6 11 16 21 26 31 Figure 7. RR instruction format OPCD RS RA SH MB ME X O OPCD RS RA SH MB ME X O 1.4.8 SD4-form (16-bit Load/Store Instructions) Figure 14. M instruction format 0 4 OPCD SD4 8 RZ 12 RX 15 1.4.15 SCI8-form 0 6 9 11 16 20 21 22 24 31 Figure 8. SD4 instruction format OPCD RT RA XO Rc F SCL UI8 OPCD RT RA XO F SCL UI8 1.4.9 BD15-form OPCD RS RA XO Rc F SCL UI8 OPCD RS RA XO F SCL UI8 0 10 12 16 31 OPCD 000 BF32 RA XO F SCL UI8 OPCD BO32 BI32 BD15 LK OPCD 001 BF32 RA XO F SCL UI8 OPCD XO RA XO F SCL UI8 Figure 9. BD15 instruction format Figure 15. SC18 instruction format 1.4.10 BD24-form 1.4.16 LI20-form 0 6 7 31 OPCD 0 BD24 LK 0 6 11 16 17 21 31 OPCD RT li20 XO li20 li20 Figure 10. BD24 instruction format Figure 16. LI20 instruction format 1.4.11 D8-form 1.4.17 Instruction Fields 0 6 11 16 24 31 OPCD RT RA XO D8 VLE uses instruction fields defined in Section 1.6.22 of OPCD RS RA XO D8 Book I as well as VLE-defined instruction fields defined below. Figure 11. D8 instruction format ARX (12:15) Field used to specify an "alternate" General Purpose Register in the range R8:R23 to be used as a destination. Chapter 1. Variable Length Encoding Introduction 665 Version 2.04 ARY (8:11) 1 Set the Link Register. The sum of the Field used to specify an "alternate" General value 2 or 4 and the address of the Branch Purpose Register in the range R8:R23 to be instruction is placed into the Link Register. used as a source. OIM5 (7:11) Offset Immediate field used to specify a 5-bit unsigned fixed-point value in the range [1:32] BD8 (8:15), BD15 (16:30), BD24 (7:30) encoded as [0:31]. Thus the binary encoding Immediate field specifying a signed two's of 0b00000 represents an immediate value of complement branch displacement which is 1, 0b00001 represents an immediate value of concatenated on the right with 0b0 and sign- 2, and so on. extended to 64 bits. OPCD (0:3, 0:4, 0:5, 0:9, 0:14, 0:15) BD15. (Used by 32-bit branch conditional Primary opcode field. class instructions) A 15-bit signed displace- ment that is sign-extended and shifted left one Rc (6, 7, 20, 31) bit (concatenated with 0b0) and then added to RECORD bit. the current instruction address to form the 0 Do not alter the Condition Register. branch target address. 1 Set Condition Register Field 0. BD24. (Used by 32-bit branch class instruc- RX (12:15) tions) A 24-bit signed displacement that is Field used to specify a General Purpose Reg- sign-extended and shifted left one bit (concat- ister in the ranges R0:R7 or R24:R31 to be enated with 0b0) and then added to the cur- used as a source or as a destination. R0 is rent instruction address to form the branch encoded as 0b0000, R1 as 0b0001, etc. R24 target address. is encoded as 0b1000, R25 as 0b1001, etc. BD8. (Used by 16-bit branch and branch con- RY (8:11) ditional class instructions) An 8-bit signed dis- Field used to specify a General Purpose Reg- placement that is sign-extended and shifted ister in the ranges R0:R7 or R24:R31 to be left one bit (concatenated with 0b0) and then used as a source. R0 is encoded as 0b0000, added to the current instruction address to R1 as 0b0001, etc. R24 is encoded as form the branch target address. 0b1000, R25 as 0b1001, etc. BI16 (6:7), BI32 (12:15) RZ (8:11) Field used to specify one of the Condition Field used to specify a General Purpose Reg- Register fields to be used as a condition of a ister in the ranges R0:R7 or R24:R31 to be Branch Conditional instruction. used as a source or as a destination for load/ BO16 (5), BO32 (10:11) store data. R0 is encoded as 0b0000, R1 as 0b0001, etc. R24 is encoded as 0b1000, R25 Field used to specify whether to branch if the as 0b1001, etc. condition is true, false, or to decrement the Count Register and branch if the Count Regis- SCL (22:23) ter is not zero in a Branch Conditional instruc- Field used to specify a scale amount in Imme- tion. diate instructions using the SCI8-form. Scaling involves left shifting by 0, 8, 16, or 24 bits. BF32 (9:10) Field used to specify one of the Condition SD4 (4:7) Register fields to be used as a target of a Used by 16-bit load and store class instruc- compare instruction. tions. The SD4 field is a 4-bit unsigned imme- diate value zero-extended to 64 bits, shifted D8 (24:31) left according to the size of the operation, and The D8 field is a 8-bit signed displacement then added to the base register to form a 64- which is sign-extended to 64 bits. bit EA. For byte operations, no shift is per- F (21) Fill value used to fill the remaining 56 bits of a formed. For half-word operations, the immedi- scaled-immediate 8 value. ate is shifted left one bit (concatenated with LI20 (17:20 || 11:15 || 21:31) 0b0). For word operations, the immediate is A 20-bit signed immediate value which is sign- shifted left two bits (concatenated with extended to 64 bits for the e_li instruction. 0b00).SI (6:10 || 21:31, 11:15 || 21:31) A 16-bit signed immediate value sign- LK (7, 16, 31) extended to 64 bits and used as one operand LINK bit. of the instruction. 0 Do not set the Link Register. 666 Power ISATM -- Book VLE Version 2.04 UI (6:10 || 21:31, 11:15 || 21:31) A 16-bit unsigned immediate value zero- extended to 64 bits or padded with 16 zeros and used as one operand of the instruction. The instruction encoding differs between the I16A and I16L instruction formats as shown in Section 1.4.12 and Section 1.4.13. UI5 (7:11) Immediate field used to specify a 5-bit unsigned fixed-point value. UI7 (5:11) Immediate field used to specify a 7-bit unsigned fixed-point value. UI8 (24:31) Immediate field used to specify an 8-bit unsigned fixed-point value. XO (6, 6:7, 6:10, 6:11, 16, 16:19,16:23) Extended opcode field. Assembler Note For scaled immediate instructions using the SCI8- form, the instruction assembly syntax requires a single immediate value, sci8, that the assembler will synthesize into the appropriate F, SCL, and UI8 fields. The F, SCL, and UI8 fields must be able to be formed correctly from the given sci8 value or the assembler will flag the assembly instruction as an error. Chapter 1. Variable Length Encoding Introduction 667 Version 2.04 668 Power ISATM -- Book VLE Version 2.04 Chapter 2. VLE Storage Addressing 2.1 Data Storage Addressing Modes . 669 2.2.1 Misaligned, Mismatched, and Byte 2.2 Instruction Storage Addressing Modes Ordering Instruction Storage Exceptions. . 670 670 2.2.2 VLE Exception Syndrome Bits . . 670 A program references memory using the effective address (EA) computed by the processor when it exe- cutes a Storage Access or Branch instruction (or cer- tain other instructions described in Book II and Book III- E), or when it fetches the next sequential instruction. 2.1 Data Storage Addressing Modes Table 1 lists data storage addressing modes supported by the VLE category. Table 1: Data Storage Addressing Modes Mode Form Description Base+16-bit displacement D-form The 16-bit D field is sign-extended and added to the contents of the GPR (32-bit instruction format) designated by RA or to zero if RA = 0 to produce the EA. Base+8-bit displacement D8-form The 8-bit D8 field is sign-extended and added to the contents of the GPR (32-bit instruction format) designated by RA or to zero if RA = 0 to produce the EA. Base+scaled 4-bit displace- SD4-form The 4-bit SD4 field zero-extended, scaled (shifted left) according to the ment size of the operand, and added to the contents of the GPR designated (16-bit instruction format) by RX to produce the EA. (Note that RX = 0 is not a special case.) Base+Index X-form The GPR contents designated by RB are added to the GPR contents (32-bit instruction format) designated by RA or to zero if RA = 0 to produce the EA. Chapter 2. VLE Storage Addressing 669 Version 2.04 2.2 Instruction Storage Addressing Modes Table 2 lists instruction storage addressing modes sup- ported by the VLE category. Table 2: Instruction Storage Addressing Modes Mode Description Taken BD24-form Branch instruc- The 24-bit BD24 field is concatenated on the right with 0b0, sign-extended, and tions (32-bit instruction format) then added to the address of the branch instruction. Taken B15-form Branch instruc- The 15-bit BD15 field is concatenated on the right with 0b0, sign-extended, and tions (32-bit instruction format) then added to the address of the branch instruction to form the EA of the next instruction. Take BD8-form Branch instruc- The 8-bit BD8 field is concatenated on the right with 0b0, sign-extended, and tions (16-bit instruction format) then added to the address of the branch instruction to form the EA of the next instruction. Sequential instruction fetching (or The value 4 [2] is added to the address of the current 32-bit [16-bit] instruction to non-taken branch instructions) form the EA of the next instruction. If the address of the current instruction is 0xFFFF_FFFF_FFFF_FFFC [0xFFFF_FFFF_FFFF_FFFE] in 64-bit mode or 0xFFFF_FFFC [0xFFFF_FFFE] in 32-bit mode, the address of the next sequential instruction is undefined. Any Branch instruction with The value 4 is added to the address of the current branch instruction and the LK = 1 (32-bit instruction for- result is placed into the LR. If the address of the current instruction is mat) 0xFFFF_FFFF_FFFF_FFFC in 64-bit mode o r0xFFFF_FFFC in 32-bit mode, the result placed into the LR is undefined. Branch se_bl. se_blrl. se_bctrl The value 2 is added to the address of the current branch instruction and the instructions (16-bit instruction result is placed into the LR. If the address of the current instruction is format) 0xFFFF_FFFF_FFFF_FFFE in 64-bit mode or 0xFFFF_FFFE in 32-bit mode, the result placed into the LR is undefined. 2.2.1 Misaligned, Mismatched, age Exception is detected and no higher priority excep- tion exists, an Instruction Storage Interrupt will occur and Byte Ordering Instruction Stor- setting SRR0 to the misaligned address for which exe- age Exceptions cution was attempted. A Byte Ordering Instruction Storage Exception occurs A Misaligned Instruction Storage Exception occurs when an implementation which supports VLE attempts when an implementation which supports VLE attempts to execute an instruction that has the VLE storage to execute an instruction that is not 32-bit aligned and attribute set to 1 and the E (Endian) storage attribute the VLE storage attribute is not set for the page that set to 1 for the page that corresponds to the effective corresponds to the effective address of the instruction. address of the instruction. If a Byte Ordering Instruction The attempted execution can be the result of a Branch Storage Exception is detected and no higher priority instruction which has bit 62 of the target address set to exception exists, an Instruction Storage Interrupt will 1 or the result of an rfi, se_rfi, rfci, se_rfci, rfdi, occur setting SRR0 to the address for which execution se_rfdi, rfmci, or se_rfmci instruction which has bit 62 was attempted. set in SRR0, SRR0, CSRR0, CSRR0, DSRR0, DSRR0, MCSRR0, or MCSRR0 respectively. If a Mis- aligned Instruction Storage Exception is detected and 2.2.2 VLE Exception Syndrome no higher priority exception exists, an Instruction Stor- age Interrupt will occur setting SRR0 to the misaligned Bits address for which execution was attempted. Two bits in the Exception Syndrome Register (ESR) A Mismatched Instruction Storage Exception occurs (see Section 5.2.9 of Book III-E) are provided to facili- when an implementation which supports VLE attempts tate VLE exception handling, VLEMI and MIF. to execute an instruction that crosses a page boundary ESRVLEMI is set when an exception and subsequent for which the first page has the VLE storage attribute interrupt is caused by the execution or attempted exe- set to 1 and the second page has the VLE storage cution of an instruction that resides in memory with the attribute bit set to 0. If a Mismatched Instruction Stor- VLE storage attribute set. 670 Power ISATM -- Book VLE Version 2.04 ESRMIF is set when an Instruction Storage Interrupt is caused by a Misaligned Instruction Storage Exception or when an Instruction TLB Error Interrupt was caused by a TLB miss on the second half of a misaligned 32-bit instruction. ESRBO is set when an Instruction Storage Interrupt is caused by a Mismatched Instruction Storage Exception or a Byte Ordering Instruction Storage Exception. Programming Note When an Instruction TLB Error Interrupt occurs as the result of a Instruction TLB miss on the second half of a 32-bit VLE instruction that is aligned to only 16-bits, SRR0 will point to the first half of the instruction and ESRMIF will be set to 1. Any other status posted as a result of the TLB miss (such as MAS register updates described in TYPE-FSL Memory Management) will reflect the page corre- sponding to the second half of the instruction which caused the Instruction TLB miss. Chapter 2. VLE Storage Addressing 671 Version 2.04 672 Power ISATM -- Book VLE Version 2.04 Chapter 3. VLE Compatibility with Books I­III 3.1 Overview. . . . . . . . . . . . . . . . . . . . 673 3.2.2 MMU Extensions . . . . . . . . . . . . 673 3.2 VLE Processor and Storage Control 3.3 VLE Limitations . . . . . . . . . . . . . . . 673 Extensions . . . . . . . . . . . . . . . . . . . . . 673 3.2.1 Instruction Extensions . . . . . . . . 673 This chapter addresses the relationship between VLE 3.2.1 Instruction Extensions and Books I­III. This section describes extensions to support VLE oper- ations. Because instructions may reside on a half-word 3.1 Overview boundary, bit 62 is not masked by instructions that read an instruction address from a register, such as the LR, Category VLE uses the same semantics as Books I­III. CTR, or a save/restore register 0, that holds an instruc- Due to the limited instruction encoding formats, VLE tion address: instructions typically support reduced immediate fields and displacements, and not all operations defined by The instruction set defined by Books I-III is modified to Books I­III are encoded in category VLE. The basic support halfword instruction addressing, as follows: philosophy is to capture all useful operations, with most 1 For Return From Interrupt instructions, such as rfi, frequent operations given priority. Immediate fields and rfci, rfdi, and rfmci no longer mask bit 62 of the displacements are provided to cover the majority of respective save/restore register 0. The destination ranges encountered in embedded control code. Instruc- address is SRR00:62 || 0b0, CSRR00:62 || 0b0, tions are encoded in either a 16- or 32-bit format, and DSRR00:62 || 0b0, MCSRR00:62 || 0b0 respectively. these may be freely intermixed. 1 For bclr, bclrl, bcctr, and bcctrl no longer mask bit 62 of the LR or CTR. The destination address is VLE instructions cannot access floating-point registers LR0:62 || 0b0 or CTR0:62 || 0b0. (FPRs). VLE instructions use GPRs and SPRs with the following limitations: 1 VLE instructions using the 16-bit formats are lim- 3.2.2 MMU Extensions ited to addressing GPR0­GPR7, and GPR24­ VLE operation is indicated by the VLE storage attribute. GPR31 in most instructions. Move instructions are When the VLE storage attribute for a page is set to 1, provided to transfer register contents between instruction fetches from that page are decoded and pro- these registers and GPR8­GPR23. cessed as VLE instructions. See Section 4.8.3 of Book 1 VLE compare and bit test instructions using the III-E. 16-bit formats implicitly set their results in CR0. When instructions are executing from a page that has VLE instruction encodings are generally different than the VLE storage attribute set to 1, the processor is said instructions defined by Books I­III, except that most to be in VLE mode. instructions falling within primary opcode 31 are encoded identically and have identical semantics unless they affect or access a resource not supported by category VLE. 3.3 VLE Limitations VLE instruction fetches are valid only when performed in a Big-Endian mode. Attempting to fetch an instruc- 3.2 VLE Processor and Storage tion in a Little-Endian mode from a page with the VLE Control Extensions storage attribute set causes an Instruction Storage Byte-ordering exception. This section describes additional functionality to sup- Support for concurrent modification and execution of port category VLE. VLE instructions is implementation-dependent. Chapter 3. VLE Compatibility with Books I­III 673 Version 2.04 674 Power ISATM -- Book VLE Version 2.04 Chapter 4. Branch Operation Instructions 4.1 Branch Processor Registers . . . . . 675 4.1.2 Link Register (LR) . . . . . . . . . . . 676 4.1.1 Condition Register (CR). . . . . . . 675 4.1.3 Count Register (CTR) . . . . . . . . 676 4.1.1.1 Condition Register Setting for 4.2 Branch Instructions . . . . . . . . . . . . 677 Compare Instructions . . . . . . . . . . . . . 676 4.3 System Linkage Instructions . . . . . 680 4.1.1.2 Condition Register Setting for the 4.4 Condition Register Instructions . . . 683 Bit Test Instruction. . . . . . . . . . . . . . . . 676 This section defines Branch instructions that can be 1 A specified CR field can be set as the result of a executed when a processor is in VLE mode and the fixed-point compare instruction. registers that support them. 1 CR field 0 can be set as the result of a fixed-point bit test instruction. 4.1 Branch Processor Registers Other instructions from implemented categories may also set bits in the CR in the same manner that they The registers that support branch operations are: would when not in VLE mode. 1 Section 4.1.1, "Condition Register (CR)" Instructions are provided to perform logical operations 1 Section 4.1.2, "Link Register (LR)" on individual CR bits and to test individual CR bits. 1 Section 4.1.3, "Count Register (CTR)" For all fixed-point instructions in which the Rc bit is defined and set, and for e_add2i., e_and2i.,and 4.1.1 Condition Register (CR) e_and2is., the first three bits of CR field 0 (CR32:34) are set by signed comparison of the result to zero, and The Condition Register (CR) is a 32-bit register which the fourth bit of CR field 0 (CR35) is copied from the reflects the result of certain operations, and provides a final state of XERSO. "Result" here refers to the entire mechanism for testing (and branching). The CR is more 64-bit value placed into the target register in 64-bit fully defined in Book I. mode, and to bits 32:63 of the value placed into the tar- Category VLE uses the entire CR, but some compari- get register in 32-bit mode. son operations and all Branch instructions are limited to using CR0­CR3. The full Book I condition register field if (64-bit mode) and logical operations are provided however. then M 1 0 else M 1 32 if (target_register)M:63 < 0 then c 1 0b100 CR else if (target_register)M:63 > 0 then c 1 0b010 32 63 else c 1 0b001 CR0 1 c || XERSO Figure 17. Condition Register If any portion of the result is undefined, the value The bits in the Condition Register are grouped into placed into the first three bits of CR field 0 is undefined. eight 4-bit fields, CR Field 0 (CR0) ... CR Field 7 (CR7), which are set by VLE defined instructions in one of the The bits of CR field 0 are interpreted as shown below. following ways. CR Bit Description 1 Specified fields of the condition register can be set by a move to the CR from a GPR (mtcrf, mtocrf). 32 Negative (LT) 1 A specified CR field can be set by a move to the The result is negative. CR from another CR field (e_mcrf) or from 33 Positive (GT) XER32:35 (mcrxr). The result is positive. 1 CR field 0 can be set as the implicit result of a 34 Zero (EQ) fixed-point instruction. The result is 0. Chapter 4. Branch Operation Instructions 675 Version 2.04 35 Summary overflow (SO) 4.1.2 Link Register (LR) This is a copy of the contents of XERSO at the completion of the instruction. VLE instructions use the Link Register (LR) as defined in Book I, although category VLE defines a subset of all variants of Book I conditional branches involving the 4.1.1.1 Condition Register Setting for LR. Compare Instructions For compare instructions, a CR field specified by the 4.1.3 Count Register (CTR) BF operand for the e_cmph, e_cmphl, e_cmpi, and e_cmpli instructions, or CR0 for the se_cmpl, VLE instructions use the Count Register (CTR) as e_cmp16i, e_cmph16i, e_cmphl16i, e_cmpl16i, defined in Book I, although category VLE defines a se_cmp, se_cmph, se_cmphl, se_cmpi, and subset of the variants of Book I conditional branches se_cmpli instructions, is set to reflect the result of the involving the CTR. comparison. The CR field bits are interpreted as shown below. A complete description of how the bits are set is given in the instruction descriptions and Section 5.6, "Fixed-Point Compare and Bit Test Instructions". Condition register bits settings for compare instructions are interpreted as follows. (Note: e_cmpi, and e_cmpli instructions have a BF32 field instead of BF field; for these instructions, BF32 should be substituted for BF in the list below.) CR Bit Description 4×BF + 32 Less Than (LT) For signed fixed-point compare, (RA) or (RX) < sci8, SI, (RB), or (RY). For unsigned fixed-point compare, (RA) or (RX) sci8, SI, (RB), or (RY). For unsigned fixed-point compare, (RA) or (RX) >u sci8, UI, UI5, (RB), or (RY). 4×BF + 34 Equal (EQ) For fixed-point compare, (RA) or (RX) = sci8, UI, UI5, SI, (RB), or (RY). 4×BF + 35 Summary Overflow (SO) For fixed-point compare, this is a copy of the contents of XERSO at the completion of the instruction. 4.1.1.2 Condition Register Setting for the Bit Test Instruction The Bit Test Immediate instruction, se_btsti, also sets CR field 0. See the instruction description and also Section 5.6, "Fixed-Point Compare and Bit Test Instruc- tions". 676 Power ISATM -- Book VLE Version 2.04 4.2 Branch Instructions The sequence of instruction execution can be changed Encodings for the BO32 field for VLE are shown in by the branch instructions. Because VLE instructions Figure 18. must be aligned on half-word boundaries, the low-order bit of the generated branch target address is forced to 0 BO32 Description by the processor in performing the branch. 00 Branch if the condition is false. The branch instructions compute the EA of the target in 01 Branch if the condition is true. one of the following ways, as described in Section 2.2, 10 Decrement CTRM:63, then branch if the "Instruction Storage Addressing Modes" decremented CTRM:630 1. Adding a displacement to the address of the 11 Decrement CTRM:63, then branch if the branch instruction. decremented CTRM:63=0. 2. Using the address contained in the LR (Branch to Link Register [and Link]). Figure 18. BO32 field encodings 3. Using the address contained in the CTR (Branch to Encodings for the BO16 field for VLE are shown in Count Register [and Link]). Figure 19. Branching can be conditional or unconditional, and the return address can optionally be provided. If the return address is to be provided (LK = 1), the EA of the instruction following the branch instruction is placed BO16 Description into the LR after the branch target address has been 0 Branch if the condition is false. computed; this is done regardless of whether the 1 Branch if the condition is true. branch is taken. Figure 19. BO16 field encodings In branch conditional instructions, the BI32 or BI16 instruction field specifies the CR bit to be tested. For 32-bit instructions using BI32, CR32:47 (corresponding to bits in CR0:CR3) may be specified. For 16-bit instructions using BI16, only CR32:35 (bits within CR0) may be specified. In branch conditional instructions, the BO32 or BO16 field specifies the conditions under which the branch is taken and how the branch is affected by or affects the CR and CTR. Note that VLE instructions also have dif- ferent encodings for the BO32 and BO16 fields than in Book I's BO field. If the BO32 field specifies that the CTR is to be decre- mented, in 64-bit mode CTR0:63 are decremented, and in 32-bit mode CTR32:63 are decremented. If BO16 or BO32 specifies a condition that must be TRUE or FALSE, that condition is obtained from the contents of CRBI32+32 or CRBI16+32. (Note that CR bits are num- bered 32:63. BI32 or BI16 refers to the condition regis- ter bit field in the branch instruction encoding. For example, specifying BI32 = 2 refers to CR34.) For Figure 18 let M = 0 in 64-bit mode and M = 32 in 32-bit mode. Chapter 4. Branch Operation Instructions 677 Version 2.04 Branch [and Link] BD24-form Branch [and Link] BD8-form e_b target_addr (LK=0) se_b target_addr (LK=0) e_bl target_addr (LK=1) se_bl target_addr (LK=1) 30 0 BD24 LK 58 0 LK BD8 0 6 7 31 0 6 7 8 15 NIA 1iea CIA + EXTS(BD24 || 0b0) NIA 1iea CIA + EXTS(BD8 || 0b0) if LK then LR 1iea CIA + 4 if LK then LR 1iea CIA + 2 target_addr specifies the branch target address. target_addr specifies the branch target address. The branch target address is the sum of BD24 || 0b0 The branch target address is the sum of BD8 || 0b0 sign-extended and the address of this instruction, with sign-extended and the address of this instruction, with the high-order 32 bits of the branch target address set the high-order 32 bits of the branch target address set to 0 in 32-bit mode. to 0 in 32-bit mode. If LK=1 then the effective address of the instruction fol- If LK=1 then the effective address of the instruction fol- lowing the Branch instruction is placed into the Link lowing the Branch instruction is placed into the Link Register. Register. Special Registers Altered: Special Registers Altered: LR (if LK=1) LR (if LK=1) Branch Conditional [and Link] BD15-form Branch Conditional Short Form BD8-form e_bc BO32,BI32,target_addr (LK=0) se_bc BO16,BI16,target_addr e_bcl BO32,BI32,target_addr (LK=1) 28 BO16 BI16 BD8 30 8 BO32 BI32 BD15 LK 0 5 6 8 15 0 6 10 12 16 31 cond_ok 1 (CRBI16+32 BO16) if (64-bit mode) if cond_ok then then M 1 0 NIA 1iea CIA + EXTS(BD8 || 0b0) else M 1 32 else NIA 1iea CIA + 2 if BO320 then CTRM:63 1 CTRM:63 - 1 The BI16 field specifies the Condition Register bit to be ctr_ok 1 ¬BO320 | ((CTRM:63 0) BO321) cond_ok 1 BO320 | (CRBI32+32 BO321) tested. The BO16 field is used to resolve the branch as if ctr_ok & cond_ok then described in Figure 19. target_addr specifies the NIA 1iea (CIA + EXTS(BD15 || 0b0)) branch target address. else The branch target address is the sum of BD8 || 0b0 NIA 1iea CIA + 4 if LK then LR 1iea CIA + 4 sign-extended and the address of this instruction, with the high-order 32 bits of the branch target address set The BI32 field specifies the Condition Register bit to be to 0 in 32-bit mode. tested. The BO32 field is used to resolve the branch as described in Figure 18. target_addr specifies the Special Registers Altered: branch target address. None The branch target address is the sum of BD15 || 0b0 sign-extended and the address of this instruction, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction following the Branch instruction is placed into the Link Register. Special Registers Altered: CTR (if BO320=1) LR (if LK=1) 678 Power ISATM -- Book VLE Version 2.04 Branch to Count Register [and Link] Branch to Link Register [and Link]C-form C-form se_blr (LK=0) se_bctr (LK=0) se_blrl (LK=1) se_bctrl (LK=1) 02 LK 03 LK 0 15 0 15 NIA 1iea LR0:62 || 0b0 NIA 1iea CTR0:62 || 0b0 if LK then LR 1iea CIA + 2 if LK then LR 1iea CIA + 2 The branch target address is LR0:62 || 0b0 with the The branch target address is CTR0:62 || 0b0 with the high-order 32 bits of the branch target address set to 0 high-order 32 bits of the branch target address set to 0 in 32-bit mode. in 32-bit mode. If LK=1 then the effective address of the instruction If LK=1 then the effective address of the instruction following the Branch instruction is placed into the Link following the Branch instruction is placed into the Link Register. Register. Special Registers Altered: Special Registers Altered: LR (if LK=1) LR (if LK=1) Chapter 4. Branch Operation Instructions 679 Version 2.04 4.3 System Linkage Instructions The System Linkage instructions enable the program to in Book I and Book III-E with the exception of the LEV call upon the system to perform a service and provide a field, but are encoded differently. means by which the system can return from performing se_sc provides the same functionality as the Book I a service or from processing an interrupt. System Link- (and Book III-E) instruction sc without the LEV field. age instructions defined by the VLE category are identi- se_rfi, se_rfci, se_rfdi, and se_rfmci provide the cal in semantics to System Linkage instructions defined same functionality as the Book III-E instructions rfi, rfci, rfdi, and rfmci respectively. System Call C-form Illegal C-form se_sc se_illegal 02 0 0 15 0 15 SRR1 1iea MSR se_illegal is used to request an Illegal Instruction SRR0 1 CIA+2 exception. NIA 1iea IVPR0:47 || IVOR848:59 || 0b0000 MSR 1 new_value (see below) The behavior is the same as if an illegal instruction was executed. The effective address of the instruction following the System Call instruction is placed into SRR0. The con- This instruction is context synchronizing. tents of the MSR are copied into SRR1. Special Registers Altered: Then a System Call interrupt is generated. The inter- SRR0 SRR1 MSR ESR rupt causes the MSR to be set as described in Section 5.6 of Book III-E. The interrupt causes the next instruction to be fetched from effective address IVPR0:47 || IVOR848:59 || 0b0000. This instruction is context synchronizing. Special Registers Altered: SRR0 SRR MSR 680 Power ISATM -- Book VLE Version 2.04 Return From Machine Check Interrupt C- Return From Critical Interrupt C-form form se_rfci se_rfmci 09 11 0 15 0 15 MSR 1 CSRR1 MSR 1 MCSRR1 NIA 1iea CSRR00:62 || 0b0 NIA 1iea MCSRR00:62 || 0b0 The se_rfci instruction is used to return from a critical The se_rfmci instruction is used to return from a class interrupt, or as a means of establishing a new machine check class interrupt, or as a means of estab- context and synchronizing on that new context simulta- lishing a new context and synchronizing on that new neously. context simultaneously. The contents of CSRR1 are placed into the MSR. If the The contents of MCSRR1 are placed into the MSR. If new MSR value does not enable any pending excep- the new MSR value does not enable any pending tions, then the next instruction is fetched, under control exceptions, then the next instruction is fetched, under of the new MSR value, from the address control of the new MSR value, from the address CSRR00:62||0b0. If the new MSR value enables one or MCSRR00:62||0b0. If the new MSR value enables one more pending exceptions, the interrupt associated with or more pending exceptions, the interrupt associated the highest priority pending exception is generated; in with the highest priority pending exception is gener- this case the values placed into the save/restore regis- ated; in this case the values placed into the save/ ters by the interrupt processing mechanism (see Chap- restore registers by the interrupt processing mecha- ter 5 of Book III-E) is the address and MSR value of the nism (see Chapter 5 of Book III-E) is the address and instruction that would have been executed next had the MSR value of the instruction that would have been exe- interrupt not occurred (that is, the address in CSRR0 at cuted next had the interrupt not occurred (that is, the the time of the execution of the se_rfci). address in MCSRR0 at the time of the execution of the This instruction is privileged and context synchronizing. se_rfmci). Special Registers Altered: This instruction is privileged and context synchronizing. MSR Special Registers Altered: MSR Chapter 4. Branch Operation Instructions 681 Version 2.04 Return From Interrupt C-form Return From Debug Interrupt C-form se_rfi se_rfdi 08 10 0 15 0 15 MSR 1 SRR1 MSR 1 DSRR1 NIA 1iea SRR00:62 || 0b0 NIA 1iea DSRR032:62 || 0b0 The se_rfi instruction is used to return from a non-criti- The se_rfdi instruction is used to return from a debug cal class interrupt, or as a means of establishing a new class interrupt, or as a means of establishing a new context and synchronizing on that new context simulta- context and synchronizing on that new context simulta- neously. neously. The contents of SRR1 are placed into the MSR. If the The contents of DSRR1 are placed into the MSR. If the new MSR value does not enable any pending excep- new MSR value does not enable any pending excep- tions, then the next instruction is fetched under control tions, then the next instruction is fetched, under control of the new MSR value from the address SRR00:62||0b0. of the new MSR value, from the address If the new MSR value enables one or more pending DSRR00:62||0b0. If the new MSR value enables one or exceptions, the interrupt associated with the highest more pending exceptions, the interrupt associated with priority pending exception is generated; in this case the the highest priority pending exception is generated; in values placed into the save/restore registers by the this case the value placed into the save/restore regis- interrupt processing mechanism (see Chapter 5 of ters by the interrupt processing mechanism (see Chap- Book III-E) is the address and MSR value of the instruc- ter 5 of Book III-E) is the address of the instruction that tion that would have been executed next had the inter- would have been executed next had the interrupt not rupt not occurred (that is, the address in SRR0 at the occurred (that is, the address in DSRR0 at the time of time of the execution of the se_rfi). the execution of se_rfdi). This instruction is privileged and context synchronizing. This instruction is privileged and context synchronizing. Special Registers Altered: Special Registers Altered: MSR MSR Corequisite Categories: Embedded.Enhanced Debug 682 Power ISATM -- Book VLE Version 2.04 4.4 Condition Register Instructions Condition Register instructions are provided to transfer does remap the CR-logical and mcrf instruction func- values to and from various portions of the CR. Cate- tionality into primary opcode 31. These instructions gory VLE does not introduce any additional functional- operate identically to the Book I instructions, but are ity beyond that defined in Book I for CR operations, but encoded differently. Condition Register AND XL-form Condition Register AND with Complement XL-form e_crand BT,BA,BB e_crandc BT,BA,BB 31 BT BA BB 257 / 0 6 11 16 21 31 31 BT BA BB 129 / 0 6 11 16 21 31 CRBT+32 1 CRBA+32 & CRBB+32 CRBT+32 1 CRBA+32 & ¬CRBB+32 The bit in the Condition Register specified by BA+32 is ANDed with the bit in the Condition Register specified by BB+32, and the result is placed into the bit in the The bit in the Condition Register specified by BA+32 is Condition Register specified by BT+32. ANDed with the one's complement of the bit in the Con- Special Registers Altered: dition Register specified by BB+32, and the result is CRBT+32 placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32 Condition Register Equivalent XL-form Condition Register NAND XL-form e_creqv BT,BA,BB e_crnand BT,BA,BB 31 BT BA BB 289 / 31 BT BA BB 225 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 1 CRBA+32 CRBB+32 CRBT+32 1 ¬(CRBA+32 & CRBB+32) The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is XORed with the bit in the Condition Register specified ANDed with the bit in the Condition Register specified by BB+32, and the complemented result is placed into by BB+32, and the complemented result is placed into the bit in the Condition Register specified by BT+32. the bit in the Condition Register specified by BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 Chapter 4. Branch Operation Instructions 683 Version 2.04 Condition Register NOR XL-form Condition Register OR XL-form e_crnor BT,BA,BB e_cror BT,BA,BB 31 BT BA BB 33 / 31 BT BA BB 449 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 1 ¬(CRBA+32 | CRBB+32) CRBT+32 1 CRBA+32 | CRBB+32 The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is ORed with the bit in the Condition Register specified by ORed with the bit in the Condition Register specified by BB+32, and the complemented result is placed into the BB+32, and the result is placed into the bit in the Con- bit in the Condition Register specified by BT+32. dition Register specified by BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 Condition Register OR with Complement Condition Register XOR XL-form XL-form e_crxor BT,BA,BB e_crorc BT,BA,BB 31 BT BA BB 193 / 31 BT BA BB 417 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 1 CRBA+32 CRBB+32 CRBT+32 1 CRBA+32 | ¬CRBB+32 The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is XORed with the bit in the Condition Register specified ORed with the complement of the bit in the Condition by BB+32, and the result is placed into the bit in the Register specified by BB+32, and the result is placed Condition Register specified by BT+32. into the bit in the Condition Register specified by Special Registers Altered: BT+32. CRBT+32 Special Registers Altered: CRBT+32 Move CR Field XL-form e_mcrf BF,BFA 31 BF // BFA ///// 16 / 0 6 9 11 16 21 31 CR4xBF+32:4xBF+35 1 CR4xBFA+32:4xBFA+35 The contents of Condition Register field BFA are copied to Condition Register field BF. Special Registers Altered: CR field BF 684 Power ISATM -- Book VLE Version 2.04 Chapter 5. Fixed-Point Instructions 5.1 Fixed-Point Load Instructions . . . . 685 5.7 Fixed-Point Trap Instructions . . . . . 701 5.2 Fixed-Point Store Instructions. . . . 689 5.8 Fixed-Point Select Instruction . . . . 701 5.3 Fixed-Point Load and Store with Byte 5.9 Fixed-Point Logical, Bit, and Move Reversal Instructions. . . . . . . . . . . . . . 692 Instructions . . . . . . . . . . . . . . . . . . . . . 702 5.4 Fixed-Point Load and Store Multiple 5.10 Fixed-Point Rotate and Shift Instruc- Instructions . . . . . . . . . . . . . . . . . . . . . 692 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 707 5.5 Fixed-Point Arithmetic Instructions 693 5.11 Move To/From System Register 5.6 Fixed-Point Compare and Bit Test Instructions . . . . . . . . . . . . . . . . . . . . . 710 Instructions . . . . . . . . . . . . . . . . . . . . . 697 This section lists the fixed-point instructions supported by category VLE. 5.1 Fixed-Point Load Instructions The fixed-point Load instructions compute the effective the instruction form is invalid. This is the same behavior address (EA) of the memory to be accessed as as specified for load with update instructions in Book I. described in Section 2.1, "Data Storage Addressing The fixed-point Load instructions from Book I, lbzx, Modes" lbzux, lhzx, lhzux, lwzx, and lwzux are available while The byte, halfword, word, or doubleword in storage executing in VLE mode. The mnemonics, decoding, addressed by EA is loaded into RT or RZ. and semantics for these instructions are identical to those in Book I. See Section 3.3.2 of Book I for the Category VLE supports both Big- and Little-Endian instruction definitions. byte ordering for data accesses. The fixed-point Load instructions from Book I, lwax, Some fixed-point load instructions have an update form lwaux, ldx, and ldux are available while executing in in which RA is updated with the EA. For these forms, if VLE mode on 64-bit implementations. The mnemonics, RA0 and RART, the EA is placed into RA and the decoding, and semantics for these instructions are memory element (byte, halfword, word, or doubleword) identical to those in Book I. See Section 3.3.2 of Book addressed by EA is loaded into RT. If RA=0 or RA =RT, Ifor the instruction definitions. Chapter 5. Fixed-Point Instructions 685 Version 2.04 Load Byte and Zero D-form Load Byte and Zero Short Form SD4-form e_lbz RT,D(RA) se_lbz RZ,SD4(RX) 12 RT RA D 08 SD4 RZ RX 0 6 11 16 31 0 4 8 12 15 if RA = 0 then b 1 0 EA 1 (RX)+ 600 || SD4 else b 1 (RA) RZ 1 560 || MEM(EA, 1) EA 1 b + EXTS(D) RT 1 560 || MEM(EA, 1) Let the effective address (EA) be the sum RX + SD4. The byte in storage addressed by EA is loaded into Let the effective address (EA) be the sum (RA|0) + D. RT56:63. RT0:55 are set to 0. The byte in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. Special Registers Altered: None Special Registers Altered: None Load Byte and Zero with Update D8-form Load Halfword Algebraic D-form e_lbzu RT,D8(RA) e_lha RT,D(RA) 06 RT RA 0 D8 14 RT RA D 0 6 11 16 24 31 0 6 11 16 31 EA 1 (RA) + EXTS(D8) if RA = 0 then b 1 0 RT 1 560 || MEM(EA, 1) else b 1 (RA) RA 1 EA EA 1 b + EXTS(D) RT 1 EXTS(MEM(EA, 2)) Let the effective address (EA) be the sum (RA) + D8. The byte in storage addressed by EA is loaded into Let the effective address (EA) be the sum (RA|0) + D. RT56:63. RT0:55 are set to 0. The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are filled with a copy of bit 0 of the EA is placed into register RA. loaded halfword. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Load Halfword and Zero D-form Load Halfword and Zero Short Form SD4-form e_lhz RT,D(RA) se_lhz RZ,SD4(RX) 22 RT RA D 0 6 11 16 31 10 SD4 RZ RX 0 4 8 12 15 if RA = 0 then b 1 0 else b 1 (RA) EA 1 (RX)+ (590 || SD4 || 0) EA 1 b + EXTS(D) RZ 1 480 || MEM(EA, 2) RT 1 480 || MEM(EA, 2) Let the effective address (EA) be the sum (RX) + (SD4 Let the effective address (EA) be the sum (RA|0) + D. || 0). The halfword in storage addressed by EA is The halfword in storage addressed by EA is loaded into loaded into RZ48:63. RZ0:47 are set to 0. RT48:63. RT0:47 are set to 0. Special Registers Altered: Special Registers Altered: None None 686 Power ISATM -- Book VLE Version 2.04 Load Halfword Algebraic with Update Load Halfword and Zero with Update D8-form D8-form e_lhau RT,D8(RA) e_lhzu RT,D8(RA) 06 RT RA 03 D8 06 RT RA 01 D8 0 6 11 16 24 31 0 6 11 16 24 31 EA 1 (RA) + EXTS(D8) EA 1 (RA) + EXTS(D8) RT 1 EXTS(MEM(EA, 2)) RT 1 480 || MEM(EA, 2)) RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA) + D8. Let the effective address (EA) be the sum (RA) + D8. The halfword in storage addressed by EA is loaded into The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are filled with a copy of bit 0 of the RT48:63. RT0:47 are set to 0. loaded halfword. EA is placed into register RA. EA is placed into RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Load Word and Zero D-form Load Word and Zero Short FormSD4-form e_lwz RT,D(RA) se_lwz RZ,SD4(RX) 20 RT RA D 12 SD4 RZ RX 0 6 11 16 31 0 4 8 12 15 if RA = 0 then b 1 0 EA 1 (RX)+ (580 || SD4 || 20) else b 1 (RA) RZ 1 320 || MEM(EA, 2) EA 1 b + EXTS(D) RT 1 320 || MEM(EA, 4) Let the effective address (EA) be the sum (RX) + (SD4 || 00). The word in storage addressed by EA is loaded Let the effective address (EA) be the sum (RA|0) + D. into RZ32:63. RZ0:31 are set to 0. The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. Special Registers Altered: None Special Registers Altered: None Chapter 5. Fixed-Point Instructions 687 Version 2.04 Load Word and Zero with Update D8-form e_lwzu RT,D8(RA) 06 RT RA 02 D8 0 6 11 16 24 31 EA 1 (RA) + EXTS(D8) RT 1 320 || MEM(EA, 4)) RA 1 EA Let the effective address (EA) be the sum (RA) + D8. The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None 688 Power ISATM -- Book VLE Version 2.04 5.2 Fixed-Point Store Instructions The fixed-point Store instructions compute the EA of 1 If RS=RA, the contents of register RS are copied the memory to be accessed as described in to the target memory element and then EA is Section 2.1, "Data Storage Addressing Modes". placed into register RA (RS). The contents of register RS or RZ are stored into the The fixed-point Store instructions from Book I, stbx, byte, halfword, word, or doubleword in storage stbux, sthx, sthux, stwx, and stwux are available addressed by EA. while executing in VLE mode. The mnemonics, decod- ing, and semantics for those instructions are identical to Category VLE supports both Big- and Little-Endian those in Book I; see Section 3.3.3 of Book I for the byte ordering for data accesses. instruction definitions. Some fixed-point store instructions have an update The fixed-point Store instructions from Book I, stdx and form, in which register RA is updated with the effective stdux are available while executing in VLE mode on address. For these forms, the following rules (from 64-bit implementations. The mnemonics, decoding, Book I) apply. and semantics for these instructions are identical to 1 If RA0, the effective address is placed into regis- those in Book I; see Section 3.3.3 of Book I for the ter RA. instruction definitions. Store Byte D-form Store Byte Short Form SD4-form e_stb RS,D(RA) se_stb RZ,SD4(RX) 13 RS RA D 09 SD4 RZ RX 0 6 11 16 31 0 4 8 12 15 if RA = 0 then b 1 0 EA 1 (RX) + EXTS(SD4) else b 1 (RA) MEM(EA, 1) 1 (RZ)56:63 EA 1 b + EXTS(D) MEM(EA, 1) 1 (RS)56:63 Let the effective address (EA) be the sum (RX) + SD4. (RZ)56:63 are stored in the byte in storage addressed by Let the effective address (EA) be the sum (RA|0)+ D. EA. (RS)56:63 are stored in the byte in storage addressed by EA. Special Registers Altered: None Special Registers Altered: None Chapter 5. Fixed-Point Instructions 689 Version 2.04 Store Byte with Update D8-form e_stbu RS,D8(RA) 06 RS RA 04 D8 0 6 11 16 24 31 EA 1 (RA) + EXTS(D8) MEM(EA, 1) 1 (RS)56:63 RA 1 EA Let the effective address (EA) be the sum (RA) + D8. (RS)56:63 are stored in the byte in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None Store Halfword D-form Store Halfword Short Form SD4-form e_sth RS,D(RA) se_sth RZ,SD4(RX) 23 RS RA D 11 SD4 RZ RX 0 6 11 16 31 0 4 8 12 15 if RA = 0 then b 1 0 EA 1 (RX) + (590 || SD4 || 0) else b 1 (RA) MEM(EA, 2) 1 (RZ)48:63 EA 1 b + EXTS(D) MEM(EA, 2) 1 (RS)48:63 Let the effective address (EA) be the sum (RX) + (SD4 || 0). (RZ)48:63 are stored in the halfword in storage Let the effective address (EA) be the sum (RA|0) + D. addressed by EA. (RS)48:63 are stored in the halfword in storage addressed by EA. Special Registers Altered: None Special Registers Altered: None Store Halfword with Update D8-form e_sthu RS,D8(RA) 06 RS RA 05 D8 0 6 11 16 24 31 EA 1 (RA) + EXTS(D8) MEM(EA, 2) 1 (RS)48:63 RA 1 EA Let the effective address (EA) be the sum (RA) + D8. (RS)48:63 are stored in the halfword in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None 690 Power ISATM -- Book VLE Version 2.04 Store Word D-form Store Word Short Form SD4-form e_stw RS,D(RA) se_stw RZ,SD4(RX) 21 RS RA D 13 SD4 RZ RX 0 6 11 16 31 0 4 8 12 15 if RA = 0 then b 1 0 EA 1 (RX) + (580 || SD4 || 20) else b 1 (RA) MEM(EA, 4) 1 (RZ)32:63 EA 1 b + EXTS(D) MEM(EA, 4) 1 (RS)32:63 Let the effective address (EA) be the sum (RX)+ (SD4 || 00). (RZ)32:63 are stored in the word in storage Let the effective address (EA) be the sum (RA|0) + D. addressed by EA. (RS)32:63 are stored in the word in storage addressed by EA. Special Registers Altered: None Special Registers Altered: None Store Word with Update D8-form e_stwu RS,D8(RA) 06 RS RA 06 D8 0 6 11 16 24 31 EA 1 (RA) + EXTS(D8) MEM(EA, 4) 1 (RS)32:63 RA 1 EA Let the effective address (EA) be the sum (RA) + D8. (RS)32:63 are stored in the word in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None Chapter 5. Fixed-Point Instructions 691 Version 2.04 5.3 Fixed-Point Load and Store with Byte Reversal Instructions The fixed-point Load with Byte Reversal and Store with Byte Reversal instructions from Book I, lhbrx, lwbrx, sthbrx, and stwbrx are available while executing in VLE mode. The mnemonics, decoding, and semantics for these instructions are identical to those in Book I. See Section 3.3.4 of Book I for the instruction defini- tions. 5.4 Fixed-Point Load and Store Multiple Instructions The Load/Store Multiple instructions have preferred forms; see Section 1.8.1 of Book I. In the preferred forms storage alignment satisfies the following rule. 1 The combination of the EA and RT (RS) is such that the low-order byte of GPR 31 is loaded (stored) from (into) the last byte of an aligned quadword in storage. Load Multiple Word D8-form Store Multiple Word D8-form e_lmw RT,D8(RA) e_stmw RS,D8(RA) 06 RT RA 08 D8 06 RS RA 9 D8 0 6 11 16 24 31 0 6 11 16 24 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(D8) EA 1 b + EXTS(D8) r 1 RT r 1 RS do while r 31 do while r 31 GPR(r) 1 320 || MEM(EA,4) MEM(EA,4) 1 GPR(r)32:63 r 1 r + 1 r 1 r + 1 EA 1 EA + 4 EA 1 EA + 4 Let n = (32-RT). Let the effective address (EA) be the Let n = (32-RS). Let the effective address (EA) be the sum (RA|0) + D8. sum (RA|0) + D8. n consecutive words starting at EA are loaded into the n consecutive words starting at EA are stored from the low-order 32 bits of GPRs RT through 31. The high- low-order 32 bits of GPRs RS through 31. order 32 bits of these GPRs are set to zero. Special Registers Altered: If RA is in the range of registers to be loaded, including None the case in which RA = 0, the instruction form is invalid. Special Registers Altered: None 692 Power ISATM -- Book VLE Version 2.04 5.5 Fixed-Point Arithmetic e_addic[.] and e_subfic[.] always set CA to reflect the carry out of bit 0 in 64-bit mode and out of bit 32 in 32- Instructions bit mode. The fixed-point Arithmetic instructions use the contents The fixed-point Arithmetic instructions from Book I, of the GPRs as source operands, and place results into add[.], addo[.], addc[.], addco[.], adde[.], addeo[.], GPRs, into status bits in the XER and into CR0. addme[.], addmeo[.], addze[.], addzeo[.], divw[.], divwo[.], divwu[.], divwuo[.], mulhw[.], mulhwu[.], The fixed-point Arithmetic instructions treat source mullw[.], mullwo[.] neg[.], nego[.], subf[.], subfo[.] operands as signed integers unless the instruction is subfe[.], subfeo[.], subfme[.], subfmeo[.], subfze[.], explicitly identified as performing an unsigned opera- subfzeo[.], subfc[.], and subfco[.] are available while tion. executing in VLE mode. The mnemonics, decoding, and semantics for these instructions are identical to The e_add2i. instruction and other Arithmetic instruc- those in Book I; see Section 3.3.8 of Book I for the tions with Rc=1 set the first three bits of CR0 to charac- instruction definitions. terize the result placed into the target register. In 64-bit mode, these bits are set by signed comparison of the The fixed-point Arithmetic instructions from Book I, result to 0. In 32-bit mode, these bits are set by signed mulld[.], mulldo[.], mulhd[.], muldu[.], divd[.], comparison of the low-order 32 bits of the result to divdo[.], divdu[.], and divduo[.] are available while zero. executing in VLE mode on 64-bit implementations. The mnemonics, decoding, and semantics for those instruc- tions are identical to these in Book I; see Section 3.3.8 of Book I for the instruction definitions. Chapter 5. Fixed-Point Instructions 693 Version 2.04 Add Short Form RR-form Add Immediate D-form se_add RX,RY e_add16i RT,RA,SI 01 0 RY RX 07 RT RA SI 0 6 8 12 15 0 6 11 16 31 RX 1 (RX) + (RY) RT 1 (RA) + EXTS(SI) The sum (RX) + (RY) is placed into register RX. The sum (RA) + SI is placed into register RT. Special Registers Altered: Special Registers Altered: None None Add (2 operand) Immediate and Record Add (2 operand) Immediate Shifted I16A-form I16A-form e_add2i. RA,si e_add2is RA,si 28 si RA 17 si 28 si RA 18 si 0 6 11 16 21 31 0 6 11 16 21 31 RA 1 (RA) + EXTS(si) RA1 (RA) + EXTS(si || 160) The sum (RA) + si is placed into register RT. The sum (RA) + (si || 0x0000) is placed into register RA. Special Registers Altered: Special Registers Altered: CR0 None Add Scaled Immediate SCI8-form Add Immediate Short Form OIM5-form e_addi RT,RA,sci8 (Rc=0) se_addi RX,oimm e_addi. RT,RA,sci8 (Rc=1) 08 0 OIM5 RX 06 RT RA 8 Rc F SCL UI8 0 6 7 12 15 0 6 11 16 20 21 22 24 31 oimm 1 (590 || OIM5) + 1 sci8 1 56-SCL×8F || UI8 ||SCL×8F RX 1 (RX) + oimm RT 1 (RA) + sci8 The sum (RX) + oimm is placed into RX. The value of The sum (RA) + sci8 is placed into register RT. oimm must be in the range of 1 to 32. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) None 694 Power ISATM -- Book VLE Version 2.04 Add Scaled Immediate Carrying SCI8-form e_addic RT,RA,sci8 (Rc=0) e_addic. RT,RA,sci8 (Rc=1) 06 RT RA 9 Rc F SCL UI8 0 6 11 16 20 21 22 24 31 sci8 1 56-SCL×8F || UI8 ||SCL×8F RT 1 (RA) + sci8 The sum (RA) + sci8 is placed into register RT. Special Registers Altered: CR0 (if Rc=1) CA Subtract RR-form Subtract From Short Form RR-form se_sub RX,RY se_subf RX,RY 1 2 RY RX 01 3 RY RX 0 6 8 12 15 0 6 8 12 15 RX 1 (RX) +¬(RY) + 1 RX 1 ¬(RX) + (RY) + 1 The sum (RX) + ¬(RY) + 1 is placed into register RX. The sum ¬(RX) + (RY) + 1 is placed into register RX. Special Registers Altered: Special Registers Altered: None None Subtract From Scaled Immediate Carrying Subtract Immediate OIM5-form SCI8-form se_subi RX,oimm (Rc=0) e_subfic RT,RA,sci8 (Rc=0) se_subi. RX,oimm (Rc=1) e_subfic. RT,RA,sci8 (Rc=1) 09 Rc OIM5 RX 06 RT RA 11 Rc F SCL UI8 0 6 7 12 15 0 6 11 16 20 21 22 24 31 oimm 1 (590 || OIM5) + 1 sci8 1 56-SCL×8F || UI8 ||SCL×8F RX 1 (RX) + ¬oimm + 1 RT 1 ¬(RA) + sci8 + 1 The sum (RA) + ¬oimm + 1 is placed into register RX. The sum ¬(RA) + sci8 + 1 is placed into register RT. The value of oimm must be in the range 1 to 32. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) CA Chapter 5. Fixed-Point Instructions 695 Version 2.04 Multiply Low Scaled Immediate SCI8-form Multiply (2 operand) Low Immediate I16A-form e_mulli RT,RA,sci8 e_mull2i RA,si 06 RT RA 20 F SCL UI8 0 6 11 16 21 22 24 31 28 si RA 20 si 0 6 11 16 21 31 sci8 1 56-SCL×8F || UI8 ||SCL×8F prod0:127 1 (RA) × sci8 prod0:127 1 (RA) × EXTS(si) RT 1 prod64:127 RA 1 prod64:127 The 64-bit first operand is (RA). The 64-bit second The 64-bit first operand is (RA). The 64-bit second operand is the sci8 operand. The low-order 64-bits of operand is the sign-extended value of the si operand. the 128-bit product of the operands are placed into reg- The low-order 64-bits of the 128-bit product of the oper- ister RT. ands are placed into register RA. Both operands and the product are interpreted as Both operands and the product are interpreted as signed integers. signed integers. Special Registers Altered: Special Registers Altered: None None Multiply Low Word Short Form RR-form Negate Short Form R-form se_mullw RX,RY se_neg RX 01 1 RY RX 0 03 RX 0 6 8 12 15 0 6 12 15 RX 1 (RX)32:63 × (RY)32:63 RX 1 ¬(RX)+ 1 The 32-bit operands are the low-order 32-bits of RX The sum ¬(RX) + 1 is placed into register RX and of RY. The 64-bit product of the operands is placed If the processor is in 64-bit mode and register RX con- into register RX. tains the most negative 64-bit number Both operands and the product are interpreted as (0x8000_0000_0000_0000), the result is the most neg- signed integers. ative 64-bit number. Similarly, if the processor is in 32- bit mode and register RX contains the most negative Special Registers Altered: 32-bit number (0x8000_0000), the result is the most None negative 32-bit number. Special Registers Altered: None 696 Power ISATM -- Book VLE Version 2.04 5.6 Fixed-Point Compare and Bit Test Instructions The fixed-point Compare instructions compare the con- The fixed-point Bit Test instruction tests the bit specified tents of register RA or register RX with one of the fol- by the UI5 instruction field and sets the CR0 field as fol- lowing: lows. 1 The value of the scaled immediate field sci8 . formed from the F, UI8, and SCL fields as: Bit Name Description sci8 1 56-SCL×8F || UI8 ||SCL×8F 0 LT Always set to 0 1 The zero-extended value of the UI field 1 GT RXui5 = 1 1 The zero-extended value of the UI5 field 2 EQ RXui5 = 0 1 The sign-extended value of the SI field 3 SO Summary overflow from the XER 1 The contents of register RB or register RY. The following comparisons are signed: e_cmph, The fixed-point Compare instructions from Book I, cmp e_cmpi, e_cmp16i, e_cmph16i, se_cmp, se_cmph, and cmpl are available while executing in VLE mode. and se_cmpi. The mnemonics, decoding, and semantics for these instructions are identical to those in Book I; see The following comparisons are unsigned: e_cmphl, Section 3.3.9 of Book I for the instruction definitions. e_cmpli, e_cmphl16i, e_cmpl16i, se_cmpli, se_cmpl, and se_cmphl. Bit Test Immediate IM5-form Compare Immediate Word I16A-form se_btsti RX,UI5 e_cmp16i RA,si 25 1 UI5 RX 28 si RA 19 si 0 6 7 12 15 0 6 11 16 21 31 a 1 UI5 b 1 EXTS(si) b 1 a+320 || 1 || 31-a0 if (RA)32:63 < b32:63 then c 1 0b100 c 1 (RX) & b if (RA)32:63 > b32:63 then c 1 0b010 if c = 0 then d 1 0b001 else d 1 0b010 if (RA)32:63 = b32:63 then c 1 0b001 CR0 1 d || XERSO CR0 1 c || XERSO Bit UI5+32 of register RX is tested for equality to '0' and The low-order 32 bits of register RA are compared with the result is recorded in CR0. EQ is set if the tested bit si, treating operands as signed integers. The result of is 0, LT is cleared, and GT is set to the inverse value of the comparison is placed into CR0. EQ. Special Registers Altered: Special Registers Altered: CR0 CR0 Chapter 5. Fixed-Point Instructions 697 Version 2.04 Compare Scaled Immediate Word Compare Word RR-form SCI8-form se_cmp RX,RY e_cmpi BF32,RA,sci8 3 0 RY RX 06 000 BF32 RA 21 F SCL UI8 0 6 8 12 15 0 6 9 11 16 21 22 24 31 if (RX)32:63 < (RY)32:63 then c 1 0b100 sci8 1 56-SCL×8F || UI8 ||SCL×8F if (RX)32:63 > (RY)32:63 then c 1 0b010 if (RA)32:63 < sci832:63 then c 1 0b100 if (RX)32:63 = (RY)32:63 then c 1 0b001 if (RA)32:63 > sci832:63 then c 1 0b010 CR0 1 c || XERSO if (RA)32:63 = sci832:63 then c 1 0b001 The low-order 32 bits of register RX are compared with CR4×BF32+32:4×BF32+35 1 c || XERSO the low-order 32 bits of register RY, treating operands The low-order 32 bits of register RA are compared with as signed integers. The result of the comparison is sci8, treating operands as signed integers. The result of placed into CR0. the comparison is placed into CR field BF32. Special Registers Altered: Special Registers Altered: CR0 CR field BF32 Compare Immediate Word Short Form Compare Logical Immediate Word IM5-form I16A-form se_cmpi RX,UI5 e_cmpl16i RA,ui 10 1 UI5 RX 28 ui RA 21 ui 0 6 7 12 15 0 6 11 16 21 31 b 1 590 || UI5 b 1 480 || ui if (RX)32:63 < b32:63 then c 1 0b100 if (RA)32:63 b32:63 then c 1 0b010 if (RA)32:63 >u b32:63 then c 1 0b010 if (RX)32:63 = b32:63 then c 1 0b001 if (RA)32:63 = b32:63 then c 1 0b001 CR0 1 c || XERSO CR0 1 c || XERSO The low-order 32 bits of register RX are compared with The low-order 32 bits of register RA are compared with UI5, treating operands as signed integers. The result of ui, treating operands as unsigned integers. The result the comparison is placed into CR0. of the comparison is placed into CR0. Special Registers Altered: Special Registers Altered: CR0 CR0 698 Power ISATM -- Book VLE Version 2.04 Compare Logical Scaled Immediate Word Compare Logical Word RR-form SCI8-form se_cmpl RX,RY e_cmpli BF32,RA,sci8 3 1 RY RX 06 01 BF32 RA 21 F SCL UI8 0 6 8 12 15 0 6 9 11 16 21 22 24 31 if (RX)32:63 u (RY)32:63 then c 1 0b010 if (RA)32:63 u sci832:63 then c 1 0b010 CR0 1 c || XERSO if (RA)32:63 = sci832:63 then c 1 0b001 The low-order 32 bits of register RX are compared with CR4×BF32+32:4×BF32+35 1 c || XERSO the low-order 32 bits of register RY, treating operands The low-order 32 bits of register RA are compared with as unsigned integers. The result of the comparison is sci8, treating operands as unsigned integers. The placed into CR0. result of the comparison is placed into CR field BF32. Special Registers Altered: Special Registers Altered: CR0 CR field BF32 Compare Logical Immediate Word Compare Halfword X-form OIM5-form e_cmph BF,RA,RB se_cmpli RX,oimm 31 BF // RA RB 14 / 08 1 OIM5 RX 0 6 9 11 16 21 31 0 6 7 12 15 a 1 EXTS((RA)48:63) oimm 1 590 || (OIM5 + 1) b 1 EXTS((RB)48:63) if (RX)32:63 u oimm32:63 then c 1 0b010 if a > b then c 1 0b010 if (RX)32:63 = oimm32:63 then c 1 0b001 if a = b then c 1 0b001 CR0 1 c || XERSO CR4×BF+32:4×BF+35 1 c || XERSO The low-order 32 bits of register RX are compared with The low-order 16 bits of register RA are compared with oimm, treating operands as unsigned integers. The the low-order 16 bits of register RB, treating operands result of the comparison is placed into CR0. The value as signed integers. The result of the comparison is of oimm must be in the range of 1 to 32. placed into CR field BF. Special Registers Altered: Special Registers Altered: CR0 CR field BF Chapter 5. Fixed-Point Instructions 699 Version 2.04 Compare Halfword Short Form RR-form Compare Halfword Immediate I16A-form se_cmph RX,RY e_cmph16i RA,si 3 2 RY RX 28 si RA 22 si 0 6 8 12 15 0 6 11 16 21 31 a 1 EXTS((RX)48:63) a 1 EXTS((RA)48:63) b 1 EXTS((RY)48:63) b 1 EXTS(si) if a < b then c 1 0b100 if a < b then c 1 0b100 if a > b then c 1 0b010 if a > b then c 1 0b010 if a = b then c 1 0b001 if a = b then c 1 0b001 CR0 1 c || XERSO CR0 1 c || XERSO The low-order 16 bits of register RX are compared with The low-order 16 bits of register RA are compared with the low-order 16 bits of register RY, treating operands si, treating operands as signed integers. The result of as signed integers. The result of the comparison is the comparison is placed into CR0. placed into CR0. Special Registers Altered: Special Registers Altered: CR0 CR0 Compare Halfword Logical X-form Compare Halfword Logical Short Form RR-form e_cmphl BF,RA,RB se_cmphl RX,RY 31 BF // RA RB 46 / 0 6 9 11 16 21 31 3 3 RY RX 0 6 8 12 15 a 1 EXTZ((RA)48:63) b 1 EXTZ((RB)48:63) a 1 (RX)48:63 if a u b then c 1 0b010 if a u b then c 1 0b010 CR4×BF+32:4×BF+35 1 c || XERSO if a = b then c 1 0b001 CR0 1 c || XERSO The low-order 16 bits of register RA are compared with the low-order 16 bits of register RB, treating operands The low-order 16 bits of register RX are compared with as unsigned integers. The result of the comparison is the low-order 16 bits of register RY, treating operands placed into CR field BF. as unsigned integers. The result of the comparison is placed into CR0. Special Registers Altered: CR field BF Special Registers Altered: CR0 700 Power ISATM -- Book VLE Version 2.04 Compare Halfword Logical Immediate I16A-form 5.7 Fixed-Point Trap Instruc- tions e_cmphl16i RA,ui The fixed-point Trap instruction from Book I, tw is avail- 28 ui RA 23 ui able while executing in VLE mode. The mnemonics, 0 6 11 16 21 31 decoding, and semantics for this instruction is identical to that in Book I; see Section 3.3.10 of Book I for the a 1 480 || (RA)48:63 instruction definition. b 1 480 || ui The fixed-point Trap instruction from Book I, td is avail- if a u b then c 1 0b010 if a = b then c 1 0b001 tations. The mnemonic, decoding, and semantics for CR0 1 c || XERSO the td instruction are identical to those in Book I; see Section 3.3.10 of Book I for the instruction definitions. The low-order 16 bits of register RA are compared with the ui field, treating operands as signed integers. The result of the comparison is placed into CR0. 5.8 Fixed-Point Select Instruc- Special Registers Altered: CR0 tion The fixed-point Select instruction provides a means to select one of two registers and place the result in a destination register under the control of a predicate value supplied by a CR bit. The fixed-point Select instruction from Book I, isel is available while executing in VLE mode. The mnemon- ics, decoding, and semantics for this instruction is iden- tical to that in Book I; see Section of Book I for the instruction definition. Chapter 5. Fixed-Point Instructions 701 Version 2.04 5.9 Fixed-Point Logical, Bit, and Move Instructions The Logical instructions perform bit-parallel operations The fixed-point Logical instructions from Book I, and[.], on 64-bit operands. The Bit instructions manipulate a or[.], xor[.], nand[.], nor[.], eqv[.], andc[.], orc[.], bit, or create a bit mask, in a register. The Move instruc- extsb[.], extsh[.], cntlzw[.], and popcntb are available tions move a register or an immediate value into a reg- while executing in VLE mode. The mnemonics, decod- ister. ing, and semantics for these instructions are identical to those in Book I; see Section 3.3.12 of Book I for the The X-form Logical instructions with Rc=1, the SCI8- instruction definitions. form Logical instructions with Rc=1, the RR-form Logi- cal instructions with Rc=1, the e_and2i. instruction, The fixed-point Logical instructions from Book I, and the e_and2is. instruction set the first three bits of extsw[.] and cntlzd[.] are available while executing in CR field 0 as the arithmetic instructions described in VLE mode on 64-bit implementations. The mnemonics, Section 5.5, "Fixed-Point Arithmetic Instructions". (Also decoding, and semantics for these instructions are see Section 4.1.1.) The Logical instructions do not identical to those in Book I; see Section 3.3.12 of Book change the SO, OV, and CA bits in the XER. I for the instruction definitions. AND (two operand) Immediate I16L-form AND (2 operand) Immediate Shifted I16L-form e_and2i. RT,ui e_and2is. RT,ui 28 RT ui 25 ui 0 6 11 16 21 31 28 RT ui 29 ui 0 6 11 16 21 31 RT 1 (RT) & (480 || ui) RT 1 (RT) & (320 || ui || 160) The contents of register RT are ANDed with 480 || ui and the result is placed into register RT. The contents of register RT are ANDed with 320 || ui || 16 0 and the result is placed into register RT. Special Registers Altered: CR0 Special Registers Altered: CR0 AND Scaled Immediate Carrying AND Immediate Short Form IM5-form SCI8-form se_andi RX,UI5 e_andi RA,RS,sci8 (Rc=0) e_andi. RA,RS,sci8 (Rc=1) 11 1 UI5 RX 0 6 7 12 15 06 RS RA 12 Rc F SCL UI8 0 6 11 16 20 21 22 24 31 RX 1 (RX) & 590 || UI5 The contents of register RX are ANDed with 590 || UI5 sci8 1 56-SCL×8F || UI8 ||SCL×8F and the result is placed into register RX. RA 1 (RS) & sci8 Special Registers Altered: The contents of register RS are ANDed with sci8 and None the result is placed into register RA. Special Registers Altered: CR0 (if Rc=1) 702 Power ISATM -- Book VLE Version 2.04 OR (two operand) Immediate I16L-form OR (2 operand) Immediate Shifted I16L-form e_or2i RT,ui e_or2is RT,ui 28 RT ui 24 ui 0 6 11 16 21 31 28 RT ui 26 ui 0 6 11 16 21 31 RT 1 (RT) | (480 || ui) RT 1 (RT) | (320 || ui || 160) The contents of register RT are ORed with 480 || ui and the result is placed into register RT. The contents of register RT are ORed with 320 || ui || 16 0 and the result is placed into register RT. Special Registers Altered: None Special Registers Altered: None OR Scaled Immediate SCI8-form XOR Scaled Immediate SCI8-form e_ori RA,RS,sci8 (Rc=0) e_xori RA,RS,sci8 (Rc=0) e_ori. RA,RS,sci8 (Rc=1) e_xori. RA,RS,sci8 (Rc=1) 06 RS RA 13 Rc F SCL UI8 06 RS RA 14 Rc F SCL UI8 0 6 11 16 20 21 22 24 31 0 6 11 16 20 21 22 24 31 sci8 1 56-SCL×8F || UI8 ||SCL×8F sci8 1 56-SCL×8F || UI8 ||SCL×8F RA 1 (RS) | sci8 RA 1 (RS) sci8 The contents of register RS are ORed with sci8 and the The contents of register RS are XORed with sci8 and result is placed into register RA. the result is placed into register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) AND Short Form RR-form AND with Complement Short Form RR-form se_and RX,RY (Rc=0) se_and. RX,RY (Rc=1) se_andc RX,RY 17 1 Rc RY RX 17 1 RY RX 0 6 7 8 12 15 0 6 8 12 15 RX 1 (RX) & (RY) RX 1 (RX) & ¬(RY) The contents of register RX are ANDed with the con- The contents of register RX are ANDed with the com- tents of register RY and the result is placed into register plement of the contents of register RY and the result is RX. placed into register RX. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) None Chapter 5. Fixed-Point Instructions 703 Version 2.04 OR Short Form RR-form NOT Short Form R-form se_or RX,RY se_not RX 17 0 RY RX 0 02 RX 0 6 8 12 15 0 6 12 15 RX 1 (RX) | (RY) RX 1 ¬(RX) The contents of register RX are ORed with the contents The contents of RX are complemented and placed into of register RY and the result is placed into register RX. register RX. Special Registers Altered: Special Registers Altered: None None Bit Clear Immediate IM5-form Bit Generate Immediate IM5-form se_bclri RX,UI5 se_bgeni RX,UI5 24 0 UI5 RX 24 1 UI5 RX 0 6 7 12 15 0 6 7 12 15 a 1 UI5 a 1 UI5 RX 1 (RX) & (a+321 || 0 || 31-a1) RX 1 (a+320 || 1 || 31-a0) Bit UI5+32 of register RX is set to 0. Bit UI5+32 of register RX is set to 1. All other bits in register RX are set to 0. Special Registers Altered: None Special Registers Altered: None Bit Mask Generate Immediate IM5-form Bit Set Immediate IM5-form se_bmaski RX,UI5 se_bseti RX,UI5 11 0 UI5 RX 25 0 UI5 RX 0 6 7 12 15 0 6 7 12 15 a 1 UI5 a 1 UI5 if a = 0 then RX 1 641 RX 1 (RX) | (a+320 || 1 || 31-a0) else RX 1 64-a0 || a1 Bit UI5+32 of register RX is set to 1. If UI5 is not zero, the low-order UI5 bits are set to 1 in register RX and all other bits in register RX are set to 0. Special Registers Altered: If UI5 is 0, all bits in register RX are set to 1. None Special Registers Altered: None 704 Power ISATM -- Book VLE Version 2.04 Extend Sign Byte Short Form R-form Extend Sign Halfword Short Form R-form se_extsb RX se_extsh RX 0 13 RX 0 15 RX 0 6 12 15 0 6 12 15 s 1 (RX)56 s 1 (RX)48 RX 1 56s || (RX)56:63 RX 1 48s || (RX)48:63 (RX)56:63 are placed into RX56:63. Bit 56 of register RX (RX)48:63 are placed into RX48:63. Bit 48 of register RX is placed into RX0:55. is placed into RX0:47. Special Registers Altered: Special Registers Altered: None None Extend Zero Byte R-form Extend Zero Halfword R-form se_extzb RX se_extzh RX 0 12 RX 0 14 RX 0 6 12 15 0 6 12 15 RX 1 560 || (RX)56:63 RX 1 480 || (RX)48:63 (RX)56:63 are placed into RX56:63. RX0:55 are set to 0. (RX)48:63 are placed into RX48:63. RX0:47 are set to 0. Special Registers Altered: Special Registers Altered: None None Load Immediate LI20-form Load Immediate Short Form IM7-form e_li RT,LI20 se_li RX,UI7 28 RT li204:8 0 li200:3 li209:19 09 UI7 RX 0 6 11 16 17 21 31 0 5 12 15 RT 1 EXTS(li205:8 || li200:4 || li209:19) RX 1 570 || UI7 The sign-extended LI20 field is placed into RT. The zero-extended UI7 field is placed into RX. Special Registers Altered: Special Registers Altered: None None Load Immediate Shifted I16L-form e_lis RT,ui 28 RT ui 28 ui 0 6 11 16 21 31 RT 1 320 || ui || 160 The zero-extended value of ui shifted left 16 bits is placed into RT. Special Registers Altered: None Chapter 5. Fixed-Point Instructions 705 Version 2.04 Move from Alternate Register RR-form Move Register RR-form se_mfar RX,ARY se_mr RX,RY 0 3 ARY RX 0 1 RY RX 0 6 8 12 15 0 6 8 12 15 r 1 ARY+8 RX 1 (RY) RX 1 GPR(r) The contents of register RY are placed into RX. The contents of register ARY+8 are placed into RX. ARY specifies a register in the range R8:R23. Special Registers Altered: None Special Registers Altered: None Move to Alternate Register RR-form se_mtar ARX,RY 0 2 RY ARX 0 6 8 12 15 r 1 ARX+8 GPR(r) 1 (RY) The contents of register RY are placed into register ARX+8. ARX specifies a register in the range R8:R23. Special Registers Altered: None 706 Power ISATM -- Book VLE Version 2.04 5.10 Fixed-Point Rotate and Shift Instructions The fixed-point Shift instructions from Book I, slw[.], The fixed-point Shift instructions from Book I, sld[.], srw[.], srawi[.], and sraw[.] are available while execut- srd[.], sradi[.], and srad[.] are available while execut- ing in VLE mode. The mnemonics, decoding, and ing in VLE mode on 64-bit implementations. The mne- semantics for those instructions are identical to those in monics, decoding, and semantics for those instructions Book I; see Section 3.3.13.2 of Book I for the instruc- are identical to those in Book I; see Section 3.3.13.2 of tion definitions. Book I for the instruction definitions. Rotate Left Word X-form Rotate Left Word Immediate X-form e_rlw RA,RS,RB (Rc=0) e_rlwi RA,RS,SH (Rc=0) e_rlw. RA,RS,RB (Rc=1) e_rlwi. RA,RS,SH (Rc=1) 31 RS RA RB 280 Rc 31 RS RA SH 312 Rc 0 6 11 16 21 31 0 6 11 16 21 31 n 1 (RB)59:63 n 1 SH RA 1 ROTL32((RS)32:63,n) RA 1 ROTL32((RS)32:63,n) The contents of register RS are rotated32 left the num- The contents of register RS are rotated32 left SH bits ber of bits specified by (RB)59:63 and the result is and the result is placed into register RA. placed into register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Rotate Left Word Immediate then Mask Rotate Left Word Immediate then AND Insert M-form with Mask M-form e_rlwimi RA,RS,SH,MB,ME e_rlwinm RA,RS,SH,MB,ME 29 RS RA SH MB ME 0 29 RS RA SH MB ME 1 0 6 11 16 21 26 31 0 6 11 16 21 26 31 n 1 SH n 1 SH r 1 ROTL32((RS)32:63, n) r 1 ROTL32((RS)32:63, n) m 1 MASK(MB+32, ME+32) m 1 MASK(MB+32, ME+32) RA 1 r&m | (RA)&¬m RA 1 r & m The contents of register RS are rotated32 left SH bits. A The contents of register RS are rotated32 left SH bits. A mask is generated having 1-bits from bit MB+32 mask is generated having 1-bits from bit MB+32 through bit ME+32 and 0-bits elsewhere. The rotated through bit ME+32 and 0-bits elsewhere. The rotated data is inserted into register RA under control of the data is ANDed with the generated mask and the result generated mask. is placed into register RA. Special Registers Altered: Special Registers Altered: None None Chapter 5. Fixed-Point Instructions 707 Version 2.04 Shift Left Word Immediate X-form Shift Left Word Immediate Short Form IM5-form e_slwi RA,RS,SH (Rc=0) e_slwi. RA,RS,SH (Rc=1) se_slwi RX,UI5 31 RS RA SH 56 Rc 27 0 UI5 RX 0 6 11 16 21 31 0 6 7 12 15 n 1 SH n 1 UI5 r 1 ROTL32((RS)32:63, n) r 1 ROTL32((RX)32:63, n) m 1 MASK(32, 63-n) m 1 MASK(32, 63-n) RA 1 r & m RX 1 r & m The contents of the low-order 32 bits of register RS are The contents of the low-order 32 bits of register RX are shifted left SH bits. Bits shifted out of position 32 are shifted left UI5 bits. Bits shifted out of position 32 are lost. Zeros are supplied to the vacated positions on the lost. Zeros are supplied to the vacated positions on the right. The 32-bit result is placed into RA32:63. RA0:31 right. The 32-bit result is placed into RX32:63. RX0:31 are set to 0. are set to 0. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) None Shift Left Word RR-form Shift Right Algebraic Word Immediate IM5-form se_slw RX,RY se_srawi RX,UI5 16 2 RY RX 0 6 8 12 15 26 1 UI5 RX 0 6 7 12 15 n 1 (RY)58:63 r 1 ROTL32((RX)32:63, n) n 1 UI5 if (RY)58 = 0 then m 1 MASK(32, 63-n) r 1 ROTL32((RX)32:63, 64-n) else m 1 640 m 1 MASK(n+32, 63) RX 1 r & m s 1 (RX)32 RX 1 r&m | (64s)&¬m The contents of the low-order 32 bits of register RX are CA 1 s & ((r&¬m)32:630) shifted left the number of bits specified by (RY)58:63. Bits shifted out of position 32 are lost. Zeros are sup- The contents of the low-order 32 bits of register RX are plied to the vacated positions on the right. The 32-bit shifted right UI5 bits. Bits shifted out of position 63 are result is placed into RX32:63. RX0:31 are set to 0. Shift lost, and bit 32 of RX is replicated to fill the vacated amounts from 32-63 give a zero result. positions on the left. Bit 32 of RX is replicated to fill RX0:31 and the 32-bit result is placed into RX32:63. CA Special Registers Altered: is set to 1 if the low-order 32 bits of register RX contain None a negative value and any 1-bits are shifted out of bit position 63; otherwise CA is set to 0. A shift amount of zero causes RX to receive EXTS((RX)32:63), and CA to be set to 0. Special Registers Altered: CA 708 Power ISATM -- Book VLE Version 2.04 Shift Right Algebraic Word RR-form Shift Right Word Immediate X-form se_sraw RX,RY e_srwi RA,RS,SH (Rc=0) e_srwi. RA,RS,SH (Rc=1) 16 1 RY RX 0 6 8 12 15 31 RS RA SH 568 Rc 0 6 11 16 21 31 n 1 (RY)59:63 r 1 ROTL32((RX)32:63, 64-n) n 1 SH if (RY)58 = 0 then m 1 MASK(n+32, 63) r 1 ROTL32((RS)32:63, 64-n) else m 1 640 m 1 MASK(n+32, 63) s 1 (RX)32 RA 1 r & m RX 1 r&m | (64s)&¬m CA 1 s & ((r&¬m)32:630) The contents of the low-order 32 bits of register RS are shifted right SH bits. Bits shifted out of position 63 are The contents of the low-order 32 bits of register RX are lost. Zeros are supplied to the vacated positions on the shifted right the number of bits specified by (RY)58:63. left. The 32-bit result is placed into RA32:63. RA0:31 are Bits shifted out of position 63 are lost, and bit 32 of RX set to 0. is replicated to fill the vacated positions on the left. Bit 32 of RX is replicated to fill RX0:31 and the 32-bit result Special Registers Altered: is placed into RX32:63. CA is set to 1 if the low-order 32 CR0 (if Rc=1) bits of register RX contain a negative value and any 1- bits are shifted out of bit position 63; otherwise CA is set to 0. A shift amount of zero causes RX to receive EXTS((RX)32:63), and CA to be set to 0. Shift amounts from 32-63 give a result of 64 sign bits, and cause CA to receive the sign bit of (RX)32:63. Special Registers Altered: CA Shift Right Word Immediate Short Form Shift Right Word RR-form IM5-form se_srw RX,RY se_srwi RX,UI5 16 0 RY RX 26 0 UI5 RX 0 6 8 12 15 0 6 7 12 15 n 1 (RY)59:63 n 1 UI5 r 1 ROTL32((RX)32:63, 64-n) r 1 ROTL32((RX)32:63, 64-n) if (RY)58 = 0 then m 1 MASK(n+32, 63) m 1 MASK(n+32, 63) else m 1 640 RX 1 r & m RX 1 r & m The contents of the low-order 32 bits of register RX are The contents of the low-order 32 bits of register RX are shifted right UI5 bits. Bits shifted out of position 63 are shifted right the number of bits specified by (RY)58:63. lost. Zeros are supplied to the vacated positions on the Bits shifted out of position 63 are lost. Zeros are sup- left. The 32-bit result is placed into RX32:63. RX0:31 are plied to the vacated positions on the left. The 32-bit set to 0. result is placed into RX32:63. RX0:31 are set to 0. Shift amounts from 32 to 63 give a zero result. Special Registers Altered: None Special Registers Altered: None Chapter 5. Fixed-Point Instructions 709 Version 2.04 5.11 Move To/From System Register Instructions The VLE category provides 16-bit forms of instructions The fixed-point Move To/From System Register instruc- to move to/from the LR and CTR. tions from Book III-E, mfspr, mtspr, mfdcr, mtdcr, mtmsr, mfmsr, wrtee, and wrteei are available while The fixed-point Move To/From System Register instruc- executing in VLE mode. The mnemonics, decoding, tions from Book I, mfspr, mtcrf, mfcr, mtocrf, mfocrf, and semantics for these instructions are identical to mcrxr, mtdcrux, mfdcrux, mfapidi, and mtspr are those in Book III-E; see Section 3.4.1 of Book III-E for available while executing in VLE mode. The mnemon- the instruction definitions. ics, decoding, and semantics for these instructions are identical to those in Book I; see Section 3.3.14 of Book I for the instruction definitions. Move From Count Register R-form Move From Link Register R-form se_mfctr RX se_mflr RX 0 10 RX 0 8 RX 0 6 12 15 0 6 12 15 RX 1 CTR RX 1 LR The CTR contents are placed into register RX. The LR contents are placed into register RX. Special Registers Altered: Special Registers Altered: None None Move To Count Register R-form Move To Link Register R-form se_mtctr RX se_mtlr RX 0 11 RX 0 9 RX 0 6 12 15 0 6 12 15 CTR 1 (RX) LR 1 (RX) The contents of register RX are placed into the CTR. The contents of register RX are placed into the LR. Special Registers Altered: Special Registers Altered: CTR LR 710 Power ISATM -- Book VLE Version 2.04 Chapter 6. Storage Control Instructions 6.1 Storage Synchronization Instructions . 6.4 TLB Management Instructions . . . 712 711 6.5 Instruction Alignment and Byte Order- 6.2 Cache Management Instructions . 712 ing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712 6.3 Cache Locking Instructions. . . . . . 712 6.1 Storage Synchronization Instruction Synchronize C-form Instructions se_isync The memory synchronization instructions implemented 01 by category VLE are identical in semantics to those 0 15 defined in Book II and Book III-E. The se_isync instruction is defined by category VLE, but has the Executing an se_isync instruction ensures that all same semantics as isync. instructions preceding the se_isync instruction have The Load and Reserve and Store Conditional instruc- completed before the se_isync instruction completes, tions from Book II, lwarx and stwcx. are available while and that no subsequent instructions are initiated until executing in VLE mode. The mnemonics, decoding, after the se_isync instruction completes. It also and semantics for those instructions are identical to ensures that all instruction cache block invalidations those in Book II; see Section 3.3.2 of Book II for the caused by icbi instructions preceding the se_isync instruction definitions. instruction have been performed with respect to the processor executing the se_isync instruction, and then The Load and Reserve and Store Conditional instruc- causes any prefetched instructions to be discarded. tions from Book II, ldarx and stdcx. are available while executing in VLE mode on 64-bit implementations. The Except as described in the preceding sentence, the mnemonics, decoding, and semantics for those instruc- se_isync instruction may complete before storage tions are identical to those in Book II; see Section 3.3.2 accesses associated with instructions preceding the of Book II for the instruction definitions. se_isync instruction have been performed. This instruction is context synchronizing. The Memory Barrier instructions from Book II, sync (msync) and mbar are available while executing in The se_isync instruction has identical semantics to the VLE mode. The mnemonics, decoding, and semantics Book II isync instruction, but has a different encoding. for those instructions are identical to those in Book II; Special Registers Altered: see Section 3.3.3 of Book II for the instruction defini- None tions. The wait instruction from Book II is available while exe- cuting in VLE mode if the category Wait is imple- mented. The mnemonics, decoding, and semantics for wait are identical to those in Book II; see Section 3.3 of Book II for the instruction definition. Chapter 6. Storage Control Instructions 711 Version 2.04 6.2 Cache Management Instruc- 6.5 Instruction Alignment and tions Byte Ordering Cache management instructions implemented by cate- Only Big-Endian instruction memory is supported when gory VLE are identical to those defined in Book II and executing from a page of VLE instructions. Attempting Book III-E. to fetch VLE instructions from a page marked as Little- Endian generates an instruction storage interrupt byte- The Cache Management instructions from Book II, ordering exception. dcba, dcbf, dcbst, dcbt, dcbtst, dcbz, icbi, and icbt are available while executing in VLE mode. The mne- monics, decoding, and semantics for these instructions are identical to those in Book II; see Section 3.2 of Book II for the instruction definitions. The Cache Management instruction from Book III-E, dcbi is available while executing in VLE mode. The mnemonics, decoding, and semantics for this instruc- tion are identical to those in Book III-E; see Section 4.9.1 of Book III-E for the instruction definition. 6.3 Cache Locking Instructions Cache locking instructions implemented by category VLE are identical to those defined in Book III-E. If the Cache Locking instructions are implemented in cate- gory VLE, the category Embedded Cache Locking must also be implemented. The Cache Locking instructions from Book III-E, dcbtls, dcbtstls, dcblc, icbtls, and icblc are available while executing in VLE mode. The mnemonics, decod- ing, and semantics for these instructions are identical to those in Book III-E; see Section 4.9.2 of Book III-E for the instruction definitions. 6.4 TLB Management Instruc- tions The TLB management instructions implemented by cat- egory VLE are identical to those defined in Book III-E. The TLB Management instructions from Book III-E, tlbre, tlbwe, tlbivax, tlbsync, and tlbsx are available while executing in VLE mode. The mnemonics, decod- ing, and semantics for these instructions are identical to those in Book III-E. See Section 4.9.4.1 of Book III-E for the instruction definitions. Instructions and resources from category Embed- ded.MMU Type FSL are available if the appropriate cat- egory is implemented. 712 Power ISATM -- Book VLE Version 2.04 Chapter 7. Additional Categories Available in VLE 7.1 Move Assist . . . . . . . . . . . . . . . . . 713 7.6 External PID . . . . . . . . . . . . . . . . . 713 7.2 Vector . . . . . . . . . . . . . . . . . . . . . . 713 7.7 Embedded Performance Monitor . 714 7.3 Signal Processing Engine. . . . . . . 713 7.8 Processor Control . . . . . . . . . . . . . 714 7.4 Embedded Floating Point . . . . . . . 713 7.5 Legacy Move Assist . . . . . . . . . . . 713 Instructions and resources from categories other than Base and Embedded are available in VLE. These 7.4 Embedded Floating Point include categories for which all the instructions in the Embedded Floating Point instructions implemented by category use primary opcode 4 or primary opcode 31. category VLE are identical to those defined in Book I. If the Embedded Floating Point instructions are imple- mented in category VLE, the appropriate category 7.1 Move Assist SPE.Embedded Float Scalar Double, SPE.Embedded Float Scalar Single, or SPE.Embedded Float Vector Move Assist instructions implemented by category VLE must also be implemented. The mnemonics, decoding, are identical to those defined in Book I. If the Move and semantics for those instructions are identical to Assist instructions are implemented in category VLE, those in Book I; see Chapter 7 of Book I for the instruc- category Move Assist must also be implemented. The tion definitions. mnemonics, decoding, and semantics for those instruc- tions are identical to those in Book I; see Section 3.3.6 of Book I for the instruction definitions. 7.5 Legacy Move Assist Legacy Move Assist instructions implemented by cate- 7.2 Vector gory VLE are identical to those defined in Book I. If the Legacy Move Assist instructions are implemented in Vector instructions implemented by category VLE are category VLE, category Legacy Move Assist must also identical to those defined in Book I. If the Vector instruc- be implemented. The mnemonics, decoding, and tions are implemented in category VLE, category Vec- semantics for those instructions are identical to those in tor must also be implemented. The mnemonics, Book I; see Chapter 8 of Book I for the instruction defi- decoding, and semantics for those instructions are nitions. identical to those in Book I; see Chapter 5 of Book I for the instruction definitions. 7.6 External PID 7.3 Signal Processing Engine External Process ID instructions implemented by cate- gory VLE are identical to those defined in Book III-E. If Signal Processing Engine instructions implemented by the External Process ID instructions are implemented category VLE are identical to those defined in Book I. If in category VLE, category Embedded.External PID the Signal Processing Engine instructions are imple- must also be implemented. The mnemonics, decoding, mented in category VLE, category Signal Processing and semantics for those instructions are identical to Engine must also be implemented. The mnemonics, those in Book III-E; see Chapter 3.3.4 of Book III-E for decoding, and semantics for those instructions are the instruction definitions. identical to those in Book I; see Chapter 6 of Book Ifor the instruction definitions. Chapter 7. Additional Categories Available in VLE 713 Version 2.04 7.7 Embedded Performance Monitor Embedded Performance Monitor instructions imple- mented by category VLE are identical to those defined in Book III-E. If the Embedded Performance Monitor instructions are implemented in category VLE, cate- gory Embedded.Performance Monitor must also be implemented. The mnemonics, decoding, and seman- tics for those instructions are identical to those in Book III-E; see Appendix E of Book III-E for the instruction definitions. 7.8 Processor Control Processor Control instructions implemented by cate- gory VLE are identical to those defined in Book III-E. If the Processor Control instructions are implemented in category VLE, category Embedded.Processor Control must also be implemented. The mnemonics, decoding, and semantics for those instructions are identical to those in Book III-E; see Chapter 9 of Book III-E for the instruction definitions. 714 Power ISATM -- Book VLE Version 2.04 Appendix A. VLE Instruction Set Sorted by Mnemonic This appendix lists all the instructions available in VLE mode in the Power ISA, in order by mnemonic. Opcodes that are not defined below are treated as illegal by category VLE. Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 XO 7C000214 B add[o][.] Add XO 7C000014 B addc[o][.] Add Carrying XO 7C000114 SR B adde[o][.] Add Extended XO 7C0001D4 SR B addme[o][.] Add to Minus One Extended XO 7C000194 SR B addze[o][.] Add to Zero Extended X 7C000038 SR B and[.] AND X 7C000078 SR B andc[.] AND with Complement EVX 1000020F SP brinc Bit Reverse Increment X 7C000000 B cmp Compare X 7C000040 B cmpl Compare Logical X 7C000074 SR 64 cntlzd[.] Count Leading Zeros Doubleword X 7C000034 SR B cntlzw[.] Count Leading Zeros Word X 7C0005EC E dcba Data Cache Block Allocate X 7C0000AC B dcbf Data Cache Block Flush X 7C0000FE P E.PD dcbfep Data Cache Block Flush by External Process ID X 7C0003AC P E dcbi Data Cache Block Invalidate X 7C00030C M ECL dcblc Data Cache Block Lock Clear X 7C00006C B dcbst Data Cache Block Store X 7C00022C B dcbt Data Cache Block Touch X 7C00027E P E.PD dcbtep Data Cache Block Touch by External Process ID X 7C00014C M ECL dcbtls Data Cache Block Touch and Lock Set X 7C0001EC B dcbtst Data Cache Block Touch for Store X 7C0001FE P E.PD dcbtstep Data Cache Block Touch for Store by External Process ID X 7C00010C M ECL dcbtstls Data Cache Block Touch for Store and Lock Set X 7C0007EC B dcbz Data Cache Block set to Zero X 7C0007FE P E.PD dcbzep Data Cache Block set to Zero by External Process ID X 7C00038C P E.CI dci Data Cache Invalidate X 7C00028C P E.CD dcread Data Cache Read X 7C0003CC P E.CD dcread Data Cache Read XO 7C0003D2 SR 64 divd[o][.] Divide Doubleword XO 7C000392 SR 64 divdu[o][.] Divide Doubleword Unsigned XO 7C0003D6 SR B divw[o][.] Divide Word XO 7C000396 SR B divwu[o][.] Divide Word Unsigned D 1C000000 VLE e_add16i Add Immediate I16A 70008800 SR VLE e_add2i. Add (2 operand) Immediate and Record I16A 70009000 VLE e_add2is Add (2 operand) Immediate Shifted SCI8 18008000 SR VLE e_addi[.] Add Scaled Immediate SCI8 18009000 SR VLE e_addic[.] Add Scaled Immediate Carrying I16L 7000C800 SR VLE e_and2i. AND (2 operand) Immediate I16L 7000E800 SR VLE e_and2is. AND (2 operand) Immediate Shifted SCI8 1800C000 SR VLE e_andi[.] AND Scaled Immediate BD24 78000000 VLE e_b[l] Branch [and Link] BD15 7A000000 CT VLE e_bc[l] Branch Conditional [and Link] IA16 70009800 VLE e_cmp16i Compare Immediate Word IA16 7000B000 VLE e_cmph16i Compare Halfword Immediate Appendix A. VLE Instruction Set Sorted by Mnemonic 715 Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 7C00001C VLE e_cmph Compare Halfword IA16 7000B800 VLE e_cmphl16i Compare Halfword Logical Immediate X 7C00005C VLE e_cmphl Compare Halfword Logical SCI8 1800A800 VLE e_cmpi Compare Scaled Immediate Word I16A 7000A800 VLE e_cmpl16i Compare Logical Immediate Word SCI8 1880A800 VLE e_cmpli Compare Logical Scaled Immediate Word XL 7C000202 VLE e_crand Condition Register AND XL 7C000102 VLE e_crandc Condition Register AND with Complement XL 7C000242 VLE e_creqv Condition Register Equivalent XL 7C0001C2 VLE e_crnand Condition Register NAND XL 7C000042 VLE e_crnor Condition Register NOR XL 7C000382 VLE e_cror Condition Register OR XL 7C000342 VLE e_crorc Condition Register OR with Complement XL 7C000182 VLE e_crxor Condition Register XOR D 30000000 VLE e_lbz Load Byte and Zero D8 18000000 VLE e_lbzu Load Byte and Zero with Update D 38000000 VLE e_lha Load Halfword Algebraic D8 18000300 VLE e_lhau Load Halfword Algebraic with Update D 58000000 VLE e_lhz Load Halfword and Zero D8 18000100 VLE e_lhzu Load Halfword and Zero with Update LI20 70000000 VLE e_li Load Immediate I16L 7000E000 VLE e_lis Load Immediate Shifted D8 18000800 VLE e_lmw Load Multiple Word D 50000000 VLE e_lwz Load Word and Zero D8 18000200 VLE e_lwzu Load Word and Zero with Update XL 7C000020 VLE e_mcrf Move CR Field I16A 7000A000 VLE e_mull2i Multiply (2 operand) Low Immediate SCI8 1800A000 VLE e_mulli Multiply Low Scaled Immediate I16L 7000C000 VLE e_or2i OR (2operand) Immediate I16L 7000D000 VLE e_or2is OR (2 operand) Immediate Shifted SCI8 1800D000 SR VLE e_ori[.] OR Scaled Immediate X 7C000230 SR VLE e_rlw[.] Rotate Left Word X 7C000270 SR VLE e_rlwi[.] Rotate Left Word Immediate M 74000000 VLE e_rlwimi Rotate Left Word Immediate then Mask Insert M 74000001 VLE e_rlwinm Rotate Left Word Immediate then AND with Mask X 7C000070 SR VLE e_slwi[.] Shift Left Word Immediate X 7C000470 SR VLE e_srwi[.] Shift Right Word Immediate D 34000000 VLE e_stb Store Byte D8 18000400 VLE e_stbu Store Byte with Update D 5C000000 VLE e_sth Store Halfword D8 18000500 VLE e_sthu Store Halfword with Update D8 18000900 VLE e_stmw Store Multiple Word D 54000000 VLE e_stw Store Word D8 18000600 VLE e_stwu Store word with Update SCI8 1800B000 SR VLE e_subfic[.] Subtract From Scaled Immediate Carrying SCI8 1800E000 SR VLE e_xori[.] XOR Scaled Immediate EVX 100002E4 SP.FD efdabs Floating-Point Double-Precision Absolute Value EVX 100002E0 SP.FD efdadd Floating-Point Double-Precision Add EVX 100002EF SP.FD efdcfs Floating-Point Double-Precision Convert from Single-Preci- sion EVX 100002F3 SP.FD efdcfsf Convert Floating-Point Double-Precision from Signed Frac- tion EVX 100002F1 SP.FD efdcfsi Convert Floating-Point Double-Precision from Signed Inte- ger EVX 100002E3 SP.FD efdcfsid Convert Floating-Point Double-Precision from Signed Inte- ger Doubleword EVX 100002F2 SP.FD efdcfuf Convert Floating-Point Double-Precision from Unsigned Fraction EVX 100002F0 SP.FD efdcfui Convert Floating-Point Double-Precision from Unsigned Integer 716 Power ISATM -- Book VLE Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 100002E2 SP.FD efdcfuid Convert Floating-Point Double-Precision from Unsigned Integer Doubleword EVX 100002EE SP.FD efdcmpeq Floating-Point Double-Precision Compare Equal EVX 100002EC SP.FD efdcmpgt Floating-Point Double-Precision Compare Greater Than EVX 100002ED SP.FD efdcmplt Floating-Point Double-Precision Compare Less Than EVX 100002F7 SP.FD efdctsf Convert Floating-Point Double-Precision to Signed Fraction EVX 100002F5 SP.FD efdctsi Convert Floating-Point Double-Precision to Signed Integer EVX 100002EB SP.FD efdctsidz Convert Floating-Point Double-Precision to Signed Integer Doubleword with Round Towards Zero EVX 100002FA SP.FD efdctsiz Convert Floating-Point Double-Precision to Signed Integer with Round Towards Zero EVX 100002F6 SP.FD efdctuf Convert Floating-Point Double-Precision to Unsigned Frac- tion EVX 100002F4 SP.FD efdctui Convert Floating-Point Double-Precision to Unsigned Inte- ger EVX 100002EA SP.FD efdctuidz Convert Floating-Point Double-Precision to Unsigned Inte- ger Doubleword with Round Towards Zero EVX 100002F8 SP.FD efdctuiz Convert Floating-Point Double-Precision to Unsigned Inte- ger with Round Towards Zero EVX 100002E9 SP.FD efddiv Floating-Point Double-Precision Divide EVX 100002E8 SP.FD efdmul Floating-Point Double-Precision Multiply EVX 100002E5 SP.FD efdnabs Floating-Point Double-Precision Negative Absolute Value EVX 100002E6 SP.FD efdneg Floating-Point Double-Precision Negate EVX 100002E1 SP.FD efdsub Floating-Point Double-Precision Subtract EVX 100002FE SP.FD efdtsteq Floating-Point Double-Precision Test Equal EVX 100002FC SP.FD efdtstgt Floating-Point Double-Precision Test Greater Than EVX 100002FD SP.FD efdtstlt Floating-Point Double-Precision Test Less Than EVX 100002E4 SP.FS efsabs Floating-Point Single-Precision Absolute Value EVX 100002E0 SP.FS efsadd Floating-Point Single-Precision Add EVX 100002CF SP.FD efscfd Floating-Point Single-Precision Convert from Double-Preci- sion EVX 100002F3 SP.FS efscfsf Convert Floating-Point Single-Precision from Signed Frac- tion EVX 100002F1 SP.FS efscfsi Convert Floating-Point Single-Precision from Signed Integer EVX 100002E3 SP.FS efscfsid Convert Floating-Point Single-Precision from Signed Integer Doubleword EVX 100002F2 SP.FS efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction EVX 100002F0 SP.FS efscfui Convert Floating-Point Single-Precision from Unsigned Inte- ger EVX 100002E2 SP.FS efscfuid Convert Floating-Point Single-Precision from Unsigned Inte- ger Doubleword EVX 100002EE SP.FS efscmpeq Floating-Point Single-Precision Compare Equal EVX 100002EC SP.FS efscmpgt Floating-Point Single-Precision Compare Greater Than EVX 100002ED SP.FS efscmplt Floating-Point Single-Precision Compare Less Than EVX 100002F7 SP.FS efsctsf Convert Floating-Point Single-Precision to Signed Fraction EVX 100002F5 SP.FS efsctsi Convert Floating-Point Single-Precision to Signed Integer EVX 100002EB SP.FS efsctsidz Convert Floating-Point Single-Precision to Signed Integer Doubleword with Round Towards Zero EVX 100002FA SP.FS efsctsiz Convert Floating-Point Single-Precision to Signed Integer with Round Towards Zero EVX 100002F6 SP.FS efsctuf Convert Floating-Point Single-Precision to Unsigned Frac- tion EVX 100002F4 SP.FS efsctui Convert Floating-Point Single-Precision to Unsigned Integer EVX 100002EA SP.FS efsctuidz Convert Floating-Point Single-Precision to Unsigned Integer Doubleword with Round Towards Zero EVX 100002F8 SP.FS efsctuiz Convert Floating-Point Single-Precision to Unsigned Integer with Round Towards Zero EVX 100002E9 SP.FS efsdiv Floating-Point Single-Precision Divide EVX 100002E8 SP.FS efsmul Floating-Point Single-Precision Multiply Appendix A. VLE Instruction Set Sorted by Mnemonic 717 Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 100002E5 SP.FS efsnabs Floating-Point Single-Precision Negative Absolute Value EVX 100002E6 SP.FS efsneg Floating-Point Single-Precision Negate EVX 100002E1 SP.FS efssub Floating-Point Single-Precision Subtract EVX 100002FE SP.FS efststeq Floating-Point Single-Precision Test Equal EVX 100002FC SP.FS efststgt Floating-Point Single-Precision Test Greater Than EVX 100002FD SP.FS efststlt Floating-Point Single-Precision Test Less Than X 7C000238 SR B eqv[.] Equivalent EVX 10000208 SP evabs Vector Absolute Value EVX 10000202 SP evaddiw Vector Add Immediate Word EVX 100004C9 SP evaddsmiaaw Vector Add Signed, Modulo, Integer to Accumulator Word EVX 100004C1 SP evaddssiaaw Vector Add Signed, Saturate, Integer to Accumulator Word EVX 100004C8 SP evaddumiaaw Vector Add Unsigned, Modulo, Integer to Accumulator Word EVX 100004C0 SP evaddusiaaw Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX 10000200 SP evaddw Vector Add Word EVX 10000211 SP evand Vector AND EVX 10000212 SP evandc Vector AND with Complement EVX 10000234 SP evcmpeq Vector Compare Equal EVX 10000231 SP evcmpgts Vector Compare Greater Than Signed EVX 10000230 SP evcmpgtu Vector Compare Greater Than Unsigned EVX 10000233 SP evcmplts Vector Compare Less Than Signed EVX 10000232 SP evcmpltu Vector Compare Less Than Unsigned EVX 1000020E SP evcntlsw Vector Count Leading Sign Bits Word EVX 1000020D SP evcntlzw Vector Count Leading Zeros Bits Word EVX 100004C6 SP evdivws Vector Divide Word Signed EVX 100004C7 SP evdivwu Vector Divide Word Unsigned EVX 10000219 SP eveqv Vector Equivalent EVX 1000020A SP evextsb Vector Extend Sign Byte EVX 1000020B SP evextsh Vector Extend Sign Halfword EVX 10000284 SP.FV evfsabs Vector Floating-Point Single-Precision Absolute Value EVX 10000280 SP.FV evfsadd Vector Floating-Point Single-Precision Add EVX 10000293 SP.FV evfscfsf Vector Convert Floating-Point Single-Precision from Signed Fraction EVX 10000291 SP.FV evfscfsi Vector Convert Floating-Point Single-Precision from Signed Integer EVX 10000292 SP.FV evfscfuf Vector Convert Floating-Point Single-Precision from Unsigned Fraction EVX 10000290 SP.FV evfscfui Vector Convert Floating-Point Single-Precision from Unsigned Integer EVX 1000028E SP.FV evfscmpeq Vector Floating-Point Single-Precision Compare Equal EVX 1000028C SP.FV evfscmpgt Vector Floating-Point Single-Precision Compare Greater Than EVX 1000028D SP.FV evfscmplt Vector Floating-Point Single-Precision Compare Less Than EVX 10000297 SP.FV evfsctsf Vector Convert Floating-Point Single-Precision to Signed Fraction EVX 10000295 SP.FV evfsctsi Vector Convert Floating-Point Single-Precision to Signed Integer EVX 1000029A SP.FV evfsctsiz Vector Convert Floating-Point Single-Precision to Signed Integer with Round Towards Zero EVX 10000296 SP.FV evfsctuf Vector Convert Floating-Point Single-Precision to Unsigned Fraction EVX 10000294 SP.FV evfsctui Vector Convert Floating-Point Single-Precision to Unsigned Integer EVX 10000298 SP.FV evfsctuiz Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round Towards Zero EVX 10000289 SP.FV evfsdiv Vector Floating-Point Single-Precision Divide EVX 10000288 SP.FV evfsmul Vector Floating-Point Single-Precision Multiply EVX 10000285 SP.FV evfsnabs Vector Floating-Point Single-Precision Negative Absolute Value EVX 10000286 SP.FV evfsneg Vector Floating-Point Single-Precision Negate 718 Power ISATM -- Book VLE Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000281 SP.FV evfssub Vector Floating-Point Single-Precision Subtract EVX 1000029E SP.FV evfststeq Vector Floating-Point Single-Precision Test Equal EVX 1000029C SP.FV evfststgt Vector Floating-Point Single-Precision Test Greater Than EVX 1000029D SP.FV evfststlt Vector Floating-Point Single-Precision Test Less Than EVX 10000301 SP evldd Vector Load Doubleword into Doubleword EVX 7C00011D P E.PD evlddepx Vector Load Doubleword into Doubleword by External Pro- cess ID Indexed EVX 10000300 SP evlddx Vector Load Doubleword into Doubleword Indexed EVX 10000305 SP evldh Vector Load Doubleword into 4 Halfwords EVX 10000304 SP evldhx Vector Load Doubleword into 4 Halfwords Indexed EVX 10000303 SP evldw Vector Load Doubleword into 2 Words EVX 10000302 SP evldwx Vector Load Doubleword into 2 Words Indexed EVX 10000309 SP evlhhesplat Vector Load Halfword into Halfwords Even and Splat EVX 10000308 SP evlhhesplatx Vector Load Halfword into Halfwords Even and Splat Indexed EVX 1000030F SP evlhhossplat Vector Load Halfword into Halfwords Odd and Splat EVX 1000030E SP evlhhossplatx Vector Load Halfword into Halfwords Odd Signed and Splat Indexed EVX 1000030D SP evlhhousplat Vector Load Halfword into Halfwords Odd Unsigned and Splat EVX 1000030C SP evlhhousplatx Vector Load Halfword into Halfwords Odd Unsigned and Splat Indexed EVX 10000311 SP evlwhe Vector Load Word into Two Halfwords Even EVX 10000310 SP evlwhex Vector Load Word into Two Halfwords Even Indexed EVX 10000317 SP evlwhos Vector Load Word into Two Halfwords Odd Signed (with sign extension) EVX 10000316 SP evlwhosx Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) EVX 10000315 SP evlwhou Vector Load Word into Two Halfwords Odd Unsigned (zero- extended) EVX 10000314 SP evlwhoux Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) EVX 1000031D SP evlwhsplat Vector Load Word into Two Halfwords and Splat EVX 1000031C SP evlwhsplatx Vector Load Word into Two Halfwords and Splat Indexed EVX 10000319 SP evlwwsplat Vector Load Word into Word and Splat EVX 10000318 SP evlwwsplatx Vector Load Word into Word and Splat Indexed EVX 1000022C SP evmergehi Vector Merge High EVX 1000022E SP evmergehilo Vector Merge High/Low EVX 1000022D SP evmergelo Vector Merge Low EVX 1000022F SP evmergelohi Vector Merge Low/High EVX 1000052B SP evmhegsmfaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate EVX 100005AB SP evmhegsmfan Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 10000529 SP evmhegsmiaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate EVX 100005A9 SP evmhegsmian Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 10000528 SP evmhegumiaa Vector Multiply Halfwords, Even, Guarded, Unsigned, Mod- ulo, Integer and Accumulate EVX 100005A8 SP evmhegumian Vector Multiply Halfwords, Even, Guarded, Unsigned, Mod- ulo, Integer and Accumulate Negative EVX 1000040B SP evmhesmf Vector Multiply Halfwords, Even, Signed, Modulo, Fractional EVX 1000042B SP evmhesmfa Vector Multiply Halfwords, Even, Signed, Modulo, Fractional to Accumulate EVX 1000050B SP evmhesmfaaw Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate into Words EVX 1000058B SP evmhesmfanw Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate Negative into Words EVX 10000409 SP evmhesmi Vector Multiply Halfwords, Even, Signed, Modulo, Integer Appendix A. VLE Instruction Set Sorted by Mnemonic 719 Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000429 SP evmhesmia Vector Multiply Halfwords, Even, Signed, Modulo, Integer to Accumulator EVX 10000509 SP evmhesmiaaw Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate into Words EVX 10000589 SP evmhesmianw Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate Negative into Words EVX 10000403 SP evmhessf Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional EVX 10000423 SP evmhessfa Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional to Accumulator EVX 10000503 SP evmhessfaaw Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional and Accumulate into Words EVX 10000583 SP evmhessfanw Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 10000501 SP evmhessiaaw Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate into Words EVX 10000581 SP evmhessianw Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate Negative into Words EVX 10000408 SP evmheumi Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer EVX 10000428 SP evmheumia Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator EVX 10000508 SP evmheumiaaw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words EVX 10000588 SP evmheumianw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 10000500 SP evmheusiaaw Vector Multiply Halfwords, Even, Unsigned, Saturate Integer and Accumulate into Words EVX 10000580 SP evmheusianw Vector Multiply Halfwords, Even, Unsigned, Saturate Integer and Accumulate Negative into Words EVX 1000052F SP evmhogsmfaa Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate EVX 100005AF SP evmhogsmfan Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 1000052D SP evmhogsmiaa Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate EVX 100005AD SP evmhogsmian Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 1000052C SP evmhogumiaa Vector Multiply Halfwords, Odd, Guarded, Unsigned, Mod- ulo, Integer and Accumulate EVX 100005AC SP evmhogumian Vector Multiply Halfwords, Odd, Guarded, Unsigned, Mod- ulo, Integer and Accumulate Negative EVX 1000040F SP evmhosmf Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional EVX 1000042F SP evmhosmfa Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional to Accumulator EVX 1000050F SP evmhosmfaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate into Words EVX 1000058F SP evmhosmfanw Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate Negative into Words EVX 1000040D SP evmhosmi Vector Multiply Halfwords, Odd, Signed, Modulo, Integer EVX 1000042D SP evmhosmia Vector Multiply Halfwords, Odd, Signed, Modulo, Integer to Accumulator EVX 1000050D SP evmhosmiaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate into Words EVX 1000058D SP evmhosmianw Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate Negative into Words EVX 10000407 SP evmhossf Vector Multiply Halfwords, Odd, Signed, Fractional EVX 10000427 SP evmhossfa Vector Multiply Halfwords, Odd, Signed, Fractional to Accu- mulator 720 Power ISATM -- Book VLE Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000507 SP evmhossfaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate into Words EVX 10000587 SP evmhossfanw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 10000505 SP evmhossiaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate into Words EVX 10000585 SP evmhossianw Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate Negative into Words EVX 1000040C SP evmhoumi Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer EVX 1000042C SP evmhoumia Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator EVX 1000050C SP evmhoumiaaw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words EVX 1000058C SP evmhoumianw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 10000504 SP evmhousiaaw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words EVX 10000584 SP evmhousianw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 100004C4 SP evmra Initialize Accumulator EVX 1000044F SP evmwhsmf Vector Multiply Word High Signed, Modulo, Fractional EVX 1000046F SP evmwhsmfa Vector Multiply Word High Signed, Modulo, Fractional to Accumulator EVX 1000054F SP evmwhsmfaaw Vector Multiply Word High Signed, Modulo, Fractional and Accumulate into Words EVX 100005CF SP evmwhsmfanw Vector Multiply Word High Signed, Modulo, Fractional and Accumulate Negative into Words EVX 1000044D SP evmwhsmi Vector Multiply Word High Signed, Modulo, Integer EVX 1000046D SP evmwhsmia Vector Multiply Word High Signed, Modulo, Integer to Accu- mulator EVX 1000054D SP evmwhsmiaaw Vector Multiply Word High Signed, Modulo, Integer and Accumulate into Words EVX 100005CD SP evmwhsmianw Vector Multiply Word High Signed, Modulo, Integer and Accumulate Negative into Words EVX 10000447 SP evmwhssf Vector Multiply Word High Signed, Fractional EVX 10000467 SP evmwhssfa Vector Multiply Word High Signed, Fractional to Accumula- tor EVX 10000547 SP evmwhssfaaw Vector Multiply Word High Signed, Fractional and Accumu- late into Words EVX 100005C7 SP evmwhssfanw Vector Multiply Word High Signed, Fractional and Accumu- late Negative into Words EVX 100005C5 SP evmwhssianw Vector Multiply Word High Signed, Integer and Accumulate Negative into Words EVX 1000044C SP evmwhumi Vector Multiply Word High Unsigned, Modulo, Integer EVX 1000046C SP evmwhumia Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator EVX 1000054C SP evmwhumiaaw Vector Multiply Word High Unsigned, Modulo, Integer and Accumulate into Words EVX 100005CC SP evmwhumianw Vector Multiply Word High Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 10000544 SP evmwhusiaaw Vector Multiply Word High Unsigned, Integer and Accumu- late into Words EVX 100005C4 SP evmwhusianw Vector Multiply Word High Unsigned, Integer and Accumu- late Negative into Words EVX 10000549 SP evmwlsmiaaw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words EVX 100005C9 SP evmwlsmianw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative into Words Appendix A. VLE Instruction Set Sorted by Mnemonic 721 Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000541 SP evmwlssiaaw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words EVX 100005C1 SP evmwlssianw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative into Words EVX 10000448 SP evmwlumi Vector Multiply Word Low Unsigned, Modulo, Integer EVX 10000468 SP evmwlumia Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator EVX 10000548 SP evmwlumiaaw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words EVX 100005C8 SP evmwlumianw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 10000540 SP evmwlusiaaw Vector Multiply Word Low Unsigned Saturate, Integer and Accumulate into Words EVX 100005C0 SP evmwlusianw Vector Multiply Word Low Unsigned Saturate, Integer and Accumulate Negative into Words EVX 1000045B SP evmwsmf Vector Multiply Word Signed, Modulo, Fractional EVX 1000047B SP evmwsmfa Vector Multiply Word Signed, Modulo, Fractional to Accu- mulator EVX 1000055B SP evmwsmfaa Vector Multiply Word Signed, Modulo, Fractional and Accu- mulate EVX 100005DB SP evmwsmfan Vector Multiply Word Signed, Modulo, Fractional and Accu- mulate Negative EVX 10000459 SP evmwsmi Vector Multiply Word Signed, Modulo, Integer EVX 10000479 SP evmwsmia Vector Multiply Word Signed, Modulo, Integer to Accumula- tor EVX 10000559 SP evmwsmiaa Vector Multiply Word Signed, Modulo, Integer and Accumu- late EVX 100005D9 SP evmwsmian Vector Multiply Word Signed, Modulo, Integer and Accumu- late Negative EVX 10000453 SP evmwssf Vector Multiply Word Signed, Saturate, Fractional EVX 10000473 SP evmwssfa Vector Multiply Word Signed, Saturate, Fractional to Accu- mulator EVX 10000553 SP evmwssfaa Vector Multiply Word Signed, Saturate, Fractional and Accu- mulate EVX 100005D3 SP evmwssfan Vector Multiply Word Signed, Saturate, Fractional and Accu- mulate Negative EVX 10000458 SP evmwumi Vector Multiply Word Unsigned, Modulo, Integer EVX 10000478 SP evmwumia Vector Multiply Word Unsigned, Modulo, Integer to Accumu- lator EVX 10000558 SP evmwumiaa Vector Multiply Word Unsigned, Modulo, Integer and Accu- mulate EVX 100005D8 SP evmwumian Vector Multiply Word Unsigned, Modulo, Integer and Accu- mulate Negative EVX 1000021E SP evnand Vector NAND EVX 10000209 SP evneg Vector Negate EVX 10000218 SP evnor Vector NOR EVX 10000217 SP evor Vector OR EVX 1000021B SP evorc Vector OR with Complement EVX 10000228 SP evrlw Vector Rotate Left Word EVX 1000022A SP evrlwi Vector Rotate Left Word Immediate EVX 1000020C SP evrndw Vector Round Word EVSE 10000278 SP evsel Vector Select L EVX 10000224 SP evslw Vector Shift Left Word EVX 10000226 SP evslwi Vector Shift Left Word Immediate EVX 1000022B SP evsplatfi Vector Splat Fractional Immediate EVX 10000229 SP evsplati Vector Splat Immediate EVX 10000223 SP evsrwis Vector Shift Right Word Immediate Signed EVX 10000222 SP evsrwiu Vector Shift Right Word Immediate Unsigned EVX 10000221 SP evsrws Vector Shift Right Word Signed 722 Power ISATM -- Book VLE Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000220 SP evsrwu Vector Shift Right Word Unsigned EVX 10000321 SP evstdd Vector Store Doubleword of Doubleword EVX 7C00019D P E.PD evstddepx Vector Store Doubleword into Doubleword by External Pro- cess ID Indexed EVX 10000320 SP evstddx Vector Store Doubleword of Doubleword Indexed EVX 10000325 SP evstdh Vector Store Doubleword of Four Halfwords EVX 10000324 SP evstdhx Vector Store Doubleword of Four Halfwords Indexed EVX 10000323 SP evstdw Vector Store Doubleword of Two Words EVX 10000322 SP evstdwx Vector Store Doubleword of Two Words Indexed EVX 10000331 SP evstwhe Vector Store Word of Two Halfwords from Even EVX 10000330 SP evstwhex Vector Store Word of Two Halfwords from Even Indexed EVX 10000335 SP evstwho Vector Store Word of Two Halfwords from Odd EVX 10000334 SP evstwhox Vector Store Word of Two Halfwords from Odd Indexed EVX 10000339 SP evstwwe Vector Store Word of Word from Even EVX 10000338 SP evstwwex Vector Store Word of Word from Even Indexed EVX 1000033D SP evstwwo Vector Store Word of Word from Odd EVX 1000033C SP evstwwox Vector Store Word of Word from Odd Indexed EVX 100004CB SP evsubfsmiaaw Vector Subtract Signed, Modulo, Integer to Accumulator Word EVX 100004C3 SP evsubfssiaaw Vector Subtract Signed, Saturate, Integer to Accumulator Word EVX 100004CA SP evsubfumiaaw Vector Subtract Unsigned, Modulo, Integer to Accumulator Word EVX 100004C2 SP evsubfusiaaw Vector Subtract Unsigned, Saturate, Integer to Accumulator Word EVX 10000204 SP evsubfw Vector Subtract from Word EVX 10000206 SP evsubifw Vector Subtract Immediate from Word EVX 10000216 SP evxor Vector XOR X 7C000774 SR B extsb[.] Extend Shign Byte X 7C000734 SR B extsh[.] Extend Sign Halfword X 7C0007B4 SR 64 extsw[.] Extend Sign Word X 7C0007AC B icbi Instruction Cache Block Invalidate X 7C0007BE P E.PD icbiep Instruction Cache Block Invalidate by External Process ID X 7C0001CC M ECL icblc Instruction Cache Block Lock Clear X 7C00002C E icbt Instruction Cache Block Touch X 7C0003CC M ECL icbtls Instruction Cache Block Touch and Lock Set X 7C00078C P E.CI ici Instruction Cache Invalidate X 7C0007CC P E.CD icread Instruction Cache Read A 7C00001E B.in isel Integer Select X 7C0000BE P E.PD lbepx Load Byte by External Process ID Indexed X 7C0000EE B lbzux Load Byte and Zero with Update Indexed X 7C0000AE B lbzx Load Byte and Zero Indexed X 7C0000A8 64 ldarx Load Doubleword and Reserve Indexed X 7C00003A P E.PD ldepx Load Doubleword by External Process ID Indexed X 7C00006A 64 ldux Load Doubleword with Update Indexed X 7C00002A 64 ldx Load Doubleword Indexed X 7C0004BE P E.PD lfdepx Load Floating-Point Double by External Process ID Indexed X 7C0002EE B lhaux Load Halfword Algebraic with Update Indexed X 7C0002AE B lhax Load Halfword Algebraic Indexed X 7C00062C B lhbrx Load Halfword Byte-Reversed Indexed X 7C00023E P E.PD lhepx Load Halfword by External Process ID Indexed X 7C00026E B lhzux Load Halfword and Zero with Update Indexed X 7C00022E B lhzx Load Halfword and Zero Indexed X 7C0004AA MA lswi Load String Word Immediate X 7C00042A MA lswx Load String Word Indexed X 7C00000E V lvebx Load Vector Element Byte Indexed X 7C00004E V lvehx Load Vector Element Halfword Indexed X 7C00024E P E.PD lvepx Load Vector by External Process ID Indexed X 7C00020E P E.PD lvepxl Load Vector by External Process ID Indexed LRU X 7C00008E V lvewx Load Vector Element Word Indexed X 7C00000C V lvsl Load Vector for Shift Left Indexed Appendix A. VLE Instruction Set Sorted by Mnemonic 723 Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction 2 mal) X 7C00004C V lvsr Load Vector for Shift Right Indexed X 7C0000CE V lvx[l] Load Vector Indexed [Last] X 7C000028 B lwarx Load Word and Reserve Indexed X 7C0002EA 64 lwaux Load Word Algebraic with Update Indexed X 7C0002AA 64 lwax Load Word Algebraic Indexed X 7C00042C B lwbrx Load Word Byte-Reversed Indexed X 7C00003E P E.PD lwepx Load Word by External Process ID Indexed X 7C00006E B lwzux Load Word and Zero with Update Indexed X 7C00002E B lwzx Load Word and Zero Indexed X 10000158 SR LIM macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed X 100001D8 SR LIM macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed X 10000198 SR LIM macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned X 10000118 SR LIM macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned X 10000058 SR LIM machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed X 100000D8 SR LIM machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed X 10000098 SR LIM machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned X 10000018 SR LIM machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned X 10000358 SR LIM maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed X 100003D8 SR LIM maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed X 10000398 SR LIM maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned X 10000318 SR LIM maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned XFX 7C0006AC E mbar Memory Barrier X 7C000400 B mcrxr Move To Condition Register From XER XFX 7C000026 B mfcr Move From Condition Register XFX 7C000286 P E mfdcr Move From Device Control Register XFX 7C000246 P E mfdcrux Move From Device Control Register User-mode Indexed XFX 7C000206 P E mfdcrx Move From Device Control Register Indexed X 7C0000A6 P B mfmsr Move From Machine State Register XFX 7C100026 B mfocrf Move From One Condition Register Field XFX 7C00029C O E.PM mfpmr Move From Performance Monitor Register XFX 7C0002A6 O B mfspr Move From Special Purpose Register VX 10000604 V mfvscr Move from Vector Status and Control Register X 7C0001DC P E.PC msgclr Message Clear X 7C00019C P E.PC msgsnd Message Send XFX 7C000120 B mtcrf Move To Condition Register Fields XFX 7C000386 P E mtdcr Move To Device Control Register X 7C000346 E mtdcrux Move To Device Control Register User-mode Indexed X 7C000306 P E mtdcrx Move To Device Control Register Indexed X 7C000124 P E mtmsr Move To Machine State Register XFX 7C100120 B mtocrf Move To One Condition Register Field XFX 7C00039C O E.PM mtpmr Move To Performance Monitor Register XFX 7C0003A6 O B mtspr Move To Special Purpose Register VX 10000644 V mtvscr Move to Vector Status and Control Register X 10000150 SR LIM mulchw[o][.] Multiply Cross Halfword to Word Signed X 10000110 SR LIM mulchwu[o][.] Multiply Cross Halfword to Word Unsigned XO 7C000092 SR 64 mulhd[.] Multiply High Doubleword XO 7C000012 SR 64 mulhdu[.] Multiply High Doubleword Unsigned X 10000050 SR LIM mulhhw[o][.] Multiply High Halfword to Word Signed X 10000010 SR LIM mulhhwu[o][.] Multiply High Halfword to Word Unsigned XO 7C000096 SR B mulhw[.] Multiply High Word XO 7C000016 SR B mulhwu[.] Multiply High Word Unsigned XO 7C0001D2 SR 64 mulld[o][.] Multiply Low Doubleword 724 Power ISATM -- Book VLE Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 XO 7C0001D6 SR B mullw[o][.] Multiply Low Word X 7C0003B8 SR B nand[.] NAND X 7C0000D0 SR B neg[o][.] Negate X 1000015C SR LIM nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Mod- ulo Signed X 100001DC SR LIM nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Sat- urate Signed X 1000005C SR LIM nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Mod- ulo Signed X 100000DC SR LIM nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Satu- rate Signed X 1000035C SR LIM nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Mod- ulo Signed X 100003DC SR LIM nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Satu- rate Signed X 7C0000F8 SR B nor[.] NOR X 7C000378 SR B or[.] OR X 7C000338 SR B orc[.] OR with Complement X 7C0000F4 B popcntb Population Count Bytes RR 0400---- VLE se_add Add Short Form OIM5 2000---- VLE se_addi Add Immediate Short Form RR 4600---- SR VLE se_and[.] AND Short Form RR 4500---- VLE se_andc AND with Complement Short Form IM5 2E00---- VLE se_andi AND Immediate Short Form BD8 E800---- VLE se_b[l] Branch [and Link] BD8 E000---- VLE se_bc Branch Conditional Short Form IM5 6000---- VLE se_bclri Bit Clear Immediate C 0006---- VLE se_bctr Branch To Count Register [and Link] IM5 6200---- VLE se_bgeni Bit Generate Immediate C 0004---- VLE se_blr Branch To Link Register [and Link] IM5 2C00---- VLE se_bmaski Bit Mask Generate Immediate IM5 6400---- VLE se_bseti Bit Set Immediate IM5 6600---- VLE se_btsti Bit Test Immediate RR 0C00---- VLE se_cmp Compare Word RR 0E00---- VLE se_cmph Compare Halfword Short Form RR 0F00---- VLE se_cmphl Compare Halfword Logical Short Form IM5 2A00---- VLE se_cmpi Compare Immediate Word Short Form RR 0D00---- VLE se_cmpl Compare Logical Word OIM5 2200---- VLE se_cmpli Compare Logical Immendiate Word R 00D0---- VLE se_extsb Extend Sign Byte Short Form R 00F0---- VLE se_extsh Extend Sign Halfword Short Form R 00C0---- VLE se_extzb Extend Zero Byte R 00E0---- VLE se_extzh Extend Zero Halfword C 0000---- VLE se_illegal Illegal C 0001---- VLE se_isync Instruction Synchronize SD4 8000---- VLE se_lbz Load Byte and Zero Short Form SD4 A000---- VLE se_lhz Load Halfword and Zero Short Form IM7 4800---- VLE se_li Load Immediate Short Form SD4 C000---- VLE se_lwz Load Word and Zero Short Form RR 0300---- VLE se_mfar Move from Alternate Register R 00A0---- VLE se_mfctr Move From Count Register R 0080---- VLE se_mflr Move From Link Register RR 0100---- VLE se_mr Move Register RR 0200---- VLE se_mtar Move To Alternate Register R 00B0---- VLE se_mtctr Move To Count Register R 0090---- VLE se_mtlr Move To Link Register RR 0500---- VLE se_mullw Multiply Low Word Short Form R 0030---- VLE se_neg Negate Short Form R 0020---- VLE se_not NOT Short Form RR 4400---- VLE se_or OR SHort Form C 0009---- P VLE se_rfci Return From Critical Interrupt Appendix A. VLE Instruction Set Sorted by Mnemonic 725 Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 C 000A---- P VLE se_rfdi Return From Debug Interrupt C 0008---- P VLE se_rfi Return from Interrupt C 000B---- P VLE se_rfmci Return From Machine Check Interrupt C 0002---- VLE se_sc System Call RR 4200---- VLE se_slw Shift Left Word IM5 6C00---- VLE se_slwi Shift Left Word Immediate Short Form RR 4100---- SR VLE se_sraw Shift Right Algebraic Word IM5 6A00---- SR VLE se_srawi Shift Right Algebraic Immediate RR 4000---- VLE se_srw Shift Right Word IM5 6800---- VLE se_srwi Shift Right Word Immediate Short Form SD4 9000---- VLE se_stb Store Byte Short Form SD4 B000---- VLE se_sth Store Halfword SHort Form SD4 D000---- VLE se_stw Store Word Short Form RR 0600---- VLE se_sub Subtract RR 0700---- VLE se_subf Subtract From Short Form OIM5 2400---- SR VLE se_subi[.] Subtract Immediate X 7C000036 SR 64 sld[.] Shift Left Doubleword X 7C000030 SR B slw[.] Shift Left Word X 7C000634 SR 64 srad[.] Shift Right Algebraic Doubleword X 7C000674 SR 64 sradi[.] Shift Right Algebraic Doubleword Immediate X 7C000630 SR B sraw[.] Shift Right Algebraic Word X 7C000670 SR B srawi[.] Shift Right Algebraic Word Immediate X 7C000436 SR 64 srd[.] Shift Right Doubleword X 7C000430 SR B srw[.] Shift Right Word X 7C0001BE P E.PD stbepx Store Byte by External Process ID Indexed X 7C0001EE B stbux Store Byte with Update Indexed X 7C0001AE B stbx Store Bye Indexed X 7C0001AD 64 stdcx. Store Doubleword Conditional Indexed X 7C00013A P E.PD stdepx Store Doubleword by External Process ID Indexed X 7C00016A 64 stdux Store Doubleword with Update Indexed X 7C00012A 64 stdx Store Doubleword Indexed X 7C0005BE P E.PD stfdepx Store Floating-Point Double by External Process ID Indexed X 7C00072C B sthbrx Store Halfword Byte-Reversed Indexed X 7C00033E P E.PD sthepx Store Halfword by External Process ID Indexed X 7C00036E B sthux Store Halfword with Update Indexed X 7C00032E B sthx Store Halfword Indexed X 7C0005AA MA stswi Store String Word Immediate X 7C00052A MA stswx Store String Word Indexed VX 7C00010E V stvebx Store Vector Element Byte Indexed VX 7C00014E V stvehx Store Vector Element Halfword Indexed X 7C00064E P E.PD stvepx Store Vector by External Process ID Indexed X 7C00060E P E.PD stvepxl Store Vector by External Process ID Indexed LRU VX 7C00018E V stvewx Store Vector Element Word Indexed VX 7C0001CE V stvx[l] Store Vector Indexed [Last] X 7C00052C B stwbrx Store Word Byte-Reversed Indexed X 7C00012D B stwcx. Store Word Conditional Indexed X 7C00013E P E.PD stwepx Store Word by External Process ID Indexed X 7C00016E B stwux Store Word with Update Indexed X 7C00012E B stwx Store Word Indexed XO 7C000050 SR B subf[o][.] Subtract From XO 7C000010 SR B subfc[o][.] Subtract From Carrying XO 7C000110 SR B subfe[o][.] Subtract From Extended XO 7C0001D0 SR B subfme[o][.] Subtract From Minus One Extended XO 7C000190 SR B subfze[o][.] Subtract From Zero Extended X 7C0004AC B sync Synchronize X 7C000088 64 td Trap Doubleword X 7C000624 P E tlbivax TLB Invalidate Virtual Address Indexed X 7C000764 P E tlbre TLB Read Entry X 7C000724 P E tlbsx TLB Search Indexed X 7C00046C P E tlbsync TLB Synchronize X 7C0007A4 P E tlbwe TLB Write Entry 726 Power ISATM -- Book VLE Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 7C000008 B tw Trap Word VX 10000180 V vaddcuw Vector Add Carryout Unsigned Word VX 1000000A V vaddfp Vector Add Floating-Point VX 10000300 V vaddsbs Vector Add Signed Byte Saturate VX 10000340 V vaddshs Vector Add Signed Halfword Saturate VX 10000380 V vaddsws Vector Add Signed Word Saturate VX 10000000 V vaddubm Vector Add Unsigned Byte Modulo VX 10000200 V vaddubs Vector Add Unsigned Byte Saturate VX 10000040 V vadduhm Vector Add Unsigned Halfword Modulo VX 10000240 V vadduhs Vector Add Unsigned Halfword Saturate VX 10000080 V vadduwm Vector Add Unsigned Word Modulo VX 10000280 V vadduws Vector Add Unsigned Word Saturate VX 10000404 V vand Vector AND VX 10000444 V vandc Vector AND with Complement VX 10000502 V vavgsb Vector Average Signed Byte VX 10000542 V vavgsh Vector Average Signed Halfword VX 10000582 V vavgsw Vector Average Signed Word VX 10000402 V vavgub Vector Average Unsigned Byte VX 10000442 V vavguh Vector Average Unsigned Halfword VX 10000482 V vavguw Vector Average Unsigned Word VX 100003CA V vcfpsxws Vector Convert from Single-Precision to Signed Fixed-Point Word Saturate VX 1000038A V vcfpuxws Vector Convert from Single-Precision to Unsigned Fixed- Point Word Saturate VX 100003C6 V vcmpbfp[.] Vector Compare Bounds Single-Precision VC 100000C6 V vcmpeqfp[.] Vector Compare Equal To Single-Precision VC 10000006 V vcmpequb[.] Vector Compare Equal To Unsigned Byte VC 10000046 V vcmpequh[.] Vector Compare Equal To Unsigned Halfword VC 10000086 V vcmpequw[.] Vector Compare Equal To Unsigned Word VC 100001C6 V vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Precision VC 100002C6 V vcmpgtfp[.] Vector Compare Greater Than Single-Precision VC 10000306 V vcmpgtsb[.] Vector Compare Greater Than Signed Byte VC 10000346 V vcmpgtsh[.] Vector Compare Greater Than Signed Halfword VC 10000386 V vcmpgtsw[.] Vector Compare Greater Than Signed Word VC 10000206 V vcmpgtub[.] Vector Compare Greater Than Unsigned Byte VC 10000246 V vcmpgtuh[.] Vector Compare Greater Than Unsigned Halfword VC 10000286 V vcmpgtuw[.] Vector Compare Greater Than Unsigned Word VX 1000034A V vcsxwfp Vector Convert from Signed Fixed-Point Word to Single- Precision VX 1000030A V vcuxwfp Vector Convert from Unsigned Fixed-Point Word to Single- Precision VX 1000018A V vexptefp Vector 2 Raised to the Exponent Estimate Floating-Point VX 100001CA V vlogefp Vector Log Base 2 Estimate Floating-Point VA 1000002E V vmaddfp Vector Multiply-Add Single-Precision VX 1000040A V vmaxfp Vector Maximum Single-Precision VX 10000102 V vmaxsb Vector Maximum Signed Byte VX 10000142 V vmaxsh Vector Maximum Signed Halfword VX 10000182 V vmaxsw Vector Maximum Signed Word VX 10000002 V vmaxub Vector Maximum Unsigned Byte VX 10000042 V vmaxuh Vector Maximum Unsigned Halfword VX 10000082 V vmaxuw Vector Maximum Unsigned Word VA 10000020 V vmhaddshs Vector Multiply-High-Add Signed Halfword Saturate VA 10000021 V vmhraddshs Vector Multiply-High-Round-Add Signed Halfword Saturate VX 1000044A V vminfp Vector Minimum Single-Precision VX 10000302 V vminsb Vector Minimum Signed Byte VX 10000342 V vminsh Vector Minimum Signed Halfword VX 10000382 V vminsw Vector Minimum Signed Word VX 10000202 V vminub Vector Minimum Unsigned Byte VX 10000242 V vminuh Vector Minimum Unsigned Halfword VX 10000282 V vminuw Vector Minimum Unsigned Word VA 10000022 V vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo Appendix A. VLE Instruction Set Sorted by Mnemonic 727 Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 VX 1000000C V vmrghb Vector Merge High Byte VX 1000004C V vmrghh Vector Merge High Halfword VX 1000008C V vmrghw Vector Merge High Word VX 1000010C V vmrglb Vector Merge Low Byte VX 1000014C V vmrglh Vector Merge Low Halfword VX 1000018C V vmrglw Vector Merge Low Word VA 10000025 V vmsummbm Vector Multiply-Sum Mixed Byte Modulo VA 10000028 V vmsumshm Vector Multiply-Sum Signed Halfword Modulo VA 10000029 V vmsumshs Vector Multiply-Sum Signed Halfword Saturate VA 10000024 V vmsumubm Vector Multiply-Sum Unsigned Byte Modulo VA 10000026 V vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo VA 10000027 V vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate VX 10000308 V vmulesb Vector Multiply Even Signed Byte VX 10000348 V vmulesh Vector Multiply Even Signed Halfword VX 10000208 V vmuleub Vector Multiply Even Unsigned Byte VX 10000248 V vmuleuh Vector Multiply Even Unsigned Halfword VX 10000108 V vmulosb Vector Multiply Odd Signed Byte VX 10000148 V vmulosh Vector Multiply Odd Signed Halfword VX 10000008 V vmuloub Vector Multiply Odd Unsigned Byte VX 10000048 V vmulouh Vector Multiply Odd Unsigned Halfword VA 1000002F V vnmsubfp Vector Negative Multiply-Subtract Single-Precision VX 10000504 V vnor Vector NOR VX 10000484 V vor Vector OR VA 1000002B V vperm Vector Permute VX 1000030E V vpkpx Vector Pack Pixel VX 1000018E V vpkshss Vector Pack Signed Halfword Signed Saturate VX 1000010E V vpkshus Vector Pack Signed Halfword Unsigned Saturate VX 100001CE V vpkswss Vector Pack Signed Word Signed Saturate VX 1000014E V vpkswus Vector Pack Signed Word Unsigned Saturate VX 1000000E V vpkuhum Vector Pack Unsigned Halfword Unsigned Modulo VX 1000008E V vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate VX 1000004E V vpkuwum Vector Pack Unsigned Word Unsigned Modulo VX 100000CE V vpkuwus Vector Pack Unsigned Word Unsigned Saturate VX 1000010A V vrefp Vector Reciprocal Estimate Single-Precision VX 100002CA V vrfim Vector Round to Single-Precision Integer toward -Infinity VX 1000020A V vrfin Vector Round to Single-Precision Integer Nearest VX 1000028A V vrfip Vector Round to Single-Precision Integer toward +Infinity VX 1000024A V vrfiz Vector Round to Single-Precision Integer toward Zero VX 10000004 V vrlb Vector Rotate Left Byte VX 10000044 V vrlh Vector Rotate Left Halfword VX 10000084 V vrlw Vector Rotate Left Word VX 1000014A V vrsqrtefp Vector Reciprocal Square Root Estimate Single-Precision VA 1000002A V vsel Vector Select VX 100001C4 V vsl Vector Shift Left VX 10000104 V vslb Vector Shift Left Byte VA 1000002C V vsldoi Vector Shift Left Double by Octet Immediate VX 10000144 V vslh Vector Shift Left Halfword VX 1000040C V vslo Vector Shift Left by Octet VX 10000184 V vslw Vector Shift Left Word VX 1000020C V vspltb Vector Splat Byte VX 1000024C V vsplth Vector Splat Halfword VX 1000030C V vspltisb Vector Splat Immediate Signed Byte VX 1000034C V vspltish Vector Splat Immediate Signed Halfword VX 1000038C V vspltisw Vector Splat Immediate Signed Word VX 1000028C V vspltw Vector Splat Word VX 100002C4 V vsr Vector Shift Right VX 10000304 V vsrab Vector Shift Right Algebraic Word VX 10000344 V vsrah Vector Shift Right Algebraic Word VX 10000384 V vsraw Vector Shift Right Algebraic Word VX 10000204 V vsrb Vector Shift Right Byte VX 10000244 V vsrh Vector Shift Right Halfword 728 Power ISATM -- Book VLE Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 VX 1000044C V vsro Vector Shift Right by Octet VX 10000284 V vsrw Vector Shift Right Word VX 10000580 V vsubcuw Vector Subtract and Write Carry-Out Unsigned Word VX 1000004A V vsubfp Vector Subtract Single-Precision VX 10000700 V vsubsbs Vector Subtract Signed Byte Saturate VX 10000740 V vsubshs Vector Subtract Signed Halfword Saturate VX 10000780 V vsubsws Vector Subtract Signed Word Saturate VX 10000400 V vsububm Vector Subtract Unsigned Byte Modulo VX 10000600 V vsububs Vector Subtract Unsigned Byte Saturate VX 10000440 V vsubuhm Vector Subtract Unsigned Byte Modulo VX 10000640 V vsubuhs Vector Subtract Unsigned Halfword Saturate VX 10000480 V vsubuwm Vector Subtract Unsigned Word Modulo VX 10000680 V vsubuws Vector Subtract Unsigned Word Saturate VX 10000688 V vsum2sws Vector Sum across Half Signed Word Saturate VX 10000708 V vsum4sbs Vector Sum across Quarter Signed Byte Saturate VX 10000648 V vsum4shs Vector Sum across Quarter Signed Halfword Saturate VX 10000608 V vsum4ubs Vector Sum across Quarter Unsigned Byte Saturate VX 10000788 V vsumsws Vector Sum across Signed Word Saturate VX 1000034E V vupkhpx Vector Unpack High Pixel VX 1000020E V vupkhsb Vector Unpack High Signed Byte VX 1000024E V vupkhsh Vector Unpack High Signed Halfword VX 100003CE V vupklpx Vector Unpack Low Pixel VX 1000028E V vupklsb Vector Unpack Low Signed Byte VX 100002CE V vupklsh Vector Unpack Low Signed Halfword VX 100004C4 V vxor Vector XOR X 7C00007C WT wait Wait X 7C000106 P E wrtee Write MSR External Enable X 7C000146 P E wrteei Write MSR External Enable Immediate D 7C000278 SR B xor[.] XOR 1 See the key to the mode dependency and privilege columns on page 839 and the key to the category column in Section 1.3.5 of Book I. 2 For 16-bit instructions, the "Opcode" column represents the 16-bit hexadecimal instruction encoding with the opcode and extended opcode in the corresponding fields in the instruction, and with 0's in bit positions which are not opcode bits; dashes are used following the opcode to indicate the form is a 16-bit instruction. For 32-bit instructions, the "Opcode" column represents the 32-bit hexadecimal instruction encoding with the opcode and extended opcode in the corresponding fields in the instruction, and with 0's in bit positions which are not opcode bits. Appendix A. VLE Instruction Set Sorted by Mnemonic 729 Version 2.04 730 Power ISATM -- Book VLE Version 2.04 Appendix B. VLE Instruction Set Sorted by Opcode This appendix lists all the instructions available in VLE mode in the Power ISA , in order by opcode. Opcodes that are not defined below are treated as illegal by category VLE. Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 C 0000---- VLE se_illegal Illegal C 0001---- VLE se_isync Instruction Synchronize C 0002---- VLE se_sc System Call C 0004---- VLE se_blr Branch To Link Register [and Link] C 0006---- VLE se_bctr Branch To Count Register [and Link] C 0008---- P VLE se_rfi Return from Interrupt C 0009---- P VLE se_rfci Return From Critical Interrupt C 000A---- P VLE se_rfdi Return From Debug Interrupt C 000B---- P VLE se_rfmci Return From Machine Check Interrupt R 0020---- VLE se_not NOT Short Form R 0030---- VLE se_neg Negate Short Form R 0080---- VLE se_mflr Move From Link Register R 0090---- VLE se_mtlr Move To Link Register R 00A0---- VLE se_mfctr Move From Count Register R 00B0---- VLE se_mtctr Move To Count Register R 00C0---- VLE se_extzb Extend Zero Byte R 00D0---- VLE se_extsb Extend Sign Byte Short Form R 00E0---- VLE se_extzh Extend Zero Halfword R 00F0---- VLE se_extsh Extend Sign Halfword Short Form RR 0100---- VLE se_mr Move Register RR 0200---- VLE se_mtar Move To Alternate Register RR 0300---- VLE se_mfar Move from Alternate Register RR 0400---- VLE se_add Add Short Form RR 0500---- VLE se_mullw Multiply Low Word Short Form RR 0600---- VLE se_sub Subtract RR 0700---- VLE se_subf Subtract From Short Form RR 0C00---- VLE se_cmp Compare Word RR 0D00---- VLE se_cmpl Compare Logical Word RR 0E00---- VLE se_cmph Compare Halfword Short Form RR 0F00---- VLE se_cmphl Compare Halfword Logical Short Form VX 10000000 V vaddubm Vector Add Unsigned Byte Modulo VX 10000002 V vmaxub Vector Maximum Unsigned Byte VX 10000004 V vrlb Vector Rotate Left Byte VC 10000006 V vcmpequb[.] Vector Compare Equal To Unsigned Byte VX 10000008 V vmuloub Vector Multiply Odd Unsigned Byte VX 1000000A V vaddfp Vector Add Floating-Point VX 1000000C V vmrghb Vector Merge High Byte VX 1000000E V vpkuhum Vector Pack Unsigned Halfword Unsigned Modulo X 10000010 SR LIM mulhhwu[o][.] Multiply High Halfword to Word Unsigned X 10000018 SR LIM machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned VA 10000020 V vmhaddshs Vector Multiply-High-Add Signed Halfword Saturate VA 10000021 V vmhraddshs Vector Multiply-High-Round-Add Signed Halfword Saturate VA 10000022 V vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo VA 10000024 V vmsumubm Vector Multiply-Sum Unsigned Byte Modulo Appendix B. VLE Instruction Set Sorted by Opcode 731 Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 VA 10000025 V vmsummbm Vector Multiply-Sum Mixed Byte Modulo VA 10000026 V vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo VA 10000027 V vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate VA 10000028 V vmsumshm Vector Multiply-Sum Signed Halfword Modulo VA 10000029 V vmsumshs Vector Multiply-Sum Signed Halfword Saturate VA 1000002A V vsel Vector Select VA 1000002B V vperm Vector Permute VA 1000002C V vsldoi Vector Shift Left Double by Octet Immediate VA 1000002E V vmaddfp Vector Multiply-Add Single-Precision VA 1000002F V vnmsubfp Vector Negative Multiply-Subtract Single-Precision VX 10000040 V vadduhm Vector Add Unsigned Halfword Modulo VX 10000042 V vmaxuh Vector Maximum Unsigned Halfword VX 10000044 V vrlh Vector Rotate Left Halfword VC 10000046 V vcmpequh[.] Vector Compare Equal To Unsigned Halfword VX 10000048 V vmulouh Vector Multiply Odd Unsigned Halfword VX 1000004A V vsubfp Vector Subtract Single-Precision VX 1000004C V vmrghh Vector Merge High Halfword VX 1000004E V vpkuwum Vector Pack Unsigned Word Unsigned Modulo X 10000050 SR LIM mulhhw[o][.] Multiply High Halfword to Word Signed X 10000058 SR LIM machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed X 1000005C SR LIM nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Mod- ulo Signed VX 10000080 V vadduwm Vector Add Unsigned Word Modulo VX 10000082 V vmaxuw Vector Maximum Unsigned Word VX 10000084 V vrlw Vector Rotate Left Word VC 10000086 V vcmpequw[.] Vector Compare Equal To Unsigned Word VX 1000008C V vmrghw Vector Merge High Word VX 1000008E V vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate X 10000098 SR LIM machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned VC 100000C6 V vcmpeqfp[.] Vector Compare Equal To Single-Precision VX 100000CE V vpkuwus Vector Pack Unsigned Word Unsigned Saturate X 100000D8 SR LIM machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed X 100000DC SR LIM nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Satu- rate Signed VX 10000102 V vmaxsb Vector Maximum Signed Byte VX 10000104 V vslb Vector Shift Left Byte VX 10000108 V vmulosb Vector Multiply Odd Signed Byte VX 1000010A V vrefp Vector Reciprocal Estimate Single-Precision VX 1000010C V vmrglb Vector Merge Low Byte VX 1000010E V vpkshus Vector Pack Signed Halfword Unsigned Saturate X 10000110 SR LIM mulchwu[o][.] Multiply Cross Halfword to Word Unsigned X 10000118 SR LIM macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned VX 10000142 V vmaxsh Vector Maximum Signed Halfword VX 10000144 V vslh Vector Shift Left Halfword VX 10000148 V vmulosh Vector Multiply Odd Signed Halfword VX 1000014A V vrsqrtefp Vector Reciprocal Square Root Estimate Single-Precision VX 1000014C V vmrglh Vector Merge Low Halfword VX 1000014E V vpkswus Vector Pack Signed Word Unsigned Saturate X 10000150 SR LIM mulchw[o][.] Multiply Cross Halfword to Word Signed X 10000158 SR LIM macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed X 1000015C SR LIM nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Mod- ulo Signed VX 10000180 V vaddcuw Vector Add Carryout Unsigned Word VX 10000182 V vmaxsw Vector Maximum Signed Word VX 10000184 V vslw Vector Shift Left Word VX 1000018A V vexptefp Vector 2 Raised to the Exponent Estimate Floating-Point VX 1000018C V vmrglw Vector Merge Low Word 732 Power ISATM -- Book VLE Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 VX 1000018E V vpkshss Vector Pack Signed Halfword Signed Saturate X 10000198 SR LIM macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned VX 100001C4 V vsl Vector Shift Left VC 100001C6 V vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Precision VX 100001CA V vlogefp Vector Log Base 2 Estimate Floating-Point VX 100001CE V vpkswss Vector Pack Signed Word Signed Saturate X 100001D8 SR LIM macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed X 100001DC SR LIM nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Sat- urate Signed EVX 10000200 SP evaddw Vector Add Word VX 10000200 V vaddubs Vector Add Unsigned Byte Saturate EVX 10000202 SP evaddiw Vector Add Immediate Word VX 10000202 V vminub Vector Minimum Unsigned Byte EVX 10000204 SP evsubfw Vector Subtract from Word VX 10000204 V vsrb Vector Shift Right Byte EVX 10000206 SP evsubifw Vector Subtract Immediate from Word VC 10000206 V vcmpgtub[.] Vector Compare Greater Than Unsigned Byte EVX 10000208 SP evabs Vector Absolute Value VX 10000208 V vmuleub Vector Multiply Even Unsigned Byte EVX 10000209 SP evneg Vector Negate EVX 1000020A SP evextsb Vector Extend Sign Byte VX 1000020A V vrfin Vector Round to Single-Precision Integer Nearest EVX 1000020B SP evextsh Vector Extend Sign Halfword EVX 1000020C SP evrndw Vector Round Word VX 1000020C V vspltb Vector Splat Byte EVX 1000020D SP evcntlzw Vector Count Leading Zeros Bits Word EVX 1000020E SP evcntlsw Vector Count Leading Sign Bits Word VX 1000020E V vupkhsb Vector Unpack High Signed Byte EVX 1000020F SP brinc Bit Reverse Increment EVX 10000211 SP evand Vector AND EVX 10000212 SP evandc Vector AND with Complement EVX 10000216 SP evxor Vector XOR EVX 10000217 SP evor Vector OR EVX 10000218 SP evnor Vector NOR EVX 10000219 SP eveqv Vector Equivalent EVX 1000021B SP evorc Vector OR with Complement EVX 1000021E SP evnand Vector NAND EVX 10000220 SP evsrwu Vector Shift Right Word Unsigned EVX 10000221 SP evsrws Vector Shift Right Word Signed EVX 10000222 SP evsrwiu Vector Shift Right Word Immediate Unsigned EVX 10000223 SP evsrwis Vector Shift Right Word Immediate Signed EVX 10000224 SP evslw Vector Shift Left Word EVX 10000226 SP evslwi Vector Shift Left Word Immediate EVX 10000228 SP evrlw Vector Rotate Left Word EVX 10000229 SP evsplati Vector Splat Immediate EVX 1000022A SP evrlwi Vector Rotate Left Word Immediate EVX 1000022B SP evsplatfi Vector Splat Fractional Immediate EVX 1000022C SP evmergehi Vector Merge High EVX 1000022D SP evmergelo Vector Merge Low EVX 1000022E SP evmergehilo Vector Merge High/Low EVX 1000022F SP evmergelohi Vector Merge Low/High EVX 10000230 SP evcmpgtu Vector Compare Greater Than Unsigned EVX 10000231 SP evcmpgts Vector Compare Greater Than Signed EVX 10000232 SP evcmpltu Vector Compare Less Than Unsigned EVX 10000233 SP evcmplts Vector Compare Less Than Signed EVX 10000234 SP evcmpeq Vector Compare Equal VX 10000240 V vadduhs Vector Add Unsigned Halfword Saturate VX 10000242 V vminuh Vector Minimum Unsigned Halfword VX 10000244 V vsrh Vector Shift Right Halfword Appendix B. VLE Instruction Set Sorted by Opcode 733 Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 VC 10000246 V vcmpgtuh[.] Vector Compare Greater Than Unsigned Halfword VX 10000248 V vmuleuh Vector Multiply Even Unsigned Halfword VX 1000024A V vrfiz Vector Round to Single-Precision Integer toward Zero VX 1000024C V vsplth Vector Splat Halfword VX 1000024E V vupkhsh Vector Unpack High Signed Halfword EVSE 10000278 SP evsel Vector Select L EVX 10000280 SP.FV evfsadd Vector Floating-Point Single-Precision Add VX 10000280 V vadduws Vector Add Unsigned Word Saturate EVX 10000281 SP.FV evfssub Vector Floating-Point Single-Precision Subtract VX 10000282 V vminuw Vector Minimum Unsigned Word EVX 10000284 SP.FV evfsabs Vector Floating-Point Single-Precision Absolute Value VX 10000284 V vsrw Vector Shift Right Word EVX 10000285 SP.FV evfsnabs Vector Floating-Point Single-Precision Negative Absolute Value EVX 10000286 SP.FV evfsneg Vector Floating-Point Single-Precision Negate VC 10000286 V vcmpgtuw[.] Vector Compare Greater Than Unsigned Word EVX 10000288 SP.FV evfsmul Vector Floating-Point Single-Precision Multiply EVX 10000289 SP.FV evfsdiv Vector Floating-Point Single-Precision Divide VX 1000028A V vrfip Vector Round to Single-Precision Integer toward +Infinity EVX 1000028C SP.FV evfscmpgt Vector Floating-Point Single-Precision Compare Greater Than VX 1000028C V vspltw Vector Splat Word EVX 1000028D SP.FV evfscmplt Vector Floating-Point Single-Precision Compare Less Than EVX 1000028E SP.FV evfscmpeq Vector Floating-Point Single-Precision Compare Equal VX 1000028E V vupklsb Vector Unpack Low Signed Byte EVX 10000290 SP.FV evfscfui Vector Convert Floating-Point Single-Precision from Unsigned Integer EVX 10000291 SP.FV evfscfsi Vector Convert Floating-Point Single-Precision from Signed Integer EVX 10000292 SP.FV evfscfuf Vector Convert Floating-Point Single-Precision from Unsigned Fraction EVX 10000293 SP.FV evfscfsf Vector Convert Floating-Point Single-Precision from Signed Fraction EVX 10000294 SP.FV evfsctui Vector Convert Floating-Point Single-Precision to Unsigned Integer EVX 10000295 SP.FV evfsctsi Vector Convert Floating-Point Single-Precision to Signed Integer EVX 10000296 SP.FV evfsctuf Vector Convert Floating-Point Single-Precision to Unsigned Fraction EVX 10000297 SP.FV evfsctsf Vector Convert Floating-Point Single-Precision to Signed Fraction EVX 10000298 SP.FV evfsctuiz Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round Towards Zero EVX 1000029A SP.FV evfsctsiz Vector Convert Floating-Point Single-Precision to Signed Integer with Round Towards Zero EVX 1000029C SP.FV evfststgt Vector Floating-Point Single-Precision Test Greater Than EVX 1000029D SP.FV evfststlt Vector Floating-Point Single-Precision Test Less Than EVX 1000029E SP.FV evfststeq Vector Floating-Point Single-Precision Test Equal VX 100002C4 V vsr Vector Shift Right VC 100002C6 V vcmpgtfp[.] Vector Compare Greater Than Single-Precision VX 100002CA V vrfim Vector Round to Single-Precision Integer toward -Infinity VX 100002CE V vupklsh Vector Unpack Low Signed Halfword EVX 100002CF SP.FD efscfd Floating-Point Single-Precision Convert from Double-Preci- sion EVX 100002E0 SP.FD efdadd Floating-Point Double-Precision Add EVX 100002E0 SP.FS efsadd Floating-Point Single-Precision Add EVX 100002E1 SP.FD efdsub Floating-Point Double-Precision Subtract EVX 100002E1 SP.FS efssub Floating-Point Single-Precision Subtract 734 Power ISATM -- Book VLE Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 100002E2 SP.FD efdcfuid Convert Floating-Point Double-Precision from Unsigned Integer Doubleword EVX 100002E2 SP.FS efscfuid Convert Floating-Point Single-Precision from Unsigned Inte- ger Doubleword EVX 100002E3 SP.FD efdcfsid Convert Floating-Point Double-Precision from Signed Inte- ger Doubleword EVX 100002E3 SP.FS efscfsid Convert Floating-Point Single-Precision from Signed Integer Doubleword EVX 100002E4 SP.FD efdabs Floating-Point Double-Precision Absolute Value EVX 100002E4 SP.FS efsabs Floating-Point Single-Precision Absolute Value EVX 100002E5 SP.FD efdnabs Floating-Point Double-Precision Negative Absolute Value EVX 100002E5 SP.FS efsnabs Floating-Point Single-Precision Negative Absolute Value EVX 100002E6 SP.FD efdneg Floating-Point Double-Precision Negate EVX 100002E6 SP.FS efsneg Floating-Point Single-Precision Negate EVX 100002E8 SP.FD efdmul Floating-Point Double-Precision Multiply EVX 100002E8 SP.FS efsmul Floating-Point Single-Precision Multiply EVX 100002E9 SP.FD efddiv Floating-Point Double-Precision Divide EVX 100002E9 SP.FS efsdiv Floating-Point Single-Precision Divide EVX 100002EA SP.FD efdctuidz Convert Floating-Point Double-Precision to Unsigned Inte- ger Doubleword with Round Towards Zero EVX 100002EA SP.FS efsctuidz Convert Floating-Point Single-Precision to Unsigned Integer Doubleword with Round Towards Zero EVX 100002EB SP.FD efdctsidz Convert Floating-Point Double-Precision to Signed Integer Doubleword with Round Towards Zero EVX 100002EB SP.FS efsctsidz Convert Floating-Point Single-Precision to Signed Integer Doubleword with Round Towards Zero EVX 100002EC SP.FD efdcmpgt Floating-Point Double-Precision Compare Greater Than EVX 100002EC SP.FS efscmpgt Floating-Point Single-Precision Compare Greater Than EVX 100002ED SP.FD efdcmplt Floating-Point Double-Precision Compare Less Than EVX 100002ED SP.FS efscmplt Floating-Point Single-Precision Compare Less Than EVX 100002EE SP.FD efdcmpeq Floating-Point Double-Precision Compare Equal EVX 100002EE SP.FS efscmpeq Floating-Point Single-Precision Compare Equal EVX 100002EF SP.FD efdcfs Floating-Point Double-Precision Convert from Single-Preci- sion EVX 100002F0 SP.FD efdcfui Convert Floating-Point Double-Precision from Unsigned Integer EVX 100002F0 SP.FS efscfui Convert Floating-Point Single-Precision from Unsigned Inte- ger EVX 100002F1 SP.FD efdcfsi Convert Floating-Point Double-Precision from Signed Inte- ger EVX 100002F1 SP.FS efscfsi Convert Floating-Point Single-Precision from Signed Integer EVX 100002F2 SP.FD efdcfuf Convert Floating-Point Double-Precision from Unsigned Fraction EVX 100002F2 SP.FS efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction EVX 100002F3 SP.FD efdcfsf Convert Floating-Point Double-Precision from Signed Frac- tion EVX 100002F3 SP.FS efscfsf Convert Floating-Point Single-Precision from Signed Frac- tion EVX 100002F4 SP.FD efdctui Convert Floating-Point Double-Precision to Unsigned Inte- ger EVX 100002F4 SP.FS efsctui Convert Floating-Point Single-Precision to Unsigned Integer EVX 100002F5 SP.FD efdctsi Convert Floating-Point Double-Precision to Signed Integer EVX 100002F5 SP.FS efsctsi Convert Floating-Point Single-Precision to Signed Integer EVX 100002F6 SP.FD efdctuf Convert Floating-Point Double-Precision to Unsigned Frac- tion EVX 100002F6 SP.FS efsctuf Convert Floating-Point Single-Precision to Unsigned Frac- tion EVX 100002F7 SP.FD efdctsf Convert Floating-Point Double-Precision to Signed Fraction EVX 100002F7 SP.FS efsctsf Convert Floating-Point Single-Precision to Signed Fraction Appendix B. VLE Instruction Set Sorted by Opcode 735 Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 100002F8 SP.FD efdctuiz Convert Floating-Point Double-Precision to Unsigned Inte- ger with Round Towards Zero EVX 100002F8 SP.FS efsctuiz Convert Floating-Point Single-Precision to Unsigned Integer with Round Towards Zero EVX 100002FA SP.FD efdctsiz Convert Floating-Point Double-Precision to Signed Integer with Round Towards Zero EVX 100002FA SP.FS efsctsiz Convert Floating-Point Single-Precision to Signed Integer with Round Towards Zero EVX 100002FC SP.FD efdtstgt Floating-Point Double-Precision Test Greater Than EVX 100002FC SP.FS efststgt Floating-Point Single-Precision Test Greater Than EVX 100002FD SP.FD efdtstlt Floating-Point Double-Precision Test Less Than EVX 100002FD SP.FS efststlt Floating-Point Single-Precision Test Less Than EVX 100002FE SP.FD efdtsteq Floating-Point Double-Precision Test Equal EVX 100002FE SP.FS efststeq Floating-Point Single-Precision Test Equal EVX 10000300 SP evlddx Vector Load Doubleword into Doubleword Indexed VX 10000300 V vaddsbs Vector Add Signed Byte Saturate EVX 10000301 SP evldd Vector Load Doubleword into Doubleword EVX 10000302 SP evldwx Vector Load Doubleword into 2 Words Indexed VX 10000302 V vminsb Vector Minimum Signed Byte EVX 10000303 SP evldw Vector Load Doubleword into 2 Words EVX 10000304 SP evldhx Vector Load Doubleword into 4 Halfwords Indexed VX 10000304 V vsrab Vector Shift Right Algebraic Word EVX 10000305 SP evldh Vector Load Doubleword into 4 Halfwords VC 10000306 V vcmpgtsb[.] Vector Compare Greater Than Signed Byte EVX 10000308 SP evlhhesplatx Vector Load Halfword into Halfwords Even and Splat Indexed VX 10000308 V vmulesb Vector Multiply Even Signed Byte EVX 10000309 SP evlhhesplat Vector Load Halfword into Halfwords Even and Splat VX 1000030A V vcuxwfp Vector Convert from Unsigned Fixed-Point Word to Single- Precision EVX 1000030C SP evlhhousplatx Vector Load Halfword into Halfwords Odd Unsigned and Splat Indexed VX 1000030C V vspltisb Vector Splat Immediate Signed Byte EVX 1000030D SP evlhhousplat Vector Load Halfword into Halfwords Odd Unsigned and Splat EVX 1000030E SP evlhhossplatx Vector Load Halfword into Halfwords Odd Signed and Splat Indexed VX 1000030E V vpkpx Vector Pack Pixel EVX 1000030F SP evlhhossplat Vector Load Halfword into Halfwords Odd and Splat EVX 10000310 SP evlwhex Vector Load Word into Two Halfwords Even Indexed EVX 10000311 SP evlwhe Vector Load Word into Two Halfwords Even EVX 10000314 SP evlwhoux Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) EVX 10000315 SP evlwhou Vector Load Word into Two Halfwords Odd Unsigned (zero- extended) EVX 10000316 SP evlwhosx Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) EVX 10000317 SP evlwhos Vector Load Word into Two Halfwords Odd Signed (with sign extension) EVX 10000318 SP evlwwsplatx Vector Load Word into Word and Splat Indexed X 10000318 SR LIM maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned EVX 10000319 SP evlwwsplat Vector Load Word into Word and Splat EVX 1000031C SP evlwhsplatx Vector Load Word into Two Halfwords and Splat Indexed EVX 1000031D SP evlwhsplat Vector Load Word into Two Halfwords and Splat EVX 10000320 SP evstddx Vector Store Doubleword of Doubleword Indexed EVX 10000321 SP evstdd Vector Store Doubleword of Doubleword EVX 10000322 SP evstdwx Vector Store Doubleword of Two Words Indexed EVX 10000323 SP evstdw Vector Store Doubleword of Two Words EVX 10000324 SP evstdhx Vector Store Doubleword of Four Halfwords Indexed 736 Power ISATM -- Book VLE Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000325 SP evstdh Vector Store Doubleword of Four Halfwords EVX 10000330 SP evstwhex Vector Store Word of Two Halfwords from Even Indexed EVX 10000331 SP evstwhe Vector Store Word of Two Halfwords from Even EVX 10000334 SP evstwhox Vector Store Word of Two Halfwords from Odd Indexed EVX 10000335 SP evstwho Vector Store Word of Two Halfwords from Odd EVX 10000338 SP evstwwex Vector Store Word of Word from Even Indexed EVX 10000339 SP evstwwe Vector Store Word of Word from Even EVX 1000033C SP evstwwox Vector Store Word of Word from Odd Indexed EVX 1000033D SP evstwwo Vector Store Word of Word from Odd VX 10000340 V vaddshs Vector Add Signed Halfword Saturate VX 10000342 V vminsh Vector Minimum Signed Halfword VX 10000344 V vsrah Vector Shift Right Algebraic Word VC 10000346 V vcmpgtsh[.] Vector Compare Greater Than Signed Halfword VX 10000348 V vmulesh Vector Multiply Even Signed Halfword VX 1000034A V vcsxwfp Vector Convert from Signed Fixed-Point Word to Single- Precision VX 1000034C V vspltish Vector Splat Immediate Signed Halfword VX 1000034E V vupkhpx Vector Unpack High Pixel X 10000358 SR LIM maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed X 1000035C SR LIM nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Mod- ulo Signed VX 10000380 V vaddsws Vector Add Signed Word Saturate VX 10000382 V vminsw Vector Minimum Signed Word VX 10000384 V vsraw Vector Shift Right Algebraic Word VC 10000386 V vcmpgtsw[.] Vector Compare Greater Than Signed Word VX 1000038A V vcfpuxws Vector Convert from Single-Precision to Unsigned Fixed- Point Word Saturate VX 1000038C V vspltisw Vector Splat Immediate Signed Word X 10000398 SR LIM maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned VC 100003C6 V vcmpbfp[.] Vector Compare Bounds Single-Precision VX 100003CA V vcfpsxws Vector Convert from Single-Precision to Signed Fixed-Point Word Saturate VX 100003CE V vupklpx Vector Unpack Low Pixel X 100003D8 SR LIM maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed X 100003DC SR LIM nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Satu- rate Signed VX 10000400 V vsububm Vector Subtract Unsigned Byte Modulo VX 10000402 V vavgub Vector Average Unsigned Byte EVX 10000403 SP evmhessf Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional VX 10000404 V vand Vector AND EVX 10000407 SP evmhossf Vector Multiply Halfwords, Odd, Signed, Fractional EVX 10000408 SP evmheumi Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer EVX 10000409 SP evmhesmi Vector Multiply Halfwords, Even, Signed, Modulo, Integer VX 1000040A V vmaxfp Vector Maximum Single-Precision EVX 1000040B SP evmhesmf Vector Multiply Halfwords, Even, Signed, Modulo, Fractional EVX 1000040C SP evmhoumi Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer VX 1000040C V vslo Vector Shift Left by Octet EVX 1000040D SP evmhosmi Vector Multiply Halfwords, Odd, Signed, Modulo, Integer EVX 1000040F SP evmhosmf Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional EVX 10000423 SP evmhessfa Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional to Accumulator EVX 10000427 SP evmhossfa Vector Multiply Halfwords, Odd, Signed, Fractional to Accu- mulator EVX 10000428 SP evmheumia Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator EVX 10000429 SP evmhesmia Vector Multiply Halfwords, Even, Signed, Modulo, Integer to Accumulator EVX 1000042B SP evmhesmfa Vector Multiply Halfwords, Even, Signed, Modulo, Fractional to Accumulate Appendix B. VLE Instruction Set Sorted by Opcode 737 Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 1000042C SP evmhoumia Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator EVX 1000042D SP evmhosmia Vector Multiply Halfwords, Odd, Signed, Modulo, Integer to Accumulator EVX 1000042F SP evmhosmfa Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional to Accumulator VX 10000440 V vsubuhm Vector Subtract Unsigned Byte Modulo VX 10000442 V vavguh Vector Average Unsigned Halfword VX 10000444 V vandc Vector AND with Complement EVX 10000447 SP evmwhssf Vector Multiply Word High Signed, Fractional EVX 10000448 SP evmwlumi Vector Multiply Word Low Unsigned, Modulo, Integer VX 1000044A V vminfp Vector Minimum Single-Precision EVX 1000044C SP evmwhumi Vector Multiply Word High Unsigned, Modulo, Integer VX 1000044C V vsro Vector Shift Right by Octet EVX 1000044D SP evmwhsmi Vector Multiply Word High Signed, Modulo, Integer EVX 1000044F SP evmwhsmf Vector Multiply Word High Signed, Modulo, Fractional EVX 10000453 SP evmwssf Vector Multiply Word Signed, Saturate, Fractional EVX 10000458 SP evmwumi Vector Multiply Word Unsigned, Modulo, Integer EVX 10000459 SP evmwsmi Vector Multiply Word Signed, Modulo, Integer EVX 1000045B SP evmwsmf Vector Multiply Word Signed, Modulo, Fractional EVX 10000467 SP evmwhssfa Vector Multiply Word High Signed, Fractional to Accumula- tor EVX 10000468 SP evmwlumia Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator EVX 1000046C SP evmwhumia Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator EVX 1000046D SP evmwhsmia Vector Multiply Word High Signed, Modulo, Integer to Accu- mulator EVX 1000046F SP evmwhsmfa Vector Multiply Word High Signed, Modulo, Fractional to Accumulator EVX 10000473 SP evmwssfa Vector Multiply Word Signed, Saturate, Fractional to Accu- mulator EVX 10000478 SP evmwumia Vector Multiply Word Unsigned, Modulo, Integer to Accumu- lator EVX 10000479 SP evmwsmia Vector Multiply Word Signed, Modulo, Integer to Accumula- tor EVX 1000047B SP evmwsmfa Vector Multiply Word Signed, Modulo, Fractional to Accu- mulator VX 10000480 V vsubuwm Vector Subtract Unsigned Word Modulo VX 10000482 V vavguw Vector Average Unsigned Word VX 10000484 V vor Vector OR EVX 100004C0 SP evaddusiaaw Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX 100004C1 SP evaddssiaaw Vector Add Signed, Saturate, Integer to Accumulator Word EVX 100004C2 SP evsubfusiaaw Vector Subtract Unsigned, Saturate, Integer to Accumulator Word EVX 100004C3 SP evsubfssiaaw Vector Subtract Signed, Saturate, Integer to Accumulator Word EVX 100004C4 SP evmra Initialize Accumulator VX 100004C4 V vxor Vector XOR EVX 100004C6 SP evdivws Vector Divide Word Signed EVX 100004C7 SP evdivwu Vector Divide Word Unsigned EVX 100004C8 SP evaddumiaaw Vector Add Unsigned, Modulo, Integer to Accumulator Word EVX 100004C9 SP evaddsmiaaw Vector Add Signed, Modulo, Integer to Accumulator Word EVX 100004CA SP evsubfumiaaw Vector Subtract Unsigned, Modulo, Integer to Accumulator Word EVX 100004CB SP evsubfsmiaaw Vector Subtract Signed, Modulo, Integer to Accumulator Word EVX 10000500 SP evmheusiaaw Vector Multiply Halfwords, Even, Unsigned, Saturate Integer and Accumulate into Words 738 Power ISATM -- Book VLE Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000501 SP evmhessiaaw Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate into Words VX 10000502 V vavgsb Vector Average Signed Byte EVX 10000503 SP evmhessfaaw Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional and Accumulate into Words EVX 10000504 SP evmhousiaaw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words VX 10000504 V vnor Vector NOR EVX 10000505 SP evmhossiaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate into Words EVX 10000507 SP evmhossfaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate into Words EVX 10000508 SP evmheumiaaw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words EVX 10000509 SP evmhesmiaaw Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate into Words EVX 1000050B SP evmhesmfaaw Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate into Words EVX 1000050C SP evmhoumiaaw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words EVX 1000050D SP evmhosmiaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate into Words EVX 1000050F SP evmhosmfaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate into Words EVX 10000528 SP evmhegumiaa Vector Multiply Halfwords, Even, Guarded, Unsigned, Mod- ulo, Integer and Accumulate EVX 10000529 SP evmhegsmiaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate EVX 1000052B SP evmhegsmfaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate EVX 1000052C SP evmhogumiaa Vector Multiply Halfwords, Odd, Guarded, Unsigned, Mod- ulo, Integer and Accumulate EVX 1000052D SP evmhogsmiaa Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate EVX 1000052F SP evmhogsmfaa Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate EVX 10000540 SP evmwlusiaaw Vector Multiply Word Low Unsigned Saturate, Integer and Accumulate into Words EVX 10000541 SP evmwlssiaaw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words VX 10000542 V vavgsh Vector Average Signed Halfword EVX 10000544 SP evmwhusiaaw Vector Multiply Word High Unsigned, Integer and Accumu- late into Words EVX 10000547 SP evmwhssfaaw Vector Multiply Word High Signed, Fractional and Accumu- late into Words EVX 10000548 SP evmwlumiaaw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words EVX 10000549 SP evmwlsmiaaw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words EVX 1000054C SP evmwhumiaaw Vector Multiply Word High Unsigned, Modulo, Integer and Accumulate into Words EVX 1000054D SP evmwhsmiaaw Vector Multiply Word High Signed, Modulo, Integer and Accumulate into Words EVX 1000054F SP evmwhsmfaaw Vector Multiply Word High Signed, Modulo, Fractional and Accumulate into Words EVX 10000553 SP evmwssfaa Vector Multiply Word Signed, Saturate, Fractional and Accu- mulate EVX 10000558 SP evmwumiaa Vector Multiply Word Unsigned, Modulo, Integer and Accu- mulate Appendix B. VLE Instruction Set Sorted by Opcode 739 Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000559 SP evmwsmiaa Vector Multiply Word Signed, Modulo, Integer and Accumu- late EVX 1000055B SP evmwsmfaa Vector Multiply Word Signed, Modulo, Fractional and Accu- mulate EVX 10000580 SP evmheusianw Vector Multiply Halfwords, Even, Unsigned, Saturate Integer and Accumulate Negative into Words VX 10000580 V vsubcuw Vector Subtract and Write Carry-Out Unsigned Word EVX 10000581 SP evmhessianw Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate Negative into Words VX 10000582 V vavgsw Vector Average Signed Word EVX 10000583 SP evmhessfanw Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 10000584 SP evmhousianw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 10000585 SP evmhossianw Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate Negative into Words EVX 10000587 SP evmhossfanw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 10000588 SP evmheumianw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 10000589 SP evmhesmianw Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate Negative into Words EVX 1000058B SP evmhesmfanw Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate Negative into Words EVX 1000058C SP evmhoumianw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 1000058D SP evmhosmianw Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate Negative into Words EVX 1000058F SP evmhosmfanw Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate Negative into Words EVX 100005A8 SP evmhegumian Vector Multiply Halfwords, Even, Guarded, Unsigned, Mod- ulo, Integer and Accumulate Negative EVX 100005A9 SP evmhegsmian Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 100005AB SP evmhegsmfan Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 100005AC SP evmhogumian Vector Multiply Halfwords, Odd, Guarded, Unsigned, Mod- ulo, Integer and Accumulate Negative EVX 100005AD SP evmhogsmian Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 100005AF SP evmhogsmfan Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 100005C0 SP evmwlusianw Vector Multiply Word Low Unsigned Saturate, Integer and Accumulate Negative into Words EVX 100005C1 SP evmwlssianw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative into Words EVX 100005C4 SP evmwhusianw Vector Multiply Word High Unsigned, Integer and Accumu- late Negative into Words EVX 100005C5 SP evmwhssianw Vector Multiply Word High Signed, Integer and Accumulate Negative into Words EVX 100005C7 SP evmwhssfanw Vector Multiply Word High Signed, Fractional and Accumu- late Negative into Words EVX 100005C8 SP evmwlumianw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 100005C9 SP evmwlsmianw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative into Words EVX 100005CC SP evmwhumianw Vector Multiply Word High Unsigned, Modulo, Integer and Accumulate Negative into Words 740 Power ISATM -- Book VLE Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 100005CD SP evmwhsmianw Vector Multiply Word High Signed, Modulo, Integer and Accumulate Negative into Words EVX 100005CF SP evmwhsmfanw Vector Multiply Word High Signed, Modulo, Fractional and Accumulate Negative into Words EVX 100005D3 SP evmwssfan Vector Multiply Word Signed, Saturate, Fractional and Accu- mulate Negative EVX 100005D8 SP evmwumian Vector Multiply Word Unsigned, Modulo, Integer and Accu- mulate Negative EVX 100005D9 SP evmwsmian Vector Multiply Word Signed, Modulo, Integer and Accumu- late Negative EVX 100005DB SP evmwsmfan Vector Multiply Word Signed, Modulo, Fractional and Accu- mulate Negative VX 10000600 V vsububs Vector Subtract Unsigned Byte Saturate VX 10000604 V mfvscr Move from Vector Status and Control Register VX 10000608 V vsum4ubs Vector Sum across Quarter Unsigned Byte Saturate VX 10000640 V vsubuhs Vector Subtract Unsigned Halfword Saturate VX 10000644 V mtvscr Move to Vector Status and Control Register VX 10000648 V vsum4shs Vector Sum across Quarter Signed Halfword Saturate VX 10000680 V vsubuws Vector Subtract Unsigned Word Saturate VX 10000688 V vsum2sws Vector Sum across Half Signed Word Saturate VX 10000700 V vsubsbs Vector Subtract Signed Byte Saturate VX 10000708 V vsum4sbs Vector Sum across Quarter Signed Byte Saturate VX 10000740 V vsubshs Vector Subtract Signed Halfword Saturate VX 10000780 V vsubsws Vector Subtract Signed Word Saturate VX 10000788 V vsumsws Vector Sum across Signed Word Saturate D8 18000000 VLE e_lbzu Load Byte and Zero with Update D8 18000100 VLE e_lhzu Load Halfword and Zero with Update D8 18000200 VLE e_lwzu Load Word and Zero with Update D8 18000300 VLE e_lhau Load Halfword Algebraic with Update D8 18000400 VLE e_stbu Store Byte with Update D8 18000500 VLE e_sthu Store Halfword with Update D8 18000600 VLE e_stwu Store word with Update D8 18000800 VLE e_lmw Load Multiple Word D8 18000900 VLE e_stmw Store Multiple Word SCI8 18008000 SR VLE e_addi[.] Add Scaled Immediate SCI8 18009000 SR VLE e_addic[.] Add Scaled Immediate Carrying SCI8 1800A000 VLE e_mulli Multiply Low Scaled Immediate SCI8 1800A800 VLE e_cmpi Compare Scaled Immediate Word SCI8 1800B000 SR VLE e_subfic[.] Subtract From Scaled Immediate Carrying SCI8 1800C000 SR VLE e_andi[.] AND Scaled Immediate SCI8 1800D000 SR VLE e_ori[.] OR Scaled Immediate SCI8 1800E000 SR VLE e_xori[.] XOR Scaled Immediate SCI8 1880A800 VLE e_cmpli Compare Logical Scaled Immediate Word D 1C000000 VLE e_add16i Add Immediate OIM5 2000---- VLE se_addi Add Immediate Short Form OIM5 2200---- VLE se_cmpli Compare Logical Immediate Word OIM5 2400---- SR VLE se_subi[.] Subtract Immediate IM5 2A00---- VLE se_cmpi Compare Immediate Word Short Form IM5 2C00---- VLE se_bmaski Bit Mask Generate Immediate IM5 2E00---- VLE se_andi AND Immediate Short Form D 30000000 VLE e_lbz Load Byte and Zero D 34000000 VLE e_stb Store Byte D 38000000 VLE e_lha Load Halfword Algebraic RR 4000---- VLE se_srw Shift Right Word RR 4100---- SR VLE se_sraw Shift Right Algebraic Word RR 4200---- VLE se_slw Shift Left Word RR 4400---- VLE se_or OR SHort Form RR 4500---- VLE se_andc AND with Complement Short Form RR 4600---- SR VLE se_and[.] AND Short Form IM7 4800---- VLE se_li Load Immediate Short Form D 50000000 VLE e_lwz Load Word and Zero Appendix B. VLE Instruction Set Sorted by Opcode 741 Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 D 54000000 VLE e_stw Store Word D 58000000 VLE e_lhz Load Halfword and Zero D 5C000000 VLE e_sth Store Halfword IM5 6000---- VLE se_bclri Bit Clear Immediate IM5 6200---- VLE se_bgeni Bit Generate Immediate IM5 6400---- VLE se_bseti Bit Set Immediate IM5 6600---- VLE se_btsti Bit Test Immediate IM5 6800---- VLE se_srwi Shift Right Word Immediate Short Form IM5 6A00---- SR VLE se_srawi Shift Right Algebraic Immediate IM5 6C00---- VLE se_slwi Shift Left Word Immediate Short Form LI20 70000000 VLE e_li Load Immediate I16A 70008800 SR VLE e_add2i. Add (2 operand) Immediate and Record I16A 70009000 VLE e_add2is Add (2 operand) Immediate Shifted IA16 70009800 VLE e_cmp16i Compare Immediate Word I16A 7000A000 VLE e_mull2i Multiply (2 operand) Low Immediate I16A 7000A800 VLE e_cmpl16i Compare Logical Immediate Word IA16 7000B000 VLE e_cmph16i Compare Halfword Immediate IA16 7000B800 VLE e_cmphl16i Compare Halfword Logical Immediate I16L 7000C000 VLE e_or2i OR (2operand) Immediate I16L 7000C800 SR VLE e_and2i. AND (2 operand) Immediate I16L 7000D000 VLE e_or2is OR (2 operand) Immediate Shifted I16L 7000E000 VLE e_lis Load Immediate Shifted I16L 7000E800 SR VLE e_and2is. AND (2 operand) Immediate Shifted M 74000000 VLE e_rlwimi Rotate Left Word Immediate then Mask Insert M 74000001 VLE e_rlwinm Rotate Left Word Immediate then AND with Mask BD24 78000000 VLE e_b[l] Branch [and Link] BD15 7A000000 CT VLE e_bc[l] Branch Conditional [and Link] X 7C000000 B cmp Compare X 7C000008 B tw Trap Word X 7C00000C V lvsl Load Vector for Shift Left Indexed X 7C00000E V lvebx Load Vector Element Byte Indexed XO 7C000010 SR B subfc[o][.] Subtract From Carrying XO 7C000012 SR 64 mulhdu[.] Multiply High Doubleword Unsigned XO 7C000014 B addc[o][.] Add Carrying XO 7C000016 SR B mulhwu[.] Multiply High Word Unsigned X 7C00001C VLE e_cmph Compare Halfword A 7C00001E B.in isel Integer Select XL 7C000020 VLE e_mcrf Move CR Field XFX 7C000026 B mfcr Move From Condition Register X 7C000028 B lwarx Load Word and Reserve Indexed X 7C00002A 64 ldx Load Doubleword Indexed X 7C00002C E icbt Instruction Cache Block Touch X 7C00002E B lwzx Load Word and Zero Indexed X 7C000030 SR B slw[.] Shift Left Word X 7C000034 SR B cntlzw[.] Count Leading Zeros Word X 7C000036 SR 64 sld[.] Shift Left Doubleword X 7C000038 SR B and[.] AND X 7C00003A P E.PD ldepx Load Doubleword by External Process ID Indexed X 7C00003E P E.PD lwepx Load Word by External Process ID Indexed X 7C000040 B cmpl Compare Logical XL 7C000042 VLE e_crnor Condition Register NOR X 7C00004C V lvsr Load Vector for Shift Right Indexed X 7C00004E V lvehx Load Vector Element Halfword Indexed XO 7C000050 SR B subf[o][.] Subtract From X 7C00005C VLE e_cmphl Compare Halfword Logical X 7C00006A 64 ldux Load Doubleword with Update Indexed X 7C00006C B dcbst Data Cache Block Store X 7C00006E B lwzux Load Word and Zero with Update Indexed X 7C000070 SR VLE e_slwi[.] Shift Left Word Immediate X 7C000074 SR 64 cntlzd[.] Count Leading Zeros Doubleword X 7C000078 SR B andc[.] AND with Complement 742 Power ISATM -- Book VLE Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 7C00007C WT wait Wait X 7C000088 64 td Trap Doubleword X 7C00008E V lvewx Load Vector Element Word Indexed XO 7C000092 SR 64 mulhd[.] Multiply High Doubleword XO 7C000096 SR B mulhw[.] Multiply High Word X 7C0000A6 P B mfmsr Move From Machine State Register X 7C0000A8 64 ldarx Load Doubleword and Reserve Indexed X 7C0000AC B dcbf Data Cache Block Flush X 7C0000AE B lbzx Load Byte and Zero Indexed X 7C0000BE P E.PD lbepx Load Byte by External Process ID Indexed X 7C0000CE V lvx[l] Load Vector Indexed [Last] X 7C0000D0 SR B neg[o][.] Negate X 7C0000EE B lbzux Load Byte and Zero with Update Indexed X 7C0000F4 B popcntb Population Count Bytes X 7C0000F8 SR B nor[.] NOR X 7C0000FE P E.PD dcbfep Data Cache Block Flush by External Process ID XL 7C000102 VLE e_crandc Condition Register AND with Completement X 7C000106 P E wrtee Write MSR External Enable X 7C00010C M ECL dcbtstls Data Cache Block Touch for Store and Lock Set VX 7C00010E V stvebx Store Vector Element Byte Indexed XO 7C000110 SR B subfe[o][.] Subtract From Extended XO 7C000114 SR B adde[o][.] Add Extended EVX 7C00011D P E.PD evlddepx Vector Load Doubleword into Doubleword by External Pro- cess ID Indexed XFX 7C000120 B mtcrf Move To Condition Register Fields X 7C000124 P E mtmsr Move To Machine State Register X 7C00012A 64 stdx Store Doubleword Indexed X 7C00012D B stwcx. Store Word Conditional Indexed X 7C00012E B stwx Store Word Indexed X 7C00013A P E.PD stdepx Store Doubleword by External Process ID Indexed X 7C00013E P E.PD stwepx Store Word by External Process ID Indexed X 7C000146 P E wrteei Write MSR External Enable Immediate X 7C00014C M ECL dcbtls Data Cache Block Touch and Lock Set VX 7C00014E V stvehx Store Vector Element Halfword Indexed X 7C00016A 64 stdux Store Doubleword with Update Indexed X 7C00016E B stwux Store Word with Update Indexed XL 7C000182 VLE e_crxor Condition Register XOR VX 7C00018E V stvewx Store Vector Element Word Indexed XO 7C000190 SR B subfze[o][.] Subtract From Zero Extended XO 7C000194 SR B addze[o][.] Add to Zero Extended X 7C00019C P E.PC msgsnd Message Send EVX 7C00019D P E.PD evstddepx Vector Store Doubleword into Doubleword by External Pro- cess ID Indexed X 7C0001AD 64 stdcx. Store Doubleword Conditional Indexed X 7C0001AE B stbx Store Bye Indexed X 7C0001BE P E.PD stbepx Store Byte by External Process ID Indexed XL 7C0001C2 VLE e_crnand Condition Register NAND X 7C0001CC M ECL icblc Instruction Cache Block Lock Clear VX 7C0001CE V stvx[l] Store Vector Indexed [Last] XO 7C0001D0 SR B subfme[o][.] Subtract From Minus One Extended XO 7C0001D2 SR 64 mulld[o][.] Multiply Low Doubleword XO 7C0001D4 SR B addme[o][.] Add to Minus One Extended XO 7C0001D6 SR B mullw[o][.] Multiply Low Word X 7C0001DC P E.PC msgclr Message Clear X 7C0001EC B dcbtst Data Cache Block Touch for Store X 7C0001EE B stbux Store Byte with Update Indexed X 7C0001FE P E.PD dcbtstep Data Cache Block Touch for Store by External Process ID XL 7C000202 VLE e_crand Condition Register AND XFX 7C000206 P E mfdcrx Move From Device Control Register Indexed X 7C00020E P E.PD lvepxl Load Vector by External Process ID Indexed LRU XO 7C000214 B add[o][.] Add Appendix B. VLE Instruction Set Sorted by Opcode 743 Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 7C00022C B dcbt Data Cache Block Touch X 7C00022E B lhzx Load Halfword and Zero Indexed X 7C000230 SR VLE e_rlw[.] Rotate Left Word X 7C000238 SR B eqv[.] Equivalent X 7C00023E P E.PD lhepx Load Halfword by External Process ID Indexed XL 7C000242 VLE e_creqv Condition Register Equivalent XFX 7C000246 P E mfdcrux Move From Device Control Register User-mode Indexed X 7C00024E P E.PD lvepx Load Vector by External Process ID Indexed X 7C00026E B lhzux Load Halfword and Zero with Update Indexed X 7C000270 SR VLE e_rlwi[.] Rotate Left Word Immediate D 7C000278 SR B xor[.] XOR X 7C00027E P E.PD dcbtep Data Cache Block Touch by External Process ID XFX 7C000286 P E mfdcr Move From Device Control Register X 7C00028C P E.CD dcread Data Cache Read XFX 7C00029C O E.PM mfpmr Move From Performance Monitor Register XFX 7C0002A6 O B mfspr Move From Special Purpose Register X 7C0002AA 64 lwax Load Word Algebraic Indexed X 7C0002AE B lhax Load Halfword Algebraic Indexed X 7C0002EA 64 lwaux Load Word Algebraic with Update Indexed X 7C0002EE B lhaux Load Halfword Algebraic with Update Indexed X 7C000306 P E mtdcrx Move To Device Control Register Indexed X 7C00030C M ECL dcblc Data Cache Block Lock Clear X 7C00032E B sthx Store Halfword Indexed X 7C000338 SR B orc[.] OR with Complement X 7C00033E P E.PD sthepx Store Halfword by External Process ID Indexed XL 7C000342 VLE e_crorc Condition Register OR with Complement X 7C000346 E mtdcrux Move To Device Control Register User-mode Indexed X 7C00036E B sthux Store Halfword with Update Indexed X 7C000378 SR B or[.] OR XL 7C000382 VLE e_cror Condition Register OR XFX 7C000386 P E mtdcr Move To Device Control Register X 7C00038C P E.CI dci Data Cache Invalidate XO 7C000392 SR 64 divdu[o][.] Divide Doubleword Unsigned XO 7C000396 SR B divwu[o][.] Divide Word Unsigned XFX 7C00039C O E.PM mtpmr Move To Performance Monitor Register XFX 7C0003A6 O B mtspr Move To Special Purpose Register X 7C0003AC P E dcbi Data Cache Block Invalidate X 7C0003B8 SR B nand[.] NAND X 7C0003CC M ECL icbtls Instruction Cache Block Touch and Lock Set X 7C0003CC P E.CD dcread Data Cache Read XO 7C0003D2 SR 64 divd[o][.] Divide Doubleword XO 7C0003D6 SR B divw[o][.] Divide Word X 7C000400 B mcrxr Move To Condition Register From XER X 7C00042A MA lswx Load String Word Indexed X 7C00042C B lwbrx Load Word Byte-Reversed Indexed X 7C000430 SR B srw[.] Shift Right Word X 7C000436 SR 64 srd[.] Shift Right Doubleword X 7C00046C P E tlbsync TLB Synchronize X 7C000470 SR VLE e_srwi[.] Shift Right Word Immediate X 7C0004AA MA lswi Load String Word Immediate X 7C0004AC B sync Synchronize X 7C0004BE P E.PD lfdepx Load Floating-Point Double by External Process ID Indexed X 7C00052A MA stswx Store String Word Indexed X 7C00052C B stwbrx Store Word Byte-Reversed Indexed X 7C0005AA MA stswi Store String Word Immediate X 7C0005BE P E.PD stfdepx Store Floating-Point Double by External Process ID Indexed X 7C0005EC E dcba Data Cache Block Allocate X 7C00060E P E.PD stvepxl Store Vector by External Process ID Indexed LRU X 7C000624 P E tlbivax TLB Invalidate Virtual Address Indexed X 7C00062C B lhbrx Load Halfword Byte-Reversed Indexed X 7C000630 SR B sraw[.] Shift Right Algebraic Word 744 Power ISATM -- Book VLE Version 2.04 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 7C000634 SR 64 srad[.] Shift Right Algebraic Doubleword X 7C00064E P E.PD stvepx Store Vector by External Process ID Indexed X 7C000670 SR B srawi[.] Shift Right Algebraic Word Immediate X 7C000674 SR 64 sradi[.] Shift Right Algebraic Doubleword Immediate XFX 7C0006AC E mbar Memory Barrier X 7C000724 P E tlbsx TLB Search Indexed X 7C00072C B sthbrx Store Halfword Byte-Reversed Indexed X 7C000734 SR B extsh[.] Extend Sign Halfword X 7C000764 P E tlbre TLB Read Entry X 7C000774 SR B extsb[.] Extend Shign Byte X 7C00078C P E.CI ici Instruction Cache Invalidate X 7C0007A4 P E tlbwe TLB Write Entry X 7C0007AC B icbi Instruction Cache Block Invalidate X 7C0007B4 SR 64 extsw[.] Extend Sign Word X 7C0007BE P E.PD icbiep Instruction Cache Block Invalidate by External Process ID X 7C0007CC P E.CD icread Instruction Cache Read X 7C0007EC B dcbz Data Cache Block set to Zero X 7C0007FE P E.PD dcbzep Data Cache Block set to Zero by External Process ID XFX 7C100026 B mfocrf Move From One Condition Register Field XFX 7C100120 B mtocrf Move To One Condition Register Field SD4 8000---- VLE se_lbz Load Byte and Zero Short Form SD4 9000---- VLE se_stb Store Byte Short Form SD4 A000---- VLE se_lhz Load Halfword and Zero Short Form SD4 B000---- VLE se_sth Store Halfword SHort Form SD4 C000---- VLE se_lwz Load Word and Zero Short Form SD4 D000---- VLE se_stw Store Word Short Form BD8 E000---- VLE se_bc Branch Conditional Short Form BD8 E800---- VLE se_b[l] Branch [and Link] 1 See the key to the mode dependency and privilege column below and the key to the category column in Section 1.3.5 of Book I. 2For 16-bit instructions, the "Opcode" column represents the 16-bit hexadecimal instruction encoding with the opcode and extended opcode in the corresponding fields in the instruction, and with 0's in bit positions which are not opcode bits; dashes are used following the opcode to indicate the form is a 16-bit instruction. For 32-bit instructions, the "Opcode" column represents the 32-bit hexadecimal instruction encoding with the opcode and extended opcode in the corresponding fields in the instruction, and with 0's in bit positions which are not opcode bits. Mode Dependency and Privilege Abbreviations Except as described below and in Section 1.10.3, "Effective Address Calculation", in Book I, all instructions are inde- pendent of whether the processor is in 32-bit or 64-bit mode. Mode Dep. Description CT If the instruction tests the Count Register, it tests the low-order 32 bits in 32-bit mode and all 64 bits in 64-bit mode. SR The setting of status registers (such as XER and CR0) is mode-dependent. 32 The instruction must be executed only in 32- bit mode. 64 The instruction must be executed only in 64- bit mode. Key to Privilege Column Priv. Description P Denotes a privileged instruction. Appendix B. VLE Instruction Set Sorted by Opcode 745 Version 2.04 Priv. Description O Denotes an instruction that is treated as priv- ileged or nonprivileged (or hypervisor, for mtspr), depending on the SPR number. M Denotes an instruction that is treated as priv- ileged or nonprivileged, depending on the value of the UCLE bit of the MSR. H Denotes an instruction that can be executed only in hypervisor state. 746 Power ISATM -- Book VLE Version 2.04 Appendices: Power ISA Books I-III Appendices Appendices: Power ISA Books I-III Appendices 747 Version 2.04 748 Power ISATM -- Book Appendices Version 2.04 Appendix A. Incompatibilities with the POWER Architecture This appendix identifies the known incompatibilities that In general, the incompatibilities identified here are must be managed in the migration from the POWER those that affect a POWER application program; incom- Architecture to the Power ISA. Some of the incompati- patibilities for instructions that can be used only by bilities can, at least in principle, be detected by the pro- POWER system programs are not necessarily dis- cessor, which could trap and let software simulate the cussed. POWER operation. Others cannot be detected by the processor even in principle. A.1 New Instructions, Formerly In several cases the Power ISA assumes that reserved fields in POWER instructions indeed contain zero. The Privileged Instructions cases include the following. 1 bclr[l] and bcctr[l] assume that bits 19:20 in the Instructions new to Power ISA typically use opcode val- POWER instructions contain zero. ues (including extended opcode) that are illegal in 1 cmpi, cmp, cmpli, and cmpl assume that bit 10 in POWER. A few instructions that are privileged in the POWER instructions contains zero. POWER (e.g., dclz, called dcbz in Power ISA) have 1 mtspr and mfspr assume that bits 16:20 in the been made nonprivileged in Power ISA. Any POWER POWER instructions contain zero. program that executes one of these now-valid or now- 1 mtcrf and mfcr assume that bit 11 in the POWER nonprivileged instructions, expecting to cause the sys- instructions is contains zero. tem illegal instruction error handler or the system privi- 1 Synchronize assumes that bits 9:10 in the POWER leged instruction error handler to be invoked, will not instruction (dcs) contain zero. (This assumption execute correctly on Power ISA. provides compatibility for application programs, but not necessarily for operating system programs; A.2 Newly Privileged see Section A.22.) 1 mtmsr assumes that bit 15 in the POWER instruc- Instructions tion contains zero. The following instructions are nonprivileged in POWER but privileged in Power ISA. A.4 Reserved Bits in Registers mfmsr Both POWER and Power ISA permit software to write mfsr any value to these bits. However in POWER reading such a bit always returns 0, while in Power ISA reading it may return either 0 or the value that was last written A.3 Reserved Fields in to it. Instructions These fields are shown with "/"s in the instruction lay- A.5 Alignment Check outs. In both POWER and Power ISA these fields are ignored by the processor. The Power ISA states that The POWER MSR AL bit (bit 24) is no longer sup- these fields must contain zero. The POWER Architec- ported; the corresponding Power ISA MSR bit, bit 56, is ture lacks such a statement, but it is expected that reserved. The low-order bits of the EA are always used. essentially all POWER programs contain zero in these (Notice that the value 0 -- the normal value for a fields. reserved bit --- means "ignore the low-order EA bits" in POWER, and the value 1 means "use the low-order EA Appendix A. Incompatibilities with the POWER Architecture 749 Version 2.04 bits".) POWER-compatible operating system code will Power ISA shows these bits as "z", "a", or "t". The "z" probably write the value 1 to this bit. bits are ignored, as in POWER. However, the "a" and "t" bits can be used by software to provide a hint about how the branch is likely to behave. If a POWER pro- A.6 Condition Register gram has the "wrong" value for these bits, the program will produce the same results as on POWER but perfor- The following instructions specify a field in the CR mance may be affected. explicitly (via the BF field) and also, in POWER, use bit 31 as the Record bit. In Power ISA, bit 31 is a reserved field for these instructions and is ignored by the proces- A.9 BH Field sor. In POWER, if bit 31 contains 1 the instructions exe- cute normally (i.e., as if the bit contained 0) except as Bits 19:20 of the Branch Conditional to Link Register follows: and Branch Conditional to Count Register instructions are reserved in POWER but are defined as a branch cmp CR0 is undefined if Rc=1 and BF0 hint (BH) field in Power ISA. Because these bits are cmpl CR0 is undefined if Rc=1 and BF0 hints, they may affect performance but do not affect the mcrxr CR0 is undefined if Rc=1 and BF0 results of executing the instruction. fcmpu CR1 is undefined if Rc=1 fcmpo CR1 is undefined if Rc=1 mcrfs CR1 is undefined if Rc=1 and BF1 A.10 Branch Conditional to Count Register A.7 LK and Rc Bits For the case in which the Count Register is decre- mented and tested (i.e., the case in which BO2=0), For the instructions listed below, if bit 31 (LK or Rc bit in POWER specifies only that the branch target address POWER) contains 1, in POWER the instruction exe- is undefined, with the implication that the Count Regis- cutes as if the bit contained 0 except as follows: if ter, and the Link Register if LK=1, are updated in the LK=1, the Link Register is set (to an undefined value, normal way. Power ISA specifies that this instruction except for svc); if Rc=1, Condition Register Field 0 or 1 form is invalid. is set to an undefined value. In Power ISA, bit 31 is a reserved field for these instructions and is ignored by the processor. A.11 System Call Power ISA instructions for which bit 31 is the LK bit in POWER: There are several respects in which Power ISA is incompatible with POWER for System Call instructions sc (svc in POWER) -- which in POWER are called Supervisor Call instruc- the Condition Register Logical instructions tions. mcrf isync (ics in POWER) 1 POWER provides a version of the Supervisor Call instruction (bit 30 = 0) that allows instruction fetch- Power ISA instructions for which bit 31 is the Rc bit in ing to continue at any one of 128 locations. It is POWER: used for "fast SVCs". Power ISA provides no such fixed-point X-form Load and Store instructions version: if bit 30 of the instruction is 0 the instruc- fixed-point X-form Compare instructions tion form is invalid. the X-form Trap instruction 1 POWER provides a version of the Supervisor Call mtspr, mfspr, mtcrf, mcrxr, mfcr, mtocrf, mfocrf instruction (bits 30:31 = 0b11) that resumes floating-point X-form Load and Store instructions instruction fetching at one location and sets the floating-point Compare instructions Link Register to the address of the next instruction. mcrfs Power ISA provides no such version: bit 31 is a dcbz (dclz in POWER) reserved field. 1 For POWER, information from the MSR is saved in A.8 BO Field the Count Register. For Power ISA this information is saved in SRR1. POWER shows certain bits in the BO field -- used by 1 In POWER bits 16:19 and 27:29 of the instruction Branch Conditional instructions -- as "x". Although the comprise defined instruction fields or a portion POWER Architecture does not say how these bits are thereof, while in Power ISA these bits comprise to be interpreted, they are in fact ignored by the proces- reserved fields. sor. 750 Power ISATM -- Book Appendices Version 2.04 1 In POWER bits 20:26 of the instruction comprise a 1 If the EA is not word-aligned, in Power ISA either portion of the SV field, while in Power ISA these an Alignment exception occurs or the addressed bits comprise the LEV field. bytes are loaded, while in POWER an Alignment interrupt occurs if MSRAL=1 (the low-order two bits 1 POWER saves the low-order 16 bits of the instruc- of the EA are ignored if MSRAL=0). tion, in the Count Register. Power ISA does not save them. 1 In Power ISA the instruction may be interrupted by a system-caused interrupt, while in POWER the 1 The settings of MSR bits by the associated inter- instruction cannot be thus interrupted. rupt differ between POWER and Power ISA; see POWER Processor Architecture and Book III. A.16 Move Assist Instructions A.12 Fixed-Point Exception There are several respects in which Power ISA is Register (XER) incompatible with POWER for Move Assist instructions. 1 In Power ISA an lswx instruction with zero length Bits 48:55 of the XER are reserved in Power ISA, while leaves the contents of RT undefined (if RTRA and in POWER the corresponding bits (16:23) are defined RTRB) or is an invalid instruction form (if RT=RA and contain the comparison byte for the lscbx instruc- or RT=RB), while in POWER the corresponding tion (which Power ISA lacks). instruction (lsx) is a no-op in these cases. 1 In Power ISA an lswx instruction with zero length A.13 Update Forms of Storage may alter the Reference bit, and a stswx instruc- tion with zero length may alter the Reference and Access Instructions Change bits, while in POWER the corresponding instructions (lsx and stsx) do not alter the Refer- Power ISA requires that RA not be equal to either RT ence and Change bits in this case. (fixed-point Load only) or 0. If the restriction is violated the instruction form is invalid. POWER permits these 1 In Power ISA a Move Assist instruction may be cases, and simply avoids saving the EA. interrupted by a system-caused interrupt, while in POWER the instruction cannot be thus interrupted. A.14 Multiple Register Loads A.17 Move To/From SPR Power ISArequires that RA, and RB if present in the instruction format, not be in the range of registers to be There are several respects in which Power ISA is loaded, while POWER permits this and does not alter incompatible with POWER for Move To/From Special RA or RB in this case. (The Power ISA restriction Purpose Register instructions. applies even if RA=0, although there is no obvious ben- 1 The SPR field is ten bits long in Power ISA, but efit to the restriction in this case since RA is not used to only five in POWER (see also Section A.3, compute the effective address if RA=0.) If the Power "Reserved Fields in Instructions"). ISA restriction is violated, either the system illegal instruction error handler is invoked or the results are 1 mfspr can be used to read the Decrementer in boundedly undefined. The instructions affected are: problem state in POWER, but only in privileged state in Power ISA. lmw (lm in POWER) lswi (lsi in POWER) 1 If the SPR value specified in the instruction is not lswx (lsx in POWER) one of the defined values, POWER behaves as fol- lows. For example, an lmw instruction that loads all 32 regis- - If the instruction is executed in problem state ters is valid in POWER but is an invalid form in Power and SPR0=1, a Privileged Instruction type ISA. Program interrupt occurs. No architected reg- isters are altered except those set by the inter- rupt. A.15 Load/Store Multiple - Otherwise no architected registers are altered. Instructions In this same case, Power ISA behaves as follows. - If the instruction is executed in problem state There are two respects in which Power ISA is incom- and spr0=1, either an Illegal Instruction type patible with POWER for Load Multiple and Store Multi- Program interrupt or a Privileged Instruction ple instructions. type Program interrupt occurs. No architected Appendix A. Incompatibilities with the POWER Architecture 751 Version 2.04 registers are altered except those set by the 1 dclz saves the EA in RA (if RA0) while dcbz does interrupt. not. - Otherwise either an Illegal Instruction type 1 dclz is privileged while dcbz is not. Program interrupt occurs (in which case no architected registers are altered except those set by the interrupt) or the results are bound- A.22 Synchronization edly undefined (or possibly undefined, for mtspr; see Book III). The Synchronize instruction (called dcs in POWER) and the isync instruction (called ics in POWER) cause more pervasive synchronization in Power ISA than in A.18 Effects of Exceptions on POWER. However, unlike dcs, Synchronize does not wait until data cache block writes caused by preceding FPSCR Bits FR and FI instructions have been performed in main storage. Also, Synchronize has an L field while dcs does not, For the following cases, POWER does not specify how and some uses of the instruction by the operating sys- FR and FI are set, while Power ISA preserves them for tem require L=2. (The L field corresponds to Invalid Operation Exception caused by a Compare reserved bits in dcs and hence is expected to be zero instruction, sets FI to 1 and FR to an undefined value in POWER programs; see Section A.3.) for disabled Overflow Exception, and clears them other- wise. 1 Invalid Operation Exception (enabled or disabled) 1 Zero Divide Exception (enabled or disabled) A.23 Move To Machine State 1 Disabled Overflow Exception Register Instruction The mtmsr instruction has an L field in Power ISA but A.19 Store Floating-Point Sin- not in POWER. The function of the variant of mtmsr with L=1 differs from the function of the instruction in gle Instructions the POWER architecture in the following ways. There are several respects in which Power ISA is 1 In Power ISA, this variant of mtmsr modifies only incompatible with POWER for Store Floating-Point Sin- the EE and RI bits of the MSR, while in the gle instructions. POWER mtmsr modifies all bits of the MSR. 1 POWER uses FPSCRUE to help determine 1 This variant of mtmsr is execution synchronizing in whether denormalization should be done, while Power ISA but is context synchronizing in POWER. Power ISA does not. Using FPSCRUE is in fact (The POWER architecture lacks Power ISA's dis- incorrect: if FPSCRUE=1 and a denormalized sin- tinction between execution synchronization and gle-precision number is copied from one storage context synchronization. The statement in the location to another by means of lfs followed by POWER architecture specification that mtmsr is stfs, the two "copies" may not be the same. "synchronizing" is equivalent to stating that the instruction is context synchronizing.) 1 For an operand having an exponent that is less than 874 (unbiased exponent less than -149), Also, mtmsr is optional in Power ISA but required in POWER stores a zero (if FPSCRUE=0) while POWER. Power ISA stores an undefined value. A.24 Direct-Store Segments A.20 Move From FPSCR POWER's direct-store segments are not supported in POWER defines the high-order 32 bits of the result of Power ISA. mffs to be 0xFFFF_FFFF, while Power ISA specifies that they are undefined. A.25 Segment Register A.21 Zeroing Bytes in the Data Manipulation Instructions Cache The definitions of the four Segment Register Manipula- tion instructions mtsr, mtsrin, mfsr, and mfsrin differ The dclz instruction of POWER and the dcbz instruc- in two respects between POWER and Power ISA. tion of Power ISA have the same opcode. However, the Instructions similar to mtsrin and mfsrin are called functions differ in the following respects. mtsri and mfsri in POWER. 1 dclz clears a line while dcbz clears a block. 752 Power ISATM -- Book Appendices Version 2.04 privilege: mfsr and mfsri are problem state instruc- tions in POWER, while mfsr and mfsrin A.29 Timing Facilities are privileged in Power ISA. function: the "indirect" instructions (mtsri and A.29.1 Real-Time Clock mfsri) in POWER use an RA register in computing the Segment Register number, The POWER Real-Time Clock is not supported in and the computed EA is stored into RA (if Power ISA. Instead, Power ISA provides a Time Base. RA0 and RART), while in Power ISA Both the RTC and the TB are 64-bit Special Purpose mtsrin and mfsrin have no RA field and Registers, but they differ in the following respects. the EA is not stored. 1 The RTC counts seconds and nanoseconds, while mtsr, mtsrin (mtsri), and mfsr have the same the TB counts "ticks". The ticking rate of the TB is opcodes in Power ISA as in POWER. mfsri (POWER) implementation-dependent. and mfsrin (Power ISA) have different opcodes. 1 The RTC increments discontinuously: 1 is added to RTCU when the value in RTCL passes Also, the Segment Register Manipulation instructions 999_999_999. The TB increments continuously: 1 are required in POWER whereas they are optional in is added to TBU when the value in TBL passes Power ISA. 0xFFFF_FFFF. 1 The RTC is written and read by the mtspr and mfspr instructions, using SPR numbers that A.26 TLB Entry Invalidation denote the RTCU and RTCL. The TB is written and read by the same instructions using different SPR The tlbi instruction of POWER and the tlbie instruction numbers. of Power ISA have the same opcode. However, the 1 The SPR numbers that denote POWER's RTCL functions differ in the following respects. and RTCU are invalid in Power ISA. 1 tlbi computes the EA as (RA|0) + (RB), while tlbie 1 The RTC is guaranteed to increment at least once lacks an RA field and computes the EA and related in the time required to execute ten Add Immediate information as (RB). instructions. No analogous guarantee is made for 1 tlbi saves the EA in RA (if RA0), while tlbie lacks the TB. an RA field and does not save the EA. 1 Not all bits of RTCL need be implemented, while 1 For tlbi the high-order 36 bits of RB are used in all bits of the TB must be implemented. computing the EA, while for tlbie these bits contain additional information that is not directly related to A.29.2 Decrementer the EA. 1 tlbie has an L field, while tlbi does not. The Power ISA Decrementer differs from the POWER Decrementer in the following respects. Also, tlbi is required in POWER whereas tlbie is optional in Power ISA. 1 The Power ISA DEC decrements at the same rate that the TB increments, while the POWER DEC decrements every nanosecond (which is the same A.27 Alignment Interrupts rate that the RTC increments). 1 Not all bits of the POWER DEC need be imple- Placing information about the interrupting instruction mented, while all bits of the Power ISA DEC must into the DSISR and the DAR when an Alignment inter- be implemented. rupt occurs is optional in Power ISA but required in 1 The interrupt caused by the DEC has its own inter- POWER. rupt vector location in Power ISA, but is considered an External interrupt in POWER. A.28 Floating-Point Interrupts POWER uses MSR bit 20 to control the generation of interrupts for floating-point enabled exceptions, and Power ISA uses the corresponding MSR bit, bit 52, for the same purpose. However, in Power ISA this bit is part of a two-bit value that controls the occurrence, pre- cision, and recoverability of the interrupt, while in POWER this bit is used independently to control the occurrence of the interrupt (in POWER all floating-point interrupts are precise). Appendix A. Incompatibilities with the POWER Architecture 753 Version 2.04 A.30 Deleted Instructions MNEM PRI XOP The following instructions are part of the POWER abs 31 360 Architecture but have been dropped from the Power clcs 31 531 ISA. clf 31 118 cli (*) 31 502 abs Absolute dclst 31 630 clcs Cache Line Compute Size div 31 331 clf Cache Line Flush divs 31 363 cli (*) Cache Line Invalidate doz 31 264 dclst Data Cache Line Store dozi 09 - div Divide lscbx 31 277 divs Divide Short maskg 31 29 doz Difference Or Zero maskir 31 541 dozi Difference Or Zero Immediate mfsri 31 627 lscbx Load String And Compare Byte Indexed mul 31 107 maskg Mask Generate nabs 31 488 maskir Mask Insert From Register rac (*) 31 818 mfsri Move From Segment Register Indirect rfi (*) 19 50 mul Multiply rfsvc 19 82 nabs Negative Absolute rlmi 22 - rac (*) Real Address Compute rrib 31 537 rfi (*) Return From Interrupt sle 31 153 rfsvc Return From SVC sleq 31 217 rlmi Rotate Left Then Mask Insert sliq 31 184 rrib Rotate Right And Insert Bit slliq 31 248 sle Shift Left Extended sllq 31 216 sleq Shift Left Extended With MQ slq 31 152 sliq Shift Left Immediate With MQ sraiq 31 952 slliq Shift Left Long Immediate With MQ sraq 31 920 sllq Shift Left Long With MQ sre 31 665 slq Shift Left With MQ srea 31 921 sraiq Shift Right Algebraic Immediate With MQ sreq 31 729 sraq Shift Right Algebraic With MQ sriq 31 696 sre Shift Right Extended srliq 31 760 srea Shift Right Extended Algebraic srlq 31 728 sreq Shift Right Extended With MQ srq 31 664 sriq Shift Right Immediate With MQ srliq Shift Right Long Immediate With MQ (*) This instruction is privileged. srlq Shift Right Long With MQ srq Shift Right With MQ Assembler Note It might be helpful to current software writers for the (*) This instruction is privileged. Assembler to flag the discontinued POWER Note: Many of these instructions use the MQ register. instructions. The MQ is not defined in the Power ISA. A.31 Discontinued Opcodes The opcodes listed below are defined in the POWER Architecture but have been dropped from the Power ISA. The list contains the POWER mnemonic (MNEM), the primary opcode (PRI), and the extended opcode (XOP) if appropriate. The corresponding instructions are reserved in Power ISA. 754 Power ISATM -- Book Appendices Version 2.04 A.32 POWER2 Compatibility The POWER2 instruction set is a superset of the section, as are the new POWER2 instructions that are POWER instruction set. Some of the instructions added not included in the Power ISA. for POWER2 are included in the Power ISA. Those that Other incompatibilities are also listed. have been renamed in the Power ISA are listed in this A.32.1 Cross-Reference for the second column of the table: the remainder of the line gives the Power ISA mnemonic and the page on Changed POWER2 Mnemonics which the instruction is described, as well as the instruction names. The following table lists the new POWER2 instruction mnemonics that have been changed in the Power ISA POWER2 mnemonics that have not changed are not User Instruction Set Architecture, sorted by POWER2 listed. mnemonic. To determine the Power ISA mnemonic for one of these POWER2 mnemonics, find the POWER2 mnemonic in POWER2 Power ISA Page Mnemonic Instruction Mnemonic Instruction 126 fcir[.] Floating Convert Double to Inte- fctiw[.] Floating Convert To Integer Word ger with Round 127 fcirz[.] Floating Convert Double to Inte- fctiwz[.] Floating Convert To Integer Word ger with Round to Zero with round toward Zero A.32.2 Floating-Point Conversion A.32.3 Floating-Point Interrupts to Integer POWER2 uses MSR bits 20 and 23 to control the gen- eration of interrupts for floating-point enabled excep- The fcir and fcirz instructions of POWER2 have the tions, and Power ISA uses the corresponding MSR bits, same opcodes as do the fctiw and fctiwz instructions, bits 52 and 55, for the same purpose. However, in respectively, of Power ISA. However, the functions differ Power ISA these bits comprise a two-bit value that con- in the following respects. trols the occurrence, precision, and recoverability of the 1 fcir and fcirz set the high-order 32 bits of the tar- interrupt, while in POWER2 these bits are used inde- get FPR to 0xFFFF_FFFF, while fctiw and fctiwz pendently to control the occurrence (bit 20) and the set them to an undefined value. precision (bit 23) of the interrupt. Moreover, in Power 1 Except for enabled Invalid Operation Exceptions, ISA all floating-point interrupts are considered Program fcir and fcirz set the FPRF field of the FPSCR interrupts, while in POWER2 imprecise floating-point based on the result, while fctiw and fctiwz set it to interrupts have their own interrupt vector location. an undefined value. 1 fcir and fcirz do not affect the VXSNAN bit of the FPSCR, while fctiw and fctiwz do. A.32.4 Trace 1 fcir and fcirz set FPSCRXX to 1 for certain cases The Trace interrupt vector location differs between the of "Large Operands" (i.e., operands that are too two architectures, and there are many other differ- large to be represented as a 32-bit signed fixed- ences. point integer), while fctiw and fctiwz do not alter it for any case of "Large Operand". (The IEEE stan- dard requires not altering it for "Large Operands".) A.33 Deleted Instructions The following instructions are new in POWER2 imple- mentations of the POWER Architecture but have been dropped from the Power ISA. lfq Load Floating-Point Quad lfqu Load Floating-Point Quad with Update Appendix A. Incompatibilities with the POWER Architecture 755 Version 2.04 lfqux Load Floating-Point Quad with Update Indexed lfqx Load Floating-Point Quad Indexed stfq Store Floating-Point Quad stfqu Store Floating-Point Quad with Update stfqux Store Floating-Point Quad with Update Indexed stfqx Store Floating-Point Quad Indexed A.33.1 Discontinued Opcodes The opcodes listed below are new in POWER2 imple- mentations of the POWER Architecture but have been dropped from the Power ISA. The list contains the POWER2 mnemonic (MNEM), the primary opcode (PRI), and the extended opcode (XOP) if appropriate. The corresponding instructions are either illegal or reserved in Power ISA; see Appendix D. MNEM PRI XOP lfq 56 - lfqu 57 - lfqux 31 823 lfqx 31 791 stfq 60 - stfqu 61 - stfqux 31 951 stfqx 31 919 756 Power ISATM -- Book Appendices Version 2.04 Appendix B. Platform Support Requirements As described in Chapter 1 of Book I, the architecture is structured as a collection of categories. Each category is comprised of facilities and/or instructions that together provide a unit of functionality. The Server and Embedded categories are referred to as "special" because all implementations must support at least one of these categories. Each special category, when taken together with the Base category, is referred to as an "environment", and provides the minimum functionality required to develop operating systems and applica- tions. Every processor implementation supports at least one of the environments, and may also support a set of cat- egories chosen based on the target market for the implementation. To facilitate the development of operat- ing systems and applications for a well-defined purpose or customer set, usually embodied in a unique hard- ware platform, this appendix documents the associa- tion between a platform and the set of categories it requires. Adding a new platform may permit cost-performance optimization by clearly identifying a unique set of cate- gories. However, this has the potential to fragment the application base. As a result, new platforms will be added only when the optimization benefit clearly out- weighs the loss due to fragmentation. The platform support requirements are documented in Figure 20. An "x" in a column indicates that the cate- gory is required. A "+" in a column indicates that the requirement is being phased in. Appendix B. Platform Support Requirements 757 Version 2.04 Category Server Plat- Embedded form Platform Base x x Server x Embedded x Alternate Time Base Cache Specification Embedded.Cache Debug Embedded.Cache Initialization Embedded.Enhanced Debug Embedded.External PID Embedded.Little-Endian Embedded.MMU Type FSL * Embedded.Performance Monitor Embedded.Processor Control Embedded Cache Locking External Control External Proxy Floating-Point x Floating-Point.Record x Legacy Move Assist Legacy Integer Multiply-Accumulate Load/Store Quadword Memory Coherence x Move Assist x Server.Performance Monitor x Signal Processing Engine SPE.Embedded Float Scalar Double SPE.Embedded Float Scalar Single SPE.Embedded Float Vector Stream x Trace x Variable Length Encoding Vector + Vector.Little-Endian +1 Wait 64-Bit x 1 If the Vector category is supported, Vector.Little-Endian is required on the Server platform. - Figure 20. Platform Support Requirements 758 Power ISATM -- Book Appendices Version 2.04 Appendix C. Complete SPR List This appendix lists all the Special Purpose Registers in the Power ISA , ordered by SPR number. SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 1 00000 00001 XER no no 64 B 8 00000 01000 LR no no 64 B 9 00000 01001 CTR no no 64 B 18 00000 10010 DSISR yes yes 32 S 19 00000 10011 DAR yes yes 64 S 22 00000 10110 DEC yes yes 32 B 25 00000 11001 SDR1 hypv3 yes 64 S 26 00000 11010 SRR0 yes yes 64 B 27 00000 11011 SRR1 yes yes 64 B 29 00000 11101 AMR yes yes 64 S 48 00001 10000 PID yes yes 32 E 54 00001 10110 DECAR yes yes 32 E 58 00001 11010 CSRR0 yes yes 64 E 59 00001 11011 CSRR1 yes yes 32 E 61 00001 11101 DEAR yes yes 64 E 62 00001 11110 ESR yes yes 32 E 63 00001 11111 IVPR yes yes 64 E 136 00100 01000 CTRL - no 32 S 152 00100 11000 CTRL yes - 32 S 256 01000 00000 VRSAVE no no 32 V 259 01000 00011 SPRG3 - no 64 B 260-263 01000 001xx SPRG[4-7] - no 64 E 268 01000 01100 TB - no 64 B 269 01000 01101 TBU - no 32 B 272-275 01000 100xx SPRG[0-3] yes yes 64 B 276-279 01000 101xx SPRG[4-7] yes yes 64 E 282 01000 11010 EAR hypv3 yes 32 EC 284 01000 11100 TBL hypv4 - 32 B 285 01000 11101 TBU hypv4 - 32 B 286 01000 11110 TBU40 hypv - 64 S 286 01000 11110 PIR - yes 32 E 287 01000 11111 PVR - yes 32 B 304 01001 10000 HSPRG0 hypv3 hypv3 64 S 304 01001 10000 DBSR yes5 yes 32 E 305 01001 10001 HSPRG1 hypv3 hypv3 64 S 306 01001 10010 HDSISR hypv3 hypv3 32 B 307 01001 10011 HDAR hypv3 hypv3 64 B 308 01001 10100 DBCR0 yes yes 32 E 309 01001 10101 PURR hypv3 yes 64 S 309 01001 10101 DBCR1 yes yes 32 E 310 01001 10110 HDEC hypv3 yes 32 S Appendix C. Complete SPR List 759 Version 2.04 SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 310 01001 10110 DBCR2 yes yes 32 E 312 01001 11000 RMOR hypv3 hypv3 64 S 312 01001 11000 IAC1 yes yes 64 E 313 01001 11001 HRMOR hypv3 hypv3 64 S 313 01001 11001 IAC2 yes yes 64 E 314 01001 11010 HSRR0 hypv3 hypv3 64 S 314 01001 11010 IAC3 yes yes 64 E 315 01001 11011 HSRR1 hypv3 hypv3 64 S 315 01001 11011 IAC4 yes yes 64 E 316 01001 11100 DAC1 yes yes 64 E 317 01001 11101 DAC2 yes yes 64 E 318 01001 11110 LPCR hypv3 hypv3 64 S 318 01001 11110 DVC1 yes yes 64 E 319 01001 11111 LPIDR hypv3 hypv3 32 S 319 01001 11111 DVC2 yes yes 64 E 336 01010 10000 TSR yes5 yes 32 E 340 01010 10100 TCR yes yes 32 E 400-415 01100 1xxxx IVOR[0-15] yes yes 32 E 512 10000 00000 SPEFSCR no no 32 SP 526 10000 01110 ATB/ATBL - no 64 ATB 527 10000 01111 ATBU - no 32 ATB 528 10000 10000 IVOR32 yes yes 32 SP 529 10000 10001 IVOR33 yes yes 32 SP 530 10000 10010 IVOR34 yes yes 32 SP 531 10000 10011 IVOR35 yes yes 32 E.PM 532 10000 10100 IVOR36 yes yes 32 E.PC 533 10000 10101 IVOR37 yes yes 32 E.PC 570 10001 11010 MCSRR0 yes yes 64 E 571 10001 11011 MCSRR1 yes yes 32 E 572 10001 11100 MCSR yes yes 64 E 574 10001 11110 DSRR0 yes yes 64 E.ED 575 10001 11111 DSRR1 yes yes 32 E.ED 604 10010 11100 SPRG8 yes yes 64 XSR 605 10010 11101 SPRG9 yes yes 64 XSR 624 10011 10000 MAS0 yes yes 32 E.MF 625 10011 10001 MAS1 yes yes 32 E.MF 626 10011 10010 MAS2 yes yes 64 E.MF 627 10011 10011 MAS3 yes yes 32 E.MF 628 10011 10100 MAS4 yes yes 32 E.MF 630 10011 10110 MAS6 yes yes 32 E.MF 633 10011 11001 PID1 yes yes 32 E.MF 634 10011 11010 PID2 yes yes 32 E.MF 688-691 10101 100xx TLB[0-3]CFG yes yes 32 E.MF 702 10101 11110 EPR - yes 32 EXP 768-783 11000 0xxxx perf_mon - no 64 S.PM 784-799 11000 1xxxx perf_mon yes yes 64 S.PM 896 11100 00000 PPR no no 64 S 924 11100 11100 DCBTRL -6 yes 32 E.CD 925 11100 11101 DCBTRH -6 yes 32 E.CD 926 11100 11110 ICBTRL -7 yes 32 E.CD 927 11100 11111 ICDBTRH -7 yes 32 E.CD 944 11101 10000 MAS7 yes yes 32 E.MF 947 11101 10011 EPLC yes yes 32 E.PD 948 11101 10100 EPSC yes yes 32 E.PD 979 11110 10011 ICBDR -7 yes 32 E.CD 1012 11111 10100 MMUCSR0 yes yes 32 E.MF 1013 11111 10101 DABR hypv3 yes 64 S 760 Power ISATM -- Book Appendices Version 2.04 SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 1015 11111 10111 DABRX hypv3 yes 64 S 1015 11111 10111 MMUCFG yes yes 32 E.MF 1023 11111 11111 PIR - yes 32 S - This register is not defined for this instruction. 1 Note that the order of the two 5-bit halves of the SPR number is reversed. 2 See Section 1.3.5 of Book I. 3 This register is a hypervisor resource, and can be modified by this instruc- tion only in hypervisor state (see Chapter 2 of Book III-S). 4 This register is a hypervisor resource, and can be modified by this instruction only in hypervisor state (see Chapter 2 of Book III-S). This register is privileged. 5 This register cannot be directly written to. Instead, bits in the register corre- sponding to 1 bits in (RS) can be cleared using mtspr SPR,RS. 6 The register can be written by the dcread instruction. 7 The register can be written by the icread instruction. All SPR numbers that are not shown above and are not implementation-spe- cific are reserved. Appendix C. Complete SPR List 761 Version 2.04 762 Power ISATM -- Book Appendices Version 2.04 Appendix D. Illegal Instructions With the exception of the instruction consisting entirely of binary 0s, the instructions in this class are available for future extensions of the Power ISA; that is, some future version of the Power ISA may define any of these instructions to perform new functions. The following primary opcodes are illegal. 1, 5, 6, 57, 60, 61 The following primary opcodes have unused extended opcodes. Their unused extended opcodes can be determined from the opcode maps in Appendix F of Book Appendices. All unused extended opcodes are illegal. 4, 19, 30, 31, 56, 58, 59, 62, 63 An instruction consisting entirely of binary 0s is illegal, and is guaranteed to be illegal in all future versions of this architecture. Appendix D. Illegal Instructions 763 Version 2.04 764 Power ISATM -- Book Appendices Version 2.04 Appendix E. Reserved Instructions The instructions in this class are allocated to specific purposes that are outside the scope of the Power ISA. The following types of instruction are included in this class. 1. The instruction having primary opcode 0, except the instruction consisting entirely of binary 0s (which is an illegal instruction; see Section 1.7.2, "Illegal Instruction Class" on page 18) and the extended opcode shown below. 256 Service Processor "Attention" 2. Instructions for the POWER Architecture that have not been included in the Power ISA. These are listed in Section A.31, "Discontinued Opcodes" and Section A.33.1, "Discontinued Opcodes". 3. Implementation-specific instructions used to con- form to the Power ISA specification. 4. Any other implementation-dependent instructions that are not defined in the Power ISA. Appendix E. Reserved Instructions 765 Version 2.04 766 Power ISATM -- Book Appendices Version 2.04 Appendix F. Opcode Maps This appendix contains tables showing the opcodes opcode, by an instruction having an extended and extended opcodes. opcode in primary opcode 30, 58, or 62, or by a potential instruction in any of the categories just For the primary opcode table (Table 3 on page 768), mentioned. The overlaying instruction, if any, is each cell is in the following format. also shown. A cell thus reserved should not be assigned to an instruction having primary opcode Opcode in Opcode in 31. (The overlaying is a consequence of opcode Decimal Hexadecimal decoding for fixed-point instructions: the primary opcode, and the extended opcode if any, are Instruction mapped internally to a 10-bit "compressed Mnemonic opcode" for ease of subsequent decoding.) 1 Parentheses around the opcode or extended Category Instruction opcode mean that the instruction was defined in Format earlier versions of the Power ISA but is no longer The category abbreviations are shown on Section 1.3.5 defined in the Power ISA. of Book I. 1 Curly brackets around the opcode or extended opcode mean that the instruction will be defined in The extended opcode tables show the extended future versions of the Power ISA. opcode in decimal, the instruction mnemonic, the cate- gory, and the instruction format. These tables appear in 1 long is used as filler for mnemonics that are longer order of primary opcode within three groups. The first than a table cell. group consists of the primary opcodes that have small An empty cell, a cell containing only an asterisk, or a extended opcode fields (2-4 bits), namely 30, 58, and cell in which the opcode or extended opcode is paren- 62. The second group consists of primary opcodes that thesized, corresponds to an illegal instruction. have 11-bit extended opcode fields. The third group consists of primary opcodes that have 10-bit extended The instruction consisting entirely of binary 0s causes opcode fields. The tables for the second and third the system illegal instruction error handler to be groups are rotated. invoked for all members of the POWER family, and this is likely to remain true in future models (it is guaranteed In the extended opcode tables several special markings in the Power ISA). An instruction having primary are used. opcode 0 but not consisting entirely of binary 0s is 1 A prime (`) following an instruction mnemonic reserved except for the following extended opcode denotes an additional cell, after the lowest-num- (instruction bits 21:30). bered one, used by the instruction. For example, 256 Service Processor "Attention" (Power ISA subfc occupies cells 8 and 520 of primary opcode only) 31, with the former corresponding to OE=0 and the latter to OE=1. Similarly, sradi occupies cells 826 and 827, with the former corresponding to sh5=0 and the latter to sh5=1 (the 9-bit extended opcode 413, shown on page 85, excludes the sh5 bit). 1 Two vertical bars (||) are used instead of primed mnemonics when an instruction occupies an entire column of a table. The instruction mnemonic is repeated in the last cell of the column. 1 For primary opcode 31, an asterisk (*) in a cell that would otherwise be empty means that the cell is reserved because it is "overlaid", by a fixed-point or Storage Access instruction having only a primary Appendix F. Opcode Maps 767 Version 2.04 Table 3: Primary opcodes 0 00 1 01 2 02 3 03 See primary opcode 0 extensions on page 767 Illegal, tdi twi Reserved Trap Doubleword Immediate 64 D B D Trap Word Immediate 4 04 5 05 6 06 7 07 See Table 7 and Table 8 Vector, LMA, mulli SP V, LMA, SP BD Multiply Low Immediate 8 08 9 09 10 0A 11 0B Subtract From Immediate Carrying subfic cmpli cmpi Compare Logical Immediate B D B D B D Compare Immediate 12 0C 13 0D 14 0E 15 0F Add Immediate Carrying addic addic. addi addis Add Immediate Carrying and Record Add Immediate B D B D B D B D Add Immediate Shifted 16 10 17 11 18 12 19 13 Branch Conditional bc sc b CR ops, System Call etc. Branch B B B SC B I XL See Table 10 on page 781 20 14 21 15 22 16 23 17 Rotate Left Word Imm. then Mask Insert rlwimi rlwinm rlwnm Rotate Left Word Imm. then AND with Mask B M B M B M Rotate Left Word then AND with Mask 24 18 25 19 26 1A 27 1B OR Immediate ori oris xori xoris OR Immediate Shifted XOR Immediate B D B D B D B D XOR Immediate Shifted 28 1C 29 1D 30 1E 31 1F AND Immediate andi. andis. FX Dwd Rot FX AND Immediate Shifted Extended Ops See Table 4 on page 769 B D B D MD[S] See Table 10 on page 781 32 20 33 21 34 22 35 23 Load Word and Zero lwz lwzu lbz lbzu Load Word and Zero with Update Load Byte and Zero B D B D B D B D Load Byte and Zero with Update 36 24 37 25 38 26 39 27 Store Word stw stwu stb stbu Store Word with Update Store Byte B D B D B D B D Store Byte with Update 40 28 41 29 42 2A 43 2B Load Half and Zero lhz lhzu lha lhau Load Half and Zero with Update Load Half Algebraic B D B D B D B D Load Half Algebraic with Update 44 2C 45 2D 46 2E 47 2F Store Half sth sthu lmw stmw Store Half with Update Load Multiple Word B D B D B D B D Store Multiple Word 48 30 49 31 50 32 51 33 Load Floating-Point Single lfs lfsu lfd lfdu Load Floating-Point Single with Update Load Floating-Point Double FP D FP D FP D FP D Load Floating-Point Double with Update 52 34 53 35 54 36 55 37 Store Floating-Point Single stfs stfsu stfd stfdu Store Floating-Point Single with Update Store Floating-Point Double FP D FP D FP D FP D Store Floating-Point Double with Update 56 38 57 39 58 3A 59 3B Load Quadword lq FX DS-form FP Single Loads Extended Ops See Table 5 on page 769 LSQ DQ DS A See Table 15 on page 785 60 3C 61 3D 62 3E 63 3F FX DS-form FP Double Stores Extended Ops See Table 6 on page 769 DS See Table 16 on page 787 768 Power ISATM -- Book Appendices Version 2.04 Table 4: Extended opcodes for primary opcode 30 (instruction bits 27:30) 00 01 10 11 0 1 2 3 rldicl rldicl' rldicr rldicr' 00 64 64 MD MD MD MD 4 5 6 7 rldic rldic' rldimi rldimi' 01 64 64 MD MD MD MD 8 9 rldcl rldcr 10 64 64 MDS MDS 11 Table 5: Extended opcodes for primary opcode 58 (instruction bits 30:31) 0 1 0 1 ld ldu 0 64 64 DS DS 2 lwa 1 64 DS Table 6: Extended opcodes for primary opcode 62 (instruction bits 30:31) 0 1 0 1 std stdu 0 64 64 DS DS 2 stq 1 LSQ DS Appendix F. Opcode Maps 769 Version 2.04 770 Power ISATM -- Book Appendices Version 2.04 Table 7: (Left) Extended opcodes for primary opcode 4 [Category: V & LMA] (instruction bits 21:31) 000000 000001 000010 000011 000100 000101 000110 000111 001000 001001 001010 001011 001100 001101 001110 001111 0 2 4 6 8 10 12 14 00000 vaddubm vmaxub vrlb vcmpequb vmuloub vaddfp vmrghb vpkuhum V VX V VX V VX V VC V VX V VX V VX V VX 64 66 68 70 72 74 76 78 00001 vadduhm vmaxuh vrlh vcmpequh vmulouh vsubfp vmrghh vpkuwum V VX V VX V VX V VC V VX V VX V VX V VX 128 130 132 134 140 142 00010 vadduwm vmaxuw vrlw vcmpequw vmrghw vpkuhus V VX V VX V VX V VC V VX V VX 198 206 00011 vcmpeqfp vpkuwus V VC V VX 258 260 264 266 268 270 00100 vmaxsb vslb vmulosb vrefp vmrglb vpkshus V VX V VX V VX V VX V VX V VX 322 324 328 330 332 334 00101 vmaxsh vslh vmulosh vrsqrtefp vmrglh vpkswus V VX V VX V VX V VX V VX V VX 384 386 388 394 396 398 00110 vaddcuw vmaxsw vslw vexptefp vmrglw vpkshss V VX V VX V VX V VX V VX V VX 452 454 458 462 00111 vsl vcmpgefp vlogefp vpkswss V VX V VC V VX V VX 512 514 516 518 520 522 524 526 01000 vaddubs vminub vsrb vcmpgtub vmuleub vrfin vspltb vupkhsb V VX V VX V VX V VC V VX V VX V VX V VX 576 578 580 582 584 586 588 590 01001 vadduhs vminuh vsrh vcmpgtuh vmuleuh vrfiz vsplth vupkhsh V VX V VX V VX V VC V VX V VX V VX V VX 640 642 644 646 650 652 654 01010 vadduws vminuw vsrw vcmpgtuw vrfip vspltw vupklsb V VX V VX V VX V VC V VX V VX V VX 708 710 714 718 01011 vsr vcmpgtfp vrfim vupklsh V VX V VC V VX V VX 768 770 772 774 776 778 780 782 01100 vaddsbs vminsb vsrab vcmpgtsb vmulesb vcuxwfp vspltisb vpkpx V VX V VX V VX V VC V VX V VX V VX V VX 832 834 836 838 840 842 844 846 01101 vaddshs vminsh vsrah vcmpgtsh vmulesh vcsxwfp vspltish vupkhpx V VX V VX V VX V VC V VX V VX V VX V VX 896 898 900 902 906 908 01110 vaddsws vminsw vsraw vcmpgtsw vcfpuxws vspltisw V VX V VX V VX V VC V VX V VX 966 970 974 01111 vcmpbfp vcfpsxws vupklpx V VC V VX V VX 1024 1026 1028 1030 1034 1036 10000 vsububm vavgub vand vcmpequb. vmaxfp vslo V VX V VX V VX V VC V VX V VX 1088 1090 1092 1094 1098 1100 10001 vsubuhm vavgub vandc vcmpequh. vminfp vsro V VX V VX V VX V VC V VX V VX 1152 1154 1156 1158 10010 vsubuwm vavgub vor vcmpequw. V VX V VX V VX V VC 1220 1222 10011 vxor vcmpeqfp. V VX V VC 1282 1284 10100 vavgsb vnor V VX V VX 1346 10101 vavgsb V VX 1408 1410 10110 vsubcuw vavgsb V VX V VX 1478 10111 vcmpgefp V VC 1536 1540 1542 1544 11000 vsububs mfvscr vcmpgtub. vsum4ubs V VX V VX V VC V VX 1600 1604 1606 1608 11001 vsubuhs mtvscr vcmpgtuh. vsum4shs V VX V VX V VC V VX 1664 1670 1672 11010 vsubuws vcmpgtuw. vsum2sws V VX V VC V VX 1734 11011 vcmpgtfp. V VC 1792 1798 1800 11100 vsubsbs vcmpgtsb. vsum4sbs V VX V VC V VX 1856 1862 11101 vsubshs vcmpgtsh. V VX V VC 1920 1926 1928 11110 vsubsws vcmpgtsw. vsumsws V VX V VC V VX 1990 11111 vcmpbfp. V VC 771 Power ISATM -- Book Appendices Version 2.04 Table 7 (Left-Center) Extended opcodes for primary opcode 4 [Category: V & LMA] (instruction bits 21:31) 010000 010001 010010 010011 010100 010101 010110 010111 011000 011001 011010 011011 011100 011101 011110 011111 16 17 24 24 00000 mulhhwu mulhhwu. machhwu long LMA XO LMA XO LMA XO LMA XO 80 81 88 89 92 93 00001 mullhw mullhw. machhw machhw. nmachhw long LMA XO LMA XO LMA XO LMA XO LMA XO LMA XO 152 153 00010 long long LMA XO LMA XO 216 217 220 220 00011 machhws long long long LMA XO LMA XO LMA XO LMA XO 272 273 280 281 00100 mulchwu mulchwu. macchwu long LMA X LMA X LMA XO LMA XO 336 337 344 345 348 349 00101 mulchw mulchw. macchw macchw. nmacchw long LMA X LMA X LMA XO LMA XO LMA XO LMA XO 408 409 00110 long long LMA XO LMA XO 472 473 476 477 00111 macchws long long long LMA XO LMA XO LMA XO LMA XO 01000 01001 01010 01011 784 784 792 793 01100 mullhwu mullhwu. maclhwu maclhwu. LMA X LMA X LMA XO LMA XO 848 849 856 857 860 861 01101 mullhw mullhw. maclhw maclhw. nmaclhw nmaclhw. LMA X LMA X LMA XO LMA XO LMA XO LMA XO 920 921 01110 long long LMA XO LMA XO 984 985 988 989 01111 maclhws maclhws. long long LMA XO LMA XO LMA XO LMA XO 1040 1041 1048 1049 10000 long long long long LMA XO LMA XO LMA XO LMA XO 1104 1105 1112 1113 1116 1117 10001 mullhwo. mullhwo. machhwo long long long LMA XO LMA XO LMA XO LMA XO LMA XO LMA XO 1176 1177 10010 long long LMA XO LMA XO 1240 1241 1244 1245 10011 long long long long LMA XO LMA XO LMA XO LMA XO 1304 1305 10100 long long LMA XO LMA XO 1368 1369 1372 1373 10101 macchwo long long long LMA XO LMA XO LMA XO LMA XO 1432 1433 10110 long long LMA XO LMA XO 1496 1497 1500 1501 10111 long long long long LMA XO LMA XO LMA XO LMA XO 11000 11001 11010 11011 1816 1817 11100 long long LMA XO LMA XO 1880 1881 1884 1885 11101 maclhwo maclhwo. long long LMA XO LMA XO LMA XO LMA XO 1944 1946 11110 long long LMA XO LMA XO 2008 2009 2012 2013 11111 long long long long LMA XO LMA XO LMA XO LMA XO 772 Version 2.04 Table 7 (Right-Center) Extended opcodes for primary opcode 4 [Category: V & LMA] (instruction bits 21:31) 100000 100001 100010 100011 100100 100101 100110 100111 101000 101001 101010 101011 101100 101101 101110 101111 32 32 34 36 37 38 39 40 41 42 43 44 46 47 00000 vmhaddshs vmhraddshs vmladduhm vmsumubm vmsummbm vmsumuhm vmsumuhs vmsumshm vmsumshs vsel vperm vsdoi vmaddfp vnmsubfp V VA V VA V VA V VA V VA V VA V VA V VA V VA V VA V VA V VA V VA V VA || || || || || || || || || || || || || || 00001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11111 || || || || || || || || || || || || || || vmhaddshs vmhraddshs vmladduhm vmsumubm vmsummbm vmsumuhm vmsumuhs vmsumshm vmsumshs vsel vperm vsdoi vmaddfp vnmsubfp 773 Power ISATM -- Book Appendices Version 2.04 Table 7 (Right) Extended opcodes for primary opcode 4 [Category: V & LMA] (instruction bits 21:31) 110000 110001 110010 110011 110100 110101 110110 110111 111000 111001 111010 111011 111100 111101 111110 111111 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 774 Version 2.04 Table 8: (Left) Extended opcodes for primary opcode 4 [Category: SP.*] (instruction bits 21:31) 000000 000001 000010 000011 000100 000101 000110 000111 001000 001001 001010 001011 001100 001101 001110 001111 00000 00001 00010 00011 00100 00101 00110 00111 512 514 516 518 520 521 522 523 524 525 526 527 01000 evaddw evaddiw evsubfw evsubifw evabs evneg evextsb evextsh evrndw evcntlzw evcntlsw brinc SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01001 640 641 644 645 646 648 649 652 653 654 01010 evfsadd evssub evfsabs evfsnabs evfsneg evfsmul evfsdiv long evfscmplt long sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX 704 705 708 709 710 712 713 716 717 718 719 01011 efsadd efssub efsabs efsnabs efsneg efsmul efsdiv efscmpgt efscmplt efscmpeq efscfd sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fd EVX 768 769 770 771 772 773 776 777 780 781 782 783 01100 evlddx evldd evldwx evldw evldhx evldh long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01101 01110 01111 1027 1031 1032 1033 1035 1036 1037 1039 10000 evmhessf evmhossf long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 1095 1096 1100 1101 1103 10001 long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX 10010 1216 1217 1218 1219 1220 1222 1223 1224 1225 1226 1227 10011 long long long long evmra evdivws evdivwu long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 1280 1281 1283 1285 1287 1288 1289 1291 1292 1293 1295 10100 long long long long long long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 1344 1345 1352 1353 10101 long long long long SP EVX SP EVX SP EVX SP EVX 1408 1409 1411 1412 1413 1415 1416 1417 1419 1420 1421 1423 10110 long long long long long long long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 1472 1473 1480 1481 10111 long long long long SP EVX SP EVX SP EVX SP EVX 11000 11001 11010 11011 11100 11101 11110 11111 775 Power ISATM -- Book Appendices Version 2.04 Table 8 (Left-Center) Extended opcodes for primary opcode 4 [Category: SP.*] (instruction bits 21:31) 010000 010001 010010 010011 010100 010101 010110 010111 011000 011001 011010 011011 011100 011101 011110 011111 00000 00001 00010 00011 00100 00101 00110 00111 529 530 534 535 536 537 539 542 01000 evand evandc evxor evor evnor eveqv evorc evnand SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01001 656 657 658 659 660 661 662 663 664 666 668 669 670 01010 evfsfui evfscfsi evfscfuf evfscfsf evfsctui evfsctsi evfsctuf evfsctsf evfsctuiz evfsctsiz evfststgt evfststlt evfststeq sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX 720 721 722 723 724 725 726 727 728 730 732 733 734 01011 efscfui efscfsi efscfuf efscfsf efsctui efsctsi efsctuf efsctsf efsctuiz efsctsiz efststgt efststlt efststeq sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX 784 785 788 789 790 791 792 793 796 797 01100 evlwhex evlwhe evlwhoux evlwhou evlwhosx evlwhos long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01101 01110 01111 10000 1112 1113 1115 10001 long long long SP EVX SP EVX SP EVX 10010 10011 10100 1363 1368 1369 1371 10101 long long long long SP EVX SP EVX SP EVX SP EVX 10110 1491 1496 1497 1499 10111 long long long long SP EVX SP EVX SP EVX SP EVX 11000 11001 11010 11011 11100 11101 11110 11111 776 Version 2.04 Table 8 (Right-Center) Extended opcodes for primary opcode 4 [Category: SP.*] (instruction bits 21:31) 100000 100001 100010 100011 100100 100101 100110 100111 101000 101001 101010 101011 101100 101101 101110 101111 00000 00001 00010 00011 00100 00101 00110 00111 544 545 546 547 548 550 552 553 554 555 556 557 558 559 01000 evsrwu evsrws evsrwiu evsrwis evslw evslwi evrlw evsplati evrlwi evsplatfi long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01001 01010 736 737 738 739 740 741 742 744 745 746 747 748 749 750 751 01011 efdadd efdsub efdcfuid efdcfsid efdabs efdnabs efdneg efdmul efddiv efdctuidz efdctsidz efdcmpgt efdcmplt efdcmpeq efdcfs sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX 800 801 802 803 804 805 01100 evstddx evstdd evstdwx evstdw evstdhx evstdh SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01101 01110 01111 1059 1063 1064 1065 1067 1068 1069 1071 10000 long long long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 1127 1128 1132 1133 1135 10001 long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX 10010 10011 1320 1321 1323 1324 1325 1327 10100 long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 10101 1448 1449 1451 1452 1453 1455 10110 long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 10111 11000 11001 11010 11011 11100 11101 11110 11111 777 Power ISATM -- Book Appendices Version 2.04 Table 8 (Right) Extended opcodes for primary opcode 4 [Category: SP.*] (instruction bits 21:31) 110000 110001 110010 110011 110100 110101 110110 110111 111000 111001 111010 111011 111100 111101 111110 111111 00000 00001 00010 00011 00100 00101 00110 00111 560 561 562 563 564 01000 evcmpgtu evcmpgts evcmpltu evcmplts evcmpeq SP EVX SP EVX SP EVX SP EVX SP EVX 632 633 634 635 636 637 638 639 01001 evsel evsel' evsel' evsel' evsel' evsel' evsel' evsel' SP EVS SP EVS SP EVS SP EVS SP EVS SP EVS SP EVS SP EVS 01010 752 753 754 755 756 757 758 759 760 762 764 765 766 01011 efdcfui efdcfsi efdcfuf efdcfsf efdctui efdctsi efdctuf efdctsf efdctuiz efdctsiz efdtstgt efdtstlt efdtsteq sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX 816 817 820 821 824 825 828 829 01100 evstwhex evstwhe evstwhox evstwho evstwwex evstwwe evstwwox evstwwo SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01101 01110 01111 10000 1139 1145 1147 10001 long long long SP EVX SP EVX SP EVX 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 778 Version 2.04 Table 9: (Left) Extended opcodes for primary opcode 19 (instruction bits 21:30) 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 0 00000 mcrf B XL 33 38 39 00001 crnor rfmci rfdi B XL E XL E.ED X 00010 00011 129 00100 crandc B XL 00101 193 198 00110 crxor dnh B XL E.EDXFX 225 00111 crnand B XL 257 01000 crand B XL 289 01001 creqv B XL 01010 01011 01100 417 01101 crorc B XL 449 01110 cror B XL 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 779 Power ISATM -- Book Appendices Version 2.04 Table 9. (Right) Extended opcodes for primary opcode 19 (instruction bits 21:30) 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 16 18 00000 bclr rfid B XL S XL 50 51 00001 rfi rfci E XL E XL (82) 00010 rfsvc XL 00011 150 00100 isync B XL 00101 00110 00111 274 01000 hrfid S XL 01001 01010 01011 01100 01101 01110 01111 528 10000 bcctr B XL 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 780 Version 2.04 Table 10: (Left) Extended opcodes for primary opcode 31 (instruction bits 21:30) 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 0 4 6 7 8 9 10 11 14 15 00000 cmp tw lvsl lvebx subfc mulhdu addc mulhwu Res'd See B X B X V X V X B XO 64 XO B XO B XO VLE Table 14 32 33 38 39 40 46 || 00001 cmpl Res'd lvsr lvehx subf Res'd || B X VLE V X V X B XO VLE || 68 71 73 75 78 || 00010 td lvewx mulhd mulhw dlmzb || 64 X V X 64 XO B XO LMA X || 103 104 || 00011 lvx neg || V X B XO || 129 131 134 135 136 138 || 00100 Res'd wrtee dcbtstls stvebx subfe add || VLE E X ECL X V X B XO B XO || 163 166 167 || 00101 wrteei dcbtls stvehx || E X ECL X V X || 193 199 200 202 206 || 00110 Res'd stvewx subfze addz msgsnd || VLE V X B XO B XO E.PC X || 225 230 231 232 233 234 235 238 || 00111 Res'd icblc stvx subfme mulld addme mullw msgclr || VLE ECL X V X B XO 64 XO B XO B XO E.PC X || 257 259 262 263 266 || 01000 Res'd mfdrx Res'd lvepxl add || VLE E X AP E.PD X B XO || 289 291 295 || 01001 Res'd mfdrux lvepx || VLE E X E.PD X || 323 326 334 || 01010 mfdcr dcread mfpmr || E X E.CD X E.PM X || {359} || 01011 lvxl || V X || 387 390 || 01100 mtdcrx dcblc || E X ECL X || 417 419 || 01101 Res'd mtdcrux || VLE E X || 449 451 454 457 459 462 || 01110 Res'd mtdcr dci divdu divwu mtpmr || VLE E X E.CI X 64 XO B XO E.PM X || 486 {487} 489 491 || 01111 Res'd stvxl divd divw || AP V X 64 XO B XO || 512 {519} 520 521 522 523 || 10000 mcrxr lvlx subfc' mulhdu' addc' mulhwu' || B X V X B XO 64XO B XO B XO || {551} 552 || 10001 lvrx subf' || V X B XO || 585 587 || 10010 mulhd' mulhw' || 64 XO B XO || 616 || 10011 neg' || B XO || {647} 648 650 || 10100 stvlx subfe' adde' || V X B XO B XO || {679} || 10101 stvrx || V X || 712 714 || 10110 subfze' addze' || B XO B XO || 744 745 746 747 || 10111 subfme' mulld' addme' mullw' || B XO 64 XO B XO B XO || 775 778 || 11000 stvepxl add' || E.PD X B XO || 807 || 11001 stvepx || E.PD X || || 11010 || || || 11011 || || 903 || 11100 stvlxl || V X || 935 || 11101 stvrxl || V X || 966 969 971 || 11110 ici divdu' divwu' || E.CI X 64 XO B XO || 998 1001 1003 || 11111 icread divd' divw' See E.CD X 64 XO B XO Table 14 781 Power ISATM -- Book Appendices Version 2.04 Table 10. (Right) Extended opcodes for primary opcode 31 (instruction bits 21:30) 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 16 19 20 21 22 23 24 26 27 28 29 30 31 00000 Res'd mfcr lwarx ldx icbt lwzx slw cntlzw sld and ldepx rldicl* lwepx VLE B XFX B X 64 X E X B X B X B X 64 X B X E.PD X 64 MD E.PD X 53 54 55 56 58 60 62 00001 ldux dcbst lwzux Res'd cntlzd andc See 64 X B X B X VLE 64 X B X Table 11 (82) 83 84 86 87 94 95 00010 mtsrd mfmsr ldarx dcbf lbzx rldicr* lbepx X B X 64 X B X B X 64 MD E.PD X (114) 118 119 122 124 126 127 00011 mtsrdin clf lbzux popcntb nor rldicr* dcbfep X X B X B X B X 64 MD E.PD X 144 146 149 150 151 157 158 159 00100 mtcrf mtmsr stdx stwcx. stwx stdepx rldic* See B XFX B X 64 X B X B X E.PD X 64 MD Table 13 178 181 183 190 191 00101 mtmsrd stdux stwux rldic* rlwinm* S X 64 X B X 64 MD B M 210 214 215 222 223 00110 mtsr stdcx. stbx rldimi* stbepx S X 64 X B X 64 MD E.PD X 242 246 247 254 255 00111 mtsrin dcbtst stbux rldimi* See S X B X B X 64 MD Table 13 274 275 278 279 280 284 285 286 286 01000 tlbiel mfapidi dcbt lhzx Res'd eqv evlddepx rldcl* See S X E X B X B X VLE B X E.PD evx 64 MDS Table 13 306 308 310 311 312 316 318 319 01001 tlbie Res'd eciwx lhzux Res'd xor rldcr* See S X EC X B X VLE B X 64 MDS Table 13 339 341 342 343 350 351 01010 mfspr lwax Res'd lhax * xori* B XFX 64 X AP B X B D 370 371 373 374 375 382 383 01011 tlbia mftb lwaux Res'd lhaux * xoris* S X S XFX 64 X AP B X B D 402 407 412 413 414 415 01100 slbmte sthx orc evstddepx * See S X B X B X E.PD evx Table 13 434 438 439 444 446 447 01101 slbie ecowx sthux or * andis.* S X EC X B X B X B D 467 469 470 471 476 478 01110 mtspr * dcbi lmw* nand * B XFX E X All D B X 498 501 503 510 01111 slbia * stmw* * S X All D 532 533 534 535 536 539 10000 Res'd lswx lwbrx lfsx srw srd B MA B X FP X B X 64 X 566 567 568 10001 tlbsync lfsux Res'd S X FP X VLE 595 597 598 599 607 10010 mfsr lswi sync lfdx lfdepx S X B MA B X FP X E.PD X 631 10011 lfdux FP X 659 660 661 662 663 10100 mfsrin Res'd stswx stwbrx stfsx S X B MA B X FP X 695 10101 stfsux FP X 725 727 735 10110 stswi stfdx stfdepx B MA FP X E.PD X 758 759 10111 dcba stfdux E X FP X 786 790 792 794 11000 tlbivax lhbrx sraw srad E X B X B X 64 X 818 822 823 824 826 827 11001 rac Res'd Res'd srawi sradi sradi' X B X 64 XS 64 XS 851 854 11010 slbmfev See S X Table 12 11011 914 915 918 922 11100 tlbsx slbmfee sthbrx extsh E X S X B X B X 946 951 954 11101 tlbre Res'd extsb E X AP B X 978 982 983 986 991 11110 tlbwe icbi stfiwx extsw icbiep E X B X FP X 64 X E.PD X 1010 1014 1023 11111 Res'd dcbz dcbzep B X E.PD X 782 Version 2.04 Table 14: Opcode: 31, Extended Opcode: 15 Table 11: Opcode: 31, Extended Opcode: 62 01111 0 00001 15 00000 isel 62 62 B.in A 00001 rldicl* wait 47 || 64 MD WT X 00001 * || || 79 || 00010 tdi* || Table 12: Opcode: 31, Extended Opcode: 854 64 D || 10110 111 || 00011 twi* || 854 854 B D || 11010 eieio mbar S X E X 143 || 00100 * || || 175 || Table 13: Opcode: 31, Extended Opcode: 159 00101 * || || 11111 207 || 159 159 00110 * || 00100 rlwimi* stwepx || B M E.PD X 239 || 191 00111 mulli* || 00101 rlwinm* B D || B M 271 || 223 01000 subfic* || 00110 stbepx B D || E.PD X || 255 255 01001 || 00111 rlwnm* dcbstep || B M E.PD X 335 || 287 287 01010 cmpli* || 01000 ori* lhepx B D || B D E.PD X 367 || 319 319 01011 cmpi* || 01001 oris* dcbtep B D || B D E.PD X 399 || 351 01100 addic* || 01010 xori* B D || B D 431 || 383 01101 addic.* || 01011 xoris* B D || B D 463 || 415 415 01110 addi* || 01100 andi.* sthepx B D || B D E.PD X 495 || 01111 addis* || B D || || 10000 || || || 10001 || || || 10010 || || || 10011 || || || 10100 || || || 10101 || || || 10110 || || || 10111 || || || 11000 || || || 11001 || || || 11010 || || || 11011 || || || 11100 || || || 11101 || || || 11110 || || || 11111 || isel 783 Version 2.04 784 Version 2.04 Table 15:(Left) Extended opcodes for primary opcode 59 (instruction bits 21:30) 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 785 Power ISATM -- Book Appendices Version 2.04 Table 15. (Right) Extended opcodes for primary opcode 59 (instruction bits 21:30) 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 18 20 21 22 24 25 26 28 29 30 31 00000 fdivs fsubs fadds fsqrts fres fmuls frsqrtes fmsub fmadds fnmsubs fnmadds FP A FP A FP A FP A FP A FP A FP A FP A FP A FP A FP A || || || || || || || || || || || 00001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11111 || || || || || || || || || || || fdivs fsubs fadds fsqrts fres fmuls frsqrtes fmsub fmadds fnmsubs fnmadds 786 Version 2.04 Table 16:(Left) Extended opcodes for primary opcode 63 (instruction bits 21:30) 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 0 12 14 15 00000 fcmpu frsp fctiw fctiwz FP X FP X FP X FP X 32 38 40 00001 fcmpo mtfsb1 fneg FP X FP X FP X 64 70 72 00010 mcrfs mtfsb0 fmr FP X FP X FP X 00011 134 136 00100 mtfsfi fnabs FP X FP X 00101 00110 00111 264 01000 fabs FP X 01001 01010 01011 392 01100 frin FP X 424 01101 friz FP X 456 01110 frip FP X 488 01111 frim FP X 10000 10001 583 10010 mffs FP X 10011 10100 10101 711 10110 mtfsf FP XFL 10111 11000 814 815 11001 fctid fctidz FP X FP X 846 11010 fcfid FP X 11011 11100 11101 11110 11111 787 Power ISATM -- Book Appendices Version 2.04 Table 16. (Right) Extended opcodes for primary opcode 63 (instruction bits 21:30) 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 18 20 21 22 23 24 25 26 28 29 30 31 00000 fdiv fsub fadd fsqrt fsel fre fmul frsqrte fmsub fmadd fnmsub fnmadd FP A FP A FP A FP A FP A FP A FP A FP A FP A FP A FP A FP A || || || || || || || || || || || || 00001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11111 || || || || || || || || || || || || fdiv fsub fadd fsqrt fsel fre fmul frsqrte fmsub fmadd fnmsub fnmadd 788 Version 2.04 789 Power ISATM -- Book Appendices Version 2.04 790 Power ISATM -- Book Appendices Version 2.04 Appendix G. Power ISA Instruction Set Sorted by Category This appendix lists all the instructions in the Power ISA, grouped by category, and in order by mnemonic within cate- gory. Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 58 SR 76 64 cntlzd[.] Count Leading Zeros Doubleword XO 31 489 SR 66 64 divd[o][.] Divide Doubleword XO 31 457 SR 66 64 divdu[o][.] Divide Doubleword Unsigned X 31 986 SR 76 64 extsw[.] Extend Sign Word DS 58 0 46 64 ld Load Doubleword X 31 84 371 64 ldarx Load Doubleword And Reserve Indexed DS 58 1 46 64 ldu Load Doubleword with Update X 31 53 46 64 ldux Load Doubleword with Update Indexed X 31 21 46 64 ldx Load Doubleword Indexed DS 58 2 45 64 lwa Load Word Algebraic X 31 373 45 64 lwaux Load Word Algebraic with Update Indexed X 31 341 45 64 lwax Load Word Algebraic Indexed XO 31 73 SR 65 64 mulhd[.] Multiply High Doubleword XO 31 9 SR 65 64 mulhdu[.] Multiply High Doubleword Unsigned XO 31 233 SR 65 64 mulld[o][.] Multiply Low Doubleword MDS 30 8 SR 81 64 rldcl[.] Rotate Left Doubleword then Clear Left MDS 30 9 SR 82 64 rldcr[.] Rotate Left Doubleword then Clear Right MD 30 2 SR 81 64 rldic[.] Rotate Left Doubleword Immediate then Clear MD 30 0 SR 79 64 rldicl[.] Rotate Left Doubleword Immediate then Clear Left MD 30 1 SR 80 64 rldicr[.] Rotate Left Doubleword Immediate then Clear Right MD 30 3 SR 82 64 rldimi[.] Rotate Left Doubleword Immediate then Mask Insert X 31 27 SR 85 64 sld[.] Shift Left Doubleword X 31 794 SR 85 64 srad[.] Shift Right Algebraic Doubleword XS 31 413 SR 85 64 sradi[.] Shift Right Algebraic Doubleword Immediate X 31 539 SR 85 64 srd[.] Shift Right Doubleword DS 62 0 50 64 std Store Doubleword X 31 214 371 64 stdcx. Store Doubleword Conditional Indexed DS 62 1 50 64 stdu Store Doubleword with Update X 31 181 50 64 stdux Store Doubleword with Update Indexed X 31 149 50 64 stdx Store Doubleword Indexed X 31 68 70 64 td Trap Doubleword D 2 70 64 tdi Trap Doubleword Immediate XO 31 266 SR 59 B add[o][.] Add XO 31 10 SR 60 B addc[o][.] Add Carrying XO 31 138 SR 61 B adde[o][.] Add Extended D 14 58 B addi Add Immediate D 12 SR 59 B addic Add Immediate Carrying D 13 SR 59 B addic. Add Immediate Carrying and Record D 15 58 B addis Add Immediate Shifted XO 31 234 SR 61 B addme[o][.] Add to Minus One Extended XO 31 202 SR 62 B addze[o][.] Add to Zero Extended X 31 28 SR 73 B and[.] AND X 31 60 SR 74 B andc[.] AND with Complement Appendix G. Power ISA Instruction Set Sorted by Category 791 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext D 28 SR 71 B andi. AND Immediate D 29 SR 71 B andis. AND Immediate Shifted I 18 31 B b[l][a] Branch B 16 CT 31 B bc[l][a] Branch Conditional XL 19 528 CT 32 B bcctr[l] Branch Conditional to Count Register XL 19 16 CT 32 B bclr[l] Branch Conditional to Link Register X 31 0 67 B cmp Compare D 11 67 B cmpi Compare Immediate X 31 32 68 B cmpl Compare Logical D 10 68 B cmpli Compare Logical Immediate X 31 26 SR 74 B cntlzw[.] Count Leading Zeros Word XL 19 257 33 B crand Condition Register AND XL 19 129 34 B crandc Condition Register AND with Complement XL 19 289 34 B creqv Condition Register Equivalent XL 19 225 33 B crnand Condition Register NAND XL 19 33 34 B crnor Condition Register NOR XL 19 449 33 B cror Condition Register OR XL 19 417 34 B crorc Condition Register OR with Complement XL 19 193 33 B crxor Condition Register XOR X 31 86 367 B dcbf Data Cache Block Flush X 31 54 366 B dcbst Data Cache Block Store X 31 278 360 B dcbt Data Cache Block Touch X 31 246 365 B dcbtst Data Cache Block Touch for Store X 31 1014 366 B dcbz Data Cache Block set to Zero XO 31 491 SR 64 B divw[o][.] Divide Word XO 31 459 SR 64 B divwu[o][.] Divide Word Unsigned X 31 284 SR 74 B eqv[.] Equivalent X 31 954 SR 74 B extsb[.] Extend Sign Byte X 31 922 SR 74 B extsh[.] Extend Sign Halfword X 31 982 359 B icbi Instruction Cache Block Invalidate XL 19 150 369 B isync Instruction Synchronize D 34 41 B lbz Load Byte and Zero D 35 41 B lbzu Load Byte and Zero with Update X 31 119 41 B lbzux Load Byte and Zero with Update Indexed X 31 87 42 B lbzx Load Byte and Zero Indexed D 42 43 B lha Load Halfword Algebraic D 43 43 B lhau Load Halfword Algebraic with Update X 31 375 43 B lhaux Load Halfword Algebraic with Update Indexed X 31 343 43 B lhax Load Halfword Algebraic Indexed X 31 790 51 B lhbrx Load Halfword Byte-Reverse Indexed D 40 42 B lhz Load Halfword and Zero D 41 42 B lhzu Load Halfword and Zero with Update X 31 311 42 B lhzux Load Halfword and Zero with Update Indexed X 31 279 42 B lhzx Load Halfword and Zero Indexed D 46 52 B lmw Load Multiple Word X 31 20 370 B lwarx Load Word And Reserve Indexed X 31 534 51 B lwbrx Load Word Byte-Reverse Indexed D 32 44 B lwz Load Word and Zero D 33 44 B lwzu Load Word and Zero with Update X 31 55 44 B lwzux Load Word and Zero with Update Indexed X 31 23 44 B lwzx Load Word and Zero Indexed XL 19 0 34 B mcrf Move Condition Register Field X 31 512 91 B mcrxr Move to Condition Register from XER XFX 31 19 89 B mfcr Move From Condition Register X 31 83 P 417, B mfmsr Move From Machine State Register 527 XFX 31 339 O 88,3 B mfspr Move From Special Purpose Register 78 XFX 31 144 89 B mtcrf Move To Condition Register Fields XFX 31 467 O 87 B mtspr Move To Special Purpose Register XO 31 75 SR 63 B mulhw[.] Multiply High Word 792 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XO 31 11 SR 63 B mulhwu[.] Multiply High Word Unsigned D 7 63 B mulli Multiply Low Immediate XO 31 235 SR 63 B mullw[o][.] Multiply Low Word X 31 476 SR 73 B nand[.] NAND XO 31 104 SR 62 B neg[o][.] Negate X 31 124 SR 74 B nor[.] NOR X 31 444 SR 73 B or[.] OR X 31 412 SR 74 B orc[.] OR with Complement D 24 71 B ori OR Immediate D 25 72 B oris OR Immediate Shifted M 20 SR 79 B rlwimi[.] Rotate Left Word Immediate then Mask Insert M 21 SR 77 B rlwinm[.] Rotate Left Word Immediate then AND with Mask M 23 SR 78 B rlwnm[.] Rotate Left Word then AND with Mask SC 17 35, B sc System Call 404, 515 X 31 24 SR 83 B slw[.] Shift Left Word X 31 792 SR 84 B sraw[.] Shift Right Algebraic Word X 31 824 SR 84 B srawi[.] Shift Right Algebraic Word Immediate X 31 536 SR 83 B srw[.] Shift Right Word D 38 47 B stb Store Byte D 39 47 B stbu Store Byte with Update X 31 247 47 B stbux Store Byte with Update Indexed X 31 215 47 B stbx Store Byte Indexed D 44 48 B sth Store Halfword X 31 918 51 B sthbrx Store Halfword Byte-Reverse Indexed D 45 48 B sthu Store Halfword with Update X 31 439 48 B sthux Store Halfword with Update Indexed X 31 407 48 B sthx Store Halfword Indexed D 47 53 B stmw Store Multiple Word D 36 49 B stw Store Word X 31 662 51 B stwbrx Store Word Byte-Reverse Indexed X 31 150 370 B stwcx. Store Word Conditional Indexed D 37 49 B stwu Store Word with Update X 31 183 49 B stwux Store Word with Update Indexed X 31 151 49 B stwx Store Word Indexed XO 31 40 SR 59 B subf[o][.] Subtract From XO 31 8 SR 60 B subfc[o][.] Subtract From Carrying XO 31 136 SR 61 B subfe[o][.] Subtract From Extended D 8 SR 60 B subfic Subtract From Immediate Carrying XO 31 232 SR 61 B subfme[o][.] Subtract From Minus One Extended XO 31 200 SR 62 B subfze[o][.] Subtract From Zero Extended X 31 598 372 B sync Synchronize X 31 566 H 453, B tlbsync TLB Synchronize 561, 651 X 31 4 69 B tw Trap Word D 3 69 B twi Trap Word Immediate X 31 316 SR 73 B xor[.] XOR D 26 72 B xori XOR Immediate D 27 72 B xoris XOR Immediate Shifted A 31 15 70 B.in isel Integer Select XFX 31 19 90 B.in mfocrf Move From One Condition Register Field XFX 31 144 90 B.in mtocrf Move To One Condition Register Field X 31 122 76 B.in popcntb Population Count Bytes X 31 758 360 E dcba Data Cache Block Allocate X 31 470 P 554 E dcbi Data Cache Block Invalidate X 31 22 359 E icbt Instruction Cache Block Touch X 31 854 374 E mbar Memory Barrier X 31 275 91 E mfapidi Move From APID Indirect XFX 31 323 S 527 E mfdcr Move From Device Control Register Appendix G. Power ISA Instruction Set Sorted by Category 793 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 291 91 E mfdcrux Move From Device Control Register User-mode Indexed X 31 259 P 527 E mfdcrx Move From Device Control Register Indexed XFX 31 451 P 526 E mtdcr Move To Device Control Register X 31 419 91 E mtdcrux Move To Device Control Register User-mode Indexed X 31 387 P 526 E mtdcrx Move To Device Control Register Indexed X 31 146 P 527 E mtmsr Move To Machine State Register XL 19 51 P 516 E rfci Return From Critical Interrupt XL 19 50 P 515 E rfi Return From Interrupt XL 19 38 P 516 E rfmci Return From Machine Check Interrupt X 31 786 P 560, E tlbivax TLB Invalidate Virtual Address Indexed 649 X 31 946 P 560, E tlbre TLB Read Entry 650 X 31 914 P 561, E tlbsx TLB Search Indexed 650 X 31 978 P 562, E tlbwe TLB Write Entry 651 X 31 131 S 528 E wrtee Write MSR External Enable X 31 163 S 528 E wrteei Write MSR External Enable Immediate X 31 326 632 E.CD dcread Data Cache Read [Alternative Encoding] X 31 486 632 E.CD dcread Data Cache Read X 31 998 633 E.CD icread Instruction Cache Read X 31 454 629 E.CI dci Data Cache Invalidate X 31 966 629 E.CI ici Instruction Cache Invalidate XFX 19 198 620 E.ED dnh Debugger Notify Halt X 19 39 516 E.ED rfdi Return From Debug Interrupt X 31 238 623 E.PC msgclr Message Clear X 31 206 623 E.PC msgsnd Message Send X 31 127 534 E.PD dcbfep Data Cache Block Flush by External PID X 31 63 533 E.PD dcbstep Data Cache Block Store by External PID X 31 319 533 E.PD dcbtep Data Cache Block Touch by External PID X 31 255 535 E.PD dcbtstep Data Cache Block Touch for Store by External PID X 31 1023 536 E.PD dcbzep Data Cache Block set to Zero by External PID EVX 31 285 538 E.PD evlddepx Vector Load Doubleword into Doubleword by External Process ID Indexed EVX 31 413 538 E.PD evstddepx Vector Store Doubleword into Doubleword by External Process ID Indexed X 31 991 536 E.PD icbiep Instruction Cache Block Invalidate by External PID X 31 95 529 E.PD lbepx Load Byte by External Process ID Indexed X 31 29 530 E.PD ldepx Load Doubleword by External Process ID Indexed X 31 607 537 E.PD lfdepx Load Floating-Point Double by External Process ID Indexed X 31 287 529 E.PD lhepx Load Halfword by External Process ID Indexed X 31 295 539 E.PD lvepx Load Vector by External Process ID Indexed X 31 263 539 E.PD lvepxl Load Vector by External Process ID Indexed LRU X 31 31 530 E.PD lwepx Load Word by External Process ID Indexed X 31 223 531 E.PD stbepx Store Byte by External Process ID Indexed X 31 157 532 E.PD stdepx Store Doubleword by External Process ID Indexed X 31 735 537 E.PD stfdepx Store Floating-Point Double by External Process ID Indexed X 31 415 531 E.PD sthepx Store Halfword by External Process ID Indexed X 31 807 540 E.PD stvepx Store Vector by External Process ID Indexed X 31 775 540 E.PD stvepxl Store Vector by External Process ID Indexed LRU X 31 159 532 E.PD stwepx Store Word by External Process ID Indexed XFX 31 334 658 E.PM mfpmr Move From Performance Monitor Register XFX 31 462 658 E.PM mtpmr Move To Performance Monitor Register X 31 310 382 EC eciwx External Control In Word Indexed X 31 438 382 EC ecowx External Control Out Word Indexed X 31 390 558 ECL dcblc Data Cache Block Lock Clear X 31 166 557 ECL dcbtls Data Cache Block Touch and Lock Set 794 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 134 557 ECL dcbtstls Data Cache Block Touch for Store and Lock Set X 31 230 559 ECL icblc Instruction Cache Block Lock Clear X 31 486 558 ECL icbtls Instruction Cache Block Touch and Lock Set X 63 32 129 FP fcmpo Floating Compare Ordered X 63 0 129 FP fcmpu Floating Compare Unordered D 50 113 FP lfd Load Floating-Point Double D 51 113 FP lfdu Load Floating-Point Double with Update X 31 631 113 FP lfdux Load Floating-Point Double with Update Indexed X 31 599 113 FP lfdx Load Floating-Point Double Indexed D 48 115 FP lfs Load Floating-Point Single D 49 115 FP lfsu Load Floating-Point Single with Update X 31 567 115 FP lfsux Load Floating-Point Single with Update Indexed X 31 535 115 FP lfsx Load Floating-Point Single Indexed X 63 64 131 FP mcrfs Move to Condition Register from FPSCR D 54 116 FP stfd Store Floating-Point Double D 55 116 FP stfdu Store Floating-Point Double with Update X 31 759 116 FP stfdux Store Floating-Point Double with Update Indexed X 31 727 116 FP stfdx Store Floating-Point Double Indexed X 31 983 117 FP stfiwx Store Floating-Point as Integer Word Indexed D 52 115 FP stfs Store Floating-Point Single D 53 115 FP stfsu Store Floating-Point Single with Update X 31 695 115 FP stfsux Store Floating-Point Single with Update Indexed X 31 663 115 FP stfsx Store Floating-Point Single Indexed X 63 264 118 FP[R] fabs[.] Floating Absolute Value A 63 21 119 FP[R] fadd[.] Floating Add A 59 21 119 FP[R] fadds[.] Floating Add Single X 63 846 127 FP[R] fcfid[.] Floating Convert From Integer Doubleword X 63 814 125 FP[R] fctid[.] Floating Convert To Integer Doubleword X 63 815 126 FP[R] fctidz[.] Floating Convert To Integer Doubleword with round toward Zero X 63 14 126 FP[R] fctiw[.] Floating Convert To Integer Word X 63 15 127 FP[R] fctiwz[.] Floating Convert To Integer Word with round toward Zero A 63 18 120 FP[R] fdiv[.] Floating Divide A 59 18 120 FP[R] fdivs[.] Floating Divide Single A 63 29 123 FP[R] fmadd[.] Floating Multiply-Add A 59 29 123 FP[R] fmadds[.] Floating Multiply-Add Single X 63 72 118 FP[R] fmr[.] Floating Move Register A 63 28 123 FP[R] fmsub[.] Floating Multiply-Subtract A 59 28 123 FP[R] fmsubs[.] Floating Multiply-Subtract Single A 63 25 120 FP[R] fmul[.] Floating Multiply A 59 25 120 FP[R] fmuls[.] Floating Multiply Single X 63 136 118 FP[R] fnabs[.] Floating Negative Absolute Value X 63 40 118 FP[R] fneg[.] Floating Negate A 63 31 124 FP[R] fnmadd[.] Floating Negative Multiply-Add A 59 31 124 FP[R] fnmadds[.] Floating Negative Multiply-Add Single A 63 30 124 FP[R] fnmsub[.] Floating Negative Multiply-Subtract A 59 30 124 FP[R] fnmsubs[.] Floating Negative Multiply-Subtract Single A 63 24 121 FP[R] fre[.] Floating Reciprocal Estimate A 59 24 121 FP[R] fres[.] Floating Reciprocal Estimate Single X 63 488 128 FP[R] frim[.] Floating Round to Integer Minus A 63 23 130 FP[R] fsel[.] Floating Select A 63 22 121 FP[R] fsqrt[.] Floating Square Root A 59 22 121 FP[R] fsqrts[.] Floating Square Root Single A 63 20 119 FP[R] fsub[.] Floating Subtract A 59 20 119 FP[R] fsubs[.] Floating Subtract Single X 63 583 131 FP[R] mffs[.] Move From FPSCR X 63 70 132 FP[R] mtfsb0[.] Move To FPSCR Bit 0 X 63 38 132 FP[R] mtfsb1[.] Move To FPSCR Bit 1 XFL 63 711 131 FP[R] mtfsf[.] Move To FPSCR Fields X 63 134 131 FP[R] mtfsfi[.] Move To FPSCR Field Immediate Appendix G. Power ISA Instruction Set Sorted by Category 795 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 63 392 128 FP[R].in frin[.] Floating Round to Integer Nearest X 63 456 128 FP[R].in frip[.] Floating Round to Integer Plus X 63 424 128 FP[R].in friz[.] Floating Round to Integer Toward Zero X 63 12 125 FP[R].in frsp[.] Floating Round to Single-Precision A 63 26 122 FP[R].in frsqrte[.] Floating Reciprocal Square Root Estimate A 59 26 122 FP[R].in frsqrtes[.] Floating Reciprocal Square Root Estimate Single XO 4 172 289 LMA macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed XO 4 236 289 LMA macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed XO 4 204 290 LMA macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned XO 4 140 290 LMA macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned XO 4 44 291 LMA machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed XO 4 108 291 LMA machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed XO 4 76 292 LMA machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned XO 4 12 292 LMA machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned XO 4 428 293 LMA maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed XO 4 492 293 LMA maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed XO 4 460 294 LMA maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned XO 4 396 294 LMA maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned X 4 168 294 LMA mulchw[.] Multiply Cross Halfword to Word Signed X 4 136 294 LMA mulchwu[.] Multiply Cross Halfword to Word Unsigned X 4 40 295 LMA mulhhw[.] Multiply High Halfword to Word Signed X 4 8 295 LMA mulhhwu[.] Multiply High Halfword to Word Unsigned X 4 424 295 LMA mullhw[.] Multiply Low Halfword to Word Signed X 4 392 295 LMA mullhwu[.] Multiply Low Halfword to Word Unsigned XO 4 174 296 LMA nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Modulo Signed XO 4 238 296 LMA nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Saturate Signed XO 4 46 297 LMA nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Modulo Signed XO 4 110 297 LMA nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Saturate Signed XO 4 430 298 LMA nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Modulo Signed XO 4 494 298 LMA nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Saturate Signed X 31 78 287 LMV dlmzb[.] Determine Leftmost Zero Byte DQ 56 P 410 LSQ lq Load Quadword DS 62 2 P 410 LSQ stq Store Quadword X 31 597 55 MA lswi Load String Word Immediate X 31 533 55 MA lswx Load String Word Indexed X 31 725 56 MA stswi Store String Word Immediate X 31 661 56 MA stswx Store String Word Indexed X 31 854 374 S eieio Enforce In-order Execution of I/O XL 19 274 H 405 S hrfid Hypervisor Return From Interrupt Doubleword X 31 595 32 P 449 S mfsr Move From Segment Register X 31 659 32 P 449 S mfsrin Move From Segment Register Indirect XFX 31 371 378 S mftb Move From Time Base 796 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 146 P 415 S mtmsr Move To Machine State Register X 31 178 P 416 S mtmsrd Move To Machine State Register Doubleword X 31 210 32 P 448 S mtsr Move To Segment Register X 31 242 32 P 448 S mtsrin Move To Segment Register Indirect XL 19 18 P 405 S rfid Return From Interrupt Doubleword X 31 498 P 444 S slbia SLB Invalidate All X 31 434 P 443 S slbie SLB Invalidate Entry X 31 915 P 446 S slbmfee SLB Move From Entry ESID X 31 851 P 446 S slbmfev SLB Move From Entry VSID X 31 402 P 445 S slbmte SLB Move To Entry X 31 370 P 453 S tlbia TLB Invalidate All X 31 306 64 H 450 S tlbie TLB Invalidate Entry X 31 274 64 H 452 S tlbiel TLB Invalidate Entry Local EVX 4 527 208 SP brinc Bit Reversed Increment EVX 4 520 208 SP evabs Vector Absolute Value EVX 4 514 208 SP evaddiw Vector Add Immediate Word EVX 4 1225 208 SP evaddsmiaaw Vector Add Signed, Modulo, Integer to Accumulator Word EVX 4 1217 209 SP evaddssiaaw Vector Add Signed, Saturate, Integer to Accumulator Word EVX 4 1224 209 SP evaddumiaaw Vector Add Unsigned, Modulo, Integer to Accumulator Word EVX 4 1216 209 SP evaddusiaaw Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX 4 512 209 SP evaddw Vector Add Word EVX 4 529 210 SP evand Vector AND EVX 4 530 210 SP evandc Vector AND with Complement EVX 4 564 210 SP evcmpeq Vector Compare Equal EVX 4 561 210 SP evcmpgts Vector Compare Greater Than Signed EVX 4 560 211 SP evcmpgtu Vector Compare Greater Than Unsigned EVX 4 563 211 SP evcmplts Vector Compare Less Than Signed EVX 4 562 211 SP evcmpltu Vector Compare Less Than Unsigned EVX 4 526 212 SP evcntlsw Vector Count Leading Signed Bits Word EVX 4 525 212 SP evcntlzw Vector Count Leading Zeros Word EVX 4 1222 212 SP evdivws Vector Divide Word Signed EVX 4 1223 213 SP evdivwu Vector Divide Word Unsigned EVX 4 537 213 SP eveqv Vector Equivalent EVX 4 522 213 SP evextsb Vector Extend Sign Byte EVX 4 523 213 SP evextsh Vector Extend Sign Halfword EVX 4 769 214 SP evldd Vector Load Double Word into Double Word EVX 4 768 214 SP evlddx Vector Load Double Word into Double Word Indexed EVX 4 773 214 SP evldh Vector Load Double into Four Halfwords EVX 4 772 214 SP evldhx Vector Load Double into Four Halfwords Indexed EVX 4 771 215 SP evldw Vector Load Double into Two Words EVX 4 770 215 SP evldwx Vector Load Double into Two Words Indexed EVX 4 777 215 SP evlhhesplat Vector Load Halfword into Halfwords Even and Splat EVX 4 776 215 SP evlhhesplatx Vector Load Halfword into Halfwords Even and Splat Indexed EVX 4 783 216 SP evlhhossplat Vector Load Halfword into Halfword Odd Signed and Splat EVX 4 782 216 SP evlhhossplatx Vector Load Halfword into Halfword Odd Signed and Splat Indexed EVX 4 781 216 SP evlhhousplat Vector Load Halfword into Halfword Odd Unsigned and Splat EVX 4 780 216 SP evlhhousplatx Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed EVX 4 785 217 SP evlwhe Vector Load Word into Two Halfwords Even EVX 4 784 217 SP evlwhex Vector Load Word into Two Halfwords Even Indexed EVX 4 791 217 SP evlwhos Vector Load Word into Two Halfwords Odd Signed (with sign extension) Appendix G. Power ISA Instruction Set Sorted by Category 797 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 790 217 SP evlwhosx Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) EVX 4 789 218 SP evlwhou Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) EVX 4 788 218 SP evlwhoux Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) EVX 4 797 218 SP evlwhsplat Vector Load Word into Two Halfwords and Splat EVX 4 796 218 SP evlwhsplatx Vector Load Word into Two Halfwords and Splat Indexed EVX 4 793 219 SP evlwwsplat Vector Load Word into Word and Splat EVX 4 792 219 SP evlwwsplatx Vector Load Word into Word and Splat Indexed EVX 4 556 219 SP evmergehi Vector Merge High EVX 4 558 220 SP evmergehilo Vector Merge High/Low EVX 4 557 219 SP evmergelo Vector Merge Low EVX 4 559 220 SP evmergelohi Vector Merge Low/High EVX 4 1323 220 SP evmhegsmfaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate EVX 4 1451 220 SP evmhegsmfan Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 4 1321 221 SP evmhegsmiaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate EVX 4 1449 221 SP evmhegsmian Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 4 1320 221 SP evmhegumiaa Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1448 221 SP evmhegumian Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1035 222 SP evmhesmf Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional EVX 4 1067 222 SP evmhesmfa Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional to Accumulator EVX 4 1291 222 SP evmhesmfaaw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1419 222 SP evmhesmfanw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1033 223 SP evmhesmi Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger EVX 4 1065 223 SP evmhesmia Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger to Accumulator EVX 4 1289 223 SP evmhesmiaaw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1417 223 SP evmhesmianw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1027 224 SP evmhessf Vector Multiply Halfwords, Even, Signed, Saturate, Fractional EVX 4 1059 224 SP evmhessfa Vector Multiply Halfwords, Even, Signed, Saturate, Fractional to Accumulator EVX 4 1283 225 SP evmhessfaaw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Words EVX 4 1411 225 SP evmhessfanw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate Negative into Words EVX 4 1281 226 SP evmhessiaaw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate into Words EVX 4 1409 226 SP evmhessianw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate Negative into Words EVX 4 1032 227 SP evmheumi Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer EVX 4 1064 227 SP evmheumia Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator 798 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1288 227 SP evmheumiaaw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1416 227 SP evmheumianw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 4 1280 228 SP evmheusiaaw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1408 228 SP evmheusianw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 4 1327 229 SP evmhogsmfaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate EVX 4 1455 229 SP evmhogsmfan Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate Negative EVX 4 1325 229 SP evmhogsmiaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate EVX 4 1453 229 SP evmhogsmian Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate Negative EVX 4 1324 230 SP evmhogumiaa Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1452 230 SP evmhogumian Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1039 230 SP evmhosmf Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional EVX 4 1071 230 SP evmhosmfa Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional to Accumulator EVX 4 1295 231 SP evmhosmfaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1423 231 SP evmhosmfanw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1037 231 SP evmhosmi Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger EVX 4 1069 231 SP evmhosmia Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger to Accumulator EVX 4 1293 232 SP evmhosmiaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1421 231 SP evmhosmianw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1031 233 SP evmhossf Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional EVX 4 1063 233 SP evmhossfa Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional to Accumulator EVX 4 1287 234 SP evmhossfaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate into Words EVX 4 1415 234 SP evmhossfanw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 4 1285 235 SP evmhossiaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate into Words EVX 4 1413 235 SP evmhossianw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate Negative into Words EVX 4 1036 235 SP evmhoumi Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer EVX 4 1068 235 SP evmhoumia Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator EVX 4 1292 236 SP evmhoumiaaw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1420 232 SP evmhoumianw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 4 1284 236 SP evmhousiaaw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words Appendix G. Power ISA Instruction Set Sorted by Category 799 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1412 236 SP evmhousianw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 4 1220 237 SP evmra Initialize Accumulator EVX 4 1103 237 SP evmwhsmf Vector Multiply Word High Signed, Modulo, Fractional EVX 4 1135 237 SP evmwhsmfa Vector Multiply Word High Signed, Modulo, Fractional to Accumulator EVX 4 1101 237 SP evmwhsmi Vector Multiply Word High Signed, Modulo, Integer EVX 4 1133 237 SP evmwhsmia Vector Multiply Word High Signed, Modulo, Integer to Accumulator EVX 4 1095 238 SP evmwhssf Vector Multiply Word High Signed, Saturate, Fractional EVX 4 1127 238 SP evmwhssfa Vector Multiply Word High Signed, Saturate, Fractional to Accumulator EVX 4 1100 238 SP evmwhumi Vector Multiply Word High Unsigned, Modulo, Integer EVX 4 1132 238 SP evmwhumia Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator EVX 4 1353 239 SP evmwlsmiaaw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words EVX 4 1481 239 SP evmwlsmianw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words EVX 4 1345 239 SP evmwlssiaaw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words EVX 4 1473 239 SP evmwlssianw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words EVX 4 1096 240 SP evmwlumi Vector Multiply Word Low Unsigned, Modulo, Integer EVX 4 1128 240 SP evmwlumia Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator EVX 4 1352 240 SP evmwlumiaaw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1480 240 SP evmwlumianw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words EVX 4 1344 241 SP evmwlusiaaw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1472 241 SP evmwlusianw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words EVX 4 1115 241 SP evmwsmf Vector Multiply Word Signed, Modulo, Fractional EVX 4 1147 241 SP evmwsmfa Vector Multiply Word Signed, Modulo, Fractional to Accumulator EVX 4 1371 242 SP evmwsmfaa Vector Multiply Word Signed, Modulo, Fractional and Accumulate EVX 4 1499 242 SP evmwsmfan Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative EVX 4 1113 242 SP evmwsmi Vector Multiply Word Signed, Modulo, Integer EVX 4 1145 242 SP evmwsmia Vector Multiply Word Signed, Modulo, Integer to Accu- mulator EVX 4 1369 242 SP evmwsmiaa Vector Multiply Word Signed, Modulo, Integer and Accumulate EVX 4 1497 242 SP evmwsmian Vector Multiply Word Signed, Modulo, Integer and Accumulate Negative EVX 4 1107 243 SP evmwssf Vector Multiply Word Signed, Saturate, Fractional EVX 4 1139 243 SP evmwssfa Vector Multiply Word Signed, Saturate, Fractional to Accumulator EVX 4 1363 243 SP evmwssfaa Vector Multiply Word Signed, Saturate, Fractional and Accumulate EVX 4 1491 244 SP evmwssfan Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative EVX 4 1112 244 SP evmwumi Vector Multiply Word Unsigned, Modulo, Integer EVX 4 1144 244 SP evmwumia Vector Multiply Word Unsigned, Modulo, Integer to Accumulator 800 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1368 245 SP evmwumiaa Vector Multiply Word Unsigned, Modulo, Integer and Accumulate EVX 4 1496 245 SP evmwumian Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative EVX 4 542 245 SP evnand Vector NAND EVX 4 521 245 SP evneg Vector Negate EVX 4 536 245 SP evnor Vector NOR EVX 4 535 246 SP evor Vector OR EVX 4 539 246 SP evorc Vector OR with Complement EVX 4 552 246 SP evrlw Vector Rotate Left Word EVX 4 554 247 SP evrlwi Vector Rotate Left Word Immediate EVX 4 524 247 SP evrndw Vector Round Word EVS 4 79 247 SP evsel Vector Select EVX 4 548 248 SP evslw Vector Shift Left Word EVX 4 550 248 SP evslwi Vector Shift Left Word Immediate EVX 4 555 248 SP evsplatfi Vector Splat Fractional Immediate EVX 4 553 248 SP evsplati Vector Splat Immediate EVX 4 547 248 SP evsrwis Vector Shift Right Word Immediate Signed EVX 4 546 248 SP evsrwiu Vector Shift Right Word Immediate Unsigned EVX 4 545 249 SP evsrws Vector Shift Right Word Signed EVX 4 544 249 SP evsrwu Vector Shift Right Word Unsigned EVX 4 801 249 SP evstdd Vector Store Double of Double EVX 4 800 249 SP evstddx Vector Store Double of Double Indexed EVX 4 805 250 SP evstdh Vector Store Double of Four Halfwords EVX 4 804 250 SP evstdhx Vector Store Double of Four Halfwords Indexed EVX 4 803 250 SP evstdw Vector Store Double of Two Words EVX 4 802 250 SP evstdwx Vector Store Double of Two Words Indexed EVX 4 817 251 SP evstwhe Vector Store Word of Two Halfwords from Even EVX 4 816 251 SP evstwhex Vector Store Word of Two Halfwords from Even Indexed EVX 4 821 251 SP evstwho Vector Store Word of Two Halfwords from Odd EVX 4 820 251 SP evstwhox Vector Store Word of Two Halfwords from Odd Indexed EVX 4 825 251 SP evstwwe Vector Store Word of Word from Even EVX 4 824 251 SP evstwwex Vector Store Word of Word from Even Indexed EVX 4 829 252 SP evstwwo Vector Store Word of Word from Odd EVX 4 828 252 SP evstwwox Vector Store Word of Word from Odd Indexed EVX 4 1227 252 SP evsubfsmiaaw Vector Subtract Signed, Modulo, Integer to Accumula- tor Word EVX 4 1219 252 SP evsubfssiaaw Vector Subtract Signed, Saturate, Integer to Accumula- tor Word EVX 4 1226 253 SP evsubfumiaaw Vector Subtract Unsigned, Modulo, Integer to Accumu- lator Word EVX 4 1218 253 SP evsubfusiaaw Vector Subtract Unsigned, Saturate, Integer to Accu- mulator Word EVX 4 516 253 SP evsubfw Vector Subtract from Word EVX 4 518 253 SP evsubifw Vector Subtract Immediate from Word EVX 4 534 253 SP evxor Vector XOR EVX 4 740 274 SP.FD efdabs Floating-Point Double-Precision Absolute Value EVX 4 736 275 SP.FD efdadd Floating-Point Double-Precision Add EVX 4 751 280 SP.FD efdcfs Floating-Point Double-Precision Convert from Single- Precision EVX 4 755 278 SP.FD efdcfsf Convert Floating-Point Double-Precision from Signed Fraction EVX 4 753 277 SP.FD efdcfsi Convert Floating-Point Double-Precision from Signed Integer EVX 4 739 278 SP.FD efdcfsid Convert Floating-Point Double-Precision from Signed Integer Doubleword EVX 4 754 278 SP.FD efdcfuf Convert Floating-Point Double-Precision from Unsigned Fraction EVX 4 752 277 SP.FD efdcfui Convert Floating-Point Double-Precision from Unsigned Integer Appendix G. Power ISA Instruction Set Sorted by Category 801 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 738 278 SP.FD efdcfuid Convert Floating-Point Double-Precision from Unsigned Integer Doubleword EVX 4 750 276 SP.FD efdcmpeq Floating-Point Double-Precision Compare Equal EVX 4 748 276 SP.FD efdcmpgt Floating-Point Double-Precision Compare Greater Than EVX 4 749 276 SP.FD efdcmplt Floating-Point Double-Precision Compare Less Than EVX 4 759 280 SP.FD efdctsf Convert Floating-Point Double-Precision to Signed Fraction EVX 4 757 278 SP.FD efdctsi Convert Floating-Point Double-Precision to Signed Inte- ger EVX 4 747 279 SP.FD efdctsidz Convert Floating-Point Double-Precision to Signed Inte- ger Doubleword with Round toward Zero EVX 4 762 280 SP.FD efdctsiz Convert Floating-Point Double-Precision to Signed Inte- ger with Round toward Zero EVX 4 758 280 SP.FD efdctuf Convert Floating-Point Double-Precision to Unsigned Fraction EVX 4 756 278 SP.FD efdctui Convert Floating-Point Double-Precision to Unsigned Integer EVX 4 746 279 SP.FD efdctuidz Convert Floating-Point Double-Precision to Unsigned Integer Doubleword with Round toward Zero EVX 4 760 280 SP.FD efdctuiz Convert Floating-Point Double-Precision to Unsigned Integer with Round toward Zero EVX 4 745 275 SP.FD efddiv Floating-Point Double-Precision Divide EVX 4 744 275 SP.FD efdmul Floating-Point Double-Precision Multiply EVX 4 741 274 SP.FD efdnabs Floating-Point Double-Precision Negative Absolute Value EVX 4 742 274 SP.FD efdneg Floating-Point Double-Precision Negate EVX 4 737 275 SP.FD efdsub Floating-Point Double-Precision Subtract EVX 4 766 277 SP.FD efdtsteq Floating-Point Double-Precision Test Equal EVX 4 764 276 SP.FD efdtstgt Floating-Point Double-Precision Test Greater Than EVX 4 765 277 SP.FD efdtstlt Floating-Point Double-Precision Test Less Than EVX 4 719 281 SP.FD efscfd Floating-Point Single-Precision Convert from Double- Precision EVX 4 708 267 SP.FS efsabs Floating-Point Single-Precision Absolute Value EVX 4 704 268 SP.FS efsadd Floating-Point Single-Precision Add EVX 4 723 272 SP.FS efscfsf Convert Floating-Point Single-Precision from Signed Fraction EVX 4 721 272 SP.FS efscfsi Convert Floating-Point Single-Precision from Signed Integer EVX 4 722 272 SP.FS efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 720 272 SP.FS efscfui Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 718 270 SP.FS efscmpeq Floating-Point Single-Precision Compare Equal EVX 4 716 269 SP.FS efscmpgt Floating-Point Single-Precision Compare Greater Than EVX 4 717 269 SP.FS efscmplt Floating-Point Single-Precision Compare Less Than EVX 4 727 273 SP.FS efsctsf Convert Floating-Point Single-Precision to Signed Frac- tion EVX 4 725 272 SP.FS efsctsi Convert Floating-Point Single-Precision to Signed Inte- ger EVX 4 730 273 SP.FS efsctsiz Convert Floating-Point Single-Precision to Signed Inte- ger with Round toward Zero EVX 4 726 273 SP.FS efsctuf Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 724 272 SP.FS efsctui Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 728 273 SP.FS efsctuiz Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 713 268 SP.FS efsdiv Floating-Point Single-Precision Divide EVX 4 712 268 SP.FS efsmul Floating-Point Single-Precision Multiply 802 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 709 267 SP.FS efsnabs Floating-Point Single-Precision Negative Absolute Value EVX 4 710 267 SP.FS efsneg Floating-Point Single-Precision Negate EVX 4 705 268 SP.FS efssub Floating-Point Single-Precision Subtract EVX 4 734 271 SP.FS efststeq Floating-Point Single-Precision Test Equal EVX 4 732 270 SP.FS efststgt Floating-Point Single-Precision Test Greater Than EVX 4 733 271 SP.FS efststlt Floating-Point Single-Precision Test Less Than EVX 4 644 259 SP.FV evfsabs Vector Floating-Point Single-Precision Absolute Value EVX 4 640 260 SP.FV evfsadd Vector Floating-Point Single-Precision Add EVX 4 659 264 SP.FV evfscfsf Vector Convert Floating-Point Single-Precision from Signed Fraction EVX 4 657 264 SP.FV evfscfsi Vector Convert Floating-Point Single-Precision from Signed Integer EVX 4 658 264 SP.FV evfscfuf Vector Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 656 264 SP.FV evfscfui Vector Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 654 262 SP.FV evfscmpeq Vector Floating-Point Single-Precision Compare Equal EVX 4 652 261 SP.FV evfscmpgt Vector Floating-Point Single-Precision Compare Greater Than EVX 4 653 261 SP.FV evfscmplt Vector Floating-Point Single-Precision Compare Less Than EVX 4 663 266 SP.FV evfsctsf Vector Convert Floating-Point Single-Precision to Signed Fraction EVX 4 661 265 SP.FV evfsctsi Vector Convert Floating-Point Single-Precision to Signed Integer EVX 4 666 265 SP.FV evfsctsiz Vector Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero EVX 4 662 266 SP.FV evfsctuf Vector Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 660 265 SP.FV evfsctui Vector Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 664 265 SP.FV evfsctuiz Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 649 260 SP.FV evfsdiv Vector Floating-Point Single-Precision Divide EVX 4 648 260 SP.FV evfsmul Vector Floating-Point Single-Precision Multiply EVX 4 645 259 SP.FV evfsnabs Vector Floating-Point Single-Precision Negative Abso- lute Value EVX 4 646 259 SP.FV evfsneg Vector Floating-Point Single-Precision Negate EVX 4 641 260 SP.FV evfssub Vector Floating-Point Single-Precision Subtract EVX 4 670 263 SP.FV evfststeq Vector Floating-Point Single-Precision Test Equal EVX 4 668 262 SP.FV evfststgt Vector Floating-Point Single-Precision Test Greater Than EVX 4 669 263 SP.FV evfststlt Vector Floating-Point Single-Precision Test Less Than X 31 7 146 V lvebx Load Vector Element Byte Indexed X 31 39 143 V lvehx Load Vector Element Halfword Indexed X 31 71 143 V lvewx Load Vector Element Word Indexed X 31 6 148 V lvsl Load Vector for Shift Left Indexed X 31 38 148 V lvsr Load Vector for Shift Right Indexed X 31 103 144 V lvx Load Vector Indexed X 31 359 144 V lvxl Load Vector Indexed Last VX 4 1540 199 V mfvscr Move From Vector Status and Control Register VX 4 1604 199 V mtvscr Move To Vector Status and Control Register X 31 135 146 V stvebx Store Vector Element Byte Indexed X 31 167 146 V stvehx Store Vector Element Halfword Indexed X 31 199 147 V stvewx Store Vector Element Word Indexed X 31 231 144 V stvx Store Vector Indexed X 31 487 147 V stvxl Store Vector Indexed Last VX 4 384 160 V vaddcuw Vector Add and Write Carry-Out Unsigned Word VX 4 10 189 V vaddfp Vector Add Single-Precision Appendix G. Power ISA Instruction Set Sorted by Category 803 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 768 160 V vaddsbs Vector Add Signed Byte Saturate VX 4 832 160 V vaddshs Vector Add Signed Halfword Saturate VX 4 896 160 V vaddsws Vector Add Signed Word Saturate VX 4 0 161 V vaddubm Vector Add Unsigned Byte Modulo VX 4 512 162 V vaddubs Vector Add Unsigned Byte Saturate VX 4 64 161 V vadduhm Vector Add Unsigned Halfword Modulo VX 4 576 162 V vadduhs Vector Add Unsigned Halfword Saturate VX 4 128 161 V vadduwm Vector Add Unsigned Word Modulo VX 4 640 162 V vadduws Vector Add Unsigned Word Saturate VX 4 1028 184 V vand Vector Logical AND VX 4 1092 184 V vandc Vector Logical AND with Complement VX 4 1282 175 V vavgsb Vector Average Signed Byte VX 4 1346 175 V vavgsh Vector Average Signed Halfword VX 4 1410 175 V vavgsw Vector Average Signed Word VX 4 1026 176 V vavgub Vector Average Unsigned Byte VX 4 1090 176 V vavguh Vector Average Unsigned Halfword VX 4 1154 176 V vavguw Vector Average Unsigned Word VX 4 842 193 V vcfsx Vector Convert From Signed Fixed-Point Word VX 4 778 193 V vcfux Vector Convert From Unsigned Fixed-Point Word VC 4 966 195 V vcmpbfp[.] Vector Compare Bounds Single-Precision VC 4 198 195 V vcmpeqfp[.] Vector Compare Equal To Single-Precision VC 4 6 181 V vcmpequb[.] Vector Compare Equal To Unsigned Byte VC 4 70 181 V vcmpequh[.] Vector Compare Equal To Unsigned Halfword VC 4 134 182 V vcmpequw[.] Vector Compare Equal To Unsigned Word VC 4 454 196 V vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Pre- cision VC 4 710 196 V vcmpgtfp[.] Vector Compare Greater Than Single-Precision VC 4 774 182 V vcmpgtsb[.] Vector Compare Greater Than Signed Byte VC 4 838 182 V vcmpgtsh[.] Vector Compare Greater Than Signed Halfword VC 4 902 182 V vcmpgtsw[.] Vector Compare Greater Than Signed Word VC 4 518 183 V vcmpgtub[.] Vector Compare Greater Than Unsigned Byte VC 4 582 183 V vcmpgtuh[.] Vector Compare Greater Than Unsigned Halfword VC 4 646 183 V vcmpgtuw[.] Vector Compare Greater Than Unsigned Word VX 4 970 192 V vctsxs Vector Convert To Signed Fixed-Point Word Saturate VX 4 906 192 V vctuxs Vector Convert To Unsigned Fixed-Point Word Saturate VX 4 394 197 V vexptefp Vector 2 Raised to the Exponent Estimate Floating- Point VX 4 458 197 V vlogefp Vector Log Base 2 Estimate Floating-Point VA 4 46 190 V vmaddfp Vector Multiply-Add Single-Precision VX 4 1034 191 V vmaxfp Vector Maximum Single-Precision VX 4 258 177 V vmaxsb Vector Maximum Signed Byte VX 4 322 177 V vmaxsh Vector Maximum Signed Halfword VX 4 386 177 V vmaxsw Vector Maximum Signed Word VX 4 2 178 V vmaxub Vector Maximum Unsigned Byte VX 4 66 178 V vmaxuh Vector Maximum Unsigned Halfword VX 4 130 178 V vmaxuw Vector Maximum Unsigned Word VA 4 32 168 V vmhaddshs Vector Multiply-High-Add Signed Halfword Saturate VA 4 33 168 V vmhraddshs Vector Multiply-High-Round-Add Signed Halfword Satu- rate VX 4 1098 191 V vminfp Vector Minimum Single-Precision VX 4 770 179 V vminsb Vector Minimum Signed Byte VX 4 834 179 V vminsh Vector Minimum Signed Halfword VX 4 898 179 V vminsw Vector Minimum Signed Word VX 4 514 180 V vminub Vector Minimum Unsigned Byte VX 4 578 180 V vminuh Vector Minimum Unsigned Halfword VX 4 642 180 V vminuw Vector Minimum Unsigned Word VA 4 34 169 V vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo VX 4 12 154 V vmrghb Vector Merge High Byte VX 4 76 154 V vmrghh Vector Merge High Halfword VX 4 140 154 V vmrghw Vector Merge High Word VX 4 268 155 V vmrglb Vector Merge Low Byte 804 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 332 155 V vmrglh Vector Merge Low Halfword VX 4 396 155 V vmrglw Vector Merge Low Word VA 4 37 170 V vmsummbm Vector Multiply-Sum Mixed Byte Modulo VA 4 40 170 V vmsumshm Vector Multiply-Sum Signed Halfword Modulo VA 4 41 171 V vmsumshs Vector Multiply-Sum Signed Halfword Saturate VA 4 36 169 V vmsumubm Vector Multiply-Sum Unsigned Byte Modulo VA 4 38 171 V vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo VA 4 39 172 V vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate VX 4 776 166 V vmulesb Vector Multiply Even Signed Byte VX 4 840 166 V vmulesh Vector Multiply Even Signed Halfword VX 4 520 166 V vmuleub Vector Multiply Even Unsigned Byte VX 4 584 166 V vmuleuh Vector Multiply Even Unsigned Halfword VX 4 264 167 V vmulosb Vector Multiply Odd Signed Byte VX 4 328 167 V vmulosh Vector Multiply Odd Signed Halfword VX 4 8 167 V vmuloub Vector Multiply Odd Unsigned Byte VX 4 72 167 V vmulouh Vector Multiply Odd Unsigned Halfword VA 4 47 190 V vnmsubfp Vector Negative Multiply-Subtract Single-Precision VX 4 1284 184 V vnor Vector Logical NOR VX 4 1156 184 V vor Vector Logical OR VA 4 43 157 V vperm Vector Permute VX 4 782 149 V vpkpx Vector Pack Pixel VX 4 398 150 V vpkshss Vector Pack Signed Halfword Signed Saturate VX 4 270 150 V vpkshus Vector Pack Signed Halfword Unsigned Saturate VX 4 462 150 V vpkswss Vector Pack Signed Word Signed Saturate VX 4 334 150 V vpkswus Vector Pack Signed Word Unsigned Saturate VX 4 14 151 V vpkuhum Vector Pack Unsigned Halfword Unsigned Modulo VX 4 142 151 V vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate VX 4 78 151 V vpkuwum Vector Pack Unsigned Word Unsigned Modulo VX 4 206 151 V vpkuwus Vector Pack Unsigned Word Unsigned Saturate VX 4 266 198 V vrefp Vector Reciprocal Estimate Single-Precision VX 4 714 194 V vrfim Vector Round to Single-Precision Integer toward -Infin- ity VX 4 522 194 V vrfin Vector Round to Single-Precision Integer Nearest VX 4 650 194 V vrfip Vector Round to Single-Precision Integer toward +Infin- ity VX 4 586 194 V vrfiz Vector Round to Single-Precision Integer toward Zero VX 4 4 185 V vrlb Vector Rotate Left Byte VX 4 68 185 V vrlh Vector Rotate Left Halfword VX 4 132 185 V vrlw Vector Rotate Left Word VX 4 330 198 V vrsqrtefp Vector Reciprocal Square Root Estimate Single-Preci- sion VA 4 42 157 V vsel Vector Select VX 4 452 158 V vsl Vector Shift Left VX 4 260 186 V vslb Vector Shift Left Byte VA 4 44 158 V vsldoi Vector Shift Left Double by Octet Immediate VX 4 324 186 V vslh Vector Shift Left Halfword VX 4 1036 158 V vslo Vector Shift Left by Octet VX 4 388 186 V vslw Vector Shift Left Word VX 4 524 156 V vspltb Vector Splat Byte VX 4 588 156 V vsplth Vector Splat Halfword VX 4 780 156 V vspltisb Vector Splat Immediate Signed Byte VX 4 844 156 V vspltish Vector Splat Immediate Signed Halfword VX 4 908 156 V vspltisw Vector Splat Immediate Signed Word VX 4 652 156 V vspltw Vector Splat Word VX 4 708 159 V vsr Vector Shift Right VX 4 772 188 V vsrab Vector Shift Right Algebraic Byte VX 4 836 188 V vsrah Vector Shift Right Algebraic Halfword VX 4 900 188 V vsraw Vector Shift Right Algebraic Word VX 4 516 187 V vsrb Vector Shift Right Byte VX 4 580 187 V vsrh Vector Shift Right Halfword VX 4 1100 159 V vsro Vector Shift Right by Octet Appendix G. Power ISA Instruction Set Sorted by Category 805 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 644 187 V vsrw Vector Shift Right Word VX 4 1408 163 V vsubcuw Vector Subtract and Write Carry-Out Unsigned Word VX 4 74 189 V vsubfp Vector Subtract Single-Precision VX 4 1792 163 V vsubsbs Vector Subtract Signed Byte Saturate VX 4 1856 163 V vsubshs Vector Subtract Signed Halfword Saturate VX 4 1920 163 V vsubsws Vector Subtract Signed Word Saturate VX 4 1024 164 V vsububm Vector Subtract Unsigned Byte Modulo VX 4 1536 165 V vsububs Vector Subtract Unsigned Byte Saturate VX 4 1088 164 V vsubuhm Vector Subtract Unsigned Halfword Modulo VX 4 1600 164 V vsubuhs Vector Subtract Unsigned Halfword Saturate VX 4 1152 164 V vsubuwm Vector Subtract Unsigned Word Modulo VX 4 1664 165 V vsubuws Vector Subtract Unsigned Word Saturate VX 4 1672 173 V vsum2sws Vector Sum across Half Signed Word Saturate VX 4 1800 174 V vsum4sbs Vector Sum across Quarter Signed Byte Saturate VX 4 1608 174 V vsum4shs Vector Sum across Quarter Signed Halfword Saturate VX 4 1544 174 V vsum4ubs Vector Sum across Quarter Unsigned Byte Saturate VX 4 1928 173 V vsumsws Vector Sum across Signed Word Saturate VX 4 846 152 V vupkhpx Vector Unpack High Pixel VX 4 526 152 V vupkhsb Vector Unpack High Signed Byte VX 4 590 152 V vupkhsh Vector Unpack High Signed Halfword VX 4 974 153 V vupklpx Vector Unpack Low Pixel VX 4 654 153 V vupklsb Vector Unpack Low Signed Byte VX 4 718 153 V vupklsh Vector Unpack Low Signed Halfword VX 4 1220 184 V vxor Vector Logical XOR X 31 62 375 WT wait Wait 1 See the key to the mode dependency and privilege columns on page 839 and the key to the category column in Section 1.3.5 of Book I. 806 Power ISATM -- Book Appendices Version 2.04 Appendix H. Power ISA Instruction Set Sorted by Opcode This appendix lists all the instructions in the Power ISA, in order by opcode. Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext D 2 70 64 tdi Trap Doubleword Immediate D 3 69 B twi Trap Word Immediate VX 4 0 161 V vaddubm Vector Add Unsigned Byte Modulo VX 4 2 178 V vmaxub Vector Maximum Unsigned Byte VX 4 4 185 V vrlb Vector Rotate Left Byte VC 4 6 181 V vcmpequb[.] Vector Compare Equal To Unsigned Byte X 4 8 295 LMA mulhhwu[.] Multiply High Halfword to Word Unsigned VX 4 8 167 V vmuloub Vector Multiply Odd Unsigned Byte VX 4 10 189 V vaddfp Vector Add Single-Precision XO 4 12 292 LMA machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned VX 4 12 154 V vmrghb Vector Merge High Byte VX 4 14 151 V vpkuhum Vector Pack Unsigned Halfword Unsigned Modulo VA 4 32 168 V vmhaddshs Vector Multiply-High-Add Signed Halfword Saturate VA 4 33 168 V vmhraddshs Vector Multiply-High-Round-Add Signed Halfword Satu- rate VA 4 34 169 V vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo VA 4 36 169 V vmsumubm Vector Multiply-Sum Unsigned Byte Modulo VA 4 37 170 V vmsummbm Vector Multiply-Sum Mixed Byte Modulo VA 4 38 171 V vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo VA 4 39 172 V vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate X 4 40 295 LMA mulhhw[.] Multiply High Halfword to Word Signed VA 4 40 170 V vmsumshm Vector Multiply-Sum Signed Halfword Modulo VA 4 41 171 V vmsumshs Vector Multiply-Sum Signed Halfword Saturate VA 4 42 157 V vsel Vector Select VA 4 43 157 V vperm Vector Permute XO 4 44 291 LMA machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed VA 4 44 158 V vsldoi Vector Shift Left Double by Octet Immediate XO 4 46 297 LMA nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Modulo Signed VA 4 46 190 V vmaddfp Vector Multiply-Add Single-Precision VA 4 47 190 V vnmsubfp Vector Negative Multiply-Subtract Single-Precision VX 4 64 161 V vadduhm Vector Add Unsigned Halfword Modulo VX 4 66 178 V vmaxuh Vector Maximum Unsigned Halfword VX 4 68 185 V vrlh Vector Rotate Left Halfword VC 4 70 181 V vcmpequh[.] Vector Compare Equal To Unsigned Halfword VX 4 72 167 V vmulouh Vector Multiply Odd Unsigned Halfword VX 4 74 189 V vsubfp Vector Subtract Single-Precision XO 4 76 292 LMA machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned VX 4 76 154 V vmrghh Vector Merge High Halfword VX 4 78 151 V vpkuwum Vector Pack Unsigned Word Unsigned Modulo EVS 4 79 247 SP evsel Vector Select Appendix H. Power ISA Instruction Set Sorted by Opcode 807 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XO 4 108 291 LMA machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed XO 4 110 297 LMA nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Saturate Signed VX 4 128 161 V vadduwm Vector Add Unsigned Word Modulo VX 4 130 178 V vmaxuw Vector Maximum Unsigned Word VX 4 132 185 V vrlw Vector Rotate Left Word VC 4 134 182 V vcmpequw[.] Vector Compare Equal To Unsigned Word X 4 136 294 LMA mulchwu[.] Multiply Cross Halfword to Word Unsigned XO 4 140 290 LMA macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned VX 4 140 154 V vmrghw Vector Merge High Word VX 4 142 151 V vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate X 4 168 294 LMA mulchw[.] Multiply Cross Halfword to Word Signed XO 4 172 289 LMA macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed XO 4 174 296 LMA nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Modulo Signed VC 4 198 195 V vcmpeqfp[.] Vector Compare Equal To Single-Precision XO 4 204 290 LMA macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned VX 4 206 151 V vpkuwus Vector Pack Unsigned Word Unsigned Saturate XO 4 236 289 LMA macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed XO 4 238 296 LMA nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Saturate Signed VX 4 258 177 V vmaxsb Vector Maximum Signed Byte VX 4 260 186 V vslb Vector Shift Left Byte VX 4 264 167 V vmulosb Vector Multiply Odd Signed Byte VX 4 266 198 V vrefp Vector Reciprocal Estimate Single-Precision VX 4 268 155 V vmrglb Vector Merge Low Byte VX 4 270 150 V vpkshus Vector Pack Signed Halfword Unsigned Saturate VX 4 322 177 V vmaxsh Vector Maximum Signed Halfword VX 4 324 186 V vslh Vector Shift Left Halfword VX 4 328 167 V vmulosh Vector Multiply Odd Signed Halfword VX 4 330 198 V vrsqrtefp Vector Reciprocal Square Root Estimate Single-Preci- sion VX 4 332 155 V vmrglh Vector Merge Low Halfword VX 4 334 150 V vpkswus Vector Pack Signed Word Unsigned Saturate VX 4 384 160 V vaddcuw Vector Add and Write Carry-Out Unsigned Word VX 4 386 177 V vmaxsw Vector Maximum Signed Word VX 4 388 186 V vslw Vector Shift Left Word X 4 392 295 LMA mullhwu[.] Multiply Low Halfword to Word Unsigned VX 4 394 197 V vexptefp Vector 2 Raised to the Exponent Estimate Floating- Point XO 4 396 294 LMA maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned VX 4 396 155 V vmrglw Vector Merge Low Word VX 4 398 150 V vpkshss Vector Pack Signed Halfword Signed Saturate X 4 424 295 LMA mullhw[.] Multiply Low Halfword to Word Signed XO 4 428 293 LMA maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed XO 4 430 298 LMA nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Modulo Signed VX 4 452 158 V vsl Vector Shift Left VC 4 454 196 V vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Pre- cision VX 4 458 197 V vlogefp Vector Log Base 2 Estimate Floating-Point XO 4 460 294 LMA maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned 808 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 462 150 V vpkswss Vector Pack Signed Word Signed Saturate XO 4 492 293 LMA maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed XO 4 494 298 LMA nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Saturate Signed EVX 4 512 209 SP evaddw Vector Add Word VX 4 512 162 V vaddubs Vector Add Unsigned Byte Saturate EVX 4 514 208 SP evaddiw Vector Add Immediate Word VX 4 514 180 V vminub Vector Minimum Unsigned Byte EVX 4 516 253 SP evsubfw Vector Subtract from Word VX 4 516 187 V vsrb Vector Shift Right Byte EVX 4 518 253 SP evsubifw Vector Subtract Immediate from Word VC 4 518 183 V vcmpgtub[.] Vector Compare Greater Than Unsigned Byte EVX 4 520 208 SP evabs Vector Absolute Value VX 4 520 166 V vmuleub Vector Multiply Even Unsigned Byte EVX 4 521 245 SP evneg Vector Negate EVX 4 522 213 SP evextsb Vector Extend Sign Byte VX 4 522 194 V vrfin Vector Round to Single-Precision Integer Nearest EVX 4 523 213 SP evextsh Vector Extend Sign Halfword EVX 4 524 247 SP evrndw Vector Round Word VX 4 524 156 V vspltb Vector Splat Byte EVX 4 525 212 SP evcntlzw Vector Count Leading Zeros Word EVX 4 526 212 SP evcntlsw Vector Count Leading Signed Bits Word VX 4 526 152 V vupkhsb Vector Unpack High Signed Byte EVX 4 527 208 SP brinc Bit Reversed Increment EVX 4 529 210 SP evand Vector AND EVX 4 530 210 SP evandc Vector AND with Complement EVX 4 534 253 SP evxor Vector XOR EVX 4 535 246 SP evor Vector OR EVX 4 536 245 SP evnor Vector NOR EVX 4 537 213 SP eveqv Vector Equivalent EVX 4 539 246 SP evorc Vector OR with Complement EVX 4 542 245 SP evnand Vector NAND EVX 4 544 249 SP evsrwu Vector Shift Right Word Unsigned EVX 4 545 249 SP evsrws Vector Shift Right Word Signed EVX 4 546 248 SP evsrwiu Vector Shift Right Word Immediate Unsigned EVX 4 547 248 SP evsrwis Vector Shift Right Word Immediate Signed EVX 4 548 248 SP evslw Vector Shift Left Word EVX 4 550 248 SP evslwi Vector Shift Left Word Immediate EVX 4 552 246 SP evrlw Vector Rotate Left Word EVX 4 553 248 SP evsplati Vector Splat Immediate EVX 4 554 247 SP evrlwi Vector Rotate Left Word Immediate EVX 4 555 248 SP evsplatfi Vector Splat Fractional Immediate EVX 4 556 219 SP evmergehi Vector Merge High EVX 4 557 219 SP evmergelo Vector Merge Low EVX 4 558 220 SP evmergehilo Vector Merge High/Low EVX 4 559 220 SP evmergelohi Vector Merge Low/High EVX 4 560 211 SP evcmpgtu Vector Compare Greater Than Unsigned EVX 4 561 210 SP evcmpgts Vector Compare Greater Than Signed EVX 4 562 211 SP evcmpltu Vector Compare Less Than Unsigned EVX 4 563 211 SP evcmplts Vector Compare Less Than Signed EVX 4 564 210 SP evcmpeq Vector Compare Equal VX 4 576 162 V vadduhs Vector Add Unsigned Halfword Saturate VX 4 578 180 V vminuh Vector Minimum Unsigned Halfword VX 4 580 187 V vsrh Vector Shift Right Halfword VC 4 582 183 V vcmpgtuh[.] Vector Compare Greater Than Unsigned Halfword VX 4 584 166 V vmuleuh Vector Multiply Even Unsigned Halfword VX 4 586 194 V vrfiz Vector Round to Single-Precision Integer toward Zero VX 4 588 156 V vsplth Vector Splat Halfword VX 4 590 152 V vupkhsh Vector Unpack High Signed Halfword EVX 4 640 260 SP.FV evfsadd Vector Floating-Point Single-Precision Add Appendix H. Power ISA Instruction Set Sorted by Opcode 809 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 640 162 V vadduws Vector Add Unsigned Word Saturate EVX 4 641 260 SP.FV evfssub Vector Floating-Point Single-Precision Subtract VX 4 642 180 V vminuw Vector Minimum Unsigned Word EVX 4 644 259 SP.FV evfsabs Vector Floating-Point Single-Precision Absolute Value VX 4 644 187 V vsrw Vector Shift Right Word EVX 4 645 259 SP.FV evfsnabs Vector Floating-Point Single-Precision Negative Abso- lute Value EVX 4 646 259 SP.FV evfsneg Vector Floating-Point Single-Precision Negate VC 4 646 183 V vcmpgtuw[.] Vector Compare Greater Than Unsigned Word EVX 4 648 260 SP.FV evfsmul Vector Floating-Point Single-Precision Multiply EVX 4 649 260 SP.FV evfsdiv Vector Floating-Point Single-Precision Divide VX 4 650 194 V vrfip Vector Round to Single-Precision Integer toward +Infin- ity EVX 4 652 261 SP.FV evfscmpgt Vector Floating-Point Single-Precision Compare Greater Than VX 4 652 156 V vspltw Vector Splat Word EVX 4 653 261 SP.FV evfscmplt Vector Floating-Point Single-Precision Compare Less Than EVX 4 654 262 SP.FV evfscmpeq Vector Floating-Point Single-Precision Compare Equal VX 4 654 153 V vupklsb Vector Unpack Low Signed Byte EVX 4 656 264 SP.FV evfscfui Vector Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 657 264 SP.FV evfscfsi Vector Convert Floating-Point Single-Precision from Signed Integer EVX 4 658 264 SP.FV evfscfuf Vector Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 659 264 SP.FV evfscfsf Vector Convert Floating-Point Single-Precision from Signed Fraction EVX 4 660 265 SP.FV evfsctui Vector Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 661 265 SP.FV evfsctsi Vector Convert Floating-Point Single-Precision to Signed Integer EVX 4 662 266 SP.FV evfsctuf Vector Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 663 266 SP.FV evfsctsf Vector Convert Floating-Point Single-Precision to Signed Fraction EVX 4 664 265 SP.FV evfsctuiz Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 666 265 SP.FV evfsctsiz Vector Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero EVX 4 668 262 SP.FV evfststgt Vector Floating-Point Single-Precision Test Greater Than EVX 4 669 263 SP.FV evfststlt Vector Floating-Point Single-Precision Test Less Than EVX 4 670 263 SP.FV evfststeq Vector Floating-Point Single-Precision Test Equal EVX 4 704 268 SP.FS efsadd Floating-Point Single-Precision Add EVX 4 705 268 SP.FS efssub Floating-Point Single-Precision Subtract EVX 4 708 267 SP.FS efsabs Floating-Point Single-Precision Absolute Value VX 4 708 159 V vsr Vector Shift Right EVX 4 709 267 SP.FS efsnabs Floating-Point Single-Precision Negative Absolute Value EVX 4 710 267 SP.FS efsneg Floating-Point Single-Precision Negate VC 4 710 196 V vcmpgtfp[.] Vector Compare Greater Than Single-Precision EVX 4 712 268 SP.FS efsmul Floating-Point Single-Precision Multiply EVX 4 713 268 SP.FS efsdiv Floating-Point Single-Precision Divide VX 4 714 194 V vrfim Vector Round to Single-Precision Integer toward -Infin- ity EVX 4 716 269 SP.FS efscmpgt Floating-Point Single-Precision Compare Greater Than EVX 4 717 269 SP.FS efscmplt Floating-Point Single-Precision Compare Less Than EVX 4 718 270 SP.FS efscmpeq Floating-Point Single-Precision Compare Equal VX 4 718 153 V vupklsh Vector Unpack Low Signed Halfword 810 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 719 281 SP.FD efscfd Floating-Point Single-Precision Convert from Double- Precision EVX 4 720 272 SP.FS efscfui Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 721 272 SP.FS efscfsi Convert Floating-Point Single-Precision from Signed Integer EVX 4 722 272 SP.FS efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 723 272 SP.FS efscfsf Convert Floating-Point Single-Precision from Signed Fraction EVX 4 724 272 SP.FS efsctui Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 725 272 SP.FS efsctsi Convert Floating-Point Single-Precision to Signed Inte- ger EVX 4 726 273 SP.FS efsctuf Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 727 273 SP.FS efsctsf Convert Floating-Point Single-Precision to Signed Frac- tion EVX 4 728 273 SP.FS efsctuiz Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 730 273 SP.FS efsctsiz Convert Floating-Point Single-Precision to Signed Inte- ger with Round toward Zero EVX 4 732 270 SP.FS efststgt Floating-Point Single-Precision Test Greater Than EVX 4 733 271 SP.FS efststlt Floating-Point Single-Precision Test Less Than EVX 4 734 271 SP.FS efststeq Floating-Point Single-Precision Test Equal EVX 4 736 275 SP.FD efdadd Floating-Point Double-Precision Add EVX 4 737 275 SP.FD efdsub Floating-Point Double-Precision Subtract EVX 4 738 278 SP.FD efdcfuid Convert Floating-Point Double-Precision from Unsigned Integer Doubleword EVX 4 739 278 SP.FD efdcfsid Convert Floating-Point Double-Precision from Signed Integer Doubleword EVX 4 740 274 SP.FD efdabs Floating-Point Double-Precision Absolute Value EVX 4 741 274 SP.FD efdnabs Floating-Point Double-Precision Negative Absolute Value EVX 4 742 274 SP.FD efdneg Floating-Point Double-Precision Negate EVX 4 744 275 SP.FD efdmul Floating-Point Double-Precision Multiply EVX 4 745 275 SP.FD efddiv Floating-Point Double-Precision Divide EVX 4 746 279 SP.FD efdctuidz Convert Floating-Point Double-Precision to Unsigned Integer Doubleword with Round toward Zero EVX 4 747 279 SP.FD efdctsidz Convert Floating-Point Double-Precision to Signed Inte- ger Doubleword with Round toward Zero EVX 4 748 276 SP.FD efdcmpgt Floating-Point Double-Precision Compare Greater Than EVX 4 749 276 SP.FD efdcmplt Floating-Point Double-Precision Compare Less Than EVX 4 750 276 SP.FD efdcmpeq Floating-Point Double-Precision Compare Equal EVX 4 751 280 SP.FD efdcfs Floating-Point Double-Precision Convert from Single- Precision EVX 4 752 277 SP.FD efdcfui Convert Floating-Point Double-Precision from Unsigned Integer EVX 4 753 277 SP.FD efdcfsi Convert Floating-Point Double-Precision from Signed Integer EVX 4 754 278 SP.FD efdcfuf Convert Floating-Point Double-Precision from Unsigned Fraction EVX 4 755 278 SP.FD efdcfsf Convert Floating-Point Double-Precision from Signed Fraction EVX 4 756 278 SP.FD efdctui Convert Floating-Point Double-Precision to Unsigned Integer EVX 4 757 278 SP.FD efdctsi Convert Floating-Point Double-Precision to Signed Inte- ger Appendix H. Power ISA Instruction Set Sorted by Opcode 811 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 758 280 SP.FD efdctuf Convert Floating-Point Double-Precision to Unsigned Fraction EVX 4 759 280 SP.FD efdctsf Convert Floating-Point Double-Precision to Signed Fraction EVX 4 760 280 SP.FD efdctuiz Convert Floating-Point Double-Precision to Unsigned Integer with Round toward Zero EVX 4 762 280 SP.FD efdctsiz Convert Floating-Point Double-Precision to Signed Inte- ger with Round toward Zero EVX 4 764 276 SP.FD efdtstgt Floating-Point Double-Precision Test Greater Than EVX 4 765 277 SP.FD efdtstlt Floating-Point Double-Precision Test Less Than EVX 4 766 277 SP.FD efdtsteq Floating-Point Double-Precision Test Equal EVX 4 768 214 SP evlddx Vector Load Double Word into Double Word Indexed VX 4 768 160 V vaddsbs Vector Add Signed Byte Saturate EVX 4 769 214 SP evldd Vector Load Double Word into Double Word EVX 4 770 215 SP evldwx Vector Load Double into Two Words Indexed VX 4 770 179 V vminsb Vector Minimum Signed Byte EVX 4 771 215 SP evldw Vector Load Double into Two Words EVX 4 772 214 SP evldhx Vector Load Double into Four Halfwords Indexed VX 4 772 188 V vsrab Vector Shift Right Algebraic Byte EVX 4 773 214 SP evldh Vector Load Double into Four Halfwords VC 4 774 182 V vcmpgtsb[.] Vector Compare Greater Than Signed Byte EVX 4 776 215 SP evlhhesplatx Vector Load Halfword into Halfwords Even and Splat Indexed VX 4 776 166 V vmulesb Vector Multiply Even Signed Byte EVX 4 777 215 SP evlhhesplat Vector Load Halfword into Halfwords Even and Splat VX 4 778 193 V vcfux Vector Convert From Unsigned Fixed-Point Word EVX 4 780 216 SP evlhhousplatx Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed VX 4 780 156 V vspltisb Vector Splat Immediate Signed Byte EVX 4 781 216 SP evlhhousplat Vector Load Halfword into Halfword Odd Unsigned and Splat EVX 4 782 216 SP evlhhossplatx Vector Load Halfword into Halfword Odd Signed and Splat Indexed VX 4 782 149 V vpkpx Vector Pack Pixel EVX 4 783 216 SP evlhhossplat Vector Load Halfword into Halfword Odd Signed and Splat EVX 4 784 217 SP evlwhex Vector Load Word into Two Halfwords Even Indexed EVX 4 785 217 SP evlwhe Vector Load Word into Two Halfwords Even EVX 4 788 218 SP evlwhoux Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) EVX 4 789 218 SP evlwhou Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) EVX 4 790 217 SP evlwhosx Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) EVX 4 791 217 SP evlwhos Vector Load Word into Two Halfwords Odd Signed (with sign extension) EVX 4 792 219 SP evlwwsplatx Vector Load Word into Word and Splat Indexed EVX 4 793 219 SP evlwwsplat Vector Load Word into Word and Splat EVX 4 796 218 SP evlwhsplatx Vector Load Word into Two Halfwords and Splat Indexed EVX 4 797 218 SP evlwhsplat Vector Load Word into Two Halfwords and Splat EVX 4 800 249 SP evstddx Vector Store Double of Double Indexed EVX 4 801 249 SP evstdd Vector Store Double of Double EVX 4 802 250 SP evstdwx Vector Store Double of Two Words Indexed EVX 4 803 250 SP evstdw Vector Store Double of Two Words EVX 4 804 250 SP evstdhx Vector Store Double of Four Halfwords Indexed EVX 4 805 250 SP evstdh Vector Store Double of Four Halfwords EVX 4 816 251 SP evstwhex Vector Store Word of Two Halfwords from Even Indexed EVX 4 817 251 SP evstwhe Vector Store Word of Two Halfwords from Even EVX 4 820 251 SP evstwhox Vector Store Word of Two Halfwords from Odd Indexed 812 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 821 251 SP evstwho Vector Store Word of Two Halfwords from Odd EVX 4 824 251 SP evstwwex Vector Store Word of Word from Even Indexed EVX 4 825 251 SP evstwwe Vector Store Word of Word from Even EVX 4 828 252 SP evstwwox Vector Store Word of Word from Odd Indexed EVX 4 829 252 SP evstwwo Vector Store Word of Word from Odd VX 4 832 160 V vaddshs Vector Add Signed Halfword Saturate VX 4 834 179 V vminsh Vector Minimum Signed Halfword VX 4 836 188 V vsrah Vector Shift Right Algebraic Halfword VC 4 838 182 V vcmpgtsh[.] Vector Compare Greater Than Signed Halfword VX 4 840 166 V vmulesh Vector Multiply Even Signed Halfword VX 4 842 193 V vcfsx Vector Convert From Signed Fixed-Point Word VX 4 844 156 V vspltish Vector Splat Immediate Signed Halfword VX 4 846 152 V vupkhpx Vector Unpack High Pixel VX 4 896 160 V vaddsws Vector Add Signed Word Saturate VX 4 898 179 V vminsw Vector Minimum Signed Word VX 4 900 188 V vsraw Vector Shift Right Algebraic Word VC 4 902 182 V vcmpgtsw[.] Vector Compare Greater Than Signed Word VX 4 906 192 V vctuxs Vector Convert To Unsigned Fixed-Point Word Saturate VX 4 908 156 V vspltisw Vector Splat Immediate Signed Word VC 4 966 195 V vcmpbfp[.] Vector Compare Bounds Single-Precision VX 4 970 192 V vctsxs Vector Convert To Signed Fixed-Point Word Saturate VX 4 974 153 V vupklpx Vector Unpack Low Pixel VX 4 1024 164 V vsububm Vector Subtract Unsigned Byte Modulo VX 4 1026 176 V vavgub Vector Average Unsigned Byte EVX 4 1027 224 SP evmhessf Vector Multiply Halfwords, Even, Signed, Saturate, Fractional VX 4 1028 184 V vand Vector Logical AND EVX 4 1031 233 SP evmhossf Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional EVX 4 1032 227 SP evmheumi Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer EVX 4 1033 223 SP evmhesmi Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger VX 4 1034 191 V vmaxfp Vector Maximum Single-Precision EVX 4 1035 222 SP evmhesmf Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional EVX 4 1036 235 SP evmhoumi Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer VX 4 1036 158 V vslo Vector Shift Left by Octet EVX 4 1037 231 SP evmhosmi Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger EVX 4 1039 230 SP evmhosmf Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional EVX 4 1059 224 SP evmhessfa Vector Multiply Halfwords, Even, Signed, Saturate, Fractional to Accumulator EVX 4 1063 233 SP evmhossfa Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional to Accumulator EVX 4 1064 227 SP evmheumia Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator EVX 4 1065 223 SP evmhesmia Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger to Accumulator EVX 4 1067 222 SP evmhesmfa Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional to Accumulator EVX 4 1068 235 SP evmhoumia Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator EVX 4 1069 231 SP evmhosmia Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger to Accumulator EVX 4 1071 230 SP evmhosmfa Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional to Accumulator VX 4 1088 164 V vsubuhm Vector Subtract Unsigned Halfword Modulo Appendix H. Power ISA Instruction Set Sorted by Opcode 813 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 1090 176 V vavguh Vector Average Unsigned Halfword VX 4 1092 184 V vandc Vector Logical AND with Complement EVX 4 1095 238 SP evmwhssf Vector Multiply Word High Signed, Saturate, Fractional EVX 4 1096 240 SP evmwlumi Vector Multiply Word Low Unsigned, Modulo, Integer VX 4 1098 191 V vminfp Vector Minimum Single-Precision EVX 4 1100 238 SP evmwhumi Vector Multiply Word High Unsigned, Modulo, Integer VX 4 1100 159 V vsro Vector Shift Right by Octet EVX 4 1101 237 SP evmwhsmi Vector Multiply Word High Signed, Modulo, Integer EVX 4 1103 237 SP evmwhsmf Vector Multiply Word High Signed, Modulo, Fractional EVX 4 1107 243 SP evmwssf Vector Multiply Word Signed, Saturate, Fractional EVX 4 1112 244 SP evmwumi Vector Multiply Word Unsigned, Modulo, Integer EVX 4 1113 242 SP evmwsmi Vector Multiply Word Signed, Modulo, Integer EVX 4 1115 241 SP evmwsmf Vector Multiply Word Signed, Modulo, Fractional EVX 4 1127 238 SP evmwhssfa Vector Multiply Word High Signed, Saturate, Fractional to Accumulator EVX 4 1128 240 SP evmwlumia Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator EVX 4 1132 238 SP evmwhumia Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator EVX 4 1133 237 SP evmwhsmia Vector Multiply Word High Signed, Modulo, Integer to Accumulator EVX 4 1135 237 SP evmwhsmfa Vector Multiply Word High Signed, Modulo, Fractional to Accumulator EVX 4 1139 243 SP evmwssfa Vector Multiply Word Signed, Saturate, Fractional to Accumulator EVX 4 1144 244 SP evmwumia Vector Multiply Word Unsigned, Modulo, Integer to Accumulator EVX 4 1145 242 SP evmwsmia Vector Multiply Word Signed, Modulo, Integer to Accu- mulator EVX 4 1147 241 SP evmwsmfa Vector Multiply Word Signed, Modulo, Fractional to Accumulator VX 4 1152 164 V vsubuwm Vector Subtract Unsigned Word Modulo VX 4 1154 176 V vavguw Vector Average Unsigned Word VX 4 1156 184 V vor Vector Logical OR EVX 4 1216 209 SP evaddusiaaw Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX 4 1217 209 SP evaddssiaaw Vector Add Signed, Saturate, Integer to Accumulator Word EVX 4 1218 253 SP evsubfusiaaw Vector Subtract Unsigned, Saturate, Integer to Accu- mulator Word EVX 4 1219 252 SP evsubfssiaaw Vector Subtract Signed, Saturate, Integer to Accumula- tor Word EVX 4 1220 237 SP evmra Initialize Accumulator VX 4 1220 184 V vxor Vector Logical XOR EVX 4 1222 212 SP evdivws Vector Divide Word Signed EVX 4 1223 213 SP evdivwu Vector Divide Word Unsigned EVX 4 1224 209 SP evaddumiaaw Vector Add Unsigned, Modulo, Integer to Accumulator Word EVX 4 1225 208 SP evaddsmiaaw Vector Add Signed, Modulo, Integer to Accumulator Word EVX 4 1226 253 SP evsubfumiaaw Vector Subtract Unsigned, Modulo, Integer to Accumu- lator Word EVX 4 1227 252 SP evsubfsmiaaw Vector Subtract Signed, Modulo, Integer to Accumula- tor Word EVX 4 1280 228 SP evmheusiaaw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1281 226 SP evmhessiaaw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate into Words VX 4 1282 175 V vavgsb Vector Average Signed Byte 814 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1283 225 SP evmhessfaaw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Words EVX 4 1284 236 SP evmhousiaaw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words VX 4 1284 184 V vnor Vector Logical NOR EVX 4 1285 235 SP evmhossiaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate into Words EVX 4 1287 234 SP evmhossfaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate into Words EVX 4 1288 227 SP evmheumiaaw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1289 223 SP evmhesmiaaw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1291 222 SP evmhesmfaaw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1292 236 SP evmhoumiaaw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1293 232 SP evmhosmiaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1295 231 SP evmhosmfaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1320 221 SP evmhegumiaa Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1321 221 SP evmhegsmiaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate EVX 4 1323 220 SP evmhegsmfaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate EVX 4 1324 230 SP evmhogumiaa Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1325 229 SP evmhogsmiaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate EVX 4 1327 229 SP evmhogsmfaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate EVX 4 1344 241 SP evmwlusiaaw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1345 239 SP evmwlssiaaw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words VX 4 1346 175 V vavgsh Vector Average Signed Halfword EVX 4 1352 240 SP evmwlumiaaw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1353 239 SP evmwlsmiaaw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words EVX 4 1363 243 SP evmwssfaa Vector Multiply Word Signed, Saturate, Fractional and Accumulate EVX 4 1368 245 SP evmwumiaa Vector Multiply Word Unsigned, Modulo, Integer and Accumulate EVX 4 1369 242 SP evmwsmiaa Vector Multiply Word Signed, Modulo, Integer and Accumulate EVX 4 1371 242 SP evmwsmfaa Vector Multiply Word Signed, Modulo, Fractional and Accumulate EVX 4 1408 228 SP evmheusianw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words VX 4 1408 163 V vsubcuw Vector Subtract and Write Carry-Out Unsigned Word EVX 4 1409 226 SP evmhessianw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate Negative into Words VX 4 1410 175 V vavgsw Vector Average Signed Word EVX 4 1411 225 SP evmhessfanw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate Negative into Words Appendix H. Power ISA Instruction Set Sorted by Opcode 815 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1412 236 SP evmhousianw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 4 1413 235 SP evmhossianw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate Negative into Words EVX 4 1415 234 SP evmhossfanw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 4 1416 227 SP evmheumianw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 4 1417 223 SP evmhesmianw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1419 222 SP evmhesmfanw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1420 232 SP evmhoumianw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 4 1421 231 SP evmhosmianw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1423 231 SP evmhosmfanw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1448 221 SP evmhegumian Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1449 221 SP evmhegsmian Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 4 1451 220 SP evmhegsmfan Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 4 1452 230 SP evmhogumian Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1453 229 SP evmhogsmian Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate Negative EVX 4 1455 229 SP evmhogsmfan Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate Negative EVX 4 1472 241 SP evmwlusianw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words EVX 4 1473 239 SP evmwlssianw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words EVX 4 1480 240 SP evmwlumianw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words EVX 4 1481 239 SP evmwlsmianw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words EVX 4 1491 244 SP evmwssfan Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative EVX 4 1496 245 SP evmwumian Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1497 242 SP evmwsmian Vector Multiply Word Signed, Modulo, Integer and Accumulate Negative EVX 4 1499 242 SP evmwsmfan Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative VX 4 1536 165 V vsububs Vector Subtract Unsigned Byte Saturate VX 4 1540 199 V mfvscr Move From Vector Status and Control Register VX 4 1544 174 V vsum4ubs Vector Sum across Quarter Unsigned Byte Saturate VX 4 1600 164 V vsubuhs Vector Subtract Unsigned Halfword Saturate VX 4 1604 199 V mtvscr Move To Vector Status and Control Register VX 4 1608 174 V vsum4shs Vector Sum across Quarter Signed Halfword Saturate VX 4 1664 165 V vsubuws Vector Subtract Unsigned Word Saturate VX 4 1672 173 V vsum2sws Vector Sum across Half Signed Word Saturate VX 4 1792 163 V vsubsbs Vector Subtract Signed Byte Saturate VX 4 1800 174 V vsum4sbs Vector Sum across Quarter Signed Byte Saturate VX 4 1856 163 V vsubshs Vector Subtract Signed Halfword Saturate VX 4 1920 163 V vsubsws Vector Subtract Signed Word Saturate VX 4 1928 173 V vsumsws Vector Sum across Signed Word Saturate D 7 63 B mulli Multiply Low Immediate 816 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext D 8 SR 60 B subfic Subtract From Immediate Carrying D 10 68 B cmpli Compare Logical Immediate D 11 67 B cmpi Compare Immediate D 12 SR 59 B addic Add Immediate Carrying D 13 SR 59 B addic. Add Immediate Carrying and Record D 14 58 B addi Add Immediate D 15 58 B addis Add Immediate Shifted B 16 CT 31 B bc[l][a] Branch Conditional SC 17 35, B sc System Call 404, 515 I 18 31 B b[l][a] Branch XL 19 0 34 B mcrf Move Condition Register Field XL 19 16 CT 32 B bclr[l] Branch Conditional to Link Register XL 19 18 P 405 S rfid Return From Interrupt Doubleword XL 19 33 34 B crnor Condition Register NOR XL 19 38 P 516 E rfmci Return From Machine Check Interrupt X 19 39 516 E.ED rfdi Return From Debug Interrupt XL 19 50 P 515 E rfi Return From Interrupt XL 19 51 P 516 E rfci Return From Critical Interrupt XL 19 129 34 B crandc Condition Register AND with Complement XL 19 150 369 B isync Instruction Synchronize XL 19 193 33 B crxor Condition Register XOR XFX 19 198 620 E.ED dnh Debugger Notify Halt XL 19 225 33 B crnand Condition Register NAND XL 19 257 33 B crand Condition Register AND XL 19 274 H 405 S hrfid Hypervisor Return From Interrupt Doubleword XL 19 289 34 B creqv Condition Register Equivalent XL 19 417 34 B crorc Condition Register OR with Complement XL 19 449 33 B cror Condition Register OR XL 19 528 CT 32 B bcctr[l] Branch Conditional to Count Register M 20 SR 79 B rlwimi[.] Rotate Left Word Immediate then Mask Insert M 21 SR 77 B rlwinm[.] Rotate Left Word Immediate then AND with Mask M 23 SR 78 B rlwnm[.] Rotate Left Word then AND with Mask D 24 71 B ori OR Immediate D 25 72 B oris OR Immediate Shifted D 26 72 B xori XOR Immediate D 27 72 B xoris XOR Immediate Shifted D 28 SR 71 B andi. AND Immediate D 29 SR 71 B andis. AND Immediate Shifted MD 30 0 SR 79 64 rldicl[.] Rotate Left Doubleword Immediate then Clear Left MD 30 1 SR 80 64 rldicr[.] Rotate Left Doubleword Immediate then Clear Right MD 30 2 SR 81 64 rldic[.] Rotate Left Doubleword Immediate then Clear MD 30 3 SR 82 64 rldimi[.] Rotate Left Doubleword Immediate then Mask Insert MDS 30 8 SR 81 64 rldcl[.] Rotate Left Doubleword then Clear Left MDS 30 9 SR 82 64 rldcr[.] Rotate Left Doubleword then Clear Right X 31 0 67 B cmp Compare X 31 4 69 B tw Trap Word X 31 6 148 V lvsl Load Vector for Shift Left Indexed X 31 7 146 V lvebx Load Vector Element Byte Indexed XO 31 8 SR 60 B subfc[o][.] Subtract From Carrying XO 31 9 SR 65 64 mulhdu[.] Multiply High Doubleword Unsigned XO 31 10 SR 60 B addc[o][.] Add Carrying XO 31 11 SR 63 B mulhwu[.] Multiply High Word Unsigned A 31 15 70 B.in isel Integer Select XFX 31 19 89 B mfcr Move From Condition Register XFX 31 19 90 B.in mfocrf Move From One Condition Register Field X 31 20 370 B lwarx Load Word And Reserve Indexed X 31 21 46 64 ldx Load Doubleword Indexed X 31 22 359 E icbt Instruction Cache Block Touch X 31 23 44 B lwzx Load Word and Zero Indexed Appendix H. Power ISA Instruction Set Sorted by Opcode 817 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 24 SR 83 B slw[.] Shift Left Word X 31 26 SR 74 B cntlzw[.] Count Leading Zeros Word X 31 27 SR 85 64 sld[.] Shift Left Doubleword X 31 28 SR 73 B and[.] AND X 31 29 530 E.PD ldepx Load Doubleword by External Process ID Indexed X 31 31 530 E.PD lwepx Load Word by External Process ID Indexed X 31 32 68 B cmpl Compare Logical X 31 38 148 V lvsr Load Vector for Shift Right Indexed X 31 39 143 V lvehx Load Vector Element Halfword Indexed XO 31 40 SR 59 B subf[o][.] Subtract From X 31 53 46 64 ldux Load Doubleword with Update Indexed X 31 54 366 B dcbst Data Cache Block Store X 31 55 44 B lwzux Load Word and Zero with Update Indexed X 31 58 SR 76 64 cntlzd[.] Count Leading Zeros Doubleword X 31 60 SR 74 B andc[.] AND with Complement X 31 62 375 WT wait Wait X 31 63 533 E.PD dcbstep Data Cache Block Store by External PID X 31 68 70 64 td Trap Doubleword X 31 71 143 V lvewx Load Vector Element Word Indexed XO 31 73 SR 65 64 mulhd[.] Multiply High Doubleword XO 31 75 SR 63 B mulhw[.] Multiply High Word X 31 78 287 LMV dlmzb[.] Determine Leftmost Zero Byte X 31 83 P 417, B mfmsr Move From Machine State Register 527 X 31 84 371 64 ldarx Load Doubleword And Reserve Indexed X 31 86 367 B dcbf Data Cache Block Flush X 31 87 42 B lbzx Load Byte and Zero Indexed X 31 95 529 E.PD lbepx Load Byte by External Process ID Indexed X 31 103 144 V lvx Load Vector Indexed XO 31 104 SR 62 B neg[o][.] Negate X 31 119 41 B lbzux Load Byte and Zero with Update Indexed X 31 122 76 B.in popcntb Population Count Bytes X 31 124 SR 74 B nor[.] NOR X 31 127 534 E.PD dcbfep Data Cache Block Flush by External PID X 31 131 S 528 E wrtee Write MSR External Enable X 31 134 557 ECL dcbtstls Data Cache Block Touch for Store and Lock Set X 31 135 146 V stvebx Store Vector Element Byte Indexed XO 31 136 SR 61 B subfe[o][.] Subtract From Extended XO 31 138 SR 61 B adde[o][.] Add Extended XFX 31 144 89 B mtcrf Move To Condition Register Fields XFX 31 144 90 B.in mtocrf Move To One Condition Register Field X 31 146 P 527 E mtmsr Move To Machine State Register X 31 146 P 415 S mtmsr Move To Machine State Register X 31 149 50 64 stdx Store Doubleword Indexed X 31 150 370 B stwcx. Store Word Conditional Indexed X 31 151 49 B stwx Store Word Indexed X 31 157 532 E.PD stdepx Store Doubleword by External Process ID Indexed X 31 159 532 E.PD stwepx Store Word by External Process ID Indexed X 31 163 S 528 E wrteei Write MSR External Enable Immediate X 31 166 557 ECL dcbtls Data Cache Block Touch and Lock Set X 31 167 146 V stvehx Store Vector Element Halfword Indexed X 31 178 P 416 S mtmsrd Move To Machine State Register Doubleword X 31 181 50 64 stdux Store Doubleword with Update Indexed X 31 183 49 B stwux Store Word with Update Indexed X 31 199 147 V stvewx Store Vector Element Word Indexed XO 31 200 SR 62 B subfze[o][.] Subtract From Zero Extended XO 31 202 SR 62 B addze[o][.] Add to Zero Extended X 31 206 623 E.PC msgsnd Message Send X 31 210 32 P 448 S mtsr Move To Segment Register X 31 214 371 64 stdcx. Store Doubleword Conditional Indexed X 31 215 47 B stbx Store Byte Indexed 818 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 223 531 E.PD stbepx Store Byte by External Process ID Indexed X 31 230 559 ECL icblc Instruction Cache Block Lock Clear X 31 231 144 V stvx Store Vector Indexed XO 31 232 SR 61 B subfme[o][.] Subtract From Minus One Extended XO 31 233 SR 65 64 mulld[o][.] Multiply Low Doubleword XO 31 234 SR 61 B addme[o][.] Add to Minus One Extended XO 31 235 SR 63 B mullw[o][.] Multiply Low Word X 31 238 623 E.PC msgclr Message Clear X 31 242 32 P 448 S mtsrin Move To Segment Register Indirect X 31 246 365 B dcbtst Data Cache Block Touch for Store X 31 247 47 B stbux Store Byte with Update Indexed X 31 255 535 E.PD dcbtstep Data Cache Block Touch for Store by External PID X 31 259 P 527 E mfdcrx Move From Device Control Register Indexed X 31 263 539 E.PD lvepxl Load Vector by External Process ID Indexed LRU XO 31 266 SR 59 B add[o][.] Add X 31 274 64 H 452 S tlbiel TLB Invalidate Entry Local X 31 275 91 E mfapidi Move From APID Indirect X 31 278 360 B dcbt Data Cache Block Touch X 31 279 42 B lhzx Load Halfword and Zero Indexed X 31 284 SR 74 B eqv[.] Equivalent EVX 31 285 538 E.PD evlddepx Vector Load Doubleword into Doubleword by External Process ID Indexed X 31 287 529 E.PD lhepx Load Halfword by External Process ID Indexed X 31 291 91 E mfdcrux Move From Device Control Register User-mode Indexed X 31 295 539 E.PD lvepx Load Vector by External Process ID Indexed X 31 306 64 H 450 S tlbie TLB Invalidate Entry X 31 310 382 EC eciwx External Control In Word Indexed X 31 311 42 B lhzux Load Halfword and Zero with Update Indexed X 31 316 SR 73 B xor[.] XOR X 31 319 533 E.PD dcbtep Data Cache Block Touch by External PID XFX 31 323 S 527 E mfdcr Move From Device Control Register X 31 326 632 E.CD dcread Data Cache Read [Alternative Encoding] XFX 31 334 658 E.PM mfpmr Move From Performance Monitor Register XFX 31 339 O 88,3 B mfspr Move From Special Purpose Register 78 X 31 341 45 64 lwax Load Word Algebraic Indexed X 31 343 43 B lhax Load Halfword Algebraic Indexed X 31 359 144 V lvxl Load Vector Indexed Last X 31 370 P 453 S tlbia TLB Invalidate All XFX 31 371 378 S mftb Move From Time Base X 31 373 45 64 lwaux Load Word Algebraic with Update Indexed X 31 375 43 B lhaux Load Halfword Algebraic with Update Indexed X 31 387 P 526 E mtdcrx Move To Device Control Register Indexed X 31 390 558 ECL dcblc Data Cache Block Lock Clear X 31 402 P 445 S slbmte SLB Move To Entry X 31 407 48 B sthx Store Halfword Indexed X 31 412 SR 74 B orc[.] OR with Complement XS 31 413 SR 85 64 sradi[.] Shift Right Algebraic Doubleword Immediate EVX 31 413 538 E.PD evstddepx Vector Store Doubleword into Doubleword by External Process ID Indexed X 31 415 531 E.PD sthepx Store Halfword by External Process ID Indexed X 31 419 91 E mtdcrux Move To Device Control Register User-mode Indexed X 31 434 P 443 S slbie SLB Invalidate Entry X 31 438 382 EC ecowx External Control Out Word Indexed X 31 439 48 B sthux Store Halfword with Update Indexed X 31 444 SR 73 B or[.] OR XFX 31 451 P 526 E mtdcr Move To Device Control Register X 31 454 629 E.CI dci Data Cache Invalidate XO 31 457 SR 66 64 divdu[o][.] Divide Doubleword Unsigned XO 31 459 SR 64 B divwu[o][.] Divide Word Unsigned Appendix H. Power ISA Instruction Set Sorted by Opcode 819 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XFX 31 462 658 E.PM mtpmr Move To Performance Monitor Register XFX 31 467 O87 B mtspr Move To Special Purpose Register X 31 470 P 554 E dcbi Data Cache Block Invalidate X 31 476 SR 73 B nand[.] NAND X 31 486 632 E.CD dcread Data Cache Read X 31 486 558 ECL icbtls Instruction Cache Block Touch and Lock Set X 31 487 147 V stvxl Store Vector Indexed Last XO 31 489 SR 66 64 divd[o][.] Divide Doubleword XO 31 491 SR 64 B divw[o][.] Divide Word X 31 498 P 444 S slbia SLB Invalidate All X 31 512 91 B mcrxr Move to Condition Register from XER X 31 533 55 MA lswx Load String Word Indexed X 31 534 51 B lwbrx Load Word Byte-Reverse Indexed X 31 535 115 FP lfsx Load Floating-Point Single Indexed X 31 536 SR 83 B srw[.] Shift Right Word X 31 539 SR 85 64 srd[.] Shift Right Doubleword X 31 566 H 453, B tlbsync TLB Synchronize 561, 651 X 31 567 115 FP lfsux Load Floating-Point Single with Update Indexed X 31 595 32 P 449 S mfsr Move From Segment Register X 31 597 55 MA lswi Load String Word Immediate X 31 598 372 B sync Synchronize X 31 599 113 FP lfdx Load Floating-Point Double Indexed X 31 607 537 E.PD lfdepx Load Floating-Point Double by External Process ID Indexed X 31 631 113 FP lfdux Load Floating-Point Double with Update Indexed X 31 659 32 P 449 S mfsrin Move From Segment Register Indirect X 31 661 56 MA stswx Store String Word Indexed X 31 662 51 B stwbrx Store Word Byte-Reverse Indexed X 31 663 115 FP stfsx Store Floating-Point Single Indexed X 31 695 115 FP stfsux Store Floating-Point Single with Update Indexed X 31 725 56 MA stswi Store String Word Immediate X 31 727 116 FP stfdx Store Floating-Point Double Indexed X 31 735 537 E.PD stfdepx Store Floating-Point Double by External Process ID Indexed X 31 758 360 E dcba Data Cache Block Allocate X 31 759 116 FP stfdux Store Floating-Point Double with Update Indexed X 31 775 540 E.PD stvepxl Store Vector by External Process ID Indexed LRU X 31 786 P 560, E tlbivax TLB Invalidate Virtual Address Indexed 649 X 31 790 51 B lhbrx Load Halfword Byte-Reverse Indexed X 31 792 SR 84 B sraw[.] Shift Right Algebraic Word X 31 794 SR 85 64 srad[.] Shift Right Algebraic Doubleword X 31 807 540 E.PD stvepx Store Vector by External Process ID Indexed X 31 824 SR 84 B srawi[.] Shift Right Algebraic Word Immediate X 31 851 P 446 S slbmfev SLB Move From Entry VSID X 31 854 374 E mbar Memory Barrier X 31 854 374 S eieio Enforce In-order Execution of I/O X 31 914 P 561, E tlbsx TLB Search Indexed 650 X 31 915 P 446 S slbmfee SLB Move From Entry ESID X 31 918 51 B sthbrx Store Halfword Byte-Reverse Indexed X 31 922 SR 74 B extsh[.] Extend Sign Halfword X 31 946 P 560, E tlbre TLB Read Entry 650 X 31 954 SR 74 B extsb[.] Extend Sign Byte X 31 966 629 E.CI ici Instruction Cache Invalidate X 31 978 P 562, E tlbwe TLB Write Entry 651 X 31 982 359 B icbi Instruction Cache Block Invalidate 820 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 983 117 FP stfiwx Store Floating-Point as Integer Word Indexed X 31 986 SR 76 64 extsw[.] Extend Sign Word X 31 991 536 E.PD icbiep Instruction Cache Block Invalidate by External PID X 31 998 633 E.CD icread Instruction Cache Read X 31 1014 366 B dcbz Data Cache Block set to Zero X 31 1023 536 E.PD dcbzep Data Cache Block set to Zero by External PID D 32 44 B lwz Load Word and Zero D 33 44 B lwzu Load Word and Zero with Update D 34 41 B lbz Load Byte and Zero D 35 41 B lbzu Load Byte and Zero with Update D 36 49 B stw Store Word D 37 49 B stwu Store Word with Update D 38 47 B stb Store Byte D 39 47 B stbu Store Byte with Update D 40 42 B lhz Load Halfword and Zero D 41 42 B lhzu Load Halfword and Zero with Update D 42 43 B lha Load Halfword Algebraic D 43 43 B lhau Load Halfword Algebraic with Update D 44 48 B sth Store Halfword D 45 48 B sthu Store Halfword with Update D 46 52 B lmw Load Multiple Word D 47 53 B stmw Store Multiple Word D 48 115 FP lfs Load Floating-Point Single D 49 115 FP lfsu Load Floating-Point Single with Update D 50 113 FP lfd Load Floating-Point Double D 51 113 FP lfdu Load Floating-Point Double with Update D 52 115 FP stfs Store Floating-Point Single D 53 115 FP stfsu Store Floating-Point Single with Update D 54 116 FP stfd Store Floating-Point Double D 55 116 FP stfdu Store Floating-Point Double with Update DQ 56 P 410 LSQ lq Load Quadword DS 58 0 46 64 ld Load Doubleword DS 58 1 46 64 ldu Load Doubleword with Update DS 58 2 45 64 lwa Load Word Algebraic A 59 18 120 FP[R] fdivs[.] Floating Divide Single A 59 20 119 FP[R] fsubs[.] Floating Subtract Single A 59 21 119 FP[R] fadds[.] Floating Add Single A 59 22 121 FP[R] fsqrts[.] Floating Square Root Single A 59 24 121 FP[R] fres[.] Floating Reciprocal Estimate Single A 59 25 120 FP[R] fmuls[.] Floating Multiply Single A 59 26 122 FP[R].in frsqrtes[.] Floating Reciprocal Square Root Estimate Single A 59 28 123 FP[R] fmsubs[.] Floating Multiply-Subtract Single A 59 29 123 FP[R] fmadds[.] Floating Multiply-Add Single A 59 30 124 FP[R] fnmsubs[.] Floating Negative Multiply-Subtract Single A 59 31 124 FP[R] fnmadds[.] Floating Negative Multiply-Add Single DS 62 0 50 64 std Store Doubleword DS 62 1 50 64 stdu Store Doubleword with Update DS 62 2 P 410 LSQ stq Store Quadword X 63 0 129 FP fcmpu Floating Compare Unordered X 63 12 125 FP[R].in frsp[.] Floating Round to Single-Precision X 63 14 126 FP[R] fctiw[.] Floating Convert To Integer Word X 63 15 127 FP[R] fctiwz[.] Floating Convert To Integer Word with round toward Zero A 63 18 120 FP[R] fdiv[.] Floating Divide A 63 20 119 FP[R] fsub[.] Floating Subtract A 63 21 119 FP[R] fadd[.] Floating Add A 63 22 121 FP[R] fsqrt[.] Floating Square Root A 63 23 130 FP[R] fsel[.] Floating Select A 63 24 121 FP[R] fre[.] Floating Reciprocal Estimate A 63 25 120 FP[R] fmul[.] Floating Multiply A 63 26 122 FP[R].in frsqrte[.] Floating Reciprocal Square Root Estimate Appendix H. Power ISA Instruction Set Sorted by Opcode 821 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext A 63 28 123 FP[R] fmsub[.] Floating Multiply-Subtract A 63 29 123 FP[R] fmadd[.] Floating Multiply-Add A 63 30 124 FP[R] fnmsub[.] Floating Negative Multiply-Subtract A 63 31 124 FP[R] fnmadd[.] Floating Negative Multiply-Add X 63 32 129 FP fcmpo Floating Compare Ordered X 63 38 132 FP[R] mtfsb1[.] Move To FPSCR Bit 1 X 63 40 118 FP[R] fneg[.] Floating Negate X 63 64 131 FP mcrfs Move to Condition Register from FPSCR X 63 70 132 FP[R] mtfsb0[.] Move To FPSCR Bit 0 X 63 72 118 FP[R] fmr[.] Floating Move Register X 63 134 131 FP[R] mtfsfi[.] Move To FPSCR Field Immediate X 63 136 118 FP[R] fnabs[.] Floating Negative Absolute Value X 63 264 118 FP[R] fabs[.] Floating Absolute Value X 63 392 128 FP[R].in frin[.] Floating Round to Integer Nearest X 63 424 128 FP[R].in friz[.] Floating Round to Integer Toward Zero X 63 456 128 FP[R].in frip[.] Floating Round to Integer Plus X 63 488 128 FP[R] frim[.] Floating Round to Integer Minus X 63 583 131 FP[R] mffs[.] Move From FPSCR XFL 63 711 131 FP[R] mtfsf[.] Move To FPSCR Fields X 63 814 125 FP[R] fctid[.] Floating Convert To Integer Doubleword X 63 815 126 FP[R] fctidz[.] Floating Convert To Integer Doubleword with round toward Zero X 63 846 127 FP[R] fcfid[.] Floating Convert From Integer Doubleword 1 See the key to the mode dependency and privilege columns on page 839 and the key to the category column in Section 1.3.5 of Book I. 822 Power ISATM -- Book Appendices Version 2.04 Appendix I. Power ISA Instruction Set Sorted by Mnemonic This appendix lists all the instructions in the Power ISA, in order by mnemonic. Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XO 31 266 SR 59 B add[o][.] Add XO 31 10 SR 60 B addc[o][.] Add Carrying XO 31 138 SR 61 B adde[o][.] Add Extended D 14 58 B addi Add Immediate D 12 SR 59 B addic Add Immediate Carrying D 13 SR 59 B addic. Add Immediate Carrying and Record D 15 58 B addis Add Immediate Shifted XO 31 234 SR 61 B addme[o][.] Add to Minus One Extended XO 31 202 SR 62 B addze[o][.] Add to Zero Extended X 31 28 SR 73 B and[.] AND X 31 60 SR 74 B andc[.] AND with Complement D 28 SR 71 B andi. AND Immediate D 29 SR 71 B andis. AND Immediate Shifted I 18 31 B b[l][a] Branch B 16 CT 31 B bc[l][a] Branch Conditional XL 19 528 CT 32 B bcctr[l] Branch Conditional to Count Register XL 19 16 CT 32 B bclr[l] Branch Conditional to Link Register EVX 4 527 208 SP brinc Bit Reversed Increment X 31 0 67 B cmp Compare D 11 67 B cmpi Compare Immediate X 31 32 68 B cmpl Compare Logical D 10 68 B cmpli Compare Logical Immediate X 31 58 SR 76 64 cntlzd[.] Count Leading Zeros Doubleword X 31 26 SR 74 B cntlzw[.] Count Leading Zeros Word XL 19 257 33 B crand Condition Register AND XL 19 129 34 B crandc Condition Register AND with Complement XL 19 289 34 B creqv Condition Register Equivalent XL 19 225 33 B crnand Condition Register NAND XL 19 33 34 B crnor Condition Register NOR XL 19 449 33 B cror Condition Register OR XL 19 417 34 B crorc Condition Register OR with Complement XL 19 193 33 B crxor Condition Register XOR X 31 758 360 E dcba Data Cache Block Allocate X 31 86 367 B dcbf Data Cache Block Flush X 31 127 534 E.PD dcbfep Data Cache Block Flush by External PID X 31 470 P 554 E dcbi Data Cache Block Invalidate X 31 390 558 ECL dcblc Data Cache Block Lock Clear X 31 54 366 B dcbst Data Cache Block Store X 31 63 533 E.PD dcbstep Data Cache Block Store by External PID X 31 278 360 B dcbt Data Cache Block Touch X 31 319 533 E.PD dcbtep Data Cache Block Touch by External PID X 31 166 557 ECL dcbtls Data Cache Block Touch and Lock Set X 31 246 365 B dcbtst Data Cache Block Touch for Store X 31 255 535 E.PD dcbtstep Data Cache Block Touch for Store by External PID Appendix I. Power ISA Instruction Set Sorted by Mnemonic 823 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 134 557 ECL dcbtstls Data Cache Block Touch for Store and Lock Set X 31 1014 366 B dcbz Data Cache Block set to Zero X 31 1023 536 E.PD dcbzep Data Cache Block set to Zero by External PID X 31 454 629 E.CI dci Data Cache Invalidate X 31 326 632 E.CD dcread Data Cache Read [Alternative Encoding] X 31 486 632 E.CD dcread Data Cache Read XO 31 489 SR 66 64 divd[o][.] Divide Doubleword XO 31 457 SR 66 64 divdu[o][.] Divide Doubleword Unsigned XO 31 491 SR 64 B divw[o][.] Divide Word XO 31 459 SR 64 B divwu[o][.] Divide Word Unsigned X 31 78 287 LMV dlmzb[.] Determine Leftmost Zero Byte XFX 19 198 620 E.ED dnh Debugger Notify Halt X 31 310 382 EC eciwx External Control In Word Indexed X 31 438 382 EC ecowx External Control Out Word Indexed EVX 4 740 274 SP.FD efdabs Floating-Point Double-Precision Absolute Value EVX 4 736 275 SP.FD efdadd Floating-Point Double-Precision Add EVX 4 751 280 SP.FD efdcfs Floating-Point Double-Precision Convert from Single- Precision EVX 4 755 278 SP.FD efdcfsf Convert Floating-Point Double-Precision from Signed Fraction EVX 4 753 277 SP.FD efdcfsi Convert Floating-Point Double-Precision from Signed Integer EVX 4 739 278 SP.FD efdcfsid Convert Floating-Point Double-Precision from Signed Integer Doubleword EVX 4 754 278 SP.FD efdcfuf Convert Floating-Point Double-Precision from Unsigned Fraction EVX 4 752 277 SP.FD efdcfui Convert Floating-Point Double-Precision from Unsigned Integer EVX 4 738 278 SP.FD efdcfuid Convert Floating-Point Double-Precision from Unsigned Integer Doubleword EVX 4 750 276 SP.FD efdcmpeq Floating-Point Double-Precision Compare Equal EVX 4 748 276 SP.FD efdcmpgt Floating-Point Double-Precision Compare Greater Than EVX 4 749 276 SP.FD efdcmplt Floating-Point Double-Precision Compare Less Than EVX 4 759 280 SP.FD efdctsf Convert Floating-Point Double-Precision to Signed Fraction EVX 4 757 278 SP.FD efdctsi Convert Floating-Point Double-Precision to Signed Inte- ger EVX 4 747 279 SP.FD efdctsidz Convert Floating-Point Double-Precision to Signed Inte- ger Doubleword with Round toward Zero EVX 4 762 280 SP.FD efdctsiz Convert Floating-Point Double-Precision to Signed Inte- ger with Round toward Zero EVX 4 758 280 SP.FD efdctuf Convert Floating-Point Double-Precision to Unsigned Fraction EVX 4 756 278 SP.FD efdctui Convert Floating-Point Double-Precision to Unsigned Integer EVX 4 746 279 SP.FD efdctuidz Convert Floating-Point Double-Precision to Unsigned Integer Doubleword with Round toward Zero EVX 4 760 280 SP.FD efdctuiz Convert Floating-Point Double-Precision to Unsigned Integer with Round toward Zero EVX 4 745 275 SP.FD efddiv Floating-Point Double-Precision Divide EVX 4 744 275 SP.FD efdmul Floating-Point Double-Precision Multiply EVX 4 741 274 SP.FD efdnabs Floating-Point Double-Precision Negative Absolute Value EVX 4 742 274 SP.FD efdneg Floating-Point Double-Precision Negate EVX 4 737 275 SP.FD efdsub Floating-Point Double-Precision Subtract EVX 4 766 277 SP.FD efdtsteq Floating-Point Double-Precision Test Equal EVX 4 764 276 SP.FD efdtstgt Floating-Point Double-Precision Test Greater Than EVX 4 765 277 SP.FD efdtstlt Floating-Point Double-Precision Test Less Than EVX 4 708 267 SP.FS efsabs Floating-Point Single-Precision Absolute Value 824 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 704 268 SP.FS efsadd Floating-Point Single-Precision Add EVX 4 719 281 SP.FD efscfd Floating-Point Single-Precision Convert from Double- Precision EVX 4 723 272 SP.FS efscfsf Convert Floating-Point Single-Precision from Signed Fraction EVX 4 721 272 SP.FS efscfsi Convert Floating-Point Single-Precision from Signed Integer EVX 4 722 272 SP.FS efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 720 272 SP.FS efscfui Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 718 270 SP.FS efscmpeq Floating-Point Single-Precision Compare Equal EVX 4 716 269 SP.FS efscmpgt Floating-Point Single-Precision Compare Greater Than EVX 4 717 269 SP.FS efscmplt Floating-Point Single-Precision Compare Less Than EVX 4 727 273 SP.FS efsctsf Convert Floating-Point Single-Precision to Signed Frac- tion EVX 4 725 272 SP.FS efsctsi Convert Floating-Point Single-Precision to Signed Inte- ger EVX 4 730 273 SP.FS efsctsiz Convert Floating-Point Single-Precision to Signed Inte- ger with Round toward Zero EVX 4 726 273 SP.FS efsctuf Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 724 272 SP.FS efsctui Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 728 273 SP.FS efsctuiz Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 713 268 SP.FS efsdiv Floating-Point Single-Precision Divide EVX 4 712 268 SP.FS efsmul Floating-Point Single-Precision Multiply EVX 4 709 267 SP.FS efsnabs Floating-Point Single-Precision Negative Absolute Value EVX 4 710 267 SP.FS efsneg Floating-Point Single-Precision Negate EVX 4 705 268 SP.FS efssub Floating-Point Single-Precision Subtract EVX 4 734 271 SP.FS efststeq Floating-Point Single-Precision Test Equal EVX 4 732 270 SP.FS efststgt Floating-Point Single-Precision Test Greater Than EVX 4 733 271 SP.FS efststlt Floating-Point Single-Precision Test Less Than X 31 854 374 S eieio Enforce In-order Execution of I/O X 31 284 SR 74 B eqv[.] Equivalent EVX 4 520 208 SP evabs Vector Absolute Value EVX 4 514 208 SP evaddiw Vector Add Immediate Word EVX 4 1225 208 SP evaddsmiaaw Vector Add Signed, Modulo, Integer to Accumulator Word EVX 4 1217 209 SP evaddssiaaw Vector Add Signed, Saturate, Integer to Accumulator Word EVX 4 1224 209 SP evaddumiaaw Vector Add Unsigned, Modulo, Integer to Accumulator Word EVX 4 1216 209 SP evaddusiaaw Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX 4 512 209 SP evaddw Vector Add Word EVX 4 529 210 SP evand Vector AND EVX 4 530 210 SP evandc Vector AND with Complement EVX 4 564 210 SP evcmpeq Vector Compare Equal EVX 4 561 210 SP evcmpgts Vector Compare Greater Than Signed EVX 4 560 211 SP evcmpgtu Vector Compare Greater Than Unsigned EVX 4 563 211 SP evcmplts Vector Compare Less Than Signed EVX 4 562 211 SP evcmpltu Vector Compare Less Than Unsigned EVX 4 526 212 SP evcntlsw Vector Count Leading Signed Bits Word EVX 4 525 212 SP evcntlzw Vector Count Leading Zeros Word EVX 4 1222 212 SP evdivws Vector Divide Word Signed EVX 4 1223 213 SP evdivwu Vector Divide Word Unsigned EVX 4 537 213 SP eveqv Vector Equivalent Appendix I. Power ISA Instruction Set Sorted by Mnemonic 825 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 522 213 SP evextsb Vector Extend Sign Byte EVX 4 523 213 SP evextsh Vector Extend Sign Halfword EVX 4 644 259 SP.FV evfsabs Vector Floating-Point Single-Precision Absolute Value EVX 4 640 260 SP.FV evfsadd Vector Floating-Point Single-Precision Add EVX 4 659 264 SP.FV evfscfsf Vector Convert Floating-Point Single-Precision from Signed Fraction EVX 4 657 264 SP.FV evfscfsi Vector Convert Floating-Point Single-Precision from Signed Integer EVX 4 658 264 SP.FV evfscfuf Vector Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 656 264 SP.FV evfscfui Vector Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 654 262 SP.FV evfscmpeq Vector Floating-Point Single-Precision Compare Equal EVX 4 652 261 SP.FV evfscmpgt Vector Floating-Point Single-Precision Compare Greater Than EVX 4 653 261 SP.FV evfscmplt Vector Floating-Point Single-Precision Compare Less Than EVX 4 663 266 SP.FV evfsctsf Vector Convert Floating-Point Single-Precision to Signed Fraction EVX 4 661 265 SP.FV evfsctsi Vector Convert Floating-Point Single-Precision to Signed Integer EVX 4 666 265 SP.FV evfsctsiz Vector Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero EVX 4 662 266 SP.FV evfsctuf Vector Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 660 265 SP.FV evfsctui Vector Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 664 265 SP.FV evfsctuiz Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 649 260 SP.FV evfsdiv Vector Floating-Point Single-Precision Divide EVX 4 648 260 SP.FV evfsmul Vector Floating-Point Single-Precision Multiply EVX 4 645 259 SP.FV evfsnabs Vector Floating-Point Single-Precision Negative Abso- lute Value EVX 4 646 259 SP.FV evfsneg Vector Floating-Point Single-Precision Negate EVX 4 641 260 SP.FV evfssub Vector Floating-Point Single-Precision Subtract EVX 4 670 263 SP.FV evfststeq Vector Floating-Point Single-Precision Test Equal EVX 4 668 262 SP.FV evfststgt Vector Floating-Point Single-Precision Test Greater Than EVX 4 669 263 SP.FV evfststlt Vector Floating-Point Single-Precision Test Less Than EVX 4 769 214 SP evldd Vector Load Double Word into Double Word EVX 31 285 538 E.PD evlddepx Vector Load Doubleword into Doubleword by External Process ID Indexed EVX 4 768 214 SP evlddx Vector Load Double Word into Double Word Indexed EVX 4 773 214 SP evldh Vector Load Double into Four Halfwords EVX 4 772 214 SP evldhx Vector Load Double into Four Halfwords Indexed EVX 4 771 215 SP evldw Vector Load Double into Two Words EVX 4 770 215 SP evldwx Vector Load Double into Two Words Indexed EVX 4 777 215 SP evlhhesplat Vector Load Halfword into Halfwords Even and Splat EVX 4 776 215 SP evlhhesplatx Vector Load Halfword into Halfwords Even and Splat Indexed EVX 4 783 216 SP evlhhossplat Vector Load Halfword into Halfword Odd Signed and Splat EVX 4 782 216 SP evlhhossplatx Vector Load Halfword into Halfword Odd Signed and Splat Indexed EVX 4 781 216 SP evlhhousplat Vector Load Halfword into Halfword Odd Unsigned and Splat EVX 4 780 216 SP evlhhousplatx Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed EVX 4 785 217 SP evlwhe Vector Load Word into Two Halfwords Even EVX 4 784 217 SP evlwhex Vector Load Word into Two Halfwords Even Indexed 826 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 791 217 SP evlwhos Vector Load Word into Two Halfwords Odd Signed (with sign extension) EVX 4 790 217 SP evlwhosx Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) EVX 4 789 218 SP evlwhou Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) EVX 4 788 218 SP evlwhoux Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) EVX 4 797 218 SP evlwhsplat Vector Load Word into Two Halfwords and Splat EVX 4 796 218 SP evlwhsplatx Vector Load Word into Two Halfwords and Splat Indexed EVX 4 793 219 SP evlwwsplat Vector Load Word into Word and Splat EVX 4 792 219 SP evlwwsplatx Vector Load Word into Word and Splat Indexed EVX 4 556 219 SP evmergehi Vector Merge High EVX 4 558 220 SP evmergehilo Vector Merge High/Low EVX 4 557 219 SP evmergelo Vector Merge Low EVX 4 559 220 SP evmergelohi Vector Merge Low/High EVX 4 1323 220 SP evmhegsmfaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate EVX 4 1451 220 SP evmhegsmfan Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 4 1321 221 SP evmhegsmiaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate EVX 4 1449 221 SP evmhegsmian Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 4 1320 221 SP evmhegumiaa Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1448 221 SP evmhegumian Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1035 222 SP evmhesmf Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional EVX 4 1067 222 SP evmhesmfa Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional to Accumulator EVX 4 1291 222 SP evmhesmfaaw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1419 222 SP evmhesmfanw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1033 223 SP evmhesmi Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger EVX 4 1065 223 SP evmhesmia Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger to Accumulator EVX 4 1289 223 SP evmhesmiaaw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1417 223 SP evmhesmianw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1027 224 SP evmhessf Vector Multiply Halfwords, Even, Signed, Saturate, Fractional EVX 4 1059 224 SP evmhessfa Vector Multiply Halfwords, Even, Signed, Saturate, Fractional to Accumulator EVX 4 1283 225 SP evmhessfaaw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Words EVX 4 1411 225 SP evmhessfanw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate Negative into Words EVX 4 1281 226 SP evmhessiaaw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate into Words EVX 4 1409 226 SP evmhessianw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate Negative into Words EVX 4 1032 227 SP evmheumi Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer Appendix I. Power ISA Instruction Set Sorted by Mnemonic 827 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1064 227 SP evmheumia Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator EVX 4 1288 227 SP evmheumiaaw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1416 227 SP evmheumianw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 4 1280 228 SP evmheusiaaw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1408 228 SP evmheusianw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 4 1327 229 SP evmhogsmfaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate EVX 4 1455 229 SP evmhogsmfan Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate Negative EVX 4 1325 229 SP evmhogsmiaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate EVX 4 1453 229 SP evmhogsmian Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate Negative EVX 4 1324 230 SP evmhogumiaa Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1452 230 SP evmhogumian Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1039 230 SP evmhosmf Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional EVX 4 1071 230 SP evmhosmfa Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional to Accumulator EVX 4 1295 231 SP evmhosmfaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1423 231 SP evmhosmfanw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1037 231 SP evmhosmi Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger EVX 4 1069 231 SP evmhosmia Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger to Accumulator EVX 4 1293 232 SP evmhosmiaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1421 231 SP evmhosmianw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1031 233 SP evmhossf Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional EVX 4 1063 233 SP evmhossfa Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional to Accumulator EVX 4 1287 234 SP evmhossfaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate into Words EVX 4 1415 234 SP evmhossfanw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 4 1285 235 SP evmhossiaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate into Words EVX 4 1413 235 SP evmhossianw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate Negative into Words EVX 4 1036 235 SP evmhoumi Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer EVX 4 1068 235 SP evmhoumia Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator EVX 4 1292 236 SP evmhoumiaaw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1420 232 SP evmhoumianw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words 828 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1284 236 SP evmhousiaaw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1412 236 SP evmhousianw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 4 1220 237 SP evmra Initialize Accumulator EVX 4 1103 237 SP evmwhsmf Vector Multiply Word High Signed, Modulo, Fractional EVX 4 1135 237 SP evmwhsmfa Vector Multiply Word High Signed, Modulo, Fractional to Accumulator EVX 4 1101 237 SP evmwhsmi Vector Multiply Word High Signed, Modulo, Integer EVX 4 1133 237 SP evmwhsmia Vector Multiply Word High Signed, Modulo, Integer to Accumulator EVX 4 1095 238 SP evmwhssf Vector Multiply Word High Signed, Saturate, Fractional EVX 4 1127 238 SP evmwhssfa Vector Multiply Word High Signed, Saturate, Fractional to Accumulator EVX 4 1100 238 SP evmwhumi Vector Multiply Word High Unsigned, Modulo, Integer EVX 4 1132 238 SP evmwhumia Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator EVX 4 1353 239 SP evmwlsmiaaw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words EVX 4 1481 239 SP evmwlsmianw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words EVX 4 1345 239 SP evmwlssiaaw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words EVX 4 1473 239 SP evmwlssianw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words EVX 4 1096 240 SP evmwlumi Vector Multiply Word Low Unsigned, Modulo, Integer EVX 4 1128 240 SP evmwlumia Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator EVX 4 1352 240 SP evmwlumiaaw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1480 240 SP evmwlumianw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words EVX 4 1344 241 SP evmwlusiaaw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1472 241 SP evmwlusianw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words EVX 4 1115 241 SP evmwsmf Vector Multiply Word Signed, Modulo, Fractional EVX 4 1147 241 SP evmwsmfa Vector Multiply Word Signed, Modulo, Fractional to Accumulator EVX 4 1371 242 SP evmwsmfaa Vector Multiply Word Signed, Modulo, Fractional and Accumulate EVX 4 1499 242 SP evmwsmfan Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative EVX 4 1113 242 SP evmwsmi Vector Multiply Word Signed, Modulo, Integer EVX 4 1145 242 SP evmwsmia Vector Multiply Word Signed, Modulo, Integer to Accu- mulator EVX 4 1369 242 SP evmwsmiaa Vector Multiply Word Signed, Modulo, Integer and Accumulate EVX 4 1497 242 SP evmwsmian Vector Multiply Word Signed, Modulo, Integer and Accumulate Negative EVX 4 1107 243 SP evmwssf Vector Multiply Word Signed, Saturate, Fractional EVX 4 1139 243 SP evmwssfa Vector Multiply Word Signed, Saturate, Fractional to Accumulator EVX 4 1363 243 SP evmwssfaa Vector Multiply Word Signed, Saturate, Fractional and Accumulate EVX 4 1491 244 SP evmwssfan Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative EVX 4 1112 244 SP evmwumi Vector Multiply Word Unsigned, Modulo, Integer Appendix I. Power ISA Instruction Set Sorted by Mnemonic 829 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1144 244 SP evmwumia Vector Multiply Word Unsigned, Modulo, Integer to Accumulator EVX 4 1368 245 SP evmwumiaa Vector Multiply Word Unsigned, Modulo, Integer and Accumulate EVX 4 1496 245 SP evmwumian Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative EVX 4 542 245 SP evnand Vector NAND EVX 4 521 245 SP evneg Vector Negate EVX 4 536 245 SP evnor Vector NOR EVX 4 535 246 SP evor Vector OR EVX 4 539 246 SP evorc Vector OR with Complement EVX 4 552 246 SP evrlw Vector Rotate Left Word EVX 4 554 247 SP evrlwi Vector Rotate Left Word Immediate EVX 4 524 247 SP evrndw Vector Round Word EVS 4 79 247 SP evsel Vector Select EVX 4 548 248 SP evslw Vector Shift Left Word EVX 4 550 248 SP evslwi Vector Shift Left Word Immediate EVX 4 555 248 SP evsplatfi Vector Splat Fractional Immediate EVX 4 553 248 SP evsplati Vector Splat Immediate EVX 4 547 248 SP evsrwis Vector Shift Right Word Immediate Signed EVX 4 546 248 SP evsrwiu Vector Shift Right Word Immediate Unsigned EVX 4 545 249 SP evsrws Vector Shift Right Word Signed EVX 4 544 249 SP evsrwu Vector Shift Right Word Unsigned EVX 4 801 249 SP evstdd Vector Store Double of Double EVX 31 413 538 E.PD evstddepx Vector Store Doubleword into Doubleword by External Process ID Indexed EVX 4 800 249 SP evstddx Vector Store Double of Double Indexed EVX 4 805 250 SP evstdh Vector Store Double of Four Halfwords EVX 4 804 250 SP evstdhx Vector Store Double of Four Halfwords Indexed EVX 4 803 250 SP evstdw Vector Store Double of Two Words EVX 4 802 250 SP evstdwx Vector Store Double of Two Words Indexed EVX 4 817 251 SP evstwhe Vector Store Word of Two Halfwords from Even EVX 4 816 251 SP evstwhex Vector Store Word of Two Halfwords from Even Indexed EVX 4 821 251 SP evstwho Vector Store Word of Two Halfwords from Odd EVX 4 820 251 SP evstwhox Vector Store Word of Two Halfwords from Odd Indexed EVX 4 825 251 SP evstwwe Vector Store Word of Word from Even EVX 4 824 251 SP evstwwex Vector Store Word of Word from Even Indexed EVX 4 829 252 SP evstwwo Vector Store Word of Word from Odd EVX 4 828 252 SP evstwwox Vector Store Word of Word from Odd Indexed EVX 4 1227 252 SP evsubfsmiaaw Vector Subtract Signed, Modulo, Integer to Accumula- tor Word EVX 4 1219 252 SP evsubfssiaaw Vector Subtract Signed, Saturate, Integer to Accumula- tor Word EVX 4 1226 253 SP evsubfumiaaw Vector Subtract Unsigned, Modulo, Integer to Accumu- lator Word EVX 4 1218 253 SP evsubfusiaaw Vector Subtract Unsigned, Saturate, Integer to Accu- mulator Word EVX 4 516 253 SP evsubfw Vector Subtract from Word EVX 4 518 253 SP evsubifw Vector Subtract Immediate from Word EVX 4 534 253 SP evxor Vector XOR X 31 954 SR 74 B extsb[.] Extend Sign Byte X 31 922 SR 74 B extsh[.] Extend Sign Halfword X 31 986 SR 76 64 extsw[.] Extend Sign Word X 63 264 118 FP[R] fabs[.] Floating Absolute Value A 63 21 119 FP[R] fadd[.] Floating Add A 59 21 119 FP[R] fadds[.] Floating Add Single X 63 846 127 FP[R] fcfid[.] Floating Convert From Integer Doubleword X 63 32 129 FP fcmpo Floating Compare Ordered X 63 0 129 FP fcmpu Floating Compare Unordered X 63 814 125 FP[R] fctid[.] Floating Convert To Integer Doubleword 830 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 63 815 126 FP[R] fctidz[.] Floating Convert To Integer Doubleword with round toward Zero X 63 14 126 FP[R] fctiw[.] Floating Convert To Integer Word X 63 15 127 FP[R] fctiwz[.] Floating Convert To Integer Word with round toward Zero A 63 18 120 FP[R] fdiv[.] Floating Divide A 59 18 120 FP[R] fdivs[.] Floating Divide Single A 63 29 123 FP[R] fmadd[.] Floating Multiply-Add A 59 29 123 FP[R] fmadds[.] Floating Multiply-Add Single X 63 72 118 FP[R] fmr[.] Floating Move Register A 63 28 123 FP[R] fmsub[.] Floating Multiply-Subtract A 59 28 123 FP[R] fmsubs[.] Floating Multiply-Subtract Single A 63 25 120 FP[R] fmul[.] Floating Multiply A 59 25 120 FP[R] fmuls[.] Floating Multiply Single X 63 136 118 FP[R] fnabs[.] Floating Negative Absolute Value X 63 40 118 FP[R] fneg[.] Floating Negate A 63 31 124 FP[R] fnmadd[.] Floating Negative Multiply-Add A 59 31 124 FP[R] fnmadds[.] Floating Negative Multiply-Add Single A 63 30 124 FP[R] fnmsub[.] Floating Negative Multiply-Subtract A 59 30 124 FP[R] fnmsubs[.] Floating Negative Multiply-Subtract Single A 63 24 121 FP[R] fre[.] Floating Reciprocal Estimate A 59 24 121 FP[R] fres[.] Floating Reciprocal Estimate Single X 63 488 128 FP[R] frim[.] Floating Round to Integer Minus X 63 392 128 FP[R].in frin[.] Floating Round to Integer Nearest X 63 456 128 FP[R].in frip[.] Floating Round to Integer Plus X 63 424 128 FP[R].in friz[.] Floating Round to Integer Toward Zero X 63 12 125 FP[R].in frsp[.] Floating Round to Single-Precision A 63 26 122 FP[R].in frsqrte[.] Floating Reciprocal Square Root Estimate A 59 26 122 FP[R].in frsqrtes[.] Floating Reciprocal Square Root Estimate Single A 63 23 130 FP[R] fsel[.] Floating Select A 63 22 121 FP[R] fsqrt[.] Floating Square Root A 59 22 121 FP[R] fsqrts[.] Floating Square Root Single A 63 20 119 FP[R] fsub[.] Floating Subtract A 59 20 119 FP[R] fsubs[.] Floating Subtract Single XL 19 274 H 405 S hrfid Hypervisor Return From Interrupt Doubleword X 31 982 359 B icbi Instruction Cache Block Invalidate X 31 991 536 E.PD icbiep Instruction Cache Block Invalidate by External PID X 31 230 559 ECL icblc Instruction Cache Block Lock Clear X 31 22 359 E icbt Instruction Cache Block Touch X 31 486 558 ECL icbtls Instruction Cache Block Touch and Lock Set X 31 966 629 E.CI ici Instruction Cache Invalidate X 31 998 633 E.CD icread Instruction Cache Read A 31 15 70 B.in isel Integer Select XL 19 150 369 B isync Instruction Synchronize X 31 95 529 E.PD lbepx Load Byte by External Process ID Indexed D 34 41 B lbz Load Byte and Zero D 35 41 B lbzu Load Byte and Zero with Update X 31 119 41 B lbzux Load Byte and Zero with Update Indexed X 31 87 42 B lbzx Load Byte and Zero Indexed DS 58 0 46 64 ld Load Doubleword X 31 84 371 64 ldarx Load Doubleword And Reserve Indexed X 31 29 530 E.PD ldepx Load Doubleword by External Process ID Indexed DS 58 1 46 64 ldu Load Doubleword with Update X 31 53 46 64 ldux Load Doubleword with Update Indexed X 31 21 46 64 ldx Load Doubleword Indexed D 50 113 FP lfd Load Floating-Point Double X 31 607 537 E.PD lfdepx Load Floating-Point Double by External Process ID Indexed D 51 113 FP lfdu Load Floating-Point Double with Update X 31 631 113 FP lfdux Load Floating-Point Double with Update Indexed X 31 599 113 FP lfdx Load Floating-Point Double Indexed Appendix I. Power ISA Instruction Set Sorted by Mnemonic 831 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext D 48 115 FP lfs Load Floating-Point Single D 49 115 FP lfsu Load Floating-Point Single with Update X 31 567 115 FP lfsux Load Floating-Point Single with Update Indexed X 31 535 115 FP lfsx Load Floating-Point Single Indexed D 42 43 B lha Load Halfword Algebraic D 43 43 B lhau Load Halfword Algebraic with Update X 31 375 43 B lhaux Load Halfword Algebraic with Update Indexed X 31 343 43 B lhax Load Halfword Algebraic Indexed X 31 790 51 B lhbrx Load Halfword Byte-Reverse Indexed X 31 287 529 E.PD lhepx Load Halfword by External Process ID Indexed D 40 42 B lhz Load Halfword and Zero D 41 42 B lhzu Load Halfword and Zero with Update X 31 311 42 B lhzux Load Halfword and Zero with Update Indexed X 31 279 42 B lhzx Load Halfword and Zero Indexed D 46 52 B lmw Load Multiple Word DQ 56 P 410 LSQ lq Load Quadword X 31 597 55 MA lswi Load String Word Immediate X 31 533 55 MA lswx Load String Word Indexed X 31 7 146 V lvebx Load Vector Element Byte Indexed X 31 39 143 V lvehx Load Vector Element Halfword Indexed X 31 295 539 E.PD lvepx Load Vector by External Process ID Indexed X 31 263 539 E.PD lvepxl Load Vector by External Process ID Indexed LRU X 31 71 143 V lvewx Load Vector Element Word Indexed X 31 6 148 V lvsl Load Vector for Shift Left Indexed X 31 38 148 V lvsr Load Vector for Shift Right Indexed X 31 103 144 V lvx Load Vector Indexed X 31 359 144 V lvxl Load Vector Indexed Last DS 58 2 45 64 lwa Load Word Algebraic X 31 20 370 B lwarx Load Word And Reserve Indexed X 31 373 45 64 lwaux Load Word Algebraic with Update Indexed X 31 341 45 64 lwax Load Word Algebraic Indexed X 31 534 51 B lwbrx Load Word Byte-Reverse Indexed X 31 31 530 E.PD lwepx Load Word by External Process ID Indexed D 32 44 B lwz Load Word and Zero D 33 44 B lwzu Load Word and Zero with Update X 31 55 44 B lwzux Load Word and Zero with Update Indexed X 31 23 44 B lwzx Load Word and Zero Indexed XO 4 172 289 LMA macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed XO 4 236 289 LMA macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed XO 4 204 290 LMA macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned XO 4 140 290 LMA macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned XO 4 44 291 LMA machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed XO 4 108 291 LMA machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed XO 4 76 292 LMA machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned XO 4 12 292 LMA machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned XO 4 428 293 LMA maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed XO 4 492 293 LMA maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed XO 4 460 294 LMA maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned XO 4 396 294 LMA maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned 832 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 854 374 E mbar Memory Barrier XL 19 0 34 B mcrf Move Condition Register Field X 63 64 131 FP mcrfs Move to Condition Register from FPSCR X 31 512 91 B mcrxr Move to Condition Register from XER X 31 275 91 E mfapidi Move From APID Indirect XFX 31 19 89 B mfcr Move From Condition Register XFX 31 323 S 527 E mfdcr Move From Device Control Register X 31 291 91 E mfdcrux Move From Device Control Register User-mode Indexed X 31 259 P 527 E mfdcrx Move From Device Control Register Indexed X 63 583 131 FP[R] mffs[.] Move From FPSCR X 31 83 P 417, B mfmsr Move From Machine State Register 527 XFX 31 19 90 B.in mfocrf Move From One Condition Register Field XFX 31 334 658 E.PM mfpmr Move From Performance Monitor Register XFX 31 339 O 88,3 B mfspr Move From Special Purpose Register 78 X 31 595 32 P 449 S mfsr Move From Segment Register X 31 659 32 P 449 S mfsrin Move From Segment Register Indirect XFX 31 371 378 S mftb Move From Time Base VX 4 1540 199 V mfvscr Move From Vector Status and Control Register X 31 238 623 E.PC msgclr Message Clear X 31 206 623 E.PC msgsnd Message Send XFX 31 144 89 B mtcrf Move To Condition Register Fields XFX 31 451 P 526 E mtdcr Move To Device Control Register X 31 419 91 E mtdcrux Move To Device Control Register User-mode Indexed X 31 387 P 526 E mtdcrx Move To Device Control Register Indexed X 63 70 132 FP[R] mtfsb0[.] Move To FPSCR Bit 0 X 63 38 132 FP[R] mtfsb1[.] Move To FPSCR Bit 1 XFL 63 711 131 FP[R] mtfsf[.] Move To FPSCR Fields X 63 134 131 FP[R] mtfsfi[.] Move To FPSCR Field Immediate X 31 146 P 527 E mtmsr Move To Machine State Register X 31 146 P 415 S mtmsr Move To Machine State Register X 31 178 P 416 S mtmsrd Move To Machine State Register Doubleword XFX 31 144 90 B.in mtocrf Move To One Condition Register Field XFX 31 462 658 E.PM mtpmr Move To Performance Monitor Register XFX 31 467 O 87 B mtspr Move To Special Purpose Register X 31 210 32 P 448 S mtsr Move To Segment Register X 31 242 32 P 448 S mtsrin Move To Segment Register Indirect VX 4 1604 199 V mtvscr Move To Vector Status and Control Register X 4 168 294 LMA mulchw[.] Multiply Cross Halfword to Word Signed X 4 136 294 LMA mulchwu[.] Multiply Cross Halfword to Word Unsigned XO 31 73 SR 65 64 mulhd[.] Multiply High Doubleword XO 31 9 SR 65 64 mulhdu[.] Multiply High Doubleword Unsigned X 4 40 295 LMA mulhhw[.] Multiply High Halfword to Word Signed X 4 8 295 LMA mulhhwu[.] Multiply High Halfword to Word Unsigned XO 31 75 SR 63 B mulhw[.] Multiply High Word XO 31 11 SR 63 B mulhwu[.] Multiply High Word Unsigned XO 31 233 SR 65 64 mulld[o][.] Multiply Low Doubleword X 4 424 295 LMA mullhw[.] Multiply Low Halfword to Word Signed X 4 392 295 LMA mullhwu[.] Multiply Low Halfword to Word Unsigned D 7 63 B mulli Multiply Low Immediate XO 31 235 SR 63 B mullw[o][.] Multiply Low Word X 31 476 SR 73 B nand[.] NAND XO 31 104 SR 62 B neg[o][.] Negate XO 4 174 296 LMA nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Modulo Signed XO 4 238 296 LMA nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Saturate Signed XO 4 46 297 LMA nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Modulo Signed Appendix I. Power ISA Instruction Set Sorted by Mnemonic 833 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XO 4 110 297 LMA nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Saturate Signed XO 4 430 298 LMA nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Modulo Signed XO 4 494 298 LMA nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Saturate Signed X 31 124 SR 74 B nor[.] NOR X 31 444 SR 73 B or[.] OR X 31 412 SR 74 B orc[.] OR with Complement D 24 71 B ori OR Immediate D 25 72 B oris OR Immediate Shifted X 31 122 76 B.in popcntb Population Count Bytes XL 19 51 P 516 E rfci Return From Critical Interrupt X 19 39 516 E.ED rfdi Return From Debug Interrupt XL 19 50 P 515 E rfi Return From Interrupt XL 19 18 P 405 S rfid Return From Interrupt Doubleword XL 19 38 P 516 E rfmci Return From Machine Check Interrupt MDS 30 8 SR 81 64 rldcl[.] Rotate Left Doubleword then Clear Left MDS 30 9 SR 82 64 rldcr[.] Rotate Left Doubleword then Clear Right MD 30 2 SR 81 64 rldic[.] Rotate Left Doubleword Immediate then Clear MD 30 0 SR 79 64 rldicl[.] Rotate Left Doubleword Immediate then Clear Left MD 30 1 SR 80 64 rldicr[.] Rotate Left Doubleword Immediate then Clear Right MD 30 3 SR 82 64 rldimi[.] Rotate Left Doubleword Immediate then Mask Insert M 20 SR 79 B rlwimi[.] Rotate Left Word Immediate then Mask Insert M 21 SR 77 B rlwinm[.] Rotate Left Word Immediate then AND with Mask M 23 SR 78 B rlwnm[.] Rotate Left Word then AND with Mask SC 17 35, B sc System Call 404, 515 X 31 498 P 444 S slbia SLB Invalidate All X 31 434 P 443 S slbie SLB Invalidate Entry X 31 915 P 446 S slbmfee SLB Move From Entry ESID X 31 851 P 446 S slbmfev SLB Move From Entry VSID X 31 402 P 445 S slbmte SLB Move To Entry X 31 27 SR 85 64 sld[.] Shift Left Doubleword X 31 24 SR 83 B slw[.] Shift Left Word X 31 794 SR 85 64 srad[.] Shift Right Algebraic Doubleword XS 31 413 SR 85 64 sradi[.] Shift Right Algebraic Doubleword Immediate X 31 792 SR 84 B sraw[.] Shift Right Algebraic Word X 31 824 SR 84 B srawi[.] Shift Right Algebraic Word Immediate X 31 539 SR 85 64 srd[.] Shift Right Doubleword X 31 536 SR 83 B srw[.] Shift Right Word D 38 47 B stb Store Byte X 31 223 531 E.PD stbepx Store Byte by External Process ID Indexed D 39 47 B stbu Store Byte with Update X 31 247 47 B stbux Store Byte with Update Indexed X 31 215 47 B stbx Store Byte Indexed DS 62 0 50 64 std Store Doubleword X 31 214 371 64 stdcx. Store Doubleword Conditional Indexed X 31 157 532 E.PD stdepx Store Doubleword by External Process ID Indexed DS 62 1 50 64 stdu Store Doubleword with Update X 31 181 50 64 stdux Store Doubleword with Update Indexed X 31 149 50 64 stdx Store Doubleword Indexed D 54 116 FP stfd Store Floating-Point Double X 31 735 537 E.PD stfdepx Store Floating-Point Double by External Process ID Indexed D 55 116 FP stfdu Store Floating-Point Double with Update X 31 759 116 FP stfdux Store Floating-Point Double with Update Indexed X 31 727 116 FP stfdx Store Floating-Point Double Indexed X 31 983 117 FP stfiwx Store Floating-Point as Integer Word Indexed D 52 115 FP stfs Store Floating-Point Single 834 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext D 53 115 FP stfsu Store Floating-Point Single with Update X 31 695 115 FP stfsux Store Floating-Point Single with Update Indexed X 31 663 115 FP stfsx Store Floating-Point Single Indexed D 44 48 B sth Store Halfword X 31 918 51 B sthbrx Store Halfword Byte-Reverse Indexed X 31 415 531 E.PD sthepx Store Halfword by External Process ID Indexed D 45 48 B sthu Store Halfword with Update X 31 439 48 B sthux Store Halfword with Update Indexed X 31 407 48 B sthx Store Halfword Indexed D 47 53 B stmw Store Multiple Word DS 62 2 P 410 LSQ stq Store Quadword X 31 725 56 MA stswi Store String Word Immediate X 31 661 56 MA stswx Store String Word Indexed X 31 135 146 V stvebx Store Vector Element Byte Indexed X 31 167 146 V stvehx Store Vector Element Halfword Indexed X 31 807 540 E.PD stvepx Store Vector by External Process ID Indexed X 31 775 540 E.PD stvepxl Store Vector by External Process ID Indexed LRU X 31 199 147 V stvewx Store Vector Element Word Indexed X 31 231 144 V stvx Store Vector Indexed X 31 487 147 V stvxl Store Vector Indexed Last D 36 49 B stw Store Word X 31 662 51 B stwbrx Store Word Byte-Reverse Indexed X 31 150 370 B stwcx. Store Word Conditional Indexed X 31 159 532 E.PD stwepx Store Word by External Process ID Indexed D 37 49 B stwu Store Word with Update X 31 183 49 B stwux Store Word with Update Indexed X 31 151 49 B stwx Store Word Indexed XO 31 40 SR 59 B subf[o][.] Subtract From XO 31 8 SR 60 B subfc[o][.] Subtract From Carrying XO 31 136 SR 61 B subfe[o][.] Subtract From Extended D 8 SR 60 B subfic Subtract From Immediate Carrying XO 31 232 SR 61 B subfme[o][.] Subtract From Minus One Extended XO 31 200 SR 62 B subfze[o][.] Subtract From Zero Extended X 31 598 372 B sync Synchronize X 31 68 70 64 td Trap Doubleword D 2 70 64 tdi Trap Doubleword Immediate X 31 370 P 453 S tlbia TLB Invalidate All X 31 306 64 H 450 S tlbie TLB Invalidate Entry X 31 274 64 H 452 S tlbiel TLB Invalidate Entry Local X 31 786 P 560, E tlbivax TLB Invalidate Virtual Address Indexed 649 X 31 946 P 560, E tlbre TLB Read Entry 650 X 31 914 P 561, E tlbsx TLB Search Indexed 650 X 31 566 H 453, B tlbsync TLB Synchronize 561, 651 X 31 978 P 562, E tlbwe TLB Write Entry 651 X 31 4 69 B tw Trap Word D 3 69 B twi Trap Word Immediate VX 4 384 160 V vaddcuw Vector Add and Write Carry-Out Unsigned Word VX 4 10 189 V vaddfp Vector Add Single-Precision VX 4 768 160 V vaddsbs Vector Add Signed Byte Saturate VX 4 832 160 V vaddshs Vector Add Signed Halfword Saturate VX 4 896 160 V vaddsws Vector Add Signed Word Saturate VX 4 0 161 V vaddubm Vector Add Unsigned Byte Modulo VX 4 512 162 V vaddubs Vector Add Unsigned Byte Saturate VX 4 64 161 V vadduhm Vector Add Unsigned Halfword Modulo VX 4 576 162 V vadduhs Vector Add Unsigned Halfword Saturate Appendix I. Power ISA Instruction Set Sorted by Mnemonic 835 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 128 161 V vadduwm Vector Add Unsigned Word Modulo VX 4 640 162 V vadduws Vector Add Unsigned Word Saturate VX 4 1028 184 V vand Vector Logical AND VX 4 1092 184 V vandc Vector Logical AND with Complement VX 4 1282 175 V vavgsb Vector Average Signed Byte VX 4 1346 175 V vavgsh Vector Average Signed Halfword VX 4 1410 175 V vavgsw Vector Average Signed Word VX 4 1026 176 V vavgub Vector Average Unsigned Byte VX 4 1090 176 V vavguh Vector Average Unsigned Halfword VX 4 1154 176 V vavguw Vector Average Unsigned Word VX 4 842 193 V vcfsx Vector Convert From Signed Fixed-Point Word VX 4 778 193 V vcfux Vector Convert From Unsigned Fixed-Point Word VC 4 966 195 V vcmpbfp[.] Vector Compare Bounds Single-Precision VC 4 198 195 V vcmpeqfp[.] Vector Compare Equal To Single-Precision VC 4 6 181 V vcmpequb[.] Vector Compare Equal To Unsigned Byte VC 4 70 181 V vcmpequh[.] Vector Compare Equal To Unsigned Halfword VC 4 134 182 V vcmpequw[.] Vector Compare Equal To Unsigned Word VC 4 454 196 V vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Pre- cision VC 4 710 196 V vcmpgtfp[.] Vector Compare Greater Than Single-Precision VC 4 774 182 V vcmpgtsb[.] Vector Compare Greater Than Signed Byte VC 4 838 182 V vcmpgtsh[.] Vector Compare Greater Than Signed Halfword VC 4 902 182 V vcmpgtsw[.] Vector Compare Greater Than Signed Word VC 4 518 183 V vcmpgtub[.] Vector Compare Greater Than Unsigned Byte VC 4 582 183 V vcmpgtuh[.] Vector Compare Greater Than Unsigned Halfword VC 4 646 183 V vcmpgtuw[.] Vector Compare Greater Than Unsigned Word VX 4 970 192 V vctsxs Vector Convert To Signed Fixed-Point Word Saturate VX 4 906 192 V vctuxs Vector Convert To Unsigned Fixed-Point Word Saturate VX 4 394 197 V vexptefp Vector 2 Raised to the Exponent Estimate Floating- Point VX 4 458 197 V vlogefp Vector Log Base 2 Estimate Floating-Point VA 4 46 190 V vmaddfp Vector Multiply-Add Single-Precision VX 4 1034 191 V vmaxfp Vector Maximum Single-Precision VX 4 258 177 V vmaxsb Vector Maximum Signed Byte VX 4 322 177 V vmaxsh Vector Maximum Signed Halfword VX 4 386 177 V vmaxsw Vector Maximum Signed Word VX 4 2 178 V vmaxub Vector Maximum Unsigned Byte VX 4 66 178 V vmaxuh Vector Maximum Unsigned Halfword VX 4 130 178 V vmaxuw Vector Maximum Unsigned Word VA 4 32 168 V vmhaddshs Vector Multiply-High-Add Signed Halfword Saturate VA 4 33 168 V vmhraddshs Vector Multiply-High-Round-Add Signed Halfword Satu- rate VX 4 1098 191 V vminfp Vector Minimum Single-Precision VX 4 770 179 V vminsb Vector Minimum Signed Byte VX 4 834 179 V vminsh Vector Minimum Signed Halfword VX 4 898 179 V vminsw Vector Minimum Signed Word VX 4 514 180 V vminub Vector Minimum Unsigned Byte VX 4 578 180 V vminuh Vector Minimum Unsigned Halfword VX 4 642 180 V vminuw Vector Minimum Unsigned Word VA 4 34 169 V vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo VX 4 12 154 V vmrghb Vector Merge High Byte VX 4 76 154 V vmrghh Vector Merge High Halfword VX 4 140 154 V vmrghw Vector Merge High Word VX 4 268 155 V vmrglb Vector Merge Low Byte VX 4 332 155 V vmrglh Vector Merge Low Halfword VX 4 396 155 V vmrglw Vector Merge Low Word VA 4 37 170 V vmsummbm Vector Multiply-Sum Mixed Byte Modulo VA 4 40 170 V vmsumshm Vector Multiply-Sum Signed Halfword Modulo VA 4 41 171 V vmsumshs Vector Multiply-Sum Signed Halfword Saturate VA 4 36 169 V vmsumubm Vector Multiply-Sum Unsigned Byte Modulo VA 4 38 171 V vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo 836 Power ISATM -- Book Appendices Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VA 4 39 172 V vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate VX 4 776 166 V vmulesb Vector Multiply Even Signed Byte VX 4 840 166 V vmulesh Vector Multiply Even Signed Halfword VX 4 520 166 V vmuleub Vector Multiply Even Unsigned Byte VX 4 584 166 V vmuleuh Vector Multiply Even Unsigned Halfword VX 4 264 167 V vmulosb Vector Multiply Odd Signed Byte VX 4 328 167 V vmulosh Vector Multiply Odd Signed Halfword VX 4 8 167 V vmuloub Vector Multiply Odd Unsigned Byte VX 4 72 167 V vmulouh Vector Multiply Odd Unsigned Halfword VA 4 47 190 V vnmsubfp Vector Negative Multiply-Subtract Single-Precision VX 4 1284 184 V vnor Vector Logical NOR VX 4 1156 184 V vor Vector Logical OR VA 4 43 157 V vperm Vector Permute VX 4 782 149 V vpkpx Vector Pack Pixel VX 4 398 150 V vpkshss Vector Pack Signed Halfword Signed Saturate VX 4 270 150 V vpkshus Vector Pack Signed Halfword Unsigned Saturate VX 4 462 150 V vpkswss Vector Pack Signed Word Signed Saturate VX 4 334 150 V vpkswus Vector Pack Signed Word Unsigned Saturate VX 4 14 151 V vpkuhum Vector Pack Unsigned Halfword Unsigned Modulo VX 4 142 151 V vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate VX 4 78 151 V vpkuwum Vector Pack Unsigned Word Unsigned Modulo VX 4 206 151 V vpkuwus Vector Pack Unsigned Word Unsigned Saturate VX 4 266 198 V vrefp Vector Reciprocal Estimate Single-Precision VX 4 714 194 V vrfim Vector Round to Single-Precision Integer toward -Infin- ity VX 4 522 194 V vrfin Vector Round to Single-Precision Integer Nearest VX 4 650 194 V vrfip Vector Round to Single-Precision Integer toward +Infin- ity VX 4 586 194 V vrfiz Vector Round to Single-Precision Integer toward Zero VX 4 4 185 V vrlb Vector Rotate Left Byte VX 4 68 185 V vrlh Vector Rotate Left Halfword VX 4 132 185 V vrlw Vector Rotate Left Word VX 4 330 198 V vrsqrtefp Vector Reciprocal Square Root Estimate Single-Preci- sion VA 4 42 157 V vsel Vector Select VX 4 452 158 V vsl Vector Shift Left VX 4 260 186 V vslb Vector Shift Left Byte VA 4 44 158 V vsldoi Vector Shift Left Double by Octet Immediate VX 4 324 186 V vslh Vector Shift Left Halfword VX 4 1036 158 V vslo Vector Shift Left by Octet VX 4 388 186 V vslw Vector Shift Left Word VX 4 524 156 V vspltb Vector Splat Byte VX 4 588 156 V vsplth Vector Splat Halfword VX 4 780 156 V vspltisb Vector Splat Immediate Signed Byte VX 4 844 156 V vspltish Vector Splat Immediate Signed Halfword VX 4 908 156 V vspltisw Vector Splat Immediate Signed Word VX 4 652 156 V vspltw Vector Splat Word VX 4 708 159 V vsr Vector Shift Right VX 4 772 188 V vsrab Vector Shift Right Algebraic Byte VX 4 836 188 V vsrah Vector Shift Right Algebraic Halfword VX 4 900 188 V vsraw Vector Shift Right Algebraic Word VX 4 516 187 V vsrb Vector Shift Right Byte VX 4 580 187 V vsrh Vector Shift Right Halfword VX 4 1100 159 V vsro Vector Shift Right by Octet VX 4 644 187 V vsrw Vector Shift Right Word VX 4 1408 163 V vsubcuw Vector Subtract and Write Carry-Out Unsigned Word VX 4 74 189 V vsubfp Vector Subtract Single-Precision VX 4 1792 163 V vsubsbs Vector Subtract Signed Byte Saturate VX 4 1856 163 V vsubshs Vector Subtract Signed Halfword Saturate VX 4 1920 163 V vsubsws Vector Subtract Signed Word Saturate VX 4 1024 164 V vsububm Vector Subtract Unsigned Byte Modulo Appendix I. Power ISA Instruction Set Sorted by Mnemonic 837 Version 2.04 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 1536 165 V vsububs Vector Subtract Unsigned Byte Saturate VX 4 1088 164 V vsubuhm Vector Subtract Unsigned Halfword Modulo VX 4 1600 164 V vsubuhs Vector Subtract Unsigned Halfword Saturate VX 4 1152 164 V vsubuwm Vector Subtract Unsigned Word Modulo VX 4 1664 165 V vsubuws Vector Subtract Unsigned Word Saturate VX 4 1672 173 V vsum2sws Vector Sum across Half Signed Word Saturate VX 4 1800 174 V vsum4sbs Vector Sum across Quarter Signed Byte Saturate VX 4 1608 174 V vsum4shs Vector Sum across Quarter Signed Halfword Saturate VX 4 1544 174 V vsum4ubs Vector Sum across Quarter Unsigned Byte Saturate VX 4 1928 173 V vsumsws Vector Sum across Signed Word Saturate VX 4 846 152 V vupkhpx Vector Unpack High Pixel VX 4 526 152 V vupkhsb Vector Unpack High Signed Byte VX 4 590 152 V vupkhsh Vector Unpack High Signed Halfword VX 4 974 153 V vupklpx Vector Unpack Low Pixel VX 4 654 153 V vupklsb Vector Unpack Low Signed Byte VX 4 718 153 V vupklsh Vector Unpack Low Signed Halfword VX 4 1220 184 V vxor Vector Logical XOR X 31 62 375 WT wait Wait X 31 131 S 528 E wrtee Write MSR External Enable X 31 163 S 528 E wrteei Write MSR External Enable Immediate X 31 316 SR 73 B xor[.] XOR D 26 72 B xori XOR Immediate D 27 72 B xoris XOR Immediate Shifted 1 See the key to the mode dependency and privilege columns on page 839 and the key to the category column in Section 1.3.5 of Book I. 838 Power ISATM -- Book Appendices Version 2.04 Mode Dependency and Privilege Abbreviations Except as described below and in Section 1.10.3, "Effective Address Calculation", in Book I, all instructions are inde- pendent of whether the processor is in 32-bit or 64-bit mode. Key to Mode Dependency Column Mode Dep. Description CT If the instruction tests the Count Register, it tests the low-order 32 bits in 32-bit mode and all 64 bits in 64-bit mode. SR The setting of status registers (such as XER and CR0) is mode-dependent. 32 The instruction can be executed only in 32-bit mode. 64 The instruction can be executed only in 64-bit mode. Key to Privilege Column Priv. Description P Denotes a privileged instruction. O Denotes an instruction that is treated as privi- leged or nonprivileged (or hypervisor, for mtspr), depending on the SPR number. H Denotes an instruction that can be executed only in hypervisor state. Appendix I. Power ISA Instruction Set Sorted by Mnemonic 839 Version 2.04 840 Power ISATM -- Book Appendices Version 2.04 Index A BB field 16 BC field 16 a bit 28 BD field 16 A-form 15 BD instruction field 666 AA field 16 BE address 20 See Machine State Register effective 23 BF field 16 effective address 419, 541 BF instruction field 666 real 420, 542 BFA field 16 address compare 420, 467, 473 BFA instruction field 666 address translation 435, 546 BH field 16 EA to VA 422 BI field 16 esid to vsid 422 block 342 overview 427 BO field 16, 28 PTE boundedly undefined 4 page table entry 431, 435 Branch Trace 473 Reference bit 435 Bridge 447 RPN Segment Registers 447 real page number 430 SR 447 VA to RA 430 brinc 208 VPN BT field 16 virtual page number 430 bytes 4 32-bit mode 422 address wrap 420, 542 addresses C accessed by processor 426 C 96 implicit accesses 426 CA 38 interrupt vectors 426 cache management instructions 358 with defined uses 426 cache model 343 addressing mode cache parameters 357 D-mode 669 Caching Inhibited 344 aliasing 347 Change bit 435 alignment CIA 7 effect on performance 355, 485, 605 consistency 347 Alignment interrupt 470, 505, 579 context assembler language definition 393, 509 extended mnemonics 317, 493, 635 synchronization 395, 511 mnemonics 317, 493, 635 Control Register 408 symbols 317, 493, 635 Count Register 412, 523, 676, 759 atomic operation 349 CR 26 atomicity 343 Critical Input interrupt 576 single-copy 343 Critical Save/Restore Register 1 565 Auxiliary Processor Unavailable interrupt 581 CSRR1 565 CTR 27, 676 B CTRL See Control Register B-form 13 Current Instruction Address 404, 515 BA field 16 BA instruction field 665, 666 Index 841 Version 2.04 D ecowx instruction 381, 382, 467, 470, 471, 473, 487 EE D field 16 See Machine State Register D instruction field 666 effective address 23, 419, 427, 541 D-form 14 size 422 D-mode addressing mode 669 translation 427 DABR interrupt 485 eieio instruction 347, 374, 454 DABR(X) emulation assist 394, 510 See Data Breakpoint Register (Extension) Endianness 346 DAR EQ 26, 27 See Data Address Register ESR 567 data access 420, 542 evabs 208 Data Address Breakpoint Register (Extension) 400, evaddiw 208 412, 485, 490, 761 evaddsmiaaw 208 data address compare 467, 473 evaddssiaaw 209 Data Address Register 412, 460, 468, 469, 470, 474, evlwhex 217 475, 759 exception 564 data cache instructions 360 alignment exception 579 Data Exception Address Register 566 critical input exception 576 data exception address register 566 data storage exception 577 Data Segment interrupt 468, 475 external input exception 578 data storage 341 illegal instruction exception 580 Data Storage interrupt 467, 473, 577 instruction storage exception 578 Data Storage Interrupt Status Register 412, 460, 468, instruction TLB miss exception 583 470, 471, 474, 505, 759 machine check exception 576 Alignment interrupt 505 privileged instruction exception 580 Data TLB Error interrupt 583 program exception 580 dcba instruction 360, 554 system call exception 581 dcbf instruction 367 trap exception 580 dcbst instruction 351, 366, 467, 473 exception priorities 591 dcbt instruction 360, 533, 557 system call instruction 593 dcbtls 558 trap instructions 592 dcbtst instruction 365, 535, 557 Exception Syndrome Register 567 dcbz instruction 366, 442, 467, 470, 473, 505, 536, exception syndrome register 567 554 exception vector prefix register 566 DEAR 566 Exceptions 563 Debug Interrupt 584 exceptions DEC address compare 420, 467, 473 See Decrementer definition 393, 509 Decrementer 412, 482, 523, 599, 759 page fault 420, 434, 467, 473, 541 Decrementer Interrupt 582 protection 420, 541 Decrementer interrupt 415, 416, 472 segment fault 420 defined instructions 18 storage 420, 541 denormalization 100 execution synchronization 395, 511 denormalized number 98 extended mnemonics 383 double-precision 100 External Access Register 412, 467, 473, 487, 490, doublewords 4 523, 759 DQ-form 14 External Control 381 DR External Control instructions See Machine State Register eciwx 382 DS field 16 ecowx 382 DS-form 14 External Input interrupt 578 DSISR External interrupt 415, 416, 470 See Data Storage Interrupt Status Register F E FE 27, 96 E (Enable bit) 487 FEX 95 EA 23 FE0 eciwx instruction 381, 382, 467, 470, 471, 473, 487 See Machine State Register 842 Power ISATM Version 2.04 FE1 VXSOFT 96 See Machine State Register VXSQRT 96 FG 27, 96 VXVC 96 FI 96 VXZDZ 96 Fixed-Interval Timer interrupt 582 XE 97 Fixed-Point Exception Register 412, 523, 759 XX 95 FL 26, 96 ZE 97 FLM field 17 ZX 95 floating-point FR 96 denormalization 100 FRA field 17 double-precision 100 FRB field 17 exceptions 94, 102 FRC field 17 inexact 107 FRS field 17 invalid operation 104 FRT field 17 overflow 105 FU 27, 96 underflow 106 FX 95 zero divide 105 FXM field 17 execution models 107 FXM instruction field 666 normalization 100 number denormalized 98 G infinity 99 GPR 38 normalized 98 GT 26, 27 not a number 99 Guarded 345 zero 98 rounding 101 sign 99 H single-precision 100 Floating-Point Unavailable interrupt 472, 476, 581 halfwords 4 forward progress 351 hardware FP definition 394, 510 See Machine State Register hardware description language 7 FPCC 96 hashed page table 431 FPR 94 size 432 FPRF 96 HDEC FPSCR 95 See Hypervisor Decrementer C 96 HDICE FE 96 See Logical Partitioning Control Register FEX 95 hrfid instruction 401, 479 FG 96 HRMOR FI 96 See Hypervisor Real Mode Offset Register FL 96 HSPRGn FPCC 96 See software-use SPRs FPRF 96 HTAB FR 96 See hashed page table FU 96 HTABORG 433 FX 95 HTABSIZE 433 NI 97 HV OE 97 See Machine State Register OX 95 hypervisor 397 RN 97 page table 431 UE 97 Hypervisor Decrementer 412, 483, 490, 759 UX 95 Hypervisor Decrementer interrupt 473 VE 97 Hypervisor Machine Status Save Restore Register VX 95 See HSRR0, HSRR1 VXCVI 97 Hypervisor Machine Status Save Restore Register VXIDI 96 0 460 VXIMZ 96 Hypervisor Real Mode Offset Register 39, 399, 408, VXISI 96 490 VXSNAN 96 Index 843 Version 2.04 I RS 18 RT 18 I-form 13 SH 18 icbi instruction 351, 359, 467, 473 SI 18 icbt instruction 359 SPR 17, 18 ILE SR 18 See Logical Partitioning Control Register TBR 18 illegal instructions 18 TH 18 implicit branch 420, 542 TO 18 imprecise interrupt 462, 571 U 18 in-order operations 420, 542 UI 18 inexact 107 XO 18 infinity 99 formats 13­?? instruction 467, 473 A-form 15 field B-form 13 BA 665, 666 D-form 14 BD 666 DQ-form 14 BF 666 DS-form 14 BFA 666 I-form 13 D 666 M-form 15 FXM 666 MD-form 15 L 666 MDS-form 15 LK 666 SC-form 14 Rc 666 VA-form 15 SH 666 VX-form 16 SI 666 X-form 14 UI 667 XFL-form 15 WS 667 XFX-form 15 fields 16­18 XL-form 15 AA 16 XO-form 15 BA 16 XS-form 15 BB 16 interrupt control 680 BC 16 mtmsr 527 BD 16 partially executed 588 BF 16 rfci 681 BFA 16 sc 680 BH 16 instruction cache instructions 359 BI 16 instruction fetch 420, 542 BO 16 effective address 420, 542 BT 16 implicit branch 420, 542 D 16 Instruction Fields 665 DS 16 instruction restart 356 FLM 17 Instruction Segment interrupt 469, 475 FRA 17 instruction storage 341 FRB 17 Instruction Storage interrupt 469, 578 FRC 17 Instruction TLB Error Interrupt 583 FRS 17 instruction-caused interrupt 462 FRT 17 Instructions FXM 17 brinc 208 L 17 dcbtls 558 LEV 17 evabs 208 LI 17 evaddiw 208 LK 17 evaddsmiaaw 208 MB 17 evaddssiaaw 209 ME 17 evlwhex 217 NB 17 instructions OE 17 classes 18 RA 17 dcba 360, 554 RB 17 dcbf 367 Rc 17 dcbst 351, 366, 467, 473 844 Power ISATM Version 2.04 dcbt 360, 533, 557 tlbsync 453, 454, 561 dcbtst 365, 535, 557 wrtee 528 dcbz 366, 442, 470, 505, 536, 554 wrteei 528 defined 18 interrupt 564 forms 19 Alignment 470, 505 eciwx 381, 382, 467, 470, 471, 473, 487 alignment interrupt 579 ecowx 381, 382, 467, 470, 471, 473, 487 DABR 485 eieio 347, 374, 454 Data Segment 468, 475 hrfid 401, 479 Data Storage 467, 473 icbi 351, 359, 467, 473 data storage interrupt 577 icbt 359 Decrementer 415, 416, 472 illegal 18 definition 393, 510 invalid forms 19 External 415, 416, 470 isync 351, 369, 463 external input interrupt 578 ldarx 349, 371, 463, 467, 470, 471, 473 Floating-Point Unavailable 472, 476 lmw 470 Hypervisor Decrementer 473 lookaside buffer 442 imprecise 462, 571 lq 410, 470 instruction lwa 471 partially executed 588 lwarx 349, 370, 463, 467, 470, 471, 473, 505 Instruction Segment 469, 475 lwaux 471 Instruction Storage 469, 578 lwsync 372 instruction storage interrupt 578 lwz 505 instruction TLB miss interrupt 583 mbar 374 instruction-caused 462 mfmsr 401, 417, 527 Machine Check 467 mfspr 414, 526 machine check interrupt 576 mfsr 449 masking 589 mfsrin 449 guidelines for system software 591 mftb 378 new MSR 466 mtmsr 401, 415, 479 ordering 589, 591 mtmsrd 401, 416, 479 guidelines for system software 591 address wrap 420, 542 overview 459 mtspr 413, 524 Performance Monitor 476 mtsr 448 precise 462, 571 mtsrin 448 priorities 479 optional processing 463 See optional instructions Program 471 preferred forms 19 program interrupt 580 ptesync 372, 395, 454 illegal instruction exception 580 reserved 19 privileged instruction exception 580 rfci 516 trap exception 580 rfid 351, 401, 405, 465, 479 recoverable 465 rfmci 517 synchronization 462 sc 404, 473, 515 System Call 473 slbia 444 system call interrupt 581 slbie 443 System Reset 466 slbmfee 446 system-caused 462 slbmfev 446 Trace 473 slbmte 445 type stdcx. 349, 371, 463, 467, 470, 471, 473 Alignment 579 stmw 470 Auxiliary Processor Unavailable 581 storage control 357, 442, 554 Critical Input 576 stq 410, 470 Data Storage 577 stw 505 Data TLB Error 583 stwcx. 349, 370, 463, 467, 470, 471, 473 Debug 584 stwx 505 Decrementer 582 sync 351, 372, 395, 435, 463 External Input 578 tlbia 434, 453 Fixed-Interval Timer 582 tlbie 434, 450, 453, 455, 561 Floating-Point Unavailable 581 tlbiel 452 Instruction TLB Error 583 Index 845 Version 2.04 Machine Check 576 Logical Partitioning Control Register 397, 412, 443, Program interrupt 580 490, 760 System Call 581 HDICE Hypervisor Decrementer Interrupt Condition- Watchdog Timer 582 ally Enable 399, 400, 415, 416, 473, 491 vector 463, 466 ILE Interrupt Little-Endian 398, 466 interrupt and exception handling registers LPES Logical Partitioning Environment DEAR 566 Selector 398, 400, 404, 423, 424, 437, 439, 466, ESR 567 492 ivpr 566 RMI Real Mode Caching Inhibited Bit 398, 400, interrupt classes 424, 492 asynchronous 570 RMLS Real Mode Offset Selector 398, 492 critical,non-critical 571 VC 492 machine check 571 VRMASD 492 synchronous 570 lookaside buffer 442 interrupt control instructions 680 LPAR (see Logical Partitioning) 397 mtmsr 527 LPCR rfci 681 See Logical Partitioning Control Register sc 680 LPES interrupt processing 572 See Logical Partitioning Control Register interrupt vector 572 LPIDR interrupt vector 572 See Logical Partition Identification Register Interrupt Vector Offset Register 36 524, 760 lq instruction 410, 470 Interrupt Vector Offset Register 37 524, 760 LR 27, 676 Interrupt Vector Offset Registers 568 LT 26 Interrupt Vector Prefix Register 566 lwa instruction 471 Interrupts 563 lwarx instruction 349, 370, 463, 467, 470, 471, 473, invalid instruction forms 19 505 invalid operation 104 lwaux instruction 471 IR lwsync instruction 372 See Machine State Register lwz instruction 505 isync instruction 351, 369, 463 IVORs 568 IVPR 566 M ivpr 566 M-form 15 Machine 513 K Machine Check 571 Machine Check interrupt 467, 576 K bits 437 Machine State Register 401, 404, 415, 416, 417, 463, key, storage 437 465, 466, 513, 527 BE Branch Trace Enable 402 DR Data Relocate 402 L EE External Interrupt Enable 401, 415, 416 dcbf 467, 473 FE0 FP Exception Mode 402 instructions FE1 FP Exception Mode 402 dcbf 467, 473 FP FP Available 402 L field 17 HV Hypervisor State 401 L instruction field 666 IR Instruction Relocate 402 language used for instruction operation description 7 LE Little-Endian Mode 402 ldarx instruction 349, 371, 463, 467, 470, 471, 473 ME Machine Check Enable 402 LE PMMPerformance Monitor Mark 402, 496 See Machine State Register PR Problem State 401 LEV field 17 RI Recoverable Interrupt 402, 415, 416 LI field 17 SE Single-Step Trace Enable 402 Link Register 412, 523, 676, 759 SF Sixty Four Bit mode 401, 420, 542 LK field 17 VEC Vector Avaialable 401 LK instruction field 666 Machine Status Save Restore Register lmw instruction 470 See SRR0, SRR1 Logical Partition Identification Register 399 Machine Status Save Restore Register 0 459, 463, Logical Partitioning 397 465 Machine Status Save Restore Register 1 463, 465, 846 Power ISATM Version 2.04 472 P main storage 341 MB field 17 page 342 mbar instruction 374 size 422 MD-form 15 page fault 420, 434, 467, 473, 541 MDS-form 15 page table ME See also hashed page table See Machine State Register search 433 ME field 17 update 454 memory barrier 347 page table entry 431, 435 Memory Coherence Required 345 Change bit 435 mfmsr instruction 401, 417, 527 PP bits 437 mfspr instruction 414, 526 Reference bit 435 mfsr instruction 449 update 454, 455 mfsrin instruction 449 partially executed instructions 588 mftb instruction 378 partition 397 Mnemonics 664 Performance Monitor interrupt 476 mnemonics performed 342 extended 317, 493, 635 PID 543 mode change 420, 542 PMM move to machine state register 527 See Machine State Register MSR PP bits 437 See Machine State Register PR mtmsr 527 See Machine State Register mtmsr instruction 401, 415, 479 precise interrupt 462, 571 mtmsrd instruction 401, 416, 479 preferred instruction forms 19 mtspr instruction 413, 524 priority of interrupts 479 mtsr instruction 448 Process ID Register 543 mtsrin instruction 448 Processor Utilization of Resources Register 400, 412, 483, 759 Processor Version Register 407, 519 N Program interrupt 471, 580 NB field 17 program order 341 Next Instruction Address 404, 405, 515, 516, 517 Program Priority Register 39, 408, 412, 760 NI 97 protection boundary 437, 470 NIA 7 protection domain 437 no-op 71 PTE 433 normalization 100 See also page table entry normalized number 98 PTEG 433 not a number 99 ptesync instruction 372, 395, 454 PURR See Processor Utilization of Resources Register O PVR See Processor Version Register OE 97 OE field 17 opcode 0 505 Q optional instructions 442 slbia 444 quadwords 4 slbie 443 tlbia 453 R tlbie 450 tlbiel 452 RA field 17 tlbsync 453 RB field 17 out-of-order operations 420, 542 RC bits 435 OV 38 Rc field 17 overflow 105 Rc instruction field 666 OX 95 real address 427 Real Mode Offset Register 399, 490 real page Index 847 Version 2.04 definition 393, 509 IVOR37 real page number 431 Interrupt Vector Offset Register 37 524, 760 recoverable interrupt 465 Link Register 27 reference and change recording 435 LPCR Reference bit 435 Logical Partitioning Control Register 397, 412, register 443, 490, 760 CSRR1 565 LPIDR CTR 676 Logical Partition Identification Register 399 DEAR 566 LR ESR 567 Link Register 412, 523, 759 IVORs 568 MSR IVPR 566 Machine State Register 401, 404, 415, 416, ivpr 566 417, 463, 465, 466, 513, 527 LR 676 PPR PID 543 Program Prioirty Register 39, 408, 412, 760 SRR0 564 PURR SRR1 564 Processor Utilization of Resources register transfer level language 7 Register 400, 412, 483, 759 Registers PVR implementation-specific Processor Version Register 407, 519 MMCR1 656 RMOR supervisor-level Real Mode Offset Register 399, 490 MMCR1 656 SDR1 registers Storage Description Register 1 412, 433, 759 Condition Register 26 Storage DescriptionRegister 1 490 Count Register 27 SPRGn CTR software-use SPRs 412, 523, 759 Count Register 412, 523, 759 SPRs CTRL Special Purpose Registers 412 Control Register 408 SRR0 DABR(X) Machine Status Save Restore Register 0 459, Data Address Breakpoint Register 463, 465 (Extension) 400, 412, 485, 490, 761 SRR1 DAR Machine Status Save Restore Register 1 463, Data Address Register 412, 460, 468, 469, 470, 465, 472 474, 475, 759 TB DEC Time Base 481, 597 Decrementer 412, 482, 523, 599, 759 TBL DSISR Time Base Lower 412, 481, 523, 597, 759 Data Storage Interrupt Status Register 412, TBU 460, 468, 470, 471, 474, 505, 759 Time Base Upper 412, 481, 523, 597, 759 EAR Time Base 377 External Access Register 412, 467, 473, 487, XER 490, 523, 759 Fixed-Point Exception Register 402, 412, 475, Fixed-Point Exception Register 38 523, 759 Floating-Point Registers 94 relocation Floating-Point Status and Control Register 95 data 420, 542 General Purpose Registers 38 reserved field 5, 394 HDEC reserved instructions 19 Hypervisor Decrementer 412, 483, 490, 759 return from critical interrupt 681 HRMOR rfci 681 Hypervisor Real Mode Offset Register 39, 399, rfci instruction 516 408, 490 rfid instruction 351, 401, 405, 465, 479 HSPRGn rfmci instruction 517 software-use SPRs 409 RI HSRR0 See Machine State Register Hypervisor Machine Status Save Restore Regis- RID (Resource ID) 487 ter 0 460 RMI IVOR36 See Logical Partitioning Control Register Interrupt Vector Offset Register 36 524, 760 RMLS 848 Power ISATM Version 2.04 See Logical Partitioning Control Register access order 347 RMOR accessed by processor 426 See Real Mode Offset Register atomic operation 349 RN 97 attributes rounding 101 Endianness 346 RS field 18 implicit accesses 426 RT field 18 instruction restart 356 RTL 7 interrupt vectors 426 N 433 No-execute 433 S order 347 Save/Restore Register 0 564 ordering 347, 372, 374 Save/Restore Register 1 564 protection sc 680 translation disabled 439 sc instruction 404, 473, 515 reservation 349 SC-form 14 shared 347 SDR1 with defined uses 426 See Storage Description Register 1 storage access 341 SE definitions See Machine State Register program order 341 segment floating-point 111 size 422 storage access ordering 385 type 422 storage address 20 Segment Lookaside Buffer storage control See SLB instructions 442, 554 Segment Registers 447 storage control attributes 344 Segment Table storage control instructions 357 bridge 447 Storage Description Register 1 412, 433, 490, 759 sequential execution model 25 storage key 437 definition 393, 510 storage location 341 SF storage operations See Machine State Register in-order 420, 542 SH field 18 out-of-order 420, 542 SH instruction field 666 speculative 420, 542 SI field 18 storage protection 437 SI instruction field 666 string instruction 550 sign 99 TLB management 550 single-copy atomicity 343 stq instruction 410, 470 single-precision 100 string instruction 550 Single-Step Trace 473 stw instruction 505 SLB 427, 442 stwcx. instruction 349, 370, 463, 467, 470, 471, 473 entry 428 stwx instruction 505 slbia instruction 444 symbols 317, 493, 635 slbie instruction 443 sync instruction 351, 372, 395, 435, 463 slbmfee instruction 446 synchronization 395, 454, 511 slbmfev instruction 446 context 395, 511 slbmte instruction 445 execution 395, 511 SO 26, 27, 38 interrupts 462 software-use SPRs 412, 523, 759 Synchronize 347 Special Purpose Registers 412 Synchronous 570 speculative operations 420, 542 system call 680 split field notation 13 system call instruction 593 SPR field 17, 18 System Call interrupt 473, 581 SR 447 System Reset interrupt 466 SR field 18 system-caused interrupt 462 SRR0 564 SRR1 564 T stdcx. instruction 349, 371, 463, 467, 470, 471, 473 stmw instruction 470 t bit 28 storage table update 454 Index 849 Version 2.04 TB 377 words 4 TBL 377 Write Through Required 344 TBR field 18 wrtee instruction 528 TH field 18 wrteei instruction 528 Time Base 377, 481, 597 WS instruction field 667 Time Base Lower 412, 481, 523, 597, 759 Time Base Upper 412, 481, 523, 597, 759 TLB 434, 442, 543 X TLB management 550 X-form 14 tlbia instruction 434, 453 XE 97 tlbie instruction 434, 450, 453, 455, 561 XER 38, 402, 475 tlbiel instruction 452 XFL-form 15 tlbsync instruction 453, 454, 561 XFX-form 15 TO field 18 XL-form 15 Trace interrupt 473 XO field 18 Translation Lookaside Buffer 543 XO-form 15 translation lookaside buffer 434 XS-form 15 trap instructions 592 XX 95 trap interrupt definition 393, 510 Z U z bit 28 ZE 97 U field 18 zero 98 UE 97 zero divide 105 UI field 18 ZX 95 UI instruction field 667 UMMCR1 (user monitor mode control register 1) 656 undefined 7 Numerics boundedly 4 underflow 106 32-bit mode 422 UX 95 V VA-form 15 VE 97 VEC See Machine State Register virtual address 427, 430 generation 427 size 422 virtual page number 431 virtual storage 342 VX 95 VX-form 16 VXCVI 97 VXIDI 96 VXIMZ 96 VXISI 96 VXSNAN 96 VXSOFT 96 VXSQRT 96 VXVC 96 VXZDZ 96 W Watchdog Timer interrupt 582 850 Power ISATM Version 2.04 Last Page - End of Document Last Page - End of Document 851 Version 2.04 852 Power ISATM