## Permedia4® Programmer's Guide - Volume II ## **DRAFT ONLY** ## PROPRIETARY AND CONFIDENTIAL INFORMATION # 3Dlabs® # Permedia4® Programmer's Guide - Volume II ## PROPRIETARY AND CONFIDENTIAL INFORMATION **Issue 4** i #### **Proprietary Notice** The material in this document is the intellectual property of **3D***labs*. It is provided solely for information. You may not reproduce this document in whole or in part by any means. While every care has been taken in the preparation of this document, **3D***labs* accepts no liability for any consequences of its use. Our products are under continual improvement and we reserve the right to change their specification without notice. **3D***labs* may not produce printed versions of each issue of this document. The latest version will be available from the **3D***labs* web site. - **3D***labs* products and technology are protected by a number of worldwide patents. Unlicensed use of any information contained herein may infringe one or more of these patents and may violate the appropriate patent laws and conventions. - **3D***labs* is the worldwide trading name of **3D***labs* Inc. Ltd. - **3D***labs*, Permedia4 and Permedia are registered trademarks of **3D***labs* Inc. Ltd. Microsoft, Windows and Direct3D are either registered trademarks or trademarks of Microsoft Corp. in the United States and/or other countries. OpenGL is a registered trademark of Silicon Graphics, Inc. All other trademarks are acknowledged and recognized. © Copyright 3Dlabs Inc. Ltd. 1999. All rights reserved worldwide. Email: info@3dlabs.com Web: http://www.3dlabs.com **3D**labs Ltd. **3D**labs K.K Meadlake Place Thorpe Lea Road, Egham Shiroyama JT Mori Bldg 16F Surrey, TW20 8HE 40301 Toranomon United Kingdom Minato-ku, Tokyo, 105, Japan Tel: +44 (0) 1784 470555 Tel: +81-3-5403-4653 Fax: +44 (0) 1784 470699 Fax: +91-3-5403-4646 **3D**labs Inc. 480 Potrero Avenue Sunnyvale, CA 94086, United States Tel: (408) 530-4700 Fax: (408) 530-4701 ### **Change History** | Document | Issue | Date | Change | |----------|-------|-----------------|-----------------------------------------------------------------| | 160.3.1 | 1 | 1 June 99 | First DRAFT Issue. | | 160.3.1 | 2 | 6 August 99 | Virtual units, editorial changes | | 160.3.1 | 3 | 15 September 99 | Structural changes to balance content with volume I, new data | | | | | and corrections in most chapters. | | 160.3.1 | 4 | 18 June 2001 | V1 OGL extensions to texture comp.; fixed Initialization | | | | | example values for stencil position and width, GID; removed | | | | | index entries and traces of FBReadMode, deleted | | | | | Windowbase references, corrected GID test control (no | | | | | longer in Window register), corrected Stencil source data field | | | | | example values. | **Contents** #### Proprietary Notice......i Change History .......ii Contents GRAPHICS PROGRAMMING...... 1-1 1.1 The Graphics HyperPipeline ......1-1 1.1.1 1.1.2 1.1.3 1.1.4 1.1.5 1.2 1.2.1 1.2.2 RASTERIZER AND 2D SETUP ...... 2-1 2.1 2.1.1 Antialiasing 2-4 2.1.2 2.1.3 Stippling during Rasterizing ......2-6 2.1.4 Points 2-7 2.1.5 2.1.6 2.1.7 Span Operations 2-15 Pixel Sizes 2-19 2.1.8 2.1.9 3 2.2 2.2.2 2.2.3 2.3.1 3.1.1 3.1.2 3.1.3 3.1 SCISSOR, STIPPLE AND COLOR DDA UNITS ...... 3-1 Scissor Unit 3-1 Rasterizer Unit Registers 2-30 User Scissor Test 3-1 Screen Scissor Tests 3-1 Scissor Registers 3-2 | | 3.1. | .4 Span Operations and the Scissor Unit | 3-3 | |---|------|-----------------------------------------|------| | | 3.1. | .5 Scissor Example | 3-3 | | | 3.2 | Stipple Unit | 3-3 | | | 3.2. | .1 Area Stippling | 3-3 | | | 3.2. | .2 Line Stippling | 3-4 | | | 3.2. | .3 Span Operations and Stippling | 3-5 | | | 3.2. | .4 Registers | 3-5 | | | 3.2. | .5 Examples | 3-7 | | | 3.2. | .6 Line Stipple Example | 3-8 | | | 3.2. | .7 Area Stipple Pattern Example | 3-8 | | | 3.3 | Color DDA Unit | 3-9 | | | 3.3. | .1 RGBA and Color-Index(CI) Modes | 3-10 | | | 3.3. | .2 Gouraud Shading | 3-10 | | | 3.3. | .3 Flat Shading Example | 3-12 | | | 3.3. | .4 Gouraud Shaded Trapezoid Example | 3-12 | | | 3.3. | .5 Gouraud Shaded Line Example | 3-13 | | 4 | LC | CALBUFFER READ/WRITE | 4-1 | | | 4.1. | .1 Mode Registers | 4-2 | | | 4.2 | Window register | 4-5 | | | 4.3 | Pixel Ownership (GID) Test Unit | 4-6 | | | 4.3. | .1 Pixel Ownership Test | 4-6 | | | 4.4 | Stencil Test | 4-8 | | | 4.4. | .1 Registers | 4-9 | | | 4.4. | .2 Stencil Example | 4-11 | | | 4.5 | Depth Test | 4-11 | | | 4.5. | .1 Registers | 4-13 | | | 4.5. | .2 Depth Example | 4-15 | | 5 | TE | XTURE MAPPING | 5-1 | | | 5.1. | .1 Compatibility with Earlier Chipsets | 5-1 | | | 5.2 | Texture Co-ordinate Generation | 5-2 | | | 5.2. | .1 Calculate texture coordinates | 5-2 | | | 5.2. | .2 Level of Detail calculation | 5-3 | | | 5.2. | .3 Texture Read | 5-5 | | | 5.2. | .4 Filter Modes | 5-8 | | | 5.2. | .5 Texel Formatting | 5-11 | | | 5.2. | .6 Lookup Table (LUT) | 5-15 | | | | | | | | 5.2 | .7 | Texture Filtering and Alpha Mapping | 5-16 | |---|-----|--------|-------------------------------------|-------| | | 5.2 | .8 | Texture Color Compositing | 5-17 | | | 5.2 | .9 | Implementation | 5-27 | | 6 | FC | OG, AN | ITIALIAS AND ALPHA TEST | . 6-1 | | | 6.1 | Fog U | Unit | 6-1 | | | 6.1 | . 1 | Fog Index Calculation | . 6-1 | | | 6.1 | .2 | Fog Table | . 6-2 | | | 6.1 | .3 | Fog Application | . 6-3 | | | 6.1 | .4 | FogMode register | . 6-4 | | | 6.1 | .5 | Fog Example | . 6-5 | | | 6.2 | Antia | liasing | 6-7 | | | 6.2 | . 1 | Antialias Application | . 6-7 | | | 6.2 | .2 | Polygon Antialiasing | . 6-7 | | | 6.2 | .3 | Registers | . 6-8 | | | 6.2 | .4 | Antialias Example | . 6-9 | | | 6.3 | Alpha | a Test Unit | 6-9 | | | 6.3 | . 1 | Alpha Test | . 6-9 | | | 6.3 | .2 | Registers | 6-10 | | | 6.3 | .3 | Alpha Test Example | 6-10 | | 7 | FR | RAME | BUFFER READ/WRITE | . 7-1 | | | 7.1 | . 1 | Standard Framebuffer Read Operation | . 7-1 | | | 7.1 | .2 | Framebuffer Read Span Operations | . 7-2 | | | 7.1 | .3 | Merge-copy Span Operations. | . 7-2 | | 8 | AL | PHA I | BLENDING | . 8-1 | | | 8.1 | Intro | duction | 8-1 | | | 8.1 | . 1 | Alpha Blend Functions | . 8-1 | | | 8.1 | .2 | Alpha Blend Registers | . 8-2 | | | 8.2 | Sourc | ce Blending Functions | 8-2 | | | 8.2 | . 1 | OpenGL Alpha Blending | . 8-2 | | | 8.3 | Desti | nation Blending Functions | 8-3 | | | 8.3 | . 1 | OpenGL Destination Blending | . 8-3 | | | 8.3 | .2 | QuickDraw 3D Alpha Blending | . 8-4 | | | 8.3 | .3 | Image Formatting | . 8-4 | | | 8.3 | .4 | Registers | . 8-5 | | | 8.3 | .5 | Chroma Testing | . 8-9 | | | 8.3 | .6 | Alpha Blend Example | 8-11 | | 9 | COLOF | R FORMAT AND LOGICAL OPS | 9-1 | |----|------------|-------------------------------------------|------| | | 9.1 Col | or and Alpha Formats | 9-1 | | | 9.1.1 | Color Dithering | 9-4 | | | 9.1.2 | Registers | 9-5 | | | 9.1.3 | Dither Example | 9-6 | | | 9.1.4 | 3:3:2 Color Format Example | 9-6 | | | 9.1.5 | 8:8:8:8 Color Format Example | 9-6 | | | 9.1.6 | Color Index Format Example | 9-7 | | | 9.2 Log | rical Op Unit | 9-7 | | | 9.2.1 | High Speed Flat Shaded Rendering | 9-7 | | | 9.2.2 | Logical Operations | 9-8 | | | 9.2.3 | Registers | 9-8 | | | 9.2.4 | XOR Example | 9-9 | | | 9.2.5 | Logical Op and Software Writemask Example | 9-10 | | 10 | FRAME | EBUFFER WRITEMASKS | 10-1 | | | 10.1.1 | Software Writemasks | 10-1 | | | 10.1.2 | Hardware Writemasks | 10-1 | | | 10.1.3 | Registers | 10-1 | | | 10.1.4 | Software Writemask Example | 10-1 | | | 10.1.5 | Hardware Writemask Example | 10-2 | | 11 | HOST | OUT | 11-1 | | | 11.1 Filte | ering | 11-1 | | | 11.1.1 | Filter Mode Example | 11-1 | | | 11.1.2 | Statistic Operations | 11-2 | | | 11.1.3 | Synchronization | 11-3 | | | 11.1.4 | Registers | 11-3 | | | 11.1.5 | Picking Example | 11-5 | | | 11.1.6 | Sync Interrupt Example | 11-6 | | 12 | INITIAL | LIZATION | 12-1 | | | 12.1 Init | ializing Permedia4 | 12-1 | | | 12.1.1 | Reset and initialisation | 12-1 | | | 12.2 Syst | tem Initialization | 12-2 | | | 12.2.1 | PCI bus | 12-2 | | | 12.2.2 | Memory Configuration | 12-2 | | | 12.2.3 | Internal Video Timing Registers | 12-3 | | | 12.2.4 | Framehuffer Depth | 12-3 | | 12.2.5 | Screen Width | 12-4 | |----------------------|------------------------------------------------------|------| | 12.2.6 | Screen Clipping Region | 12-4 | | 12.2.7 | Localbuffer and Framebuffer Configuration | 12-4 | | 12.2.8 | Host Out Unit | 12-5 | | 12.2.9 | Disabling Specialized Modes | 12-6 | | 12.3 Win | ndow Initialization | 12-6 | | 12.3.1 | Color Format | | | 12.3.2 | Setting the Window Address and Origin | | | 12.3.3 | Writemasks | 12-7 | | 12.3.4 | Enabling Writing | 12-7 | | 12.4 Ap <sub>1</sub> | plication Initialization | 12-8 | | 13 PERFO | DRMANCE TIPS | 13-1 | | 13.1 Blo | ck Writes | | | 13.2 Fas | t double buffering in a window | | | 13.3 Dis | able FB Reads per pixel if not required | | | 13.4 Imp | proving PCI bus bandwidth for Programmed I/O and DMA | | | 13.5 PC | I burst transfers under Programmed I/O | | | 13.6 Usi | ng PCI Disconnect Under Programmed I/O | | | 13.7 Usi | ng Bus Mastership (DMA) | | | 13.8 Dis | abling units not in use | | | 13.9 Cle | aring the localbuffer & framebuffer | | | 13.10 Use | e of the Framebuffer (or Localbuffer) Bypass | | | 13.11 Loa | nding Registers in Unit Order | | | 13.12 Avo | oiding Unnecessary Register Updates | | | 13.13 Ha | rdware and Software Context Dumps | | | 13.14 Use | e the Memory Scratchpad Registers | | | 13.15 Mis | scellaneous Tips | | | 14 APPEN | NDICES | 14-1 | | | udocode Definitions | | | 14.2 Into | erpolation Calculation | 14-3 | | 14.2.1 | Color Gradient Interpolation | 14-3 | | 14.2.2 | Register Set Up for Color Interpolation | | | 14.2.3 | Calculating Depth Gradient Values | | | 14.3 Acc | curate Rendering | 14-5 | | 14.4 Glo | ossary | 18 | | 15 INDEX | ES | 21 | | 15.1 | Volume I Index | 21 | |------|-----------------|----| | 15.2 | Volume II Index | 24 | 1 ### **Graphics Programming** Permedia4 provides a rich variety of operations for 2D and 3D graphics supported by its pipeline architecture. In this chapter, section §1.1 shows the basic units in the HyperPipeline. The following chapters describe a typical rendering process for two typical basic graphic primitives (the Gouraud shaded triangle and a 2D rectangle) and each of the units in detail. #### 1.1 The Graphics HyperPipeline This section describes each of the units in the graphics HyperPipeline. Figure 1-1 shows a schematic of the pipeline. In this diagram, the localbuffer contains the pixel ownership values (known as Graphic IDs), the Depth (Z) and Stencil buffer. The framebuffer contains the Red, Green, Blue and Alpha bitplanes. The units in the HyperPipeline are: - Rasterizer scan converts the given primitive into a series of fragments for processing by the rest of the pipeline. - Scissor Test clips out fragments that lie outside the bounds of a user defined scissor rectangle and also performs screen clipping to stop illegal accesses outside the screen memory. - Stipple Test masks out certain fragments according to a specified pattern. Line and area stipples are available. - GID (Pixel Ownership) is concerned with ensuring that the location in the framebuffer for the current fragment is owned by the current visual. Comparison occurs between the given fragment and the Graphic ID value in the localbuffer, at the corresponding location, to determine whether the fragment should be discarded. - Stencil Test conditionally discards a fragment based on the outcome of a test between the given fragment and the value in the stencil buffer at the corresponding location. The stencil buffer is updated dependent on the result of the stencil test and the depth test. - Depth Test conditionally discards a fragment based on the outcome of a test between the depth value for the given fragment and the value in the depth buffer at the corresponding location. The result of the depth test can be used to control the updating of the stencil buffer. - Color DDA is responsible for generating the color information (RGBA or Color Index(CI)) associated with a fragment. - Texture is concerned with mapping a portion of a specified image (texture) onto a fragment. The process involves interpolating to determine the texel coordinates including perspective division, reading the texels, filtering to calculate the texture color, and application which applies the texture color to the fragment color. - Fog blends a fog color with a fragment's color according to a given fog factor. Fogging is used for depth cueing images and to simulate atmospheric fogging. Antialias Application combines the incoming fragment's alpha value with its coverage value when antialiasing is enabled. Figure 1-1 HyperPipeline - Alpha Test conditionally discards a fragment based on the outcome of a comparison between the fragments alpha value and a reference alpha value. - Alpha Blending combines the incoming fragment's color with the color in the framebuffer at the corresponding location. - Color Formatting converts the fragment's color into the format in which the color information is stored in the framebuffer. This may optionally involve dithering. - Logical Op/Framebuffer Mask performs Logical Operations between the fragment and destination, and optionally applies a writemask. - Host Out optionally gathers statistics for picking and extent checking, and returns data to the host for image uploads. The HyperPipeline structure of Permedia4 is very efficient at processing fragments. For example, texture mapping calculations are not actually performed on fragments that are clipped off by scissor testing. This approach saves substantial computational effort. To obtain the best results when programming for pipelined processing, however, you need to be aware of what all the pipeline stages are doing at any time. For example, many operations require both a read and/or write to the localbuffer and framebuffer. Because these are at different points in the pipeline the programmer must enable data read/write from/to the framebuffer – simply setting a logical operation to XOR and enabling logical operations will not have the desired effect. #### 1.1.1 Router As discussed in Volume I, the register address space can be seen conceptually as either a message passing system or a flat address map. This allows some significant adaptive performance enhancements. One important performance feature of the pipeline is the Router. This is essentially a switch which allows the order of some of the units to be swapped, by setting or clearing the *Sequence* bit of the **RouterMode** register. Textured primitives are typically more processor-intensive than non-textured primitives. When the *Sequence* bit is set, fragments are tested against the GID (Pixel Ownership), Stencil and Depth(Z) before the texture value is calculated. If the fragment fails any of these tests nothing is drawn so texture value calculations can be skipped - leading to higher performance. OpenGL defines the order of operations on a fragment as texture, alpha test, stencil then depth(Z), which is the sequence used when the Sequence bit in the Router register is cleared. However, if the alpha test is disabled (or cannot reject fragments) then OpenGL compatible semantics are maintained even if the operation order is changed to the more efficient stencil, depth(Z), texture, and alpha test. The order can be dynamically reconfigured at any time without any need to synchronize simply by writing to the Order bit. #### 1.1.2 Initialization Permedia4 requires many of its registers to be initialized in a particular way, regardless of what is to be drawn, for instance, the screen size and appropriate clipping must be set up. Normally this only needs to be done once and for clarity this example assumes that all initialization has already been done. More details may be found later in this volume. Other states (e.g. enabling Gouraud shading and depth buffering) change occasionally though rarely on a per primitive basis. #### 1.1.3 Dominant and Subordinate Sides of a Triangle The dominant side of a triangle is that with the greatest range of Y values. The choice of dominant side is optional when the triangle is either flat bottomed or flat topped. Permedia4 always draws triangles from the dominant edge towards the subordinate edges. This simplifies the calculation of set up parameters as will be seen below. Figure 1-2 Dominant and Subordinate Sides of a Triangle #### 1.1.4 Register Set Up for Depth Testing Internally Permedia4 uses fixed point arithmetic. The formats for each register are described in the *Reference Guide*. Each depth value must be converted into a 2's complement 16.32 bit fixed point number and then loaded into the appropriate pair of 32 bit registers (**DZdxL**, **DZdxU**, **DZdyDomL**, **DZdyDomU**). The 'Upper' or 'U' registers store the integer portion, whilst the 'Lower' or 'L' registers store the 16 bit LSB, left justified and zero filled. For the example triangle, Permedia4 would need its registers set up as follows: ``` // Load the depth start and delta values // to draw a triangle ZstartU (Z1_MS) ZstartL (Z1_LS) dZdyDomU (dZdy13_MS) dZdyDomL (dZdy13_LS) dZdxU (dZdx_MS) dZdxL (dZdx_LS) ``` #### 1.1.4.1 RasterizerMode The Permedia4 rasterizer has a numebr of mode bits which take effect until cleared and therefore tend to affect many primitives. These primarily involve bit mask operations described below. For details refer to the *Reference Guide*, **RasterizerMode** register. In the case of the Gouraud shaded triangle the default value for these modes is suitable. #### 1.1.5 Subpixel Correction Permedia4 supports subpixel correction of interpolated values when rendering aliased trapezoids (smooth shaded, textured, fogged or depth buffered). Subpixel correction ensures that all interpolated parameters associated with a fragment (color, depth, fog, texture) are correctly sampled at the fragment's center. This correction is required to ensure consistent shading of objects made from many primitives. Subpixel correction is not applied to antialiased primitives. Control of subpixel correction is in the **Render** command register described below, and can be selected in bit settings for individual primitives (**DrawLine**, **DrawTriangle**, **DrawPoint**). A full code example is given in the Appendices. #### 1.2 Pipeline Overviews Before we review each unit in detail it is worth looking in general terms at how a graphic primitive passes through the pipeline, what messages are generated and what happens in each unit. Some simplifications have been made in the description to avoid detail which would otherwise complicate what is in fact a very simple process. The descriptions concentrate on what happens as a fragment flows down the message stream. It is important to remember that at any instant in time there are many fragments flowing down the message stream and the further they get the more processing has occurred. #### 1.2.1 A day in the life of a 3D triangle This section previews the render process for a typical 3D graphics primitive, the Gouraud shaded, depth buffered, dithered triangle. For this example assume that the triangle is to be drawn into a window which has its colormap set for RGB as opposed to color index operation. This means that all three color components; red, green and blue, must be handled. Also, assume the coordinate origin is bottom left of the window and drawing will be from top to bottom. Permedia4 can draw from top to bottom or bottom to top. For clarity the equations are shown in full in the appendices, though in practice there are many common terms and factors which need only be computed once and normally the OGL driver performs all the necessary interpolations. Consider a triangle with vertices, v1, v2 and v3 where each vertex comprises X, Y and Z coordinates, shown below. Each vertex has a different color made up of red, green and blue (R, G and B) components. The alpha component is omitted for this example. Figure 1-3 Example Triangle The diagram makes a distinction between top and bottom halves because Permedia4 is designed to rasterize (a) screen aligned trapezoids, and (b) flat-topped or -bottomed triangles; as shown below: Figure 1-4 Screen aligned trapezoid and flat topped triangle #### **1.2.1.1 Delta Unit** The drawing process starts by generating and loading vertex data in the Delta Unit: - The application generates the triangle vertex information and makes the necessary OpenGL calls to draw it. - 2. The OpenGL server/library gets the vertex information, transforms, clips and lights it. The vertex coordinates and color values are written into the vertex stores (in the Delta Unit) and the **DrawTriangle** command is issued. - 3. The Delta Unit calculates the initial values and derivatives for the values to interpolate (X<sub>left</sub>, X<sub>right</sub>, red, green, blue and depth) for unit change in dx and dxdy<sub>left</sub>. All these values are in fixed point integer and have unique message tags. Some of the values (the depth derivatives) have more than 32 bits to cope with the dynamic range and resolution so are sent in two halves Finally, once the derivatives, start and end values have been sent the 'render triangle' message begins the rendering process. - 4. The derivative, start and end parameter messages are received and filter down the message stream to the appropriate units. The depth parameters and derivatives to the Depth Unit; the RGB parameters and derivative to the Color DDA Unit; the edge values and derivatives to the Rasteriser Unit. #### 1.2.1.2 Rasterizer The 'render triangle' message is received by the rasteriser unit and all subsequent messages (from the host) are blocked until the triangle has been rasterised (but not necessarily written to the framebuffer). A 'prepare to render' command is passed on so any other units can prepare themselves. The Rasteriser Unit walks the left and right edges of the triangle and fills in the spans between. As the walk progresses messages are send to indicate the direction of the next step: StepX or StepYDomEdge. #### 1.2.1.3 Rasterizer 'Edge walking' - Calculating the Slope for each Side Permedia4 draws filled shapes such as triangles as a series of spans with one span per scanline. Therefore it needs to know the start and end X coordinate of each span. These are determined by 'edge walking'. This process involves adding one delta value to the previous span's start X coordinate and another delta value to the previous span's end x coordinate to determine the X coordinates of the new span. These delta values are in effect the slopes of the triangle sides. To draw from left to right and top to bottom, the slopes of the three sides are calculated as: $$dX_{23} = \frac{X_3 - X_2}{Y_3 - Y_2}$$ $$dX_{13} = \frac{X_3 - X_1}{Y_3 - Y_1}$$ $$dX_{12} = \frac{X_2 - X_1}{Y_2 - Y_1}$$ This triangle will be drawn in two parts, top down to the 'knee' i.e. vertex 2 and then from there to the bottom. The dominant side is the left side so for the top half: $$dXDom = dX_{13}$$ $dXSub = dX_{12}$ The start X,Y, the number of scanlines, and the deltas (above) give Permedia4 enough information to edge walk the top half of the triangle. However, to indicate that this is not a flat topped triangle (Permedia4 is designed to rasterize screen aligned trapezoids and flat topped triangles), the same start position in terms of X must be given twice as StartXDom and StartXSub. To edge walk the lower half of the triangle, selected additional information is required. The slope of the dominant edge remains unchanged, but the subordinate edge slope needs to be set to: Also the number of scanlines to be covered from Y2 to Y3 needs to be given. Finally to avoid any rounding errors accumulated in edge walking to X2 (which can lead to pixel errors), StartXSub must be set to X2. The data field holds the current (x, y) coordinate. One message is sent per pixel within the triangle boundary. These messages, or fragments, are divided into two groups, active and passive. Fragments always start off in the active group but may be changed to the passive group if the pixel fails one of the tests (e.g. depth) on its path down the message stream. The two groups are distinguished by a single bit in the message tag. The fragments (in either form) are always passed throughout the length of the message stream and are used by all the DDA units to keep their interpolation values in step. Any other messages pertaining to fragments always precede the fragment in the message stream. The messages hold X, Y, color and coverage data for each fragment <sup>1</sup>. The data field expands between units to accommodate additional data when necessary. <sup>&</sup>lt;sup>1</sup>The coverage field is only used for antialiasing. For aliased primitives the coverage field holds a dErr value used for subpixel correction. #### 1.2.1.4 Rasterizing the Triangle We are almost ready to draw the triangle. Setting up the registers as described here and sending the **Render** command draws the top half of the example triangle first. To draw the example triangle, all the bit fields within the **Render** command should be set to 0 except the PrimitiveType which should be set to trapezoid and the *SubPixelCorrection Enable* bit which should be set to TRUE. ``` // Draw triangle with knee // Set deltas \begin{array}{ll} \text{StartXDom (X$_1$<<16)} & \text{//Converted to 16.16 fixed point} \\ \text{dXDom (((X$_3$-X$_1$)<<16)/(Y$_3$-Y$_1$))} \\ \text{StartXSub (X$_1$<<16)} \\ \text{dXSub (((X$_2$-X$_1$)<<16)/(Y$_2$-Y$_1$))} \\ \text{StartY (Y$_1$<<16)} \\ \text{dY (-1$<<16)} \\ \text{Count (Y$_1$-Y$_2$)} \\ \text{// Set the render command mode} \\ \text{render.PrimitiveType} = \text{PERMEDIA4\_TRAPEZOID\_PRIMITIVE} \\ \text{render.SubPixelCorrectionEnable} = \text{TRUE} \\ \text{// Draw the top half of the triangle} \\ \\ \text{Render(render)} \end{array} ``` After the **Render** command has been issued the registers in Permedia4 can immediately be altered to draw the lower half of the triangle. Only two registers need be loaded and the **ContinueNewSub** command sent. Once Permedia4 has received **ContinueNewSub** it starts drawing the sub-triangle. ``` // Set-up the delta and start for the new edge StartXSub (X2<<16) dXSub (((X3 - X2)<<16)/(Y3 - Y2)) // Draw sub-triangle ContinueNewSub (Y2 - Y3) // Draw lower half ``` #### 1.2.1.5 Scissor and Stipple This unit does 4 tests on the fragment (as embodied by the active step message). The screen scissor test takes the coordinates associated with the fragment, converts them to be screen relative (if necessary) and compares them against the screen boundaries. (The other three tests - user scissor, line stipple and area stipple - are disabled in this example.) If the enabled tests pass then the active fragment is forwarded to the next unit, otherwise it is changed into a passive step and then forwarded. #### 1.2.1.6 Router In this example the Router is set up so the Depth test occurs before the texture operations (i.e. **RouterMode** Sequence bit = 1). #### 1.2.1.7 Local Buffer Read In general terms the Local Buffer Read Unit reads the Graphic ID, Stencil and Depth information from the Local Buffer and passes it to the next unit. This includes performing the GID test on fragments, checking cache and local buffer for data, and data formatting. See volume I - Localbuffer - for more information #### 1.2.1.8 Stencil and Depth When an active fragment is received the internal stencil and depth values are compared with the fragment's as specified in the **StencilMode** and **DepthMode** registers. If the enabled tests pass then the new local buffer data is written back to the fragment, which is forwarded to the next unit. If any of the enabled tests fail then equivalent passive step message is forwarded instead (a local buffer write may still be done). The Depth DDA is stepped to update the local depth value. #### 1.2.1.9 Local Buffer Write The Local Buffer Write Unit calculates the address, formats the GID, Stencil and Depth data and (if writes are enabled) passes the formatted data and address to the Memory Controller. The memory is much wider than the pixel data so any writes are first done into a write combine buffer which is flushed to memory as required. See volume I - Localbuffer - for more information. The fragment is forwarded to the next unit. #### 1.2.1.10 Color DDA The Color DDA unit responds to an active fragment by updating the fragment's color field and sending this to the next unit. The color field holds the *current* RGBA value from the DDA. After the step message is sent the DDA is incremented in the correct direction, ready for the next pixel. #### 1.2.1.11 Texturing, Fog and Alpha Tests In this example, Texturing, Fog and Alpha Tests are disabled so the fragments are forwarded unchanged. #### 1.2.1.12 Framebuffer Read In general terms Framebuffer Read reads the color information from the framebuffer and passes it onto the next unit. It is functionally similar to the Localbuffer but handles color data rather than GID, depth and stencil data, write-combined operations and Patch2 and Patch32\_2 formats. See volume I - Framebuffer - for more information. #### 1.2.1.13 Alpha Blend The formatting of the Framebuffer data is deferred until the Alpha Blend Unit as it is the only unit which needs to match buffer formats with the internal formats. In this example no alpha blending or logical ops are taking place so reads are disabled and fragments pass through unaltered. #### 1.2.1.14 Dither The Dither Unit uses the least significant bits of the (X, Y) coordinate information from the step message to dither the color field. Part of the dithering process is to convert from the internal color format into the format of the framebuffer. The new color is inserted back into the color field and the fragment forwarded. #### 1.2.1.15 Logical Ops In this example Logical Ops are disabled so the fragments pass through. #### 1.2.1.16 Framebuffer Write The Framebuffer Write Unit calculates the address and (if writes are enabled) passes the formatted data and address to the Memory Controller. The memory is much wider than the pixel data so any writes are first done into a write combine buffer and only when this needs to be flushed is the Memory Controller given the write command - see volume I - Framebuffer - for more information. #### 1.2.1.17 Host Out The Host Out Unit deals with host synchronisation and statistics. In this example it simply consumes any fragments which reach this point in the message stream. #### 1.2.2 A day in the life of a 2D primitive Permedia4 introduces an alternative method for rendering which is particularly suited to 2D operations. These are pure 2D without any 3D functionality such as depth or stencil testing or alpha blending. 2D drawing works on spans of pixels, where a span is always 64 pixels sequentially along a scanline. The core now works on 64 pixels in parallel (in addition to processing multiple spans along the length of the message stream) and a pixel can be 8, 16 or 32 bits in size. Spans can be read, written, copied, uploaded or downloaded. A span can have a constant color or a variable color per pixel in the span. The primitive we are going to look at is a fill with a constant color through a bit mask held in the texture memory. The zero bits in the bit mask do not cause the corresponding pixels in the framebuffer to be written to. The fill shape is a rectangle for simplicity, but could be any shape (with suitable decomposition into primitives Permedia4 understands). As usual, we refer to "units" along the message stream but these are virtual rather than necessarily physical entities. #### 1.2.2.1 Initialization The application generates the rectangle information and makes the necessary Windows API calls to draw it. #### 1.2.2.2 2D Set Up Unit The NT device driver gets the rectangle information and uses the Render2D command to set up and rasterise the rectangle (done in the new 2D Set Up Unit). Other state data and information is also set up, as discussed below. #### 1.2.2.3 Rasterizer The 'render trapezoid' message is received by the rasteriser unit and all subsequent messages (from the host) are blocked until the trapezoid has been rasterised (but not necessarily written to the framebuffer). The **Render** message has the *FastFillEnable* bit set. A 'prepare to render' message is also passed on internally so any other units can prepare themselves. The Rasteriser Unit walks the left and right edges of the trapezoid (a rectangle in this case) and fills in the spans between the left and right hand edges. As the walk progresses messages are sent to indicate the direction of the next step. These internal SpanStep commands control the subsequent processing of the span fragment. #### 1.2.2.4 Scissor and Stipple Unit Scissor and Stipple Unit. This unit does 3 tests on the span (as embodied by the SpanStep message). The screen scissor test takes the coordinates associated with the SpanStep message, converts them to be screen relative (if necessary) and compares the pixel mask against the screen boundaries and clears the bits for pixels which lie outside the screen boundary. The pixel mask is potentially further reduced using the scissor tests (applied in a similar way). The area stipple test is disabled for this example but, if it was enabled would potentially remove further pixels after suitable alignment. The modified SpanStep message is forwarded to the next unit. #### 1.2.2.5 Color DDA The Color DDA unit does not respond to the SpanStep messages so they just pass through. #### 1.2.2.6 Texture Coordinate and Index The Texturing Coordinate Unit responds to the SpanStep message and appends the u, v coordinates of the texel where the bit mask data for this span is held. The S and T DDAs are set up to step through the bit mask pattern. The SpanStep is forwarded on to the next unit. The Texture Index Unit converts the uv coordinate in the SpanStep message into an ij coordinate of the texel where the bit mask data for this span is held. The SpanStep is forwarded on to the next unit. #### 1.2.2.7 Texturing, Fog and Alpha The Texture Read Unit converts the ij coordinate into a physical address where the texel data is held. The texel data is read (maybe sourced from the secondary cache) and zero extended up to 64 bits if the bit mask was held as 8, 16, or 32 bits. After being optionally inverted or mirrored it is ANDed with the pixel mask field in the SpanStep message and forwarded to the next unit. The remaining texture units, Fog and Alpha Tests Units do not respond to the SpanStep messages so they just pass through. #### 1.2.2.8 Localbuffer Read, Stencil/Depth and Localbuffer Write The LB Read, Stencil/Depth, LB Write Units do not respond to the SpanStep messages (in this example) so they just pass through. #### 1.2.2.9 Framebuffer Read In general terms the Framebuffer Read Unit reads the color information from the framebuffer and forwards it to the next unit. More specifically for spans it calculates the linear address in the framebuffer of the required data. This is done using the (X, Y) position recorded internally and locally stored information on the 'screen width' and window base address. The span is decomposed into a series of memory aligned reads. In this example no logical ops are taking place so reads are disabled and hence no read address is sent to the Memory Controller. The span tags just pass through. #### 1.2.2.10 Alpha Blend and Dither The Alpha Blend and Dither Units do not respond and the span data simply passes through. #### 1.2.2.11 Logical Ops The Logical Ops are disabled so the Span data passes through. #### 1.2.2.12 Framebuffer Write The Framebuffer Write Unit calculates the address, aligns the pixel mask to the memory block write boundaries and passes these to the Memory Controller. The pixel data previously set up in the **FBColor** Register can be written, ideally using the block fill capability of the SGRAM. The Span data is passed on to the next unit. #### 1.2.2.13 Host Out The Host Out Unit is concerned with synchronisation with the host - for this example it simply consumes any messages which reach this point in the message stream. 2 ### **Rasterizer and 2D Setup** The rasterizer decomposes a primitive into a series of fragments for processing by the rest of the HyperPipeline. Permedia4 can directly rasterize: - aliased screen aligned trapezoids - aliased single pixel wide lines - aliased single pixel points - antialiased screen aligned trapezoids - antialiased circular points All other primitives are treated as one or more of the above, for example an antialiased line is drawn as a series of antialiased trapezoids. 2D Operations can be largely implemented using the **Render2D** and **Render2Dglyph** registers. These, together with the **GlyphData** and **GlyphPosition** registers constitute a functional subunit of the Rasterizer and are discussed below. #### 2.1 Description The rasterizer unit scan converts the given primitive into a list of pixel coordinates which meet the rasterisation rules of OpenGL, X and NT. In addition to generating the coordinates, the order in which pixels are visited is also defined (by the **Render** command) so the local DDA units in the Texture, Color, Fog and Depth units can incrementally keep in step. When a primitive is antialiased the percentage coverage of the primitive within the scan converted pixels is calculated for later use in the alpha blend unit. The same method of antialiasing is used for all primitives. The primitive is scan converted to a higher resolution (e.g. 4x4 sub samples per Render command) and the number of sub pixel sample points covered is counted. The ratio of covered sample points to total number of sample points gives the coverage weighting by which to adjust the color. The rasterisation process steps through along the Y axis and calculates the two intersection points for this scanline. For normal rasterisation the pixels between these two intersection points are filled in. During antialiasing a step of Y/4 (for example) is used and within each scan line four pairs of intersections are calculated per scanline. The coverage for each of the four sub pixel scanline makes in a pixel (on this scanline) are calculated and summed. The coordinates passed to the rasterizer can be window relative or screen relative. The rasterizer treats both the same. Conversion to memory addresses does not happen until they reach the Local Buffer and Framebuffer Units. The Rasterizer is not concerned whether the origin is the bottom left or top left and again it is the Local Buffer and Framebuffer Units which take this into account when calculating the memory address. Obviously if the direction of scan conversion is important then the parameters must match up with the origin definition to give the desired effect. *Note:* Long term mode information is held in the **RasterizerMode** command and short term mode information (which only applies to the primitive being rasterised) is passed with the **Render** command. #### 2.1.1 Trapezoids Permedia4's basic area primitive is the screen aligned trapezoid, discussed in the previous chapter. This is characterized by having top and bottom edges parallel to the X axis. The side edges may be vertical (a rectangle), but in general will be diagonal. The top or bottom edges can degenerate into points in which case we are left with either flat topped or flat bottomed triangles. Any polygon can be decomposed into screen aligned trapezoids or triangles. Usually, polygons are decomposed into triangles because the interpolation of values over non-triangular polygons is ill defined. The rasterizer does handle flat topped and flat bottomed 'bow tie' polygons which are a special case of screen aligned trapezoids. X's definition of a polygon is more complex than OpenGL's. It can be concave and self intersecting. In the non convex case the best thing is for X to do is to decompose the polygon into a series of spans and render them as 1 pixel high rectangles. For any convex polygons X can decompose them into screen aligned trapezoids as a further optimisation over just using spans. X does not support antialiased polygons. Adjacent triangles or polygons which share an edge or vertex must be drawn so that pixels which make up the edge or vertex are drawn once only. This may be achieved by omitting the pixels down the left or the right sides and the pixels along the top or lower sides. Permedia4 follows the convention of omitting the pixels down the right hand edge. Control of whether pixels along the top or lower sides are omitted depends on the start Y value and the number of scanlines to be covered. With the example, if StartY = Y1 and the number of scanlines is set to Y1toY2, the lower edge of the top half of the triangle will be excluded. This excluded edge is drawn as the top of the lower half of the triangle. To minimize delta calculations, triangles may be scan converted from left to right or from right to left. The direction depends on the dominant edge, that is *the edge which has the maximum range of Y values*. Rendering always proceeds from the dominant edge towards the relevant subordinate edge. In the example above, the edge with the greatest Y range (dominant) is on the right so rendering will be from right to left. Figure 2-1 Rasterizing a triangle The sequence of actions required to render a triangle (with a 'knee') are: - Load the edge parameters and derivatives for the dominant edge and the first subordinate edges in the first triangle. - Send the Render command. This starts the scan conversion of the first triangle, working from the dominant edge. This means that for triangles where the knee is on the left we are scanning right to left, and vice versa for triangles where the knee is on the right. - Load the edge parameters and derivatives for the remaining subordinate edge in the second triangle. - Send the **ContinueNewSub** command. This starts the scan conversion of the second triangle. | Render Data Field | | | | | | |-------------------|---|---------------------|---|--------------------------|---| | AreaStippleEnable | 1 | LineStippleEnable | 0 | PrimitiveType | 1 | | FastFillEnable | 0 | FastFillIncrement | Х | UsePointTable | 0 | | AntialiaseEnable | 0 | AntialiasingQuality | Х | ResetLineStipple | X | | SyncOnBitMask | 0 | SyncOnHostData | 0 | TextureEnable | 1 | | FogEnable | 1 | CoverageEnable | 0 | SubPixelCorrectionEnable | 1 | $$\begin{aligned} & \operatorname{StartXDom}\left(X_{1}\right) \\ & \operatorname{dXDom}\left((X_{3}\text{-}\ X_{1})/(Y_{3}\text{-}\ Y_{1})\right) \\ & \operatorname{StartXSub}\left(X_{1}\right) \end{aligned}$$ $dXSub ((X_2-X_1)/(Y_2-Y_1))$ $StartY (Y_1)$ dY (-1.0) $Count (Y_1-Y_2)$ Render $StartXSub (X_2)$ $dXSub ((X_3-X_2)/(Y_3-Y_2))$ ContinueNewSub (Y2 - Y3) // Bottom half Note: If both edges need to be reloaded to continue on with the bottom half of the polygon then issue ContinueNewSub (0) and then ContinueNewDom (count). The ContinueNewSub (0) will just update the DDA with the new start value but not draw any scanlines. Alternatively, if the accuracy of the DDA end values is good enough and can be used as the start values for the next trapezoid then the delta values can be updated and the Continue message used. The sub pixel correction is only needed if color, depth, fog or texture interpolation is being used. After the **Render** command has been sent the registers can be updated immediately to draw the second half of the triangle. Only two registers need to be loaded to do this, followed by the **ContinueNewSub** command. When the first triangle has been drawn and the **ContinueNewSub** command issued, Permedia4 starts drawing the sub-triangle and the **ContinueNewSub** command register is loaded with the remaining number of scanlines to be rendered. #### 2.1.2 Antialiasing Permedia4 uses a subpixel point sampling algorithm to antialias primitives. Permedia4 can directly rasterize antialiased trapezoids and points. Other primitives are composed from these base primitives. The rasterizer associates a coverage value with each fragment produced when antialiasing. This value represents the percentage coverage of the pixel by the fragment. Permedia4 supports two levels of antialiasing quality: - normal, which represents 4x4 pixel subsampling - high, which represents 8x8 pixel subsampling Selection between these two is made by the *AntialiasingQuality* bit in the **Render** command register. Use the **FlushSpan** command to terminate rendering an antialiased primitive. This is necessary because of the way Permedia4 maintains antialiasing continuity. When rendering a primitive which does not complete on a scanline boundary, Permedia4 retains antialiasing information about the last sub-scanline(s) it has processed but does not generate fragments for them unless a **FlushSpan** command is received. The commands **ContinueNewSub, ContinueNewDom** or **Continue** can then be used to maintain continuity between adjacent trapezoids, which allows complex antialiased primitives to be built up from simple trapezoids or points. Figure 2-2 Antialiased Line The procedure to render the line is as follows: ``` // Set-up the blend and coverage application units // as appropriate – not shown // In this example only the edge deltas are shown // loaded into registers for clarity. In reality // start X and Y values are required. This example // uses 4x4 antialiasing. // Render Trapezoid A dY(1 \le 14) dXDom(dXDom1<<14) dXSub(dXSub1<<14) Count(count1<<2) render.PrimitiveType = PERMEDIA4_TRAPEZOID render.AntialiasEnable = PERMEDIA4_TRUE render.AntialiasQuality = PERMEDIA4_MIN_ANTIALIAS render.CoverageEnable = PERMEDIA4_TRUE Render(render) // Render Trapezoid B ``` ``` dXSub(dXSub2<<14) ContinueNewSub(count2<<2) // Render Trapezoid C dXDom(dXDom2<<14) ContinueNewDom(count3<<2) // Now we have finished the primitive flush out // the last scanline FlushSpan() ``` *Note:* When rendering antialiased primitives, any count values should be given in subscanlines. For example if the quality is 4x4 then the count will be 4 times the number of scanlines completely covered by the primitive plus the number of subscanlines contained in the remaining partially covered scanlines. Also, if using 4x4 quality then any delta value must be divided by 4. If using 8x8 quality then the multiply/divide factor is 8. When rendering, *AntialiasEnable* must be set in the **AntialiasMode** register to scale the fragment's color by the coverage value. An appropriate blending function should also be enabled. See the Antialias Application and Alpha Blend sections for more details. Note: When rendering antialiased bow-ties the coverage value on the cross-over scanline may be incorrect. #### 2.1.2.1 Antialiased Polygons Antialiased polygons (or more precisely, screen aligned trapezoids) are scan converted by walking the trapezoid's edges to a higher resolution (x4, say). The coverage for a specific pixel is calculated by summing the coverage each of the sub scanlines contributes. More specific details are given in the implementation section. Care needs to be taken when trapezoids (from the same polygon) meet part way through a scan line. The span of pixels cannot be generated until the second trapezoid is available as it will contribute to the coverage in this scanline. If, on the last trapezoid, the scan line is only part covered then a 'flush' command is needed to generate the coverage for these pixels as there is no follow-on trapezoid. #### 2.1.3 Stippling during Rasterizing Normally, stipple processing is accommodated in the Stipple Unit. This covers all stipple requirements for OpenGL (e.g. aliased lines, polygons) and most other platforms, e.g. X. Details are given in the Stipple Unit section. The Rasterizer does provide additional stipple functionality, for example stippling requirements for X which cannot be met by the Stipple Unit: - Arbitrary stipple on lines. - Arbitrary stipple on polygons, especially rectangles. The bit mask unit in the rasterizer (normally used for characters) can give an arbitrary stipple to any primitive. The stipple pattern required is loaded into the **BitMaskPattern** register 32 bits at a time, in the order in which the pixels in the primitive are generated. The state of each bit in the bit mask determines if an active pixel is generated or a passive one. One bit in the stipple sequence is required for each pixel in the primitive. This stippling method is independent of the Stipple Unit and can replace its function or be used as a second level of stippling. #### **2.1.3.1** Stipple Lines **(X)** The standard OpenGL method of stippling lines can be used in X for the more restricted case where the mark/space ratio of the stipple is the same. X allows an arbitrary stipple pattern to be defined using the Bitmap facility. Here the host provides a number of bit mask words where each bit corresponds to one pixel in the line. The state of this bit determines whether the associated pixel is generated or skipped. #### **2.1.4** Points Points are the easiest of all primitives to scan convert but there are a number of special cases. The main questions are whether the point is antialiased or not, and its size. All the DDA related parameters are held constant over a point (a point may cover many pixels), and between points in a Begin/End set. Before any point rasterisation is done the host must have set up the Texture, Color, Fog and Depth units so they maintain a constant value and don't increment between pixels in a point. In OpenGL no stipple operations are defined for points so stippling must be disabled. This can be done by changing the stipple mode (see Stipple Unit) or by setting the stipple operation in the **Render** (or **PrepareToRender**) command to 'none'. This later method is much easier for the software to use. #### 2.1.4.1 Aliased Points (OpenGL) Permedia4 supports a single pixel aliased point primitive. For points larger than one pixel, trapezoids should be used. The fields in the **Render** command register are described in detail later, however, in this case the *PrimitiveType* field in the **Render** command should be set to equal PERMEDIA4\_POINT\_PRIMITIVE. The pseudocode portion to render an aliased unity sized point is: #### 2.1.4.2 Worked example – one pixel points A series of one pixel points $P(X_1, Y_1)$ , $P(X_2, Y_2)$ ... $P(X_n, Y_n)$ are required. The **Render** command is set up as shown: | Render Data Field | | | | | | | |-------------------|---|---------------------|---|--------------------------|---|--| | AreaStippleEnable | 0 | LineStippleEnable | 0 | PrimitiveType | 2 | | | FastFillEnable | 0 | FastFillIncrement | X | UsePointTable | X | | | AntialiaseEnable | X | AntialiasingQuality | X | ResetLineStipple | X | | | SyncOnBitMask | X | SyncOnHostData | X | TextureEnable | 1 | | | FogEnable | 1 | CoverageEnable | 0 | SubPixelCorrectionEnable | 0 | | $StartXDom(X_1)$ StartY $(Y_1)$ Render StartXDom (X<sub>2</sub>) StartY (Y2) Render ... ... StartXDom (Xn) StartY (Yn) Render #### 2.1.4.3 Aliased Points (X) X only has single pixel sized points so these are rendered by just sends any of the Active walk commands with the (X, Y) position encoded in the data field for each point to render. #### 2.1.4.4 Antialiased Points (OpenGL) Permedia4 can render small antialiased points. Antialiased points are treated as circles, with the coverage of the boundary fragments ranging from 0% to 100%. Permedia4 supports: - point diameter of 0.5 to 16.0 in steps of 0.25 for 4x4 antialiasing - point diameter of 0.25 to 8.0 in steps of 0.125 for 8x8 antialiasing To scan convert an antialiased point as a circle, Permedia4 traverses the boundary in sub scanline steps to calculate the coverage value. For this, the sub scanline intersections are calculated incrementally using a small table. The table holds the change in X for a step in Y. Symmetry is used so the table only holds the delta values for one quadrant. #### Figure 2-3 Antialiased Point The pattern of table accesses, additions and subtractions are shown in Figure 2-3 for an odd diameter point. On the diagram the symbol +/-= Table[n] by an arrow indicates the contents of the table at address n are added/subtracted to move along the arrow. **StartXDom**, **StartXSub** and **StartY** are set to the top or bottom of the circle and dY set to the subscanline step. In this example the point table will have three entries. Note in the case of an even diameter, the last of the required entries in the table is set to zero. The Figure 2-3 Antialiasing an odd-diameter point The Permedia4 Reference Guide gives full details of how the point table is laid out. Note that the table is configurable and point shapes other than circles can be rendered. Also if the **StartXDom** and **StartXSub** values are not coincident then horizontal thick lines with rounded ends, can be rendered. The point looks like this and we will render from bottom to top. The origin is assumed to be bottom left and we are using 4x4 antialiasing quality. The point's diameter is 3 pixels, or 12 sub scanlines. The point table is assumed to be set up already. | Render Data Field | | | | | | | |-------------------|---|---------------------|---|--------------------------|---|--| | AreaStippleEnable | 0 | LineStippleEnable | 0 | PrimitiveType | 1 | | | FastFillEnable | 0 | FastFillIncrement | X | UsePointTable | 1 | | | AntialiaseEnable | 1 | AntialiasingQuality | 0 | ResetLineStipple | X | | | SyncOnBitMask | 0 | SyncOnHostData | X | TextureEnable | 1 | | | FogEnable | 1 | CoverageEnable | 1 | SubPixelCorrectionEnable | 0 | | StartXDom (X) StartXSub (X) StartY (Y) dY (1.0/4.0) Count (12) Render FlushSpan () #### 2.1.5 Lines There are two accepted way of drawing lines: using a DDA, or Bresenham's algorithm. Bresenham's algorithm has an advantage over DDA in that no divide is necessary. This has some benefits, particularly for 2D. For OpenGL we use the DDA method because the cost of the divide is acceptable and is needed to calculate the gradient of any color or depth change. Lines are specified by their end points (accurate to 4 bits of sub pixel position) and rate of change in X and Y per step along the major axis of the line. #### 2.1.5.1 Aliased Lines (OpenGL and X) Single pixel wide aliased lines are drawn using a DDA algorithm so all it needs by way of input data is StartX, StartY, dX, dY and length. The algorithm just calculates: ``` while (length--) { X = X + dx Y = Y + dy plot ((int)X, (int)Y) ``` } The variables X, Y, dx and dy are all fixed point numbers. The conversion to memory address using the X, Y coordinate is done in the memory read units. #### 2.1.5.2 Worked example - Aliased PolyLine (OpenGL or simple stipple X) A two segment polyline from $(X_1, Y_1)$ to $(X_2, Y_2)$ to $(X_3, Y_3)$ is required. Both segments are X major, so: abs $$(X_{n+1} - X_n) > abs (Y_{n+1} - Y_n)$$ Note: For individual line segments or the first line segment in a polyline the line stipple is reset (as shown). | - | | | | | | | | |--------------------|-------------------|---------------------|---|--------------------------|---|--|--| | | Render Data Field | | | | | | | | AreaStipple Enable | 0 | LineStippleEnable | 1 | PrimitiveType | 0 | | | | FastFillEnable | 0 | FastFillIncrement | X | UsePointTable | 0 | | | | AntialiaseEnable | 0 | AntialiasingQuality | X | ResetLineStipple | 1 | | | | SyncOnBitMask | 0 | SyncOnHostData | 0 | TextureEnable | 1 | | | | FogEnable | 1 | CoverageEnable | 0 | SubPixelCorrectionEnable | 0 | | | StartXDom (X<sub>1</sub>) $dXDom(\pm 1.0)$ StartY (Y<sub>1</sub>) $dY ((Y_2-Y_1)/(X_2-X_1))$ Count (abs $(X_2 - X_1)$ ) Render $dXDom(\pm 1.0)$ $dY ((Y_3 - Y_2)/(X_3 - X_2))$ ContinueNewLine (abs (X<sub>3</sub> - X<sub>2</sub>)) Note: The use of ContinueNewLine is not recommended for OpenGL because the DDA units will start with a slight error as compared with the value they would have been loaded with for the second and subsequent segments. The fractional bits of the DDA can be forces to zero or half on the ContinueNewLine action. #### 2.1.5.3 Aliased Wide Lines (OpenGL) There is no direct support for wide lines. The OpenGL server has two options: - 1. Wide lines can be drawn by repeating a single pixel wide line, but offset by one pixel in X for X major lines or one pixel in Y for Y major lines. Any values interpolated along the line (e.g. color) will need to be re-initialised at the start of each individual line. This is easily done with the Render command. - 2. Wide lines can be converted to parallelograms (the ends of a wide line are parallel to the edge of the screen in OpenGL) and then rendered as polygons. As you might expect neither method is the best in all cases. For vertical or near vertical lines method 2 will cause fewer page breaks in memory so should be faster. However if there is any stippling then method 1 is likely to be much faster. Method 1 is the simpler method and is the preferred implementation. A single wide line from $(X_1, Y_1)$ to $(X_2, Y_2)$ is required. The line is 3 pixels wide. The line is X major so abs $(X_2 - X_1) >$ abs $(Y_2 - Y_1)$ . | Render Data Field | | | | | | |-------------------|---|---------------------|---|--------------------------|---| | AreaStippleEnable | 0 | LineStippleEnable | 1 | PrimitiveType | 0 | | FastFillEnable | 0 | FastFillIncrement | X | UsePointTable | 0 | | AntialiaseEnable | 0 | AntialiasingQuality | X | ResetLineStipple | 1 | | SyncOnBitMask | 0 | SyncOnHostData | X | TextureEnable | 1 | | FogEnable | 1 | CoverageEnable | 0 | SubPixelCorrectionEnable | 0 | StartXDom $(X_1 - 1)$ $dXDom(\pm 1.0)$ StartY (Y1) $dY ((Y_2 - Y_1)/(X_2 - X_1))$ Count (abs $(X_2 - X_1)$ ) Render $StartXDom(X_1)$ Render StartXDom $(X_1 + 1)$ Render #### 2.1.5.4 Aliased Wide Lines (X) Individual wide lines in X have square ends and multiple connected wide lines have a range of joint styles. X needs to convert the wide lines either to polygons, or to a series of spans, to achieve the desired effect. ## 2.1.5.5 Antialiased Lines (OpenGL) Antialiased lines of any width are drawn as antialiased polygons (see below). If stipple is enabled then the line is drawn as a series of polygons to match up with the stipple parameters. ## 2.1.6 Polygons The only polygons the rasteriser handles are screen aligned trapezoids. These are characterised by having the top and bottom edges parallel to the X axis. The side edges may be vertical, but in general will be diagonal. The top or bottom edges can degenerate into points in which case we are left with flat topped or flat bottomed triangles. Any polygon can be decomposed into this shape, however the sample OpenGL server always decomposes polygons<sup>2</sup> into triangles because the interpolation of values over nontriangular polygons is ill defined. | The rasteriser does handle vertical 'bow tie' polygons. | | |---------------------------------------------------------|--| | | | As part of the rasterisation process a number of parameters (color, depth, fog and texture) are calculated for each fragment generated. These are calculated in the DDA unit down stream under the guidance of the rasteriser step messages. The ideal way to calculate these values is to use the fragments XY coordinate and substitute this into the plane equation for each parameter in turn. This technique gives the best result, however it is computationally expensive so it is normal to use an incremental method such as a DDA to approximate to it. The DDA method introduces some errors of its own: - An incremental error due to the finite precision of the delta values. To overcome this source of errors enough fractional bits are used so that the error cannot propagate into the actual bit range of the DDA where the parameter value is extracted from. - The start value for a parameter, P, can be nearly dPdx (one step in the X direction) out because the value calculated as a result of a Y step (shown as a circle in the following diagram) corresponds to the value of the sample on the edge and not at the center of the first fragment to be drawn. It is necessary to correct for this error to eliminate bright edge artefacts and achieve high quality rendering. This correction is needed for every scanline. A similar correction is needed at the start of the primitive because the parameter value at the start vertex is unlikely to lie on the horizontal center of a pixel so needs adjustment in Y. This correction is handled by software. <sup>&</sup>lt;sup>2</sup>Excluding the special case of screen aligned rectangles. If dErr is the distance the edge is away from the pixel's center (must be < 1) and dPdx is the change in P for unit change in x then the correct value at the first sample point is: The distance dErr is sent internally by the rasteriser in PrepareToRender and Step messages. The multiplication is done in the DDA units whenever these messages are received, but only update Px on the **SubPixelCorrection**register if the LS bit of the data field is set. The correction dErr is sent as a 7 bit 2's complement 1.6 fixed point format. The dErr value sent in the messages is the dErr needed for the next scanline (of the first one in the case of a PrepareToRender). #### 2.1.6.1 Sub Pixel Correction not Supported for Antialiased Primitives Sub pixel correction must be enabled by the *SubPixelCorrectionEnable* bit in the **Render** command if it is required. See the *Permedia4 Reference Guide* for more information. Antialiasing presents a much more complex problem to solve in that the sample point for the parameters must be inside the boundary of the fragment, but this may not be the center of the pixel anymore. Near horizontal edges can give rise to a dErr value which approaches the width of the screen (or window). Two methods can be used to overcome this: - The sample point can be moved to be within the boundary by 'micro nudging' the DDAs in X and Y. - The parameter being interpolated can be integrated over the interior sub pixel sample points and then divided by the number of interior points (this is the method in the OpenGL spec). In both these cases the changes to the DDA units are too extensive given the other problems antialiasing presents (the coverage calculation doesn't take into account sub pixel visibility and doesn't work well with a depth buffer). No sub pixel corrections are done for antialised primitives. ## 2.1.6.2 Antialiased Triangle The triangle looks like this and is rendered from top to bottom. The origin is assumed to be bottom left. Antialias quality is 4x4: | Render Data Field | | | | | | | | | |-------------------|---|---------------------|---|--------------------------|---|--|--|--| | AreaStippleEnable | 1 | LineStippleEnable | 0 | PrimitiveType | 1 | | | | | FastFillEnable | 0 | FastFillIncrement | X | UsePointTable | 0 | | | | | AntialiaseEnable | 1 | AntialiasingQuality | 0 | ResetLineStipple | X | | | | | SyncOnBitMask | 0 | SyncOnHostData | 0 | TextureEnable | 1 | | | | | FogEnable | 1 | CoverageEnable | 1 | SubPixelCorrectionEnable | 1 | | | | StartXDom (X<sub>1</sub>) $dXDom((X_3-X_1)/(4*(Y_3-Y_1)))$ StartXSub (X<sub>1</sub>) $dXSub ((X_2-X_1)/(4*(Y_2-Y_1)))$ StartY (Y<sub>1</sub>) dY (-1.0/4.0) Count $((Y_1 - Y_2) * 4)$ Render StartXSub (X<sub>2</sub>) $dXSub ((X_3-X_2)/(4*(Y_3-Y_2)))$ ContinueNewSub $((Y_2 - Y_3) * 4)$ // Bottom half FlushSpan () Note: The DDA units need to have their sample point biased from the center of the pixel to the lower edge of the pixel so the DDA units can be tracked properly with the walk messages. This can be done by calculating the start values for integer Y values rather than at Y+0.5 as would normally be done. The sub pixel correction is only needed if color, depth, fog or texture interpolation is being used. # 2.1.7 Span Operations Many 2D rendering operations can be implemented more efficiently using span operations, enabled with the *FastFillEnable* bit in **Render** and **Render2D**. For both 2D textures and rasterizer bit mask operations the improvement can be from about 40 Mpixels/s to 400 Mpixels/s. Permedia4's span filling implementation can be used for image upload, image download, filling with constant color, filling with a pattern, characters (i.e. bit masks), copies, and copies with logical ops. Any trapezoid can be used and the scanning direction can be left-to-right or right-to-left. Benefits of span fill for 2D operations include: - Better utilization of SGRAM block fill (where memory devices permit) for solid, stippled and patterned fills and character bitmaps. - Span mechanism is independent of pixel size makes maximum use of framebuffer bandwidth for 8, 16 and 32 bit pixels. - Multiple pixels processed in parallel - No alignment restrictions any span operation may be performed to any pixel alignment for all pixel sizes. - Page break overheads are spread over many more read/write operations during a BitBlt operation performance of BitBlts is much closer to peak memory bandwidth - Both window- and screen-relative operations supported - Scissor clipping can be used in conjunction with span operations If any reads are enabled, span operations are converted into a series of normal memory reads. The memory data returned is aligned and sent on in 64 bit words for further processing. *Note:* Tthis is different from earlier chips where the memory interface was responsible for decoding the span mask and returning the appropriate aligned data Span reads are only supported when the pixel data is laid out in the Linear or Patch64 formats. 32\_2 and patch\_2 formats do not support spans (but packed support is available for non-span rasterization - see <a href="Packed8Pixels">Packed8Pixels</a> and <a href="Packed16Pixels">Packed16Pixels</a> in the <a href="Permedia4">Permedia4</a> Reference Guide.) If source and destination reads are enabled then the source data is read first and stored in the scratch pad ram. Then the destination data is read and packed into 64 bit words and sent on. After each destination word is sent the corresponding source word from the scratch pad ram is read and sent on. The destination buffers are read in increasing numerical order. Page breaks are kept to a minimum by reading all the data from a buffer for a span before moving onto the next buffer (for the same span mask). The span operation does have some restrictions: - <u>Stencil and Depth tests are not available. These units just ignore the commands associated with fast span fills.</u> - Gouraud shading, alpha tests, alpha blend and dither operations are not available. - If GIDs are being used for window clipping then spans cannot be used at full speed as they normally ignore GID information and write to all pixels in the span. However the result writes 4 pixels per cycle. When span operation is enabled the rasterizer divides the pixels between the left- and right-hand edges of the polygon or rectangle into a succession of spans, each 64 pixels wide. Each span is described by a 64 bit wide span mask and each pixel in the span has a corresponding bit in the span mask. If a bit in the span mask is set, then the corresponding pixel will be read and/or written. The least significant bit in the span mask (bit 0) corresponds to the leftmost pixel on the screen for the span. The span mask does not have any fixed alignment with the pixels stored in the framebuffer, i.e. the first pixel in the span may correspond to any pixel in the framebuffer. Any masking or shifting to align the span data being read or written to the 64 bit framebuffer architecture is performed automatically. Span filling may be performed left-to-right or right-to-left, but the pixels within an individual span are always read and/or written left to right. Hence if bitmask or image download data is provided, the data within individual spans must be ordered left to right. Normally if any data is provided span filling should be left-to-right. The use of spans for image handling is shown later (Bitmaps, Spans and Images). Spans operate in both the LB and FB functional groups. In the Localbuffer the data written is constant for the span and is held in the **LBClearDataU** and **LBClearDataL** registers which together provide 40 bits of data. This is replicated automatically to the four pixels in a memory word. For Packed16 mode where there are 8 pixels in a memory word software must replicate the 16 bits of clear data into the 32 bit **LBClearDataL** register. The **LBClearData** registers hold the depth, stencil and GID data in the format it is in the local buffer - i.e. no formatting is done on the clear data before it is written. The byte enables (in LBWriteMode) can be used to protect bytes from being updated. If, however, the field to clear is not byte aligned and a multiple of bytes in width (i.e. a 3 bit stencil field), then clearing this field while leaving the others intact can only be done via a read-modify-write operation so will run at one quarter the speed. #### 2.1.7.1 Mode changes in Span Operations Permedia4 supports major mode changes during native display list operations. This is described in greater detail in the *Framebuffer* chapter. However to ensure that the effect of mode changes during display lists can be software controlled, new registers (**FBDestReadEnables**, **FBSourceReadEnables**) are set up to provide monitoring and readback for software-specified modes e.g. AlphaBlend or LogicalOps. The Boolean equation for a span read in buffer n is: destRead = (mode.ReadEnable & mode.Enable[n] & ~mode.UseReadEnables) | (mode.ReadEnable & mode.Enable[n] & mode.UseReadEnables & (E4 & R4 | E5 & R5 | E6 & R6 | E7 & R7)) where "mode" is shorthand for **FBDestReadMode** and E\* and R\* are taken from **FBDestReadEnables**. The logical operations versions of the registers (**FBDestReadEnablesAnd** and **FBDestReadEnablesOr**) can be used to change individual bits. #### 2.1.7.2 Alpha Filtering One use of the mode monitoring feature is an alpha filtering enhancement. In many cases when doing alpha blending the blend mode is set such that if the fragment's alpha is a specific value (typically 0 or 255) then the framebuffer color (from a destination read) is effectively ignored as it doesn't contribute to the final alpha blended color. In this case there is no point in reading the destination pixel value and we can save memory bandwidth by avoiding it. Alpha filtering is enabled by the *AlphaFiltering* bit in **FBDestReadMode** and the reference alpha value to compare against can be found in **FBDestReadEnables**. #### 2.1.7.3 Span Mask Processing Span fills are enabled by setting the *FastFillEnable* bit in the **Render** command. The *SpanOperation* bit when clear indicates writes are to use the constant color found in the previous **FBBlockColor** register. When this bit is set write data is variable and is either provided by the host (i.e. **SyncOnHostData** is set) or is read from the framebuffer. All other trapezoid parameters are the same. The span mask can also be used to grow the extent region or perform picking as part of HostOut statistics gathering. The span mask undergoes several processing steps before it is used by the Framebuffer Unit to determine which pixel to read and/or write: - The Rasterizer generates the mask using the left and right hand edge information. Note that the edges may be vertical or sloped. - If *SyncOnBitMask* is enabled in the **Render** command, then the span mask is ANDed with the bit mask data provided by the host. If no bit mask data is present the Rasterizer wait for it to arrive before proceeding. - The bit mask data may be optionally inverted, byte swapped, word swapped or mirrored (in any combination) before the ANDing is performed. The inversion can be used to enable drawing of the background bits. The byte and word swapping allows bit mask data from different endian hosts to be accommodated. The mirror operation swaps bits 0 and 31, bits 1 and 30, etc. which changes the left most pixel in a span from being controlled by the least significant bit to the most significant bit in the bit mask. - If Screen Scissor testing is enabled then pixels falling outside the left and right edges of the screen scissor region have their corresponding bits in the span mask cleared. - If the User Scissor test is enabled, then pixels falling outside the left and right edges of the user scissor region have their corresponding bits in the span mask cleared. - If Area Stippling is enabled, then the stipple mask is extracted from the area stipple table for the appropriate scan line and expanded, if necessary, to 32 bits by replication. The normal offset, select and mirror controls in X and in Y may be used as for non-span rendering. The stipple mask is ANDed with the span mask. - If Texture Mapping is enabled, then a texel is read from the texture logical or physical address under the control of the **HostTextureAddress**, **TextureOperation**, **LogicalTexturePage**, **TextureReadMode** and the S, T and Q DDA parameters. If the texel is to be used as a bit mask, then any specified texel formatting is performed and the final 64 bit texel value is optionally inverted, byte swapped and mirrored before being ANDed with the span mask. - The span mask is now used to read/write the framestore pixel data #### **2.1.7.4 Block Write** The FastFillIncrement and BlockWidth parameters in **Render** and **FBWriteMode** are no longer required or supported. For more information on block write see Volume I, section 4.2.4, Pixels and Spans. #### 2.1.8 Pixel Sizes The local buffer holds up to four fields of information: Depth, Stencil, GID and fast clear planes (FCP). FCPs are not implemented in Permedia4 and the bitfields are reserved for historical reasons. Permedia4 takes note of pixel depth as Permedia2 did, <u>but also allows pixel sizing on a unit-by-unit basis</u>, <u>which can be desirable for texturing</u>. When using span operations it is important to maximize the number of pixels <u>per 32 or 64 bits processed</u>, The Rasterizer unit **PixelSize** register can have the following values <u>on either a global or unit-tailored</u> basis: : Depth: 15, 16, 24 and 31. • Stencil: 0, 1, 2, 3, 4, 5, 6, 7 and 8. GID: 0, 1, 2, 3 and 4. The depth plane always starts at bit 0. The Stencil and GID fields can start on any bit position from 16 to 39 inclusive. It is the user's responsibility to ensure that they don't overlay or reference bits outside the pixel width. Selecting a depth width of 15 bits forces the stencil and GID fields to be set from bit 15 of the pixel and ignores the normal stencil and GID settings. If the specified width of a field is less than its internal width then the field is zero extended at the Least Significant end to its internal width. Since **PixelSize** is a core register it can be modified at any time without affecting inprogress rendering. It is not necessary to synchronize with the chip before changing pixel depth. Pixel size is also definable in the **DMARectangleRead** and **DMARectangleWrite** registers. #### 2.1.8.1 Sub Pixel Precision The rasterizer has 16 bits of fraction precision and the screen width used is typically less than 2<sup>16</sup> wide, so a number of bits (called subpixel precision bits) are available. Consider a screen width of 4096 pixels. This figure gives a subpixel precision of 4 bits (4096=2<sup>12</sup>). The extra bits are required for a number of reasons: - antialiasing (where vertex start positions can be supplied to subpixel precision) - when using an accumulation buffer (where scans are rendered multiple times with jittered input vertices) - for correct interpolation of parameters to give high quality shading as described below ## 2.1.9 Bitmaps, Spans and Images The Permedia4 is not software-compatible with earlier Permedia2 or GLINT MX chips. Specific changes affecting bitmaps, spans and images include separate control of source and destination FB and LB reads using new registers, automatic span read alignment, pattern RAM data held in localbuffer, and texture units now generate source offsets but not addresses. #### 2.1.9.1 Bitmaps A Bitmap primitive is a trapezoid or line of ones and zeros which controls which fragments are generated by the rasterizer. The bitmap operates on any fragments produced by the rasterizer, including spans and characters. Bitmaps may be implemented as Rasterizer bitmasks or 2D Textures with or without span fill enabled. Span Fills are described in the next section. Span fills are generally an order of magnitude faster but do not normally support LB test functions (Depth, GID, Stencil) or Alpha Test, Logical Ops, Texturing or Dither. (But see Volume I, Section 1.1.6 - GID Field - for GID testing of LB spans.) Bitmaps are controlled using the **BitMaskPattern** register and parameters enabled in the **RasterizerMode** command: *ByteSwapBitMask; MirrorBitMask; InvertBitMask; BitMaskPacking* and *BitMaskOffset*. In addition to its raw data, each bitmap is characterised by its origin coordinates (bottom left or top left); width and height. When *SyncOnHost* is enabled in the **Render** command only fragments where the corresponding Bitmap bit is set are submitted for drawing. The normal use for this is in drawing characters, although the mechanism is available for all primitives. Bitmap data unless otherwise formatted is by default packed contiguously into 32 bit words so that rows are packed adjacent to each other. Bits in the mask word are by default used from the least significant end towards the most significant end and are applied to pixels in the order they are generated in. The relationship between bits in the mask and the scanning order is shown in Figure 2-4. The rasterizer scans through the bits in each word of the Bitmap data and increments the X,Y coordinates to trace out the rectangle of the given width and height. By default, any set bits (1) in the Bitmap cause a fragment to be generated, any reset bits (0) cause the fragment to be rejected. Figure 2-4 Relationship between Bitmask and Scanning Directions The selection of bits from the **BitMaskPattern** register can be mirrored, that is, the pattern is traversed from MSB to LSB rather than LSB to MSB. Permedia4 allows the pattern to be byte swapped on download. This is useful for downloading Windows/NT bitmaps in their native format. Also, the sense of the test can be reversed such that a set bit causes a fragment to be rejected and vice versa. This control is found in the **RasterizerMode** register, described in section § 2.2. When one Bitmap word has been exhausted but there are still pixels remaining in the rectangle, rasterization is suspended until the next write to the **BitMaskPattern** register. Any unused bits in the last Bitmap word are discarded. For example a 5 pixel wide, 8 pixel high bitmap requires a register set up as follows: ``` // Set the rasterizer mode to the default RasterizerMode(0) // Set-up the start values and the deltas. // Note that the X and Y coordinates are converted // to 16.16 format StartXDom (X<<16) dXDom (0) StartXSub ((X + 5) << 16) // Right hand edge pixels // get missed off. StartY (Y<<16) dY (1<<16) Count (8) // At least the following bits require setting for // the Render command. Render.PrimitiveType = PERMEDIA4_TRAPEZOID_PRIMITIVE render.SyncOnBitMask = PERMDIA3_TRUE // Issue render command. First fragment will be // generated on receipt of the BitMaskPattern Render (render) // 8x5 pixel bitmap requires 40 bits, and so 2 // 32 bit words. BitMaskPattern (patternWord0) BitMaskPattern (patternWord1) ``` Rendering starts as soon as the first patternWord is loaded into the **BitMaskPattern** register. Permedia4 provides the ability to start a scanline at an arbitrary offset into the first bitmask that is downloaded for each scanline, and to discard unused bits at the end of a scanline. This lets the host download data directly from a host bitmap without having to shift and pack the bits. This functionality is controlled by the *BitMaskPacking* and the five *BitMaskOffset* bits in the **RasterizerMode** register. #### 2.1.9.2 Bitmaps with Spans The fastest way to render downloaded bitmap data is to use a span operation (described in §2.1.6, Span Operations, above). The rasterizer is set up as normal and the FastFillEnable bit in the Render command is enabled. The SpanOperation bit determines the if the span writes use constant color data or variable color data. All other trapezoid parameters are the same. A span is always 64 pixels long and any combination of pixels within the span can be read and/or written. Pixels with a width of 8 or 16 bits are processed 8 or 4 pixels at a time respectively and all read and write alignment is handled in hardware. The span mechanism can be used for image upload, image download, filling with constant color, filling with a pattern, characters (i.e. bit masks), copies and copies with logical ops. Any trapezoid can be used and the scanning direction can be left-to-right or right-to-left3. If the span is being written with a constant color value4 and the SGRAM supports block fills (where a number of pixels can be written simultaneously) then span filling automatically uses this mode of operation to give a very much faster filling rate. The Memory Controller takes care of mapping this logical configuration on to the actual SGRAM configuration where the SGRAM chips may have fewer pixels in a block, the framebuffer may be interleaved and/or hold packed pixels. When the bitmap data is downloaded it is ANDed with the span mask generated by the rasterizer. The resulting mask is passed through the core to be used as the block fill mask. Thus a single memory access can be used to process up to 32 pixels. Since the downloaded bitmask data will be ANDed with masks generated by the Rasterizer without any re-alignment being performed, the host software must ensure that the masks match up. This can be achieved in either of two ways: - 1. the host software can align the bits that it downloads to match the alignment of the Rasterizer. - 2. use the User Scissor (generally faster and recommended). Note: this is a general algorithm. In the special case where the data to be downloaded is already aligned to 32 bits on both the left and right edges the scissor need not be used. For example, suppose we want to download data to fill a rectangle with left edge at 10 and right edge at 200. Assume that the host bitmap data is to be loaded from an offset of 35 within the bitmap. Our goal is to match the bit at offset 35 with the pixel at offset 10. Since we want to avoiding shifting the data and incurring a host processing overhead, we download the host bitmap data at the previous 32-bit boundary. This means that we must set Permedia4 up to discard the first 3 bits of data. \_ <sup>&</sup>lt;sup>3</sup>The pixels within a span are always read and/or written in a left to right order so if the host if providing any bitmask or image download data then it needs to take this into account. The simplest thing is for the host to always scan left to right when supplying data. <sup>&</sup>lt;sup>4</sup>This is not strictly true as the framebuffer may be in packed pixel format so adjacent pixels within a 32/64 bit word could have different colors. We achieve this by rasterizing a rectangle whose left edge is 3 pixels less than that required, in this case we would rasterize the left edge to start at pixel 7. This aligns the source bitmap data with the mask data produced by the rasterizer. But, in order to protect the 3 pixels that we would otherwise overwrite, we use the scissor clip and set its bounds to be those of the original rectangle. When using a span operation like this the rasterizer waits for new bitmask data to be downloaded at the start of each scanline. So we do not have to perform the alignment operation on the right hand edge. The following gives the outline for this algorithm: ``` leftalign = bitmapxleft & 31 width = Xright - Xleft + leftalign StartXDom ((Xleft - leftalign)<<16) dXDom (0) StartXSub (Xright<<16) StartY (Y<<16) ``` ``` dY (1<<16) Count (height) // protect the edge pixels with the scissor minXY.X = Xleft minXY.Y = Y maxXY.X = Xright maxXY.Y = Y + height ScissorMinXY(minXY) // Load the registers ScissorMaxXY(maxXY) // Enable the unit scissorMode.UserScissorEnable = PERMEDIA4_ENABLE scissorMode.ScreenScissorEnable = PERMEDIA4_ENABLE // At least the following bits require setting for // the Render command. Render.PrimitiveType = PERMEDIA4_TRAPEZOID_PRIMITIVE render.SyncOnBitMask = PERMEDIA4\_TRUE render.FastFillEnable = PERMEDIA4_TRUE // Issue render command. First fragment will be // generated on receipt of the BitMaskPattern. Render (render) // download the bits from the source bitmap 32 bits // at a time aligning the bitmap pointer at the // start of each scanline BitmapBase += bitmapyorg * bitmapwidth bitmapxleft &= ~31 for (h = 0; h < height; ++h) { pulBitmap = BitmapBase + bitmapxleft/8; for (c = 0; c < width; c += 32) { BitMaskPattern(pulBitmap) ``` ``` pulBitmap += sizeof(ULONG) } BitmapBase += bitmapwidth } ``` ## 2.1.9.3 Glyphs A byte stream of glyph data (packed four to a word) can be downloaded and automatically chopped up and padded to the necessary width for the texture units to use as a bitmap. For example a gyph with a width between 17 and 24 pixels will be sent down as a stream of bytes and each triplet of bytes will be padded with zero and sent to be written into memory. If the input words have their bytes labelled: First word: DCBA (A is the least significant byte) Second word: HGFE Then the output words send on to the rasterizer are: First word: 0CBA Second word: 0FED #### 2.1.9.4 Image Copy/Upload/Download Permedia4 supports three "pixel rectangle" operations - Copy, Upload and Download. Image operations involve rectangular regions with pixel coordinates rather than the usual 3D coordinates. The image regions can be moved among host memory and any Permedia buffer(s). #### 2.1.9.5 Copy Image Copy moves raw blocks of data around buffers. To zoom or re-format data external software must upload the data, process and return it. To copy a rectangular area the rasterizer would be configured to render the destination rectangle, thus generating fragments for the area to be copied. Note: Care must be taken when the source and destination overlap to choose the source scanning direction so that the overlapping area is not overwritten before it has been moved. This may be done by swapping the values written to the StartXDom and StartXSub, or by changing the sign of dY and setting StartY to be the opposite side of the rectangle. If the source and destination rectangles overlap then the direction of the scan conversion is important and must be set up correctly by the host. Localbuffer copy operations are tested for pixel ownership (GID). Note that this implies two reads of the localbuffer, one to collect the source data, and one to get the destination GID for the pixel ownership test. ## 2.1.9.6 Upload/Download The host places a pixel image in a windows-relative rectangle, in any buffer (depth, stencil or color) using the Rasterizer. The host could control the process directly, but the rasterizer also manages clipping, fragment processing and window coordinate tracking. **Texture**Enable SubPixelCorrectionEnable 0 During download, for example, the rasterizer scans the image so the host does not need to provide X,Y coordinates, waits for a depth, stencil or color command from the host, then processes the next pixel. In other words, the process is synchronous with host processing. To maintain synchronization enable the *SyncOnHost* bit in the **Render** command. The image download rectangle looks like this and the origin is assumed to be bottom left. The host provides the data in top to bottom, left to right order. Color data will be provided. There are *n* pixels in the rectangle. | iii iiio rootaiigio | - | | · | | | | |---------------------|---|---|----------------------|---|------------------|---| | | | | | | | | | | | | Render Data Field | | | | | AreaStippleEnable | | 0 | LineStippleEnable | 0 | PrimitiveType | 1 | | FastFillEnable | | 0 | FastFillIncrement | X | UsePointTable | 0 | | AntialiaseEnable | | 0 | Antialiasing Quality | X | ResetLineStipple | X | In OpenGL the AreaStippleEnable would always be 0, but in X may be enabled or disabled. SyncOnHostData CoverageEnable StartXDom (X1) dXDom(0) SyncOnBitMask FogEnable StartXSub (X2) dXSub (0) StartY (Y2) dY(-1.0) Count (Y2 - Y1 + 1) // Width of image Render Color (P0) // Pixel 0 Color (P1) Color (P2) ••• Color (Pn) Note: the rasteriser overscans the rectangle because the right hand edge is not plotted and the downloaded image doesn't include these pixels Any functions which can generate fragment values, the color DDA for example, should generally be disabled for any copy, upload or download operations. Warning: During image upload, all the returned fragments must be read from the Host Out FIFO, otherwise the Permedia4 pipeline will stall. In addition it is strongly recommended that any units which can discard fragments (for instance the following tests: bitmask, alpha, user scissor, screen scissor, stipple, pixel ownership, depth, stencil), are disabled otherwise a shortfall in pixels returned may occur, also leading to deadlock. Bit mask processing can be used in conjunction with image operations to allow arbitrary stipples, for example. Use the **BitMaskPattern** command to load the bit mask. Unlike conventional bit mask functionality, during image loading the **Bitmaskpattern** command must be interleaved accurately with the image data to ensure that the new mask is available immediately the old mask is consumed. Pixels arriving without mask bits are considered passive until the new mask arrives. If the host fails to supply a required color, depth or stencil tag the chip waits until one arrives, or (to avoid unnecessary hangs) terminates the image operation when any tag other than color, depth, stencil, FBData or BitMaskPattern are received. During image uploads the host can read back a window-relative rectangle from any buffer. The buffer read must be set up using the **FBSourceReadAddress**, Offset and Operations registers. #### 2.1.9.7 Image Copy/Upload/Download with Spans 2D image operations to and from the framebuffer can be optimized by using a span operation. The benefits are greatest at lower pixel depths since packed pixel data is transferred through the core. #### Copy Using span operations when copying pixel data within the framebuffer is straightforward. Simply set the *FastFillEnable* and *SpanOperation* bits in the **Render** command. *Note:* This works both with and without logical op processing. #### Download Download facilities ("Write Pixels") allow the host to transfer image data to local memory. The rasteriser supports this function by scan converting the rectangle (so the host doesn't need to generate X, Y coordinates). The rasteriser is constrained by the *SyncOnHostData* bit in the **Render** command to wait for Depth, Stencil or Color data from the host (in the **Depth**, **Stencil** or **Color** registers) before moving on to the next pixel. In other words it runs synchronously to the host for the duration of this primitive. The bit mask mode can also be enabled during this function so arbitrary stippling can be done on the image being downloaded (useful in X). The bit mask register is loaded whenever the **BitMaskPattern** register is received. This is slightly different<sup>5</sup> to the way it works when the rasteriser is not in Image download mode. The **BitMaskPattern** data must be interleaved correctly with the image data to ensure the new mask is available immediately after the last bit in the current mask has been used. It this sequence is not correct then all subsequent fragments until the new mask is received will be passive. There is the potential for the host to send too few Color (**Depth**, **Stencil** or **FBData**) messages for the size of primitive it has defined. Rather than have Permedia4 hang <sup>&</sup>lt;sup>5</sup>This change is necessary to prevent a deadlock situation arising if too many **Color** messages (for example) are sent before the next **BitMask** message is due. because it is waiting for messages which will never arrive, any message other than Color, **Depth, Stencil, FBData** or **BitMaskPattern** stop primitive generation. The *SyncOnHost* functionality is in fact available for any primitive, although usually used in conjunction with downloads. Image downloads are also supported by DMA - see **DMARectangleRead** in the *Permedia4 Reference Manual* #### Upload Image upload ("ReadPixels"). This function provides the host with a method of reading back a windows-relative rectangular region of any of the buffers (depth, stencil, color). The rasteriser supports this function by scan converting the rectangle and sending the active walk messages. The Local Buffer Read Unit or the Framebuffer Read Unit will have already been **set** up to do the read and generate the appropriate **LBDepth**, **LBStencil** or **FBColor** message, which will collected by the Host Out Unit and passed back to the host. Upload can also be run via DMA using the **DMARectangleWrite** command. The image data may be a sub image of a larger image and have any natural alignment or pixel size. Information regarding the rectangle transfer is held in registers loaded from the input FIFO or a DMA buffer. *Note: failure to supply an EOF may have unpredictable results.* The pixel data written to host memory is always packed, however when read from the Host Out FIFO it can be in packed or unpacked format (packed when Reset). It can also, optionally, be aligned on 64 byte boundaries. The minimum number of PCI writes are used to align and pack the image data. PERMEDIA4 is set up to rasterize the source area for the pixel data (depth, stencil, color, etc.) enabled in the **Render** command. This is done before the Rectangular DMA is started. ### 2.2 Rasterizer Mode The RasterizerMode register sets long-term modes, particularly these: - MirrorBitMask: This is a single bit flag which specifies the direction that bits are checked in the BitMaskPattern register. If the bit is reset, the direction is from least significant to most significant (bit 0 to bit 31), if the bit is set, it is from most significant to least significant (from bit 31 to bit 0). Using a value of 3 is very useful in conjunction with the MirrorBitMask bit for handling Microsoft Windows bitmaps since this causes a complete byte swap of the downloaded data. - InvertBitMask: This is a single bit which controls the sense of the accept/reject test when using a Bitmask. If the bit is reset then when the BitMask bit is set the fragment is accepted and when it is reset the fragment is rejected. When the bit is set the sense of the test is reversed. - BitMaskPacking: This is a single bit which controls the packing of bits which are downloaded as part of a SyncOnBitMask operation. If this bit is reset then any spare bits at the end of a scanline are used to start the next scanline. If this bit is set then extra bits at the end of a scanline are discarded. This is not available for use with span fills. - BitMaskOffset. This is a 5 bit field which specifies the first bit to be used in the first bitmask word of every scanline downloaded as part of a SyncOnBitMask operation. This is not available for use with span fills. - Fraction Adjust: These 2 bits control the action taken by the rasterizer on receiving a ContinueNewLine command. As Permedia4 uses a DDA algorithm to render lines, an error accumulates in the DDA value. Permedia4 provides for greater control of the error by: - 1. leaving the DDA running, which means errors will be propagated along a line, or - 2. setting the fraction bits to either zero, a half or almost a half (0x7FFF). - Bias Coordinates is a 2-bit field with the following actions: - 0 Add 0 to the coordinates (Effectively do nothing) - 1 Add exactly one half to the coordinates - 2 Add nearly one half (0x7FFF) to the coordinates - Host Data Byte Swap Mode: The data downloaded by the host when using SyncOnHostData can have its bytes re-ordered. If the downloaded data has a byte ordering of ABCD then, this 2 bit field specifies re-ordering as follows: - 0: ABCD (no swap) - 1: BADC (swap within halfwords) - 2: CDAB (halfword swap) - 3: DCBA (full byte swap) - Y Limits Clipping: When set, this bit enables Y Limits clipping. When reset Y Limits clipping is disabled. This is described in the next section. - Multi Rasterizer. If set this bit causes the rasterizer to work in multi-Rasterizer mode. If reset the rasterizer works in single Rasterizer mode. #### 2.2.1.1 Y Limits Clipping The rasterizer normally rasterizes all pixels on every scanline, generating a fragment per pixel. If large numbers of scanlines are subsequently clipped out by, for example, one of the scissor units, then a lot of time can be wasted. The **Ylimits** register has been added to provide a way of quickly eliminating whole scanlines for a given primitive. This is effectively a Y scissor clip in the Rasterizer. If Y limits testing has been enabled in the **RaserizerMode** register, and if a scanline being rasterized falls outside the Y limits bounds, then the rasterizer will move directly onto the next scanline without rasterizing in X. Y Limits clipping is automatically disabled when SyncOnHostData or SyncOnBitMask is used. # 2.2.2 Rasterizer Unit Registers Real coordinates with fractional parts are provided to the rasterizer in 2's complement 16 bit integer, 16 bit fraction format, as illustrated below for a typical register in this unit: | Name Type ContinueNewDom Rasterizer | | zer | Offset<br>0x8048 | | | | |-------------------------------------|-----------|-------|------------------|--------------------------------------|-------------------------|--| | | | Comma | nand | | | | | Bits | Name | Read | Write | Reset | Description | | | 015 | Scanlines | ✓ | ✓ | X | 16 bit unsigned integer | | | 1631 | Reserved | 0 | 0 | x Reserved for future use, mask to 0 | | | Table 1.1 Typical register description – ContinueNewDom #### 2.2.2.1 Command registers The following table lists the command registers which control the rasterizer unit. The control registers are shown separately below. | Register Name | Data Field | Description | |----------------|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Render | Bitfield | Starts the rasterization process | | ContinueNewDom | 16 bit integer | Allows the rasterization to continue with a new dominant edge. The dominant edge DDA in the rasterizer is reloaded with the new parameters. The subordinate edge is carried on from the previous trapezoid. This allows any convex polygon to be broken down into a collection of trapezoids, with continuity maintained across boundaries. Note: other DDAs are not reloaded with new start values until the next Render command. Thus it is not possible to use this command, for example, to Gouraud shade a triangle from left to right which has a knee on the left hand side. To avoid this, 3D rendering should always start from the side without the knee. The data field holds the number of scanlines (or sub scanlines) to fill. This count is not loaded into the <i>Count</i> register. | | Register Name | Data Field | Description | | | | |--------------------|------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--| | ContinueNewSub | 16 bit integer | Allows the rasterization to continue with a new subordinate edge. The subordinate DDA is reloaded with the new parameters. The dominant edge is carried on from the previous trapezoid. This is useful when scan converting triangles with a 'knee' (i.e. two subordinate edges). The data field holds the number of scanlines (or sub scanlines) to fill. This count is not loaded into the Count register. | | | | | Continue | 16 bit integer | Allows the rasterization to continue after new delta value(s) have been loaded, but does not cause either of the trapezoid's edge DDAs to be reloaded. The data field holds the number of scanlines (or sub scanlines) to fill. This count is not loaded into the Count register. | | | | | ContinueNewLine | 16 bit integer | Allows rasterization to continue for the next segment in a polyline. The XY position is carried on from the previous line, but the fraction bits in the DDAs can be: kept, set to zero, half, or nearly one half, under control of the RasterizerMode. The data field holds the number of pixels or subpixels in a line. This count is not loaded into the Count register. The use of ContinueNewLine is not recommended in OpenGL as for the second and subsequent segments the DDA units will start with a slight error compared with the value they would have been loaded with. | | | | | FlushSpan | Not used | Used when antialiasing to force the last span out when not all sub spans may be defined. | | | | | PixelSize | 0 = 32 bits<br>1 = 16 bits<br>2 = 8 bits | Configures the Rasterizer (and other core units) with the size of pixel to process when spans are used. It also informs the framebuffer interface Unit, but in this case all reads and writes are affected and not just spans. This replaces the pixel size field in the PCI FBModeSel register and works the same way for single pixel reads and writes (i.e. the framebuffer can be set to 32 bit pixels even though it is displaying 8 bit pixels to process 4 pixels at a time). | | | | | WaitFor Completion | Not used | This is used to suspend the core until all outstanding reads and writes in both the localbuffer and framebuffer memory units have completed. This is intended to prevent a new primitive from starting to be rasterized before the previous primitive is completely finished. It would be used, for example, to separate texture downloads from the surrounding primitives. The same functionality can be achieved using the Sync register and waiting for it in the Host Out FIFO; however, this method doesn't involve the host and can be inserted into a DMA buffer. | | | | **Table 2.1 Command Register Descriptions** | Register Name | Data Field | Description | |-------------------------|-----------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | RasterizerMode | See below | Defines the long term mode of operation of the rasterizer. | | StartXDom | Fixed point 16.16 format | Initial X value for the dominant edge in trapezoid filling, or initial X value in line drawing. | | dXDom | Fixed point 16.16 format | Value added when moving from one scanline (or sub scanline) to the next for the dominant edge in trapezoid filling. Also holds the change in X when plotting lines so for Y major lines this will be some fraction ( $dx/dy$ ), otherwise it is normally $\pm$ 1.0, depending on the required scanning direction. | | StartXSub | Fixed point 16.16 format | Initial X value for the subordinate edge. | | dXSub | Fixed point 16.16 format | Value added when moving from one scanline (or sub scanline) to the next for the subordinate edge in trapezoid filling. | | StartY | Fixed point 16.16 format | Initial scanline (or sub scanline) in trapezoid filling, or initial Y position for line drawing. | | dY | Fixed point 16.16 format | Value added to Y to move from one scanline to the next. For X major lines this will be some fraction (dy/dx), otherwise it is normally $\pm$ 1.0, depending on the required scanning direction. | | Count | 16 bit integer | Number of pixels in a line. Number of scanlines in a trapezoid. Number of sub scanlines in an antialiased trapezoid. Diameter of a point in sub scanlines. | | BitMaskPattern | 32 bits defined earlier | Value used to control the BitMask stipple operation (if enabled). | | PointTable0 PointTable1 | Packed dx point data. | Antialias point data table. There are 4 words in the table and the register tag is decoded to select a word. | | PointTable2 PointTable3 | | | | ScanLine Ownership | See Multi-Rasterizer<br>chapter | Defines which scanlines are owned when in multi-rasterizer mode. | | Ylimits | Ymax: 2's complement 16 bit value in the upper word. Ymin: 2's complement 16 bit value in the lower word. | Defines the Y extents the rasterizer should fill between. A scanline is filled if its Y value satisfies Ymin≤Y <ymax< td=""></ymax<> | **Table 2.2 Rasterizer Registers** ### 2.2.3 Render Command For efficiency, the Render command register has a number of bit fields that can be set or cleared per render operation and which qualify other state information. These bits are: - AreaStippleEnable - LineStippleEnable - ResetLineStipple - TextureEnable - FogEnable - CoverageEnable - SubpixelCorrection. This feature enables units to be set or cleared in one step as part of a specific render operation. For example, to clear a window to a background color when stippling and fog have already been enabled for 3D operations it is not necessary to clear the enable bits in **FogMode**, **AreaStippleMode** and **LineStippleMode** individually. They can be left enabled but overriden for the window clear operation simply by adjusting the **Render** command bitfield settings, shown below: ## Render | Name | | Type | | Offs | set Format | |--------|-------------|--------|-------|-------|-------------------------------------------------| | Render | | Global | | 0x80 | Bitfield | | | | Comma | nd | | | | Bits | Name | Read | Write | Reset | Description | | | | | | | | | 0 | AreaStipple | ~ | ./ | v | This bit when set enables area stippling of the | | Bits | Name | Read | Write | Reset | Description | |------|-----------------------|------|-------|-------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | AreaStipple<br>Enable | x | ~ | x | This bit, when set, enables area stippling of the fragments produced during rasterisation in the Stipple Unit. Note that area stipple in the Stipple Unit must be enabled as well for stippling to occur. When this bit is reset no area stippling occurs irrespective of the setting of the area stipple enable bit in the Stipple Unit. This bit is useful to temporarily force no area stippling for this primitive. | | 1 | LineStipple<br>Enable | x | ~ | x | This bit, when set, enables line stippling of the fragments produced during rasterisation in the Stipple Unit. Note that line stipple in the Stipple Unit must be enabled as well for stippling to occur. When this bit is reset no line stippling occurs irrespective of the setting of the line stipple enable bit in the Stipple Unit. This bit is useful to temporarily force no line stippling for this primitive. | | 2 | ResetLine<br>Stipple | × | V | x | This bit, when set, causes the line stipple counters in the Stipple Unit to be reset to zero, and would typically be used for the first segment in a polyline. This action is also qualified by the LineStippleEnable bit and also the stipple enable bits in the Stipple Unit. When this bit is reset the stipple counters carry on from where they left off (if line stippling is enabled) | | 3 | FastFillEnable | × | ~ | X | This bit, when set, causes the span fill mechanisms to be used for the rasterisation process. The type of span filling is specified in the SpanOperation field. When this bit is reset the normal rasterisation process occurs. | | 4, 5 | Unused | 0 | 0 | X | | | 7 | D.: T | | | 771 : | |------|----------------|---|----------|-------------------------------------------------------------| | 6, 7 | Primitive Type | × | ~ | This two bit field selects the primitive type to rasterise. | | | | | | The primitives are: | | | | | | 0 = Line | | | | | | 1 = Trapezoid | | | | | | 2 = Point | | 8 | Antialiase | X | <b>V</b> | This bit, when set, causes the generation of sub | | | Enable | | | scanline data and the coverage value to be calculated | | | | | | for each fragment. The number of sub pixel samples | | | | | | to use is controlled by the AntialiasingQuality bit. | | | | | | When this bit is reset normal rasterisation occurs. | | 9 | Antialiasing | X | <b>V</b> | This bit, when set, sets the sub pixel resolution to be | | | Quality | | | 8x8 | | | | | | When this bit is reset the sub pixel resolution is 4x4. | | 10 | UsePoint Table | X | V | When this bit and the AntialiasingEnable are set, the | | | | ^ | | dx values used to move from one scanline to the next | | | | | | are derived from the Point Table. | | 11 | SyncOnBit | X | ./ | This bit, when set, causes a number of actions: | | 11 | Mask | ^ | ~ | The least significant bit or most significant bit | | | IVIASK | | | (depending on the MirrorBitMask bit) in the Bit Mask | | | | | | , | | | | | | register is extracted and optionally inverted | | | | | | (controlled by the InvertBitMask bit). If this bit is 0 | | | | | | then any fragments are skipped. | | | | | | After every fragment the BitMask register is rotated by | | | | | | one bit. | | | | | | If all the bits in the BitMask register have been used | | | | | | then rasterisation is suspended until a new | | | | | | BitMaskPattern tag is received. If any other tag is | | | | | | received while the rasterisation is suspended then the | | | | | | rasterisation is aborted. The message which caused | | | | | | the abort is then processed as normal. | | | | | | Note the behaviour is slightly different when the | | | | | | SyncOnHostData bit is set to prevent a deadlock from | | | | | | occurring. In this case the rasterisation doesn't | | | | | | suspend when all the bits have been used and if new | | | | | | BitMaskPattern tags are not received in a timely | | | | | | manner then the subsequent fragments will just reuse | | | | | | the bit mask. | | 12 | SyncOnHost | × | ~ | When this bit is set a fragment is produced only when | | 14 | Data | ^ | | one of the following tags have been received from the | | | Data | | | host: Depth, Stencil, Color or FBData, FBSourceData. | | | | | | If SyncOnBitMask is reset then any tag other than one | | | | | | of these three is received then the rasterisation is | | | | | | | | | | | | aborted. If SyncOnBitMask is set then any tag other | | | | | | than one of these five or BitMaskPattern is received | | | | | | then the rasterisation is aborted. The tag which | | | | | | caused the abort is then processed as normal for that | | | | | | register type. The BitMaskPattern register doesn't | | | | | | cause any fragments to be generated, but just updates | | | | | | the BitMask register. | | 13 | TextureEnable | × | ~ | X | This bit, when set, enables texturing of the fragments produced during rasterisation. Note that the Texture Units must be suitably enabled as well for any texturing to occur. When this bit is reset no texturing occurs irrespective of the setting of the Texture Unit controls. This bit is useful to temporarily force no texturing for this primitive. | |------|----------------------------------|---|----------|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 14 | FogEnable | × | V | X | This bit, when set, enables fogging of the fragments produced during rasterisation. Note that the Fog Unit must be suitably enabled as well for any fogging to occur. When this bit is reset no fogging occurs irrespective of the setting of the Fog Unit controls. This bit is useful to temporarily force no fogging for this primitive. | | 15 | Coverage<br>Enable | × | ~ | X | This bit, when set, enables the coverage value produced as part of the antialiasing to weight the alpha value in the alpha test unit. Note that this unit must be suitably enabled as well. When this bit is reset no coverage application occurs irrespective of the setting of the AntialiasMode. | | 16 | SubPixel<br>Correction<br>Enable | X | V | x | This bit, when set enables the sub pixel correction of the color, depth, fog and texture values at the start of a scanline. When this bit is reset no correction is done at the start of a scanline. Sub pixel corrections are only applied to aliased trapezoids. | | 17 | Reserved | 0 | 0 | X | | | 18 | SpanOperation | × | V | x | This bit, when clear, indicates the writes are to use the constant color found in the previous FBBlockColor register. When this bit is set write data is variable and is either provided by the host (i.e. SyncOnHostData is set) or is read from the framebuffer. | | 19 | Unused | 0 | 0 | X | | | 2026 | Reserved | X | <b>~</b> | X | | | 27 | FBSourceRead<br>Enable | X | • | X | This bit, when set enables source buffer reads to be done in the Framebuffer Read Unit. Note that the Framebuffer Read Unit must be suitably enabled as well for the source read to occur. When this bit is reset no source reads occur irrespective of the setting of the Framebuffer Read Unit controls. | | 2831 | Unused | 0 | 0 | X | | # RasterizerMode | Name | Type | Offset | Format | |--------------------|------------------|--------|----------| | RaasterizerMode | Rasterizer | 0x80A0 | Bitfield | | RaasterizerModeAnd | Rasterizer | 0xABA0 | Bitfield | | RaasterizerModeOr | Rasterizer | 0xABA8 | Bitfield | | | Control register | | | | Bits | Name | Read <sup>6</sup> | Write | Reset | Description | |------|-----------------------------|-------------------|----------|-------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | MirrorBit Mask | • | ~ | X | <ul> <li>When set the bit mask bits are consumed from the most significant end towards the least significant end.</li> <li>When reset the bit mask bits are consumed from the least significant end towards the most significant end.</li> </ul> | | 1 | InvertBit Mask | ~ | ~ | X | When this bit is set the bit mask is inverted first before being tested. | | 2,3 | Fraction Adjust | • | ~ | X | These bits control the action of a ContinueNewLine command and specify how the fraction bits in the Y and XDom DDAs are adjusted. 0: No adjustment is done, 1: Set the fraction bits to zero, 2: Set the fraction bits to half. 3: Set the fraction to nearly half, i.e. 0x7fff | | 4,5 | Bias<br>Coordinates | • | <b>V</b> | X | These bits control how much is added onto the SartXDom, StartXSub and StartY values when they are loaded into the DDA units. The original registers are not affected. 0: Zero is added, 1: Half is added, 2: Nearly half, i.e. 0x7fff is added | | 6 | | <b>V</b> | <b>V</b> | X | Reserved | | 7,8 | BitMask<br>ByteSwap<br>Mode | • | <b>'</b> | X | These bit controls the byte swapping of the BitMask data before it is used. If the bytes are labelled ABCD on input then they are swapped as follows: 0: ABCD (i.e. no swap) 1: BADC 2: CDAB 3: DCBA | | 9 | BitMask<br>Packing | ~ | ~ | x | This bit controls whether the bitMask data is packed or if a new BitMask data is required on every scanline. 0: BitMask data is packed, 1: BitMask data is provided for each scanline. | $<sup>^{6}</sup>$ Logic Op register readback is via the main register only | 10-14 | BitMaskOffset | ~ | ~ | X | These bits hold the bit position in the BitMask data where the first bit is taken from for the bit mask test for the first BitMask data on a new scanline. Subsequent BitMask data starts from bit 0 until the next scanline. Successive bits are taken from increasing bit positions until the bit mask is consumed (i.e. bit 31 is reached). The least significant bit is bit zero. | |----------|--------------------------|----------|----------|---|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 15,16 | HostDataByteS<br>wapMode | V | <i>'</i> | x | These bits controls the byte swapping of the BitMask data before it is used. If the bytes are labelled ABCD on input then they are swapped as follows: 0: ABCD (i.e. no swap) 1: BADC 2: CDAB 3: DCBA | | 17 | MultiRasterizer | • | • | X | This bit selects whether the rasterizer is to work in single rasterizer mode or in multi-Rasterizer mode. In multi-rasterizer mode it only processes the scanlines allocated to it. 0: Single Rasterizer mode 1: Multi-Rasterizer mode | | 18 | YLimitsEnable | • | • | x | This bit, when set, enables the Y limits testing to be done between the minimum and maximum Y values given by the YLimits register. | | 19 | Reserved | <b>/</b> | <b>✓</b> | X | | | 2022 | StripeHeight | ~ | V | X | This field specifies the number of scanlines in a stripe. The options are: $0 = 1 \qquad 3 = 8$ $1 = 2 \qquad 4 = 16$ $2 = 4$ | | 23 | WordPacking | • | <b>'</b> | x | This bit controls how the two host words sent during, a span operation are packed into the 64 bit internal span data. 0 = first word in bits 031, second word in 3263 1 = first word in bits 3263, second word in 031 | | 24 | OpaqueSpans | • | • | X | This bit, when set allows the color of each pixel in the span to be either foreground or background as set by the supplied bit masks. If this bit is 0 then any supplied bit masks are anded with the pixel mask to delete pixels from the span. This bit should be set to 0 for performance reasons when foreground/background processing is not required. | | | | | 1 | | | | 25 | Reserved | 0 | 0 | X | | | 25<br>26 | Reserved D3DRules | 0 | 0 | X | This bit, if set, uses D3D rules for subpixel correction calculations, otherwise OpenGL rules are used. | Notes: Defines the long term mode of operation of the rasterizer. The logic operator equivalents behave the same way but the new mode is AND'd or OR'd with the former mode before replacing it. # 2.3 2D Setup This unit performs a nuber of fuctions to improve the throughput of 2D rendering. There are two new registers - **Render2D** and **Render2DGlyph** - which allow: - Rectangle setup using only two messages - Glyph rendering from texture memory in two messages - Glyph data can be handled (downloaded, chopped and padded) scanline by scanline compatibly with bitmap textures - Packed pixel downloads are converted from 4- to 8-bit format - Run Length Encoded (RLE) data downloads are automatically expanded The Render2D command incidentally flushes the write combine buffers to ensure memory is updated (and therefore visible to bypass or video reads) after the rectangle is rendered. # 2.3.1 Glyph rendering Once the position is established (**GlyphPosition**) subsequent glyphs can be rendered by writing the address of the texture bitmap containing the glyph to the **TextureBaseAddr(0)** register folllwed by the **Render2DGlyph** Command. The glyph position is updated automatically from the *Width* bitfield.Because glyphs are rendered as a span, the direction is always increasing X and Y. 3 # **Scissor, Stipple and Color DDA Units** ### 3.1 Scissor Unit Two scissor tests are provided in Permedia4, the User Scissor test and the Screen Scissor test. The user scissor checks each fragment or span against a user supplied scissor region; the screen scissor converts the fragment to screen-relative coordinates and checks that the fragment or span lies within the screen. The scissor unit operates both on active fragments and spans. In span processing the pixel mask bits corresponding to a failed fragment are reset. #### 3.1.1 User Scissor Test The user scissor test checks each fragment as follows: $XMin \le X < XMax$ $YMin \le Y < YMax$ Where X and Y are the coordinates for the fragments, and XMin, XMax, YMin and YMax define the user supplied scissor region. If a fragment fails the test it is discarded. The test may be screen- or window- relative. #### 3.1.2 Screen Scissor Tests This test ensures that a fragment lies within the screen boundaries. For each fragment the XY origin stored in the **WindowOrigin** register is added to the fragment coordinates and this is tested against the screen boundaries stored in the **ScreenSize** register. Since the X and Y coordinates are held as 2's complement numbers, the window origin can be moved off the edges of the screen. Note that the **WindowOrigin** register only affects the origin for clipping, it does not affect the base address for rendering. The *Windows Initialization* chapter gives further details on how to set the base address of a window for rendering. The Screen Scissor test is: $0 \le (X + WX) < SW$ $0 \le (Y + WY) < SH$ Where: X = Fragment X coordinate Y = Fragment Y coordinate WX = Window origin X coordinate WY = Window origin Y coordinate SW = Screen Width SH = Screen Height The diagram below shows a simple case of a screen with a single window which has a user defined scissor region. The shaded area shows the region where fragments pass the user and screen scissor tests and so can progress in the pipeline. Fragments outside this region are culled from the pipeline. Figure 3-1 Screen Scissor and User Scissor Tests This test may reject fragments if some part of a window has been moved off the screen. It will not reject fragments if part of a window is simply overlapped by another window (GID testing can be used to detect this). # 3.1.3 Scissor Registers The unit is controlled by the **ScissorMode** register: | Name Type | | Offse | et | Format | | | |------------|---------------|-------------------|-------|--------|---------------|------------------------| | ScissorMod | le | Scissor | | 0x81 | 80 | Bitfield | | ScissorMod | leAnd | Scissor | | 0xAI | BB0 | Bitfield Logic Mask | | ScissorMod | leOr | Scissor | 0xA | | 3B8 | Bitfield Logic Mask | | | | Control registers | | rs | | | | Bits | Name | Read <sup>7</sup> | Write | Reset | Description | | | | | | | | | | | 0 | UserScissor | <b>~</b> | ~ | X | enables the u | ser scissor clipping | | | Enable | | | | | | | 1 | ScreenScissor | <b>~</b> | ~ | x | enables the s | creen scissor clipping | | | Enable | | | | | | | 231 | Unused | 0 | 0 | x | | | Figure 3-2 ScissorMode Register <sup>&</sup>lt;sup>7</sup> Logic Op register readback is via the main register only The screen scissor test would normally be left enabled by default. The most common exception is during image upload. The user scissor region is specified by two registers **ScissorMinXY** and **ScissorMaxXY** the X values are stored in the least significant 16 bits of the register, the Y values in the most significant 16 bits of the register. The **WindowOrigin** register has the X coordinate of the origin stored in the least significant 16 bits of the register, and the Y coordinate in the most significant 16 bits of the register. As each fragment is generated by the rasterization unit this origin is added to the coordinates of the fragment to generate its screen coordinates. The **ScreenSize** register specifies the screen width and height, with the width in the least significant 16 bits and the height in the most significant 16 bits. # 3.1.4 Span Operations and the Scissor Unit If a span mask is presented to the scissor unit, the pixel mask (and potentially the color mask) is modified to zero out bits corresponding to pixels which lie outside the scissor region. This is true for both the user scissor and the screen scissor. The screen scissor first converts the span's coordinates to screen-relative. # 3.1.5 Scissor Example To enable screen scissor for a region: $10 \le X < 500$ , $100 \le Y < 200$ with a screen size of $1280 \times 1024$ and the window origin at (100,100). ``` // Set the screen size screenSize.Width = 1280 screenSize.Height = 1024 ScreenSize(screenSize) // Set the window origin ScissorMode(scissorMode) WindowOrigin(windowOrigin) // Render primitives ``` # 3.2 Stipple Unit Stippling is a process which checks each fragment against a bit in a defined pattern. The fragment can either be rejected or accepted depending on the result of the stipple test. If it is rejected, then it undergoes no further processing, otherwise it proceeds down the pipeline. Permedia4 supports line and area stippling. # 3.2.1 Area Stippling The address of the stipple pattern row to use in the test is calculated as follows: - Add the Y offset to the bottom five bits of Y coordinates of the span coordinate. If the corresponding mirror bits are set then invert the Y address. - Extract the bottom m bits of the resulting Y value where m is determined by the Y Sel fields. The extracted Y address is zero extended to 5 bits where necessary and is now called Y'. - Add the YTableOffset to Y' to move the test to the required sub stipple pattern row. The Y' value selects the row in the stipple RAM (row zero is at **AreaStipplePattern[0]**) and this is the first value of the AreaStippleMask which is processed by each of the following stages and passed on to the next: - The mask is rotated right by the *XTableOffset* amount to select the sub stipple pattern to replicate, mirror, etc. - The least significant 2, 4, 8, 16 or 32 bits are extracted from the AreaStippleMask and replicated to fill all 32 bits of the mask. The Xsel field determines the number of bits to replicate (0 = 2 bit to replicate, etc.). - Next the AreaStippleMask is mirrored if the *MirrorX* bit is set. The mirroring is done by swapping bits (0, 63), (1, 62), (2, 61), etc.. - The area span mask is inverted under control of the InvertStipplePattern bit. - The area span mask is rotated right by (Xoffset + X) bits. The area stipple pattern is always 32x32 and is window relative. However the *XtableOffset* and *YtableOffset* fields in **AreaStippleMode** allow the 32x32 bit table to hold several smaller area stipple patterns. The least significant 5 bits of the fragment's (X,Y) coordinates, index into the controlling bit of the 2D stipple pattern. If the selected bit in the pattern is set, then the fragment passes the test, otherwise it is rejected as described above. The mask is defined in the **AreaStipplePattern** registers. Area stippling is enabled and controlled using the **AreaStippleMode** register and must be qualified by the *AreaStipple Enable* bit in the **Render** command register. This allows temporary disable stippling when Bitmaps or OGL pixel rectangles are being rendered. The address selection can be controlled independently in the X and Y directions. In addition the bitpattern can be inverted or mirrored using *InvertStipplePattern* or *MirrorX*. Inverting the bit pattern has the effect of changing the sense of the accept/reject test. If the mirror bit is set the most significant bit of the pattern is towards the left of the window, the default is the converse. In some situations window relative stippling is required but coordinates are only available screen relative. To allow windows relative stippling, an offset can be added to the coordinates before indexing the stipple table. X and Y offsets can be controlled independently. # 3.2.2 Line Stippling Line stippling applies normally to aliased lines. Antialisaed lines can be stippled by applying the stipple pattern to the rectangles which constitute the antialiased line. In this test, fragments are conditionally rejected on the outcome of testing a linear stipple mask. If the bit is zero then the test fails, otherwise it passes. The line stipple pattern is 16 bits in length and is scaled by a repeat factor, r, (in the range 1 to 512). The stipple mask bit, b, which controls the acceptance or rejection of a fragment is determined using: $b = (floor (s / r)) \mod 16$ where s is the stipple counter which is incremented for every fragment (normally along the line). This counter may be reset at the start of a polyline, but between segments it continues as if there were no break. The stipple pattern can be optionally mirrored, that is the bit pattern is traversed from most significant to least significant bits, rather than the default, from least significant to most significant. The **UpdateLineStippleCounters** register controls initialization of the line stipple counters, which can be reset or loaded from a previously saved value. The **UpdateLineStippleCounters** register can be reset by writing 0 to bit 0 (earlier chips required resetting all 32 bits in the register). The **SaveLineStippleCounters** register is used to save the current line stipple counters. The combination of **UpdateLineStippleCounters** and **SaveLineStippleCounters** is useful to implement stippling of wide polylines. Line stippling is enabled using the **LineStippleMode** register and must be qualified by the *LineStippleEnable* bit in the **Render** command register. # 3.2.3 Span Operations and Stippling If the Area Stipple unit is enabled it modifies span masks generated by the rasterizer. (Line stipple has no effect on the span mask.) The mask can be rotated or inverted before being ANDed with the pixel mask for transparent spans, or the color mask for spans using the *OpaqueSpan* bit in the **AreaStippleMode** register. # 3.2.4 Registers The LineStippleMode register controls line stipple: | Name | Type | Offset | Format | |--------------------|------------------|--------|---------------------| | LineStippleMode | Stipple | 0x81A8 | Bitfield | | LineStippleModeAnd | Stipple | 0xABC0 | Bitfield Logic Mask | | LineStippleModeOr | Stipple | 0xABC8 | Bitfield Logic Mask | | <del></del> | Control register | | | Bits Name Read Write Reset Description StippleEnable This field, when set, enables the stippling of lines. The 1 x 0 LineStippleEnable bit in the Render command must also be set. 1...9 RepeatFactor This field holds the positive repeat factor for stippled x lines. The repeat factor stored here is one less than the desired repeat factor. 10...25 StippleMask This field holds the stipple pattern. 1 X 26 Mirror This field, when set, will mirror the StippleMask x before it is used. 27...31 Unused 0 $\mathbf{x}$ Figure 3-3 LineStippleMode Register The least significant bit of the UpdateLineStippleCounters register controls loading the line stipple counters. If set the line stipple counters are loaded with the previously saved values. If reset, the counters are cleared to zero. The counters can also be reset by means of the ResetLineStipple bit in the Render command. The **AreaStippleMode** register controls area stipple operation: | Name | Type | Offset | Format | | |--------------------|---------|--------|---------------------|--| | AreaStippleMode | Stipple | 0x81A0 | Bitfield | | | AreaStippleModeAnd | Stipple | 0xABD0 | Bitfield Logic Mask | | | AreaStippleModeOr | Stipple | 0xABD8 | Bitfield Logic Mask | | Control registers | Bits | Name | Read <sup>8</sup> | Write | Reset | Description | |------|---------------------------|-------------------|----------|-------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | Enable | ~ | • | X | This field, when set, enables area stippling. The AreaStippleEnable bit in <i>Render</i> must also be set for this to have an effect. | | 13 | X address select: | ~ | • | X | 0 = 1 bit $1 = 2 bit$ $2 = 3 bit$ $3 = 4 bit$ $4 = 5 bit$ | | 46 | Y address select: | V | • | x | 0 = 1 bit $1 = 2 bit$ $2 = 3 bit$ $3 = 4 bit$ $4 = 5 bit$ | | 711 | X Offset | ~ | • | X | This field holds the offset to add to the X value before it is used to index into the stipple bit. This allows a window relative stipple pattern to be selected when the coordinates are given in screen relative format. | | 1216 | Y Offset | • | ~ | X | This field holds the offset to add to the Y value before it is used to index into the area stipple pattern table. This allows a window relative stipple pattern to be selected when the coordinates are given in screen relative format. | | 17 | Invert Stipple<br>Pattern | ~ | ~ | X | 0 = No Invert 1 = Invert | | 18 | Mirror X | ~ | ~ | x | 0 = No Mirror 1 = Mirror | | 19 | Mirror Y | V | ~ | X | 0 = No Mirror $1 = Mirror$ | | 20 | OpaqueSpan | ~ | • | X | This bit, when set, allows the area stipple pattern to modify the color mask, otherwise the pixel mask is modified. | | 2125 | XTableOffset | ~ | ~ | x | This field allows a sub area stipple pattern to be extracted from the area stipple table, i.e. the area stipple table is treated as a cache of smaller stipple patterns. | | 2630 | YTableOffset | <b>V</b> | <b>'</b> | X | This field allows a sub area stipple pattern to be extracted from the area stipple table, i.e. the area stipple table is treated as a cache of smaller stipple patterns. | | 31 | Unused | 0 | 0 | X | | Figure 3-4 AreaStippleMode Register <sup>&</sup>lt;sup>8</sup> Logic Op register readback is via the main register only The *EnableUnit* bit in the **LineStippleMode** and **AreaStippleMode** registers are qualified by the *LineStippleEnable* and *AreaStippleEnable* bits in the **Render** command register. The **SaveLineStippleCounters** register (which has no data field) saves the line stipple counters internally. The area stipple is set up in the **AreaStipplePattern** register, where n represents an integer between 0 and 31. The LoadLineStippleCounters register is shown in the Permedia4 Reference Guide | Name | Type | Offset | Format | | |-------------------------|---------|--------|----------|--| | LoadLineStippleCounters | Global | 0x81B0 | Bitfield | | | | Command | | | | | Bits | Name | Read | Write | Reset | Description | |------|--------------------------|------|-------|-------|-------------| | 03 | LiveBit<br>Counter | × | ~ | x | | | 412 | LiveRepeat<br>Counter | × | • | X | | | 1316 | SegmentBit<br>Counter | X | ~ | X | | | 1725 | SegmentRepeat<br>Counter | × | • | x | | | 2631 | Unused | 0 | 0 | X | | Figure 3-5 LoadLineStippleCounters register ## 3.2.5 Examples ``` A repeating area stipple pattern of 2x2 pixels producing a 50% grey area: ``` ``` AreaStippleMode(areaStippleMode) // When the Render command is sent the // AreaStippleEnable // bit should be set in addition to the area stipple // test being enabled: // render.AreaStippleEnable = PERMEDIA4_TRUE ``` # 3.2.6 Line Stipple Example A line stipple which rejects alternate fragments: ``` // Set counters to zero UpdateLineStippleCounters(0x0) // Set the stipple mode lineStippleMode.UnitEnable = PERMEDIA4_ENABLE lineStippleMode.RepeatFactor = 0 // Repeat factor 1 lineStippleMode.StippleMask = 0xAAAA LineStippleMode(lineStippleMode) // When issuing a Render command the // LineStippleEnable bit should be set in addition // to the line stipple test being enabled: // render.LineStippleEnable = PERMEDIA4_TRUE ``` # 3.2.7 Area Stipple Pattern Example Another repeating area stipple pattern of 2x2 pixels producing a 50% grey area: ``` AreaStiPPlePattern0 (0xAAAAAAAA) AreaStipplePatternl (0x55555555) AreaStipplePattern2 (0xAAAAAAA) AreaStipplePattern3 (0x55555555) AreaStipplePattern4 (0xAAAAAAAA) AreaStipplePatfern5 (0x55555555) RTeaStipplePattern6 (0xAAAAAAAA) AreaStipplePattern7 (0x55555555) AreaStipplePattern31 (0x55555555) // Set-up mode register areaStippleMode.UnitEnable = PERMEDIA4_ENABLE areaStippleMode.Xselect = 0 areaStippleMode.Yselect = 0 areaStippleMode.Xoffset = 0 areaStippleMode.Yoffset = 0 ``` ``` areaStippleMode.Invert = 0 areaStippleMode.MirrorY = 0 areaStippleMode.MirrorX = 0 // Load mode register AreaStippleModeareaStippleMode) // When issuing a Render command, the // AreaStippleEnable bit should be set to enabled: // Arender.AreaStippleEnable = PERMEDIA4_TRUE ``` ## 3.3 Color DDA Unit The color DDA unit is used to associate a color with a fragment produced by the rasterizer. This unit should be enabled for rendering operations and disabled for pixel rectangle operations (i.e. copies, uploads and downloads). Color DDA functionality is controlled by the ColorDDA register: # ColorDDAMode ColorDDAModeAnd ColorDDAModeOr | Name | Type | Offset | Format | | |-----------------|-------|--------|---------------------|--| | ColorDDAMode | Color | 0x87E0 | Bitfield | | | ColorDDAModeAnd | Color | 0xABE0 | Bitfield Logic Mask | | | ColorDDAModeOr | Color | 0xABE8 | Bitfield Logic Mask | | | | | | | | Control registers | Bits | Name | Read <sup>9</sup> | Write | Reset | Description | |------|---------|-------------------|-------|-------|------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 1 | Enable | ~ | ~ | X | This bit, when set, causes the current color to be generated. | | 2 | Shading | ~ | ~ | X | Selects the shading mode. The two options are: 0 = Flat – the color is taken from the Constant Color register. 1 = Gouraud – the color is taken from the DDAs. | | 331 | Unused | 0 | 0 | X | | Notes: The ColorDDAMode register controls the operation of the Color DDA unit using the Enable and Shading bits. The logic operator equivalents behave the same way but the new mode is AND'd or OR'd with the former mode before replacing it. <sup>&</sup>lt;sup>9</sup> Logic Op register readback is via the main register only # 3.3.1 RGBA and Color-Index(CI) Modes **Type** Two color modes are supported by Permedia4, RGBA and color index (CI). Permedia4's internal color representation is RGBA with 8 bits per component: A typical register layout is **ConstantColor**: **Format** # **Constant Color** Blue Alpha Name 16...23 24...31 | ConstantColor | | Delta | | | E8 Bitfield | |------------------------------|-------|----------|---|-------|-------------| | Bits Name Read Write Reserve | | | | Reset | Description | | 07 | Red | ~ | ~ | x | | | 815 | Green | <b>✓</b> | ~ | X | | Offset Notes: This register holds the constant color in packed format. This is a legacy register maintained for backwards compatibility which has been superceded by the *ConstantColorDDA* register. The *ConstantColorDDA* register, as well as loading up the constant color register, also loads the DDA start register from the corresponding color byte and sets the dx and dyDom gradients to zero. This allows a constant color to be set up irrespective of the shading mode. This format is the same for all the different framebuffer configurations supported. If the number of bits in the framebuffer for a color component is less than 8 then the color value is left shifted into the most significant bits of that components field. The unused least significant bits should be set to zero. In CI mode the color index is placed in the lower byte of the 32 bit register (i.e., the red component). If less than 8 bits are used the index is left justified to be in the most significant end of the red component. The unused least significant bits should be set to zero. For further information on Color modes see chapter 9 - Color Format and Logical Ops. ## 3.3.2 Gouraud Shading Shading may be flat or Gouraud. For flat shading, the color value is taken from the **ConstantColor** register, not from the DDA. When in Gouraud shading mode, the color DDA unit performs linear interpolation given a set of start and increment values. Interpolated values are clamped to avoid overflow or underflow. For details of color interpolation calculation see Apendix 13-2 - Calculating Depth Gradient Values. Figure 3-6 Color Interpolation Color interpolates from the dominant edge of the trapezoid to the subordinate edges. This means that two increment values are required per color component, one to move along the dominant edge and one to move across the span to the subordinate edge. This is illustrated in Figure 3-6, where C represents a color component (red, green, blue, alpha or color index). The control registers are shown in table 3.3, below. For Gouraud shaded lines, each line is treated as the dominant edge of a trapezoid so no **dCdx** increment is required. To allow accurate interpolation, the increment values are specified in a 24bit fixed point format. The format is 2's complement with 9 bits of integer and 15 bits of fraction. A typical register layout is shown below: | Name | | Type | | Offset | | Format | |------------------|----------|------------------|-------|--------|--------------|-------------------------------| | dAdyDom | | Color DDA 0x87D8 | | D8 | Fixed point | | | Control register | | | | | | | | Bits | Name | Read | Write | Reset | | Description | | | | | | | | | | 014 | Fraction | 1 | 1 | X | 2's compleme | ent 9.15 fixed point fraction | | 1523 | Integer | 1 | 1 | X | 2's compleme | ent 9.15 fixed point integer | | 2431 | Unused | 0 | 0 | X | | | Figure 3-7 Fixed Point Color Format Note that if you are rendering to multiple buffers and have initialized the start and increment values in the color DDA unit, then any subsequent **Render** command will reload the start values. If subpixel correction has been enabled for a primitive, then any correction required will be applied to the color components. The registers to set up Gouraud shading in the color DDA unit are: | Register | Data Field | Description | |----------|-------------------------|-----------------| | RStart | Fixed point 9.15 format | Red start value | | dRdx | Fixed point 9.15 format | Red derivative per unit X | |---------|-------------------------|--------------------------------------------| | dRdyDom | Fixed point 9.15 format | Red derivative per unit Y, dominant edge | | GStart | Fixed point 9.15 format | Green start value | | dGdx | Fixed point 9.15 format | Green derivative per unit X | | dGdyDom | Fixed point 9.15 format | Green derivative per unit Y, dominant edge | | BStart | Fixed point 9.15 format | Blue start value | | dBdx | Fixed point 9.15 format | Blue derivative per unit X | | dBdyDom | Fixed point 9.15 format | Blue derivative per unit Y, dominant edge | | AStart | Fixed point 9.15 format | Alpha start value | | dAdx | Fixed point 9.15 format | Alpha derivative per unit X | | dAdyDom | Fixed point 9.15 format | Alpha derivative per unit Y, dominant edge | Table 3.3 Color Interpolation Registers ### 3.3.3 Flat Shading Example A flat shaded primitive: ``` // Set DDA to flat shade mode colorDDAMode.UnitEnable = Permedia4_ENABLE colorDDAMode.Shade = Permedia4_FLAT_SHADE_MODE ColorDDAMode(colorDDAMode) ConstantColor(0xFFFFFFFF) // Load the flat color ``` ## 3.3.4 Gouraud Shaded Trapezoid Example dBdyDom() ## 3.3.5 Gouraud Shaded Line Example ``` // Set DDA for Gouraud shaded mode colorDDAMode.UnitEnable = Permedia4_ENABLE colorDDAMode.Shade = Permedia4_GOURAUD_SHADE_MODE ColorDDAMode(colorDDAMode) ``` ``` // For lines we need only start values and // dominant edge deltas RStart() // Set-up the red component start value dRdyDom() // Set-up the red component increment GStart() // Set-up the green component start value dGdyDom() // Set-up the green component increment BStart() // Set-up the blue component start value dBdyDom() // Set-up the blue component increment ``` 4 # **Localbuffer Read/Write** The localbuffer holds the Graphic ID, Stencil and Depth data associated with a fragment. The localbuffer address calculation uses the LocalBuffer mode, address and offset registers registers to set base addresses and screen-relative offsets, as well as positioning the Depth, Stencil and GID planes. For details see "Localbuffer and Framebufferonfiguration" in *Initialization* section 12.2.7 below. The origin can be set in the relevant BufferMode register(s) to top left or bottom right using the *Origin* field. Note: Enabling Patch addressing in the Layout field of the buffer mode registers introduces additional complexity into the address calculation which is beyond the scope of this manual. Localbuffer bypass accesses are not recommended when Patch mode addressing is enabled. The localbuffer read format is controlled by the **LBDestReadFormat** register's definition of the positions of the Depth, Stencil and GID planes. Selecting a depth width of 15 bits forces the stencil and GID fields to be set from bit 15 of the pixel and ignores the normal stencil and GID settings. The natural internal width of the fields are depth (31), stencil (8), GID (4). If the specified width of a field is less than its internal width then the field is zero extended to its internal width. | Field | Width | Position | |---------|----------------|----------------------------------------------------------| | Depth | 16, 24, 31, 15 | Bit 0 to bit 3 | | Stencil | 0 - 8 | Starts at 16 to 39 (entered as $0 - 23$ ) | | GID | 0 - 4 | Starts at 16 to 39 (entered as 0 – 23) following Stencil | **Table 4.4 Localbuffer Configurations** The enables for these are in the **GIDMode**, **StencilMode** and **DepthMode** registers. These tell Permedia4 which areas of the localbuffer are required for various operations. The operations are specified by the **LBWriteMode** Operation field in bits 29-31: | 2931 | Operation | <b>/</b> | <b>/</b> | X | This field defines where the data is to be taken from to | | |------|-----------|----------|----------|---|----------------------------------------------------------|--------------------| | | | | | | do the write and what is to happen to it afterwards. | | | | | | | | This is only of interest during an upload or download | | | | | | | | operation. The options are: | | | | | | | | 0 = No operation | 1 = Download depth | | | | | | | 2 = Download stencil | 3 = Upload depth | | | | | | | 4 = Upload stencil | | #### Table 4.5 Localbuffer Read/Write Modes. Note that the **LBReadFormat** and **LBWriteFormat** registers should not be written to while there are pending reads to the localbuffer. To avoid this a write to these registers should normally be preceded by a **WaitForCompletion** command. ## 4.1.1 Mode Registers The **LBDestReadMode** register is as shown below: # LBDestReadMode LBDestReadModeAnd LBDestReadModeOr | Name | Type | Offset | Format | |-------------------|-------------------|--------|---------------------| | LBDestReadMode | Localbuffer | 0xB500 | Bitfield | | LBDestReadModeAnd | Localbuffer | 0xB580 | Bitfield Logic Mask | | LBDestReadModeOr | Localbuffer | 0xB588 | Bitfield Logic Mask | | | Control registers | | | | Bits | Name | Read<br>10 | Write | Reset | Description | |------|--------------------|------------|----------|-------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | Enable | ~ | ~ | x | This bit, when set, causes fragments or spans to read from the destination buffer | | 1 | Reserved | × | X | X | | | 24 | StripePitch | ~ | ~ | X | This field specifies the number of scanlines between the first scanline in a stripe and the first scanline in the next stripe. (It would normally be set to a number of RXs * StripeHeight). The options are: $0 = 1$ $1 = 2$ $2 = 4$ $3 = 8$ $4 = 16$ $5 = 32$ $6 = 64$ $7 = 128$ This field will normally be set to zero for PERMEDIA4. | | 57 | StripeHeight | <b>'</b> | ~ | x | This field specifies the number of scanlines in a stripe. The options are: $0 = 1$ $1 = 2$ $2 = 4$ $3 = 8$ $4 = 16$ This field will normally be set to zero for Permedia4. | | 8 | Layout | ~ | ~ | x | This field selects the layout of the pixel data in memory for the destination buffer. The options are: $0 = \text{Linear} \qquad 1 = \text{Patch64}$ | | 9 | Origin | • | <b>V</b> | X | This field selects where the window origin is for the destination buffer. The options are: $0 = \text{Top Left.} \qquad 1 = \text{Bottom Left}$ | | 10 | UseRead<br>Enables | ~ | ~ | X | When this bits is set the enables in the LBDestReadEnables register are used to determine if a destination read is required. The Enable bit must also be set as well for a read to occur. | 4-2 $<sup>^{10}</sup>$ Logic Op register readback is via the main register only | 11 | Packed16 | ~ | • | X | When this bit is set the pixel size is 16 bits so a single memory word can hold 8 depth values. | |------|----------|----------|---|---|-------------------------------------------------------------------------------------------------| | 1223 | Width | <b>'</b> | ~ | X | This field holds the width of the destination buffer. Its range is 04095. | Notes: Defines the localbuffer destination read operation. The destination address calculations are controlled by the LBDestReadMode register and the address is a function of X, Y, LBDestReadBufferAddr, LBDestReadBufferOffset, width and Packed16 parameters. The logic operator equivalents behave the same way but the new mode is AND'd or OR'd with the former mode before replacing it. Figure 4-1 LBDestReadMode Register ## LBWriteMode LBWriteModeAnd LBWriteModeOr Name Type Offset **Format** LBWriteMode Localbuffer 0x88C0 Bitfield LBWriteModeAnd Localbuffer 0xAC80 Bitfield LBWriteModeOr Localbuffer 0xAC88 Bitfield Control register | Bits | Name | Read<br>11 | Write | Reset | Description | |------|--------------|------------|----------|-------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | WriteEnable | ~ | <b>V</b> | x | This bit, when set, causes fragments or spans to written to the destination buffer. Note each byte must also be enabled in the ByteEnables field. | | 12 | Reserved | 0 | 0 | X | | | 35 | StripePitch | ~ | ~ | X | This field specifies the number of scanlines between the first scanline in a stripe and the first scanline in the next stripe. It would normally be set to number of RXs * StripeHeight. The options are: $ 0 = 1 4 = 16 $ $ 1 = 2 5 = 32 $ $ 2 = 4 6 = 64 $ $ 3 = 8 7 = 128 $ This field will normally be set to zero for Permedia4. | | 68 | StripeHeight | ~ | ~ | x | This field specifies the number of scanlines in a stripe. The options are: $0 = 1$ $3 = 8$ $1 = 2$ $4 = 16$ $2 = 4$ This field will normally be set to zero for Permedia4. | <sup>&</sup>lt;sup>11</sup> Logic Op register readback is via the main register only | 9 | Layout | ~ | ~ | X | This field selects the layout of the pixel data in memory for the destination buffer. The options are: 0 = Linear 1 = Patch64 | |------|-------------|----------|---|---|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 10 | Origin | • | • | x | This field selects where the window origin is for the destination buffer. The options are: 0 = Top Left. 1 = Bottom Left | | 11 | Packed16 | ~ | ~ | X | When this bit is set the pixel size is 16 bits so a single memory word can hold 8 depth values. | | 1223 | Width | ~ | ~ | X | This field holds the width of the destination buffer. Its range is 04095. | | 2428 | ByteEnables | <b>'</b> | ~ | X | This field holds the byte enables for each byte in the pixel. A byte enable bit must be set for the corresponding byte to be written. Ideally the depth, stencil, etc. fields are byte aligned and integral bytes in length so these can be used to disable modifying a field, otherwise read-modify-write operations will need to be done. | | 2931 | Operation | ~ | V | x | This field defines where the data is to be taken from to do the write and what is to happen to it afterwards. This is only of interest during an upload or download operation. The options are: 0 = No operation 1 = Download depth 2 = Download stencil 3 = Upload depth 4 = Upload stencil | Notes: The write requests have two forms: - Single pixel. This is the normal mode for 3D operation but is only used for exotic 2D operations. The calculated address is always a pixel address and this is shifted to take into account the width of a pixel (16 or 32 bits) in calculating the memory address and byte enables. The pixel data (Z, stencil and GID) are formatted and shifted into the correct byte lanes for the memory. - Pixel spans. Spans are useful for clearing down the local buffer but do not use any block fill capabilities of the memory (these are only available through the FB Write Unit), although 4 or 8 pixels will be cleared down per cycle. - N.B Write operation is not compatible with GLINT MX for programming purposes. The logic operator equivalents behave the same way but the new mode is AND'd or OR'd with the former mode before replacing it. #### Figure 4-2 LBWriteMode Register In **LBWriteMode** the LSB enables writes to the destination buffer. Other bits control byte enables and upload/download characteristics. The localbuffer format must be specified for both reads and writes using the **LBReadFormat** and **LBWriteFormat** registers. Normally these registers are set to identical values. It may be useful to set them to different values when, say, copying between two windows using different depth widths. In all cases care should be taken to ensure that the field widths and positions are such that the fields do not overlap. #### **LBWriteFormat** | Name | Type | Offset | Format | |---------------|------------------|--------|----------| | LBWriteFormat | Localbuffer | 0x88C8 | Bitfield | | | Control register | | | | Bits | Name | Read | Write | Reset | Description | |------|-----------------|------|----------|-------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 01 | DepthWidth | ~ | <b>V</b> | x | This field specifies the width of the depth field. The depth field always starts at bit position 0. The width options are: 0 = 16 bits 1 = 24 bits 2 = 31 bits 3 = 15 bits When the depth width is 15 the GID and Stencil fields are ignored and a one bit GID and Stencil are taken from bit 15. Only one of the GID or Stencil operation are enabled to select the desired field type. | | 25 | StencilWidth | ~ | ~ | X | This field specifies the width of the stencil field. The legal range of values are 08. The stencil field always starts at bit position given in the next field. | | 610 | StencilPosition | ~ | ~ | x | This field holds position of the least significant bit of the stencil field. The legal range of values are 023, representing bit positions 1639 respectively. | | 1119 | Reserved | 0 | 0 | x | | | 2022 | GIDWidth | ~ | ~ | x | This field specifies the width of the Graphics ID field. The legal range of values are 04. The GID field always starts at the bit position given in the <i>GIDPosition</i> field. | | 2327 | GIDPosition | ~ | • | X | This field holds position of the least significant bit of the Graphics ID field. The legal range of values are 023, representing bit positions 1639 respectively. | | 2831 | Reserved | 0 | 0 | X | | Notes: This register defines the position and width of the depth, stencil, GID (Graphics ID) in the data read back from the local buffer. Figure 4-3 LBWriteFormat Register Layout ## 4.2 Window register A number of Localbuffer operations, particularly Stencil, are conditioned by the **Window** register. - The ForceLBUpdate bit is used to allow all the fields in the localbuffer to be updated simultaneously. ForceLBUpdate overrides all stencil and Depth testing. This is useful during initialization and copy operations. - When the LBUpdateSource bit is set the source of the stencil and depth data is determined by the StencilMode and DepthMode registers respectively. - The OverrideWriteFiltering control bit, when set causes the testing of LBData = LBWriteData to always fail. This is mainly used when the GID field needs to be changed. It also allows the LBReadFormat to be different to the LBWriteFormat so the write data as seen by the memory is really different to the data that was read. - LBUpdateSource is used in conjunction with the ForceLBUpdate bit to select whether the source data comes from: the localbuffer, or values held in local registers (Depth, Window, Stencil). - The combination of *LBUpdateSource* being set to LBSourceData, and the *ForceLBUpdate* bit being enabled is particularly useful when copying a window from one location on the screen to another. - The combination of LBUpdateSource being set to Registers and the force LBUpdate bit being enabled is particularly useful for initializing the contents of the various localbuffer fields in a window. - Normally Permedia4 detects the case where the data to be written to the localbuffer is the same as the data read from the localbuffer, and avoids performing the write. Setting the OverrideWriteFiltering bit prevents these writes from being filtered out. This is of value when the localbuffer read format is different from the localbuffer write format since the comparison is done on the internal data format. ## 4.3 Pixel Ownership (GID) Test Unit Any fragment generated by the rasterizer may undergo a pixel ownership test. This test establishes the current fragment's write permission to the localbuffer and framebuffer. ## 4.3.1 Pixel Ownership Test The ownership of a pixel is established by testing the GID of the current window against the GID of a fragment's destination in the GID buffer. If the test passes, then a write can take place, otherwise the write is discarded. The sense of the test can be set to one of: always pass, always fail, pass if equal, or pass if not equal. Pass if equal is the normal mode. In Permedia4 the GID planes, if present, are 4 bits deep allowing 16 possible Graphic ID's. If GIDMode is disabled fragments pass through undisturbed. Pixel ownership is controlled by the relevant LB Format and **GIDMode** registers: ## GIDMode GIDModeAnd GIDModeOr | Name | Type | Offset | Format | |-------------|-------------|---------|---------------------| | GIDMode | Localbuffer | 0xB538 | Bitfield | | GIDMode And | Localbuffer | 0x B5B0 | Bitfield Logic Mask | | GIDMode Or | Localbuffer | 0x B5B8 | Bitfield Logic Mask | Control registers | Bits | Name | Read<br>12 | Write | Reset | Description | |------|--------------------|------------|-------|-------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | Fragment<br>Enable | • | • | X | This bit, when set, causes GID testing to occur on fragments. If the test fails then the fragment is discarded | | 1 | Span Enable | ~ | ~ | X | This bit, when set, allows the span pixel mask to be modified by GID testing each pixel. The mask is modified to disable those pixels which fail the test. | | 25 | Compare Value | • | ~ | x | This field holds the 4 bit GID value to compare against. Unused bits (where the GID width in the local buffer format is less than 4 bits) should be set to zero. | | 67 | Compare Mode | ~ | ~ | X | This field holds the comparison modes available for use during GID testing. The options are: 0 = Always pass 1 = Never pass (i.e. always fail) 2 = Pass when local buffer gid == CompareValue 3 = Pass when local buffer gid!= CompareValue | | 89 | Replace Mode | ~ | • | X | This field specifies the replacement mode. This is independent of the FragmentEnable bit (except when the replacement depends on the outcome of the GID test). The options are: 0 = Always replace 1 = Never replace 2 = Replace on GID test pass. 3 = Replace on GID test fails | | 1013 | Replace Value | <b>'</b> | ~ | Х | This field holds the 4 bit GID value to replace the value read from the local buffer, if the replace mode is satisfied. | | 1331 | Reserved | 0 | 0 | x | Reserved | Figure 4-4 GIDMode Register The *CompareMode* field will generally be set to 'Pass if Equal' for GID testing, with the current GID in the appropriate field. $<sup>^{12}</sup>$ Logic Op register readback is via the main register only #### 4.4 Stencil Test The stencil test conditionally rejects fragments based on the outcome of a comparison between the value in the stencil buffer and a reference value. The stencil buffer is updated according to the current stencil update mode which depends on the result of the stencil test and the depth test. This test only occurs if all the preceding tests (bitmask, scissor, stipple, alpha, pixel ownership) have passed. The stencil test is controlled by the stencil function and the stencil operation. The stencil function controls the test between the reference stencil value and the value held in the stencil buffer. If the test is LESS and the result is true then the fragment value is less than the source value. The stencil operation controls the updating of the stencil buffer, and is dependent on the result of the stencil and depth tests. The table below shows the stencil functions available: | Mode | Comparison Function | |------|---------------------| | 0 | Never | | 1 | Less | | 2 | Equal | | 3 | Less or Equal | | Mode | Comparison Function | | | | | |------|---------------------|--|--|--|--| | 4 | Greater | | | | | | 5 | Not Equal | | | | | | 6 | Greater or Equal | | | | | | 7 | Always | | | | | **Table 4.6 Stencil Functions** If the stencil test is enabled then the stencil buffer will be updated depending on the outcome of both the stencil and the depth tests (if the depth test is disabled the depth result is set to pass). Refer to the tables below and the definition of the **StencilMode** register in section §4.4.1 to fully understand their relationship. | | | | Stencil Test | |------------|------|--------|--------------| | | | Pass | Fail | | Depth Test | Pass | dppass | sfail | | | Fail | dpfail | sfail | **Table 4.7 Possible Update Operations for Stencil Planes** The entries dppass, dpfail and sfail are set to one of the update operations below. Source stencil is the value in the stencil buffer: | Update Method | Mode | Stencil Value | |---------------|------|--------------------------------------------------------------| | Keep | 0 | Source stencil | | Zero | 1 | 0 | | Replace | 2 | Reference stencil | | Increment | 3 | Clamp (Source stencil + 1) to 2 <sup>stencil</sup> width - 1 | | Decrement | 4 | Clamp (Source stencil -1) to 0 | | Invert | 5 | ~Source stencil | Table 4.8 Stencil Operations In addition a comparison bit mask is supplied in the **StencilData** register. This is used to establish which bits of the source and reference value are used in the stencil function test. It should normally be set to exclude the top four bits when the stencil width has been set to 4 bits in the **StencilMode** register. The source stencil value can be from a number of places as controlled by bits 13-14 (*StencilSource*) in the **StencilMode** register: | Stencil Source | Mode | Use | |---------------------------------------------------------------|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Test logic | 0 | This is the normal mode. | | Stencil register | 1 | This is used, for instance, in the OpenGL draw pixels function where the host supplies the stencil values in the <b>Stencil</b> register. This is used when a constant stencil value is needed, for example, when clearing the stencil buffer when fast clear planes are not available. | | Source stencil value read from the localbuffer | 2 | This is used, for instance, in the OpenGL copy pixels function when the stencil planes in the destination are <b>not</b> to be updated. The stencil data comes from the localbuffer. | | LBSourceData:<br>(stencil value read from the<br>localbuffer) | 3 | This is used, for instance, in the OpenGL copy pixels function when the stencil planes are to be copied to the destination | Table 4.9 Stencil Sources See *The OpenGL Reference Manual* and *The OpenGL Programming Guide* from Addison-Wesley for more details of stencil operations and examples of its use. # 4.4.1 Registers Stencil test is controlled by the **StencilMode** register: ## StencilMode StencilModeAnd StencilModeOr | Name | Type | Offset | Format | |----------------|---------|--------|---------------------| | StencilMode | Stencil | 0x8988 | Bitfield | | StencilModeAnd | Stencil | 0xAC60 | Bitfield Logic Mask | | StencilModeOr | Stencil | 0xAC68 | Bitfield Logic Mask | Control registers | Bits | Name | Read | Write | Reset | Description | |------|----------------|------|----------|-------|------------------------------------------------------------| | | | | | | | | 0 | Unit enable | V | V | X | 0 = Disable | | | | | | | 1 = Enable | | 13 | Update method | ~ | <b>v</b> | X | if Depth test passes and Stencil test passes (see table 1) | | 46 | Update method | ~ | V | x | if Depth test fails and Stencil test passes (see table 1) | | 79 | Update method | ~ | V | x | if Stencil test fails (see table 1) | | 1012 | Mode 0-7 | V | ~ | x | Unsigned comparison function (see table 2) | | 1314 | Stencil source | ~ | ~ | X | 0 = Test Logic | | | | | | | 1 = Stencil Register | | | | | | | 2 = LBData | | | | | | | 3 = LBSourceData | | 1516 | Stencil widths | ~ | ~ | X | 0 = 4 bits | | | | | | | 1 = 8 bits | | | | | | | 2 = 1 bit | | 1731 | Unused | 0 | 0 | X | | Figure 4-5 StencilMode Register The **StencilData** register holds the other data associated with the test. # StencilData StencilDataAnd StencilDataOr | Name | Type | Offset | Format | |----------------|---------|--------|---------------------| | StencilData | Stencil | 0x8990 | Bitfield | | StencilDataAnd | Stencil | 0xB3E0 | Bitfield Logic Mask | | StencilDataOr | Stencil | 0xB3E8 | Bitfield Logic Mask | Control registers | | Control registers | | | | | | | | | |------|-------------------|------|-------|-------|---------------------------------------------------|--|--|--|--| | Bits | Name | Read | Write | Reset | Description | | | | | | 07 | Stencil value | ~ | ~ | x | 8 bit stencil test value | | | | | | 815 | Compare mask | ~ | ~ | x | Determines which bits are significant in the test | | | | | | 1623 | Writemask | ~ | ~ | X | Determines which bits in localbuffer are updated | | | | | | 2431 | Reserved | 0 | 0 | X | | | | | | Figure 4-6 StencilData Register The stencil writemask is used to control which stencil planes are updated as a result of the test. The **Stencil** register holds an externally sourced stencil value. It is a 32 bit register of which only the least significant 8 bits are used. The unused most significant bits should be set to zero. The **Stencil** register must be enabled to update the stencil buffer. If it is disabled then the stencil buffer will only be updated if *ForceLBUpdate* is set in the **Window** register. ### 4.4.2 Stencil Example This example sets the stencil unit to use a supplied reference value (0x80) and to test fragments to be LESS than this value. It also sets the stencil planes update function to be Increment if the test passes and the depth test passes (or is not enabled), otherwise it sets the update function to Keep. ``` // Set the localbuffer read and write modes // Set the stencil modes stencilMode.UnitEnable = PERMEDIA4_ENABLE stencilMode.DPPass = PERMEDIA4_STENCIL_METHOD_INCREMENT stencilMode.DPFail = PERMEDIA4_STENCIL_METHOD_KEEP stencilMode.SFail = PERMEDIA4_STENCIL_METHOD_KEEP stencilMode.CompareFunction = PERMEDIA4 STENCIL COMPARE LESS stencilMode.StencilSource = PERMEDIA4 SOURCE TEST LOGIC stencilMode.Width = as appropriate StencilMode(stencilMode) // Set the reference stencil value and set the // compare and writemasks to 0xFF stencilData.ReferenceStencil = 0x80 stencilData.CompareMask = 0xFF stencilData.StencilWriteMask = as appropriate for width of Stencil buffer stencilData.FCStencil = don't care StencilData(stencilData) ``` // Enable the depth test here if required, if not enabled the result of the depth test is set to pass. # 4.5 Depth Test The depth (Z) test, if enabled, compares a fragment's depth against the corresponding depth in the depth buffer. The result of the depth test can affect the stencil buffer update if stencil testing is enabled. This test is only performed if all the preceding tests (bitmask, scissor, stipple, alpha, pixel ownership, stencil) have passed. The comparison tests available are: | Mode | Comparison Function | |------|---------------------| | 0 | Never | | 1 | Less | | 2 | Equal | | 3 | Less Than or Equal | | Mode | Comparison Function | |------|-----------------------| | 4 | Greater | | 5 | Not Equal | | 6 | Greater Than or Equal | | 7 | Always | Table 4.10 Depth Comparison Modes. The test compares the fragment's depth against a source depth value. If the compare function is LESS and the result is true then the fragment value is less than the source value. The source value can be obtained from a number of places as controlled by a field in the DepthMode register. | Source | Use | | | |-----------------------|----------------------------------------------------------------------------------------------|--|--| | DDA (see below) | This is used for normal Depth (Z) buffered 3D rendering. | | | | <b>Depth</b> register | This is used, for instance, in the OpenGL draw pixels function where the host supplies the | | | | | depth values through the Depth register. | | | | | Alternatively this is used when a constant depth value is needed, for example, when clearing | | | | | the depth buffer or 2D rendering where the depth is held constant. | | | | LBSourcData: | Source depth value from the localbuffer: | | | | | This is used, for instance, in the OpenGL copy pixels function when the depth planes are to | | | | | be copied to the destination. | | | | Source Depth | This is used by X during the a window copy operation where all the fields in the pixel are | | | | | moved. | | | | | This is used in the OpenGL CopyPixels function when the depth planes in the destination are | | | | | not updated. The depth data will come either from the LBData message of the FCDepth | | | | | register depending the state of the Fast Clear modes in operation. | | | #### Table 4.11 Depth Sources When using the depth DDA for normal depth buffered rendering operations the depth values required are similar to those required for the color values in the color DDA unit: Zstart = Start Z Value dZdYDom = Increment along dominant edge. dZdX = Increment along the scan line. The dZdX value is not required for Z-buffered lines. | rmedia4 Programmer's Guide Volume II Localbuffer Read/Write | | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--| | gure 4-7 Depth Interpolation The number format for the increment values is 2's complement fixed point integer: 32 bits integer and 16 bits fraction. All the start, derivative and internal data is in this format. This is mapped into the Upper and Lower registers (U and L) as shown below: | | | | | | gure 4-8 Depth Derivative Format. | | | The depth unit must be enabled to update the depth buffer. If it is disabled then the depth buffer will only be updated if ForceLBUpdate is set in the Window register. | | | 5.1 Registers Operation of the Depth unit is controlled by the <b>DepthMode</b> register: | | # DepthMode DepthModeAnd DepthModeOr NameTypeOffsetFormatDepthModeDepth0x89A0BitfieldDepthModeAndDepth0xAC70Bitfield Lo DepthModeAndDepth0xAC70Bitfield Logic MaskDepthModeOrDepth0xAC78Bitfield Logic Mask Control registers | Bits | Name | Read<br>13 | Write | Reset | Description | |------|------------|------------|----------|-------|----------------------------------------------------------------------------------------| | 0 | Enable | ~ | <b>'</b> | X | This bit, when set, enables the depth test and the | | | | | | | replacement depth value to depend on the outcome of | | | | | | | the test. Otherwise the test always passes and the | | | | | | | depth data in the local buffer is not changed. | | 1 | WriteMask | <b>V</b> | V | X | This bit, when set enables the depth value in the local | | | | | | | buffer to be updated when doing a read-modify-write | | | | | | | operation. The byte enables (LB Write) can also be | | | | | | | used when the Z value is 16 or 24 bits in size. | | 23 | NewDepth | <b>~</b> | <b>~</b> | X | The depth value to write to the local buffer can come | | | Source | | | | from several places. The options are: | | | | | | | 0 = DDA. | | | | | | | 1 = Source depth (i.e. read from Local Buffer) | | | | | | | 2 = Depth register | | | | | | | 3 = LBSourceData register. Only generated when | | | | | | | source and destination reads are enabled. | | 46 | Compare | <b>✓</b> | <b>~</b> | X | This field selects the compare function to use. The | | | Function | | | | options are: | | | | | | | 0 = Never $1 = Less$ | | | | | | | 2 = Equals $3 = $ Less Equals $4 = $ Greater $5 = $ Not Equal | | | | | | | 1 | | 78 | Width | | | | 6 = Greater Equal 7 = Always This field holds the width in bits of the depth field in | | /0 | Width | ~ | ~ | X | local buffer. The options are: | | | | | | | 0 = 16 bits wide $1 = 24$ bits wide | | | | | | | 2 = 31 bits wide $3 = 15$ bits wide | | 9 | Normalise | ~ | ~ | X | This bit, when set, will use all 50 bits of the DDA for | | | Normanse | • | | Λ | Z interpolation, even for 24 or less bits of depth. The | | | | | | | Width field must be set up to restrict the number of | | | | | | | bits used in the comparison operation. When this bit | | | | | | | is clear the depth test is compatible with GLINT MX. | | | | | | | This bit should be 0 if NonLinearZ is set. | | 10 | NonLinearZ | _ | / | X | This bit, when set, enables the 32 bit DDA Z value to | | | | | | | be encoded in 15, 16 or 24 bits using a non linear | | | | | | | pseudo floating point representation. The non linear | | | | | | | format is controlled by the following two fields. | $<sup>^{13}</sup>$ Logic Op register readback is via the main register only 4-14 | 1112 | Exponent Scale | ~ | ~ | X | This field defines how much the exponent should be | | |------|----------------|----------|----------|---|----------------------------------------------------|--| | | | | | | scaled by. The options are: | | | | | | | | 0 = scale by 1 $1 = scale by 2$ | | | | | | | | 2 = scale by 4 $3 = scale by 8$ | | | 1314 | Exponent | <b>/</b> | <b>/</b> | X | This field defines the number of bits in the depth | | | | Width | | | | word to use as exponent bits. The options are: | | | | | | | | 0 = 1 bit wide exponent field | | | | | | | | 1 = 2 bits wide $2 = 3$ bits wide | | | | | | | | 3 = 4 bits wide | | | 1531 | Unused | 0 | 0 | X | | | Notes: The register defines Depth operation. It controls the comparison of a fragment's depth value and updating of the depth buffer. (If the compare function is LESS and result = TRUE then the fragment value is less than the source value.) The logic operator equivalents behave the same way but the new mode is AND'd or OR'd with the former mode before replacing it. #### Figure 4-9 DepthMode Register. The single bit writemask is used to control updating all the bits in the depth buffer. Depth values can come from the **Depth** register or Source or Destination Framebuffer reads, or the DDA. The **Depth** register holds an externally sourced 32 bit depth value. If the depth buffer holds less than 32bits then the user supplied depth value is right justified to the least significant end of the register. The unused most significant bits should be set to zero. The DDA and other registers are shown below (note the increment values are split into two registers): | Register | Description | |----------|--------------------------------------------------------------| | ZStartU | Depth start value | | ZStartL | | | dZdxU | Depth derivative per unit X | | dZdxL | | | dZdyDomU | Depth derivative per unit Y, dominant edge, or along a line. | | dZdyDomL | | Table 4.12 Depth Interpolation Registers. ## 4.5.2 Depth Example Rendering a Gouraud shaded depth buffered trapezoid. ``` // Set the localbuffer read and write modes // Set the depth mode depthMode.UnitEnable = PERMEDIA4_ENABLE depthMode.WriteMask = 1 ``` ``` depthMode.NewDepthSource = PERMEDIA4_NEW_DEPTH_SOURCE_DDA depthMode.CompareMode = PERMEDIA4_DEPTH_COMPARE_MODE_LESS DepthMode(depthMode) // Load the depth start values and deltas for // dominant edge and the body of the trapezoid ZStartU() // Load upper and lower start values ZStartL() dZdxU() // Load upper and lower dZdX deltas dZdxL() dZdyDomU() // Load upper and lower dominant edge deltas dZdyDomL() // Enable unit in Gouraud shading mode colorDDAMode.UnitEnable = PERMEDIA4_ENABLE color DDAMode. Shade = PERMEDIA4\_GOURAUD\_SHADE\_MODE ColorDDAMode(colorDDAMode) // Load the color start values and deltas for // dominant edge and the body of the trapezoid Rstart() // Set-up the red component start value dRdX() // Set-up the red component increments dRdYDom() Gstart() // Set-up the green component start value dGdX() // Set-up the green component increments dGdYDom() Bstart() // Set-up the blue component start value dBdX() // Set-up the blue component increments dBYDom() // Render primitive ``` 5 # **Texture Mapping** Texture Mapping memory management was introduced in Volume I, section 4.5 - Texture Mapping. The following pages describe the process from the graphics programming point of view. For a discussion of the theory and practice of texture mapping.see the *OpenGL Specification* and the *OpenGL Programming Guide*. For each fragment within a primitive, texture mapping involves the following steps: - 1. calculate the perspecively correct texture coordinates for each fragment - 2. calculate the level of detail for mipmapping - 3. convert texture coordinates into memory indices - 4. load texels into primary cache - 5. format cache data into texels for filtering - 6. check color values and optionally replace a range with apha values to indicate transparency - 7. filter texels from cache based on color components - 8. composite the color and texel values with constant color values to produce a final texture value. These fall into several different phases of operation: - 1. Coordinate interpolation and perspective correction - 2. Memory indexing - 3. Cache loading - 4. Alpha and texture filtering and border color - Texel compositing - Color value calculation and application including lighting effects and application modes # 5.1.1 Compatibility with Earlier Chipsets - Color interpolation is largely unchanged although TextureAddressMode is now named TextureCoordMode and TextureLODBiasS and -T need to be set to 0 to be compatible with the GLINT MX. - Level of Detail calculations now use TextureFilterMode instead of TextureReadMode. Supported texels must be 4, 8 or 16 bpp - 1, 2 and 4 bpp texels are not supported. - **TextureReadMode** is not backward compatible with the MX chipset. - LUT control registers have been consistently renamed (LUT[0...15], LUTAddress, LUTIndex, LUTData, LUTTransfer, LUTMode - The **TextureColorMode** register has been renamed **TextureApplicationMode** and the Color and Alpha data are managed separately during compositing and application. - **TextureFilterMode** *enable* must be set (=1) when texture mapping is enabled. The enable bit works in conjunction with the *TextureEnable* bit in the **Render** Command. - GID is no longer controlled by the #### 5.2 Texture Co-ordinate Generation To generate the texture addresses, DDAs are used to interpolate the texture coordinates over a trapezoid or line primitive. There are two general modes of operation: 2D and 3D. In 3D mode, the task divides into the following steps: - interpolate the texture coordinates (S, T, Q) using the DDA units - perspective correction of the coordinates by calculating S/Q and T/Q - level of detail calculation - wrap the corrected coordinates (s, t) using mirror, repeat or clamp operations to map the coordinates into the range 0.0 to 1.0 (u, v) - pass the resulting coordinates (u, v) to the texture read unit. For the 2D mode, the perspective correction stage is omitted, the wrap operation is always a repeat operation and no level of detail is performed. #### 5.2.1 Calculate texture coordinates Coordinate interpolation can be either 2D or 3D (set in the **TextureIndexMode** register): For 2D operations the step or span messages trigger interpolation of the S and T coordinates (Q, S1, T1 and Q1 are not used). This is used for tiled fills, characters and icons, arbitrary large stipple patterns, color index dithering etc. For 3D operations, **TextureCoordMode** interpolates two sets of texture coordinates (S,T and Q and S1, T1, Q1) and corrects them for perspective and range before they are used for Cache loading. The coordinates can be used for (a) determining the Level of Detail for MIP mapping, or (b) calculating a 3D texture coordinate. When used for LOD, the S, T and Q values are applied as a set of linked coordinates to the current fragment, while S1, T1 and Q1 are automatically offset in **dY** to track the coordinates in the adjacent fragment. When used for 3D texturing, the Delta unit allocates S, T, Q and R as a set of linked coordinates to S, T, Q and S1. T1 is ignored and Q1 is a copy of Q. The S, T and Q parameters are interpolated in DDA units in the same way as other interpolants: the 9 control registers: **SStart**, **dSdx**, **dSdyDom**, **TStart**, **dTdx**, **dTdyDom**, **QStart**, **dQdx** and **dQdyDom** hold the start, **dX** and **dYDom** parameters for S, T and Q. The values of S, T and Q at each vertex are used to calculate the gradient values in much the same way as the color gradients when Gouraud shading. The fixed point format of these registers can be defined as you wish but must be internally consistent - the divide operation yields consistent internal results. One method of ensuring that the full range of accuracy available in the DDAs is used but not exceeded (the DDAs clamp if the range is exceeded) is to normalize the S, T, Q values before calculating the gradient values. For example, for a triangle primitive this involves finding the maximum absolute value of the 9 register values defined at the vertices, and scaling the other 8 values appropriately. #### **5.2.1.1** Perspective Correction At each pixel there is a division operation to achieve perspective correction of the texture coordinates and derive the s, t coordinates used to index the texture map through the equations: After the division, the s, t coordinates are wrapped to lie in the range 0.0 to 1.0 inclusive (and therefore within the range of the defined texture map). The wrapped coordinates are denoted as u, v. These are used to index the raw texel data in memory. Note: In the unusual case where perspective correction must be disabled, refer to the **DeltaControlMode** register's ForceQto1 bit enable. #### 5.2.2 Level of Detail calculation The Level Of Detail (LOD) calculates the approximate area a fragment projects onto the texture map. The LOD value is then used: - To select between the mininfication and magnification filter modes provided in the TextureReadMode register. - The one or two texture maps to use when mipmapping. - The between-maps interpolation factor if the mipmapping requires two maps. The LOD calculation requires the dSdy, dTdy and dQdy values to proceed. These are not supplied by the onboard Delta unit or Gamma accellerator so must be provided by the Texture unit. The *EnableDY* bit in the **TextureCoordMode** register selects the data source for the calculation. If the *EnableDY* bit is *not* set the **dSdy**, **dTdy** and **dQdy** values can be provided externally by writing into the corresponding registers. The LOD calculation itself is enabled by the <code>EnableLOD</code> bit in the <code>TextureCoordMode</code> register. When this bit is clear a constant LOD from the <code>LOD</code> register is used (when it is required by <code>TextureReadMode</code>). The format is unsigned 4.8 fixed point and can be interpreted as follows: the integer part selects the higher resolution map of the pair to use with 0 using the map at the address given by <code>TextureBaseAddr[0]</code> register; the fraction gives the between map interpolation coefficient measured from the higher resolution map selected. **Lod0** is the LOD value calculated as described above. This always relates to texture 0. **Lod1** is a user-supplied value relating to texture 1. Both LOD values can be clamped using **LODRange0** and **LODRange1** respectively. LOD values can be further clamped or constrained by setting the width and height values in **TextureCoordMode**, biased in **TextureCoordinateMode** and biased and clamped in **TextureRead**. These constraints allow large textures to be loaded at a low resolution and gradually, by continuous clamping, raised to its final resolution without "popping" artefacts. #### 5.2.2.1 Texture Coordinate Wrapping Modes Three wrapping modes are available - Clamp, Repeat and Mirror - and s and t can be wrapped individually. The selected mode is held in the *WrapS* and *WrapT* fields in the **TextureCoordMode** register, and in the *WrapU* and *WrapV* fields in the **TextureIndexMode** register. The wrapping modes are listed in the register descriptions in the *Reference Guide*. *Note:* These wrap modes differ from the cylindrical Direct3D texture wrap described in volume I, which is implemented as part of the DeltaFormat unit (section 3.2.4). | Wrapping<br>Mode | Description | | | | | | |------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--| | Clamp | This tests the coordinate against 1.0 and if the coordinate is larger sets the coordinate to 1.0. Similarly if the coordinate is less that 0.0 it is set to 0.0. | | | | | | | | This causes texels outside of the texture map to be set to the edge values. | | | | | | | | | | | | | | | Repeat | The integer part of the coordinate is discarded just to leave the fractional part. The Repeat mode creates a saw-tooth transfer function, which as the name suggests, causes the texture pattern to be repeated (i.e. tiled) over the polygon. Abutting edges are from opposite sides of the texture map so unless care is taken a discontinuity may be seen. | | | | | | | Mirror | This is similar to <b>Repeat</b> , but when the integer part is odd the value (1.0 - fraction) is used instead of just the fraction. This creates a triangle transfer function, which has the advantage that butting edges always match. | | | | | | Table 4-1 Texture Wrapping - Repeat and Clamp modes are as defined by OpenGL. #### 5.2.2.2 Texture Address Registers The following registers set up the texture interpolation deltas: | Register | Description | |----------|----------------------------------------| | Sstart | S start value | | DSdx | S derivative per unit X | | DSdyDom | S derivative per unit Y, dominant edge | | Tstart | T start value | | DTdx | T derivative per unit X | | dTdyDom | T derivative per unit Y, dominant edge | | Qstart | Q start value | | DQdx | Q derivative per unit X | | DQdyDom | Q derivative per unit Y, dominant edge | | DSdy | S derivative per unit Y | | DTdy | T derivative per unit Y | | DQdy | Q derivative per unit Y | **Table 5.2 Texture Interpolation Registers** #### 5.2.2.3 Mipmapping A mipmap is an ordered set of arrays representing the same image. Each array has half the linear resolution of the preceding one. This technique allows minification filtering to occur with a constant time overhead irrespective of the size of the projected area. The first filter name for mipmapping in the *MinFilter* field specifies the filtering to be done on a level, and the second filter name specifies the filtering to be done between levels. Mipmap is enabled by setting the *MipMapEnable* bit (bit 20) in the **TextureIndexMode** register. Other Mipmap parameters are also controlled by TextureIndexMode, including Magnification and Minification filter types. #### 5.2.3 Texture Read The texture read phase fetches and formats texel data. This involves taking the u, v coordinates generated by the texture address unit and possibly the LOD value and calculating the physical address in the localbuffer where the texture is stored. The texture information (texels) is read and forwarded for Texture Filtering. The interpolation coefficients (if any are needed) are derived from the u, v coordinates (and possibly the LOD value) and passed on as well. The texture cache management process is described in Volume I, Section 4-6 - Primary Cache. The Texture Read operation is controlled by **TextureReadMode0** and **TextureReadMode1** which are the same. However most modes cannot be eabled in both caches at the same time. The supported combinations are: - One nearest or linear filtered texture using both halves of the cache to achieve higher cache hit rates on larger texture maps or polygons. - Any two independent nearest or linear filtered textures, one per half of the cache. - One automatically (or per pixel) mip mapped texture (always texture 0) using both halves of the cache to store alternate levels of the mip map. - One 3D texture map using both halves of the cache to store alternate slices of the 3D volume. - Two independent mip mapped textures where the minification filters only use texels from one level at a time (i.e. the filter are NearestMipNearest or LinearMipNearest). Each texture uses half the cache. There are no interlocks to prevent the user selecting a non-supported combination and in this case the mode settings in **TextureReadMode0** take priority. # TextureReadMode0 TextureReadMode0And TextureReadMode0Or | Name | Type | Offset | Format | |---------------------|---------|--------|---------------------| | TextureReadMode0 | Texture | 0xB400 | Bitfield | | TextureReadMode0And | Texture | 0xAC30 | Bitfield Logic Mask | | TextureReadMode0Or | Texture | 0xAC38 | Bitfield Logic Mask | Control registers **Bits** Name Read Write Reset Description 14 Enable When set causes any texels needed by the fragment to 0 1 x be read. This is also qualified by the TextureEnable bit in the Render command. 1...4 Width 1 This field holds the width of the map as a power of 1 x two. The legal range of values for this field is 0 (map width = 1) to 11 (map width = 2048). This is only used when Texture3D is enabled and then is only used for cache management purposes and *not* for address calculations. 5...8 Height This field holds the height of the map as a power of x two. The legal range of values for this field is 0 (map height = 1) to 11 (map height = 2048). This is only used when Texture3D is enabled and then is only used for cache management purposes and *not* for address calculations. 9...10 TexelSize This field holds the size of the texels in the texture map. The options are: 0 = 8 bits 1 = 16 bits 2 = 32 bits 3 = 64 bits (Only valid for spans) This bit, when set, enables 3D texture index 11 Textue3D / / generation. The CombinedCache mode bit should not be set when 3D textures are being used. 12 This bit, when set, causes the two banks of the Combine 1 1 Caches Primary Cache to be joined together, thereby increasing the size of a single texture map which can be efficiently handled. 13...16 MapBaseLevel This field defines which TextureBaseAddr register 1 should be used to hold the address for map level 0 when mip mapping or the texture map when not mip mapping. Successive map levels are at increasing TextureBaseAddr registers upto (and including) the MapMaxLevel (next field). 3D textures always use TextureBaseAddr0. 4 <sup>&</sup>lt;sup>14</sup> Logic Op register readback is via the main register only | 1720 | MapMaxLevel | • | • | x | This field defines the maximum TextureBaseAddr register this texture should use when mip mapping. Any attempt to use beyond this level will clamp to this level. | |------|----------------|----------|----------|---|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 21 | LogicalTexture | • | ~ | X | This bit, when set, defines this texture or all mip map levels, if mip mapping, to be logically mapped so undergo logical to physical translation of the texture addresses. | | 22 | Origin | • | <b>V</b> | X | This field selects where the origin is for a texture map with a Linear or Patch64 layout. The options are: 0 = Top Left. 1 = Bottom Left A Patch32_2 or Patch2 texture map is always bottom left origin. | | 2324 | TextureType | <b>'</b> | <b>/</b> | X | This field defines any special processing needed on the texel data before it can be used. The options are: 0 = Normal. | | 2527 | ByteSwap | ~ | ~ | x | This field defines the byte swapping, if any, to be done on texel data when it is used as a bitmap. This is automatically done when spans are used. Bit 27, when set, causes adjacent bytes to be swapped, bit 26 adjacent 16 bit words to be swapped and bit 27 adjacent 32 bit words to be swapped. In combination this byte swap the input (ABCDEFGH) as follows: 0 ABCDEFGH 1 BADCFEHG 2 CDABGHEF 3 ABCDEFGH 4 EFGHABCD 5 FEHGBADC 6 GHEFCDAB 7 HGFEDCBA | | 28 | Mirror | ~ | ~ | X | This bit, when set will mirror any bitmap data. This only works for spans. | | 29 | Invert | ~ | ~ | X | This bit, when set will invert any bitmap data. This only works for spans. | | 30 | OpaqueSpan | • | ~ | x | This bit, when set, uses the span color mask instead of the pixel mask to define foreground and background colors using the <b>FBBlockColor</b> and <b>FBBlockColorBack</b> registers. | | 31 | Reserved | 0 | 0 | X | | Notes: The unit is controlled by the TextureReadMode0 and TextureReadMode1 registers for texture 0 and texture 1 respectively. Not all combinations of modes across both registers are supported and where there is a clash the modes in TextureReadMode0 take priority. For per pixel mip mapping the TextureRead0 and TextureReadMode1 register should be set up the same as should the TextureMapWidth0 and TextureMapWidth1 registers. N.B. The layout and use of the *TextureReadMode* register is not compatible with GLINT MX: 1, 2, and 4 bit textures are no longer supported. The logic operator equivalents behave the same way but the new mode is AND'd or OR'd with the former mode before replacing it. #### Figure 5-1 TextureReadMode Register #### 5.2.4 Filter Modes All the filter modes of OpenGL are supported, that is: | Minification | Magnification | |----------------------|---------------| | Nearest | Nearest | | Linear | Linear | | NearestMipMapNearest | | | NearestMipMapLinear | | | LinearMipMapNearest | | | LinearMipMapLinear | | "Minification" is the name given to the filtering situation where multiple texels map to a single fragment, while magnification is the name given to the filtering situation where only a portion of a single texel maps to a single fragment. "Nearest" is the simplest form of filtering where the nearest texel to the texture coordinate location is selected. "Linear" is a more sophisticated filtering algorithm which is dependent on the type of primitive. For lines (which are 1D), it involves linear interpolation between the two nearest texels. For polygons and points which are considered to have finite area, linear is in fact bilinear interpolation which interpolates between the nearest 4 texels. #### 5.2.4.1 Texture Patching In Permedia4 the data part of the primary cache is managed by the **TextureFilterMode** register, while the tag part is managed by the **TextureReadMode** register. The Filter functionality includes data formatting and alpha mapping. #### 5.2.4.2 Primary Cache The primary cache holds the texel data in 8, 16 or 32 bits per texel format. The cache is divided up into 8 banks and there is a fixed relationship between a texel's position in the texture map and which bank of cache it must be stored in. The 8 banks are assigned depending on the type of texture mapping being done: Single bilinear The texture map is stored in both banks of the cache. This is achieved by connecting the output of the second bank's register files to the by connecting the output of the second bank's register files to the corresponding register files in bank 0. This is controlled by the *CombineCaches* bit in **TextureFilterMode**. This allows the full size of the cache to be used on a single texture, so a larger texture map can be handled before scanline coherency starts to break down, with the consequential loss of performance. Dual bilinear Texels from texture map 0 are stored in banks 0...3 and texels from texture map 1 are stored in banks 4...7. Mip mapping Even mip maps are stored in banks 0...3, odd mip maps are stored in banks 4...7. 3D texture maps Texels with an even k coordinate (i.e. the third coordinate) are in banks 0...3 and maps with an odd k coordinate are in banks 4...7. where T0...T3 represents the cache banks and the numbers in brackets are the coordinate of the texel in the map. Storing the texture map in memory with one row following the next can gives poor access times when scanning along a column due to the page breaks. This does not apply If the texture map is smaller than the page size. When the texture map is significantly larger than the page size, make access time less dependent on scanning direction by patching the texture map. This ensures that a 2D region of the map is stored in one page. All the texels within a word are always sequential along a row and a patch is 16x16 words, hence the patch size in texels varies from 16x16 (for 32 bit texels) to 512x16 (for 1 bit texels). If packed texture maps are required then the packing can be done automatically during texture download 15, or must be done by the host if the localbuffer bypass is used. Note that some wastage of the memory space will occur if the texture map dimensions are not an integer multiple of the patch size. <sup>&</sup>lt;sup>15</sup> See Volume I, Section 4.5.1.2 - Patch Layout Rules Figure 5-2 Texture Patch Example - Map Width: The patch mode is only useful when the width of the map exceeds 16 words. - Map Height: The patch mode works best when the height of the map is greater than 16 texels. For maps which are less than this in height a portion of the patch will not be used so the texel data will be spread out in memory. Consider a 1K word x 4 texture map. This will occupy a quarter of the patch memory so 16K words need to be set aside for 4K of texels. Moving between rows will occur without page breaks, where as in the non patch case it would incur a page break. It is possible to interleave 4 such maps so getting the benefit of less page breaks without the cost of the additional memory. • Filter and MapType: The filter (Nearest or Linear) and map type (1D or 2D) determine how many addresses are generated. A texel on the map has the integer coordinates i, j and these are calculated from u, v and the width and height values. These integer coordinates are guaranteed to lie on the texture map (excluding the border texels, if present), so for the nearest filter mode the texel is just read and used. For the linear filter mode and 2D MapType the four texels (i, j), (i+1, j), (i, j+1) and (i+1, j+1) are read, with obvious reductions for the 1D MapType. The coordinates (i+1) and/or (j+1) may not lie on the texture map. If the texture map has a border (specified in the *Border* field) then the appropriate texel from the texture map is read, otherwise texel is taken from the **BorderColor** register. The texel color stored in this register is in 8:8:8:8 format. Texture maps are preferably stored in memory as a 2x2 patch such that the texels in the patch are in the same memory word. When texture maps are not in this format (i.e. the memory layout is Linear or Patch64) the Texture Read Unit passes the texel data on in the patched format. The following diagram shows the layout of texels assumed by this unit when loading up the cache. This exactly matches the layout in memory when one of the 2x2 patch modes are used. #### 32 bits per texel #### 16 bits per texel | 120 | 112 | 104 | . 96 | . 88 | . 80 | 72 | 64 | 56 | . 48 | 40 | . 32 | . 24 | 16 | . 8 | . 0 | _ | |-------------------------|----------|-------------------------|------|------------------------|-----------|-------------------------|----|------------------------|------|------------------------|------|-------------|-----------|-----------|-----------|---| | (3,<br>T3 <sub>16</sub> | 1)<br>31 | (2,<br>T2 <sub>10</sub> | , | (3,<br>T1 <sub>1</sub> | 0)<br>631 | (2,<br>T0 <sub>16</sub> | | (1,<br>T3 <sub>0</sub> | , | (0,<br>T2 <sub>0</sub> | , | <b>T</b> 1, | 0)<br>)15 | (0,<br>T0 | 0)<br>015 | | #### 8 bits per texel ## 5.2.5 Texel Formatting Texel formatting is controlled by the **TextureFilterMode** register: # **TextureFilterMode TextureFilterModeAnd TextureFilterModeOr** | Name | Type | Offset | Format | |-------------------|-------------|--------|----------| | TextureFilterMode | Alpha Blend | 0x84E0 | Bitfield | TextureFilterModeAnd Alpha Blend Bitfield Logic Mask 0xAD50ChromaTestModeOr Alpha Blend 0xAD58Bitfield Logic Mask Control registers | Bits | Name | Read<br>16 | Write | Reset | Description | | | |------|---------------------|------------|-------|-------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--| | 0 | Enable | ~ | ~ | X | When set causes the output to be calculated as defined by the fields in this register, otherwise the texel0 and texel1 values are set to zero. The TextureEnable bit in the <i>Render</i> command must also be set to enable this unit. | | | | 14 | Format0 | • | • | X | This field selects the format of the texel data T0T3. The options are 0 = A4L4 1 = L8 2 = I8 3 = A8 4 = 332 5 = A8I8 6 = 5551 7 = 565 8 = 4444 9 = 888 10 = 8888 or YUV | | | | 5 | ColorOrder0 | • | • | X | This bit selects the color component order of the texel data T0T3. The two options are: 0 = AGBR 1 = ARGB | | | | 6 | AlphaMapEnab<br>le0 | • | ~ | X | This bit, when set, enables the alpha value of texels T0T3 to be forced to zero based on testing the color values. | | | | 7 | AlphaMapSense<br>0 | ~ | V | x | This bit selects if the alpha value for texels T0T3 should be set to zero when the colors are in range or out of range. The options are: 0 = Out of range 1 = In range | | | | 8 | Combine<br>Caches | <b>V</b> | ~ | X | This bit, when set, combines both banks of the cache so they are used for texture 0. This is an optimisation and allows larger textures to be handled before scanline coherency starts to break down. | | | $<sup>^{16}\,\</sup>mathrm{Logic}$ Op register readback is via the main register only | 0 10 | Г 4 | | | 1 | 771: C 11 1 . 1 C . C 1 . 1 | |------|----------------|----------|----------|----|---------------------------------------------------------| | 912 | Format1 | ~ | ~ | x | This field selects the format of the texel data T4T7. | | | | 1 | | | The options are | | | | | 1 | | 0 = A4L4 | | | | | 1 | | 1 = L8 | | | | | | | 2 = I8 | | | | | | | 3 = A8 | | | | | | | 4 = 332 | | | | | | | 5 = A818 | | | | | | | 6 = 5551 | | | | | | | 7 = 565 | | | | | | | 8 = 4444 | | | | | | | 9 = 888 | | | | | | | 10 = 8888 or YUV | | 13 | ColorOrder1 | ~ | | | This bit selects the color component order of the texel | | 13 | Color Order 1 | <b>"</b> | <b>'</b> | X | data T4T7. The two options are: | | | | | | | 0 = AGBR | | | | 1 | | | 1 = ARGB | | 14 | AlphaManEngh | - | + - | | This bit, when set, enables the alpha value of texels | | 14 | AlphaMapEnab | ~ | <b>~</b> | X | • | | | le1 | 1 | | | T4T7 to be forced to zero based on testing the | | 4.5 | A1 1 A5 C | - | 1 | | color values. | | 15 | AlphaMapSense | ~ | ~ | X | This bit selects if the alpha value for texels T4T7 | | | 1 | | | | should be set to zero when the colors are in range or | | | | | | | out of range. The options are: | | | | | | | 0 = Out of range | | | | | | | 1 = In range | | 16 | AlphaMapFilter | ~ | <b>/</b> | X | This bit, when set, will allow the alpha mapped texels | | | ing | | | | (AlphaMapEnable must be set) to cause the fragment | | | | | | | to be discarded depending on the comparison of the | | | | | | | number of texels to be alpha mapped with the | | | | | | | following three limit fields. | | 1719 | AlphaMapFilter | / | ~ | x | This field holds the number of alpha mapped texels in | | | Limit0 | • | | | the group T0T3 which must be exceeded for the | | | | | | | fragment to be discarded. | | 2022 | AlphaMapFilter | <b>v</b> | V | x | This field holds the number of alpha mapped texels in | | | Limit1 | | | A. | the group T4T7 which must be exceeded for the | | | | 1 | | | fragment to be discarded. | | 2326 | AlphaMapFilter | <b>/</b> | ~ | 1 | This field holds the number of alpha mapped texels in | | | Limit01 | | | X | the group T0T7 which must be exceeded for the | | | | | | | fragment to be discarded. | | 27 | MultiTexture | | + | | This bit, when set, prevents the Alpha Map Filtering | | 41 | Munitexture | ~ | <b>~</b> | X | logic from testing the I4 interpolant and maybe | | | | | 1 | | | | | | 1 | | | disregarding the alpha map result of T0T3 or | | | | | 1 | | T4T7. This bit should be set for multi texture | | | | | 1 | | operation when alpha map filtering is required. It | | 20 | D 411 W | | 1 | | should be clear otherwise. | | 28 | ForceAlphaTo | ~ | <b>~</b> | X | This bit, when set, will force the alpha channel of | | | One0 | 1 | | | T0T3 to be set to 1.0 (255) regardless of the color | | | | | | | format or the presence of a real alpha channel. | | 29 | ForceAlphaTo | <b>~</b> | <b>/</b> | X | This bit, when set, will force the alpha channel of | | | One1 | | 1 | | T4T7 to be set to 1.0 (255) regardless of the color | | | | | | | format or the presence of a real alpha channel. | | 30 | Shift0 | This bit, when set, causes the conversion of T0T3 | | | |----|--------|-------------------------------------------------------|--|--| | | | for color components less than 8 bits wide to be done | | | | | | by a shift operation, otherwise a scale operation is | | | | | | needed. The shift operation is useful where the exact | | | | | | color (after dithering) is to be preserved for flat | | | | | | shaded areas, such as in a stretch blit. | | | | 31 | Shift1 | This bit, when set, causes the conversion of T4T7 | | | | | | for color components less than 8 bits wide to be done | | | | | | by a shift operation, otherwise a scale operation is | | | | | | needed. The shift operation is useful where the exact | | | | | | color (after dithering) is to be preserved for flat | | | | | | shaded areas, such as in a stretch blit. | | | Notes: The logic operator equivalents behave the same way but the new mode is AND'd or OR'd with the former mode before replacing it. For most texel formats the data in the cache is held in the raw memory format. The two exceptions to this are 8 bit indexed textures and YUV422 format textures. In both these cases the original texel data is converted into 32 bit AGBR format before being loaded into the cache. The first task is to extract the byte or short - this is given by the bottom two bits of the address for this cache channel. The second task it to isolate the individual color components from the texel data. The following table shows the different color modes supported. In the R, G, B and A columns the nomenclature n@m means this component is n bits wide and starts at bit position m in the data. The least significant bit position is 0. The number 255 indicates this component is hardwired to this value. Two color ordering formats are supported, namely ABGR and ARGB, with the right most letter representing the color in the least significant part of the word. This is controlled by the Color Order bit in the TextureFilterMode message, and is easily implemented by just swapping the R and B components after conversion into the internal format. The only exception to this are the 3:3:2 format where the actual bit fields extracted need to be modified as well because the R and B components are differing widths. | Format | Color | Name | Width | R | G | В | Α | |--------|-------|-------------|-------|-------------|-----|------|------| | | Order | | | | | | | | 0 | | A4L4 | 8 | 4@0 | 4@0 | 4@0 | 4@4 | | 1 | | L8 | 8 | 8@0 | 8@0 | 8@0 | 255 | | 2 | | I8 | 8 | 8@0 | 8@0 | 8@0 | 8@0 | | 3 | | A8 | 8 | 255 | 255 | 255 | 8@0 | | 4 | | 332 | 8 | 3@0 | 3@3 | 2@6 | 255 | | 5 | ABGR | A8I8 | 16 | 8@0 | 8@0 | 8@0 | 8@8 | | 6 | | 5551 | 16 | <b>5@</b> 0 | 5@5 | 5@10 | 1@15 | | 7 | | 565 | 16 | 5@0 | 6@5 | 5@11 | 255 | | 8 | | 4444 | 16 | 4@0 | 4@4 | 4@8 | 4@12 | | 9 | | 888 | 32 | 8@0 | 8@8 | 8@16 | 255 | | 10 | | 8888 or YUV | 32 | 8@0 | 8@8 | 8@16 | 8@24 | | 0 | | A4L4 | 8 | 4@0 | 4@0 | 4@0 | 4@4 | | 1 | | L8 | 8 | 8@0 | 8@0 | 8@0 | 255 | |----|------|-------------|----|------|-----|-----|------| | 2 | | 18 | 8 | 8@0 | 8@0 | 8@0 | 8@0 | | 3 | | A8 | 8 | 255 | 255 | 255 | 8@0 | | 4 | | 332 | 8 | 3@5 | 3@2 | 2@0 | 255 | | 5 | ARGB | A8I8 | 16 | 8@0 | 8@0 | 8@0 | 8@8 | | 6 | | 5551 | 16 | 5@10 | 5@5 | 5@0 | 1@15 | | 7 | | 565 | 16 | 5@11 | 6@5 | 5@0 | 255 | | 8 | | 4444 | 16 | 4@8 | 4@4 | 4@0 | 4@12 | | 9 | | 888 | 32 | 8@16 | 8@8 | 8@0 | 255 | | 10 | | 8888 or YUV | 32 | 8@16 | 8@8 | 8@0 | 8@24 | **Table 4.2.5 - Texture Color Modes** The alpha channel can be forced to 1.0 to override the alpha value, when the alpha channel in the texel data is to be ignored (this is independent of the color conversion mode - see next paragraph). When an extracted component is less than 8 bits wide it is made up to 8 bits by scaling or shifting. Scaling is preferred for normal 3D usage, however when the texture maps are being used for 2D operations (such as stretch blits) the shift method is preferred as it will maintain the same color during bilinear filtering over regions of constant color. Scaling is done by replicating the extracted component from the most significant end towards the least significant end of the byte. For example if a three bit component has bits B2, B1 and B0 then the 8 bit value would be made up as follows: | Bit 7 | | | | | | | Bit 0 of | |----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------| | | | | | | | | output | | | | | | | | | byte | | B <sub>2</sub> | B <sub>1</sub> | B <sub>0</sub> | B <sub>2</sub> | B <sub>1</sub> | B <sub>0</sub> | B <sub>2</sub> | B <sub>1</sub> | ## 5.2.6 Lookup Table (LUT) The LUT functionality includes: - Translating color data on a color-by-color basis (for, e.g., un-Gamma correcting) - Mapping CI data to 32-bit RGBA - Conversion of span pixel data from 8bpp to 8, 16 or 32 bpp, or RGB conversion from 32bpp to 32bpp. - Sourcing pattern fill data - Applying motion compensation to video streams - Map 8 bit CI texel data to 32bpp RGBA texel data needed for Texture Filter functiona.. #### 5.2.6.1 Loading the Texel LUT The LUT is 256 entries deep by 32 bits wide. The bottom 16 locations are directly accessed by the **LUT[0...15]** registers, and can be read back directly. The remaining entries are accessed in another way. The LUT can be loaded via the auto incrementing register writes or from the local buffer. The ability to load the entire LUT from the local buffer by writing to two registers greatly reduces the burden on the host of managing the LUT. The LUT data can be written into the local buffer initially either via the bypass or (better) using the normal texture download mechanism. #### 5.2.6.2 Loading the LUT via auto incrementing registers The start index in the LUT is written to the **LUTTransfer** register. The bottom 8 bits of the data give the index. Every subsequent write to the **LUTData** register loads the LUT with the data and increments the index. Reading back the **LUTIndex** register will return the incremented index value. #### 5.2.6.3 Loading the LUT from the local buffer. The local buffer address where the LUT is held is in the **LUTAddress** register. The start index and number of words to fill in the LUT are given in the **LUTTransfer** register with the index in the bits 0...7 and the count in bits 8...16. The write to the **LUTTransfer** register starts the transfer. A count of zero loads zero words into the LUT so this effectively disables the loading operation. The transfer wraps around in the LUT if necessary. The **LUTAddress** and **LUTTransfer** registers are not changed by the transfer and both can be read back. The restoration of these registers after a context switch automatically restores the LUT to it's previous contents. This assumes that the LUT hasn't been loaded piecemeal or via one of the other mechanisms and that the LUT data in the local buffer is still valid. If these conditions do not hold then the LUT will have to be restored manually. The LUT data is only held in the bottom 32 bits of the local buffer memory and the red component is in the least significant byte. #### 5.2.6.4 Reading the LUT. To read the LUT first read the **LUTIndex** register. As well as returning the current LUT index (as noted above) it also has the side effect of setting an *Index* counter to zero. The *Index* counter is only used during readback. Each subsequent read from the **LUTData** register returns the LUT data at the *Index* and increments the Index counter. The Index counter wraps from 255 to 0. ## 5.2.7 Texture Filtering and Alpha Mapping The required texture filter mode is set up in the **TextureReadMode** register as already outlined. Texture filtering must be enabled separately via the **TextureFilterMode** register. This register has the following fields: | Name | Width | Function | |----------------|-------|---------------------------------------------------------------------| | Enable | 1 | Enables texture filtering to occur when set. | | AlphaMapEnable | 1 | Enables Alpha map processing to occur when set | | AlphaMapSense | 1 | When clear the alpha map sense is Include, otherwise it is exclude. | Table 5.13 Texture Filtering Alpha Map processing provides a mechanism where the color of the input texels are tested against a range of colors and the alpha value of the texel is set based on the outcome of the test. This subsequently allows an Alpha Test to be done, however it doesn't rely on the presence of an alpha channel in the texture map.. Direct3D and Quick Draw 3D both have the notion of a transparent color in the texture map for doing cut-outs so the alpha map operation allows the Alpha Test to be used. The alpha map test is given by: where CI is the lower chroma value held in the **TextureChromaLower** register, Cu is the upper chroma value held in the **TextureChromaUpper** register and T is the input texel value. Each component is tested separately and obviously a component can be excluded from the test by setting the lower and upper values to 0 and 255 respectively. The **TextureChromaLower** and **TextureChromaUpper** registers hold the color bytes with the red component in the lower byte, then the green byte and finally the blue byte. The alpha map test is only enabled when **TextureFilterMode** enable bit is set and the *AlphaMapEnable* bit in **TextureFilterMode** is set. The sense of the alpha map test (when enabled) is controlled by the *AlphaMapSense* bit and the effect of this is tabulated below: | AlphaMap Test Enabled | Test Result | AlphaMapSense | Action | | | |-----------------------|-------------|---------------|------------------------|--|--| | N | X | X | Alpha value unchanged. | | | | Y | False | Include | Alpha set to 0x00. | | | | Y | True | Include | Alpha set to 0xFF. | | | | Y | False | Exclude | Alpha set to 0xFF. | | | | Y | True | Exclude | Alpha set to 0x00. | | | Table 5.14 AlphaMapTest Enabled #### **5.2.8** Texture Color Compositing During compositing, the Color, Texel0 and Texel1 values are combined with constant color value(s) held in registers to produce a combined Texture value for the texel, which is passed on to the Application phase. The whole unit operation is enabled and disabled by the **TextureCompositeMode** register. It has the following format: | Bit No. | Name | Description | |---------|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | Enable | When set causes the compositing operation to be calculated and to replate the texture0 value sent to the next unit, otherwise the texture value rematunchanged. This enable is also qualified by the TextureEnable bit in the PrepareToRender message. | The compositing is controlled by five registers: | Register | Channels | Stage | |----------------------------|----------|-------| | TextureCompositeColorMode0 | RGB | 0 | | TextureCompositeColorMode1 | RGB | 1 | | TextureCompositeAlphaMode0 | A | 0 | | TextureCompositeAlphaMode1 | A | 1 | # These registers all have the same format: | Bit No. | Name | Description | |---------|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | Enable | When set causes the output to be calculated as defined by the fields in this register, otherwise the texel0 data is passed through for stage0 and Output data is passed through for stage 1. | | 14 | Arg1 | This field selects the source value for Arg1. The options are: 0 = Output.C of the previous stage or height if the first stage 1 = Output.A of the previous stage or height if the first stage 2 = Color.C 3 = Color.A 4 = TextureCompositeFactorn.C 5 = TextureCompositeFactorn.A 6 = Texel0.C 7 = Texel0.A 8 = Texel1.C 9 = Texel1.A 10 = Sum of the color components of the previous stage or 0 if the first stage. where n is the same as the message suffix and C is the RGB or A depending on the channel. height is defined as clamp (Texel0.A - Texel1.A + 128) | | 5 | InvertArg1 | This bit, if set, will invert the selected Arg1 value before it is used. | | 69 | Arg2 | This field selects the source value for Arg2. The options are: 0 = Output.C of the previous stage or height if the first stage 1 = Output.A of the previous stage or height if the first stage 2 = Color.C 3 = Color.A 4 = TextureCompositeFactorn.C 5 = TextureCompositeFactorn.A 6 = Texel0.C 7 = Texel0.A 8 = Texel1.C 9 = Texel1.A 10 = Sum of the color components of the previous stage or 0 if the first stage. where n is the same as the message suffix and C is the RGB or A depending on the channel. height is defined as clamp (Texel0.A - Texel1.A + 128) | | 10 | InvertArg2 | This bit, if set, will invert the selected Arg2 value before it is used. | | 1113 | I | This field selects what is used as the interpolation factor when the Operation field is set to Lerp, for example. The options are: 0 = Output.A of the previous stage or 0 if the first stage 1 = Color.A 2 = TextureCompositeFactorn.A 3 = Texel0.A 4 = Texel1.A 5 = Texel0.C 6 = Texel1.C where n is the same as the message suffix and C is the RGB or A depending on the channel. | |------|-----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 14 | InvertI | This bit, if set, will invert the selected I value before it is used. | | 15 | A | This bit selects which Arg (after any inversion) is to be used as A in the Operation. The options are: 0 = Arg1 1 = Arg2 | | 16 | В | This bit selects which Arg (after any inversion) is to be used as B the Operation. The options are: 0 = Arg1 1 = Arg2 | | 1720 | Operation | This field defines how the three inputs (A, B and I) are combined. Note the inputs can be optionally inverted before being combined. The 8 bit inputs are unsigned 0.8 fixed point format, but 255 is treated as if it were 1.0 for the calculations. The possible operations are: 0 = Pass (A) 1 = Add (A + B) 2 = AddSigned (A + B - 128) 3 = Subtract (A - B) 4 = Modulate (A * B) 5 = Lerp (A * (1.0 - I) + B * I) 6 = ModulateColorAddAlpha (A * B + I) 7 = ModulateAlphaAddColor (A * I + B) 8 = AddSmoothSaturate (A + B - A * B) 9 = ModulateSigned (A * B, but A and B are biased 8 bit numbers) | | 2122 | Scale | This field selects the scale factor to apply to the final result before it is clamped. The options are: $0 = 0.5$ $1 = 1$ $2 = 2$ $3 = 4$ | ### 5.2.8.1 Texture Application The Application phase applies the texel values calculated in the previous phases of texturing to the incoming pixel color (generated in the color DDA unit). The function used to combine these two colors is defined in the **TextureApplicationMode** register and includes various types of blend, decal, replacement and modulation for the different APIs. The available options are split into three types - OpenGL, QuickDraw 3D and Direct3D. The OpenGL options are one of: Decal - Blend - Modulate - · Replace. The QuickDraw 3D options are any combination of: - Decal - Modulate - Highlight. The D3D options are: - Copy - Add - Modulate - Blend #### 5.2.8.2 OpenGL Application Modes The fragment's color is calculated based on the following equations: | Туре | Equation | | | | | | | | |----------|-----------|--------|--|--|--|--|--|--| | Modulate | | | | | | | | | | Decal | | | | | | | | | | | | | | | | | | | | Blend | | | | | | | | | | | | | | | | | | | | Replace | Base Form | nat | | | | | | | | | Alpha | | | | | | | | | | Luminance | e | | | | | | | | | Luminance | eAlpha | | | | | | | | | Intensity | | | | | | | | | | RGB | | | | | | | | | | RGBA | | | | | | | | ...where R is the final color after texture has been applied, C is the fragment color (in a Color field), T is the texel value (in the texel field) and K is a constant color stored in a register locally (loaded by the **TextureEnvColor** register). The equations are executed on the four color components in parallel and the suffixes show how the different component values are combined. The setting of the **TextureApplicationMode** register fields to implement these OpenGL equations is as follows. Enable is 1, KsEnable, KdEnable are both 0 for all entries and some obvious abbreviations have been used to keep the table width down. | | Colo | r fields | ; | | | Alpha | a fields | | | | |------------------|------|----------|-----|------|-----------|-------|----------|---|------|---------------| | Туре | A | В | I | Invi | Operation | A | В | I | Invi | Operatio<br>n | | Modulate | C.C | T.C | | | Modulate | C.A | T.A | | | Modulate | | Decal | C.C | T.C | T.A | N | Lerp | C.A | | | N | PassA | | Blend | C.C | K.C | T.C | N | Lerp | C.A | T.A | | | Modulate | | Replace | C.C | | | | PassA | | T.A | | | PassB | | (Alpha) | | | | | | | | | | | | Replace | | T.C | | | PassB | C.A | | | | PassA | | (Luminance) | | | | | | | | | | | | Replace | | T.C | | | PassB | | T.A | | | PassB | | (LuminanceAlpha) | | | | | | | | | | | | Replace | | T.C | | | PassB | | T.A | | | PassB | | (Intensity) | | | | | | | | | | | | Replace (RGB) | | T.C | | | PassB | C.A | | | | PassA | | Replace (RGBA) | | T.C | | | PassB | | T.A | | | PassB | So for example, the **TextureApplicationMode** fields for OGL Decal would be set as follows (see the **Value** column): | Bits | Name | Read<br>17 | Write | Value | Description | |------|--------|------------|-------|-------|-----------------------------------------------------------------------------------------------------------------------------------------------| | 0 | Enable | ~ | • | 1 | When set causes the output to be calculated as defined<br>by the fields in this register, otherwise the fragment's<br>data is passed through. | | 12 | ColorA | <b>'</b> | ~ | 0 | This field selects the source value for A. The options are: 0 = Color.C 1 = Color.A 2 = K.C (TextureEnvColor) 3 = K.A (TextureEnvColor) | | 34 | ColorB | • | ~ | 0 | This field selects the source value for B. The options are: 0 = Texel.C 1 = Texel.A 2 = K.C (TextureEnvColor) 3 = K.A (TextureEnvColor) | | 56 | ColorI | ~ | ~ | 3 | This field selects the source value for I. The options are: 0 = Color.A 1 = K.A (TextureEnvColor) 2 = Texel.C 3 = Texel.A | $<sup>^{17}</sup>$ Logic Op register readback is via the main register only | 7 | ColorInvertI | <b>V</b> | <b>V</b> | 0 | This bit, if set, will invert the selected I value before it | |------|-----------------------|----------|----------|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | , | Gololliivelli | | | 0 | is used. | | 810 | Color<br>Operation | • | V | 4 | The possible operations are: 0 = PassA (A) 1 = PassB (B) 2 = Add (A + B) 3 = Modulate (A * B) 4 = Lerp (A * (1.0 - I) + B * I) 5 = ModulateColorAddAlpha (A * B + I) 6 = ModulateAlphaAddColor (A * I + B) 7 = ModulateBIAddA (B * I + A) | | 1112 | AlphaA | • | | 1 | This field selects the source value for A. The options are: 0 = Color.C (effectively Color.A) 1 = Color.A 2 = K.C (TextureEnvColor) (effectively K.A) 3 = K.A (TextureEnvColor) | | 1314 | AlphaB | ~ | V | X | This field selects the source value for B. The options are: 0 = Texel.C (effectively T.A) 1 = Texel.A 2 = K.C (TextureEnvColor) (effectively K.A) 3 = K.A (TextureEnvColor) | | 1516 | AlphaI | ~ | ~ | X | This field selects the source value for I. The options are: 0 = Color.A 1 = K.A (TextureEnvColor) 2 = Texel.C (effectively T.A) 3 = Texel.A | | 17 | Alpha InvertI | ~ | ~ | 0 | This bit, if set, will invert the selected I value before it is used. | | 1820 | Alpha<br>Operation | • | V | 0 | This field defines how the three inputs (A, B and I) are combined. The possible operations are: $0 = PassA (A)$ $1 = PassB (B)$ $2 = Add (A + B)$ $3 = Modulate (A * B)$ $4 = Lerp (A * (1.0 - I) + B * I)$ $5 = ModulateABAddI (A * B + I)$ $6 = ModulateAIAddB (A * I + B)$ $7 = ModulateBIAddA (B * I + A)$ | | 21 | KdEnable | ~ | ~ | 0 | When set this bit causes the RGB results of the texture application to be multiplied by the Kd DDA values. It also enables the Kd DDA sto be updated. | | 22 | KsEnable | <b>V</b> | ~ | 0 | When set this bit causes the RGB results of the texture application (or Kd processing) to be added with the Ks DDA values. It also enables the Ks DDAs to be updated. | | 23 | Motion Comp<br>Enable | • | ~ | X | | #### 5.2.8.3 Apple Texture Application The fragment's color is calculated based on the following equations (any combination of these operations are allowed and they are done in the order given): | Туре | Equation | | | | | | | | |-----------|---------------------------------------------|--|--|--|--|--|--|--| | Decal | If enabled | | | | | | | | | | $R_{rgb} = T_a T_{rgb} + (1 - T_a) C_{rgb}$ | | | | | | | | | | $R_a = C_a$ | | | | | | | | | | else | | | | | | | | | | $R_{rgb} = T_{rgb}$ $R_a = T_a C_a$ | | | | | | | | | | $R_a = T_a C_a$ | | | | | | | | | Modulate | | | | | | | | | | | | | | | | | | | | Highlight | | | | | | | | | | | | | | | | | | | ...where T is the texel color, C is the fragment color (in a Color message), Kd is the diffuse RGB components from the Kd DDA unit, and Ks is the specular RGB components from the Ks DDA unit. The equations are executed on the four color components in parallel and the suffixes show how the different component values are combined. The final value R is forwarded in the Color field of the active step to the next unit. The setting of the **TextureApplicationMode** fields to implement these Apple equations is as follows. Enable is 1, KsEnable is set if Modulate is required, KdEnable is set if highlight is required. Some obvious abbreviations have been used to keep the table width down. | | Colo | r fields | | Alpha fields | | | | | | | |-------------------|------|----------|-----|--------------|-------|-----|-----|---|------|-----------| | Туре | A | В | I | | | A | В | ı | Invl | Operation | | | | | | | n | | | | | | | Decal enabled | C.C | T.C | T.A | N | Lerp | C.A | | | | PassA | | Modulate disabled | | T.C | | | PassB | C.A | T.A | | | Modulate | So for example, the **TextureApplicationMode** fields for Apple Quickdraw Decal with highlighting but no modulation would be as follows (see the **Value** column): | Bits | Name | Read | Write | Value | Description | |------|------|------|-------|-------|-------------| | | | 18 | | | | $<sup>^{18}</sup>$ Logic Op register readback is via the main register only | 0 | Enable | ~ | ~ | 1 | When set causes the output to be calculated as defined<br>by the fields in this register, otherwise the fragment's<br>data is passed through. | | |------|--------------------|---|----------|---|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--| | 12 | ColorA | ~ | ~ | 0 | This field selects the source value for A. The options are: | | | | | | | | 0 = Color.C<br>1 = Color.A<br>2 = K.C (TextureEnvColor)<br>3 = K.A (TextureEnvColor) | | | 34 | ColorB | ~ | ~ | 0 | This field selects the source value for B. The options are: | | | | | | | | 0 = Texel.C<br>1 = Texel.A<br>2 = K.C (TextureEnvColor)<br>3 = K.A (TextureEnvColor) | | | 56 | ColorI | • | ~ | 3 | This field selects the source value for I. The options are: 0 = Color.A 1 = K.A (TextureEnvColor) | | | | | | | | 2 = Texel.C<br>3 = Texel.A | | | 7 | ColorInvertI | ~ | ~ | 0 | This bit, if set, will invert the selected I value before it is used. | | | 810 | Color<br>Operation | V | <b>V</b> | 4 | The possible operations are: 0 = PassA (A) 1 = PassB (B) 2 = Add (A + B) 3 = Modulate (A * B) 4 = Lerp (A * (1.0 - I) + B * I) 5 = ModulateColorAddAlpha (A * B + I) 6 = ModulateAlphaAddColor (A * I + B) 7 = ModulateBIAddA (B * I + A) | | | 1112 | AlphaA | | | 1 | This field selects the source value for A. The options are: 0 = Color.C (effectively Color.A) 1 = Color.A 2 = K.C (TextureEnvColor) (effectively K.A) 3 = K.A (TextureEnvColor) | | | 1314 | AlphaB | • | ~ | 1 | This field selects the source value for B. The options are: 0 = Texel.C (effectively T.A) 1 = Texel.A 2 = K.C (TextureEnvColor) (effectively K.A) 3 = K.A (TextureEnvColor) | | | 1516 | AlphaI | • | • | X | This field selects the source value for I. The options are: 0 = Color.A 1 = K.A (TextureEnvColor) 2 = Texel.C (effectively T.A) 3 = Texel.A | | | 17 | Alpha InvertI | ~ | ~ | X | This bit, if set, will invert the selected I value before it is used. | |------|-----------------------|---|---|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 1820 | Alpha<br>Operation | ~ | ~ | 0 | This field defines how the three inputs (A, B and I) are combined. The possible operations are: $0 = PassA (A)$ $1 = PassB (B)$ $2 = Add (A + B)$ $3 = Modulate (A * B)$ $4 = Lerp (A * (1.0 - I) + B * I)$ $5 = ModulateABAddI (A * B + I)$ $6 = ModulateAIAddB (A * I + B)$ $7 = ModulateBIAddA (B * I + A)$ | | 21 | KdEnable | ~ | ~ | 1 | When set this bit causes the RGB results of the texture application to be multiplied by the Kd DDA values. It also enables the Kd DDA sto be updated. | | 22 | KsEnable | ~ | ~ | 0 | When set this bit causes the RGB results of the texture application (or Kd processing) to be added with the Ks DDA values. It also enables the Ks DDAs to be updated. | | 23 | Motion Comp<br>Enable | ~ | ~ | X | | ### 5.2.8.4 Direct 3D Texture Application (TBlend) The D3D texture color ops are as follows: Enable is 1, KsEnable is 0, KdEnable is set if specular highlight is required. | | Color fields | | | | | | |-----------------------|--------------|-----|-----|------|----------------|--| | Туре | Α | В | ı | Invi | Operation | | | Disable | C.C | | | | PassA | | | Сору | | T.C | | | PassB | | | CopyAlpha | | T.A | | | PassB | | | Add | C.C | T.C | | | Add | | | AddAlpha | C.C | T.A | | | Add | | | Modulate | C.C | T.C | | | Modulate | | | ModulateAlpha | C.C | T.A | | | Modulate | | | BlendFactorAlpha | C.C | T.C | K.A | 5 | Lerp | | | BlendTextureAlpha | C.C | T.C | T.A | ? | Lerp | | | BlendDiffuseAlpha | C.C | T.C | C.A | ? | Lerp | | | ModulateColorAddAlpha | C.C | T.C | TA | ? | ModulateABAddI | | The D3D texture alpha ops are as follows. Enable is 1: | | Color fields | | | | | |----------|--------------|-----|---|------|-----------| | Туре | Α | В | ı | Invi | Operation | | Disable | C.A | | | | PassA | | Сору | | T.A | | | PassB | | Add | C.A | T.A | | | Add | | Modulate | C.A | T.A | | | Modulate | So for example, the **TextureApplicationMode** fields for D3D Modulate with Specular highlights would be set as follows (see the **Value** column): | Bits | Name | Read<br>19 | Write | Value | Description | |------|--------------------|------------|-------|-------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | Enable | ~ | • | 1 | When set causes the output to be calculated as defined<br>by the fields in this register, otherwise the fragment's<br>data is passed through. | | 12 | ColorA | V | V | 0 | This field selects the source value for A. The options are: 0 = Color.C 1 = Color.A 2 = K.C (TextureEnvColor) 3 = K.A (TextureEnvColor) | | 34 | ColorB | V | ~ | 0 | This field selects the source value for B. The options are: 0 = Texel.C 1 = Texel.A 2 = K.C (TextureEnvColor) 3 = K.A (TextureEnvColor) | | 56 | ColorI | V | ~ | X | This field selects the source value for I. The options are: $0 = \text{Color.A}$ $1 = \text{K.A (TextureEnvColor)}$ $2 = \text{Texel.C}$ $3 = \text{Texel.A}$ | | 7 | ColorInvertI | ~ | ~ | X | This bit, if set, will invert the selected I value before it is used. | | 810 | Color<br>Operation | ~ | • | 3 | The possible operations are: 0 = PassA (A) 1 = PassB (B) 2 = Add (A + B) 3 = Modulate (A * B) 4 = Lerp (A * (1.0 - I) + B * I) 5 = ModulateColorAddAlpha (A * B + I) 6 = ModulateAlphaAddColor (A * I + B) 7 = ModulateBIAddA (B * I + A) | | 1112 | AlphaA | ~ | V | 1 | This field selects the source value for A. The options are: 0 = Color.C (effectively Color.A) 1 = Color.A 2 = K.C (TextureEnvColor) (effectively K.A) 3 = K.A (TextureEnvColor) | $<sup>^{19}</sup>$ Logic Op register readback is via the main register only | 1314 | AlphaB | V | <b>'</b> | 1 | This field selects the source value for B. The options | |------|---------------|----------|----------|----------|--------------------------------------------------------------| | 131 | 111911111 | <b>"</b> | | 1 | are: | | | | | | | 0 = Texel.C (effectively T.A) | | | | | | | 1 = Texel.A | | | | | | | 2 = K.C (TextureEnvColor) (effectively | | | | | | | K.A) | | | | | | | 3 = K.A (TextureEnvColor) | | 1516 | AlphaI | V | <b>'</b> | X | This field selects the source value for I. The options | | | | | | | are: | | | | | | | 0 = Color.A | | | | | | | 1 = K.A (TextureEnvColor) | | | | | | | 2 = Texel.C (effectively T.A) | | | | | | | 3 = Texel.A | | 17 | Alpha InvertI | ~ | ~ | X | This bit, if set, will invert the selected I value before it | | | | | | | is used. | | 1820 | Alpha | / | <b>/</b> | 3 | This field defines how the three inputs (A, B and I) are | | | Operation | | | | combined. The possible operations are: | | | | | | | 0 = PassA (A) | | | | | | | 1 = PassB (B) | | | | | | | 2 = Add (A + B) | | | | | | | 3 = Modulate (A * B) | | | | | | | 4 = Lerp (A * (1.0 - I) + B * I) | | | | | | | 5 = ModulateABAddI (A * B + I) | | | | | | | 6 = ModulateAIAddB (A * I + B) | | | | | | | 7 = ModulateBIAddA (B * I + A) | | 21 | KdEnable | ~ | ~ | 1 | When set this bit causes the RGB results of the texture | | | | | | | application to be multiplied by the Kd DDA values. It | | | | | | <u> </u> | also enables the Kd DDA sto be updated. | | 22 | KsEnable | <b>/</b> | <b>'</b> | 0 | When set this bit causes the RGB results of the texture | | | | | | | application (or Kd processing) to be added with the | | | | | | | Ks DDA values. It also enables the Ks DDAs to be | | | | | | | updated. | | 23 | Motion Comp | / | <b>/</b> | X | | | | Enable | | | | | # 5.2.9 Implementation Texture processing has two enables which must both be set to enable modification of the **Color** register. The first enable is loaded via the **TextureApplicationMode** register and is effective until changed by a new **TextureApplicationMode** message. The second enable is the *TextureEnable* bit in the **Render** register and this is only effective until the next **Render** message is received. This second enable is used to temporarily disable texturing when a primitive must not be textured. #### 5.2.9.1 The Ks and Kd DDAs The Ks and Kd DDA units interpolate the specular and diffuse RGB values. Sub pixel corrections can be applied to correct for an initial start error on a span. The output of the DDA units is applied to the texture calculations outlined earlier when the corresponding Apple texure modes are enabled. - The original Ks and Kd registers (e.g. **KsRStart**) when written to load the corresponding R, G and B registers. This gives some backward compatibility. - The new KsRStart, dKsRdx and dKsRdyDom registers load up the start, dx and dyDom registers for the Ks Red DDA unit. Similarly for the Ks GB components and also the Kd RGB components. This allows for future set up chips to program these registers directly. The format is 2's complement 2.22 fixed point format with an effective range clamped to ±1.999. There is a small underflow/overflow guard band - if it is exceeded the value wraps around and produces an abrupt color change artefact. (This should not happen if the setup is correct and sub-pixel correction is applied at the start of each span.) The values of Ks and Kd at each vertex are used to calculate the gradient values in much the same way as the color gradients when Gouraud shading. The parameters to control the two DDA units are loaded into the red, green and blue values (there is no alpha value) and are held as 1.8 unsigned fixed point numbers so values greater than 1.0 can be represented. #### 5.2.9.2 Texture Color Registers The application of texture is qualified by the *TextureEnable* bit in the **Render** command register. The following registers (together with the **TextureApplicationMode** register) control the application of textures. | Register | Data Field | Description | |-----------------|---------------------------|--------------------------------------------------------| | TextureEnvColor | 32 bit RGBA format, R | | | | in least significant byte | | | KsStart | 24 bit 2's comp fix pt | Ks start value, loads up the R, G and B DDA start | | | | registers. | | DKsdx | 24 bit 2's comp fix pt | Ks derivative unit X, loads up the R, G and B DDA dx | | | | registers. | | DKsdyDom | 24 bit 2's comp fix pt | Ks derivative unit Y, dominant edge, loads up the R, G | | | | and B DDA dyDom registers. | | KdStart | 24 bit 2's comp fix pt | Kd start value, loads up the R, G and B DDA start | | | | registers. | | DKddx | 24 bit 2's comp fix pt | Kd derivative unit X, loads up the R, G and B DDA dx | | | | registers. | | DKddyDom | 24 bit 2's comp fix pt | Kd derivative unit Y, dominant edge, loads up the R, G | | | | and B DDA dyDom registers. | | KsRStart | 24 bit 2's comp fix pt | Ks Red start value | | DKsRdx | 24 bit 2's comp fix pt | Ks Red derivative unit X | | DKsRdyDom | 24 bit 2's comp fix pt | Ks Red derivative unit Y, dominant edge | | KsGStart | 24 bit 2's comp fix pt | Ks Green start value | | dKsGdx | 24 bit 2's comp fix pt | Ks Green derivative unit X | | dKsGdyDom | 24 bit 2's comp fix pt | Ks Green derivative unit Y, dominant edge | | KsBStart | 24 bit 2's comp fix pt | Ks Blue start value | | dKsBdx | 24 bit 2's comp fix pt | Ks Blue derivative unit X | | dKsBdyDom | 24 bit 2's comp fix pt | Ks Blue derivative unit Y, dominant edge | | KdRStart | 24 bit 2's comp fix pt | Kd Red start value | | DKdRdx | 24 bit 2's comp fix pt | Kd Red derivative unit X | |-----------|------------------------|-------------------------------------------| | DKdRdyDom | 24 bit 2's comp fix pt | Kd Red derivative unit Y, dominant edge | | KdGStart | 24 bit 2's comp fix pt | Kd Green start value | | DKdGdx | 24 bit 2's comp fix pt | Kd Green derivative unit X | | DKdGdyDom | 24 bit 2's comp fix pt | Kd Green derivative unit Y, dominant edge | | KdBStart | 24 bit 2's comp fix pt | Kd Blue start value | | DKdBdx | 24 bit 2's comp fix pt | Kd Blue derivative unit X | | DKdBdyDom | 24 bit 2's comp fix pt | Kd Blue derivative unit Y, dominant edge | Table 5.15 Texture Color Registers 6 # Fog, Antialias and Alpha Test ### 6.1 Fog Unit The fog unit is used to blend the incoming fragment's color or Z (generated by the color DDA unit, and potentially modified by the texture unit) with a predefined fog color. Fogging can be used to simulate atmospheric fogging, and also to depth cue images. Fog application has two stages: - 1. derive the fog index for a fragment; - 2. apply the fogging effect. The fog index is a value which is interpolated over the primitive using a DDA in the same way color and depth are interpolated. The fogging effect is applied to each fragment using one of the equations described below. Note: Although fog values are linearly interpolated over a primitive they can be calculated on the host using either a linear fog function (typically for simple fog effects and depth cueing) or a more complex function e.g. an exponential function to model atmospheric attenuation.. ### 6.1.1 Fog Index Calculation The fog index can be derived from specified fog values in **FStart**, **dFdX** and **dFdYDom**, or from the Depth DDA values. This option is selected with the *UseZ* bit in the **FogMode** register. The fog DDA is used to interpolate the fog index (f) across a primitive. The mechanics are similar to those of the other DDA units, as the diagram below illustrates: #### Figure 6-1 Fog Interpolation Over A Triangle where: - **dFdX** = Fog gradient in the X direction. - **dFdyDom** = Fog gradient along the dominant edge of a primitive. *Note:* For fogged lines the **dFdx** delta is not required. The fog interpolation values (e.g. **Fstart**) are specified as 32bit fixed point numbers - the format is 2's complement with 10 bits integer and 22 bits fraction. However the derived fog index is an 8-bit fixed point number (0 bits integer, 8 bit fraction). The DDA only exports a relatively narrow range (+511 to -512) compared to the range of depths so the software needs to be careful when setting up the DDA. There are four cases: - If all the vertices are in the near range then the DDA should be set up to output 1.0 with a delta of 0. - If all the vertices are in the far range then the DDA should be set up to output 0.0 with a delta of 0. - If all the vertices are within the DDA's range then the DDA's parameters are set up as normal. - One or more of the vertices are out of the DDA's range and must be clamped before the DDA's parameters are set up. (This will only occur on very large polygons which extend from near the eye point into the far distance.) The result of clamping the input values to the DDA will be to change the effective position and width of the fog band (i.e. middle range), but this is unlikely to be noticeable. If it is noticeable then tessellating the polygon will solve the problem. #### 6.1.1.1 Z-controlled Fog The fog value (direct or mapped via the table) can be derived from the interpolated Z value. If the UseZ bit is set in **FogMode** then the fog DDA is loaded by the Z DDA parameters and tracks the Z value over the primitive. The 2's complement 32 bit Z value from the DDA output is mapped to the 8 bit fog index as follows: - Clamp Z from the DDA so it is greater than or equal to 0. - Add in the ZFogBias and clamp again to be greater than or equal to 0. - Shift right by ZShift amount. - Clamp against 255 so the result is less than or equal to 255. This is the fog index. The bias sets a Z value below which no blending occurs. The scale value selects the range (as a power of 2) beyond which the fog color is used (because the fog index is set to 255). ### 6.1.2 Fog Table Initially, the fog values populate a span register and an increment register tracks progress along the Dominant edge. Both f-controlled and Z-controlled fog produce the 8-bit index values which can be directly applied to interpolation or stored as a table for use in producing more complex (non-linear) fogs with host intervention. The Fog Table is selected using the *Table* bit in the **FogMode** register. The fog table is organised as 256 x 8 so the 8 bit input fog index is mapped to an 8 bit output fog index. The fog table is held in the **FogTable(0)** to **FogTable(63)** registers and each register loads 4 entries at a time. **FogTable0**, byte 0 loads the mapping for fog index 0, byte 1 for fog index 1, etc.. ### 6.1.3 Fog Application Once the fog indices are calculated they are applied to interpolate the fog color and the current color, the controlling equations depending on whether the colors are represented in RGBA ro CI mode. The mode selection is made with the *ColorMode* bit in **FogMode**. #### 6.1.3.1 RGBA Fogging Equation Fogging is applied differently depending on the color mode. For RGBA mode the fogging equation is: #### where: - V = outgoing color - FC = fog color - C = incoming fragment color - FI = fog index The equation is applied to the color components red, green and blue; alpha is not modified The diagram below shows how the fogging would typically affect fragments. Initially no fogging occurs, $f \ge 1.0$ , then a region of linear combination of the fragment color and fog color occurs 0.0 < f < 1.0, followed by a region of constant fog color, $f \le 0.0$ . ### 6.1.3.2 CI Fogging Equation In CI mode the equation is: Note: The CI value is held only in the red channel for later use, but doing the same equation on all color channels keeps the control simpler. Clamping is needed as the result can overflow the 8 bit color component range. # 6.1.4 FogMode register The **FogMode** register is used to enable and disable fogging (qualified by the fog application bit in the **Render** command register). # FogMode FogModeAnd FogModeOr | Name | Type | Offset | Format | |------------|-------------------|--------|---------------------| | FogMode | Fog | 0x8690 | Bitfield | | FogModeAnd | Fog | 0xAC10 | Bitfield Logic Mask | | FogModeOr | Fog | 0xAC18 | Bitfield Logic Mask | | | Control registers | | C | | Bits | Name | Read | Write | Reset | Description | |------|-----------|------|-------|-------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | Enable | ~ | • | x | This bit, when set, and qualified by the FogEnable bit in the <i>Render</i> command causes the current fragment color to be modified by the fog coefficient and background color. | | 1 | ColorMode | ~ | ~ | X | This bit selects the color mode. The two options are: 0 = RGB. The RGB fog equation is used. 1 = CI. The Color Index fog equation is used. | | 2 | Table | ~ | • | X | This bit, when set, causes the Fog Index to be mapped via the FogTable before it controls the blending between the fragment's color and the fog color, otherwise the DDA value is used directly. | | 3 | UseZ | ~ | ~ | X | This bit, when set, causes the DDA to be loaded with the Z DDA values instead of the Fog DDA values. It also adjusts the clamping of the DDA output. | | 48 | ZShift | ~ | V | x | This field specifies the amount the (z from DDA + zBias) is right shifted by before it is clamped against 255 and the bottom 8 bits used as the fog index. This should also take into account the number of depth bits there are. | | 9 | InvertFI | • | ~ | x | This bit, when set, inverts the fog index before it is used to interpolates between the fragment's color and the fog color. This is usually 0 when fog values are used and 1 for Z values. Fog values are set up so they decrease with increasing depth and obviously Z values increase with increasing depth. | | 1031 | Unused | 0 | 0 | X | | Figure 6-2 FogMode Register In addition to the *ColorMode, Table* and *UseZ* bits, FogMode allows inversion of the fog index before interpolation using *InvertFI*. ### 6.1.5 Fog Example A Gouraud shaded, fogged RGBA trapezoid, with the fog color set to white: ``` // Enable the color DDA unit in Gouraud shading // mode colorDDAMode.UnitEnable = Permedia4_ENABLE colorDDAMode.Shade = Permedia4 GOURAUD SHADE MODE ColorDDAMode(colorDDAMode) // Enable the Fog unit fogMode.FogEnable = Permedia4_TRUE fogMode.ColorMode = Permedia4_RGBA_MODE FogMode(fogMode) // Set the fog color to white FogColor(0xFFFFFFF) // Load the color start values and deltas for // dominant edge and the body of the trapezoid Rstart() // Set-up the red component start value dRdX() // Set-up the red component increments dRdYDom() Gstart() // Set-up the green component start value dGdX() // Set-up the green component increments dGdYDom() Bstart() // Set-up the blue component start value dBdX() // Set-up the blue component increments dBYDom() // Load the start value and delta for dominant edge // and the body of the trapezoid // Note that the fog deltas are calculated in the // same way as the color deltas FStart() // Set-up the fog component start value dFdX() // Set-up the fog component increments dFdYDom() // When issuing a Render command the FogEnable bit // should be set in addition to the fog unit being // enabled: // render.FogEnable = PERMEDIA4_TRUE ``` ### 6.2 Antialiasing Antialias application controls the way the coverage value generated by the rasterizer combines with the color generated in the color DDA units. The application depends on the color mode - RGBA or Color Index (CI). ### **6.2.1** Antialias Application When antialiasing is enabled by setting the **AntialiasMode** *Enable* bit and the **Render** register's *CoverageEnable* bit, the fragment's color and alpha is weighted by the percentage area of the pixel covered by the fragment. The coverage weighting is determined by the Rasteriser and varies from 0 to 100% "saturation". If antialiasing is not enabled the fragment is forwarded for alpha testing. The mode (RGBA or CI) is set using the *ColorMode* bit in the **AntialiasMode** register. In RGBA mode the color value is multiplied by the coverage value calculated in the rasterizer (its range is 0% to 100%). The RGB values remain unchanged unless the *ScaleColor* bit is also set. Color scaling is not required by OGL and may reduce performance. In CI mode the coverage value is placed in the lower 4 bits of the color field. The Color Look Up Table is assumed to be set up such that each color has 16 intensities associated with it, one per coverage entry. ### 6.2.2 Polygon Antialiasing A number of issues should be considered when using Permedia4 to render antialiased polygons. Depth buffering cannot be used with Permedia4 antialiasing. This is because the order the fragments are combined in is critical in producing the correct final color. Polygons must therefore be depth sorted, and rendered front to back, using the alpha blend modes: SourceAlphaSaturate for the source blend function and One for the destination blend function. In this way the alpha component of a fragment represents the percentage pixel coverage, and the blend function accumulates coverage until the value in the alpha buffer equals one, at which point no further contributions can be made to a pixel. Although this technique works well in many cases, it is an approximation. Consider the case below which shows three polygons of equal depth which intersect a single pixel. In this case there would ideally be a contribution from each of the polygons. However, if the rendering order is polygon A followed by polygon B, each of which contributes approximately 50% pixel coverage, then polygon C will make no contribution to the pixel as the alpha value is saturated (50%+50%=100%). Figure 6-3 Polygon Antialiasing When antialiasing general scenes with no restrictions on rendering order, the accumulation buffer is the preferred choice. This is indirectly supported on Permedia4 via image uploading and downloading, with the accumulation buffer residing on the host. When antialiasing, interpolated parameters which are sampled within a fragment (color, fog and texture), sometimes are not representative of a continuous sampling of a surface so care should be taken when rendering smooth shaded antialiased primitives. This problem does not occur in aliased rendering, as the sample point is consistently at the center of a pixel. See The OpenGL Programming Guide for more details of antialiasing. ### 6.2.3 Registers The **AntialiasMode** register provides the enables described earlier. | | sMode<br>sModeAnd<br>sModeOr | <b>Type</b> Alpha ' Alpha ' Alpha ' Control | Test | Offs<br>0x 88<br>0x A<br>0x A | BF0 Bitfield Logic Mask | | | |------|------------------------------|---------------------------------------------|-------|-------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--| | Bits | Name | Read | Write | Reset | Description | | | | 0 | Enable | ~ | ~ | X | When set causes the fragment's alpha value to be scaled under control of the remaining bits in this register and the coverage value. When this bit is clear the fragment's alpha value is not changed. 0 = Disable 1 = Enable | | | | 1 | Color Mode | <b>'</b> | • | X | This bit defines the color format the fragment's colors in: 0 = RGBA 1 = CI | | | | 2 | Scale Color | V | V | X | This bit, when set allows the coverage value to scale | | | |---|-------------|---|---|---|-------------------------------------------------------|--|--| | | | | | | the RGB components as well as the alpha component. | | | | | | | | | When this bit is reset only the alpha component is | | | | | | | | | scaled. This allows antialiasing of pre multiplied | | | | | | | | | images used in compositing. | | | | | Unused | 0 | 0 | X | | | | Figure 6-4 AntialiasMode Register For the coverage application to take place the enable in the **AntialiasMode** register must be qualified by the *CoverageEnable* bit in the **Render** command register. ### 6.2.4 Antialias Example Enable antialiasing for a RGBA primitive: ``` // Set AA application for RGBA primitive antialiasMode.AntialiasEnable = PERMEDIA4_TRUE antialiasMode.ColorMode = PERMEDIA4_TRUE AntialiasMode(antialiasMode) // Set the blend mode to an appropriate value if // blending is required. Not shown. // When issuing a Render command the CoverageEnable // bit should be set in addition to the antialias // unit being enabled: // render.CoverageEnable = PERMEDIA4_TRU ``` # 6.3 Alpha Test Unit The alpha test compares a fragment's alpha value with a reference value. Alpha testing is not available in color index (CI) mode. ### 6.3.1 Alpha Test The alpha test conditionally rejects a fragment based on the comparison between a reference alpha value and one associated with the fragment, the available tests are: | | | 1 | | |------|---------------------|------|-----------------------| | Mode | Comparison Function | Mode | Comparison Function | | 0 | Never | 4 | Greater | | 1 | Less | 5 | Not Equal | | 2 | Equal | 6 | Greater Than or Equal | | 3 | Less Than or Equal | 7 | Always | **Table 6.16 Alpha Test Comparison Tests** The sense of the test is such that if the comparison mode is set to Less and the reference value is set to 0x80, then fragments with alpha values between 0x0 and 0x7F will pass the test and fragments with alpha values between 0x80 and 0xFF will fail the test and be rejected. ### 6.3.2 Registers The **AlphaTestMode** register controls the alpha test: | Name | Type | Offset | Format | |------------------|-------------------|---------|---------------------| | AlphaTestMode | AlphaBlend | 0x 8800 | Bitfield | | AlphaTestModeAnd | AlphaBlend | 0x ABF0 | Bitfield Logic Mask | | AlphaTestModeOr | AlphaBlend | 0x ABF8 | Bitfield Logic Mask | | _ | Control registers | | | | Bits | Name | Read | Write | Reset | Description | |------|-----------|----------|-------|-------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | Enable | V | ~ | X | When set causes the fragment's alpha value to be tested under control of the remaining bits in this register. If the alpha test fails then the fragment is discarded. When this bit is clear the fragment alway passes the alpha test. 0 = Disable 1 = Enable | | 13 | Compare | <i>V</i> | V | X | This field defines the unsigned comparison function to use. The options are: 0 = Never 1 = Less 2 = Equal 3 = Less Equal 4 = Greater 5 = Not Equal 6 = Greater Equal 7 = Always The comparison order is as follows: Result = fragment, Alpha Compare Function, reference, Alpha. | | 411 | Reference | ~ | ~ | X | This field holds the 8 bit reference alpha value used in the comparison. | | 1231 | Unused | 0 | 0 | X | <u> </u> | Figure 6-5 AlphaTestMode Register ### 6.3.3 Alpha Test Example Set the alpha test mode to be LESS and the reference value to be 0x80: ``` // Enable unit and set modes alphaMode.UnitEnable = Permedia4_ENABLE alphaMode.Compare = Permedia4_ALPHA_COMPARE_MODE_LESS alphaMode.Reference = 0x80 AlphaMode(alphaMode) // Load register // Render primitives ``` 7 # Framebuffer Read/Write Before rendering can take place Permedia4 must be configured to perform the correct framebuffer read and write operations. Framebuffer read and write modes affect the operation of alpha blending, logic ops, writemasks, image upload/download operations and the updating of pixels in the framebuffer. The framebuffer read and write units are set up in different ways depending on whether Span Operations are being used. Normally, span operations are used for 2D rendering in order to maximize memory bandwidth. Span operations allow multiple pixels to be read and processed in parallel. The following sections discuss the use of the framebuffer read and write units for both standard operation and span operations. ### 7.1.1 Standard Framebuffer Read Operation The **FBSourceReadMode** and **FBDestReadMode** registers allows Permedia4 to be configured to make 0, 1 or 2 reads of the framebuffer. The following are the most common modes of access to the framebuffer: - Rendering operations with no logical operations, software writemasking or alpha blending. In this case no read of the framebuffer is required and framebuffer writes should be enabled. - Rendering operations which use logical ops, software writemasks or alpha blending. In these cases the destination pixel must be read from the framebuffer and framebuffer writes must be enabled. (Here set-up varies depending what functionality is required. If alpha blending, logic ops or software writemasks are used the framebuffer is read twice i.e. both the source and the destination. When alpha blending and logic ops are not needed, and hardware writemasks are used (or when the software writemask allows updating of all bits in a pixel) only one read is required.) - Image upload. This requires reading of the destination framebuffer pixels to be enabled and framebuffer writes to be disabled. - Image download. This case requires no framebuffer reads (as long as software writemasking, alpha blending and logic ops are disabled) but writes must be enabled. The data read from the framebuffer may be tagged either FBDefault (data which may be written back into the framebuffer or used in some manner to modify the fragment color) or FBColor (data which will be uploaded to the host). Table 7.17 Framebuffer Read/Write Modes summarizes the framebuffer read/write control for common rendering operations: | ReadSource | Read | Writes | Read Data | Rendering Operation | |------------|-------------|----------|-----------|------------------------------------------------| | | Destination | | Type | | | Disabled | Disabled | Enabled | - | Rendering with no logical operations, software | | | | | | writemasks or blending. | | Disabled | Disabled | Enabled | - | Image download. | | Disabled | Enabled | Disabled | FBColor | Image upload. | | Enabled | Disabled | Enabled | FBDefault | Image copy with hardware writemasks and no | | | | | | alpha blending or logical operations | | Disabled | Enabled | Enabled | FBDefault | Rendering using logical operations, software | |----------|---------|---------|-----------|----------------------------------------------| | | | | | writemasks or blending. | | Enabled | Enabled | Enabled | FBDefault | Image copy with software writemasks, alpha | | | | | | blending or logic ops. | Table 7.17 Framebuffer Read/Write Modes ### 7.1.2 Framebuffer Read Span Operations As well as performing standard, single pixel at a time, read operations the framebuffer read unit can be used to process span operations. The simplest type of operation is where a span mask is presented to the read unit and the ReadSource bit is enabled. This will cause the unit to read a complete span of pixels from the framebuffer in a 64-bit packed format. The data is always read as a set of 64 bit words. This allows maximum use of both memory and core bandwidth since multiple pixels are being processed. Since a span mask may not necessarily have all its bits set to 1 (i.e. only a subset of pixels in the span need to be processed), it would be wasteful of memory bandwidth to always read the complete span. For example, at the right hand edge of a rectangle which is being copied, we want the read unit to only read up to the rightmost pixel but not beyond. Whether a 64 bit word is read depends on the corresponding bit values in the span mask. Since each bit in the mask represents a pixel, either 1, 2 or 4 bits will represent a 32 bit word for the depths 32, 16 and 8 bits respectively. If the group of bits representing a 32 bit word is non-zero then the corresponding 32 bits will be read from the framebuffer. Thus: - at 32 bits per pixel, a single bit in the span mask corresponds to 32 bits in the framebuffer and 32 bit words will be read only at those locations where the corresponding bit in the span mask is a 1. - at 16 bits per pixel, 2 bits in the span mask represent 32 bits in the framebuffer. A 32 bit word will be read only at those locations where the corresponding 2 span bits form a non-zero value. - at 8 bits per pixel, a 32 bit word will be read only at those locations where the corresponding 4 span bits form a non-zero value. The number of 32bit words read from the framebuffer is thus a function of the span mask and the number of bits per pixel, though this is not normally of interest to the programmer. However, the number of 32bit words becomes important for span operations where the data is downloaded from the host. For example, an image download operation using a span operation only requires those 32 bit words which contain required pixel data to be downloaded. Some examples of this are given later. # 7.1.3 Merge-copy Span Operations To understand the way in which the read units works we will examine the way in which a span operation with a logic op works. In particular we consider the case where both ReadSource and ReadDestination bits are set in the **FBDestReadMode** and **FBSourceReadMode** registers. For example, this would be the case when copying data within the framebuffer with an xor logic op. To perform this operation, the framebuffer read unit must read both a source span of data and a destination span of data. These spans must then be merged so that the data presented to the logic op unit consists of source and destination pairs. Since the logic op unit can combine up to 32 bits at a time, the data can be presented in the form of packed 32 bit words (at 8 bits per pixel this means that the logic op unit can work on 4 pixels at a time). It would be wasteful of memory bandwidth to read 32 bits from the source followed by 32 bits from the destination. This would result in too many memory page breaks. So the read unit reads a complete source span and stores it internally as Pattern RAM in the local buffer. Then the destination span is read. As the destination span is read, it is merged with the saved source span data so that the data which the logical op unit sees comprises corresponding sections of source and destination data. The logic op unit can then combine this data and present a series of 32 bit results to the framebuffer write unit. The Pattern RAM is so named because it can be used for pattern filling operations and was a distinct area of memory in previous Permedia chipsets. 8 # **Alpha Blending** In this chapter we discuss alpha blending. The alpha blend unit performs opacity calculations on the color and alpha components of pixel fragments according to functions defined in the color mode and alpha mode alpha blend registers: - Source Blending Functions - Destination Blending Functions - Color Component Alpha Blending - Alpha Component Alpha Blending - Context Switching - Registers - Readback #### 8.1 Introduction The alpha value is an opacity gradient, with the value of 0 representing complete transparency and a value of 1 representing complete opacity. Both source and destination pixels have associated blending functions that perform calculations to set opacity values before blending the two pixel values occurs. ### 8.1.1 Alpha Blend Functions Alpha blending functions are performed on both color components and alpha components. The alpha blend unit performs the following functions: - Calculates opacity on incoming (source) pixel information - Calculates opacity on existing framebuffer (destination) pixel information - Blends the source and destination pixel information into a new pixel value There are 3 source inputs for both RGB and Alpha: Arg0, Arg1 and Arg2. Arg2 is always the interpolator. The opmodes behave as follows: GL\_REPLACE Arg0 GL\_MODULATE Arg0 \* Arg1 GL\_ADD Arg0 + Arg1 GL\_ADD\_SIGNED\_EXT Arg0 + Arg1 - 128 GL\_INTERPOLATE\_EXT Arg0 \* Arg2 + Arg1 \* (1 - Arg2) Each source can come from one of: GL\_PRIMARY\_COLOR\_EXT color of incoming fragment GL\_TEXTURE texel of corresponding stage GL\_CONSTANT\_EXT texture environment blend color GL\_PREVIOUS\_EXT result of combine from previous unit (always incoming fragment if stage 0) In addition the RGB channels can specify the alpha component (i.e replicate the alpha into rgb). The Blend unit also has an effect on compositing and border textures. ### 8.1.2 Alpha Blend Registers The alpha blend registers comprise the following segments: - Alpha Blend Color Operations - Alpha Blend Alpha Operations - Alpha Source Color Assignments - Alpha Destination Color Assignments - Chroma Test Operations - 2D Configuration Operations - Context Operations Blending occurs in color mode and alpha mode alpha blend registers, called **AlphaBlendColorMode** and **AlphaBlendAlphaMode**, respectively. The **AlphaBlendColorMode** register assigns blend functions to color components R, G and B, and the **AlphaBlendAlphaMode** register assigns a blend function to the alpha component, A. # 8.2 Source Blending Functions Source blending function components are defined in the source blend bits of the **AlphaBlendColorMode** and **AlphaBlendAlphaMode** registers. The functions correspond to OpenGL source blending parameters. # 8.2.1 OpenGL Alpha Blending The alpha blend unit combines the fragment's color value to be stored in the framebuffer, using the blend equation: $$C_0 = C_s S + C_d D$$ where: Co is the output color, Cs is the source color (calculated internally) and Cd is the destination color read from the framebuffer. The source blending function, S, and the destination blending function, D, are defined in the following tables: | Mode | Value | R | G | В | A | |------|-----------------------------|--------------------|--------------------|--------------------|--------------------| | 0 | Zero | 0 | 0 | 0 | 0 | | 1 | One | 1 | 1 | 1 | 1 | | 2 | Destination Color | R <sub>d</sub> | Gd | B <sub>d</sub> | Ad | | 3 | One Minus Destination Color | 1 - R <sub>d</sub> | 1 - G <sub>d</sub> | 1 - B <sub>d</sub> | 1 - A <sub>d</sub> | | 4 | Source Alpha | A <sub>s</sub> | $A_s$ | A <sub>s</sub> | A <sub>s</sub> | |---|-----------------------------|--------------------|--------------------|--------------------|--------------------| | 5 | One Minus Source Alpha* | 1 - A <sub>s</sub> | 1 - A <sub>s</sub> | 1 - A <sub>s</sub> | 1 - A <sub>s</sub> | | 6 | Destination Alpha | Ad | A <sub>d</sub> | Ad | Ad | | 7 | One Minus Destination Alpha | 1 - A <sub>d</sub> | 1 - A <sub>d</sub> | 1 - A <sub>d</sub> | 1 - A <sub>d</sub> | | 8 | Source Alpha Saturate | Min of | min of | min of | 1 | | | | $(A_s, 1 - A_d)$ | $(A_s, 1 - A_d)$ | $(A_s, 1 - A_d)$ | | **Table 8.18 Source Blending Functions** The terms in the equations are in the form Cxy, where x denotes source component (s) or destination component (d), and y denotes color component r, g, b, or a, for Red, Green, Blue, or Alpha, respectively. Note: Values are defined as floating point numbers. All source color component values are in the range 0 to 1.0 inclusive as defined in, e.g. OGL texture environment color parameters (GL\_TEXTURE\_ENV\_COLOR). | Mode | Value | R | G | В | A | |------|-----------------------------|--------------------|--------------------|--------------------|--------------------| | 0 | Zero | 0 | 0 | 0 | 0 | | 1 | One | 1 | 1 | 1 | 1 | | 2 | Source Color | $R_s$ | $G_8$ | $B_s$ | $A_s$ | | 3 | One Minus Source Color | 1 - R <sub>s</sub> | 1 - G <sub>s</sub> | 1 - B <sub>s</sub> | 1 - A <sub>s</sub> | | 4 | Source Alpha | $A_s$ | $A_s$ | $A_s$ | A <sub>s</sub> | | 5 | One Minus Source Alpha | 1 - A <sub>s</sub> | 1 - A <sub>s</sub> | 1 - A <sub>s</sub> | 1 - A <sub>s</sub> | | 6 | Destination Alpha | Ad | Ad | Ad | Ad | | 7 | One Minus Destination Alpha | 1 - A <sub>d</sub> | 1 - A <sub>d</sub> | 1 - A <sub>d</sub> | 1 - A <sub>d</sub> | **Table 8.19 Destination Blending Functions** # 8.3 Destination Blending Functions Destination blending function components are defined in the *DestBlend* bits of the **AlphaBlendColorMode** register and the **AlphaBlendAlphaMode** registers. If the blend operations require any destination color components then the framebuffer read mode must be set appropriately. ## 8.3.1 OpenGL Destination Blending The destination blending corresponds to OpenGL source blending parameters. In some situations blending is desired when no retained alpha buffer is present. In this case the alpha value which is considered to be read from the framebuffer is set to 1.0. The *NoAlphaBuffer* bit in the **AlphaBlendAlphaMode** register controls this. The terms in the blend equations are in the form Cxy, where x denotes source component (s) or destination component (d), and y denotes color component r, g, b, or a, for Red, Green, Blue, or Alpha, respectively. <sup>\*</sup> One Minus Value is sometimes referred to as Inverse Value. Blend values are defined as floating point numbers. All source color component values should be clamped in the range 0 to 1.0 inclusive. In addition to glBlendFunc, Permedia4 supports OGL texture functions described in GL\_TEXTURE\_ENV\_MODE during texture compositing and application. Support for GL\_texture\_ env\_combine\_EXT is enabled by calling TexEnv with GL\_TEXTURE\_ENV\_MODE set to GL\_COMBINE\_EXT. This allows user to explicitly set up the fragment operations for the RGB and Alpha channels - in particular, GL\_ALPHA, GL\_LUMINANCE, GL\_LUMINANCE\_ALPHA, GL\_RGB and GL\_RGBA. This allows full texture function implementation in both Texture0 and Texture1. The equations for each case are as described in *The OpenGL Reference Manual* and *The OpenGL Programming Guide* from Addison-Wesley. #### 8.3.1.1 Embossed bump-mapping Special ENV\_MODE support is available in the Permedia4 when this extension is used for embossed bump-mapping. Normally GL\_PREVIOUS\_EXT maps onto GL\_PRIMARY\_COLOR\_EXT for stage0. However when the *EnableBumpHeightAsSource* flag is true, GL\_PREVIOUS\_EXT uses the difference between texture stage1 and stage0 alpha channels for the source input for stage0: clamp(tex0.alpha - tex1.alpha + 128 This difference when replicated into the rgb channels can be used to modulate the other input to make it lighter or darker. The alpha channel is the same in each stage, but is read offset in the second stage relative to the first stage. ### 8.3.2 QuickDraw 3D Alpha Blending When the AlphaType bit in the AlphaBlendAlphaMode register is set then QuickDraw 3D style alpha blend equations are followed. The OpenGL equations above are used for the RGB components, but the alpha channel is treated differently and has a single source and destination blend functions as follows: $$C_a = 1 - (1 - C_{sa}) * (1 - C_{da})$$ The source and destination blend functions should be set as follows: | Name | Source Blend | Destination Blend | |----------------|--------------|---------------------| | Pre-multiplied | ONE | ONE_MINUS_SRC_ALPHA | | Interpolated | SRC_ALPHA | ONE_MINUS_SRC_ALPHA | #### Table 8.20 Source Blending Functions The alpha calculation is the same for both modes. #### 8.3.3 Image Formatting The alpha blend and color formatting units can be used to format image data into any of the supported Permedia4 framebuffer formats, though conversion between CI and RGB modes or vice versa are not supported. Consider the case where the framebuffer is in RGBA 4:4:4:4 mode, and an area of the screen is to be uploaded and stored in an 8 bit RGB 3:3:2 format. The sequence of operations is: - Set the rasterizer as appropriate (described in section §2.1.9.4) - Enable framebuffer reads - Disable framebuffer writes and set the *UpLoadData* bit in the **FBWriteMode** register - Enable the alpha blend unit with a blend function which passes the destination value and ignores the source value (source blend Zero, destination blend One) and set the color mode to RGBA 4:4:4:4 - Set the color formatting unit to format the color of incoming fragments to an 8 bit RGB 3:3:2 framebuffer format. The upload now proceeds as normal. The same technique can be used to download data which is in any supported framebuffer format, in this case the rasterizer is set to sync with **FBData**, rather than Color. In this case framebuffer writes are enabled, and the *UpLoadData* bit cleared. ### 8.3.4 Registers The unit is controlled by the **AlphaBlendAlphaMode** and **AlphaBlendColorMode** registers: # AlphaBlendAlphaMode AlphaBlendAlphaModeAnd AlphaBlendAlphaModeOr Name Type Offset **Format** AlphaBlendAlphaMode Alpha Blend 0x AFA8 Bitfield Bitfield Logic Mask AlphaBlendAlphaModeAnd Alpha Blend 0x AD30 Alpha Blend Alpha Mode OrAlpha Blend 0x AD38Bitfield Logic Mask Control registers | Bits | Name | Read<br>20 | Write | Reset | Description | |------|--------------------|------------|----------|-------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | Enable | 1 | ✓ | x | When set causes the fragment's alpha to be alpha blended under control of the remaining bits in this register. When clear the fragment alpha remains unchanged (but may later to affected by the chroma test). | | 14 | SourceBlend | 1 | 1 | X | This field defines the source blend function to use. See the table below for the possible options. | | 57 | DestBlend | 1 | 1 | X | This field defines the destination blend function to use. See the earlier table for the possible options. | | 8 | Source<br>TimesTwo | 1 | ✓ | X | This bit, when set causes the source blend result to be multiplied by two before it is combined with the dest blend result. When this bit is clear no multiply occurs. | | 9 | DestTimes<br>Two | 1 | 1 | x | This bit, when set causes the dest blend result to be multiplied by two before it is combined with the source blend result. When this bit is clear no multiply occurs. | | 10 | Invert Source | 1 | 1 | X | This bit, when set, causes the incomming source data to be inverted before any blend operation takes place. | | 11 | Invert Dest | 1 | 1 | X | This bit, when set, causes the incomming dest data to be inverted before any blend operation takes place. | | 12 | NoAlpha<br>Buffer | ✓ | ✓ | x | When this bit is set the source alpha value is always set to 1.0. This is typically used when no retained alpha buffer is present but alsos overrides any retained alpha value if one is present. Color formats with no alpha field defined automatically have their alpha value set to 1.0 regardless of the state of this bit. | | 13 | Alpha Type | 1 | <b>√</b> | X | This bit selects which set of equations are to be used for the alpha channel. 0 = OpenGL 1 = Apple | $<sup>^{\</sup>rm 20}\,{\rm Logic}$ Op register readback is via the main register. | 14 | Alpha<br>Conversion | ✓ | ✓ | x | This bit selects how alpha component less than 8 bits wide are converted to 8 bit wide values prior to the alpha blend calculations. The options are $0 = \text{Scale}$ 1 = Shift | |------|---------------------|----------|----------|---|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 15 | Constant<br>Source | <b>✓</b> | <b>✓</b> | X | This bit, when set, forces the Source color to come from the AlphaSourceColor register (in 8888 format) instead of the framebuffer. 0 = Use framebuffer alpha 1 = Use AlphaSourceColor register alpha value. | | 16 | Constant Dest | <b>√</b> | <b>✓</b> | X | This bit, when set, forces the destination color to come from the AlphaDestColor register (in 8888 format) instead of the fragment's color. 0 = Use fragment's alpha. 1 = Use AlphaDestColor register alpha value | | 1719 | Operation | ✓ | <b>√</b> | X | This field selects how the source and destination blend results are to be combined. The options are: 0 = Add | Notes The Alpha Conversion bit selects the conversion method for alpha values read from the framebuffer. - The Scale method linearly scales the alpha values to fill the full range of an 8 bit value. This method is preferable when, for example, downloading an image with fewer bits per pixel into a deeper (i.e. more bits per pixel) framebuffer. - The Shift method just left shifts by the appropriate amount to make the component 8 bits wide. This method is preferable when blending into a dithered framebuffer as it preserves the framebuffer alpha when fragment alpha does not contribute to it. Alpha is controlled separately from color to allow, for example, the situation in antialiasing where it represents coverage - this must be linearly scaled to preserve the 100% covered state. The logic operator equivalents behave the same way but the new mode is AND'd or OR'd with the former mode before replacing it. The table below shows the different color modes supported. In the R, G, B and A columns the nomenclature <u>n@m</u> means this component is n bits wide and starts at bit position m in the framebuffer. The least significant bit position is 0 and a dash in a column indicates that this component does not exist for this mode. In the case of the RGB formats where no Alpha is shown then the alpha field is set to 255. In this case the *NoAlphaBuffer* bit in the **AlphaBlendAlphaMode** register should be set which causes the alpha component to be set to 255. Two color ordering formats are supported, namely ABGR and ARGB, with the right most letter representing the color in the least significant part of the word. This is controlled by the Color Order bit in the *AlphaBlendColorMode* register, and is easily implemented by just swapping the R and B components after conversion into the internal format. The only exception to this are the 3:3:2 formats where the actual bit fields extracted from the framebuffer data need to be modified as well because the R and B components are differing widths. CI processing is not effected by this swap and the result is always on internal R channel. The format to use is held in the *AlphaBlendColorMode* register. Note that in OpenGL the alpha blending is not defined for CI mode.. When converting a Color Index value to the internal format any unused bits are set to zero #### Figure 8-1 AlphaBlendAlphaMode Register The ColorConversion bit selects the conversion method for RGB values read from the framebuffer. The Scale method linearly scales the color values to fill the full range of an 8 bit value. This method is preferable when, for example, downloading an image with fewer bits per pixel into a deeper (i.e. more bits per pixel) framebuffer. The Shift method just left shifts by the appropriate amount to make the component 8 bits wide. This method is preferable when blending into a dithered framebuffer as it preserves the framebuffer color when fragment color does not contribute to it. The scale method would otherwise cause the 'fraction' bits to be non zero, which may result in a different color when re-dithered again. This shows up as a faint outline of the underlying polygon, when, for example, an alpha blended texture is used with zero value to provide cut-outs. The *AlphaConversion* bit selects the conversion method for the Alpha values in a similar way. It is controlled separately to allow, for example, the situation in antialiasing where it represents coverage - this must be linearly scaled to preserve the 100% covered state. The alpha blend can be augmented by a chroma test, discussed next. ## 8.3.5 Chroma Testing Chroma test involves testing a fragment's color against a range of colors. The fragment can then be rejected based on the outcome. The framebuffer source color, framebuffer destination color and the fragment's color before or after alpha blending can all be used for the test. The source and destination keying are needed by DirectX for its chroma key blts. Rejecting a fragment based on its color can be used to prevent writes where the destination color does not change. For example a fogged fragment which has the same color as the background fog color does not need to be written if the screen was cleared to the fog color. The chroma test is given by: where CI is the lower chroma value held in the **ChromaLower** register, Cu is the upper chroma value held in the **ChromaUpper** register and T is the selected color to test against. Each component is tested separately and obviously a component can be excluded from the test by setting the lower and upper values to 0 and 255 respectively. The format of the **ChromaLower** and **ChromaUpper** registers is the red byte is in the least significant byte, then the green byte and finally the blue byte. If the framebuffer format for a color component is less than 8 bits then the unused bits in the upper and lower register for this component are set to zero. The chroma test is enabled when the *Enable* bit in the **ChromaMode** register is set. The source color to test is given by the *Source* field. The sense of the chroma test is controlled by the *Sense* bit - the effect shown in the table below: | Chroma Test<br>Enabled | Test Result | ChromaSense | Action | |------------------------|-------------|-------------|--------------------------------------| | N | X | X | The framebuffer is updated as normal | | Y | False | Include | The framebuffer is not updated | | Y | True | Include | The framebuffer is updated as normal | | Y | False | Exclude | The framebuffer is updated as normal | | Y | True | Exclude | The framebuffer is not updated | The format of the **ChromaTestMode** register is: # ChromaTestMode ChromaTestModeAnd ChromaTestModeOr | Name | Type | Offset | Format | |-------------------|-------------------|--------|---------------------| | ChromaTestMode | Alpha Blend | 0x8F18 | Bitfield | | ChromaTestModeAnd | Alpha Blend | 0xACC0 | Bitfield Logic Mask | | ChromaTestModeOr | Alpha Blend | 0xACC8 | Bitfield Logic Mask | | | Control registers | | | | Bits | Name | Read<br>21 | Write | Reset | Description | |------|------------|------------|-------|-------|-------------------------------------------------------| | 0 | Enable | 1 | 1 | X | When set enables chroma testing under control | | | | | | | of the remaining bits in this register. When clear | | | | | | | no chroma test is done. | | 12 | Source | 1 | 1 | X | This field selects which color (after any suitable | | | | | | | conversion) is to be used for the chroma test. The | | | | | | | values are: | | | | | | | 0 = FBSourceData | | | | | | | 1 = FBData | | | | | | | 2 = Input Color (from fragment) | | | | | | | 3 = Output Color (after any alpha | | | | | | | blending) | | 34 | PassAction | 1 | 1 | X | This field defines what action is to be taken if the | | | | | | | chroma test passes (and is enabled). The options are: | | | | | | | 0 = Pass | | | | | | | 1 = Reject | | | | | | | 2 = Substitute ChromaPassColor | | | | | | | 3 = Substitute ChromaFailColor | | 56 | FailAction | 1 | 1 | X | This field defines what action is to be taken if the | | | | | | | chroma test fails (and is enabled). The options are: | | | | | | | 0 = Pass | | | | | | | 1 = Reject | | | | | | | 2 = Substitute ChromaPassColor | | | | | | | 3 = Substitute ChromaFailColor | | 731 | Unused | 0 | 0 | X | | Notes: Used to test the fragment's color against a range of colors after alphablending. The chroma test is enabled by the enable bit (0) in the register. Note: incompatible with MX programming. The logic operator equivalents behave the same way but the new mode is AND'd or OR'd with the former mode before replacing it. The color format and order is needed as the destination color is read from the framebuffer and needs to be converted into the internal Permedia4 representation, it should therefore be set as appropriate for the framebuffer. \_ <sup>&</sup>lt;sup>21</sup> Logic Op register readback is via the main register only | | | | | Internal Co | lor Channel | | |--------|--------|--------------|------|-------------|-------------|------| | | Format | Name | R | G | В | A | | | 0 | 8:8:8:8 | 8@0 | 8@8 | 8@16 | 8@24 | | | 1 | 5:5:5:5 | 5@0 | 5@5 | 5@10 | 5@15 | | | 2 | 4:4:4:4 | 4@0 | 4@4 | 4@8 | 4@12 | | Color | 3 | 4:4:4:4Front | 4@0 | 4@8 | 4@16 | 4@24 | | Order: | 4 | 4:4:4:4Back | 4@4 | 4@12 | 4@20 | 4@28 | | BGR | 5 | 3:3:2Front | 3@0 | 3@3 | 2@6 | 255 | | | 6 | 3:3:2Back | 3@8 | 3@11 | 2@14 | 255 | | | 7 | 1:2:1Front | 1@0 | 2@1 | 1@3 | 255 | | | 8 | 1:2:1Back | 1@4 | 2@5 | 1@7 | 255 | | | 13 | 5:5:5Back | 5@16 | 5@21 | 5@26 | 255 | | | 0 | 8:8:8:8 | 8@16 | 8@8 | 8@0 | 8@24 | | | 1 | 5:5:5:5 | 5@10 | 5@5 | 5@0 | 5@15 | | | 2 | 4:4:4:4 | 4@8 | 4@4 | 4@0 | 4@12 | | Color | 3 | 4:4:4:4Front | 4@16 | 4@8 | 4@0 | 4@24 | | Order: | 4 | 4:4:4:4Back | 4@20 | 4@12 | 4@4 | 4@28 | | RGB | 5 | 3:3:2Front | 3@5 | 3@2 | 2@0 | 255 | | | 6 | 3:3:2Back | 3@13 | 3@10 | 2@8 | 255 | | | 8 | 1:2:1Back | 1@7 | 2@5 | 1@4 | 255 | | | 7 | 1:2:1Front | 1@3 | 2@1 | 1@0 | 255 | | | 13 | 5:5:5Back | 5@26 | 5@21 | 5@16 | 255 | | CI | 14 | CI8 | 8@0 | 0 | 0 | 0 | | | 15 | CI4 | 4@0 | 0 | 0 | 0 | Table 8.21 Permedia4 Color Modes The framebuffer may be configured to be RGBA or Color Index (CI). The R, G, B and A columns show the width of each color component. n@m means that n bits starting at bit position m are read and scaled to fit the 8bit internal color channel format. The least significant bit position is zero. A numerical value (0 or 255) indicates the value substituted when the corresponding channel does not exist in the framebuffer. For the Front and Back Modes the value to be blended is read only from the low bits or high bits respectively. This is to assist with color space double buffering. ## 8.3.6 Alpha Blend Example This example sets the blend mode to allow antialiasing of polygons, i.e. source blend function = Source Alpha Saturate, destination blend function = One. These blend functions are suitable for polygon antialiasing when polygons are drawn in front to back order, and the depth test is disabled. ``` // Enable framebuffer reads allow blend operation // - Not Shown - // Set the alpha mode. ``` // - Not Shown - ``` alphaBlendColorMode.Enable = PERMEDIA4_ENABLE alphaBlendColorMode.SourceBlend = PERMEDIA4_BLEND_SRC_ALPHA_SATURATE alphaBlendColorMode.DestinationBlend = PERMEDIA4_BLEND_ONE alphaBlendColorMode.ColorFormat = as appropriate AlphaBlendColorMode(alphaBlendColorMode) // Load register // Enable antialias application and disable // depth testing // - Not Shown - // Render polygons sorted front to back with // Coverage Enable bit set in the Render command ``` 9 # **Color Format and Logical Ops** The color format unit converts from Permedia4's internal color representation to a format suitable to be written into the framebuffer. This process may optionally include dithering of the color values for framebuffers with less than 8 bits width per color component. If the unit is disabled then the color is not modified in any way. ## 9.1 Color and Alpha Formats Permedia4 separates the Alpha and Color format information into two new registers (AlphaBlendColorMode and AlphaBlendAlphaMode). The AlphaBlendMode register is not supported. The color format is held in the **AlphaBlendColorMode** register. Note that in OpenGL alpha blending is not defined for CI mode. Raw framebuffer formats from local memory are only converted to 8-bit formats in the AlphaBlend registers. Alpha is controlled separately from color to allow, for example, the situation in antialiasing where it represents coverage - this must be linearly scaled to preserve the 100% covered state. The table below shows the different color modes supported. In the R, G, B and A columns the nomenclature n@m means the component is n bits wide and starts at bit position m in the framebuffer. The least significant bit position is 0 and a dash in a column indicates that this component does not exist for this mode. In the case of RGB formats where no Alpha is shown, the alpha field should be set to 255. Use the *NoAlphaBuffer* bit in the **AlphaBlendAlphaMode** register to do this. Permedia4 supports two color-ordering formats: ABGR and ARGB. The rightmost letter represents the color in the least significant part of the word. This is controlled by the *ColorOrder* bit in the **AlphaBlendColorMode** register (and elsewhere), and is easily implemented by just swapping the R and B components after conversion into the internal format. The only exception to this are the 3:3:2 formats where the actual bit fields extracted from the framebuffer data need to be modified as well because the R and B components are differing widths. CI processing is not affected by this swap and the result is always on the internal R channel. When converting a Color Index value to the internal format any unused bits are set to zero | | | | | Internal Color Channels | | | | |---|--------|-------|---------|-------------------------|-----|------|------| | | Format | Color | Name | R | G | В | Α | | | | Order | | | | | | | | 0 | BGR | 8:8:8:8 | 8@0 | 8@8 | 8@16 | 8@24 | | | 1 | BGR | 4:4:4:4 | 4@0 | 4@4 | 4@8 | 4@12 | | С | 2 | BGR | 5:5:5:1 | 5@0 | 5@5 | 5@10 | 1@15 | | O | 3 | BGR | 5:6:5 | 5@0 | 6@5 | 5@11 | - | | L | 4 | BGR | 3:3:2 | 3@0 | 3@3 | 2@6 | - | | О | 0 | RGB | 8:8:8:8 | 8@16 | 8@8 | 8@0 | 8@24 | | u | 1 | RGB | 4:4:4:4 | 4@8 | 4@4 | 4@0 | 4@12 | | |----|----|-----|---------|------|-----|-----|------|--| | r | 2 | RGB | 5:5:5:1 | 5@10 | 5@5 | 5@0 | 1@15 | | | | 3 | RGB | 5:6:5 | 5@11 | 6@5 | 5@0 | - | | | | 4 | RGB | 3:3:2 | 3@5 | 3@2 | 2@0 | - | | | CI | 15 | X | CI8 | 8@0 | 0 | 0 | 0 | | The *AlphaConversion* bit in the **AlphaBlendAlphaMode** register selects the conversion method for alpha values read from the framebuffer. When the conversion bit is set the corresponding component(s) is left shifted by (8 - n) bits and zero filling. Note For some formats the components have different widths, hence different values of n. - The Scale method linearly scales the alpha values to fill the full range of an 8 bit value. This method is preferable when, for example, downloading an image with fewer bits per pixel into a deeper (i.e. more bits per pixel) framebuffer. - The Shift method left shifts by the appropriate amount to make the component 8 bits wide. This method is preferable when blending into a dithered framebuffer as it preserves the framebuffer alpha when fragment alpha does not contribute to it. For example if a three bit component has bits B2, B1 and B0 then the 8 bit value would be made up as follows: If the alpha component doesn't exist in the format, or *NoAlphaBuffer* is set then the alpha value is not affected by the setting of the *AlphaConversion* bit and is always set to 255 (in the 8 bit domain) or 256 (in the 9 bit domain). The **AlphaBlendColorMode** register controls color channel blending. It has the following format: | Name | Type | Offset | Format | |------------------------|-------------|---------|---------------------| | AlphaBlendColorMode | Alpha Blend | 0x AFA0 | Bitfield | | AlphaBlendColorModeAnd | Alpha Blend | 0x ACB0 | Bitfield Logic Mask | | AlphaBlendColorModeOr | Alpha Blend | 0x ACB8 | Bitfield Logic Mask | | Control | rometore | |---------|-----------| | Common | 102131013 | | | | | Bits | Name | Read <sup>22</sup> | Write | Reset | Description | |------|--------|--------------------|----------|-------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | Enable | <b>V</b> | <b>V</b> | | When set causes the fragment's color to be alpha blended under control of the remaining bits in this register. When clear the fragment color remains unchanged (but may later to effected by the chroma test). | 2 <sup>&</sup>lt;sup>22</sup> Logic Op register readback is via the main register | 14 | SourceBlend | ~ | <b>'</b> | X | This field defines the source blend function to use. See the table in the <i>AlphaBlendColorMode</i> register for the possible options | | |------|---------------------|---|----------|---|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--| | 57 | DestBlend | ~ | • | X | This field defines the destination blend function to a See the table in the <i>AlphaBlendColorMode</i> register for possible options | | | 8 | Source<br>TimesTwo | • | • | X | This bit, when set causes the source blend result to be multiplied by two before it is combined with the dest blend result. When this bit is clear no multiply occurs | | | 9 | DestTimes Two | • | | x | This bit, when set causes the dest blend result to be multiplied by two before it is combined with the source blend result. When this bit is clear no multiply occurs | | | 10 | InvertSource | ~ | ~ | X | This bit, when set, causes the incomming source data to be inverted before any blend operation takes place | | | 11 | InvertDest | ~ | ~ | X | This bit, when set, causes the incomming dest data to be inverted before any blend operation takes place | | | 1215 | Color Format | • | ~ | X | This field defines framebuffer color formats. See the table in the <i>AlphaBlendColorMode</i> register for the possible options | | | 16 | ColorOrder | • | ~ | X | This bit selects the color order in the framebuffer:<br>0 = BGR<br>1 = RGB | | | 17 | Color<br>Conversion | ~ | / | X | This bit selects how color components less than 8 bits wide are converted to 8 bit wide values prior to the alpha blend calculations. The options are 0 = Scale 1 = Shift | | | 18 | Constant Source | ~ | ~ | X | This bit, when set, forces the Source color to come from the <i>AlphaSourceColor</i> register (in 8888 format) instead of the framebuffer. 0 = Use framebuffer 1 = Use AlphaSourceColor register | | | 19 | ConstantDest | ~ | ~ | X | This bit, when set, forces the destination color to come from the <i>AlphaDestColor</i> register (in 8888 format) instead of the fragment's color. 0 = Use fragment's color. 1 = Use <i>AlphaDestColor</i> register. | | | 2023 | Operation | • | V | X | This field selects how the source and destination blend results are to be combined. The options are: 0 Add 1 Subtract (i.e. S - D) 2 Subtract reversed (i.e. D - S) 3 Minimum 4 Maximum | | | 24 | SwapSD | ~ | • | X | This bit, when set causes the source and destination pixel values to be swapped. The main use for this is to allow a downloaded color value to be in a format other than 8888 and use this unit to do color conversion. | | The *ColorConversion* bit selects the conversion method for RGB values read from the framebuffer, similarly to the *AlphaConversion* bit for alpha values: - The Scale method linearly scales the color values to fill the full range of an 8 bit value. This method is preferable when, for example, downloading an image with fewer bits per pixel into a deeper (i.e. more bits per pixel) framebuffer. - The Shift method left shifts by the appropriate amount to make the component 8 bits wide. This method is preferable when blending into a dithered framebuffer as it preserves the framebuffer color when fragment color does not contribute to it<sup>23</sup> #### 9.1.1 Color Dithering Permedia4 uses an ordered dither algorithm to implement color dithering. The following table shows the exact type of dithering used when dither is enabled. The type of dithering depends on the width of individual color components: | Component Width | Type of Dithering | |-----------------|--------------------| | 8 | No Dithering | | 5 | 2x2 Ordered Dither | | 4 | 4x4 Ordered Dither | | 3 | 4x4 Ordered Dither | | 2 | 4x4 Ordered Dither | | 1 | 4x4 Ordered Dither | Table 9.22 Dither Methods Permedia4's ordered dither matrices are shown below: | 0 | 8 | 2 | 10 | |----|----|----|----| | 12 | 4 | 14 | 6 | | 3 | 11 | 1 | 9 | | 15 | 7 | 13 | 5 | | 0 | 2 | |---|---| | 3 | 1 | Table 9.23 Ordered Dither Matrices, 4x4 and 2x2. If the color formatting unit is disabled, the RGBA color components are not modified. Instead, they are truncated or rounded under the control of the *RoundingMode* bit in the **DitherMode** register when they are placed in the framebuffer. This assumes that the framebuffer width is less than 8 bits per component. In CI mode the value is rounded to the nearest integer. In both cases the result is clamped to a maximum value to prevent overflow. In some situations only screen coordinates are available, but windows-relative dithering is required. This can be implemented by adding an optional offset to the coordinates before indexing the dither tables. The offset is a two bit number which is supplied for each coordinate, X and Y. The *XOffset*, *YOffset* fields in the **DitherMode** register control this operation, if window relative coordinates are used they should be set to zero. For more information on offset calculation see section 4.2.10.1 - Address Calculation, in Volume I **3D**labs <sup>&</sup>lt;sup>23</sup>The scale method would otherwise cause the 'fraction' bits to be non zero, which could result in a different color when redithered again. This shows up as a faint outline of the underlying polygon, when, for example, an alpha blended texture is used with zero value to provide cut-outs. Alpha channel dithering is qualified by the *AlphaDither* control bit. When cleared the alpha channel is processed in the same way as the color channels, as dictated by the *DitherEnable* bit. When the *AlphaDither* bit is set however, the alpha channel is not dithered, but is processed according to the state of the *RoundingMode* bit. The ability to disable dithering on the alpha channel is useful when using the alpha buffer to hold coverage information during antialiasing. In this situation dithering adds noise to the coverage value, which would create artifacts where a pixel which should be fully covered is reported as not fully covered. See *The OpenGL Reference Manual* and *The OpenGL Programming Guide* from Addison-Wesley for more details on dithering. #### 9.1.2 Registers Dither operations are controlled by the **DitherMode** register: ## DitherMode DitherModeAnd DitherModeOr | Name | Type | Offset | Format | |---------------|------------------|--------|---------------------| | DitherMode | Global | 0x8818 | Bitfield | | DitherModeAnd | Global | 0xACD0 | Bitfield Logic Mask | | DitherModeOr | Global | 0xACD8 | Bitfield Logic Mask | | | Control Register | | | | Bits | Name | Read | Write | Reset | Description | |------|---------------|------|----------|-------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | Enable | • | ~ | X | When set causes the fragment's color values to be dithered or rounded under control of the remaining bits in this register. If this bit is clear then the fragment's color is passed unchanged. | | 1 | Dither Enable | • | V | x | When this bit is set any RGB format color is dithered, otherwise it is rounded to the destination size under control of the RoundingMode field. See the table below for the dither matrix and how it is combined with the color components. Color Index formats are always rounded. | | 25 | Color Format | ~ | ~ | X | The color format which in turn is coded from the size and position of the red, green, blue and (if present) the alpha components. | | 67 | Xoffset | ~ | ~ | х | This offset is added to the fragment's x coordinate to derive the x address in the dither table. This allows window-relative dithering using screen coordinates. | | 89 | Yoffset | ~ | <b>'</b> | X | This offset is added to the fragment's y coordinate to derive the y address in the dither table. This allows window-relative dithering using screen coordinates. | | 10 | Color Order | ~ | ~ | X | Holds the color order. The options are: 0 = BGR 1 = RGB | | 1113 | Reserved | 0 | 0 | X | | |------|------------------|---|---|---|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 14 | Alpha Dither | ~ | ~ | x | This bit allows the alpha channel to be rounded even when the color channels are dithered. This helps when antialiasing. 0 = Alpha value is dithered (if DitherEnable is set) 1 = Alpha value is always rounded. | | 1516 | Rounding<br>Mode | • | • | X | 0 = Truncate<br>1 = Round Up<br>2 = Round Down | | 1731 | Unused | 0 | 0 | X | | Figure 9-1 DitherMode Register ## 9.1.3 Dither Example To set the framebuffer format to RGB 3:3:2 and enable dithering: ``` // 332 Dithering ditherMode.UnitEnable = PERMEDIA4_TRUE ditherMode.DitherEnable = PERMEDIA4_TRUE ditherMode.ColorMode = PERMEDIA4_COLOR_FORMAT_RGB_332 DitherMode(ditherMode) // Load register ``` ## 9.1.4 3:3:2 Color Format Example To set the framebuffer format to RGB 3:3:2 and disable dithering: ``` // 332 No Dither ditherMode.UnitEnable = PERMEDIA4_TRUE ditherMode.DitherEnable = PERMEDIA4_FALSE ditherMode.ColorMode = PERMEDIA4_COLOR_FORMAT_RGB_332 DitherMode(ditherMode) // Load register ``` ## 9.1.5 8:8:8:8 Color Format Example To set the framebuffer to RGBA 8:8:8:8 and not dithered: ``` // 8888 Dithered (No effect as 8 bit components are // not dithered) ditherMode.UnitEnable = PERMEDIA4_TRUE ditherMode.DitherEnable = PERMEDIA4_FALSE ditherMode.ColorMode = PERMEDIA4_COLOR_FORMAT_RGBA_8888 ``` ``` DitherMode(ditherMode) // Load register ``` The same can be achieved by disabling the color formatting unit as 8 bit components are not dithered: ``` // 8888 No dither ditherMode.UnitEnable = PERMEDIA4_FALSE ``` DitherMode(ditherMode) // Load register ## 9.1.6 Color Index Format Example To set the framebuffer to 4 bit Color Index and enable dithering: ``` // 4 bit CI with dithering ditherMode.UnitEnable = PERMEDIA4_TRUE ditherMode.DitherEnable = PERMEDIA4_TRUE ditherMode.ColorMode = PERMEDIA4_COLOR_FORMAT_CI_4 DitherMode(ditherMode) // Load register ``` ## 9.2 Logical Op Unit The logical op unit performs two functions; logic ops between the fragment color (source color) and a value from the framebuffer (destination color), and, optionally control of a special Permedia4 mode which allows high performance flat shaded rendering. # 9.2.1 High Speed Flat Shaded Rendering This mode is still supported on the Permedia4 and is detailed below for completeness but offers no advantage over span processing. The technique uses a color value from the **FBWriteData** register instead of fragment color. It is retained for backwards compatibility only. To use the mode the following constraints must be satisfied: - Flat shaded aliased primitive - No dithering required or logical ops - No stencil, depth or GID testing required - No alpha blending If all the conditions are met then load the **FBWriteData** register with the required framebuffer color data and set the *UseConstantFBWriteData* bit in the **LogicalOpMode** register. All unused units should be disabled. This mode is most useful for 2D applications or for clearing the framebuffer when the memory does not support block writes. Note that the **FBWriteData** register should be considered volatile when context switching. ## 9.2.2 Logical Operations The logical operations supported by Permedia4 are: | Mode | Name | Operation | Mode | Name | Operation | |------|--------------|-----------|------|-------------|-----------| | 0 | Clear | 0 | 8 | Nor | ~(S D) | | 1 | And | S & D | 9 | Equivalent | ~(S ^ D) | | 2 | And Reverse | S & ~D | 10 | Invert | ~D | | 3 | Сору | S | 11 | Or Reverse | S ~D | | 4 | And Inverted | ~S & D | 12 | Copy Invert | ~S | | 5 | Noop | D | 13 | Or Invert | ~S D | | 6 | Xor | S^D | 14 | Nand | ~(S & D) | | 7 | Or | S D | 15 | Set | 1 | Where: S = Source (fragment) color, D = Destination (framebuffer) color #### **Table 9.24 Logical Operations** For correct operation of this unit in a mode which takes the destination color, Permedia4 must be configured to allow reads from the framebuffer using the **FBDestReadMode** register. See section §7 for more details. Permedia4 makes no distinction between RGBA and CI modes when performing logical operations. ## 9.2.3 Registers The operation of the unit is controlled by the **LogicalOpMode** register: # LogicalOpMode LogicalOpModeAnd LogicalOpModeOr | Name | Type | Offset | Format | |------------------|-------------------|--------|---------------------| | LogicalOpMode | Logic Ops | 0x8828 | Bitfield | | LogicalOpModeAnd | Logic Ops | 0xAEC0 | Bitfield Logic Mask | | LogicalOpModeOr | Logic Ops | 0xAEC8 | Bitfield Logic Mask | | | Control registers | , | | | Bits | Name | Read | Write | Reset | Description | |------|--------|----------|-------|-------|---------------------------------------------------------| | | | | | | | | 0 | Enable | <b>/</b> | V | X | When set causes the fragment's color to be logial op'ed | | | | | | | under control of the remaining bits in this register. | | | | | | | When clear the fragment color remains unchanged | | | | | | | (but may later to effected by write masking). | | 1 4 | T : O: | | T . | | 771: C 11 1 C .1 1 : 1 | . ( ,: , 271 | |------|---------------|----------|----------|----|--------------------------------------------|---------------------------------------| | 14 | LogicOp | <b>✓</b> | <b>~</b> | X | This field defines the logical of | op runction to use. The | | | | | | | options are: | 1 - A 1(C 9 D) | | | | | | | 0 = Clear(0) | 1 = And(S & D) | | | | | | | 2 = AndReverse (S & ~D) | 3 = Copy (S) | | | | | | | $4 = \text{AndInvert} (\sim \text{S & D})$ | 5 = Noop(D) | | | | | | | $6 = Xor(S \cap D)$ | $7 = Or(S \mid D)$ | | | | | | | $8 = Nor (\sim (S \mid D);$ | $9 = \text{Equiv} (\sim (S \land D);$ | | | | | | | $10 = Invert (\sim D)$ | | | | | | | | $11 = OrReverse (S \mid \sim D)$ | | | | | | | | $12 = \text{CopyInvert} (\sim \text{S})$ | | | | | | | | ` ' ' | $14 = \text{Nand} (\sim (S \& D);$ | | | | | | | 15 = Set (1) | | | | | | | | where: S is Color or FBSon | urceData | | | | | | | D is FBData | | | 5 | UseConstantFB | <b>/</b> | <b>/</b> | x | There is no longer any perfor | rmance advantage to | | | WriteData | | | | using this bit but it is retained | for backwards | | | | | | | compatability. | | | 6 | BackgroundEn | ~ | <b>/</b> | X | This bit, when set, enables a d | lifferent logical operation | | | able | - | | | to be done for background pi | xels. If this bit is clear | | | | | | | then the same logical operatio | on is applied to | | | | | | | foreground and background p | oixels. Setting this bit | | | | | | | when the Enable field is zero | | | | | | | | A background pixel is a pixel | whose corresponding bit | | | | | | | in the color mask is zero. | 1 0 | | 710 | BackgroundLog | ~ | <b>/</b> | X | This field specifies the logical | operation to apply to | | | icalOp | | | | background pixels, if this has | 1 11, | | | 1 | | | | BackgroundEnable field. The | • | | | | | | | are the same as the LogicalOp | | | 11 | UseConstantSo | ~ | ~ | X | This field, when set, causes th | | | | urce | | | 21 | from the ForegroundColor re | | | | | | | | taken from the fragment, if ne | 0 | | | | | | | is in the raw framebuffer form | | | | | | | | should have their color replica | - | | | | | | | bits. | | | 1231 | Unused | 0 | 0 | X | | | | | 1 | U | U | Λ | | | # 9.2.4 XOR Example To set the logical operation to XOR. ``` // Set framebuffer to allow reads // Not shown logicalOpMode.UnitEnable = PERMEDIA4_ENABLE logicalOpMode.LogicalOp = PERMEDIA4_LOGICOP_XOR LogicalOpMode(logicalOpMode) // Load register ``` ## 9.2.5 Logical Op and Software Writemask Example FBSoftwareWriteMask(0xFFFFFE3) To set the logical operation to COPY, enable the software writemask, and write to the green component in an 8 bit framebuffer configured in 3:3:2 RGB mode: ``` // Set framebuffer to allow reads // Not shown ditherMode.UnitEnable = PERMEDIA4_ENABLE ditherMode.DitherEnable = PERMEDIA4_ENABLE ditherMode.ColorMode = PERMEDIA4_COLOR_FORMAT_RGB_332 DitherMode(ditherMode) // Load register logicalOpMode.UnitEnable = PERMEDIA4_ENABLE logicalOpMode.LogicalOp = PERMEDIA4_LOGICOP_COPY LogicalOpMode(logicalOpMode) // Load register ``` **10** # **Framebuffer Writemasks** Two types of framebuffer writemasking are supported by Permedia4; Software and Hardware. Software writemasking requires a read from the framebuffer to combine the fragment color with the framebuffer color before checking the bits in the mask to see which planes are writeable. Hardware writemasking is implemented using SDRAM/SGRAM writemasks and no framebuffer read is required. Refer to section 12.3, Windows Initialisation, for further information on Writemasks and Write initialisation. #### 10.1.1 Software Writemasks Software writemasking is controlled by the **FBSoftwareWriteMask** register. The data field has one bit per framebuffer bit which when set, allows the corresponding framebuffer bit to be updated. When reset it disables writing to that bit. Software writemasking is applied to all fragments and is not controlled by an enable/disable bit. However it may effectively be disabled by setting the mask to all 1's. Note that the *ReadDestination* bit must be enabled in the **FBDestReadMode** register when using software writemasks, in which some of the bits are zero. See the Framebuffer Read/Write section for details of how to enable/disable framebuffer reads. #### 10.1.2 Hardware Writemasks Hardware writemasks, if present, are controlled using the FBHardwareWriteMask register. If the framebuffer supports hardware writemasks, and they are to be used, then software writemasking should be disabled (by setting all the bits in the FBSoftwareWriteMask register). This results in fewer framebuffer reads when no logical operations or alpha blending is needed. If the framebuffer is used in 8 bit packed mode, then an 8 bit hardware writemask must be replicated to all 4 bytes of the FBHardwareWriteMask register. If the framebuffer is in 16 bit packed mode then the 16 bit hardware writemask must be replicated to both halves of the FBHardwareWriteMask register. See the Permedia4 Reference Guide for more details of framebuffer hardware writemasks. ## 10.1.3 Registers Both **FBHardwareWriteMask** and **FBSoftwareWriteMask** are 32 bit registers in which each bit represents a bit in the framebuffer. # 10.1.4 Software Writemask Example Using software writemasks: // Enable framebuffer reads (not shown) // Set the writemask FBSoftwareWriteMask(0x0F0F0F0F) See §9.2.5 for another example ## 10.1.5 Hardware Writemask Example Using hardware writemasks when neither logic ops, nor alpha blending are enabled: // Disable framebuffer reads (not shown) // Set the writemasks FBSoftwareWriteMask(0xFFFFFFF) // 'Disable' FBHardwareWriteMask(0xF0F0F0F0) // Actual writemask 11 # **Host Out** The Host Out Unit controls which data is available at the output FIFO, and gathers statistics about the rendering operations (picking and extent testing) and the synchronization of Permedia4 via the **Sync** register. ## 11.1 Filtering Filtering controls the data available at the output FIFO. There are a number of categories: - depth, stencil and color: these are data values associated with a fragment which has been read from the localbuffer or framebuffer, or generated using the UpLoadData flag in the Framebuffer Write Unit. - A single register, Sync, which is used to synchronize Permedia4 and flush the graphics pipeline. - Statistics: The registers associated with extent and picking. The filtering is controlled by the **FilterMode** register which is split into 2 bit fields for each category. The 2 bit field selects whether the register tag and/or register data, are passed to the output FIFO. The format of the **FilterMode** register is shown in the table below. | Register Category | Tag<br>Control<br>Bit | Data<br>Control<br>Bit | Description | |---------------------|-----------------------|------------------------|------------------------------------------------------------------------------------------------------------------------------------------------| | Diagnostic Use Only | 0 | 1 | | | Diagnostic Use Only | 2 | 3 | | | Depth | 4 | 5 | This is the data from image upload of the Depth (Z) buffer. | | Stencil | 6 | 7 | This is the data from image upload of the Stencil buffer. | | Color | 8 | 9 | This is the data from image upload of the Framebuffer (FBColor). | | Synchronization | 10 | 11 | | | Statistics | | 13 | This is the data generated following a command to read back the results of the statistic measurements: PickResult, MaxHitRegion, MinHitRegion. | | Diagnostic Use Only | 14 | 15 | | Table 11.25 Filter Modes ## 11.1.1 Filter Mode Example // Set up Filter mode to only permit read back of // synchronization tag and data FilterMode(0x0C00) // Set bits 10 & 11 ## 11.1.2 Statistic Operations There are two statistic collection modes of operation; picking and extent checking. Picking is normally used to select drawn objects or regions of the screen. Typically, extent checking is used to determine the bounds within which drawing has occurred so that a smaller area of the framebuffer can subsequently be cleared. Spans are handled by Permedia4 in a fully consistent way for picking and extent checking. Statistic collection is controlled using the **StatisticMode** register. #### 11.1.2.1 **Picking** In picking mode, the active and/or passive fragments and spans have their associated XY coordinates compared against the coordinates specified in the **MinRegion** and **MaxRegion** registers. If the result is true, then the PickResult flag is set, otherwise it holds its previous state. The compare function can be either Inside or Outside. Before picking picking can start, the **ResetPickResult** register must be loaded to clear the PickResult flag. The **MinRegion** and **MaxRegion** registers are loaded to select the region of interest for picking picking. A coordinate is inside the region if: $X_{min} \le X < X_{max}$ $Y_{min} \le Y < Y_{max}$ where X and Y are from the fragment and the min/max values are from **MinRegion** and **MaxRegion** registers. This comparison is identical to the one used in the scissor tests. The following stages are required for picking picking: - 1) load ResetPickResult, MinRegion and MaxRegion registers - 2) Set up the FilterMode to allow statistic commands out of Permedia4 MX - 3) Draw the primitives. - 4) Send a **PickResult** command. - 5) Poll the output FIFO while waiting for the PickResult to have passed through the pipeline. #### 11.1.2.2 Extent Checking In extent mode, active and/or passive fragments have their associated XY coordinates compared to the **MinRegion** and **MaxRegion** registers and if found to be outside the defined rectangular region, then the appropriate register is updated with the new coordinate(s) to extend the region. The Inside/Outside bit has no effect in this mode. Block fills are included in the extent checking if the StatisticMode register is set to include spans. The **MinRegion** and **MaxRegion** registers are loaded to select the maximum value (MinRegion) and minimum value (MaxRegion) for extent checking. A coordinate is inside the region if: $X_{\min} \le X < X_{\max}$ $$Y_{min} \le Y < Y_{max}$$ where X and Y are from the fragment and the min/max values are from **MinRegion** and **MaxRegion** registers. This comparison is identical to the one used in the scissor tests. Once all the necessary primitives have been rendered the results can be found using the **MinHitRegion** and **MaxHitRegion** commands, which cause the contents of the **MinRegion** and **MaxRegion** registers respectively to be written into the output FIFO (under control of the **FilterMode** register). ## 11.1.3 Synchronization The **Sync** register is filtered and written to the output FIFO in a similar fashion to the other registers. If an interrupt is required then the most significant bit of the **Sync** command register must be set, and the filtering must be set up to write something into the FIFO. If nothing is written to the FIFO (because of the FilterMode) then no interrupt is generated. The actual interrupt is not generated until the **Sync** data or tag has passed through. It is on the output of the FIFO, which allows low level resynchronization between the core and PCI clock domains. The FIFO has an extra bit in width to accommodate the interrupt signal. When both the data and tag are written into the FIFO only the first entry in the FIFO will cause the interrupt (assuming an interrupt was requested). The remaining bits in the data field are free and can be used by the host to identify the reason for the Sync. ## 11.1.4 Registers Filtering is controlled by the **FilterMode** register: # FilterMode FilterModeAnd FilterModeOr | Name | Type | Offset | Format | |---------------|-------------------|--------|---------------------| | FilterMode | Output | 0x8C00 | Bitfield | | FilterModeAnd | Output | 0xAD00 | Bitfield Logic Mask | | FilterModeOr | Output | 0xAD08 | Bitfield Logic Mask | | | Control registers | | | | Bits | Name | Read<br>24 | Write | Reset | Description | |------|-------------|------------|----------|-------|-------------------------------------------------------------------------------------------| | 03 | Reserved | V | <b>/</b> | X | Reserved for diagnostic use – set to 0 | | 4 | LBDepthTag | ~ | ~ | x | When set allows the <i>LBDepth</i> tag to be written into the output FIFO. | | 5 | LBDepthData | ~ | ~ | xx | When set allows the data upload from the Depth buffer to be written into the output FIFO. | | 6 | StencilTag | ~ | ~ | X | When set allows the LBStencil tag to be written into the output FIFO. | <sup>&</sup>lt;sup>24</sup> Logic Op register readback is via the main register only | 7 | StencilData | ~ | ~ | X | When set allows the data upload from the Stencil buffer to be written into the output FIFO. | | |------|--------------------------|----------|----------|---------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--| | 8 | FBColorTag | ~ | ~ | When set allows the <i>FBColor</i> tag to be written into the output FIFO. | | | | 9 | FBColorData | ~ | ~ | When set allows the data upload from the framebuf to be written into the output FIFO. | | | | 10 | SyncTag | ~ | ~ | X | When set allows Sync tag to be written into the output FIFO. | | | 11 | SyncData | <b>'</b> | ~ | x | When set allows the Sync data to be written into the output FIFO. | | | 12 | StatisticsTag | ~ | ~ | X | When set allows the <i>PickResult</i> , <i>MaxHitRegion</i> and <i>MinHitRegion</i> tags to be written into the output FIFO. | | | 13 | StatisticsData | ~ | ~ | X | When set allows the <i>PickResult, MaxHitRegion</i> and <i>MinHitRegion</i> data to be written into the output FIFO. | | | 14 | RemainderTag | ~ | ~ | X | When set allows any tags not covered by the categories in this table to be written into the output FIFO. | | | 15 | RemainderData | ~ | ~ | X | When set allows any data not covered by the categories in this table to be written into the output FIFO. | | | 1617 | ByteSwap | V | <b>V</b> | | This field controls the byte swapping of the data field when it is written into the output FIFO. The options are: 0 = ABCD (i.e. no swap) 1 = BADC 2 = CDAB 3 = DCBA | | | 18 | ContextTag | ~ | ~ | x | When set allows the <i>ContextData</i> and <i>EndOfFeedback</i> tags to be written into the output FIFO. | | | 19 | ContextData | ~ | ~ | x | When set allows the ContextData and EndOfFeedback data to be written into the output FIFO. | | | 20 | RunLength<br>Encode Data | ~ | ~ | X | This bit, when set, will write run length encoded data into the host out FIFO. | | | 2131 | Unused | 0 | 0 | x | | | Notes: This register can only be updated if the *Security* register is set to 0. #### Figure 11-1 FilterMode Register Statistic collection is controlled by the StatisticMode register: # StatisticMode StatisticModeAnd StatisticModeOr | Name | Type | Offset | Format | |------------------|---------|--------|---------------------| | StatisticMode | Output | 0x8C08 | Bitfield | | StatisticModeAnd | Output | 0xAD10 | Bitfield Logic Mask | | StatisticModeOr | Output | 0xAD18 | Bitfield Logic Mask | | | Command | | | | Bits | Name | Read | Write | Reset | Description | |------|---------------------|------|-------|-------|-----------------------------------------------------------------------------------------------------| | 0 | Enable | ~ | ~ | X | When set allows the collection of statistics information. | | 1 | StatsType | ~ | ~ | X | Selects the type of staticstics to gather. The options are: 0 = Picking 1 = Extent | | 2 | ActiveSteps | ~ | ~ | x | When set includes active fragments in the statistics gathering, otherwise they are excluded. | | 3 | PassiveSteps | ~ | ~ | x | When set includes culled fragments in the statistics gathering, otherwise they are excluded. | | 4 | Compare<br>Function | ~ | ~ | X | Selects the type of compare function to use. The options are: 0 = Inside region 1 = Outside region | | 5 | Spans | ~ | ~ | X | When set includes spans in the statistics gathering, otherwise they are excluded. | | 631 | Unused | 0 | 0 | X | | #### Figure 11-2 StatisticMode Register **MinRegion**, **MaxRegion** registers are used to load picking/extent regions, and **MaxHitRegion** and **MinHitRegion** are used to read the registers back. The format is 16 bit 2's complement numbers, X in the least significant end of the word. **PickResult** is used to read the results of picking, the pick flag is placed in the least significant bit of the 32 bit register. **ResetPickResult** is used to clear the picking flag, the data field is not used. The **Sync** register is 32 bits with the most significant bit set to indicate an interrupt is to be generated, bits 0-30 are available for the user. # 11.1.5 Picking Example Set the statistic mode to picking and detect any active fragments in the region $0x0 \le x \le 0x100$ , $0x0 \le y \le 0x100$ . Render some primitives then read back the results. ``` // Set filter mode as above FilterMode(0x0C00) // Set bits 10 & 11 // Set statistic mode MinRegion(0) MaxRegion(0x100 | 0x100 << 16) // Clear the picking flag ResetPickResult(0x0) // Data not used ``` ``` // Now render primitives.... ... Render (render) // All units set as appropriate // All rendering finished. // Set the filter mode to allow read back of Syncs // and statistic information (tag and data) FilterMode(0x3C00) // Set bits 10 to 13 // Write to the PickResult register PickResult(0x0) // Data not used // Now read the PickResult from the output FIFO (not shown) ``` ## 11.1.6 Sync Interrupt Example Generate a synchronization interrupt and encode some user defined data (0x34) in the lower 31 bits of the Sync register. ``` // Set up Filter mode to only permit read back of // synchronization tag and data FilterMode(0x0C00) // Set bits 10 & 11 // Write to the Sync register with the top bit // (bit 31) set and user data encoded into the // lower bits (0-30) sync = (0x1 << 31) | (0x34 & 0x7FFFFFFF) Sync (sync) // Now wait for the sync interrupt. (Not shown.) ``` 12 # **Initialization** ## 12.1 Initializing Permedia4 This section illustrates how to initialize **Permedia4** following reset, prior to carrying out rendering operations. Initialization falls broadly into three areas, though in different systems precise responsibilities can vary: - System initialization covers the PCI bus, memory set-up and video output. This information typically is only initialized once following reset. - Window initialization covers the base address of the current rendering window and its color format. This must be initialized at reset and needs to be updated each time Permedia4 starts drawing to a new window. - Application initialization covers state that is typically dynamic, enabling and disabling depth testing for example. Again this state must be set at reset, but is likely to be updated relatively frequently. To make use of the full functionality of Permedia4 consult the relevant sections of Chapter 1 - Graphics Programming. Examples are given which make use of the pseudocode conventions given in Appendix B. Note: In general the graphics registers are not hardware initialized to specific values at reset. In the examples below it is assumed that the data structures used to load these registers are initialized to zero. Thus bit fields which are not set explicitly default to zero. #### 12.1.1 Reset and initialisation The units and FIFOs can be reset under software control or by a hardware reset signal, usually as a result of power-on. During reset all the inter-unit FIFOs, the FIFOs between the core and the memory controller, and the host interface are emptied. Some of the units (Local Buffer Read and Framebuffer Read) also have internal FIFOs and these are cleared as well. All the state machines in each unit are forced into their idle state so this together with the FIFOs being empty guarantees a safe start when the first message is received. Note: A reset does not, in general, change the contents of any state information which can be read back. After a power-on reset all these registers must be initialized by software to place them in a well defined state before any rendering is done. Units are not automatically disabled on a reset. ## 12.2 System Initialization #### 12.2.1 PCI bus There are a set of PCI related registers which can be interrogated for information about the chip, for example its revision and device ID. Some of these PCI related registers need to be set up at reset, for instance to configure the base addresses of the different memory regions of the chip. However, the subject of PCI bus initialization is beyond the scope of this document. For more details refer to the Reset chapter of the *Permedia4 Reference Guide*, and the *PCI Local Bus Specification Rev2.1*. ## 12.2.2 Memory Configuration There are no memory hardware configuration pins. Memory parameters are set through a group of registers in Region 0. These parameters are described in detail in the *Permedia4 Reference Guide*, chapter 9 (Memory Systems) including register bitfields and sample configurations. The primary registers are **LocalMemCaps** and **LocalMemControl**. **LocalMemCaps** is show below. ## LocalMemCaps NameTypeOffsetFormatLocalMemCapsMemory Control0x1018Bitfield Command register | Bits | Name | Read | Write | Reset | Description | | |------|-----------------------|------|----------|-------|------------------------------------------------------------------------------|--| | 03 | Column<br>Address | 1 | <b>✓</b> | 0 | Address bits to use for column address. | | | 47 | RowAddress | 1 | 1 | 0 | Address bits to use for row address. | | | 811 | BankAddress | 1 | 1 | 0 | Address bits to use for bank address. | | | 1215 | ChipSelect | 1 | 1 | 0 | Address bits to use for chip select. | | | 1619 | PageSize | 1 | 1 | 0 | Page size (units = full width of memory)<br>0 = 32 units 1 = 64 units, etc | | | 2023 | RegionSize | ✓ | 1 | 0xF | Region size (units = full width of memory)<br>0 = 32 units 1 = 64 units, etc | | | 24 | NoPrecharge<br>Opt | 1 | 1 | 0 | 0 = off $1 = on$ | | | 25 | SpecialMode<br>Opt | 1 | 1 | 0 | 0 = off $1 = on$ | | | 26 | TwoColor<br>BlockFill | 1 | 1 | 0 | 0 = off $1 = on$ | | | 27 | Combine Banks | 1 | 1 | 0 | 0 = off $1 = on$ | | | 28 | NoWriteMask | 1 | 1 | 0x1 | 0 = off $1 = on$ | | | 29 | NoBlockFill | 1 | 1 | 0x1 | 0 = off $1 = on$ | | | 30 | HalfWidth | 1 | 1 | 0x1 | 0 = off $1 = on$ | | | 31 | NoLookAhead | 1 | 1 | 0x1 | 0 = off $1 = on$ | | Notes: - 1. The ColumnAddress, RowAddress, BankAddress, and ChipSelect fields select the bits of the absolute physical address that are to be used to define corresponding parameters. Each value follows on from the previous one, so the ChipSelect value starts at ColumnAddress + RowAddress + BankAddress and continues for ChipSelect bits. - 2. The PageSize field defines the size of the page, and the RegionSize field defines the size of the region of memory that each of the four page detectors should be assigned to (so that it is set to one quarter of the memory size). ## 12.2.3 Internal Video Timing Registers Video Timing initialization is described in Volume I, chapter 5 (Video System). ## 12.2.4 Framebuffer Depth The size of each pixel to be written into the framebuffer is set up using the **PixelSize** register. The two bit pixel size encoding field sets the pixel size to be used for merging the pixel data into the memory. It is normally set to the same value for all functions, but for generating texture maps it may be advantageous to use a different write pixel size. The pixel size is taken from bits 0...1 when bit 31 is 0 or taken from subsequent bites for local functionality when bit 31 is 1. The two bit pixel size is encoded as follows: - 0 = 32 bpp - 1 = 16 bpp - 2 = 8 bpp During readback bits 0...17 and 31 return values as loaded and bits 18...30 return zero. #### 12.2.5 Screen Width The visible screen width depends on the framebuffer configuration, screen clipping dimensions and RAMDAC setup. Framebuffer configuration is described in Volume I, section 3.5.1 (Framebuffer Dimensions and Depth). ## 12.2.6 Screen Clipping Region Permedia4 supports a screen scissor clip which should be set at system initialization, and a user scissor clip which should initially be disabled. Assuming that the relevant framebuffer registers<sup>25</sup> are set appropriately (see the *P4 Programmer's Guide* Volume I, chapter 4, "Buffer and Cache Management") then setting the screen clip prevents writing outside framebuffer memory. The following example would be appropriate for a resolution of 1024 by 768 pixels: ``` screenSize.X = 1024 screenSize.Y = 768 ScreenSize(ScreenSize) scissorMode.ScreenScissorEnable = Permedia4_ENABLE scissorMode.UserScissorEnable = Permedia4_DISABLE ScissorMode(ScissorMode) ``` ## 12.2.7 Localbuffer and Framebuffer Configuration Permedia4 supports a range of localbuffer configurations. During initialization, fields in the **LBWriteFormat**, **LBWriteBufferWidth** and **LBReadFormat** registers should be set to appropriate values which reflect the depth of memory on the board design, and the initial manner in which it is to be used. N.B. The width of the Local and Frame buffers is needed to convert x.y coordinates into a physical address (= Y \* FBWriteBufferWidth[buffer] + X). The frame buffer height is not needed for this calculation. For example if the hardware is designed to support a 32 bit localbuffer, and initially this is to be divided into a 24 bit Depth buffer, 4 bit stencil and 4 GID planes then the registers must be set as follows (where "[mode]" = either destination or source): ``` lb[mode]ReadFormat.DepthWidth = 1 // 24 bit depth buffer lb[mode]ReadFormat.StencilPosition = 8 // Stencil @ 24 ``` 12-4 <sup>&</sup>lt;sup>25</sup> Framebuffer and Localbuffer memory is defined using source and destination read and write base addresses, offsets and widths for various formats and layouts. ScreenSize will then be a subset of the memory allocated to the buffers.. ``` \begin{tabular}{l} lb[mode]ReadFormat.StencilWidth &= 4 & // 4 bit stencil \\ lb[mode]ReadFormat.GIDWidth &= 4 \\ lb[mode]ReadFormat.GIDPosition &= 12 & // GID @ 29 \\ \end{tabular} ``` #### LB[MODE]ReadFormat(lb[mode]ReadFormat) ``` lbWriteFormat.DepthWidth= 1// 24 bit depth bufferlbWriteFormat.StencilPosition= 8// Stencil @ 24lbWriteFormat.StencilWidth= 4// 4 bit stencillbWriteFormat.GIDWidth= 4lbWriteFormat.GIDPosition= 12//GID @ 29 ``` LBWriteMode(lbWriteFormat) Note that within the limits of the memory depth that is physically available, it is possible to dynamically change the allocation of the bits, for instance on a per window basis. Set the framebuffer and localbuffer source and/or destination read units to their default data sources: ``` fbSourceReadMode.DataType = Permedia4_FBSourceDATA FBSourceReadMode(fbSouceReadMode) ``` ``` lbSourceReadMode.DataType = Permedia4_LBSourceDEFAULT LBSourceReadMode(lbSourceReadMode) ``` #### 12.2.8 Host Out Unit Under some circumstances it is necessary for the host to synchronize with Permedia4. This is controlled using the **Sync** command which causes data to be written to the host out FIFO once all processing has completed. The host out FIFO should normally be initialized to pass these pieces of data (they can be filtered out). The host out unit should normally be set to filter out all other output data, otherwise the host software must regularly poll the output FIFO to keep it drained and prevent it freezing the pipeline. For example: ## 12.2.9 Disabling Specialized Modes The Graphic ID should normally be initially disabled using the **GIDMode** *FragmentEnable* bit. Refer to chapter 1 - Graphics Programming - for more details. #### 12.3 Window Initialization Permedia4 supports the concept of a window origin and makes it relatively simple to implement systems which allow different color formats to coexist in different windows. #### 12.3.1 Color Format The Color formatting unit and the Alpha blend unit should be initialized to an appropriate color format at reset. The units support a variety of different formats - see the *Permedia4 Reference Guide*, **AlphaBlendColor** register *ColorFormat* bitfield and related tables. For example to render in 3:3:2, 8 bit color format, the following would be needed: ``` ditherMode.ColorFormat = Permedia4_COLOR_FORMAT_RGB_332_FRONT DitherMode(ditherMode) alphaBlendColorMode.ColorFormat = Permedia4_COLOR_FORMAT_RGB_332_FRONT AlphaBlendColorMode(alphaBlendColorMode) ``` To enable dithering use the following: ``` ditherMode.Xoffset = 0 ditherMode.Yoffset = 0 ditherMode.DitherEnable = Permedia4_ENABLE ditherMode(ditherMode) Permedia4_ENABLE ``` Note: The color formatting unit is normally always enabled even if dithering itself is not. This is because the unit handles color formatting as well as the dithering operation. ## 12.3.2 Setting the Window Address and Origin Permedia4 supports the concept of a current window origin. The origin of the window can be specified either as being in the Top Left or Bottom Left corner and (for Framebuffer functions) one of four destination buffers. This allows the user to pick the most appropriate coordinate system to use; for OpenGL it would typically be bottom left, whereas for an X windows implementation it would be Top Left. Thus for OpenGL set: ``` fbDestReadMode.Origin[1] = Permedia4_BOTTOM_LEFT_WINDOW_ORIGIN FBDestReadMode(fbDestReadMode) lbDestReadMode.Origin[1] = Permedia4_BOTTOM_LEFT_WINDOW_ORIGIN LBDestReadMode(lbDestReadMode) ``` The window dimensions for clipping are set in the scissor unit. The ScissorMinXY register holds the minimum XY scissor coordinate - i.e. the rectangle corner closest to the screen origin. This information usually is provided by the window system. It needs updating if the window moves. As an example if the position of the window is (200, 600 to 480,960) (using a bottom left coordinate system), the clipping coordinate is specified as follows: ScissorMinXY = 200,600ScissorMaxXY = 480.960 To set the buffer origin using the **BufferAddress** and **BufferOffset** registers see P10 Programmer's Guide Volume 1, "Buffer and Cache Management". Unpatched addresses can be held using only the BufferAddress register(s). Patched address offsets must be held in the BufferOffset registers to convert the absolute memory address into a screerelative address which can be used for patching. #### 12.3.3 Writemasks Normally both the hardware (if present) and the software writemasks are initially set to make all bitplanes writeable: FBSoftwareWriteMask(**Permedia4\_**ALL\_WRITEMASKS\_SET) FBHardwareWriteMask(Permedia4 ALL WRITEMASKS SET) See Chapter 10, Framebuffer Writemasks, for more information. ## 12.3.4 Enabling Writing Which buffers are enabled at any given time is window specific and should be considered for performance reasons. Performance will be improved if unnecessary reads from, and writes to, buffers are disabled. For example if the current rendering does not use depth, stencil, or pixel ownership testing, then reading and writing to the localbuffer may be disabled. The following example initializes the buffers to allow Z buffering and alpha blending: fbWriteMode.WriteEnable Permedia4\_ENABLE FBWriteMode(fbWriteMode) lbWriteMode.WriteEnable **Permedia4** ENABLE LBWriteMode(lbWriteMode) lbSourceReadMode.Enable Permedia4 DISABLE lbDestReadMode.Enable Permedia4 ENABLE LBSourceReadMode(lbSourceReadMode) LBDestReadMode(lbDestReadMode) Permedia4\_DISABLE fbSourceReadMode.ReadEnable = fbDestReadMode.ReadEnable Permedia4\_ENABLE FBDestReadMode(fbDestReadMode) to use software writemasking, the FBDestReadMode register's ReadEnable *Note:* field needs to be set if the writemask is set to other than all 1's. ## 12.4 Application Initialization While an application is running it may dynamically use features of Permedia4 such as depth buffering, alpha blending, logical operations, etc. Initially, however, it is recommended that the respective units be disabled to ensure they are in a known state: areaStippleMode.Enable = **Permedia4\_**DISABLE AreaStippleMode(areaStippleMode) lineStippleMode.StippleEnable = **Permedia4\_**DISABLE LineStippleMode(lineStippleMode); routerMode.Sequence = **Permedia4\_**SET RouterMode(routerMode) //Set to skip texture since stencil and depth disabled// stencilMode.UnitEnable = **Permedia4\_**DISABLE StencilMode(stencilMode) depthMode.Enable = **Permedia4\_**DISABLE DepthMode(depthMode) colorDDAMode.Enable = **Permedia4\_**DISABLE ColorDDAMode(colorDDAMode) textureCoordMode.Enable = **Permedia4\_**DISABLE TextureCoordMode(textureCoordMode) textureIndexMode.Enable = **Permedia4** DISABLE TextureIndexMode(textureIndexMode) textureReadMode.Enable = **Permedia4\_**DISABLE textureReadMode(textureReadMode) TextureCompositeColorMode.Enable = **Permedia4\_**DISABLE TextureCompositeColorMode(TextureColorMode) fogMode.Enable = **Permedia4\_**DISABLE FogMode(fogMode) antialiasMode.Enable = **Permedia4\_**DISABLE AntialiasMode(antialiasMode) alphaTestMode.Enable = **Permedia4\_**DISABLE Alpha Test Mode (alpha Test Mode) alphaBlendAlphaMode.Enable = **Permedia4\_**DISABLE AlphaBlendAlphaMode(alphaBlendAlphaMode) alphaBlendColorMode.Enable = **Permedia4\_**DISABLE AlphaBlendColorMode(alphaBlendColorMode) logicalOpMode.Enable = **Permedia4\_**DISABLE LogicalOpMode(logicalOpMode) 13 # **Performance Tips** The following is a list of software programming tips and techniques which can be applied to maximize Permedia4 performance. Many of these are debug aids and the importance of effective debug techniques cannot be overemphasised: As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs. - Maurice Wilkes discovers debugging, 1949 The list is intended to be suggestive only and refers back to the *Permedia4 Reference Guide* and earlier chapters of the *Programmers Guide*. - Using Block Writes e.g. for clears - Fast double buffering in a window - Disable FB reads-per-pixel if they are not required - Incrementing addresses when writing to the FIFO to enable PCI burst transfers - Using PCI Disconnect under PIO - Using bus mastership (i.e. DMA) - Improving DMA bus bandwidth utilization using the indexed FIFO modes - Disabling units that are not in use (e.g. Framebuffer reads) - Use of the extent register to minimize the area in the localbuffer and framebuffer that needs to be cleared - Use of the Permedia4 graphics pipeline in preference to the framebuffer (and/or localbuffer) bypass when possible - Loading registers in unit order (i.e. Rasterizer first Host Out last) - Avoiding unnecessary register updates - Miscellaneous debug and generic graphics tips #### 13.1 Block Writes Permedia4 boards are equipped with either SGRAM that supports block writes or SDRAM which does not. This allows up to 32 pixels at a time to be filled with a constant color by a single framebuffer write access. This can lead to roughly a 32fold increase in the speed of, for instance, clearing a large area of the framebuffer. While this technique is most useful when clearing the framebuffer, it can be used to fill any trapezoid. See volume I, section 4.3.3 - Block Writes. ## 13.2 Fast double buffering in a window Double buffering is a technique used to achieve visually smooth animation, by rendering a scene to an offscreen buffer, before quickly displaying it. Permedia4 board designs can readily support a variety of double buffering mechanisms depending on the memory configuration and LUT-DAC used, including: - BLT - Full Screen Note: The best results can often be achieved by combining double buffering techniques. ## 13.3 Disable FB Reads per pixel if not required The *AlphaFiltering* bit in **FBDestReadMode** can reduce unnecessary FB reads. When set, it compares the fragment's alpha value and if it is equal to the AlphaReference value (held in **FBReadEnables**) then no read is done. This saves memory bandwidth when the destination color doesn't contribute to the fragment's color during blending. ## 13.4 Improving PCI bus bandwidth for Programmed I/O and DMA Writing data values into the memory mapped registers is appropriate for primitives which require few set-up parameters such as 2D lines. For more complex primitives such as Gouraud shaded triangles where a significant number of registers must be loaded for each primitive, it may be more efficient to write directly to the FIFO input. The advantage of this mechanism is that it is then possible to use DMA burst transfers. The disadvantage is that both the address of the register and the data value to be loaded must be written, apparently doubling the amount of data to be loaded. However, to improve DMA bus bandwidth utilization, the registers have been grouped into blocks which frequently all need to be updated together, and an indexed addressing mode is supported which allows a single "address" to be loaded, followed by the data for a whole set of registers. An additional mode is supported which allows a large number of data values to be loaded to the same register. This is useful for image downloads. It may also be possible to reduce DMA overhead by re-using DMA buffers and vertex buffers. The HostInID register can be used to mark any point in the command stream so that the use of index and vertex buffers can be monitored. This register is loaded with an ID field; like the DMA address register, which can be read at any time to check the progress of the command stream. ## 13.5 PCI burst transfers under Programmed I/O PCI bus burst transfers typically allow up to four times the bandwidth of individual transfers. However burst transfers are only initiated on the PCI bus when successive addresses are being written to (i.e. the byte address is incremented by 4). To facilitate the use of burst transfers when using programmed I/O to load the Permedia4 FIFOs, Permedia4 multiply maps the FIFO input register throughout the range: 0x00002000 to 0x00002FFF in region 0 Thus when data is being loaded into the FIFO a software loop should be written which starts by writing the first data item at the lower extreme of this address range, and works towards the upper. ## 13.6 Using PCI Disconnect Under Programmed I/O The PCI bus protocol incorporates a feature known as PCI Disconnect which is supported by Permedia4. Once Permedia4 is in this mode, if the host processor attempts to write to the full FIFO then instead of the write being lost, Permedia4 asserts PCI Disconnect which forces the host processor to retry the write cycle until it succeeds. This feature allows faster download of data to Permedia4 since the host need not poll the **InFIFOSpace** register. However it should be used carefully because the bus is then effectively hogged by the host processor until Permedia4 frees up an entry in its FIFO. ## 13.7 Using Bus Mastership (DMA) Most Permedia4 boards support PCI bus mastership, allowing the on-board DMA of Permedia4 to be used to copy data from host memory into Permedia4 FIFO. Bus mastership mode is asserted in the **CFGCommand** register using bit 2, *BusMasterEnable*. The use of PCI bus mastership has a number of benefits: - PCI bus bandwidth utilization is generally much improved. Permedia4 has been measured achieving transfer rates of up to 30-40MBytes/sec with a fast host slave (P90 Neptune chipset). - PCI bus bandwidth is further improved because the driver software no longer needs to poll the FIFO flags to find how many entries are empty, before loading it. - Overall system performance may benefit through increased parallelism between Permedia4 and the host, as the host can often perform useful work preparing the next DMA buffer once it has initiated one DMA transfer. ## 13.8 Disabling units not in use As a general rule any units within Permedia4 which are not actively in use for the current rendering should be disabled. Each unit has a bit in a control register for this purpose. This will maximize pixel throughput in the graphics core. In particular it is important to check that unnecessary reads of the localbuffer are not taking place. For instance it is perfectly possible to set up the localbuffer read unit such that Permedia4 reads per pixel information (such as Z, stencil and GID data) which is then discarded. The effect will be the same visually, but the cost in performance of making the memory accesses will be very high. Similar comments apply for the framebuffer read unit which again should only be enabled to read pixel data when it is essential. Note Permedia4 boards typically support hardware writemasks and these should be used in preference to the software writemasks. # 13.9 Clearing the localbuffer & framebuffer Permedia4 can be instructed in the StatisticsMode StatsType register field to maintain a record of the minimum bounding box (MinRegion and MaxRegion registers) that has been rendered to, in a given period. This can be used to limit the area that must be cleared down using span fill. For further details see chapter 11, Host Out, on Extent Checking ## 13.10 Use of the Framebuffer (or Localbuffer) Bypass Whenever possible rendering should be done through the Permedia4 graphics pipeline. This is because reading and writing the framebuffer (or localbuffer) using the bypass is relatively slow. In some cases performance may even be improved if a small area of the framebuffer (and/or localbuffer) is uploaded through the graphics pipeline into a bitmap, rendered to, and then downloaded again through the graphics pipeline. ## 13.11 Loading Registers in Unit Order To maximize performance, the control registers for the next primitive should be loaded into the Permedia4 FIFO in unit order. Thus the registers associated with the Rasterizer unit should be loaded first, then Scissor unit, Stipple unit, Color DDA, and so on until the last unit to be loaded is the Host Out unit (if necessary). Then finally the relevant command register should be loaded. For the order of the units refer to chapter 1, Figure 1-1. ## 13.12 Avoiding Unnecessary Register Updates Permedia4 control registers retain their value between one primitive and the next. Thus it is not necessary to reload registers that are unchanged between primitives. e.g. the dY register usually is set to either +1 or -1 (except when antialiasing). In addition calculations of register values can often be shared across primitives, for instance between edges in adjacent polygons in meshes. # 13.13 Hardware and Software Context Dumps Permedia4 supports **ContextDump** and **ContextRestore** commands and a **StatisticsMode** register, with enables for extent checking and picking set in the **FilterMode** register. These allow the selection of active and passive fragments by screen area and other parameters at specific points in the render process, and state switching to halt and resume graphic processing while examining the collected data. The decision to use hardware context management may depend on the software regime being supported. In the D3D environment it may be more effective to save all the context state in software copies. When a context is switched to, simply set up the chip again. This avoids the need to wait for a Context Dump before switching away from a context and takes advantage of D3D's capabilities. However the hardware-assisted route is generally preferred by OGL developers. # 13.14 Use the Memory Scratchpad Registers By keeping track of which primitives have finished rendering it is often possible to avoid waiting for chip syncs. When applications do procedural tetxuring they need to change the texture every frame. Normally host access to a texture that has been rendered with requires a chip-sync. Using scratchpad memory to keep track of primitives which have finished rendering allows the driver to confirm that the last primitive to use that texture has indeed completed and the application can now access the texture immediately with no sync. As long as applications only want to change the texture some time after they rendered with it (the best time is just before rendering the new version) then chip-syncs can be almost entirely avoided. The same approach can be used when the application is changing render target and doing a render-to-texture or blit-to-texture. Similarly, when the driver is texture swapping, it can tell which textures it can and can't touch using this tracking information. ## 13.15 Miscellaneous Tips The following is a set of miscellaneous tips that are not Permedia4 specific but well worth using. - Avoid polling for Vblank whenever possible but if you have to poll, consider whether your application is taking just longer than an integer number of Vblank intervals to draw a frame slightly simplifying the frame to make it just under an integer multiple can dramatically improve performance. - Another way of looking at the same problem is, if you remove your SwapBuffers() calls, does your application render many more frames per second? If so, you might be spending a lot of time waiting for buffer swaps, and you should tune your app so that it draws just enough to fit in one less frame time. - When using DMA it may be best to flush the DMA buffer to the chip after entering a large primitive in the buffer (e.g. screen clear), so that the chip is doing useful work while further primitives are being prepared on the host. - Minimize the use of the Sync command. - Does making your window smaller cause things to speed up? If so, you're probably fill-limited (bottlenecked by filling the pixels in the window). Speed things up by reducing the depth complexity of your scene or by using simpler drawing operations wherever possible (e.g., avoiding depth-buffering for the background or ground plane). - Does making your window smaller have no effect on the time it takes to draw a frame? If so, you're probably geometry-limited (bottlenecked by transformations, clipping, or lighting) or host-limited. - Measure the time it takes your application to draw a frame. Now comment out all the drawing calls, and measure again. If most of the elapsed time per frame is spent doing things other that drawing, your application is probably host-limited rather than geometry-limited. - If you're geometry-limited, you can speed things up by using simpler models with fewer vertices, by reducing the amount of clipped geometry, by using fewer light sources, etc. If you're host-limited you should use profiling tools to figure out where your application is spending its time and then tune those areas. 14 # **Appendices** #### 14.1 **Pseudocode Definitions** In many areas of the document we use fragments of pseudocode to describe register loading. These are based on a C interface to Permedia4 in which each 32 bit register is represented as a C structure, potentially split into a series of bit fields. Where in an example only a subset of the bit fields in a register are set, it is assumed either that a software copy of the register is being modified, or that the current contents of the register have first been read back. This style has been chosen for clarity; there are often more efficient strategies. The constant definitions and register bit field definitions are based upon those used in the 3Dlabs driver software. Sources including header files are available under source license agreement. Loading of a Permedia4 register is expressed as: ``` register-name(value) ``` When writing directly to the register file (i.e. to a FIFO) this would be implemented by writing "value" to the mapped-in address of the register called "register-name". Fragmentary examples are not in strict C syntax, a typical example is: ``` // Sample code to rasterize a 10x10 rectangle at the // framebuffer origin. StartXDom(0) // Start dominant edge StartXSub(1<<16) // Start of subordinate dXDom(0x0) dXSub(0x0) Count(0xA) YStart(0) dY(1 << 16) // Set-up to render an aliased trapezoid. render.AreaStippleEnable = Permedia4\_DISABLE render.LineStippleEnable = Permedia4 DISABLE render.PrimitiveType = Permedia4_TRAPEZOID render.FastFillEnable = Permedia4_DISABLE render. Fast Fill Increment = don't care ``` ``` render.UsePointTable = Permedia4_FALSE render.AntialiasEnable = Permedia4_DISABLE render.AntialiasingQuality = don't care render.ResetLineStipple = Permedia4_FALSE render.SyncOnBitMask = Permedia4_FALSE render.SyncOnHostData = Permedia4_FALSE Render(render) // Render the rectangle ``` Code is shown in roman face and comments are C++ style '//' indicating that the rest of the line is a comment. Any statement which ends in parenthesis is a register update, other statements will generally be variable assignments. A variable, say render, is of a type associated with the register being modified. This will usually be clear by the context and will not usually be declared as such. All the type definitions are in the header files. The values assigned to a register will be either a variable as described above, a macro i.e. Permedia4\_TRUE, as found in the headers, or an immediate constant in C style format (e.g. 0x45). In registers with several fields some of which are not relevant to a particular example, the field can be ignored completely or set to *don't care*. In some registers values for fields which need to be set are not readily available. These are typically set as appropriate. For some fragments we simply give a list of register updates e.g.: ``` StartXDom() // Start dominant edge StartXSub() // Start of subordinate dXDom() dXSub() Count() YStart() dY() // Set-up to render an aliased trapezoid. Render() // Render the rectangle ``` // Sample code to rasterize a rectangle This technique is used to give a feel for the registers involved in a particular operation and where a detailed treatment is not warranted. To take the address of a register, the name is used, thus this example stores the address of the StartXDom register in the buffer pointed to by the variable buf and increments the pointer: ``` *buf++ = StartXDom ``` To test the value of a register the register name is dereferenced using the C '\*' operator as for instance in this example which tests for the completion of a DMA operation: while (\*DMACount != 0); #### 14.2 Interpolation Calculation #### 14.2.1 Color Gradient Interpolation | To draw from | left to right, top to bottom, the color gradients (or deltas) required are | |----------------|----------------------------------------------------------------------------| | | | | And from the | plane equation: | | | | | | | | | | | | | | where, to be i | independent of the order the vertices are provided: | | | | These values allow the color of each fragment in the triangle to be determined by linear interpolation. For example, to calculate the red component color value of a fragment at Xn,Ym: - add dRdy, for each scanline between Y1 and Yn, to R1, then - add dRdx for each fragment along scanline Yn from the left edge to Xn. The example chosen has the 'knee' i.e. vertex 2, on the right hand side, and drawing is from left to right. If the knee were on the left side (or drawing was from right to left), then the Y deltas for both the subordinate sides would be needed to interpolate the start values for each color component (and the depth value) on each scanline. For this reason Permedia4 always draws triangles starting from the dominant edge and towards the subordinate edges. For the example triangle, this means left to right. #### 14.2.2 Register Set Up for Color Interpolation For the example triangle, Permedia4 registers must be set as follows for color interpolation. Note color values are in 24 bit, fixed point 2's complement 9.15 format. // Load the color start and delta values to draw | // a triangle | | |-------------------------------|---------------------------------| | Rstart (R <sub>1</sub> ) | | | Gstart (G <sub>1</sub> ) | | | Bstart (B <sub>1</sub> ) | | | dRdyDom (dRdy <sub>13</sub> ) | // To walk up the dominant edge | | dGdyDom (dGdy13) | | | dBdyDom (dBdy13) | | | dRdx (dRdx) | // To walk along the scanline | | dGdx (dGdx) | | | dBdx (dBdx) | | ## 14.2.3 Calculating Depth Gradient Values | To draw from left to right and top to botton | n, the depth ( | gradients (o | r deltas) | required for | |----------------------------------------------|----------------|--------------|-----------|--------------| | interpolation are: | | | | | | | | | | | | And from the plane equation: | | | | | |------------------------------|------|--|--|--| | | | | | | | where, as bef | ore: | | | | | | | | | | The divisor, shown here as c, is the same as for color gradient values. The two deltas, dZdy13 and dZdx allow the Z value of each fragment in the triangle to be determined by linear interpolation as was described for the color interpolation above. returned by the CFGDeviceId register in the is 0006h in bits 31-16. ### 14.3 Accurate Rendering This appendix describes how to calculate the various parameters needed to define a Gouraud shaded triangle. This topic is covered in section 1.1.2, however in the interest of simplicity some of the finer details were glossed over. The quality of the rasterization and shading suffers where these fine details are not included and will give rise to 'stitch marks' and 'bright edge' artifacts. The main area where simplifications were made earlier relates to the fact that vertices are not, in general, coincident with pixel centers so sub pixel corrections are necessary. The initial values being interpolated (RGB for example) need to be adjusted to account for this. Permedia4 does the necessary X corrections when moving from scan line to scan line when the SubPixelCorrection bit is set, but the initial Y correction must be done in software. The vertices are sorted into Y order and the dominant edge is AC. Scan conversion will start at vertex A and proceed upwards. The origin is bottom left. The usual parameters to interpolate (denoted P in the diagram) across the triangle would include color (R, G, B and alpha), depth (Z), fog (F), and texture (S, T, Q, Ks and Kd). The source code to set up Permedia4 to achieve the best quality rendering will only calculate the parameters for RGBA and Z to keep the size of the code down. ``` #include <stdio.h> #include <float.h> // A simple macro which just prints out the register name and value. // Replace this with some code to write to Permedia4. #define LD Permedia4 REG(name, value) printf ("%s = \%08x\n", #name, value) // This software is part of the application note which describes // how Permedia4 is set up to get the best quality rendering. Particular // care is taken to avoid cracks, stitch marks and bright edge artifacts // from occurring. The OpenGL rasterization rules are used. // The software has not been written with maximum performance in mind, // but as a clear, well documented example covering the nuances // which are easily overlooked. // Simple vertex structure used to interface parameters to the RenderTriangle // function. typedef struct { float x, y, z; // in device coords float r, g, b, a; // in the range 0.0 to 1.0 } Vertex; // Prototypes. long IntToFixedPoint16 (long i); long FloatToColor (float f); long FloatToCoordinate (float f); void FloatToDepth (float f, long *zi, long *zf); void RenderTriangle (Vertex *v0, Vertex *v1, Vertex *v2); // Defines some simple function to convert from floating point numbers ``` ``` // to various fixed point formats. These can be inlined if necessary. long IntToFixedPoint16 (long i) return i \ll 16; } // These functions perform the conversion from floating point numbers // to the various fixed point format numbers required in Permedia4. They // are implemented as simple operations on the binary representation // of IEEE single precision floating point number so the floating // point rounding mode doesn't need to be set up first and in many // cases they are faster than using the built in conversion functions, // especially when the range checking and clamping is taken into account. // Format of IEEE single-precision (32-bit) real number. #define F_BIAS 127 #define F_SIGN_BIT 31 #define F EXPONENT BITS 23 #define F_FRACTION_BITS 0 // Convert 32-bit floating-point value to 9.15 fixed-point value used // for the color parameters. The input range is assumed to be 0.0 // to 1.0. The algorithm is: // If exponent < -15 then return (0x00000000), otherwise // if exponent < 8 then return (-1**(s) * 1.f * 2**(e - 127)), otherwise // return ((s == 1) ? 0xff800000 : 0x007fffff). long FloatToColor (float fi) long f = *((long *) \& fi); long sign; unsigned char exponent; sign = (f >> F\_SIGN\_BIT); ``` ``` exponent = (unsigned char)(f >> F_EXPONENT_BITS); if (exponent < (F_BIAS-15)) return (0); if (exponent < (F BIAS+8)) f = ((unsigned long)((f | 0x00800000) << 8) >> ((F_BIAS+16) - exponent)); if (sign < 0) f = -f; return (f); return (0x007fffff ^ sign); } // Convert 32-bit floating-point value to 16.16 fixed-point value used // for the rasterizer parameters. // If exponent < 0 then return (0x0000000), otherwise // if exponent < 31 then return (-1**(s) * 1.f * 2**(e - 127)), otherwise // return ((s == 1) ? 0x80000000 : 0x7fffffff). long FloatToCoordinate (float fi) long f = *((long *) \&fi); long sign; unsigned char exponent; long res: sign = f >> F_SIGN_BIT; exponent = (unsigned char) (f >> F_EXPONENT_BITS); if (exponent < (F_BIAS-16)) return (0); if (exponent < (F_BIAS+15)) res = ((unsigned long)((f | 0x00800000) << 8) >> ((F_BIAS+15) - exponent)); ``` ``` if (sign < 0) res = -res; return (res); } return (0x7fffffff ^ sign); } // Convert 32-bit floating-point value to 24.16 fixed-point value as // used by the Z values. Note that this assumes a 24 bit Z buffer. // If exponent < -16 then return (0x00000000000000), otherwise // if CLAMP_24_16 is defined and is non-zero: // if exponent < 23 then return (-1**(s) * 1.f * 2**(e - 127)), otherwise // otherwise: // return (-1**(s) * 1.f * 2**(e - 127)). void FloatToDepth (float fi, long *zi, long *zf) long f = *((long *) \&fi); long sign; unsigned char exponent; long resh; unsigned long resl; sign = (f >> F\_SIGN\_BIT); exponent = (unsigned char)(f >> F_EXPONENT_BITS); if (exponent < (F_BIAS-16)) { *zi = 0; *zf = 0; return; if (exponent < (F_BIAS+23)) f = ((f \mid 0x00800000) << 8); if (exponent < (F_BIAS+0)) ``` ``` { resh = 0; resl = ((unsigned long) f >> ((F_BIAS-1) - exponent)); } else unsigned char shift; shift = ((F\_BIAS+31) - exponent); // 8 <= shift < 32 resh = ((unsigned long) f >> shift); resl = (f << (31 - shift)); // shifts >= 32 undefined // so we must shift twice resl <<= 1; if (sign < 0) unsigned long old_resl; resl = \sim resl; resh = \sim resh; old_resl = resl; res1 += 0x00010000; // overflow if (resl < old_resl) ++resh; } } else { resh = (0x007fffff ^ sign); resl = (0xffff0000 \land sign); resl &= 0xffff0000; *zi = resh; *zf = resl; ``` } ``` #define SAME 0 #define REVERSED ~SAME #define ORDER(v0, v1, v2, order) \{a = v0; b = v1; c = v2; windingOrder = order; \} void RenderTriangle (Vertex *v0, Vertex *v1, Vertex *v2) float dxAB, dyAB, dxBC, dyBC, dxAC, dyAC; // Diff in x,y for each edge. float drAC, dgAC, dbAC, daAC, dzAC; // Diff in rgbz for dominant edge float drBC, dgBC, dbBC, daBC, dzBC; // Diff in rgbz for the BC edge. float dxdyAC, dxdyAB, dxdyBC; // Edge gradients for unit // set in y float drdxdy, dgdxdy, dbdxdy; float dadxdy, dzdxdy; float drdx, dgdx, dbdx, dadx, dzdx; // Gradients for unit step in x. float r0, g0, b0, a0, z0; // Start values float area, oneOverArea, t1, t2; float oneOverdyAC; Vertex *a, *b, *c; // Sorted vertices. long xDomFixed, xSubFixed; float dyErr, yBottom, yTop; long iyBottom, iyTop; int windingOrder; // Not used. long zi, zf; long temp; // Sort vertices into ascending Y order. *a points to the vertex with the // lowest y value. Compare winding order of the pre and post sorted vertices // and set winding order flag as appropriate (this is only needed if culling // based on the winding order is to be done). if (v0->y < v1->y) { if (v1->y < v2->y) ORDER (v0, v1, v2, SAME) else if (v0->y < v2->y) ORDER (v0, v2, v1, REVERSED) ``` ``` else ORDER (v2, v0, v1, SAME) } else { if (v1->y < v2->y) if (v0->y < v2->y) ORDER (v1, v0, v2, REVERSED) else ORDER (v1, v2, v0, SAME) } else ORDER (v2, v1, v0, REVERSED) } // Compute signed area of the triangle. // Form vectors for two edges of the triangle. dxAC = a->x - c->x; dxBC = b->x - c->x; dyAC = a->y - c->y; dyBC = b->y - c->y; // Form the cross product of the two edges. area = dxAC * dyBC - dxBC * dyAC; if (area == 0.0) // Reject zero area triangles. return: // A negative area just means the order of the vertices, after sorting, was // clockwise. Note this may be different from original input order. if (area < 0.0) // Make positive. area = -area; // The dx/dy value (change in x for unit change in y) are needed for // each edge so the rasterizer can compute the new left and right hand // x coordinates as it steps from one scan line to the next. Horizontal ``` ``` // or near horizontal edges will have very large gradients but these will // be handled later. Values for AC and BC have already been calculated so // just do the remaining edge. dxAB = a->x - b->x; dyAB = a->y - b->y; // The dominant edge is always AC (i.e. the edge with the maximum Y extent). // Compute the change in rgbaz along this edge for unit change in y. oneOverdyAC = 1.0 / dyAC; // Differences along edge AC drAC = a->r - c->r; dgAC = a->g - c->g; dbAC = a->b - c->b; daAC = a->a - c->a; dzAC = a->z - c->z; // Gradient along edge AC for each parameter. drdxdy = drAC * oneOverdyAC; dgdxdy = dgAC * oneOverdyAC; dbdxdy = dbAC * oneOverdyAC; dadxdy = daAC * oneOverdyAC; dzdxdy = dzAC * oneOverdyAC; dxdyAC = dxAC * oneOverdyAC; // Difference along edge BC drBC = b->r - c->r: dgBC = b->g - c->g; dbBC = b->b - c->b; daBC = b->a - c->a; dzBC = b->z - c->z; // Compute the change in rgbaz when taking unit steps in x. oneOverArea = 1.0 / area; t1 = dyAC * oneOverArea; ``` ``` t2 = dyBC * oneOverArea; drdx = drAC * t2 - drBC * t1; dgdx = dgAC * t2 - dgBC * t1; dbdx = dbAC * t2 - dbBC * t1; dadx = daAC * t2 - daBC * t1; dzdx = dzAC * t2 - dzBC * t1; // A general triangle will need to be split into two trapezoids for // rendering. Either of these trapezoids may have a zero height in // which case the triangle has a flat top or bottom. The rasterizer // and DDAs are still set up, however the count may be zero. // Fill lower trapezoid. yBottom = a->y; yTop = b->y; // The y coordinates are converted to integer values, taking into // account the openGL rules which determine which pixels fall within // the boundary. temp = FloatToCoordinate (yBottom); // float to 16.16 fixed point temp += 0x00007fff; // add in nearly a half iyBottom = temp >> 16; // extract integer part temp = (int) FloatToCoordinate (yTop); // float to 16.16 fixed point temp += 0x00007fff; // add in nearly a half iyTop = temp >> 16; // extract integer part dyErr = iyBottom + 0.5 - yBottom; // Check for the case when AB is a true horizontal edge to prevent a divide // by zero. if (dyAB == 0.0) dyAB = FLT_MIN; // set to a very small number. dxdyAB = dxAB / dyAB; ``` ``` // Move the rgbaz values at vertex a along the edge AC in proportion // to how far the vertex a is from the pixel center in the y direction // to do the sub pixel adjustment in Y. Permedia4 does the sub pixel // adjustment in X automatically, if enabled. r0 = a - r + dyErr * drdxdy; g0 = a->g + dyErr * dgdxdy; b0 = a - b + dyErr * dbdxdy; a0 = a - a + dyErr * dadxdy; z0 = a->z + dyErr * dzdxdy; // Similarly for the start values for the left and right hand edges. xDomFixed = FloatToCoordinate (a->x + dyErr * dxdyAC); xSubFixed = FloatToCoordinate (a->x + dyErr * dxdyAB); // Load up Permedia4 with the parameters. // Rasterizer. Note that the RasterizerMode is set to add // Permedia4 START BIAS ALMOST HALF to the XDom, XSub and // Y Start values to conform to the OpenGL rasterization rules. LD_Permedia4_REG(StartXDom, xDomFixed); LD_Permedia4_REG(dXDom. FloatToCoordinate (dxdyAC)); LD_Permedia4_REG(StartXSub, xSubFixed); LD Permedia4 REG(dXSub, FloatToCoordinate (dxdyAB)); LD_Permedia4_REG(StartY, IntToFixedPoint16 (iyBottom)); LD_Permedia4_REG(dy, IntToFixedPoint16 (1)); LD_Permedia4_REG(Count, (iyTop - iyBottom)); // Color DDA. LD_Permedia4_REG(Rstart, FloatToColor (r0)); LD_Permedia4_REG(dRdx, FloatToColor (drdx)); LD_Permedia4_REG(dRdyDom, FloatToColor (drdxdy)); LD_Permedia4_REG(Gstart, FloatToColor (g0)); LD_Permedia4_REG(dGdx, FloatToColor (dgdx)); LD Permedia4 REG(dGdyDom, FloatToColor (dgdxdy)); ``` ``` LD Permedia4 REG(Bstart, FloatToColor (b0)); LD_Permedia4_REG(dBdx, FloatToColor (dbdx)); LD_Permedia4_REG(dBdyDom, FloatToColor (dbdxdy)); LD_Permedia4_REG(AStart, FloatToColor (a0)); LD_Permedia4_REG(dAdx, FloatToColor (dadx)); LD_Permedia4_REG(dAdyDom, FloatToColor (dadxdy)); // Depth DDA. FloatToDepth (z0, &zi, &zf); LD_Permedia4_REG(ZStartU, zi); LD_Permedia4_REG(ZStartL, zf); FloatToDepth (dzdx, &zi, &zf); LD_Permedia4_REG(dZdxU, zi); LD_Permedia4_REG(dZdxL, zf); FloatToDepth (dzdxdy, &zi, &zf); LD_Permedia4_REG(dZdyDomU, zi); LD_Permedia4_REG(dZdyDomL, zf); // Render the trapezoid ... LD_Permedia4_REG(Render, 0x00014041); // Fill upper trapezoid. yBottom = b->y; yTop = c->y; // The y coordinates are converted to integer values, taking into // account the openGL rules which determine which pixels fall within // the boundary. temp = FloatToCoordinate (yBottom); // float to 16.16 fixed point temp += 0x00007fff; // add in nearly a half iyBottom = temp >> 16; // extract integer part temp = FloatToCoordinate (yTop); // float to 16.16 fixed point temp += 0x00007fff; // add in nearly a half ``` ``` iyTop = temp >> 16; // extract integer part // Find the dyErr value for vertex B so that the start value for x can be // corrected. dyErr = iyBottom + 0.5 - yBottom; // Check for the case when BC is a true horizontal edge to prevent a divide // by zero. if (dyBC == 0.0) dyBC = FLT_MIN; // set to a very small number. dxdyBC = (dxBC / dyBC); // Set up the rasterizer for the upper trapezoid. All other DDA units // can carry on with their parameters as they are walking up the same // edge. xSubFixed = FloatToCoordinate (b->x + dyErr * dxdyBC); LD_Permedia4_REG(StartXSub, xSubFixed); LD_Permedia4_REG(dxSub, FloatToCoordinate (dxdyBC)); LD_Permedia4_REG(ContinueNewSub, (iyTop - iyBottom)); ``` } ### 14.4 Glossary accumulation buffer A color buffer of higher resolution than the displayed buffer (typically 16bits per component for an 8bit per component display). Typically used to sum the result of rendering several frames from slightly different viewpoints to achieve motion blur effects or eliminate aliasing effects. active fragment A fragment which passes all the various culling tests, such as scissor, depth(Z), alpha, etc., is written to/combined with the corresponding pixel in the framebuffer. See also "fragment" and "passive fragment". aliasing A phenomena resulting from a rendering style which ignores the fact that a pixel may not be wholly covered by a primitive, leading to jagged edges on primitives. alpha buffer A memory buffer containing the fourth component of a pixel's color in addition to Red, Green and Blue. This component is not displayed, but may be used for instance to control color blending and antialiasing. alpha test A test used to cull selected fragments from being drawn, based on a comparison of a fixed value with the alpha value of the fragment. antialiasing A rendering style which weights the color of a pixel by the fraction of its area that is covered by primitives, leading to reduction or elimination of jagged edges. bitblt Bit aligned block transfer. Copy of a rectangular array of pixels in a bitmap from one location to another. **block write** A feature provided in some SGRAM devices which allows multiple pixels to be set to a given value by a single write. See also fast fill which is an alternative name for the same feature. **command register** A register which when loaded triggers activity in Permedia4. For instance the **Render** command register when loaded will cause Permedia4 to start rendering the specified primitive with the parameters currently set up in the control registers. context The state information associated with a particular task. Typically in a system more than one task will be using Permedia4 to render primitives. Software on the host must save away the current contents of the Permedia4 control registers when suspending one task to allow another to run, and must restore the state when that task is next scheduled to run. control register A register which contains state that dictates how **Permedia4** will execute a command. culling The process of eliminating a fragment, object face, or primitive, so that it is not drawn. DDA Digital Differential Analyzer. An algorithm for determining the pixels to draw along a line or polygon edge. Also used to interpolate linearly varying values such as color and depth. depth (Z) buffer A memory buffer containing the depth component of a pixel. Used to, for example, eliminate hidden surfaces. depth-cueing A technique which determines the color of a pixel based on its depth. Used, for instance, to fade far away objects into the background. See also fogging. dithering A rendering style which increases the perceived range of displayed colors at the cost of spatial resolution. The technique is similar to the use of stippled patterns of black and white pixels, to achieve shades of grey on a black and white display. **double-buffering** A technique for achieving smooth animation, by rendering only to an undisplayed back buffer, and then swapping the back buffer to the front once drawing is complete. fast fill A feature provided in SGRAM devices which allows multiple pixels to be set to a given value by a single write. See also block write which is an alternative name for the same feature. **fogging** A technique which determines the color of a pixel based on its depth. Used, for instance, to fade far away objects into the background. See also depth-cueing. **Fast Clear Planes** Used to allow higher animation rates by enabling localbuffer pixel data, such as depth (Z), to be cleared down - not required or supported in Permedia4 **fragment** A fragment is an object generated as a result of the rasterization of a primitive. It corresponds to and contains all the components of a single pixel. If a fragment passes all the various culling tests, such as scissor, depth(Z), alpha, etc., it will be written to/combined with the corresponding pixel in the framebuffer. **framebuffer** An area of memory containing the displayable color buffers (front, back, left, right, overlay, underlay), their (optional) associated alpha components, and any associated (optional) window control information. This memory is typically separate from the localbuffer. **Graphic ID (GID)** A component of a pixel containing information used for per pixel clipping. **host** The processor which controls Permedia4. **localbuffer** An area of memory which may be used to store the following non- displayable pixel information: depth(Z), stencil, Graphic ID. passive fragment A fragment which fails one or more of the various culling tests, such as scissor, depth(Z), alpha, etc., is nor written to/combined with the corresponding pixel in the framebuffer. See also "fragment" and "active fragment". pixel Picture element. A pixel comprises the bits in all the buffers (whether stored in the localbuffer or framebuffer), corresponding to a particular location in the framebuffer. primitive A geometric object to be rendered. The Permedia4 primitives are points, lines, trapezoids (including triangles as a subset), and bitmaps. rasterization The act of converting a point, line, polygon, or bitmap, in device coordinates, into fragments. rendering Conversion of primitives in object coordinates into an image. scissor test A means of culling fragments which lie outside the defined scissor rectangle. The scissor rectangle is defined in device coordinates. stencil buffer A buffer used to store information about a pixel which controls how subsequent stenciled fragments at the same location may be combined with its current value. Typically used to mask complex two-dimensional shapes. stipple A one or two dimensional binary pattern which is used to cull fragments from being drawn. task A process, or thread on the host which uses the Permedia4 coprocessor. Typically tasks assume that they have sole use of Permedia4 and rely on a device driver to save and restore their Permedia4 context, when they are swapped out. texel Texture element. An element of an image stored in texture memory which represents the color of the texture to be applied (fully or in part) to a corresponding fragment. texture An image used to modify the color of fragments during processing. Often used for instance to achieve high realism in a scene, with relatively few primitives. texture mapping The process of applying a two dimensional image to a primitive. For instance to apply a wood grain effect to a table. window control buffer A buffer containing control bits used by display hardware to select between multiple hardware LUTs or display buffers (such as overlay and underlay) on a per pixel basis. Usually a given value in the buffer corresponds to a single window on the screen. A bit pattern used to enable or inhibit the writing of the corresponding bits of a fragment's color into the framebuffer. writemask # **15** # **Indexes** #### 15.1 Volume I Index Alpha Blend, 1-3, 6-7, 8-2, **8-4** Alpha Blend Example, 8-11 Alpha Blend Unit, 8-1 Alpha Blending, 1-3, 8-2, **8-4** alpha buffer, 6-7, 8-3, **9-5**, 18 Alpha test, 6-9 Alpha Test, 1-3, 6-9 Alpha Test, 6-9 **AlphaBlendMode**, **8-3**, **8-4**, 8-5, 8-8 $Alpha Test Mode,\, 6\text{--}10$ Antialias Application, 1-2, 6-7 Antialias Example, 6-9 antialiasing, 18 Antialiasing, 2-4, 6-7, 6-8 **AntialiasMode**, **2-6**, 6-8, 6-9 Application Initialization, 12-8 area stippling, 3-5 Area Stippling, 2-18, 3-3 AreaStippleMode, 3-4, 3-6, 3-7 AreaStipplePattern, 3-7 **AStart**, **3-12** Bitmaps, 2-20 BitMaskPattern, 2-20, 2-21, 2-29, 2-32 block write, 18 BorderColor, 5-11 BStart, 3-12, 3-13, 5-5, 5-28, 5-29 chroma, 8-9 ChromaLower, 5-17, 8-9 ChromaUpper, 5-17, 8-9 CI Fogging Equation, 6-4 Color DDA, 1-1, 3-9 Color Format, 12-6 Color Format Example 3:3:2, 9-6 8:8:8:8, 9-6 Color Format Unit, 9-1 Color Formatting, 1-3 Color Index Format Example, 9-7 Color Interpolation, 14-3 ColorDDAMode, 3-12, 3-13 command register, 18 ConstantColor, 3-12 context, 18 Continue, 2-30, 2-31 ContinueNewDom, 2-4, 2-30 ContinueNewLine, 2-31 ContinueNewSub, 1-8, 2-4, 2-31 control register, 18 Count, 2-32 dAdx, 3-12 dAdyDom, 3-12 dBdx, 3-12 **dBdyDom**, **3-12**, 3-13 DDA, 3-12, 3-13 delta, 2-2, **2-31**, 14-3, 14-4 Depth, 1-4, 4-6, 4-15 depth (Z) buffer, 19 Depth Example, 4-15 Depth Gradient, 3-10, 14-4 Depth Test, 1-1 Depth Test, 4-11 **depth-cueing**, 19 DepthMode, 4-12, 4-13, 4-15 dFdx, 6-2 dFdyDom, 6-2 dGdx, 3-12, 5-5, 5-29 dGdyDom, 3-12, 3-13, 5-5, 5-29 Disabling Specialized Modes, 12-6 extent checking, 11-2 Disabling units not in use, 13-3 Extent Checking, 11-2 Dither Example, 9-6 Fast double buffering in a window, 13-2 fast fill, 19 dithering, 19 FBColor, 7-1, 11-1 Dithering, 9-4 DitherMode, 9-4, 9-5, 9-6 FBData, 8-5 dKdBdx, 5-29 FBDestReadMode, 7-2, 9-8, 12-7 dKdBdyDom, 5-29 FBHardwareWriteMask, 10-1 FBSoftwareWriteMask, 10-1 dKddx, 5-28 dKddyDom, 5-28 FBSourceReadMode, 7-1 FBWriteData, 9-7 dKdGdx, 5-29 dKdGdyDom, 5-29 FBWriteMode, 8-5 dKdRdx, 5-29 Filter Mode Example, 11-1 dKdRdyDom, 5-29 Filtering, 11-1 dKsBdx, 5-28 FilterMode, 11-1, 11-2, 11-3, 11-4 flat shaded, 3-12 dKsBdyDom, 5-28 dKsdx, 5-28 Flat Shading example, 3-12 FlushSpan, 2-4, 2-31 dKsdyDom, 5-28 dKsGdx, 5-28 Fog, 1-1, 6-1 dKsGdyDom, 5-28 Fog Example, 6-5 dKsRdx, 5-28 Fog Index Calculation - The Fog DDA, 6-1 fogging, 19 dKsRdyDom, 5-28 FogMode, 6-4, 6-5 **DMA** framebuffer, 19 Using the Bus Mastership, 13-3 Dominant, 1-4 Framebuffer, 12-4 dQdx, 5-2, 5-5 Bypass, 13-4 dQdy, 5-2, 5-3, 5-5 Framebuffer, 7-1 dQdyDom, 5-5 Framebuffer Depth, 12-3 dRdx, 3-12, 14-3 Framebuffer Read Span Operations, 7-2 dRdyDom, 3-12, 3-13 Gouraud shading, 3-13 Gouraud Shading, 3-10 dSdx, 5-2, 5-5 dSdy, 5-2, 5-3, 5-5 Gouraud Shading examples, 3-12 dSdyDom, 5-5 Graphic ID, 19 dTdx, 5-2, 5-5 Graphics HyperPipeline, 1-1 dTdy, 5-2, 5-3, 5-5 Graphics Programming, 1-1 GStart, 3-12, 3-13, 5-5, 5-28, 5-29 dXDom, 2-32 dXSub, 2-32 Hardware Writemask Example, 10-2 dY, 2-9, 2-25, 2-32, 13-4 Hardware Writemasks, 10-1 dZdxL, 4-15 High Speed Flat Shaded Rendering, 9-7 dZdxU, 4-15 Host, 11-1, 12-5 dZdyDomL, 4-15 Host Out, 1-3 dZdyDomU, 4-15 HyperPipeline, 1-1 Enabling Writing, 12-7 Image Copy/Upload/Download, 2-25 Examples, 3-7 Image Formatting, 8-4 | Improving PCI bus bandwidth for Programmed | patch, 5-9 | |----------------------------------------------------|-----------------------------------------------------| | I/O and DMA, 13-2 | Patch, 4-1 | | Initialization, 1-3, 12-1 | PCI burst transfers under Programmed I/O, 13-2 | | Initializing GLINT, 12-1 | PCI bus, 12-2 | | Interpolation | PCI Disconnect Under Programmed I/O, 13-3 | | Calculating Colorvalues, 14-3 | Performance Tips, 13-1 | | KdBStart, 5-29 | Perspective Correction, 5-3 | | KdGStart, 5-29 | picking, 11-2 | | KdRStart, 5-28 | Picking Example, 11-5 | | KdStart, 5-28 | PickResult, 11-1, 11-2, 11-5 | | KsBStart, 5-28 | Pixel Ownership, 1-1 | | KsGStart, 5-28 | Pixel Ownership Test, 4-6 | | KsRStart, 5-28 | Pixel Sizes, 2-19 | | KsStart, 5-28 | <b>PixelSize</b> , <b>2-19</b> , <b>2-31</b> , 12-3 | | LBDestReadMode, 4-2 | PointTable, 2-32 | | LBReadFormat, 4-2, 4-4, 12-4 | PointTable0, 2-32 | | <b>LBWriteFormat</b> , <b>4-2</b> , 4-4, 4-5, 12-4 | primitive, 20 | | LBWriteMode, 4-4 | pseudocode, 14-1 | | Level of Detail calculation, 5-3 | QStart, 5-2, <b>5-5</b> | | Line Stippling, 3-4 | Rapid clear of the localbuffer & framebuffer, 13-3 | | LineStippleMode, 3-5, 3-7 | Rasterization, 1-8 | | Loading registers in unit order, 13-4 | Rasterizer, 1-1, <b>2-1</b> | | LoadLineStippleCounters, 3-7 | Rasterizer Mode, 1-4, 2-29 | | localbuffer, 19 | Rasterizer Unit Registers, 2-30 | | Localbuffer, 12-4 | RasterizerMode, 2-21, 2-29, 2-31, 2-32 | | Bypass, 13-4 | Register Updates | | LOD, 5-3, <b>5-5</b> | Avoiding Unnecessary, 13-4 | | Logical Op, 9-7 | <b>Render</b> , <b>1-5</b> , 2-28, 2-30 | | Logical Op and Software Writemask Example, 9- | ResetPickResult, 11-2, 11-5 | | 10 | RGBA and Color-Index(CI) Modes, 3-10 | | Logical Operations, 9-8 | RGBA Fogging Equation, 6-3 | | LogicalOpMode, 9-7, 9-8 | Router, 1-3 | | MaxHitRegion, 11-1, 11-3, <b>11-5</b> | RouterMode, 1-3 | | MaxRegion, 11-2, 11-3, 11-5 | RStart, 3-11, 3-12, 3-13, 5-5, 5-28 | | Memory Configuration, 12-2 | SaveLineStippleCounters, 3-5, 3-7 | | Merge-copy Span Operations, 7-2 | SaveStippleLineCounters, 3-5 | | MinHitRegion, 11-1, 11-3, <b>11-5</b> | ScanLineOwnership, 2-32 | | MinRegion, 11-2, 11-3, 11-5 | Scissor, 3-1 | | Miscellaneous Generic Graphics Tips, 13-5 | Scissor Example, 3-3 | | OpenGL Application Modes, 5-20 | scissor test, 11-2, 11-3 | | origin | Scissor Test, 1-1 | | window, 12-6 | ScissorMaxXY, 3-3 | | Origin | ScissorMinXY, 3-3 | | Setting, 12-6 | ScissorMode, 3-2 | | <del>-</del> | | Screen Clipping Region, 12-4 Screen Scissor Tests, 3-1 Screen Width, 12-4 ScreenSize, 3-1, 3-3 SGRAM Block Writes, 13-1 Sides Calculating the Slope, 1-6 Software Writemask Example, 10-1 Software Writemasks, 10-1 Span Mask Processing, 2-18 Span Operations, 2-15 Span Operations and Bitmaps, 2-22 Span Operations and Image Copy/Upload/Download, 2-27 Span Operations and Stippling, 3-5 Span Operations and the Scissor Unit, 3-3 SStart, 5-2, 5-5 Standard Framebuffer Read Operation, 7-1 StartXDom, 1-7, 2-9, 2-25, 2-32, 14-2 StartXSub, 1-7, 2-9, 2-25, 2-32 StartY, 2-2, 2-9, 2-25, 2-32 Statistic Operations, 11-2 StatisticMode, 11-2, 11-4, 11-5 Stencil, 4-6, 4-11 stencil buffer, 20 Stencil Example, 4-11 Stencil Test, 1-1 Stencil Test, 4-8 **StencilData**, 4-8, 4-10 StencilMode, 4-8, 4-9, 4-10 stipple, 20 Stipple, 3-3 Stipple Test, 1-1 Sub Pixel Precision and Correction, 2-19 Subordinate, 1-4 Subpixel Correction, 1-4 Sync, 11-3, 11-6, 12-5 Sync Interrupt Example, 11-6 Synchronization, 11-3 System Initialization, 12-2 TexelLUT, 5-15, 5-16 TexelLUTAddress, 5-16 TexelLUTData, 5-16 TexelLUTIndex, 5-16 TexelLUTTransfer, 5-16 texture, 20 Texture, 1-1, 5-1, 5-2 Texture Filtering, 5-16 texture mapping, 1-3, 5-1, 20 TextureAddressMode, 2-18, 5-3 TextureBaseAddr, 5-3 TextureChromaLower, 5-17 TextureChromaUpper, 5-17 TextureColor Generation, 5-19 TextureEnvColor, 5-28 *TextureFilterMode*, 5-2, 5-16, **5-17** **TextureReadMode**, **2-18**, 5-3, **5-4**, **5-5**, **5-8**, 5-16 Trapezoids, 2-2 TStart, 5-2, 5-5 **UpdateLineStippleCounters**, 3-5, 3-6 UseConstantFBWriteData, 9-7 User Scissor Test, 3-1 Video Timing, 12-3 WaitForCompletion, 2-31, 4-2 Window, 4-6, 4-11, 4-13 Window Address Setting, 12-6 window control. 20 Window Initialization, 12-6 WindowOrigin, 3-1, 3-3 Write Masks, 10-1 writemask, 20 Writemasks, 12-7 XOR Example, 9-9 Y Limits Clipping, 2-30 ZStartL, 4-15 ZStartU, 4-15 #### 15.2 Volume II Index Alpha Blend, 1-3, 6-7, 8-2, 8-4 Alpha Blend Example, 8-11 Alpha Blend Unit, 8-1 ContinueNewLine, 2-31 Alpha Blending, 1-3, 8-2, 8-4 ContinueNewSub, 1-8, 2-4, 2-31 alpha buffer, 6-7, 8-3, **9-5**, 18 control register, 18 Count, 2-32 Alpha test, 6-9 dAdx, 3-12 Alpha Test, 1-3, 6-9 Alpha Test, 6-9 dAdyDom, 3-12 **AlphaBlendMode**, **8-3**, **8-4**, 8-5, 8-8 dBdx, 3-12 AlphaTestMode, 6-10 dBdyDom, 3-12, 3-13 Antialias Application, 1-2, 6-7 DDA, 3-12, 3-13 delta, 2-2, 2-31, 14-3, 14-4 Antialias Example, 6-9 antialiasing, 18 Depth, 1-4, 4-6, 4-15 Antialiasing, 2-4, 6-7, 6-8 depth (Z) buffer, 19 **AntialiasMode**, **2-6**, 6-8, 6-9 Depth Example, 4-15 Application Initialization, 12-8 Depth Gradient, 3-10, 14-4 area stippling, 3-5 Depth Test, 1-1 Area Stippling, 2-18, 3-3 Depth Test, 4-11 AreaStippleMode, 3-4, 3-6, 3-7 depth-cueing, 19 AreaStipplePattern, 3-7 DepthMode, 4-12, 4-13, 4-15 AStart, 3-12 dFdx, 6-2 Bitmaps, 2-20 dFdyDom, 6-2 BitMaskPattern, 2-20, 2-21, 2-29, 2-32 dGdx, 3-12, 5-5, 5-29 block write, 18 dGdyDom, 3-12, 3-13, 5-5, 5-29 BorderColor, 5-11 Disabling Specialized Modes, 12-6 BStart, 3-12, 3-13, 5-5, 5-28, 5-29 Disabling units not in use, 13-3 chroma, 8-9 Dither Example, 9-6 **ChromaLower**, **5-17**, 8-9 dithering, 19 ChromaTestMode, 8-9 Dithering, 9-4 **ChromaUpper**, **5-17**, 8-9 DitherMode, 9-4, 9-5, 9-6 CI Fogging Equation, 6-4 dKdBdx, 5-29 Color DDA, 1-1, 3-9 dKdBdyDom, 5-29 Color Format, 12-6 dKddx, 5-28 Color Format Example dKddyDom, 5-28 3:3:2, 9-6 dKdGdx, 5-29 8:8:8:8, 9-6 dKdGdvDom, 5-29 Color Format Unit, 9-1 dKdRdx, 5-29 Color Formatting, 1-3 dKdRdyDom, 5-29 Color Index Format Example, 9-7 dKsBdx, 5-28 Color Interpolation, 14-3 dKsBdyDom, 5-28 ColorDDAMode, 3-12, 3-13 dKsdx, 5-28 command register, 18 dKsdvDom, 5-28 ConstantColor, 3-12 dKsGdx, 5-28 context, 18 dKsGdyDom, 5-28 Continue, 2-30, 2-31 dKsRdx. 5-28 ContinueNewDom, 2-4, 2-30 dKsRdyDom, 5-28 **DMA** FogMode, 6-4, 6-5 Using the Bus Mastership, 13-3 framebuffer, 19 Dominant, 1-4 Framebuffer, 12-4 dQdx, 5-2, 5-5 Bypass, 13-4 dQdy, 5-2, 5-3, 5-5 Framebuffer, 7-1 Framebuffer Depth, 12-3 dQdyDom, 5-5 dRdx, 3-12, 14-3 Framebuffer Read Span Operations, 7-2 dRdyDom, 3-12, 3-13 Gouraud shading, 3-13 dSdx, 5-2, **5-5** Gouraud Shading, 3-10 Gouraud Shading examples, 3-12 dSdy, 5-2, 5-3, 5-5 dSdyDom, 5-5 Graphic ID, 19 dTdx, 5-2, 5-5 Graphics HyperPipeline, 1-1 Graphics Programming, 1-1 dTdy, 5-2, 5-3, 5-5 GStart, 3-12, 3-13, 5-5, 5-28, 5-29 dXDom, 2-32 dXSub, 2-32 Hardware Writemask Example, 10-2 dY, 2-9, 2-25, 2-32, 13-4 Hardware Writemasks, 10-1 dZdxL, 4-15 High Speed Flat Shaded Rendering, 9-7 dZdxU, 4-15 Host, 11-1, 12-5 dZdyDomL, 4-15 Host Out, 1-3 dZdyDomU, 4-15 HyperPipeline, 1-1 Image Copy/Upload/Download, 2-25 Enabling Writing, 12-7 Image Formatting, 8-4 Examples, 3-7 Improving PCI bus bandwidth for Programmed extent checking, 11-2 Extent Checking, 11-2 I/O and DMA, 13-2 Fast double buffering in a window, 13-2 Initialization, 1-3, 12-1 fast fill, 19 Initializing GLINT, 12-1 FBColor, 7-1, 11-1 Interpolation FBData, 8-5 Calculating Colorvalues, 14-3 FBDestReadMode, 7-2, 9-8, 12-7 KdBStart. 5-29 FBHardwareWriteMask, 10-1 KdGStart, 5-29 FBSoftwareWriteMask, 10-1 KdRStart, 5-28 FBSourceReadMode, 7-1 KdStart, 5-28 FBWriteData, 9-7 KsBStart, 5-28 FBWriteMode, 8-5 KsGStart, 5-28 Filter Mode Example, 11-1 KsRStart, 5-28 Filtering, 11-1 KsStart, 5-28 FilterMode, 11-1, 11-2, 11-3, 11-4 LBDestReadMode, 4-2 flat shaded, 3-12 LBReadFormat, 4-2, 4-4, 12-4 Flat Shading example, 3-12 LBWriteFormat, 4-2, 4-4, 4-5, 12-4 FlushSpan, 2-4, 2-31 LBWriteMode, 4-4 Fog, 1-1, **6-1** Level of Detail calculation, 5-3 Fog Example, 6-5 Line Stippling, 3-4 Fog Index Calculation - The Fog DDA, 6-1 LineStippleMode, 3-5, 3-7 Loading registers in unit order, 13-4 fogging, 19 | LoadLineStippleCounters, 3-7 | Rasterizer Mode, 1-4, 2-29 | | | |-----------------------------------------------------|-------------------------------------------------------------|--|--| | localbuffer, 19 | Rasterizer Unit Registers, 2-30 | | | | Localbuffer, 12-4 | RasterizerMode, 2-21, 2-29, 2-31, 2-32 | | | | Bypass, 13-4 | Register Updates | | | | LOD, 5-3, <b>5-5</b> | Avoiding Unnecessary, 13-4 | | | | Logical Op, 9-7 | <b>Render</b> , <b>1-5</b> , 2-28, 2-30 | | | | Logical Op and Software Writemask Example, 9- | ResetPickResult, 11-2, 11-5 | | | | 10 | RGBA and Color-Index(CI) Modes, 3-10 | | | | Logical Operations, 9-8 | RGBA Fogging Equation, 6-3 | | | | LogicalOpMode, 9-7, 9-8 | Router, 1-3 | | | | MaxHitRegion, 11-1, 11-3, <b>11-5</b> | RouterMode, 1-3 | | | | <b>MaxRegion</b> , <b>11-2</b> , 11-3, <b>11-5</b> | <b>RStart</b> , <b>3-11</b> , 3-12, 3-13, <b>5-5</b> , 5-28 | | | | Memory Configuration, 12-2 | SaveLineStippleCounters, 3-5, 3-7 | | | | Merge-copy Span Operations, 7-2 | SaveStippleLineCounters, 3-5 | | | | MinHitRegion, 11-1, 11-3, <b>11-5</b> | ScanLineOwnership, 2-32 | | | | MinRegion, 11-2, 11-3, 11-5 | Scissor, 3-1 | | | | Miscellaneous Generic Graphics Tips, 13-5 | Scissor Example, 3-3 | | | | OpenGL Application Modes, 5-20 | scissor test, 11-2, 11-3 | | | | origin | Scissor Test, 1-1 | | | | window, 12-6 | ScissorMaxXY, 3-3 | | | | Origin | ScissorMinXY, 3-3 | | | | Setting, 12-6 | ScissorMode, 3-2 | | | | patch, 5-9 | Screen Clipping Region, 12-4 | | | | Patch, 4-1 | Screen Scissor Tests, 3-1 | | | | PCI burst transfers under Programmed I/O, 13-2 | Screen Width, 12-4 | | | | PCI bus, 12-2 | ScreenSize, 3-1, 3-3 | | | | PCI Disconnect Under Programmed I/O, 13-3 | SGRAM Block Writes, 13-1 | | | | Performance Tips, 13-1 | Sides | | | | Perspective Correction, 5-3 | Calculating the Slope, 1-6 | | | | picking, 11-2 | Software Writemask Example, 10-1 | | | | Picking Example, 11-5 | Software Writemasks, 10-1 | | | | PickResult, 11-1, <b>11-2</b> , <b>11-5</b> | Span Mask Processing, 2-18 | | | | Pixel Ownership, 1-1 | Span Operations, 2-15 | | | | Pixel Ownership Test, 4-6 | Span Operations and Bitmaps, 2-22 | | | | Pixel Sizes, 2-19 | Span Operations and Image | | | | <b>PixelSize</b> , <b>2-19</b> , <b>2-31</b> , 12-3 | Copy/Upload/Download, 2-27 | | | | PointTable, 2-32 | Span Operations and Stippling, 3-5 | | | | PointTable0, 2-32 | Span Operations and the Scissor Unit, 3-3 | | | | primitive, 20 | SStart, 5-2, <b>5-5</b> | | | | pseudocode, 14-1 | Standard Framebuffer Read Operation, 7-1 | | | | QStart, 5-2, <b>5-5</b> | StartXDom, 1-7, 2-9, 2-25, 2-32, 14-2 | | | | Rapid clear of the localbuffer & framebuffer, 13-3 | StartXSub, 1-7, <b>2-9</b> , 2-25, <b>2-32</b> | | | | Rasterization, 1-8 | StartY, 2-2, <b>2-9</b> , 2-25, <b>2-32</b> | | | | Rasterizer, 1-1, 2-1 | Statistic Operations, 11-2 | | | StatisticMode, 11-2, 11-4, 11-5 Stencil, 4-6, 4-11 stencil buffer, 20 Stencil Example, 4-11 Stencil Test, 1-1 Stencil Test, 4-8 StencilData, 4-8, 4-10 StencilMode, 4-8, 4-9, 4-10 stipple, 20 Stipple, 3-3 Stipple Test, 1-1 Sub Pixel Precision and Correction, 2-19 Subordinate, 1-4 Subpixel Correction, 1-4 Sync, 11-3, 11-6, 12-5 Sync Interrupt Example, 11-6 Synchronization, 11-3 System Initialization, 12-2 TexelLUT, 5-15, 5-16 TexelLUTAddress, 5-16 TexelLUTData, 5-16 TexelLUTIndex, 5-16 TexelLUTTransfer, 5-16 texture, 20 Texture, 1-1, 5-1, 5-2 Texture Filtering, 5-16 texture mapping, 1-3, 5-1, 20 TextureAddressMode, 2-18, 5-3 TextureBaseAddr, 5-3 TextureChromaLower, 5-17 TextureChromaUpper, 5-17 TextureColor Generation, 5-19 TextureEnvColor, 5-28 *TextureFilterMode*, 5-2, 5-16, **5-17** TextureReadMode, 2-18, 5-3, 5-4, 5-5, 5-8, 5-16 Trapezoids, 2-2 TStart, 5-2, 5-5 UpdateLineStippleCounters, 3-5, 3-6 UseConstantFBWriteData, 9-7 User Scissor Test, 3-1 Video Timing, 12-3 WaitForCompletion, 2-31, 4-2 Window, 4-6, 4-11, 4-13 Window Address Setting, 12-6 window control, 20 Window Initialization, 12-6 WindowOrigin, 3-1, 3-3 Write Masks, 10-1 writemask, 20 Writemasks, 12-7 XOR Example, 9-9 Y Limits Clipping, 2-30 ZStartL, 4-15 ZStartU, 4-15