# **Entry-Level Solutions** | | ' <u>000</u> | <u>020</u> | <u>030</u> | |--------------------------|--------------|------------|------------| | Resources | IU | IU | IU, MMU | | I Cache (bytes) | - | 256 | 256 | | D Cache (bytes) | - | - | 256 | | Thru-Hole Package | - | 100 PPGA | 128 PPGA | | SMT Package | 68 QFP | 100 PQFP | 132 CQFP | | Available f | 8->16 | 16, 25 | 25, 40 | | Typical Power (fmax) | ~.2w | ~.6w | ~1w | | Process Technology | UDR1 | IDR/UDR1 | IDR | | <b>Production Status</b> | MC Now | MC Now | MC Now | | Static Core Availability | Now | Now | 1Q95 | | Performance (Mips) | 2.7 | 9.8 | 14.3 | | | | | | ### **68000 Architecture** | D0 D1 D2 D3 D4 D5 D6 | |----------------------| | D2<br>D3<br>D4<br>D5 | | D3<br>D4<br>D5 | | D4<br>D5 | | D5 | | | | D6 | | | | D7 | | Α0 | | <b>A</b> 1 | | A2 | | A3 | | <b>A4</b> | | A5 | | <b>A6</b> | | A7 | | PC | | A7' | | SSR CCR | - 32-Bit Instruction Set Architecture - Robust & General Purpose - Broadly Extensible & Scalable - Compact, Dense Code - Low Memory Bandwidth Needs - Efficient High-Level Language Support - > 25 Major Products, >>100 Variations - 1 -> 100 MIPS In Production Now - Entry Pricing From < \$3 - >100 Million Units Installed World-Wide - >3 Million Units / Month Growth - Industry-Wide Support Infrastructure # **68040 1st Generation Family** | Resources | | |--------------------------|--| | I Cache (bytes) | | | D Cache (bytes) | | | Thru-Hole Package | | | SMT Package | | | Available f | | | ypical Power (f max) | | | Process Technology | | | <b>Production Status</b> | | | Performance (MIPs) | | | | | | <u>040</u> | |------------| | IU,FPU,MMU | | 4k | | 4k | | 179 PGA | | 184 CQFP | | 25, 33, 40 | | ~8W | | UDR1 | | XC Now | | 43.8 | | <u>LC040</u> | |--------------| | IU, MMU | | 4k | | 4k | | 179 PGA | | 184 CQFP | | 25, 33, 40 | | ~6W | | UDR1 | | XC Now | | 43.8 | | | | EC040 | |------------| | IU | | 4k | | 4k | | 179 PGA | | 184 CQFP | | 25, 33, 40 | | ~5W | | UDR1 | | XC Now | | 43.8 | ## 68040 Block Diagram ### **040 Features** - Single cycle instruction execution - Optimized for Branches Taken (taken/not\_taken = 2/3 cycles respectively) - New instruction: MOVE16 (ideal for block moves) - 2 clock synchronous bus cycles - Dual 4K byte, 4-way set associative instruction and data caches - Full Internal Harvard architecture - 2-1-1-1 read and write bursting - Copy-back Data Cache increases performance & reduces bus bandwidth - Separate 64-entry, 4-way set associative, instruction and data MMUs - Multimaster/Multiprocessor support via bus snooping - User code software compatible with M680x0 family - Full 040s compatible with M68881/2 floating point coprocessors using 040FPSP - Publicly available on AESOP BBS # **040 Immunity to Wait States** ### 040 2nd Generation 68040V 4k UDR2 Resources IU, MMU (Equivalent to 68LC040) I Cache (bytes) D Cache (bytes) 4k Thru-Hole Package 183 PGA SMT Package 184 CQFP Available *f* 25, 33, 40 Typical Power (f max) ~1.5 W Process Technology Production Status General Sampling Performance (MIPs) 44 @ 40MHz ### 68040v Overview #### **▼Features** - Fully 68LC040 compatible - Low power: typically < 1.5W</li> - Dual 4Kbyte caches - Dual 64 entry ATCs - 0 to 40MHz operation - » Single Clock Input - Bus compatible with 060 providing easy migration - New Signals - » LFO Low frequency operation - » LOC Loss of BCLK - » SCD System clock disable #### **▼Power Management** - New LPSTOP instruction - 3.3v static design - » 75% Pwr Reduction - » I/O to 3V or 5V logic #### **▼Performance** - 44 Dhrystone MIPs @ 40MHz #### **▼Technology** - 0.5mm TLM CMOS ### **68040V Features** - ▼ 3.3-Volt Supply - Over 75 Percent Reduction in Power from 5-Volt Device - Input and Output Buffers Interface Directly to mixed Voltage Designs - Consistent 040 Programming Model - Data Types, Instruction Set, Cache, and MMUs same as LC/EC040 - Variable operating Frequency 0 to 40 MHz - Single Clock Input (2x clock generated internally) - ▼ Bus Compatible with the 060 Providing Easy Migration Path ## Signals new to 68040V - Low Frequency Operation (LFO) Allows Bus Clock Frequency to Instantaneously Change to an Operating Frequency Between 0-16Mhz - V Loss of Clock (LOC) Detects Loss of Bus Clock or Phase Lock Loop Error - System Clock Disable (SCD\*) Indicates Bus Clock Input can be Disabled # MC68060 Family | Resources | |--------------------------| | I Cache (bytes) | | D Cache (bytes) | | Thru-Hole Package | | SMT Package | | Available Speeds | | Typical Power (fmax) | | Process Technology | | <b>Production Status</b> | | Performance (MIPs) | | <u>060</u> | |------------| | IU,FPU,MMU | | 8k | | 8k | | 223 PGA | | 240 CQFP | | 50, 66 | | 3.5W | | UDR2 | | XC Now | | 100 | | | | <u>LC060</u> | |--------------| | IU, MMU | | 8k | | 8k | | 223 PGA | | 240 CQFP | | 50, 66 | | 3.0W | | UDR2 | | XC Now | | 100 | | | | J | | |---|----------| | | EC060 | | | IU | | | 8k | | | 8k | | | 223 PGA | | | 240 CQFP | | | 50, 66 | | | 3.0W | | | UDR2 | | | XC Now | | | 100 | | | | ### 68060 Overview #### **▼Features** - RISC-hybrid superscalar 680x0 compatible microprocessor - Pipelined extended precision FPU - Dual 8Kbyte caches - Dual 64 entry ATCs - 256 entry branch cache - 68040 compatible bus #### ▼Power Management - LPSTOP instruction - Dynamic internal clocking - 3.3v static design - » Clock can be slowed/stopped - » I/O to 3V or 5V logic #### **▼Performance** - 100 MIPs @ 66MHz - 65 SpecInts @ 66MHz #### **▼Technology** - 0.5mm TLM CMOS ## **Embedded System Challenges** - Code usually shipped in ROM / EPROM / FLASH - Every Bit costs \$ - Memory Bandwidth is limited - Add'l Memory Speed is Expensive - Add'l Memory Width is Expensive (traces, components, granularity, etc.) - High Performance is Key but must be tied to cost goals - System Power Consumption Critical - Need migration path up to higher performance - Need migration path down to lower costs ## **RISC Hybrid Architecture** ## **060 Superscalar Pipeline** - 2 instructions per clock, plus branch - 4-stage prefetch pipeline - Dual, 4-stage integer execute pipelines - Pipes operate completely synchronously - Strict in-order execution within & between the two execute pipe - Reads and writes always obey program order - 50-60% of instructions executed as pairs - Separate address generation and execute engines in each pipe - Some instructions dynamically relocated to address generation stage - Complete exception restart operation - The branch cache indexes branch PC addresses with targets - 256 entry, 4-way set associative array mapped using virtual addresses - State bits track branch history - Allows zero-clock branches for correct predictions of taken branches - Different architectural trade-off than on 040 ### **060 Cache Architecture** - Separate 8KB instruction and 8KB data cache - Each can be frozen to prevent allocation over time-critical code or data - Can freeze 1/2 cache allowing rest to operate as 4K 2-way set associative - Operand data cache is 4-way banked to allow simultaneous read and write access each clock - Supports 1-clk aligned double-precision operand accesses from the FPU - Snoop invalidate is supported - On-chip 4-deep write buffer for performance ### **060 MMU Architecture** - Separate instruction & data TLB in a 64-entry, 4-way set associative MMU - Each TLB can be separately frozen to prevent allocation over key PTEs - Each can be 1/2 frozen to allow other 32 entries to be dynamically updated - Default TTR capability provided for translation disabled and TTR miss - Compatible with existing 040 page tables - Dedicated hardware tablewalker - Hardware tablewalker doesn't access the data cache - MMUSR functionality replaced with new PLPA instruction - Provided to load a physical address - Translates directly with TLB / TTR hit - Tablewalks for miss, but doesn't update the TLB - Imprecise write mode for write buffer use in writethrough and non-cacheable pages to maximize bus utilization ### 060 Bus: 040 Compatible - 223-pin PGA supersocket and 240-pin CQFP - ▼ 5 Volt I/O Compatible - 16mA Output Drive Capability - Allows new high-performance designs and retrofit of existing designs. - ▼ Write data can be held for 1 clock after TA for compatibility with existing ASICs. - Single external clock - No internal PLL or VCO - Relaxed duty cycle requirements - CLKEN allows external system logic to control which CLK edges are used - Allows operation at 1/2 and 1/4 speed with existing 040 ASICs - New signals designed for high-speed operation with new system designs - TRA signal for separate retry encoding - CLA allows simplified DRAM operation with bursts - BTT signal used for full-speed bus handovers (BB still available too) - BS[3-0] internal decode of A0:1 & SIZ0:1 - ▼ Internal ignore state counters on transfer termination ease high-speed design ## **Power Management** - Clocking to Functional Units is dynamically controlled. - The cache arrays only draw power when an access is made. - Execute units power down when not needed. - External Clock can be reduced or stopped to optimize power vs performance. - The LPSTOP instruction has been implemented: - Privileged instruction - Same opcode and behavior as on 040V. - LPSTOP disconnects most of the chip from the CLK pin for lowest power. - 3.3 Volt design with TTL compatible I/O ## **Key Differences from 040** - Improved bus interface for high performance system design - Stack-frame differences - These require some OS changes for fault handling/recovery - There is no MSP (SR does implement the M-bit) - Trace on change of flow is not supported in hardware - PMMU differences - Hardware tablewalker doesn't access the data cache - OS control of the MMU has been changed: there is no PTEST or MMUSR - Default TTR allows blanket cachability, etc. - Some Low use instructions relegated to software emulation - On-chip branch cache - Virtual address mapping: must be cleared on context switches - New PCR Register (Reads Processor ID, Writes FPU & SuperScalar Enable) - Cache snooping: only snoop invalidate supported - LPSTOP instruction added for power-down capability ## **Key Differences from 040** Some complex/low-use instructions are unimplemented in hardware, with emulation trap support for full binary compatibility: | Mnemonic CAS CAS2 CHK2 CMP2 DIVS.L DIVU.L FDBcc FINT FINTRZ FMOVEM FScc FTRAPcc LPSTOP MOVEC MOVEC MOVEP MULS.L MULU.L PTEST PLPA FSAVE / FRESTORE | Description Compare and Swap Compare and Swap Dual Check Register Against Bound Compare Register Against Bound Signed Divide Unsigned Divide FP Decrement and Branch FP Integer Part FP Integer Part, Round-to-Zero FP Move Multiple Data Registers FP Set According to Condition FP Trap on Condition Low Power Stop Move Control Registers Move Peripheral Signed Long Multiply Unsigned Long Multiply Test a Logical Address Load Physical Address FP state transfer | Emulation Emulation for 64/32 Emulation for 64/32 Emulation Implemented in hardware Implemented in hardware | |----------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------| |----------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------| ## **68060 Software Package** - Production release publicly available on AESOP - All code is re-entrant and relocatable - Temporary working space on the stack - Distributed as Pseudo-assembly code - Five modules available - Integer exception handler - Integer instruction library (Can be compiled into applications to prevent exception overhead) - Full FP exception handler - Partial FP exception handler - FP library (Can be compiled into applications to prevent exception overhead) - Key Difference from 040FPSP is stack frames, Algorithms are the same - ▼ The object code < 64K bytes for kernel release</p> - Handles user and supervisor exception cases - RIS methodology used to validate the implementation on the 060 Model. - 250K clocks simulated using VCX - 4B clocks emulated on PiE HW - PTEST emulation source code available separately ## **Byte Benchmarks** ### 060 Relative to 040 #### Notes: - Run on a Motorola IDP board with DRAM. - Diab 3.2b compiler. - 040 running at 25MHz. - ▼ 060 running at 50MHz on a 25MHz bus. ## **68060 Integer Performance** Note: assumes 68060 running at 2x 68040 BCLK