Sunday, July 21, 2019
Pentium Memory Management Unit Computer Science Essay
Pentium Memory Management Unit Computer Science Essay The main aim of the research paper is to analyze Pentium Memory Management Unit. Here, certain key features associated with a memory management unit like segmentation, paging, their protection, cache associated with MMU in form of translation look aside buffer, how to optimize microprocessors performance after implementing those features etc. have been discussed. Some problems and their respective solutions related to Pentium memory management unit are also covered. Also, the current and future research work done in the field of memory management is covered too. The main challenge is to get accustomed with the Pentium memory management unit and analyze the crucial factors related. Introduction A hardware component liable in handling different accesses to memory requested by CPU is known as memory management unit (MMU), which is also termed as paged memory management unit (PMMU). The main functions of MMU can be categorized as follows:-[1] Translation of virtual addresses to physical addresses which is also known as virtual memory management (VMM). Memory protection Cache Control Bus Arbitration Bank switching The memory system for Pentium microprocessor is 4G bytes in size just as in 80386DX and 80486 microprocessors. Pentium uses a 64-bit data bus to address memory organized in eight banks that each contains 512M bytes of data. Most microprocessors including Pentium also supports virtual memory concept with the help of memory management unit. Virtual memory is used to manage the resource of physical memory. It gives an application the illusion of a very large amount of memory, typically much larger than what is actually available. It supports the execution of processes partially resident in memory. Only the most recently used portions of a processs address space actually occupy physical memory-the rest of the address space is stored on disk until needed. The Intel Pentium microprocessor supports both segmentation and segmentation with paging. Another important feature supported by Pentium processors is the memory protection. This mechanism helps in limiting access to certain segments or pages based on privilege levels and thus protect critical data if kept in a privilege level with highest priority from different attacks. Intels Pentium processor also supports cache, translation look aside buffers, (TLBs), and a store buffer for temporary on-chip (and external) storage of instructions and data. Another major issue resolved by MMU is the fragmentation of memory. Sometimes, the size of largest contiguous free memory is much smaller than the total available memory because of the fragmentation issue. With virtual memory, a contiguous range of virtual addresses can be mapped to several non-contiguous blocks of physical memory. [1] This research paper basically revolves around different functions associated with a memory management unit of Pentium processors. This includes features like virtual memory management, memory protection, and cache control and so on. Pentiums memory management unit has some problems associated with it and some benefits as well which will be covered in detail in the later part. The above mentioned features help in solving major performance issues and has given a boom to the microprocessor world. History In some early microprocessor designs, memory management was performed by a separate integrated circuit such as the VLSI VI475 or the Motorola 68851 used with the Motorola 68020 CPU in the Macintosh II or the Z8015 used with the Zilog Z80 family of processors. Later microprocessors such as the Motorola 68030 and the ZILOG Z280 placed the MMU together with the CPU on the same integrated circuit, as did the Intel 80286 and later x86 microprocessors. The first memory management unit came into existence with the release of 80286 microprocessor chip in 1982. For the first time, 80286 offered on-chip memory management which makes it suitable for multitasking operations. On many machines, cache access time limits the clock cycle rate and in turn it affects more than the average memory access time. Therefore, to achieve fast access times, fitting the cache on chip was very important and this on-chip memory management paved the way. The major functionalities associated with a memory management are segmentation and paging. Segmentation unit was found first and foremost on 8086 processor which had only one purpose of serving as a gateway for 1MB physical address space. To allow easy porting from old applications to the new environment, it was decided by Intel to keep the segmentation unit alive under protected-mode. Protected mode does not have fixed sized memory blocks in memory, but instead, the size and location of each segment is set in an associated data structure called a Segment Descriptor. All memory references are accessed relative to the base address of their corresponding segment so as to allow relocation of program modules fairly easy and also avoid operating system to perform code fix-ups when it loads applications into memory. [2] With paging enabled, the processor adds an extra level of indirection to the memory translation process. Instead of serving as a physical address, an application-generated address is used by the processor to index one of its look-up tables. The corresponding entry in the table contains the actual physical address which is sent to the processor address bus. Through the use of paging, operating systems can create distinct address spaces for each running application thus simplifying memory access and preventing potential conflicts. Virtual-memory allows applications to allocate more memory than is physically available. This is done by keeping memory pages partially in RAM and partially on disk. When a program tries to access an on-disk page, anà Exceptionà is generated and the operating system reloads the page to allow the faulting application resume its execution. [2] The Pentium 4 was Intels final endeavor in the realm of single-core CPUs. The Pentium 4 had an on-die cache memory of 8 to 16 KB. The Pentium 4 memory cache is a memory location on the CPU used to store instructions to be processed. The Pentium 4 on-die memory cache is an extremely fast memory location which stored and decoded instructions known as microcode that were about to be executed by the CPU. [3] By todays standards, the Pentium 4 cache size is very lacking in capacity. This lack of cache memory means the CPU must make more calls to RAM for operating instructions. These calls to RAM are performance reducing, as the latency involved in transferring data from RAM is much higher than from the on-die cache. Often overlooked, the cache size of any CPU is of vast importance to predicting the performance of aà computerà processor. While the Pentium 4s level one cache was very limited by todays standards, it was at the time of its release more than adequate for the majority of computer applications. [4] Likely Pentium Pros most noticeable addition was its on-package L2 cache, which ranged from 256 KB at introduction to 1 MB in 1997. Intel placed the L2 die(s) separately in the package which still allowed it to run at the same clock speed as the CPU core. Additionally, unlike most motherboard-based cache schemes that shared the main system bus with the CPU, the Pentium Pros cache had its own back-side bus. Because of this, the CPU could read main memory and cache concurrently, greatly reducing a traditional bottleneck. The cache was also non-blocking, meaning that the processor could issue more than one cache request at a time (up to 4), reducing cache-miss penalties. These properties combined to produce an L2 cache that was immensely faster than the motherboard-based caches of older processors. This cache alone gave the CPU an advantage in input/output performance over older x86 CPUs. In multiprocessor configurations, Pentium Pros integrated cache skyrocketed performance in comparis on to architectures which had each CPU sharing a central cache. [4]However, this far faster L2 cache did come with some complications. The processor and the cache were on separate dies in the same package and connected closely by a full-speed bus. The two or three dies had to be bonded together early in the production process, before testing was possible. This meant that a single, tiny flaw in either die made it necessary to discard the entire assembly. [5] Technical Aspects of Pentiums Memory Management Unit Virtual Memory Management in Pentium The memory management unit in Pentium is upward compatible with the 80386 and 80486 microprocessors. The linear address space for Pentium microprocessor is 4G bytes that means from 0 to (232 1). MMU translates the Virtual Address to Physical address in less than a single clock cycle for a HIT and also it minimizes the cache fetch time for a MISS. CPU generates logical address which are given to segmentation unit which produces linear address which are then given to paging unit and thus paging unit generates physical address in main memory. Hence, paging and segmentation units are sub forms of MMUs. Figure 3.1 Logical to Physical Address Translation in Pentium Pentium can run in both modes i.e. real or protected. Real mode does not allow multi-tasking as there is no protection for one process to interfere with another whereas in protected mode, each process runs in a separate code segment. Segments have different privilege levels preventing the lower privilege process (such as an application) to run a higher privilege one (e.g. Operating system). Pentium running in Protected mode supports both segmentation and segmentation with paging. Segmentation: Pentium This process helps in dividing programs into logical blocks and then placing them in different memory areas. This makes it possible to regulate access to critical sections of the application and help identify bugs during the development process. It includes several features like to define the exact location and size of each segment in memory and set a specific privilege level to a segment which protects its content from unauthorized access. [6] Segment registers are now calledà segment selectorsà because they do not map directly to a physical address but point to an entry of the descriptor table. Pentium CPU has six 16 bit segment registers called SELECTORS. The logical address consists of 16 bit of segment size and 32 bit offset. The below figure shows a multi-segment model which uses the full capabilities of the segmentation mechanism to provide hardware enforced protection of code, data structures, and programs and tasks. This is supported by IA-32 architecture. Here, each program is given its own table of segment descriptors and its own segments. Figure 3.1.1.1 Multi-Dimensional Model When the processor needs to translate a memory location SEGMENT: OFFSET to its corresponding physical address à â⬠, it takes the following steps: [7] Step 1: Find the start of the descriptor table (GDTR register) The below figure shows CPU selectors provide index (pointer) to Segment Descriptors stored in RAM in the form of memory structures called Descriptor Tables. Then, that address is combined with the offset to locate a specific linear address. Figure 3.1.1.2 Selector to Descriptor and then to finally linear address in Pentium MMU Step 2: Find the Segmentà entry of the table; this is the segment descriptor corresponding to the segment. There are two types of Descriptor tables: Global Descriptor Table and Local Descriptor table. Global Descriptor Table: It consists of segment definitions that apply to all programs like the code belonging to operating system segments created by OS before CPU switched to protected mode. Local Descriptor Table: These tables are unique to an application. This figure finds the entry of the segment table and then a segment descriptor is chosen corresponding to the segment. [7] Figure 3.1.1.3 Global and Local Descriptor Table Pentium has a 32 bit base address which allows segments to begin at any location in its 4G bytes of memory. The below figure shows the format of a descriptor of a Pentium processor: [7] Figure 3.1.1.4 Pentium Descriptor Format Step 3: Find the base physical address à Ãâ of the segment Step 4: Compute à â⬠= à Ãâ + OFFSET [7] Paging Unit Paging is an address translation from linear to physical address. The linear address is divided into fixed length pages and similarly the physical address space is divided into same fixed length frames. Within their respective address spaces pages and frames are numbered sequentially. The pages that have no frames assigned to them are stored on the disk. When the CPU needs to run the code on any non-assigned page, it generates a page fault exception, upon which the operating system reassigns a currently non-used frame to that page and copies the code from that page on the disk to the newly assigned RAM frame. [9] Pentium MMU uses the two-level page table to translate a virtual address to a physical address. The page directory contains 1024 32-bit page directory entries (PDEs), each of which points to one of 1024 level-2 page tables. Each page table contains 1024 32-bit page table entries (PTEs), each of which points to a page in physical memory or on disk. The page directory base register (PDBR) points to the beginning of the page directory. Figure 3.1.2.1 Pentium multi-level page table [8] For 4KB pages, Pentium uses a two level paging scheme in which division of the 32 bit linear address as: Figure 3.1.2.2 Division of 32 bit linear address The below figure shows the complete address translation process in Pentium i.e. from CPUs virtual address to main memorys physical address. Figure 3.1.2.3 Summary of Pentium address translation [8] The size of a paging table is dynamic and can become large in a system that contains large memory. In Pentium, due to the 4M byte paging feature, there is just a single page directory and no page tables. Basically, this mechanism helps operating system to create VIRTUAL (faked) address space by swapping code between disk and RAM. This procedure is known as virtual memory support. [9] The paging mechanism in Pentium functions with 4K byte memory pages or with a new extension available to the Pentium with 4M byte memory pages. The 20-bit VPN is partitioned into two 10-bit chunks. VPN1 indexes a PDE in the page directory pointed at by the PDBR. The address in the PDE points to the base of some page table that is indexed by VPN2. The PPN in the PTE indexed by VPN2 is concatenated with the VPO to form the physical address. [8] Figure 3.1.2.4 Pentium Page table Translation [8] Segmentation with Paging: Pentium Pentium supports both pure segmentation and segmentation with paging. To select a segment, program loads a selector for that segment into one of six segment registers. For e.g. CS register is a selector for code segment and DS register is a selector for data segment. Selector can specify whether segment table is Local to the process or Global to the machine. Format of a selector used in Pentium is as follows: C:Bb4JPGfoo4-43.jpg Figure 3.1.3.1 Selector Format The steps required to achieve this methodology are as follows:- Step 1: Use the Selector to convert the 32 bit virtual offset address to a 32 bit linear address. Step 2: Convert the 32 bit linear address to a physical address using a two-stage page table. Figure 3.1.3.2 mapping of a linear address onto a physical address [9] The below figures shows the complete process of segmentation along with paging which is one of the important functionalities of Pentiums memory management unit. [9] Figure 3.1.3.3 Segmentation with paging Some modern processors allow usage of both, segmentation and paging alone or in a combination (Motorola 8030 and later, Intel 80386, 80486, and Pentium) the OS designers have a choice which is cgiven in the below table. [9] Segmentation Paging No No Small (embedded) systems, low overhead, high performance No Yes Linear address space BSD UNIX, Windows NT Yes No Better controlled protection and sharing. ST can be kept on chip predictable access times (Intel 8086) Yes Yes Controlled protection/sharing Better memory management. UNIX Sys. V, OS/2. Figure 3.1.3.4 Usage of segmentation and paging in different processors Intel 80386, 486 and Pentium support the following MM scheme which is used in IBM OS/2. The diagram is shown below: Figure 3.1.3.5 Intels Memory Management scheme implemented in IBM OS/2 3.1.4 Optimizing Address Translation in Pentium processors The main goal of memory management for address translation is to have all translations in less than a single clock cycle for a HIT and minimize cache fetch time for a MISS. On page fault, the page must be fetched from disk and it takes millions of clock cycles which are handled by OS code. To minimize page fault rate, two methods used are:- 1. Smart replacement algorithms: To reduce page fault rate, the most preferred replacement algorithm is least-recently used (LRU). In this, a reference bit is set to 1 in page table entry to each page and is periodically cleared to 0 by OS. A page with reference bit equal to 0 has not been used recently. [10] 2. Fast translation using Translation Look aside Buffer: Address translation would appear to require extra memory references i.e. one to access the Page table entry and then the other for actual memory access. But access to page tables has good locality and thus use a fast cache of PTEs within the CPU called a Translation Look-aside Buffer (TLB) where the typical rate in Pentium is 16-512 PTEs, 0.5-1 cycle for hit, 10-100 cycles for miss, 0.01%-1% miss rate. [11] Page size 4KB -64 KB Hit Time 50-100 CPU clock cycles Miss Penalty Access time Transfer time 106 107 clock cycles 0.8 x 106 -0.8 x 107 clock cycles 0.2 x 106 -0.2 x 107 clock cycles Miss rate 0.00001% 0.001% Virtual address space size GB -16 x 1018 byte Figure 3.1.4.1 TLB rates Using the below mentioned two methods, TLB misses are handled (hardware or software) The page is in memory, but its physical address is missing. A new TLB entry must be created. The page is not in memory and the control is transferred to the operating system to deal with a page fault where it is handled by causing exception (interrupt): using EPC and Cause register. There are two ways of handling them:- Instruction page fault: Store the state of the process Look up the page table to find the disk address of the referenced page Choose a physical page to replace Start a read from disk for the referenced page Execute another process until the read completes Restart the instruction which caused the fault [12] Data access page fault: Occurs in the middle of an instruction. MIPS instructions are restartable: prevent the instruction from completing and restart it from the beginning. More complex machines: interrupting instructions (saving the state of CPU) 3. The other method used to reduce the HIT time is to avoid address translation during indexing. The CPU uses virtual addresses that must be mapped to a physical address. A cache that indexes by virtual addresses is called a virtual cache, as opposed to a physical cache. A virtual cache reduces hit time since a translation from a virtual address to a physical address is not necessary on hits. Also, address translation can be done in parallel with cache access, so penalties for misses are reduced as well. Although some difficulties are associated with Virtual cache technique i.e. process switches require cache purging. In virtual caches, different processes share the same virtual addresses even though they map to different physical addresses. When a process is swapped out, the cache must be purged of all entries to make sure that the new process gets the correct data. [13] Different solutions to overcome this problem are:- PID tags: Increase the width of the cache address tags to include a process ID (instead of purging the cache.) The current process PID is specified by a register. If the PID does not match, it is not a hit even if the address matches. Anti-aliasing hardware: A hardware solution called anti-aliasing guarantees every cache block a unique physical address. Every virtual address maps to the same location in the cache. Page coloring: This software technique forces aliases to share some address bits. Therefore, the virtual address and physical address match over these bits. Using the page offset: An alternative to get the best of both virtual and physical caches. If we use the page offset to index the cache, then we can overlap the virtual address translation process with the time required to read the tags. Note that the page offset is unaffected by address translation. However, this restriction forces the cache size to be smaller than the page size. Pipelined cache access: Another method to improve cache is to divide cache access into stages. This will lead to the following result: Pentium: 1 clock cycle per hit Pentium II and III: 2 clock cycles per hit Pentium 4: 4 clock cycles per hit It helps in allowing faster clock, while still producing one cache hit per clock. But the problem is that it has higher branch penalty, higher load delay. [13] Trace caches: A trace cache is a specialized instruction cache containing instruction traces; that is, sequences of instructions that are likely to be executed. It is found on Pentium 4 (NetBurst microarchitecture). It is used instead of conventional instruction cache. Cache blocks contain micro-operations, rather than raw memory and contain branches and continue at branch target, thus incorporating branch prediction. Cache hit requires correct branch prediction. The major advantage is that it makes sure instructions are available to supply the pipeline, by avoiding cache misses that result from branches and the disadvantage is that the cache may hold the same instruction several times and it has more complex control. [13] System Memory Management Mode The system memory management mode (SMM) is on the same level as protected mode, real mode and virtual mode, but it is provided to function as a manager. The SMM is not intended to be used as an application or a system level feature. It is intended for high-level system functions such as power management and security, which most Pentiums use during operation, but that are controlled by the operating system. Access to the SMM is accomplished via a new external hardware interrupt applied to the SMI# pin on the Pentium. When the SMM interrupt is activated, the processor begins executing system-level software in an area of memory called the system management RAM, or SMMRAM, called the SMM state dump record. The SMI# interrupt disables all other interrupts that are normally handled by user applications and the operating system. A return from the SMM interrupt is accomplished with a new instruction called RSM. RSM returns from the memory management mode interrupt and returns to the interrupted program at the point of the interruption. SMM allows the Pentium to treat the memory system as a flat 4G byte system, instead of being able to address the first 1M of memory. SMM helps in executing the software initially stored at a memory location 38000H. SMM also stores the state of the Pentium in what is called a dump record. The dump record is stored at memory locations 3FFA8H through 3FFFFH. The dump record allows a Pentium based system to enter a sleep mode and reactivate at the point of program interruption. This requires that the SMMRAM be powered during the sleep period. The Halt auto restart and I/O trap restarts are used when the SMM mode is exited by the RSM instruction. These data allow the RSM instruction to return to the halt state or return to the interrupt I/O instruction. If neither a halt nor an I/O operation is in effect upon entering the SMM mode, the RSM instruction reloads the state of the machine from the state dump and returns to the point of interruption. [14] Memory protection in Pentium In protected mode, the Intel 64 and IA-32 architectures provide a protection mechanism that operates at both the segment level and the page level. This protection mechanism provides the ability to limit access to certain segments or pages based on privilege levels. The Pentium 4 also supports four protection levels, with level 0 being the most privileged and level 3 the least. Segment and page protection is incorporated in localizing and detecting design problems and bugs. It can also be implemented into end-products to offer added robustness to operating systems, utilities software, and applications software. This protection mechanism is used to verify certain protection checks before actual memory cycle gets started such as Limit checks, type checks, privilege level checks, restriction of addressable domains and so on. The figure shows how these levels of privilege are interpreted as rings of protection. Here, the center (reserved for the most privileged code, data, and stacks) is used for the segments containing the critical software, usually the kernel of an operating system. Outer rings are used for less critical software. At each instant, a running program is at a certain level, indicated by a 2-bit field in its PSW (Program Status Word). Each segment also belongs to a certain level. Figure 3.3.1 Protection on Pentium II Memory protection implemented by associating protection bit with each frame valid-invalid bit attached to each entry in the page table: Valid indicates that the associated page is in the process logical address space, and is thus a legal page. Invalid indicates that the page is not in the process logical address space. As long as a program restricts itself to using segments at its own level, everything works fine. Attempts to access data at a higher level are permitted. Attempts to access data at a lower level are illegal and cause traps. 3.4 Cache in Pentium Processors Cache control is one of the most common techniques for improving performance in computer systems (both hardware and software) is to utilize caching for frequently accessed information. This lowers the average cost of accessing the information, providing greater performance for the overall system. This applies in processor design, and in the Intel Pentium 4 Processor architecture, caching is a critical component of the systems performance. The Pentium 4 Processor Architecture includes multiple types and levels of caching: Level 3 Cache This type of caching is only available on some versions of the Pentium 4 Processor (notably the Pentium 4 Xeon processors). This provides a large on-processor tertiary memory storage area that the processor uses for keeping information nearby. Thus, the contents of the Level 3 cache are faster to access. Level 2 Cache this type of cache is available in all versions of the Pentium 4 Processor. It is normally smaller than the Level 3 cache and is used for caching both data and code that is being used by the processor. Level 1 Cache this type of cache is used only for caching data. It is smaller than the Level 2 Cache and generally is used for the most frequently accessed information for the processor. Trace Cache this type of cache is used only for caching decoded instructions. Specifically, the processor has already broken down the normal processor instructions into micro operations and it is these micro ops that are cached by the P4 in the Trace Cache. Translation Look aside Buffer (TLB) this type of cache is used for storing virtual-to-physical memory translation information. It is an associative cache and consists of an instruction TLB and data TLB. Store Buffer this type of cache is used for taking arbitrary write operations and caching them so they may be written back to memory without blocking the current processor operations. This decreases contention between the processor and other parts of the system that are accessing main memory. There are 24 entries in the Pentium 4. Write Combining Buffer this is similar to the Store Buffer, except that it is specifically optimized for burst write operations to a memory region. Thus, multiple write operations can be combined into a single write back operation. There are 6 entries in the Pentium 4. The disadvantage of caching is handling the situation when the original copy is modified, thus making the cached information incorrect (or stale). A significant amount of the work done within the processor is ensuring the consistency of the cache, both for physical memory as well as for the TLBs. In the Pentium 4, physical memory caching remains coherent because the processor uses the MESI protocol. MESI defines the state of each unique cached piece of memory, called a cache line. In the Pentium 4, a cache line is 64 bytes. Thus, with the MESI protocol, each cache line is in one of four states: Modified the cache line is owned by this processor and there are modifications to that cache line stored within the processor cache. No other part of the system may access the main memory for that cache line as this will obtain stale information. Exclusive the cache line is owned by this processor. No other part of the system may access the main memory for that cache line. Shared the cache line is owned by this processor. Other parts of the system may acquire shared access to the cache line and may read that particular cache line. None of the shared owners may modify the cache line. Invalid the cache line is in an indeterminate state for this processor. Other parts of the system may own this cache line, or it is possible that no other part of the system owns the cache line. This processor may not access the memory and it is not cached. [15] Current Problems and Solution associated with them When you run multiple programs (especially MS-DOS-based programs) on a Windows-based computer that has insufficient system memory (RAM) and contains an Intel Pentium Pro or Pentium II processor, information in memory may become unavailable or damaged, leading to unpredictable results. For example, copy and compare operations may not work consistently.à This behavior is an indirect result of certain performance optimizations in the Intel Pentium Pro and Pentium II processors. These optimizations affect how the Windows 95 Virtual Machine Manager (VMM) performs certain memory operations, such as determining which sections of memory are not in use and can be safely freed. As a result, the Virtual Machine Manager may free the wrong pages in memory, leading to the symptoms described earlier. This problem no longer occurs in Windows 98. To resolve this problem, install the current version of Windows. [16] There is a little problem with sharing in
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.