Linux HugePages * What you may already know: - Larger (usually 2M depending on architecture) memory page size - Processes share page table entries, less kernel page table management overhead, less TLB miss - Always resident memory - On Oracle11g, can be enabled only if AMM is disabled (memory_target set to 0) - Metalink notes 361323.1 (HugePages on Linux: What It Is... and What It Is Not...), 744769.1, 748637.1 - Similar to Solaris ISM (intimate shared memory) in almost every way Now what you may not know (but may not care either): * HugePages memory only shows as resident memory on Red Hat 4, not 5 (Actually, it's most likely Linux kernel, not Red Hat version, dependent.) On RHEL 4 server, when HugePages is used, `top' or `ps' shows that Oracle process's resident memory is only slightly smaller than virtual memory. But on RHEL 5, resident memory is very much smaller. This, however, does not change the fact that HugePages memory is guaranteed to be locked in RAM. David Gibson, a HugePages developer, says in private email "hugepages are always resident, never swapped. This [RHEL 5 showing non-resident HugePages] must be something changed in the wider MM code". * vm.nr_hugepages, memlock, and SGA sizing It's probably best to set memlock (in /etc/security/limits.conf, in KB) slightly larger than vm.nr_hugepages (in /etc/sysctl.conf, multiplied by Hugepagesize), which in turn is set slightly larger than total SGA's of all instances on the box. (Don't forget the ASM instance if any.) The memlock setting in limits.conf alone won't actually set aside memory; it's just a mathematical number to limit the amount of memory locking. But vm.nr_hugepages actually allocates memory. If after starting instance HugePages is found to be not used, lower SGA a lot and try again. Then gradually add SGA back. * Checking usage `cat /proc/meminfo'. Focus on HugePages* and PageTables. Also, `strace -f -e trace=process sqlplus / as sysdba' and startup. Look for SHM_HUGETLB in 3rd arg to shmget(). Linux shmat() doesn't have the option for this flag so tracing listener to follow down to the cloned server process won't work.[note1] Also, Linux doesn't have -s option for `pmap' as on Solaris to check page size for the individual mappings inside a process memory space. memlock affects shell's `ulimit -l' setting. Make sure your shell has the desired setting before starting DB instance. You can check how the numbers HugePages_Free and HugePages_Rsvd change while you startup or shutdown an instance that uses HugePages (adjust grep pattern as needed): while true; do for i in $(grep ^Huge /proc/meminfo | head -3 | awk '{print $2}'); do echo -n "$i " done sleep 5 done The output is like the following (numbers are HugePages_Total, HugePages_Free, HugePages_Rsvd): 512 225 192 512 225 192 512 225 192 512 512 0 <- Instance down. All HugePages freed. (This is the last moment of database shutdown.) 512 512 0 512 371 338 <- Startup. 338 pages free but reserved (i.e. 371-338=33 pages "real" free), 512-371=141 pages used 512 329 296 <- 512-329=183 pages used, up by 183-141=42, reserved pages down by 42, "real" free unchanged 512 227 194 <- 512-227=285 pages used, up by 285-183=102, reserved down by 102 too, "real" free unchanged It indicates that when the instance is started, HugePages memory pages are immediately reserved. This is a fast process because there's no write to the pages (remember reserved is just a special type of free; see http://linux-mm.org/DynamicHugetlbPool). Then when the pages are written to, they're taken off of the reserved list and used. This server has 33 "real" free pages wasted. I could have done better diligence to not assign them to HugePages. Note that older versions of HugePages code don't show reserved pages. On Red Hat Linux, the change is between RHEL 4 and 5. * 11g AMM 11g Automatic Memory Management includes PGA into auto management. But PGA can never be allocated from HugePages memory.[note2] I would set memory_target to 0 to disable AMM and configure HugePages as usual. HugePages is a far more appealing feature than AMM. If I have to sacrifice one of the two, I sacrifice AMM. The usage of SGA and PGA is so different they should be separately managed anyway. To name one issue with AMM, it requires hundreds if not thousands of descriptors for *every* server process to open *all* the files under /dev/shm, most likely 4 MB each (SGA granule size, _ksmg_granule_size). See http://download.oracle.com/docs/cd/B28359_01/install.111/b32002/pre_install.htm#sthref71 * Further reading http://linux-mm.org/HugePages _____________ [note1] On Solaris, you can run `truss -f -p ' and connect to the database through Oracle Net. The trace will show e.g. shmat(1979711503, 0x40280000000, 040000) = 0x40280000000 where 040000 is SHM_SHARE_MMU according to /usr/include/sys/shm.h. [note2] For now at least. See Kevin Closson's blog for more: http://kevinclosson.wordpress.com/2007/08/23/oracle11g-automatic-memory-management-and-linux-hugepages-support/