Large pages (John Mashey)

Index Home About Blog

From: old_systems_guy@yahoo.com (John Mashey)
Newsgroups: comp.arch
Subject: Re: Large page support in UNIX.
Date: 11 Jan 2004 21:00:14 -0800
Message-ID: <ce9d692b.0401112100.7520627a@posting.google.com>

robertwessel2@yahoo.com (Robert Wessel) wrote in message news:<bea2590e.0401101853.23f55dcc@posting.google.com>...
> Maynard Handley <name99@redheron.com> wrote in message news:<name99-746C7C.15492810012004@netnews.comcast.net>...
> > How do the UNIXs on CPUs with large page support (ie pretty much any
> > high-end CPU these days) deal with this?
> > To me the obvious thing is to provide flags to calls like vmalloc() and
> > mmap() that say "please create this vm-region large page mapped", but is
> > that what is done, or do they try to go the (easier for the user, but a
> > whole lot messier for the implementer) route of tracking page faults and
> > deciding, based on them, to either coalesce pages, or, perhaps easier
> > and more effective, simply switch an entire vm-region to large pages?
>
>
> HP-UX on PA-RISC (I'd guess IPF too, although I don't know for sure)
> decides to use large pages dynamically as he's handling TLB faults.
> In the case of PA-RISC, page sizes are 4KB, 16KB, 64KB..., so small
> groups of pages are easy to coalesce.  Contrast the 4KB and 2/4MB
> pages on x86 where you'd have to gather 512 or 1024 small pages to
> populate a superpage.

Around 1989, I cajoled the MIPS chip folks to incorporate variable-sized
pages into the R4000 design, which first shipped in systems in 1Q1992.
These do the 4K, 16K, 64K, etc progression.
I think it was around 1995 or 1996 when the feature first started to
actually be used (maybe some IRIX person will post), and it took even
longer to tune.

IRIX does:
1) Automatic coalescing.
2) Boot-time system tuning of page sizes.
3) Run-time hinting of various kinds, I as I recall.

There are interesting interactions with ccNUMA page migration.
Google search: IRIX large page

Large pages are absolutely crucial when systemes get faced with
individual applicactions that use 100s of GBs of memory.

There are several useful lessons in large page support.

1) The original rationale was simple:
   a) Main memory would continue to grow at Moore's Law rates.
   b) MMUs with 4K pages (or 8K or 16K) typically didn't map enough
      memory already (in larger applications), but it was clear that
      MMUs weren't going to be allowed to grow at Moore's Law rates.
      [The tradeoffs just didn't make it worth the space.]
      Hence, in the absence of large page support, MMUs would inevitably
      map smaller fractions of memory, a Bad Thing.
      While eTLBmiss overheaed was minimal for smaller apps, it was
      actually noticable in some with distressful reference patterns.

2) But it took a long time to get from chip design to getting this actually
   supported and shipped in an operating system, because in fact, it took
   serious reworking various kernel data structures, an a lot of tuning
   work.  [This is the unfortunate thing for good OS people: if they do
   a bad job, the results (flakey systems) are instantly apparent,
   whereas when they do a great job, everything just works the same, only
   faster, or more consistently, or more reliably ... and people often
   don't notice.]

3) If you have a great new hardware feature, but it affects something
pervasive, especially in somebody else's sOS, expect to wait a while.

From: old_systems_guy@yahoo.com (John Mashey)
Newsgroups: comp.arch
Subject: Re: Large page support in UNIX.
Date: 12 Jan 2004 23:26:31 -0800
Message-ID: <ce9d692b.0401122326.3443a522@posting.google.com>

Seongbae Park <SeongbaeDOTPark@sun.com> wrote in message news:<bttmlp$4bp$1@news1nwk.SFbay.Sun.COM>...
> In article <ce9d692b.0401112100.7520627a@posting.google.com>, John Mashey wrote:
> ...
> >    work.  [This is the unfortunate thing for good OS people: if they do
> >    a bad job, the results (flakey systems) are instantly apparent,
> >    whereas when they do a great job, everything just works the same, only
> >    faster, or more consistently, or more reliably ... and people often
> >    don't notice.]
>
> Now replace "OS people" with processor people or compiler people,
> and repeat :)
>
> Seongbae
>
> PS. Sorry, couldn't resist.

Yes, but without any implication that any of these is any easier or
less valuable than any other, I would observe that the problems are
different amongst the three areas [having done some of all three at one
time or another.]

Processor
1) When you do the first one of a family, you get a relatively blank slate.
Life fun.  Then, from then on, for a long time, you either have to be
upward-compatible, for user level, and mostly at the kernel level ...
or else very good reasons to justify incompatibilities.  Some of this is
less fun, especially if you are trying to do high performance implementations
of architectures that don't lend themselves to that, and to accomodate
*^$^@#% programmers who were allowed to do in-line instruction
modification.

2) But, when you've got the Nth chips into production, (most) of you
are *done*, and you can get working on N+1.  You can forget about N-1,
N-2, etc.   If someone finds a bug in N-1, you will probably not rush around
trying to replace them.  You will probably not try to make a cool new CPU
chip that goes into a socket of a machine shipped 5 years ago.

OS
1) When chip N ships, the OS has to be ready to support it in all the
relevant systems configurations, of which there may be many.
People hate having too many OS versions around, so you may well have to
support systems using chips N-1, N-2, N-3, maybe 5-6 years back.

2) Hence, the normal thing faced by OS people is that the complexity of
installed base that must be supported *accumulates* over time.
Most OS people would *love* to be able to move an OS just to the
newest machines and forget about the older ones, theh way their
processor colleagues can :-)

3) When you move a new OS to an old machine, quite often there are
new features that can make the old machines slow down.  People don't
like that.

4) External things can happen that *demand* synchronized OS releases to
a wide variety of hardware platforms.  Ex: Y2K, or external standards
changes.

Compilers
1) Are somewhat like OS issues.  In particular, if there are too many
different user-visible ISA variants, it's a pain.  On the other hand,
compilers usually don't have to deala with as many hardware-dependent issues,
or as many configuration-combination multipliers as do OS folks.

2) Tuning tends to be a little less sensitive, i.e., the usual syndrome
is that a new optimization goes in as -Ox, and it usually works, but
occasionally doesn't.  That's generally easier to deal with than
ugly timing-dependent OS problems that lose memory, or crash a system.

Index Home About Blog