Index Home About
From: mash@mash.engr.sgi.com (John R. Mashey)
Newsgroups: comp.std.c
Subject: Re: szieof (an array type)
Date: 28 Apr 1999 16:56:17 GMT

In article <WvWqw3Nm$tJ3EwcW@romana.davros.org>, "Clive D.W. Feather"
<clive@on-the-train.demon.co.uk> writes:

|> >Obviously the size of an array "T a[n];" must be at least n*sizeof(T), but
|> >can there be padding _at the end_ of an array?

This reminds me: while it is clearly too late for a new feature,
can anyone on the Committee say if there's been any discussion of the
following sort of thing, and if not, people might start thinking of it
for the next time.

Here is the problem:  many assemblers support a .origin or equivalent
directive that can be used to align a data item/structure onto
power-of-2 boundaries larger than data items actually found in C,
i.e., of special relevance would be cache-line-sizes (32, 64, 128 typical),
or page sizes.  Codes exist that already work fairly hard to control
the alignment of structures, arrays, COMMON blocks, and sometimes
heavily-contended locks (which are good things to have in separate
cache lines).

But the reasons for doing this are inevitably going to get much more
important (later).

Perhaps something like (don't care much about the syntax)
	#define LINESIZE 64
	....
	aligned(LINESIZE) struct x { ...} xx;

should, for sure, align xx on a multiple of LINESIZE ... and probably pad
its size to a multiple of LINESIZE so that arrays of x's all started on
the same alignment.  I.e., this is similar to the effect currently
obtained by declaring as the first data item in x, a data time with the
most restrictive alignment requirements (typically 8 bytes, or maybe 16
bytes if an implementation supports a 128-bit aligned long double).
I.e., this is the same sort of mechanism, all that's happened is that the
existing alignments of 1, 2, 4 (8, 16) get augmented by 32, 64, 128, etc.

As to why one cares, it is sad, but true that the increasing gap between
CPU speed and DRAM is increasing the relative cost of cache misses,
and we're headed towards having to think more about data placements.
In 1985, you could do 1-2 instructions in the time of a memory access.
In 2001, this is likely to be approaching 1000 (peak) instructions
in the time of a cache miss (i.e., imagine 1ns CPU cycle, 4-6 issues/cycle,
and 150ns realistic memory time = 600-900 instructions.)

It may be very useful to make sure that a struct doesn't cross more
line-size boundaries than it needs to.

On the one hand, it is disturbing to think that an implementation thing
like line-size might start getting embedded into more user code.
On the other hand, this is not a fundamentally new mechanism.
On the third hand "the gripping hand"), C's ability to provide
efficient code is going to be under stress from the CPU-DRAM gap,
so we need to start thinking about it.


--
-john mashey    DISCLAIMER: <generic disclaimer: I speak for me only...>
EMAIL:  mash@sgi.com  DDD: 650-933-3090 FAX: 650-933-4392
USPS:   Silicon Graphics/Cray Research 40U-005,
2011 N. Shoreline Blvd, Mountain View, CA 94043-1389


From: mash@mash.engr.sgi.com (John R. Mashey)
Newsgroups: comp.std.c
Subject: Re: ALIGN_MAX and max_align_t again
Date: 4 Jun 1999 22:19:04 GMT

In article <375830B2.61DBDAEC@technologist.com>, David R Tribble
<dtribble@technologist.com> writes:

|> Something tells me that there's a contradiction in saying that a
|> pointer to a double can contain an address that's a multiple of 4

1) Some CPUs allow anything to be aligned anywhere.

2) Some CPUs require strict alignment, at least of the usual
integer and floating-point data, i.e., an 8-byte object must be
aligned on 8-byte boundaries, else there is a trap considered an error.

3) Some CPUs prefer strict alignment, but allow arbitrary alignment with
varying performance penalities), i.e., if you access 8-bytes, unaligned,
but it's within a cache line, it is reasonably fast, and if it crosses
cache lines, it's a non-fatal interrupt that the OS is expected to fix.

4) Some CPUs require alignment of multi-byte items, but not quite
as strict, i.e., 8-byte objects can be on 4-byte boundaries,
or 4-byte items can be on 2-byte boundaries.  This usually happens when
the first version of a CPU has a bus-width somewhere that is half of
the item size, and so allowing the relaxed alignment may save space,
and costs nothing (at that point) ... although usually, later CPUs come
along that have wider busses, and then the earlier decision to allow the
unstrict alignment causes serious cursing of the earlier decision.

Suggestion: regardless of what the standards *allow*, it has historically
been a good idea when designing a new C environment to enforce the strictest
alignment, even if it is not required by the early CPUs, since busses tend to
get wider, and unalignment tends to hurt them.

[This is actually related to a more general problem, which is that C says
a lot about alignment of existing data objects, but doesn't seem to cleanly
address the related issues of:
	a) Alignment/padding on page-size boundaries.
	b) Alignment/padding on cache-line boundaries (or on a boundary
	guaranted to be as large as the largest cache line to be
	encountered in a given CPU family.  I.e., 64- or 128 bytes
	would do OK here).
Some people care about a), some people care about b); over time, more people
will likely care about b), as the ratio of CPU instructions / cache miss
increases - we're soon headed towards CPUs that can execute ~ 1K instructions
in the time of 1 cache-miss, and minor changes in data layout can make
a serious difference.
--
-john mashey    DISCLAIMER: <generic disclaimer: I speak for me only...>
EMAIL:  mash@sgi.com  DDD: 650-933-3090 FAX: 650-933-4392
USPS:   Silicon Graphics/Cray Research 40U-005,
2011 N. Shoreline Blvd, Mountain View, CA 94043-1389

Index Home About