Index Home About Blog
Date: 	Sat, 19 May 2001 12:35:58 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: no ioctls for serial ports? [was Re: LANANA: To Pending Device
Newsgroups: fa.linux.kernel

On Sat, 19 May 2001, Pavel Machek wrote:
> 
> Well, if we did something like modify(int fd, char *how), you could do
> 
> modify(0, "nonblock,9600") 

What you're really proposing is to make ioctl's be ASCII strings.

Which is not necessarily a bad idea, and I think plan9 did something
similar (or rather, if I remember correctly, plan9 has control streams
that were ASCII. Or am I confused?).

> I thought about how to do networking without sockets, and it seems to
> me like this kind of modify syscall is needed, because network sockets
> connect to *two* different places (one local address and one
> remote). Sockets are really nasty :-(.

One of the horrors of ioctl's is indeed that they are not very
well-defined, and as such cannot be transported over a network without
knowing more about them. Structuring them some way would already be very
useful. the _IOC() macros do this partially, of course, but because it is
a voluntary thing it is not actually followed all that well in general,
and most ioctl names are just random numbers that don't tell the structure
of the arguments or return values.

And a "stream of bytes" is in a very real sense the simplest structure,
and is the unix way (and the plan9 way is to avoid binary streams, and use
ASCII text instead when possible, which probably also makes sense).

However, you can't really use a string. It would really have to be two
memory regions: incoming and outgoing, with an ASCII representation being
the _preferred_ method for stuff that isn't obviously structured or
performance-critical.

Let's not take this too far, though.

		Linus



Date: 	Sat, 19 May 2001 20:26:20 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: [RFD w/info-PATCH] device arguments from lookup, partion code
Newsgroups: fa.linux.kernel

On Sat, 19 May 2001, Richard Gooch wrote:
>
> Matthew Wilcox writes:
> > On Sat, May 19, 2001 at 10:22:55PM -0400, Richard Gooch wrote:
> > > The transaction(2) syscall can be just as easily abused as ioctl(2) in
> > > this respect.
> > 
> > But read() and write() cannot.
> 
> Sure they can. I can pass a pointer to a structure to either of them.

You're missing the point.

It's ok to do "read()/write()" on structures. In fact, people do that all
the time (and then they complain about the file not being portable ;)

The problem with ioctl is that not only are people passing ioctl's
pointers to structures, but:
 - they're not telling how big the structure is
 - the structure can have pointers to other places
 - sometimes it modifies the structure passed in

None of which are "network-nice". Basically, ioctl() is historically used
as a "pass any crap into driver xxxx, and the driver - and ONLY the driver
- will know what to do with it".

And when _only_ a driver knows what the arguments mean, upper layers can't
encapsulate them. Upper layers cannot make a packet of the argument and
send it over the network to another machine. Upper layers cannot do
sanity-checking on things like "is this argument a valid pointer". Which
means, for example, that not only can you not send the ioctl arguments
anywhere, but ioctl's have also historically been a hot-bed of bugs.

Example traditional ioctl bugs: use kernel pointers to access the argument
(because it just happens to work on x86, never mind the fact that if the
argument is bad you'll get a kernel oops and/or a serious security error).
Other example: different drivers/filesystems implementing the same ioctl,
but disagreeing on what the argument means (is it a pointer to an integer
argument, or the integer itself?).

Now, the advantage of using read()/write() is (a) that it's unambiguous
where the argument comes from and how big it is and (b) because of that
the _psychology_ is different. You don't get into this "pass random crap
around, let the kernel modify user data structures directly" mentality.

And psychology is important.

		Linus



Date: 	Sun, 20 May 2001 12:02:35 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: [RFD w/info-PATCH] device arguments from lookup, partion code 
Newsgroups: fa.linux.kernel

On Sun, 20 May 2001, David Woodhouse wrote:
> 
> If that had been done right the first time, you wouldn't have had to either.
> For that matter, it's often the case that if the ioctl had been done right
> the first time, nobody would have had to fix it up for any architecture.

The problem with ioctl's is, let me repeat, not technology.

It's people.

ioctl's are a way to do ugly things. That's what they have ALWAYS been.
And because of that, people don't care about following the rules - if
ioctl's followed the rules, they wouldn't _be_ ioctls in the first place,
but instead have a good interface (say, read()/write()).

Basically, ioctl's will _never_ be done right, because of the way people
think about them. They are a back door. They are by design typeless and
without rules. They are, in fact, the Microsoft of UNIX.

The only way to fix ioctl's is to force people to think about them in
another way. Because if you don't, there is always going to be another
driver writer who adds his own ioctl because it's the easy way to do
whatever he wants without giving it a second of _design_ thought.

Now, a good way to force the issue may be to just remove the "ioctl"
function pointer from the file operations structure altogether. We don't
have to force people to use "read/write" - we can just make it clear that
ioctl's _have_ to be wrapped, and that the only ioctl's that are
acceptable are the ones that are well-designed enough to be wrappable. So
we'd have a "linux/fs/ioctl.c" that would do all the wrapping, and would
also be able to do all the stuff that is currently done by pretty much
every single architecture out there (ie emulation of ioctl's for different
native modes).

It would probably not be that horrible. Many ioctl's are probably not all
that much used by any real programs any more. The most common ones by far
are the tty ones - and the truly generic ones like "FIONREAD" that it
actually would make sense to generalize more.

Catching stuff like EJECT at a higher layer and turning THOSE kinds of
things into real block device operations would clean up drivers and make
them more uniform.

Would fs/ioctl.c be an ugly mess of some special cases? Yes. But would
that make the ugliness explicit and possibly easier to try to manage and
fix? Very probably. And it would mean that driver writers could not just
say "fuck design, I'm going to do this my own really ugly way". 

			Linus



Date: 	Sun, 20 May 2001 12:10:59 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: [RFD w/info-PATCH] device arguments from lookup, partion code
Newsgroups: fa.linux.kernel

On Sun, 20 May 2001, Russell King wrote:
>
> On Sun, May 20, 2001 at 11:46:33AM -0700, Linus Torvalds wrote:
> > Nobody will expect the above to work, and everybody will agree that the
> > above is a BUG if the read() call will actually follow the pointer.
> 
> I didn't say anything about read().  I said write().  Obviously it
> wouldn't work for read()!

No, but the point is, everybody _would_ consider it a bug if a
low-level driver "write()" did anything but touched the explicit buffer.

Code like that would not pass through anybody's yuck-o-meter. People would
point fingers and say "That is not a legal write() function". Anybody who
tried to make write() follow pointers would be laughed at as a stupid git.

Anybody who makes "ioctl()" do the same is just following years of
standard practice, and the yuck-o-meter doesn't even register.

THAT is the importance of psychology.

Technology is meaningless. What matters is how people _think_ of it.

		Linus



Date: 	Sun, 20 May 2001 12:27:40 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: [RFD w/info-PATCH] device arguments from lookup, partion code 
Newsgroups: fa.linux.kernel

On Sun, 20 May 2001, Alexander Viro wrote:
> 
> Pheeew... Could you spell "about megabyte of stuff in ioctl.c"?

I agree. But it would certainly force people to think about this. And it
may turn out that a lot of it can be streamlined, and not that much ends
up being used very much.

It would also allow a single place of catching the generic ones, and as
such be a place to try to make things like the network ioctl's more
regular: setting things like network device duplex with _real_ interfaces
instead of hiding it in ioctl routines.

Also, note that many ioctl's actually do have fairly regular meaning, and
that it _is_ possible to catch a number of them with those regular
things:

	switch (_IOC_TYPE(number)) {
	case 'x':
		xfs_ioctl(..);

and actually try to enforce the things that Documentation/ioctl-number.txt
tries to document. And make the clashes _explicit_ and thus make people
have more incentive to really try to fix it.

> How about moratorium on new ioctls in the meanwhile? Whatever we do in
> fs/ioctl.c, it _will_ take time.

Ehh.. Telling people "don't do that" simply doesn't work. Not if they can
do it easily anyway. Things really don't get fixed unless people have a
certain pain-level to induce it to get fixed.

		Linus



Date: 	Sun, 20 May 2001 12:34:02 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: [RFD w/info-PATCH] device arguments from lookup, partion code
Newsgroups: fa.linux.kernel

On Sun, 20 May 2001, Alexander Viro wrote:
> 
> On Sun, 20 May 2001, Matthew Wilcox wrote:
> 
> > On Sun, May 20, 2001 at 03:11:53PM -0400, Alexander Viro wrote:
> > > Pheeew... Could you spell "about megabyte of stuff in ioctl.c"?
> > 
> > No.
> > 
> > $ ls -l arch/*/kernel/ioctl32*.c
> > -rw-r--r--    1 willy    willy       22479 Jan 24 16:59 arch/mips64/kernel/ioctl32.c
> > -rw-r--r--    1 willy    willy      109475 May 18 16:39 arch/parisc/kernel/ioctl32.c
> > -rw-r--r--    1 willy    willy      117605 Feb  1 20:35 arch/sparc64/kernel/ioctl32.c
> > 
> > only about 100k.
> 
> You are missing all x86-only drivers.

Now, the point is that it _is_ doable, and by doing it in one standard
place (instead of letting each architecture fight it on its own) we'd
expose the problem better, and maybe get rid of some of those
architecture-specific ones.

For example, right now the fact that part of the work _has_ been done by
things like Sparc64 has not actually had any advantages: the sparc64 work
has not allowed people to say "let's try to merge this work", because it
has not been globally relevant, and a sparc64-only file has not been a
single point of contact that could be used to clean up things.

In contrast, a generic file has the possibility of creating new VFS or
device-level interfaces. You can catch block device ioctl's and turn them
into proper block device requests - and send them down the right request
queue. Suddenly a block device driver doesn't just get READ/WRITE
requests, it gets EJECT/SERIALIZE requests too. Without having to add
magic ioctl's that are specific to just one device driver. 

So by having a common point of access, you can actually encourage _fixing_
some of the problems. Historically, sparc64 etc have not been able to do
that - they can only try to convert different ioctl's into another format
and then re-submitting it..

		Linus



Date: 	Sun, 20 May 2001 12:38:17 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: [RFD w/info-PATCH] device arguments from lookup, partion code 
Newsgroups: fa.linux.kernel

Davem, check the last thing, please.

On Sun, 20 May 2001, Alexander Viro wrote:
> 
> On Sun, 20 May 2001, Linus Torvalds wrote:
> 
> > > How about moratorium on new ioctls in the meanwhile? Whatever we do in
> > > fs/ioctl.c, it _will_ take time.
> > 
> > Ehh.. Telling people "don't do that" simply doesn't work. Not if they can
> > do it easily anyway. Things really don't get fixed unless people have a
> > certain pain-level to induce it to get fixed.
> 
> Umm... How about the following:  you hit delete on patches that introduce
> new ioctls, I help to provide required level of pain.  Deal?

It still doesn't work.

That only makes people complain about my fascist tendencies. See the
thread about device numbers, where Alan just says "ok, I'll do it without
Linus then". 

The whole point of open source is that I don't have that kind of power. I
can only guide, but the most powerful guide is by guiding the _design_,
not micro-managing.

> BTW, -pre4 got new bunch of ioctls. On procfs, no less.

I know. David has zero taste. 

Davem, why didn't you just make new entries in /proc/bus/pci and let
people do "mmap(/proc/bus/pci/xxxx/mem)" instead of having idiotic ioctl's
to set "this is a IO handle" and "this is a MEM handle"? This particular
braindamage is not too late to fix..

		Linus



Date: 	Sun, 20 May 2001 20:12:04 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: [RFD w/info-PATCH] device arguments from lookup, partion code
Newsgroups: fa.linux.kernel

On Mon, 21 May 2001, Ingo Molnar wrote:
> 
> On Sun, 20 May 2001, Alexander Viro wrote:
> 
> > Linus, as much as I'd like to agree with you, you are hopeless
> > optimist. 90% of drivers contain code written by stupid gits.
> 
> 90% of drivers contain code written by people who do driver development in
> their spare time, with limited resources, most of the time serving as a
> learning excercise. And they do this freely and for fun. Accusing them of
> being 'stupid gits' is just micharacterising the situation.

I would disagree with both of you.

The problem is not whether people do it with limited resources or time, or
whether they are stupid or not.

The problem is that if you expect to get nice code, you have to have nice
interfaces and infratructure. And ioctl's aren't it.

The reason we _can_ write beautiful filesystems these days is that the VFS
layer _supports_ it. In fact, the VFS layer has tons of infrastructure and
structure that makes it _hard_ to write bad filesystem code (which is not
to say that we don't have ugly code there - but much of it is due to
historically not having had quite the same level of infrastructure).

If we had nice infrastructure to make ioctl's more palatable, we could
probably make do even with the current binary-number interfaces, simply
because people would use the infrastructure without ever even _seeing_ how
lacking the user-level accesses are.

But that absolutely _requires_ that the driver writers should never see
the silly "pass a random number and a random argument type" kind of
interface with no structure or infrastructure in place.

Because right now even _good_ programmers make a mess of the fact that
they get passed a bad interface.

Think of it this way: the user interface to opening a file is
"open()" with pathnames and magic flags. But a filesystem never even
_sees_ that interface, it sees a very nicely structured setup where all
the argument parsing and locking has already been done for it, and the
magic flags don't even exist any more as far as the low-level FS is
concerned. Which is why filesystems _can_ be clean.

In contrast, ioctl's are passed through directly, with no help to make
them clean. 

		Linus



Date: 	Mon, 21 May 2001 15:10:32 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: [RFD w/info-PATCH] device arguments from lookup, partion code
Newsgroups: fa.linux.kernel

On Mon, 21 May 2001, Alan Cox wrote:
>
> > Sure. But we have to do two syscalls only if ioctl has both in- and out-
> > arguments that way. Moreover, we are talking about non-trivial in- arguments.
> > How many of these are in hotspots?
>
> There is also a second question. How do you ensure the read is for the right
> data when you are sharing a file handle with another thread..
>
> ioctl() has the nice property that an in/out ioctl is implicitly synchronized

I don't think we can generically replace ioctl's with read-write, and we
shouldn't bend over backwards even _trying_.

The important thing would be to give them more structure, and as far as
I'm personally concerned I don't even care if the user-level access method
ends up being the same old thing. After all, we have magic numbers
everywhere: even a system call uses magic numbers for the syscall entry
numbering. The thing that makes system call numbers nice is that the
number gets turned into a more structured thing with proper type checking
and well-defined semantics very very early on indeed.

It shouldn't be impossible to do the same thing to ioctl numbers. Nastier,
yes. No question about it. But we don't necessarily have to redesign the
whole approach - we only want to re-design the internal kernel interfaces.

That, in turn, might be as simple as changing the ioctl incoming arguments
of <cmd,arg> into a structure like <type,cmd,inbuf,inlen,outbuf,outlen>.

		Linus


Newsgroups: fa.linux.kernel
From: Alexander Viro <viro@math.psu.edu>
Subject: Re: [Evms-devel] Re: [PATCH] EVMS core 3/4: evms_ioctl.h
Original-Message-ID: <Pine.GSO.4.21.0210071340580.29030-100000@weyl.math.psu.edu>
Date: Mon, 7 Oct 2002 17:51:17 GMT
Message-ID: <fa.mgr5riv.k4gqbi@ifi.uio.no>

On Mon, 7 Oct 2002, Christoph Hellwig wrote:

> I don't think that basing kernel internal interfaces on ioctl is
> a smart idea.  Just add another function pointer to your operations
> vector for every operation you want supported on volumes.

s/every/& generic/.  Other than that, seconded.  BTW, one of the pending
changes is taking the last more or less generic ioctl (HDIO_GETGEO) into
a separate method...

->ioctl() is for driver-specific crud; stuff that won't be used by
any generic application.  "Make a cuckoo jump out of drive and sing
'1000 bottles of beer'" is a valid ioctl.  "Get drive size" isn't.



Newsgroups: fa.linux.kernel
From: viro@parcelfarce.linux.theplanet.co.uk
Subject: Re: f_ops flag to speed up compatible ioctls in linux kernel
Original-Message-ID: <20040901073218.GQ16297@parcelfarce.linux.theplanet.co.uk>
Date: Wed, 1 Sep 2004 07:40:26 GMT
Message-ID: <fa.nadrbqg.1vh4ob0@ifi.uio.no>

On Wed, Sep 01, 2004 at 10:22:45AM +0300, Michael S. Tsirkin wrote:
> Hello!
> Currently, on the x86_64 architecture, its quite tricky to make
> a char device ioctl work for an x86 executables.
> In particular,
>    1. there is a requirement that ioctl number is unique -
>       which is hard to guarantee especially for out of kernel modules

Too bad.

>    2. there's a performance huge overhead for each compat call - there's
>       a hash lookup in a global hash inside a lock_kernel -
>       and I think compat performance *is* important.
>
> Further, adding a command to the ioctl suddenly requires changing
> two places - registration code and ioctl itself.

So don't add them.  Adding a new ioctl is *NOT* a step to be taken lightly -
we used to be far too accepting in that area and as somebody who'd waded
through the resulting dungpiles over the last months I can tell you that
result is utterly revolting.

Excuse me, but I have zero sympathy to people who complain about obstacles
to dumping more into the same piles - it should be hard.


Newsgroups: fa.linux.kernel
From: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Subject: Re: Problem with ioctl command TCGETS
Original-Message-ID: <20041128003901.GS26051@parcelfarce.linux.theplanet.co.uk>
Date: Sun, 28 Nov 2004 19:43:16 GMT
Message-ID: <fa.n9c5cas.1oh2or0@ifi.uio.no>

On Sun, Nov 28, 2004 at 02:22:51AM +0200, Ozan Eren Bilgen wrote:
> 1. Is it nice to break _IO macros?

There is nothing nice about ioctls.

> 2. If it has a historical reason, shall I forget to trust to the
> informations that I decoded using _IO* macros?

You should.

> 3. Is there a list of such amazing commands?

There isn't.


Newsgroups: fa.linux.kernel
From: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Subject: Re: Problem with ioctl command TCGETS
Original-Message-ID: <20041128121800.GZ26051@parcelfarce.linux.theplanet.co.uk>
Date: Sun, 28 Nov 2004 19:39:30 GMT
Message-ID: <fa.ndc7bqr.1sh4ob1@ifi.uio.no>

On Sun, Nov 28, 2004 at 12:22:03PM +0100, Miklos Szeredi wrote:
> > The set-get is supposed to be used for queries, too? The size of value is
> > only used for the get case to describe the buffer length in that case?
> > because otherwise the set-get case may require a short value in and a large
> > answer structure out.
>
> You misunderstand the motivation.  This is to get/set small compact
> parameters, not huge structures or big data.  Think get/setsockopt().

Think read(2)/write(2).  We already have several barfbags too many,
and that includes both ioctl() and setsockopt().  We are stuck with
them for compatibility reasons, but why the hell would we need yet
another one?


Newsgroups: fa.linux.kernel
From: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Subject: Re: Problem with ioctl command TCGETS
Original-Message-ID: <20041128131159.GC26051@parcelfarce.linux.theplanet.co.uk>
Date: Sun, 28 Nov 2004 19:38:45 GMT
Message-ID: <fa.n4crbqk.1ihkob8@ifi.uio.no>

On Sun, Nov 28, 2004 at 02:07:04PM +0100, Tomas Carnecky wrote:
> >Think read(2)/write(2).  We already have several barfbags too many,
> >and that includes both ioctl() and setsockopt().  We are stuck with
> >them for compatibility reasons, but why the hell would we need yet
> >another one?
>
> And what's the option? So without ioctl, how would you reaplace this:
> ioctl(cdrom_fd, CDROMEJECT, 0)?

Which part of "we are stuck with them" is not clear enough?  If you insist
on using the same descriptor for data and for out-of-band mess - no, you
can't get anything saner.  If you do not, you can; it's that simple...


Newsgroups: fa.linux.kernel
From: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Subject: Re: Problem with ioctl command TCGETS
Original-Message-ID: <20041128140552.GD26051@parcelfarce.linux.theplanet.co.uk>
Date: Sun, 28 Nov 2004 19:38:00 GMT
Message-ID: <fa.n4sfbio.1h1co3c@ifi.uio.no>

On Sun, Nov 28, 2004 at 02:20:19PM +0100, Tomas Carnecky wrote:
> But then you'd have to open another file :(

Correct, but not necessary on sysfs.

> And what about somethink like:
> cdrom_fd = open("/dev/cdrom", O_RDWR)
> cdrom_param_fd = get_param_fd(cdrom_fd) /* a new syscall */
> Now read/write to this param fd.
> And two new entries in the struct file_operations:
> write_param([same args as write])
> read_param([same args as read])

That assumes that there is any sort of uniform semantics for these
operations.  There isn't.  Moreover, you are insisting on pushing
all of them into the same channel; not a good idea since the set
of things done with ioctls tends to consist of several unrelated
classes, often coming from a bunch of unrelated subsystems.

There is no mechanical replacement for ioctl(); the nature of its
problems is that we have a random mix of unrelated operations bumped
into one pile.

Take a look at e.g. networking ioctls.  Most of them openly ignores the
descriptor used to issue an ioctl - more often then not the first thing
they do is to peek into the passed data structure and go looking for
the real object we are going to operate upon; e.g. find an interface by
name.  Of course it's bogus; any sane modification of that API would
have the object selected by the opened file we are passing to it.

And no, we have no chance in hell to rewrite all userland code that
uses these suckers, so we are stuck with them for all forseeable future.
UCB folks had no taste, film at 11...

For more or less common (read: implemented by more than a couple of drivers)
ioctls we have to keep them anyway; for the stuff where we really stand
a chance of doing some kind of changes (including the new operations) we
can bloody well do splitup by files that would match the nature of operations.
Which leaves the "get me secondary channel by fd" kind of operations
without any uses.


Newsgroups: fa.linux.kernel
From: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Subject: Re: Problem with ioctl command TCGETS
Original-Message-ID: <20041128130319.GB26051@parcelfarce.linux.theplanet.co.uk>
Date: Sun, 28 Nov 2004 19:38:47 GMT
Message-ID: <fa.n1srbim.1g1ko3a@ifi.uio.no>

On Sun, Nov 28, 2004 at 01:52:41PM +0100, Miklos Szeredi wrote:
>
> > > > Think read(2)/write(2).  We already have several barfbags too many,
> > > > and that includes both ioctl() and setsockopt().  We are stuck with
> > > > them for compatibility reasons, but why the hell would we need yet
> > > > another one?
> > >
> > > You can't replace either ioctl() or setsockopt() with read/write can
> > > you?  Both of them set out-of-band information on file descriptors.
> >
> > Out-of-band == should be on a separate channel...
>
> Tell me how?  E.g. how would you set/get sound stream parameters if
> not with ioctl()?

Have several related files.


Newsgroups: fa.linux.kernel
From: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Subject: Re: Problem with ioctl command TCGETS
Original-Message-ID: <20041128152756.GL26051@parcelfarce.linux.theplanet.co.uk>
Date: Sun, 28 Nov 2004 19:37:09 GMT
Message-ID: <fa.n8spc2q.1l16oje@ifi.uio.no>

On Sun, Nov 28, 2004 at 03:30:19PM +0100, Tomas Carnecky wrote:
> You mean.. like nvidia?
> /dev/nvidiactl
> /dev/nvidia0
> /dev/nvidia1
> /dev/nvidia2
> and do read/write on /dev/nvidiactl (instead on ioctl)?

Really depends on situation - in some cases that's the obvious clean
variant, in some you might want something more specific.  Usually
it helps to ask "what object am I working with?" and see if it gives
a reasonable picture.  Note, BTW, that your example (eject) actually
demonstrates what kind of ugliness can be created by piling everything
together - the logics around "it's currently used, do not eject and
return -EBUSY" is broken and unfixable in all cdrom drivers.  Broken
exactly because we need to open device itself to issue eject request.
Think what happens if we get
	fd = open("/dev/cdrom", 0);
	if (fork()) {
		read a lot from that sucker
	} else {
		sleep for a while
		ioctl(fd, CDROMEJECT, 0);
	}
From the driver point of view, we have only one opener.  There's no way
to tell how many processes might have file descriptors that point to
what we'd opened back then.  So we either need to keep track of all
changes in descriptor tables and provide exclusion between that and
ioctls (have fun) or admit that driver might be hit with eject in the
middle of IO, all logics along the lines of "it's opened by somebody,
no eject for us" nonwithstanding.

And then there are horrors like cciss special-casing the open of 1st disk
on a controller (even if there's none) so that we could talk to controller
itself (in particular, tell it to go look for disks that might be attached
to it now).  It gets very ugly; same for RAID array creation, same for
loop device setup and races around it, etc.

Index Home About Blog