Index Home About Blog
From: Theodore Tso <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: What still uses the block layer?
Date: Mon, 15 Oct 2007 01:46:35 UTC
Message-ID: <fa.NS0QnAKdOzp2XXKE9rI90QuPGms@ifi.uio.no>

On Sun, Oct 14, 2007 at 06:45:44PM -0500, Rob Landley wrote:
> I admit a certain amount of personal annoyance that once the SCSI
> layer consumes a category of device (USB, SATA, PATA), they can
> often _only_ be used by going through the SCSI midlayer.  (This
> strikes me as analogous to TCP/IP claiming ethernet and PPP devices
> so thoroughly that you can no longer address them as eth1 or
> /dev/ttyS0.)

That's because modern USB, ATAPI (what was once known as IDE), SATA
really *all* using the SCSI command protocols at the low level, just
as Ethernet and PPP interfaces really are fundamentally the same
thing.  You can rail against it, but that's the mark of someone who
refuses to accept reality.

> This has the annoying effect of bundling together different types of
> devices and making device enumeration unnecessarily difficult: my
> laptop only has one SATA hard drive and can't gain another without a
> soldering iron, but that drive could move from /dev/sda to /dev/sdb
> if I reboot the system with a USB key plugged in.  This seems like a
> regrettable loss of orthogonality to me.  I remember back when
> /dev/usb0 and /dev/hda were separate devices that showed up in /dev,
> but these days "it's SCSI" seems to trump "it's USB", "it's ATA", or
> "it's SATA".  (Even though none of those are actually SCSI hardware,
> they just send a similar packet protocol across the wire.)

You're showing your ignorance here.  In fact in the past few years,
ATA and SCSI has been converging significantly, with the ATAPI
specification has essentially incorporating the SCSI protocol by
reference and by value --- with the point that SAS was developed by
the SCSI Trade Association, and SAS is effectively a superset of SATA,
to the point where with care, you can actually mix SAS and SATA drives
on the same in enclosure (SAS and SATA are physically compatible on
the connector level).

More to the point, with SATA, hot plugging has been designed in, so
probing order is not going to be well defined, just as with USB
devices.  And there are already relatively common situations where the
same disk can show up via multiple different interfaces.

For example, if you have a modern Thinkpad with an secondary SATA hard
drive in an Ultrabay, and you plug it into the Ultrabay in your T60,
it will show up as a SATA drive.  However, if you plug it into the
Advanced dock, it shows up as a USB device.  And with iSCSI not only
can you encapsulate a SCSI command stream over USB, you can do so over
IP as well.  In any case, regardless of how the physical SATA drive is
attached to the system, you want it to show up as the same device and
be mounted in the same location.

That's why identifying filesystem by UUID's or Labels is so critical.
This is not a new concept; we've had the capability to do this for
over a decade, and I always knew it would be necessary for us to do
this sooner or later --- which is why I added the UUID support to ext2
back in 1996.

> The fact that udev can theoretically unwind this hairball is not an
> excuse for conflating different categories of devices in the first
> place.

See the thinkpad Ultrabay drive example above.  You address hosts by
IP address; it doesn't matter whether you access them via a PPP
interface, or a wireless interface, or a ethernet interface.
Similarly, a disk could in theory be accessible over USB, SATA, or
iSCSI, and the Thinkpad example is only one such where the same
filesystem might be accessible over multiple interfaces.  And with
multipath fiber channel SAN's (and I hate to break it to you, but FC
also uses SCSI protocols) storage is very much looking more and more
like networking.

						- Ted


From: Theodore Tso <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: What still uses the block layer?
Date: Mon, 15 Oct 2007 13:22:22 UTC
Message-ID: <fa.ayP5jOfbDjibiZ/uer3+d/34avw@ifi.uio.no>

On Mon, Oct 15, 2007 at 03:04:00AM -0500, Rob Landley wrote:
> Ok, I'll bite.  If it's all "real" scsi, why does ioctl(SG_EMULATED_HOST)
> exist? exist if it's all "real" scsi?

SG_EMULATED_HOST was added before Linux 2.4, at least six or seven
years ago.  Back then the migration of ATA devices through the various
versions the ATAPI specification and then into SATA was very early in
its evolution, and back then, yes there were people who considered
anything that didn't use the honking huge parallel SCSI cables not
"real" SCSI.  Over time, that distinction at both the physical
connector level and logical level has declined to the point of almost
non-existence.  It's note quite at the point where SAS exists only to
justify massive prices differences between commodity and "data-center
grade" disks to the benefit of hard drive manufacturers, but it's
darned close.  (There are differences such as voltage levels so that
the max cable differences for SAS are larger, etc., but those could
have been optional additions to the SATA spec, and allegedly SAS
drives are supposedly manufactered to be more robust --- although some
recent papers published at FAST have raised some interesting questions
about how true those marketing claims really are in practice.)

> > just
> > as Ethernet and PPP interfaces really are fundamentally the same
> > thing.
>
> They're the same thing?
>
> Do you mean that on a system with both, going:
>   ifconfig eth1 66.92.53.140
>   ifconfig ppp 192.168.0.42
>
> Would be functionally equivalent to:
>   ifconfig eth1 192.168.0.42
>   ifconfig ppp 66.92.53.140

No, of course not.  But we don't have separate IP stacks for ethernet
and ppp devices.  And how we connect to a host via ssh makes no
difference whether we accessed it via Ethernet or PPP.  And I would
argue that how we address a filesystem should also make no difference
depending on the path to hard drive.

> By the way, ethernet cards contain a unique MAC address.  Hard
> drives do not seem to, or if they do it's not being consistently
> exposed in a way I can find.

You can pull a Model and Serial number via hdparm -i, but it's not as
easy to manipulate as a fixed-length MAC address.  That's why people
tend to use filesystem UUID's.

> > More to the point, with SATA, hot plugging has been designed in, so
> > probing order is not going to be well defined,
>
> The spec may define the capability to hotplug, but your average
> laptop doesn't not offer the capability to hotplug anything into its
> SATA controllers.  The hard drive is screwed in (due to the
> portability part of laptopness), all the controllers wired onto the
> motherboard are accounted for, none are exposed externally.  What
> _is_ exposed externally is USB, and if you want to add an extra hard
> drive you can buy a cheap USB one at Fry's.

That may be true for laptops today, but Linux doesn't run just on
servers.  You can easily get home servers with hot-swap SATA bays.  My
home fileserver, which is a white box I purchased on my own nickel,
NOT IBM big iron, has 3TB of raw storage for less than $10,000 a year
ago.  Today, that amount of home storage with hot-swap SATA drives and
a battery-backed hardware RAID controller could probably be purchased
for about half that price.

And even for laptops, if you need the performance, you can get Cardbus
cards that will allow you to connect eSATA drives to your laptop at
Fry's.

So even if you ignore "big data center" interconnects like FC, this
problem exists even for commodity grade SATA devices.

I agree at the moment we have an issue where if the root device isn't
guaranteed, it forces people to use initrd's, and the quality and
debuggability of initrd's between distro's is highly variable and not
standardized.  In practice though the /dev/sda is actually pretty
stable on laptops, especially if you end up compiling ehci and uhci
support as modules (which is a good idea from a power savings point of
view anyway).  The reason why Ubuntu and other distributions are using
UUID-based labels is not just because of the root device, but also for
all of the other disks that might be mounted on the system, including
some that might be using USB devices that don't have stable /dev
names.

> It's necessary for IBM big iron to do this.  It's generally not
> necessary for laptops or embedded systems to do this if they
> distinguish between _types_ of devices, which is something they
> until recently did for the types of devices I was interested in, and
> something they _stopped_ doing when everything got merged into the
> scsi layer, and I consider this a regression.

As another example, it's easy to see a home media server running Linux
which doesn't have any expansion bays for additional hard drive --- so
the only way a user could expand their storage is by using one or more
permanently connected USB disks.  So we do need to solve the general
device enumeration problem in the general case; it's not just the case
of IBM "big iron" as you seem to think.

> No, distinguishing between types of devices is not a perfect
> solution to device enumeration, but it was sufficient for all my use
> cases for many years, and would still be if the kernel still did it,
> and I'm not alone here.

News flash!  The kernel wasn't built just for you, and over time, more
and more people will have multiple disk drives of the same type, so we
will need to solve the device naming problem sooner or later.  Why not
solve it sooner, especially given that a number of companies (not just
IBM) are funding the organization that is paying *your* salary are
interested in solving the general case?

Furthermore, I've already pointed a number of situations where the
home user might have multiple USB devices on their system today, and
that is probably going to go up over time, not down.  Have you seen
how cheap 500GB USB disks are at Costco?  And for a typical
unsophisticated user, plugging in another 500G USB disk when they need
more storage is a lot easier than cracking open the computer case and
futzing with screws and disk cables and power connectors.

		     	    	     	    - Ted


From: Theodore Tso <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: What still uses the block layer?
Date: Mon, 15 Oct 2007 13:37:12 UTC
Message-ID: <fa.J3nPZA08Oxnfk779elTWapc4O+s@ifi.uio.no>

On Mon, Oct 15, 2007 at 02:29:45PM +0100, Alan Cox wrote:
> > You can pull a Model and Serial number via hdparm -i, but it's not as
> > easy to manipulate as a fixed-length MAC address.  That's why people
> > tend to use filesystem UUID's.
>
> ATA8 at the moment looks set to add a true "MAC" or "WWN" type identifier
> to each device.. Right now model/serial is not always unique.

True, but most manufacturers try to make the serial number unique for
their own reasons (like warrantee service), and you can have
manufacturing errors with MAC assignment just as easily as you can
with serial numbers.

I still remember when SGI shipped MIT 20 SGI Indy pizza boxes that all
had the same MAC addresses (that we knew about --- we found out
because all 20 were installed on the same subnet).  That was a mildly
entertaining bug to track down....  especially since IIRC, Irix at the
time didn't print warning messages when someone else with a different
IP addresses responded to your MAC address.

   						- Ted

Index Home About Blog