Index Home About Blog
From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [alsa-devel] HG -> GIT migration
Date: Wed, 21 May 2008 17:44:22 UTC
Message-ID: <fa.YSW3trnhtE9L5B4Wjz8ewJWyVoU@ifi.uio.no>

On Wed, 21 May 2008, Takashi Iwai wrote:
>
> Well, what I meant is about the fixes to the subsystem (say, ALSA) by
> people in the outside.  Not every ALSA-bugfix patch goes into the
> upstream from ALSA tree.  You, Andrew and others pick individually
> ALSA-fix patches.  They will be missing in the ALSA subsystem tree.

Well, that's actually fairly rare, but when it happens, either:

 - if you didn't get the fix (ie you're are just seeing random patches go
   in that happen to touch alsa), why should you then merge the WHOLE TREE
   with all my experimental stuff anyway? You can largely ignore it,
   knowing it's fixed, and when you ask me to pull, we'll have a good
   end result.

 - if you got the same fix as a patch, just apply it to your tree (ie just
   ignore what happens upstream). This happens all the time - people
   duplicate patches simply because two people apply it.

But the real issue is here is that my tree sometimes gets ten THOUSAND
commits during the merge window. Do you really want to pull those
thousands of commits into your tree just for one or two possible ALSA
fixes?

In _my_ tree, at least the people involved with asking me to pull end up
also having (a) people test it and (b) aware that it's in my tree, so they
work on trying to fix it. But if ALSA just merges at random times, neither
of those two cases are true. Nobody will know about or test some random
state that ALSA merged into its own tree.

Ask yourself (and ignore the ALSA parts - think of some totally
*different* development area) which you think is better

 - developing in one area based on a stable base, with the people who do
   development in that area knowing about that area.

 - or develop on top of a churning sea of thousands of changes to other
   sub-areas that you don't know anything about?

In other words, the reason I ask people to not do lots of merges is more
than just "it looks confusing". It's literally a matter of "it's bad
development practice". It causes problems. The confusing history is
actually *real* - it's not just a "visual artifact" of looking at the
result in gitk. The confusing history is a real phenomenon, and implies
that people are doing development not based on some tested base.

> And, what if that you need a fix for the fix that isn't in ALSA
> tree...?  IMO, either a rebase or a merge is better than
> cherry-picks.

First off, I don't see why you even need cherry-picks in the first place.
I think your argument is bogus, and you're making it because you want to
get the end result, not because the argument is valid on its own.

Here, let's see what I committed to the sound subsystem since 2.6.24
(ignoring merges):

	git log --no-merges v2.6.24.. --committer=torvalds sound/

and look over that list. Remember: this is not some short timeframe, this
is over TWO whole merge windows, ie this is way more commits than we would
normally _ever_ get out of sync over.

Realistically, which of those commits aren't (a) either already from you
sent to me just as a way to get a quick fix into my tree without merging
the whole thing or (b) stuff that can't just be in my tree and doesn't
have to be in the ALSA tree until the next release?

Honestly, now: does *any* of those commits look like "we should merge all
the other changes just because we need that commit _now_ in ALSA"?

I really doubt it.

So I'd seriously suggest submaintainers merge *AT*MOST* once a week, and
preferably much much less often than that. There simply isn't any real
reason to do it more often. Because it can cause problems.

That's why my suggested rule is:

 - merge with mainline at major releases

   This is "safe". Yes, releases still have bugs, but on the other hand,
   they have much fewer problems than random git trees of the day, so they
   are a lot safer targets to merge.

 - merge with mainline if you know there are real conflicts that need to
   be resolved.

   This isn't "safe", but it's about trying to resolve conflicts early, so
   at some point the downside of merging with a "random point" is smaller
   than the downside of delaying the merge!

but perhaps the most important rule is that things should never be
*really* black-and-white, and in the end the really fundamental rule
should be:

 - Use your own judicious good sense, and merge at other points as
   necessary, but just keep in mind that a merge is a big change.

Yes, merging with git may be technically really really trivial and take
all of two seconds of your time, but:

 (a) you *do* potentially get thousands of new commits that aren't
     actually related to your work and that you probably don't know
     well.
 (b) others, when they look at your history, will have a harder time
     following it.

so while I can give you a few guidelines, in the end those guidelines are
just _examples_ of when merges can make sense. You need to understand what
the impact of a merge is - and that while git makes merging technically
pretty damn trivial most of the time, a merge should still be a big deal,
and something you think about.

So the kinds of merges I *really* dislike are the ones that are basically
"let's do a regular merge every day to keep up-to-date". That's fine if
you don't do any development at all and "git pull" is just basically a
"track the current development kernel for testing", but if it involves a
merge, it means that there is something wrong in your development model.

> But, my question is about the divergence between the development and
> for-linus branches: how to apply patches that exist only in for-linus
> tree back.

How often does it happen? And how big/important are those? I really think
it's probably a "maybe once or twice a release cycle".

And then, the actual answer can be different depending on the details. For
example, there are really three things you can do:

 - ignore it. Is it a cleanup patch (like the sparse patches) or just
   fairly trivial stuff that doesn't matter in real life ("remove
   duplicated unlikely()" patch or the /proc fixups)

   This is often the right thing to do. You _will_ merge eventually
   anyway, we know that. I'd expect merges to happen at least once in the
   development cycle, maybe twice.

   Yes, the patch may touch the sound system, but do you really _care_
   about it happening right now, or can you just wait until the next merge
   you do?

 - cherry-pick it. Is it a small, simple patch that you want, but that
   isn't really worth pulling in all the other stuff that you simply don't
   know?

   This isn't wrong. It shouldn't be *common*, but it's not wrong to have
   the same patch in two different branches. It makes sense if it is
   something you really want, but it's still not important or complex
   enough to actually mege everything else!

 - and finally: merge. It really can be the RightThing(tm). Is it a
   biggish infrastructure change? Is it a series of several related and
   dependent commits?

   In other words: is it something big enough that you'd rather merge
   everything else too (which at least has gotten tested together)? If so,
   merging is absolutely the right thing to do!

So merging on its own is not "wrong or evil" at all. Merging is a very
good operation to do, but *mindless* merging is bad. That's really all
that I'm really trying to argue against.

If you thought it through, and decided that yes, you really want to merge,
then you should merge. I just think a lot of people merge without even
thinking about all the other things it involves, just because git made it
*so* easy to do.

			Linus


From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [alsa-devel] HG -> GIT migration
Date: Wed, 21 May 2008 18:12:37 UTC
Message-ID: <fa.pPIumk93FkTsE/268oMw4s3EtT0@ifi.uio.no>

On Wed, 21 May 2008, Linus Torvalds wrote:
>
> So merging on its own is not "wrong or evil" at all. Merging is a very
> good operation to do, but *mindless* merging is bad. That's really all
> that I'm really trying to argue against.

Btw, let me explain this another way.

I do a *lot* of merges as being an "upstream" person. Since 2.6.25, I've
done something like 247 merges (and that's not counting the fast-forward
ones). If you do

	git log v2.6.25.. -author=torvalds

you'll see pretty much just merges. It's simply what I do. I have a few
fairly trivial patches every once in a while (although you almost have to
add a "--no-merges" to filter out the merges to see them), but doing
merges is what I do most.

So why would I tell others to not merge, when I've done several hundred
merges in just the last month myself? Isn't that really hypocritical of
me?

The symmetry breaking comes from a few things:

 - the merges that "upstream" people do are generally smaller, but even
   when they are large, they have a "theme".

   Most merges I do are fairly small, but even when they aren't (eg the
   network layer merge of all the thousands of commits that were pending
   for when the merge window opened), they are _directed_.

   IOW, when upstream does a merge, it's hopefully (if the process works
   correctly) going to be about a specific sub-area, even if that area may
   be pretty big. So the merge has a very specific meaning: "I want to
   pull in the changes to subsystem 'xyz'"

   In contrast, a merge that goes the other way (subsystem merging
   upstream) generally doesn't get any particular directed development
   changes, it just gets "everything else".

   This also explains why I can do merge summaries, and downstream people
   generally can not. Look at my merges, and see how they say things like

      Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6

      * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (21 commits)
        [CIFS] Remove debug statement
        Fix possible access to undefined memory region.
        [CIFS] Enable DFS support for Windows query path info
        [CIFS] Enable DFS support for Unix query path info
        [CIFS] add missing seq_printf to cifs_show_options for hard mount option
	...

   iow, when upstream does a merge, it's simply different from when
   downstream does one - thanks exactly to having a specific purpose.

 - I hopefully only merge "release points".

   I don't pull from people at random points. I don't pull daily, or even
   weekly. I pull when a sub-maintainer asks me to pull, and I've tried to
   teach people to think of their pull requests as literally being
   "releases" of their tree. Because they effectively are!

   No, releases aren't perfect, and there will be bugs, but the same way
   that I argued that maintainers should generally aim to pull from me
   mainly at stable release points, I myself want to pull from downstream
   only when there is _some_ reason to believe that it's a stable point.

   So this isn't really a "broken symmetry", but it looks different
   because subsystem "releases" are smaller and happen more often than a
   "whole kernel release".

 - My tree isn't so much a "development tree" as it is a "integration
   tree".

   IOW, the biggest reason for my tree to exist in the first place is
   exactly the fact that it gives people a place to go to see what the
   "union" of the development is. In contrast, the reason people would go
   to the ALSA tree, or the networking tree, or any of the other specific
   trees is exactly because they don't want the union, but want to see
   what's recent in that particular area.

   If somebody fetches the tip of the ALSA tree, they may expect
   sound-related stuff to break, but you'd generally want to make sure
   that the rest is as stable as possible (while not being _entirely_
   stale, of course!)

   If somebody fetches my tree, they want it all, and they *expect* to see
   any breakage (and any new features) that everybody else could have
   caused.

So the above hopefully explains why I do 350+ merges per release, but
still have the gall to tell other people that they shouldn't merge too
much.

[ The corollary to this all is that when downstream does a merge, think
  about what the merge message can say. How would you descibe the merge?

  Can you give a good description of what you merged, and why? That's one
  thing that merging with releases can give you: you can say "merge with
  release 'xyz'", and people actually understand the *meaning* of it. Your
  merge message makes sense - and that implies that the merge itself
  likely made sense.

  If you cannot explain what and why you merged, you probably shouldn't be
  merging - that's a good rule of thumb right there! Maybe that rule in
  itself should already be seen as sufficient ]

				Linus


From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [alsa-devel] HG -> GIT migration
Date: Wed, 21 May 2008 18:40:18 UTC
Message-ID: <fa.mQHAQ8T1jF/E9Cw0XnBTtILt//o@ifi.uio.no>

On Wed, 21 May 2008, david@lang.hm wrote:
>
> one thing that you have missed in your explination in this thread (although
> you have made the point in other threads) is that subsystem maintainers have
> the fear that there are other changes that will interfere with their stuff and
> want to catch it early.

Yes.

However, that's not just a "my tree" issue. In fact, quite often other
trees are more interesting from that angle: for driver subsystems like
sound, the changes in Greg's driver core git tree may actually be often
more relevant and give more of a heads-up than looking at my tree.

> per your instructions in prior threads, what they should do is to have a
> seperate branch on their system that they use as a throw-away branch to pull
> from your tree, and from their tree to spot problems. As they find problems
> they can then address them (cherry pick, or whatever)

Yes. Doing throw-away merges is a great way to test not just whether there
might be actual merge conflicts, but also to just test that things work
together.

And even if you want to concentrate your *development* on just
ALSA-specific stuff, you may well want to also test all the changes that
have gone upstream from other projects (and often do that _together_ with
the changes you have developed yourself). And again, for this kind of
testing, doing a throw-away merge to see how it all works together is
fine.

> so it's not that the ALSA people should only look at your tree at the merge
> points, it's that they shouldn't pollute their tree that they are going to
> publish to you with this checking.

Yes. In general, it's a great idea to have "test trees" that aren't really
for development, but for testing. That's obviously what 'linux-next' does,
but it's something any tester can do (and it doesn't even have to imply
any developer skills, although it would generally require at least some
comfort with git).

That said, at least as far as I'm concerned, when I pull from some
subsystem tree, the thing I really want to know is that the state of that
tree is stable on its own. IOW, if the merge itself introduces some subtle
bug, that is not only fairly unusual, but it's also something that should
not be seen as a bug from the tree I pulled - it's just bad luck.

So a submaintainer should care *most* about the fact that his/her tree is
stable on its own. Problems that happen when multiple development trees
are merged should be the secondary concern. I'd rather have people test
their _own_ code really well, than spending lots of time trying to test
every possible combination with other peoples trees.

			Linus


From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [alsa-devel] HG -> GIT migration
Date: Wed, 21 May 2008 19:03:26 UTC
Message-ID: <fa.BJcWQdFAr0rh9nkZvyQSN/+cyMI@ifi.uio.no>

On Wed, 21 May 2008, Takashi Iwai wrote:
>
> >  - cherry-pick it. Is it a small, simple patch that you want, but that
> >    isn't really worth pulling in all the other stuff that you simply don't
> >    know?
> >
> >    This isn't wrong. It shouldn't be *common*, but it's not wrong to have
> >    the same patch in two different branches. It makes sense if it is
> >    something you really want, but it's still not important or complex
> >    enough to actually mege everything else!
>
> Hm, that's what I didn't consider seriously.  I thought cherry-picking
> patches may cause merge errors easily.

Cherry-picking can certainly cause merge errors, but not generally very
often.

Cherry-picking by definition will obviously apply the *same* patch to two
different branches, and as a result, when you merge, that merge will
generally be totally clean. So a trivial merge that succeeds without you
even noticing is actually the common case.

But you can certainly get merge failures where you then have to fix things
up if there were *other* changes to that same area. At that point, you end
up with two different branches that changed the same few lines
differently, and it doesn't matter if then _some_ of the changes were
identical - the fact that others were not is enough to cause a merge
conflict.

If cherry-picking is an uncommon situation, the merge problems are not
going to show up (and when they do, they'll generally be simple to
resolve, especially if you limit cherry-picking to simple fixes). But if
you do a *lot* of cherry-picking, and you cherry-pick big changes, then
yes, you'll start hitting merge problems.

So cherry-picking is fine if you do it (a) fairly seldom and (b) just to
small patches, because then the upsides of cherry-picking (easy to get a
single fix without merging everything else) are bigger than the downsides
(the potential merge problems later).

IOW, think of cherry-picking as just another tool. It has upsides and
downsides. It's not "wrong" per se, but you can use it the wrong way. You
shouldn't use a hammer on a screw, and you shouldn't use cherry-picking
for big and complex stuff.

			Linus

Index Home About Blog