What is this about

The 500+ packages of the OpenStack team do not use pristine-tar, and all is done in a single branch. There are many reasons why it is done this way, and why this workflow is more efficient. This page describes why and how.

I've been notified (with harsh words) by someone that this wasn't documented. This will hopefully fill the gap.

If the reader doesn't care about why pristine-tar has been avoided, and just wish to contribute, simply jump over the pristine-tar is bad section below.

Why not using pristine-tar

Pristine-tar is broken by design

The original author of pristine-tar, Joey Hess, explained it himself. pristine-tar doesn't work, because it needs to generate tarballs in a reproducible way. But that doesn't work unless:

* one uses the same exact version of tar. Debian has been sending patches upstream and keeping tar behaving the same way just in order to unbreak prisinte-tar. Though it's a bad idea.

* upstream files are always listed in the same order

* upstream timestamps are all set to EPOCH=1 (ie: same timestamp, always)

Any attempt to do otherwise is at least risky, and some times, just a waste of time as it wont ever work. If you do not trust me, discuss this with the maintainers of pristine-tar themselves.

Pristine-tar is uselessly time consuming for package maintainers

The idea to work with upstream tarballs, and use them as the reference for an upstream release, may have been the right one 30 years ago. But since Git appeared, and in fact, since everyone moved to Git, it is now a very bad one. Why? Because it is a useless loss of time. Think about it. In order to keep this workflow, any package maintainer in Debian needs, to release a new version:

- download the last tarball from upstream

- copy it to the right folder (I personally use ../build-area, which means I need the tarball to be in .. and in ../build-area, relative to the root of a source package).

- import in the Git repository for packaging, in 2 different branches (ie: gbp import-orig): the debian one, the upstream branch.

- commit it to the pristine-tar branch (ie: pristine-tar commit ../<upstream-tarball>).

4 operations is already annoying, just for a new upstream release. But that's not it.

Downloading the tarball may be slow. Think about doing this in the train, over a flacky wifi, or using your mobile phone line. Compare this with getting a few dozen commits with git (ie: git fetch <tag-name>), when the upstream tarball maybe is 150+ MB...

The pristine-tar commit, on relatively big upstream tarball, is super resource intensive. For example, when I work on Ceph, and commit a new tarball with pristine-tar, my CPU fan starts spinning at its maximum speed, and it takes minutes to get it done. That's impressively stupid.

Using 3 branches doesn't work

The pristine-tar workflow assumes that you have a single upstream branch, and that the upstream release will only increment in a monotonic way. This is very naive. For OpenStack, I need to maintain 5 or 6 upstream release per Debian stable release. Look at this table and you'll see them:

https://wiki.debian.org/OpenStack#OpenStack_releases_VS_Debian_releases

If I had to use prinstine-tar, I'd have to keep 5 or 6 upstream branches too, and for each release, correctly document what upstream branch is being used in a debian/gbp.conf file. This is highly inefficient. Especially when you consider that I'd need to do that for 500+ packages.

Using the workflow that is described below, each upstream release is mapped to exactly one Debian git branch, which is easy enough.

Why using upstream tarball is wrong

Often, the upstream tarball is a less good source compared to Git. It may contain artifacts that upstream generated, for example, or it may omit files that are otherwise (ie: in the upstream git tree) available. To my experience: whenever possible, it's always best to use upstream Git tree.

Using upstream Git is always nicer

There's countless times that I had to look at the git history of different branch, just to see in what branch a commit appeared, and sometimes, cherry-pick it from one upstream branch to another. Since the packaging branches are merge of the upstream branch + the Debian packaging commits, it's super easy to do so. This way, a "git blame" on the upstream code also works in the Debian packaging branch too, which is what I often have to do (to find the correct patch in upstream Gerrit, and sometimes, backport it to an earlier version of OpenStack)!

The prinstine-tar workflow doesn't allow me to do that, I'd have to clone upstream git repository somewhere else, and do the above from there. Except that the upstream Git repository may some time disappear (yes, this already happened to me)! Or some branches may be deleted, or you know what...

Also, a few times, embargoed security patches for OpenStack were done on top of upstream last stable-branch commit, and wouldn't apply without some commits that are NOT in the matching last upstream tarball for that branch. Guess what? Using the git workflow, I don't have these troubles (or in fact, it is easier to deal with that kind of troubles).

Some exceptions

Unfortunately, there are exceptions that I haven't been able to deal with. For example, when upstream uses git submodules, it may be harder to use the upstream git tag based workflow (at least, I haven't find a sound solution). Though these exceptions are very rare (I'd say, less than 5). Also, some packages are maintained by others (Canonical: thank you guys!), like OVN and OpenVSwitch, and for them, I agreed to use pristine-tar.

The upstream git tag workflow

Basic principles

Whenever upstream release a new version, they create a tag matching the release (in the rare cases where upstream forgot about tagging, I have always been able to tag it myself instead). This tag is the base that's used for all operations.

There's only a single packaging branch to deal with, always. Of course: there may be more than one Debian branch, because we may need to maintain more than one version, for example, debian/bullseye, debian/bookworm, debian/trixie, debian/forky, for when there's security or point release updates. And otherwise, the OpenStack team maintains one branch per OpenStack release.

gbp is not mandatory. One of the OpenStack team contributor simply uses an "sbuild" invocation direction, which also works.

Fetching the upstream tag

One may simply do:

git fetch upstream --tags

Though I do:

./debian/rules fetch-upstream-remote

which has the automation for adding the remote described as UPSTREAM_GIT := in debian/rules, *OR* calculating it like this:

UPSTREAM_GIT    ?= https://github.com/openstack/$(DEBPKGNAME).git

as most (if not all) OpenStack projects are mirrored on github (the canonical upstream remote is normally https://opendev.org/openstack, but it is a way slower than github that always has an up-to-date mirror or all OpenStack git repositories).

Importing in the Debian branch

Move to the correct branch name for the OpenStack release you want to package for, and simply do:

git merge -X theirs <upstream-tag-name>

Generating the upstream tarball

Since generating upstream tarballs using pristine-tar is broken, well so is doing it using this workflow. But it doesn't mater. As a very strong rule, whenever you want to rebuild a package from Debian, do not use "gbp export-orig" to generate the tarball: fetch it from the Debian archive instead (with wget, wcurl or dget, for example). Otherwise, you're risking to generate a tarball that isn't the same as in the archive, and you'll get your upload rejected. Well, with this workflow, that's the same: fetching it from the Debian archive is best if it's not a new upstream release.

So, having this in mind, the only reason why you'd like to generate an upstream tarball, is when there's a new upstream release (or when doing a new package). In such case, simply do:

./debian/rules gen-orig-xz

you may do it by hand if you do not want to use the automation of openstack-pkg-tools:

git archive --prefix=DEBPKGNAME-VERSION/ GIT_TAG | xz >../DEBPKGNAME_VERSION.orig.tar.xz

I believe there's other tooling in Debian doing the same, but I got used to my own. Pick the one you want, nobody cares, if at least it's doing "git archive" the correct way.

Is that it ?

Yes. That's all there is to it, and that's all there is to know. 3 commands:

./debian/rules fetch-upstream-remote
git merge -X theirs <upstream-tag-name>
./debian/rules gen-orig-xz

And if you only need to contribute without pushing a new version, then you don't even need them: fetching the current upstream tarball from the Debian archive is enough.

wrong sbuild and gbp default configuration

The same way sbuild should have, by default:

$clean_source = 0;

and that one needs to write it in ~/.sbuildrc, otherwise, it's a nightmare, gbp needs some default configuration.

To avoid maintaining a debian/gbp.conf file in absolutely ALL OpenStack packages, one needs to have this in ~/.gbp.conf:

[DEFAULT]
cleaner = /bin/true
ignore-branch = True
pristine-tar = False
no-create-orig = True

No, it is NOT a problem to have the above in a ~/.gbp.conf, even when working with the prinstine-tar workflow (which I often do within the Python team, because ... it's the rule there!). Trust me, the above is always good to have, in all cases. Let me explain:

ignore-branch = True: makes it possible to use whatever branch you like for the Debian packaging, without gbp to be annoying.

pristine-tar = False: tells gbp that we don't want the pristine-tar automations by default (that doesn't mean one can't work with this workflow)

no-create-orig = True: tells gbp to never automatically second-guess how the upstream tarball should be generated. One then needs to do things manually, which believe me, is a good thing.

Without the above (which should have been the default in gbp to begin with), it would have been necessary to maintain a debian/gbp.conf file in each and every Debian package of the OpenStack team. Thanks but ... no thanks! :)

Efficiency

The described packaging workflow above is a WAY more efficient than using pristine-tar in my every day packaging. And that's the main reason (above ALL the others described above) why I do this way. If you think it should be done differently, I welcome you to join the OpenStack team and contribute. But since all members of the team are currently convinced of what's written in this page, and that the team is mostly consisting of a single active member (ie: me, zigo), then I get to decide unless the situation changes (in which case I'll be open to any discussion on this topic).