Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc: add --expire-to option #1843

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions Documentation/git-gc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,12 @@ be performed as well.
the `--max-cruft-size` option of linkgit:git-repack[1] for
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

ZheNing Hu via GitGitGadget <[email protected]> 于2024年12月31日周二 10:18写道:
>
> From: ZheNing Hu <[email protected]>
>
> This commit extends the functionality of `git gc`
> by adding a new option, `--expire-to=<dir>`. Previously,
> this feature was implemented in `git repack` (see 91badeb),
> allowing users to specify a directory where unreachable and
> expired cruft packs are stored during garbage collection.
> However, users had to run `git repack --cruft --expire-to=<dir>`
> followed by `git prune` to achieve similar results within `git gc`.
>
> By introducing `--expire-to=<dir>` directly into `git gc`,
> we simplify the process for users who wish to manage their
> repository's cleanup more efficiently. This change involves
> passing the `--expire-to=<dir>` parameter through to `git repack`,
> making it easier for users to set up a backup location for cruft
> packs that will be pruned.
>
> Signed-off-by: ZheNing Hu <[email protected]>
> ---
>  Documentation/git-gc.txt | 6 ++++++
>  builtin/gc.c             | 6 +++++-
>  t/t6500-gc.sh            | 6 ++++++
>  3 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
> index 370e22faaeb..b4c0cf02972 100644
> --- a/Documentation/git-gc.txt
> +++ b/Documentation/git-gc.txt
> @@ -69,6 +69,12 @@ be performed as well.
>         the `--max-cruft-size` option of linkgit:git-repack[1] for
>         more.
>
> +--expire-to=<dir>::
> +       When packing unreachable objects into a cruft pack, write a cruft
> +       pack containing pruned objects (if any) to the directory `<dir>`.
> +       See the `--expire-to` option of linkgit:git-repack[1] for
> +       more.
> +
>  --prune=<date>::
>         Prune loose objects older than date (default is 2 weeks ago,
>         overridable by the config variable `gc.pruneExpire`).
> diff --git a/builtin/gc.c b/builtin/gc.c
> index d52735354c9..77904694c9f 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -136,6 +136,7 @@ struct gc_config {
>         char *prune_worktrees_expire;
>         char *repack_filter;
>         char *repack_filter_to;
> +       char *repack_expire_to;
>         unsigned long big_pack_threshold;
>         unsigned long max_delta_cache_size;
>  };
> @@ -441,6 +442,8 @@ static void add_repack_all_option(struct gc_config *cfg,
>                 if (cfg->max_cruft_size)
>                         strvec_pushf(&repack, "--max-cruft-size=%lu",
>                                      cfg->max_cruft_size);
> +               if (cfg->repack_expire_to)
> +                       strvec_pushf(&repack, "--expire-to=%s", cfg->repack_expire_to);
>         } else {
>                 strvec_push(&repack, "-A");
>                 if (cfg->prune_expire)
> @@ -675,7 +678,6 @@ struct repository *repo UNUSED)
>         const char *prune_expire_sentinel = "sentinel";
>         const char *prune_expire_arg = prune_expire_sentinel;
>         int ret;
> -
>         struct option builtin_gc_options[] = {
>                 OPT__QUIET(&quiet, N_("suppress progress reporting")),
>                 { OPTION_STRING, 0, "prune", &prune_expire_arg, N_("date"),
> @@ -694,6 +696,8 @@ struct repository *repo UNUSED)
>                            PARSE_OPT_NOCOMPLETE),
>                 OPT_BOOL(0, "keep-largest-pack", &keep_largest_pack,
>                          N_("repack all other packs except the largest pack")),
> +               OPT_STRING(0, "expire-to", &cfg.repack_expire_to, N_("dir"),
> +                          N_("pack prefix to store a pack containing pruned objects")),
>                 OPT_END()
>         };
>
> diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
> index ee074b99b70..d4b0653a9b7 100755
> --- a/t/t6500-gc.sh
> +++ b/t/t6500-gc.sh
> @@ -339,6 +339,12 @@ test_expect_success 'gc.maxCruftSize sets appropriate repack options' '
>         test_subcommand $cruft_max_size_opts --max-cruft-size=3145728 <trace2.txt
>  '
>
> +test_expect_success '--expire-to sets appropriate repack options' '
> +       mkdir expired &&
> +       GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size gc --cruft --expire-to=./expired/pack &&
> +       test_subcommand $cruft_max_size_opts --expire-to=./expired/pack <trace2.txt
> +'
> +
>  run_and_wait_for_gc () {
>         # We read stdout from gc for the side effect of waiting until the
>         # background gc process exits, closing its fd 9.  Furthermore, the
> --
> gitgitgadget
>

Hi, Jeff King, could you come and help take a look at this patch?
I would be very grateful if you have time!

ZheNing Hu

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

This patch has been sitting for weeks with no review. Does anyone want
to help take a look?

ZheNing Hu via GitGitGadget <[email protected]> 于2024年12月31日周二 10:18写道:
>
> From: ZheNing Hu <[email protected]>
>
> This commit extends the functionality of `git gc`
> by adding a new option, `--expire-to=<dir>`. Previously,
> this feature was implemented in `git repack` (see 91badeb),
> allowing users to specify a directory where unreachable and
> expired cruft packs are stored during garbage collection.
> However, users had to run `git repack --cruft --expire-to=<dir>`
> followed by `git prune` to achieve similar results within `git gc`.
>
> By introducing `--expire-to=<dir>` directly into `git gc`,
> we simplify the process for users who wish to manage their
> repository's cleanup more efficiently. This change involves
> passing the `--expire-to=<dir>` parameter through to `git repack`,
> making it easier for users to set up a backup location for cruft
> packs that will be pruned.
>
> Signed-off-by: ZheNing Hu <[email protected]>
> ---
>  Documentation/git-gc.txt | 6 ++++++
>  builtin/gc.c             | 6 +++++-
>  t/t6500-gc.sh            | 6 ++++++
>  3 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
> index 370e22faaeb..b4c0cf02972 100644
> --- a/Documentation/git-gc.txt
> +++ b/Documentation/git-gc.txt
> @@ -69,6 +69,12 @@ be performed as well.
>         the `--max-cruft-size` option of linkgit:git-repack[1] for
>         more.
>
> +--expire-to=<dir>::
> +       When packing unreachable objects into a cruft pack, write a cruft
> +       pack containing pruned objects (if any) to the directory `<dir>`.
> +       See the `--expire-to` option of linkgit:git-repack[1] for
> +       more.
> +
>  --prune=<date>::
>         Prune loose objects older than date (default is 2 weeks ago,
>         overridable by the config variable `gc.pruneExpire`).
> diff --git a/builtin/gc.c b/builtin/gc.c
> index d52735354c9..77904694c9f 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -136,6 +136,7 @@ struct gc_config {
>         char *prune_worktrees_expire;
>         char *repack_filter;
>         char *repack_filter_to;
> +       char *repack_expire_to;
>         unsigned long big_pack_threshold;
>         unsigned long max_delta_cache_size;
>  };
> @@ -441,6 +442,8 @@ static void add_repack_all_option(struct gc_config *cfg,
>                 if (cfg->max_cruft_size)
>                         strvec_pushf(&repack, "--max-cruft-size=%lu",
>                                      cfg->max_cruft_size);
> +               if (cfg->repack_expire_to)
> +                       strvec_pushf(&repack, "--expire-to=%s", cfg->repack_expire_to);
>         } else {
>                 strvec_push(&repack, "-A");
>                 if (cfg->prune_expire)
> @@ -675,7 +678,6 @@ struct repository *repo UNUSED)
>         const char *prune_expire_sentinel = "sentinel";
>         const char *prune_expire_arg = prune_expire_sentinel;
>         int ret;
> -
>         struct option builtin_gc_options[] = {
>                 OPT__QUIET(&quiet, N_("suppress progress reporting")),
>                 { OPTION_STRING, 0, "prune", &prune_expire_arg, N_("date"),
> @@ -694,6 +696,8 @@ struct repository *repo UNUSED)
>                            PARSE_OPT_NOCOMPLETE),
>                 OPT_BOOL(0, "keep-largest-pack", &keep_largest_pack,
>                          N_("repack all other packs except the largest pack")),
> +               OPT_STRING(0, "expire-to", &cfg.repack_expire_to, N_("dir"),
> +                          N_("pack prefix to store a pack containing pruned objects")),
>                 OPT_END()
>         };
>
> diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
> index ee074b99b70..d4b0653a9b7 100755
> --- a/t/t6500-gc.sh
> +++ b/t/t6500-gc.sh
> @@ -339,6 +339,12 @@ test_expect_success 'gc.maxCruftSize sets appropriate repack options' '
>         test_subcommand $cruft_max_size_opts --max-cruft-size=3145728 <trace2.txt
>  '
>
> +test_expect_success '--expire-to sets appropriate repack options' '
> +       mkdir expired &&
> +       GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size gc --cruft --expire-to=./expired/pack &&
> +       test_subcommand $cruft_max_size_opts --expire-to=./expired/pack <trace2.txt
> +'
> +
>  run_and_wait_for_gc () {
>         # We read stdout from gc for the side effect of waiting until the
>         # background gc process exits, closing its fd 9.  Furthermore, the
> --
> gitgitgadget
>

more.

--expire-to=<dir>::
When packing unreachable objects into a cruft pack, write a cruft
pack containing pruned objects (if any) to the directory `<dir>`.
See the `--expire-to` option of linkgit:git-repack[1] for
more.

--prune=<date>::
Prune loose objects older than date (default is 2 weeks ago,
overridable by the config variable `gc.pruneExpire`).
Expand Down
9 changes: 7 additions & 2 deletions builtin/gc.c
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ struct gc_config {
char *prune_worktrees_expire;
char *repack_filter;
char *repack_filter_to;
char *repack_expire_to;
unsigned long big_pack_threshold;
unsigned long max_delta_cache_size;
};
Expand Down Expand Up @@ -432,7 +433,8 @@ static int keep_one_pack(struct string_list_item *item, void *data UNUSED)
static void add_repack_all_option(struct gc_config *cfg,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Dec 31, 2024 at 02:18:33AM +0000, ZheNing Hu via GitGitGadget wrote:

> diff --git a/builtin/gc.c b/builtin/gc.c
> index 77904694c9f..8656e1caff0 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -433,7 +433,8 @@ static int keep_one_pack(struct string_list_item *item, void *data UNUSED)
>  static void add_repack_all_option(struct gc_config *cfg,
>  				  struct string_list *keep_pack)
>  {
> -	if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now"))
> +	if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now")
> +		&& !(cfg->cruft_packs && cfg->repack_expire_to))
>  		strvec_push(&repack, "-a");

I expected to see a mention of repack_expire_to here, but not
cfg->cruft_packs. These two are AND-ed together so we are only disabling
"repack -a" when both options ("--expire-to" and "--cruft") are passed.
Can we --expire-to without cruft? I.e., what should happen with:

  git gc --expire-to=some-path --prune=now --no-cruft

Looking at the underlying git-repack, it seems that we only respect
--expire-to at all when used with "--cruft", and don't otherwise
consider it. Which is what the manpage says ("Only useful with --cruft
-d").

But if we look at this proposed patch for example:

  https://lore.kernel.org/git/48438876fb42a889110e100a6c42ca84e93aac49.1733011259.git.me@ttaylorr.com/

then it is expanding how --expire-to is used during the pruning step.
OTOH, I think the way your patch 1 is structured means that we'd always
pass --expire-to to git-repack anyway, and I _think_ even with the patch
linked above that "repack -a -d --expire-to=whatever" would do the right
thing.

In which case the problem really is the combination of cruft packs and
expire-to. Just cruft packs by themselves do not need to override using
"-a" for "--prune=now" because we know that any such cruft pack would be
empty.

So I think this logic is correct. Taylor might have more thoughts,
though (and ideas on whether he intends to revisit that earlier patch).

I do think this change should probably be done as part of patch 1,
rather than introducing a buggy state and then fixing it in patch 2.

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Jeff King <[email protected]> 于2025年1月13日周一 17:17写道:
>
> On Tue, Dec 31, 2024 at 02:18:33AM +0000, ZheNing Hu via GitGitGadget wrote:
>
> > diff --git a/builtin/gc.c b/builtin/gc.c
> > index 77904694c9f..8656e1caff0 100644
> > --- a/builtin/gc.c
> > +++ b/builtin/gc.c
> > @@ -433,7 +433,8 @@ static int keep_one_pack(struct string_list_item *item, void *data UNUSED)
> >  static void add_repack_all_option(struct gc_config *cfg,
> >                                 struct string_list *keep_pack)
> >  {
> > -     if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now"))
> > +     if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now")
> > +             && !(cfg->cruft_packs && cfg->repack_expire_to))
> >               strvec_push(&repack, "-a");
>
> I expected to see a mention of repack_expire_to here, but not
> cfg->cruft_packs. These two are AND-ed together so we are only disabling
> "repack -a" when both options ("--expire-to" and "--cruft") are passed.
> Can we --expire-to without cruft? I.e., what should happen with:
>
>   git gc --expire-to=some-path --prune=now --no-cruft
>
> Looking at the underlying git-repack, it seems that we only respect
> --expire-to at all when used with "--cruft", and don't otherwise
> consider it. Which is what the manpage says ("Only useful with --cruft
> -d").
>

Yes, this is the current state of git-repack. The --expire-to option can
only be used with --cruft, which is why I use cruft_packs && repack_expire_to
as a double safeguard.

When using --no-cruft, the option --expire-to becomes irrelevant.
So leaving `git gc --prune=now` as is at this point: passing -a as a
parameter to repack seems reasonable.

> But if we look at this proposed patch for example:
>
>   https://lore.kernel.org/git/48438876fb42a889110e100a6c42ca84e93aac49.1733011259.git.me@ttaylorr.com/
>
> then it is expanding how --expire-to is used during the pruning step.
> OTOH, I think the way your patch 1 is structured means that we'd always
> pass --expire-to to git-repack anyway, and I _think_ even with the patch
> linked above that "repack -a -d --expire-to=whatever" would do the right
> thing.
>

I've taken a look at the patch, and I believe Taylor's changes are primarily
aimed at extending the --expire-to functionality within the --cruft feature,
rather than expecting --expire-to to be used on its own.

> In which case the problem really is the combination of cruft packs and
> expire-to. Just cruft packs by themselves do not need to override using
> "-a" for "--prune=now" because we know that any such cruft pack would be
> empty.
>
> So I think this logic is correct. Taylor might have more thoughts,
> though (and ideas on whether he intends to revisit that earlier patch).
>
> I do think this change should probably be done as part of patch 1,
> rather than introducing a buggy state and then fixing it in patch 2.
>

Yes, I agree with that, and perhaps a single patch will suffice.

> -Peff

- ZheNing Hu

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trong danh sách gửi thư Git , ZheNing Hu đã viết ( trả lời bài này ):

Jeff King <[email protected]> 于2025年1月13日周一 17:17写道:
>
> On Tue, Dec 31, 2024 at 02:18:33AM +0000, ZheNing Hu via GitGitGadget wrote:
>
> > diff --git a/builtin/gc.c b/builtin/gc.c
> > index 77904694c9f..8656e1caff0 100644
> > --- a/builtin/gc.c
> > +++ b/builtin/gc.c
> > @@ -433,7 +433,8 @@ static int keep_one_pack(struct string_list_item *item, void *data UNUSED)
> >  static void add_repack_all_option(struct gc_config *cfg,
> >                                 struct string_list *keep_pack)
> >  {
> > -     if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now"))
> > +     if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now")
> > +             && !(cfg->cruft_packs && cfg->repack_expire_to))
> >               strvec_push(&repack, "-a");
>
> I expected to see a mention of repack_expire_to here, but not
> cfg->cruft_packs. These two are AND-ed together so we are only disabling
> "repack -a" when both options ("--expire-to" and "--cruft") are passed.
> Can we --expire-to without cruft? I.e., what should happen with:
>
>   git gc --expire-to=some-path --prune=now --no-cruft
>
> Looking at the underlying git-repack, it seems that we only respect
> --expire-to at all when used with "--cruft", and don't otherwise
> consider it. Which is what the manpage says ("Only useful with --cruft
> -d").
>

Yes, this is the current state of git-repack. The --expire-to option can
only be used with --cruft, which is why I use cruft_packs && repack_expire_to
as a double safeguard.

When using --no-cruft, the option --expire-to becomes irrelevant.
So leaving `git gc --prune=now` as is at this point: passing -a as a
parameter to repack seems reasonable.

> But if we look at this proposed patch for example:
>
>   https://lore.kernel.org/git/48438876fb42a889110e100a6c42ca84e93aac49.1733011259.git.me@ttaylorr.com/
>
> then it is expanding how --expire-to is used during the pruning step.
> OTOH, I think the way your patch 1 is structured means that we'd always
> pass --expire-to to git-repack anyway, and I _think_ even with the patch
> linked above that "repack -a -d --expire-to=whatever" would do the right
> thing.
>

I've taken a look at the patch, and I believe Taylor's changes are primarily
aimed at extending the --expire-to functionality within the --cruft feature,
rather than expecting --expire-to to be used on its own.

> In which case the problem really is the combination of cruft packs and
> expire-to. Just cruft packs by themselves do not need to override using
> "-a" for "--prune=now" because we know that any such cruft pack would be
> empty.
>
> So I think this logic is correct. Taylor might have more thoughts,
> though (and ideas on whether he intends to revisit that earlier patch).
>
> I do think this change should probably be done as part of patch 1,
> rather than introducing a buggy state and then fixing it in patch 2.
>

Yes, I agree with that, and perhaps a single patch will suffice.

> -Peff

- ZheNing Hu

#1843 (comment)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct string_list *keep_pack)
{
if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now"))
if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now")
&& !(cfg->cruft_packs && cfg->repack_expire_to))
strvec_push(&repack, "-a");
else if (cfg->cruft_packs) {
strvec_push(&repack, "--cruft");
Expand All @@ -441,6 +443,8 @@ static void add_repack_all_option(struct gc_config *cfg,
if (cfg->max_cruft_size)
strvec_pushf(&repack, "--max-cruft-size=%lu",
cfg->max_cruft_size);
if (cfg->repack_expire_to)
strvec_pushf(&repack, "--expire-to=%s", cfg->repack_expire_to);
} else {
strvec_push(&repack, "-A");
if (cfg->prune_expire)
Expand Down Expand Up @@ -675,7 +679,6 @@ struct repository *repo UNUSED)
const char *prune_expire_sentinel = "sentinel";
const char *prune_expire_arg = prune_expire_sentinel;
int ret;

struct option builtin_gc_options[] = {
OPT__QUIET(&quiet, N_("suppress progress reporting")),
{ OPTION_STRING, 0, "prune", &prune_expire_arg, N_("date"),
Expand All @@ -694,6 +697,8 @@ struct repository *repo UNUSED)
PARSE_OPT_NOCOMPLETE),
OPT_BOOL(0, "keep-largest-pack", &keep_largest_pack,
N_("repack all other packs except the largest pack")),
OPT_STRING(0, "expire-to", &cfg.repack_expire_to, N_("dir"),
N_("pack prefix to store a pack containing pruned objects")),
OPT_END()
};

Expand Down
6 changes: 6 additions & 0 deletions t/t6500-gc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,12 @@ test_expect_success 'gc.maxCruftSize sets appropriate repack options' '
test_subcommand $cruft_max_size_opts --max-cruft-size=3145728 <trace2.txt
'

test_expect_success '--expire-to sets appropriate repack options' '
mkdir expired &&
GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size gc --cruft --expire-to=./expired/pack &&
test_subcommand $cruft_max_size_opts --expire-to=./expired/pack <trace2.txt
'

run_and_wait_for_gc () {
# We read stdout from gc for the side effect of waiting until the
# background gc process exits, closing its fd 9. Furthermore, the
Expand Down
Loading