Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-add of untracked files screws me up every time #323

Open
durin42 opened this issue May 16, 2022 · 147 comments · Fixed by #5138
Open

Auto-add of untracked files screws me up every time #323

durin42 opened this issue May 16, 2022 · 147 comments · Fixed by #5138
Assignees

Comments

@durin42
Copy link
Contributor

durin42 commented May 16, 2022

Description

Initially I thought the auto-add of files was a neat idea, but in practice I just leave untracked files in my repo all the time, and tools like patch(1) assume they can drop .orig or similar files in the WC without it being a problem. I think every time I've used jj I've ended up getting grumpy at auto-adds and having to rip something out of a commit, sometimes after doing a push (when it was effectively emulating hg import for example).

Steps to Reproduce the Problem

  1. patch -p1 < some.diff (or similar)
  2. jj describe && jj close && jj git push
  3. Look at web view of diff, notice you committed a .orig file again (or similar)

Expected Behavior

I (still) don't expect auto-adds, and it really surprises me every time, plus it's super frustrating to have to ignore every file spec I might create as a temporary scratch file or what have you.

Actual Behavior

As above.

Specifications

@arxanas
Copy link
Contributor

arxanas commented May 17, 2022

Agreed on that — I think automatically committing tracked files is fine, but untracked is probably bad:

  • They could be very big
  • They could contain secrets
  • Querying the working copy for only tracked files is probably more efficient in practice (like with git status -uno)

@martinvonz
Copy link
Member

I mostly find the feature useful, but I also agree that it can be annoying and confusing. The worst case I've noticed is when you - perhaps accidentally - check out the root commit, where there's no .gitignore file containing target/. Then any command you run will try to commit GBs of data. I ran into that again just a few days ago, and was actually thinking of adding a config for the auto-add behavior.

When looking at an old version of the repo, you'll not see untracked files (e.g. jj --at-op=<some old operation> status), but that seems fine. I think the most annoying bit will be to add a way of providing that information to users who want to commit the working copy in the background (like we will probably do at Google). I'll probably skip that bit to start with when I implement this.

martinvonz added a commit that referenced this issue Jun 10, 2022
I plan to use this matcher for some future `jj add` command (for
#323). The idea is that we'll do a path-restricted walk of the working
copy based on the intersection of the sparse patterns and any patterns
specified by the user. However, I think it will be useful before that,
for @arxanas's fsmonitor feature (#362).
martinvonz added a commit that referenced this issue Jun 10, 2022
I plan to use this matcher for some future `jj add` command (for
#323). The idea is that we'll do a path-restricted walk of the working
copy based on the intersection of the sparse patterns and any patterns
specified by the user. However, I think it will be useful before that,
for @arxanas's fsmonitor feature (#362).
@tp-woven
Copy link
Contributor

Just to throw my 2c in: With the exception of new files that are also added to .gitignore, I actually really like the automatic tracking. I also think the same "repro" from the first comment can be used as a reason to auto-track - if you don't jj st before pushing, you're just as likely to miss a file that you should have added as you are a file that you should have ignored. So my preference is that this would be configurable if possible, rather than just removed.

@elasticdog
Copy link
Contributor

Just wanted to say that I really like the auto-tracking feature as well. After using jj for a while, going back to manually having to think about adding files to be tracked seems like a lot of extra work and possibly more error prone. I'm already used to checking jj status to make sure that I'm committing the expected files...that said, I can definitely see how not thinking about the appropriate ignores right away could be troublesome if the files are large, and also acknowledge the the root commit situation (although I don't know how often that would realistically come up in day-to-day usage).

@necauqua
Copy link
Contributor

necauqua commented May 8, 2023

The thing with accidentally committing GBs of target when you forgot to ignore it or checkout some old commit is that there's no GC at the moment, so those GBs are forever in the history.

Especially if you have something automatically creating it, like direnv, the moment you check out the commit before your update of gitignore, literally happened to me with jj repo more than once.

The only way is to recreate the repo (losing the oplog) this time not forgetting to ignore stuff/not editing old commits - which is kind of meh as well.

Full GC of course means basically the same thing, only automated by a single command, but something like jj undo --harder [--at-op OP] [--and-immediate-gc-too] (are you sure [y/N]) to delete the last op/OP unrecoverably and gc things it referenced could be cool.

@ony
Copy link

ony commented Jun 11, 2023

Maybe it is possible to track ignores in jj too. If you switch away from commit at which you had something ignored but not at new one - that information probably can be used. E.g. sort of in-flight ignores that visible in jj st to inform about either extending explicit ignores or confirming addition of new paths to current change.

P.S. Can think of how git checkout reacts when your untracked file is about to be overwritten by checked out files. In case of jj all files are tracked by default and thus even absence of the file is tracked.

@necauqua
Copy link
Contributor

necauqua commented Jun 11, 2023

Hm, so auto-tracking when you are doing changes in a working commit is the main killer feature.

But for my issue with it, how about this:
What if jj considered .gitignore not just from the current commit, but from the entire history - or actually just descendant of the current commit?
And if you needed to actually add some file in the past you'd have to explicitly track it (instead of explicitly untracking it in the common case you don't need that).

I know this is kind of magical, but the more I think about it the more it makes sense, idk - and the autorebase is magical on it's own, this somehow is kind of even consistent in my head

@martinvonz
Copy link
Member

martinvonz commented Jun 11, 2023

It would be expensive to find the gitignores from all commits, but we could probably index that information. I'm more concerned that it would be unexpected behavior. For example, if you check out a sibling commit where target/ is not ignored, then it would still not be ignored if we only consider descendants.

Maybe it's better to check if the gitignores changed between the old and the new commit and if any untracked files according to the new patterns match the old patterns. If that happen, we could just print a warning about it. We could additionally add the ignores to a per-workspace set of ignores (which we don't support yet). (EDIT: I think this is what @ony suggested.)

@necauqua
Copy link
Contributor

If that happen, we could just print a warning about it

The warning being "those differences are implicitly untracked for this WC, in case your direnv caused GBs of files to generate in target and/or .direnv - add them to gitignore here or explicitly track them, moving to another commit without them ignored (e.g. jj new) will cause them to be tracked in that commit"

^ this is a loose idea, could be refined, for example jj st will have that information ofc

@ilyagr
Copy link
Contributor

ilyagr commented Aug 13, 2023

Following @ony's suggestion, perhaps when the working copy moves to a new commit, we could track "newly unignored" files by comparing the old .gitignore and the new .gitignore. For example, we could store an extra tree of "newly unignored" files.

Then, the UI could provide ways for dealing with these files. E.g. there could be a command like jj ignored --previously that lists them and jj ignored --previously --restore that gets rid of them. We'd also have to decide whether a modification to a "previously ignored" file makes it no longer "previously ignored".

@kevincliao
Copy link
Contributor

kevincliao commented Jan 13, 2024

I ran into this today when trying to checkout to a different branch that doesn't contain node_modules. As people have mentioned above at that point it's not possible to run any jj commands. I ended up creating a temporary .gitignore file before being able to jj op restore to a previous checkpoint. Is there a better way to recover when running into this? I wonder if it's possible to have jj op commands still work in this scenario.

@kevincliao
Copy link
Contributor

Ahh ignore my comment, I think I got confused - once there is an option to not auto-add untrack files jj op commands will work again.

@yuja
Copy link
Contributor

yuja commented Jan 14, 2024

Is there a better way to recover when running into this? I wonder if it's possible to have jj op commands still work in this scenario.

You can pass --ignore-working-copy to these commands, but we don't have the last bit to reset the working copy without snapshotting yet.

jj op log --ignore-working-copy  # "jj op log" also works with the current main branch
jj op restore --ignore-working-copy @-
jj workspace update-stale --some-option-to-not-snapshot-before-resetting

@HadrienG2
Copy link

HadrienG2 commented Jan 25, 2024

Overall, it seems there is no good automated way to handle untracked files when creating commits. One one hand, not tracking them leads to incomplete commits. On the other hand, auto-tracking them leads to commiting of unwanted files. So how would you feel about some variation of the following semi-automated design?

  1. By default, interactively prompt before auto-adding files, something like This command will add <list of files> to the current commit, proceed? [y/N].
    • Saying yes follows the current behavior.
    • Saying no aborts the command with an error return value and lets you use jj track and .gitignore as appropriate.
  2. Have a way to whitelist sets of files (e.g. source files) so that they are auto-added without a prompt, and not mentioned in prompts when they do occur.
    • This could take the form of a .jjadd file that uses the same glob syntax as .gitignore.
    • The aforementioned prompt would mention the possibility of configuring jj for auto-adding and ignoring.

I think this might strike a good balance between the following concerns:

  • Files which we want to track (like source files) eventually get auto-added silently as desired, avoiding incomplete commits and replicating the good parts of the current jj auto-add UX.
  • Files which we do not want to track (like object files, target/ directories...) eventually get ignored silently as desired, without undesirable creation of commits that will keep them in the history forever.
  • After a short initial configuration period, seeing the prompt becomes an exceptional event and thus leads the user to pause and think, as desired in this situation.

For scripted operation, there should be a way to provide a default answer to the prompt via CLI arguments.

@martinvonz
Copy link
Member

I'm personally quite happy with the current behavior (except for the behavior when updating to a commit with different .gitignore). It can be a bit annoying in the beginning, but once you've added the appropriate paths, I find that it works pretty well. Maybe others feel differently. But even if they don't, we may want to make it less annoying for new users by doing something like you suggest.

@ilyagr
Copy link
Contributor

ilyagr commented Jan 25, 2024

I'm not very happy with the idea of the interactive prompt.

I think that if you edit a .gitignore, any subsequent jj command could trigger this prompt, including jj log. I tend to run an analogue of watch jj log in a tmux pane permanently, and I think this would work very badly with the interactive prompt. Firstly, I'll need to adjust the command to use the "scripted mode". In the "scripted" mode, if the default answer to the prompt is "yes", this goes back to users experiencing auto-add of untracked files. If it's "no", jj's view of the workspace could be out of sync with reality for a while (but, if we go with a prompt, I think this is the better option).

Other UIs will also do an analogue of jj log regularly. Every jj UI (e.g. VS Code plugin) would probably need to have a way of giving this prompt to the user, if we made this interactive.

@HadrienG2
Copy link

Ah, yes, there's that. I knew that this design decision of having status commands modify the repository was fishy and going to cause problems someday...

@lf-
Copy link

lf- commented Apr 12, 2024

This feature has sadly made me bounce off of jj immediately every time I try it, which is really unfortunate because I keep hearing such good things about it, and want to give it a genuine try.

Every single repository I work on regularly has various testing/strace-log/whatever files in its root. I actually don't mind auto tracking in src/, but in the root it just is not compatible with my workflow.

fwiw a workaround for this that I've not yet checked works on jj might be some kind of terrible thing like so in the .git/info/exclude:

/*
!src/
!Cargo.*

This workaround is quite bad indeed, and I would rather not have to reimplement the git index in .git/info/exclude to be able to use jj, though admittedly it would be with wildcards at least.

@ilyagr
Copy link
Contributor

ilyagr commented Apr 12, 2024

I have no idea whether this would be helpful you, but here's something that helped me a lot. I can't remember who had suggested it originally; it might be in the FAQ.

  • I added _ignore/* to ~/.config/git/ignore (~/.gitignore should also work).

    This is possibly not absolutely optimal (I have been wondering whether /_ignore/ would be better), but works well enough. I actually use _ilyaignore to make the name more unique.

  • Create an _ignore subdir in my repo

  • Save all weird logs and traces to it

@lf-
Copy link

lf- commented Apr 12, 2024

Yup it is in the FAQ or something; I've seen it given as advice before. I just don't like it and it doesn't vibe with how I work, since it would be a whole bunch of extra typing. I could have it be i/ or something, I guess, to reduce typing, but I would still have to remember to do it every time, which feels kind of bad?

@ilyagr
Copy link
Contributor

ilyagr commented Apr 18, 2024

Inspired by more feedback (#3528 (comment)), perhaps @dpc 's suggestion from that post might work. Perhaps we could have a notion of "untracked" files, like Git, and default files to "untracked", while also auto-updating all the tracked files on each command?

jj status would certainly show any untracked files. jj log could too. It's not quite in the spirit of "everything is a commit" (untracked files would show up as a fake commit in some places), but might work.

One question is what jj diff would do. My first instinct would be to have it act on tracked files only, but complain loudly when there are untracked files. Perhaps each command would do that, I'm unsure.

This would be a huge change, so I almost certainly missed some important considerations.

@dpc
Copy link

dpc commented Apr 18, 2024

This would be a huge change

Speaking out of ignorance, I'm guessing jj already needs to compare all worktree files against .gitignore. Right after that it could just compare them against files already tracked in the current change and ignore ones that are not. Plus a command to track a file. And that's kind of it, no? Showing untracked files, etc. seems like a nice-to-have. Deleting a tracked file could work as a "untrack", just like it already does.

Hmm... I guess mv <sometrackedfile> <newlocation> now requires explicit calling "track" on a new location, which is a bit breaking the "immersion", but I think it's fine. And again - this behavior would be optional (but I would suggest making it the default for the sake of newcomers). People that figured out everything could just opt-in into current seamless behavior, which I find elegant and I'm sure I would eventually settle into it just fine, after making sure given repo doesn't produce untracked trash, adding some ./tmp/ to .gitignore and remembering to create my debugging stuff inside it.

@martinvonz
Copy link
Member

jj status would certainly show any untracked files. jj log could too. It's not quite in the spirit of "everything is a commit" (untracked files would show up as a fake commit in some places), but might work.

One question is what jj diff would do. My first instinct would be to have it act on tracked files only, but complain loudly when there are untracked files. Perhaps each command would do that, I'm unsure.

If we add support for untracked files, I think it should be pretty much only jj status that shows them. They would just be invisible to every other command. Would that work for the untracked-files proponents?

I guess mv <sometrackedfile> <newlocation> now requires explicit calling "track" on a new location

I don't think so. Almost all commands, and probably also the future jj mv work on commits and just update the working copy to match afterwards.

@scott2000
Copy link
Contributor

@ilyagr

If we are OK with the idea of "For efficiency, if all files in a directory are ignored, the directory path could be stored instead.", that would solve the problem of the .gitignore becoming a mile long

I think for .gitignore, this definitely isn't ok actually because they have different semantics. If I ignore dir/a.txt and it adds dir/ to the ignore file, then when I add "dir/b.txt" it would be surprising if it were ignored even though I didn't specify it as ignored. This same issue might also apply to storing the untracked file list in the working copy in some cases, so we'd have to be careful there as well.

Either way, I personally still don't feel very comfortable with jj automatically adding paths to the committed .gitignore file in general, because I like to keep my committed .gitignore file as clean as possible. For instance, I sometimes use comments to create section headers for files produced by different tools, and I like to add the simplest rule that can capture all of the necessary files, even if a longer list of files would have the same effect in practice.

I do see the value of such a command, but I would at least prefer for it to be something the user has to ask for explicitly (e.g. jj untrack --ignore <file> or jj ignore <file>) rather than something that could happen automatically during another operation.

@yuja
Copy link
Contributor

yuja commented Oct 15, 2024

  • For the list of newly unignored files, I'd try to put some energy into making it work with .gitignore first, for reasons Joy described, but if there are insurmountable problems, we could store it in the working copy. I think that later rules always override earlier ones in .gitingore, which is the main reason I hope that storing things in .gitignore might work without the ability to parse .gitignore syntax.

There may be a negative pattern in sub directory's .gitignore. I don't remember the rule, but I think sub-dir rule is prioritized? That's another complexity I had in mind.

If we are OK with the idea of "For efficiency, if all files in a directory are ignored, the directory path could be stored instead.", that would solve the problem of the .gitignore becoming a mile long as in Auto-add of untracked files screws me up every time #323 (comment).

It would probably become simpler if the working copy tracked ignored directory/file paths. If the whole directory was previously ignored, and if it is now un-ignored, record it as new ignore path. However, this means the working-copy tracks ignored paths, so I think it's easier to accumulate ignored paths instead of updating user-managed .gitignore files. We'll need jj status support and some other commands to update/reset the cache of ignored paths, but we can instead get away from --ignore-policy options, which seems equally unintuitive.

@martinvonz
Copy link
Member

There may be a negative pattern in sub directory's .gitignore. I don't remember the rule, but I think sub-dir rule is prioritized? That's another complexity I had in mind.

Yes, I think the rules are basically appended to the list from the parent and processed in reverse order, so later rules in a subdirectory or in an individual .gitignore file take priority.

@sheremetyev
Copy link
Contributor

For one, what if the new commit has a tracked file where the working copy had an ignored file? This situation would already be a problem today, but I think it'll become far more common with your setup. E.g. it would be potentially problematic if you start on the main branch of jj, then jj new gh-pages, and then jj new main-.

@ilyagr that's a great example! I think it's reasonable to expect such file to remain untracked - because it's explicitly listed in the working copy's list of untracked files? Switching back to the main branch would put working copy in a good state (ignored file is still on disk). Intermediate situation on gh-pages branch is somewhat similar to the case where a file was committed previously, then user decided to stop tracking it but wants to keep the file on disk.

IIUC, to achieve such behaviour, the list of untracked files in the working copy should take precedence over the list of tracked files in the current commit - when determining tracked/untracked status for a file.

It seems likely to me that, if we want the user to manually resolve cases when .gitignore does not match the actual ignore files (or a subset of them that is not marked as scratch files), we would be forced to create a jj state where it's not OK to jj new to another commit.

IMHO it's an important advantage of UX in Jujutsu that user can always switch to another commit - would be nice to preserve it

@robinst
Copy link
Contributor

robinst commented Oct 18, 2024

Tip

Summary of current state for anyone landing here

If you use jj with git and want to not automatically track any files, since #4338 you can set this configuration:

jj config set --user 'snapshot.auto-track' 'none()'

Then to see untracked files:

git status

To add an untracked file:

jj file track myfile.txt

(The work remaining for this IIUC is to not require a git status to see untracked files.)

tmeijn pushed a commit to tmeijn/dotfiles that referenced this issue Oct 18, 2024
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [martinvonz/jj](https://github.com/martinvonz/jj) | minor | `v0.21.0` -> `v0.22.0` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>martinvonz/jj (martinvonz/jj)</summary>

### [`v0.22.0`](https://github.com/martinvonz/jj/releases/tag/v0.22.0)

[Compare Source](jj-vcs/jj@v0.21.0...v0.22.0)

##### Breaking changes

-   Fixing [#&#8203;4239](jj-vcs/jj#4239) means the
    ordering of some messages have changed.

-   Invalid `ui.graph.style` configuration is now an error.

-   The builtin template `branch_list` has been renamed to `bookmark_list` as part
    of the `jj branch` deprecation.

##### Deprecations

-   `jj branch` has been deprecated in favor of `jj bookmark`.

    **Rationale:** Jujutsu's branches don't behave like Git branches, which a
    confused many newcomers, as they expected a similar behavior given the name.
    We've renamed them to "bookmarks" to match the actual behavior, as we think
    that describes them better, and they also behave similar to Mercurial's
    bookmarks.

-   `jj obslog` is now called `jj evolution-log`/`jj evolog`. `jj obslog` remains
    as an alias.

-   `jj unsquash` has been deprecated in favor of `jj squash` and
    `jj diffedit --restore-descendants`.

    **Rationale:** `jj squash` can be used in interactive mode to pull
    changes from one commit to another, including from a parent commit
    to a child commit. For fine-grained dependent diffs, such as when
    the parent and the child commits must successively modify the same
    location in a file, `jj diffedit --restore-descendants` can be used
    to set the parent commit to the desired content without altering the
    content of the child commit.

-   The `git.push-branch-prefix` config has been deprecated in favor of
    `git.push-bookmark-prefix`.

-   `conflict()` and `file()` revsets have been renamed to `conflicts()` and `files()`
    respectively. The old names are still around and will be removed in a future
    release.

##### New features

-   The new config option `snapshot.auto-track` lets you automatically track only
    the specified paths (all paths by default). Use the new `jj file track`
    command to manually tracks path that were not automatically tracked. There is
    no way to list untracked files yet. Use `git status` in a colocated workspace
    as a workaround.
    [#&#8203;323](jj-vcs/jj#323)

-   `jj fix` now allows fixing unchanged files with the `--include-unchanged-files` flag. This
    can be used to more easily introduce automatic formatting changes in a new
    commit separate from other changes.

-   `jj workspace add` now accepts a `--sparse-patterns=<MODE>` option, which
    allows control of the sparse patterns for a newly created workspace: `copy`
    (inherit from parent; default), `full` (full working copy), or `empty` (the
    empty working copy).

-   New command `jj workspace rename` that can rename the current workspace.

-   `jj op log` gained an option to include operation diffs.

-   `jj git clone` now accepts a `--remote <REMOTE NAME>` option, which
    allows to set a name for the remote instead of using the default
    `origin`.

-   `jj op undo` now reports information on the operation that has been undone.

-   `jj squash`: the `-k` flag can be used as a shorthand for `--keep-emptied`.

-   CommitId / ChangeId template types now support `.normal_hex()`.

-   `jj commit` and `jj describe` now accept `--author` option allowing to quickly change
    author of given commit.

-   `jj diffedit`, `jj abandon`, and `jj restore` now accept a `--restore-descendants`
    flag. When used, descendants of the edited or deleted commits will keep their original
    content.

-   `jj git fetch -b <remote-git-branch-name>` will now warn if the branch(es)
    can not be found in any of the specified/configured remotes.

-   `jj split` now lets the user select all changes in interactive mode. This may be used
    to keeping all changes into the first commit while keeping the current commit
    description for the second commit (the newly created empty one).

-   Author and committer names are now yellow by default.

##### Fixed bugs

-   Update working copy before reporting changes. This prevents errors during reporting
    from leaving the working copy in a stale state.

-   Fixed panic when parsing invalid conflict markers of a particular form.
    ([#&#8203;2611](jj-vcs/jj#2611))

-   Editing a hidden commit now makes it visible.

-   The `present()` revset now suppresses missing working copy error. For example,
    `present(@&#8203;)` evaluates to `none()` if the current workspace has no
    working-copy commit.

##### Contributors

Thanks to the people who made this release happen!

-   Austin Seipp ([@&#8203;thoughtpolice](https://github.com/thoughtpolice))
-   Danny Hooper ([@&#8203;hooper](https://github.com/hooper))
-   Emily Shaffer ([@&#8203;nasamuffin](https://github.com/nasamuffin))
-   Essien Ita Essien ([@&#8203;essiene](https://github.com/essiene))
-   Ethan Brierley ([@&#8203;eopb](https://github.com/eopb))
-   Ilya Grigoriev ([@&#8203;ilyagr](https://github.com/ilyagr))
-   Kevin Liao ([@&#8203;kevincliao](https://github.com/kevincliao))
-   Lukas Wirth ([@&#8203;Veykril](https://github.com/Veykril))
-   Martin von Zweigbergk ([@&#8203;martinvonz](https://github.com/martinvonz))
-   Mateusz Mikuła ([@&#8203;mati865](https://github.com/mati865))
-   mlcui ([@&#8203;mlcui-corp](https://github.com/mlcui-corp))
-   Philip Metzger ([@&#8203;PhilipMetzger](https://github.com/PhilipMetzger))
-   Samuel Tardieu ([@&#8203;samueltardieu](https://github.com/samueltardieu))
-   Stephen Jennings ([@&#8203;jennings](https://github.com/jennings))
-   Tyler Goffinet ([@&#8203;qubitz](https://github.com/qubitz))
-   Vamsi Avula ([@&#8203;avamsi](https://github.com/avamsi))
-   Yuya Nishihara ([@&#8203;yuja](https://github.com/yuja))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40NDAuNyIsInVwZGF0ZWRJblZlciI6IjM3LjQ0MC43IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
@simonmichael
Copy link

simonmichael commented Oct 23, 2024

Thanks to @martinvonz and all the discussers for working on this. I'm able to use jj now because of the new setting.

@AngelEzquerra
Copy link

For what is worth, let me add a vote for changing the current behavior. While I understand that for a lot of current users might really enjoy it, I agree with @durin42 that jujutsu's current default behavior is problematic in several ways (and in fact it tripped me the very first time I tried jj). However, if you often create private files that you want to keep in your working directory without tracking them (i.e. files that you never want to share and which you don't want to disappear as you move to another changeset), the current default and the file untrack tooling is not very helpful (it should at least warn you when you untrack a file that is not already gitignored).
In addition to this, I worry that the current default is unsafer than the alternative. It makes it way too easy to accidentally add a secret file to your repo without noticing. Yes, you can make that same mistake in git if you do git commit -a, but at least there you are doing a commit explicitly, so it makes sense to expect people to pay attention. In jujutsu it could happen implicitly when you do jj st or jj log, which is much easier and less explicit.
It is great that you can at least change the default, but defaults matter and I worry that this one might hamper jujutsu's future adoption.

@marc-h38
Copy link

marc-h38 commented Jan 1, 2025

For what is worth, let me add a vote for changing the current behavior.

I'm afraid you have not read the whole discussion. You should also read #4338.

This mega issue has now transitioned to discussing #5138. If you really want to start a "vote" to change the default (you do not), then create a new, specific issue. This issue has gone in too many directions and become too big and unreadable.

if you often create private files that you want to keep in your working directory without tracking them...

The consistent answer from some members of the jj community has been : 1. Use .gitignore for temporary files with predictable paths 2. If you absolutely need or want unpredictable file names, don't "pollute" your clone and move them outside. The latter is a "bad habit" that the software world as a whole should un-learn.

I'm not a fan of this either but getting jj config set --user 'snapshot.auto-track' 'none()' already felt like a small miracle, so enjoy it and don't push our luck. It's just one command to use only once, nothing worse than git config --global user.email ...

@cstoitner
Copy link
Contributor

cstoitner commented Jan 5, 2025

With #5138 jj status is still missing untracked files in untracked directories.

@cstoitner cstoitner reopened this Jan 5, 2025
@arxanas
Copy link
Contributor

arxanas commented Jan 5, 2025

Revision of previous stance

I originally commented two years ago arguing against snapshotting untracked files. I no longer think that, for two reasons —

  • After having read through some of the papers and transcripts for Investigating & Archiving the Scholarly Git Experience, I now believe that auto-snapshotting untracked files is the correct default for new users.
    • Regarding representation: Many software engineers have provided their thoughts in this thread, but the class of novice VCS users simply won't be represented here.
    • At a deeper level: We ought to clarify the design principles for jj.
      • Is it supposed to target non-software-engineers?
      • That would determine what kinds of trade-offs we should make in the interest of accessibility.
      • I don't know if the jj design principles have been codified.
  • After having spent more time using jj, it doesn't seem like as severe of a design problem as I originally thought.
    • Not snapshotting untracked files is a form of 'losing work', which is a worse sin for VCS in my opinion than 'publishing secrets'.
      • To this day, I somehow still manage to not track newly-added files in Git somewhat regularly. It tends to happen when I'm switching between a bunch of commits in a large stack and making changes.
      • The consequences are substantially worse for novices, who may take indefinitely long to realize that they didn't commit and publish everything they intended to. (Such users don't typically have a CI system at their disposal to automatically validate commits, for example.)
    • However: There are clear UX and performance issues which remain unsolved.

Design considerations

I skimmed through the thread and tried to collect the important design considerations:

  • Performance: handle large files
  • Performance: handle scanning many files
  • Correctness: track history of all important files
  • Correctness: transmit/push all important files
  • Correctness: don't leak secrets
  • Correctness: don't lose work when switching commits/merging with new working copy
  • Workflow: keep secrets
    • API keys
    • Encryption keys
    • Credentials
    • Notes
  • Workflow: ignore junk
    • Build artifacts
    • Editor temporary files
  • Workflow: preserve ‘precious’ files
    • Log files
    • Notes
    • Scripts
  • Workflow: list/query untracked files
  • Workflow: start tracking untracked files

Let me know if I missed anything.

Unifying designs

While I'm fine with the current set of workarounds (snapshot.auto-track, showing untracked files in jj status), I think we ought to adopt a more general design that handles untracked files along with other kinds of files, so that we don't have to invent a new design for every kind of file in working copy. I previously collated such a list here: #4060

Example alternative design

To demonstrate similarity for a general design, I'll claim that untracked files aren't that different from LFS files:

  • Both involve not storing the content in the commit literally.
  • Instead, there's merely a handle to a place you can fetch the content, if desired.
    • In the case of LFS, that's probably some remote store involved.
    • In the case of untracked files, that's the current working copy.
  • It would be good to handle them using as much shared concepts and infrastructure as possible.

Here's one example of a way to generalize the design (but note that it's not a fully-fleshed out design). Suppose we added object types for each of the following (among possibly others from the previously-linked listing):

  • untracked files
  • ignored files
  • precious files
  • large files (like Git LFS)

Evaluation

If untracked/ignored/precious files are represented similarly to normal files in the snapshots, then you also resolve most of the above design considerations:

  • Performance: if the file is large, you don't need to store its full contents, just a marker that it exists on disk
  • Performance: to avoid scanning too many files, you can mark a tree-style object as "untracked" rather than a blob-style object, indicating that it's a directory, and avoid descending into it.
  • Correctness: if we wanted, we could store at least the historical hashes of the files, although I don't see a specific reason to do so (so we might drop this criterion for untracked files).
  • Correctness: to ensure that users publish all work that they intended to, we can add a check to prevent pushing untracked files, in the same way we (presumably?) prevent pushing conflicts, and require the user to resolve the untracked files before proceeding.
  • Correctness: using the same workflow as the previous point, we can alert users to avoid pushing secrets.
  • Correctness: we get to resolve issues of switching commits and clobbering files "for free" (or at least with less work), because the attempt to switch commits would result in a real merge conflict (and for the case of conflicts with untracked files, we can always preserve the untracked file contents on disk, rather than attempting an auto-merge, to avoid losing information).
  • Workflow: we can keep secrets as per above, assuming the filename itself isn't confidential
  • Workflow: we can ignore junk files as per above
  • Workflow: we can preserve 'precious' files by leveraging the merging behavior when switching commits
  • Workflow: we can surface untracked files in jj status (if we want) by finding them in the snapshotted tree, rather than communicating them via an out-of-band mechanism in the implementation.
  • Workflow: we can start tracking an untracked file by changing the object type and re-snapshotting.

Versus other proposed designs

In contrast, I am less favorable towards a solution where we automatically update .gitignore or some other list of untracked files, because I'm not convinced it handles the design considerations. To give some examples: the semantics of switching commits when there are changes to .gitignore are quite complicated, and you still risk accidentally committing sensitive files in a way that can't be automatically surfaced for review later.

I'll also just remark that "untracked files" are pretty similar to "files which aren't checked out due to the sparse checkout configuration":

  • To stop tracking a file on disk, you could just remove it from your sparse checkout, so that the file's contents on disk aren't used for snapshotting.
  • I believe jj sparse checkouts don't support negative patterns currently, though.
  • From this perspective, the snapshot.auto-track configuration is pretty similar to specifying a sparse checkout via a fileset, so it seems unfortunate to have both systems (although I assume they're not identical).
  • An alternative to the above (fairly involved) design is to improve the features and usability of sparse checkouts so that you can use them effectively to preserve files in the working copy without jj trying to snapshot or clobber them. I would also be in favor of that over new configuration and files.

@sheremetyev
Copy link
Contributor

@arxanas are you arguing for snapshotting of untracked files in the current commit or in the state of working copy? I think snapshotting in working copy would give a reasonable behaviour (#323 (comment))

@joyously
Copy link

joyously commented Jan 5, 2025

If untracked/ignored/precious files are represented similarly to normal files in the snapshots, then you also resolve most of the above design considerations

Does this representation require different data than the existing backends save? If so, does that mean that data is only stored locally?
Can you give a definitive answer as to why there should be a difference between ignored and untracked? Is precious really a thing for a VCS to worry about?

@marc-h38
Copy link

marc-h38 commented Jan 6, 2025

Deep and interesting thoughts, except maybe for this:

Not snapshotting untracked files is a form of 'losing work', which is a worse sin for VCS in my opinion than 'publishing secrets'.

Errr... no: "not shared/published yet" is definitely not the same as "losing work". That's a serious misrepresentation.

The funny thing is: this type of accident is absolutely nothing new. Whether it's the better or worse design choice, most (all?) VC systems have allowed it to happen so far. The only reason we're discussing it is because jj is (boldly) departing with this default behavior. So chances are, this type of accident has happened already at least once or twice to many people involved in this discussion. Which means they have a pretty good, practical knowledge and understanding of it and can't be misled about it by dramatization.

Regarding representation: Many software engineers have provided their thoughts in this thread, but the class of novice VCS users simply won't be represented here.

Very good point: with the (welcome) rise of markup languages and treating everything "as code", VC is becoming more popular than ever and reaching new types of users.

The consequences are substantially worse for novices, who may take indefinitely long to realize that they didn't commit and publish everything they intended to. (Such users don't typically have a CI system at their disposal to automatically validate commits, for example.)

Agreed: you could for instance publish a book accidentally missing a section. This requires a lack of automated checking of cross-references, a poor proof-reading process not even noticing that table of contents don't match, etc. but none of these is hard to imagine.

For developers on the other hand, catching a forgotten source file does not even require any CI system in practice. If that new file serves some actual purpose (other than hijacking a shared repo to backup private resources...), then the miss is immediately noticed the moment ANYone or anything (not just CI) tries to compile or run tests etc[*]. Many developers have first hand experience with this (= exactly why jj is trying to address it) and many treat it as a inconvenience very minor compared to the long list of autotracking drawbacks (not just sharing secrets). See, maybe there was a reason all systems before jj behave this way. Maybe it was not just a lack of creativity, maybe it was because very many developers actually like it. We'll never know for sure.

So looking back, making this behavior configurable was practically necessary for jj to serve such a large spectrum of users. Well done.

There is also a profile of git commit -a developers who are already "auto" tracking everything "manually" and git push without looking. These will be delighted by the new convenience to accidentally share secrets and huge files :-)

[*] I'm not considering one-person projects that don't even have any sort of automated CI either. These have much, much bigger test issues than autotracking.

@necauqua
Copy link
Contributor

necauqua commented Jan 6, 2025

So, since our discussions I've been using jj a lot (for everything), and I only recently - like a week or two recently - finally set the snapshot.auto-track = "none()" config.

And I've found the experience worse so far, I forgot to track files several times.
re: "the miss is immediately noticed" from the above comment - not on the local machine :)

I probably got used to autotrack and stopped having random files in my repos?.

My personal issues were things like tracking a massive target/ folder or not having junk in oplog - for the first one I was saved multiple times by the 1mb heuristic (which I'm pretty sure I inspired here in gh comments somewhere), and the second one is just annoying ocd type of thing that I'm learning to ignore - same with jj snapshots of unfinished/messy work in general.

Also for massive target/ or other "incomplete gitignore"-type issues - they usually occur once and at the very worst you can nuke and reclone/recreate the repo.
Stuff like random logs or IDE files or crashdumps or other non-personal junk - should get gitignored too.

For the secrets that people keep bringing up - bruh I never in my life had secrets in source files, what (at least not those I'm ready to leak because they are for localhost env or smth)
Like it's always a gitignored .env file or private sub-config or something.
But anyway, you remove them from the working copy and resnapshot it - people do look at jj diff before/after they finalize or (especially) push a jj commit, right?.
And if you don't want to have them in the oplog you nuke it too then, oh well.

I was a git add .-and-then-look-at-git status developer.
Actually I just thought that I would be immensely happy with a non-snapshotting jj st that had a category "files/changes to be snapshotted", but that sounds a bit silly.
Maybe an unofficial subcommand if/when/is-it-not-already the jj_lib/jj_cli crates are usable to make this 🙃

@arxanas
Copy link
Contributor

arxanas commented Jan 6, 2025

To reiterate, in my previous comment, I was mainly proposing that we should establish a single unified design with a minimal number of additional concepts, rather than special-casing behavior for untracked files, as we've done so far; and then we should reuse the design for many other kinds of files.

In the comment below, I also point out what I consider to be remaining UX issues, which I should have raised first.


@arxanas are you arguing for snapshotting of untracked files in the current commit or in the state of working copy? I think snapshotting in working copy would give a reasonable behaviour (#323 (comment))

I'm not in favor of specializing the working copy to handle untracked files. I think it would be better to embed the concepts into commits generally. Some of the semantics (like when switching commits) will probably be the same regardless of whether we literally embed untracked files/handles into commits.

Does this representation require different data than the existing backends save? If so, does that mean that data is only stored locally?

I didn't flesh out the design too much, so there's probably multiple implementation routes:

  • In the simplest conception, it's literally a kind of object outside of Git's object model, so it can't be represented directly.
  • Therefore, you wouldn't be able to push it to a Git remote.
  • For untracked files, you could imagine having to 'resolve' untracked files before pushing commits that contain them.
    • This could entail either explicitly ignoring them or starting to track them.
    • Incidentally for the case of untracked files, marking them as tracked or ignored would convert the result into something representable in the Git object model, but not so for other kinds of files (such as Git LFS).
  • There's a number of other conceivable issues which I won't go into detail about here.

Can you give a definitive answer as to why there should be a difference between ignored and untracked? Is precious really a thing for a VCS to worry about?

Assuming that —

  • precious files are a useful workflow (I certainly think they are, and many others agree)
  • precious files need to live in the working copy
  • the VCS manages the working copy

then I don't see a way for the VCS to avoid worrying about precious files.

  • If filesystems in common use were more capable, and you could say "I want to operate on this directory, but with this set of overlaid files", then we might not have to worry about precious files in the VCS (or many classes of ignored and untracked files).
  • Sparse checkouts also address some of the same problems, as mentioned in my previous comment.
    • You could also just argue that sparse checkouts exist as a way to work around deficiencies in filesystems.

I'll admit that "untracked" is perhaps the weakest-justified variant of the kinds of files in the working copy:

  • I think it mainly reflects the workflow that there are extra files in the working copy, and they haven't been categorized into "tracked"/"ignored"/"precious"(/"large" if necessary)/etc. yet.
    • I didn't list that as one of the explicit workflows, but maybe I should.
  • The main argument for keeping a separate "untracked" state is probably that Git and jj have determined that neither of the following are good defaults:
    • When you choose to ignore/not track everything by default, then people can and do fail to commit and push important work.
    • When you choose to track everything by default, then people can and do commit and push large/secret files.
  • I think there is real value in being forced to explicitly choose whether a file should be tracked or not — at some point in the development lifecycle.
    • I think it should be later than it currently is with jj, but earlier than "never" with Git (in that doesn't enforce that you make a decision about untracked files at some point).

Errr... no: "not shared/published yet" is definitely not the same as "losing work". That's a serious misrepresentation.

I agree that a "not shared/published yet" state is definitely not the same as "losing work", but untracked files (neither in Git nor jj) do not represent a "not shared/published yet" state. They certainly should.

Actually I don't recall seeing a UI affordance in this thread that would help mitigate this issue? Maybe somebody already proposed this in the thread? I think the combination of

  • deferring the categorization of untracked files into more specific classes
  • prohibiting pushes containing untracked files

would help both users who both prefer to auto-track and those who prefer not to auto-track. I believe the current set of solutions let users opt into or out of auto-tracking entirely, but it seems like a local maximum to unblock today's users, rather than the ideal UX.

The combination of the above is actually independent of the design and implementation, but in my previous post, I focused on the design and implementation rather than what I consider to be a remaining UX issue, which was probably a rhetorical mistake.

(The are some UX reasons to slightly prefer embedding untracked files in commits, rather than strictly as a property of the working copy. For example: when you later discover that you meant to track a file, and you've already created a stack of commits, then you can automatically add the file to the commit which logically introduced the file, rather than having to figure out which commit should contain it later.)

Agreed: you could for instance publish a book accidentally missing a section. This requires a lack of automated checking of cross-references, a poor proof-reading process not even noticing that table of contents don't match, etc. but none of these is hard to imagine.

For developers on the other hand, catching a forgotten source file does not even require any CI system in practice. If that new file serves some actual purpose (other than hijacking a shared repo to backup private resources...), then the miss is immediately noticed the moment ANYone or anything (not just CI) tries to compile or run tests etc[*]. Many developers have first hand experience with this (= exactly why jj is trying to address it) and many treat it as a inconvenience very minor compared to the long list of autotracking drawbacks (not just sharing secrets). See, maybe there was a reason all systems before jj behave this way. Maybe it was not just a lack of creativity, maybe it was because very many developers actually like it. We'll never know for sure.

Perhaps you didn't intend it this way, but the wording here seems to underplay the severity of failing to push work.

  • I've had situations where I pushed from one machine and pulled from another and — certainly, I discovered that I failed to push what I intended, but that still means that I wasted a bunch of time, perhaps being unable to do any work on a long flight, when the situation was completely avoidable by earlier intervention via a better design.
  • In the worst cases, I've actually e.g. deleted worktrees or repos, thinking that the necessary files were not committed or pushed, and only discovering the sad truth that I managed to lose some important work at some future time.

@marc-h38
Copy link

marc-h38 commented Jan 6, 2025

Perhaps you didn't intend it this way, but the wording here seems to underplay the severity of failing to push work.

I do intend to correct your dramatization.

I've had situations where I pushed from one machine and pulled from another and — certainly, I discovered that I failed to push what I intended, but that still means that I wasted a bunch of time, perhaps being unable to do any work on a long flight, when the situation was completely avoidable by earlier intervention via a better design.

No one disputes that this is an inconvenience. The real questions are "how often?" and "how bad?" I worked for a few decades with many different engineers and none complained like it was a serious issue. To be fair, none were exposed to autotracking yet and I bet some will like it (especially the git commit -a minority) . But that does not mean they considered forgetting to push a file as something significantly affecting their work on a regular basis. Simply because it was a very rare event.

In the worst cases, I've actually e.g. deleted worktrees or repos, thinking that the necessary files were not committed or pushed, and only discovering the sad truth that I managed to lose some important work at some future time.

I wasn't going to reply except for this other, new misrepresentation. When you carelessly delete git clones, you lose much more than uncommitted files: you ALSO lose unpushed branches and stashes! These are 1) more likely to be present than uncommitted files 2) likely losing more work than uncommitted files. git clones MUST always be carefully inspected before deletion no matter what. Autotracking makes a very small difference to that. If you carelessly delete git clones then you get what you deserve - autotracking or not.

@emilazy
Copy link
Contributor

emilazy commented Jan 6, 2025

I said this before, @marc-h38, but I really don’t think your tone here is appropriate. You’ve now accused multiple long‐term contributors and users who have engaged at length with the issues surrounding file tracking of arguing in bad faith and lacking experience. We’ve had extensive and fruitful discussions with people strongly opposed to automatic file tracking like @AngelEzquerra, and this is one of the most extensively‐discussed issues in Jujutsu, but I am disappointed that you have repeatedly chosen to turn up the temperature on the discussion.

To be clear, I don’t at all mind people having strong opinions on this matter; it’s comments like this I find unacceptable (and I am not saying that I think that none of these could ever be reasonably said – but I do think that they are unwarranted by the discussions here):

I'm afraid it's not possible to assume good faith anymore.

My problem is: pretending to help while actually doing the opposite. This is time-consuming, not productive and not respectful.

That's a serious misrepresentation.

Which means they have a pretty good, practical knowledge and understanding of it and can't be misled about it by dramatization.

If you carelessly delete git clones then you get what you deserve

I have no position of power in the project beyond the merge bit and am speaking solely for myself here; I just think it is harmful for the project if this kind of thing goes by unremarked upon, and stifles productive discussion of the options and trade‐offs. I think you’ve made plenty of useful and constructive contributions to the discussion here, but if you’re going to continue, please make a greater effort to be civil and not treat conversations as fights. FWIW, @arxanas, I really appreciate your detailed exploration of the design space as always, and think that explicitly representing untracked files may be a really nice way forward that gets much closer to satisfying everyone.

@marc-h38
Copy link

marc-h38 commented Jan 6, 2025

Thanks @emilazy , I do not believe @arxanas 's presentation of this particular issue was in bad faith, sorry if a bad choice of words gave that impression. I admit I'm getting tired of it being repeatedly presented as a scary and critical issue when it has actually been the (good or bad) routine for the last few decades. This tiredness has negatively affected my tone, apologies.

@necauqua
Copy link
Contributor

necauqua commented Jan 6, 2025

"it has been the routine for the last few decades" is not the best argument imo.

jj explores new things without the burden of having decades of said legacy and so far the autotrack - while being a contentious point - has been overall a net positive in my opinion and opinions of lots of people, and as you might remember I wasn't the biggest fan.

I do agree that some people (cough, @PhilipMetzger, cough - if I'm not super-misremembering) did present no-autotrack as electronic satan at times, but this whole argument often devolves into philosophical battles that are unwinnable, because autotracking and not autotracking are both valid strategies with pros and cons, and I think in the context of jj autotracking makes more sense.

People who have established workflows that depend on it, or are not ready for it (like I was I feel like) got the config option (that I argued strongly for).

I agree that some person who have never heard of a VCS before, starting with jj from scratch, would be better off with autotrack being the default state of things, and they'll learn the pitfalls - the only major one being accidental big files imo, and that's covered by the heuristic thing. And if they committed and pushed some secret they'd do the same thing in git, I think in discord I said that I see the chances of that happening as equal between jj and git.

@arxanas
Copy link
Contributor

arxanas commented Jan 6, 2025

No one disputes that this is an inconvenience. The real questions are "how often?" and "how bad?" I worked for a few decades with many different engineers and none complained like it was a serious issue. To be fair, none were exposed to autotracking yet and I bet some will like it (especially the git commit -a minority) . But that does not mean they considered forgetting to push a file as something significantly affecting their work on a regular basis. Simply because it was a very rare event.

To the questions of "how often?" and "how bad?", I propose the answers are "often" and "it concretely and noticeably impedes productivity".

Empirical evidence from Gap Analysis of the Scholarly Git Experience (Nguyen 2021) suggests that novices are constantly doing this, specifically that "all" participants (n = 44?) had trouble with add/commit/push workflows:

There is a clear disconnect between understanding of the Git terminology (i.e. mental model) and the commands and/or buttons provided in CLI and GUI. Adding to the feature request list, participants all admitted that they often forget to git pull in order to update their own version of the file, as well as git add, commit, and push to save their changes. This was most salient for minimal users who expect software connected to a cloud server or available on the internet will automatically save updates. Some participants suggested that git provide notifications to remind users to pull and commit.

It's clear from the available evidence that the default UX is not good, and simply doesn't match user expectations, and that these issues arise quite regularly in practice.

  • I believe this establishes frequency and severity to a reasonable degree.
  • [philosophical claim] The interviews were conducted on novices, but I claim that UX deficiencies do not simply disappear for the experienced user simply because the user has grown accustomed to them.
    • Given that experienced engineers continue to encounter the same issues (evidenced in this thread, although certainly at a significantly lesser frequency), I think it's clear that many of them are working around a UX deficiency.

In addition: Many experienced engineers leverage the existing Git behavior to implement 'secret'/'ignored'/'precious' workflows.

  • You won't see those workflows addressed in this specific body of research, since they're more advanced, and novices are unlikely to use them.
  • Nonetheless: I believe that most people in the thread agree that we should support these workflows somehow, while addressing the other UX problems.

In terms of how to resolve the UX problem, we can look at What’s Wrong with Git? A Conceptual Design Analysis (De Rosso, Jackson 2013), which provides ideas on how to construct a framework to evaluate potential solutions. To give an example:

Gitless eliminates the concept of an assumed unchanged file by making the concept of an untracked file more general. This addresses the rough edges described in Sec. 5.3 caused by the violation of the generality (and propriety) criteria.

[technical note] This specific quote is referring to collapsing "assume-unchanged" and "untracked" into the same state. I'm actually arguing that we don't do exactly that, but my preferred design still tries to preserve 'generality' by generalizing across various file states and 'propriety' by deferring categorization/state transitions until later.

[general note] We could adopt the same principles and evaluate our solutions under them, or establish our own. I proposed design principles and concrete workflows in my earlier comment, although at a lower level of abstraction than in this paper.

[specific note] Regarding auto-tracking in its current form (including snapshot.auto-track):

  • I claim that it violates at least one of my 'correctness' constraints (roughly corresponding to 'propriety' in the above paper).
  • I argue that any configuration at present will fail to support either the "secrets" workflow or the "don't lose work" workflow (or both).
  • I argue that my proposed solution UX (generalize and categorize untracked/ignored/secret/precious files + defer categorization of such files until later in the development process) meets the correctness constraints with better UX.

I wasn't going to reply except for this other, new misrepresentation. When you carelessly delete git clones, you lose much more than uncommitted files: you ALSO lose unpushed branches and stashes! These are 1) more likely to be present than uncommitted files 2) likely losing more work than uncommitted files. git clones MUST always be carefully inspected before deletion no matter what. Autotracking makes a very small difference to that. If you carelessly delete git clones then you get what you deserve - autotracking or not.

[micro-critique] I have many quibbles about the argumentation here but ended up removing them during editing of this comment.

[macro-critique] Basically, this viewpoint seems to boil down to "the operator should exercise more care". I don't think this is a productive way to approach improving operational safety.

I don't know if there's a name for this philosophy, but typically, I don't expect processes to improve simply by telling the operators that they should be more careful. Instead, I rely on improvements to the processes themselves, which is primarily what we're discussing in this thread.

Here I am proposing: "this is a specific way that I could avoid losing work under these situations that happen to me personally, and I believe others".

  • It should improve operational safety and reduce the impact of operator error.
  • If you want to claim that such a design is not worth the implementation effort, conceptual complexity, or makes some other problematic trade-off, then go ahead. I certainly don't think that my design is without significant potential cost.

It's not that it's necessarily incorrect (well, I have quibbles) to say something like "you are being careless", but it's not useful, either.


I admit I'm getting tired of it being repeatedly presented as a scary and critical issue when it has actually been the (good or bad) routine for the last few decades.

From an argumentation perspective, I think there are two problems here:

  • There is no dichotomy between being "scary and critical issue" and being a "routine for the last few decades".
    • An issue being routine does not automatically mean that it's not severe.
    • Treating it as such in this thread will surely exhaust you and your collaborators.
  • You cannot unilaterally establish for others whether an issue is "scary and critical".
    • I (and novices) personally have the "scary and critical issue" of losing work in common workflows.
    • Others have the "scary and critical issue" of accidentally publishing secrets, which I rarely encounter or find to be severe.
    • Although I rarely personally encounter issues with the secrets workflow, I've assessed it as a realistic issue, and have tried to accommodate it in my design.
    • I think it's fair that you claim "this issue does not happen often in practice, and therefore the design should not optimize for it", for which I've tried to provide an empirical counter-argument above.
    • I don't think it's fair for you to claim that an issue is "not scary and critical" on someone else's behalf, regardless of what the issue is.

EDIT: By the way, statements like "I do intend to correct your dramatization." seem pretty hostile to me. As an example, you could instead say something like "I believe you're overstating the severity of this problem in practice." to communicate the same (?) information. Since I don't know you or your communication style personally, it's quite difficult for me to ascertain if you meant that statement to be less emphatic than I interpreted it to be.

@pfmooney
Copy link

pfmooney commented Jan 6, 2025

In addition: Many experienced engineers leverage the existing Git behavior to implement 'secret'/'ignored'/'precious' workflows.

* You won't see those workflows addressed in this specific body of research, since they're more advanced, and novices are unlikely to use them.

* Nonetheless: I believe that most people in the thread agree that we should support these workflows somehow, while addressing the other UX problems.

Thanks for highlighting cases such as this.

I work in a repo which, for legacy reasons, has a galactic number of untracked files during/after a build (to the point of basically requiring status.showuntrackedfiles=no in the local config). Prior to the addition of auto-track = "none()", making use of jj for that work would have been a total non-starter. The nature of those untracked files is such that excluding them via .gitignore is not feasible.

Having to set that in the config is not an unreasonable burden, IMO. In the longer term, bringing the associated machinery up to par with git (forcing the untracked files back into visibility with git status -u normal, for example), and including more detail about that sort of workflow in the docs would probably make it a pretty smooth experience for those who need it.

@arxanas
Copy link
Contributor

arxanas commented Jan 6, 2025

I work in a repo which, for legacy reasons, has a galactic number of untracked files during/after a build (to the point of basically requiring status.showuntrackedfiles=no in the local config). Prior to the addition of auto-track = "none()", making use of jj for that work would have been a total non-starter. The nature of those untracked files is such that excluding them via .gitignore is not feasible.

Thanks for pointing out your use-case. It sounds relevant for the "Performance: handle scanning many files" criterion. Some questions:

  • Do you know roughly how many tracked files are in the repo?
  • Do you know roughly how many untracked files are generated?
  • Are the untracked files scattered among tracked directories, or do they happen to be isolated to untracked directories (such as a build/ subdirectory for each source directory)?
  • Do you use Watchman in that repo?

If I were to implement some "first-class untracked-files" solution like I was proposing, the necessity of status.showuntrackedfiles=no suggests that there may be so many files that trying to include them in the commit (even if not scanning their contents) could be problematic. (I think there might still be many ways to design for it, such as the existing auto-track config, but also via Watchman, "default-exclude-everything" .gitignore configuration, or "narrow" sparse checkouts.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.