Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many changes for improved flexibility #8

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

aspiers
Copy link
Collaborator

@aspiers aspiers commented May 8, 2013

This patch series is a collection of functionality enhancements, bug fixes, and refactorings. The highlights are:

  • --delete is now optional.
  • Arbitrary arguments can be passed to rsync.
  • File removal is handled separately via run_on_removals
  • The :excludes option is now an Array of Strings, not a Hash of Regexps. This is a backwards-incompatible change; however it should be possible to reintroduce support for Hashes prior to the next release.

This allows new use cases, in particular where multiple source trees are rsync'd to a single merged destination tree, e.g.:

crowbar/crowbar#1795

Previously the presence of --delete would have meant that any change in one of the source trees would cause files from all the other trees to get wiped from the destination.

Adam Spiers added 17 commits May 7, 2013 13:23
guard/guard@306bd4de deprecated the 'ignore_paths' directive
previously available in the Guardfile DSL, in favour of new 'ignore'
and 'filter' directives which accept regexp parameters:

    guard/guard@306bd4de

This means it is no longer practical to propagate the values to
rsync's exclude file, because there is no easy way to convert from
regexps to exclude patterns.

Even if there was, it would not be correct to do it, because 'ignore'
regexps are intended to control the guard listener, which is
independent to the expected behaviour of the rsync invocation.

For example, the user might want certain files only to be rsynced via
the 'run_all' action, and not when they individually change.  This
could be achieved by specifying them in an 'ignore' directive, but not
in guard-rsync's ':excludes' option.

Or the user might choose to compile .sass files to CSS via a
run_on_changes_begin callback:

    https://github.com/guard/guard/wiki/Hooks-and-callbacks

and then rsync only the resulting .css files to the target.  In this
case, the 'ignore' directive could ignore changes to .css files (or
the 'filter' directive could ignore all changes except those to .sass
files) and then the rsync would either exclude .sass files or only
include .css files.
This allows the rsync command to be forked without any quoting issues
which arise through the existence of an intermediate shell process.
The :excludes option expected a Hash whose keys were Regexps.  This
had a number of issues:

  1. It required iterating over the whole source tree to find which
     files matched; this could be costly if the tree is large.

  2. The values were expected to be nil, false, or Procs which would
     transform a matching file path into another file path to be
     excluded.  But this syntax is ugly when you just want to exclude
     a single thing, e.g.

       :excludes => { '.git' => false }

     and didn't actually buy anything useful, since the exclusions
     caused by calling the Proc could be equivalently achieved via
     another Regexp.

  3. It's counter-intuitive since most users already know rsync
     accepts *lists* of exclude patterns, not hashes.

Therefore we simply switch the exclude list to be an Array of native
rsync exclude patterns.  Later we can add back support for Regexps in
case the convenience afforded to the user outweighs the
Dir.glob('**/*') performance hit.
This well-known rsync behaviour can be very useful in many situations,
so it doesn't make sense to intentionally cripple the way guard-rsync
wraps around it.
It's a perfectly valid use case that the input path might be a single
file, or a shell-like expansion pattern (rsync will accept things like
"foo{1,2}").
This paves the way for graceful handling of removal events,
where the include/exclude list will be completely different.
In a guard group containing n Guard::Rsync guards, if
:run_group_on_start is true, at start time each guard not only runs
itself but also the n-l other Guard::Rsync guards via the
::Guard.guards.each loop, resulting in a total of n^2 invocations of
run_all.  This is easily fixed by skipping any Guard::Rsync guards
inside the loop.
run_all is called less frequently, and unlike run_on_change etc., it
is not triggered by a listener event.  Therefore the deltas are likely
to be larger and the consequent output verbosity too high if -v is
used.
If there are multiple guards, this makes it clearer which one failed.
@lukemelia
Copy link
Collaborator

Adam, thanks for the PR. There is a lot here, and @kselden and I would like to work through them in pieces. Do you mind breaking up the 4 bullet-pointed features in your description into 4 separate PRs and we can provide focused feedback and merge each when ready?

@krisselden
Copy link
Owner

Thanks for the PR.

Here is a couple quick points of feedback.

  • The reasoning behind ensure_no_trailing_slash(options[:input]) is because the difference between trailing slash and no trailing slash (start at directory then recurse vs start at the files in the directory then recurse) affects --delete and --exclude-from. The --delete flag only has an effect recursing into a directory so trailing slash means the root directory is unaffected. The root of absolute exclusions is also affected and if you don't start an exclusion with a slash it is then matched anywhere in the file.
  • By moving --delete to just run_on_removals then it can't catch up on deletes that happened while guard was not running (this is the point of run on start).
  • The Hash option of --excludes was so that it could ignore files covered by other guard tasks in the group, so you could for example exclude js files that came from coffee but include other js files.

Feel free to contact me via email or google chat to discuss further.

@aspiers
Copy link
Collaborator Author

aspiers commented May 9, 2013

[Deleted and reposted comment; apparently replying via e-mail disables Markdown - bah 👎]

Hi Kristofor,

  • The reasoning behind ensure_no_trailing_slash(options[:input]) is because the difference between trailing slash and no trailing slash (start at directory then recurse vs start at the files in the directory then recurse) affects --delete and --exclude-from. The --delete flag only has an effect recursing into a directory so trailing slash means the root directory is unaffected. The root of absolute exclusions is also affected and if you don't start an exclusion with a slash it is then matched anywhere in the file.
  • By moving --delete to just run_on_removals then it can't catch up on deletes that happened while guard was not running (this is the point of run on start).

Yes, I'm aware of those points. However, for some use cases (like the example I gave) the trailing slash does make sense, which is why I strongly believe the choice should be left up to the user. And it is very likely that someone who chooses to use guard-rsync already has good experience of using rsync, and is therefore aware of these subtleties. Therefore applying the Principle of Least Surprise, guard-rsync will provide the most intuitive user experience when it is an un-opinionated transparent wrapper around rsync and doesn't do any unexpected munging of the input parameters.

This is also a good justification for allowing extra flexibility in which options are passed to rsync. For example omitting --delete by default still allows the user to specify it, but does not force it down the user's throat. This is particularly important for --delete and other dangerous options which carry a risk of data loss.

In my use case, the omission of --delete is very specifically required for reasons already explained in my pull request's description, and the resulting inability to catch up on deletes which happened prior to guard start is a necessary and worthwhile trade-off.

  • The Hash option of --excludes was so that it could ignore files covered by other guard tasks in the group, so you could for example exclude js files that came from coffee but include other js files.

Ahhh, now I understand; that's nice. So my comment in the comment message for f8fead4 "[that] didn't actually buy anything useful" was wrong. However I still think that all my other points apply. I actually wanted to add back support for Hashes in --excludes in order to avoid breaking backwards compatibility, but didn't get around to that yet. But now I am thinking that this format would be best:

    guard('rsync', {
      :input => 'apps_src/my_app',
      :output => 'apps',
      :excludes => [
        '/.git/',
        { /(.+)\.coffee$/ => lambda { |m| "#{m[1]}.js" } },
        '*.bak',
      ],
    })

since it allows a mix'n'match approach, and means that the performance hit from Dir.glob('*/') is optional and only required if you want to pull the transform trick. How does that sound?

Thanks! :)

@krisselden
Copy link
Owner

I agree --delete should have been optional.

I don't think your changes to delete is in line with "un-opinionated transparent wrapper around rsync" it is as scenario solving as what I had before, and multiple varying srcs synced to one dest is not as common of a use case as just syncing a single src to a dest exactly.

That being said, one can use https://github.com/guard/guard-shell for a completely un-opinionated wrapper.

Being able to catch up to changes on start was one of my primary use cases which is the idea behind it running a group on start, because my src directory is under source control and I need to catch up to removals from other people as it seems fragile to have to remember I need guard running before I pull or removals will be lost.

Maybe there is a way to take it back to a thin wrapper, and provide enough hooks for the other use cases but at a minimum being able to do a perfect sync of a src to dest that can catch up on start seems in keeping with the principal least surprise.

@krisselden
Copy link
Owner

I like your mix'n'match proposal to exclusions.

@aspiers
Copy link
Collaborator Author

aspiers commented May 10, 2013

Thanks for the quick reply ...

I don't think your changes to delete is in line with "un-opinionated transparent wrapper around rsync"

Well, making --delete optional is certainly less opinionated than enforcing it. And silently adding an option the user didn't ask for doesn't strike me as particularly transparent either.

it is as scenario solving as what I had before, and multiple varying srcs synced to one dest is not as common of a use case as just syncing a single src to a dest exactly.

Agreed.

That being said, one can use https://github.com/guard/guard-shell for a completely un-opinionated wrapper.

guard-shell was the first thing I tried even before discovering guard-rsync. But it is no use because it triggers the shell action block once per changed file; this can result in rsync process storms if you do something like a recursive search and replace.

Being able to catch up to changes on start was one of my primary use cases which is the idea behind it running a group on start, because my src directory is under source control and I need to catch up to removals from other people as it seems fragile to have to remember I need guard running before I pull or removals will be lost.

Let's distinguish carefully between "catch up on changes to new/existing files at start" and "catch up on all changes including removals at start". I absolutely agree that the former will be one of the most common use cases, and this pull request provides that as the default behaviour. In contrast, whilst the latter is the current default behaviour, the potential for accidentally causing severe data loss is huge, and can happen merely by misconfiguring either the input or output parameter. IMHO this is not a sensible default. It's trivial for the user to add :extra => '--delete' to their options, but also hard enough that they could not do it without being aware of the risks - and to me that seems like just the right trade-off between convenience and giving the user enough rope.

Maybe there is a way to take it back to a thin wrapper, and provide enough hooks for the other use cases but at a minimum being able to do a perfect sync of a src to dest that can catch up on start seems in keeping with the principal least surprise.

My pull request already provides this, unless by "catch up on start" you actually mean "catch up on start with removals", in which case I think that's a bad idea for the reasons stated above (i.e. it's too opinionated and dangerous).

I like your mix'n'match proposal to exclusions.

Cool. I'll try to find time to amend the pull request to implement this. Thanks!

@aspiers
Copy link
Collaborator Author

aspiers commented Aug 26, 2015

Looks like https://github.com/pmcjury/guard-remote-sync has solved many of these issues and also appears to be maintained.

@aspiers aspiers removed their assignment Aug 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants