-
-
Notifications
You must be signed in to change notification settings - Fork 180
Filesets
Build workflows necessarily involve the filesystem because it's not practical to send file contents around in memory as function arguments, and because we want to be able to leverage existing JVM tooling that generally operates on things in the class path and not on in-memory data structures.
Complex project.clj
files typically name numerous places on disk where
plugins should either emit or expect files for various purposes. Unfortunately,
because the places on disk are global to the build process and are possibly
shared by independent destructive processes, configuring builds this way is
brittle.
To aid in task composition, and to alleviate the difficulty inherent in coordinating globally addressable places, boot does things differently.
Boot is more than just a build tool. Boot is a framework for bootstrapping Clojure applications. Since Clojure programs run on the JVM there is a lot of bootstrapping that needs to be done:
- Dependencies – fetch JARs from Maven (the immutable classpath).
- Classpath files – add directories to the classpath (the mutable classpath).
- Environment – prepare the Clojure environment to run the program.
Managing interactions with the filesystem is a key part of this process.
The bootstrapping process can be expressed in terms of input and output:
Fuser →BOOTSTRAP
→
Jcp + Fcp + Fasset + E
where
- Fuser
- User's project files (the
build.boot
file, sources, assets, etc). - Jcp
- JARs on the classpath (the immutable classpath).
- Fcp
- Files in directories on the classpath (the mutable classpath).
- Fasset
- Files in directories not on the classpath that the program might need.
- E
- The Clojure environment in memory.
Of course, the line between bootstrapping and computing proper isn't clearly delineated. Most of the time the bootstrapping process will be extended (via Tasks) to set things up and launch a specific application or build specific artifacts.
This process of extending the bootstrapping phase to create an artifact for deployment or distribution, or to launch an application can be described as a transformation:
Jcp + Fcp + Fasset + E →TASK
→
Jcp + F’cp + F’asset + E’
In any case, the JARs on the classpath cannot be modified and the environment is simply the result of evaluating Clojure expressions; Lisp already provides the abstractions we need to manage those. Boot provides a fileset abstraction to frame the F → F’ component of the transformation in a Lispy, functional way.
Boot provides a fileset record type to manage the parts of the process that involve interaction with the filesystem: an in-memory, immutable representation of the state of the files in Fcp and Fasset at a point in time.
The file-related parts of the process above can be described as:
Fcp + Fasset →FILESET OPERATIONS
→
F’cp + F’asset
or, equivalently, as:
fileset →FILESET OPERATIONS
→ fileset’
Operations on fileset values return new values.
Note: The underlying filesystem is not immutable. The fileset protocol provides a
commit!
method to sync a given immutable fileset with the underlying mutable filesystem. This overwrites the files on disk to reflect the state of the fileset object.
The boot build process is essentially a pipeline of middleware, similar to the Ring servlet architecture, or Clojure transducers. Where Ring handlers take a request map and return a response map, task handlers take and return fileset objects.
The fileset lifecycle goes something like this:
- Receive – handler is passed an immutable fileset as its argument.
- Query – handler obtains a set of files to process from the fileset.
- Work – handler performs some operation, creating files in temp dirs.
- Add – add temp files to fileset, obtaining a new immutable value.
- Commit – sync the underlying filesystem dirs to the fileset.
- Next – call the next handler, passing it the new immutable fileset.
The middleware approach combined with the immutable fileset messaging between handlers provides a basis for creating powerful, composable modules.
The fileset abstraction has a number of desirable characteristics:
- Filesets are values, not places: they can be anonymous and scoped.
-
Tasks can hold onto a value and
commit!
at a later time. -
The flow of files is a succession of values, with occasional
commit!
s.
These characteristics allow boot to exploit efficiencies such as the use of hard links, structural sharing, and copying without exposing implementation details to the rest of the program.
In developing the model for boot's treatment of the side-effecting nature of build tasks, it's helpful to map out the types of tasks and files that comprise a typical build process.
Boot build processes usually consist of two main types of tasks:
-
Build
- compile things, emit code, etc.
- consume and produce intermediate or source files
-
Package
- create JAR files, executables, etc.
- consume intermediate files to produce final artifacts.
Note: tasks may perform activities of one or both types.
From a task's point of view files in the build set fulfill two principal roles. These roles express the creator's intent with respect to how tasks will use them:
-
Input
- may be compiled or processed
- on the build class path
- consumed by build tasks
- created by build tasks
-
Output
- may be incorporated into final artifacts
- emitted by boot to the target directory
- consumed by packaging tasks
- created by build or packaging tasks
Note: these roles are not mutually exclusive; they represent orthogonal concerns, and files may have either or both roles assigned.
Given that files relevant to the build process can be characterized by the two main roles listed above, we can divide the build fileset into four components, corresponding to the four permutations of input and/or output roles:
type | input? | output? | example |
---|---|---|---|
resource | ✓ | ✓ | HTML files, Clojure source (without AOT) |
source | ✓ | ✗ | Java source, Clojure source (with AOT) |
asset | ✗ | ✓ | ?? |
cache | ✗ | ✗ | Various files needed during build |
The relationship between roles and components is one of consumer and producer.
- Consumers – query the fileset for files of a given role, depending on the type of task.
- Producers – add files to the given component of the fileset to express their intent with respect to how subsequent tasks in the pipeline will use them.
Note: tasks are normally both consumer and producer; they consume files from the fileset and create artifacts of their own, adding them to the fileset.
An important principle of the boot build process is that tasks do not refer to named places in the filesystem. Tasks may only create files in managed temp directories provided by boot. These temp directories are:
- Anonymous – tasks do not specify the location of the temp dir.
- Local – tasks do not pass references to temp dirs to other tasks.
- Managed – temp dirs are cleaned up by boot as necessary.
In order to communicate files in these temp directories to the rest of the build process they must be added to the fileset object, described below.
Boot provides a record type, TmpFileSet
, that coordinates interaction
with the filesystem. The fileset object is:
- Immutable – operations on the fileset return new values.
- Snapshot – the fileset models the state of the filesystem at a point in time.
- Transactional – the filesystem can be synced to the fileset at any time.
- Overlay – files are identified by unique paths relative to the fileset root.
The functions that make up the temp dirs API are all in the boot.core namespace.
The only place where tasks are allowed to create or modify files is in temp directories provided by boot.
- (temp-dir!)
- Returns a boot-managed temporary directory, as a
java.io.File
.
The fileset is a tree of TmpFile
objects. The underlying files are read-only.
- (tmppath f)
- Returns the path of
f
relative to the fileset root. - (tmpfile f)
- Returns the underlying
java.io.File
object for the temp filef
.
Obtain sets of temp files from the fileset according to their roles.
- (user-files fs)
-
Returns a set of
TmpFile
objects corresponding to files infs
that were created by the user as part of the project. These are not the actual files from the project–they are temp files that boot keeps synced with the user's files. - (input-files fs)
-
Returns a set of
TmpFile
objects corresponding to files infs
with the input role. - (output-files fs)
-
Returns a set of
TmpFile
objects corresponding to files infs
with the output role.
It is also possible to obtain references to the underlying boot-managed temp directories where the fileset is persisted. These directories are read-only.
- (user-dirs fs)
-
Returns a set of
java.io.File
objects corresponding to the user's source, resource, and asset directories. These are not the actual user directories–they are temp dirs that boot keeps synced with the user's project directories. - (input-dirs fs)
-
Returns a set of
java.io.File
objects corresponding to directories infs
containing files with the input role. - (output-dirs fs)
-
Returns a set of
java.io.File
objects corresponding to directories infs
containing files with the output role.
Fileset operations return new immutable fileset objects. These functions may have hidden side effects. They are not intended to be used in STM transactions.
- (add-resource fs ^File dir)
-
Adds the contents of the
dir
directory tofs
and assigns roles as defined for the resource component above. Paths of added files are relative todir
. Returns a new fileset object. - (add-source fs ^File dir)
-
Adds the contents of the
dir
directory tofs
and assigns roles as defined for the source component above. Paths of added files are relative todir
. Returns a new fileset object. - (add-asset fs ^File dir)
-
Adds the contents of the
dir
directory tofs
and assigns roles as defined for the asset component above. Paths of added files are relative todir
. Returns a new fileset object. - (rm fs tmpfiles)
-
Removes the
TmpFile
s intmpfiles
from the filesetfs
. Returns a new fileset object. - (cp fs ^File src-file ^TmpFile dest-tmpfile)
-
Replaces the contents of
dest-tmpfile
with the contents ofsrc-file
. Returns a new fileset object.
The fileset may be "synced" to the filesystem at any time. This is the only way for tasks to effect mutation of the classpath or communicate files to other tasks in the pipeline.
- (commit! fs)
-
Syncs the underlying managed directories with the immutable fileset object
fs
, rebuilding the underlying directories according to its internal state. Returns the fileset object.
To demonstrate how filesets are used, consider a task that compiles files with
the .lc
extension to .uc
by converting all lower case characters to upper
case.
(ns acme.boot-lc
{:boot/export-tasks true}
(:require
[boot.core :as c]
[clojure.java.io :as io]))
(defn- compile-lc!
[in-file out-file]
(doto out-file
io/make-parents
(spit (.toUpperCase (slurp in-file)))))
(defn- lc->uc
[path]
(.replaceAll path "\\.lc$" ".uc"))
(c/deftask lc
"Compile .lc files."
[]
(let [tmp (c/temp-dir!)] ; [1]
(fn middleware [next-handler] ; [2]
(fn handler [fileset] ; [3]
(c/empty-dir! tmp) ; [4]
(let [in-files (c/input-files fileset) ; [5]
lc-files (c/by-ext [".lc"] in-files)] ; [6]
(doseq [in lc-files] ; [7]
(let [in-file (c/tmpfile in) ; [7.i]
in-path (c/tmppath in) ; [7.ii]
out-path (lc->uc in-path) ; [7.iii]
out-file (io/file tmp out-path)] ; [7.iv]
(compile-lc! in-file out-file))) ; [7.v]
(-> fileset ; [8]
(c/add-resource tmp) ; [9]
c/commit! ; [10]
next-handler)))))) ; [11]
The first two functions are just helper functions, representing processes that might be running in Pods in a real-world task. The task definition is where the interesting stuff happens:
- First, we obtain a temporary directory in which the task can create files. This is bound locally and closed over by the middleware the task returns, so the task can reuse the temp dir across build iterations.
- Tasks return middleware (similar to Ring middleware).
- Task middleware return handlers (similar to Ring handlers).
- Inside the handler, the first thing we do is empty the temp dir, ensuring that stale files from previous builds are removed. A more sophisticated implementation could track dependencies and recompile only the source files that have changed, but for simplicity we will just rebuild everything.
- We query the fileset, obtaining a set of input files. (This is a build-type
task, so we consume files with the input role.) Note that this returns a
set of
TmpFile
objects. - We then filter the input files, keeping only the
.lc
files–the sources we will be compiling. Note that this returns a set ofTmpFile
objects. - Then, we compile each of the filtered input files, producing output files
in the temp dir.
- Get a reference to the underlying source file.
- Get the path of the source file relative to the fileset root.
- Compute the path of the output file relative to the temp dir.
- Create an output file in the temp dir with the computed relative path.
- Invoke the compiler to compile the source file.
- At this point the temp dir contains the compiled
.uc
files, but they are not yet incorporated into the fileset object. - We add the contents of the temp dir to the resources component of the fileset, obtaining a new fileset value.
- We commit the fileset to disk, returning the fileset object. The output files are now on the classpath.
- Finally, we pass the fileset to the next handler, returning the result to the previous task in the build pipeline.
You can find other developers and users in the #hoplon
channel on freenode IRC or the boot slack channel.
If you have questions or need help, please visit the Discourse site.
- Environments
- Boot environment
- Java environment
- Tasks
- Built-ins
- Third-party
- Tasks Options
- Filesets
- Target Directory
- Pods
- Boot Exceptions
- Configuring Boot
- Updating Boot
- Setting Clojure version
- JVM Options
- S3 Repositories
- Scripts
- Task Writer's Guide
- Require inside Tasks
- Boot for Leiningen Users
- Boot in Leiningen Projects
- Repl reloading
- Repository Credentials and Deploying
- Snippets
- Troubleshooting
- FAQ
- API docs
- Core
- Pod
- Util