Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

### Maintenance

- Added immutable Git-share snapshot tags and non-mutating historical restores with `update --ref`, using CrawlKit for shared Git history mechanics.
- Moved FTS5 query escaping onto CrawlKit and refreshed Go dependencies.
- Updated crawlkit through 0.12.2 for shared runtime hardening, SQLite 1.52, and absolute Windows database paths.

## 0.7.2 - 2026-06-10
Expand Down
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ Choose the path that matches your setup:
- `report` summarizes archive activity and git-share freshness without writing SQL
- `publish` exports the local SQLite archive into a git repo as compressed JSONL shards plus a manifest
- `subscribe` configures a git-backed reader that can run without Slack credentials
- `update` pulls and imports the latest git snapshot
- `update` pulls and imports the latest git snapshot, or restores a historical tag/ref without moving the share checkout
- `sync` performs a one-shot crawl from bot/API, MCP connector, wiretap/desktop, or both
- `import` imports a Slack export ZIP or extracted export directory
- `purge` previews or deletes messages and message-owned records older than a cutoff
Expand Down Expand Up @@ -380,10 +380,12 @@ stale_after = "15m"
Behavior:

- `publish` writes gzipped JSONL shards plus `manifest.json` into `repo_path`
- `publish --tag <name>` attaches an immutable lightweight tag to the committed snapshot
- cached non-DM/non-private file media is included by default; use `--no-media` to omit it
- `subscribe` writes a git-reader config, disables Slack API and desktop sources for that config, clones the repo, and imports the snapshot
- pass `--db` to `subscribe` when you want the reader archive to land in a non-default SQLite path
- `update` pulls and re-imports only when the manifest changes
- `update --ref <tag-or-commit>` imports that historical snapshot without checking it out
- `status`, `search`, `messages`, `mentions`, `sql`, `users`, `channels`, and `report` auto-refresh stale git snapshots before reading when `auto_update = true`
- `sync --source bot` and `sync --source all` warm from the git snapshot before hitting Slack when a share remote is configured
- `status` and `doctor` surface the current git-share repo, last import time, and whether the local snapshot is stale
Expand All @@ -395,6 +397,7 @@ Behavior:
```bash
go run ./cmd/slacrawl publish --remote /path/to/private/slacrawl-archive.git --push
go run ./cmd/slacrawl publish --repo ~/.slacrawl/share --branch main --message "archive: daily refresh" --push
go run ./cmd/slacrawl publish --tag backup-2026-06-19 --push
```

Relevant flags:
Expand All @@ -403,6 +406,7 @@ Relevant flags:
- `--remote` sets or overrides the git remote used for publish
- `--branch` chooses the target branch
- `--message` sets the git commit message
- `--tag` creates an immutable snapshot tag and requires a commit
- `--no-commit` exports files without creating a git commit
- `--push` pushes the new commit to `origin`
- `--no-media` omits cached media files from the snapshot
Expand Down Expand Up @@ -435,8 +439,11 @@ Relevant flags:
```bash
go run ./cmd/slacrawl update
go run ./cmd/slacrawl update --repo ~/.slacrawl/share --branch main
go run ./cmd/slacrawl update --ref backup-2026-06-19
```

`--ref` accepts a tag, branch, or commit. Historical imports read Git objects directly and leave the share repo's current branch and working tree unchanged.

### `report`

`report` is the fastest human-readable archive summary and is especially handy in git-share mode because it shows the current archive footprint plus share freshness.
Expand Down
2 changes: 2 additions & 0 deletions SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -312,6 +312,8 @@ Share config:
- `[share].repo_path` is the local clone / working repo path used for publish and update
- `[share].branch` defaults to `main`
- `[share].auto_update` controls whether read commands import stale git snapshots before querying
- `publish --tag <name>` creates an immutable tag for a committed snapshot
- `update --ref <tag-or-commit>` restores a historical snapshot without changing the share checkout
- `[share].stale_after` defines how old the last successful import can be before auto-refresh runs
- share sync state should record both the last successful import time and the last imported manifest generation time

Expand Down
10 changes: 5 additions & 5 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ go 1.26.4
require (
github.com/alecthomas/kong v1.15.0
github.com/golang/snappy v1.0.0
github.com/openclaw/crawlkit v0.12.2
github.com/openclaw/crawlkit v0.12.3-0.20260619112528-82bf1826da3f
github.com/slack-go/slack v0.26.0
github.com/stretchr/testify v1.11.1
github.com/syndtr/goleveldb v1.0.0
Expand All @@ -17,7 +17,7 @@ require (
github.com/aymanbagabas/go-osc52/v2 v2.0.1 // indirect
github.com/charmbracelet/bubbles v1.0.0 // indirect
github.com/charmbracelet/bubbletea v1.3.10 // indirect
github.com/charmbracelet/colorprofile v0.4.1 // indirect
github.com/charmbracelet/colorprofile v0.4.3 // indirect
github.com/charmbracelet/lipgloss v1.1.0 // indirect
github.com/charmbracelet/x/ansi v0.11.7 // indirect
github.com/charmbracelet/x/cellbuf v0.0.15 // indirect
Expand All @@ -32,19 +32,19 @@ require (
github.com/lucasb-eyer/go-colorful v1.4.0 // indirect
github.com/mattn/go-isatty v0.0.22 // indirect
github.com/mattn/go-localereader v0.0.1 // indirect
github.com/mattn/go-runewidth v0.0.23 // indirect
github.com/mattn/go-runewidth v0.0.24 // indirect
github.com/muesli/ansi v0.0.0-20230316100256-276c6243b2f6 // indirect
github.com/muesli/cancelreader v0.2.2 // indirect
github.com/muesli/termenv v0.16.0 // indirect
github.com/ncruces/go-strftime v1.0.0 // indirect
github.com/pelletier/go-toml/v2 v2.3.1 // indirect
github.com/pelletier/go-toml/v2 v2.4.0 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
github.com/rivo/uniseg v0.4.7 // indirect
github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e // indirect
golang.org/x/net v0.53.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
modernc.org/libc v1.72.3 // indirect
modernc.org/libc v1.73.4 // indirect
modernc.org/mathutil v1.7.1 // indirect
modernc.org/memory v1.11.0 // indirect
modernc.org/sqlite v1.52.0 // indirect
Expand Down
32 changes: 16 additions & 16 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ github.com/charmbracelet/bubbles v1.0.0 h1:12J8/ak/uCZEMQ6KU7pcfwceyjLlWsDLAxB5f
github.com/charmbracelet/bubbles v1.0.0/go.mod h1:9d/Zd5GdnauMI5ivUIVisuEm3ave1XwXtD1ckyV6r3E=
github.com/charmbracelet/bubbletea v1.3.10 h1:otUDHWMMzQSB0Pkc87rm691KZ3SWa4KUlvF9nRvCICw=
github.com/charmbracelet/bubbletea v1.3.10/go.mod h1:ORQfo0fk8U+po9VaNvnV95UPWA1BitP1E0N6xJPlHr4=
github.com/charmbracelet/colorprofile v0.4.1 h1:a1lO03qTrSIRaK8c3JRxJDZOvhvIeSco3ej+ngLk1kk=
github.com/charmbracelet/colorprofile v0.4.1/go.mod h1:U1d9Dljmdf9DLegaJ0nGZNJvoXAhayhmidOdcBwAvKk=
github.com/charmbracelet/colorprofile v0.4.3 h1:QPa1IWkYI+AOB+fE+mg/5/4HRMZcaXex9t5KX76i20Q=
github.com/charmbracelet/colorprofile v0.4.3/go.mod h1:/zT4BhpD5aGFpqQQqw7a+VtHCzu+zrQtt1zhMt9mR4Q=
github.com/charmbracelet/lipgloss v1.1.0 h1:vYXsiLHVkK7fp74RkV7b2kq9+zDLoEU4MZoFqR/noCY=
github.com/charmbracelet/lipgloss v1.1.0/go.mod h1:/6Q8FR2o+kj8rz4Dq0zQc3vYf7X+B0binUUBwA0aL30=
github.com/charmbracelet/x/ansi v0.11.7 h1:kzv1kJvjg2S3r9KHo8hDdHFQLEqn4RBCb39dAYC84jI=
Expand Down Expand Up @@ -55,8 +55,8 @@ github.com/mattn/go-isatty v0.0.22 h1:j8l17JJ9i6VGPUFUYoTUKPSgKe/83EYU2zBC7YNKMw
github.com/mattn/go-isatty v0.0.22/go.mod h1:ZXfXG4SQHsB/w3ZeOYbR0PrPwLy+n6xiMrJlRFqopa4=
github.com/mattn/go-localereader v0.0.1 h1:ygSAOl7ZXTx4RdPYinUpg6W99U8jWvWi9Ye2JC/oIi4=
github.com/mattn/go-localereader v0.0.1/go.mod h1:8fBrzywKY7BI3czFoHkuzRoWE9C+EiG4R1k4Cjx5p88=
github.com/mattn/go-runewidth v0.0.23 h1:7ykA0T0jkPpzSvMS5i9uoNn2Xy3R383f9HDx3RybWcw=
github.com/mattn/go-runewidth v0.0.23/go.mod h1:XBkDxAl56ILZc9knddidhrOlY5R/pDhgLpndooCuJAs=
github.com/mattn/go-runewidth v0.0.24 h1:cpokDiIn0MGnhdHwuWnJBITySJ20QyNGnY2kR/ay2DU=
github.com/mattn/go-runewidth v0.0.24/go.mod h1:XBkDxAl56ILZc9knddidhrOlY5R/pDhgLpndooCuJAs=
github.com/muesli/ansi v0.0.0-20230316100256-276c6243b2f6 h1:ZK8zHtRHOkbHy6Mmr5D264iyp3TiX5OmNcI5cIARiQI=
github.com/muesli/ansi v0.0.0-20230316100256-276c6243b2f6/go.mod h1:CJlz5H+gyd6CUWT45Oy4q24RdLyn7Md9Vj2/ldJBSIo=
github.com/muesli/cancelreader v0.2.2 h1:3I4Kt4BQjOR54NavqnDogx/MIoWBFa0StPA8ELUXHmA=
Expand All @@ -70,10 +70,10 @@ github.com/onsi/ginkgo v1.7.0 h1:WSHQ+IS43OoUrWtD1/bbclrwK8TTH5hzp+umCiuxHgs=
github.com/onsi/ginkgo v1.7.0/go.mod h1:lLunBs/Ym6LB5Z9jYTR76FiuTmxDTDusOGeTQH+WWjE=
github.com/onsi/gomega v1.4.3 h1:RE1xgDvH7imwFD45h+u2SgIfERHlS2yNG4DObb5BSKU=
github.com/onsi/gomega v1.4.3/go.mod h1:ex+gbHU/CVuBBDIJjb2X0qEXbFg53c61hWP/1CpauHY=
github.com/openclaw/crawlkit v0.12.2 h1:KivYMOHfemLG9LrfKKI8A/FTDJpdFJyeOreCGbKCsXA=
github.com/openclaw/crawlkit v0.12.2/go.mod h1:+Z9vrCgH8BJ/+3MMoMfnDyhXC9ON7bEDduGvp5TmmuM=
github.com/pelletier/go-toml/v2 v2.3.1 h1:MYEvvGnQjeNkRF1qUuGolNtNExTDwct51yp7olPtrEc=
github.com/pelletier/go-toml/v2 v2.3.1/go.mod h1:2gIqNv+qfxSVS7cM2xJQKtLSTLUE9V8t9Stt+h56mCY=
github.com/openclaw/crawlkit v0.12.3-0.20260619112528-82bf1826da3f h1:U3pEzAcN0SZK++4A/UgbdUX3X+iAPj/r+/CdqE6jLks=
github.com/openclaw/crawlkit v0.12.3-0.20260619112528-82bf1826da3f/go.mod h1:zOJv5WPWO1AuuXO7zW8NRTxb/ZTkIQXYPrx3StmnMUI=
github.com/pelletier/go-toml/v2 v2.4.0 h1:Mwu0mAkUKbittDs3/ADDWXqMmq3EOK2VHiuCkV00Row=
github.com/pelletier/go-toml/v2 v2.4.0/go.mod h1:2gIqNv+qfxSVS7cM2xJQKtLSTLUE9V8t9Stt+h56mCY=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec h1:W09IVJc94icq4NjY3clb7Lk8O1qJ8BdBEF8z0ibU0rE=
Expand Down Expand Up @@ -117,20 +117,20 @@ gopkg.in/yaml.v2 v2.2.1 h1:mUhvW9EsL+naU5Q3cakzfE91YhliOondGd6ZrsDBHQE=
gopkg.in/yaml.v2 v2.2.1/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
modernc.org/cc/v4 v4.28.2 h1:3tQ0lf2ADtoby2EtSP+J7IE2SHwEJdP8ioR59wx7XpY=
modernc.org/cc/v4 v4.28.2/go.mod h1:OnovgIhbbMXMu1aISnJ0wvVD1KnW+cAUJkIrAWh+kVI=
modernc.org/ccgo/v4 v4.34.0 h1:yRLPFZieg532OT4rp4JFNIVcquwalMX26G95WQDqwCQ=
modernc.org/ccgo/v4 v4.34.0/go.mod h1:AS5WYMyBakQ+fhsHhtP8mWB82KTGPkNNJDGfGQCe0/A=
modernc.org/cc/v4 v4.28.4 h1:Hd/4Es+MBj+/7hSdZaisNyu6bv3V0Dp2MdllyfqaH+c=
modernc.org/cc/v4 v4.28.4/go.mod h1:OnovgIhbbMXMu1aISnJ0wvVD1KnW+cAUJkIrAWh+kVI=
modernc.org/ccgo/v4 v4.34.4 h1:OVnSOWQjVKOYkFxoHYB+qQmSHK5gqMqARM+K9DpR/Ws=
modernc.org/ccgo/v4 v4.34.4/go.mod h1:qdKqE8FNIYyysougB1RX9MxCzp5oJOcQXSobANJ4TuE=
modernc.org/fileutil v1.4.0 h1:j6ZzNTftVS054gi281TyLjHPp6CPHr2KCxEXjEbD6SM=
modernc.org/fileutil v1.4.0/go.mod h1:EqdKFDxiByqxLk8ozOxObDSfcVOv/54xDs/DUHdvCUU=
modernc.org/gc/v2 v2.6.5 h1:nyqdV8q46KvTpZlsw66kWqwXRHdjIlJOhG6kxiV/9xI=
modernc.org/gc/v2 v2.6.5/go.mod h1:YgIahr1ypgfe7chRuJi2gD7DBQiKSLMPgBQe9oIiito=
modernc.org/gc/v3 v3.1.2 h1:ZtDCnhonXSZexk/AYsegNRV1lJGgaNZJuKjJSWKyEqo=
modernc.org/gc/v3 v3.1.2/go.mod h1:HFK/6AGESC7Ex+EZJhJ2Gni6cTaYpSMmU/cT9RmlfYY=
modernc.org/gc/v3 v3.1.3 h1:6QAplYyVO+KdPW3pGnqmJDUxtkec8ooEWvks/hhU3lc=
modernc.org/gc/v3 v3.1.3/go.mod h1:HFK/6AGESC7Ex+EZJhJ2Gni6cTaYpSMmU/cT9RmlfYY=
modernc.org/goabi0 v0.2.0 h1:HvEowk7LxcPd0eq6mVOAEMai46V+i7Jrj13t4AzuNks=
modernc.org/goabi0 v0.2.0/go.mod h1:CEFRnnJhKvWT1c1JTI3Avm+tgOWbkOu5oPA8eH8LnMI=
modernc.org/libc v1.72.3 h1:ZnDF4tXn4NBXFutMMQC4vtbTFSXhhKzR73fv0beZEAU=
modernc.org/libc v1.72.3/go.mod h1:dn0dZNnnn1clLyvRxLxYExxiKRZIRENOfqQ8XEeg4Qs=
modernc.org/libc v1.73.4 h1:+ra4Ui8ngyt8HDcO1FTDPWlkAh6yOdaO2yAoh8MddQA=
modernc.org/libc v1.73.4/go.mod h1:DXZ3eO8qMCNn2SnmTNCiC71nJ9Rcq3PsnpU6Vc4rWK8=
modernc.org/mathutil v1.7.1 h1:GCZVGXdaN8gTqB1Mf/usp1Y/hSqgI2vAGGP4jZMCxOU=
modernc.org/mathutil v1.7.1/go.mod h1:4p5IwJITfppl0G4sUEDtCr4DthTaT47/N3aT6MhfgJg=
modernc.org/memory v1.11.0 h1:o4QC8aMQzmcwCK3t3Ux/ZHmwFPzE6hf2Y5LbkRs+hbI=
Expand Down
37 changes: 31 additions & 6 deletions internal/cli/app.go
Original file line number Diff line number Diff line change
Expand Up @@ -1839,6 +1839,7 @@ func (a *App) runPublish(ctx context.Context, configPath string, args []string,
remote := fs.String("remote", cfg.Share.Remote, "git remote")
branch := fs.String("branch", cfg.Share.Branch, "git branch")
message := fs.String("message", "", "commit message")
tag := fs.String("tag", "", "immutable snapshot tag")
noCommit := fs.Bool("no-commit", false, "skip git commit")
push := fs.Bool("push", false, "push to origin")
noMedia := fs.Bool("no-media", !cfg.ShareMediaEnabled(), "omit cached media files")
Expand All @@ -1848,6 +1849,9 @@ func (a *App) runPublish(ctx context.Context, configPath string, args []string,
if fs.NArg() != 0 {
return errors.New("publish takes no positional arguments")
}
if *noCommit && strings.TrimSpace(*tag) != "" {
return errors.New("publish --tag requires a commit")
}
st, err := a.openStore(cfg)
if err != nil {
return err
Expand All @@ -1858,6 +1862,10 @@ func (a *App) runPublish(ctx context.Context, configPath string, args []string,
if err != nil {
return err
}
opts.Tag = strings.TrimSpace(*tag)
if err := share.ValidateTag(ctx, opts); err != nil {
return err
}
manifest, err := share.Export(ctx, st, opts)
if err != nil {
return err
Expand All @@ -1869,6 +1877,10 @@ func (a *App) runPublish(ctx context.Context, configPath string, args []string,
return err
}
}
createdTag, err := share.CreateImmutableTag(ctx, opts)
if err != nil {
return err
}
if *push {
if err := share.Push(ctx, opts); err != nil {
return err
Expand All @@ -1884,6 +1896,7 @@ func (a *App) runPublish(ctx context.Context, configPath string, args []string,
"tables": manifest.Tables,
"media": manifest.Media,
"committed": committed,
"tag": createdTag,
"pushed": *push,
}, format, true)
}
Expand Down Expand Up @@ -1977,6 +1990,7 @@ func (a *App) runUpdate(ctx context.Context, configPath string, args []string, f
repoPath := fs.String("repo", cfg.Share.RepoPath, "local clone path")
remote := fs.String("remote", cfg.Share.Remote, "git remote")
branch := fs.String("branch", cfg.Share.Branch, "git branch")
ref := fs.String("ref", "", "historical git ref to import")
noMedia := fs.Bool("no-media", !cfg.ShareMediaEnabled(), "skip restoring cached media")
if err := fs.Parse(args); err != nil {
return err
Expand All @@ -1993,12 +2007,22 @@ func (a *App) runUpdate(ctx context.Context, configPath string, args []string, f
if err != nil {
return err
}
if err := share.Pull(ctx, opts); err != nil {
return err
}
manifest, imported, err := share.ImportIfChanged(ctx, st, opts)
if err != nil {
return err
var manifest share.Manifest
var imported bool
if strings.TrimSpace(*ref) == "" {
if err := share.Pull(ctx, opts); err != nil {
return err
}
manifest, imported, err = share.ImportIfChanged(ctx, st, opts)
if err != nil {
return err
}
} else {
manifest, err = share.ImportAt(ctx, st, opts, *ref)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid auto-updating before restoring refs

When share.auto_update is enabled and the reader archive is stale, a.openStore(cfg) has already run autoUpdateShare, which calls share.Pull and imports the latest manifest, before this --ref branch runs. In that default configuration, slacrawl update --ref <old-tag> can fast-forward the share checkout first (and an invalid ref still leaves the DB/repo updated to latest), contradicting the advertised non-mutating historical restore. Open the store without auto-update or bypass autoUpdateShare when --ref is set.

Useful? React with 👍 / 👎.

if err != nil {
return err
}
imported = true
}
return a.writeOutput("Update", map[string]any{
"repo_path": opts.RepoPath,
Expand All @@ -2007,6 +2031,7 @@ func (a *App) runUpdate(ctx context.Context, configPath string, args []string, f
"tables": manifest.Tables,
"media": manifest.Media,
"imported": imported,
"ref": strings.TrimSpace(*ref),
}, format, true)
}

Expand Down
13 changes: 12 additions & 1 deletion internal/cli/app_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -611,7 +611,12 @@ func TestPublishSubscribeAndSearchGitArchive(t *testing.T) {

var stdout bytes.Buffer
app := &App{Stdout: &stdout, Stderr: &stdout}
require.NoError(t, app.Run(ctx, []string{"--config", publisherCfgPath, "--json", "publish", "--push"}))
require.NoError(t, app.Run(ctx, []string{"--config", publisherCfgPath, "--json", "publish", "--tag", "test-snapshot", "--push"}))
var publish map[string]any
require.NoError(t, json.Unmarshal(stdout.Bytes(), &publish))
require.Equal(t, "test-snapshot", publish["tag"])
require.Equal(t, true, publish["pushed"])
require.ErrorContains(t, app.Run(ctx, []string{"--config", publisherCfgPath, "publish", "--tag", "invalid", "--no-commit"}), "requires a commit")

readerCfgPath := filepath.Join(dir, "reader.toml")
stdout.Reset()
Expand All @@ -633,6 +638,12 @@ func TestPublishSubscribeAndSearchGitArchive(t *testing.T) {
require.NoError(t, json.Unmarshal(stdout.Bytes(), &rows))
require.Len(t, rows, 1)
require.Equal(t, "archive seed message", rows[0]["text"])

stdout.Reset()
require.NoError(t, app.Run(ctx, []string{"--config", readerCfgPath, "--json", "update", "--ref", "test-snapshot"}))
var update map[string]any
require.NoError(t, json.Unmarshal(stdout.Bytes(), &update))
require.Equal(t, "test-snapshot", update["ref"])
}

func TestSubscribePersistsNoMedia(t *testing.T) {
Expand Down
Loading