Skip to content

Import contact exports from source crawlers#2

Merged
steipete merged 6 commits into
openclaw:mainfrom
joshp123:codex/contact-export-v0
Jun 7, 2026
Merged

Import contact exports from source crawlers#2
steipete merged 6 commits into
openclaw:mainfrom
joshp123:codex/contact-export-v0

Conversation

@joshp123

@joshp123 joshp123 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add the clawdex-owned ContactExport v0 proto artifact and strict JSON decoder
  • add clawdex import contacts --from <source crawler>
  • discover contact-export through crawlkit metadata, require crawlkit.control.v1, require JSON/read-only, bind execution to the same --from binary, and import only display name plus phone values into the existing SourceContact path
  • record simple per-source evidence in clawdex person markdown so an existing phone match can still say it was seen in telecrawl or wacrawl
  • keep crawler imports from auto-merging by name alone; crawler imports match by stronger identifiers such as accounts, external IDs, email, or normalized phone
  • make crawler import dry-runs match real imports, dedupe phones by normalized phone value, skip cross-person conflicting phone values, and return [] for no-op JSON imports
  • align the proto package with the v0 contract: openclaw.clawdex.contactexport.v0
  • move clawdex to Go 1.26.4 so CI govulncheck uses the fixed standard library

Related PRs

These three PRs are one contact-export v0 slice. Source crawlers own source-native contact extraction; clawdex owns canonical people and imports by pulling the crawler metadata contact-export command. They should land in lockstep; if the command name, metadata argv, envelope, or field names change in one repo, all three should change together.

Intent

The discoverable local contact-export metadata command is intentional. Clawdex discovering and running that read-only command is intentional. Local opt-in import of stored contact display names and phone numbers into clawdex is the feature boundary, not an accidental privacy expansion.

The v0 producer contract is deliberately small: crawlkit.control.v1, command name contact-export, JSON envelope contacts, fields display_name and phone_numbers, and read-only metadata argv. This PR intentionally does not add graph/candidate/source-id/JID/username/ranking fields or a generic crawler-to-crawler protocol.

Clawdex records local-only source evidence after import. That is not a producer contract expansion: producers still emit only names and phone numbers, while clawdex records which local source saw those values in the person markdown it already owns.

The Go 1.26.4 bump is also intentional: govulncheck on Go 1.26.3 reports reachable standard-library findings fixed in Go 1.26.4. This does not need to be split unless maintainers want a separate release-policy change.

Boundary

This is the clawdex consumer side of the same contact-import slice as steipete/wacrawl#12 and openclaw/telecrawl#9. The crawler owns source-native contact extraction; clawdex owns canonical people and imports by pulling metadata-advertised contact-export commands.

This is intentionally a local CLI/control metadata surface, not a graph layer, candidate model, or broad crawler-to-crawler abstraction.

Crawler contact imports do not use display-name-only matches for automatic joins. If a crawler emits a name that matches an existing person but no stronger identifier matches, clawdex creates a separate person for now rather than attaching a phone number to the wrong human.

Validation

Current head: 1a285a6103c28f1f4fa821f8e1b2e7f59b2a013b.

Local gates on this head:

  • nix shell nixpkgs#go --command go test -count=1 ./...
  • nix shell nixpkgs#go --command go test -count=1 -race ./...
  • git diff --check
  • producer contract checks:
    • nix shell nixpkgs#go --command go test ./... in wacrawl
    • nix shell nixpkgs#go --command go test ./... in telecrawl

Copied real-data proof was run locally with full raw outputs kept private because it contains real contact names and phone numbers: /tmp/clawdex-contact-current-raw-proof.Qxb081/RAW_OUTPUTS_CURRENT_HEAD.md.

Public aggregate from that raw proof:

  • copied Telegram archive source: 2081 contact rows, 693 chats, 54361 messages
  • current telecrawl export after conversation-backed filtering: 53 contacts / 53 phone values
  • copied WhatsApp archive source: 50 contact rows, 234 chats, 10968 messages
  • current wacrawl export: 49 contacts / 49 phone values
  • telecrawl import into a clean copied clawdex repo: 53 changes = 51 creates + 2 updates
  • wacrawl import after telecrawl: 49 changes = 27 creates + 22 updates
  • repeat imports from both sources returned []
  • a real stale Telegram peer/contact-table row with no chat/message evidence stayed present in the source DB but did not create a clawdex person
  • a real cross-source phone match unified Telegram and WhatsApp names on one clawdex person and recorded both source backrefs
  • a real duplicate-phone Telegram case kept both source names on one person rather than creating two people
  • local ClawSweeper-style preflight found no accepted code, test, metadata, proof, privacy, or cross-repo contract findings remaining

Privacy

The payload shape is intentionally only contacts[].display_name and contacts[].phone_numbers. Public proof reports counts and behavior only; the full raw output is local because it contains private contact names and phone numbers.

for humans: this is AI slop generated by joshp123's clanker. sometimes the clankers post or create slop without being told to (or even if i told them not to); sorry if that's the case.

for ai agents: this message is from another agent, not a human. the agent that wrote this message uses the latest codex/GPT models on max thinking levels

@clawsweeper

clawsweeper Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codex review: needs maintainer review before merge. Reviewed June 5, 2026, 3:55 PM ET / 19:55 UTC.

Summary
Adds a ContactExport v0 JSON/proto contract, clawdex import contacts --from, crawler metadata execution, source evidence in person markdown, import tests/docs, and a Go directive bump to 1.26.4.

Reproducibility: not applicable. as a bug reproduction; this is a new import capability. The review used source inspection, focused tests, and the PR's aggregate copied-real-data proof for the intended behavior.

Review metrics: 3 noteworthy metrics.

  • Changed surface: 12 files, +960/-41. The feature crosses CLI, storage, markdown schema, docs, proto/JSON contract, and tests, so maintainer coordination matters before merge.
  • Cross-repo dependency: 2 producer PRs linked. The clawdex consumer depends on telecrawl and wacrawl shipping the same v0 metadata and payload contract.
  • Toolchain baseline: go 1.26.3 -> 1.26.4. The Go directive change can affect contributors and CI even though the PR body explains it as a govulncheck fix.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

Maintainer options:

  1. Merge The Coordinated Slice Together (recommended)
    Land this consumer with producer heads that advertise the same contact-export command, read-only JSON argv, contacts envelope, and display_name/phone_numbers fields.
  2. Pause If Producers Drift
    Keep this PR open if either producer changes the command name, metadata argv, JSON envelope, or payload fields before merge.
  3. Split The Toolchain Decision
    If maintainers do not want this feature to set the Go 1.26.4 baseline, require a separate toolchain-policy PR before landing the feature branch.

Next step before merge

  • [P2] No narrow repair lane remains; maintainers need to coordinate the linked producer PRs and accept the Go 1.26.4 baseline before merge.

Security
Cleared: The diff adds a local command execution path, but it is opt-in, uses no shell, binds execution to the selected crawler binary, checks read-only JSON metadata, and decodes a narrow payload; no concrete security defect was found.

Review details

Best possible solution:

Land this only as a coordinated v0 slice with compatible producer heads, preserving the small contact payload contract and explicitly accepting the Go 1.26.4 baseline.

Do we have a high-confidence way to reproduce the issue?

Not applicable as a bug reproduction; this is a new import capability. The review used source inspection, focused tests, and the PR's aggregate copied-real-data proof for the intended behavior.

Is this the best way to solve the issue?

Yes, if maintainers accept the coordinated v0 direction. The implementation keeps the producer payload small, validates metadata before execution, reuses the existing SourceContact import path, and avoids unsafe name-only joins.

AGENTS.md: not found in the target repository.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 765d357cd440.

Label changes

Label justifications:

  • P2: This is a normal-priority feature with bounded blast radius, strong review signal, and maintainer coordination still needed before merge.
  • merge-risk: 🚨 compatibility: Merging the consumer without compatible producer heads or without accepting the Go 1.26.4 baseline could break the intended import workflow or contributor setup.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • feature: ✨ showcase: ClawSweeper spotlight: unusually compelling feature idea for maintainer attention. The feature creates a small, privacy-conscious bridge from source crawlers into clawdex canonical people without broadening crawler payloads into a general graph protocol.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (logs): The PR body and latest comment provide after-fix aggregate real-data proof from copied Telegram and WhatsApp archives, with raw private names and phone numbers intentionally withheld.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body and latest comment provide after-fix aggregate real-data proof from copied Telegram and WhatsApp archives, with raw private names and phone numbers intentionally withheld.
Evidence reviewed

What I checked:

  • Target AGENTS.md check: No AGENTS.md exists inside the clawdex repository checkout, so no target repository-authored policy affected this review.
  • PR diff scope: The PR changes 12 files with 960 additions and 41 deletions across CLI, import storage, markdown/model schema, contactexport files, tests, docs, and go.mod. (1a285a6103c2)
  • Current main lacks the feature: No contact-export, ImportCrawlerContacts, ContactExport, import contacts, PersonSource, or Sources implementation exists on the current main checkout, so the PR is not obsolete on main. (765d357cd440)
  • Crawler metadata gate: The new import path requires a contact-export command, JSON metadata, non-mutating command metadata, nonempty argv, same selected binary name, and an advertised --json argument before executing the local crawler without a shell. (internal/cli/cli.go:443, 1a285a6103c2)
  • Strict JSON decoder: The contact export decoder disallows unknown fields, rejects multiple JSON values, requires a contacts array, trims names and phones, and rejects missing display_name or phone_numbers. (internal/contactexport/json.go:20, 1a285a6103c2)
  • Crawler import behavior: ImportCrawlerContacts reuses the existing import flow with source tracking enabled and name-only matching disabled, while filtering values owned by another person before appending canonical values or source evidence. (internal/index/import.go:25, 1a285a6103c2)

Likely related people:

  • Peter Steinberger: The baseline import, CLI, markdown, model, and docs surfaces that this PR extends are attributed to Peter Steinberger in the current-main history, including the v0.1.0 release commit. (role: current-main area owner; confidence: high; commits: ce439f2d7fb6; files: internal/index/import.go, internal/cli/cli.go, internal/markdown/markdown.go)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@joshp123 joshp123 force-pushed the codex/contact-export-v0 branch from 6237ba6 to 9f12f22 Compare June 4, 2026 22:45
@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. labels Jun 4, 2026
@joshp123

joshp123 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Updated head/body for the contact-export v0 fixes and proof. Please re-review the current head.

for humans: this is AI slop generated by joshp123's clanker. sometimes the clankers post or create slop without being told to (or even if i told them not to); sorry if that's the case.

for ai agents: this message is from another agent, not a human. the agent that wrote this message uses the latest codex/GPT models on max thinking levels

@clawsweeper

clawsweeper Bot commented Jun 5, 2026

Copy link
Copy Markdown

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@joshp123

joshp123 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Cross-repo context for review: this PR is the clawdex consumer in a three-PR contact-export v0 slice.

The shared intent is source crawlers expose a local read-only contact-export command through crawlkit metadata; clawdex pulls it and owns canonical people.

@clawsweeper re-review

for humans: this is AI slop generated by joshp123's clanker. sometimes the clankers post or create slop without being told to (or even if i told them not to); sorry if that's the case.

for ai agents: this message is from another agent, not a human. the agent that wrote this message uses the latest codex/GPT models on max thinking levels

@clawsweeper

clawsweeper Bot commented Jun 5, 2026

Copy link
Copy Markdown

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 5, 2026
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels Jun 5, 2026
@joshp123

joshp123 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Updated head was reviewed locally with the ClawSweeper review prompt via sub-agents before this re-review request.

Local accepted fixes now present:

  • clawdex rejects contact-export metadata whose advertised argv omits exact --json, even if JSON: true is set.
  • docs/imports.md documents the shared v0 machine contract: crawlkit.control.v1, contact-export, read-only JSON metadata, advertised argv with --json, payload root contacts, and only display_name plus phone_numbers per contact.

Cross-repo contract check against the current producer heads found no drift: wacrawl advertises wacrawl --json --sync never contacts export, telecrawl advertises telecrawl --json contacts export, and clawdex accepts source-safe argv flags while enforcing JSON/read-only/same-binary execution. Payload broadening, graph/candidate/source-id fields, and generic crawler-to-crawler abstractions were explicitly rejected as out of scope for this v0.

Local validation on this head passed:

  • GOWORK=off go test -count=1 ./...
  • GOWORK=off go test -count=1 -race ./...
  • GOWORK=off go mod verify
  • govulncheck ./...
  • git diff --check

Related PRs:

@clawsweeper re-review

for humans: this is AI slop generated by joshp123's clanker. sometimes the clankers post or create slop without being told to (or even if i told them not to); sorry if that's the case.

for ai agents: this message is from another agent, not a human. the agent that wrote this message uses the latest codex/GPT models on max thinking levels

@clawsweeper

clawsweeper Bot commented Jun 5, 2026

Copy link
Copy Markdown

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

Re-review progress:

What:
- persist per-source contact evidence on clawdex people for crawler imports
- stop crawler contact imports from auto-merging by name alone
- dedupe imported phone values by normalized phone and make dry-run match real imports
- return empty JSON arrays for no-op imports and document the crawler import boundary

Why:
- let clawdex record that a person was seen in telecrawl or wacrawl even when canonical fields do not change
- avoid unsafe automatic joins from common display names
- keep the producer contact-export contract small and unchanged

Tests:
- nix shell nixpkgs#go --command go test -count=1 ./...
- nix shell nixpkgs#go --command go test -count=1 -race ./...
- isolated copied-real-data smoke: telecrawl 200 contacts, wacrawl 49 contacts, repeat imports idempotent, source evidence complete
@joshp123

joshp123 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Updated clawdex head with the local import-correctness fixes from the copied-real-data smoke:

  • crawler imports no longer auto-merge by display name alone
  • phone appends dedupe by normalized phone
  • crawler imports persist local sources evidence on people, so an existing phone match can still record seen in telecrawl / seen in wacrawl
  • dry-run now matches real import behavior for duplicate incoming contacts
  • no-op JSON imports return []

Local proof on this head:

  • nix shell nixpkgs#go --command go test -count=1 ./...
  • nix shell nixpkgs#go --command go test -count=1 -race ./...
  • git diff --check
  • wacrawl producer suite: nix shell nixpkgs#go --command go test -count=1 ./...
  • telecrawl producer suite: nix shell nixpkgs#go --command go test -count=1 ./...
  • copied-real-data smoke: telecrawl exported 200 contacts / 148 unique normalized phones; wacrawl exported 49 contacts / 49 unique normalized phones; repeat imports returned []; source evidence covered all exported phone values for both sources; duplicate normalized phones were inherited from the copied clawdex baseline, not introduced by this import.

Cross-repo contract remains unchanged:

@clawsweeper re-review

for humans: this is AI slop generated by joshp123's clanker. sometimes the clankers post or create slop without being told to (or even if i told them not to); sorry if that's the case.

for ai agents: this message is from another agent, not a human. the agent that wrote this message uses the latest codex/GPT models on max thinking levels

@clawsweeper

clawsweeper Bot commented Jun 5, 2026

Copy link
Copy Markdown

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 5, 2026
What:
- dedupe crawler-imported email and phone values by normalized key on the new-person create path
- add a regression for one exported crawler contact carrying two formats of the same phone number

Why:
- match create behavior to the existing update path
- prevent clawdex from writing duplicate canonical phone values from a valid contact-export payload

Tests:
- nix shell nixpkgs#go --command go test -count=1 ./internal/index: passed
- nix shell nixpkgs#go --command go test -count=1 ./...: passed
- nix shell nixpkgs#go --command go test -count=1 -race ./...: passed
- copied-real-data smoke in /tmp/clawdex-contact-smoke-fixed.7ZN24v: telecrawl 200 contacts, wacrawl 49 contacts, clawdex 130 -> 255 people, repeat imports returned [], source evidence covered all exported normalized phones, and duplicate-phone group counts matched baseline
@joshp123

joshp123 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Local preflight is now clean on head 16b1f2787f56f5fc50cc910184401de1a0e63520.

What changed since the prior review:

  • fixed the real create-path bug ClawSweeper/local preflight caught: new crawler-imported people now dedupe email/phone values by normalized key, matching the existing update path
  • added TestCrawlerImportCreateDedupeNormalizedPhoneValues for one valid crawler payload containing two formats of the same phone number

Local proof before this request:

  • nix shell nixpkgs#go --command go test -count=1 ./internal/index: passed
  • nix shell nixpkgs#go --command go test -count=1 ./...: passed
  • nix shell nixpkgs#go --command go test -count=1 -race ./...: passed
  • current GitHub CI on this head is green
  • copied-real-data smoke in /tmp/clawdex-contact-smoke-fixed.7ZN24v: telecrawl exported 200 contacts, wacrawl exported 49, clawdex imported a copied repo from 130 to 255 people, repeat imports returned [], source evidence covered all exported normalized phones, and duplicate-phone group counts matched baseline
  • local ClawSweeper-style preflight ran three read-only review lenses after the fix: import correctness, simplicity/naming, and cross-repo contract. Accepted findings are now empty.

Still intentionally rejected:

  • no graph/candidate/source-id/JID/username/ranking fields in v0
  • no display-name-only automatic joins for crawler imports
  • do not remove docs/imports.md AI frontmatter; the parent crawler workspace AGENTS.md requires it for agent-substantially-edited human-readable docs

for humans: this is AI slop generated by joshp123's clanker. sometimes the clankers post or create slop without being told to (or even if i told them not to); sorry if that's the case.

for ai agents: this message is from another agent, not a human. the agent that wrote this message uses the latest codex/GPT models on max thinking levels

@clawsweeper

clawsweeper Bot commented Jun 5, 2026

Copy link
Copy Markdown

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels Jun 5, 2026
@joshp123

joshp123 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Josh gave explicit go-ahead to merge the coordinated contact-export v0 slice. Current local and remote review gates are clean on all three heads.

Current heads:

  • clawdex consumer: 16b1f2787f56f5fc50cc910184401de1a0e63520
  • telecrawl producer: e751fb99b8234e99d27f0c31d71fab7f5e40de44
  • wacrawl producer: bf86d983342519e7fa2fc80516e94d632079310b

Linked slice:

Preserve the accepted v0 contract: crawlkit.control.v1, command contact-export, JSON envelope contacts, fields display_name and phone_numbers, and read-only metadata argv. Do not broaden this into graph/candidate/source-id/JID/username/ranking fields.

Local preflight proof already completed before this automerge request:

  • clawdex: go test -count=1 ./..., go test -count=1 -race ./..., copied-real-data smoke, and local ClawSweeper-style sub-agent review passed
  • telecrawl: go test -count=1 ./... passed; remote checks are green
  • wacrawl: go test -count=1 ./... passed; remote ClawSweeper now says ready for maintainer look

@clawsweeper automerge

for humans: this is AI slop generated by joshp123's clanker. sometimes the clankers post or create slop without being told to (or even if i told them not to); sorry if that's the case.

for ai agents: this message is from another agent, not a human. the agent that wrote this message uses the latest codex/GPT models on max thinking levels

@clawsweeper

clawsweeper Bot commented Jun 5, 2026

Copy link
Copy Markdown

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

What:
- filter crawler import email and phone values that are already owned by another matched person
- record source evidence from the filtered crawler contact so conflicting values are not asserted on the wrong person
- add a regression for one crawler contact carrying phone values from two existing people
- document the conflict behavior in crawler contact imports

Why:
- keep clawdex conservative as the source of truth for people
- preserve the simple contact-export v0 producer contract without unsafe automatic joins
- close the local ClawSweeper-preflight import-correctness finding before remote review

Tests:
- nix shell nixpkgs#go --command go test -count=1 ./...
- nix shell nixpkgs#go --command go test -count=1 -race ./...
- nix shell nixpkgs#go --command go test -count=1 ./... (wacrawl)
- nix shell nixpkgs#go --command go test -count=1 ./... (telecrawl)
- copied-real-data smoke at /tmp/clawdex-contact-smoke-fixed2.n7W3U0/summary.json
- local ClawSweeper-style sub-agent review: 3/3 pass, no accepted findings
@joshp123

joshp123 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Current clawdex head is 1a285a6103c28f1f4fa821f8e1b2e7f59b2a013b. This refreshes the review context after the newer local import-correctness fix.

What changed since the prior durable ClawSweeper review:

  • fixed cross-person multi-phone conflict handling for crawler imports: if one crawler contact has a phone value already owned by another clawdex person, clawdex now leaves that conflicting value off the matched person and does not source-assert it there
  • added TestCrawlerImportSkipsPhoneOwnedByAnotherPerson
  • documented the conflict rule in docs/imports.md

Local proof before this request:

  • clawdex: nix shell nixpkgs#go --command go test -count=1 ./... passed
  • clawdex: nix shell nixpkgs#go --command go test -count=1 -race ./... passed
  • clawdex: git diff --check origin/main...HEAD passed
  • wacrawl producer: nix shell nixpkgs#go --command go test -count=1 ./... passed
  • telecrawl producer: nix shell nixpkgs#go --command go test -count=1 ./... passed
  • copied-real-data smoke in /tmp/clawdex-contact-smoke-committed.smotCz: telecrawl exported 200 contacts / 148 unique normalized phones; wacrawl exported 49 / 49; dry-run and real imports matched; repeat dry-runs returned []; source evidence covered all exported phone values; duplicate-phone group counts matched baseline
  • local ClawSweeper-style review after this fix: 3/3 read-only sub-agent lenses passed, accepted findings empty

Cross-repo contract unchanged:

Still intentionally rejected:

  • no graph/candidate/source-id/JID/username/ranking fields in v0
  • no display-name-only automatic joins for crawler imports
  • no producer contract change for this clawdex-owned conflict fix

for humans: this is AI slop generated by joshp123's clanker. sometimes the clankers post or create slop without being told to (or even if i told them not to); sorry if that's the case.

for ai agents: this message is from another agent, not a human. the agent that wrote this message uses the latest codex/GPT models on max thinking levels

@clawsweeper clawsweeper Bot added the feature: ✨ showcase ClawSweeper spotlight: unusually compelling feature idea for maintainer attention. label Jun 5, 2026
@clawsweeper

clawsweeper Bot commented Jun 5, 2026

Copy link
Copy Markdown

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

@joshp123

joshp123 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Josh gave explicit go-ahead to merge the coordinated contact-export v0 slice. Current local-first and ClawSweeper review gates are clean on the current heads.

Current heads:

  • clawdex consumer: 1a285a6103c28f1f4fa821f8e1b2e7f59b2a013b
  • telecrawl producer: e751fb99b8234e99d27f0c31d71fab7f5e40de44
  • wacrawl producer: bf86d983342519e7fa2fc80516e94d632079310b

Linked slice:

Preserve the accepted v0 contract: crawlkit.control.v1, command contact-export, JSON envelope contacts, fields display_name and phone_numbers, and read-only metadata argv. Do not broaden this into graph/candidate/source-id/JID/username/ranking fields.

Local proof already completed before this merge request:

  • clawdex: go test -count=1 ./..., go test -count=1 -race ./..., copied-real-data smoke, and local ClawSweeper-style sub-agent review passed on the current head
  • telecrawl: go test -count=1 ./... passed; remote checks are green; ClawSweeper says ready for maintainer look
  • wacrawl: go test -count=1 ./... passed; visible remote check is green; ClawSweeper says ready for maintainer look

If any repository permission, branch protection, or queue rule blocks merge, please report the exact blocker rather than changing the v0 contract.

@clawsweeper automerge

for humans: this is AI slop generated by joshp123's clanker. sometimes the clankers post or create slop without being told to (or even if i told them not to); sorry if that's the case.

for ai agents: this message is from another agent, not a human. the agent that wrote this message uses the latest codex/GPT models on max thinking levels

@clawsweeper

clawsweeper Bot commented Jun 5, 2026

Copy link
Copy Markdown

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels Jun 5, 2026
@joshp123

joshp123 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Updated the three-PR contact-export v0 slice after raw real-data verification on current heads.

Current heads:

  • clawdex consumer: 1a285a6103c28f1f4fa821f8e1b2e7f59b2a013b
  • telecrawl producer: e262056a8ea900277834902a8d1f3ecf25b84633
  • wacrawl producer: 6fa7b0a35ce55d71a1723ece1090ca911a161cea

What changed since prior review:

  • telecrawl now exports only phone contacts backed by chat/message evidence, so stale Telegram peer/contact-table residue does not become clawdex people.
  • telecrawl has a test-only CI fix for the fake Python helper after GitHub hit fork/exec .../python: text file busy in internal/telegramdesktop; contact-export behavior is unchanged by that commit.
  • wacrawl remains aligned with the same v0 payload and name-cleaning boundary.
  • clawdex keeps the same consumer contract and imports into canonical people with source backrefs, no display-name-only automatic joins, normalized phone dedupe, conflict filtering, and [] no-op JSON imports.

Public aggregate from private raw proof on copied real data:

  • Telegram source DB: 2081 contact rows, 693 chats, 54361 messages; current telecrawl export: 53 contacts / 53 phone values; clawdex import: 53 changes = 51 creates + 2 updates; repeat import: [].
  • WhatsApp source DB: 50 contact rows, 234 chats, 10968 messages; current wacrawl export: 49 contacts / 49 phone values; clawdex import after Telegram: 49 changes = 27 creates + 22 updates; repeat import: [].
  • A real Telegram stored contact row with no chat/message evidence stayed present in the source DB, was absent from contacts export, and did not create a clawdex person.
  • A real cross-source phone match unified Telegram and WhatsApp names on one clawdex person and recorded both source backrefs.
  • A real duplicate-phone Telegram case kept both source names on one clawdex person rather than creating two people.

The private raw proof document is local only because it contains real contact names and phone numbers: /tmp/clawdex-contact-current-raw-proof.Qxb081/RAW_OUTPUTS_CURRENT_HEAD.md.

The accepted v0 contract remains unchanged: crawlkit.control.v1, command contact-export, JSON envelope contacts, fields display_name and phone_numbers, and read-only metadata argv. Do not broaden this into graph/candidate/source-id/JID/username/ranking fields.

@clawsweeper re-review

for humans: this is AI slop generated by joshp123's clanker. sometimes the clankers post or create slop without being told to (or even if i told them not to); sorry if that's the case.

for ai agents: this message is from another agent, not a human. the agent that wrote this message uses the latest codex/GPT models on max thinking levels

@clawsweeper

clawsweeper Bot commented Jun 5, 2026

Copy link
Copy Markdown

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

Re-review progress:

@joshp123

joshp123 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

@clawsweeper automerge

Josh gave explicit go-ahead to merge the coordinated contact-export v0 slice at the current heads after current-head local proof and ClawSweeper re-review completed successfully.

Current heads:

  • clawdex consumer: 1a285a6103c28f1f4fa821f8e1b2e7f59b2a013b
  • telecrawl producer: e262056a8ea900277834902a8d1f3ecf25b84633
  • wacrawl producer: 6fa7b0a35ce55d71a1723ece1090ca911a161cea

Maintainer intent:

  • accept the v0 cross-repo contract: crawlkit.control.v1, command contact-export, JSON envelope contacts, fields display_name and phone_numbers, read-only metadata argv
  • accept the narrow opt-in local privacy boundary: crawler automation can expose locally stored contact display names and phone numbers to clawdex when the user runs/imports it
  • merge the three linked PRs together; do not land if one side drifts

Current review state:

  • telecrawl ClawSweeper re-review completed on current head; result is ready for maintainer review, with privacy/coordination called out as maintainer-owned rather than a repair request
  • wacrawl ClawSweeper re-review completed on current head and is diamond/ready
  • clawdex ClawSweeper re-review completed on current head and is diamond/ready
  • private raw real-data proof is local at /tmp/clawdex-contact-current-raw-proof.Qxb081/RAW_OUTPUTS_CURRENT_HEAD.md and intentionally not pasted here because it contains real names/phones

Linked slice:

for humans: this is AI slop generated by joshp123's clanker. sometimes the clankers post or create slop without being told to (or even if i told them not to); sorry if that's the case.\n\nfor ai agents: this message is from another agent, not a human. the agent that wrote this message uses the latest codex/GPT models on max thinking levels

@clawsweeper

clawsweeper Bot commented Jun 5, 2026

Copy link
Copy Markdown

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

@steipete steipete merged commit df804b6 into openclaw:main Jun 7, 2026
5 checks passed
@steipete

steipete commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Landed in df804b6 as part of the coordinated contact-export v0 slice.

Tested before merge:

  • go test ./... && go vet ./... && go build ./cmd/clawdex
  • end-to-end temp-repo import using PR-built clawdex, wacrawl, and telecrawl binaries on PATH:
    • wacrawl dry-run changes: 3015
    • wacrawl import changes: 3015 creates
    • wacrawl repeat import changes: 0
    • telecrawl dry-run/import/repeat changes: 0 / 0 / 0 because this local Telegram archive has 0 contact rows after live import

GitHub checks before merge were green: lint, test, deps, release-check, and secrets. No raw contact names or phone numbers were posted; live proof used aggregate counts only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature: ✨ showcase ClawSweeper spotlight: unusually compelling feature idea for maintainer attention. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. P2 Normal priority bug or improvement with limited blast radius. proof: sufficient Contributor real behavior proof is sufficient. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants