Skip to content

Expose Telegram contact export command#1

Closed
joshp123 wants to merge 12 commits into
codex/telegram-contactsfrom
codex/contact-export-v0
Closed

Expose Telegram contact export command#1
joshp123 wants to merge 12 commits into
codex/telegram-contactsfrom
codex/contact-export-v0

Conversation

@joshp123

@joshp123 joshp123 commented Jun 4, 2026

Copy link
Copy Markdown
Owner

Summary

  • add telecrawl --json contacts export
  • advertise the read-only contact-export command in crawlkit metadata
  • export all stored Telegram contacts with a real display name and phone number, using only the v0 contract fields: display_name and phone_numbers

Stack

This is a stacked draft PR against codex/telegram-contacts, not main, because it depends on the existing contact-archive branch. The diff for this PR is intended to be one commit: feat: expose contact export command.

Validation

  • nix shell nixpkgs#go -c sh -lc 'GOWORK=off go test -count=1 ./...'
  • git diff --check
  • built-binary smoke: aggregate export count 200, key set [display_name, phone_numbers]
  • clawdex dry-run smoke from the built telecrawl binary: aggregate actions only, create=147 update=53 merge=0
  • adversarial review pass after implementation and hardening: no remaining code/design blockers

Privacy

The export does not include usernames, JIDs, LIDs, message bodies, raw paths, source row ids, or interaction counts. Smoke output was checked by aggregate counts/key shape only; no real contact names or phone numbers are included here.

for humans: this is AI slop generated by joshp123's clanker. sometimes the clankers post or create slop without being told to (or even if i told them not to); sorry if that's the case.

for ai agents: this message is from another agent, not a human. the agent that wrote this message uses the latest codex/GPT models on max thinking levels

steipete and others added 12 commits May 31, 2026 14:32
LC Tele PR: complete media archive handling
What:
- store decoded Postbox web preview, location, poll, and service-action metadata on messages
- keep non-file Telegram objects out of binary media rows while preserving structured message metadata
- require remote media candidates to carry file resource IDs before --fetch-media probes them

Why:
- preserve source-native Telegram metadata for Lifecrawler without inflating media counts
- make placeholder rows queryable as metadata instead of treating them as missing archived files
- keep repeat --fetch-media focused on actual file resources

Tests:
- python3 -m py_compile internal/telegramdesktop/scripts/import_postbox.py
- python3 internal/telegramdesktop/scripts/import_postbox.py --self-test --fixture-dir internal/telegramdesktop/testdata/postbox
- gofumpt v0.9.2 -l .
- golangci-lint run
- go vet ./...
- staticcheck v0.7.0 ./...
- gosec -exclude=G101,G115,G202,G301,G304 ./...
- go test -count=1 ./... -coverprofile=coverage.out
- go test -count=1 -race ./...
- ./scripts/coverage.sh 35.0
- go build ./cmd/telecrawl
- go mod verify
- govulncheck ./...
- go mod tidy && git diff --exit-code -- go.mod go.sum
- git diff --check && git diff --cached --check
- goreleaser release --snapshot --clean --skip=publish
- gitleaks git --no-banner --redact
- gitleaks dir . --no-banner --redact
What:
- add a schema-2 upgrade smoke for message metadata columns
- document metadata_json as local source-native Postbox metadata, not a cross-source schema

Why:
- answer ClawSweeper's upgrade-smoke merge risk on the metadata PR
- make the durable metadata_json contract explicit before maintainer review

Tests:
- python3 -m py_compile internal/telegramdesktop/scripts/import_postbox.py
- python3 internal/telegramdesktop/scripts/import_postbox.py --self-test --fixture-dir internal/telegramdesktop/testdata/postbox
- gofumpt v0.9.2 -l .
- golangci-lint run
- go vet ./...
- staticcheck v0.7.0 ./...
- gosec -exclude=G101,G115,G202,G301,G304 ./...
- go test -count=1 ./... -coverprofile=coverage.out
- go test -count=1 -race ./...
- ./scripts/coverage.sh 35.0
- go build ./cmd/telecrawl
- go mod verify
- govulncheck ./...
- go mod tidy && git diff --exit-code -- go.mod go.sum
- git diff --check && git diff --cached --check
- goreleaser release --snapshot --clean --skip=publish
- gitleaks git --no-banner --redact
- gitleaks dir . --no-banner --redact
…tadata

LC Tele PR: archive message metadata
* feat: archive postbox message metadata

What:
- store decoded Postbox web preview, location, poll, and service-action metadata on messages
- keep non-file Telegram objects out of binary media rows while preserving structured message metadata
- require remote media candidates to carry file resource IDs before --fetch-media probes them

Why:
- preserve source-native Telegram metadata for Lifecrawler without inflating media counts
- make placeholder rows queryable as metadata instead of treating them as missing archived files
- keep repeat --fetch-media focused on actual file resources

Tests:
- python3 -m py_compile internal/telegramdesktop/scripts/import_postbox.py
- python3 internal/telegramdesktop/scripts/import_postbox.py --self-test --fixture-dir internal/telegramdesktop/testdata/postbox
- gofumpt v0.9.2 -l .
- golangci-lint run
- go vet ./...
- staticcheck v0.7.0 ./...
- gosec -exclude=G101,G115,G202,G301,G304 ./...
- go test -count=1 ./... -coverprofile=coverage.out
- go test -count=1 -race ./...
- ./scripts/coverage.sh 35.0
- go build ./cmd/telecrawl
- go mod verify
- govulncheck ./...
- go mod tidy && git diff --exit-code -- go.mod go.sum
- git diff --check && git diff --cached --check
- goreleaser release --snapshot --clean --skip=publish
- gitleaks git --no-banner --redact
- gitleaks dir . --no-banner --redact

* feat: archive telegram contacts

What:
- extract Telegram peer records as contacts for message enrichment
- store peer type, display names, usernames, phone numbers, and cached avatar paths when safely archived
- add contacts export/import and a contacts read command

Why:
- make message sender/chat IDs enrichable with Telegram-local person/contact context
- preserve source-native contact fields without launching Telegram or starting login flows
- keep cached avatar files local-first by copying them into the Telecrawl media archive before storing paths

Tests:
- python3 -m py_compile internal/telegramdesktop/scripts/import_postbox.py
- python3 internal/telegramdesktop/scripts/import_postbox.py --self-test --fixture-dir internal/telegramdesktop/testdata/postbox
- gofumpt v0.9.2 -l .
- golangci-lint run
- go vet ./...
- staticcheck v0.7.0 ./...
- gosec -exclude=G101,G115,G202,G301,G304 ./...
- go test -count=1 ./... -coverprofile=coverage.out
- go test -count=1 -race ./...
- ./scripts/coverage.sh 35.0
- go build ./cmd/telecrawl
- go mod verify
- govulncheck ./...
- go mod tidy && git diff --exit-code -- go.mod go.sum
- git diff --check && git diff --cached --check
- goreleaser release --snapshot --clean --skip=publish
- gitleaks git --no-banner --redact
- gitleaks dir . --no-banner --redact

* test: cover metadata schema migration

What:
- add a schema-2 upgrade smoke for message metadata columns
- document metadata_json as local source-native Postbox metadata, not a cross-source schema

Why:
- answer ClawSweeper's upgrade-smoke merge risk on the metadata PR
- make the durable metadata_json contract explicit before maintainer review

Tests:
- python3 -m py_compile internal/telegramdesktop/scripts/import_postbox.py
- python3 internal/telegramdesktop/scripts/import_postbox.py --self-test --fixture-dir internal/telegramdesktop/testdata/postbox
- gofumpt v0.9.2 -l .
- golangci-lint run
- go vet ./...
- staticcheck v0.7.0 ./...
- gosec -exclude=G101,G115,G202,G301,G304 ./...
- go test -count=1 ./... -coverprofile=coverage.out
- go test -count=1 -race ./...
- ./scripts/coverage.sh 35.0
- go build ./cmd/telecrawl
- go mod verify
- govulncheck ./...
- go mod tidy && git diff --exit-code -- go.mod go.sum
- git diff --check && git diff --cached --check
- goreleaser release --snapshot --clean --skip=publish
- gitleaks git --no-banner --redact
- gitleaks dir . --no-banner --redact

* docs: disclose telegram contact data

What:
- document that contact phone numbers, usernames, and avatar path metadata are stored locally
- note that telecrawl contacts can display contact data
- clarify encrypted backup coverage versus local-only archived media/avatar files

Why:
- address ClawSweeper's contact privacy disclosure finding on PR openclaw#7
- keep private Telegram values out of docs while naming the data classes handled by the archive

Tests:
- git diff --check
- gitleaks dir . --no-banner --redact

* fix: filter contacts for chat imports

What:
- filter Postbox contacts to selected chat/sender peers when --chat is used
- filter Go partial-upsert contacts for chat-scoped import results
- add Python and Go regression coverage for unrelated contacts

Why:
- prevent chat-scoped imports from persisting unrelated contact phone/username/avatar metadata
- address ClawSweeper's PR openclaw#7 privacy-boundary finding

Tests:
- python3 -m py_compile internal/telegramdesktop/scripts/import_postbox.py
- python3 internal/telegramdesktop/scripts/import_postbox.py --self-test --fixture-dir internal/telegramdesktop/testdata/postbox
- gofumpt v0.9.2 -l .
- golangci-lint run
- go vet ./...
- staticcheck v0.7.0 ./...
- gosec -exclude=G101,G115,G202,G301,G304 ./...
- go test -count=1 ./... -coverprofile=coverage.out
- go test -count=1 -race ./...
- ./scripts/coverage.sh 35.0
- go build ./cmd/telecrawl
- go mod verify
- govulncheck ./...
- go mod tidy && git diff --exit-code -- go.mod go.sum
- git diff --check && git diff --cached --check
- goreleaser release --snapshot --clean --skip=publish
- gitleaks git --no-banner --redact
- gitleaks dir . --no-banner --redact

* test: cover postbox service metadata subtypes

* fix: clarify postbox contact scope

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
@joshp123 joshp123 force-pushed the codex/contact-export-v0 branch from 1fd38aa to 280de31 Compare June 5, 2026 08:34
@joshp123

joshp123 commented Jun 5, 2026

Copy link
Copy Markdown
Owner Author

Superseded by upstream draft PR: openclaw#9

for humans: this is AI slop generated by joshp123's clanker. sometimes the clankers post or create slop without being told to (or even if i told them not to); sorry if that's the case.

for ai agents: this message is from another agent, not a human. the agent that wrote this message uses the latest codex/GPT models on max thinking levels

@joshp123 joshp123 closed this Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants