feat(linkedin): add recommended jobs adapter with GraphQL pagination support by RickSanchez88E · Pull Request #51 · nashsu/AutoCLI

RickSanchez88E · 2026-04-29T01:26:36Z

Description

Adds a new linkedin recommended command that crawls LinkedIn's personalized job recommendation feed (JYMBII algorithm, at /jobs/collections/recommended/). Unlike the existing linkedin search adapter (REST Voyager API), this endpoint uses GraphQL (/voyager/api/graphql) and requires a browser session.

File: adapters/linkedin/recommended.yaml

Technical Details

API: LinkedIn uses GraphQL with queryId voyagerJobsDashJobCards.* (version-hashed, discovered dynamically via Performance API)
Auth: strategy: header with CSRF token extracted from JSESSIONID cookie
Pagination: Batches of 24 items, automatic multi-page crawl via start offset
Unlimited mode: --limit 0 crawls until no more items (limit > 0 ? limit - fetched : BATCH loop)
Easy Apply detection: Checks footerItems[].type === "EASY_APPLY_TEXT" (not easyApplyUrl which doesn't exist in this API)
Workplace type: Parsed from secondaryDescription.text parentheses, e.g. "London (Hybrid)" → workplace_type: "Hybrid"

Output Columns

rank, title, company, location, workplace_type, salary, posted_time, applicant_count, easy_apply, url

Usage

# Default 200 results
autocli linkedin recommended -f json

# Specify count
autocli linkedin recommended --limit 50 -f json

# Unlimited (crawls all available)
autocli linkedin recommended --limit 0 -f json

# Table format
autocli linkedin recommended --limit 20

# CSV
autocli linkedin recommended --limit 100 -f csv

How to Test

Prerequisites: Chrome must be open with LinkedIn signed in, and the AutoCLI Chrome extension must be installed.

# Quick smoke test (5 results)
autocli linkedin recommended --limit 5

# Verify Easy Apply detection (should see "true" values)
autocli linkedin recommended --limit 24 -f json | grep easy_apply

# Verify pagination (should return exactly 50)
autocli linkedin recommended --limit 50 -f json | python3 -c "import json,sys; d=json.load(sys.stdin); print(f'{len(d)} results')"

# Diagnostic (verify auth & API discovery)
autocli linkedin recommended --limit 3 -f json

Known Quirks / Pitfalls

GraphQL variable encoding: LinkedIn requires colons (:) and parentheses to remain raw (not URL-encoded) in GraphQL variables. Full encodeURIComponent causes HTTP 400. The adapter uses a partial-encode-then-decode approach.
No total count: The API doesn't return a totalCount field. --limit 0 fetches incrementally until the server returns an empty batch.
No applicant_count: Unlike the REST search API, this GraphQL endpoint's jobPostingCard doesn't include applicant count. Column is preserved but always returns "N/A".
No easyApplyUrl field: Easy Apply detection uses footerItems type — verified via 200-job crawl with ~30% Easy Apply rate.
Dynamic queryId: The GraphQL queryId includes a version hash that may change. The adapter discovers it dynamically via performance.getEntriesByType('resource'), so no hardcoded ID to maintain.
Workplace type parsing: Workplace type is embedded in the location string in parentheses. Regex extracts On-site/Hybrid/Remote and strips it from the location field.

Adds `linkedin recommended` adapter for crawling LinkedIn JYMBII algorithm recommended jobs via GraphQL API. Supports automatic pagination, Easy Apply detection via footerItems EASY_APPLY_TEXT, workplace type parsing, and unlimited mode (--limit 0). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…quest signatures, pagination, and test commands

Local LLM (qwen3) → structured JSON → Supabase pipeline: - 5-module Python pipeline: config, preprocess, LLM, db, orchestrator - Grammar-constrained generation via llama.cpp json_schema - 3-attempt retry at temp=0: standard → repair → minimal - Atomic claim/upsert via Supabase RPC functions - Stale processing reaper, dead-letter queue, extraction_runs tracking - Per-run report: console summary + failed-jobs detail + JSON report Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

linkedin recommended --limit 0 --with_jd triggers long-running commands that scroll the full job list and fetch descriptions for each, which can exceed the previous 30-second HTTP timeout. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Add clean_linkedin_jobs.py pipeline that extracts URLs from multiple fields, normalizes URLs, validates LinkedIn records (require easy_apply or external_url), and maps apply_url/source_channel/apply_type correctly. Includes: - clean_linkedin_jobs.py: HTML cleaning, URL extraction cascade, salary parsing, batch dedup, dead letter queue - sync_autocli_jobs.py: Supabase RPC upsert with source_channel/apply_type - 23 unit tests with TDD (clean + sync + validation + URL mapping) - 5 migrations: schema, url_hash, source_channel/apply_type, drop url_hash unique constraint, old data cleanup - daemon health check wait in main.rs bad_count invariant: 776 -> 0 (after cleanup + pipeline fix)

Chrome debugger can detach mid-command on SPA pages (e.g. LinkedIn), returning "Detached while handling command". This error was not in the retry list, causing the extension to give up immediately instead of re-attaching and retrying. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Extension `WINDOW_IDLE_TIMEOUT` (30s) would fire during evaluate steps that run longer than the timeout (e.g. --limit 0 fetching all LinkedIn recommended jobs). Added activeCommands counter per workspace so the idle timer only starts when no commands are in-flight. Added `scripts/autocli-baseline.sh` with 8 pre-flight checks (autocli binary, Chrome process, daemon, extension, LinkedIn reachability, DNS, output dir, disk space) with structured timestamped logging and --json output. Includes 13-test suite at `scripts/test_baseline.sh`.

`check_extension_freshness` compares dist/background.js mtime against a refresh marker file (.baseline-last-refresh). On first run (no marker) it warns; when dist is newer than last refresh it fails with a clear hint to use --refresh-extension. `--refresh-extension` uses browser-harness CDP to navigate to chrome://extensions, find the AutoCLI card, and click its reload button, then updates the marker. Test suite now has 15 tests covering all freshness scenarios.

sync_autocli_jobs.py looked for "apply_type" key in raw records, but LinkedIn raw data uses "easy_apply". Records from this pipeline were silently defaulted to apply_type='unknown'. Added a fallback check for the "easy_apply" field to correctly classify LinkedIn easy-apply jobs. Also ran a SQL migration to fix 271 existing rows that were affected.

Rick Sanchez and others added 16 commits April 29, 2026 02:18

docs: add LinkedIn native recommended implementation plan

9d35915

docs: rework LinkedIn plan — fix capture timing, detail responses, re…

e338d64

…quest signatures, pagination, and test commands

linkedin recommended: add --with_jd and fetch descriptions

9f98ebd

docs: add JD pipeline changelog

dd9f1eb

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(linkedin): add external_url for offsite apply

7a0bfda

docs: design ATS form intelligence worker

aa1f540

fix(linkedin): retry detail fetches to fill external_url

bf0f474

Merge branch 'codex/ats-form-intelligence'

08ee442

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(linkedin): add recommended jobs adapter with GraphQL pagination support#51

feat(linkedin): add recommended jobs adapter with GraphQL pagination support#51
RickSanchez88E wants to merge 16 commits intonashsu:mainfrom
RickSanchez88E:main

RickSanchez88E commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RickSanchez88E commented Apr 29, 2026

Description

Technical Details

Output Columns

Usage

How to Test

Known Quirks / Pitfalls

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant