Skip to content

feat: olostep as alternative to tavily#148

Open
umerkay wants to merge 4 commits into
Beever-AI:mainfrom
umerkay:main
Open

feat: olostep as alternative to tavily#148
umerkay wants to merge 4 commits into
Beever-AI:mainfrom
umerkay:main

Conversation

@umerkay
Copy link
Copy Markdown

@umerkay umerkay commented May 4, 2026

Summary

Adds Olostep's /searches endpoint as an optional alternative to Tavily for web search in QA external knowledge retrieval. The provider is selectable via the WEB_SEARCH_PROVIDER environment variable (default: tavily for full backward compatibility).

  • What: New optional OLOSTEP_API_KEY and WEB_SEARCH_PROVIDER config vars; provider branching in search_external_knowledge()
  • Why: Customers may prefer Olostep's API or already have existing Olostep credentials; optionality preserves existing Tavily deployments
  • Backward compatible: Defaults to tavily; all existing installs unaffected unless explicitly reconfigured

Closes #

Test plan

  • make lint passes locally (Python syntax validation)
  • All modified files compile without errors
  • New config fields tested in test_config.py
  • Installer test updated to exercise new prompts and env var seeding
  • Tavily code path untouched — no regressions

Manual verification:

python -m py_compile src/beever_atlas/infra/config.py
python -m py_compile src/beever_atlas/agents/tools/external_tools.py
python -c "from beever_atlas.agents.tools.external_tools import search_with_olostep; print('OK')"

Screenshots (UI changes)

N/A — configuration-only feature.

Breaking changes

  • No breaking changes

WEB_SEARCH_PROVIDER defaults to 'tavily'. Existing .env files without this var continue to work unchanged.

Commit trailers checklist

  • Constraint: — Olostep SDK does not expose a /searches namespace; direct HTTP via httpx required
  • Rejected: — Considered pip-installing olostep package; rejected per requirements (HTTP-only integration)
  • Directive: — Provider branching added around, not inside, existing Tavily call; Tavily logic remains unchanged to reduce regression risk
  • Confidence: — high (straightforward provider pattern; no shared state or complex branching logic)
  • Scope-risk: — narrow (new code paths only execute when WEB_SEARCH_PROVIDER=olostep; all other code untouched)
  • Not-tested: — Live API calls to Olostep endpoints not tested (requires valid credentials); mocked in CI only
  • Every commit is signed off (git commit -s) per DCO

@jhkchan
Copy link
Copy Markdown
Member

jhkchan commented May 5, 2026

Nice addition — the branching is clean and Tavily is genuinely backward-compatible. A few things worth tightening before merge:

1. Validate WEB_SEARCH_PROVIDER in the settings model

Today the value flows straight through:

provider = (settings.web_search_provider or "tavily").strip().lower()
if provider == "olostep":
    ...
# everything else (typos, garbage) silently falls through to Tavily

A typo like WEB_SEARCH_PROVIDER=olstep quietly uses Tavily with no warning. Would suggest typing it as a Literal["tavily", "olostep"] in infra/config.py so Pydantic rejects unknown values at startup, or at minimum emit a logger.warning when the value isn't one of the two recognized strings.

2. Log the selected provider

search_external_knowledge never logs which provider it used — operators can only tell from the response source field (external vs external_olostep), which isn't typically logged. One line at the top of the function would close the gap:

logger.info("web_search.provider=%s mode=%s", provider, mode)

3. No fallback on transient failure

If provider=olostep and Olostep returns a 5xx (or auth fails), the call returns an error even when TAVILY_API_KEY is also configured. Probably out of scope for this PR, but worth a follow-up WEB_SEARCH_FALLBACK=true opt-in for resilience — both providers are pay-per-call, so the cost concern is small if it only kicks in on errors.

4. .env.example inline comments cause a parse bug in the installer

-OLOSTEP_API_KEY=    # Optional: Olostep web search (alternative to Tavily)
-WEB_SEARCH_PROVIDER=tavily  # Options: tavily, olostep
+# Optional: Olostep web search (alternative to Tavily)
+OLOSTEP_API_KEY=
+# Options: tavily, olostep
+WEB_SEARCH_PROVIDER=tavily

The atlas installer reads existing values with grep -E "^${key}=" .env | cut -d'=' -f2-, which captures the trailing comment as part of the value. So on a second run, prompt_external_key thinks OLOSTEP_API_KEY is set (to a string starting with whitespace + # Optional...) and offers "(Enter = keep)". python-dotenv itself does strip inline comments correctly, but the installer (and any other shell consumer doing the same grep | cut pattern) does not. Move the comments to their own lines to avoid the foot-gun. (Same fix applies to WEB_SEARCH_PROVIDER.)

Smaller nits

  • httpx.post in search_with_olostep is sync, called from asyncio.to_thread — works, but httpx.AsyncClient would match the rest of the file's style.
  • data.get("result", {}).get("links", [])[:max_results] — if Olostep's response shape ever changes (e.g. results instead of links), the function silently returns [] rather than logging the unexpected shape. A logger.warning("olostep response missing links: %s", list(data.keys())) would help future debugging.

Happy to land #2 and #4 myself in a follow-up if that's easier — the validation + fallback ones (#1, #3) are more architectural and worth your call.

@umerkay
Copy link
Copy Markdown
Author

umerkay commented May 7, 2026

Thanks for the thorough review! I've pushed fixes addressing all the feedback:

  1. WEB_SEARCH_PROVIDER validation — now typed as Literal["tavily", "olostep"] in the Settings model. Pydantic rejects unknown values at startup, so typos like WEB_SEARCH_PROVIDER=olstep will fail loudly instead of silently falling back to Tavily.

  2. Provider logging — added logger.info("web_search.provider=%s mode=%s", provider, mode) so operators can see which provider was selected in the logs, not just infer it from the response source field.

  3. .env.example inline comments — moved trailing comments to separate lines to fix the installer grep | cut parsing bug. On re-run, prompt_external_key no longer captures the comment as part of the value.

  4. Unexpected response shape added logger.warning("olostep response missing links: %s", list(data.keys())) to help with future debugging if Olostep's API contract changes.

Left out of scope (as you noted):

  • Transient failure fallback to Tavily agreed this is a follow-up feature (WEB_SEARCH_FALLBACK opt-in).
  • httpx async client the sync httpx.post called via asyncio.to_thread already works and matches the existing file pattern.

Ready for merge when you are!

@umerkay
Copy link
Copy Markdown
Author

umerkay commented May 12, 2026

@jhkchan The requested changes have been made. I hope the olostep integration can be merged now :)

@alan5543
Copy link
Copy Markdown
Member

Hi @umerkay — thanks for the olostep PR!

The only failing check (Security Audit) is unrelated to your change — it's flagging CVEs in urllib3, mako, and python-multipart that were already patched on main in 6cfcf3b. A rebase onto current main should clear it:

git fetch upstream
git rebase upstream/main
git push --force-with-lease

Let me know if you'd like me to push the rebase for you instead — happy to.

@umerkay
Copy link
Copy Markdown
Author

umerkay commented May 15, 2026

@alan5543 I've merged the main branch into my work. There were some conflicts that I have resolved and run the local checks again. All looks good to merge. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants