Skip to content

fix: normalize LLM output types in profile serialization#161

Open
ygh1254 wants to merge 1 commit into666ghj:mainfrom
ygh1254:fix/profile-type-normalization
Open

fix: normalize LLM output types in profile serialization#161
ygh1254 wants to merge 1 commit into666ghj:mainfrom
ygh1254:fix/profile-type-normalization

Conversation

@ygh1254
Copy link

@ygh1254 ygh1254 commented Mar 12, 2026

Summary

  • Add type coercion helpers (_coerce_to_str, _coerce_to_str_list) to safely convert dict/list LLM outputs into plain strings
  • Normalize fields at construction time via OasisAgentProfile.__post_init__ — covers bio, persona, country, profession, gender, interested_topics
  • Defensive coercion in serializers (_save_twitter_csv, _save_reddit_json) as a second safety net before string operations like [:150] and .replace()
  • Normalize in _generate_profile_with_llm() immediately after json.loads() to catch structured outputs before they reach the dataclass

Root Cause

LLM JSON parsing (json.loads) can produce dict or list values for fields declared as str (e.g. bio, persona, country). The serialization code then crashes on string operations like slicing ([:150]) or .replace() applied to non-string types.

Traceback from issue:

File "backend/app/services/oasis_profile_generator.py", line 1296, in _save_reddit_json
  "bio": profile.bio[:150] if profile.bio else f"{profile.name}",
KeyError: slice(None, 150, None)

Testing

The coercion logic handles:

  • bio as dict → extracts text via common keys or json.dumps
  • persona as dict → same
  • country as list → joins into comma-separated string
  • interested_topics as nested dicts → flattens to list[str]

Closes #154

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Mar 12, 2026
@ygh1254 ygh1254 closed this Mar 12, 2026
@ygh1254 ygh1254 reopened this Mar 12, 2026
@ygh1254 ygh1254 force-pushed the fix/profile-type-normalization branch from edd5052 to 77870cc Compare March 12, 2026 04:52
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Profile serialization crashes when LLM returns structured bio/persona fields

1 participant