[Bug]: error running workflow create_final_text_units: Could not convert <ArrowStringArray>

### Do you need to file an issue?

- [x] I have searched the existing issues and this bug is not already filed.
- [x] My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
- [x] I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

### Describe the bug

`create_final_text_units` workflow does not work since yesterday's "pandas==3.0.0" release. Installations before yesterday work fine. 

### Steps to reproduce

```
python3 -m venv graphrag
source graphrag/bin/activate
pip install graphrag
graphrag index
```

### Expected Behavior

All pipeline stages complete successfully.

### GraphRAG Config Used

defaults for Azure OpenAI:

```yaml
### This config file contains required core defaults that must be set, along with a handful of common optional settings.
### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/

### LLM settings ###
## There are a number of settings to tune the threading and token limits for LLM calls - check the docs.

models:
  default_chat_model:
    type: azure_openai_chat
    model_provider: openai
    auth_type: api_key # or azure_managed_identity
    api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file, or remove if managed identity
    model: gpt-5-nano
    deployment_name: gpt-5-nano
    api_base: <url>    
    api_version: 2025-01-01-preview
    model_supports_json: true # recommended if this is available for your model.
    concurrent_requests: 20
    async_mode: threaded # or asyncio
    retry_strategy: exponential_backoff
    max_retries: 10
    tokens_per_minute: 50000
    requests_per_minute: 300
    max_completion_tokens: 10000
    temperature: 1
  default_embedding_model:
    type: azure_openai_embedding
    model_provider: openai
    auth_type: api_key
    api_key: ${GRAPHRAG_API_KEY}
    model: text-embedding-3-small
    deployment_name: text-embedding-3-small
    api_base: <url>
    api_version: 2025-01-01-preview
    concurrent_requests: 25
    async_mode: threaded # or asyncio
    retry_strategy: exponential_backoff
    max_retries: 10
    tokens_per_minute: null
    requests_per_minute: null
    max_tokens: 8000
    temperature: 1

### Input settings ###

input:
  storage:
    type: file # or blob
    base_dir: "input"
  file_type: text # [csv, text, json]

chunks:
  size: 1200
  overlap: 100
  group_by_columns: [id]

### Output/storage settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided

output:
  type: file # [file, blob, cosmosdb]
  base_dir: "output"
    
cache:
  type: file # [file, blob, cosmosdb]
  base_dir: "cache"

reporting:
  type: file # [file, blob]
  base_dir: "logs"

vector_store:
  default_vector_store:
    type: lancedb
    db_uri: output/lancedb
    container_name: default

### Workflow settings ###

embed_text:
  model_id: default_embedding_model
  vector_store_id: default_vector_store

extract_graph:
  model_id: default_chat_model
  prompt: "prompts/extract_graph.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1

summarize_descriptions:
  model_id: default_chat_model
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

extract_graph_nlp:
  text_analyzer:
    extractor_type: regex_english # [regex_english, syntactic_parser, cfg]
  async_mode: threaded # or asyncio

cluster_graph:
  max_cluster_size: 10

extract_claims:
  enabled: false
  model_id: default_chat_model
  prompt: "prompts/extract_claims.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  model_id: default_chat_model
  graph_prompt: "prompts/community_report_graph.txt"
  text_prompt: "prompts/community_report_text.txt"
  max_length: 2000
  max_input_length: 8000

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)

snapshots:
  graphml: false
  embeddings: false

### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query

local_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: "prompts/local_search_system_prompt.txt"

global_search:
  chat_model_id: default_chat_model
  map_prompt: "prompts/global_search_map_system_prompt.txt"
  reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
  knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"

drift_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: "prompts/drift_search_system_prompt.txt"
  reduce_prompt: "prompts/drift_search_reduce_prompt.txt"

basic_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: "prompts/basic_search_system_prompt.txt"
```


### Logs and screenshots

```
2026-01-22 11:59:14.0063 - ERROR - graphrag.index.run.run_pipeline - error running workflow create_final_text_units
Traceback (most recent call last):
  File "/home/greenowl/Downloads/graphrag/lib/python3.12/site-packages/graphrag/index/run/run_pipeline.py", line 121, in _run_pipeline
    result = await workflow_function(config, context)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/greenowl/Downloads/graphrag/lib/python3.12/site-packages/graphrag/index/workflows/create_final_text_units.py", line 49, in run_workflow
    await write_table_to_storage(output, "text_units", context.output_storage)
  File "/home/greenowl/Downloads/graphrag/lib/python3.12/site-packages/graphrag/utils/storage.py", line 34, in write_table_to_storage
    await storage.set(f"{name}.parquet", table.to_parquet())
                                         ^^^^^^^^^^^^^^^^^^
  File "/home/greenowl/Downloads/graphrag/lib/python3.12/site-packages/pandas/core/frame.py", line 3135, in to_parquet
    return to_parquet(
           ^^^^^^^^^^^
  File "/home/greenowl/Downloads/graphrag/lib/python3.12/site-packages/pandas/io/parquet.py", line 490, in to_parquet
    impl.write(
  File "/home/greenowl/Downloads/graphrag/lib/python3.12/site-packages/pandas/io/parquet.py", line 191, in write
    table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/table.pxi", line 4796, in pyarrow.lib.Table.from_pandas
  File "/home/greenowl/Downloads/graphrag/lib/python3.12/site-packages/pyarrow/pandas_compat.py", line 651, in dataframe_to_arrays
    arrays = [convert_column(c, f)
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/greenowl/Downloads/graphrag/lib/python3.12/site-packages/pyarrow/pandas_compat.py", line 639, in convert_column
    raise e
  File "/home/greenowl/Downloads/graphrag/lib/python3.12/site-packages/pyarrow/pandas_compat.py", line 633, in convert_column
    result = pa.array(col, type=type_, from_pandas=True, safe=safe)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/array.pxi", line 365, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 91, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: ("Could not convert <ArrowStringArray>\n['ab0ce889-2dd5-4842-ba4e-0cf150772c09',\n 'e196f731-e522-454b-9aa3-2d3fa4171124',\n '7e339fc2-6a41-48f1-9aef-0bf4e9574b8e',\n '7bb73f8b-90f8-4ff4-be04-e445f7b21a20',\n '8d57bd9e-2fd8-483c-ac7f-425097b390ed']\nLength: 5, dtype: str with type ArrowStringArray: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column entity_ids with type object')
2026-01-22 11:59:14.0064 - ERROR - graphrag.api.index - Workflow create_final_text_units completed with errors
2026-01-22 11:59:14.0065 - ERROR - graphrag.cli.index - Errors occurred during the pipeline run, see logs for more details.
```

### Additional Information

- GraphRAG Version: graphrag==2.7.0
- Operating System: Ubuntu 24.04
- Python Version: 3.12.3
- Related Issues: None


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: error running workflow create_final_text_units: Could not convert <ArrowStringArray> #2177

Do you need to file an issue?

Describe the bug

Steps to reproduce

Expected Behavior

GraphRAG Config Used

Logs and screenshots

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: error running workflow create_final_text_units: Could not convert <ArrowStringArray> #2177

Description

Do you need to file an issue?

Describe the bug

Steps to reproduce

Expected Behavior

GraphRAG Config Used

Logs and screenshots

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions