Description
Describe the bug
This may be expected behavior so feel free to close if this is intended. However, it is (potentially) different from postgres behavior and I figured I would mention it. The reproducer can probably explain the issue better than I can.
I'm able to work around the issue by renaming all fields on one of the inputs with a prefix but I didn't have to do this before and so I figured I'd report it and make sure the change is intentional.
To Reproduce
use arrow::array::{ArrayRef, Int32Array, RecordBatch};
use datafusion::prelude::*;
use std::sync::Arc;
#[tokio::main]
async fn main() {
let ctx = SessionContext::new();
let id: ArrayRef = Arc::new(Int32Array::from(vec![0, 1, 2]));
let value: ArrayRef = Arc::new(Int32Array::from(vec![0, 1, 2]));
let batch = RecordBatch::try_from_iter(vec![("id", id), ("value", value)]).unwrap();
ctx.register_batch("tes", batch).unwrap();
let id: ArrayRef = Arc::new(Int32Array::from(vec![1, 2, 3]));
let value: ArrayRef = Arc::new(Int32Array::from(vec![1, 2, 3]));
let batch = RecordBatch::try_from_iter(vec![("id", id), ("value", value)]).unwrap();
ctx.register_batch("tes2", batch).unwrap();
let tes = ctx.table("tes").await.unwrap();
let tes2 = ctx.table("tes2").await.unwrap();
// This succeeds (the two tables have different names and so the qualified names of the columns differ)
let joined = tes
.clone()
.join(tes2, JoinType::Full, &["id"], &["id"], None)
.unwrap();
joined.show().await.unwrap();
// This fails with the error:
//
// SchemaError(DuplicateQualifiedField { qualifier: Bare { table: "tes" }, name: "id" }, Some(""))
let tes_clone = tes.clone();
let joined = tes
.join(tes_clone, JoinType::Full, &["id"], &["id"], None)
.unwrap();
joined.show().await.unwrap();
}
Expected behavior
I would expect both joins to succeed (they did in version 42).
Additional context
In postgres the closest I get is:
CREATE TABLE tes (id int, val int);
INSERT INTO tes (id, val) VALUES (0, 0), (1, 1), (2, 2);
CREATE TABLE tes2 (id int, val int);
INSERT INTO tes2 (id, val) VALUES (1, 1), (2, 2), (3, 3);
SELECT * FROM tes FULL OUTER JOIN tes2 ON tes.id = tes2.id;
SELECT * FROM tes as t1 FULL OUTER JOIN tes as t2 ON t1.id = t2.id;
It's not exactly the same as I have to alias tes
. I will mention that my full motivation here is to support a join we do in lance during a merge_insert. We do a full outer join between the existing data (target table) and the new data (source table). Since these tables have the same schema and they are created with SessionContext::read_table
they have the same name.
An alternative (and maybe simpler) fix would be to introduce a SessionContext::read_table_with_alias
function which takes in an optional table name.