Batch Key not working with sc.pp.scrublet()

### Please make sure these conditions are met

- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of scanpy.
- [ ] (optional) I have confirmed this bug exists on the main branch of scanpy.

### What happened?

I'm trying to predict my doublets using scrublet. So, I'm using the `pp.scrublet` function to do that. The thing is, when I run it without adding the parameter `batch_key`, it works. But is failing when I try to use that parameter. 

I tried these options before submitting the Issue:
- [X] Checked for NaN in original Anndata and Anndata `adata_after_qc`
- [X] Checked for NaN in  Anndata with simulated doublets `adata_sim`
- [X] It's not a problem on my `.obs` because I already used the batch effect correction in other Scanpy functions and it worked
- [X] Ran the function with and without the parameter and only works without the parameter
- [X] Run the simulation within each batch independently

I think it might be a problem managing the batches when you have the combination of pre-simulated doublets using `sc.pp.scrublet_simulate_doublets()` + original Anndata + `batch_key` because that the output of `sc.pp.scrublet_simulate_doublets()` removes the `.obs` in the Anndata output.

### Minimal code sample

```python
# /// script
# requires-python = ">=3.12"
# dependencies = [
#   "scanpy@git+https://github.com/scverse/scanpy.git@main",
# ]
# ///
#
# This script automatically imports the development branch of scanpy to check for issues

import scanpy as sc

# TODO: create `adata_after_qc`
# Simulate an AnnData similar to the one in question
n_obs, n_vars = 4151, 8342
dtype = np.int64

# Generate a sparse random count matrix (Poisson-distributed)
X = sp.csr_matrix(np.random.poisson(1, size=(n_obs, n_vars)), dtype=dtype)

adata_dummy = ad.AnnData(X)

# Example metadata
adata_dummy.obs["batch"] = np.random.choice(["r1", "r2", "r3", "r4"], size=n_obs)
adata_dummy.var["gene_symbol"] = [f"gene_{i}" for i in range(n_vars)]

# Store raw counts layer
adata_dummy.layers["raw_counts"] = adata_dummy.X.copy()


# Select the gene that I will modify to have zero variance
zero_var_gene_idx = 13
zero_var_gene_name = adata_dummy.var_names[zero_var_gene_idx]

# Get indices of cells belonging to batch 'r1'
batch_mask = adata_dummy.obs["batch"] == "r1"
r1_idx = np.where(batch_mask)[0]

# Set that gene's values in r1 to a constant
X_dense = adata_dummy.X.toarray()
X_dense[r1_idx, zero_var_gene_idx] = 2  # Constant value of gene

# Convert back to csr sparse matrix
adata_dummy.X = sp.csr_matrix(X_dense)

adata_sim = sc.pp.scrublet_simulate_doublets(
    adata_dummy, 
    synthetic_doublet_umi_subsampling = 1.0, # 1.0 == doublet is created adding counts from two random samples 
    layer="raw_counts"
)

sc.pp.scrublet(
    adata_dummy, # Original anndata 
    adata_sim=adata_sim, # Anndata with simulated doublets and modified .X 
    knn_dist_metric="euclidean",
    n_prin_comps=20,
    batch_key= "batch"
)
```

### Error output

```pytb
ValueError: Input X contains NaN.
PCA does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values
```

### Versions

<details>

```
marimo	0.15.2
scanpy	1.11.4
anndata	0.12.2
seaborn	0.13.2
numpy	2.2.6
scipy	1.16.1
pooch	1.8.2 (v1.8.2)
pandas	2.3.2
----	----
h5py	3.14.0
Markdown	3.9
charset-normalizer	3.4.3
pymdown-extensions	10.16.1
cffi	2.0.0
typing_extensions	4.15.0
setuptools	80.9.0
msgpack	1.1.1
jedi	0.19.2
crc32c	2.7.1
websockets	15.0.1
parso	0.8.5
statsmodels	0.14.5
threadpoolctl	3.6.0
colorama	0.4.6
zarr	3.1.2
pillow	11.3.0
numba	0.61.2
llvmlite	0.44.0
session-info2	0.2.1
scikit-learn	1.7.2
psutil	7.0.0
six	1.17.0
narwhals	2.4.0
itsdangerous	2.2.0
anyio	4.10.0
Pygments	2.19.2
joblib	1.5.2
PyYAML	6.0.2
patsy	1.0.1
pyparsing	3.2.3
h11	0.16.0
packaging	25.0
matplotlib	3.10.6
legacy-api-wrap	1.4.1
uvicorn	0.35.0
kiwisolver	1.4.9
natsort	8.4.0
click	8.2.1
numcodecs	0.16.1
donfig	0.8.1.post1
python-dateutil	2.9.0.post0
docutils	0.22
platformdirs	4.5.0
pytz	2025.2
zstandard	0.25.0
tomlkit	0.13.3
tqdm	4.67.1
pycparser	2.22
sniffio	1.3.1
starlette	0.47.3
cycler	0.12.1
----	----
Python	3.13.7 | packaged by conda-forge | (main, Sep  3 2025, 14:30:35) [GCC 14.3.0]
OS	Linux-6.8.0-85-generic-x86_64-with-glibc2.39
CPU	128 logical CPU cores, x86_64
GPU	No GPU found
Updated	2025-10-17 15:04
```

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch Key not working with sc.pp.scrublet() #3837

Please make sure these conditions are met

What happened?

Minimal code sample

Error output

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Batch Key not working with sc.pp.scrublet() #3837

Description

Please make sure these conditions are met

What happened?

Minimal code sample

Error output

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions