Skip to content

Oracle NUMBER type incorrectly inferred as double #3133

@jflipts

Description

@jflipts

dlt version

1.16.0

Describe the problem

The oracle NUMBER type is used to represent both integer values as well as floating point values depending on the configuration. See https://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#i22289 for more details.

SQL Alchemy also models this by subclassing sqlalchemy.dialects.oracle.NUMBER from both sqlalchemy.types.Numeric and sqlalchemy.types.Integer. types. See https://docs.sqlalchemy.org/en/20/dialects/oracle.html#sqlalchemy.dialects.oracle.NUMBER

Dlt assumes that all oracle NUMBER types are of the Numeric variant and converts it to a double or decimal. It even mentions that it does this in the relevant code https://github.com/dlt-hub/dlt/blob/devel/dlt/sources/sql_database/schema_types.py#L114

Expected behavior

Instead of assuming that all NUMBER types are of the Numeric variant, dlt should check if NUMBER type is either the Numeric variant or the Integer variant.

In most cases this isn't that big of a deal, but in some larger values cannot be represented as a double while they can be represented as a bigint which becomes a problem during the conversion to arrow:

>>> import pyarrow as pa
>>> pa.array([9309935020231023], type="int64")
<pyarrow.lib.Int64Array object at 0x7b16939b51e0>
[
  9309935020231023
]
>>> pa.array([9309935020231023], type="float64")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow/array.pxi", line 375, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 45, in pyarrow.lib._sequence_to_array
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Integer value 9309935020231023 is outside of the range exactly representable by a IEEE 754 double precision value

Steps to reproduce

Successful pipeline:

  1. Create an oracle table with a column with type NUMBER(17)
  2. Insert some integers into the column
  3. Run a pipeline to extract the table using the sql_table resource
  4. Explore the dlt schema and notice that the column's data type is double

With the pyarrow error on top:

  1. Create an oracle table with a column with type NUMBER(17)
  2. Insert some integers into the column, including a very big number like 9309935020231023
  3. Run a pipeline to extract the table using the sql_table resource with the pyarrow backend
  4. Exception

Operating system

Linux

Runtime environment

Local

Python version

3.12

dlt data source

sql_database(
credentials
schema="my_oracle_schema",
defer_table_reflect=True,
chunk_size=10000,
table_names=["my_oracle_table"],
backend="pyarrow",
reflection_level="full_with_precision",
)

dlt destination

No response

Other deployment details

No response

Additional information

The workaround is to use a simple type adapter, but this should work out of the box

from sqlalchemy.dialects.oracle import NUMBER

def type_adapter_callback(sql_type):
    if isinstance(sql_type, NUMBER):
        return sql_type._type_affinity
    return sql_type  # Use default detection for other types

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions