Skip to content

INTPYTHON-624 Add PolymorphicEmbeddedModelField #327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions django_mongodb_backend/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,8 +121,10 @@ def _isnull_operator(a, b):
"gte": lambda a, b: {"$gte": [a, b]},
# MongoDB considers null less than zero. Exclude null values to match
# SQL behavior.
"lt": lambda a, b: {"$and": [{"$lt": [a, b]}, {"$ne": [a, None]}]},
"lte": lambda a, b: {"$and": [{"$lte": [a, b]}, {"$ne": [a, None]}]},
"lt": lambda a, b: {"$and": [{"$lt": [a, b]}, DatabaseWrapper._isnull_operator(a, False)]},
"lte": lambda a, b: {
"$and": [{"$lte": [a, b]}, DatabaseWrapper._isnull_operator(a, False)]
},
"in": lambda a, b: {"$in": [a, b]},
"isnull": _isnull_operator,
"range": lambda a, b: {
Expand Down
2 changes: 1 addition & 1 deletion django_mongodb_backend/compiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -746,7 +746,7 @@ def execute_sql(self, result_type):
elif hasattr(value, "prepare_database_save"):
if field.remote_field:
value = value.prepare_database_save(field)
elif not hasattr(field, "embedded_model"):
elif not getattr(field, "stores_model_instance", False):
raise TypeError(
f"Tried to update field {field} with a model "
f"instance, {value!r}. Use a value compatible with "
Expand Down
6 changes: 3 additions & 3 deletions django_mongodb_backend/expressions.py
Original file line number Diff line number Diff line change
Expand Up @@ -190,9 +190,9 @@ def when(self, compiler, connection):

def value(self, compiler, connection): # noqa: ARG001
value = self.value
if isinstance(value, int):
# Wrap numbers in $literal to prevent ambiguity when Value appears in
# $project.
if isinstance(value, list | int):
# Wrap lists & numbers in $literal to prevent ambiguity when Value
# appears in $project.
return {"$literal": value}
if isinstance(value, Decimal):
return Decimal128(value)
Expand Down
2 changes: 2 additions & 0 deletions django_mongodb_backend/fields/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from .embedded_model_array import EmbeddedModelArrayField
from .json import register_json_field
from .objectid import ObjectIdField
from .polymorphic_embedded_model import PolymorphicEmbeddedModelField

__all__ = [
"register_fields",
Expand All @@ -13,6 +14,7 @@
"EmbeddedModelField",
"ObjectIdAutoField",
"ObjectIdField",
"PolymorphicEmbeddedModelField",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive by comment. I am not trying to convince anyone of anything, but I would love to use this feature if it were called GenericEmbeddedModelField. Apologies but I just cannot with the Polymorphic… for two reasons:

  • What is Polymorphic? I have to look it up every single time.
  • Generic is shorter than Polymorphic
  • Generic has some context in Django

Unlike my concern about using Embedded… terminology, there may be some justification that Generic could be used instead of Polymorphic and still be appropriate (whereas with Embedded terminology we're purposely steering away from SQL terminology.)

Copy link
Contributor

@Jibola Jibola Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR -- Polymorphic is a more holistically accurate representation of this field and avoids confusion.

I'm against the use of GenericEmbeddedModelField for several reasons, but the most pertinent ones are:

  1. It's the incorrect usage of the term Generic

    • Our polymorphism (what's commonly viewed as polymorphism) abides by subtype polymorphism which focuses more on fields sharing commonality, whereas Generics abide by parametric polymorphism which emphasize a set of "actions" available to any object/value passed. The contents of that object mean much less than what it can actually do. See the blurb below:

We have already seen that object-oriented languages support subtype polymorphism; generics give us a different kind of polymorphism, parametric polymorphism. Recall that polymorphism means that something can be used at multiple types. Subtype polymorphism allows client code to be written with respect to a single type (as specified by, say, an interface or abstract class) while interoperating with multiple implementations of that type. Parametric polymorphism, also known as generics, solves the opposite problem, in a sense: it allows a single implementation to interoperate with multiple clients who want to use that implementation for different types. This goal is achieved by writing the implementation with respect to a type parameter, a variable ranging over types that can be instantiated with different concrete types. Support for generics is not an essentially object-oriented idea, and was not originally part of Java, having been first introduced in Java 5.

Source

  1. It would further confuse folks

    • In both Django Generic Relations and the, now defunct, Django-MongoDB-Engine Generic Embedding "Generics" implement the properties defined in the point above. Rather than a list of explicitly defined items like in our polymorphic design, it takes quite literally anything and will make it work in its system. Were we to also call it a Generic* it would confuse developers because they could expect those qualities instead of realizing things still need to be explicitly typed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

]


Expand Down
2 changes: 2 additions & 0 deletions django_mongodb_backend/fields/embedded_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
class EmbeddedModelField(models.Field):
"""Field that stores a model instance."""

stores_model_instance = True

def __init__(self, embedded_model, *args, **kwargs):
"""
`embedded_model` is the model class of the instance to be stored.
Expand Down
2 changes: 2 additions & 0 deletions django_mongodb_backend/fields/embedded_model_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@


class EmbeddedModelArrayField(ArrayField):
value_is_model_instance = True

def __init__(self, embedded_model, **kwargs):
if "size" in kwargs:
raise ValueError("EmbeddedModelArrayField does not support size.")
Expand Down
195 changes: 195 additions & 0 deletions django_mongodb_backend/fields/polymorphic_embedded_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
import contextlib

from django.core import checks
from django.core.exceptions import FieldDoesNotExist, ValidationError
from django.db import connections, models
from django.db.models.fields.related import lazy_related_operation

from .embedded_model import KeyTransformFactory


class PolymorphicEmbeddedModelField(models.Field):
"""Field that stores a model instance of varying type."""

stores_model_instance = True

def __init__(self, embedded_models, *args, **kwargs):
"""
`embedded_models` is a list of possible model classes to be stored.
Like other relational fields, each model may also be passed as a
string.
"""
self.embedded_models = embedded_models
kwargs["editable"] = False
super().__init__(*args, **kwargs)

def db_type(self, connection):
return "embeddedDocuments"

def check(self, **kwargs):
from ..models import EmbeddedModel

errors = super().check(**kwargs)
embedded_fields = {}
for model in self.embedded_models:
if not issubclass(model, EmbeddedModel):
return [
checks.Error(
"Embedded models must be a subclass of "
"django_mongodb_backend.models.EmbeddedModel.",
obj=self,
hint="{model} doesn't subclass EmbeddedModel.",
id="django_mongodb_backend.embedded_model.E002",
)
]
for field in model._meta.fields:
if field.remote_field:
errors.append(
checks.Error(
"Embedded models cannot have relational fields "
f"({model().__class__.__name__}.{field.name} "
f"is a {field.__class__.__name__}).",
obj=self,
id="django_mongodb_backend.embedded_model.E001",
)
)
field_name = field.name
if existing_field := embedded_fields.get(field.name):
connection = _get_mongodb_connection()
Copy link
Collaborator

@WaVEV WaVEV Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should run anytime? or only when a migration is applied on the database? 🤔
Given that make migrations does not pass the database, because it should be agnostic to that but when a migration is applied it should not be agnostic. So when do we want to check this? when a migration is created, or anytime?

Edit: change the way that field types are compared is viable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intended to run at the makemigrations stage in order to flag the issue as early as possible. I think it's better for all database-specific checks to run during makemigrations but this requires changes to Django as discussed in https://code.djangoproject.com/ticket/31286#comment:7.

if existing_field.db_type(connection) != field.db_type(connection):
errors.append(
checks.Warning(
f"Embedded models {existing_field.model._meta.label} "
f"and {field.model._meta.label} both have field "
f"'{field_name}' of different type.",
obj=self,
id="django_mongodb_backend.embedded_model.E003",
hint="It may be impossible to query both fields.",
)
)

else:
embedded_fields[field_name] = field
return errors

def deconstruct(self):
name, path, args, kwargs = super().deconstruct()
if path.startswith("django_mongodb_backend.fields.polymorphic_embedded_model"):
path = path.replace(
"django_mongodb_backend.fields.polymorphic_embedded_model",
"django_mongodb_backend.fields",
)
kwargs["embedded_models"] = self.embedded_models
del kwargs["editable"]
return name, path, args, kwargs

def get_internal_type(self):
return "PolymorphicEmbeddedModelField"

def _set_model(self, model):
"""
Resolve embedded model classes once the field knows the model it
belongs to. If any of the items in __init__()'s embedded_models
argument are strings, resolve each to the actual model class, similar
to relational fields.
"""
self._model = model
if model is not None:
for embedded_model in self.embedded_models:
if isinstance(embedded_model, str):

def _resolve_lookup(_, *resolved_models):
self.embedded_models = resolved_models

lazy_related_operation(_resolve_lookup, model, *self.embedded_models)

model = property(lambda self: self._model, _set_model)

def from_db_value(self, value, expression, connection):
return self.to_python(value)

def to_python(self, value):
"""
Pass embedded model fields' values through each field's to_python() and
reinstantiate the embedded instance.
"""
if value is None:
return None
if not isinstance(value, dict):
return value
model_class = self._get_model_from_label(value.pop("_label"))
instance = model_class(
**{
field.attname: field.to_python(value[field.attname])
for field in model_class._meta.fields
if field.attname in value
}
)
instance._state.adding = False
return instance

def get_db_prep_save(self, embedded_instance, connection):
"""
Apply pre_save() and get_db_prep_save() of embedded instance fields and
create the {field: value} dict to be saved.
"""
if embedded_instance is None:
return None
if not isinstance(embedded_instance, self.embedded_models):
raise TypeError(
f"Expected instance of type {self.embedded_models!r}, not "
f"{type(embedded_instance)!r}."
)
field_values = {}
add = embedded_instance._state.adding
for field in embedded_instance._meta.fields:
value = field.get_db_prep_save(
field.pre_save(embedded_instance, add), connection=connection
)
# Exclude unset primary keys (e.g. {'id': None}).
if field.primary_key and value is None:
continue
field_values[field.attname] = value
# Store the model's label to know the class to use for initializing
# upon retrieval.
field_values["_label"] = embedded_instance._meta.label
# This instance will exist in the database soon.
embedded_instance._state.adding = False
return field_values

def get_transform(self, name):
transform = super().get_transform(name)
if transform:
return transform
for model in self.embedded_models:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if multiple submodes has the same field name?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I called this out in the design doc. I think we'll have to use the system check framework to enforce that common field names at least use the same type, otherwise, we won't know how to prepare the lookup value (e.g. the DatabaseOperations.adapt_<field>_value() methods).

Thinking about it some more, there is also the possibility that nested embedded documents share a field name. In that case, we won't know which field to traverse for the nested lookups that follow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also the possibility that nested embedded documents share a field name

This would only matter in the case of nested polymorphic fields, right? I think we can maintain that polymorphic fields of the same name should also require the same embedded models to define them? This then keeps us in alignment with the requirement that all name conflicts "are of the same type".

class Colonial(EmbeddedModel):
    ...

class Townhome(EmbeddedModel):
    ...

class Condo(EmbeddedModel):
    ...
    
class AmericanAddress(EmbeddedModel):
    city = models.CharField(...)
    state = models.CharField(...)
    zip_code = models.CharField(...)
    home_specifications = PolymorphicEmbeddedModelField([Colonial, Townhome, Condo],...)
    

class CanadianAddress(EmbeddedModel):
    city = models.CharField(...)
    province = models.CharField(...)
    zip_code = models.CharField(...)
    home_specifications = PolymorphicEmbeddedModelField([Colonial, Townhome, Condo],...)
    
class Home(models.Model):
    address = PolymorphicEmbeddedModelField([AmericanAddress, CanadianAddress], ...)
    num_inhabitants = models.IntegerField(...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a reasonable restriction. Whether it's worth adding a system check for that at this stage, I'm doubtful. There are many other cases where the current system check falls short. For example, the db_type() of EmbeddededModelField and the polymorphic version are both "embeddedDocuments", but there could be an incompatibility there. It just seems like a lot of various cases and combinations to consider, and they are technically okay, except for the querying issue which the user may or may not care about (thus a system check warning rather than an error).

with contextlib.suppress(FieldDoesNotExist):
field = model._meta.get_field(name)
break
else:
raise FieldDoesNotExist(
f"The models of field '{self.name}' have no field named '{name}'."
)
return KeyTransformFactory(name, field)

def validate(self, value, model_instance):
super().validate(value, model_instance)
if not isinstance(value, self.embedded_models):
raise ValidationError(
f"Expected instance of type {self.embedded_models!r}, not {type(value)!r}."
)
for field in value._meta.fields:
attname = field.attname
field.validate(getattr(value, attname), model_instance)

def formfield(self, **kwargs):
raise NotImplementedError("PolymorphicEmbeddedModelField does not support forms.")

def _get_model_from_label(self, label):
return next(model for model in self.embedded_models if model._meta.label == label)


def _get_mongodb_connection():
for alias in connections:
if connections[alias].vendor == "mongodb":
return connections[alias]
return None
15 changes: 15 additions & 0 deletions django_mongodb_backend/operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,8 @@ def get_db_converters(self, expression):
)
elif internal_type == "JSONField":
converters.append(self.convert_jsonfield_value)
elif internal_type == "PolymorphicEmbeddedModelField":
converters.append(self.convert_polymorphicembeddedmodelfield_value)
elif internal_type == "TimeField":
# Trunc(... output_field="TimeField") values must remain datetime
# until Trunc.convert_value() so they can be converted from UTC
Expand Down Expand Up @@ -182,6 +184,19 @@ def convert_jsonfield_value(self, value, expression, connection):
"""
return json.dumps(value)

def convert_polymorphicembeddedmodelfield_value(self, value, expression, connection):
if value is not None:
model_class = expression.output_field._get_model_from_label(value["_label"])
# Apply database converters to each field of the embedded model.
for field in model_class._meta.fields:
field_expr = Expression(output_field=field)
converters = connection.ops.get_db_converters(
field_expr
) + field_expr.get_db_converters(connection)
for converter in converters:
value[field.attname] = converter(value[field.attname], field_expr, connection)
return value

def convert_timefield_value(self, value, expression, connection):
if value is not None:
value = value.time()
Expand Down
30 changes: 30 additions & 0 deletions docs/source/ref/models/fields.rst
Original file line number Diff line number Diff line change
Expand Up @@ -313,3 +313,33 @@ These indexes use 0-based indexing.
.. class:: ObjectIdField

Stores an :class:`~bson.objectid.ObjectId`.

``PolymorphicEmbeddedModelField``
---------------------------------

.. class:: PolymorphicEmbeddedModelField(embedded_models, **kwargs)

.. versionadded:: 5.2.0b2

Stores a model of one of the types in ``embedded_models``.

.. attribute:: embedded_models

This is a required argument that specifies a list of model classes
that may be embedded.

Each model class reference works just like
:attr:`.EmbeddedModelField.embedded_model`.

See :ref:`the embedded model topic guide
<polymorphic-embedded-model-field-example>` for more details and examples.

.. admonition:: Migrations support is limited

:djadmin:`makemigrations` does not yet detect changes to embedded models,
nor does it create indexes or constraints for embedded models referenced
by ``PolymorphicEmbeddedModelField``.

.. admonition:: Forms are not supported

``PolymorphicEmbeddedModelField``\s don't appear in model forms.
2 changes: 2 additions & 0 deletions docs/source/releases/5.2.x.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ New features
- Added the ``options`` parameter to
:func:`~django_mongodb_backend.utils.parse_uri`.
- Added support for :ref:`database transactions <transactions>`.
- Added :class:`~.fields.PolymorphicEmbeddedModelField` for storing a model
instance that may be of more than one model class.

5.2.0 beta 1
============
Expand Down
Loading
Loading