Implementation of DLRM: Embedding Operations #4227

Sir-NoChill · 2024-09-24T20:27:51Z

Sir-NoChill
Sep 24, 2024

Hello community!

I am currently trying to reimplement Meta's DLRM algorithm, specifically the architecture discussed in this paper for profiling and performance research. I am having some trouble with writing a flax implementation of the sparse vector embedding code:

In Meta's implementation, they initialize a torch.EmbeddingBag as follows (see line in the original code):

Note: I abridge a lot of the code relating to the class fields, they initialize, then store, then access a sequence of these EmbeddingBag's throughout the course of the model

EE = nn.EmbeddingBag(n, m, mode="sum", sparse=True)

but they subsequently use it like this (refer to this line):

V = EE(
  sparse_index_group_batch,
  sparse_offset_group_batch,
  per_sample_weights=per_sample_weights,
)

However I cannot find a way to duplicate this functionality using the flax nnx.Embed or linen.Embed class. I am also relatively new to jax/flax so I apologize in advance for my further questions :) My current model is as follows (using nnx):

class DLRM(nnx.Module):
  """A clone of Meta's DLRM model written with Flax NNX."""

  def __init__(self, *, rngs: nnx.Rngs, embed_dims: jax.Array):
    # Dense linear layers
    self.linear_bot1 = nnx.Linear(13, 512, rngs=rngs)
    self.linear_bot2 = nnx.Linear(512, 256, rngs=rngs)
    self.linear_bot3 = nnx.Linear(256, 64, rngs=rngs)
    self.linear_bot4 = nnx.Linear(64, 16, rngs=rngs)

    self.linear_top1 = nnx.Linear(512, 256, rngs=rngs)
    self.linear_top2 = nnx.Linear(256, 1, rngs=rngs)

    emb_l = []
    v_W_l = []
    for i in range(0, embed_dims.size):
      n = embed_dims[i]

      EE = nnx.Embed(n, 16, rngs=rngs)  # TODO fix this to a parameter
      emb_l.append(EE)
      v_W_l.append(None)

    self.embed = emb_l

  def __call__(self, x, lS_o, lS_i):
    """Forward for the DLRM model
    x is Linear
    no clue what the other two are
    """
    # Dense linear on the interaction data
    x = nnx.relu(self.linear_bot1(x))
    x = nnx.relu(self.linear_bot2(x))
    x = nnx.relu(self.linear_bot3(x))
    x = nnx.relu(self.linear_bot4(x))

    # Sparse linears on the categorical data
    ly = []
    for k, sparse_index_group_batch in enumerate(lS_i):
        sparse_offset_group_batch = lS_o[k]
        y = self.embed[k](sparse_index_group_batch)  # <<<< HERE
        ly.append(y)

    # Interact the two sets of data
    batch_size, d = x.shape
    T = jnp.concat([x] + ly, dim=1).view((batch_size, -1, d))
    Z = jnp.dot(T, jnp.transpose(T, 1, 2))

    _, ni, nj = Z.shape
    li = jnp.Array([i for i in range(ni) for j in range(i)])
    lj = jnp.Array([j for i in range(nj) for j in range(i)])

    Zflat = Z[:, li, lj]
    R = jnp.concat([x] + [Zflat], dim=1)

    # Close it off with the top MLP
    ret = self.linear_top1(R)
    ret = self.linear_top2(ret)
    return ret

Sir-NoChill · 2024-10-04T02:11:59Z

Sir-NoChill
Oct 4, 2024
Author

Still working on this problem, I switched from nnx back to flax.linen just for more historical code examples. My current model looks like the following, which (I think) is correct:

class DLRM_Net(nn.Module):
    m_spa: int
    ln_emb: List[int]
    ln_bot: List[int]
    ln_top: List[int]
    arch_interaction_op: str
    arch_interaction_itself: bool = False
    sigmoid_bot: int = -1
    sigmoid_top: int = -1
    loss_threshold: float = 0.0
    weighted_pooling: Optional[str] = None

    def setup(self):
        self.embeddings = [nn.Embed(num_embeddings=n, features=self.m_spa) 
                           for n in self.ln_emb]

        self.bot_mlp = self.create_mlp(self.ln_bot, self.sigmoid_bot)
        self.top_mlp = self.create_mlp(self.ln_top, self.sigmoid_top)

    def create_mlp(self, ln, sigmoid_layer):
        layers = []
        for i in range(len(ln) - 1):
            layers.append(nn.Dense(features=ln[i + 1]))
            if i == sigmoid_layer:
                layers.append(nn.sigmoid)
            else:
                layers.append(nn.relu)
        return nn.Sequential(layers)

    def apply_embedding(self, lS_o, lS_i, embeddings):
        """Embeddings lookup for sparse features using jax.lax.gather"""
        ly = []
        for k in range(len(embeddings)):
            E = embeddings[k]
            embeds = E.apply(lS_i[k], lS_o[k])
            
            # Perform sum over the range of gathered embeddings specified by lS_o
            V = jnp.sum(embeds, axis=-1)
            ly.append(V)
        
        return ly

    def interact_features(self, x, ly):
        if self.arch_interaction_op == "dot":
            T = jnp.concatenate([x] + ly, axis=1).reshape(x.shape[0], -1, x.shape[1])
            Z = jnp.matmul(T, jnp.transpose(T, axes=(0, 2, 1)))
            offset = 1 if self.arch_interaction_itself else 0
            li = jnp.array([i for i in range(Z.shape[1]) for j in range(i + offset)])
            lj = jnp.array([j for i in range(Z.shape[2]) for j in range(i + offset)])
            Zflat = Z[:, li, lj]
            R = jnp.concatenate([x, Zflat], axis=1)
        elif self.arch_interaction_op == "cat":
            R = jnp.concatenate([x] + ly, axis=1)
        else:
            raise ValueError(f"Unsupported interaction op: {self.arch_interaction_op}")
        return R

    def __call__(self, dense_x, lS_o, lS_i):
        x = self.bot_mlp(dense_x)
        ly = self.apply_embedding(lS_o, lS_i, self.embeddings)
        z = self.interact_features(x, ly)
        p = self.top_mlp(z)

        if 0.0 < self.loss_threshold < 1.0:
            p = jnp.clip(p, self.loss_threshold, 1.0 - self.loss_threshold)

        return p

though I am having trouble with the model.init stage as the linen.Embed module take a dictionary for the offset argument. I don't quite know how to provide dummy data for that, so any pointers would be helpful on how to properly initialize the model with embedding offsets.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of DLRM: Embedding Operations #4227

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Implementation of DLRM: Embedding Operations #4227

Sir-NoChill Sep 24, 2024

Replies: 1 comment

Sir-NoChill Oct 4, 2024 Author

Sir-NoChill
Sep 24, 2024

Sir-NoChill
Oct 4, 2024
Author