Skip to content

Type propagation failure in structs with less than two members #8956

@SamuelWAnderson45

Description

@SamuelWAnderson45

Summary
Pointers to structs with less than 2 members are overridden during type propagation, leading to bad decompilation. I provide an example since it is a little difficult to explain succinctly. I also have a "fix", but I'm worried it might not be correct.

Bug example
For a while now, I've encountered a bug in type inference with structs with only one defined member. This might seem contrived, but it actually comes up regularly when reverse engineering subclasses.

For example, the following code:

extern void* mymalloc(unsigned long sz);

#define NULL ((void*)0)

struct A {
  short a;
  char b;
};

struct B {
  struct A c;
  void *d;
  char e;
};

__attribute__((noinline)) struct A *initA(struct A *this, short a_param, char b_param) {
  if (this == NULL) {
    this = mymalloc(sizeof *this);
    if (this == NULL)
      return NULL;
  }
  this->a = a_param;
  this->b = b_param;
  return this;
}

struct B *initB(struct B *this, short a_param, char b_param, void *d_param, char e_param) {
  if (this == NULL) {
    this = mymalloc(sizeof *this);
    if (this == NULL)
      return NULL;
  }
  initA(&this->c, a_param, b_param);
  this->d = d_param;
  this->e = e_param;
  return this;
}

B is a subclass of A. B has an allocating initializer.

Say I've already REed A and initA, and defined its type.

Image Image

Now I go to look at B. I can tell by the call to malloc its size, 0x18. And I can tell from the call to initA that it is a subclass of A.

Image

So, naturally I create a struct B of size 0x18 with struct A as its first member.

Image

And then I set the first param and return value of initB to struct B*

Image

And the results aren't great. The assignment of param_5 seems to think param_1 must refer to struct A. The return statement is bizzare, the function should return param_1, not some offset into it. Usually the decompiler is at least good about offsets.
The more examples of these you play with, the more clear it becomes that the decompiler believes param_1 is actually a "struct A*", regardless of what we've told it.

There is a strange workaround. Define any other field in struct B. For example, I defined one of the padding bytes at the end to be undefined1 and all of the sudden:

Image

back to the great decompilation we're used to.

Tracing the bug
Why does this happen?

During the type propagation pass, we eventually reach a point in ActionInferTypes::propagateTypeEdge, where the algorithm must choose between struct A* and struct B*

  if (0>newtype->typeOrder(*outvn->getTempType())) {
#ifdef TYPEPROP_DEBUG
	propagationDebug(typegrp->getArch(),outvn,newtype,op,inslot,(Varnode *)0);
#endif
	outvn->setTempType(newtype);
	return !outvn->isMark();
  }

This ultimately leads to the check:

if (submeta != op.submeta) return (submeta < op.submeta) ? -1 : 1;

In this case, our struct B* has submeta "SUB_PTR", a generic pointer, and struct A* has submeta "SUB_PTR_STRUCT", a struct pointer. Since a stuct pointer is more specific, it "wins" the type propagation, which is what leads to the decompiler treating the pointer as a struct A*.

Why is struct B* a SUB_PTR instead of a SUB_PTR_STRUCT? After all, it is a pointer to a struct.

void TypePointer::calcSubmeta(void)
{
  type_metatype ptrtoMeta = ptrto->getMetatype();
  if (ptrtoMeta == TYPE_STRUCT) {
	if (ptrto->numDepend() > 1 || ptrto->isIncomplete())
	  submeta = SUB_PTR_STRUCT;
	else
	  submeta = SUB_PTR;
  }

if numDepend (the number of members in the struct) is greater than 1, it treats it as a struct pointer, else it's treated as a generic pointer. This explains why the workaround works, since it bumps the number of defined members to two.

The easy "fix" is:

diff --git a/Ghidra/Features/Decompiler/src/decompile/cpp/type.cc b/Ghidra/Features/Decompiler/src/decompile/cpp/type.cc
index 6dc548520b..bd5b52e1b9 100644
--- a/Ghidra/Features/Decompiler/src/decompile/cpp/type.cc
+++ b/Ghidra/Features/Decompiler/src/decompile/cpp/type.cc
@@ -1037,10 +1037,7 @@ void TypePointer::calcSubmeta(void)
 {
   type_metatype ptrtoMeta = ptrto->getMetatype();
   if (ptrtoMeta == TYPE_STRUCT) {
-    if (ptrto->numDepend() > 1 || ptrto->isIncomplete())
-      submeta = SUB_PTR_STRUCT;
-    else
-      submeta = SUB_PTR;
+    submeta = SUB_PTR_STRUCT;
   }
   else if (ptrtoMeta == TYPE_UNION) {
	 submeta = SUB_PTR_STRUCT;

and indeed that fixes my decompilation; it produces the same result as when using the workaround from earlier.

I'm hesitant to call my patch a "fix", because presumably that check exists for some reason. So deleting it may cause a regression. But I'm far out of my depth, and figure the decompiler experts might have more insight.

Environment (please complete the following information):

  • OS: Linux
  • Java Version: 21
  • Ghidra Version: 12.0.2 (and older, since at least 11.2 if not much older)
  • Ghidra Origin: official

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions