Potential false negative problem in type confinement #10

for-just-we · 2024-03-22T02:57:11Z

First, I would like to express my respect for this work. By reviewing the source code of MLTA and some test cases I found a potential case that may cause MLTA to produce additional false negatives than FLTA.

Below is an example from dovecot project. And iostream_pump_flush is an address-taken function used as an argument of call expression o_stream_set_flush_callback(pump->output, iostream_pump_flush, pump);.

I noticed MLTA does process the case that addr-taken function is used as a call argument for type confining. However, in this case, there exists an indirect call in the call in the call-chain, _stream->set_flush_callback(_stream, callback, context); call o_stream_default_set_flush_callback. Where type confinement happens in o_stream_default_set_flush_callback. Will this lead to insufficient confinement of iostream_pump_flush to field ostream::callbacks ?

void iostream_pump_start(struct iostream_pump *pump)
{
	i_assert(pump != NULL);
	i_assert(pump->callback != NULL);

	/* add flush handler */
	if (!pump->output->blocking) {
		o_stream_set_flush_callback(pump->output,
					    iostream_pump_flush, pump);
	}

	/* make IO objects */
	if (pump->input->blocking) {
		i_assert(!pump->output->blocking);
		o_stream_set_flush_pending(pump->output, TRUE);
	} else {
		pump->io = io_add_istream(pump->input,
					  iostream_pump_copy, pump);
		io_set_pending(pump->io);
	}
}


void o_stream_set_flush_callback(struct ostream *stream,
				 stream_flush_callback_t *callback,
				 void *context)
{
	struct ostream_private *_stream = stream->real_stream;

	_stream->set_flush_callback(_stream, callback, context);
}

// indirect invoked by _stream->set_flush_callback
static void
o_stream_default_set_flush_callback(struct ostream_private *_stream,
				    stream_flush_callback_t *callback,
				    void *context)
{
	if (_stream->parent != NULL)
		o_stream_set_flush_callback(_stream->parent, callback, context);

	_stream->callback = callback;
	_stream->context = context;
}

The text was updated successfully, but these errors were encountered:

YantingChi · 2024-03-23T03:29:33Z

Hi,
Thanks for your interest in MLTA. I looked into the dovecot/core, and your case mentioned sounds like a recursive function call.

From my understanding, your question is about whether indirect assignment between function and struct fields will be missed or not. More specifically, in your case the type-confinement happens in _stream->callback = callback; Where the callback function (namely iostream_pump_flush) is assigned to a field of _stream struct. However, when MLTA tries to solve the type hierarchy of the address-taken function, it might miss the assignment of the function iostream_pump_flush to a _stream struct. In that case, when we want to query the potential callee of a caller with a layer hierarchy _stream. callback, we might miss the iostream_pump_flush function.

This is an interesting limitation of MLTA, but it will not introduce FN but introduce some FP instead. In your case, MLTA will treat all the functions that are with type stream_flush_callback_t (or types that can be cast to this type) as potential right values in _stream->callback= callback , because we cannot solve the what function might be passed to o_stream_default_set_flush_callback. It will introduce a lot of FPs, but theoretically, MLTA will not miss the target.

I cannot build the core project (the install. md of the project seems not updated). Have you checked if MLTA will actually fail to capture the layered information for _stream->callback = callback; and thus FNs exist? It would be interesting to see if MLTA misses that target, and that might be a bug in the code implementation.

Hope my answer helps!

for-just-we · 2024-03-23T05:16:37Z

introduce a lot of FPs, but theoretically, MLTA will not miss the target.

I cannot build the core project (the install. md of the project seems not updated). Have you checked if MLTA will actually fail to capture the layered information for _stream->callback = callback; and thus FNs exist? It would be interesting to see if MLTA misses that target, and that might be a bug in the code implementation.

I build dovecot following oss-fuzz.

I actually don't run MLTA on the bitcode level. I tried to implement a source-code level MLTA (which of course cannot be as accurate as at the IR level) and found the issue when running it.

I understand that when MLTA identifies an escape type, it may downgrade to solve i-call by FLTA. And your answer seems to imply that in this case, MLTA will identify the type of _stream->callback that will escape. However, I don't understand how MLTA identifies it.

By reviewing the source code of MLTA, I speculate that if o_stream_default_set_flush_callback is called directly, MLTA will solve the type confine of _stream->callback = iostream_pump_flush; . (By replacing parameter callback with iostream_pump_flush).

However, when _stream->callback = iostream_pump_flush; is missing. And in other place a i-call fstream->ostream.callback(fstream->ostream.context); references the same field callback of struct ostream_private. How will MLTA know that there might be an escape-type.

YantingChi · 2024-03-23T23:02:38Z

Thanks for the feedback, as oss-fuzz typically will generate many exe files, could you tell me which specific executable in oss-fuzz/dovecot are you targeting?
In that case, I can check if there are real false negatives here.

for-just-we · 2024-03-24T04:36:06Z

Thanks for the feedback, as oss-fuzz typically will generate many exe files, could you tell me which specific executable in oss-fuzz/dovecot are you targeting? In that case, I can check if there are real false negatives here.

I actually forget that, but I implement 3 simple testcases with the code in dovecot. Although they cannot run, but can generate bitcode for MLTA to analyze.

testcase1

#define TRUE 1
#define bool _Bool
#define NULL 0

typedef void io_callback_t(void *context);
typedef void iostream_pump_callback_t(enum iostream_pump_status status,
				      void *context);
typedef int stream_flush_callback_t(void *context);

enum iostream_pump_status {
	/* pump succeeded - EOF received from istream and all output was
	   written successfully to ostream. */
	IOSTREAM_PUMP_STATUS_INPUT_EOF,
	/* pump failed - istream returned an error */
	IOSTREAM_PUMP_STATUS_INPUT_ERROR,
	/* pump failed - ostream returned an error */
	IOSTREAM_PUMP_STATUS_OUTPUT_ERROR,
};

struct ostream {
	/* Number of bytes sent via o_stream_send*() and similar functions.
	   This is counting the input data. For example with a compressed
	   ostream this is counting the uncompressed bytes. The compressed
	   bytes could be counted from the parent ostream's offset.

	   Seeking to a specified offset only makes sense if there is no
	   difference between input and output data sizes (e.g. there are no
	   wrapper ostreams changing the data). */
	unsigned int offset;

	/* errno for the last operation send/seek operation. cleared before
	   each call. */
	int stream_errno;

	/* overflow is set when some of the data given to send()
	   functions was neither sent nor buffered. It's never unset inside
	   ostream code. */
	bool overflow:1;
	/* o_stream_send() writes all the data or returns failure */
	bool blocking:1;
	bool closed:1;

	struct ostream_private *real_stream;
};

struct ostream_private {
    void (*set_flush_callback)(struct ostream_private *stream,
				   stream_flush_callback_t *callback,
				   void *context);
    
    struct ostream *parent;

    stream_flush_callback_t *callback;

    void *context;
};

struct iostream_pump {
	int refcount;
    struct io* io;
    struct ostream *output;
	iostream_pump_callback_t *callback;
	void *context;

	bool waiting_output;
	bool completed;
};

struct io {
	const char *source_filename;
	unsigned int source_linenum;
	/* trigger I/O callback even if OS doesn't think there is input
	   pending */
	bool pending;
	/* This IO event shouldn't be the only thing being waited on, because
	   it would just result in infinite wait. */
	bool never_wait_alone;

	io_callback_t *callback;
        void *context;

	struct ioloop *ioloop;
	struct ioloop_context *ctx;
};


void iostream_pump_start(struct iostream_pump *pump)
{
	o_stream_set_flush_callback(pump->output,
					    iostream_pump_flush, pump);
}

void o_stream_set_flush_callback(struct ostream *stream,
				 stream_flush_callback_t *callback,
				 void *context)
{
	struct ostream_private *_stream = stream->real_stream;
	_stream->set_flush_callback(_stream, callback, context);
}

// indirect invoked by _stream->set_flush_callback
static void
o_stream_default_set_flush_callback(struct ostream_private *_stream,
				    stream_flush_callback_t *callback,
				    void *context)
{
	if (_stream->parent != NULL)
		o_stream_set_flush_callback(_stream->parent, callback, context);

	_stream->callback = callback;
	_stream->context = context;
}

static int iostream_pump_flush(struct iostream_pump *pump)
{
	int ret = 0;
    return ret;
}

static void stream_send_io(struct ostream_private *stream) {
    // indirect-call
    stream->callback(stream->context);
}

In this case, the assignment of iostream_pump_flush involves two callsites including one indirect-callsite.

testcase2

#define TRUE 1
#define bool _Bool
#define NULL 0

typedef void io_callback_t(void *context);
typedef void iostream_pump_callback_t(enum iostream_pump_status status,
				      void *context);
typedef int stream_flush_callback_t(void *context);

enum iostream_pump_status {
	/* pump succeeded - EOF received from istream and all output was
	   written successfully to ostream. */
	IOSTREAM_PUMP_STATUS_INPUT_EOF,
	/* pump failed - istream returned an error */
	IOSTREAM_PUMP_STATUS_INPUT_ERROR,
	/* pump failed - ostream returned an error */
	IOSTREAM_PUMP_STATUS_OUTPUT_ERROR,
};

struct ostream {
	/* Number of bytes sent via o_stream_send*() and similar functions.
	   This is counting the input data. For example with a compressed
	   ostream this is counting the uncompressed bytes. The compressed
	   bytes could be counted from the parent ostream's offset.

	   Seeking to a specified offset only makes sense if there is no
	   difference between input and output data sizes (e.g. there are no
	   wrapper ostreams changing the data). */
	unsigned int offset;

	/* errno for the last operation send/seek operation. cleared before
	   each call. */
	int stream_errno;

	/* overflow is set when some of the data given to send()
	   functions was neither sent nor buffered. It's never unset inside
	   ostream code. */
	bool overflow:1;
	/* o_stream_send() writes all the data or returns failure */
	bool blocking:1;
	bool closed:1;

	struct ostream_private *real_stream;
};

struct ostream_private {
    void (*set_flush_callback)(struct ostream_private *stream,
				   stream_flush_callback_t *callback,
				   void *context);
    
    struct ostream *parent;

    stream_flush_callback_t *callback;

    void *context;
};

struct iostream_pump {
	int refcount;
    struct io* io;
    struct ostream *output;
	iostream_pump_callback_t *callback;
	void *context;

	bool waiting_output;
	bool completed;
};

struct io {
	const char *source_filename;
	unsigned int source_linenum;
	/* trigger I/O callback even if OS doesn't think there is input
	   pending */
	bool pending;
	/* This IO event shouldn't be the only thing being waited on, because
	   it would just result in infinite wait. */
	bool never_wait_alone;

	io_callback_t *callback;
        void *context;

	struct ioloop *ioloop;
	struct ioloop_context *ctx;
};

static int iostream_pump_flush(void *pump)
{
	int ret = 0;
    return ret;
}

void o_stream_set_flush_callback(struct ostream *stream,
				 stream_flush_callback_t *callback,
				 void *context);

// indirect invoked by _stream->set_flush_callback
static void
o_stream_default_set_flush_callback(struct ostream_private *_stream,
				    stream_flush_callback_t *callback,
				    void *context)
{
	_stream->callback = callback;
	_stream->context = context;
}

void o_stream_set_flush_callback(struct ostream *stream,
				 stream_flush_callback_t *callback,
				 void *context)
{
    struct ostream_private *_stream = stream->real_stream;
	o_stream_default_set_flush_callback(_stream, callback, context);
}

void iostream_pump_start(struct iostream_pump *pump)
{
	o_stream_set_flush_callback(pump->output,
					    iostream_pump_flush, pump);
}

static void stream_send_io(struct ostream_private *stream) {
    // indirect-call
    stream->callback(stream->context);
}

int main() {
    struct iostream_pump *pump;
    iostream_pump_start(pump);
    stream_send_io(pump->output);
    return 0;
}

In this case, the assignment of addr-taken function involves two direct callsite.

testcase3

#define TRUE 1
#define bool _Bool
#define NULL 0

typedef void io_callback_t(void *context);
typedef void iostream_pump_callback_t(enum iostream_pump_status status,
				      void *context);
typedef int stream_flush_callback_t(void *context);

enum iostream_pump_status {
	/* pump succeeded - EOF received from istream and all output was
	   written successfully to ostream. */
	IOSTREAM_PUMP_STATUS_INPUT_EOF,
	/* pump failed - istream returned an error */
	IOSTREAM_PUMP_STATUS_INPUT_ERROR,
	/* pump failed - ostream returned an error */
	IOSTREAM_PUMP_STATUS_OUTPUT_ERROR,
};

struct ostream {
	/* Number of bytes sent via o_stream_send*() and similar functions.
	   This is counting the input data. For example with a compressed
	   ostream this is counting the uncompressed bytes. The compressed
	   bytes could be counted from the parent ostream's offset.

	   Seeking to a specified offset only makes sense if there is no
	   difference between input and output data sizes (e.g. there are no
	   wrapper ostreams changing the data). */
	unsigned int offset;

	/* errno for the last operation send/seek operation. cleared before
	   each call. */
	int stream_errno;

	/* overflow is set when some of the data given to send()
	   functions was neither sent nor buffered. It's never unset inside
	   ostream code. */
	bool overflow:1;
	/* o_stream_send() writes all the data or returns failure */
	bool blocking:1;
	bool closed:1;

	struct ostream_private *real_stream;
};

struct ostream_private {
    void (*set_flush_callback)(struct ostream_private *stream,
				   stream_flush_callback_t *callback,
				   void *context);
    
    struct ostream *parent;

    stream_flush_callback_t *callback;

    void *context;
};

struct iostream_pump {
	int refcount;
    struct io* io;
    struct ostream *output;
	iostream_pump_callback_t *callback;
	void *context;

	bool waiting_output;
	bool completed;
};

struct io {
	const char *source_filename;
	unsigned int source_linenum;
	/* trigger I/O callback even if OS doesn't think there is input
	   pending */
	bool pending;
	/* This IO event shouldn't be the only thing being waited on, because
	   it would just result in infinite wait. */
	bool never_wait_alone;

	io_callback_t *callback;
        void *context;

	struct ioloop *ioloop;
	struct ioloop_context *ctx;
};

static int iostream_pump_flush(void *pump)
{
	int ret = 0;
    return ret;
}

// indirect invoked by _stream->set_flush_callback
static void
o_stream_default_set_flush_callback(struct ostream_private *_stream,
				    stream_flush_callback_t *callback,
				    void *context)
{
	_stream->callback = callback;
	_stream->context = context;
}

static void stream_send_io(struct ostream_private *stream) {
    // indirect-call
    stream->callback(stream->context);
}

int main() {
    struct ostream *stream;
    void* context;
    o_stream_default_set_flush_callback(stream, iostream_pump_flush, context);
    stream_send_io(stream);
    return 0;
}

In this case, the assignment of addr-taken function involves only one callsite.

result

However, I notice in all 3 cases MLTA fail to report iostream_pump_flush.

for-just-we · 2024-03-24T07:55:05Z

This also reminds me of an interesting extended question, which is "Can MLTA induce false negatives when it comes to complex data flows?" compared with FLTA.

In your paper, you seem to hypothesize that all address-taken functions referred to by structs fields are assigned straightforwardly. However, in the following cases. address-taken function addr_taken_func is assigned to Handler::func through a complex data-flow that involves an array funcs.

In my understanding, MLTA seems not to conduct such complex data-flow (which may also include inter-procedural complex data-flow). In this case, MLTA should regard Handler::func as an escape type. But I don't figure out how will MLTA identify the escape type. It seems that the paper doesn't discuss the details of the escape type.

void addr_taken_func();

...
func_ptr_t funcs[10];

funcs[5] = addr_taken_func;
...

struct Handler handler;
handler.func = funcs[5];

for-just-we · 2024-03-25T07:15:02Z

Hi, Thanks for your interest in MLTA. I looked into the dovecot/core, and your case mentioned sounds like a recursive function call.

From my understanding, your question is about whether indirect assignment between function and struct fields will be missed or not. More specifically, in your case the type-confinement happens in _stream->callback = callback; Where the callback function (namely iostream_pump_flush) is assigned to a field of _stream struct. However, when MLTA tries to solve the type hierarchy of the address-taken function, it might miss the assignment of the function iostream_pump_flush to a _stream struct. In that case, when we want to query the potential callee of a caller with a layer hierarchy _stream. callback, we might miss the iostream_pump_flush function.

This is an interesting limitation of MLTA, but it will not introduce FN but introduce some FP instead. In your case, MLTA will treat all the functions that are with type stream_flush_callback_t (or types that can be cast to this type) as potential right values in _stream->callback= callback , because we cannot solve the what function might be passed to o_stream_default_set_flush_callback. It will introduce a lot of FPs, but theoretically, MLTA will not miss the target.

I cannot build the core project (the install. md of the project seems not updated). Have you checked if MLTA will actually fail to capture the layered information for _stream->callback = callback; and thus FNs exist? It would be interesting to see if MLTA misses that target, and that might be a bug in the code implementation.

Hope my answer helps!

In your crix project. You seem to comment that such a case as: pool->free = free_fn; // free_fn is a function pointer from a function argument should be deemed as escaped. Does this mean in the case we discuss, when a func pointer field of struct load value from a function pointer variable (could be a function parameter or a simple func pointer global variable), we should simply regard it as escaped instead of doing inter-procedural analysis? Although it might induce more false positives. In this case, it seems that no false positive will be induced. But will lose precision on some simple stupid data-flow cases.

YantingChi · 2024-03-26T20:06:38Z

Thanks for the description and comments

In your crix project. You seem to comment that such a case as pool->free = free_fn; // free_fn is a function pointer from a function argument should be deemed as escaped. Does this mean in the case we discuss, when a func pointer field of struct load value from a function pointer variable (could be a function parameter or a simple func pointer global variable), we should simply regard it as escaped instead of doing inter-procedural analysis? Although it might induce more false positives. In this case, it seems that no false positive will be induced. But it will lose precision on some simple stupid data-flow cases.

I think you are mentioning there are no false negatives but more false positives here. I believe MLTA does not involve expensive inter-procedural analysis as it can quickly scanned through the Linux kernel. Involving inter-procedural analysis might increase the overhead.

I can double-check the code to make sure whether there are inter-procedural analyses in MLTA or not.

However, I notice in all 3 cases MLTA fails to report iostream_pump_flush.

Thanks for the cases! Excited to see MLTA actually misses some cases. I will go through the cases and get back to you when I find something interesting

for-just-we · 2024-03-27T02:49:47Z

Thanks for the description and comments

In your crix project. You seem to comment that such a case as pool->free = free_fn; // free_fn is a function pointer from a function argument should be deemed as escaped. Does this mean in the case we discuss, when a func pointer field of struct load value from a function pointer variable (could be a function parameter or a simple func pointer global variable), we should simply regard it as escaped instead of doing inter-procedural analysis? Although it might induce more false positives. In this case, it seems that no false positive will be induced. But it will lose precision on some simple stupid data-flow cases.

I think you are mentioning there are no false negatives but more false positives here. I believe MLTA does not involve expensive inter-procedural analysis as it can quickly scanned through the Linux kernel. Involving inter-procedural analysis might increase the overhead.

I can double-check the code to make sure whether there are inter-procedural analyses in MLTA or not.

However, I notice in all 3 cases MLTA fails to report iostream_pump_flush.

Thanks for the cases! Excited to see MLTA actually misses some cases. I will go through the cases and get back to you when I find something interesting

I guess the code is in MLTA::typeConfineInFunction(Function *F), where my understanding for this part of code is that they traversing all arguments of a callInst, if encounter a function being used as a argument. Try to deep into the callee to continue type confining. Well, maybe there are some mistakes in my understanding.

else if (CallInst *CI = dyn_cast<CallInst>(I)) {
		for (User::op_iterator OI = I->op_begin(), 
				OE = I->op_end();
				OI != OE; ++OI) {
			if (Function *F = dyn_cast<Function>(*OI)) {
				if (F->isIntrinsic())
					continue;
				if (CI->isIndirectCall()) {
					confineTargetFunction(*OI, F);
					continue;
				}
				Value *CV = CI->getCalledOperand();
				Function *CF = dyn_cast<Function>(CV);
				if (!CF)
					continue;
				if (CF->isDeclaration())
					CF = Ctx->GlobalFuncMap[CF->getGUID()];
				if (!CF)
					continue;
				if (Argument *Arg = getParamByArgNo(CF, OI->getOperandNo())) {
					for (auto U : Arg->users()) {
						if (isa<StoreInst>(U) || isa<BitCastOperator>(U)) {
							confineTargetFunction(U, F);
						}
					}
				}
				// TODO: track into the callee to avoid marking the
				// function type as a cap
			}
		}
	}

for-just-we · 2024-06-14T09:26:19Z

Hi, author. My recent work requires the use of MLta. I have been reading the code of MLTA and making some adjustments, but there are some parts of the code that I do not fully understand. I would like to seek your guidance on these.

1.Question 1

The first place is in MLTA::confineTargetFunction, where you mark a function-type as escaping. However, If I use fuzz type match by comparing the call arguments and function parameters one by one. Do I need to mark the escaped function-type? I give an example

typedef void (*fptr_long)(long);
typedef void (*fptr_int)(int);
typedef void (*fptr_ptr)(fptr_long);
void f1(long a) {}
void f2(long a) {}
void f3(long a) {}

void scene1_b(fptr_int f) { f(0); } // call1

void scene1_a () {
   fptr_int f = (fptr_int)&f1;
   scene1_b(f);
}

struct S {
    fptr_long one;
    fptr_int two;
};

void scene2_b (struct S* s) { s->one(0); } // call2

void scene2_a () {
    struct S s = {&f2 , 0};
    scene2_b(&s);
}

fptr_long callback;

void set_callback(fptr_long f) { callback = f; }

void scene3_b () { callback(0); } // call

void scene3_a () {
    fptr_ptr some_cb_target = &set_callback ;
    some_cb_target(&f3); // call3
}

int main(){
    scene1_a();
    scene2_a();
    scene3_a();
    return 0;
}

Where in this case, s->one(0); should be analyzed by MLTA with callee set of only f2. However, due to typedef void (*fptr_long)(long) escaped. It is analyzed by FLTA. Here I don't understand the necessity of marking escaped function-type.

2.Question 2

The following code in MLTA::typeConfineInFunction(Function *F).

else if (CallInst *CI = dyn_cast<CallInst>(I)) {
			for (User::op_iterator OI = I->op_begin(), 
					OE = I->op_end();
					OI != OE; ++OI) {
				if (Function *F = dyn_cast<Function>(*OI)) {
					if (F->isIntrinsic())
						continue;
					if (CI->isIndirectCall()) {
						confineTargetFunction(*OI, F);
						continue;
					}
					Value *CV = CI->getCalledOperand();
					Function *CF = dyn_cast<Function>(CV);
					if (!CF)
						continue;
					if (CF->isDeclaration())
						CF = Ctx->GlobalFuncMap[CF->getGUID()];
					if (!CF)
						continue;
					if (Argument *Arg = getParamByArgNo(CF, OI->getOperandNo())) {
						for (auto U : Arg->users()) {
							if (isa<StoreInst>(U) || isa<BitCastOperator>(U)) {
								confineTargetFunction(U, F);
							}
						}
					}
					// TODO: track into the callee to avoid marking the
					// function type as a cap
				}
			}
		}

I understand the process for StoreInst. But why should MLTA process CallInst in typeConfineInFunction. Because I don't see there are confinement to be analyzed in call expression.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential false negative problem in type confinement #10

Potential false negative problem in type confinement #10

for-just-we commented Mar 22, 2024

YantingChi commented Mar 23, 2024 •

edited

Loading

for-just-we commented Mar 23, 2024

YantingChi commented Mar 23, 2024

for-just-we commented Mar 24, 2024

for-just-we commented Mar 24, 2024

for-just-we commented Mar 25, 2024

YantingChi commented Mar 26, 2024

for-just-we commented Mar 27, 2024

for-just-we commented Jun 14, 2024

Potential false negative problem in type confinement #10

Potential false negative problem in type confinement #10

Comments

for-just-we commented Mar 22, 2024

YantingChi commented Mar 23, 2024 • edited Loading

for-just-we commented Mar 23, 2024

YantingChi commented Mar 23, 2024

for-just-we commented Mar 24, 2024

testcase1

testcase2

testcase3

result

for-just-we commented Mar 24, 2024

for-just-we commented Mar 25, 2024

YantingChi commented Mar 26, 2024

for-just-we commented Mar 27, 2024

for-just-we commented Jun 14, 2024

1.Question 1

2.Question 2

YantingChi commented Mar 23, 2024 •

edited

Loading