Skip to content

MDEV-36684 - main.mdl_sync fails under valgrind (test for Bug#42643) #4009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions sql/sql_base.cc
Original file line number Diff line number Diff line change
Expand Up @@ -4460,6 +4460,7 @@ bool open_tables(THD *thd, const DDL_options_st &options,
goto error;

error= FALSE;
std::this_thread::yield();
goto restart;
}
goto error;
Expand Down Expand Up @@ -4522,6 +4523,7 @@ bool open_tables(THD *thd, const DDL_options_st &options,
goto error;

error= FALSE;
std::this_thread::yield();
goto restart;
Comment on lines 4525 to 4527
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicating the section of code 63 lines earlier. In fact, a wider section of the code block starting with if (unlikely(error)) is being duplicated. I think that a follow-up fix would be useful where we’d do the following:

  if (unlikely(error))
  {
    if (!(error= open_tables_handle_error(error, …)))
      goto restart;
    goto error;
  }

and there would be a new function ATTRIBUTE_COLD static bool open_tables_error_handler(bool, …) that would include the duplicated code, including the call to std::this_thread::yield().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. There was a change in 10.11 though, that makes it different by one line at least. I wouldn't dare doing it in GA versions though.

}
/*
Expand Down
1 change: 1 addition & 0 deletions sql/sql_class.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
/* Classes in mysql */

#include <atomic>
#include <thread>
#include "dur_prop.h"
#include <waiting_threads.h>
#include "sql_const.h"
Expand Down
1 change: 1 addition & 0 deletions sql/table_cache.cc
Original file line number Diff line number Diff line change
Expand Up @@ -901,6 +901,7 @@ TABLE_SHARE *tdc_acquire_share(THD *thd, TABLE_LIST *tl, uint flags,
{
mysql_mutex_unlock(&element->LOCK_table_share);
lf_hash_search_unpin(thd->tdc_hash_pins);
std::this_thread::yield();
goto retry;
}
lf_hash_search_unpin(thd->tdc_hash_pins);
Comment on lines 903 to 907
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the following be a correct optimization?

  mysql_mutex_lock(&element->LOCK_table_share);
  lf_hash_search_unpin(thd->tdc_hash_pins);
  if (!(share= element->share))
  {
    mysql_mutex_unlock(&element->LOCK_table_share);
    std::this_thread::yield();
    goto retry;
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, depending on what you're trying to achieve. LOCK_table_share is a per-share mutex, it is not locked for table DML, but still can become contested for views. The less we have under its protection the better. So, this will make things slightly worse in this regard, and that was the main reason to keep it out of LOCK_table_share.

Speaking of correctness: lf_hash_search_unpin() marks element unused. In current LF_HASH implementation it means element is available for reuse. Taking it into account as well as considering how table definition cache is using it, nothing bad is going to happen. But I'd say it is like relying on undefined behaviour. Once it is unpinned it becomes property of LF_HASH. If it happens before element->share check, in theory we can expect LF_HASH to free/corrupt its memory. So it is not good from correctness point of view either.

What we get if we do lf_hash_search_unpin() earlier? It means we mark element for reuse earlier, that is another thread may reuse it rather than allocating it anew.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we would gain is avoiding some code duplication and instruction cache bloat. Another variant would avoid the potential race condition:

  mysql_mutex_lock(&element->LOCK_table_share);
  share= element->share;
  lf_hash_search_unpin(thd->tdc_hash_pins);
  if (!share)
  {
    mysql_mutex_unlock(&element->LOCK_table_share);
    std::this_thread::yield();
    goto retry;
  }

Copy link
Contributor Author

@svoj svoj Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't work either: mysql_mutex_unlock(&element->LOCK_table_share) would access potentially corrupt memory.

I can foresee your question "Why it is safe to unpin right after this block?" Presence of element->share guarantees there is no concurrent delete ongoing. Or if there is, it is waiting on element->LOCK_table_share mutex and thus it is safe to unpin it immediately after this block.

I understand your concern, I'd have implemented it as you suggested originally (back 10+ years ago), but I didn't see the way to arrange code better. Nor do I see it now.

Expand Down
1 change: 1 addition & 0 deletions storage/innobase/trx/trx0purge.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1076,6 +1076,7 @@ static void trx_purge_close_tables(purge_node_t *node, THD *thd) noexcept

void purge_sys_t::wait_FTS(bool also_sys)
{
std::this_thread::yield();
for (const uint32_t mask= also_sys ? ~0U : ~PAUSED_SYS; m_FTS_paused & mask;)
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
Expand Down