Scale frequency to suppress RCU CPU stall warning #67

Mes0903 · 2025-01-11T12:25:51Z

Since the emulator currently operates using sequential emulation, the execution time for the boot process is relatively long, which can result in the generation of RCU CPU stall warnings.

To address this issue, there are several potential solutions:

Scale the frequency to slow down emulator time during the boot process, thereby eliminating RCU CPU stall warnings.
During the boot process, avoid using clock_gettime to update ticks and instead manage the tick increment relationship manually.
Implement multi-threaded emulation to accelerate the emulator's execution speed.

For the third point, while implementing multi-threaded emulation can significantly accelerate the emulator's execution speed, it cannot guarantee that this issue will not reappear as the number of cores increases in the future. Therefore, a better approach is to use methods 1 and 2 to allow the emulator to set an expected time for completing the boot process.

The advantages and disadvantages of the scale method are as follows:

Advantages:

Simple implementation
Effectively sets the expected boot process completion time
Results have strong interpretability
Emulator time can be easily mapped back to real time

Disadvantages:

Slower execution speed

The advantages and disadvantages of the increment ticks method are as follows:

Advantages:

Faster execution speed
Effectively sets the expected boot process completion time

Disadvantages:

More complex implementation
Some results are difficult to interpret
Emulator time is difficult to map back to real time

Based on practical tests, the second method provides limited acceleration but introduces some significant drawbacks, such as difficulty in interpreting results and the complexity of managing the increment relationship. Therefore, this commit opts for the scale frequency method to address this issue.

This commit divides time into emulator time and real time. During the boot process, the timer uses scale frequency to slow down the growth of emulator time, eliminating RCU CPU stall warnings. After the boot process is complete, the growth of emulator time aligns with real time.

To configure the scale frequency parameter, three pieces of information are required:

The expected completion time of the boot process
A reference point for estimating the boot process completion time
The relationship between the reference point and the number of SMPs

According to the Linux kernel documentation:
https://docs.kernel.org/RCU/stallwarn.html#config-rcu-cpu-stall-timeout

The grace period for RCU CPU stalls is typically set to 21 seconds. By dividing this value by two as the expected completion time, we can provide a sufficient buffer to reduce the impact of errors and avoid RCU CPU stall warnings.

Using gprof for basic statistical analysis, it was found that semu_timer_clocksource accounts for approximately 10% of the boot process execution time. Since the logic within semu_timer_clocksource is relatively simple, its execution time can be assumed to be nearly equal to clock_gettime.

Furthermore, by adding a counter to semu_timer_clocksource, it was observed that each time the number of SMPs increases by 1, the execution count of semu_timer_clocksource increases by approximately $2 \times 10^8$ (see the table below).

With this information, we can estimate the boot process completion time as

$$ \text{SecPerCall} \times \text{SMPs} \times 2 \times 10^8 \times \frac{100%}{10%} $$

seconds, and thereby calculate the scale frequency parameter. For instance, if the estimated time is 200 seconds and the target time is 10 seconds, the scaling factor would be 10 / 200.

To calculate the proportion of semu_timer_clocksource using gprof, simply add the -pg option during compilation. Then, terminate the emulator's operation immediately after the first switch to U mode (when the boot process is complete). This approach allows for a rough estimation of the proportion occupied by semu_timer_clocksource.

Below is the gprof testing output:

CPU: 13th Gen Intel(R) Core(TM) i7-13700

 mes@DESKTOP-HLQ9F6A:~/semu$ gprof ./semu
 Flat profile:
 
 Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total           
  time   seconds   seconds    calls  Ts/call  Ts/call  name    
  39.91     55.19    55.19                             vm_step
  10.55     69.78    14.59                             semu_timer_clocksource
   9.27     82.60    12.82                             semu_start
   6.49     91.57     8.97                             aclint_mswi_update_interrupts
   6.19    100.13     8.56                             mmu_store
   5.73    108.05     7.92                             mmu_translate
   3.98    113.56     5.51                             semu_timer_get
   3.33    118.16     4.60                             op_rv32i
   2.92    122.20     4.04                             aclint_mtimer_update_interrupts
   2.76    126.02     3.82                             aclint_sswi_update_interrupts
   2.10    128.93     2.91                             ram_read
   1.59    131.13     2.20                             mmu_load
   1.27    132.89     1.76                             mem_store
   1.19    134.54     1.65                             _init
   0.62    135.40     0.86                             mem_load
   0.46    136.04     0.64                             u8250_check_ready
   0.41    136.61     0.57                             ram_write
   0.39    137.15     0.54                             csr_read
   0.37    137.66     0.51                             virtio_net_refresh_queue
   0.28    138.05     0.39                             mem_fetch
   0.17    138.28     0.23                             op_csr_cs

CPU: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz

 mes@Mes:~/semu$ gprof ./semu
 Flat profile:
 
 Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total           
  time   seconds   seconds    calls  Ts/call  Ts/call  name    
  32.96     77.77    77.77                             vm_step
  15.96    115.43    37.66                             semu_timer_clocksource
  13.95    148.34    32.91                             semu_start
   6.10    162.74    14.40                             aclint_mswi_update_interrupts
   5.95    176.77    14.03                             mmu_store
   5.35    189.39    12.62                             mmu_translate
   3.85    198.47     9.08                             aclint_mtimer_update_interrupts
   2.73    204.91     6.44                             op_rv32i
   2.46    210.72     5.81                             ram_read
   2.11    215.70     4.98                             aclint_sswi_update_interrupts
   1.74    219.81     4.11                             mmu_load
   1.74    223.91     4.10                             semu_timer_get
   1.30    226.98     3.07                             mem_store
   0.92    229.14     2.16                             ram_write
   0.81    231.05     1.91                             _init
   0.61    232.48     1.43                             mem_load
   0.55    233.78     1.30                             csr_read
   0.35    234.60     0.82                             u8250_check_ready
   0.22    235.11     0.51                             op_csr_cs
   0.21    235.61     0.50                             virtio_net_refresh_queue
   0.14    235.95     0.34                             mem_fetch
   0.00    235.96     0.01                             semu_timer_init

And by adding a counter to semu_timer_clocksource, it becomes possible to calculate the relationship between SMPs and the number of times semu_timer_clocksource is called.

Below is the testing output(13th Gen Intel(R) Core(TM) i7-13700):

SMP	times call `semu_timer_total_ticks`
1	239,937,385
2	410,377,969
3	600,190,253
4	825,078,230
5	1,007,098,511
6	1,213,304,632
7	1,419,857,500
8	1,627,353,803
9	1,835,991,887
10	2,056,837,458
11	2,269,651,319
12	2,485,299,111
13	2,703,809,852
14	2,917,880,774
15	3,134,960,385
16	3,450,041,050
17	3,587,880,492
18	3,810,315,132
19	4,052,935,937
20	4,274,307,156
21	4,503,423,577
22	4,742,850,891
23	4,973,145,039
24	5,213,953,075
25	5,597,276,263
26	5,696,829,697
27	5,935,553,504
28	6,173,571,960
29	6,620,649,951
30	6,683,572,727
31	6,942,616,048
32	7,180,650,711

Summary by Bito

Implementation of frequency scaling solution to address RCU CPU stall warnings during boot process. The solution divides time handling between emulator and real time, using scaled frequency during boot and switching to real-time after completion. Targets 10-second boot time with comprehensive testing infrastructure across SMP configurations.

Unit tests added: False

Estimated effort to review (1-5, lower is better): 3

ranvd · 2025-01-12T03:25:10Z

utils.c

+#define SEMU_BOOT_TARGET_TIME 10
+#endif
+
+bool boot_complete = false;


I suggest moving boot_complete variable into vm_t for a more conceptually accurate design.

If we move boot_complete into vm_t, all existing functions for semu_timer_t would need an additional vm_t parameter. For example, semu_timer_get would change to:

semu_timer_get(vm_t *vm, semu_timer_t *timer)

This change would indirectly require the areas that call this function to also pass in a vm_tparameter. For instance, since semu_timer_get is called within aclint_mtimer_update_interrupts, the API of aclint_mtimer_update_interrupts would also need to be updated to include vm_t.

As this pattern continues, the API changes would proliferate significantly. Perhaps we could introduce a static bool pointer pointing to boot_complete and assign its value during semu_timer_init. This way, we would only need to modify the parameters of semu_timer_init.

utils.c

Mes0903 · 2025-01-13T12:54:51Z

I noticed that the time during boot process was far from the target time on macOS. I think this may cause by the hypothesis of " $SMP \times 2 \times 10^8$ calls of semu_timer_clocksource" or "semu_timer_clocksource is ~10% of boot overhead", is not correct on macOS. But the logic of the function was implemented now.

utils.c

utils.h

utils.c

utils.h

Mes0903 · 2025-01-15T07:26:32Z

Based on the output, I think it clearly that the assumption of " $SMP \times 2 \times 10^8$ calls of semu_timer_clocksource" or "semu_timer_clocksource is ~10% of boot overhead" does not hold on macOS. We need to re-evaluate the number of semu_timer_clocksource calls and its proportion on macOS to accurately calculate the scale factor.

Makefile

jserv

Refine the git commit messages.

Change from

According to the Linux kernel documentation:
https://docs.kernel.org/RCU/stallwarn.html#config-rcu-cpu-stall-timeout

to

According to Using RCU’s CPU Stall Detector [1]
...
[1] https://docs.kernel.org/RCU/stallwarn.html#config-rcu-cpu-stall-timeout

Append Close #51 at the end of git commit messages.

utils.c

utils.h

Since the emulator currently operates using sequential emulation, the execution time for the boot process is relatively long, which can result in the generation of RCU CPU stall warnings. To address this issue, there are several potential solutions: 1. Scale the frequency to slow down emulator time during the boot process, thereby eliminating RCU CPU stall warnings. 2. During the boot process, avoid using 'clock_gettime' to update ticks and instead manage the tick increment relationship manually. 3. Implement multi-threaded emulation to accelerate the emulator's execution speed. For the third point, while implementing multi-threaded emulation can significantly accelerate the emulator's execution speed, it cannot guarantee that this issue will not reappear as the number of cores increases in the future. Therefore, a better approach is to use methods 1 and 2 to allow the emulator to set an expected time for completing the boot process. The advantages and disadvantages of the scale method are as follows: Advantages: - Simple implementation - Effectively sets the expected boot process completion time - Results have strong interpretability - Emulator time can be easily mapped back to real time Disadvantages: - Slower execution speed The advantages and disadvantages of the increment ticks method are as follows: Advantages: - Faster execution speed - Effectively sets the expected boot process completion time Disadvantages: - More complex implementation - Some results are difficult to interpret - Emulator time is difficult to map back to real time Based on practical tests, the second method provides limited acceleration but introduces some significant drawbacks, such as difficulty in interpreting results and the complexity of managing the increment relationship. Therefore, this commit opts for the scale frequency method to address this issue. This commit divides time into emulator time and real time. During the boot process, the timer uses scale frequency to slow down the growth of emulator time, eliminating RCU CPU stall warnings. After the boot process is complete, the growth of emulator time aligns with real time. To configure the scale frequency parameter, three pieces of information are required: 1. The expected completion time of the boot process 2. A reference point for estimating the boot process completion time 3. The relationship between the reference point and the number of SMPs According to Using RCU’s CPU Stall Detector [1], the grace period for RCU CPU stalls is typically set to 21 seconds. By dividing this value by two as the expected completion time, we can provide a sufficient buffer to reduce the impact of errors and avoid RCU CPU stall warnings. Using 'gprof' for basic statistical analysis, it was found that 'semu_timer_clocksource' accounts for approximately 10% of the boot process execution time. Since the logic within 'semu_timer_clocksource' is relatively simple, its execution time can be assumed to be nearly equal to 'clock_gettime'. Furthermore, by adding a counter to 'semu_timer_clocksource', it was observed that each time the number of SMPs increases by 1, the execution count of 'semu_timer_clocksource' increases by approximately '2 * 10^8' With this information, we can estimate the boot process completion time as 'sec_per_call * SMPs * 2 * 10^8 * (100% / 10%)' seconds, and thereby calculate the scale frequency parameter. For instance, if the estimated time is 200 seconds and the target time is 10 seconds, the scaling factor would be '10 / 200'. Close sysprog21#51 [1] https://docs.kernel.org/RCU/stallwarn.html#config-rcu-cpu-stall-timeout

jserv · 2025-01-23T03:20:41Z

Running on macOS + Apple M1, I got the following:

$ make SMP=8 check
...
[    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=32 to nr_cpu_ids=8.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
...
[   31.988673] rcu: INFO: rcu_sched self-detected stall on CPU
[   31.993102] rcu:     2-....: (5247 ticks this GP) idle=2e0c/1/0x40000002 softirq=517/517 fqs=2128
[   32.002647]  (t=5253 jiffies g=-891 q=11 ncpus=8)
[   32.007524] CPU: 2 PID: 8 Comm: kworker/u16:0 Not tainted 6.1.99 #1
[   32.013259] Hardware name: semu (DT)
[   32.016691] Workqueue: events_unbound async_run_entry_fn
[   32.022636] epc : do_reset+0x30/0x6c
[   32.027939]  ra : do_reset+0x38/0x6c
[   32.033080] epc : c03816d0 ra : c03816d8 sp : c0893e50
[   32.038049]  gp : c04de840 tp : c085af40 t0 : c0bfa5d0
[   32.042925]  t1 : 00000008 t2 : 6d2e8da2 s0 : c0893e60
[   32.047778]  s1 : c03a48b8 a0 : 00000001 a1 : 00000000
[   32.052489]  a2 : 00000000 a3 : 005ac9b1 a4 : 0025364e
[   32.057287]  a5 : 00000000 a6 : 00000000 a7 : 0000000c
[   32.061854]  s2 : 004725a7 s3 : 0038da58 s4 : 00000000
[   32.066685]  s5 : 00000030 s6 : 00000007 s7 : c03a496c
[   32.071414]  s8 : c03a48b8 s9 : c0381488 s10: c03a4968
[   32.076499]  s11: c0381894 t3 : c0a7e400 t4 : 00000003
[   32.081382]  t5 : 46f17671 t6 : c0893e08
[   32.085321] status: 00000120 badaddr: dfcac9b1 cause: 80000005
[   32.090664] [<c03816d0>] do_reset+0x30/0x6c
[   32.096556] [<c0381870>] write_buffer+0x40/0x64
[   32.102722] [<c038207c>] unpack_to_rootfs+0x1c4/0x300
[   32.109299] [<c0382848>] do_populate_rootfs+0x70/0xd8
[   32.114948] [<c002bd8c>] async_run_entry_fn+0x30/0xd0
[   32.120732] [<c0021990>] process_one_work+0x198/0x224
[   32.126716] [<c0021f18>] worker_thread+0x238/0x294
[   32.132557] [<c00289c0>] kthread+0xd4/0xd8
[   32.137496] [<c0002514>] ret_from_exception+0x0/0x1c
[   55.544062] Freeing initrd memory: 8188K

jserv

Rebase the latest master branch and resolve the conflicts.

bito-code-review · 2025-02-08T12:12:49Z

Code Review Agent Run #6023d9

Actionable Suggestions - 10

Makefile - 1
- Consider conditional profiling flag usage · Line 4-4
statistic/analyze_results.sh - 1
- Consider making base directory configurable · Line 27-29
utils.c - 5
- Consider making boot_complete variable static · Line 23-23
- Consider using integer constant for nanoseconds · Line 58-58
- Add error handling for time() call · Line 72-72
- Consider removing abrupt program termination · Line 126-129
- Consider defining magic number as constant · Line 151-151
main.c - 2
- Use defined CLOCKID macro for consistency · Line 691-691
- Consider checking clock_gettime return value · Line 691-691
auto_test.sh - 1
- Consider adding error handling for make · Line 1-19

Additional Suggestions - 4

mac_analyze.sh - 3
- Consider making SMP range configurable · Line 27-27
- Consider consistent data row formatting · Line 72-72
- Consider improving header format alignment · Line 24-24
auto_test.sh - 1
- Consider handling mkdir failure case · Line 4-4

Review Details

Files reviewed - 10 · Commit Range: 1f9c6a8..147bf47
- Makefile
- auto_test.sh
- mac_analyze.sh
- main.c
- riscv.c
- statistic/README
- statistic/analyze_results.sh
- statistic/draw.sh
- utils.c
- utils.h
Files skipped - 1
- .gitignore - Reason: Filter setting
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- Fb Infer (Static Code Analysis) - ✖︎ Failed

AI Code Review powered by

bito-code-review · 2025-02-08T12:21:25Z

Changelist by Bito

This pull request implements the following key changes.

Key Change	Files Impacted
Feature Improvement - Timer Scaling Implementation for Boot Process	- `Makefile` - Added profiling flag and boot target time configuration - `main.c` - Added time measurement initialization and clock configuration - `utils.c` - Implemented scaled timer functionality with real-time switching - `utils.h` - Added boot completion flag and updated timer initialization interface
Testing - Performance Analysis and Testing Scripts	- `auto_test.sh` - Added script for automated SMP testing - `mac_analyze.sh` - Added script for analyzing macOS test results - `analyze_results.sh` - Added script for analyzing performance results - `draw.sh` - Added script for generating performance graphs - `README` - Added CPU specifications for testing environments

Makefile

bito-code-review · 2025-02-08T12:21:28Z

statistic/analyze_results.sh

+  BASEDIR="profile-2"
+  LOGDIR="logs-${TAG}"           # Directory containing the log files
+  OUTFILE="$BASEDIR/results_summary-${TAG}.txt"


Consider making base directory configurable

Consider making the base directory path configurable through an environment variable or command line argument instead of hardcoding BASEDIR="profile-2". This would make the script more flexible and reusable across different environments.

Code suggestion

Check the AI-generated fix before applying

- BASEDIR="profile-2" - LOGDIR="logs-${TAG}" - OUTFILE="$BASEDIR/results_summary-${TAG}.txt" + BASEDIR="${BASE_DIR:-profile-2}" + LOGDIR="logs-${TAG}" + OUTFILE="$BASEDIR/results_summary-${TAG}.txt"

Code Review Run #6023d9

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

bito-code-review · 2025-02-08T12:21:29Z

utils.c

@@ -19,6 +20,14 @@
 #endif
 #endif

+bool boot_complete = false;


Consider making boot_complete variable static

Consider making boot_complete static since it appears to be an internal state variable that tracks boot completion status. Global variables should generally be avoided unless cross-module access is required.

Code suggestion

Check the AI-generated fix before applying

Suggested change

bool boot_complete = false;

static bool boot_complete = false;

Code Review Run #6023d9

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

bito-code-review · 2025-02-08T12:21:30Z

utils.c

+#if defined(HAVE_POSIX_TIMER)
+    struct timespec ts;
+    clock_gettime(CLOCKID, &ts);
+    return (uint64_t) ts.tv_sec * 1e9 + (uint64_t) ts.tv_nsec;


Consider using integer constant for nanoseconds

The floating-point literal 1e9 used in nanosecond calculations might cause precision issues with integer operations. Consider using the integer constant 1000000000ULL instead.

Code suggestion

Check the AI-generated fix before applying

- return (uint64_t) ts.tv_sec * 1e9 + (uint64_t) ts.tv_nsec; + return (uint64_t) ts.tv_sec * 1000000000ULL + (uint64_t) ts.tv_nsec; @@ -72 +72 @@ - return (uint64_t) now_sec * 1e9; + return (uint64_t) now_sec * 1000000000ULL;

Code Review Run #6023d9

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

bito-code-review · 2025-02-08T12:21:31Z

utils.c

+#else
+    /* Fallback to non-HRT calls time(0) in seconds => convert to ns. */
+    time_t now_sec = time(0);
+    return (uint64_t) now_sec * 1e9;


Add error handling for time() call

Consider using time() with error checking since it can return -1 on failure. Additionally, the nanosecond conversion using floating point 1e9 could lead to precision issues with large timestamps.

Code suggestion

Check the AI-generated fix before applying

Suggested change

return (uint64_t) now_sec * 1e9;

if (now_sec == -1) {

fprintf(stderr, "Error getting time\n");

return 0;

}

return (uint64_t) now_sec * 1000000000ULL;

Code Review Run #6023d9

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

bito-code-review · 2025-02-08T12:21:32Z

utils.c

+
+        exit(0);
+    }
+    return (uint64_t) ((int64_t) real_ticks - offset);


Consider removing abrupt program termination

Consider removing the exit(0) call after switching to real-time timer. This abrupt program termination could prevent proper cleanup and resource deallocation. Instead, consider using a return value or status flag to indicate the switch completion.

Code suggestion

Check the AI-generated fix before applying

Suggested change

exit(0);

}

return (uint64_t) ((int64_t) real_ticks - offset);

}

return (uint64_t) ((int64_t) real_ticks - offset);

Code Review Run #6023d9

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

bito-code-review · 2025-02-08T12:21:33Z

utils.c

+
+    timer->begin = mult_frac(host_time_ns(), timer->freq, 1e9);
+    boot_ticks = timer->begin;
+    ticks_increment = (SEMU_BOOT_TARGET_TIME * CLOCK_FREQ) / (2.0e8 * n_harts);


Consider defining magic number as constant

Consider defining the magic number 2.0e8 as a named constant to improve code maintainability and readability. The constant could be named something like SEMU_TIMER_SCALE_FACTOR to better document its purpose.

Code suggestion

Check the AI-generated fix before applying

-static double ticks_increment; +#define SEMU_TIMER_SCALE_FACTOR 2.0e8 +static double ticks_increment; @@ -151,1 +151,1 @@ - ticks_increment = (SEMU_BOOT_TARGET_TIME * CLOCK_FREQ) / (2.0e8 * n_harts); + ticks_increment = (SEMU_BOOT_TARGET_TIME * CLOCK_FREQ) / (SEMU_TIMER_SCALE_FACTOR * n_harts);

Code Review Run #6023d9

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

main.c

bito-code-review · 2025-02-08T12:21:35Z

auto_test.sh

+#!/usr/bin/env bash
+
+# Create a directory to store logs (optional)
+mkdir -p logs
+
+for N in $(seq 1 32); do
+    echo "============================================="
+    echo "Starting experiment with SMP=$N"
+    echo "============================================="
+
+    echo "Building and running 'make check SMP=$N'..."
+    make check SMP=$N 2>&1 | tee "logs/emulator_SMP_${N}.log"
+
+    echo "Done with SMP=$N. Logs saved:"
+    echo "  - logs/emulator_SMP_${N}.log"
+    echo
+done
+
+echo "All experiments complete!"


Consider adding error handling for make

Consider adding error handling for the make check command. If the command fails, it might be helpful to capture the exit status and handle failures appropriately. Could consider adding set -e at the beginning of the script to fail fast on any error.

Code suggestion

Check the AI-generated fix before applying

Suggested change

#!/usr/bin/env bash

# Create a directory to store logs (optional)

mkdir -p logs

for N in $(seq 1 32); do

echo "============================================="

echo "Starting experiment with SMP=$N"

echo "============================================="

echo "Building and running 'make check SMP=$N'..."

make check SMP=$N 2>&1 | tee "logs/emulator_SMP_${N}.log"

echo "Done with SMP=$N. Logs saved:"

echo " - logs/emulator_SMP_${N}.log"

echo

done

echo "All experiments complete!"

#!/usr/bin/env bash

set -e

# Create a directory to store logs (optional)

mkdir -p logs

failed_runs=()

for N in $(seq 1 32); do

echo "============================================="

echo "Starting experiment with SMP=$N"

echo "============================================="

echo "Building and running 'make check SMP=$N'..."

if ! make check SMP=$N 2>&1 | tee "logs/emulator_SMP_${N}.log"; then

failed_runs+=("$N")

fi

echo "Done with SMP=$N. Logs saved:"

echo " - logs/emulator_SMP_${N}.log"

echo

done

if [ ${#failed_runs[@]} -eq 0 ]; then

echo "All experiments completed successfully!"

else

echo "Experiments failed for SMP values: ${failed_runs[*]}"

exit 1

fi

Code Review Run #6023d9

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

jserv

Rebase the latest master branch and resolve the conflicts.

Mes0903 · 2025-02-13T16:36:26Z

The current issue lies in the fact that the bogoMIPS measurement may have a significant discrepancy compared to the actual value. I suspect that this discrepancy arises because the instruction cache (i-cache) is in a warm state during the bogoMIPS measurement, whereas in real scenarios, this is not always the case. However, I have yet to come up with an effective method to measure and verify this hypothesis.

Regardless, if we continue to use the scale frequency method, the complexity of the program will undoubtedly increase. We would need to refine the measurement approach or implement additional corrections during the boot process to better align the values with real-world conditions.

Fundamentally, this complexity arises because the scale frequency method requires three key pieces of information to be implemented:

The cost of semu_timer_clocksource
The proportion of semu_timer_clocksource
The number of times semu_timer_clocksource is called

Among these, the third piece of information is highly consistent across different environments, approximately equal to SMP count * 2.15 * 1e8. However, the other two values vary across different environments, which led to the introduction of bogoMIPS in the first place, as we needed a way to measure these two variables. Unfortunately, our current measurement approach does not accurately reflect real-world conditions.

For example, in an M2 Pro Max environment with SMP set to 32, the relationship between the values is illustrated in the figure below:

The red dashed line represents the value measured by bogoMIPS, while the blue line represents the actual value. The significant gap between the two results in the persistent occurrence of RCU CPU stall warnings.

I have re-evaluated the increment method. Since it only requires the number of semu_timer_clocksource calls for implementation and this value remains highly consistent across different environments:

As expected, this method effectively suppresses RCU CPU stall warnings across various environments.

Initially, we chose the scale frequency method because:

Using a hardcoded scale_factor made implementation relatively simple.
It provided an intuitive way to estimate the actual boot time (e.g., if scale_factor was set to 0.01, the actual boot time would be the printed value multiplied by 100).

However, after a series of modifications, these advantages no longer hold. Additionally, the implementation of the scale frequency method has become more complex than the increment method. Furthermore, the increment method eliminates the need to call clock_gettime, providing another advantage.

Therefore, I think we should switch to the increment method to resolve the RCU CPU stall warning issue.

Mes0903 force-pushed the scale-frequency branch from 67b96a5 to 7c0fccd Compare January 11, 2025 12:28

jserv requested review from chiangkd and ranvd January 11, 2025 16:51

This comment was marked as resolved.

Sign in to view

jserv requested a review from ChinYikMing January 11, 2025 17:34

Mes0903 force-pushed the scale-frequency branch from 7c0fccd to 9395054 Compare January 11, 2025 19:25

ranvd reviewed Jan 12, 2025

View reviewed changes

shengwen-tw mentioned this pull request Jan 12, 2025

VirtIO RNG and unexpected behavior of arc4random_buf() standard function of C #68

Closed

jserv reviewed Jan 12, 2025

View reviewed changes

utils.c Outdated Show resolved Hide resolved

jserv reviewed Jan 12, 2025

View reviewed changes

utils.c Outdated Show resolved Hide resolved

jserv reviewed Jan 12, 2025

View reviewed changes

utils.c Outdated Show resolved Hide resolved

Mes0903 mentioned this pull request Jan 12, 2025

RCU CPU stall warning in a multi-core system simulation #51

Closed

jserv reviewed Jan 13, 2025

View reviewed changes

utils.c Outdated Show resolved Hide resolved

jserv reviewed Jan 13, 2025

View reviewed changes

utils.h Outdated Show resolved Hide resolved

jserv reviewed Jan 14, 2025

View reviewed changes

utils.c Outdated Show resolved Hide resolved

jserv reviewed Jan 14, 2025

View reviewed changes

utils.c Outdated Show resolved Hide resolved

jserv reviewed Jan 14, 2025

View reviewed changes

utils.c Outdated Show resolved Hide resolved

jserv reviewed Jan 15, 2025

View reviewed changes

utils.c Outdated Show resolved Hide resolved

jserv reviewed Jan 15, 2025

View reviewed changes

utils.h Outdated Show resolved Hide resolved

Mes0903 force-pushed the scale-frequency branch 2 times, most recently from bdf4a62 to cb265ae Compare January 15, 2025 03:55

This comment was marked as outdated.

Sign in to view

Mes0903 force-pushed the scale-frequency branch from cb265ae to ee5506e Compare January 15, 2025 05:25

This comment was marked as outdated.

Sign in to view

Mes0903 force-pushed the scale-frequency branch from 4fae085 to e814185 Compare January 17, 2025 13:50

jserv reviewed Jan 18, 2025

View reviewed changes

Makefile Outdated Show resolved Hide resolved

Mes0903 force-pushed the scale-frequency branch from e814185 to 16e0fc7 Compare January 21, 2025 22:54

jserv requested changes Jan 22, 2025

View reviewed changes

jserv reviewed Jan 22, 2025

View reviewed changes

utils.c Outdated Show resolved Hide resolved

jserv reviewed Jan 22, 2025

View reviewed changes

utils.h Outdated Show resolved Hide resolved

Mes0903 force-pushed the scale-frequency branch from 16e0fc7 to 1f9c6a8 Compare January 23, 2025 00:32

WIP: profiling for ns_per_call

fb3bf51

Mes0903 force-pushed the scale-frequency branch from fbf1f0c to fb3bf51 Compare January 26, 2025 15:17

Mes0903 added 6 commits January 26, 2025 23:27

WIP: remove gprof

2172626

WIP: parse data

471877d

WIP: update new profiling data

c9095c3

WIP: Statistic data within all environment again

640aa66

WIP: optimize profile method

f548bda

WIP: correct percentage calculation

544156c

jserv requested changes Feb 2, 2025

View reviewed changes

WIP: Test increase base method

147bf47

bito-code-review bot reviewed Feb 8, 2025

View reviewed changes

Makefile Show resolved Hide resolved

bito-code-review bot reviewed Feb 8, 2025

View reviewed changes

main.c Show resolved Hide resolved

bito-code-review bot reviewed Feb 8, 2025

View reviewed changes

main.c Show resolved Hide resolved

bito-code-review bot reviewed Feb 8, 2025

View reviewed changes

jserv requested changes Feb 9, 2025

View reviewed changes

Mes0903 marked this pull request as draft February 14, 2025 14:12

	bool boot_complete = false;
	static bool boot_complete = false;

-    return (uint64_t) now_sec * 1e9;
+    if (now_sec == -1) {
+        fprintf(stderr, "Error getting time\n");
+        return 0;
+    }
+    return (uint64_t) now_sec * 1000000000ULL;

Scale frequency to suppress RCU CPU stall warning #67

Are you sure you want to change the base?

Scale frequency to suppress RCU CPU stall warning #67

Uh oh!

Conversation

Mes0903 commented Jan 11, 2025 • edited by bito-code-review bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Bito

Uh oh!

This comment was marked as resolved.

Uh oh!

ranvd Jan 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mes0903 Jan 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mes0903 commented Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Mes0903 commented Jan 15, 2025

Uh oh!

Uh oh!

jserv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jserv commented Jan 23, 2025

Uh oh!

jserv left a comment

Choose a reason for hiding this comment

Uh oh!

bito-code-review bot commented Feb 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Agent Run #6023d9

Uh oh!

bito-code-review bot commented Feb 8, 2025

Changelist by Bito

Uh oh!

Uh oh!

bito-code-review bot Feb 8, 2025

Choose a reason for hiding this comment

Uh oh!

bito-code-review bot Feb 8, 2025

Choose a reason for hiding this comment

Uh oh!

bito-code-review bot Feb 8, 2025

Choose a reason for hiding this comment

Uh oh!

bito-code-review bot Feb 8, 2025

Choose a reason for hiding this comment

Uh oh!

bito-code-review bot Feb 8, 2025

Choose a reason for hiding this comment

Uh oh!

bito-code-review bot Feb 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bito-code-review bot Feb 8, 2025

Choose a reason for hiding this comment

Uh oh!

jserv left a comment

Choose a reason for hiding this comment

Mes0903 commented Jan 11, 2025 •

edited by bito-code-review bot

Loading

ranvd Jan 12, 2025 •

edited

Loading

Mes0903 Jan 12, 2025 •

edited

Loading

Mes0903 commented Jan 13, 2025 •

edited

Loading

bito-code-review bot commented Feb 8, 2025 •

edited

Loading