Replies: 7 comments 7 replies
-
First off use strace -T -ttt to see timing information on the syscalls, then
you can get profiling information on what kitty is doing as described
here:
https://sw.kovidgoyal.net/kitty/performance/#instrumenting-kitty
Compare the profiles between kakoune and your loop that should give you
a good idea of what is the root cause.
|
Beta Was this translation helpful? Give feedback.
-
Also, I am assuming that you double checked that the termios settings in the two are exactly the same? Without that you can see significant slow downs in the kernel. |
Beta Was this translation helpful? Give feedback.
-
Ok, so I did some extra investigation, and the problem is very weird. Basically, every time Kakoune prints to the screen, it also sets the title by I also tried cutting up this Very strange indeed. |
Beta Was this translation helpful? Give feedback.
-
Ok, another discovery, apparently, in order to achieve the performance improvements, the title has to change, you can't just set it to be the same thing as before. Patched version of my C code, which now sets the title#include <asm-generic/ioctls.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/select.h>
#include <sys/types.h>
#include <termios.h>
#include <time.h>
#include <unistd.h>
struct termios orig_attr;
void set_raw_mode() {
struct termios attr = orig_attr;
attr.c_iflag &=
~(IGNBRK | BRKINT | PARMRK | ISTRIP | INLCR | IGNCR | ICRNL | IXON);
attr.c_oflag &= ~OPOST;
attr.c_lflag &= ~(ECHO | ECHONL | ICANON | ISIG | IEXTEN);
attr.c_lflag |= NOFLSH;
attr.c_cflag &= ~(CSIZE | PARENB);
attr.c_cflag |= CS8;
attr.c_cc[VMIN] = attr.c_cc[VTIME] = 0;
tcsetattr(STDIN_FILENO, TCSANOW, &attr);
}
void setup_terminal() {
write(STDOUT_FILENO,
"\033[?1049h\033[?1004h\033[>4;1m\033[>5u\033[22t\033[?"
"25l\033=\033[?2004h",
50);
}
void restore_terminal() {
write(STDOUT_FILENO,
"\033>\033[?25h\033[23t\033[<u\033[>4;0m\033[?1004l\033["
"?1049l\033[?2004l\033[m",
52);
}
int main() {
struct winsize w;
ioctl(STDOUT_FILENO, TIOCGWINSZ, &w);
tcgetattr(STDIN_FILENO, &orig_attr);
char hashes[] =
"########################################################################"
"########################################################################"
"########################################################################"
"##############################";
char *cooked_string = (char *)malloc(10000 * sizeof(char));
int len = 0;
sprintf(cooked_string, "\x1b[?2026h");
len += 9;
for (int j = 0; j < w.ws_row; j++) {
sprintf(cooked_string + len, "\x1b[%dH", j + 1);
len += 3 + ((j + 1) >= 10 ? 2 : 1);
strncpy(cooked_string + len, hashes, w.ws_col / 3);
len += w.ws_col / 3;
}
sprintf(cooked_string + len, "\x1b[?2026l");
len += 9;
if (!isatty(STDOUT_FILENO))
return -1;
setup_terminal();
set_raw_mode();
struct timespec t;
t.tv_nsec = 10000000;
write(STDOUT_FILENO, "\x1b[?2026$p", 9);
int i = 0;
while (1) {
char *ch;
fd_set rfds, wfds, efds;
FD_ZERO(&rfds);
FD_ZERO(&wfds);
FD_ZERO(&efds);
FD_SET(STDIN_FILENO, &rfds);
struct timeval timeout = {.tv_sec = 0, .tv_usec = 500000};
int res = select(1, &rfds, &wfds, &efds, &timeout);
if (res == 1) {
char buf[50];
read(STDIN_FILENO, &buf, 50);
struct timeval no_timeout = {.tv_sec = 0, .tv_usec = 0};
if (select(1, &rfds, &wfds, &efds, &no_timeout) == 0) {
// Title is being set here.
char *title;
asprintf(&title, "\x1b]2;My title here lmao %d\007", i);
write(STDOUT_FILENO, title, strlen(title));
// By commenting this, the title stays the same, and performance stays bad.
i++;
int sent = 0;
while (sent < len) {
int to_send = 4096 < len - sent ? 4096 : len - sent;
write(STDOUT_FILENO, cooked_string + sent, to_send);
sent += to_send;
}
}
}
}
tcsetattr(STDIN_FILENO, TCSANOW, &orig_attr);
restore_terminal();
} This version of my code performs as well as Kakoune. While I'm happy that I found a way to make my editor performant, It's veeeeeeery strange that setting the title on every update was the way to do it. |
Beta Was this translation helpful? Give feedback.
-
It seems like what I thought was a performance problem in my text editor... might actually be a bug in kitty? I'm testing these patches in gnome-terminal and there is no performance difference between the two. Should I open an issue for this? |
Beta Was this translation helpful? Give feedback.
-
Maybe it could be relevant, but I am using hyprland, so the problem could be in the wayland version of kitty? I do remember this being an issue back when I was on gnome, but I was using kitty through xwayland back then. Also my computer is a somewhat old, medium-low end laptop, so the performance wasn't gonna be that great to begin with, but the title setting difference is real. |
Beta Was this translation helpful? Give feedback.
-
On Mon, Oct 06, 2025 at 03:23:05PM -0700, ahoyiski wrote:
Maybe it could be relevant, but I am using hyprland, so the problem could be in the wayland version of kitty?
Set the following:
input_delay 0
repaint_delay 2
sync_to_monitor no
wayland_enable_ime no
in kitty.conf and see if it makes a difference.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
TLDR: My program calls the same syscalls (particularly
write
) with less content than another program, with the same frequency, and yet the other program is much faster and smoother than mine, and I don't know why that could be.Hello, I've been building a text editor, and in that process, one thing that was always in the back of my mind was performance of the terminal. For whatever reason, my text editor always seemed slower than more established text editors (like neovim, or as a case study for the rest of this discussion, Kakoune). The performance of the editor itself was fine, but when it came to printing, the terminal seemed to be twice as slow (from the cpu usage of kitty on btop), and much more stuttery.
Recently I've been on a journey of trying to rectify that. Initially, I assumed that my editor was slow because it wasn't scrolling text through the scrolling ansi escape sequences. That didn't turn out to be the culprit. Then I started analyzing the code of Kakoune, and it seemed remarkably simple, the editor simply prints to the whole screen with synchronization escape sequences before and after. This was surprising to me, since that was pretty much exactly what I was already doing.
So then I started to analyze the syscalls of both programs with
strace
. And again, there were very few surprises. The program was using thewrite
syscall to print everything on screen on every frame (roughly 100 FPS). But there were also some additional syscalls, specificallypselect
andfcntl
. I later discovered that patchingfcntl
out of the loop had no effect, so I ended up thinking that the reason for the speed was thepselect
syscall.However, throughout the whole list of syscalls called by Kakoune, the only ones which involved
stdout
werewrite
andfcntl
. And as I mentioned, removing the calls tofcntl
had no effect, so Kakoune was really just callingwrite
, the same thing that I was doing, and yet it was twice as fast.Fast forward a bit, I ended up writing a minimal version of Kakoune's printing loop in C, which made use of the same syscalls in the same way, but with much less text per
write
. Once again, the performance of Kakoune also beat that experiment, which ended up similarly poorly to my own text editor (according to the cpu usage of kitty on btop).The C code in question
I have measured the time taken in between each frame of Kakoune, and it is 10 ms, which is my keyboard repeat period. Interestingly, if I patch Kakoune's code and "bypass" the
pselect
calls by printing the same frame every 10 ms in a 10 second for loop, the performance profile drops to the same level as my text editor, even though the frequency ofwrite
syscalls stays unchanged.I'm not going to lie, from what I've seen, Kakoune seems to essentially be doing magic to get better performance out of
write
. I tried really hard to look for how they did it, but even the internet was also of no help. I know this isn't strictly speaking "related" to kitty, but I figured this would be a good place to ask, considering the contributors of a terminal emulator are probably knowledgeable about things like syscalls and how to write performant programs for them.I also noticed that this is an issue on other modern TUI text editors, like helix, which seems to indicate a commonality for this problem.
Below, I've attached the
strace
output for Kakoune and my dummy C code. You can also find my patched version of Kakoune on (this link). I am also running all of this on Arch Linux, if that matters for this discussion.strace of patched Kakoune
strace of the C test code
There is also some extra information on the
README.asciidoc
in the repository of my kakoune patch.Beta Was this translation helpful? Give feedback.
All reactions