Debugging the Xserver
This minihowto attempts to explain how to debug the X server, particularly in the case where the server crashes. It assumes a basic familiarity with unix and a willingness to risk deadlocking the machine.
Just as a warning, if you try this with a closed-source driver, the output is not likely to be very useful.
You'll really want to have a second machine around. It's very difficult to debug the X server from within itself; when it stops and returns control to the debugger, you won't be able to send events to the xterm running your debugger. ssh is your friend here. If you don't have a second machine, see the Debugging with one machine section, and good luck.
Your gdb needs to be reasonably recent, 5.3 or better is probably good.
And of course, you'll need a reproduceable way of crashing the X server, but if you've read this far you've probably got that already. This is your testcase.
If you're debugging with a modern distribution, then they probably already have 'debuginfo' packages available. These packages (usually quite large) include the debugging symbols for the software you have installed, which makes tools like gdb much more useful. Refer to your distro's documentation for details on how to install these. You'll probably want at least the debuginfo for the X server itself, and for the video driver you're using. For example, on a Fedora machine, you'd say:
debuginfo-install xorg-x11-server-Xorg xorg-x11-drv-ati
On Debian or Ubuntu you'd say
apt-get install xserver-xorg-core-dbg xserver-xorg-video-ati-dbg
Otherwise, if you're building X yourself, you'll need to have built X with debugging information. To pass compiler flags in at build time, say:
CFLAGS='-O0 -g3' ./configure --prefix=...
All the normal configure options should work as expected. You may want to put your debuggable server in a different prefix. Be careful of ModulePath and other such path statements in your xorg.conf.
Remember that if you're trying to debug into a driver, you'll want to repeat this step for the driver as well as for the server core.
Start the server normally. Go over to your second machine and ssh into the first one. su root, and type
gdb /opt/xorg-debug/Xorg $(pidof Xorg)
gdb /usr/bin/Xorg $(pidof X)
depending on your setup.
Note that even when running with a ssh, X might cripples the console. You can avoid this by passing this option:
-keeptty don't detach controlling tty (for debugging only)
gdb will attach to the running server and spin for a while reading in symbols from all the drivers. Eventually you'll reach a (gdb) prompt. Notice that the X server has halted; type cont at the gdb prompt to continue executing.
Go back to the machine running X, and run your testcase. This time, instead of the server crashing, it should freeze, and gdb should tell you the server got a signal (usually SIGSEGV), as well as the function and line of code where the problem happened. An example looks like:
Program received signal SIGSEGV, Segmentation fault. 0x403245a3 in fbBlt (srcLine=0xc1a1c180, srcStride=59742, srcX=0, dstLine=0x4240cb6c, dstStride=1152, dstX=0, width=32960, height=764, alu=-1046602744, pm=1111538028, bpp=32, reverse=0, upsidedown=0) at fbblt.c:174 174 *dst++ = FbDoDestInvarientMergeRop(*src++);
This by itself is pretty helpful, but there's more info out there. At the gdb prompt, type bt f for a full stack backtrace. (Warning, this will be long!) This dumps out the full call chain of functions from main() on down, as well as the arguments they were called with and the value of all local variables. Keep hitting enter until you get back to the gdb prompt.
Get your mouse out, copy all the output from "Program received..." on down, and paste it into a file on your second machine. Type detach at the gdb prompt to detach gdb from the server and let it finish crashing. Go to http://bugs.freedesktop.org/ and file a new bug describing the testcase. Attach the gdb output to the bug (please don't just paste it into the comments section).
All the gdb commands you'll ever need
For any gdb command, you can say "help <command>" at the (gdb) prompt to get a (hopefully informative) explanation.
bt - Prints a stack backtrace. This shows all the functions that you are currently inside, from main() on down to the point of the crash, along with their arguments. Appending the word full (or just the letter f) also prints out the value of all the local variables within each function.
list - Prints the source around the current frame. When invoked multiple times, it will print the next lines, making it useful for quick code inspection. "list -" prints the source code backwards (starting from the current frame). This is useful to inspect the lines of code that led to an error.
break / clear - break sets a breakpoint. When execution reaches a breakpoint, the debugger will stop the program and return you to the gdb prompt. You can set breakpoints on functions, lines of code, or individual instructions; see the help text for details. clear, naturally, clears a breakpoint.
step / next - step and next allow you to manually advance the program's execution. next runs the program until you reach a different source line; step does the same thing, but also descends into called functions.
continue - continue the program normally until the next breakpoint is hit.
print - Prints the expression. You can specify variable names, registers, and absolute addresses, as well as more complex expressions (help print for details). Variable names have to be resolveable, which means they either have to be local variables within the current stack frame or global variables. Register names start with a $ sign, like print $eax. Addresses are specified as numbers, like print 0xdeadbeef.
Expressions can be fairly complex. For example, if you have a pointer to a structure named foo, print foo will print the memory address that foo points to, print *foo will print the structure being pointed too, and print foo->bar will print the bar member of the foo structure.
handle - Tells the debugger how to handle various signals. The defaults are mostly sensible, but there are two you may wish to change. SIGPIPE is generated when a client dies, which you may not always care about, and SIGUSR1 is generated on VT switch. By default, the debugger will halt the running process when it receives these signals; to change this, say handle SIGPIPE nostop and handle SIGUSR1 nostop. (Note: Don't use handle SIGUSR1 ignore or you can confuse things quite badly---for example, having multiple X servers simultaneously active on the same VT can be very confusing.)
set environment - Sets environment variables. The syntax is set environment name value; don't use an = sign like in bash, it won't do what you expect.
run - Runs the program. If you only specify a program name on the command line (and not a process ID or a core file), gdb will load the program but not start running it until you say so. Arguments to run are passed verbatim to the child process, eg run :0 -verbose -ac.
kill - Kills the program being debugged. Not always useful, you'd often rather say...
detach - which detaches the debugger from the running program, which can then shut down gracefully.
disassemble - Prints the assembly instructions being executed, starting at the current source line. You can also specify absolute memory references or function names to start disassembly somewhere other than the default. Only useful if you can read the assembly language of your CPU.
finish - Continue until exit of current function. Will also print the return value of the function (if applicable).\
Note that most commands can be used in an abbreviated version (e.g. n instead of next). Just try it yourself!
Things that can go wrong
The biggest thing to watch out for is attempting to print memory contents when that memory is located on the video card. It won't work, on x86 anyway, for some not-very-interesting reasons. You'll know when you did it because the machine will deadlock and you'll have to reboot. See the DebuggingHints file (below) for workarounds.
Some issues with running X under gdb may be resolved by passing the -dumbSched option to the X server. This worked for me to resolve crashes of gdb 6.3 and strange loops in gdb 5.3. You'll know if you need this option because gdb will get very confused by SIGALRM. Even if gdb isn't misbehaving, the -dumbSched option can be very helpful to avoid the SIGALRM peridocially interrupting your debugging session.
Likewise, some gdb versions crash when starting the X server when attempting to run xkbcomp. This is, amazingly enough, a bug in the kernel's DRM code for suppressing some signals; it should be fixed in 2.6.28 if not earlier. You can disable XKB by passing the -kb option on the server's command line; obviously if you're trying to debug XKB this may cause you some problems and you're probably better off attaching gdb to a running X instead. Alternatively, disable DRI, but again, if DRI is the thing you're trying to debug, that won't help.
When you compile with optimization, the values printed by bt can sometimes be confusing. Some variables can get optimized out of existance, some variables occupy the same position on the stack during different parts of a function's execution, and some functions might not show up on the stack at all. Also, single-stepping can be confusing because the function might get executed in a different order than listed in the source if the compiler determines that's safe to do. gcc 4.0 seems to be much more aggressive at confusing the debugger than earlier versions, although it does emit more debugging information such that you'll at least know when variables have been optimized away. As always, lowering the optimization level improves debuggability.
There is a DebuggingHints file available online. It contains a lot of helpful (if slightly dated) information on how to debug the server, including how to dump PCI memory without deadlocking the machine. In particular, you'll want to read this if you're trying to debug a server older than 6.9.
Debugging with one machine
The script below allows you to run the server in gdb and catch the gdb output in a file. You cannot interactively control gdb, however the Xserver should not hang gdb by stopping inside the debugger while you cannot control it from a terminal. Store the following script in some file (for example: /tmp/Xdbg:
#GDB=... #XSERVER=... ARGS=$* PID=$$ test -z "$GDB" && GDB=gdb test -z "$XSERVER" && XSERVER=/usr/bin/Xorg cat > /tmp/.dbgfile.$PID << HERE file $XSERVER set confirm off set args $ARGS handle SIGUSR1 nostop handle SIGUSR2 nostop handle SIGPIPE nostop run bt full cont quit HERE $GDB --quiet --command=/tmp/.dbgfile.$PID &> /tmp/gdb_log.$PID rm -f /tmp/.dbgfile.$PID echo "Log written to: /tmp/gdb_log.$PID"
Then (as root) do:
chmod u+x /tmp/Xdbg mv /usr/X11R6/bin/X /usr/X11R6/bin/X.org ln -sf /tmp/Xdbg /usr/X11R6/bin/X
If you are using a module aware debugger you should remove the comment sign # form the line starting with #GDB and add the full path to your debugging gdb. You can now start your Xserver like normal. Note, that if you use startx you should do so as root. When the Xserver crashes the output of the server should have been written to /tmp/gdb_log.<number> together with a backtrace. If your Xserver resides at some other place you can use the XSERVER environment variable to specify the path. To restore the previous setup do:
mv /usr/X11R6/bin/X.org /usr/X11R6/bin/X
If you only have one machine available, you might be able to pry some useful information from the server when it crashes. The downside is that it will probably halt your machine entirely rather than just crashing X.
Edit your xorg.conf file and find the ServerFlags section. Uncomment the
line (or add it if it doesn't exist). This will prevent the server from catching fatal signals, which should cause core dumps instead. (You need to make sure you have core dumps enabled for the server by removing the appropriate ulimit; see the ulimit command in the bash man page for details.)
The problem here is the same as mentioned earlier; the core dump will attempt to included mmap()'d sections of card memory, which will make the machine freeze. Usually the core dump is informative enough to at least give a partial backtrace.
Once you've crashed the machine, find the core file and load it in gdb:
gdb `which Xorg` /path/to/core/file
and try to bt f like normal. Fortunately at this point you can't make the machine crash again.
Debugging with gdbserver
Run X on the target using gdbserver, listening on (for example) port 2500:
gdbserver :2500 /usr/bin/X
Attach to the running process from gdb, running it from an environment in which you have Xorg installed. In my case, this is a chroot environment. If I try to debug the program from the host environment, without chrooting into my Xorg build environment, gdb cannot find the symbols correctly.
root:/usr/src/xc-build# gdb GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu". (gdb) file programs/Xserver/Xorg Reading symbols from /usr/src/xc-build/programs/Xserver/Xorg...done.Using host libthread_db library "/lib/libthread_db.so.1". (gdb) target remote 192.168.0.134:2401 Remote debugging using 192.168.0.134:2401 0xb7fed7b0 in ?? () (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. 0xb7a92524 in GXDisplayVideo (pScrni=0x828bd38, id=0xb7aa9490, offset=0x17, width=0x82a, height=0xe730, pitch=0xb7aa946c, x1=0x8289920, y1=0x0, x2=0x0, y2=0x0, dstBox=0x82ae680, src_w=0x82a, src_h=0xe794, drw_w=0x828, drw_h=0x8638) at amd_gx_video.c:849 849 GFX(set_video_enable(1)); (gdb)
Note in this example that I specify the program to be debugged with a gdb command to read the Xorg symbols:
(gdb) file programs/Xserver/Xorg
This is simply an alternative to running gdb like this: