OFJ Confluence will not be available on 6th of Nov 2024, from 13:00 CET until 14:00 CET
🎩🐇🃏🔮
This page contains many helpful suggestions for debugging issues with the Linux kernel.
wdt0: watchdog@12800800 { compatible = "renesas,r9a07g044-wdt", "renesas,rzg2l-wdt"; reg = <0 0x12800800 0 0x400>; clocks = <&cpg CPG_MOD R9A07G044_WDT0_PCLK>, <&cpg CPG_MOD R9A07G044_WDT0_CLK>; clock-names = "pclk", "oscclk"; interrupts = <GIC_SPI 49 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 50 IRQ_TYPE_LEVEL_HIGH>; interrupt-names = "wdt", "perrout"; resets = <&cpg R9A07G044_WDT0_PRESETN>; power-domains = <&cpg>; status = "disabled"; };
$ grep -R "renesas,rzg2l-wdt" * arch/arm64/boot/dts/renesas/r9a07g044.dtsi: "renesas,rzg2l-wdt"; arch/arm64/boot/dts/renesas/r9a07g044.dtsi: "renesas,rzg2l-wdt"; arch/arm64/boot/dts/renesas/r9a07g044.dtsi: "renesas,rzg2l-wdt"; arch/arm64/boot/dts/renesas/r9a07g043.dtsi: "renesas,rzg2l-wdt"; arch/arm64/boot/dts/renesas/r9a07g043.dtsi: "renesas,rzg2l-wdt"; arch/arm64/boot/dts/renesas/r9a07g054.dtsi: "renesas,rzg2l-wdt"; arch/arm64/boot/dts/renesas/r9a07g054.dtsi: "renesas,rzg2l-wdt"; arch/arm64/boot/dts/renesas/r9a07g054.dtsi: "renesas,rzg2l-wdt"; Documentation/devicetree/bindings/watchdog/renesas,wdt.yaml: - const: renesas,rzg2l-wdt # RZ/G2L Documentation/devicetree/bindings/watchdog/renesas,wdt.yaml: - renesas,rzg2l-wdt drivers/watchdog/rzg2l_wdt.c: { .compatible = "renesas,rzg2l-wdt", }, scripts/dtc/include-prefixes/arm64/renesas/r9a07g044.dtsi: "renesas,rzg2l-wdt"; scripts/dtc/include-prefixes/arm64/renesas/r9a07g044.dtsi: "renesas,rzg2l-wdt"; scripts/dtc/include-prefixes/arm64/renesas/r9a07g044.dtsi: "renesas,rzg2l-wdt"; scripts/dtc/include-prefixes/arm64/renesas/r9a07g043.dtsi: "renesas,rzg2l-wdt"; scripts/dtc/include-prefixes/arm64/renesas/r9a07g043.dtsi: "renesas,rzg2l-wdt"; scripts/dtc/include-prefixes/arm64/renesas/r9a07g054.dtsi: "renesas,rzg2l-wdt"; scripts/dtc/include-prefixes/arm64/renesas/r9a07g054.dtsi: "renesas,rzg2l-wdt"; scripts/dtc/include-prefixes/arm64/renesas/r9a07g054.dtsi: "renesas,rzg2l-wdt";
static const struct of_device_id rzg2l_wdt_ids[] = { { .compatible = "renesas,rzg2l-wdt", }, { /* sentinel */ } }; MODULE_DEVICE_TABLE(of, rzg2l_wdt_ids); static struct platform_driver rzg2l_wdt_driver = { .driver = { .name = "rzg2l_wdt", .of_match_table = rzg2l_wdt_ids, }, .probe = rzg2l_wdt_probe, }; module_platform_driver(rzg2l_wdt_driver); MODULE_DESCRIPTION("Renesas RZ/G2L WDT Watchdog Driver"); MODULE_AUTHOR("Biju Das "); MODULE_LICENSE("GPL v2");
$ cat /proc/interrupts
Sometimes you might have a case were u-boot loads the kernel image and device tree, but when you start to boot the kernel, nothing happens. The last message you see is "Starting kernel..."
For example:
Hit any key to stop autoboot: 0
19843584 bytes read in 827 ms (22.9 MiB/s)
59411 bytes read in 6 ms (9.4 MiB/s)
Moving Image from 0x48080000 to 0x48200000, end=49560000
## Flattened Device Tree blob at 48000000
Booting using the fdt blob at 0x48000000
Using Device Tree in place at 0000000048000000, end 0000000048011812
Starting kernel ...
The reason is that the kernel will start to run, but something makes it crash before the serial console is configured. So, you will never get to see the error messages. For example, if you have a mistake in your device tree for the memory configuration, the system will crash early before the serial console driver is loaded.
However, there is a way for your to print out those early messages. You can tell the kernel to continue to use the same serial connection that u-boot used since it is already configured and working. To use that serial port peripheral, the kernel only needs a very simple driver where each serial characters is sent out one at a time since there will not be any interrupts set up yet. This driver is already support for all the RZ devices.
This feature is called earlycon
To enable, add earlycon to your kernel boot arguments to print out the error messages.
=> setenv bootargs 'root=/dev/mmcblk1p2 rootwait earlycon'
=> booti 0x48080000 - 0x48000000'
Then, you will at least see some text come out that might be helpful to you.
Starting kernel ...
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd073]
[ 0.000000] Linux version 5.10.184-cip36-yocto-standard (oe-user@oe-host) (aarch64-poky-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.31.1) #1 SMP PREEMPT Tue Apr 5 23:00:00 UTC 2011
[ 0.000000] Machine model: HopeRun HiHope RZ/G2N with sub board
[ 0.000000] earlycon: scif0 at MMIO 0x00000000e6e88000 (options '115200n8')
[ 0.000000] printk: bootconsole [scif0] enabled
[ 0.000000] efi: UEFI not found.
[ 0.000000] Reserved memory: created CMA memory pool at 0x0000000058000000, size 512 MiB
[ 0.000000] OF: reserved mem: initialized node linux,cma@58000000, compatible id shared-dma-pool
[ 0.000000] Reserved memory: created CMA memory pool at 0x0000000078000000, size 256 MiB
[ 0.000000] OF: reserved mem: initialized node linux,multimedia@70000000, compatible id shared-dma-pool
For example, say that the system was saying it was playing audio, but nothing was coming out. Basically if aplay works and you don't see anything, then it's 1 of 2 issues:
1. The pins are not really configured correctly
2. You have the channel muted so the alsa subsystem doesn't even bother sending data to the driver.
The driver is sound/soc/sh/rz-rssi.c
The way alsa drivers work is, first configures the audio for the correct sample rate, then it breaks the data up into blocks and feeds them one at a time. Basically, here are the APIs:
static const struct snd_soc_dai_ops rz_ssi_dai_ops = {
.trigger = rz_ssi_dai_trigger,
.set_fmt = rz_ssi_dai_set_fmt,
.hw_params = rz_ssi_dai_hw_params,
};
The function rz_ssi_dai_trigger() is the one that the audio subsystems calls over an over again with blocks of data to send out.
By putting a printk in that function to see if it is really getting called or not.
That will at least point you in a direction to look next.
There are many different situations that may lead to a kernel panic, the one below is an example:
root@smarc-rzg2ul:~# mount -t nfs 192.168.10.118:/home/micbis/renesas/repos/vlp3/build/tmp/deploy/images/smarc-rzg2ul /mnt
root@smarc-rzg2ul:~# cp /mnt/core-image-weston-smarc-rzg2ul-20221207082338.rootfs.ext4 .
[ 40.178861] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000402
[ 40.187626] Mem abort info:
[ 40.190404] ESR = 0x96000004
[ 40.193443] EC = 0x25: DABT (current EL), IL = 32 bits
[ 40.198727] SET = 0, FnV = 0
[ 40.201763] EA = 0, S1PTW = 0
[ 40.204885] Data abort info:
[ 40.207748] ISV = 0, ISS = 0x00000004
[ 40.211562] CM = 0, WnR = 0
[ 40.214514] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000055024000
[ 40.220921] [0000000000000402] pgd=0000000000000000, p4d=0000000000000000
[ 40.227683] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 40.233227] Modules linked in: vspm_if(O) vspm(O) mmngrbuf(O) mmngr(O)
[ 40.239745] CPU: 0 PID: 237 Comm: cp Tainted: G O 5.10.131-cip13-yocto-standard #1
[ 40.248571] Hardware name: Evaluation board based on r9a07g043u11 (DT)
[ 40.255066] pstate: 00400085 (nzcv daIf +PAN -UAO -TCO BTYPE=--)
[ 40.261056] pc : ___slab_alloc.constprop.105+0x58/0x6f8
[ 40.266255] lr : ___slab_alloc.constprop.105+0x61c/0x6f8
[ 40.271538] sp : ffff80001a46b690
[ 40.274833] x29: ffff80001a46b690 x28: ffff80001a46bc58
[ 40.280122] x27: ffff000009413d00 x26: ffff00007fc10920
[ 40.285410] x25: 0000000000000000 x24: ffff800011059000
[ 40.290699] x23: 000000000007abf8 x22: 0000000000000dc0
[ 40.295987] x21: 0000000000000dc0 x20: fffffe000009edc0
[ 40.301276] x19: ffff800010da0c18 x18: 00000000000000c0
[ 40.306564] x17: 0000000000000010 x16: 0000000000000000
[ 40.311852] x15: 0000000000000001 x14: 0000000000000003
[ 40.317140] x13: 000000000004ff76 x12: dead000000000122
[ 40.322428] x11: 0000000000000000 x10: ffff00000f521400
[ 40.327716] x9 : 000000000000000e x8 : 0000000000000001
[ 40.333005] x7 : ffff80006ed13000 x6 : ffff800011190000
[ 40.338292] x5 : 0000000000440000 x4 : fffffe000009ee20
[ 40.343581] x3 : 0000000000000403 x2 : 000000000006c808
[ 40.348869] x1 : 0000000000000403 x0 : 0000000000000402
[ 40.354159] Call trace:
[ 40.356595] ___slab_alloc.constprop.105+0x58/0x6f8
[ 40.361450] __slab_alloc.isra.96.constprop.104+0x2c/0x58
[ 40.366822] kmem_cache_alloc+0x2d0/0x2f0
[ 40.370814] __nfs_create_request.part.20+0x58/0x190
[ 40.375754] nfs_create_request+0x7c/0xa0
[ 40.379744] readpage_async_filler+0x21c/0x250
[ 40.384168] read_cache_pages+0xcc/0x198
[ 40.388071] nfs_readpages+0xec/0x1f8
[ 40.391714] read_pages+0x188/0x288
[ 40.395186] page_cache_ra_unbounded+0x140/0x218
[ 40.399779] do_page_cache_ra+0x40/0x50
[ 40.403596] ondemand_readahead+0x128/0x2d0
[ 40.407758] page_cache_async_ra+0xe0/0xe8
[ 40.411833] generic_file_buffered_read+0x17c/0x9c8
[ 40.416686] generic_file_read_iter+0x100/0x1a0
[ 40.421193] nfs_file_read+0x78/0x100
[ 40.424839] new_sync_read+0x104/0x180
[ 40.428569] vfs_read+0x148/0x1d0
[ 40.431866] ksys_read+0x68/0xf0
[ 40.435077] __arm64_sys_read+0x1c/0x28
[ 40.438897] el0_svc_common.constprop.4+0x8c/0x180
[ 40.443663] do_el0_svc+0x24/0x90
[ 40.446963] el0_svc+0x14/0x20
[ 40.450000] el0_sync_handler+0x90/0xb8
[ 40.453817] el0_sync+0x160/0x180
[ 40.457120] Code: f9400681 d1000420 f240003f 9a941000 (f9400000)
[ 40.463186] ---[ end trace e6200b448c7a9881 ]---
Here's a breakdown of some key information in the first part of the log:
The important bit is that it was a data abort at the current Exception Level (so the kernel itself).
In this specific case, it was a read (WnR = 0), not a cache maintenance instruction (CM = 0), the Instruction Syndrome was not valid (ISV = 0) and ISS = 0x4 means Translation fault, level 0.
[0000000000000402] pgd=0000000000000000, p4d=0000000000000000: Specific information about the page global directory and page table entries.
Unless you are really developing the kernel, this is not usually very important.
Overall, by reading this first part of the log we can deduce that it was a problem probably related to memory, some wrong access due to something.
Continuing with the breakdown:
By far the most important part is the program counter. It tells us which function caused the crash, from the name we can deduce it was a specific type of malloc, consistent with the data abort previously signaled (more information in the call stack). The stack pointer (sp) may often give other hints. From the registers dump you can notice that many registers holds 64-bit values beginning with 0xffff, this is expected because this is the kernel address space, in contrast with user-space (0x0). What you may want to check is whether some registers hold weird 64-bit values, this is often an indication of something wrong. x0 is also something you may want to monitor closely: in the Arm64 Procedure Call Standard it is used to pass the first function parameter and for the return value.
The last part is the call stack. We can retrieve some useful information from it. Starting from the bottom:
To conclude, in this specific case this kernel crash was triggered by a innocuous network copy, experienced during the BSP porting on a custom board. We can reasonably exclude (at least initially) a kernel bug, so what we may want to look into is everything that may be related to memory. Eventually this was due to a silly mistake in the memory section of the device tree: 2GB defined whereas the custom board implemented 1GB only. Note that this may not be immediately visible, it occurred only when the kernel tried to allocate something in the non-existing gigabyte.
[ 2.039546] Call trace: [ 2.041986] rzg2l_mipi_dsi_attach+0x5c/0x244 <<< Where we put our WARN_ON [ 2.046336] drm_bridge_attach+0x64/0xc4 <<< Where we need to look next for the bad args [ 2.050254] rcar_du_encoder_init+0xac/0x140 [ 2.054514] rcar_du_modeset_init+0x3b4/0x4ac [ 2.058861] rcar_du_probe+0xb0/0x16c [ 2.062514] platform_drv_probe+0x50/0xa0 [ 2.066515] really_probe+0x260/0x3d0 [ 2.070169] driver_probe_device+0x54/0xf0 <<< Means this is all part of the initial driver loading at boot time [ 2.074256] __device_attach_driver+0xc0/0x110 [ 2.078689] bus_for_each_drv+0x78/0xd0 [ 2.082516] __device_attach+0xd8/0x174 [ 2.086344] device_initial_probe+0x10/0x20 [ 2.090517] bus_probe_device+0x90/0xa0 [ 2.094344] deferred_probe_work_func+0x6c/0xa0 [ 2.098867] process_one_work+0x1b8/0x304 [ 2.102867] worker_thread+0x23c/0x44c [ 2.106607] kthread+0x128/0x160 [ 2.109829] ret_from_fork+0x10/0x1c
=> setenv bootargs 'root=/dev/mmcblk1p2 rootwait initcall_debug'
=> booti 0x48080000 - 0x48000000'
In the Linux Kernel, many drivers and subsystems have support for a special file system called debugfs. This special file system is enabled and mounted by default with VLP:
mount /dev/mmcblk0p6 on / type ext4 (rw,relatime) [...] debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime) [...]
As usual you can use the cat command to view the content of this file system. For example clock:
cat /sys/kernel/debug/clk/clk_summary
enable prepare protect duty
clock count count count rate accuracy phase cycle
---------------------------------------------------------------------------------------------
32k 0 0 0 32768 0 0 50000
x1-clock 4 4 0 24000000 0 0 50000
ref 1 1 0 24000000 0 0 50000
pfd1 1 1 0 24000000 0 0 50000
pll1 2 2 0 600000000 0 0 50000
div2 1 1 0 12000000 0 0 50000
se3_mux 1 1 0 12000000 0 0 50000
se3 1 1 0 12000000 0 0 50000
div1_mux 1 1 0 600000000 0 0 50000
div1 1 1 0 25000000 0 0 50000
diff1_mux 1 1 0 25000000 0 0 50000
diff1 1 1 0 25000000 0 0 50000
pfd3_mux 1 1 0 24000000 0 0 50000
pfd3 1 1 0 960000 0 0 50000
pll3 1 1 0 564480000 0 0 50000
div5 2 2 0 11289600 0 0 50000
se1_mux 1 1 0 11289600 0 0 50000
[...]
GPIO:
cat /sys/kernel/debug/gpio
gpiochip0: GPIOs 120-511, parent: platform/11030000.pinctrl, 11030000.pinctrl:
gpio-120 (P0_0 )
gpio-121 (P0_1 )
gpio-122 (P0_2 )
[...]
gpio-149 (P3_5 )
gpio-150 (P3_6 )
gpio-151 (P3_7 )
gpio-152 (P4_0 )
gpio-153 (P4_1 |gpio_sd0_pwr_en ) out hi
gpio-154 (P4_2 )
[...]
Pin muxing:
root@smarc-rzg2l:~# cat /sys/kernel/debug/pinctrl/11030000.pinctrl-pinctrl-rzg2l/pinmux-pins
Pinmux settings per pin
Format: pin (name): mux_owner|gpio_owner (strict) hog?
pin 0 (P0_0): UNCLAIMED
pin 1 (P0_1): UNCLAIMED
pin 2 (P0_2): UNCLAIMED
pin 3 (P0_3): UNCLAIMED
[...]
pin 30 (P3_6): UNCLAIMED
pin 31 (P3_7): UNCLAIMED
pin 32 (P4_0): device 11c50200.usb-phy function usb0 group usb0
pin 33 (P4_1): GPIO 11030000.pinctrl:153
pin 34 (P4_2): UNCLAIMED
pin 35 (P4_3): UNCLAIMED
pin 36 (P4_4): UNCLAIMED
pin 37 (P4_5): UNCLAIMED
pin 38 (P4_6): UNCLAIMED
pin 39 (P4_7): UNCLAIMED
pin 40 (P5_0): device 11c50200.usb-phy function usb0 group usb0
pin 41 (P5_1): device 11c50200.usb-phy function usb0 group usb0
pin 42 (P5_2): UNCLAIMED
pin 43 (P5_3): UNCLAIMED
pin 44 (P5_4): UNCLAIMED
pin 45 (P5_5): UNCLAIMED
[...]
pin 68 (P8_4): UNCLAIMED
pin 69 (P8_5): UNCLAIMED
pin 70 (P8_6): UNCLAIMED
pin 71 (P8_7): UNCLAIMED
pin 72 (P9_0): device 10059000.adc function adc group adc
[...]
DRI:
cat /sys/kernel/debug/dri/0/state
plane[32]: plane-0
crtc=crtc-0
fb=46
allocated by = weston
refcount=2
format=XR24 little-endian (0x34325258)
modifier=0x0
size=1920x1080
layers:
size[0]=1920x1080
pitch[0]=7680
offset[0]=0
obj[0]:
name=0
refcount=3
start=00100384
size=8294400
imported=no
paddr=0x0000000058600000
vaddr=00000000fcba5467
crtc-pos=1920x1080+0+0
src-pos=1920.000000x1080.000000+0.000000+0.000000
rotation=1
normalized-zpos=0
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
plane[34]: plane-1
crtc=(null)
fb=0
crtc-pos=0x0+0+0
src-pos=0.000000x0.000000+0.000000+0.000000
rotation=1
normalized-zpos=0
color-encoding=ITU-R BT.601 YCbCr
color-range=YCbCr limited range
crtc[37]: crtc-0
enable=1
active=1
self_refresh_active=0
planes_changed=0
mode_changed=0
active_changed=0
connectors_changed=0
color_mgmt_changed=0
plane_mask=1
connector_mask=1
encoder_mask=1
mode: "1920x1080": 60 148500 1920 2008 2052 2200 1080 1084 1089 1125 0x48 0x5
connector[39]: HDMI-A-1
crtc=crtc-0
self_refresh_aware=0
max_requested_bpc=0
connector[45]: Writeback-1
crtc=(null)
self_refresh_aware=0
max_requested_bpc=0
But there are much more useful information that can be found in there and of course you can use grep command to filter the output.
Table of Contents |
---|