Child pages
  • Ever-rising load on Debian jessie + DRBD8 + LXC host pairs

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Solution!?

...

Code Block
Sep 30 14:12:45 host14 kernel: [203230.540687] Oops: 0000 [#1] SMP 
Sep 30 14:12:45 host14 kernel: [203230.541997] CPU: 0 PID: 4211 Comm: drbd_w_bs Tainted: G           OE   4.6.0-0.bpo.1-amd64 #1 Debian 4.6.4-1~bpo8+1
Sep 30 14:12:45 host14 kernel: [203230.542186] RIP: 0010:[<ffffffff81320246>]  [<ffffffff81320246>] memcpy_erms+0x6/0x10
Sep 30 14:12:45 host14 kernel: [203230.542344] RDX: 00000000000003b0 RSI: 0000000000000003 RDI: ffff88080a616040
Sep 30 14:12:45 host14 kernel: [203230.542619] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 30 14:12:45 host14 kernel: [203230.542863]  00004000000005b4 00000000000005b4 0000000000000a70 0000000000000a00
Sep 30 14:12:45 host14 kernel: [203230.543108]  [<ffffffffc04fde49>] ? drbd_send+0xc9/0x1e0 [drbd]
Sep 30 14:12:45 host14 kernel: [203230.554230]  [<ffffffffc04fbf50>] ? drbd_destroy_connection+0xf0/0xf0 [drbd]
Sep 30 14:12:45 host14 kernel: [203230.564960]  [<ffffffff81099df0>] ? kthread_park+0x50/0x50
Sep 30 14:12:45 host14 kernel: [203230.584805] ---[ end trace 2335d6e97c28a203 ]---

Cases 8-11

Same same.

Dismissed solution ideas (after case 4): DRBD9? Commercial support?

...

Unfortunately kernel 3.16 fell victim to someone trying to "fix" the VLAN encapsulation. In fact that fix made the kernel drop packets occationally enough to render this kernel unusable.

Other ideas

Out of, kind of.

Maybe LXD?

Probable Solution

Fixing drbd_main.c rg. Kernels 4.0+

We finally entrusted a Kernel specialist, Richard Weinberger from Sigma-Star.at.

We believe that his 0001-drbd-Fix-kernel_sendmsg-usage.patch solves the problem for Kernels 4.0 to 4.9 and have included this in our drbd8-dkms package (See Debian jessie builds of DKMSed upstream DRBD8 Kernel Module and the debian repository's pool directory http://deb.clazzes.org/debian/pool/jessie-drbdpkg-8/).

Kernel 4.10 will get a rewrite of that code and should solve the problem once and for all for everybody.

Conclusion

DRBD is faulty with Kernels 4.0-4.9.

Linbit didn't believe it and didn't care.

We had a professional kernel developer fix it.