The rbtree-based wait queue consumes a lot of CPU. Use the ul2tree instead. Lots of cleanups and code reorganizations made it possible to reduce the task struct and simplify the code a bit.