You have to pick either kernel or user space, not both. Either implement it purely in the kernel or purely in user space. In reality pure user space is faster, just look at Snabb switch https://github.com/SnabbCo/snabbswitch/wiki
Snabb and DPDK aren't magic though. Because they poll you have to dedicate a whole core to the vSwitch. Containers are a different case than VMs because the packets start in the kernel TCP/IP stack; to get into a userspace vSwitch they'd have to exit the kernel.