perf: Use List's TotalEqKernel
This utilizes `List`'s variant of `TotalEqKernel` and implements broadcasting
for that kernel. This gives a nice speed-up, reduces code-bloat and reduces
chance of bugs.
For a measure on how much faster this is I did a microbenchmark:
```python
import polars as pl
import numpy as np
from timeit import timeit
lists = [None] + [list(range(i)) for i in range(11)]
a = [lists[length] for length in np.random.randint(0, 11, 1_000_000)]
b = [lists[length] for length in np.random.randint(0, 11, 1_000_000)]
a = pl.Series('a', a)
b = pl.Series('b', b)
t = timeit(lambda: a == b, number=100)
print(f"Time: {t:.2}s")
```
- Before: `Time: 16s`
- After: `Time: 1.9s`
I feel like there is more that can be done to make this a lot faster, but I
feel like this is good enough for now.