On a single core of a ryzen 3900x with 12GB memory limit. The benchmarks from shouldRevert are solved, if difficutly reversion is applied. Whereas the opposite applies to Shouldn't revert. 

With the current classifier:

52/100 are reverted. Ideally it'd be zero of those.
1243/1250 are reverted. Ideally it'd ne 1250.

So a perfect classifier would solve 59 more problems.

