This is not so much meant as a serious pull request but rather something to kick off a discussion with the ability to point at a piece of code.
I have a type of block problem in which the nonlinearity is constant so that its method
solveLocalProblem can be
const. In this case (because nobody's writing anything) I believe I can safely read all my data, also in parallel.
So I've made the Jacobi method do just that. The speedup is not great. I get a reduction in runtime in the order of 30% with 2 threads instead of 1. It finally levels off at 40%. Better than nothing. Of course, this doesn't get us very far because the nonlinear smoother typically isn't the bottle-neck in applications. But maybe I've already made so many mistakes here that leads to so many objections about how to go about such changes that there's already room for a discussion...
Go, say something. Be brutal.