Two fixes for performance monitoring on X86:

    - Add recursion protection to another callchain invoked from
      x86_pmu_stop() which can recurse back into x86_pmu_stop(). The first
      attempt to fix this missed this extra code path.

    - Use the already filtered status variable to check for PEBS counter
      overflow bits and not the unfiltered full status read from
      IA32_PERF_GLOBAL_STATUS which can have unrelated bits check which
      would be evaluated incorrectly.