Skip to content
Snippets Groups Projects
  1. Jul 27, 2015
  2. Jul 25, 2015
  3. Jul 23, 2015
  4. Jul 22, 2015
  5. Jul 21, 2015
  6. Jul 20, 2015
  7. Jul 19, 2015
  8. Jul 18, 2015
  9. Jul 17, 2015
  10. Jul 16, 2015
    • Karl Rupp's avatar
      AMG: Extended earlier OpenCL-only AMG implementation to CUDA and OpenMP. · 016dfb91
      Karl Rupp authored
      Now provides the following:
       - coarsening: classical RS, aggregation
       - smoothing: direct interpolation, aggregation, smoothed-agg
      All available for all three backends, no longer requiring uBLAS.
      Former RS0 and RS3 dropped due to a lack of fine-grained parallelism.
      
      Implementations mostly based on paper
      "Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods"
      by Bell et al. Initial implementation in branch
       karlrupp/refurbish-amg
      which became too messy over time, hence this cleanup.
      
      New operations available via viennacl/linalg/amg_operations.hpp.
      This includes assign_to_dense() and amg_transpose(),
      which should at some point become generally available with a nicer API.
      
      Still to be added:
       - diagnostic information from preconditioner object
       - documentation in manual
       - more convenience for amg_tag
      016dfb91
    • Karl Rupp's avatar
      MIC: Enhanced kernel parameters for BLAS levels 1, 2, 3. · d2ef9b25
      Karl Rupp authored
      Parameters not obtained from a full-fledged optimizer run,
      but from careful manual tweaking. Obtained memory bandwidths
      for BLAS levels 1 and 2 are about 70 GB/sec, which is okay given
      that vector operations on the Xeon Phi are slow with OpenCL.
      BLAS level 3 improves very mildly, peaks at about 40 GFLOP/sec.
      
      Given that OpenCL for Xeon Phi (KNC) has limited use and that
      everybody is eager for KNL, further tuning efforts are suspended.
      
      Resolves #26.
      d2ef9b25
  11. Jul 15, 2015
Loading