Tighter write race checking (#224)
* introduce loopy.Options.insert_additional_gbarriers * implement WriteRaceChecker * barrier insertion: use write race checker instead of overlap checker * tests: gets rid of requirement of certain barriers when there weren't any access races * should be either l.0.A != l.0.B or l.1.A != l.1.B * do not intersect the map domains Intersecting would get ignore some accesses when there's a grid mismatch => Inaccurate. * docs * account for instructions missing hw axes * "broadcast" statements along the unused hw axes * fixup! implement WriteRaceChecker changes related to callables_table * s/insert_additional_gbarriers/insert_gbarriers/g * move WriteRaceChecker to loopy.schedule.tools * extends test_no_barriers_for_non_overlapping_access * uses markup in _check_for_access_races docs * describe the special access map values * do not handroll isl_obj.project_out_except * adds a test from gh-578