Events and Synchronization【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isaPTO Tile Lib supports an explicit event model for expressing dependencies between operations without introducing a global barrier for every instruction.This document describes the C event types used byinclude/pto/common/pto_instr.hppandinclude/pto/common/event.hpp.Note: the concretepto::EventSrcOp, DstOptype is defined only for device builds (__CCE_AICORE__). The CPU simulator backend treatsTSYNCas a no-op and relies on ordinary program order within a single thread.Key typespto::Oppto::Opis an opcode-like enumeration used to classify operations. EachOpmaps to a hardware pipeline (PIPE_V,PIPE_MTE2, ...).pto::RecordEventMany intrinsics (e.g.,TADD,TLOAD,TSTORE) returnpto::RecordEvent. This is a marker value that can be assigned into anEventSrcOp, DstOpto record a token after the op finishes.pto::EventSrcOp, DstOp(device-only)On device builds (__CCE_AICORE__),include/pto/common/event.hppdefines:template Op SrcOp, Op DstOp struct Event { void Wait(); void Record(); Event operator(RecordEvent); };Wait()blocks until the producer-side token is satisfied.Record()sets a token on the producer pipeline.evt OP(...)(assignment fromRecordEvent) records automatically.The template parameters encode the producer/consumer opcodes and are used to select the correct pipeline pair.TSYNCOpCode()(single-pipeline barrier)TSYNCOpCode()is a single-op barrier implemented byTSYNC_IMPLOpCode().On device, the current implementation restricts the single-op form to vector pipeline ops (PIPE_V).On the CPU simulator backend (__CPU_SIM),TSYNC_IMPLis a no-op.HowWaitEvents...works in intrinsicsMost intrinsics ininclude/pto/common/pto_instr.hpphave a trailingWaitEvents... eventspack.Pattern:The intrinsic callsTSYNC(events...).TSYNC(events...)callsWaitAllEvents(events...), which invokesevents.Wait()on each event.The instruction then executes, and the intrinsic returns aRecordEvent.This enables a programming style where you:Keep event tokens as SSA-like C variables.Pass them into the next op to enforce ordering.Record a new token by assigning the returnedRecordEvent.Ordering guidelines (abstract)Events are primarily used to express orderingbetween pipeline classes(for example, “a memory load must complete before a vector op consumes the tile”).Operations without explicit data or event dependencies may execute out of order on the device.Operations linked by an event dependency must observe theWait()/Record()order implied by the program.Instruction pages indocs/isa/indicate when ordering constraints matter for correctness.Minimal example#include pto/pto-inst.hpp using namespace pto; void pipeline(__gm__ float* in0, __gm__ float* in1, __gm__ float* out) { using TileT TileTileType::Vec, float, 16, 16; using GShape Shape1, 1, 1, 16, 16; using GStride BaseShape2Dfloat, 16, 16, Layout::ND; using GT GlobalTensorfloat, GShape, GStride, Layout::ND; GT gin0(in0), gin1(in1), gout(out); TileT a, b, c; EventOp::TLOAD, Op::TADD e0; EventOp::TLOAD, Op::TADD e1; EventOp::TADD, Op::TSTORE_VEC e2; e0 TLOAD(a, gin0); e1 TLOAD(b, gin1); e2 TADD(c, a, b, e0, e1); TSTORE(gout, c, e2); }【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考