CANN/cann-bench ROIAlign算子API描述

张

张建站

2026/5/9 16:23:31

10分钟阅读

ROIAlign 算子 API 描述【免费下载链接】cann-bench评测AI在处理CANN领域代码任务的能力涵盖算子生成、算子优化等领域支撑模型选型、训练效果评估统一量化评估标准识别Agent能力短板构建CANN领域评测平台推动AI能力在CANN领域的持续演进。项目地址: https://gitcode.com/cann/cann-bench1. 算子简介池化层用于非均匀输入尺寸的特征图。主要应用场景目标检测中对候选区域Region of Interest进行特征提取Faster R-CNN、Mask R-CNN 等两阶段检测框架中的特征对齐实例分割中将不同大小的 ROI 映射到固定尺寸的特征表示算子特征难度等级L3FusedComposite双输入特征图 x 和 ROI 框 rois单输出支持双线性插值模式2. 算子定义数学公式$$ y \text{roi_align}(x, \text{boxes}, \text{output_size}) $$对于每个 box将其映射到输入特征图上的区域通过 spatial_scale 缩放然后将该区域划分为 outputHeight x outputWidth 个 bin在每个 bin 内通过双线性插值采样后进行平均池化得到固定尺寸的输出。3. 接口规范算子原型cann_bench.roi_align(Tensor x, Tensor boxes, int outputHeight, int outputWidth, float spatial_scale, int sampling_ratio, bool aligned) - Tensor y输入参数说明参数类型默认值描述xTensor必选输入特征图shape 为 [B, C, H, W]boxesTensor必选ROI 框shape 为 [numBoxes, 5] (batch_idx, x1, y1, x2, y2)outputHeightint必选输出高度outputWidthint必选输出宽度spatial_scalefloat必选空间缩放因子用于将 boxes 坐标映射到输入特征图尺寸sampling_ratioint-1采样比率 (-1 或 0 时自动计算)alignedboolfalse是否对齐 (alignedTrue 时 boxes 坐标偏移 -0.5 像素)输出参数Shapedtype描述y[numBoxes, C, outputHeight, outputWidth]与输入 x 相同输出张量boxes 对齐结果数据类型输入 dtype输出 dtypefloat32float32float16float16规则与约束输入特征图 x 的 shape 为 [B, C, H, W]即 batch、通道、高、宽四维格式ROI 框 boxes 的 shape 为 [numBoxes, 5]其中 5 列格式为 (batch_idx, x1, y1, x2, y2)x 和 boxes 的 dtype 需一致outputHeight 和 outputWidth 需为正整数spatial_scale 用于将 ROI 坐标从原图尺度映射到特征图尺度sampling_ratio 为 -1 或 0 时自动计算采样点数4. 精度要求采用生态算子精度标准进行验证。误差指标平均相对误差MERE采样点中相对误差平均值$$ \text{MERE} \text{avg}(\frac{\text{abs}(actual - golden)}{\text{abs}(golden)\text{1e-7}}) $$最大相对误差MARE采样点中相对误差最大值$$ \text{MARE} \max(\frac{\text{abs}(actual - golden)}{\text{abs}(golden)\text{1e-7}}) $$通过标准数据类型FLOAT16BFLOAT16FLOAT32HiFLOAT32FLOAT8 E4M3FLOAT8 E5M2通过阈值(Threshold)2^-102^-72^-132^-112^-32^-2当平均相对误差 MERE Threshold最大相对误差 MARE 10 * Threshold 时判定为通过。5. 标准 Golden 代码优先使用torchvision.ops.roi_align不可用时使用纯 Python fallback。import torch try: from torchvision.ops import roi_align as _tv_roi_align HAS_TORCHVISION True except Exception: HAS_TORCHVISION False def _bilinear_interpolate(input, roi_batch_ind, y, x, ymask, xmask): _, channels, height, width input.size() y y.clamp(min0); x x.clamp(min0) y_low y.int(); x_low x.int() y_high torch.where(y_low height - 1, height - 1, y_low 1) y_low torch.where(y_low height - 1, height - 1, y_low) y torch.where(y_low height - 1, y.to(input.dtype), y) x_high torch.where(x_low width - 1, width - 1, x_low 1) x_low torch.where(x_low width - 1, width - 1, x_low) x torch.where(x_low width - 1, x.to(input.dtype), x) ly y - y_low; lx x - x_low; hy 1.0 - ly; hx 1.0 - lx def masked_index(y_idx, x_idx): if ymask is not None: y_idx torch.where(ymask[:, None, :], y_idx, 0) x_idx torch.where(xmask[:, None, :], x_idx, 0) return input[roi_batch_ind[:, None, None, None, None, None], torch.arange(channels, deviceinput.device)[None, :, None, None, None, None], y_idx[:, None, :, None, :, None], x_idx[:, None, None, :, None, :]] v1 masked_index(y_low, x_low); v2 masked_index(y_low, x_high) v3 masked_index(y_high, x_low); v4 masked_index(y_high, x_high) def outer_prod(y_t, x_t): return y_t[:, None, :, None, :, None] * x_t[:, None, None, :, None, :] return outer_prod(hy, hx)*v1 outer_prod(hy, lx)*v2 outer_prod(ly, hx)*v3 outer_prod(ly, lx)*v4 def _roi_align_fallback(x, boxes, pooled_height, pooled_width, spatial_scale, sample_ratio, aligned): orig_dtype x.dtype x_fp32 x.float(); boxes_fp32 boxes.float() _, channels, height, width x_fp32.size() roi_batch_ind boxes_fp32[:, 0].long() offset 0.5 if aligned else 0.0 roi_start_w boxes_fp32[:, 1] * spatial_scale - offset roi_start_h boxes_fp32[:, 2] * spatial_scale - offset roi_end_w boxes_fp32[:, 3] * spatial_scale - offset roi_end_h boxes_fp32[:, 4] * spatial_scale - offset roi_width roi_end_w - roi_start_w; roi_height roi_end_h - roi_start_h if not aligned: roi_width roi_width.clamp(min1.0); roi_height roi_height.clamp(min1.0) bin_size_h roi_height / pooled_height; bin_size_w roi_width / pooled_width exact_sampling sample_ratio 0 roi_bin_grid_h sample_ratio if exact_sampling else torch.ceil(roi_height / pooled_height) roi_bin_grid_w sample_ratio if exact_sampling else torch.ceil(roi_width / pooled_width) if exact_sampling: count max(roi_bin_grid_h * roi_bin_grid_w, 1) iy torch.arange(roi_bin_grid_h, devicex.device); ix torch.arange(roi_bin_grid_w, devicex.device) ymask None; xmask None else: count torch.clamp(roi_bin_grid_h * roi_bin_grid_w, min1) iy torch.arange(height, devicex.device); ix torch.arange(width, devicex.device) ymask iy[None, :] roi_bin_grid_h[:, None]; xmask ix[None, :] roi_bin_grid_w[:, None] def from_K(t): return t[:, None, None] y from_K(roi_start_h) torch.arange(pooled_height, devicex.device)[None, :, None] * from_K(bin_size_h) (iy[None, None, :] 0.5).to(x_fp32.dtype) * from_K(bin_size_h / roi_bin_grid_h) x_pos from_K(roi_start_w) torch.arange(pooled_width, devicex.device)[None, :, None] * from_K(bin_size_w) (ix[None, None, :] 0.5).to(x_fp32.dtype) * from_K(bin_size_w / roi_bin_grid_w) val _bilinear_interpolate(x_fp32, roi_batch_ind, y, x_pos, ymask, xmask) if not exact_sampling: val torch.where(ymask[:, None, None, None, :, None], val, 0) val torch.where(xmask[:, None, None, None, None, :], val, 0) output val.sum((-1, -2)) if isinstance(count, torch.Tensor): output output / count[:, None, None, None] else: output output / count return output.to(orig_dtype) def roi_align( x: torch.Tensor, boxes: torch.Tensor, pooled_height: int, pooled_width: int, spatial_scale: float 1.0, sample_ratio: int -1, aligned: bool False, ) - torch.Tensor: 池化层用于非均匀输入尺寸的特征图公式: y roi_align(x, boxes, output_size) if HAS_TORCHVISION: return _tv_roi_align(x, boxes, (pooled_height, pooled_width), spatial_scalespatial_scale, sampling_ratiosample_ratio, alignedaligned) return _roi_align_fallback(x, boxes, pooled_height, pooled_width, spatial_scale, sample_ratio, aligned)6. 额外信息算子调用示例import torch import cann_bench x torch.randn(2, 256, 64, 64, dtypetorch.float32, devicenpu) boxes torch.tensor([[0, 10.0, 10.0, 50.0, 50.0], [1, 20.0, 20.0, 60.0, 60.0]], dtypetorch.float32, devicenpu) y cann_bench.roi_align(x, boxes, 7, 7, 0.0625, 2, False)【免费下载链接】cann-bench评测AI在处理CANN领域代码任务的能力涵盖算子生成、算子优化等领域支撑模型选型、训练效果评估统一量化评估标准识别Agent能力短板构建CANN领域评测平台推动AI能力在CANN领域的持续演进。项目地址: https://gitcode.com/cann/cann-bench创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

CANN/pypto maximum逐元素最大值API

# pypto.maximum 【免费下载链接】pypto PyPTO（发音: pai p-t-o）：Parallel Tensor/Tile Operation编程范式。项目地址: https://gitcode.com/cann/pypto 产品支持情况产品是否支持Ascend 950PR/Ascend 950DT√Atlas A3…...

2026/5/9 16:06:37 阅读更多 →

CANN/HCCL AlltoAllV示例

集合通信 - AlltoAllV 【免费下载链接】hccl 集合通信库（Huawei Collective Communication Library，简称HCCL）是基于昇腾AI处理器的高性能集合通信库，为计算集群提供高性能、高可靠的通信方案项目地址: https://gitcode.com/ca…...

2026/5/9 16:04:13 阅读更多 →

CANN/asc-devkit：AllocMutexID互斥锁分配接口

AllocMutexID (ISASI) 【免费下载链接】asc-devkit 本项目是CANN 推出的昇腾AI处理器专用的算子程序开发语言，原生支持C和C标准规范，主要由类库和语言扩展层构成，提供多层级API，满足多维场景算子开发诉求。项目地址: https://g…...

2026/5/9 16:00:43 阅读更多 →

ColorControl：一键掌控多设备显示与智能控制的终极方案

ColorControl：一键掌控多设备显示与智能控制的终极方案【免费下载链接】ColorControl Easily change NVIDIA display settings and/or control LG TVs 项目地址: https://gitcode.com/gh_mirrors/co/ColorControl ColorControl 是一个专注于显示参数优化与智…...

2026/5/8 2:36:29 阅读更多 →

使用Taotoken CLI工具一键配置开发环境与多工具API密钥的教程

使用Taotoken CLI工具一键配置开发环境与多工具API密钥的教程 1. 安装Taotoken CLI工具 Taotoken CLI工具提供两种安装方式。对于需要频繁使用CLI的场景，推荐全局安装： npm install -g taotoken/taotoken若只需临时使用或避免全局依赖，可通…...

2026/5/9 12:07:00 阅读更多 →

C语言固件完整性保护全栈方案（含国密SM4+可信执行环境TEE落地代码）

更多请点击： https://intelliparadigm.com 第一章：Shell脚本的基本语法和命令 Shebang 与执行方式每个可执行 Shell 脚本的第一行应以 Shebang（ #!/bin/bash）开头，用于指定解释器路径。保存为 hello.sh 后&#xf…...

2026/5/7 11:30:49 阅读更多 →

在 Node.js 后端服务中集成 Taotoken 实现多模型对话路由

在 Node.js 后端服务中集成 Taotoken 实现多模型对话路由 1. 准备工作在开始集成 Taotoken 之前，需要确保您的开发环境已满足以下条件。Node.js 版本建议使用 18.x 或更高 LTS 版本。通过运行 node -v 可以检查当前版本。如果尚未安装 openai 包，可以…...

2026/5/8 6:34:49 阅读更多 →