当前位置：首页 > news >正文

CUDA学习笔记08: 原子规约/向量求和

news 来源：原创 2024/5/20 5:16:10

参考资料

CUDA编程模型系列一(核心函数)_哔哩哔哩_bilibili

代码

#include <iostream>
#include <cuda_runtime.h> 
#include <device_launch_parameters.h>
#include <stdio.h>
#include <math.h>#define N 10000000
#define BLOCK 256
#define GRID_SIZE 32__managed__ int source[N];
__managed__ int gpu_result[1] = { 0 };__global__ void sum_gpu(int* in, int count, int* out)
{__shared__ int ken[BLOCK];//grid_loopint shared_tmp = 0;for (int idx = blockDim.x * blockIdx.x + threadIdx.x; idx < count; idx += blockDim.x * gridDim.x){shared_tmp += in[idx];}ken[threadIdx.x] = shared_tmp;__syncthreads();int tmp = 0;for (int total_threads = BLOCK / 2; total_threads >= 1; total_threads /= 2){if (threadIdx.x < total_threads){tmp = ken[threadIdx.x] + ken[threadIdx.x + total_threads];}__syncthreads();if (threadIdx.x < total_threads){ken[threadIdx.x] = tmp;}}// block_sum -> share memory[0]if (blockIdx.x * blockDim.x < count){if (threadIdx.x == 0){atomicAdd(out, ken[0]);// memory space wmr}}}// 规约
void test01()
{int cpu_result = 0;/* 初始化 */for (int i = 0; i < N; i++) {source[i] = rand() % 10;}cudaEvent_t start, stop_cpu, stop_gpu;cudaEventCreate(&start);cudaEventCreate(&stop_cpu);cudaEventCreate(&stop_gpu);cudaEventRecord(start);cudaEventSynchronize(start);for (int i = 0; i < 20; i++) {gpu_result[0] = 0;sum_gpu<<<GRID_SIZE, BLOCK>>>(source, N, gpu_result);cudaDeviceSynchronize();}cudaEventRecord(stop_gpu);cudaEventSynchronize(stop_gpu);for (int i = 0; i < N; i++) {cpu_result += source[i];}cudaEventRecord(stop_cpu);cudaEventSynchronize(stop_cpu);float time_cpu = 0, time_gpu = 0;cudaEventElapsedTime(&time_cpu, stop_gpu, stop_cpu);cudaEventElapsedTime(&time_cpu, start, stop_gpu);printf("CPU time: %.2f\nGPU time: %.2f\n", time_cpu, time_gpu / 20);printf("Result: %s\nGPU_result: %d;\nCPU_result: %d;\n", (gpu_result[0] == cpu_result) ? "Pass" : "Error", gpu_result[0], cpu_result);
}

代码在windows下可以运行.

相关文章：

MQTT.fx和MQTTX 链接ONENET物联网提示账户或者密码错误

【idea快捷键】idea开发java过程中常用的快捷键

jupyter notebook导出含中文的pdf（LaTex安装和Pandoc、MiKTex安装）

【分布式】——降级熔断限流

3月25日，每日信息差

【python】Jupyter Notebook 修改默认路径

界面控件DevExpress WinForms/WPF v23.2 - 电子表格支持表单控件

DHCP snooping、DHCP安全及威胁防范

力扣刷题31-33（力扣 0024/0070/0053）

Linux kernel高频技术面试题一

解决错误LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to

云原生周刊：Kubernetes v1.30 一瞥｜ 2024.3.25

STM32 库函数 3*4矩阵键盘

简明 Python 教程(第5章函数)

动态内存分配

［nginx文档翻译系列］控制nginx

【划重点】MySQL技术内幕：InnoDB存储引擎

5分钟即可掌握的前端高效利器：JavaScript 策略模式

Electron入门介绍

ESLint简单操作

HashMap ConcurrentHashMap

Java多态

Java教程_软件开发基础

LeetCode刷题——29. Divide Two Integers（Part 1靠自己）

markdown编辑器简评

Python_网络编程

Spark in action on Kubernetes - Playground搭建与架构浅析

thinkphp5.1 easywechat4 微信第三方开放平台

vue-cli3搭建项目

windows-nginx-https-本地配置

当SetTimeout遇到了字符串

基于Dubbo+ZooKeeper的分布式服务的实现

基于组件的设计工作流与界面抽象

如何利用MongoDB打造TOP榜小程序

CMake 入门1/5：基于阿里云 ECS搭建体验环境

Java性能优化之JVM GC（垃圾回收机制）

sqlite3 --- SQLite 数据库 DB-API 2.0 接口模块

（Redis使用系列） Springboot 整合Redisson 实现分布式锁七

（附源码）springboot电竞专题网站毕业设计 641314

（附源码）springboot教学评价毕业设计 641310

（附源码）ssm高校志愿者服务系统毕业设计 011648

（离散数学）逻辑连接词

（一）C语言之入门：使用Visual Studio Community 2022运行hello world

（原创）Stanford Machine Learning (by Andrew NG) --- (week 9) Anomaly DetectionRecommender Systems...

（转载）跟我一起学习VIM - The Life Changing Editor

*（长期更新）软考网络工程师学习笔记——Section 22 无线局域网

**Java有哪些悲观锁的实现_乐观锁、悲观锁、Redis分布式锁和Zookeeper分布式锁的实现以及流程原理...

.NET Compact Framework 多线程环境下的UI异步刷新

.net core webapi Startup 注入ConfigurePrimaryHttpMessageHandler

；号自动换行

@configuration注解_2w字长文给你讲透了配置类为什么要添加 @Configuration注解

@vue/cli 3.x+引入jQuery

[]新浪博客如何插入代码（其他博客应该也可以）

[Android实例] 保持屏幕长亮的两种方法 [转]

[C++进阶篇]STL中vector的使用