Linux shell编程:监控进程CPU使用率并使用 perf 抓取高CPU进程信息

0. 概要

本文将介绍一个用于监控一组进程CPU使用率的Shell脚本,,当检测到某进程的CPU使用率超出阈值时,使用 perf 工具抓取该进程的详细信息。

1. shell脚本流程的简要图示:


2. perf介绍

perf 是 Linux 内核提供的一个强大性能分析工具,能够用于分析和调优系统性能。它支持多种事件类型,如CPU时钟、缓存命中/未命中、中断等。


perf record -F 99 -e cpu-clock -p $pid -g -o "perf-$process_name.data" -- sleep $perf_sleep_time
  • -F 99:以每秒99次的频率进行采样。
  • -e cpu-clock:采样的事件类型为CPU时钟周期。
  • -p $pid:指定要采样的进程ID。
  • -g:记录调用栈信息,帮助分析性能瓶颈。
  • -o "perf-$process_name.data":将采样数据输出到指定文件中。
  • -- sleep $perf_sleep_time:持续采样时间为10秒。


如何使用perf 统计cpu和内存?

3. shell脚本详解

  1. 日志文件配置

    # Log file location
    # Redirect standard input, output, and error to log file
    exec 1>>"$LOGFILE"
    exec 2>>"$LOGFILE"


  2. 后台运行检测

    # Check if the script is already running
    if [ "$1" != "background" ]; then"$0" background &exit 0


  3. 初始化上次报告时间文件

    # Initialize last report time file
    touch "$last_report_time_file"


  4. 获取CPU总时间的函数

    # Function to get the total CPU usage from /proc/stat
    get_total_cpu_time() {awk '/^cpu / {print $2 + $3 + $4 + $5 + $6 + $7 + $8}' /proc/stat

    /proc/stat 文件中获取CPU总时间。

  5. 获取进程CPU时间的函数

    # Function to get the process CPU usage from /proc/[pid]/stat
    get_process_cpu_time() {pid=$1awk '{print $14 + $15 + $16 + $17}' /proc/$pid/stat

    /proc/[pid]/stat 文件中获取指定进程的CPU时间。

  6. 计算进程CPU使用率的函数

    # Function to calculate CPU usage of a process
    calculate_cpu_usage() {pid=$1prev_process_time=$(get_process_cpu_time "$pid")prev_total_time=$(get_total_cpu_time)sleep 1process_time=$(get_process_cpu_time "$pid")total_time=$(get_total_cpu_time)process_delta=$((process_time - prev_process_time))total_delta=$((total_time - prev_total_time))cpu_usage=$((100 * process_delta / total_delta))echo $cpu_usage


  7. 加载上次报告时间的函数

    # Function to load the last report time for a PID
    load_last_report_time() {pid=$1grep "^$pid=" "$last_report_time_file" | cut -d'=' -f2


  8. 保存上次报告时间的函数

    # Function to save the last report time for a PID
    save_last_report_time() {pid=$1time=$2sed -i "/^$pid=/d" "$last_report_time_file"echo "$pid=$time" >> "$last_report_time_file"


  9. 进程监控列表

    # List of process names to monitor
    process_names="top systemd"


  10. 监控循环

 while true; docurrent_time=$(date +%s)for process_name in $process_names; doif [ -n "$DEBUG_ON" ]; thenecho "Checking process: $process_name"fi# Find all matching process PIDspids=$(ps aux | grep "$process_name" | grep -v grep | awk '{print $2}')for pid in $pids; do# Calculate CPU usagecpu_usage=$(calculate_cpu_usage "$pid")# Check if CPU usage exceeds $max_cpu_usage%if [ "$cpu_usage" -gt $max_cpu_usage ]; thenecho "High CPU usage detected for process '$process_name' (PID: $pid): $cpu_usage%"# Load the last report time for this PIDlast_time=$(load_last_report_time "$pid")last_time=${last_time:-0}time_diff=$((current_time - last_time))# Check if the last report time is more than 60 seconds agoif [ "$time_diff" -ge 60 ]; thenecho "time_diff: $time_diff, perf record -F 99 -e cpu-clock -p $pid -g -o perf-$process_name.data -- sleep $perf_sleep_time"ps -p "$pid" -o pid,ppid,cmd,%mem,%cpu >> "$LOGFILE"perf record -F 99 -e cpu-clock -p $pid -g -o "perf-$process_name.data" -- sleep $perf_sleep_time# Save the last report time for this PIDsave_last_report_time "$pid" "$current_time"# sleep for 1 secondsleep 1fielseif [ -n "$DEBUG_ON" ]; thenecho "CPU usage for process '$process_name' (PID: $pid): $cpu_usage%"fifidonedonedone

这是主要的监控循环,定期检查指定进程的CPU使用率,并在超过阈值时使用 perf 抓取详细信息。

4. 完整脚本实现


#!/bin/sh# This script monitors the CPU usage of a list of processesDEBUG_ON=1
# Log file location
LOGFILE="process_monitor.log"# Redirect standard input, output, and error to log file
exec 1>>"$LOGFILE"
exec 2>>"$LOGFILE"# Check if the script is already running
if [ "$1" != "background" ]; then"$0" background &exit 0
fi# Initialize last report time file
touch "$last_report_time_file"# Function to get the total CPU usage from /proc/stat
get_total_cpu_time() {awk '/^cpu / {print $2 + $3 + $4 + $5 + $6 + $7 + $8}' /proc/stat
}# Function to get the process CPU usage from /proc/[pid]/stat
get_process_cpu_time() {pid=$1awk '{print $14 + $15 + $16 + $17}' /proc/$pid/stat
}# Function to calculate CPU usage of a process
calculate_cpu_usage() {pid=$1prev_process_time=$(get_process_cpu_time "$pid")prev_total_time=$(get_total_cpu_time)sleep 1process_time=$(get_process_cpu_time "$pid")total_time=$(get_total_cpu_time)process_delta=$((process_time - prev_process_time))total_delta=$((total_time - prev_total_time))cpu_usage=$((100 * process_delta / total_delta))echo $cpu_usage
}# Function to load the last report time for a PID
load_last_report_time() {pid=$1grep "^$pid=" "$last_report_time_file" | cut -d'=' -f2
}# Function to save the last report time for a PID
save_last_report_time() {pid=$1time=$2sed -i "/^$pid=/d" "$last_report_time_file"echo "$pid=$time" >> "$last_report_time_file"
}# List of process names to monitor
process_names="top systemd"echo "Monitoring CPU usage for processes: $process_names"# Perf sleep time
max_cpu_usage=80# Monitoring loop
while true; docurrent_time=$(date +%s)for process_name in $process_names; doif [ -n "$DEBUG_ON" ]; thenecho "Checking process: $process_name"fi# Find all matching process PIDs# pids=$(ps | grep "$process_name" | grep -v grep | awk '{print $1}')pids=$(ps aux | grep "$process_name" | grep -v grep | awk '{print $2}')for pid in $pids; do# Calculate CPU usagecpu_usage=$(calculate_cpu_usage "$pid")# Check if CPU usage exceeds $max_cpu_usage%if [ "$cpu_usage" -gt $max_cpu_usage ]; thenecho "High CPU usage detected for process '$process_name' (PID: $pid): $cpu_usage%"# Load the last report time for this PIDlast_time=$(load_last_report_time "$pid")last_time=${last_time:-0}time_diff=$((current_time - last_time))# Check if the last report time is more than 60 seconds agoif [ "$time_diff" -ge 60 ]; thenecho "time_diff: $time_diff, perf record -F 99 -e cpu-clock -p $pid -g -o perf-$process_name.data -- sleep $perf_sleep_time"ps -p "$pid" -o pid,ppid,cmd,%mem,%cpu >> "$LOGFILE"perf record -F 99 -e cpu-clock -p $pid -g -o "perf-$process_name.data" -- sleep $perf_sleep_time# Save the last report time for this PIDsave_last_report_time "$pid" "$current_time"# sleep for 1 secondsleep 1fielseif [ -n "$DEBUG_ON" ]; thenecho "CPU usage for process '$process_name' (PID: $pid): $cpu_usage%"fifidonedonedone

通过这种方式,我们可以有效地监控嵌入式系统中高CPU使用率的进程,并通过 perf 工具获取详细的性能数据,帮助我们进行性能调优和问题排查。


