Skip to content

Mediapipe GPU ​

General info ​

Mediapipe is a great library by google that allows to do great stuff, however it doesn't come with out-of-the-box support for jetson platforms. That's why is necessary to build it from source.

Sad for us developer, the build procedure changed substantially from version 0.8.x to version 0.10.x, so depending on your needs you have to match the mediapipe version you desire with the python version you desire.

If there is no match that suits your needs, you can try to combine some of the instructions provided or open an issue on the GitHub repo of this documentation, me or some kind member of the community will try to address your issue.
BUT, since it isn't an easy process I suggest to consider using some of the matches that are already present in this page πŸ˜ƒ

Mediapipe wheels ​

Jetpack (l4t)PythonMediapipeInstall guide
4.6.1 (l4t-32.7.1)3.6.90.8.4go to page
4.6.1 (l4t-32.7.1)3.6.90.8.5go to page
4.6.1 (l4t-32.7.1)3.8.00.10.7go to page
4.6.1 (l4t-32.7.1)3.10.110.10.7go to page

Docker images ​

INFO

To properly run docker images on jetson, make sure you have it correctly configured. Check out docker setup.

TIP

To run containers on jetson with display and GPU, look here.

Root image: ghcr.io/lanzani/mediapipe.

Runtime images ​

Here you can find images with opencv and mediapipe pre-installed.

Jetpack 4.6.1 (l4t-32.7.1) ​

PythonOpenCVMediapipeImage tagImage source
3.6.94.8.00.8.5l4t32.7.1-py3.6.9-ocv4.8.0-mp0.8.5Dockerfile
3.8.04.8.00.10.7l4t32.7.1-py3.8.0-ocv4.8.0-mp0.10.7Dockerfile
3.10.114.8.00.10.7l4t32.7.1-py3.8.0-ocv4.10.11-mp0.10.7Dockerfile

Build images ​

These are the images used to build mediapipe and get the wheel file.

Jetpack 4.6.1 (l4t-32.7.1) ​

PythonOpenCVMediapipeImageImage source
3.8.04.8.00.10.7l4t32.7.1-py3.8.0-ocv4.8.0-mp0.10.7-buildDockerfile
3.10.114.8.00.10.7l4t32.7.1-py3.10.11-ocv4.8.0-mp0.10.7-buildDockerfile

You can find all the available tags here.

Test GPU support ​

To check if mediapipe uses the gpu, run your script or use one of the following, if you see a log printed in the terminal that says:

Created TensorFlow Lite XNNPACK delegate for GPU.

it means that mediapipe is using the gpu! Congratulations! πŸŽ‰

Mediapipe 0.8.x ​

Live pose estimation ​

With an HD webcam I was able to obtain ~20 fps.

python
import time

import cv2
import mediapipe as mp

video_source = "/dev/video0"  # Use a webcam
# video_source = "test_video.mp4"  # Path to video file

# Initialize MediaPipe Pose and Drawing utilities
mp_pose = mp.solutions.pose
mp_drawing = mp.solutions.drawing_utils
pose = mp_pose.Pose()

# Open the video file
cap = cv2.VideoCapture(video_source)
time.sleep(2)

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Convert the frame to RGB
    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

    # Process the frame with MediaPipe Pose
    result = pose.process(frame_rgb)

    # Draw the pose landmarks on the frame
    if result.pose_landmarks:
        mp_drawing.draw_landmarks(frame, result.pose_landmarks, mp_pose.POSE_CONNECTIONS)

    # Display the frame
    cv2.imshow('MediaPipe Pose', frame)

    # Exit if 'q' keypyt
    cv2.waitKey(1)

Mediapipe 0.10.x ​

With an HD webcam I was able to obtain ~20 fps.

WARNING

To run this script you need to download this model and put it in the same directory of the script.

python
import cv2
import numpy as np
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
from mediapipe.framework.formats import landmark_pb2

model_path = "pose_landmarker_full.task"

video_source = "/dev/video0"

num_poses = 4
min_pose_detection_confidence = 0.5
min_pose_presence_confidence = 0.5
min_tracking_confidence = 0.5


def draw_landmarks_on_image(rgb_image, detection_result):
    pose_landmarks_list = detection_result.pose_landmarks
    annotated_image = np.copy(rgb_image)

    # Loop through the detected poses to visualize.
    for idx in range(len(pose_landmarks_list)):
        pose_landmarks = pose_landmarks_list[idx]

        pose_landmarks_proto = landmark_pb2.NormalizedLandmarkList()
        pose_landmarks_proto.landmark.extend([
            landmark_pb2.NormalizedLandmark(
                x=landmark.x,
                y=landmark.y,
                z=landmark.z) for landmark in pose_landmarks
        ])
        mp.solutions.drawing_utils.draw_landmarks(
            annotated_image,
            pose_landmarks_proto,
            mp.solutions.pose.POSE_CONNECTIONS,
            mp.solutions.drawing_styles.get_default_pose_landmarks_style())
    return annotated_image


to_window = None
last_timestamp_ms = 0


def print_result(detection_result: vision.PoseLandmarkerResult, output_image: mp.Image,
                 timestamp_ms: int):
    global to_window
    global last_timestamp_ms
    if timestamp_ms < last_timestamp_ms:
        return
    last_timestamp_ms = timestamp_ms
    # print("pose landmarker result: {}".format(detection_result))
    to_window = cv2.cvtColor(
        draw_landmarks_on_image(output_image.numpy_view(), detection_result), cv2.COLOR_RGB2BGR)


base_options = python.BaseOptions(model_asset_path=model_path, delegate=python.BaseOptions.Delegate.GPU)
options = vision.PoseLandmarkerOptions(
    base_options=base_options,
    running_mode=vision.RunningMode.LIVE_STREAM,
    num_poses=num_poses,
    min_pose_detection_confidence=min_pose_detection_confidence,
    min_pose_presence_confidence=min_pose_presence_confidence,
    min_tracking_confidence=min_tracking_confidence,
    output_segmentation_masks=False,
    result_callback=print_result
)

with vision.PoseLandmarker.create_from_options(options) as landmarker:
    # Use OpenCV’s VideoCapture to start capturing from the webcam.
    cap = cv2.VideoCapture(video_source)

    # Create a loop to read the latest frame from the camera using VideoCapture#read()
    while cap.isOpened():
        success, image = cap.read()
        if not success:
            print("Image capture failed.")
            break

        # Convert the frame received from OpenCV to a MediaPipe’s Image object.
        mp_image = mp.Image(
            image_format=mp.ImageFormat.SRGB,
            data=cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
        timestamp_ms = int(cv2.getTickCount() / cv2.getTickFrequency() * 1000)
        landmarker.detect_async(mp_image, timestamp_ms)

        if to_window is not None:
            cv2.imshow("MediaPipe Pose Landmark", to_window)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()