Edge AI in Action: Practical Approaches to Developing and Deploying Optimized Models

During this tutorial, we presented how to develop and deploy models for edge AI using as examples multimodal interaction applications.

Banner image
feature image

Summary

Edge AI is a term that refers to the application of artificial intelligence on edge devices, i.e., devices that are at the periphery of a network, such as smartphones, tablets, laptops, cameras, sensors, and drones, among others. Edge AI enables these devices to perform AI tasks autonomously, without relying on a connection to the cloud or a central server. This brings benefits such as higher speed, lower latency, greater privacy, and lower power consumption.


However, edge AI also poses many challenges and opportunities for model development and deployment, such as size reduction, compression, quantization, and distillation. Edge AI also involves integrating and communicating between edge devices and the cloud or other devices, creating a hybrid and distributed architecture.


In this tutorial, we provided clear and practical guidance on developing and deploying optimized models for edge AI. Our comprehensive approach covered both the theoretical and technical aspects, along with best practices and real-world case studies. Our primary focus was on computer vision and deep learning models, as they are highly relevant to edge AI applications. Throughout the tutorial, we demonstrated the utilization of various tools and frameworks, including TensorFlow, PyTorch, ONNX, OpenVINO, Google Mediapipe, and Qualcomm SNPE.


Additionally, we provided concrete examples of multimodal AI applications, including head pose estimation, body segmentation, hand gesture recognition, sound localization, and more. These applications leverage various input sources, such as images, videos, and sounds, to create highly interactive and immersive edge AI experiences. Our presentation encompassed the development and deployment of these multimodal AI models on Jabra collaboration business cameras. Furthermore, we explored integration possibilities with cloud services and other devices, such as AWS DeepLens, Luxonis OAK-1 MAX, and NVIDIA Jetson Nano Developer Kit.

    feature image

    Topics

    The tutorial was intended for researchers and practitioners interested in learning more about edge AI and how to apply it in real-world scenarios. The tutorial assumed some basic knowledge of computer vision and deep learning, but did not require any prior experience with edge AI. The list of topics covered in this tutorial was:

    • Introduction to Edge AI: Motivation, definition, challenges, and opportunities of edge AI. Comparison and trade-offs between edge AI and cloud AI. Overview of edge AI applications and use cases.
    • Model Development for Edge AI: Techniques and methods for developing efficient and effective models for edge AI, such as model design, pruning, compression, quantization, and distillation. Evaluation, comparison, and best practices of different model development approaches.
    • Model Deployment for Edge AI: Techniques and methods for deploying and running models on edge devices, such as model conversion, inference, and optimization. Overview of various tools and frameworks for edge AI. Demonstration and comparison of different model deployment approaches.
    • Multimodal AI for Edge AI: Introduction to multimodal AI, which combines different types of inputs and outputs, such as images, videos, and sounds. Overview of multimodal AI applications and use cases, such as combining poses, gestures, gaze and voice. Techniques and methods for developing and deploying multimodal AI models on edge devices. Demonstration and comparison of different multimodal AI approaches.

    Schedule

    This was our tutorial schedule. You can download the slides presented by us in the following links.

    Tutorial Video

    Watch the video of our tutorial Edge AI in Action: Practical Approaches to Developing and Deploying Optimized Models presented during the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024).

    Organizers

    Due to the diversity and complexity of the modalities involved, it was required expertise from different domains, such as computer vision, natural language processing, human-computer interaction, signal processing, and machine learning. We covered the breadth and depth of multimodal interaction research, and provided a rich and diverse perspective to the audience. We hope that our tutorial inspired and motivated the CVPR community to explore and advance these exciting and challenging fields.

    Fabricio Batista Narcizo

    Fabricio Batista Narcizo

    Research Scientist / Part-Time Lecturer
    Jabra / IT University of Copenhagen

    Elizabete Munzlinger

    Elizabete Munzlinger

    Industrial Ph.D. Student
    Jabra / IT University of Copenhagen

    Anuj Dutt

    Anuj Dutt

    Senior Software Engineer for AI Systems
    GN Audio A/S

    Shan Ahmed Shaffi

    Shan Ahmed Shaffi

    AI/Cloud Engineer
    GN Hearing A/S

    Sai Narsi Reddy Donthi Reddy

    Sai Narsi Reddy Donthi Reddy

    AI/ML Researcher
    Jabra