Edge AI in Action: Practical Approaches to Developing and Deploying Optimized Models
During this tutorial, we presented how to develop and deploy models for edge AI using as examples multimodal interaction applications.


Summary
Edge AI is a term that refers to the application of artificial intelligence on edge devices, i.e., devices that are at the periphery of a network, such as smartphones, tablets, laptops, cameras, sensors, and drones, among others. Edge AI enables these devices to perform AI tasks autonomously, without relying on a connection to the cloud or a central server. This brings benefits such as higher speed, lower latency, greater privacy, and lower power consumption.
However, edge AI also poses many challenges and opportunities for model development and deployment, such as size reduction, compression, quantization, and distillation. Edge AI also involves integrating and communicating between edge devices and the cloud or other devices, creating a hybrid and distributed architecture.
In this tutorial, we provided clear and practical guidance on developing and deploying optimized models for edge AI. Our comprehensive approach covered both the theoretical and technical aspects, along with best practices and real-world case studies. Our primary focus was on computer vision and deep learning models, as they are highly relevant to edge AI applications. Throughout the tutorial, we demonstrated the utilization of various tools and frameworks, including TensorFlow, PyTorch, ONNX, OpenVINO, Google Mediapipe, and Qualcomm SNPE.
Additionally, we provided concrete examples of multimodal AI applications, including head pose estimation, body segmentation, hand gesture recognition, sound localization, and more. These applications leverage various input sources, such as images, videos, and sounds, to create highly interactive and immersive edge AI experiences. Our presentation encompassed the development and deployment of these multimodal AI models on Jabra collaboration business cameras. Furthermore, we explored integration possibilities with cloud services and other devices, such as AWS DeepLens, Luxonis OAK-1 MAX, and NVIDIA Jetson Nano Developer Kit.

Topics
The tutorial was intended for researchers and practitioners interested in learning more about edge AI and how to apply it in real-world scenarios. The tutorial assumed some basic knowledge of computer vision and deep learning, but did not require any prior experience with edge AI. The list of topics covered in this tutorial was:
- Introduction to Edge AI: Motivation, definition, challenges, and opportunities of edge AI. Comparison and trade-offs between edge AI and cloud AI. Overview of edge AI applications and use cases.
- Model Development for Edge AI: Techniques and methods for developing efficient and effective models for edge AI, such as model design, pruning, compression, quantization, and distillation. Evaluation, comparison, and best practices of different model development approaches.
- Model Deployment for Edge AI: Techniques and methods for deploying and running models on edge devices, such as model conversion, inference, and optimization. Overview of various tools and frameworks for edge AI. Demonstration and comparison of different model deployment approaches.
- Multimodal AI for Edge AI: Introduction to multimodal AI, which combines different types of inputs and outputs, such as images, videos, and sounds. Overview of multimodal AI applications and use cases, such as combining poses, gestures, gaze and voice. Techniques and methods for developing and deploying multimodal AI models on edge devices. Demonstration and comparison of different multimodal AI approaches.
Schedule
This was our tutorial schedule. You can download the slides presented by us in the following links.
2:00 pm (PST) - 2:15 pm (PST) - Opening by Fabricio Batista Narcizo [Slides]
2:15 pm (PST) - 2:45 pm (PST) - Introduction to Edge AI by Anuj Dutt [Slides]
2:45 pm (PST) - 3:15 pm (PST) - Model Development for Edge AI by Sai Narsi Reddy Donthi Reddy [Slides]
3:15 pm (PST) - 3:45 pm (PST) - Break
3:45 pm (PST) - 5:00 pm (PST) - Model Deployment for Edge AI by Anuj Dutt and Fabricio Batista Narcizo [Slides]
5:00 pm (PST) - 5:50 pm (PST) - Multimodal AI for Edge AI by Fabricio Batista Narcizo, Elizabete Munzlinger and Shan Ahmed Shaffi [Slides]
5:50 pm (PST) - 6:00 pm (PST) - Closing Remarks and Joint Q&A
Tutorial Video
Watch the video of our tutorial Edge AI in Action: Practical Approaches to Developing and Deploying Optimized Models presented during the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024).
Organizers
Due to the diversity and complexity of the modalities involved, it was required expertise from different domains, such as computer vision, natural language processing, human-computer interaction, signal processing, and machine learning. We covered the breadth and depth of multimodal interaction research, and provided a rich and diverse perspective to the audience. We hope that our tutorial inspired and motivated the CVPR community to explore and advance these exciting and challenging fields.
Fabricio Batista Narcizo
Research Scientist / Part-Time Lecturer
Jabra / IT University of Copenhagen
Elizabete Munzlinger
Industrial Ph.D. Student
Jabra / IT University of Copenhagen
Anuj Dutt
Senior Software Engineer for AI Systems
GN Audio A/S
Shan Ahmed Shaffi
AI/Cloud Engineer
GN Hearing A/S
Sai Narsi Reddy Donthi Reddy
AI/ML Researcher
Jabra