Skip to main content

Chapter 2: Conversational Robotics

Duration: Week 13 Hardware Tier: Tier 2-3 Lessons: 4

Coming Soon

This chapter is currently in outline form. Full content will be available in a future update.

Chapter Overview

"Clean the room." Four syllables that require understanding intent, decomposing into subtasks, perceiving the environment, and executing dozens of coordinated actions. This chapter teaches you how to build robots that understand and act on natural language.

Learning Objectives

Integrate OpenAI Whisper for voice command processing
Build speech recognition and NLU pipelines
Design multi-modal interaction systems
Use LLMs for cognitive planning and task decomposition

Lessons (Outline)

#	Lesson	Duration	Status
2.1	Voice-to-Action with OpenAI Whisper	75 min	📝 Outline
2.2	Speech Recognition and NLU	60 min	📝 Outline
2.3	Multi-Modal Interaction	75 min	📝 Outline
2.4	Cognitive Planning with LLMs	90 min	📝 Outline

Key Topics Covered

Whisper integration with ROS 2
Intent classification and entity extraction
Gesture recognition with MediaPipe
LLM-based task decomposition
Action primitive libraries
Grounding language in robot capabilities

Chapter Overview
Learning Objectives
Lessons (Outline)
Key Topics Covered