Skip to main content

Chapter 2: Conversational Robotics

Duration: Week 13 Hardware Tier: Tier 2-3 Lessons: 4

Coming Soon

This chapter is currently in outline form. Full content will be available in a future update.

Chapter Overview

"Clean the room." Four syllables that require understanding intent, decomposing into subtasks, perceiving the environment, and executing dozens of coordinated actions. This chapter teaches you how to build robots that understand and act on natural language.

Learning Objectives

  • Integrate OpenAI Whisper for voice command processing
  • Build speech recognition and NLU pipelines
  • Design multi-modal interaction systems
  • Use LLMs for cognitive planning and task decomposition

Lessons (Outline)

#LessonDurationStatus
2.1Voice-to-Action with OpenAI Whisper75 min📝 Outline
2.2Speech Recognition and NLU60 min📝 Outline
2.3Multi-Modal Interaction75 min📝 Outline
2.4Cognitive Planning with LLMs90 min📝 Outline

Key Topics Covered

  • Whisper integration with ROS 2
  • Intent classification and entity extraction
  • Gesture recognition with MediaPipe
  • LLM-based task decomposition
  • Action primitive libraries
  • Grounding language in robot capabilities