Lesson 2.1: ROS 2 Core Concepts
Duration: 45 minutes Hardware Tier: Tier 1 (Laptop + Cloud) Layer: L1 (Manual Foundation)
Learning Objectives
By the end of this lesson, you will be able to:
- Explain the ROS 2 distributed architecture and its advantages over centralized systems
- Describe how DDS middleware enables peer-to-peer communication between nodes
- Identify appropriate Quality of Service settings for different robot communication patterns
- Visualize the computation graph to understand node relationships
Why ROS 2 Architecture Matters
A warehouse robot loses connection to its central server. In ROS 1, the entire system would freeze. In ROS 2, the robot continues operating because nodes communicate directly with each other. This architectural shift from centralized to distributed systems makes modern robots more resilient, scalable, and production-ready.
What is ROS 2 Architecture
ROS 2 (Robot Operating System 2) is a distributed middleware framework that breaks complex robot systems into small, independent programs called nodes. Each node performs one specific task, such as reading sensor data, processing images, or controlling motors. Unlike traditional monolithic programs, ROS 2 nodes run as separate processes that communicate through well-defined channels.
The architecture follows three core principles. First, nodes are independent executables that can start, stop, and restart without affecting other parts of the system. Second, communication happens through standardized message types, ensuring different nodes can work together even if written by different teams. Third, the system is language-agnostic, supporting Python, C++, and other languages through client libraries.
ROS 2 Humble Hawksbill is the current Long-Term Support release, maintained until 2027. It runs on Ubuntu 22.04, Windows 10, and macOS, making it accessible across development environments. The Humble release focuses on real-time performance, security improvements, and production deployment features that were missing in ROS 1.
This modular design means you can develop a camera node on your laptop, test a navigation node in simulation, and deploy both to a physical robot without rewriting code. Each node remains a black box to others, exposing only its inputs and outputs through topics, services, and actions.
DDS Middleware and Peer-to-Peer Communication
DDS (Data Distribution Service) is the communication layer that powers ROS 2. Unlike ROS 1, which required a central master node to coordinate all communication, DDS enables peer-to-peer communication where nodes discover and connect to each other automatically. This eliminates the single point of failure that plagued ROS 1 systems.
When you start a ROS 2 node, DDS broadcasts a discovery message on the local network. Other nodes receive this message and learn about the new node's publishers, subscribers, services, and actions. Within seconds, nodes establish direct connections without human intervention. This process is called automatic discovery.
DDS handles the low-level networking details that would otherwise require hundreds of lines of socket programming. It manages message serialization, network transport, congestion control, and reliability guarantees. As a robotics developer, you work with high-level ROS 2 APIs while DDS handles the complexity underneath.
Multiple DDS implementations exist, including Fast DDS (default in Humble), Cyclone DDS, and RTI Connext. ROS 2 abstracts these differences through a common interface called rmw (ROS middleware). You can switch DDS implementations by changing an environment variable, allowing you to choose the best performance for your hardware.
The peer-to-peer model scales better than centralized architectures. A robot with 50 nodes doesn't bottleneck through a single master. If one node crashes, others continue operating. This resilience is critical for production robots in warehouses, hospitals, and factories where downtime costs money.
Quality of Service Settings for Reliable Communication
Quality of Service (QoS) settings control how messages travel between nodes. ROS 2 provides fine-grained control over reliability, durability, and delivery guarantees through QoS profiles. Choosing the right QoS prevents data loss in critical systems while allowing best-effort delivery for high-frequency sensor data.
The reliability setting has two options. Reliable delivery guarantees every message arrives, retransmitting lost packets until acknowledged. Use this for command messages like motor control where missing a stop command could damage hardware. Best-effort delivery sends messages once without confirmation. Use this for sensor streams like camera images where the next frame makes old data obsolete.
The durability setting determines whether late-joining subscribers receive old messages. Transient local durability stores recent messages so new subscribers catch up on missed data. This works well for robot state information. Volatile durability discards old messages, appropriate for real-time sensor feeds where only current data matters.
The history setting controls message queuing. A keep last policy with depth 10 stores the 10 most recent messages. If a subscriber falls behind, it skips to the newest messages rather than processing a backlog. A keep all policy never discards messages, risking memory exhaustion if subscribers can't keep up.
Here's how different robot systems use QoS profiles:
| System Component | Reliability | Durability | History | Reason |
|---|---|---|---|---|
| Motor commands | Reliable | Volatile | Keep last 1 | Must arrive, only latest matters |
| Camera stream | Best-effort | Volatile | Keep last 5 | High frequency, old frames useless |
| Robot state | Reliable | Transient local | Keep last 10 | New nodes need current state |
| Emergency stop | Reliable | Volatile | Keep last 1 | Critical, must not be lost |
Mismatched QoS between publisher and subscriber prevents communication. A reliable publisher won't connect to a best-effort subscriber. ROS 2 enforces these rules to prevent silent data loss that could cause robot failures.
The Computation Graph Visualizes Node Relationships
The computation graph is a visual representation of all active nodes and their communication channels. Nodes appear as circles, topics as arrows connecting them. This graph helps you understand data flow, debug connection issues, and verify your system architecture matches your design.
Every ROS 2 system has an implicit graph structure. When you run ros2 node list, you see all active nodes. When you run ros2 topic list, you see all communication channels. The rqt_graph tool combines this information into an interactive diagram where you can click nodes to inspect their connections.
Consider a simple mobile robot with three nodes. A camera_node publishes images to the /camera/image topic. A vision_node subscribes to that topic and publishes detected objects to /detected_objects. A navigation_node subscribes to detected objects and publishes velocity commands to /cmd_vel. The computation graph shows this pipeline visually, making the data flow obvious.
The graph updates in real-time as nodes start and stop. If the vision node crashes, the graph shows the broken connection between camera and navigation. This immediate feedback helps you diagnose problems faster than reading log files. You can also filter the graph to show only specific nodes or topics, reducing clutter in complex systems.
Understanding the computation graph is essential for debugging. If your robot isn't responding to commands, check whether the command publisher and motor subscriber are connected. If sensor data seems stale, verify the sensor node is publishing at the expected rate. The graph makes these invisible connections visible.
Creating Your First ROS 2 Node
Let's build a minimal ROS 2 node to see these concepts in action. This node will initialize the ROS 2 runtime, create a node object, and log a message to demonstrate the basic structure every node follows.
What we're building: A simple ROS 2 node that demonstrates initialization and logging.
# minimal_node.py
import rclpy
from rclpy.node import Node
class MinimalNode(Node):
def __init__(self):
super().__init__('minimal_node')
self.get_logger().info('Minimal node has started')
def main(args=None):
rclpy.init(args=args)
node = MinimalNode()
rclpy.spin(node)
rclpy.shutdown()
if __name__ == '__main__':
main()
Expected output:
[INFO] [minimal_node]: Minimal node has started
What's happening:
- Line 2:
rclpyis the Python client library for ROS 2, providing all core functionality - Line 3:
Nodeis the base class that all ROS 2 nodes inherit from - Line 6:
super().__init__('minimal_node')registers this node with the name 'minimal_node' in the computation graph - Line 7:
get_logger().info()writes to the ROS 2 logging system, which timestamps and categorizes messages - Line 10:
rclpy.init()initializes the ROS 2 runtime and connects to DDS middleware - Line 12:
rclpy.spin()keeps the node running and processing callbacks until interrupted - Line 13:
rclpy.shutdown()cleanly disconnects from DDS and releases resources
This pattern appears in every ROS 2 node you'll write. The initialization, spin, and shutdown sequence ensures proper resource management and clean exits when you press Ctrl+C.
Save this code as minimal_node.py and run it with python3 minimal_node.py. In another terminal, run ros2 node list to see your node in the computation graph. Press Ctrl+C to stop the node cleanly.
Key Takeaways
- ROS 2 uses a distributed architecture where nodes communicate peer-to-peer through DDS middleware, eliminating the single point of failure from ROS 1's centralized master.
- DDS provides automatic node discovery, allowing nodes to find and connect to each other without manual configuration.
- Quality of Service settings control message delivery guarantees, with reliable delivery for critical commands and best-effort for high-frequency sensor data.
- The computation graph visualizes all nodes and their communication channels, making system architecture and data flow visible for debugging.
- Every ROS 2 node follows the same pattern: initialize the runtime, create a node object, spin to process callbacks, and shutdown cleanly.
Check Your Understanding
-
What advantage does ROS 2's peer-to-peer architecture provide over ROS 1's centralized master node?
-
A robot's emergency stop button publishes to the
/emergency_stoptopic. Should this use reliable or best-effort QoS? Explain your reasoning. -
You run
ros2 node listand see five nodes, butrqt_graphshows only three nodes connected. What does this tell you about your system? -
Compare transient local and volatile durability. When would you choose each for a robot's battery status topic?
-
In the minimal node example, what would happen if you removed the
rclpy.spin(node)line? Why?
Next Steps
Now that you understand ROS 2's distributed architecture and communication patterns, the next lesson explores nodes in depth. You'll learn about node lifecycle management, parameters, and how to structure nodes for maintainability and testing.
Hardware Tier 1 Note: Use The Construct for a browser-based ROS 2 environment with pre-configured Humble installation. Alternatively, Foxglove Studio provides visualization tools without requiring local ROS 2 installation. Both platforms let you run the minimal node example and explore the computation graph through web interfaces.