Config Wars - Chapter 1: Intro to Schemas

Jul 08, 2025

Welcome to Config Wars! This series is designed to help scaling robotics teams understand why schemas are useful for their configuration management and navigate which schema is best suited for their stack.

Setting the Stage

You’ve done it! You’ve officially gone from 0 to 1. You started with a crazy idea for a robot that could automate a task for a customer, and after countless iterations, you built a prototype. After more testing and tuning, it worked, not just in the lab, but in the field. You shipped it to your first customers, and they love it.

Now, you’ve set your ambitions higher. Instead of just a few robots in production, you want 10s, 50s, or 100s across customer sites around the world. It’s time to scale.

To support the scale-up, the entire team will have their hands full.

The hardware team will need to finalize the BOM and establish QA for a growing fleet.

The ML team needs to generalize their models across new customers and environments.

And the software infra team? They’ll be tackling today’s topic: configuration!

At the scale of just a few robots, most teams are using vanilla JSON/YAML for their configs. They’re iterating very quickly, constantly experimenting and manually tuning parameters. The robot fleet and codebase are small enough such that it’s tractable for SWEs to remember which configs do what, why specific fields have been modified, and manage the fleet’s configuration. Usually, along with some light version control, the fleet’s configuration is in a manageable state.

However, to support your scaling efforts, you’ll want to move to a configuration system defined by a schema language.

As I’ve observed, for teams comprised of engineers from the embedded and robotics world, it’s challenging to select the correct schema language. Historically, schema languages have been used in cloud software, either for SaaS applications or large-scale backend infrastructure. This means our friends in robotics don’t have much first-hand experience.

In this blog series, we’ll give you all the information you need to choose the right schema for your production fleet. We’ll walk through what a schema language is, why they’re useful, our main criteria when choosing a schema, and finally, a few options for popular schemas in robotics.

Chapter 1 is all about laying down the basics:

What is a config? Why do they matter in robotics?
Why YAML/JSON don’t scale
What is a schema language? Why is it helpful?
Criteria for choosing a schema language

By the end, you’ll have the knowledge and confidence to evaluate a schema for your own use case.

Let’s dive in!

What is a Config?

In software systems, a configuration (or config) is a set of parameters that define how the system behaves. The goal is to decouple application logic from runtime values, allowing behavior to be modified without needing to change code.

Robotics follows the same principle. Configs are parameters, typically injected at runtime, that define how a robot operates. They make the system adaptable: we can swap out our configs to adjust for different hardware versions, environments, and use cases.

This enables modularity. Instead of having to rewrite code every time we swap out a sensor, we can change the parameter values for the new serial number and calibrations. Rather than having a completely different perception module for our picking and packing robots, we can toggle a boolean depending on our use case.

Configs are particularly important in robotics because of the complexity and diversity of the systems we build. Subsystems like firmware, perception, motion control, and ML are developed independently but must interoperate reliably. Configs are the shared interface that lets them coordinate without hardcoding any assumptions.

Customers we work with often have thousands of config parameters per robot (one notably has 10,000 on a single system). This makes managing these configs for your fleet one of the most important infrastructure mandates as your fleet scales.

Failure Mode: Why YAML / JSON Don’t Scale

One of the most common failure modes in robotics config management is going schema-less.

At the scale of a few robots, using raw JSON or YAML is often the right call. Some particularly daring teams even use .txt files! At that point, we’re iterating very quickly, constantly experimenting, and manually tuning parameters. The robot fleet and codebase are small enough such that it’s tractable for SWEs to remember which configs do what, why specific fields have been modified, and manage the fleet’s configuration. Usually, along with some light version control, the fleet’s configuration is in a manageable state.

But once the fleet starts scaling, the flaws of this approach become evident.

An unstructured config means that any fatfingering or typo can crash your application. If someone sets max_speed: -1 or misspells controller_mode , you won’t know until it’s deployed and running on your robot. By then, it’ll be too late.

I’ve met a team that burned $50k because they misconfigured one of their robot arms, and it crashed into a wall. These mistakes are expensive!

Config drift is also an issue. Each robot in the fleet will have a different permutation of its configs. They will vary based on their hardware version, business logic (such as a customer’s safety settings), sensor calibrations, etc. To account for this, we’d like to be able to override configs by layering robot-specific config values over our base configuration. But JSON and YAML don’t support this kind of inheritance. This means that we’ll be stuck copy and pasting files over every time we need to update a config, which is slow and brittle.

Finally, you won’t have the benefit of the rich tooling ecosystem that comes with schema languages. They help validate configs, automatically generate structs and classes based on configuration files, and provide other useful utilities for your CI/CD pipeline.

So What Is a Schema Language? Why are They Helpful?

Let’s talk about how schemas save you from those aforementioned growing pains.

Schema languages define the structure, types, and rules your configuration data must follow. In practice, they describe things like:

What fields are required
What types each field should be
What values are allowed (enums, ranges, patterns)
Depending on the language, what defaults or logic apply

navigation: {
  max_speed:     float & >=0 & <=5      // meters per second
  min_speed:     float & >=0 & < max_speed
  safety_margin: float | *0.2           // default to 0.2 meters
  mode:          "indoor" | "outdoor"   // enum
  localization: {
    method:      "ekf" | "fastslam"
    enabled:     bool | *true
  }
}

In this example of a navigation module config, we can see how a schema is useful. It constrains speed values, sets defaults, uses enums for mode, and nests a localization block with its own rules.

This will be important as we scale and are changing these values for each robot. No fat fingering that makes it to production!

The most useful schema languages also solve scaling problems by making configs easier to write, validate, version, and generate code from. They can:

Catch errors early with static validation
Auto-generate boilerplate code or documentation
Compose and override configs across environments
Enable runtime introspection

What to Look For in a Schema Language

Not all schema languages were built for the same purpose. Some came from web APIs, others from serialization, and others were designed with configs in mind. Each comes with tradeoffs.

When picking one, these are the dimensions we see robotics teams care about most:

Core Dimensions:

Validation Model:

When does the schema catch errors? Runtime? Compile time?
Does it support required fields, type checks, value constraints, etc.?

Code Generation:

Can you generate code from your schema in languages that we (robotics teams) care about (structs or classes in C++, Python, or Rust)?

Composability / Overrides

Can you override or extend configs cleanly across environments?
How easy are these overrides to create? What kind of logic do we write?
How powerful are these overrides? What level of granularity can we unlock?

Templating / Logic / Computation:

Can you express conditional logic, like if statements
Can you use variables, functions, or computed values in your config?
Is the language Turing-complete? If so, what are the potential side effects?

Self-Documentation / Readability:

Does the schema support built-in descriptions and field-level metadata?
Can a new engineer understand the config by reading it? Is it easy to edit and debug?
Can it double as internal documentation?
Will your team hate you for choosing it?

Tooling Ecosystem:

How production-grade and well-maintained are its tools?
Is there an official validate library that works well?
Can you integrate schema validation into your CI/CD pipeline?

Alongside these core features, here are some nice-to-haves worth mentioning:

Bonus Dimensions:

Change Safety / Drift Detection:

Can you detect when a config change happens and perform taint tracking?
Can you diff between config versions?

Declarative vs Imperative Semantics:

Is the language declarative (describe what the config should be) or imperative (describe how to compute it)?
How much complexity is acceptable for your config system?

Schema vs Instance:

Does the schema live in a separate file from the config instance?
Or is it all blended into a single file (logic, constraints, and data)?

Summary:

At a small scale, YAML and JSON are acceptable solutions for robotics configs. But as the fleet grows, it becomes a problem that’s too painful to ignore.

Schema languages give your configuration much needed structure. They define what’s required, what values are valid, and how configs can vary across the fleet. This allows for config validation, overrides, and codegen of typed config objects.

Instead of relying on manual, brittle processes, schemas and their tooling helps teams scale their config management solution alongside their fleet.

Now that we know what configs are, why unstructured formats don’t scale, and what schema languages can offer, the next step is choosing the right one. In the next post, we’ll dive into JSON Schema, why it’s widely adopted, what it gets right, and where it falls short.

Config Wars Series Index