Notes from Wabtec: Controls and Edge Systems in Heavy Industrial Rail

TL;DR

I worked at Wabtec from July 2024 to December 2025 on controls and embedded edge systems for off-highway industrial equipment. The public details have to stay careful. The lessons do not. Fielded industrial systems are not impressed by elegant models. They care about degraded operation, validation discipline, conservative interfaces, diagnostic evidence, and whether another engineer can explain what happened after the machine has already made the expensive noise.

Context

I joined Wabtec in Erie, Pennsylvania in July 2024 as a controls engineer. The work sat around off-highway vehicle drive systems, liquid-cooling controls, diagnostics, high-voltage safety, electrical validation, MATLAB and Simulink models, test plans, and correlation between lab behavior and simulated duty cycles. I left on December 26, 2025 to move back to India and build JouleBridge full-time.

Wabtec is not a startup pretending every problem began last Tuesday. It is a heavy industrial company with real customers, real equipment, real installed base, and real consequences when a system behaves badly. Public reporting for 2024 put Wabtec at $10.39 billion in annual sales. That scale changes the engineering culture. A clever idea is not enough. The idea has to survive process, test, supplier reality, service reality, customer reality, and the fact that industrial equipment can outlive the fashionable software stack around it.

Quality comes not from inspection, but from the improvement of the production process.
W. Edwards Deming, Out of the Crisis, 1982

Deming's sentence is the industrial version of the controls lesson. Inspection matters, but the system improves when the process makes bad states harder to produce and easier to detect. Scale is not a spreadsheet event. Scale is a validation burden.

What industrial rail actually is

Most people hear rail and picture a passenger train. Industrial rail is wider: freight locomotives, mining and off-highway equipment, transit systems, brakes, propulsion, signaling, yard automation, remote monitoring, diagnostics, thermal systems, high-voltage subsystems, and customer-specific duty cycles that make clean abstractions sweat.

The software in this world does not live above hardware. It lives inside it, beside it, and sometimes underneath a protective cover that a field technician removes while everyone hopes the system was documented by an adult. A control loop is not only math. It is math connected to sensors, actuators, wiring, power electronics, thermal mass, communication buses, test benches, calibration procedures, and the person who has to diagnose the issue after a shift.

That is the first thing Wabtec changed in me. It made the word "software" feel too small.

Wabtec public operating scale

Public Wabtec reporting, indexed for article readability. 2024 sales were $10.39B; other bars show public-facing scale signals rather than internal engineering data.

Three lessons that stayed

Three lessons stayed with me. The first is that the field is the source of truth: when the model and the machine disagree, the machine gets the last word, and the work becomes a disciplined investigation of why they diverged. The second is that degraded operation is the default, not the exception. Sensors drift, harnesses age, operators override, and a duty cycle that looked reasonable in simulation turns rude in the lab. The third is that the audit trail is the product. When money, safety, warranty, or customer trust is involved, the question after a fault is not what we think happened. It is what we can prove.

Figure / Industrial controls

The control loop that survives the field

A model is only one part of the system. Diagnostics and evidence are product surfaces.

Sensor

Read the machine, including drift and missing values.

Controller

Apply state, limits, mode, and timing.

Actuator

Move only inside the approved envelope.

Diagnostic

Record the fault path, not only the success path.

Field evidence

Leave enough state for the next engineer.

Animated signal flow through a connected industrial loop.

What controls means here

Controls engineering is the place where clean equations meet equipment with opinions. Feedback loops, state machines, fault logic, thermal response, calibration, and diagnostics all matter. But none of them live alone.

A cooling loop, for example, is not only a control algorithm. It is a physical loop with inertia, delay, sensor placement, actuator behavior, ambient conditions, software timing, and test coverage. A drive system is not only torque logic. It is command arbitration, fault handling, mode management, safety interlocks, and the awkward business of making sure the machine behaves under conditions that did not ask politely before arriving.

This is why I get impatient with "we use AI" pitches that skip the control boundary. A model that proposes an action is not a control system. A control system has state, limits, interlocks, diagnostics, review paths, and consequences. The model can be an input. It is not the authority.

Controls work also teaches respect for time. A ten-millisecond loop, a sensor filter, a thermal response, and a human service procedure all live on different clocks. The design has to say which clock owns which decision. If timing is vague, the system will eventually make two correct local choices that compose into one bad global behavior.

That is where validation becomes more than a final test. The useful question is not only whether the controller reached the expected output. It is whether the team can explain the path from input to state to command to observed machine behavior. A plot is useful. A procedure is useful. A fault log is useful. The value comes from the fact that another person can replay the reasoning without trusting the original engineer's memory.

What edge systems means here

An edge system is the compute and control surface near the equipment. It is close enough to read real state, enforce local rules, and keep operating when the cloud is absent or irrelevant. In heavy industrial systems, edge design is not a branding category. It is the only sane place for many decisions to happen.

The tension is between determinism and integration. The local controller has to behave predictably. The larger product wants telemetry, configuration, service workflows, remote diagnostics, and sometimes fleet-level coordination. Too much isolation and the equipment becomes opaque. Too much integration and the equipment becomes a distributed systems accident with wheels.

The same tension shows up wherever software starts to govern equipment it cannot afford to misunderstand. Locality is not a deployment style. It is where the evidence has to live.

Industrial edge design pressure

Qualitative weights from industrial controls field experience. Degraded operation and audit evidence dominate because field systems fail in ways dashboards do not predict.

The algorithm was never the point

I thought algorithms were the main event. Wabtec made that assumption smaller.

The algorithm matters, but the production envelope around it often matters more: validation, calibration, diagnostic coverage, fault isolation, serviceability, EMC behavior, test reports, and the exact wording of a procedure another person has to run. ISO 13766-style electromagnetic compatibility context is a good reminder that machines do not operate in a clean software universe. They operate around noise, cables, power electronics, and hardware that makes software people suddenly rediscover physics.

I also became less romantic about rewrites. Industrial systems contain old decisions because old decisions survived. Some are ugly. Some are scars. Some are genuinely wise. The reflex to throw everything away and start fresh is often the reflex of someone who has not yet paid the cost of missing a hidden behavior.

It is harder to read code than to write it.
Joel Spolsky, Things You Should Never Do, Part I, 2000

That line explains a lot of industrial conservatism. Reading an old system is hard because it contains decisions, defects, repairs, customer cases, timing assumptions, field fixes, and institutional memory. Rewriting it is tempting because the new version has not yet been humiliated by reality.

A controls model can look clean in isolation. Then it meets scheduling, sensor quality, actuator limits, communication timing, calibration drift, firmware versions, lab equipment, operator procedures, and test coverage. The model is still important, but it is no longer the whole thing.

That lesson now shapes how I judge every engineering product. If a system cannot describe its operating envelope, it is immature. If it cannot explain failure, it is immature. If it cannot be tested by someone other than the author, it is immature. If the only proof is a demo video, it is marketing.

Validation is not bureaucracy

The worst way to read industrial process is to treat it as paperwork invented by people who hate velocity. Some of it is slow. Some of it is annoying. Some of it is there because the system has already learned what happens when nobody writes down the boundary.

Validation is not bureaucracy when the machine is expensive, safety-relevant, or customer-facing. It is the part of engineering that makes a claim transferable. A test plan says what was checked. A report says what happened. A diagnostic says what the system saw. A calibration procedure says how to reproduce the state. These documents are not separate from the system. They are how the system travels across people.

That point changed how I think about founder speed. Moving fast is useful only if the artifacts survive outside the founder's head. If the only person who understands the system is the person sprinting, the company is not fast. It is fragile.

Diagnostics are product design

Before Wabtec, I thought of diagnostics as something that came after the real system. The control algorithm does the work, then diagnostics help when something goes wrong. That is the amateur ordering.

In industrial systems, diagnostics are part of the product surface. A fault code, threshold, event log, or test report is not only for engineering. It is for service, warranty, customer trust, field triage, and the next design review. A machine that fails silently is not merely inconvenient. It forces every later person into detective work with incomplete evidence.

Good diagnostics have to be designed with the same seriousness as control behavior. What exactly did the system observe? Was the value outside a limit or was the sensor unavailable? Was the command rejected because of a safety interlock, a mode mismatch, a range violation, or missing state? Did the fault clear itself? Did it latch? Did the operator override it? Did the system keep enough history to reconstruct the sequence?

Those questions are why rejected commands matter. A system that only records successful actions is flattering itself. The failed attempts, denied proposals, and weird edge cases are where the design becomes inspectable.

The diagnostic surface also changes behavior before a fault occurs. When engineers know that a bad state will be visible later, they design the boundary more carefully. When a technician can see why a command was refused, the refusal becomes part of trust rather than a mysterious blockage. When a customer can see which state the machine believed, the conversation moves from accusation to evidence.

The same lesson applies to websites and pitch decks, oddly enough. If a founder says "our system is reliable" but cannot show the diagnostic path, the sentence is empty. Reliability is not an adjective. It is a trail of tests, limits, failures, recovery behavior, and evidence.

Customer constraints change the shape of engineering

Industrial customers do not buy the same way early software adopters buy. They do not care that the repo is elegant if the machine cannot be serviced. They do not care that the demo worked if the validation boundary is unclear. They do not care that the model is smart if a technician cannot diagnose it at 2 AM.

This changes the product shape. It pushes engineering toward conservative interfaces, explicit state, boring recovery paths, readable logs, clear versioning, and documentation that assumes the original author is unavailable. That assumption is healthy. Every serious system should be understandable after its author leaves the room.

It also changes how you think about "innovation." The startup version often treats innovation as novelty. Industrial customers usually care more about risk reduction. A new control strategy, gateway, diagnostic layer, or dispatch system is only valuable if it reduces some real operating pain without creating a larger unknown downstream.

This is why I do not want JouleBridge to sell itself as magic. The buyer does not need magic. The buyer needs a way to prove meter reads, charger commands, policy decisions, and billing-period evidence when the normal systems disagree. That is less cinematic than "AI for energy." It is also more useful.

What did not transfer

Not every Wabtec lesson maps cleanly to startups. Large industrial companies can afford process that would suffocate an early company. A startup cannot run every decision through a mature enterprise validation stack before it has a customer. A founder who copies big-company process without understanding why it exists will build a museum, not a company.

The useful transfer is selective. Keep the respect for field evidence. Keep the discipline around diagnostics. Keep the habit of naming limits. Keep the bias toward artifacts another person can inspect. Drop the ceremony that exists only because a large organization needs coordination overhead to survive its own scale.

That distinction matters for JouleBridge. The first pilot should not look like a full utility certification program. It should look like a narrow, measurable field trial with strong evidence: site boundary, adapter path, signed records, known failure modes, verifier output, and a human operator who can say whether the artifact reduced pain. Small scope, serious evidence.

This is the startup version of the industrial lesson. Move fast, but make the claim checkable.

The part I still keep from the industrial environment is the suspicion of unsupported certainty. A clean dashboard can be useful, but it is never the source of truth by itself. A green status light can be useful, but it should have a path back to the conditions that made it green. A model output can be useful, but it needs the surrounding state that makes the output safe to use.

That habit is slower than pure demo culture. It also saves time later. The first time a fault appears, the team either has a record or starts archaeology. The first time a customer asks what happened, the company either has an answer or starts telling a story. Wabtec made that difference feel less philosophical and more operational.

The best version of speed keeps that record small and real. It names the boundary, runs the narrow test, records the result, and lets the next engineer inspect it without a meeting.

Limits of this paper

I am not publishing confidential Wabtec design details. No internal architectures, customer data, schematics, test reports, control logic, or supplier specifics appear here. That is not a coy omission. It is the boundary.

The useful public artifact is pattern transfer. Wabtec gave me a fielded-systems education at industrial scale. It taught me to distrust unverified data, respect conservative interfaces, treat degraded operation as normal, and make evidence a first-class product surface. Those lessons now show up in the systems I build.

Closing

The lesson from Wabtec is not that heavy industry is slow and startups are fast. That is the childish version.

The lesson is that fielded systems make a deal with reality. Reality gets to be messy. The system still has to behave, the evidence still has to exist, and the next engineer still has to understand what happened. That is the bar I am trying to keep when I build edge software now.