Embedded Systems Engineering Interview Questions

32 questions. Expand any one to see what the interviewer is really probing for and how to structure a strong answer, then practice it live with AI.

  1. What is the difference between a microcontroller and a microprocessor, and when would you choose one over the other for a product?

    Foundational
    How to answer

    What they’re really asking

    They want to confirm you understand the basic system-on-chip versus discrete CPU tradeoff and can reason about it at a product level, not just recite a definition.

    Strong answer structure

    Microcontroller integrates CPU, RAM, flash, and peripherals (timers, ADC, UART) on one die; microprocessor is just the CPU and needs external memory and peripherals. Choose an MCU for cost-sensitive, low-power, deterministic control tasks (motor control, sensors). Choose an MPU when you need an OS, high compute, lots of RAM, or a rich UI (Linux gateway, camera). Mention BOM cost, board complexity, boot time, and power as decision drivers.

    Likely follow-ups

    • How does boot time differ between an MCU running bare metal and an MPU booting Linux?
    • Where does an application-class MCU with an MMU fit in this picture?
  2. Explain what the volatile keyword does in C and give a concrete embedded scenario where omitting it causes a bug.

    Foundational
    How to answer

    What they’re really asking

    They want to know you understand how the compiler can optimize away memory accesses and why hardware registers and ISR-shared variables must defeat that.

    Strong answer structure

    volatile tells the compiler the value can change outside the normal program flow, so it must re-read from memory on every access and not cache it in a register or reorder/eliminate the access. Scenario: polling a status register bit in a while loop, or a flag set inside an ISR and read in main; without volatile the compiler hoists the read out of the loop and you spin forever or never see the flag. Note volatile does NOT provide atomicity or memory ordering across cores.

    Likely follow-ups

    • Does volatile make a multi-byte variable safe to share between an ISR and main? Why not?
    • When would you use volatile and a memory barrier together?
  3. Walk me through what happens, step by step, from a hardware interrupt firing to your ISR code executing on an ARM Cortex-M.

    Intermediate
    How to answer

    What they’re really asking

    They want to see whether you actually understand the interrupt mechanism end to end: the NVIC, vector table, context save, and priority handling.

    Strong answer structure

    Peripheral asserts an interrupt line to the NVIC. NVIC checks the interrupt is enabled and its priority beats the current execution priority (and BASEPRI/PRIMASK masks). Cortex-M automatically stacks R0-R3, R12, LR, PC, xPSR onto the current stack. It loads the handler address from the vector table at the exception number's slot, sets LR to an EXC_RETURN magic value, and branches. Your ISR runs, ideally short: clear the interrupt flag, do minimal work, signal a task. On return, the magic LR triggers automatic unstacking and resumes the interrupted code. Mention tail-chaining and late-arrival optimizations.

    Likely follow-ups

    • Why might the ISR keep firing immediately after you return from it?
    • What is EXC_RETURN and how does the core know which stack to unstack from?
  4. How do you safely share a multi-byte variable, like a 32-bit counter, between an ISR and your main loop on an 8-bit or 16-bit MCU?

    Intermediate
    How to answer

    What they’re really asking

    They are probing your understanding of atomicity and tearing on small architectures, and whether you reach for the right mechanism without over-engineering.

    Strong answer structure

    The read or write is non-atomic on an 8/16-bit core, so an interrupt mid-access causes a torn value. Options: briefly disable interrupts (critical section) around the main-loop access; or use a double-read / read-compare-reread loop for a monotonic counter; or use a hardware atomic if available. Keep the critical section as short as possible to bound interrupt latency. Mark the variable volatile so the compiler does not cache it. On Cortex-M, LDREX/STREX or disabling via PRIMASK are alternatives.

    Likely follow-ups

    • What is the downside of disabling interrupts for too long?
    • How does this differ on a 32-bit Cortex-M where the access is naturally aligned?
  5. Compare SPI, I2C, and UART. Given a design, how do you pick between them?

    Foundational
    How to answer

    What they’re really asking

    They want a clear grasp of the three workhorse serial buses and the practical tradeoffs that drive a real selection.

    Strong answer structure

    UART: asynchronous, no clock, point-to-point, two wires (TX/RX), needs matched baud rate, good for logs/modems. I2C: synchronous, two wires (SDA/SCL), multi-drop with 7/10-bit addresses, open-drain with pull-ups, slower (100k/400k/1M), great for many low-speed sensors sharing a bus. SPI: synchronous, full-duplex, four wires plus one chip-select per device, fast (tens of MHz), no addressing so wiring grows with devices. Pick by speed, pin count, number of devices, duplex needs, and whether you need addressing versus raw throughput.

    Likely follow-ups

    • Why does I2C need pull-up resistors and how do you size them?
    • How does SPI handle multiple devices on the same bus?
  6. Describe how I2C clock stretching works and a situation where it causes a hard-to-find bug.

    Advanced
    How to answer

    What they’re really asking

    They want deep I2C protocol knowledge and awareness of real-world interop failures, signaling senior-level field experience.

    Strong answer structure

    A slave that needs more time holds SCL low after the master releases it, stalling the master until it releases SCL. Bug: some master peripherals (or bit-banged masters) do not support clock stretching, or have buggy stretching, so a slow slave's stretch is misread as a stuck bus or a timeout, causing corrupted transactions. Also a slave can get stuck holding SDA low after a glitch, hanging the bus. Mitigations: check the master's stretching support, add a bus-recovery routine that clocks SCL nine times to free a stuck slave, use timeouts, and verify with a scope/logic analyzer.

    Likely follow-ups

    • How do you recover a bus where a slave is holding SDA low?
    • How would you detect on a scope that clock stretching is happening?
  7. What is switch bounce, and walk me through both a hardware and a software approach to debouncing.

    Foundational
    How to answer

    What they’re really asking

    They want to confirm you understand a fundamental real-world input problem and can solve it pragmatically in both domains.

    Strong answer structure

    Mechanical contacts make and break several times over a few milliseconds, producing spurious edges. Hardware: RC filter plus Schmitt trigger, or an SR latch with a SPDT switch. Software: sample the pin periodically (e.g., every 5-10 ms) and only register a state change after N consecutive stable samples, or take an edge then ignore further edges for a debounce window. Mention that ISR-on-every-edge is the naive trap; prefer a timer-based sampling approach or debounce inside the ISR plus a lockout.

    Likely follow-ups

    • Why is triggering an interrupt on every edge a poor debouncing strategy?
    • How would you debounce 16 buttons efficiently without 16 timers?
  8. Explain memory-mapped I/O. How would you write a driver to set a specific bit in a 32-bit peripheral control register in C?

    Intermediate
    How to answer

    What they’re really asking

    They want to see you comfortable with the pointer-to-hardware idiom, volatile, and read-modify-write semantics.

    Strong answer structure

    Peripheral registers live at fixed physical addresses in the same address space as memory. Access via a volatile pointer: define the register address, cast to volatile uint32_t*, and do read-modify-write: *REG |= (1u << BIT) to set, *REG &= ~(1u << BIT) to clear. Stress volatile so reads/writes are not optimized away. Note read-modify-write is non-atomic, so an ISR touching the same register can corrupt it; use bit-band or set/clear registers (BSRR on STM32) where available. Prefer a struct overlay matching the register map for clean drivers.

    Likely follow-ups

    • Why are dedicated set and clear registers (like STM32 BSRR) safer than read-modify-write?
    • How does a packed struct overlay of the register map help, and what alignment risks exist?
  9. What is the difference between a preemptive RTOS scheduler and a cooperative scheduler, and what determines task priority assignment?

    Intermediate
    How to answer

    What they’re really asking

    They want to gauge your RTOS fundamentals and whether you understand the responsiveness-versus-complexity tradeoff.

    Strong answer structure

    Preemptive: the scheduler can interrupt a running task when a higher-priority task becomes ready, giving bounded response latency but requiring careful synchronization for shared data. Cooperative: tasks run until they yield, simpler and no mid-task preemption races, but a misbehaving task starves others. Priority assignment is typically rate-monotonic (shorter period / tighter deadline gets higher priority) for hard real-time, balanced against avoiding starvation and priority inversion. Mention deadline-driven reasoning and keeping high-priority tasks short.

    Likely follow-ups

    • What is rate-monotonic scheduling and what assumptions does it rely on?
    • How does an RTOS tick interact with the scheduler?
  10. Explain priority inversion and how a priority inheritance mutex solves it.

    Advanced
    How to answer

    What they’re really asking

    They want to see if you know the classic RTOS failure mode (famously the Mars Pathfinder bug) and its standard mitigation.

    Strong answer structure

    A high-priority task blocks on a mutex held by a low-priority task; a medium-priority task then preempts the low-priority holder, so the high-priority task is indirectly blocked by a medium task that should not outrank it. Priority inheritance: while the low-priority task holds a mutex a higher-priority task wants, it temporarily inherits that higher priority so it runs, releases quickly, and is then restored. Mention the Mars Pathfinder watchdog resets, and alternatives like priority ceiling protocol. Note inheritance mutexes versus plain semaphores.

    Likely follow-ups

    • Why does a binary semaphore not protect against priority inversion the way a mutex does?
    • What is the priority ceiling protocol and how does it differ?
  11. How do you communicate between an ISR and a task safely in an RTOS? What can and cannot be called from interrupt context?

    Intermediate
    How to answer

    What they’re really asking

    They want to confirm you understand deferred processing, FromISR API variants, and the constraints of interrupt context.

    Strong answer structure

    Keep ISRs short: do the minimum, then signal a task to do the heavy work (deferred interrupt handling / bottom half). Use ISR-safe APIs like FreeRTOS xQueueSendFromISR or a task notification, and handle the higher-priority-task-woken yield request. You cannot block in an ISR, cannot call APIs that may sleep, and generally avoid malloc, printf, and floating point unless the context saves FPU state. Watch interrupt priority versus the RTOS max-syscall priority threshold.

    Likely follow-ups

    • What is the FreeRTOS configMAX_SYSCALL_INTERRUPT_PRIORITY and why does it matter?
    • Why might calling printf in an ISR be dangerous?
  12. Walk me through the memory layout of a typical bare-metal firmware: where do .text, .data, .bss, the stack, and the heap live, and what does the startup code do before main?

    Intermediate
    How to answer

    What they’re really asking

    They want to verify you understand the link map, what the linker script controls, and what runtime initialization actually happens.

    Strong answer structure

    .text (code) and .rodata in flash; .data (initialized globals) has its init values in flash but lives in RAM; .bss (zero-initialized globals) in RAM. Startup/reset handler: set the stack pointer, copy .data from flash to RAM, zero .bss, call libc init / constructors, configure clocks, then call main. Stack typically grows down from the top of RAM, heap grows up; collision is the classic stack overflow into heap. The linker script defines the memory regions and section placement.

    Likely follow-ups

    • How would you detect a stack overflow at runtime on a Cortex-M?
    • Why are initialized global values stored in flash but copied to RAM?
  13. What are the main differences between using DMA versus interrupt-driven versus polled I/O for moving data from a peripheral?

    Intermediate
    How to answer

    What they’re really asking

    They want to see you can reason about CPU load, latency, and throughput when choosing a data-transfer strategy.

    Strong answer structure

    Polled: CPU busy-waits, simple but wastes cycles and scales poorly. Interrupt-driven: CPU does other work and is notified per byte/event, good for moderate rates but per-interrupt overhead hurts at high throughput. DMA: hardware moves data between peripheral and memory without CPU involvement, ideal for high-rate or bulk transfers (ADC streams, SPI displays, audio), with one interrupt at half/full transfer. Tradeoffs: DMA frees the CPU but adds setup complexity, cache-coherency concerns, and bus contention.

    Likely follow-ups

    • What cache coherency problem arises with DMA on a Cortex-M7 and how do you fix it?
    • How do double-buffering / ping-pong DMA help with continuous streams?
  14. You have a firmware bug that only appears after several hours of running in the field and never on your bench. How do you debug it?

    Advanced
    How to answer

    What they’re really asking

    They want to see a structured debugging methodology for intermittent, hard-to-reproduce embedded failures, not just guesses.

    Strong answer structure

    Characterize: what fails, how often, under what conditions (temperature, load, timing). Hypothesize common slow-burn causes: memory leaks/heap fragmentation, stack overflow, counter or timer overflow/wraparound, race conditions, watchdog interactions, ESD/brownout, or accumulating sensor drift. Instrument without changing timing too much: add a circular RAM log / trace buffer, capture min/max stack high-water mark, log free heap, enable fault handlers that dump registers and the stacked PC. Reproduce by accelerating (stress, raise temperature, speed up time base). Use a debugger with non-stop trace (ITM/SWO, ETM) or a watchdog that captures state. Bisect recent changes.

    Likely follow-ups

    • How would a 32-bit millisecond counter overflow cause a once-in-49-days bug?
    • How do you capture the CPU state at the moment of a hard fault?
  15. Explain the difference between the low-power sleep modes on a typical MCU and how you decide which to use.

    Intermediate
    How to answer

    What they’re really asking

    They want practical power-management knowledge: the tradeoff between current draw, retained state, and wake latency.

    Strong answer structure

    Modes trade current against what stays alive and how fast you wake. Sleep/idle: CPU clock gated, peripherals and RAM live, fast wake, microamps-to-milliamps. Stop/deep-sleep: most clocks off, RAM retained, wake from a few sources (RTC, EXTI), longer wake latency, low microamps. Standby/shutdown: most of the chip powered down, RAM usually lost, wake resets the part, nanoamps-to-microamps. Decide by the duty cycle: how long you sleep, how fast you must respond, what state you must retain, and the wake source available. Quantify with average current = active*ton + sleep*toff over the period.

    Likely follow-ups

    • How do you wake from the deepest mode and what state survives?
    • How would you measure actual sleep current on real hardware?
  16. What is a watchdog timer, and how do you use it correctly without masking real bugs?

    Intermediate
    How to answer

    What they’re really asking

    They want to know you understand watchdogs as a safety net and the anti-patterns that defeat their purpose.

    Strong answer structure

    A hardware timer that resets the MCU if not periodically kicked, recovering from hangs/lockups. Correct use: kick it from a single trusted place (e.g., a supervisor task) only after verifying all critical tasks have checked in, so a hung task is actually caught. Anti-pattern: kicking it in an ISR or a tight loop unconditionally, which keeps petting it even when the system is dead. Capture the reset cause and log a watchdog reset so you can diagnose. Consider a windowed watchdog to catch too-fast as well as too-slow. Save state before reset if possible.

    Likely follow-ups

    • Why is kicking the watchdog inside a timer ISR a bad idea?
    • What is a windowed watchdog and what failure does it additionally catch?
  17. How do you debug a hard fault on an ARM Cortex-M? What information is available and how do you find the offending instruction?

    Advanced
    How to answer

    What they’re really asking

    They want hands-on Cortex-M fault analysis skill: reading fault status registers and reconstructing the faulting context.

    Strong answer structure

    In the HardFault handler, inspect the fault status registers: HFSR, CFSR (with UFSR/BFSR/MMFSR sub-fields), and BFAR/MMFAR for the faulting address. Recover the stacked frame: figure out whether MSP or PSP was active (from EXC_RETURN bit), then read the stacked R0-R3, R12, LR, PC, xPSR; the stacked PC points at (or near) the offending instruction. Look up that address in the .map / disassembly. Common causes: null/wild pointer dereference, unaligned access, stack overflow, dividing by zero with trap enabled, calling a function pointer that is null or has a bad LSB (Thumb bit). Use a debugger to break in the handler.

    Likely follow-ups

    • How do you know whether the fault context used MSP or PSP?
    • Why does an even (non-Thumb) function pointer address cause a fault?
  18. Explain the difference between a pointer and the value it points to in C, and what these declarations mean: const char *p, char *const p, const char *const p.

    Intermediate
    How to answer

    What they’re really asking

    They want airtight C pointer and const-correctness understanding, which separates careful embedded engineers from sloppy ones.

    Strong answer structure

    A pointer holds an address; dereferencing reads/writes the pointed-to object. const char *p: pointer to const char, you can change p but not *p (data is read-only through this pointer, useful for ROM strings). char *const p: const pointer to char, you cannot change p (it always points at the same place) but can modify *p. const char *const p: cannot change either. Read right-to-left or use the spiral rule. This matters for putting data in flash, for API contracts, and to let the compiler catch accidental writes.

    Likely follow-ups

    • Where would const char *const be the right choice for a string in flash?
    • What is the difference between a const pointer and a pointer to volatile?
  19. What is endianness, and how does it bite you when sending a multi-byte value over SPI or serializing a struct to a network?

    Intermediate
    How to answer

    What they’re really asking

    They want to confirm you handle byte ordering correctly across heterogeneous systems and buses.

    Strong answer structure

    Endianness is the byte order of a multi-byte value in memory: little-endian stores the least significant byte first, big-endian the most significant first. It bites when two systems with different endianness exchange raw bytes, or when you cast a byte buffer to a multi-byte type. Fixes: define a wire byte order (network/big-endian is common), serialize/deserialize byte by byte or with htons/ntohl-style helpers, and never memcpy a struct across the wire and assume layout. Also watch struct padding and alignment in addition to endianness.

    Likely follow-ups

    • Why is memcpy-ing a struct directly onto the wire risky beyond endianness?
    • How do you write portable code that works on both endiannesses?
  20. How does an analog-to-digital converter (ADC) work conceptually, and what do resolution, sampling rate, and reference voltage mean for your measurement?

    Intermediate
    How to answer

    What they’re really asking

    They want practical mixed-signal literacy: how to interpret ADC specs and avoid common measurement mistakes.

    Strong answer structure

    An ADC samples an analog voltage and quantizes it to a digital code. Resolution (bits) sets the number of code steps; the LSB voltage = Vref / 2^N, so 12-bit at 3.3V Vref gives ~0.8 mV per step. Vref sets full scale and directly affects accuracy and noise; a noisy or wrong reference scales your reading. Sampling rate must satisfy Nyquist (at least 2x the signal bandwidth) to avoid aliasing, so anti-alias filtering is needed. Also account for input impedance/sample-and-hold settling time, INL/DNL, and oversampling/averaging to gain effective bits.

    Likely follow-ups

    • What is aliasing and how do you prevent it?
    • How does oversampling and averaging improve effective resolution?
  21. Explain how PWM works and how you would use it to dim an LED versus control a servo or a motor.

    Foundational
    How to answer

    What they’re really asking

    They want to know you understand duty cycle versus frequency and can apply PWM across different actuators.

    Strong answer structure

    PWM produces a square wave with a fixed period and a variable duty cycle. For LED dimming, average power scales with duty cycle; pick a frequency above flicker perception (>~200 Hz, often kHz) and the eye integrates it. For a hobby servo, the information is in the pulse width (typically 1-2 ms within a 20 ms / 50 Hz frame), not the average. For a DC motor, PWM into an H-bridge controls average voltage/speed; choose a frequency above audible range to avoid whine and consider inductance. Mention resolution = timer clock / (frequency * steps).

    Likely follow-ups

    • Why does PWM frequency choice differ for an LED versus a motor?
    • How does a hardware timer generate PWM without CPU intervention?
  22. What is a circular (ring) buffer, why is it ubiquitous in embedded UART drivers, and how do you implement one that is safe between an ISR producer and a main-loop consumer?

    Intermediate
    How to answer

    What they’re really asking

    They want a core data-structure that you can implement correctly with concurrency in mind, no locks needed.

    Strong answer structure

    A fixed-size buffer with head and write indices that wrap around, giving O(1) enqueue/dequeue with no dynamic allocation, ideal for streaming UART RX/TX. For single-producer (ISR) / single-consumer (main), you can make it lock-free: the ISR only writes the head index, the consumer only writes the tail index, each side reads the other's index. Use a power-of-two size for cheap masking, mark indices volatile, and ensure index updates are atomic on your architecture (or use a memory barrier). Handle full/empty with a spare slot or a separate count.

    Likely follow-ups

    • Why does the single-producer-single-consumer case avoid needing a lock?
    • How do you distinguish a full buffer from an empty one with just two indices?
  23. How would you design a robust firmware update (OTA / bootloader) mechanism for a deployed device so a failed update never bricks it?

    Advanced
    How to answer

    What they’re really asking

    They want system-level design thinking around field updates, atomicity, rollback, and security.

    Strong answer structure

    Use a small immutable bootloader plus dual application banks (A/B) or a download-then-swap scheme. Download new image to a spare slot, verify integrity (CRC/hash) and authenticity (signature) before activation, then atomically switch the active bank by flipping a flag. Keep the previous image so a failed boot triggers automatic rollback; use a boot-count/watchdog confirmation handshake so an image must prove it runs before being marked good. Handle power loss at every step (interrupted download, mid-erase). Encrypt the image in transit, never trust unsigned firmware, and protect the bootloader with read-out protection.

    Likely follow-ups

    • How does an A/B scheme guarantee atomic switchover under power loss?
    • Why is signature verification in the bootloader essential, and where is the public key stored?
  24. What is the difference between flash, EEPROM, and RAM in an embedded system, and how does flash wear affect how you store frequently changing data?

    Intermediate
    How to answer

    What they’re really asking

    They want memory-technology fundamentals and awareness of endurance limits driving real storage design.

    Strong answer structure

    RAM is volatile, fast, byte-writable, used for runtime data. Flash is non-volatile, holds code and constants, erased in blocks/sectors and written in pages, with limited erase/write endurance (often ~10k-100k cycles). EEPROM is non-volatile and byte-erasable with higher endurance, used for small config. Flash wear: rewriting the same sector repeatedly wears it out, so use wear leveling, journaling, or write-counters, and avoid erase-on-every-update patterns. For frequently changing values use EEPROM, FRAM, an external EEPROM, or an emulated-EEPROM library with rotation across flash pages.

    Likely follow-ups

    • How does flash emulation of EEPROM achieve wear leveling?
    • What is FRAM and when would you choose it over flash or EEPROM?
  25. Your UART is receiving garbled bytes at higher baud rates but works fine slowly. How do you diagnose and fix it?

    Intermediate
    How to answer

    What they’re really asking

    They want a systematic hardware/firmware troubleshooting flow for a very common serial problem.

    Strong answer structure

    Garbling that scales with baud usually points to clock/timing or overrun. Check baud-rate error: the peripheral clock and divisor may not produce an accurate baud at high speed (>~2-3% error corrupts framing); verify with a scope on the bit period. Check for RX overrun: at high rates the ISR/DMA may not keep up, so move to DMA or a bigger FIFO/ring buffer. Verify signal integrity: scope the line for rounded edges, reflections, noise, ground bounce, or missing common ground. Confirm framing/parity/stop-bit settings match both ends. Look at framing error flags. Fix by adjusting clock source, enabling DMA, shortening/terminating the cable, or lowering baud.

    Likely follow-ups

    • How do you calculate baud-rate error from the peripheral clock and divisor?
    • What does an overrun error flag tell you and how do you eliminate it?
  26. Explain race conditions in embedded systems and walk me through a critical section. When is disabling interrupts the wrong tool?

    Intermediate
    How to answer

    What they’re really asking

    They want solid concurrency reasoning and judgment about when heavy-handed interrupt disabling is inappropriate.

    Strong answer structure

    A race occurs when two contexts (ISR and main, or two tasks) access shared state and the outcome depends on timing, e.g. a non-atomic read-modify-write interrupted halfway. A critical section protects the shared access; disabling interrupts is the simplest on bare metal but must be short because it raises worst-case interrupt latency and can break real-time guarantees. It is the wrong tool when: you have an RTOS (use a mutex/semaphore so you do not block higher-priority ISRs), when the section is long, on multi-core (disabling interrupts on one core does not stop the other; you need a spinlock), or when an ISR is the contender (you cannot mutex against an ISR). Prefer atomics or lock-free designs where possible.

    Likely follow-ups

    • Why does disabling interrupts not protect shared data on a dual-core MCU?
    • How do you protect data shared between a task and an ISR in an RTOS?
  27. Tell me about a time you had to debug a particularly difficult hardware or firmware issue. How did you approach it?

    Intermediate
    How to answer

    What they’re really asking

    They want evidence of methodical debugging, persistence, and the ability to work across the hardware/software boundary under uncertainty.

    Strong answer structure

    STAR. Situation: a specific elusive bug (e.g., intermittent resets in the field, or sensor data corruption). Task: root-cause and fix it without a reliable repro. Action: describe forming hypotheses, isolating variables, instrumenting with a scope/logic analyzer/trace buffer, reading datasheets/errata, and bisecting changes; show you didn't just shotgun-debug. Result: identified the true cause (e.g., a brownout, a missed pull-up, a stack overflow), the fix, and a preventive measure (added monitoring, errata workaround, test). Emphasize what you learned and how you de-risked future designs.

    Likely follow-ups

    • What tool ended up being most decisive, and why?
    • How did you make sure the same class of bug could not recur?
  28. Describe a situation where you had to ship firmware under a tight deadline with a known limitation or technical debt. How did you handle the tradeoff?

    Intermediate
    How to answer

    What they’re really asking

    They want to see pragmatic engineering judgment, risk communication, and ownership rather than either reckless shipping or perfectionism.

    Strong answer structure

    STAR. Situation: a deadline (e.g., a customer demo or production run) with an unfinished or imperfect feature. Task: decide what is safe to ship. Action: assess the risk of the limitation, contain it (feature flag, conservative defaults, extra validation, documented known issue), communicate the tradeoff clearly to stakeholders, and ensure a safe fallback (e.g., OTA path to patch later). Result: shipped on time without compromising safety/reliability, then paid down the debt on a planned schedule. Emphasize transparent communication and that you never hid the risk.

    Likely follow-ups

    • How did you decide the limitation was acceptable to ship versus a blocker?
    • How did you ensure the technical debt actually got addressed later?
  29. How do you measure and reduce interrupt latency in a real-time system, and why does it matter?

    Advanced
    How to answer

    What they’re really asking

    They want deep real-time analysis skill: understanding what contributes to worst-case latency and how to bound it.

    Strong answer structure

    Interrupt latency is the time from the hardware event to the first instruction of your ISR (and full response includes ISR execution and any deferred task). Contributors: longest period interrupts are disabled (critical sections), higher-or-equal priority ISRs running, context save/stacking time, and on some cores wait states/flash latency. Measure with a GPIO toggle at event and at ISR entry on a scope, or use cycle counters (DWT CYCCNT) and trace. Reduce by minimizing and bounding critical sections, raising the priority of the critical interrupt, keeping ISRs short with deferred processing, avoiding long atomic blocks, and using zero-wait-state RAM for hot handlers. It matters because missing a hard deadline can mean data loss or a safety failure.

    Likely follow-ups

    • What is the difference between interrupt latency and interrupt response time?
    • How do nested interrupts and interrupt priorities affect worst-case latency?
  30. What is stack overflow in an embedded context, how is it different from heap exhaustion, and how do you detect and prevent it?

    Intermediate
    How to answer

    What they’re really asking

    They want awareness of constrained-memory failure modes and concrete detection/prevention techniques.

    Strong answer structure

    Stack overflow: the call stack grows past its allotted region (deep recursion, large local arrays, deep ISR nesting) and corrupts adjacent memory, often .bss/heap or another task's stack. Heap exhaustion: malloc fails or fragmentation prevents an allocation. Detection: paint the stack with a known pattern and check the high-water mark; use an MPU guard region or a redzone; enable RTOS stack-overflow hooks; on Cortex-M use the PSPLIM/MSPLIM stack-limit registers (v8-M) or a MemManage fault. Prevention: size stacks from worst-case analysis, avoid recursion and large stack buffers, prefer static allocation, measure high-water marks in test, and avoid dynamic allocation in long-running firmware to dodge fragmentation.

    Likely follow-ups

    • How does stack painting let you measure worst-case stack usage?
    • Why do many embedded teams forbid dynamic allocation after init?
  31. Why might you avoid dynamic memory allocation (malloc/free) in long-running embedded firmware, and what do you use instead?

    Intermediate
    How to answer

    What they’re really asking

    They want to see you understand determinism, fragmentation, and the reliability culture of embedded/safety-critical code.

    Strong answer structure

    Problems: heap fragmentation over time can make an allocation fail even with enough total free memory; malloc has non-deterministic timing (bad for real-time); failure handling is awkward in deeply embedded code; and it complicates worst-case memory analysis. Alternatives: static/global allocation sized at build time, fixed-size memory pools / block allocators, stack allocation for short-lived data, and arena allocators. Many standards (e.g., MISRA, automotive/aerospace) restrict or forbid dynamic allocation after initialization. If you must allocate, do it once at startup.

    Likely follow-ups

    • How does a fixed-size block pool avoid fragmentation?
    • If you must use malloc, how would you make it safer in firmware?
  32. Explain what a logic analyzer and an oscilloscope each tell you, and how you would use them together to debug an I2C sensor that returns wrong data.

    Intermediate
    How to answer

    What they’re really asking

    They want practical bench-debugging fluency and the judgment to pick the right instrument for a problem.

    Strong answer structure

    An oscilloscope shows the analog shape of a signal over time: voltage levels, rise/fall times, ringing, noise, and signal integrity. A logic analyzer shows many digital lines as decoded protocol over time: it can decode I2C transactions, addresses, ACK/NACK, and data, great for protocol-level bugs. Workflow: first use the logic analyzer to decode the I2C exchange, confirm the right slave address, register pointer, ACKs, and whether the data bytes match expectations or a NACK appears. If the protocol looks valid but data is wrong, or you see NACKs, switch to the scope to check pull-up strength, rise times, voltage levels, and noise on SDA/SCL. Together they separate protocol/firmware bugs from electrical/integrity bugs.

    Likely follow-ups

    • What on a scope would indicate your I2C pull-ups are too weak?
    • How would the logic analyzer distinguish a wrong register address from a wiring fault?