Tools, Morse Code, Developer Tools·Mar 4, 2026

Building a Morse Code Decoder: Line Classification, Greedy Boundaries, and Graceful Errors

How we designed a deterministic Morse decoder that handles mixed input, inconsistent separators, and invalid tokens without silently failing.

Han Chee

The Problem With Existing Morse Decoders

Most online Morse code decoders share a common failure mode: they assume. They assume you will use exactly one space between letters and exactly three spaces (or a /) between words. They assume your input contains nothing but valid Morse sequences. When any of those assumptions breaks — a stray comma, an extra space, a line of plaintext mixed in — they either produce garbage silently or refuse to process the input at all.

The goal for this decoder was different: build something deterministic, lenient on input format, and transparent about failure. A decoder that processes what it can, clearly marks what it cannot, and never silently discards information. The design decisions that follow from that goal are less obvious than they appear.

The Line-Level Classification Decision

The most fundamental design question was where to classify "is this Morse or not?" Three options exist: classify at the character level (per symbol), at the token level (per space-delimited sequence), or at the line level (per newline-delimited segment).

Character-level classification is tempting but creates ambiguity. The characters . and - appear in plaintext too — an ellipsis, a dash in prose, a file path containing ./. Trying to classify individual characters produces false positives constantly.

Token-level classification is more precise but creates a different problem: what do you do with a line that is half-Morse and half-plaintext? Decode the Morse tokens and emit the plaintext tokens verbatim alongside them? The output becomes unreadable — a mixture of decoded characters and raw fragments that neither represent the original nor the decoded message.

Line-level classification gives users a clear mental model: each line is either a Morse line or a passthrough line. If a line contains only the characters ., -, /, |, ,, space, and tab, it is decoded. Everything else passes through unchanged. Empty lines and whitespace-only lines are treated as Morse lines, producing blank output lines and preserving the vertical structure of the input.

This model has an important predictability property: you always know, just by looking at a line, how the decoder will treat it. That predictability is worth more than marginal precision gains from finer-grained classification. It also means mixed-format files — say, a Morse puzzle sheet that includes a header row of instructions — decode correctly without any configuration.

The set of Morse-valid characters is defined precisely rather than permissively. The separators /, |, and , are included because they are all commonly used as word delimiters in practice. Tab is included because pasted content from spreadsheets or terminals often contains tabs rather than spaces. No other characters are included. The definition is closed: if a character is not in the set, the line is not a Morse line.

Greedy Word Boundary Normalization

Once a line is classified as Morse, the decoder normalises all word boundary conventions into a single internal marker before splitting. The /, |, and , characters are all converted to a null byte, and any run of two or more consecutive spaces is also converted to the same null byte. The line is then split on null bytes to produce word segments, and each word segment is split on single spaces to produce individual letter tokens.

This greedy approach means the decoder accepts all common separator conventions without requiring the user to configure which convention they are using. A sequence copied from a web page might use /, while a sequence typed by hand might use (two spaces), while a sequence exported from another tool might use |. The decoder handles all of them identically.

A potential concern is that , also appears in Morse as the sequence --..-- (representing the comma character). There is no actual conflict here, because inside a Morse line, , always appears as a literal character between Morse sequences — it is never a part of a dot-dash sequence itself. Morse sequences consist exclusively of . and -; every other character is structural. So treating , as a word boundary at the structural level is correct, and it cannot be mistaken for part of a Morse encoding.

The consequence of collapsing all separators greedily is that multiple consecutive word boundaries produce a single word gap in the output, which is the correct semantic. A sequence like .- / / -... — two slashes in a row, perhaps from a copy-paste artifact — decodes as A B (one space) rather than A B (two spaces) or producing an empty word in between.

Invalid Token Handling

After word and letter boundaries are resolved, each letter token is looked up in the Morse table. If the token matches a known sequence, the corresponding character is emitted. If it does not match — either because it is a valid dot-dash sequence that simply is not in the standard table, or because it contains unexpected characters — it is emitted in bracket notation: [token].

This is the most deliberate design choice in the decoder, and the one most at odds with how other tools handle errors. The alternatives are:

Silent drop: skip the invalid token and continue. This is the most common approach in other tools, and it is the worst. The output looks clean, but it is wrong — it silently omits information, and the user has no way to know what was discarded.

Replacement character: substitute ? or * for each invalid token. This is better than dropping, but it collapses all failures into a single undifferentiated marker. You lose the original sequence, which makes it impossible to debug why the token was invalid.

Block on error: refuse to produce output if any token is invalid. This is maximally unhelpful — it prioritises the decoder's cleanliness over the user's ability to see any output at all.

Bracket notation preserves the original failing sequence and positions it exactly where it appeared in the decoded output. If you decode .- ...... -... and get A[......]B, you immediately know that the middle token was six dots — an invalid Morse sequence — and it appeared between A and B. The output is a faithful transcript of what the decoder saw. You can correct the input and re-run, and you know exactly what to look for.

The bracket notation does mean the output is not "pure" decoded text when errors are present. That is the right trade-off. A decoder that hides errors is less useful than one that exposes them, even at the cost of some visual noise.

The Demo Input Choice

The initial input pre-loaded in the decoder is .-- .... .- - / .... .- - .... / --. --- -.. / .-- .-. --- ..- --. .... -, which decodes to WHAT HATH GOD WROUGHT.

This is not SOS. The choice was deliberate.

SOS (... --- ...) is the most recognisable Morse sequence, and nearly every Morse tool on the web uses it as a demo. It is three short sequences that happen to be easy to type. As a demonstration input it is fine, but it carries no meaning beyond itself.

"WHAT HATH GOD WROUGHT" is the first message ever sent over a commercial electric telegraph, transmitted by Samuel Morse from the United States Capitol in Washington D.C. to Baltimore on May 24, 1844. The phrase is a quotation from the Book of Numbers, chosen by Annie Ellsworth, the daughter of the Commissioner of Patents, who had been granted the honour of selecting the message. Morse considered it an acknowledgement that the invention was larger than its inventors.

Using this message as the demo grounds the tool in the history it is decoding. When someone opens the tool and sees the Morse sequence decode to those words, there is context — something to connect the abstract dot-dash notation to the actual moment it entered the world. That historical grounding makes the demo more interesting than any arbitrary test string, and more honest than SOS, which was standardised sixty years after the telegraph was invented.

Building a Privacy-First Morse Code Encoder

A walkthrough of building a deterministic, ITU-compliant, entirely client-side Morse code encoder — covering character mapping design, edge-case policy decisions, and how to handle unsupported Unicode gracefully without silently corrupting output.

How to improve as programmer?

In the wake of ever changing technology, there are still lot of new technical knowledge that programmers need to grasp. In short, it’s essential for programmers having this capability of quickly grasping new knowledge as it will determine their level of career development and outlook. The most important and fundamental skillset for programmers are learning ability and ability to analyze and solve problems.