An ELS is pure arithmetic: starting from an initial position, take one letter every d positions. If the text has a single letter too many or too few before your sequence, every position shifts and the finding appears, disappears or moves. That is why, before searching for anything, one question must be answered with absolute precision: how many letters does the Torah have, and which ones are they?

There is no "the" number — there are editions

The question "how many letters does the Torah have?" does not have a single answer, but an answer per textual tradition. The most authoritative Masoretic manuscripts differ from one another by a handful of letters — almost all of them plene or defective spelling variants (malé/chaser: the presence or omission of a ו or a י as a mater lectionis), which change neither the reading nor the meaning, but do change the count:

  • Koren edition / textus receptus — 304,805 letters. The text of the standard scrolls and of the classic printed editions. It is the text used by Witztum, Rips and Rosenberg in the experiment published in Statistical Science (1994), and the one used by virtually all codes research since.
  • Miqra According to the Masorah (MAM) — 304,801 letters. A digital edition based on the Aleppo Codex (Ben Asher, 10th c.), the most authoritative Masoretic manuscript. It is developed openly on Hebrew Wikisource under a CC-BY-SA license.
  • Leningrad Codex — 304,850 letters. The oldest complete manuscript, the basis of the academic Biblia Hebraica. It differs from the previous two by dozens of letters.

Four letters between Koren and MAM. Forty-five between Koren and Leningrad. For reading the text, irrelevant. For a fixed skip across thousands of positions, decisive.

Our decision: Koren for the Torah

We adopted the Koren edition (304,805 letters) as the search engine's canonical text for the five books of the Torah, for three reasons:

  • It is the standard of the discipline. Every published finding in the ELS literature — from Weissmandl to WRR and their critics — was computed on this text. To reproduce a finding with its exact skip, you must search the same text.
  • It is stable. The textus receptus has been fixed for centuries; it has no active editorial history. MAM, by contrast, is a living project that keeps receiving corrections — an editorial virtue, but a risk for the reproducibility of an arithmetic search.
  • It is verifiable. Its count (304,805) is the publicly documented number against which anyone can audit our corpus.

The 9 differences, documented one by one

Our corpus starts from the MAM edition (Hebrew Wikisource, open CC-BY-SA license) and applies the 9 documented variants that separate it from the Koren text. These are they — and only these:

#VerseMAM (Aleppo)KorenType
1Genesis 4:13מנשאמנשוא+1 (plene)
2Genesis 7:11מעינתמעינות+1 (plene)
3Genesis 9:29ויהיוויהי−1
4Exodus 25:31תעשהתיעשה+1 (plene)
5Exodus 28:26האפדהאפוד+1 (plene)
6Numbers 1:17בשמתבשמות+1 (plene)
7Numbers 10:10חדשיכםחדשכם−1 (defective)
8Numbers 22:5בערבעור+1 (plene)
9Deuteronomy 23:2דכאדכה0 (substitution)

Balance: +6 −2 = +4 letters → 304,801 + 4 = 304,805. There is also a tenth difference that changes no letter at all: in Koren, the words ויהי אחרי המגפה form the verse Numbers 25:19; in MAM they are the beginning of Numbers 26:1. Same Torah, different numbering — the Koren Torah has 5,847 verses.

How we verify it (and how you can verify it yourself)

  • Letter-by-letter collation against Rips's text. We compared our complete corpus — all 5,847 verses — against the Koren text distributed by the TorahBibleCodes project (the same file derived from the text used by Eliyahu Rips). Result: total identity, 304,805 out of 304,805.
  • Frozen checksums. The letter stream of each book has a SHA-256 fingerprint recorded in the repository. Every build of the site re-verifies all 39 books of the Tanakh against those fingerprints; if a single letter changed, the site would not be published.
  • Canonical counts as an invariant. Genesis 78,064 · Exodus 63,529 · Leviticus 44,790 · Numbers 63,530 · Deuteronomy 54,892. Any deviation halts the build.
  • Ketiv, not qere. Where the Masorah prescribes writing one thing and reading another, the corpus contains exclusively what is written in the scroll — the convention of all ELS research.

What about the rest of the Tanakh?

Outside the Torah there is no "Koren reference text" in the codes literature (the WRR experiment was performed on Genesis). For Nevi'im and Ketuvim we use the frozen MAM edition — same open source, same checksums, same guarantee of reproducibility. The complete table of counts per book is in our article on the numbers of the Tanakh.

The scroll you see on screen

The Sefer Torah viewer (the amudim with the traditional stichography) uses data from tikkun.io, whose text follows the Ben Asher tradition. Our alignment index walks both texts letter by letter and verifies on every build that they differ in exactly the 9 variants in the table — not one more. At those 9 points, the highlighting anchors to the neighboring letter of the scroll; at the other 304,796 letters, the correspondence is exact.