Every quantitative study of the Tanakh — and Equidistant Letter Sequences are exactly that — stands or falls on a prior question: precisely which text are you measuring? This page answers that question for our search engine with single-letter precision: we document the full source chain, declare the counting rules, publish the complete table of all 39 books, and validate the total against the number the soferim have guarded for centuries.
The source, documented link by link
Our search engine does not use "some Hebrew text from the internet". It uses a specific edition with a public editorial history:
- Edition: Miqra According to the Masorah (MAM, מקרא על פי המסורה) — a digital edition of the Tanakh based on the Aleppo Codex (כתר ארם צובא, the most authoritative Masoretic manuscript, vocalized by Aharon ben Asher, 10th c.) and related manuscripts for the missing portions. Every editorial decision in MAM is publicly documented.
- Where it lives: MAM is developed and maintained at Hebrew Wikisource, under an open CC-BY-SA license.
- How it reaches us: Sefaria distributes MAM as its standard Hebrew version of the Tanakh. This is not an assumption: Sefaria's own API declares it in its metadata (
heVersionTitle: "Miqra according to the Masorah"andheVersionSourcepointing to the project page on Wikisource). Anyone can check the raw API response. - Our search engine loads that text, book by book, directly from the Sefaria API — the same chain end to end.
Counting methodology (the exact rules)
Tanakh counts are only comparable when the rules are declared. Ours, applied identically in this table and in the search engine's corpus:
- Letter: any character of the Hebrew alphabet (Unicode U+05D0 to U+05EA: the 22 letters plus the 5 final forms ך ם ן ף ץ). Vocalization (nikkud) and cantillation (te'amim) marks are not letters and are not counted.
- Written text (ketiv), not read text (qere): where the Masorah records a word written one way and read another, we count exclusively the ketiv — what is physically written in the scroll. This is the convention of classical ELS research.
- No editorial apparatus: MAM's footnotes (manuscript variants), the section markers {פ}/{ס} and all signaling elements are excluded: they are not letters of the Tanakh.
- Word: a sequence of Hebrew letters delimited by whitespace or maqaf (־). That is, words joined by maqaf count separately.
- Verse: each verse of the edition's standard Masoretic division.
The complete table
| # | Book | Verses | Words | Letters |
|---|---|---|---|---|
| 1 | Genesis בראשית | 1,533 | 20,612 | 78,063 |
| 2 | Exodus שמות | 1,210 | 16,713 | 63,527 |
| 3 | Leviticus ויקרא | 859 | 11,950 | 44,790 |
| 4 | Numbers במדבר | 1,288 | 16,408 | 63,529 |
| 5 | Deuteronomy דברים | 956 | 14,294 | 54,892 |
| Torah — 5 books | 5,846 | 79,977 | 304,801 | |
| 6 | Joshua יהושע | 656 | 10,031 | 39,730 |
| 7 | Judges שופטים | 618 | 9,885 | 38,952 |
| 8 | I Samuel שמואל א | 811 | 13,261 | 51,357 |
| 9 | II Samuel שמואל ב | 695 | 11,033 | 42,179 |
| 10 | I Kings מלכים א | 817 | 13,140 | 50,625 |
| 11 | II Kings מלכים ב | 719 | 12,273 | 47,822 |
| 12 | Isaiah ישעיהו | 1,291 | 16,925 | 66,874 |
| 13 | Jeremiah ירמיהו | 1,364 | 21,831 | 84,899 |
| 14 | Ezekiel יחזקאל | 1,273 | 18,730 | 74,511 |
| 15 | Hosea הושע | 197 | 2,381 | 9,389 |
| 16 | Joel יואל | 73 | 957 | 3,872 |
| 17 | Amos עמוס | 146 | 2,042 | 8,034 |
| 18 | Obadiah עובדיה | 21 | 291 | 1,119 |
| 19 | Jonah יונה | 48 | 688 | 2,700 |
| 20 | Micah מיכה | 105 | 1,396 | 5,571 |
| 21 | Nahum נחום | 47 | 558 | 2,255 |
| 22 | Habakkuk חבקוק | 56 | 671 | 2,596 |
| 23 | Zephaniah צפניה | 53 | 767 | 2,995 |
| 24 | Haggai חגי | 38 | 600 | 2,336 |
| 25 | Zechariah זכריה | 211 | 3,127 | 12,433 |
| 26 | Malachi מלאכי | 55 | 876 | 3,450 |
| Neviʼim (Prophets) — 21 books | 9,294 | 141,463 | 553,699 | |
| 27 | Psalms תהלים | 2,527 | 19,583 | 78,822 |
| 28 | Proverbs משלי | 915 | 6,915 | 26,500 |
| 29 | Job איוב | 1,070 | 8,340 | 31,851 |
| 30 | Song of Songs שיר השירים | 117 | 1,250 | 5,141 |
| 31 | Ruth רות | 85 | 1,294 | 4,949 |
| 32 | Lamentations איכה | 154 | 1,542 | 5,974 |
| 33 | Ecclesiastes קהלת | 222 | 2,987 | 10,968 |
| 34 | Esther אסתר | 167 | 3,045 | 12,110 |
| 35 | Daniel דניאל | 357 | 5,923 | 24,280 |
| 36 | Ezra עזרא | 280 | 3,754 | 15,762 |
| 37 | Nehemiah נחמיה | 405 | 5,312 | 22,507 |
| 38 | I Chronicles דברי הימים א | 943 | 10,740 | 44,559 |
| 39 | II Chronicles דברי הימים ב | 822 | 13,315 | 54,917 |
| Ketuvim (Writings) — 13 books | 8,064 | 84,000 | 338,340 | |
| COMPLETE TANAKH — 39 books | 23,204 | 305,440 | 1,196,840 |
The validation: why these numbers are trustworthy
Any table can be copied; a scientific table is validated. The soferic tradition — the scribes who copy scrolls letter by letter — has guarded the Sefer Torah count for centuries: 304,805 letters. Our computed Torah count gives 304,801. A difference of exactly 4 letters, ~0.0013%.
And here is the decisive point: that difference is not an error — it is a signature of authenticity. The number 304,805 corresponds to the orthography of today's standard scrolls (a tradition consolidated by the late printed editions), while Ben Asher's Aleppo Codex differs from those scrolls in a handful of documented plene/defective spelling cases (מלא/חסר). A corrupt or careless digital text would drift by hundreds or thousands of letters; a critical edition faithful to the Aleppo Codex drifts from the standard scroll by exactly this tiny, explainable order of magnitude. The validations converge:
- Torah letters: 304,801 (MAM/Aleppo) vs 304,805 (standard scrolls) — Δ of 4 letters, consistent with the documented plene/defective differences between traditions.
- Torah words: 79,977 — the commonly cited reference count is 79,976 (Δ = 1, attributable to a borderline word-division case between editions).
- Torah verses: 5,846 — exactly the count of modern Masoretic editions.
- Tanakh verses: 23,204 — within the transmitted Masoretic range (~23,200).
Why do other sites publish different numbers?
Compare "Bible letter count" tables online and you will find variation. It is almost always explained by four rarely-declared factors: the base edition (Aleppo, Leningrad, Koren, printed editions — they differ in plene/defective spelling), the treatment of ketiv/qere (do they count what is written, what is read, or both?), the word rule (does maqaf join or separate?), and silent contamination (editorial notes, markers and formatting characters counted as text). Our table declares all four decisions — which is why every number is defensible and reproducible.
In fact, preparing this study led us to improve the search engine itself: we detected that MAM's editorial footnotes and the duplicated qere were inflating the corpus by about 5,600 letters (~0.5%), and we fixed it. The current corpus is exactly the written text — the figures in this table.
Reproduce it yourself
Don't take our word for it: open the search engine, load any book (or the complete Tanakh) and compare the total letters reported in the information panel with this table. They match because they are the same count over the same text. The full procedure — source, cleaning, rules — is described above, and any programmer can replicate it against Sefaria's public API in an afternoon.
Note: the counts correspond to the MAM edition as distributed by the Sefaria API on the publication date of this article. MAM is a living edition with documented editorial corrections; future changes would be on the order of individual letters.