Every quantitative study of the Tanakh — and Equidistant Letter Sequences are exactly that — stands or falls on a prior question: precisely which text are you measuring? This page answers that question for our search engine with single-letter precision: we document the full source chain, declare the counting rules, publish the complete table of all 39 books, and validate the total against the number the soferim have guarded for centuries.

The source, documented link by link

Our search engine does not use "some Hebrew text from the internet". It uses a specific edition with a public editorial history:

  • Edition: Miqra According to the Masorah (MAM, מקרא על פי המסורה) — a digital edition of the Tanakh based on the Aleppo Codex (כתר ארם צובא, the most authoritative Masoretic manuscript, vocalized by Aharon ben Asher, 10th c.) and related manuscripts for the missing portions. Every editorial decision in MAM is publicly documented.
  • Where it lives: MAM is developed and maintained at Hebrew Wikisource, under an open CC-BY-SA license.
  • How it reaches us: Sefaria distributes MAM as its standard Hebrew version of the Tanakh. This is not an assumption: Sefaria's own API declares it in its metadata (heVersionTitle: "Miqra according to the Masorah" and heVersionSource pointing to the project page on Wikisource). Anyone can check the raw API response.
  • Our search engine loads that text, book by book, directly from the Sefaria API — the same chain end to end.

Counting methodology (the exact rules)

Tanakh counts are only comparable when the rules are declared. Ours, applied identically in this table and in the search engine's corpus:

  • Letter: any character of the Hebrew alphabet (Unicode U+05D0 to U+05EA: the 22 letters plus the 5 final forms ך ם ן ף ץ). Vocalization (nikkud) and cantillation (te'amim) marks are not letters and are not counted.
  • Written text (ketiv), not read text (qere): where the Masorah records a word written one way and read another, we count exclusively the ketiv — what is physically written in the scroll. This is the convention of classical ELS research.
  • No editorial apparatus: MAM's footnotes (manuscript variants), the section markers {פ}/{ס} and all signaling elements are excluded: they are not letters of the Tanakh.
  • Word: a sequence of Hebrew letters delimited by whitespace or maqaf (־). That is, words joined by maqaf count separately.
  • Verse: each verse of the edition's standard Masoretic division.

The complete table

#BookVersesWordsLetters
1Genesis בראשית1,53320,61278,063
2Exodus שמות1,21016,71363,527
3Leviticus ויקרא85911,95044,790
4Numbers במדבר1,28816,40863,529
5Deuteronomy דברים95614,29454,892
Torah — 5 books5,84679,977304,801
6Joshua יהושע65610,03139,730
7Judges שופטים6189,88538,952
8I Samuel שמואל א81113,26151,357
9II Samuel שמואל ב69511,03342,179
10I Kings מלכים א81713,14050,625
11II Kings מלכים ב71912,27347,822
12Isaiah ישעיהו1,29116,92566,874
13Jeremiah ירמיהו1,36421,83184,899
14Ezekiel יחזקאל1,27318,73074,511
15Hosea הושע1972,3819,389
16Joel יואל739573,872
17Amos עמוס1462,0428,034
18Obadiah עובדיה212911,119
19Jonah יונה486882,700
20Micah מיכה1051,3965,571
21Nahum נחום475582,255
22Habakkuk חבקוק566712,596
23Zephaniah צפניה537672,995
24Haggai חגי386002,336
25Zechariah זכריה2113,12712,433
26Malachi מלאכי558763,450
Neviʼim (Prophets) — 21 books9,294141,463553,699
27Psalms תהלים2,52719,58378,822
28Proverbs משלי9156,91526,500
29Job איוב1,0708,34031,851
30Song of Songs שיר השירים1171,2505,141
31Ruth רות851,2944,949
32Lamentations איכה1541,5425,974
33Ecclesiastes קהלת2222,98710,968
34Esther אסתר1673,04512,110
35Daniel דניאל3575,92324,280
36Ezra עזרא2803,75415,762
37Nehemiah נחמיה4055,31222,507
38I Chronicles דברי הימים א94310,74044,559
39II Chronicles דברי הימים ב82213,31554,917
Ketuvim (Writings) — 13 books8,06484,000338,340
COMPLETE TANAKH — 39 books23,204305,4401,196,840

The validation: why these numbers are trustworthy

Any table can be copied; a scientific table is validated. The soferic tradition — the scribes who copy scrolls letter by letter — has guarded the Sefer Torah count for centuries: 304,805 letters. Our computed Torah count gives 304,801. A difference of exactly 4 letters, ~0.0013%.

And here is the decisive point: that difference is not an error — it is a signature of authenticity. The number 304,805 corresponds to the orthography of today's standard scrolls (a tradition consolidated by the late printed editions), while Ben Asher's Aleppo Codex differs from those scrolls in a handful of documented plene/defective spelling cases (מלא/חסר). A corrupt or careless digital text would drift by hundreds or thousands of letters; a critical edition faithful to the Aleppo Codex drifts from the standard scroll by exactly this tiny, explainable order of magnitude. The validations converge:

  • Torah letters: 304,801 (MAM/Aleppo) vs 304,805 (standard scrolls) — Δ of 4 letters, consistent with the documented plene/defective differences between traditions.
  • Torah words: 79,977 — the commonly cited reference count is 79,976 (Δ = 1, attributable to a borderline word-division case between editions).
  • Torah verses: 5,846 — exactly the count of modern Masoretic editions.
  • Tanakh verses: 23,204 — within the transmitted Masoretic range (~23,200).

Why do other sites publish different numbers?

Compare "Bible letter count" tables online and you will find variation. It is almost always explained by four rarely-declared factors: the base edition (Aleppo, Leningrad, Koren, printed editions — they differ in plene/defective spelling), the treatment of ketiv/qere (do they count what is written, what is read, or both?), the word rule (does maqaf join or separate?), and silent contamination (editorial notes, markers and formatting characters counted as text). Our table declares all four decisions — which is why every number is defensible and reproducible.

In fact, preparing this study led us to improve the search engine itself: we detected that MAM's editorial footnotes and the duplicated qere were inflating the corpus by about 5,600 letters (~0.5%), and we fixed it. The current corpus is exactly the written text — the figures in this table.

Reproduce it yourself

Don't take our word for it: open the search engine, load any book (or the complete Tanakh) and compare the total letters reported in the information panel with this table. They match because they are the same count over the same text. The full procedure — source, cleaning, rules — is described above, and any programmer can replicate it against Sefaria's public API in an afternoon.

Note: the counts correspond to the MAM edition as distributed by the Sefaria API on the publication date of this article. MAM is a living edition with documented editorial corrections; future changes would be on the order of individual letters.