Цель этого теста была прикладной: проверить, можно ли из сложного PDF-документа получить рабочий LaTeX-файл книги, используя DeepSeek OCR 2 локально, без внешних API.
Исходный документ по судовым чартерам содержал плотную структуру: много заголовков и подзаголовков, большое количество ссылок и сносок. Ключевой вопрос был не просто в распознавании текста, а в том, можно ли сохранить структуру и восстановить сноски в формате, который подходит для дальнейшей редактуры и перевода.
Если вам нужно локально оцифровать сложные документы (книги, регламенты, договорные материалы) и получить не «сырой OCR», а рабочий результат для бизнеса, можно прислать задачу в бриф.
Что проверяли в эксперименте
- Можно ли стабильно распознавать страницы сложной книги в локальном контуре.
- Можно ли объединить результаты OCR так, чтобы сохранить и структуру, и сноски.
- Можно ли на выходе получить единый LaTeX-файл книги для следующего этапа работы.
Результат
Результат теста успешный: итоговый LaTeX-файл книги был получен.
То есть гипотеза подтвердилась: в таком классе документов можно получить пригодный LaTeX через локальный OCR-пайплайн и постобработку.
Следующий этап этого же кейса — автоматизированный перевод полученного LaTeX на русский с локальной моделью Qwen 2.5 32B, включая quality-gate и гибридную дополировку: отдельный пост по переводу.
Машина и ресурсы
- Машина: MacBook Pro M1, 32 GB RAM.
- Запуск: внутри Docker, в CPU-режиме.
- Режим инференса:
cpu,float32,attn_impl=eager.
Производительность по логу выполнения
Вот как выглядела работа в терминале: в приложенном фрагменте для страниц 0178 и 0179 время обработки одной страницы составляло примерно 18-21 минуту, а загрузка весов модели перед обработкой занимала около 39-44 секунд.
Технические детали
Документ обрабатывался постранично. Для каждой страницы использовались два OCR-прохода:
-
структурный проход (даёт более аккуратную иерархию текста),
<image> <|grounding|> Convert the document to markdown. -
полный OCR-проход (лучше вытягивает полный текст, включая сноски),
<image> Free OCR.
Далее результаты двух проходов объединялись: структура бралась из первого, а тексты сносок и потерянные фрагменты подтягивались из второго. После этого формировался единый LaTeX и отдельный список мест для ручной проверки.
Модель и точная ревизия: deepseek-ai/DeepSeek-OCR-2 @ aaa02f3811945a91062062994c5c4a3f4c0af2b0.
Примеры OCR-выхода
Ниже — два фрагмента реального распознавания одной из страниц: структурный Markdown-выход и полный Free OCR-выход.
Пример Markdown-распознавания (развернуть)
Текстовый комментарий: в этом режиме лучше сохраняется структура документа (заголовки и секции), но отдельные элементы вроде сносок могут теряться или упрощаться.
<|ref|>text<|/ref|><|det|>[[102, 79, 887, 145]]<|/det|>
not be underplayed. In Papas Olio JSC v. Grains & Fourrages, \( ^{12} \) Toulson L.J. said that, in most cases, the recap fulfils a dual function of confirming evidently the making of the oral agreement and also superseding the oral agreement by providing a document to which the parties can then look as the expression of their bargain. As Lord Blackburn said in Rossiter v. Miller \( ^{13} \) :
<|ref|>text<|/ref|><|det|>[[138, 156, 887, 315]]<|/det|>
It is a necessary part of the plaintiff’s case to show that the two parties had come to a final and complete agreement, for, if not, there was no contract. So long as they are only in negotiation either party may retract; and though the parties may have agreed on all the cardinal points of the intended contract, yet, if some particulars essential to the agreement still remain to be settled afterwards, there is no contract. The parties, in such a case, are still only in negotiation. But the mere fact that the parties have expressly stipulated that there shall afterwards be a formal agreement prepared, embodying the terms, which shall be signed by the parties does not, by itself, show that they continue merely in negotiation. It is a matter to be taken into account in construing the evidence and determining whether the parties have really come to a final agreement or not. But as soon as the fact is established of the final mutual assent of the parties so that those who draw up the formal agreement have not the power to vary the terms already settled, I think the contract is completed.
<|ref|>text<|/ref|><|det|>[[103, 327, 886, 360]]<|/det|>
1.4 Those particulars that are “essential to the agreement” and that must therefore be settled before a binding contract exists, may fall into two categories, namely:
<|ref|>text<|/ref|><|det|>[[103, 361, 885, 393]]<|/det|>
(i) terms that, if not settled, render the entire agreement unworkable, or void for uncertainty, with the result that the court is unable to enforce it, whatever the parties may have intended;
<|ref|>text<|/ref|><|det|>[[103, 394, 885, 427]]<|/det|>
(ii) terms, the agreement upon which is regarded by the parties themselves as an essential prerequisite of the making of a binding contract. \( ^{14} \)
<|ref|>sub_title<|/ref|><|det|>[[103, 450, 810, 483]]<|/det|>
## Matters which must be agreed if the contract is not to be unworkable or void for uncertainty
<|ref|>text<|/ref|><|det|>[[102, 491, 886, 557]]<|/det|>
1.5 As Bingham J. said in Pagnan v. Feed Products, \( ^{15} \) “the parties are to be regarded as masters of their contractual fate”, and it is primarily up to them whether agreement upon any particular matter is to be a prerequisite of a binding contract. However, the issue is not subjective, as noted by Lord Clarke \( ^{16} \) :
<|ref|>text<|/ref|><|det|>[[137, 568, 887, 686]]<|/det|>
The general principles are not in doubt. Whether there is a binding contract between the parties and if so, upon what terms depends on what they have agreed. It depends not upon their subjective state of mind, but upon a consideration of what was communicated between them by words or conduct and whether that leads objectively to a conclusion that they intended to create legal relations and had agreed upon all the terms which they regard or the law requires as essential for the formation of legally binding relations. Even if certain terms of economic or other significance to the parties have not been finalised, an objective appraisal of their words and conduct may lead to the conclusion that they did not intend agreement of such terms to be a pre-condition to a concluded and legally binding agreement.
<|ref|>text<|/ref|><|det|>[[104, 697, 844, 714]]<|/det|>
As Andrew Smith J. expressed it in Bear Stearns Bank plc v. Forum Global Equity Ltd \( ^{17} \) :
<|ref|>text<|/ref|><|det|>[[103, 729, 885, 755]]<|/det|>
12 [2010] 2 Lloyd’s Rep. 152, at para. 28 and see also TTMI Sarl v. Statoil ASA (The Sibolelle) [2011] 2 Lloyd’s Rep. 220, at para. 31.
<|ref|>text<|/ref|><|det|>[[130, 755, 366, 768]]<|/det|>
13 (1878) 3 App. Cas. 1124, 1151.
<|ref|>text<|/ref|><|det|>[[103, 768, 885, 793]]<|/det|>
14 See Pagnan v. Feed Products [1987] 2 Lloyd’s Rep. 601, 619; Spectra International v. Tiscali [2002] All E.R.(D) 209.
<|ref|>text<|/ref|><|det|>[[103, 794, 886, 845]]<|/det|>
15 Ibid. at p. 611. This is a description which the courts have repeatedly adopted: see, e.g., RTS Flexible Systems Ltd v. Molenski Alois Muller GmbH & Co. [2010] 1 W.L.R. 753 and Air Studios (Lyndhurst) Ltd v. Lombard North Central (T/A Air Entertainment Group) [2013] 1 Lloyd’s Rep. 63, where Males J. set out the principles concerning the present issue with great clarity at paras 5–12.
<|ref|>text<|/ref|><|det|>[[103, 845, 885, 871]]<|/det|>
16 RTS Flexible Systems Ltd v. Molenski Alois Muller GmbH & Co. [2011] 1 W.L.R. 753; and see Barbudev v. Eurocom Cable Management Bulgaria EOOD [2012] 2 All E.R. (Comm) 963.
<|ref|>text<|/ref|><|det|>[[103, 871, 885, 897]]<|/det|>
17 [2007] EWHC 1576 (Comm), at para. 171; and the same judge in Macro Volatility Master Fund v. Rouvray [2009] 1 Lloyd’s Rep. 475, at para. 223.
Пример Free OCR-распознавания (развернуть)
Текстовый комментарий: в этом режиме лучше вытягивается полный текст, включая сноски, но структура (иерархия заголовков/секций) обычно менее аккуратная.
not be underplayed. In Papas Olio JSC v. Grains & Fourrages,\(^{12}\) Toulson L.J. said that, in most cases, the recap fulfils a dual function of confirming evidentially the making of the oral agreement and also superseding the oral agreement by providing a document to which the parties can then look as the expression of their bargain. As Lord Blackburn said in Rossiter v. Miller\(^{13}\):
It is a necessary part of the plaintiff’s case to show that the two parties had come to a final and complete agreement, for, if not, there was no contract. So long as they are only in negotiation either party may retract; and though the parties may have agreed on all the cardinal points of the intended contract, yet, if some particulars essential to the agreement still remain to be settled afterwards, there is no contract. The parties, in such a case, are still only in negotiation. But the mere fact that the parties have expressly stipulated that there shall afterwards be a formal agreement prepared, embodying the terms, which shall be signed by the parties does not, by itself, show that they continue merely in negotiation. It is a matter to be taken into account in construing the evidence and determining whether the parties have really come to a final agreement or not. But as soon as the fact is established of the final mutual assent of the parties so that those who draw up the formal agreement have not the power to vary the terms already settled, I think the contract is completed.
1.4 Those particulars that are “essential to the agreement” and that must therefore be settled before a binding contract exists, may fall into two categories, namely:
(i) terms that, if not settled, render the entire agreement unworkable, or void for uncertainty, with the result that the court is unable to enforce it, whatever the parties may have intended;
(ii) terms, the agreement upon which is regarded by the parties themselves as an essential prerequisite of the making of a binding contract.\(^{14}\)
**Matters which must be agreed if the contract is not to be unworkable or void for uncertainty**
1.5 As Bingham J. said in *Pagnan v. Feed Products*,\(^{15}\) “the parties are to be regarded as masters of their contractual fate”, and it is primarily up to them whether agreement upon any particular matter is to be a prerequisite of a binding contract. However, the issue is not subjective, as noted by Lord Clarke\(^{16}\):
The general principles are not in doubt. Whether there is a binding contract between the parties and if so, upon what terms depends on what they have agreed. It depends not upon their subjective state of mind, but upon a consideration of what was communicated between them by words or conduct and whether that leads objectively to a conclusion that they intended to create legal relations and had agreed upon all the terms which they regard or the law requires as essential for the formation of legally binding relations. Even if certain terms of economic or other significance to the parties have not been finalised, an objective appraisal of their words and conduct may lead to the conclusion that they did not intend agreement of such terms to be a pre-condition to a concluded and legally binding agreement.
As Andrew Smith J. expressed it in *Bear Stearns Bank plc v. Forum Global Equity Ltd\(^{17}\)*:
12 [2010] 2 Lloyd’s Rep. 152, at para. 28 and see also *TTMI Sarl v. Statoil ASA (The Sibolelle)* [2011] 2 Lloyd’s Rep. 220, at para. 31.
13 (1878) 3 App. Cas. 1124, 1151.
14 See *Pagnan v. Feed Products* [1987] 2 Lloyd’s Rep. 601, 619; *Spectra International v. Tiscali* [2002] All E.R.(D) 209.
15 *Ibid.* at p. 611. This is a description which the courts have repeatedly adopted: see, e.g., *RTS Flexible Systems Ltd v. Molenski Alois Muller GmbH & Co.* [2010] 1 W.L.R. 753 and *Air Studios (Lyndhurst) Ltd v. Lombard North Central (T/A Air Entertainment Group)* [2013] 1 Lloyd’s Rep. 63, where Males J. set out the principles concerning the present issue with great clarity at paras 5–12.
16 *RTS Flexible Systems Ltd v. Molenski Alois Muller GmbH & Co.* [2009] 1 W.L.R. 753; and see *Barbudev v. Eurocom Cable Management Bulgaria EOOD* [2012] 2 All E.R. (Comm) 963.
17 [2007] EWHC 1576 (Comm), at para. 171; and the same judge in *Macro Volatility Master Fund v. Rouvray* [2009] 1 Lloyd’s Rep. 475, at para. 223.
- Попытка улучшить результат за счёт экспериментов с промптами заметного эффекта не дала.
- Фактически стабильно работали два режима: «распознать в markdown» и «распознать всё» (Free OCR).
- Наилучший практический результат дал не один «идеальный» проход, а объединение этих двух выходов.
- Полученный LaTeX подходит как база для следующего этапа: ручной вычитки и перевода на русский.