Calibre Fails to Convert PDF to AZW3 for Kindle

I am on KDE Plasma, and have installed Calibre and successfully managed to convert lots of pdf files and move them to my Kindle.

Recently, any attempt to convert any pdf fails!

Anybody knows what can be done?
If I uninstalled Calibre then reinstalled would this fix it or perhaps there will be some config file somewhere that will cause the same problem?

Thank you for your help.

I use neither Calibre nor Kindle so not that well-suited to help you troubleshoot the issue.

A quick search reveals that Calibre has a CLI as well: ebook-convert

Just for the sake of testing try that and see if it works:

ebook-convert some_file.pdf some_file.azw3

2 Likes

Thanks @pebcak
I tried the command but got:

  File "/usr/lib/calibre/calibre/ebooks/oeb/polish/container.py", line 42, in <module>
    from calibre.ebooks.oeb.polish.parsing import parse as parse_html_tweak
  File "/usr/lib/calibre/calibre/ebooks/oeb/polish/parsing.py", line 10, in <module>
    import html5_parser
  File "/usr/lib/python3.10/site-packages/html5_parser/__init__.py", line 31, in <module>
    raise RuntimeError(
RuntimeError: html5-parser and lxml are using different versions of libxml2. This happens commonly when using pip installed versions of lxml. Use pip install --no-binary lxml lxml instead. libxml2 versions: html5-parser: (2, 10, 3) != lxml: (2, 9, 14)
[limo@asus Downloads]$ 

I will see if I can fix it.
I did what was suggested but got:

[limo@asus Downloads]$ pip install --no-binary lxml lxml
Defaulting to user installation because normal site-packages is not writeable
DEPRECATION: --no-binary currently disables reading from the cache of locally built wheels. In the future --no-binary will not influence the wheel cache. pip 23.1 will enforce this behaviour change. A possible replacement is to use the --no-cache-dir option. You can use the flag --use-feature=no-binary-enable-wheel-cache to test the upcoming behaviour. Discussion can be found at https://github.com/pypa/pip/issues/11453
Requirement already satisfied: lxml in /home/limo/.local/lib/python3.10/site-packages (4.9.2)
[limo@asus Downloads]$ 

Launch calibre from the terminal and then use it for converting a file and see the output in the terminal.

The output might help some forum mate to pin the issue down.

1 Like
[limo@asus Downloads]$ calibre
Convert book 1 of 1 (Buddhist Economic Thoughts Buddhist Economic Thoughts ( PDFDrive ))
Conversion options changed from defaults:
  read_metadata_from_opf: '/tmp/calibre_6.15.1_tmp_si6lga1q/5ivi8zau.opf'
  cover: '/tmp/calibre_6.15.1_tmp_si6lga1q/fcas27tl.jpeg'
  verbose: 2
Resolved conversion options
calibre version: 6.15.1
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 0.0,
 'book_producer': None,
 'change_justification': 'original',
 'chapter': "//*[((name()='h1' or name()='h2') and re:test(., "
            "'\\s*((chapter|book|section|part)\\s+)|((prolog|prologue|epilogue)(\\s+|$))', "
            "'i')) or @class = 'chapter']",
 'chapter_mark': 'pagebreak',
 'comments': None,
 'cover': '/tmp/calibre_6.15.1_tmp_si6lga1q/fcas27tl.jpeg',
 'debug_pipeline': None,
 'dehyphenate': True,
 'delete_blank_paragraphs': True,
 'disable_font_rescaling': False,

'use_auto_toc': False,
 'verbose': 2}
InputFormatPlugin: PDF Input running
on /tmp/calibre_6.15.1_tmp_si6lga1q/no4j_cx3.pdf
Converting file to html...
pdftohtml log:
Page-1
Page-2
Page-3
Page-4
Page-5
Page-6

Retrieving document metadata...
Generating manifest...
Rendering manifest...
Parsing all content...
Parsing index.html ...
Generating default TOC from spine...
Traceback (most recent call last):
  File "/usr/bin/calibre-parallel", line 21, in <module>
    sys.exit(main())
  File "/usr/lib/calibre/calibre/utils/ipc/worker.py", line 215, in main
    result = func(*args, **kwargs)
  File "/usr/lib/calibre/calibre/gui2/convert/gui_conversion.py", line 38, in gui_convert_override
    gui_convert(input, output, recommendations, notification=notification,
  File "/usr/lib/calibre/calibre/gui2/convert/gui_conversion.py", line 25, in gui_convert
    plumber.run()
  File "/usr/lib/calibre/calibre/ebooks/conversion/plumber.py", line 1143, in run
    from calibre.ebooks.html_transform_rules import transform_conversion_book
  File "/usr/lib/calibre/calibre/ebooks/html_transform_rules.py", line 7, in <module>
    from html5_parser import parse
  File "/usr/lib/python3.10/site-packages/html5_parser/__init__.py", line 31, in <module>
    raise RuntimeError(
RuntimeError: html5-parser and lxml are using different versions of libxml2. This happens commonly when using pip installed versions of lxml. Use pip install --no-binary lxml lxml instead. libxml2 versions: html5-parser: (2, 10, 3) != lxml: (2, 9, 14)

1 Like

What is the output of:

python -m pip list

?

1 Like

OMG! It is long long list!

[limo@asus ~]$ python -m pip list
Package                      Version
---------------------------- ------------------------------------
absl-py                      1.4.0
aiohttp                      3.8.4
aiosignal                    1.3.1
anyio                        3.6.2
appdirs                      1.4.4
apsw                         3.40.0.0
argcomplete                  1.10.3
aspell-python-py3            1.15
astunparse                   1.6.3
async-generator              1.10
async-timeout                4.0.2
attrs                        22.2.0
autocommand                  2.2.2
bcrypt                       4.0.1
beautifulsoup4               4.8.2
blis                         0.7.9
breadability                 0.1.20
Brotli                       1.0.9
brotlicffi                   1.0.9.2
bs4                          0.0.1
btrfsutil                    6.2.2
build                        0.10.0
cachetools                   5.3.0
catalogue                    2.0.8
cchardet                     2.1.7
certifi                      2022.12.7
cffi                         1.15.1
charade                      1.0.3
chardet                      3.0.4
charset-normalizer           3.1.0
chatbot                      1.5.2b0
click                        8.1.3
cmake                        3.26.1
colorama                     0.4.6
coloredlogs                  15.0.1
compressed-rtf               1.0.6
confection                   0.0.4
cryptography                 40.0.1
css-parser                   1.0.8
cssselect                    1.2.0
cupshelpers                  1.0
cymem                        2.0.7
Cython                       0.29.34
dbus-python                  1.3.2
defusedxml                   0.7.1
deprecation                  2.1.0
dill                         0.3.6
distro                       1.8.0
dnspython                    2.3.0
docopt                       0.6.2
docx2txt                     0.8
ebcdic                       1.1.1
en-core-web-lg               3.5.0
en-core-web-sm               3.5.0
exceptiongroup               1.1.0
extract-msg                  0.28.7
fake-useragent               1.1.3
fastjsonschema               2.16.3
feedfinder2                  0.0.4
feedparser                   6.0.10
filelock                     3.11.0
fire                         0.5.0
Flask                        2.2.3
Flask-Cors                   3.0.10
flatbuffers                  23.3.3
frozendict                   2.3.5
frozenlist                   1.3.3
fsspec                       2023.3.0
ftfy                         6.1.1
future                       0.18.2
gast                         0.4.0
gensim                       4.3.1
Glances                      3.3.1.1
google-api-core              2.11.0
google-auth                  2.16.2
google-auth-oauthlib         1.0.0
google-pasta                 0.2.0
googleapis-common-protos     1.59.0
goose3                       3.1.13
gpg                          1.19.0
gpt-2-simple                 0.8.1
greenlet                     2.0.2
grpcio                       1.51.3
h11                          0.14.0
h5py                         3.8.0
html2text                    2020.1.16
html5-parser                 0.4.10
html5lib                     1.1
huggingface-hub              0.13.4
humanfriendly                10.0
idna                         2.8
ifaddr                       0.2.0
IMAPClient                   2.1.0
img2pdf                      0.4.4
importlib-metadata           6.1.0
importlib-resources          5.12.0
inflate64                    0.3.1
inflect                      6.0.4
installer                    0.7.0
itsdangerous                 2.1.2
jaraco.classes               3.2.3
jaraco.context               4.3.0
jaraco.functools             3.6.0
jaraco.text                  3.11.1
jax                          0.4.8
jeepney                      0.8.0
jieba3k                      0.35.1
Jinja2                       3.1.2
joblib                       1.2.0
JPype1                       1.4.1
keras                        2.12.0
keyring                      23.11.0
langcodes                    3.3.0
langdetect                   1.0.9
libclang                     16.0.0
libcomps                     0.1.19
libtorrent                   2.0.8
lit                          16.0.0
lxml                         4.9.2
Markdown                     3.4.3
MarkupSafe                   2.1.2
mechanize                    0.4.8
ml-dtypes                    0.0.4
mod-wsgi                     4.9.4
more-itertools               9.1.0
mpmath                       1.3.0
msgpack                      1.0.4
multidict                    6.0.4
multiprocess                 0.70.14
multitasking                 0.0.11
multivolumefile              0.2.3
murmurhash                   1.0.9
netifaces                    0.11.0
netsnmp-python               1.0a1
networkx                     3.0
newspaper3k                  0.2.8
nftables                     0.1
nltk                         3.8.1
numpy                        1.23.5
nvidia-cublas-cu11           11.10.3.66
nvidia-cuda-cupti-cu11       11.7.101
nvidia-cuda-nvrtc-cu11       11.7.99
nvidia-cuda-runtime-cu11     11.7.99
nvidia-cudnn-cu11            8.5.0.96
nvidia-cufft-cu11            10.9.0.58
nvidia-curand-cu11           10.2.10.91
nvidia-cusolver-cu11         11.4.0.1
nvidia-cusparse-cu11         11.7.4.91
nvidia-nccl-cu11             2.14.3
nvidia-nvtx-cu11             11.7.91
oauthlib                     3.2.2
ocrmypdf                     14.1.0
olefile                      0.46
opt-einsum                   3.3.0
ordered-set                  4.1.0
outcome                      1.2.0
packaging                    23.0
pandas                       1.5.3
parse                        1.19.0
pathy                        0.10.1
pdf2image                    1.16.3
pdfminer                     20191125
pdfminer.six                 20191110
pdftotext                    2.2.2
pikepdf                      7.2.0
Pillow                       9.4.0
pip                          23.0.1
platformdirs                 3.2.0
pluggy                       1.0.0
ply                          3.11
preshed                      3.0.8
proto-plus                   1.22.2
protobuf                     4.22.1
psutil                       5.9.4
py7zr                        0.20.4
pyahocorasick                2.0.0
pyaml                        21.10.1
pyarrow                      11.0.0
pyasn1                       0.4.8
pyasn1-modules               0.2.8
pybcj                        1.0.1
pybind11                     2.10.4
pycairo                      1.23.0
pychm                        0.8.6
pycountry                    22.3.5
pycparser                    2.21
pycryptodome                 3.17
pycryptodomex                3.12.0
pycups                       2.0.1
pycurl                       7.45.2
pydantic                     1.10.7
pyee                         8.2.2
Pygments                     2.14.0
PyGObject                    3.44.1
pyOpenSSL                    23.1.1
pyparsing                    3.0.9
PyPDF2                       3.0.1
pyperclip                    1.8.2
pyppeteer                    1.0.2
pyppmd                       1.0.0
pyproject_hooks              1.0.0
PyQt5                        5.15.9
PyQt5-sip                    12.12.0
PyQt6                        6.5.0
PyQt6-sip                    13.5.0
PyQt6-WebEngine              6.5.0
pyquery                      2.0.0
PySocks                      1.7.1
pytesseract                  0.3.10
python-dateutil              2.8.2
python-docx                  0.8.11
python-dotenv                1.0.0
python-gnupg                 0.5.0
python-magic                 0.4.27
python-pptx                  0.6.21
pythondialog                 3.5.3
pytz                         2022.7.1
pytz-deprecation-shim        0.1.0.post0
pyxdg                        0.28
PyYAML                       6.0
pyzstd                       0.15.6
Recoll                       1.34.0
recollchm                    0.8.4.1+git
Reflector                    2021.11.20.2.41.3
regex                        2023.3.23
reportlab                    3.6.12
requests                     2.21.0
requests-file                1.5.1
requests-oauthlib            1.3.1
responses                    0.18.0
rpm                          4.18.1
rsa                          4.9
sacremoses                   0.0.53
scikit-learn                 1.2.2
scipy                        1.10.1
SecretStorage                3.3.3
selenium                     4.8.3
sentencepiece                0.1.97
setuptools                   67.6.0
setuptools-scm               7.1.0
sgmllib3k                    1.0.0
shtab                        1.5.8
six                          1.12.0
smart-open                   6.3.0
sniffio                      1.3.0
sortedcontainers             2.4.0
soupsieve                    2.4
spacy                        3.5.1
spacy-legacy                 3.0.12
spacy-loggers                1.0.4
SpeechRecognition            3.8.1
SQLAlchemy                   2.0.9
srsly                        2.4.6
starlette                    0.26.1
summa                        1.2.0
summarizer                   0.0.6
sumy                         0.11.0
sympy                        1.11.1
systemd-python               235
team                         1.0
tensorboard                  2.12.1
tensorboard-data-server      0.7.0
tensorboard-plugin-wit       1.8.1
tensorflow                   2.12.0
tensorflow-estimator         2.12.0
tensorflow-io-gcs-filesystem 0.32.0
termcolor                    2.2.0
textract                     1.6.5
texttable                    1.6.7
thinc                        8.1.9
threadpoolctl                3.1.0
tinysegmenter                0.3
tldextract                   3.4.0
tldr                         3.1.0
tokenizers                   0.13.3
tomli                        2.0.1
toposort                     1.10
torch                        2.0.0
torchaudio                   2.0.1
torchvision                  0.15.1
tqdm                         4.62.3
transformers                 4.28.1
trio                         0.22.0
trio-websocket               0.9.2
triton                       2.0.0
trove-classifiers            2023.3.10
typer                        0.7.0
typing_extensions            4.5.0
tzdata                       2023.3
tzlocal                      4.3
ujson                        5.7.0
Unidecode                    1.3.6
unrardll                     0.1.5
urllib3                      1.24.3
validate-pyproject           0.12.2.post1.dev0+g2940279.d20230328
w3lib                        2.1.1
wasabi                       1.1.1
wcwidth                      0.2.6
webencodings                 0.5.1
websockets                   10.4
Werkzeug                     2.2.3
wheel                        0.40.0
wrapt                        1.14.1
wsproto                      1.2.0
xlrd                         1.2.0
XlsxWriter                   3.0.9
xxhash                       3.2.0
yarl                         1.8.2
zeroconf                     0.39.4
zipp                         3.15.0
[limo@asus ~]$ 


Make sure your system is updated:

sudo pacman -Syu

Uninstall this one:

pip uninstall lxml

Then run:

sudo pacman -S python-html5-parser python-html5lib python-lxml

Hope this helps!

3 Likes

Amazingly genius @pebcak !
You said:

But what you suggested actually solved it 100% both running the command:

and through Calibre app as before!
I wonder what can you do if help with something you are experienced with!
Amazing community as usual.

Thank you

1 Like

Glad you got the issue resolved!

I just used some search-fu :blush:

1 Like

I previously did the search, but… what I found was mostly bla bla bla… you simply pinpointed and you solved it in one shot. In my search I never came across some suggestion to run from command line to pinpoint the problem. All was mostly blah blah blah.

Thank you @pebcak

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.