Web Scraping and Character Encoding

DromundKaas · November 17, 2023, 4:03pm

This looks to me more like a developer/development problem than an OS/installation issue. From a development perspective, I can only tell you how I would start debugging the issue:

Get a dump of the website you’re scraping, using either curl or wget. Then using the scraping framework I would try to see into the memory, if the memory representation is the same. If that is not possible, dump the binary representation from the scraper into a file, then binary diff the original wget/curl version and this version.

Good luck.