Web Scraping and Character Encoding

This looks to me more like a developer/development problem than an OS/installation issue. From a development perspective, I can only tell you how I would start debugging the issue:

Get a dump of the website you’re scraping, using either curl or wget. Then using the scraping framework I would try to see into the memory, if the memory representation is the same. If that is not possible, dump the binary representation from the scraper into a file, then binary diff the original wget/curl version and this version.

Good luck.