July 11th, 2008, 06:27 PM
Character Set Encoding/Decoding
Hey guys... long time no see how is everyone? I see a lot of the same faces around... and a few new ones
Well here's my question. I wrote a spider in perl that grabs 230K pages of content and puts it into a database for later use on our own site.... and like a ... you know ... I forgot to make sure the database was set to the same charset. The original content was in UTF-8 but the database was in latin-1.
After realizing my mistake i told the database it should be in utf-8 but that didnt fix it (it's mysql btw), actually some perl programmers think that might have even compounded the problem.
Basically how do i get back to my original data? I have php, perl, C#, C++, python and a few others under my belt so tools arent much of a problem but the ones that I've tried are getting me nowhere... and i think i have a faulty view of encoding and decoding.
I looked at the hex and know it definitely converted the utf-8 to latin-1 on the way in and i think i need to only go one step backwards. any thoughts?
ex. i had the char Ω on the site and it comes out Ω in the db.
hex: ce a9 -> e2 84 7c
if God was willing to live all out for us, why aren't we willing to live all out for Him? God bless,
my home my forum
By cabby80 in forum The Security Tutorials Forum
Last Post: October 18th, 2005, 12:13 AM
By ch4r in forum Other Tutorials Forum
Last Post: May 30th, 2005, 09:29 PM
By skiddieleet in forum Other Tutorials Forum
Last Post: March 9th, 2005, 06:20 PM
By White Scorpion in forum Other Tutorials Forum
Last Post: November 2nd, 2004, 11:41 AM
By ele5125 in forum Other Tutorials Forum
Last Post: June 18th, 2002, 05:44 AM