July 11th, 2008, 06:27 PM
Character Set Encoding/Decoding
Hey guys... long time no see how is everyone? I see a lot of the same faces around... and a few new ones
Well here's my question. I wrote a spider in perl that grabs 230K pages of content and puts it into a database for later use on our own site.... and like a ... you know ... I forgot to make sure the database was set to the same charset. The original content was in UTF-8 but the database was in latin-1.
After realizing my mistake i told the database it should be in utf-8 but that didnt fix it (it's mysql btw), actually some perl programmers think that might have even compounded the problem.
Basically how do i get back to my original data? I have php, perl, C#, C++, python and a few others under my belt so tools arent much of a problem but the ones that I've tried are getting me nowhere... and i think i have a faulty view of encoding and decoding.
I looked at the hex and know it definitely converted the utf-8 to latin-1 on the way in and i think i need to only go one step backwards. any thoughts?
ex. i had the char Ω on the site and it comes out Ω in the db.
hex: ce a9 -> e2 84 7c
if God was willing to live all out for us, why aren't we willing to live all out for Him? God bless,
my home my forum
July 16th, 2008, 09:48 PM
Hmmm, this might help?
I seem to recall this exercise a few years back when MySql changed the default character set to UTF-8 and you wanted to migrate/upgrade?
What version of MySql is it?
I might be old fashioned, but I have never been a fan of one-step updates. I always like an intermediate file that I can check before proceeding and go back to if the update fails.
By cabby80 in forum The Security Tutorials Forum
Last Post: October 18th, 2005, 12:13 AM
By ch4r in forum Other Tutorials Forum
Last Post: May 30th, 2005, 09:29 PM
By skiddieleet in forum Other Tutorials Forum
Last Post: March 9th, 2005, 06:20 PM
By White Scorpion in forum Other Tutorials Forum
Last Post: November 2nd, 2004, 11:41 AM
By ele5125 in forum Other Tutorials Forum
Last Post: June 18th, 2002, 05:44 AM