[imc-sf-active] utf-8

mark burdett mark at indymedia.org
Thu Jul 28 12:14:03 PDT 2005


seems like now might be a good time to "force" all sf-active
sites & databases to be utf8.. but at least make it an easily
configurable option.

i changed radio.indymedia.org to be a utf-8 site, it was easy.

i wanted to change the db to utf8 as well so that e.g. 
searches and everything work seamlessly:
http://radio.indymedia.org/news/?keyword=peri%C3%B3dico

run a sql command for each table:
Alter table webcast modify column article text character 
set utf8, modify column summary text character
set utf8 [etc.]

note, if your database has something aside from latin1, this
probably won't work right.  it'll get garbled.  there are
various docs online to help you deal w/ this.. so at least
make sure you backup the old database first..!

you also might as well set the default character set to utf8 
for each table and the database as a whole, so if something 
is added in the future it will be utf8.

add this to get_connection() in db_class.inc:
   mysql_query('SET CHARACTER SET utf8');
and you could also add
   mysql_query('SET NAMES utf8');
   mysql_query("SET COLLATION_CONNECTION='utf8_general_ci'");

change the <meta> header for your web and admin pages to
charset=utf-8

put this in your httpd.conf: 
AddDefaultCharset UTF-8
so your server is sending out the correct header.

remove any utf_encode() from syndication classes.  also remove
any translation being done for windows characters in posts.
this stuff will all be utf-8 now, so it's not necessary.

if you are aggregating feeds onto your site, remove any 
utf_decode() functions from these.

that's it.  well of course there are some other considerations 
re: multibyte text but this is basic functionality.

this stuff can easily be placed in "if" statements based on 
sfactive.cfg but might be worth forcing utf8 on everyone at 
some point...

--mark B.


More information about the imc-sf-active mailing list