SMF Online Manual

*
  • Home
  • Community
  • Download
  • Customize
    • Modifications
    • Themes
    • Upgrades
  • Support
    • Function Database
  • Online Manual
  • About
  • Contribute
  • Development
 

Documentation of the moment:

As an administrator, can I view my members' personal messages?


  • Home
  • Help
  • Search
  • Login
  • Register

  • SMF Online Manual »
  • Into the Depths of SMF »
  • Inside SMF »
  • Topic: UTF-8 Readme
 
Install SMF Installing Upgrading Converting
Using SMF User Moderator Administrator
Advanced Mods and Themes More InfoLanguagesUTF-8 ReadmeWireless Device SupportRestoring a MySQL DatabaseWhat is repair_settings.php?Copyright Information
Glossary Terminology FAQs References Feature List What's New
Comments Feedback Requests
« previous next »
UTF-8 Readme
What is UTF-8?
UTF-8 is an encoding standard that can represent all Unicode characters. This allows to show almost any writing system in the world.

What's new in SMF 1.1 RC3 with regard to UTF-8?
SMF has always supported multiple character sets. Each language package was written in a specific character set. Nothing has changed to the support of those character sets, but additional support for UTF-8 has been added. It is possible to convert your forum to UTF-8 or (in case of a new forum) install with UTF-8 support. If you have your forum in UTF-8 mode, both the database and website will be using UTF-8.

The following character sets are currently used for SMF's language packages (both 1.0.x and 1.1):

Character set Language
big5Chinese (traditional)
gbkChinese (simplified)
ISO-8859-1Albanian, Brazilian, Catalan, Danish, Dutch, English, Finnish, French, German, Portuguese, Norwegian, Spanish, Swedish
ISO-8859-2Croatian, Hungarian, Polish, Romanian
ISO-8859-9Turkish
tis-620Thai
UTF-8Chinese (simplified), Chinese (traditional), Japanese, Persian
windows-1256Arabic
windows-1251Bulgarian, Russian
windows-1253Greek
windows-1255Hebrew

As of SMF 1.1 RC3 you'll be able to also download each of those language packages in UTF-8 character set (Download ->Language packs).


Why would I need UTF-8?
There are a few reasons you might need UTF-8:
  • If you want to support multiple languages that use different character sets on your forum. For instance if you want to support both Russian and Turkish, you will need a character set that supports both. UTF-8 is then a logical choice.
  • If the software integrating with SMF uses UTF-8. In some cases such an integration can require character sets to match.
  • If you need better search results or improved sorting. In some cases searching and sorting by the database can be improved by chosing UTF-8 as your character set.

Why would I NOT need UTF-8?
For all the other reasons, UTF-8 would probably not be very useful. Besides, it's a bit slower too.

Also keep in mind that you need at least MySQL 4.1 and SMF 1.1 RC3 to be able to use UTF-8 as default character set if you are using MySQL as your database scheme.

How to convert to UTF-8?
  • Start with a backup of your database(!) Character set conversions are all but guaranteed to go right.
  • Go to 'Forum Maintenance' -> 'Convert the database and data to UTF-8'
  • Select the character set your current data is in. The default setting for this is based on the character set of your default langauge file.
  • After pressing proceed, your database will be converted. Depending on the size of your database, the conversion process might stop temporarily from time to time to avoid overloading the server. If that was successful, your forum should be converted to UTF-8.
  • You'll be needing a new set of language files. All language files need to be UTF-8 compatible. Luckily all language packs for 1.1 RC3 are available for both the original character set and UTF-8, so simply download them and you should be ready to go.
  • Once all the UTF-8 language packs have been installed, convert the language settings of each user by running the following query:
Code: [Select]
UPDATE smf_members
SET lngfile = CONCAT(lngfile, '-utf8')
WHERE lngfile != ''
  • Also, change the default language in your admin center - Admin -> Server Settings
  • Check to see if all your data was properly converted
  • If any of your posts contain HTML entities, you will want to convert those to UTF-8 as well...
Print
Reply
Reply with quote



Comments:
Quote
SwapsRulez made the following comment on June 03, 2008, 10:46:44 PM:

Thanks... that is too informational. :)

Quote
SyEss made the following comment on July 26, 2008, 05:31:09 PM:

If your dates (containing native chars) display incorrectly after successfully converting to UTF-8, try the following:
In "\Themes\default\languages" open the following file:
index.[yourNativeLanguageHere]-utf8.php
Add ".UTF-8" to the language locale setting (in my example its hungarian):
$txt['lang_locale'] = 'hu_HU';
so it becomes:
$txt['lang_locale'] = 'hu_HU.UTF-8';
This solved my problem.

This error is caused by php's strftime() function, which might return your date-string in the server's default (native) encoding instead of UTF-8. strftime() should be set up properly using the setlocale() function, but it is not.

Quote
abram made the following comment on July 26, 2009, 11:00:14 AM:

Amendment, Character set  Language for Hebrew -  windows-1255.
http://en.wikipedia.org/wiki/Windows-1255

Quote
China Expats made the following comment on August 02, 2009, 01:00:06 AM:

Hi, I guess I am missing something here - and support for Chinese language is critical for my Forum

Quote
Go to 'Forum Maintenance' -> 'Convert the database and data to UTF-8'


Well, I have converted HTML to UTF-8 with no change, and still have stupid ANSI support for all posts (have cleared cache, renewed browser, etc).

The quoted above option does not exist in the version I am using (SMF V1.1.10), and yes - I am in Forum Maintenance as an administrator with full permissions. Here is a screenshot of my admin panel:

http://www.china-expats.com/Forum/UTF-8_SMF_Query001.gif

You will note that the
Quote
'Convert the database and data to UTF-8'

does not exist

Please try again, and supply the correct information; or tell me how to include this option within my installation package

Otherwise, thank you for providing such an excellent Forum package!

best wishes
Jonno

Quote
B made the following comment on August 02, 2009, 03:37:33 AM:

Quote from: abram on July 26, 2009, 11:00:14 AM
Amendment, Character set  Language for Hebrew -  windows-1255.
http://en.wikipedia.org/wiki/Windows-1255


Thanks, I have updated the document.

B

Quote
B made the following comment on August 02, 2009, 03:44:25 AM:

Quote from: China Expats on August 02, 2009, 01:00:06 AM
You will note that the
Quote
'Convert the database and data to UTF-8'

does not exist


That probably means your database is already in UTF-8, which would also explain being able to convert HTML entities to UTF-8 characters. Have you tried installing and switching languages?

B


Advertisement:
  • Powered by SMF 2.0 RC2 | SMF © 2006–2009, Simple Machines LLC
  • XHTML
  • RSS
  • WAP2

Page created in 0.213 seconds with 18 queries.
Page served by: 10.0.100.134