Unicode and Asian Character Sets for Testers

Establishing the Proper Foundation for Your Global Product

When you internationalize your product in order to produce localized versions for Asia or the rest of the world, your first decision will be about how to represent your data.

Will you use Unicode? or local character sets? or both? What laws can influence your decision? And what issues arise when using Unicode? What encoding should you use? What are the benefits and costs?

This uniquely unbiased workshop answers these questions by presenting Unicode’s strong and weak points in a clear, incisive manner.

  • The workshop begins with a short history of character sets leading up to Unicode, from Unicode 1.0 to the current Unicode version. The basic Unicode encodings (UTF-8, UTF‑16, UTF-32), as well as other representations (CESU-8, SCSU, BOCU-1) are presented along with their pros and cons.
  • Next, the four normalization forms and their applications are discussed. Unicode implementation issues are presented, notably concerning text manipulation and fonts.
  • The workshop then covers the latest Asian character sets: Korea’s KS X 1001 versions, China’s GB18030, Hong Kong’s SCS and Japan’s JIS X 0213 and their implications, both technical and legal (including conformance requirements).

You will learn

  • About various code sets
  • About multiple encodings
  • How to build data sets for multilingual testing: FIGS, CCJK and BiDi