21.2. Character Set ¼­Æ÷Æ®

PostgreSQL¿¡¼­ Áö¿øµÇ´Â character set(ÀÎÄÚµùÀ̶ó°íµµ ºÒ¸²)´Â ISO 8859 ½Ã¸®Áî¿Í °°Àº ½Ì±Û ¹ÙÀÌÆ® ¹®ÀÚ³ª EUC(È®Àå Unix ÄÚµå), UTF-8, Mule ³»ºÎ ÄÚµå¿Í °°Àº ¸ÖƼ ¹ÙÀÌÆ® character set¸¦ Æ÷ÇÔÇÏ´Â, ´Ù¾çÇÑ character set·Î ÅØ½ºÆ®¸¦ ÀúÀåÇÒ ¼ö°¡ ÀÖ½À´Ï´Ù. Ŭ¶óÀÌ¾ðÆ®´Â Áö¿øµÇ´Â ¸ðµç character set¸¦ ½±°Ô »ç¿ëÇÒ ¼ö ÀÖÁö¸¸, ÀϺδ ¼­¹ö ³»ÀÇ (Áï, ¼­¹ö »çÀ̵å ÀÎÄÚµùÀ¸·Î¼­)»ç¿ëÀ» Áö¿øÇÏÁö ¾Ê°í ÀÖ½À´Ï´Ù. µðÆúÆ® character set´Â initdb¸¦ »ç¿ëÇÏ´Â PostgreSQLµ¥ÀÌÅͺ£À̽º Ŭ·¯½ºÅͰ¡ ÃʱâÈ­µÇ¾îÁú ¶§ °áÁ¤µË´Ï´Ù. ÀÌ´Â µ¥ÀÌÅͺ£À̽º¸¦ »ý¼ºÇÒ ¶§ ¿À¹ö¶óÀ̵åµÉ ¼ö ÀÖÀ¸¹Ç·Î, °¢°¢ ´Ù¸¥ character set¸¦ °¡Áö´Â ´Ù¼öÀÇ µ¥ÀÌÅͺ£À̽º¸¦ °¡Áú ¼ö ÀÖ½À´Ï´Ù.

21.2.1. Áö¿øµÇ´Â Character Sets

Table 21-1´Â character set°¡ PostgreSQL¿¡¼­ »ç¿ë °¡´ÉÇÏ´Ù´Â °ÍÀ» º¸¿©ÁÝ´Ï´Ù.

Table 21-1. PostgreSQL Character Sets

À̸§ ¼³¸í ¾ð¾î ¼­¹ö? ¹ÙÀÌÆ®/¹®ÀÚ º°Äª
BIG5 Big Five ÀüÅë Áß±¹¾î ¾Æ´Ï¿À 1-2 WIN950, Windows950
EUC_CN Extended UNIX Code-CN °£ÀÌ Áß±¹¾î ³× 1-3  
EUC_JP Extended UNIX Code-JP ÀϺ»¾î ³× 1-3  
EUC_KR Extended UNIX Code-KR Çѱ¹¾î ³× 1-3  
EUC_TW Extended UNIX Code-TW ÀüÅë Áß±¹¾î, ´ë¸¸¾î ³× 1-3  
GB18030 ±¹°¡ Ç¥ÁØ Áß±¹¾î ¾Æ´Ï¿À 1-2  
GBK È®ÀåµÈ ±¹°¡ Ç¥ÁØ °£ÀÌ Áß±¹¾î ¾Æ´Ï¿À 1-2 WIN936, Windows936
ISO_8859_5 ISO 8859-5, ECMA 113 ¶óƾ/۸± ¹®ÀÚ ³× 1  
ISO_8859_6 ISO 8859-6, ECMA 114 ¶óƾ/¾Æ¶óºñ¾Æ¾î ³× 1  
ISO_8859_7 ISO 8859-7, ECMA 118 ¶óƾ/±×¸®½º¾î ³× 1  
ISO_8859_8 ISO 8859-8, ECMA 121 ¶óƾ/È÷ºê¸®¾î ³× 1  
JOHAB JOHAB Çѱ¹¾î (ÇѱÛ) ³× 1-3  
KOI8 KOI8-R(U) ۸± ¹®ÀÚ ³× 1 KOI8R
LATIN1 ISO 8859-1, ECMA 94 ¼­ºÎ À¯·´¾î ³× 1 ISO88591
LATIN2 ISO 8859-2, ECMA 94 Áß¾Ó À¯·´¾î ³× 1 ISO88592
LATIN3 ISO 8859-3, ECMA 94 ³²ºÎ À¯·´¾î ³× 1 ISO88593
LATIN4 ISO 8859-4, ECMA 94 ºÏºÎ À¯·´¾î ³× 1 ISO88594
LATIN5 ISO 8859-9, ECMA 128 ÅÍŰ¾î ³× 1 ISO88599
LATIN6 ISO 8859-10, ECMA 144 ºÏÀ¯·´, ½ºÄ­µð³ªºñ¾Æ »ç¶÷ ³× 1 ISO885910
LATIN7 ISO 8859-13 ¹ßÆ®¾î ³× 1 ISO885913
LATIN8 ISO 8859-14 ÄÌÆ®¾î ³× 1 ISO885914
LATIN9 ISO 8859-15 LATIN1ÀÇ À¯·´°ú ¾Ç¼¾Æ® ³× LATIN1 1 ISO885915
LATIN10 ISO 8859-16, ASRO SR 14111 ·ç¸¶´Ï¾Æ¾î ³× 1 ISO885916
MULE_INTERNAL Mule ³»ºÎ ÄÚµå ´ÙÁß¾î Emacs ³× 1-4  
SJIS Shift JIS ÀϺ»¾î ³× 1-2 Mskanji, ShiftJIS, WIN932, Windows932
SQL_ASCII ¹ÌÁöÁ¤(ÅØ½ºÆ® ÂüÁ¶) ¹«¾ùÀÌµç ³× 1  
UHC ÅëÇÕ ÇÑ±Û ÄÚµå Çѱ¹¾î ¾Æ´Ï¿À 1-2 WIN949, Windows949
UTF8 Unicode, 8-bit ¸ðµÎ ³× 1-4 Unicode
WIN866 Windows CP866 ۸± ¹®ÀÚ ³× 1 ALT
WIN874 Windows CP874 ŸÀÌ ¾î ³× 1  
WIN1250 Windows CP1250 Áß¾Ó À¯·´¾î ³× 1  
WIN1251 Windows CP1251 ۸± ¹®ÀÚ ³× 1 WIN
WIN1252 Windows CP1252 ¼­ºÎ À¯·´¾î ³× 1  
WIN1253 Windows CP1253 ±×¸®½º¾î ³× 1  
WIN1254 Windows CP1254 ÅÍŰ¾î ³× 1  
WIN1255 Windows CP1255 È÷ºê¸®¾î ³× 1  
WIN1256 Windows CP1256 ¾Æ¶óºñ¾Æ¾î ³× 1  
WIN1257 Windows CP1257 ¹ßÆ®¾î ³× 1  
WIN1258 Windows CP1258 º£Æ®³²¾î ³× 1 ABC, TCVN, TCVN5712, VSCII

¸ðµçAPI°¡ À§ÀÇ À϶÷Ç¥¿¡ ³ªÅ¸³½ character set¸¦ Áö¿øÇϰí ÀÖ´Â °ÍÀº ¾Æ´Õ´Ï´Ù. ¿¹¸¦ µé¸é PostgreSQL JDBC µå¶óÀ̹ö´Â MULE_INTERNAL,LATIN6,LATIN8, ±×¸®°íLATIN10¸¦ ¼­Æ÷Æ®ÇÏÁö ¾Ê½À´Ï´Ù.

SQL_ASCII ¼³Á¤Àº ´Ù¸¥ ¼³Á¤°ú ´Ù¸¨´Ï´Ù. ¼­¹öÀÇ Ä³¸¯ÅÍ ¼¼Æ®°¡SQL_ASCII¶§, ¼­¹ö´Â 0¿¡¼­ 127ÀÇ ¹ÙÀÌÆ® °ªÀ» ASCII·Î º¯È¯ÇÕ´Ï´Ù. ÇÑÆí, 128¿¡¼­ 255±îÁö´Â º¯È¯µÇÁö ¾Ê½À´Ï´Ù. ¼³Á¤ÀÌ SQL_ASCIIÀÇ °æ¿ì´Â, encode´Â ½ÇÇàµÇÁö ¾Ê½À´Ï´Ù. µû¶ó¼­, ÀÌ ¼³Á¤Àº ƯÁ¤ÀÇ encode¸¦ »ç¿ëÇϰí ÀÖ´Â °æ¿ì¿¡´Â, ±× encode¸¦ ¹«½ÃÇÏ°Ô µÇ¾î ¹ö¸³´Ï´Ù. ´ëºÎºÐÀÇ °æ¿ì, ¸¸¾à ASCII µ¥ÀÌÅͰ¡ ¾Æ´Ñ ȯ°æ¿¡¼­ ÀÛ¾÷ÇÏ°Ô µÈ´Ù¸é, SQL_ASCIIÀÇ ¼³Á¤À» »ç¿ëÇÏ´Â °ÍÀº ¿µ¸®ÇÑ ÀÏÀÌ ¾Æ´Õ´Ï´Ù. ¿Ö³ÄÇϸéPostgreSQL´Â ASCII°¡ ¾Æ´Ñ ¹®ÀÚ¸¦ º¯È¯Çϰųª °Ë»çÇϰųª ÇÏ´Â °ÍÀº ÇÒ ¼ö ¾ø±â ¶§¹®ÀÔ´Ï´Ù.

21.2.2. Character Set ¼³Á¤

initdb·Î PostgreSQL Ŭ·¯½ºÅÍÀÇ µðÆúÆ® character set¸¦ Á¤ÀÇÇÕ´Ï´Ù. ÀÌÇÏ´Â ¿¹¸¦ ³ªÅ¸³À´Ï´Ù.

initdb -E EUC_JP

À̰ÍÀº µðÆúÆ® character set(encode ¹æ½Ä)¸¦ EUC_JP(ÀϺ»¾î È®Àå Unix ÄÚµå)·Î ¼³Á¤ÇÕ´Ï´Ù. º¸´Ù ±ä ¿É¼Ç ¹®ÀÚ¿­ ŸÀÔÀ» ¼±È£ÇÑ´Ù¸é, -E ´ë½Å¿¡ --encoding¸¦ »ç¿ëÇÒ ¼ö ÀÖ½À´Ï´Ù. ÁÖ¾îÁø -E¿É¼ÇÀ̳ª --encodingÀÌ ¾øÀ» °æ¿ì, initdb´Â ÁöÁ¤µÇ°Å³ª µðÆúÆ® Áö¿ª¿¡ ±Ù°ÅÇØ¼­ Àû´çÇÑ encode ¹æ½ÄÀ» °áÁ¤ÇÏ·Á°í ½ÃµµÇÕ´Ï´Ù.

´Ù¸¥ character setÀ» °¡Áö´Â µ¥ÀÌÅͺ£À̽º¸¦ »ý¼ºÇÒ ¼ö ÀÖ½À´Ï´Ù.

createdb -E EUC_KR korean

ÀÌ´Â character set EUC_KR¸¦ »ç¿ëÇÏ´Â korean À̸§ÀÇ µ¥ÀÌÅͺ£À̽º¸¦ ÀÛ¼ºÇÕ´Ï´Ù. SQL Ä¿¸àµå¸¦ »ç¿ëÇÏ¿© ½ÇÇàÇÏ·Á¸é ´ÙÀ½°ú °°ÀÌ ÇÕ´Ï´Ù.

CREATE DATABASE korean WITH ENCODING 'EUC_KR';

µ¥ÀÌÅͺ£À̽ºÀÇ encode ¹æ½ÄÀº pg_database½Ã½ºÅÛ Ä«Å»·Î±×¿¡ ÀúÀåµË´Ï´Ù. psqlÀÇ -l¿É¼ÇÀ̳ª \lÄ¿¸àµå¸¦ »ç¿ëÇÏ¿© encode ¹æ½ÄÀ» È®ÀÎÇÒ ¼ö°¡ ÀÖ½À´Ï´Ù.

$ psql -l
            List of databases
   Database    |  Owner  |   Encoding    
---------------+---------+---------------
 euc_cn        | t-ishii | EUC_CN
 euc_jp        | t-ishii | EUC_JP
 euc_kr        | t-ishii | EUC_KR
 euc_tw        | t-ishii | EUC_TW
 mule_internal | t-ishii | MULE_INTERNAL
 postgres      | t-ishii | EUC_JP
 regression    | t-ishii | SQL_ASCII
 template1     | t-ishii | EUC_JP
 test          | t-ishii | EUC_JP
 utf8          | t-ishii | UTF8
(9 rows)

Important: ¿øÇÏ´Â encoding ¹æ½ÄÀ¸·Î µ¥ÀÌÅͺ£À̽º¸¦ ÁöÁ¤ÇÒ ¼ö ÀÖÀ½¿¡µµ, ¼±ÅÃÇÑ Áö¿ª¿¡¼­ »ç¿ëµÇÁö ¾Ê´Â encodingÀ» ¼±ÅÃÇÏ´Â °ÍÀº ÁÁÁö ¾Ê½À´Ï´Ù. LC_COLLATE¿Í LC_CTYPE ¼³Á¤Àº ƯÁ¤ÇÑ encoding¿¡¸¸ Àû¿ëµÇ¸ç, Áö¿ª-ÀÇÁ¸ÀûÀÎ ¿¬»ê(Á¤·Ä(sorting)°ú °°Àº)Àº ¸ð¼øµÈ encodingÀÇ ¿À¿ªµÈ µ¥ÀÌÅͰ¡ µÉ ¼ö ÀÖ½À´Ï´Ù.

initdbÀ¸·Î °íÁ¤µÇ¾îÁø Áö¿ª ¼³Á¤µÇ¾î¼­, °¢°¢ÀÇ µ¥ÀÌÅͺ£À̽º Ŭ·¯½ºÅÍÀÇ ´Ù¸¥ ÀÎÄÚµù »ç¿ëÀº ½ÇÁ¦º¸´Ù´Â ÀÌ·ÐÀû¿¡ ´õ °¡±õ½À´Ï´Ù. ÀÌ·¯ÇÑ ¸ÅÄ¿´ÏÁòÀº PostgreSQLÀÇ ÇâÈÄ ¹öÀü¿¡¼­ ´Ù½Ã ¸¸³ª½Ç ¼ö ÀÖÀ» °ÍÀÔ´Ï´Ù.

´Ù¼öÀÇ encodingÀ» ¾ÈÀüÇÏ°Ô »ç¿ëÇÏ´Â ¹æ¹ýÀ¸·Î initdb°¡ ½ÇÇàµÇ´Â µ¿¾È, C³ª POSIX¸¦ Áö¿ª¿¡ ¼³Á¤ÇÒ ¼ö ÀÖ½À´Ï´Ù. ÀÌ¿¡ µû¶ó ¾î¶°ÇÑ ½ÇÁ¦ Áö¿ª Àνĵµ °¡´ÉÇÏÁö ¾Ê°Ô µË´Ï´Ù.

21.2.3. ¼­¹ö¡¤Å¬¶óÀÌ¾ðÆ®°£ÀÇ ÀÚµ¿ Character Set º¯È¯

PostgreSQL´Â character set º¯È¯¿¡ ´ëÇØ¼­ ¼­¹ö¿Í Ŭ¶óÀ̾ðÆ®ÀÇ »çÀÌ¿¡ character set°¡ ÀÚµ¿ÀûÀ¸·Î º¯È¯µÇ´Â ±â´ÉÀ» Á¦°øÇϰí ÀÖ½À´Ï´Ù. º¯È¯ Á¤º¸´Âpg_conversion½Ã½ºÅÛ Ä«Å»·Î±×¿¡ ÀúÀåµÇ¾î ÀÖ½À´Ï´Ù. PostgreSQL´Â Table 21-2¿¡¼­ º¸¿©Áö´Â °Í°ú °°ÀÌ, »çÀü¿¡ Á¤ÀÇµÈ º¯È¯ÀÌ °¡Áö°í ÀÖ½À´Ï´Ù. »õ·Î¿î º¯È¯À» ÀÛ¼ºÇÏ·Á¸é SQL Ä¿¸àµåÀÇCREATE CONVERSION¸¦ »ç¿ëÇÕ´Ï´Ù.

Table 21-2. Ŭ¶óÀ̾ðÆ®/¼­¹ö Character Set º¯È¯

¼­¹ö ij¸¯ÅÍ ¼¼Æ® ÀÌ¿ë °¡´ÉÇÑ Å¬¶óÀÌ¾ðÆ® Character Set
BIG5 ¼­¹ö encoding ¹æ½ÄÀº Áö¿øÇÏÁö ¾ÊÀ½
EUC_CN EUC_CN , MULE_INTERNAL, UTF8
EUC_JP EUC_JP , MULE_INTERNAL, SJIS, UTF8
EUC_KR EUC_KR , MULE_INTERNAL, UTF8
EUC_TW EUC_TW , BIG5, MULE_INTERNAL, UTF8
GB18030 ¼­¹ö encoding ¹æ½ÄÀº Áö¿øÇÏÁö ¾ÊÀ½
GBK ¼­¹ö encoding ¹æ½ÄÀº Áö¿øÇÏÁö ¾ÊÀ½
ISO_8859_5 ISO_8859_5 , KOI8, MULE_INTERNAL, UTF8, WIN866, WIN1251
ISO_8859_6 ISO_8859_6 , UTF8
ISO_8859_7 ISO_8859_7 , UTF8
ISO_8859_8 ISO_8859_8 , UTF8
JOHAB JOHAB , UTF8
KOI8 KOI8 , ISO_8859_5, MULE_INTERNAL, UTF8, WIN866, WIN1251
LATIN1 LATIN1 , MULE_INTERNAL, UTF8
LATIN2 LATIN2 , MULE_INTERNAL, UTF8, WIN1250
LATIN3 LATIN3 , MULE_INTERNAL, UTF8
LATIN4 LATIN4 , MULE_INTERNAL, UTF8
LATIN5 LATIN5 , UTF8
LATIN6 LATIN6 , UTF8
LATIN7 LATIN7 , UTF8
LATIN8 LATIN8 , UTF8
LATIN9 LATIN9 , UTF8
LATIN10 LATIN10 , UTF8
MULE_INTERNAL MULE_INTERNAL , BIG5, EUC_CN, EUC_JP, EUC_KR, EUC_TW, ISO_8859_5, KOI8, LATIN1 to LATIN4, SJIS, WIN866, WIN1250, WIN1251
SJIS ¼­¹ö encoding ¹æ½ÄÀº Áö¿øÇÏÁö ¾ÊÀ½
SQL_ASCII ¾î´À °Íµµ(º¯È¯µÇÁö ¾ÊÀ» °ÍÀÔ´Ï´Ù)
UHC ¼­¹ö encoding ¹æ½ÄÀº Áö¿øÇÏÁö ¾ÊÀ½
UTF8 ¸ðµç encodingÀÌ Áö¿øµÇ°í ÀÖ½À´Ï´Ù.
WIN866 WIN866 , ISO_8859_5, KOI8, MULE_INTERNAL, UTF8, WIN1251
WIN874 WIN874 , UTF8
WIN1250 WIN1250 , LATIN2, MULE_INTERNAL, UTF8
WIN1251 WIN1251 , ISO_8859_5, KOI8, MULE_INTERNAL, UTF8, WIN866
WIN1252 WIN1252 , UTF8
WIN1253 WIN1253 , UTF8
WIN1254 WIN1254 , UTF8
WIN1255 WIN1255 , UTF8
WIN1256 WIN1256 , UTF8
WIN1257 WIN1257 , UTF8
WIN1258 WIN1258 , UTF8

ÀÚµ¿ character set º¯È¯À» »ç¿ëÇϱâ À§Çؼ­´Â Ŭ¶óÀÌ¾ðÆ®¿¡¼­ »ç¿ëÇϰíÀÚ ÇÏ´Â character set(encoding)¸¦ PostgreSQL¿¡°Ô ¾Ë·Á¾ß¸¸ ÇÕ´Ï´Ù. À̰ÍÀ» ½Ç½ÃÇÏ´Â ¸î°¡Áö ¹æ¹ýÀÌ ÀÖ½À´Ï´Ù.

ƯÁ¤ÇÑ ¹®ÀÚÀÇ º¯È¯ÀÌ ºÒ°¡´É ÇÒ ¶§, ¼­¹ö´ÂEUC_JPÀ¸·Î Ŭ¶óÀÌ¾ðÆ®´Â LATIN1¸¦ ¼±ÅÃÇÑ´Ù¸é, ÀϺΠÀϺ»¾î ¹®ÀÚ´Â LATIN1¸¦ Ç¥ÇöÇÏÁö ¸øÇÒ °ÍÀÌ°í ¿¡·¯°¡ º¸°íµÉ °ÍÀÔ´Ï´Ù.

Ŭ¶óÀÌ¾ðÆ® ¹®ÀÚ ¼³Á¤ÀÌ SQL_ASCII·Î Á¤Àǵȴٸé, ¼­¹öÀÇ character set¿Í °ü°è¾øÀÌ ¼­¹ö encoding º¯È¯ÀÌ °¡´ÉÇÏÁö ¾Ê½À´Ï´Ù. ¼­¹ö¸¸À» º¸ÀÚ¸é, all-ASCII µ¥ÀÌÅÍ·Î ÀÏÀ» ÇÏÁö ¾Ê´Â ÀÌ»ó SQL_ASCII¸¦ »ç¿ëÇÏ´Â °ÍÀº Çö¸íÇÏÁö ¾Ê½À´Ï´Ù.

21.2.4. Ãß°¡ÀûÀÎ ¹®¼­

¿©±â¿¡ ÀûÀº °ÍÀº ¿©·¯°¡Áö encode ¹æ½Ä ½Ã½ºÅÛÀ» ÇнÀÇϴµ¥ ÁÁÀº ÀÚ·áÀÔ´Ï´Ù.

http://www.i18ngurus.com/docs/984813247.html

character set, encoding¿Í ÄÚµå ÆäÀÌÁö¿¡ °üÇÑ ±¤´ëÇÑ ¹®¼­ÀÇ ¸ðÀ½

ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf

3.2Àý¿¡EUC_JP,EUC_CN,EUC_KR,EUC_TWÀÇ ÀÚ¼¼ÇÑ ¼³¸íÀÌ ÀÖ½À´Ï´Ù.

http://www.unicode.org/

Unicode ÇùȸÀÇ Web »çÀÌÆ®ÀÔ´Ï´Ù.

RFC 2044

¿©±â¿¡ UTF-8°¡ Á¤Àǵǰí ÀÖ½À´Ï´Ù.