繁簡體判別 - Delphi K.Top 討論區

發文回覆瀏覽次數：2125

推到 Plurk!

推到 Facebook!

繁簡體判別

尚未結案

lin11112 初階會員發表：42 回覆：83 積分：25 註冊：2003-02-17 發送簡訊給我	#1 引用回覆回覆發表時間：2008-11-25 16:36:29 IP:60.248.xxx.xxx 訂閱請問各位先進如何判別一個字串是屬於繁體,簡體或是英文的字串呢?
aftcast 站務副站長發表：81 回覆：1485 積分：1763 註冊：2002-11-21 發送簡訊給我	#2 引用回覆回覆發表時間：2008-11-25 23:43:40 IP:122.120.xxx.xxx 訂閱是哪種字串? UNICODE? ANSICODE? 不過不管哪種都很難分。我猜想你是要用在安裝不同作業系統上吧，若是這樣，那該往如何知道os是哪種語系來處理… 還是……你是真的想知道怎麼分? @@ 這我就不會了! ------ 蕭沖 --All ideas are worthless unless implemented-- C++ Builder Delphi Taiwan G+ 社群 http://bit.ly/cbtaiwan
pceyes 尊榮會員發表：70 回覆：657 積分：1140 註冊：2003-03-13 發送簡訊給我	#3 引用回覆回覆發表時間：2008-11-26 06:48:38 IP:220.141.xxx.xxx 訂閱汉字与区位码(2) - 分析 http://www.cnblogs.com/del/archive/2008/11/19/1336467.html ------ 努力會更接近成功
lin11112 初階會員發表：42 回覆：83 積分：25 註冊：2003-02-17 發送簡訊給我	#4 引用回覆回覆發表時間：2008-11-26 13:30:47 IP:218.210.xxx.xxx 訂閱小弟是想在讀入一文字檔時能分辨此文字檔的內容是繁體或簡體的 ===================引用 aftcast 文章=================== 是哪種字串? UNICODE? ANSICODE? 不過不管哪種都很難分。我猜想你是要用在安裝不同作業系統上吧，若是這樣，那該往如何知道os是哪種語系來處理… 還是……你是真的想知道怎麼分? @@ 這我就不會了!
aftcast 站務副站長發表：81 回覆：1485 積分：1763 註冊：2002-11-21 發送簡訊給我	#5 引用回覆回覆發表時間：2008-11-26 14:09:30 IP:122.120.xxx.xxx 訂閱看你執念很深，我就講一下了! 目先，我貼一下unicode國際組織的一篇faq內容給你看: Q: How can I recognize from the 32 bit value of a Unicode character if this is a Chinese, Korean or Japanese character? A: It's basically impossible and largely meaningless. It's the equivalent of asking if "a" is an English letter or a French one. There are some characters where one can guess based on the source information in Unihan.txt that it's traditional Chinese, simplified Chinese, Japanese, Korean, or Vietnamese, but there are too many exceptions to make this really reliable. (For example, one particularly nasty obscenity in Cantonese would probably have never been encoded for Cantonese, but has made it in for the sake of Korean, where one hopes it isn't nearly as obscene.) The phonetic data in Unihan.txt should not be used for this purpose. A blank in the phonetic data means that nobody's supplied a reading, not that a reading doesn't exist. Because updating the Unihan database is an ongoing process, these fields will be increasingly filled out as time goes on, but they should never be taken as absolutely complete. In particular, there are obscure characters where it is known that there is a reading, but since the character does not occur in standard dictionaries, we are unable to supply it (e.g., 䃟 U 40DF in Cantonese). A better solution is to look at the text as a whole: if there's a fair amount of kana, it's probably Japanese, and if there's a fair amount of hangul, it's probably Korean. The only proper mechanism is, as for determining whether "chat" is spelled correctly in English or French, is to use a higher-level protocol 希望你能看得懂…不過，話雖如此… 我想了一想，有了一個主意，也許可行! 但工程比較大，我沒時間測式… 我提供我的想法吧! 我有一份Gig5對unicode的文字比照表。把它放入一個資料庫裡，比如說access。讀入字串前要先知道該檔案是unicode或是ansi檔案，這有幾個判別法，使用bom開頭碼來測，但也不一定都準…詳情一時很難教你。但若你一開始就知道該檔是unicode或是一般ansi的檔，那就不用管這件事了! 接下來，分二項: 1/ 若是ansi的檔，那麼隨機取一段字串放入ansistring，然後用IsLeadByte來判別是否為中文而非英文。然後若是中文則把這個中文字轉成hex code(即內碼，這需要另外的技巧)，然後再使用multibytetounicode的方式指定big5為轉換條件，然後轉為unicode，此時再把這個unicode轉成hex code，這時候拿內碼的hex 與unicode的hex這一對值去比對上面的資料庫，若是合，那就是big5的文字，若是不合，那是簡體 2/ 若是unicode的檔，那處理過程則相反，先把unicode的hex算出，然後使用unicodetomultibyte配合big5條件，轉出big5碼，然後再轉成hex code，再把這一對值去比對資料庫。原則上應該是很可行，但真的要不少的技巧! PS 我對編碼有相當熟的了解和興趣，但目前時間不太多，暫沒法實作給你看，待過些日子看看… 或者有前輩懂我說的演算結構，那就請他們實作或貼部份的code給你看。 ===================引用 lin11112 文章=================== 小弟是想在讀入一文字檔時能分辨此文字檔的內容是繁體或簡體的 ------ 蕭沖 --All ideas are worthless unless implemented-- C++ Builder Delphi Taiwan G+ 社群 http://bit.ly/cbtaiwan

系統時間：2024-04-25 0:07:28

聯絡我們 | Delphi K.Top討論版

本站聲明

1. 本論壇為無營利行為之開放平台，所有文章都是由網友自行張貼，如牽涉到法律糾紛一切與本站無關。
2. 假如網友發表之內容涉及侵權，而損及您的利益，請立即通知版主刪除。
3. 請勿批評中華民國元首及政府或批評各政黨，是藍是綠本站無權干涉，但這裡不是政治性論壇！

5151線上健康照護網 | 台灣西醫網 | 台灣中醫網 | 台灣牙科網 | 台灣照護網 | 趴趴狗旅遊網
大花蓮旅遊網 | 大花蓮民宿網 | 花蓮旅遊網 | 花蓮旅遊 | 花蓮旅遊 | 花蓮住宿
花蓮民宿網 | 花蓮旅遊 | 花蓮住宿 | 花蓮民宿 | 花蓮旅遊 | 花蓮民宿
花蓮住宿 | 大南投旅遊網 | 大南投民宿網 | 日月潭風景區 | 日月潭旅遊網 | 日月潭民宿網
日月潭住宿網 | 宜蘭旅遊網 | 宜蘭民宿網 | 宜蘭住宿網 | 宜蘭旅遊網 | 宜蘭民宿網
宜蘭住宿網 | 宜蘭旅遊網 | 宜蘭民宿網 | 宜蘭住宿網 | 台東旅遊網 | 台東民宿網
台東住宿網 | 台東旅遊網 | 台東民宿網 | 台東住宿網 | 台東旅遊 | 台東民宿
台東住宿 | 綠島旅遊網 | 綠島民宿網 | 綠島住宿網 | 綠島旅遊網 | 綠島民宿網
綠島住宿網 | 綠島旅遊網 | 綠島民宿網 | 綠島住宿網 | 集集旅遊網 | 集集民宿網
集集住宿網 | 關子嶺旅遊網 | 關子嶺民宿網 | 白河旅遊網 | 白河民宿網 | 心脈大師
尊榮牙醫診所 |