請問如何將中文字轉為UTF-8碼 - Delphi K.Top 討論區

發文回覆瀏覽次數：2827

推到 Plurk!

推到 Facebook!

請問如何將中文字轉為UTF-8碼

尚未結案

1995 一般會員發表：7 回覆：19 積分：5 註冊：2002-08-07 發送簡訊給我	#1 引用回覆回覆發表時間：2004-10-20 10:10:52 IP:61.218.xxx.xxx 未訂閱小弟想用 twebbrowser 連結搜尋網站做簡單的搜尋功能但中文字串傳出後會出現亂碼後來查知需先將中文如"工具"轉成 utf-8 "工具"就正常但要如何用 DELPHI 將中文轉成 utf-8 呢?? 查了站上的相關文章，但似乎無法達到此功能，有沒有人可幫助我 PS:小弟知道有些搜尋網站提供傳charset來達成目的，但小弟還是想了解如何將中文轉換成 UTF-8 thanks!
wameng 版主發表：31 回覆：1336 積分：1188 註冊：2004-09-16 發送簡訊給我	#2 引用回覆回覆發表時間：2004-10-20 10:27:24 IP:61.222.xxx.xxx 未訂閱在"UTF-16"編碼方式中，所有的字串都使用16個二進制位來表示，表示的從0到65535的字串。在處理使用該編碼的文件時，每取出一個字串，需要從該文件中得到兩字節的數據，按照其默認的高低為順序，將其組合為一個16位的數值，即為該字串的數值。 "UTF-8"編碼方式中，表示字串編碼的基本單位是一個八位二進制數(一字節)。根據字串在Unicode字串集中的位置，即字串的數值不同，一個字串可能被編碼成為一字節，兩字節，三字節。具體規定如下: 從0x0000到0x007f之間的字串(即ASCII碼的前128位)，使用一字節編碼。具體格式為 [0vvvvvvv]，該字節的第一位為0，後七位為有效位，表示該字串的數值，這與ASCII碼的編碼方式相同，無需特殊處理。從0x0080到0x07ff之間的字串使用兩個字節編碼。具體格式為[110vvvvv]， [10vvvvvv]，第一個字節的開始三位為110，其後為有效位，第二個字節的開始兩位為10，有六個有效位。處理時需要將此兩個字節的有效位取出，合成一個十一位的二進制數，即為該字串的數值。從0x0800到0xffff之間的字串使用三個字節編碼，具體格式為 [1110vvvv]， [10vvvvvv]， [10vvvvvv]。此三個字節的有效位分別為4，6，6，處理時應把他們合併成16位的數值。
1995 一般會員發表：7 回覆：19 積分：5 註冊：2002-08-07 發送簡訊給我	#3 引用回覆回覆發表時間：2004-10-20 12:26:47 IP:61.218.xxx.xxx 未訂閱所以說只要照大大的方法將字集轉成其具體格式後再轉成16進位就成了?? 那應如何算出字集是屬於0x0000到0x007f、0x0080到0x07ff、0x0800到0xffff中的那一種呢??而有效位又是怎麼算的呢?? 能否舉個實例
wameng 版主發表：31 回覆：1336 積分：1188 註冊：2004-09-16 發送簡訊給我	#4 引用回覆回覆發表時間：2004-10-20 12:36:46 IP:61.222.xxx.xxx 未訂閱請參考 http://fundementals.sourceforge.net/cUnicodeCodecs.html
1995 一般會員發表：7 回覆：19 積分：5 註冊：2002-08-07 發送簡訊給我	#5 引用回覆回覆發表時間：2004-10-20 12:53:00 IP:61.218.xxx.xxx 未訂閱 ~內容真不少得好好研究一下了不過我發現其中有一種資料格式是 UCS4 ? 這跟 AnsiString、WideString有什麼不同呢?? 謝謝大大
wameng 版主發表：31 回覆：1336 積分：1188 註冊：2004-09-16 發送簡訊給我	#6 引用回覆回覆發表時間：2004-10-20 12:58:04 IP:61.222.xxx.xxx 未訂閱 WideString --> UniCode
1995 一般會員發表：7 回覆：19 積分：5 註冊：2002-08-07 發送簡訊給我	#7 引用回覆回覆發表時間：2004-10-20 13:19:23 IP:61.218.xxx.xxx 未訂閱找到了 delphi 的 help 有 UCS4 的定義 UTF conversion scheme data type UCS4 另外找到一個例子，應該是用 JAVASCRIPT 寫的，有沒有人可轉成 DELPHI // UTF-8: converts code point array into UTF-8 code unit array function toUTF8(cpArray, cuArray, parseResult) { parseResult.set(); var u = 0; var bu; var lastPoint = 0; cuArray.length = 0; for (var p = 0; p < cpArray.length; p) { lastPoint = point; var point = cpArray[p]; if (point < 0) { parseResult.set("illegal code point - out of bounds: ", p, point); return; } else if (point <= 0x7F) { cuArray[u ] = point; } else if (point <= 0x7FF) { u = 2; bu = u; cuArray[--bu] = 0x80 \| (point & 0x3F); point >>= 6; cuArray[--bu] = 0xC0 \| (point & 0x1F); } else if (point <= 0xFFFF) { if (0xD800 <= lastPoint && lastPoint <= 0xDFFF && 0xDC00 <= point && point <= 0xDFFF) { parseResult.set("illegal code point - surrogate pair: ", p, point); return; } u = 3; bu = u; cuArray[--bu] = 0x80 \| (point & 0x3F); point >>= 6; cuArray[--bu] = 0x80 \| (point & 0x3F); point >>= 6; cuArray[--bu] = 0xE0 \| (point & 0x0F); } else if (point <= 0x10FFFF) { u = 4; bu = u; cuArray[--bu] = 0x80 \| (point & 0x3F); point >>= 6; cuArray[--bu] = 0x80 \| (point & 0x3F); point >>= 6; cuArray[--bu] = 0x80 \| (point & 0x3F); point >>= 6; cuArray[--bu] = 0xF0 \| (point & 0x07); } else { new ParseResult("illegal code point - out of bounds: ", p, point); } } cuArray.length = u; }
nulifes 一般會員發表：4 回覆：3 積分：1 註冊：2003-11-13 發送簡訊給我	#8 引用回覆回覆發表時間：2004-11-22 21:57:59 IP:61.219.xxx.xxx 未訂閱找到 indyURI 方法了一個函數就好免宣告 function UTF8Encode(const ASrc: string): string; const UnsafeChars = ['*', '#', '%', '<', '>', ' ', ' ']; {do not localize} var i: Integer; begin Result := ''; {Do not Localize} for i := 1 to Length(ASrc) do begin if (ASrc[i] in UnsafeChars) or (ASrc[i] >= #$80) or (ASrc[i] < #32) then begin Result := Result '%' IntToHex(Ord(ASrc[i]), 2); {do not localize} end else begin Result := Result ASrc[i]; end; end; end;
WY.GZ 一般會員發表：1 回覆：10 積分：7 註冊：2003-05-07 發送簡訊給我	#9 引用回覆回覆發表時間：2005-03-04 17:38:43 IP:211.160.xxx.xxx 未訂閱 unit utf8util; interface // RFC 2279 function EncodeUTF8 (S: WideString): String; function DecodeUTF8 (S: String): WideString; // * // // Created: // // October 2. 2004 // // By R.M. Tegel // // // // Discription: // // UTF8 Encode and Decode functions // // Encode and Decode UTF to and from WideString // // // // Limitations: // // 4-byte UTF decoding not supported. // // No effort is done to mask 4-byte UTF character to two-byte WideChar // // 4-byte characters will be replace by space (#32) // // This should not affect further decoding. // // // // Background: // // Created as independant UTF8 unit to support libsql // // Targeted to be more effective than borland's implementation in D7+ // // especially on large strings. // // // // License: // // Modified Artistic License // // The MAL license is compatible with almost any open-source license // // Especially including but no limited to GPL, LGPL and BSD // // Main issues about this licese: // // You may use this unit for any legal purpose you see fit // // You may freely modify and redistribute this unit as long as // // you leave author(s) name(s) as contributor / original creator // // You may use this unit in closed-source commercial applications // // You may include this unit in your projects' source distribution // // You may re-license this unit as GPL or BSD if needed for your project // // // // Happy Programming ;) // // Rene // // * // implementation function EncodeUTF8(S: WideString): String; var rl: Integer; procedure Plus (c: byte); begin inc (rl); //pre-allocation to improve performance: if rl>length(Result) then SetLength (Result, length(Result)+2048); Result[rl] := char(c); end; var i: Integer; c: Word; begin //alter this to length(S) * 2 if you expect a _lot_ ('only') of non-ascii //for max speed. SetLength (Result, 20+round (length(S) * 1.2)); rl := 0; for i:=1 to length (S) do begin c := Word(S[i]); if c<=$7F then //single byte in valid ascii range Plus (c) else if c<$7FF then //two-byte unicode needed begin Plus ($C0 or (c shr 6)); Plus ($80 or (c and $3F)); end else begin //three byte unicode needed //Note: widestring specifies only 2 bytes //so, there is no need for encoding up to 4 bytes. Plus ($E0 or (c shr 12)); Plus ($80 or ((c and $FFF) shr 6)); Plus ($80 or (c and $3F)); end; end; SetLength (Result, rl); end; function DecodeUTF8 (S: String): WideString; var rl: Integer; procedure Plus (c: word); begin inc (rl); if (rl>length(Result)) then //alloc some extra mem SetLength (Result, length(Result)+512); Result[rl] := WideChar(c); end; var b,c,d,e,r: byte; w: Word; i,l: Integer; begin //Result := ''; SetLength (Result, length(S)); rl := 0; i := 1; l := length(S); while i<=l do begin b := byte(S[i]); if (b and $80)=0 then //7-bit Plus (b) else if (b and $E0)=$C0 then //11-bit begin if i$80 then c := $80; //error. tag with zero. sorry. b:=b and $1F; c:=c and $3F; plus (b shl 6 or c); inc (i); end else if (b and $F0) = $E0 then //16-bit begin if i$80 then c := $80; //error. tag with zero. sorry. if (d and $C0)<>$80 then d := $80; //error. tag with zero. sorry. b := b and $0F; c := c and $3F; d := d and $3F; plus ((b shl 12) or (c shl 6) or d); inc (i,2); end else begin //we have a problem here. a value > 16 bit was encoded //obviously, this doesn't fit in a widestring.. //fix: leave blank ('space'). sorry. Plus (ord(' ')); b := b shl 1; repeat b := b shl 1; inc (i); until (b and $80)=0; end; inc (i); end; SetLength (Result, rl); end; end.
Ktop_Robot 站務副站長發表：0 回覆：3511 積分：0 註冊：2007-04-17 發送簡訊給我	#10 引用回覆回覆發表時間：2007-04-26 13:51:02 IP:000.000.xxx.xxx 未訂閱提問者您好: 以上回應是否已得到滿意的答覆？若已得到滿意的答覆，請在一週內結案，否則請在一週內回覆還有什麼未盡事宜，不然，將由版主(尚無版主之區域將由副站長或站長)採自由心證,選擇較合適之解答予以結案處理，被選上之答題者同樣會有加分獎勵，同時發問者將受到扣 1 分的處分。不便之處，請見諒。有問有答有結案，才能有良性的互動，良好的討論環境需要大家共同維護，感謝您的配合。 ------ 我是機器人,我不接受簡訊.

系統時間：2024-04-27 11:03:52

聯絡我們 | Delphi K.Top討論版

本站聲明

1. 本論壇為無營利行為之開放平台，所有文章都是由網友自行張貼，如牽涉到法律糾紛一切與本站無關。
2. 假如網友發表之內容涉及侵權，而損及您的利益，請立即通知版主刪除。
3. 請勿批評中華民國元首及政府或批評各政黨，是藍是綠本站無權干涉，但這裡不是政治性論壇！

5151線上健康照護網 | 台灣西醫網 | 台灣中醫網 | 台灣牙科網 | 台灣照護網 | 趴趴狗旅遊網
大花蓮旅遊網 | 大花蓮民宿網 | 花蓮旅遊網 | 花蓮旅遊 | 花蓮旅遊 | 花蓮住宿
花蓮民宿網 | 花蓮旅遊 | 花蓮住宿 | 花蓮民宿 | 花蓮旅遊 | 花蓮民宿
花蓮住宿 | 大南投旅遊網 | 大南投民宿網 | 日月潭風景區 | 日月潭旅遊網 | 日月潭民宿網
日月潭住宿網 | 宜蘭旅遊網 | 宜蘭民宿網 | 宜蘭住宿網 | 宜蘭旅遊網 | 宜蘭民宿網
宜蘭住宿網 | 宜蘭旅遊網 | 宜蘭民宿網 | 宜蘭住宿網 | 台東旅遊網 | 台東民宿網
台東住宿網 | 台東旅遊網 | 台東民宿網 | 台東住宿網 | 台東旅遊 | 台東民宿
台東住宿 | 綠島旅遊網 | 綠島民宿網 | 綠島住宿網 | 綠島旅遊網 | 綠島民宿網
綠島住宿網 | 綠島旅遊網 | 綠島民宿網 | 綠島住宿網 | 集集旅遊網 | 集集民宿網
集集住宿網 | 關子嶺旅遊網 | 關子嶺民宿網 | 白河旅遊網 | 白河民宿網 | 心脈大師
尊榮牙醫診所 |