線上訂房服務-台灣趴趴狗聯合訂房中心
發文 回覆 瀏覽次數:5042
推到 Plurk!
推到 Facebook!

[問題]請問如何在BCB中撰寫含有SSE指令及的程式

 
gps
一般會員


發表:6
回覆:10
積分:3
註冊:2004-10-11

發送簡訊給我
#1 引用回覆 回覆 發表時間:2006-07-05 14:30:03 IP:140.121.xxx.xxx 未訂閱

請問BCB有辦法像VC一樣利用INTEL的編譯器

去撰寫含有MMX , SSE ,SSE2等指令集的程式嗎?

haman
中階會員


發表:46
回覆:137
積分:56
註冊:2005-03-10

發送簡訊給我
#2 引用回覆 回覆 發表時間:2006-07-05 22:18:28 IP:211.76.xxx.xxx 未訂閱
http://www.ccrun.com/other/go.asp?i=117&d=tdblw6

http://qc.borland.com/wc/qcmain.aspx?d=6579

參考看看吧。
gps
一般會員


發表:6
回覆:10
積分:3
註冊:2004-10-11

發送簡訊給我
#3 引用回覆 回覆 發表時間:2006-07-06 13:05:02 IP:140.121.xxx.xxx 未訂閱

您好

感謝您的回覆,但是這兩篇我都看過了

大致上找不到解答,例如大陸那一篇看起來比較像是給vc用的

__declspec(align(16)) 像這一種宣告bcb好像就沒有

另外外國那一篇,比較像是讓你debug時能夠看到mmx sse等暫存器的值

最後還是非常感謝您辛苦的回覆 謝謝

gps
一般會員


發表:6
回覆:10
積分:3
註冊:2004-10-11

發送簡訊給我
#4 引用回覆 回覆 發表時間:2006-07-06 13:08:59 IP:140.121.xxx.xxx 未訂閱

補充說明 以下擷取文章中某段文字

1.4 CW器支持
前面已Se,只有Intel的C\C CW器和Microsoft的Macro Assembler支持新的SSE指令集.IntelCW器已蹌膃X到Microsoft的Visual Studio集成{纗狺中F.Visual Studio集成{纗狴i以被配置成使用Intel的CW器SCW整炊u程或者工程中的某炊憟.

聽說microsfot有取得授權所以vc能使用

不知道bcb可不可以支援intel的compiler

axsoft
版主


發表:681
回覆:1056
積分:969
註冊:2002-03-13

發送簡訊給我
#5 引用回覆 回覆 發表時間:2006-07-14 18:13:49 IP:61.219.xxx.xxx 未訂閱
Visual C C Run-Time Samples
CPUID Sample: Determines CPU Capabilities
<!--NONSCROLLING BANNER END-->Download Sample
<!-- Topic Status -->
The CPUID sample provides a routine that uses the CPUID instruction to determine the capabilities of the CPU being run.

The sample provides the function int _cpuid(_p_info *pinfo), which returns data about the CPU. The int return value is a bitmask of flags for major processor features. The bits that might be set are:

    • #define _CPU_FEATURE_MMX 0x0001
    • #define _CPU_FEATURE_SSE 0x0002
    • #define _CPU_FEATURE_SSE2 0x0004
    • #define _CPU_FEATURE_3DNOW 0x0008
  • ~
  • Building and Running the Sample

To build and run this sample

  1. Open the solution cpuid.sln.
  2. From the Build menu, click Build.
  3. From the Debug menu, select Start Without Debugging.
Example Program Output
The sample includes a test.cpp file that trivially calls _cpuid and outputs the values in the resulting _p_info struct. For example, on a Pentium III computer that supports MMX and SSE, the program output would look something like this:
C:\work\cpuid>test
v_name:         GenuineIntel
model:          INTEL Pentium-III
family:         6
model:          8
stepping:       3
feature:        00000003
        yes     _CPU_FEATURE_MMX
        yes     _CPU_FEATURE_SSE
        no      _CPU_FEATURE_SSE2
        no      _CPU_FEATURE_3DNOW
os_support:     00000003
        yes     _CPU_FEATURE_MMX
        yes     _CPU_FEATURE_SSE
        no      _CPU_FEATURE_SSE2
        no      _CPU_FEATURE_3DNOW
checks:         0000000f
See Also
cpuid.h
#ifndef _INC_CPUID
#define _INC_CPUID
#define _CPU_FEATURE_MMX    0x0001
#define _CPU_FEATURE_SSE 0x0002
#define _CPU_FEATURE_SSE2 0x0004
#define _CPU_FEATURE_3DNOW 0x0008
#define _MAX_VNAME_LEN  13
#define _MAX_MNAME_LEN 30
typedef struct _processor_info {
char v_name[_MAX_VNAME_LEN]; // vendor name
char model_name[_MAX_MNAME_LEN]; // name of model
// e.g. Intel Pentium-Pro
int family; // family of the processor
// e.g. 6 = Pentium-Pro architecture
int model; // model of processor
// e.g. 1 = Pentium-Pro for family = 6
int stepping; // processor revision number
int feature; // processor feature
// (same as return value from _cpuid)
int os_support; // does OS Support the feature?
int checks; // mask of checked bits in feature
// and os_support fields
} _p_info;
#ifdef __cplusplus
extern "C"
#endif
int _cpuid (_p_info *);
#endif
cpuid.cpp
#include 
#include "cpuid.h"

// These are the bit flags that get set on calling cpuid
// with register eax set to 1
#define _MMX_FEATURE_BIT 0x00800000
#define _SSE_FEATURE_BIT 0x02000000
#define _SSE2_FEATURE_BIT 0x04000000
// This bit is set when cpuid is called with
// register set to 80000001h (only applicable to AMD)
#define _3DNOW_FEATURE_BIT 0x80000000
int IsCPUID()
{
__try {
_asm {
xor eax, eax
cpuid
}
}
__except (EXCEPTION_EXECUTE_HANDLER) {
return 0;
}
return 1;
}

/***
* int _os_support(int feature)
* - Checks if OS Supports the capablity or not
*
* Entry:
* feature: the feature we want to check if OS supports it.
*
* Exit:
* Returns 1 if OS support exist and 0 when OS doesn't support it.
*
****************************************************************/
int _os_support(int feature)
{
__try {
switch (feature) {
case _CPU_FEATURE_SSE:
__asm {
xorps xmm0, xmm0 // executing SSE instruction
}
break;
case _CPU_FEATURE_SSE2:
__asm {
xorpd xmm0, xmm0 // executing SSE2 instruction
}
break;
case _CPU_FEATURE_3DNOW:
__asm {
pfrcp mm0, mm0 // executing 3DNow! instruction
emms
}
break;
case _CPU_FEATURE_MMX:
__asm {
pxor mm0, mm0 // executing MMX instruction
emms
}
break;
}
}
__except (EXCEPTION_EXECUTE_HANDLER) {
if (_exception_code() == STATUS_ILLEGAL_INSTRUCTION) {
return 0;
}
return 0;
}
return 1;
}

/***
*
* void map_mname(int, int, const char *, char *)
* - Maps family and model to processor name
*
****************************************************/

void map_mname(int family, int model, const char *v_name, char *m_name)
{
// Default to name not known
m_name[0] = '\0';
    if (!strncmp("AuthenticAMD", v_name, 12)) {
switch (family) { // extract family code
case 4: // Am486/AM5x86
strcpy (m_name, "AMD Am486");
break;
        case 5: // K6
switch (model) { // extract model code
case 0:
case 1:
case 2:
case 3:
strcpy (m_name, "AMD K5");
break;
case 6:
case 7:
strcpy (m_name, "AMD K6");
break;
case 8:
strcpy (m_name, "AMD K6-2");
break;
case 9:
case 10:
case 11:
case 12:
case 13:
case 14:
case 15:
strcpy (m_name, "AMD K6-3");
break;
}
break;
        case 6: // Athlon
// No model numbers are currently defined
strcpy (m_name, "AMD ATHLON");
break;
}
}
else if (!strncmp("GenuineIntel", v_name, 12)) {
switch (family) { // extract family code
case 4:
switch (model) { // extract model code
case 0:
case 1:
strcpy (m_name, "INTEL 486DX");
break;
case 2:
strcpy (m_name, "INTEL 486SX");
break;
case 3:
strcpy (m_name, "INTEL 486DX2");
break;
case 4:
strcpy (m_name, "INTEL 486SL");
break;
case 5:
strcpy (m_name, "INTEL 486SX2");
break;
case 7:
strcpy (m_name, "INTEL 486DX2E");
break;
case 8:
strcpy (m_name, "INTEL 486DX4");
break;
}
break;
        case 5:
switch (model) { // extract model code
case 1:
case 2:
case 3:
strcpy (m_name, "INTEL Pentium");
break;
case 4:
strcpy (m_name, "INTEL Pentium-MMX");
break;
}
break;
        case 6:
switch (model) { // extract model code
case 1:
strcpy (m_name, "INTEL Pentium-Pro");
break;
case 3:
case 5:
strcpy (m_name, "INTEL Pentium-II");
break; // actual differentiation depends on cache settings
case 6:
strcpy (m_name, "INTEL Celeron");
break;
case 7:
case 8:
case 10:
strcpy (m_name, "INTEL Pentium-III");
break; // actual differentiation depends on cache settings
}
break;
        case 15 | (0x00 << 4): // family 15, extended family 0x00
switch (model) {
case 0:
strcpy (m_name, "INTEL Pentium-4");
break;
}
break;
}
}
else if (!strncmp("CyrixInstead", v_name, 12)) {
strcpy (m_name, "Cyrix");
}
else if (!strncmp("CentaurHauls", v_name, 12)) {
strcpy (m_name, "Centaur");
}
    if (!m_name[0]) {
strcpy (m_name, "Unknown");
}
}

/***
*
* int _cpuid (_p_info *pinfo)
*
* Entry:
*
* pinfo: pointer to _p_info.
*
* Exit:
*
* Returns int with capablity bit set even if pinfo = NULL
*
****************************************************/

int _cpuid (_p_info *pinfo)
{
DWORD dwStandard = 0;
DWORD dwFeature = 0;
DWORD dwMax = 0;
DWORD dwExt = 0;
int feature = 0;
int os_support = 0;
union {
char cBuf[12 1];
struct {
DWORD dw0;
DWORD dw1;
DWORD dw2;
} s;
} Ident;
    if (!IsCPUID()) {
return 0;
}
    _asm {
push ebx
push ecx
push edx
        // get the vendor string
xor eax, eax
cpuid
mov dwMax, eax
mov Ident.s.dw0, ebx
mov Ident.s.dw1, edx
mov Ident.s.dw2, ecx
        // get the Standard bits
mov eax, 1
cpuid
mov dwStandard, eax
mov dwFeature, edx
        // get AMD-specials
mov eax, 80000000h
cpuid
cmp eax, 80000000h
jc notamd
mov eax, 80000001h
cpuid
mov dwExt, edx
notamd:
pop ecx
pop ebx
pop edx
}
    if (dwFeature & _MMX_FEATURE_BIT) {
feature |= _CPU_FEATURE_MMX;
if (_os_support(_CPU_FEATURE_MMX))
os_support |= _CPU_FEATURE_MMX;
}
if (dwExt & _3DNOW_FEATURE_BIT) {
feature |= _CPU_FEATURE_3DNOW;
if (_os_support(_CPU_FEATURE_3DNOW))
os_support |= _CPU_FEATURE_3DNOW;
}
if (dwFeature & _SSE_FEATURE_BIT) {
feature |= _CPU_FEATURE_SSE;
if (_os_support(_CPU_FEATURE_SSE))
os_support |= _CPU_FEATURE_SSE;
}
if (dwFeature & _SSE2_FEATURE_BIT) {
feature |= _CPU_FEATURE_SSE2;
if (_os_support(_CPU_FEATURE_SSE2))
os_support |= _CPU_FEATURE_SSE2;
}
    if (pinfo) {
memset(pinfo, 0, sizeof(_p_info));
        pinfo->os_support = os_support;
pinfo->feature = feature;
pinfo->family = (dwStandard >> 8) & 0xF; // retrieve family
if (pinfo->family == 15) { // retrieve extended family
pinfo->family |= (dwStandard >> 16) & 0xFF0;
}
pinfo->model = (dwStandard >> 4) & 0xF; // retrieve model
if (pinfo->model == 15) { // retrieve extended model
pinfo->model |= (dwStandard >> 12) & 0xF;
}
pinfo->stepping = (dwStandard) & 0xF; // retrieve stepping
        Ident.cBuf[12] = 0;
strcpy(pinfo->v_name, Ident.cBuf);
        map_mname(pinfo->family, 
pinfo->model,
pinfo->v_name,
pinfo->model_name);
        pinfo->checks = _CPU_FEATURE_MMX |
_CPU_FEATURE_SSE |
_CPU_FEATURE_SSE2 |
_CPU_FEATURE_3DNOW;
}
    return feature;
}
 
Test.cpp
#include 
#include "cpuid.h"
void expand(int avail, int mask)
{
if (mask & _CPU_FEATURE_MMX) {
printf("\t%s\t_CPU_FEATURE_MMX\n",
avail & _CPU_FEATURE_MMX ? "yes" : "no");
}
if (mask & _CPU_FEATURE_SSE) {
printf("\t%s\t_CPU_FEATURE_SSE\n",
avail & _CPU_FEATURE_SSE ? "yes" : "no");
}
if (mask & _CPU_FEATURE_SSE2) {
printf("\t%s\t_CPU_FEATURE_SSE2\n",
avail & _CPU_FEATURE_SSE2 ? "yes" : "no");
}
if (mask & _CPU_FEATURE_3DNOW) {
printf("\t%s\t_CPU_FEATURE_3DNOW\n",
avail & _CPU_FEATURE_3DNOW ? "yes" : "no");
}
}
void main(void)
{
_p_info info;
    _cpuid(&info);
    printf("v_name:\t\t%s\n", info.v_name);
printf("model:\t\t%s\n", info.model_name);
printf("family:\t\t%d\n", info.family);
printf("model:\t\t%d\n", info.model);
printf("stepping:\t%d\n", info.stepping);
printf("feature:\tx\n", info.feature);
expand(info.feature, info.checks);
printf("os_support:\tx\n", info.os_support);
expand(info.os_support, info.checks);
printf("checks:\t\tx\n", info.checks);
}
以上資料來自Microsoft MSDN
axsoft
版主


發表:681
回覆:1056
積分:969
註冊:2002-03-13

發送簡訊給我
#6 引用回覆 回覆 發表時間:2006-07-14 18:19:04 IP:61.219.xxx.xxx 未訂閱
以下範例已在BCB6/BDS2006 Compiler無誤,故上一篇Microsoft MSDN for VC的範例也可在BCB Compiler無誤.....
CPUInfo.h
//-----------------------------------------------------------------
// CPUInfo.h
// by kkm1982
//----------------------------------------------------------------
class CCPUInfo
{
public:
int GetTypeName(char *type);
int GetName(char *name);
int GetFamily();
bool withMMX();
bool hasFPU();
int GetSpeed();
CCPUInfo();
virtual ~CCPUInfo();
private:
bool FPU;
char * Name;
bool MMX;
int iFamily;
};
//-----------------------------------------------------------------
//////////////////////////////////////////////////////////////////////
// Construction/Destruction
//////////////////////////////////////////////////////////////////////
//---------------------------------------------------------------------------
CCPUInfo::CCPUInfo()
{
char OEMString[13];
int iEAXValue,iEBXValue,iECXValue,iEDXValue;
//鳳CPU腔ⅲ齪ㄩ
_asm
{
mov eax,0
cpuid
mov DWORD PTR OEMString,ebx
mov DWORD PTR OEMString 4,edx
mov DWORD PTR OEMString 8,ecx
mov BYTE PTR OEMString 12,0
}
Name=new char[15];
strcpy(Name,OEMString);
//載嗣CPU陓洘ㄩ
_asm
{
mov eax,1
cpuid
mov iEAXValue,eax
mov iEBXValue,ebx
mov iECXValue,ecx
mov iEDXValue,edx
}
MMX=bool(iEDXValue & 0x800000);
iFamily=(0xf00 & iEAXValue)>>8;
FPU=bool(iEDXValue & 0x1);
}
//---------------------------------------------------------------------------
CCPUInfo::~CCPUInfo()
{
delete []Name;
}
//---------------------------------------------------------------------------
int CCPUInfo::GetSpeed()
{
int PriorityClass, Priority;
HANDLE hThread,hProcess;
hThread=GetCurrentThread();
hProcess=GetCurrentProcess();
PriorityClass = GetPriorityClass(hProcess);
Priority = GetThreadPriority(hThread);
SetPriorityClass(hProcess, REALTIME_PRIORITY_CLASS);
SetThreadPriority(hThread,THREAD_PRIORITY_TIME_CRITICAL);
long lEAXValue,lEDXValue;
SleepEx(50,false);
_asm
{
xor eax,eax
rdtsc
mov lEAXValue,eax
mov lEDXValue,edx
}
if(SleepEx(500,false)==0)
{
_asm
{
xor eax,eax
rdtsc
sub eax,lEAXValue
sbb edx,lEDXValue
mov lEAXValue, eax
mov lEDXValue, edx
}
}
SetThreadPriority(hThread, Priority);
SetPriorityClass(hProcess, PriorityClass);
return lEAXValue/(1000.0*500);
}
//---------------------------------------------------------------------------
bool CCPUInfo::withMMX()
{
return MMX;
}
//---------------------------------------------------------------------------
int CCPUInfo::GetFamily()
{
return iFamily;
}
//---------------------------------------------------------------------------
int CCPUInfo::GetName(char *name)
{
if(name==NULL) return -1;
strcpy(name,Name);
return 0;
}
//---------------------------------------------------------------------------
int CCPUInfo::GetTypeName(char *type)
{
//EAX腔8善11弇桶隴岆撓86ㄩ
// 3 - 386
// 4 - i486
// 5 - Pentium
// 6 - Pentium Pro Pentium II
// 2 - Dual Processors
if(type==NULL) return -1;
switch(iFamily)
{
case 2: strcpy(type,"Dual Processors");break;
case 3: strcpy(type,"386");break;
case 4: strcpy(type,"486");break;
case 5: strcpy(type,"Pentium");break;
case 6: strcpy(type,"P2,celeron,Pentium Pro");break;
default: strcpy(type,"Unknown Type");
}
return 0;
}
//---------------------------------------------------------------------------
bool CCPUInfo::hasFPU()
{
return FPU;
}
//--------------------------------------------------------------
UNIT1.H
//---------------------------------------------------------------------------
#ifndef Unit1H
#define Unit1H
//---------------------------------------------------------------------------
#include
#include
#include
#include <Forms.hpp><br />#include "cpuinfo.h"
#include
//---------------------------------------------------------------------------
class TForm1 : public TForm
{
__published: // IDE-managed Components
TButton *Button1;
TGroupBox *GroupBox1;
TLabel *Label1;
TLabel *Label2;
TEdit *EditCPUSpeed;
TCheckBox *CheckBoxMMX;
TCheckBox *CheckBoxHasFPU;
TEdit *EditCPUName;
TEdit *EditCPUType;
TLabel *Label3;
void __fastcall Button1Click(TObject *Sender);
private: // User declarations
public: // User declarations
__fastcall TForm1(TComponent* Owner);
};
//---------------------------------------------------------------------------
extern PACKAGE TForm1 *Form1;
//---------------------------------------------------------------------------
#endif
UNIT1.cpp
//---------------------------------------------------------------------------
#include
#pragma hdrstop
#include "Unit1.h"
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent* Owner)
: TForm(Owner)
{
}
//---------------------------------------------------------------------------
void __fastcall TForm1::Button1Click(TObject *Sender)
{
char cpuname[128],temp[128];
CCPUInfo *MyCpu=new CCPUInfo();
MyCpu->GetName(cpuname);
EditCPUName->Text=String(cpuname);
EditCPUSpeed->Text=MyCpu->GetSpeed();
MyCpu->GetTypeName(temp);
EditCPUType->Text=String(temp);
CheckBoxHasFPU->Checked=MyCpu->hasFPU();
CheckBoxMMX->Checked=MyCpu->withMMX();
delete MyCpu;
}
//---------------------------------------------------------------------------
gps
一般會員


發表:6
回覆:10
積分:3
註冊:2004-10-11

發送簡訊給我
#7 引用回覆 回覆 發表時間:2006-07-17 10:49:02 IP:140.121.xxx.xxx 未訂閱

您好

非常感謝您的回覆,可是您這一段程式的功能應該是提取與cpu有關的資料出來

而不是撰寫mmx與sse的程式,實際上我試過用assembly直接寫mmx的程式

compiler會過,但是sse的就不會過了,不知道是不是tasm不支援

另外就是intel有for c的函式庫但是只有intel與microsoft的compiler能用

最後還是很感謝您的回覆

bugmans
高階會員


發表:95
回覆:322
積分:188
註冊:2003-04-12

發送簡訊給我
#8 引用回覆 回覆 發表時間:2007-09-16 10:07:43 IP:125.225.xxx.xxx 未訂閱
看到這問題似乎很有趣,我花了一個禮拜上google找資料剛好有些心得可以分享給各位
SSE 介紹
(中文)http://www.csie.ntu.edu.tw/~r89004/hive/sse/page_1.html
Introduction to the Streaming SIMD Extensions in the Pentium III
http://www.x86.org/articles/articles.htm#sse_pt1
看完這兩篇後對於SSE應該有粗淺的了解,所以要使用SSE有三種方法(asm,intrinsics,F32vec4物件)
但在BCB6實作時就有問題了
1.asm方法
[code cpp]
float _SSE_Sqrt(float x)
{//利用SSE指令集計算平方根
float root = 0.f;
_asm
{
sqrtss xmm0, x //xmm0~xmm7是SSE的暫存器
movss root, xmm0
}
return root;
}
int main(int argc, char* argv[])
{
float r=_SSE_Sqrt(2.0f);
return 0;
}
[/code]
我目前只有這個範例有執行成功,但注意到sqrtss一次只能計算一個值,另外一個相同功能的指令sqrtps
一次就能算四個不同數字的平方根,但現在卡在無法將四個值的陣列傳入asm{}內
[code cpp]
#pragma pack(push, 16)//用來取代__declspec(align(16))
float x[4];
#pragma pack(pop)
int main(int argc, char* argv[])
{
_asm
{push esi
mov esi , x
movaps xmm0 , [esi] //嘗試將x陣列讀入xmm0暫存器,但執行到這裡就得到Access Violation
pop esi
}
return 0;
}
[/code]
若無法執行ps系列的指令,就無法享受到SSE執行的快速
註:關於ss和ps請看http://www.x86.org/articles/sse_pt1/simd1.htm (5.1.1 Data Packing)
暗黑破壞神
版主


發表:9
回覆:2301
積分:1627
註冊:2004-10-04

發送簡訊給我
#9 引用回覆 回覆 發表時間:2007-09-16 17:22:32 IP:61.225.xxx.xxx 未訂閱
我做了個實驗

[code cpp]
main()
{
int i = 0;
int j = 0;
for (i = 0; i < 100; i )
{
j =1;
}
printf("%d", j);
}
[/code]

這是原程式碼。
改完後變成這樣。


[code cpp]
main()
{
int i = 0;
int j = 0;
for (i = 0; i < 100; i )
{
_asm
{
db 0x8B, 0x55, 0xCC
db 0x01, 0x55, 0xC8
db 0xFF, 0x45, 0xCC
db 0x83, 0x7D, 0xCC, 0x64
db 0x7C, 0xF1
}
}
printf("%d", j);
}
[/code]

兩個程式的執行結果一樣。
並不是什麼特殊方法。
只是我把機器碼直接給它。
不經過它的組譯(tasm)
所以我不用管它到底有沒有支援這些個指令。
至於,你說的怎麼把資料傳入跟傳出。
那也跟寫 asm 方式一樣而已。
你試看看吧。有問題再提出。
bugmans
高階會員


發表:95
回覆:322
積分:188
註冊:2003-04-12

發送簡訊給我
#10 引用回覆 回覆 發表時間:2007-09-22 06:33:08 IP:125.225.xxx.xxx 未訂閱
[code cpp]
#pragma pack(push, 16)
float x[4];
#pragma pack(pop)
#include
int main(int argc, char* argv[])
{
printf("%d %d %d %d",&x[0],&x[1],&x[2],&x[3]);
}

[/code]
測試結果x陣列根本沒有落在16的倍數上,難怪到movaps會出現Access Violation
既然指令不支援那就自己動手來,我的作法是先malloc的char陣列
再找到位址為16的倍數,不用時再將陣列free掉
[code cpp]
#include
#include
void main()
{
char *buf;
float *a[4];
buf=(char *)malloc(200);
a[0]=(float *)(((int)buf 15) & -16);
*a[0]=0.2f;
a[1]=a[0] 4; *a[1]=0.1f;
a[2]=a[1] 4; *a[2]=0.2f;
a[3]=a[2] 4; *a[3]=0.3f;
printf("%d %d %d %d",a[0],a[1],a[2],a[3]);
_asm
{mov eax ,[a]
movaps xmm0,[eax]
}
free(buf);
}
[/code]
雖然movaps不再出現錯誤,但目前還無法從xmm0取回當初的資料

變數位置沒有對齊在16的倍數上的話,還有movups可用,效能會差了點,但這是目前能順利執行的範例

[code cpp]
typedef struct m128
{float a[4];
}m128;

m128 add(m128 n1,m128 n2)
{
m128 r;
_asm //ebp 0 傳回值r放的位置,sizeof=16
{movups xmm0,[ebp 16]//n1
movups xmm1,[ebp 32]//n2
addps xmm0,xmm1
movups r,xmm0
}
return r;
}
void main()
{
m128 a={0.1f,0.2f,0.3f,0.4f},b={0.4f,0.3f,0.2f,0.1f},c;
c=add(a,b);//c={0.5f,0.5f,0.5f,0.5f}
}

[/code]
這些日子花了很多時間在測試網路上所找到的資料到底哪些可以在BCB上執行
所以我建議以後看到這篇討論的網友,若有需要寫關於SSE的程式
強烈建議請使用Intel的編譯器或微軟的VC,至少網路上資料比較多
編輯記錄
bugmans 重新編輯於 2007-09-22 06:34:56, 註解 無‧
aftcast
站務副站長


發表:81
回覆:1485
積分:1763
註冊:2002-11-21

發送簡訊給我
#11 引用回覆 回覆 發表時間:2007-09-22 13:39:06 IP:60.248.xxx.xxx 訂閱
暗黑使出這種最「暴力」的方法實在是看到傻眼! 

僅管是真的可以乎略compiler支持與否,但真的是太暴力了…要實作這樣的inline assembly真的要有高超的技巧。我雞婆的補充一下實作要點。
方法1 : 超人的作法: 要對 x86 opcode的規格很熟,於是可以直接查表寫出如 mov ecx, eax 的opcode = 8BC8,相當於直接寫機器語言,有這等功力大概也可以寫反組譯器了! ^_^
方法2: 用MASM之類的ASSEMBLER先寫好副程式(有SUPPORT SSE),然後反組譯再COPY OPCODE出來。這樣人性一點,但也麻煩。

個人的看法,暴力的方法比較適用在對單一微指令上,比如RDTSC類的指令,OPCODE=db 0x0F, 0x31,即沒有定址問題的指令。

針對目前的這個問題,建議也可以試著用MASM寫好PROC,然後編成OMF格式,再與BCB LINK。這樣可能是較人性化,可讀性高的方式。


===================引 用 暗黑破壞神 文 章===================

[code cpp]
main()
{
int i = 0;
int j = 0;
for (i = 0; i < 100; i )
{
_asm
{
db 0x8B, 0x55, 0xCC
db 0x01, 0x55, 0xC8
db 0xFF, 0x45, 0xCC
db 0x83, 0x7D, 0xCC, 0x64
db 0x7C, 0xF1
}
}
printf("%d", j);
}
[/code]
------


蕭沖
--All ideas are worthless unless implemented--

C++ Builder Delphi Taiwan G+ 社群
http://bit.ly/cbtaiwan
暗黑破壞神
版主


發表:9
回覆:2301
積分:1627
註冊:2004-10-04

發送簡訊給我
#12 引用回覆 回覆 發表時間:2007-09-22 22:39:29 IP:61.225.xxx.xxx 未訂閱
啊。好感動喔。有人看得懂我說什麼耶。     ^________^

其實。我會這樣寫也是有原因為。因為我用到了參數。而參數就關聯到
它所存放的位置。
而利用這樣可以去存取參數。
而用 db 的方法,只是讓發問者知道。就算不支援的指令。
也可以先用查表,查出來它的機碼,然後直接寫那一兩行。

技術秀過頭了。居然還有人看得懂。
真是高興
===================引 用 aftcast 文 章===================
暗黑使出這種最「暴力」的方法實在是看到傻眼!

僅管是真的可以乎略compiler支持與否,但真的是太暴力了…要實作這樣的inline assembly真的要有高超的技巧。我雞婆的補充一下實作要點。
[code cpp]
main()
{
int i = 0;
int j = 0;
for (i = 0; i < 100; i )
{
_asm
{
db 0x8B, 0x55, 0xCC
db 0x01, 0x55, 0xC8
db 0xFF, 0x45, 0xCC
db 0x83, 0x7D, 0xCC, 0x64
db 0x7C, 0xF1
}
}
printf("%d", j);
}
[/code]
bugmans
高階會員


發表:95
回覆:322
積分:188
註冊:2003-04-12

發送簡訊給我
#13 引用回覆 回覆 發表時間:2007-10-05 15:36:08 IP:125.225.xxx.xxx 未訂閱
機器語言我實在不懂,我先不討論這部分
2.intrinsics方法
因為組合語言比較難閱讀,所以再將sqrtss包裝成_mm_sqrt_ss等易懂的函式,全部的函式定義在xmmintrin.h
但在BCB引用xmmintrin.h後
還出現[linker error]Unresolved external '_mm_sqrt_ss' referenced from ....錯誤訊息
原因是BCB找不到_mm_sqrt_ss實作的程式碼,但我在inlude和lib目錄都找不到程式碼
我還特地下載IntelC Compiler10說不定可以從某個lib檔找到可呼叫的dll檔,但還是失敗了
以下的範例在Visual C 2005 Express Edition可順利執行
[code cpp]
#include "stdafx.h"
#include
int _tmain(int argc, _TCHAR* argv[])
{
__m128 a, b, c;
a = _mm_set_ps(4, 3, 2, 1);
b = _mm_set_ps(4, 3, 2, 1);
c = _mm_set_ps(0, 0, 0, 0);
c = _mm_mul_ps(a, b);//一次對四個值作乘法運算,結果c=16,9,4,1
return 0;
}
[/code]
3.F32vec4物件
再將各運算函式包裝成C 物件,物件的定義在fvec.h可以找到
以下的範例只能在Visual C 2005 Express Edition測試成功
[code cpp]
#include "stdafx.h"
#include
int _tmain(int argc, _TCHAR* argv[])
{
F32vec4 a(4,3,2,1),b(4,3,2,1),c(0,0,0,0);
c=a*b;//c(16,9,4,1)
return 0;
}
[/code]
bugmans
高階會員


發表:95
回覆:322
積分:188
註冊:2003-04-12

發送簡訊給我
#14 引用回覆 回覆 發表時間:2007-10-25 20:22:40 IP:125.225.xxx.xxx 未訂閱

[code cpp]
/*
__mingw_aligned_malloc and friends, implemented using Microsoft's public
interfaces and with the help of the algorithm description provided
by Wu Yongwei: http://sourceforge.net/mailarchive/message.php?msg_id=3847075
I hereby place this implementation in the public domain.
-- Steven G. Johnson (stevenj@alum.mit.edu)
*/
#include
#include
#include /* ptrdiff_t */
#include /* memmove */
#ifdef HAVE_STDINT_H
# include /* uintptr_t */
#else
# define uintptr_t size_t
#endif
#define NOT_POWER_OF_TWO(n) (((n) & ((n) - 1)))
#define UI(p) ((uintptr_t) (p))
#define CP(p) ((char *) p)
#define PTR_ALIGN(p0, alignment, offset) \
((void *) (((UI(p0) (alignment sizeof(void*)) offset) \
& (~UI(alignment - 1))) \
- offset))
/* Pointer must sometimes be aligned; assume sizeof(void*) is a power of two. */
#define ORIG_PTR(p) (*(((void **) (UI(p) & (~UI(sizeof(void*) - 1)))) - 1))
void *
__mingw_aligned_offset_malloc (size_t size, size_t alignment, size_t offset)
{
void *p0, *p;
if (NOT_POWER_OF_TWO (alignment))
{
errno = EINVAL;
return ((void *) 0);
}
if (size == 0)
return ((void *) 0);
if (alignment < sizeof (void *))
alignment = sizeof (void *);
/* Including the extra sizeof(void*) is overkill on a 32-bit
machine, since malloc is already 8-byte aligned, as long
as we enforce alignment >= 8 ...but oh well. */
p0 = malloc (size (alignment sizeof (void *)));
if (!p0)
return ((void *) 0);
p = PTR_ALIGN (p0, alignment, offset);
ORIG_PTR (p) = p0;
return p;
}
void *
__mingw_aligned_malloc (size_t size, size_t alignment)
{
return __mingw_aligned_offset_malloc (size, alignment, 0);
}
void
__mingw_aligned_free (void *memblock)
{
if (memblock)
free (ORIG_PTR (memblock));
}
void *
__mingw_aligned_offset_realloc (void *memblock, size_t size,
size_t alignment, size_t offset)
{
void *p0, *p;
ptrdiff_t shift;
if (!memblock)
return __mingw_aligned_offset_malloc (size, alignment, offset);
if (NOT_POWER_OF_TWO (alignment))
goto bad;
if (size == 0)
{
__mingw_aligned_free (memblock);
return ((void *) 0);
}
if (alignment < sizeof (void *))
alignment = sizeof (void *);
p0 = ORIG_PTR (memblock);
/* It is an error for the alignment to change. */
if (memblock != PTR_ALIGN (p0, alignment, offset))
goto bad;
shift = CP (memblock) - CP (p0);
p0 = realloc (p0, size (alignment sizeof (void *)));
if (!p0)
return ((void *) 0);
p = PTR_ALIGN (p0, alignment, offset);
/* Relative shift of actual data may be different from before, ugh. */
if (shift != CP (p) - CP (p0))
/* ugh, moves more than necessary if size is increased. */
memmove (CP (p), CP (p0) shift, size);
ORIG_PTR (p) = p0;
return p;
bad:
errno = EINVAL;
return ((void *) 0);
}
void *
__mingw_aligned_realloc (void *memblock, size_t size, size_t alignment)
{
return __mingw_aligned_offset_realloc (memblock, size, alignment, 0);
}
//---------------------------------------------------------------------------
#include
int main(int argc, char* argv[])
{
float *a=(float *)__mingw_aligned_malloc(sizeof(float)*4,16);
float *b=(float *)__mingw_aligned_malloc(sizeof(float)*4,16);
a[0]=0.1f; a[1]=0.2f; a[2]=0.3f; a[3]=0.4f;
b[0]=0.4f; b[1]=0.3f; b[2]=0.2f; b[3]=0.1f;
_asm
{mov eax,[a]
mov edx,[b]
movaps xmm0,[eax]
addps xmm0,[edx]
movaps [eax],xmm0
}
printf("%f %f %f %f",a[0],a[1],a[2],a[3]);//0.5 0.5 0.5 0.5
__mingw_aligned_free(a);
__mingw_aligned_free(b);
return 0;
}
[/code]

之前提到無法取得記憶體位址為16的倍數,今天剛好在google找到別人寫好的程式碼
http://www.koders.com/c/fid98DD89C9A7D444032542F612D8A341C209CE9F6A.aspx
也順利完成兩組浮點數相加(addps)的範例,只要從BCB開Console專案,再將程式碼貼上就可以執行了
編輯記錄
bugmans 重新編輯於 2007-10-25 20:25:27, 註解 無‧
bugmans 重新編輯於 2007-10-25 20:33:35, 註解 無‧
bugmans
高階會員


發表:95
回覆:322
積分:188
註冊:2003-04-12

發送簡訊給我
#15 引用回覆 回覆 發表時間:2008-02-15 17:45:47 IP:125.225.xxx.xxx 未訂閱

[code cpp]
#include
#include
#include
static __inline void*
_mm_malloc (size_t size, size_t align)
{
void * malloc_ptr;
void * aligned_ptr;
/* Error if align is not a power of two. */
if (align & (align - 1))
{
errno = EINVAL;
return ((void*) 0);
}
if (size == 0)
return ((void *) 0);
/* Assume malloc'd pointer is aligned at least to sizeof (void*).
If necessary, add another sizeof (void*) to store the value
returned by malloc. Effectively this enforces a minimum alignment
of sizeof double. */
if (align < 2 * sizeof (void *))
align = 2 * sizeof (void *);
malloc_ptr = malloc (size align);
if (!malloc_ptr)
return ((void *) 0);
/* Align We have at least sizeof (void *) space below malloc'd ptr. */
aligned_ptr = (void *) (((size_t) malloc_ptr align)
& ~((size_t) (align) - 1));
/* Store the original pointer just before p. */
((void **) aligned_ptr) [-1] = malloc_ptr;
return aligned_ptr;
}
static __inline void
_mm_free (void * aligned_ptr)
{
if (aligned_ptr)
free (((void **) aligned_ptr) [-1]);
}
void main()
{
float *a=(float *)_mm_malloc(sizeof(float)*4,16);
float *b=(float *)_mm_malloc(sizeof(float)*4,16);
a[0]=0.1f; a[1]=0.2f; a[2]=0.3f; a[3]=0.4f;
b[0]=0.4f; b[1]=0.3f; b[2]=0.2f; b[3]=0.1f;
_asm
{mov eax,[a]
mov edx,[b]
movaps xmm0,[eax]
addps xmm0,[edx]
movaps [eax],xmm0
}
printf("%f %f %f %f",a[0],a[1],a[2],a[3]);
_mm_free(a);
_mm_free(b);
}

[/code]

我參考gcc的mm_malloc.h檔案,利用_mm_malloc宣告記憶體位置為16的倍數
http://www.koders.com/c/fid734CB65BA142B297562A83D149492EFB419EBD8D.aspx
完成兩組浮點數相加(addps)的範例,只要從BCB開Console專案,再將程式碼貼上就可編譯執行

gcc也支援intel的intrinsics相關指令,如_mm_load_ps,_mm_add_ps等指令
請參考http://www.tuleriit.ee/progs/rexample.php
bugmans
高階會員


發表:95
回覆:322
積分:188
註冊:2003-04-12

發送簡訊給我
#16 引用回覆 回覆 發表時間:2008-02-15 17:52:44 IP:125.225.xxx.xxx 未訂閱
另外BCB和Delphi的使用者可以用Exentia
http://www.tommesani.com/ExentiaFeatures.html
只要宣告TFVector物件後就可以使用其中的函式來運用SSE指令,如Add,Sub,Mul,Divide,Sqrt...
感謝Justmade提供的資訊http://delphi.ktop.com.tw/board.php?cid=30&fid=74&tid=31475
系統時間:2024-04-20 15:04:01
聯絡我們 | Delphi K.Top討論版
本站聲明
1. 本論壇為無營利行為之開放平台,所有文章都是由網友自行張貼,如牽涉到法律糾紛一切與本站無關。
2. 假如網友發表之內容涉及侵權,而損及您的利益,請立即通知版主刪除。
3. 請勿批評中華民國元首及政府或批評各政黨,是藍是綠本站無權干涉,但這裡不是政治性論壇!