GPGPU using Opencl

cayoenrique

Senior Member
Messages
478
@C0der
Listen I do not forget you are out there. And you do not need the CSA stuff. BC only or Colibri csa.cu kernel can be done. Like I said ,speed should come up the almost the same. OpenCl is like Cuda. Now changing the way to compute Keyschedule or the way we read form SBOX & Permute. That makes difference in speed. Like they say trial and error, Instead "trial and speed test". But that is more an advance math project. Problem is that if I try that I will have even less time for the learning stuff.

Dale once posted that an encrypted pack transponder like in Abertis will produce a very special situation. Where instead of the normal require 1 stream + 1 block cypher per TS, we can instead find solution with 2 block cypher but of two different ts. No stream. Or 2 stream no block. Let me see if I can remeber, try to refresh those formulas and I may try to explain the case later.

Listen guys. I have not much time. But I give my best to speed up. I will post something tomorrow.
 

cayoenrique

Senior Member
Messages
478
OK I hope you appreciate this. I spent the whole night working on this.
OCLBiss_012.zip (21.34 KB)
Code:
https://workupload.com/file/tgKfLKUqDX9
pass:www.sat-universe.com

To stop it some how trick. As the GPU gets busy Program becomes also unresponsive a little. To stop in windows you long press [ESC] Key until it asy if you realy want to stop. Then press "Y". In Linux I had to substitute [ESC] as it does not work well. So In Linux we do [CTRL]+C then "Y".

Now, for doing proper Brute-Force you should have 2 GPU. 1 to Brute-Force the other to manage your screen. Most of you will have a GPU in your Intel CPU.

Now If you have ONLY one GPU. Then you need to limit the amount of time spent at GPU. IF you force it it will hang screen for a few seconds, as the GPU was to busy to refreesh your screen. Then my program will break as Windows complains.

To adjust or maximize speed you play with OCLBiss.cfg in it you will find
Code:
#Parallel OpenCL Parameters
SINGLETHREADENABLE:0
LOOPSPERTHREAD:1024 16384   #128 1024 2048 4096 8192 16384 32768 65536
LOCALTHREADS:0 64 0 256
GLOBALTHREADS:0 1536

Again do not forget. config will accept ONLY 1rst value before space and after ":".

So if you see LOCALTHREADS:0 64 0 256 this means 0 . Because there is a space then 64 0 256 are ignored. It is like a scrap book where you leave the most used values to remeber.

Now if you imput 0 for LOCALTHREADS &/or GLOBALTHREADS it will use the default values it gets from GPU. That is why you see I use 0. But you can play with that latter.
LOOPSPERTHREAD is where most of the action is. This is while 1 thread is loaded how much time to spent using GPU. Again a very big number may hang your GPU. Start increasing it slowly. as you see I use multiple of 2.

Do not play with SINGLETHREADENABLE:0 most likekly will not work, This is an attempt to do a single thread so that I can try to print some values when I debug. It has no other used. But I am not sure if it works in Windows as I did not test it.

I did not test SPEEDTESTENABLE & SPEEDTESTKEY It is in fact useless as you can test your speed with normal operation.

Now here is where for now we introduce the SB0+SB1 and the start Key

STARTKEY: 0000385F624F0000
CSADECTYPE:0
SB01: 3CEBDC173C2BD64F651688F258D59705

Leave CSADECTYPE:0 as is, program is unfinish. May be latter will have some use.
SB01: 3CEBDC173C2BD64F651688F258D59705 is the 16 byte we trying to decrypt
STARTKEY: 0000385F624F0000 is the 6 byte CSA KEY to start. as 38 5F 62 XX 4F 00 00 XX. So yes you should write 0000 at the beginning as a remind

The real KEY I used is 0000385F624FCABC as in 38 5F 62 XX 4F CA BC XX.

Now as the program is incomplete ONLY testing 1 16 butes SB0+SB1, this program will produce multiple FAKE keys. This is why we should have at least 2 and in Cudabiss I guess they used 3 16 bytes.

I hope you appreciate my effort. Enjoy as the Sun is raising and I have not gone to bed. ;)
 

dvlajkovic

Senior Member
Messages
499
Thank you, Enrique.
Compiled program runs but its not stopping on the real key 38 5F 62 F9 4F CA BC D5.
Most likely that's because we use just 1 16 bytes.
Yes, cudabiss uses 3 16 PUSI packets and stops only when all 3 are decoded with the same key.
I've put LOOPSPERTHREAD:65536
and it's not hanging gpu - dragon runs it easily while he's watching youtube clip in background and me typing this text (see screenshot below)

92Ddlmc.jpeg
 

cayoenrique

Senior Member
Messages
478
The program is not build to stop in a POSSIBLE KEY. Instead continues to get nest POSSIBLE key.

Now the CPU part in the future should then receive the Posible and then do a quick Test with that POSSIBLE key with the other 2 16 bytes. If all three then get a PASS 00 00 01 then is a Winner. In fact most of the time ALL 8 Bytes gets to be the same if they where pick from same VPID or APID.

Now ALL has posted questioning the SPEED. For most I may ignore them . But You having one of the BEST GPU I wonder whats you result. You did not posted a clear view of the program output. Can you look at log.txt and paste here the output. Like mine in a Laptop 480 core AMD GPU I get:
Code:
enrique@live:$ make
Detected OS = Linux
CFLAGS = -Wall -g -fopenmp -std=gnu11 
INC_DIRS = 
LIB_DIRS = 
LIBS = -lgomp -lOpenCL
CC = gcc
STRIP = strip
RM = rm -f
RRM = rm -f r
srcfiles = OCLBiss.c OCLB_toolbox.c
objects = OCLBiss.o OCLB_toolbox.o
BIN = OCLBiss
gcc -Wall -g -fopenmp -std=gnu11  -c OCLBiss.c OCLB_toolbox.c
CC	OCLBiss.c
g++  -Wall -g -fopenmp -std=gnu11  OCLBiss.o OCLB_toolbox.o   -lgomp -lOpenCL  -o OCLBiss
strip OCLBiss
enrique@live:$ ./OCLBiss
Reading from OCLBiss.cfg...Done...

Today is Sun Sep 10 08:33:46 2023
Connected to device:AMD => AMD TURKS (DRM 2.50.0 / 4.19.0-6-amd64, LLVM 7.0.1)

Device Kernel properties:
	number of cores: 				6
	recommended work group size (local threads): 	64
	max work group size:				256


Number Loops per thread: 				1024
Number of keys per thread(x32/4096): 			1024
Local threads: 						256
Global threads: 					1536
Keys per kernel: 					0x180000(1572864)

		*****Important: To stop program and save current status******* 
				PRESS a LONG [ESC] Key and then say 'Y'

		Scroll Lock LED in KeyBoard will be used to Signal Key Found event.
			Please ensure it is off before continuing

		File Log from Last Searched present:				11C16EC2C380101
		Do you what to continue from here?
		If you type 'N' or 'n' will start from config.ini value:	0000385F624F0000
		Press Enter to Continue (Y)n

BruteForcing for:
SB01:		3C EB DC 17 3C 2B D6 4F 65 16 88 F2 58 D5 97 05

Range: 0000000000180000

Loop             From             To                kernel Time  Keys per seconds        #Keys Found 

00000001 00385F624F0000 00385F6266FFFF [...] 04:34:36  kps:1.383236618756e+006 00000001         Key 001:38 5F 62 F9 4F CA BC D5  Total 1:
00000002 00385F62670000 00385F627EFFFF [...] 04:34:38  kps:1.386672127620e+006 00000000 Total 1:
00000003 00385F627F0000 00385F6296FFFF [...] 04:34:39  kps:1.386292069851e+006 00000000 Total 1:

So my slow Laptop AMD 480 core GPU does kps:1.386292069851e+006

Now what about starting "GPU-Z.2.54.0.exe" on 1rst TAB select your NVIDIA GPU, then move to the TAB "SENSORS".
As soon as you run OCLBiss.exe you should see ot Loading and Temperature rising. What is % GPU LOAD?
 
Last edited:

cayoenrique

Senior Member
Messages
478
In any case this is the worst case. It has no GPU improvement. That is what you guys need to do. changing Key schedule, chsnging the way we read SBOX & Permute for Block Cypher. Stream Cypher Sboxes can easily be change with their Karnaugh map equivalent, so no read to memories are perform. We could even split Block and Stream to be a two part Kernel. And many many other test. This is like a game. you keep changing until one day you find the faster kernel.
 

dvlajkovic

Senior Member
Messages
499
Code:
Today is Sun Sep 10 15:37:46 2023
Connected to device:NVIDIA Corporation => NVIDIA GeForce RTX 4090

Device Kernel properties:
    number of cores:                 128
    recommended work group size (local threads):     32
    max work group size:                256


Number Loops per thread:                 65536
Number of keys per thread(x32):             65536
Local threads:                         256
Global threads:                     32768
Keys per kernel:                     0x80000000(2147483648)

BruteForcing for:
SB01:       

Range: 0000000080000000

Loop             From             To                kernel Time  Keys per seconds        #Keys Found

0000003B 00387C624F0000 00387CE24EFFFF [...] 12:39:28  kps:5.711649249351e+008 00000026         Key 001:38 7C AE 62 4F 05 3D 91 Key 002:38 7C 8F 43 52 08 13 6D Key 003:38 7C D2 86 7E 08 2E B4 Key 004:38 7C D2 86 57 70 99 60 Key 005:38 7C D3 87 10 76 45 CB Key 006:38 7C B1 65 F1 F3 98 7C Key 007:38 7C 67 1B 2E F3 92 B3 Key 008:38 7C 70 24 3A D0 57 61 Key 009:38 7C 6E 22 C4 E9 C1 6E Key 010:38 7C 80 34 BB 97 2A 7C Key 011:38 7C C2 76 8D DB E1 49 Key 012:38 7C B2 66 98 DC A5 19 Key 013:38 7C B2 66 F1 FD CC BA Key 014:38 7C CA 7E B8 E0 CE 66 Key 015:38 7C B5 69 FD F9 1B 11 Key 016:38 7C B0 64 52 E9 FA 35 Key 017:38 7C BE 72 90 EA FD 77 Key 018:38 7C BF 73 16 FD 5E 71 Key 019:38 7C BD 71 42 C8 78 82 Key 020:38 7C D4 88 B4 EE 2A CC Key 021:38 7C 6C 20 24 EF 94 A7 Key 022:38 7C AC 60 62 F2 09 5D Key 023:38 7C DE 92 B6 E3 EF 88 Key 024:38 7C 89 3D 1D CC B7 A0 Key 025:38 7C 93 47 21 D8 46 3F Key 026:38 7C 6A 1E 70 FD D8 45
0000003C 00387CE24F0000 00387D624EFFFF [...] 12:39:32  kps:5.696851920859e+008 00000014         Key 001:38 7D 01 B6 B9 00 36 EF Key 002:38 7D 33 E8 9B 03 14 B2 Key 003:38 7D 1E D3 58 04 2A 86 Key 004:38 7D 1E D3 72 48 33 ED Key 005:38 7D 1F D4 32 B6 05 ED Key 006:38 7C F0 A4 F9 51 D1 1B Key 007:38 7D 3A EF 99 C1 EA 44 Key 008:38 7C E5 99 27 C4 FB E6 Key 009:38 7C F5 A9 88 D9 74 D5 Key 010:38 7D 12 C7 A1 F0 F2 83 Key 011:38 7D 0A BF C6 AC F5 67 Key 012:38 7D 10 C5 3D E4 00 21 Key 013:38 7D 19 CE 2D E3 FE 0E Key 014:38 7D 27 DC 43 FD 8F CF
0000003D 00387D624F0000 00387DE24EFFFF [...] 12:39:36  kps:5.710754169933e+008 00000010         Key 001:38 7D 71 26 63 00 94 F7 Key 002:38 7D A0 55 9B 05 9F 3F Key 003:38 7D A0 55 9E 05 BD 60 Key 004:38 7D A0 55 C4 D1 89 1E Key 005:38 7D A0 55 B3 F9 F2 9E Key 006:38 7D AD 62 A9 F7 B6 56 Key 007:38 7D B8 6D DB F9 03 D7 Key 008:38 7D C6 7B AD 6B 82 9A Key 009:38 7D C6 7B 6B B0 8D A8 Key 010:38 7D D2 87 7B FB EA 60

As for the Current GPU Load, it goes from low to 100%. then drops for a short period and goes back to 100%.
Average GPU Load will require running the search for a longer period.

lFu2uJo.png
z10huro.gif
 

cayoenrique

Senior Member
Messages
478
BADDDDddddd news.

The PC where I have my good AMD GPU broke. Its power supply do not work. And as I said my economic situation is poor. Well I can do basics in my laptop. But this means it will be hard for me to really test what can work to improve speed. I can suggest but not test. Life sucks.
 

cayoenrique

Senior Member
Messages
478
I need a few good one of you to join and test. So that I can feel useful.

Last time we introduce our 1rst OCLBiss and we discover it is slow !!! On dvlajkovic results show.
Code:
0000003B 00387C624F0000 00387CE24EFFFF [...] 12:39:28  kps:5.711649249351e+008 00000026         Key 001:38 7C AE 62 4F 05 3D 91 Key 002:38 7C 8F 43 52 08 13 6D Key 003:38 7C D2 86 7E 08 2E B4 Key 004:38 7C D2 86 57 70 99 60 Key 005:38 7C D3 87 10 76 45 CB Key 006:38 7C B1 65 F1 F3 98 7C Key 007:38 7C 67 1B 2E F3 92 B3 Key 008:38 7C 70 24 3A D0 57 61 Key 009:38 7C 6E 22 C4 E9 C1 6E Key 010:38 7C 80 34 BB 97 2A 7C Key 011:38 7C C2 76 8D DB E1 49 Key 012:38 7C B2 66 98 DC A5 19 Key 013:38 7C B2 66 F1 FD CC BA Key 014:38 7C CA 7E B8 E0 CE 66 Key 015:38 7C B5 69 FD F9 1B 11 Key 016:38 7C B0 64 52 E9 FA 35 Key 017:38 7C BE 72 90 EA FD 77 Key 018:38 7C BF 73 16 FD 5E 71 Key 019:38 7C BD 71 42 C8 78 82 Key 020:38 7C D4 88 B4 EE 2A CC Key 021:38 7C 6C 20 24 EF 94 A7 Key 022:38 7C AC 60 62 F2 09 5D Key 023:38 7C DE 92 B6 E3 EF 88 Key 024:38 7C 89 3D 1D CC B7 A0 Key 025:38 7C 93 47 21 D8 46 3F Key 026:38 7C 6A 1E 70 FD D8 45
0000003C 00387CE24F0000 00387D624EFFFF [...] 12:39:32  kps:5.696851920859e+008 00000014         Key 001:38 7D 01 B6 B9 00 36 EF Key 002:38 7D 33 E8 9B 03 14 B2 Key 003:38 7D 1E D3 58 04 2A 86 Key 004:38 7D 1E D3 72 48 33 ED Key 005:38 7D 1F D4 32 B6 05 ED Key 006:38 7C F0 A4 F9 51 D1 1B Key 007:38 7D 3A EF 99 C1 EA 44 Key 008:38 7C E5 99 27 C4 FB E6 Key 009:38 7C F5 A9 88 D9 74 D5 Key 010:38 7D 12 C7 A1 F0 F2 83 Key 011:38 7D 0A BF C6 AC F5 67 Key 012:38 7D 10 C5 3D E4 00 21 Key 013:38 7D 19 CE 2D E3 FE 0E Key 014:38 7D 27 DC 43 FD 8F CF
0000003D 00387D624F0000 00387DE24EFFFF [...] 12:39:36  kps:5.710754169933e+008 00000010         Key 001:38 7D 71 26 63 00 94 F7 Key 002:38 7D A0 55 9B 05 9F 3F Key 003:38 7D A0 55 9E 05 BD 60 Key 004:38 7D A0 55 C4 D1 89 1E Key 005:38 7D A0 55 B3 F9 F2 9E Key 006:38 7D AD 62 A9 F7 B6 56 Key 007:38 7D B8 6D DB F9 03 D7 Key 008:38 7D C6 7B AD 6B 82 9A Key 009:38 7D C6 7B 6B B0 8D A8 Key 010:38 7D D2 87 7B FB EA 60

He seems to be doing 5.71e+008 The e+008 is called engineering notation. Just for the ones that wonder.
e+008 == 10^8. So KPS = 5.71 *10^008 , about half a billion keys per second


Now look at his timing, { 12:39:28 12:39:32 12:39:36 } every 4 seconds. He decided to go by the maximum. I guess his OCLBiss.cfg
LOOPSPERTHREAD:65536 #128 256 514 1024 2048 4096 8192 16384 32768 65536
But as a reference I want YOU ALL to have a cadence of 1 seconds. In this way your GPU have time to recover and do what ever else it has to do.
And it is simple. You know that your previous attempt takes 1 second per round of kernel. If you see it doing more, lets say 2 per seconds. You can quickly know that it is faster.


Ok, Where we are going to get IDEAS!! We have out Master Colibri. He posted some sources.

Code:
http://colibri.bplaced.net/csa_rainbow_table.htm

In Linux we can do:
Code:
wget http://colibri.bplaced.net/CSA-Rainbow-Table-Tool_V1.03.zip
unzip CSA-Rainbow-Table-Tool_V1.03.zip "Source Code.zip"
unzip -j "Source Code.zip" "CSA-Rainbow-Table-Tool - V1.03/CSA-Rainbow-Table-Tool/cuda/cuda.cu"


I now you are Linux haters. So I put a copy of the sources with next file.

OCL_TEST_01.zip (846.84 KB)
Code:
https://workupload.com/file/qKmTwVD4qCT

Pass: www.sat-universe.com

1) In it you will find Colibri's "Source Code.zip" in it you can find "CSA-Rainbow-Table-Tool - V1.03/CSA-Rainbow-Table-Tool/cuda/cuda.cu"
For know we will use his __device__ void KeySchedule07_Xor(BYTE *ksfull, const BYTE *Cw8)

2) To test new IDEAS, you do not START with OPENCL!! NO it is hard to debug. Instead you test new IDEAS in plain C code. So
You will find csa_decrypt_1block_013 the changes are in csa.c at line #65 to #133.
In line #65 you find
Code:
#define KS_00
Like this you tell the compiler to use Colibries new key_schedule.

But If you put to slash //, this mean make the line a comment. And then the OLD key_schedule is used. Easy NO! It will look like GRAY
//#define KS_00
. So test and see that you get ?

My personal results in a laptop.
Code:
root@mumble:# make
Detected OS = Linux
CFLAGS = -Wall -g -fopenmp -std=gnu11 -DCL_TARGET_OPENCL_VERSION=110 -D__OPENCL_VERSION__=110
INC_DIRS = 
LIB_DIRS = 
LIBS = -lgomp -lOpenCL
CC = gcc
STRIP = strip
RM = rm -f
RRM = rm -f r
srcfiles = ./csa.c ./csa_decrypt_1block.c
objects =  ./csa.o  ./csa_decrypt_1block.o
BIN = csa_decrypt_1block
gcc -Wall -g -fopenmp -std=gnu11 -DCL_TARGET_OPENCL_VERSION=110 -D__OPENCL_VERSION__=110 -c ./csa.c ./csa_decrypt_1block.c
CC	csa.c
g++   ./csa.o  ./csa_decrypt_1block.o   -lgomp -lOpenCL  -o csa_decrypt_1block
strip csa_decrypt_1block
root@mumble:# ./csa_decrypt_1block

encrypted
0000 - 47 41 00 1A 3C EB DC 17 3C 2B D6 4F 65 16 88 F2 
0010 - 58 D5 97 05 


Key				 38 5F 62 F9 4F CA BC D5          8_b.O... 

key schedule, kk[0]..kk[55] = 
 70 F8 7D A8 CC 8D 9D BF A6 6E EF E8 3D 93 AB 6D          p........n..=..m 
 D5 28 25 31 CB 3D 0D 9D 13 EE AC EF 0F ED BE F2          .(%1.=.......... 
 2B 4F A2 7F AF 99 BE 22 13 AB 33 22 E9 EC 42 FE          +O....."..3"..B. 
 3E 59 64 FF 49 CC BA D3


SB[ 0]  = 3C EB DC 17 3C 2B D6 4F	65 16 88 F2 58 D5 97 05

IB[ 0]  = 3C EB DC 17 3C 2B D6 4F 
stream  = 				63 C9 50 DD C4 0A E8 F3 
IB[ 1]  = 				06 DF D8 2F 9C DF 7F F6 
block   = 				06 DF D9 CF 9C DF FF 36 
DB[0] = 				[b]00 00 01[/b] E0 00 00 80 C0 
root@mumble:#

Do not forget what we looking for is 00 00 01.
 
Last edited:

cayoenrique

Senior Member
Messages
478
3) And then you get the NEW OCLBiss implementation.
Where do you think the changes will be? Easy on the OpenCl kernel, file is called csa_decrypt_1block.cl And you will see almost same changes at line #100.
Again you can go to old style just by placing the comment // in front of //#define KS_00.

4) You can find my results in Enroque_log.txt I know a type error, That is life.

My old speed was in my old laptop with 480 core AMD GPU
00000001 00385F624F0000 00385F6266FFFF [...] 21:30:19 kps:1.55e+06 00000001 Key 001:38 5F 62 F9 4F CA BC D5 Total 1:
00000002 00385F62670000 00385F627EFFFF [...] 21:30:21 kps:1.56e+06 00000000 Total 1:

My new kernel is
00000001 00385F624F0000 00385F6266FFFF [...] 21:32:20 kps:5.22e+06 00000001 Key 001:38 5F 62 F9 4F CA BC D5 Total 1:
00000002 00385F62670000 00385F627EFFFF [...] 21:32:21 kps:5.41e+06 00000000 Total 1:

This means that new Colibris key_schedule is
( 5.22e+06 - 1.55e+06 ) / 1.55e+06 = 3.67 / 1.55 = 2.37 Times faster

See how we little by little can improve speed.

Now I need @dvlajkovic to do some test.

1) Delete the log.txt if any.
2) Go into OCLBiss.cfg and change your
LOOPSPERTHREAD:65536 #128 256 514 1024 2048 4096 8192 16384 32768 65536
This gave you 1 result every 4 seconds. To slow. I want it 4 time faster. The I guess you will place it at
LOOPSPERTHREAD:16384 #128 256 514 1024 2048 4096 8192 16384 32768 65536
3) Run the new OCLBiss and let it run at least 20 times. You should see it running like 2-3 times per seconds. Maybe more.
4) Go to csa_decrypt_1block.cl and comment line 100 //#define KS_00 It should be Grey. Then run. This time the old Kernel should change about every 1seconds. Let it run for 20 cycles.

PLEASE Post the two results. I no longer can test in a Good GPU so you are know the standard. The idea is to have your two speed values to see how we are improving. Will wait for your results. Any one can join. Do not worry you can ask all you want. I know is a new thing.
 

dvlajkovic

Senior Member
Messages
499
3) Run the new OCLBiss and let it run at least 20 times. You should see it running like 2-3 times per seconds. Maybe more.
untouched_OCLBiss.exe
Code:
C:\Apps\home\OCLBiss_013_test>untouched_OCLBiss.exe
Reading from OCLBiss.cfg...Done...

Today is Tue Sep 12 10:11:52 2023
Connected to device:NVIDIA Corporation => NVIDIA GeForce RTX 4090
3 warnings generated.

Device Kernel properties:
        number of cores:                                128
        recommended work group size (local threads):    32
        max work group size:                            256


Number Loops per thread:                                16384
Number of keys per thread:                              16384
Local threads:                                          256
Global threads:                                         32768
Keys per kernel:                                        0x20000000(536870912)

                *****Important: To stop program and save current status*******
                                PRESS a LONG [ESC] Key and then say 'Y'

                Scroll Lock LED in KeyBoard will be used to Signal Key Found event.
                        Please ensure it is off before continuing

                File Log from Last Searched present:                            11C16EC6EF80101
                Do you what to continue from here?
                If you type 'N' or 'n' will start from config.ini value:        0000385F624F0000
                Press Enter to Continue (Y)y

 base = 385F66FF0000
 Looking for 0x385F62F94FCABCD5

BruteForcing for:
SB01:           3C EB DC 17 3C 2B D6 4F 65 16 88 F2 58 D5 97 05

Range: 0000000020000000

Loop             From             To                kernel Time  Keys per seconds        #Keys Found
00000001 0000385F66FF0000 0000385F86FEFFFF [...] 10:12:24 kps:4.57e+008 00000027          Key 001:38 5F 80 17 E4 C4 85 2D
00000002 0000385F86FF0000 0000385FA6FEFFFF [...] 10:12:24 kps:5.70e+008 00000022          Key 001:38 5F 9C 33 8A C0 9E E8
00000003 0000385FA6FF0000 0000385FC6FEFFFF [...] 10:12:25 kps:5.71e+008 00000011          Key 001:38 5F C6 5D 2C 43 70 DF
00000004 0000385FC6FF0000 0000385FE6FEFFFF [...] 10:12:26 kps:5.70e+008 00000024          Key 001:38 5F DC 73 16 40 AF 05
00000005 0000385FE6FF0000 0000386006FEFFFF [...] 10:12:27 kps:5.71e+008 00000024          Key 001:38 5F F9 90 87 01 32 BA
00000006 0000386006FF0000 0000386026FEFFFF [...] 10:12:28 kps:5.71e+008 00000025          Key 001:38 60 1F B7 DC C0 F1 8D
00000007 0000386026FF0000 0000386046FEFFFF [...] 10:12:29 kps:5.69e+008 00000013          Key 001:38 60 36 CE 70 84 CC C0
00000008 0000386046FF0000 0000386066FEFFFF [...] 10:12:30 kps:5.70e+008 00000026          Key 001:38 60 48 E0 E6 01 DB C2
00000009 0000386066FF0000 0000386086FEFFFF [...] 10:12:31 kps:5.70e+008 00000020          Key 001:38 60 86 1E AA C0 08 72
0000000A 0000386086FF0000 00003860A6FEFFFF [...] 10:12:32 kps:5.71e+008 00000010          Key 001:38 60 91 29 9B 80 1F 3A
0000000B 00003860A6FF0000 00003860C6FEFFFF [...] 10:12:33 kps:5.72e+008 00000025          Key 001:38 60 B3 4B 5E 82 CE AE
0000000C 00003860C6FF0000 00003860E6FEFFFF [...] 10:12:34 kps:5.71e+008 00000029          Key 001:38 60 CB 63 4B 40 C1 4C
0000000D 00003860E6FF0000 0000386106FEFFFF [...] 10:12:35 kps:5.71e+008 00000027          Key 001:38 60 F7 8F 44 06 61 AB
0000000E 0000386106FF0000 0000386126FEFFFF [...] 10:12:36 kps:5.71e+008 00000024          Key 001:38 61 24 BD D0 03 0B DE
0000000F 0000386126FF0000 0000386146FEFFFF [...] 10:12:37 kps:5.71e+008 00000035          Key 001:38 61 31 CA BE C0 29 A7
00000010 0000386146FF0000 0000386166FEFFFF [...] 10:12:38 kps:5.69e+008 00000035          Key 001:38 61 4B E4 2B C0 54 3F
00000011 0000386166FF0000 0000386186FEFFFF [...] 10:12:39 kps:5.71e+008 00000025          Key 001:38 61 6B 04 05 00 0B 10
00000012 0000386186FF0000 00003861A6FEFFFF [...] 10:12:40 kps:5.71e+008 00000019          Key 001:38 61 A6 3F 38 C0 64 5C
00000013 00003861A6FF0000 00003861C6FEFFFF [...] 10:12:41 kps:5.71e+008 00000029          Key 001:38 61 A9 42 00 01 9D 9E
00000014 00003861C6FF0000 00003861E6FEFFFF [...] 10:12:42 kps:5.71e+008 00000019          Key 001:38 61 C7 60 C3 41 0C 10
00000015 00003861E6FF0000 0000386206FEFFFF [...] 10:12:43 kps:5.68e+008 00000017          Key 001:38 61 F2 8B 75 82 51 48
00000016 0000386206FF0000 0000386226FEFFFF [...] 10:12:44 kps:5.67e+008 00000019          Key 001:38 62 09 A3 70 44 12 C6
00000017 0000386226FF0000 0000386246FEFFFF [...] 10:12:45 kps:5.70e+008 00000027          Key 001:38 62 41 DB 5C 83 4E 2D
00000018 0000386246FF0000 0000386266FEFFFF [...] 10:12:45 kps:5.68e+008 00000028          Key 001:38 62 47 E1 0E 00 4F 5D
00000019 0000386266FF0000 0000386286FEFFFF [...] 10:12:46 kps:5.70e+008 00000009          Key 001:38 62 6D 07 64 C2 5A 80
0000001A 0000386286FF0000 00003862A6FEFFFF [...] 10:12:47 kps:5.70e+008 00000024          Key 001:38 62 87 21 2D 80 CD 7A
0000001B 00003862A6FF0000 00003862C6FEFFFF [...] 10:12:48 kps:5.70e+008 00000025          Key 001:38 62 B8 52 70 85 43 38
0000001C 00003862C6FF0000 00003862E6FEFFFF [...] 10:12:49 kps:5.70e+008 00000018          Key 001:38 62 D1 6B 50 81 05 D6
0000001D 00003862E6FF0000 0000386306FEFFFF [...] 10:12:50 kps:5.69e+008 00000031          Key 001:38 63 05 A0 BF 40 36 35
0000001E 0000386306FF0000 0000386326FEFFFF [...] 10:12:51 kps:5.68e+008 00000012          Key 001:38 63 24 BF 08 40 F9 41
0000001F 0000386326FF0000 0000386346FEFFFF [...] 10:12:52 kps:5.69e+008 00000019          Key 001:38 63 3A D5 F9 41 3D 77

4) Go to csa_decrypt_1block.cl and comment line 100 //#define KS_00 It should be Grey. Then run. This time the old Kernel should change about every 1seconds. Let it run for 20 cycles.
greyed_OCLBiss.exe
Code:
C:\Apps\home\OCLBiss_013_test>greyed_OCLBiss.exe
Reading from OCLBiss.cfg...Done...

Today is Tue Sep 12 10:29:30 2023
Connected to device:NVIDIA Corporation => NVIDIA GeForce RTX 4090
3 warnings generated.

Device Kernel properties:
        number of cores:                                128
        recommended work group size (local threads):    32
        max work group size:                            256


Number Loops per thread:                                16384
Number of keys per thread:                              16384
Local threads:                                          256
Global threads:                                         32768
Keys per kernel:                                        0x20000000(536870912)

                *****Important: To stop program and save current status*******
                                PRESS a LONG [ESC] Key and then say 'Y'

                Scroll Lock LED in KeyBoard will be used to Signal Key Found event.
                        Please ensure it is off before continuing

                File Log from Last Searched present:                            11C1A086EF80101
                Do you what to continue from here?
                If you type 'N' or 'n' will start from config.ini value:        0000385F624F0000
                Press Enter to Continue (Y)y

 base = 386846FF0000
 Looking for 0x385F62F94FCABCD5

BruteForcing for:
SB01:           3C EB DC 17 3C 2B D6 4F 65 16 88 F2 58 D5 97 05

Range: 0000000020000000

Loop             From             To                kernel Time  Keys per seconds        #Keys Found
00000001 0000386846FF0000 0000386866FEFFFF [...] 10:29:38 kps:5.61e+008 00000028          Key 001:38 68 5C FC 65 40 F7 9C
00000002 0000386866FF0000 0000386886FEFFFF [...] 10:29:39 kps:5.68e+008 00000029          Key 001:38 68 7B 1B 73 C0 21 54
00000003 0000386886FF0000 00003868A6FEFFFF [...] 10:29:40 kps:5.69e+008 00000025          Key 001:38 68 8D 2D 3C 81 A1 5E
00000004 00003868A6FF0000 00003868C6FEFFFF [...] 10:29:41 kps:5.68e+008 00000007          Key 001:38 68 B0 50 28 02 69 93
00000005 00003868C6FF0000 00003868E6FEFFFF [...] 10:29:42 kps:5.68e+008 00000022          Key 001:38 68 D8 78 58 41 77 10
00000006 00003868E6FF0000 0000386906FEFFFF [...] 10:29:42 kps:5.68e+008 00000025          Key 001:38 68 ED 8D E8 C0 2C D4
00000007 0000386906FF0000 0000386926FEFFFF [...] 10:29:43 kps:5.68e+008 00000017          Key 001:38 69 24 C5 73 00 3C AF
00000008 0000386926FF0000 0000386946FEFFFF [...] 10:29:44 kps:5.68e+008 00000015          Key 001:38 69 29 CA 36 40 04 7A
00000009 0000386946FF0000 0000386966FEFFFF [...] 10:29:45 kps:5.68e+008 00000016          Key 001:38 69 4D EE 6F 04 A2 15
0000000A 0000386966FF0000 0000386986FEFFFF [...] 10:29:46 kps:5.68e+008 00000024          Key 001:38 69 7F 20 0C 00 16 22
0000000B 0000386986FF0000 00003869A6FEFFFF [...] 10:29:47 kps:5.69e+008 00000019          Key 001:38 69 93 34 59 01 B3 0D
0000000C 00003869A6FF0000 00003869C6FEFFFF [...] 10:29:48 kps:5.68e+008 00000028          Key 001:38 69 AD 4E ED 40 B3 E0
0000000D 00003869C6FF0000 00003869E6FEFFFF [...] 10:29:49 kps:5.68e+008 00000020          Key 001:38 69 C8 69 63 83 A4 8A
0000000E 00003869E6FF0000 0000386A06FEFFFF [...] 10:29:50 kps:5.68e+008 00000013          Key 001:38 69 EB 8C 1E C1 1A F9
0000000F 0000386A06FF0000 0000386A26FEFFFF [...] 10:29:51 kps:5.68e+008 00000025          Key 001:38 6A 12 B4 12 C5 4E 25
00000010 0000386A26FF0000 0000386A46FEFFFF [...] 10:29:52 kps:5.68e+008 00000023          Key 001:38 6A 43 E5 E2 83 3C A1
00000011 0000386A46FF0000 0000386A66FEFFFF [...] 10:29:53 kps:5.68e+008 00000017          Key 001:38 6A 59 FB 96 41 17 EE
00000012 0000386A66FF0000 0000386A86FEFFFF [...] 10:29:54 kps:5.69e+008 00000025          Key 001:38 6A 72 14 18 C0 28 00
00000013 0000386A86FF0000 0000386AA6FEFFFF [...] 10:29:55 kps:5.69e+008 00000026          Key 001:38 6A 94 36 16 C0 6C 42
00000014 0000386AA6FF0000 0000386AC6FEFFFF [...] 10:29:56 kps:5.68e+008 00000032          Key 001:38 6A BA 5C FE C0 86 44
00000015 0000386AC6FF0000 0000386AE6FEFFFF [...] 10:29:57 kps:5.68e+008 00000027          Key 001:38 6A C8 6A 5D 81 0B E9
00000016 0000386AE6FF0000 0000386B06FEFFFF [...] 10:29:58 kps:5.68e+008 00000028          Key 001:38 6A F0 92 DF 03 75 57
00000017 0000386B06FF0000 0000386B26FEFFFF [...] 10:29:59 kps:5.55e+008 00000028          Key 001:38 6B 17 BA F1 80 07 78
00000018 0000386B26FF0000 0000386B46FEFFFF [...] 10:30:00 kps:5.61e+008 00000033          Key 001:38 6B 28 CB 38 83 65 20
00000019 0000386B46FF0000 0000386B66FEFFFF [...] 10:30:01 kps:5.63e+008 00000022          Key 001:38 6B 57 FA 0E 41 6C BB
0000001A 0000386B66FF0000 0000386B86FEFFFF [...] 10:30:02 kps:5.66e+008 00000019          Key 001:38 6B 71 14 B4 44 DB D3
0000001B 0000386B86FF0000 0000386BA6FEFFFF [...] 10:30:03 kps:5.64e+008 00000030          Key 001:38 6B 9D 40 83 C0 12 55
0000001C 0000386BA6FF0000 0000386BC6FEFFFF [...] 10:30:04 kps:5.65e+008 00000034          Key 001:38 6B B7 5A 1D 01 C9 E7
0000001D 0000386BC6FF0000 0000386BE6FEFFFF [...] 10:30:05 kps:5.65e+008 00000041          Key 001:38 6B CD 70 7E C1 5B 9A
0000001E 0000386BE6FF0000 0000386C06FEFFFF [...] 10:30:06 kps:5.66e+008 00000022          Key 001:38 6B EE 91 E6 41 2F 56
0000001F 0000386C06FF0000 0000386C26FEFFFF [...] 10:30:07 kps:5.66e+008 00000036          Key 001:38 6C 1D C1 31 00 D8 09
 

dvlajkovic

Senior Member
Messages
499
No, I just changed the name :giggle:
Of course I have compiled it before the 2nd run, that's how I got 2 files.
Note that before the 2nd run I've placed // on the beginning of line 100
//#define KS_00
but the code that follows it does not become greyed out (see pic below).

N3AaIzF.png


C0der, could you post your results? Do they very much differ one from another?
 

cayoenrique

Senior Member
Messages
478
@coder no need for compile. Why you may ask? Because changes are in Kernel not in C program.

See main.c When it runs it does take care of so called Building the kernel from source. This is in fact a compilation of the OpenCL kernel. So every single time you execute main.c in this case inside OCLBiss.exe the lernel get compiled.
You may ask again why? It is OpenCL way of warranty that the code is build for the GPU selected.

Now what may seems to be happening is that he did change the //#define KS_00, automatically it change color even if not save. BUTTTT not SAVE means that OCLBiss.exe will see same old csa_decrypt_1block.cl !!! and compile always the same cl.

@dvlajkovic I see that by changing Number Loops per thread: 16384 you will go to a cadence of every seconds on old kernel. This is what we did expect.
But we expect that the new one should be around twice faster, so the cadence should be 2 rounds per seconds.


Now I am building the kernel with DVBCSA files just as it seems was the intention of colibri. I would imagine that should speed up al lot faster as stream cypher uses about a 10th of the registers, and less loops. Will see. I hope to post that in a few hours.
 

cayoenrique

Senior Member
Messages
478
Changing the name of the program is not needed. The C program is ALWAYS the same.
What is different is csa_decrypt_1block.cl But every time you edit this ONLY file, you need to save it. If you do not save it then OCLBiss.exe see the same file.

I will continue working in the new version. Hope to post soon.
 

cayoenrique

Senior Member
Messages
478
OCL_TEST_02.zip (55.06 KB)
Code:
https://workupload.com/file/QQbyQ6fSVb6

Ok this one uses dvbcsa. It is more faster.

Now to stop the comenting of define. I will use the OCLBiss.cfg and you only have to mod
Code:
#PROGRAM_FILE:"csa_decrypt_1block_000.cl"
#PROGRAM_FILE:"csa_decrypt_1block_001.cl"
#PROGRAM_FILE:"csa_decrypt_1block_002.cl"
PROGRAM_FILE:"csa_decrypt_1block_003.cl"

# means commented. In any case I belive if more than one of same variables is set, I guess it will remeber the last one!


PLEASE leave comment on speed.

On my slow laptop #002 is the fastest but not as expected!

I try to unroll some loops but it got slower. but maybe you see something different.
I guess we need to split the kernel, block save to global memory and the ask stream to xor its result with saved global results.
 

dvlajkovic

Senior Member
Messages
499
Changing the name of the program is not needed. The C program is ALWAYS the same.
What is different is csa_decrypt_1block.cl But every time you edit this ONLY file, you need to save it. If you do not save it then OCLBiss.exe see the same file.
I did save it. Why it's not fast as you'd expect - I don't know.
What I know is that no one else is even trying this at home like it's nuclear physics.
Not a programmer, you all said... well, I'm no gynecologist but I can surely take a look.
 

cayoenrique

Senior Member
Messages
478
Code:
What I know is that no one else is even trying this at home like it's nuclear physics.
@dvlajkovic do not worry, soon they will show up. I had done many of this teaching threads and is always the same. People read but do not comment. Wait until you say It is as fast as CUDABISS or even faster just a little and you will see comments. Well I hope to at least match speed.

In general every different GPU Nvidia or AMD will not show equal speeds improvements. It depends on GPU IC family, as some have more registers and others have more private faster memory.
Now, something is weird and I hope to find reason soon. My laptop shows gain of 2X in 001 and gain of 5X 002.

@Window Users
Now if yo are a fanatic of Windows and do not want to do GNU compiler. Fine. Just install Nvidia driver and Tools. Install your favorite flavor of Visual C. And make sure you can run OpenCL nvidia way. Then you can test this programs. Just do:
Code:
make
OCLBiss.exe
 

moonbase

VIP
Donating Member
Messages
556
@Window Users
Now if yo are a fanatic of Windows and do not want to do GNU compiler. Fine. Just install Nvidia driver and Tools. Install your favorite flavor of Visual C. And make sure you can run OpenCL nvidia way. Then you can test this programs. Just do:
Code:
make
OCLBiss.exe

Windows fan boy here, I have nvidia drivers installed and a version of Visual C, no other stuff is installed.
I downloaded the OCL_TEST_02.zip and extracted it, what to do now to run the test?

What to do and where to type instructions of
make
OCLBiss.exe
 
Top