#11




1rst and most important. Do not have to apologize.
As I said, the only smart guy here is you, the one asking. Those that do not ask, will never find the answer to their unknown. Now I did not went into basic C for you. But for the many that may read silently in the background. This is the reason I split this part on a different thread. Now I am not sure I had answer your question: Quote:
The original key1 have 8 bytes. But each byte has 8 bit. This means 8 * 8 = 64 bit Total in original key. The table is a way to represent how we are going to shift each one of those bits. As you saw we need 56 tolal keys. The last 8 operations are the original keys kk[48]kk[55]. So we in fact only need 48 new keys as ( 56  8 = 48 ). How do we get them? We use the table to know how we shift each bit position. This is done 48 times to get kk[0]kk[47]. So table is just a way to shift each bit of the total 64. It provides a new position for each bit. Those generating a new byte. 
#12




Now I want you to realize what intention the developer has in mind when he added this simple 64 long table.
Lets take colibri's subroutine and estimates how many PU operations & PU cycles are required, there are 48 operations Code:
ksfull[0] = 0 ^ ( ( eq1 )  ( eq2 )  ( eq3 )  ( eq4 )  ( eq5 )  ( eq6 )  ( eq7 )  ( eq8 ) ); Code:
((cw[x] & bit) << pos) & bit => 1 PU Op = 1 cycle << pos => 1 PU Op = 1 cycle So the following requires 12 cycles. But because they are 8 per line total are 8 * 12 = 96 cycles. Code:
((cw[x] & bit) << pos) 0 ^  We can not forget this hidden op, 1 more Cycle ksfull[0] = Last we need to storage in array. Local memory. So 10 more cycles. ( note storing may require more cycles than reading, normally twice. lets ignore that). So what is the total wait time consequence per Line. Total Cycles = 96 + 7 + 1 + 10 = 114 cycles per byte. To make it symple lets say every line uses same. So 56 bytes requiring 114 cycles each or Gran total = 56 * 114 = 6384 cycles. See when the developer included 1 table. It not only prevented a smart Math Boy to use equations. The developer also added a waist time per PU of 6384 cycles if we try to use Local memory. What will happen if instead of we try to use the original subroutine. Lets ignore other operations, lets concentrate in just Reading and writing. 56 * ( Read to Global memory + Write to Global memory ) 56 * ( 1000 cycles + 1000 cycles ). Total = 11200 cycles. And Let me remind you that because randomness lets imagine at least 1 of every 10 will try to read to same memory, because is same value. This means 63 PU waits while 1 read just because it wants to read same value. 11200 / 10 * 63 = 70560 cycles. A lot !! But because we are only imagine it lest say it is only 1 / 10. Just to be safe 11200 / 10 * 63 / 10 = 7056 cycles. So total waited time reading to Global memory is 11200 + 7056 = 18256 cycles. This 18256 cycles do not count for the needed operations to get for the values. You know those loops and sums that you see in my original subroutine. See Just adding a table the developer took out GPU and slow it down 20,000 cycles if we do use that table..... make you think uuhuuu!!! Now in time this is not much. Lets use a moderate GPU. GTX1070 1920 PU Core @ 1506 MHz 20000 cycles / 1920 PU / 1506 cycles per second per PU = 0.00691 seconds. Yes as you see 2000 is big but in time is only 1/150 of a second for the GPU. I know more confusion. Sorry. Here is the important to remember. Developers of encryption will use substitution tables as much as the can. As it will slow us hackers in hour attempt to do bruteforce. Last edited by cayoenrique; 15012022 at 16:52.. 
The Following 2 Users Say Thank You to cayoenrique For This Useful Post:  
dvlajkovic (16012022), Me2019H (15012022) 
#13




2nd CSA Stream Cypher
As you see I am splitting all. So that the newbie can look and understand only the new code. At the end I will provide the Full CSA_Core program. See attachement 
The Following 2 Users Say Thank You to cayoenrique For This Useful Post:  
dvlajkovic (16012022), Me2019H (16012022) 
#14




Ok Here is the Block Cypher. Clearly it also have the Key Schedule as we need it.

Thread Tools  
Display Modes  

