Encryption Projects as SU group

sdrfgs

Registered
Messages
111
But these settings should make difference.
optimize compute performance on or off? and prefer maximum performance?

driver versions did make a difference in the original cudabiss51264.exe especially after using nvclean to remove all the nvidia driver bloatware. There were speed benefits to be had.



@Lak7 are you going to post your own results?
 

sdrfgs

Registered
Messages
111
@Lak7 did you try feeding the source code into chatgp or any other ai coding engine , that would suggest improvements that could be made
 

moonbase

VIP
Donating Member
Messages
583
I am getting some puzzling test results with CudaBISS v12_3_VS22.

Tests with 3080Ti's and 3090's are running slower than previous CudaBISS versions on the same GPU's in the same PC's.
Other testers are reporting speed increases with 3060Ti and 3070 GPU's.

Is there a possibility that the higher number of cuda cores in the 3080Ti's and 3090 are not getting addressed correctly?
Cuda cores per card type are:

3060Ti = 4864
3070 = 5888
3080Ti = 10240
3090 = 10496

There is a huge jump up in the cuda core count for the 3080Ti and 3090 cards that are running slower than expected with v12_3_VS22.

Also, CudaBISS of the same version does not always run at the same speed if it completes and is then restarted, it is as though there is some legacy impact from the previous test.
This is intermittent, it is not a consistent speed reduction between sessions. This has never occurred previously with versions 102464 and earlier, it is only with versions later than v102464.
 

moonbase

VIP
Donating Member
Messages
583
Just a guess: Some GPU lower the clock when temperature get higher?

Good point.

However, the GPU's are in open air frames with lots of ventilation in a cool area. The same GPU's in the same open air frames run faster on earlier versions of CudaBISS.
The only variable is the version of CudaBISS.
 

moonbase

VIP
Donating Member
Messages
583
Are you still using the gpu overclock? remember what i said about dont overclock the vram

No overclocking, the GPU's and CPU's are all running at stock speeds.
Everything is identical apart from the version of CudaBISS.

It would be useful if another 3080Ti/3090/4090 tester could provide some resutls with the various recent versions of CudaBISS and how they compared to older versions, especially with simultaneous multiple instances.
 

moonbase

VIP
Donating Member
Messages
583
Update:

Think I found the reason, the PSU is possibly on the way out, acrid smell coming from it, probably not powering the cards at full throttle.
I better swap it out before it blows a GPU or the board.
 

Lak7

Registered
Messages
38
@Lak7 did you try feeding the source code into chatgp or any other ai coding engine , that would suggest improvements that could be made
I used the nvidia profiler. I shows that the memory calls are uncoalesced, and if coalesced could possibly provide big improvements.
I've have played with ChatGBT, it only provides vague answers. The source code is originally pulled from a couple different sources, and is not written very well by today's standards.
I think it has a hard time following what the code is doing. I did not write the original code, just compiled it - and learned a few things along the way.
What is another ai coding engine?
 

sdrfgs

Registered
Messages
111
I did - RTX 3070 - 3.8bkps 1 instance, 4+bkps 2 instances (max)
Yes but what is your pc specs? ram type , ram speed, chipset etc?
also the input.txt that you used as i will test the same settings

I think a 3070 only has 1000 cores more than a 3060ti? which make model card vram size? some gpus are factory overclocked slightly

my speed was around 3100+ bkps single copy on 3060 ti 8gb
 

sdrfgs

Registered
Messages
111
I used the nvidia profiler. I shows that the memory calls are uncoalesced, and if coalesced could possibly provide big improvements.
I've have played with ChatGBT, it only provides vague answers. The source code is originally pulled from a couple different sources, and is not written very well by today's standards.
I think it has a hard time following what the code is doing. I did not write the original code, just compiled it - and learned a few things along the way.
What is another ai coding engine?
Ill try to find .. there are asome listed here



Chat gpt told me this

Uncoalesced memory calls in your source code can lead to inefficient memory access patterns, which can slow down your program. Here are some strategies to fix this issue:

  1. Memory Coalescing: In CUDA, a coalesced memory transaction is one in which all of the threads in a half-warp access global memory at the same time1. The correct way to do it is just have consecutive threads access consecutive memory addresses1. For example, if threads 0, 1, 2, and 3 read global memory 0x0, 0x4, 0x8, and 0xc, it should be a coalesced read1.
  2. Matrix Access: In a matrix example, keep in mind that you want your matrix to reside linearly in memory1. Your memory access should reflect how your matrix is laid out1. For example, if you have a 3x4 matrix, you could access it row after row, so that (r,c) maps to memory (r*4 + c)1.
  3. Profiling: Write your kernel and then profile it to see if you have non-coalesced global loads and stores1. You can use tools like the Visual Profiler or nvprof in command line mode1.
  4. Avoiding Indirect Access: If your program is suffering from un-coalesced global memory access due to indirect access, you might need to find some patterns of the data to make this a direct sequential access2.
Remember, these are general guidelines and the best approach can depend on the specifics of your code and the architecture of your GPU. It’s always a good idea to profile your code to understand its behavior and performance characteristics1
 

Lak7

Registered
Messages
38
Yes but what is your pc specs? ram type , ram speed, chipset etc?
also the input.txt that you used as i will test the same settings

I think a 3070 only has 1000 cores more than a 3060ti? which make model card vram size? some gpus are factory overclocked slightly

my speed was around 3100+ bkps single copy on 3060 ti 8gb
I only run old stuff - usually whatever is leftover at work .... intel i5-8xxx, H370 motherboard, is has ram .
RTX 3070 is a Gigabyte, I'd have to see if I still have the box for make / model.
I don't use a specific input file - just a known winner. For speed, I only look at the first 2 digits. Not that critical, more about if your card sees any improvement.
As a side note on speeds - I was told not to use the first running of of a new release, it's always a little slower that first time.
 

sdrfgs

Registered
Messages
111
Yes but if u post the input.txt you use then its better for comparing with others.
so we are all testing the same input

testing something that take 10 minutes to run vs someone testing a short range that takes 2 minutes or less isnt good for comparing.

Things will average out better on longer runs,

Memory bandwidth does make a difference as test have proven, e/g 2133 ram vs 3200 mhz ram
 

moonbase

VIP
Donating Member
Messages
583
testing something that take 10 minutes to run vs someone testing a short range that takes 2 minutes or less isnt good for comparing.
Things will average out better on longer runs,


The tests I generally run are for a first two character range.
For example 110000000000 to 11FFFFFFFFFF checks for a CW in all of the range of 11.
 

moonbase

VIP
Donating Member
Messages
583
As a side note on speeds - I was told not to use the first running of of a new release, it's always a little slower that first time.


I have noticed similar, there is a visible lag in the command prompt window before a new release of CudaBISS activates the search.
The second time the same version is run on that PC the lag is absent.

If I am testing a new release I take the results from the 2nd activation ownards.
 

manic01

Super VIP
Messages
2,707
I still have a ways to go of non-rubbish posting before I can use private messaging again. :(

Can make forum donation and get pm access

The donation will give you access to the VIP/SVIP team area.
New members will also gain the following.

Forum Donating Member badge shown on your posts and your profile.
Private Messaging. Including attachments.
Editing of your Own Posts & Threads.
Unlimited Downloads.
Signature options not available to new members.
Shoutbox.

And a nice fuzzy feeling, you are helping SU with it's running costs to enable the forum to stay online.
 

moonbase

VIP
Donating Member
Messages
583
CudaBISS v12_3_VS22 with an RTX 4090 card runs slower than v12_3_64 on different PC platforms.
My only previous observation of this was on a z690 board with PCIe 4 capability.

I have since tested the RTX 4090 card using v12_3_64 and v12_3_VS22 on an x99 board with PCIe 3 capability.
Again, v12_3_VS22 was the slowest of the two versions.

Conclusion:
The PC platform makes no difference, an RTX 4090 card runs CudaBISS slower with v12_3_VS22 compared to v12_3_64.
 
Top