Is it possible to effectively parallelise a brute-force attack on 4 different password patterns?

2017-12-31

An answer to this question on Stack Overflow.

Question

In the context of my homework task I need to smart brute-force a set of passwords. Every password in the set has either of three possible masks:

%%@@

@@%%

@%%@

%@@%

( @ - a numeric character, % - a lowercase alpha character ).

At this point I am doing something like this to run over only one pattern ( the 1st one ) in multithreading:

// Compile: $ gcc test.c -o test -fopenmp -O3 -std=c99
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <omp.h>
int main() {
    const char alp[26] = "abcdefghijklmnopqrstuvwxyz";
    const char num[10] = "0123456789";
    register int i;
    char pass[4];
    #pragma omp parallel for private(pass)
    for (i = 0; i < 67600; i++) {
        pass[3] = num[i        % 10];
        pass[2] = num[i /   10 % 10];
        pass[1] = alp[i /  100 % 26];
        pass[0] = alp[i / 2600 % 26];
        /* Slow password processing here */
    }
    return 0;
}

But, unfortunately, that technique has nothing to do with searching passwords with different patterns.

So my question is:
Is there a way to construct an effective set of parallel for instructions in order to run the attack simultaneously on each password pattern?

Help is much appreciated.

Answer

The trick here is to note that all four password options are simply rotations/shifts of each other.

That is, for the example password qr34 and the patterns you mention, you are looking at:

qr34 %%@@ #Original potential password
4qr3 @%%@ #Rotate 1 place right
34qr @@%% #Rotate 2 places right
r34q %@@% #Rotate 3 places right

Given this, you can use the same generation technique as in your first question.

For each potential password generated, check the potential password as well as the next three shifts of that password.

Note that the following code relies on an interesting property of C/C++: if the truth value of a statement can be deduced early, no further execution takes place. That is, given the statement if(A || B || C), if A is false, then B must be evaluated; however, if B is true, then C is never evaluated.

This means that we can have A=CheckPass(pass) and B=CheckPass(RotatePass(pass)) and C=CheckPass(RotatePass(pass)) with the guarantee that the password will only be rotated as many times as necessary.

Note that this scheme requires that each thread have its own, private copy of the potential password.

//Compile with, e.g.: gcc -O3 temp.c -std=c99 -fopenmp
#include <stdio.h>
#include <unistd.h>
#include <string.h>
int PassCheck(char *pass){
  return strncmp(pass, "4qr3", 4)==0;
}
//Rotate string one character to the right
char* RotateString(char *str, int len){
  char lastchr = str[len-1];
  for(int i=len-1;i>0;i--)
    str[i]=str[i-1];
  str[0] = lastchr;
  return str;
}
int main(){
  const char alph[27] = "abcdefghijklmnopqrstuvwxyz";
  const char num[11]  = "0123456789";
  char goodpass[4]    = "----"; //Provide a default password to indicate an error state
  #pragma omp parallel for collapse(4)
  for(int i = 0; i < 26; i++)
  for(int j = 0; j < 26; j++)
  for(int m = 0; m < 10; m++)
  for(int n = 0; n < 10; n++){
    char pass[4] = {alph[i],alph[j],num[m],num[n]};
    if(
      PassCheck(pass)                 ||
      PassCheck(RotateString(pass,4)) ||
      PassCheck(RotateString(pass,4)) ||
      PassCheck(RotateString(pass,4))
    ){
      //It is good practice to use `critical` here in case two
      //passwords are somehow both valid. This won't arise in
      //your code, but is worth thinking about.
      #pragma omp critical
      {
        memcpy(goodpass, pass, 4);
        //#pragma omp cancel for //Escape for loops!
      }
    }
  }
  printf("Password was '%.4s'.\n",goodpass);
  return 0;
}

I notice that you are generating your password using

pass[3] = num[i        % 10];
pass[2] = num[i /   10 % 10];
pass[1] = alp[i /  100 % 26];
pass[0] = alp[i / 2600 % 26];

This sort of technique is occasionally useful, especially in scientific programming, but usually only for addressing convenience and memory locality.

For instance, an array of arrays where an element is accessed as a[y][x] can be written as a flat-array with elements accessed as a[y*width+x]. This gives a speed gain, but only because the memory is contiguous.

In your case, this indexing does not produce any speed gains, but does make it more difficult to reason about how your program works. I would avoid it for this reason.

It's been said that "premature optimization is the root of all evil". This is especially true of micro-optimizations such as the one you're trying here. The biggest speed gains come from high-level algorithmic decisions, not from fiddly stuff. The -O3 compilation flag does most of everything you'll ever need done in terms of making your code fast at this level.

Micro-optimizations assume that doing something convoluted in your high-level code will somehow enable you to out-smart the compiler. This is not a good assumption since the compiler is often quite smart and will be even smarter tomorrow. Your time is very valuable: don't use it on this stuff unless you have a clear justification. (Further discussion of "premature optimization" is here.)