

Since it has padded on extra “A”s, it also adds “=” signs unnecessarily, lengthening out the string. The algorithm adds one or two “A”s, which doesn’t properly zero out the last 8 or 16 bits. Likewise, running the algorithm on “abcde” gives “YWJjZGVB=” instead of the correct “YWJjAGU=”. But running the algorithm on “abcd” gives the incorrect result of “YWJjZEFB=” instead of the correct result of “YWJjZA=”. It works for strings exactly divisible by 3, but fails to match it on any other length string.įor instance, storing the string “abc” in the file named “key” gives the proper result for both–“YWJj”.

Second, the algorithm fails to match the Linux base64 program (usually found in /usr/bin/base64).

First, the initial use of base64.b64encode() requires a byte string, so prefixing a ‘b’ to the string makes it work. There are a couple of problems with this. It does it with all other characters, adding them to the " base64" string, which will contain our final encoded text. Meaning base64chars is A, base64chars will retrieve B, etc. 29 th, because indexing starts with 0, not one. Simply put, our first Base64 number is 28, so it will use the 29 th character in the base64chars array, which is c. The same logic applies to the other two parts also.Īfter getting the numeric value of the 6bits of the encoded block, the program maps the numeric value to a character in the base64chars array. On the contrary, the last six 1s of the mask will leave the shifted block intact. Think about it: performing AND between anything and 0 will produce 0.
#Python convert to base64 encoding plus
Running AND on the corresponding bits of the shifted block and the number 63 (again, all zeroes, plus 111111) will do just this for us. How can we discard the first, unneeded 011100 part? The algorithm simply uses the AND bitwise operator to accomplish it. But we still have the first 6 bits in the shifted block. The program shifts the block 12 times, so the second 6 bits gets to the far-right position that we need. Now, the second part is more interesting. Running the AND operator between this and the decimal number 63 (which is …111111 in binary) is not necessary this time as all of the bits in front of the shifted block are all 0s. You'll see in our example the last 6bits of the shifted block will become 011100. That gives me the base64 encoded SHA256 I needed.To calculate the first 6bit part, the program shifts the 24-bit block to the right 18 times, this way the desired 6 bits will occupy the first 6 bits of the block.
#Python convert to base64 encoding code
The 2nd code converts the hash from hex to base64:ĭf = df.apply( This first code hashes the entire Email column for me: I couldn't figure out a way to make that work in Alteryx so I had to outsource the labor to Python. What I really needed was for the encoder to read it as Hex to convert to base64. Checking with online hash generators, I realized that the base64 encoding tools on Alteryx kept interpreting it as UTF-8 to encode to base64, which produced a hash different from what I expect. The next issue I faced was base64 encoding the hash correctly. I was having trouble looping the functions over the entire field because of attribute errors and hash.lib was not iterable the best I could do was operate on one record at a time by indexing which record I wanted to look at. I pretty much used lambda expressions to apply the hash and the encoding to the entire column of Emails.
