eLSB Stegano+ Asymmetric Backdoors in Kleptography ~ Learning To Hack

Been quite some time since I last posted something stego oriented so here it goes. In this one I'm gonna explain a bit of in-depth steganography (as a matter of fact LSB shifting is considered a novice stage but not in the case of the tutorial) and sketchily go over basic kleptography aspects as well as cryptovirology ones as well. The tutorial is somewhat related to my old MLE one but with more advanced and major concepts being implemented in comparison to it. I had a notion of combining everything together and forming a unique tool or rather a virus but it's too complex for me so I'll just break down the information into different topics and sum it up in a single project that includes all the below listed stuff. As a matter of fact here's what I am gonna layout in it:

[*] — Multiple hybrid layers (concealment)
[*] — Steganography (enhanced LSB shifting)
[*] — Kleptography (asymmetric backdoors)
[*] — Cryptovirology (cryptotrojans)
[*] — Python
[*] — Esoteric language (Brainfuck/INTERCAL)
[*] — Vigenere cipher
[*] — Self-modifying source
[*] — Polymorphism
[*] — 2D models & Spectrums
[*] — Temporal frequency analyses
[*] — Graphical schemes (graphs)
[*] — Datamatrix/QRcodes

Unlike the previous tutorial on this, here we're gonna develop a quintuple layer. The idea is to cover the above mentioned includes and especially the kleptographic attack where we're gonna spawn an asymmetric backdoor. It's in fact a monolayer but I'm gonna divide it into several sub-levels. We're gonna plant our cryptotrojan in a subliminal channel within the last sub-layer and then the next layer will take care of any eavesdropping or detection as a whole followed by three more for extra security and finalizing with a noised multi-pixel image. Here's a scheme of what each of them will represent:

So before we start lemme just mention the abbreviation of LSB which stands for least significant bit (this is basically the last bit in the sequence - rightmost). The fact that I wanted to meddle LSB steganography in this tutorial is mainly due to the fact that it's quite often mistaken with quantum steganography and images with a high level of noise. I myself had trouble solving a mission on HTS which has the same layer in the first place. The method I'm gonna apply is known as enhanced LSB shifting. The idea here is that each pixel is being replaced with a different value and hence makes the image totally unrecognizable. It is called enhanced because we're eliminated the high-level bits for every pixel except for the last LSB one and this is the case where we can most often evaluate the layer by looking into the structure of an image and following let's say an IDAT of 9 blocks, last LSB will be either smaller or equal to the previous bits (rarely equal in fact) which means that the previous ones have been altered and there's literally no room for the last LSB.

One of the few techniques that can be used to detect eLSB steganography (and actually differentiate it from quantums) is statistical analyses. The most famous of which is the chi-square attack. There are a few tools to perform such analysis. I'm gonna use a Java implementation of one of them (not written by me). The first chi-square graph is the output of the analysis made on the enhanced image and the one below it - on the non-enhanced. For the record, the chi-square is calculated on 24-bit .BMPs. It's also known as steganalysis.

Lemme quote the author of the tool as to what the different curves represent:

The program will output a graph with two curves. The first one inred is the result of the chi-square test. If it's close to one, than the probability for a random embedded message is high. So, if there is a random message embedded, this green curve will stay around 0.5. On the graph network, every vertical blue line represents 1 kilobyte of embedded data.

This is a sample representation of how the LSBs are being enhanced and set to either 255 or 0. Basically, the noise level depends on how much data we want to steganography and of course the size of the image, the color capacity etc.

Below is a histogram of the image. The blue initiates with a high value and then settles to a certain level. Whenever something like this occurs after you analyses, you could be nearly certain that there is something steganographied. Magically said, there is somewhat 70%+ in which I've found something in an image after a histogram analysis which outputs such results (especially the heightened presence of one of the RGB values).

Here's a Java chi-square class. I've used this one to represent the output of the steganalysis above. Here is a direct link to the .jar if you happen to need it for testing purposes only and not going through the code.

Code:

public double[] getExpected() {

double[] result = new double[pov.length / 2];

for (int i = 0; i < result.length; i++) {

double avg = (pov[2 * i] + pov[2 * i + 1]) / 2;

result = avg;

}

return result;

}

public void incPov(int i) {

pov++;

}

public long[] getPov() {

long[] result = new long[pov.length / 2];

for (int i = 0; i < result.length; i++) {

result = pov[2 * i + 1];

}

return result;

Frankly speaking there is a lot of software for embedding and extracting data but none is actually efficient when it comes to reversing the process. In this case the only possible way to reverse it will be to pull the least significant bit from every pixel channel, group them into words of 8 bits and convert back to text but that'd only be possible if we had any clue on which pixels have been altered (which we do not possess).

But let's say there is some sound or whatever audio file meddled. If you're good enough with steganography you could mix up both eLSB and audio rendering of an image and come up with an incredibly secure layer. Consider we have a file called fuke.wav which is somewhat altered and has some data within it. One of the ways to check for anything specific or whatever is to put the file under a frequency analysis and see whether there is something worth pursuing. First let's see a temporal analysis alongside a TFFT. Actually, the only difference bettwen a FFT and a temporal analysis is that the TFFT studies both the time and frequency of the signal while the FFT one only the signal itself (in other words we need to define a spectrum in order to see the temporal frequencies).

If that doesn't suit you, you can use sox for Linux boxes and generate a similar spectrum. Note that sox works only with .wav files (which is pretty much the extension that most software worships). Now to output a spectrum we do the following:

sox fuke.wav -n spectrogram

You may have to use a converter like ffmpeg or similar to alter the extension if you have previously generated a different one than .wav. And so we end up with the following:

Similar to that spectrogram are the following. The first one of which is with a dBV^2 scale on a 1024-bit window at 85%+- and the second one a linear scale and a 2048-bit window at 90%+ with a log bin. Quite better visible as you can plainly see. Same would refer if we manage to scale the sox spectrogram and manipulate it as best as we can but I don't think sox offers such possibilities.

Ok, so much for our first layer - four more to go. Now let's get to the next one which is the datamatrix.

Let's start off with a bit of a challenge walkthrough so that we can get the the essence of this layer. So considering we have the following PHP file:

http://pastebin.com/raw.php?i=vVzem66H

So here we are supposed to have something embedded. First of all this is a normal image file just formatted into PHP, so let's save it as .PNG for instance and see what we come up with.

Using a converter we end up with the following string which is in fact a polyalphabetic substitution cipher (vigenere in this case).

dtsfwqutisvqtesymkuvabbujwhfecuvlshwopcyeghguywjvlaibflcacyahckyqvypjntfhihgtvyxeqakjwouldltuiuhbhjumgkxuugqahvwhotduqtahcknheypjetxpvlhxtlrpjagyjzcgijgfjmcupsslkzpuxegaillytlfbygeptzjtuzlvlwkzdznxqwpabbe

There are lots of applications for 1D barcodes, PDF417, datamatrixes, QRs and whatnot. So far this one does the job the best for me.

Let's start off with the source. So far it's not polymorphic just basic encryption functions:

http://pastebin.com/raw.php?i=GErb6dm9

Little bit of background. Vigenere is a polyalphabetic cipher. Meaning that all attacks against such substitutions are applicable against it as well. I won't be explaining how to crack a vigenere since that's quite easily findable on the web. Anyway one of the key things is to estimate the key length. Below are the steps you need to take to manually crack a vigenere.

1. Find long repeated sequences

2. Find often repeated sequences

3. Count the distances between the repeats

4. Work out which factors are common among these distances

5. Start by assuming a plausible keyword length and write down each letter on the position

6. Analyze the letters as if it were a Caesar's Shift

7. The current letter of the keyword will be the first letter of the Caesar's Shift cipher alphabet used

8. Repeat step 5 with the next position until you processed all letters or can make out the keyword

Those would include Kasiski examination and a lot of other cryptanalyses. I won't be narrating Phizo's words so here is what he advised me when I sought his help:

Polymorphism is dependent on the key alone, in your example. Notice that the ciphertext will not produce polymorphic results when the key consists of nothing but the same characters. Also notice that the actual key will be visible in the plaintext if the first character is used for the entire key-length. Imagine the decimal value of a character in your plaintext, multiplied by a completely different random decimal value of a character. To make the key of any importance, you're going to want to make it depend on the key (such as t he one in the example you've shown), but with changes that will prevent the issues I pointed out earlier. For example, having the key be an operand in the multiplication along with random values, making the key and the resulting ciphertext polymorphic.

We'll start off with my favourite one - Malbolge. Practically, every esoteric language shouldn't be viewed that much from a programming point but rather a cryptographic one. Those languages are intentionally made sophisticated so that they hinder the writing of any code and obfuscate it immensely. Initially, they were invented to see how far can programming be extended and how many different ways of writing code can be implemented really. Malbolge is considered to be the hardest to write in simply because it's the most perplex of all of em. There are only three programs written in it ever since 1998 so yeah. We're not gonna write another one simply because we won't be able to but what the point here is to output the vigenere or whatever ciphertext we need using an esoteric language. So first off the 'accessories':

Interpreter:

http://pastebin.com/raw.php?i=nR5Ey90r

Assembler --> here

Basic Malbolge generator:
http://www.matthias-ernst.eu/malbolge/stringout.c

Sample Text

Unordered List

Monday, June 16, 2014