How to find hidden files inside image files (Jpg/Gif/Png) [closed]

2024/10/13 5:23:06

I came across a link that shows how to hide number of files inside an image file: http://lifehacker.com/282119/hide-files-inside-of-jpeg-images more discussion on detection here: http://ask.metafilter.com/119943/How-to-detect-RARsEXEs-hidden-in-JPGs

I'm trying to find out what is a good way to programmatically detect whether an image file has other files hidden inside it? Should I try unzipping the file to see if other files come out of it?

I'm not bound programmatically but something that works well on the JVM would be great.

Update

One Approach:

Would something like this work (suggested by someone on metafilter)

$ cat orig.jpg test.zip > stacked.jpg
$ file stacked.jpg 
stacked.jpg: JPEG image data, JFIF standard 1.01
$ convert stacked.jpg stripped.jpg  # this is an ImageMagick command
$ ls -l11483 orig.jpg
322399 stacked.jpg11484 stripped.jpg
310916 test.zip

I could use JMagick for this approach.

Answer

Great question!

If all you want to check for is a RAR or ZIP file appended to the end of an image file, then running it through the unrar or unzip command is the easiest way to do it.

If you want a faster but less exact check, you can check for some of the special file format signatures that indicate certain types of files. The usual UNIX tool to identify file format is file. It uses a database of binary file signatures, whose format is defined in the magic(5) man page. It won’t find a RAR file for you at the end of a JPEG, because it only looks at the start of files to try to identify them quickly, but you might be able to modify its source code to do what you want. You could also reuse its database of file signatures. If you look at the archive file part of its database in the Rar files section, it shows this:

# RAR archiver (Greg Roelofs, [email protected])
0   string      Rar!        RAR archive data,

which indicates that if your JPEG file contains the four bytes Rar! that would be suspicious. But you would have to examine the Rar file format spec in detail to check whether more of the Rar file structure is present to avoid false positives—this web page also contains the four bytes Rar! but there are no hidden files attached to it :P

But if someone knows the details of your automated checks, they could easily work around them. The simplest workaround would be to reverse all the bytes of the files before appending them to the JPEG. Then none of your signatures would catch the reversed version of the file.


If someone really wants to hide a file inside an image, there are all sorts of ways to do that that you won’t be able to detect easily. The general term for this is “steganography.” The Wikipedia page, for example, shows a picture of trees that has a picture of a cat hidden inside it. For simpler steganographic methods, there are statistical tests that can indicate something funny has been done to a picture, but if someone spends a lot of time to come up with their own method to hide other files inside images, you won’t be able to detect it.

https://en.xdnf.cn/q/69569.html

Related Q&A

How to open a simple image using streams in Pillow-Python

from PIL import Imageimage = Image.open("image.jpg")file_path = io.BytesIO();image.save(file_path,JPEG);image2 = Image.open(file_path.getvalue());I get this error TypeError: embedded NUL char…

SyntaxError: Non-UTF-8 code starting with \x82 [duplicate]

This question already has answers here:"SyntaxError: Non-ASCII character ..." or "SyntaxError: Non-UTF-8 code starting with ..." trying to use non-ASCII text in a Python script(7 an…

How to identify the CPU core ID of a process on Python multiprocessing?

I am testing Pythons multiprocessing module on a cluster with SLURM. I want to make absolutely sure that each of my tasks are actually running on separate cpu cores as I intend. Due to the many possibi…

Finding highest values in each row in a data frame for python

Id like to find the highest values in each row and return the column header for the value in python. For example, Id like to find the top two in each row:df = A B C D 5 9 8 2 4 …

Using pytest_addoptions in a non-root conftest.py

I have a project that has the following structure: Project/ | +-- src/ | | | +-- proj/ | | | +-- __init__.py | +-- code.py | +-- tests/ | | | +-- __init_…

How to count distinct values in a combination of columns while grouping by in pandas?

I have a pandas data frame. I want to group it by using one combination of columns and count distinct values of another combination of columns.For example I have the following data frame:a b c …

Set Environmental Variables in Python with Popen

I want to set an environmental variable in linux terminal through a python script. I seem to be able to set environmental variables when using os.environ[BLASTDB] = /path/to/directory .However I was in…

Python - pandas - Append Series into Blank DataFrame

Say I have two pandas Series in python:import pandas as pd h = pd.Series([g,4,2,1,1]) g = pd.Series([1,6,5,4,"abc"])I can create a DataFrame with just h and then append g to it:df = pd.DataFr…

How to retrieve values from a function run in parallel processes?

The Multiprocessing module is quite confusing for python beginners specially for those who have just migrated from MATLAB and are made lazy with its parallel computing toolbox. I have the following fun…

SignalR Alternative for Python

What would be an alternative for SignalR in Python world?To be precise, I am using tornado with python 2.7.6 on Windows 8; and I found sockjs-tornado (Python noob; sorry for any inconveniences). But s…