Signatures

Question 1

I'm trying to solve this problem and I'm not sure what to do next.

Link to the problem
Problem statement:
Suppose that some preliminary image preprocessing was already done and you have data in form of coordinates of stars on two pictures. These pictures are about 100x100 millimeters, and coordinates are also given in millimeters relative to their center. Look at the schematic reprsentation below: enter image description here You see that in both pictures stars are shown in roughly circular area (think of it as of apperture of our telescope), and you can find out that they represent the same piece of the sky - thoulgh slightly rotated and slightly shifted.

You also can see that one of the stars (marked with red arrows) have changed its position relative to others.

Your task is to find out such a "wandering star" for it could be comet or asteroid with high probability.

Note that some stars which are close to the edge could be absent from one of pictures (due to shift) - but the "wandering star" is not that far from the center and therefore is presented on both of images.

Input data contains two sections corresponding to two images. Each sequence starts with a single integer - amount of stars listed. Then the coordinates (X and Y) of stars follow.

Answer should give two indexes (0-based) of the wandering star in the first and second section respectively.

The example is the same as pictures above. The star #29 from the first section with coordinates (-18.2, 11.1) is the same as the star #3 from the second section with coordinates (-19.7, 6.9). Example input data:
94 # section 1 contains 94 stars
-47.5 -10.4
19.1 25.9
18.9 -10.4
-2.1 -47.6
...
...
92 # section 2 contains 92 stars
-14.8 10.9
18.8 -0.1
-11.3 5.7
-19.7 6.9
-11.5 -16.7
-45.4 -15.3
6.0 -46.9
-24.1 -26.3
30.2 27.4
...
...

The problem I'm facing
The problem is that the vectors do no match, and don't even have the same size. So for example the first vector in the first section doesn't match the first in the second section, so I can't calculate the rotation matrix based on that. I also tried calculating it based on the centroids of each section, but some points on the edge might be absent, so they will have different centroids(I tried including only the vectors who's length is < 40, the size still doesn't match).

So my question is what should I base my calculations on? How do I find some matching vectors so I can calculate the rotation matrix on them? I need a push in the right direction.

What I did is implement the functions to find the rotation matrix between two given vectors. The formula I'm using:
transformed_vector = R * original_vector + t
where R is the rotation matrix and since the vector also moves along the axis a bit I'm also adding t
Now all I need is two vectors to base my calculations on.

Edit: I should probably mention that I'm actually given two arrays of vectors, one for each image, I'm not actually given the images. I need to find the star that moved based on those vectors.

Thanks!

Question 2

First, for simplicity let's initially pretend that there is no wandering star, and that every star that is in one image is in the other -- that is, that image B has been produced from image A by applying just a (possibly zero) shift and a (possibly zero) rotation.

Signatures

I think the best way to approach this is to "forget about" the position of each star in each image, and instead record for each star only a signature: some information that will not be changed by a shift (translation) or rotation. Then you can look at every signature in image A, and try to match it with a signature in image B. If we make the signatures in a sensible way, then they should do a good job of distinguishing the stars, so that no two stars in the same image get the same or highly similar signatures.

For a given star, something that is not changed by shifting or rotation is its distance to any other star in the same image. So you could use the complete set of distances to all the other n-1 stars as a star's signature (stored as an unordered set, since we don't know "which stars are which"). But it's probably sufficient (and faster, and ultimately more robust) to just use a subset of these, and in that case the distances to its nearest neighbours are probably the most informative (unless you have repeating patterns of stars appearing). How many to use? I would start by calculating the distance of each star to its nearest neighbour in the same image; if in image A, all of these these distances are "different enough" (i.e., have difference below some arbitrary threshold that you choose), and likewise in image B, then you're done -- that is, the signatures that you have calculated already distinguish the stars well enough -- otherwise go back and for each star insert the distance to its second-nearest neighbour into its signature, repeating this until "uniqueness" of signatures is achieved.

Comparing signatures

This brings up the question of how to decide whether two signatures are "different enough" when a signature contains more than one distance. Now you are comparing two sets of numbers instead of just two numbers. One way would be to sort the distances in each signature, and then compare them by calculating some measure of distance between points like the sum of squared differences (e.g., if each signature contains 3 distances, this would be (u[1]-v[1])^2 + (u[2]-v[2])^2 + (u[3]-v[3])^2 for two stars u and v, with u[i] being the distance from star u to its ith-nearest neighbour in the same image, and likewise for v.) On your example dataset, I would guess that probably 3-4 distances per star would be enough.

Robust and efficient signature comparison

However this will turn out to be not particularly robust once wandering and missing stars are considered: Suppose that some star u in image A has the wandering star v as its nearest neighbour, and in image B, v has moved away so that it is now the 5th-nearest neighbour to u. Then if we compare u's signature in image A with its signature in image B, we will get an incorrectly large distance, because we will be "blindly" comparing uA[1] with uB[1] and so on, instead of comparing uA[2] with uB[1], uA[3] with uB[2] and so on as we would like to (since these distances will be equal). So a better way to compare two signatures would be by taking the ordered list of distances of each one and allowing the numbers to "slide along until they fit" the numbers in the other one.

Happily this can be done in linear time with a list merge. Given two sorted lists of numbers X and Y, we perform the usual list merge strategy of choosing the smallest element from either list, removing it from there and writing it out, but we also keep track of two candidate partial solutions: Bm, the score of the best partial solution in which the most recently output number is matched with an earlier number, and Bf, the score of the best partial solution in which it is free (not matched with anything yet). Choose a penalty cost p to assign to numbers that are not matched with any number in the other list, and a scoring function score(a, b) that assigns a value of 0 when a = b and higher values the more different a and b are (e.g., score(a, b) = (a - b)^2). Then the optimal score sigDiff(X, Y) can be computed using:

Set Bm = Bf = 0.
If X and Y are both empty, then halt. The final score is min(Bf + p, Bm).
Call the element at the front of X x, and the element at the front of Y y. Let z be the smaller of x and y, and set whichList to the ID of the list (X or Y) that z came from.
Remove the first element (z) from the list named by whichList.
If whichList == prevWhichList (i.e. if the previous smallest number, prev, was also from the same list as z), or if this is the first iteration, then set newBm = min(Bf + p, Bm) + p.
Otherwise, it's possible to match z with the previous number written out, so set newBm = Bf + score(z, prev).
Set Bf = min(Bf + p, Bm).
Set Bm = newBm.
Set prev = z and prevWhichList = whichList.
Goto 2.

For example, suppose we have the two distance lists (i.e., two signatures) X = [10, 20, 30, 40] and Y = [11, 18, 41, 50], the penalty is 20, and score() is the sum of squared differences as above. The above algorithm will "align" the two sequences as follows:

X        10          20   30   40
matches    \        /            \
Y           11    18              41   50
cost         1     4      20       1   20      Total cost: 46

By contrast, the "naive" sum-of-squares method would give:

X        10   20   30   40
matches   |    |    |    |
Y        11   18   41   50
cost      1    4  121  100                     Total cost: 226

Matching signatures between the two images

We have built all this machinery so that we can try to match stars between the two images by matching their signatures. Arguably the "right" way to do this would be to solve the bipartite maximum weighted matching problem on the graph in which the vertices in one side are the signatures of the stars in image A, the vertices in the other side are the signatures of the stars in image B, and there is an edge from every vertex X in one side to every vertex Y in the other side having weight -sigDiff(X, Y) (negated because we want to minimise the total difference).

However, solving this might take a while, and it's also possible that this algorithm will find some incorrect "approximate" matches in an attempt to get an overall minimum cost. We would prefer to focus on just those pairs of stars which we are certain correspond to one another. That can be done much faster with a heuristic based on the idea that if, for a star X in image A whose closest star in image B (based on sigDiff()) is Y, it turns out that Y's closest star in image A is also X (i.e., if the best match of X's best match is X), then we can be confident that X and Y really match each other. These confident matching pairs can be found quickly, and used to estimate an affine transform from image A to image B using least squares.

Ignoring missing stars

We first compute the convex hull of image B's stars. Then, using the transform determined from the confident star pairs, we transform every star in image A to its estimated corresponding location in image B, and take the convex hull of this transformed image. We then intersect these two convex hulls: every star in either image that falls inside this intersection is a star that we expect to find in both images. Finally we throw away all stars from either image that are outside this intersected convex hull, and start again!

After running everything again (maybe it's not necessary to rerun everything, actually), we should find that there is exactly 1 star in image A and 1 star in image B that have poor similarity to all other stars in the other image (as measured by sigDiff(), as usual). These "two" stars are of course the single "wandering" star :)

Wandering star - codeabbey task

Signatures

Comparing signatures

Robust and efficient signature comparison

Matching signatures between the two images

Ignoring missing stars

Related Q&A

Find delimiter in txt to convert to csv using Python

Assert mocked function called with json string in python

read certificate(.crt) and key(.key) file in python

Admin FileField current url incorrect

Difference between generator expression and generator function

Django performance testing suite thatll report on metrics (db queries etc.)

dev_appserver.py Opens a Text File, Does Not Deploy

How to pass a list from a view to template in django

Probing/sampling/interpolating VTK data using python TVTK or MayaVi

Make Sphinx generate RST class documentation from pydoc