1 Overview
- Step back in time with Sergei Prokudin-Gorskii, a visionary photographer who captured the vast Russian Empire in vibrant color over a century ago! His innovatively captured three separate grey-scale images (red, green, and blue) on glass plates but no equipment was available to recreate that colorful world until over 40 years later.
- Our mission? To develop algorithms that automatically align and combine the three glass plates, breathing life back into these historical snapshots!

2 Naive Way: A First Glimpse
- Cutting and Stacking: The Naive Approach
- We begin by simply dividing each glass plate into three equal vertical sections, representing the red, green, and blue channels.
- Stacking these sections directly results in a misaligned, blurry mess - a far cry from the vivid scenes Prokudin-Gorskii intended to capture...
- Clearly, we need a smarter approach to align these channels!

3 Starting Small: Alignment on Low-Resolution Images
3.1 Approach: Single-Scale Exhaustive Search
- We start with the smaller
.jpg
images (likemonastery.jpg
andcathedral.jpg
) to test our alignment algorithms. - Our initial strategy is an exhaustive search: we systematically shift the green and red channels relative to the blue channel, evaluating the alignment at each step.
- To measure the goodness of alignment, we experiment with two metrics:
- Euclidean Distance (L2 Norm): \(\sum \sum ( A - B )^2 \), where A matrix = Img_1 and B matrix = Img_2. This is an equivalent formula since the optimization result won't change.
- Normalized Cross-Correlation (NCC): \(\frac{\vec{a} \cdot \vec{b}}{\|\vec{a}\| \|\vec{b}\|} \), where a = Img_1 and b = Img_2. This is just a dot product between normalized image vectors to calculate the similarity.
- Key Implementation Detail:
- We crop the images to focus on the central regions, avoiding potential misalignment at the edges.
- We always perform
np.roll
on the original image, not the cropped one; otherwise, the obvious misaligned edges would appear due to the roll function.
3.2 Result: Promising Beginnings!
- First row of images use L2 metric. Second row uses NCC.
- Coordinates mean displacement (x, y).
- +x means the image is rolled from left to right for x pixels. +y means from top to bottom.
- We achieve near-perfect alignment, laying a solid foundation for tackling the larger, high-resolution plates.

Cathedral L2
Green Channel: (2, 5)
Red Channel: (3, 12)
Run-time: 0.4s

Monastery L2
Green Channel: (2, -3)
Red Channel: (2, 3)
Run-time: 0.4s

Tobolsk L2
Green Channel: (3, 3)
Red Channel: (3, 7)
Run-time: 0.4s

Cathedral NCC
Green Channel: (2, 5)
Red Channel: (3, 12)
Run-time: 0.3s

Monastery NCC
Green Channel: (2, -3)
Red Channel: (2, 3)
Run-time: 0.3s

Tobolsk NCC
Green Channel: (3, 3)
Red Channel: (3, 7)
Run-time: 0.3s
4 Going Big: Scaling Up with Image Pyramids
4.1 Approach: Efficient Pyramid Algorithm
- The full-resolution
.tif
images are massive (around 3600x3600 pixels)! A simple exhaustive search would be incredibly slow if we want the search window to be big enough eg.[-200, 200]. - Enter the image pyramid: we create a series of downsampled versions of the original image, starting with a very coarse representation (around 400x400 px).
- We align the coarsest images first (using a large search window like [-15, 15] without loss of efficiency), then progressively refine the alignment as we move to finer scales (with a smaller search window like [-3, 3])
- In this way, we create a total search window of around: \(30\times8 + 30\times4 + 30\times2 + 6\times1 = 426\), which is generally enough for all the images we test here! (30 is the length of [-15, 15], 8/4/2/1 are the scaling factors.)
- Intuitive Analogy: This is much like manually aligning two large images - you start with big, rough adjustments, then fine-tune as you get closer. This is also similar to how the learning rate is auto-adjusted in neural network training.
4.2 Result: Near-Perfect Alignment on High-Resolution Images
- All images here use NCC metric.
- Our pyramid approach successfully aligns even the largest images, bringing Prokudin-Gorskii's scenes to life in stunning detail.
- Except for the Emir.jpg! Clear red misalignment can be detected. So we still need to work on that.

Church
Green Channel: (4, 25)
Red Channel: (-4, 58)
Run-time: 15.3s

Harvesters
Green Channel: (17, 59)
Red Channel: (15, 123)
Run-time: 15.8s

Icon
Green Channel: (18, 41)
Red Channel: (23, 90)
Run-time: 15.9s

Lady
Green Channel: (8, 54)
Red Channel: (11, 115)
Run-time: 15.6s

Melons
Green Channel: (10, 82)
Red Channel: (13, 179)
Run-time: 16.7s

Onion Church
Green Channel: (27, 51)
Red Channel: (37, 108)
Run-time: 15.9s

Sculpture
Green Channel: (-11, 33)
Red Channel: (-27, 140)
Run-time: 16.1s

Self Portrait
Green Channel: (29, 78)
Red Channel: (37, 175)
Run-time: 15.8s

Three Generations
Green Channel: (14, 51)
Red Channel: (12, 110)
Run-time: 15.0s

Train
Green Channel: (6, 42)
Red Channel: (32, 86)
Run-time: 16.7s

Emir
Green Channel: (24, 48)
Red Channel: (70, 117)
Run-time: 15.6s
5 Bells and Whistles
5.1 Better Features: Gradient Edge Detection
- Due to different brightness in the orginal glass plates, L2 and NCC metrics are not effective anymore.
- Therefore, we choose to not use the raw pixel data for alignment, instead, we choose to detect edges since this feature is less affected by brightness.
- So, we choose to use Sobel Filter that convolve the original image with a sobel kernel to generate a gradient graph. $$ \text{sobel\_x} = \begin{bmatrix} 1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1 \end{bmatrix} \quad \text{sobel\_y} = \begin{bmatrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ -1 & -2 & -1 \end{bmatrix} $$ $$ $$
- Then, we use the same pyramid algorithm to align the gradient images to get the total displacement values.
- Finally, we apply the total displacement values to the original images to get the final aligned images!

And, here is the final result!

Emir (w/o edge detection)
Green Channel: (24, 48)
Red Channel: (70, 117)
Run-time: 15.6s

Emir (w/ edge detection)
Green Channel: (24, 49)
Red Channel: (41, 107)
Run-time: 17.5s
6 Gallery
- All source images from Library Of Congress
- No special sricks spplied: just normal cutting, edge-detecting, aligning, stacking and cropping!
- Prudkin was really a genius. Just enjoy!









