1 Overview

Step back in time with Sergei Prokudin-Gorskii, a visionary photographer who captured the vast Russian Empire in vibrant color over a century ago! His innovatively captured three separate grey-scale images (red, green, and blue) on glass plates but no equipment was available to recreate that colorful world until over 40 years later.
Our mission? To develop algorithms that automatically align and combine the three glass plates, breathing life back into these historical snapshots!

2 Naive Way: A First Glimpse

Cutting and Stacking: The Naive Approach
We begin by simply dividing each glass plate into three equal vertical sections, representing the red, green, and blue channels.
Stacking these sections directly results in a misaligned, blurry mess - a far cry from the vivid scenes Prokudin-Gorskii intended to capture...
Clearly, we need a smarter approach to align these channels!

3 Starting Small: Alignment on Low-Resolution Images

3.1 Approach: Single-Scale Exhaustive Search

We start with the smaller .jpg images (like monastery.jpg and cathedral.jpg) to test our alignment algorithms.
Our initial strategy is an exhaustive search: we systematically shift the green and red channels relative to the blue channel, evaluating the alignment at each step.
To measure the goodness of alignment, we experiment with two metrics:
- Euclidean Distance (L2 Norm): \(\sum \sum ( A - B )^2 \), where A matrix = Img_1 and B matrix = Img_2. This is an equivalent formula since the optimization result won't change.
- Normalized Cross-Correlation (NCC): \(\frac{\vec{a} \cdot \vec{b}}{\|\vec{a}\| \|\vec{b}\|} \), where a = Img_1 and b = Img_2. This is just a dot product between normalized image vectors to calculate the similarity.
Key Implementation Detail:
- We crop the images to focus on the central regions, avoiding potential misalignment at the edges.
- We always perform np.roll on the original image, not the cropped one; otherwise, the obvious misaligned edges would appear due to the roll function.

3.2 Result: Promising Beginnings!

First row of images use L2 metric. Second row uses NCC.
Coordinates mean displacement (x, y).
+x means the image is rolled from left to right for x pixels. +y means from top to bottom.
We achieve near-perfect alignment, laying a solid foundation for tackling the larger, high-resolution plates.

Cathedral L2

Green Channel: (2, 5)

Red Channel: (3, 12)

Run-time: 0.4s

Monastery L2

Green Channel: (2, -3)

Red Channel: (2, 3)

Run-time: 0.4s

Tobolsk L2

Green Channel: (3, 3)

Red Channel: (3, 7)

Run-time: 0.4s

Cathedral NCC

Green Channel: (2, 5)

Red Channel: (3, 12)

Run-time: 0.3s

Monastery NCC

Green Channel: (2, -3)

Red Channel: (2, 3)

Run-time: 0.3s

Tobolsk NCC

Green Channel: (3, 3)

Red Channel: (3, 7)

Run-time: 0.3s

4 Going Big: Scaling Up with Image Pyramids

4.1 Approach: Efficient Pyramid Algorithm

The full-resolution .tif images are massive (around 3600x3600 pixels)! A simple exhaustive search would be incredibly slow if we want the search window to be big enough eg.[-200, 200].
Enter the image pyramid: we create a series of downsampled versions of the original image, starting with a very coarse representation (around 400x400 px).
We align the coarsest images first (using a large search window like [-15, 15] without loss of efficiency), then progressively refine the alignment as we move to finer scales (with a smaller search window like [-3, 3])
In this way, we create a total search window of around: \(30\times8 + 30\times4 + 30\times2 + 6\times1 = 426\), which is generally enough for all the images we test here! (30 is the length of [-15, 15], 8/4/2/1 are the scaling factors.)
Intuitive Analogy: This is much like manually aligning two large images - you start with big, rough adjustments, then fine-tune as you get closer. This is also similar to how the learning rate is auto-adjusted in neural network training.

4.2 Result: Near-Perfect Alignment on High-Resolution Images

All images here use NCC metric.
Our pyramid approach successfully aligns even the largest images, bringing Prokudin-Gorskii's scenes to life in stunning detail.
Except for the Emir.jpg! Clear red misalignment can be detected. So we still need to work on that.

Church

Green Channel: (4, 25)

Red Channel: (-4, 58)

Run-time: 15.3s

Harvesters

Green Channel: (17, 59)

Red Channel: (15, 123)

Run-time: 15.8s

Icon

Green Channel: (18, 41)

Red Channel: (23, 90)

Run-time: 15.9s

Lady

Green Channel: (8, 54)

Red Channel: (11, 115)

Run-time: 15.6s

Melons

Green Channel: (10, 82)

Red Channel: (13, 179)

Run-time: 16.7s

Onion Church

Green Channel: (27, 51)

Red Channel: (37, 108)

Run-time: 15.9s

Sculpture

Green Channel: (-11, 33)

Red Channel: (-27, 140)

Run-time: 16.1s

Self Portrait

Green Channel: (29, 78)

Red Channel: (37, 175)

Run-time: 15.8s

Three Generations

Green Channel: (14, 51)

Red Channel: (12, 110)

Run-time: 15.0s

Train

Green Channel: (6, 42)

Red Channel: (32, 86)

Run-time: 16.7s

Emir

Green Channel: (24, 48)

Red Channel: (70, 117)

Run-time: 15.6s

5 Bells and Whistles

5.1 Better Features: Gradient Edge Detection

Due to different brightness in the orginal glass plates, L2 and NCC metrics are not effective anymore.
Therefore, we choose to not use the raw pixel data for alignment, instead, we choose to detect edges since this feature is less affected by brightness.
So, we choose to use Sobel Filter that convolve the original image with a sobel kernel to generate a gradient graph.
Then, we use the same pyramid algorithm to align the gradient images to get the total displacement values.
Finally, we apply the total displacement values to the original images to get the final aligned images!

And, here is the final result!

Emir (w/o edge detection)

Green Channel: (24, 48)

Red Channel: (70, 117)

Run-time: 15.6s

Emir (w/ edge detection)

Green Channel: (24, 49)

Red Channel: (41, 107)

Run-time: 17.5s

Reviving the Past:

Automatic Colorization of Prokudin-Gorskii's Russia

1 Overview

2 Naive Way: A First Glimpse

3 Starting Small: Alignment on Low-Resolution Images

3.1 Approach: Single-Scale Exhaustive Search

3.2 Result: Promising Beginnings!

4 Going Big: Scaling Up with Image Pyramids

4.1 Approach: Efficient Pyramid Algorithm

4.2 Result: Near-Perfect Alignment on High-Resolution Images

5 Bells and Whistles

5.1 Better Features: Gradient Edge Detection

6 Gallery