1 Overview
- Step back in time with Sergei Prokudin-Gorskii, a visionary photographer who captured the vast Russian Empire in vibrant color over a century ago! His innovatively captured three separate grey-scale images (red, green, and blue) on glass plates but no equipment was available to recreate that colorful world until over 40 years later.
- Our mission? To develop algorithms that automatically align and combine the three glass plates, breathing life back into these historical snapshots!
data:image/s3,"s3://crabby-images/3aaa6/3aaa676a27f4e38682886d706fc347b39852410c" alt="Section 1 Image"
2 Naive Way: A First Glimpse
- Cutting and Stacking: The Naive Approach
- We begin by simply dividing each glass plate into three equal vertical sections, representing the red, green, and blue channels.
- Stacking these sections directly results in a misaligned, blurry mess - a far cry from the vivid scenes Prokudin-Gorskii intended to capture...
- Clearly, we need a smarter approach to align these channels!
data:image/s3,"s3://crabby-images/b94ef/b94ef3aa98af56ed91f03a72e0d53c5f8ec14dd7" alt="Section 2 Image"
3 Starting Small: Alignment on Low-Resolution Images
3.1 Approach: Single-Scale Exhaustive Search
- We start with the smaller
.jpg
images (likemonastery.jpg
andcathedral.jpg
) to test our alignment algorithms. - Our initial strategy is an exhaustive search: we systematically shift the green and red channels relative to the blue channel, evaluating the alignment at each step.
- To measure the goodness of alignment, we experiment with two metrics:
- Euclidean Distance (L2 Norm): \(\sum \sum ( A - B )^2 \), where A matrix = Img_1 and B matrix = Img_2. This is an equivalent formula since the optimization result won't change.
- Normalized Cross-Correlation (NCC): \(\frac{\vec{a} \cdot \vec{b}}{\|\vec{a}\| \|\vec{b}\|} \), where a = Img_1 and b = Img_2. This is just a dot product between normalized image vectors to calculate the similarity.
- Key Implementation Detail:
- We crop the images to focus on the central regions, avoiding potential misalignment at the edges.
- We always perform
np.roll
on the original image, not the cropped one; otherwise, the obvious misaligned edges would appear due to the roll function.
3.2 Result: Promising Beginnings!
- First row of images use L2 metric. Second row uses NCC.
- Coordinates mean displacement (x, y).
- +x means the image is rolled from left to right for x pixels. +y means from top to bottom.
- We achieve near-perfect alignment, laying a solid foundation for tackling the larger, high-resolution plates.
data:image/s3,"s3://crabby-images/934af/934af306e6115dea84105837da66e31c59e2f759" alt="cathedral_L2"
Cathedral L2
Green Channel: (2, 5)
Red Channel: (3, 12)
Run-time: 0.4s
data:image/s3,"s3://crabby-images/28f67/28f6756c58e3801f4d69496350fd502412c103fd" alt="monastery_L2"
Monastery L2
Green Channel: (2, -3)
Red Channel: (2, 3)
Run-time: 0.4s
data:image/s3,"s3://crabby-images/bf3e2/bf3e2c56edcaa1ed2d69505ba90ba9d79d84e0e5" alt="tobolsk_L2"
Tobolsk L2
Green Channel: (3, 3)
Red Channel: (3, 7)
Run-time: 0.4s
data:image/s3,"s3://crabby-images/b1b82/b1b82ba43f422f1c08c1e2bb30a0671c2de4be9d" alt="cathedral_NCC"
Cathedral NCC
Green Channel: (2, 5)
Red Channel: (3, 12)
Run-time: 0.3s
data:image/s3,"s3://crabby-images/09459/094595a81b920c8cb42e51b64b9aa5094fed2bca" alt="monastery_NCC"
Monastery NCC
Green Channel: (2, -3)
Red Channel: (2, 3)
Run-time: 0.3s
data:image/s3,"s3://crabby-images/398c3/398c3fc6ef0913c69066ac8ba88fc35ef3988712" alt="tobolsk_NCC"
Tobolsk NCC
Green Channel: (3, 3)
Red Channel: (3, 7)
Run-time: 0.3s
4 Going Big: Scaling Up with Image Pyramids
4.1 Approach: Efficient Pyramid Algorithm
- The full-resolution
.tif
images are massive (around 3600x3600 pixels)! A simple exhaustive search would be incredibly slow if we want the search window to be big enough eg.[-200, 200]. - Enter the image pyramid: we create a series of downsampled versions of the original image, starting with a very coarse representation (around 400x400 px).
- We align the coarsest images first (using a large search window like [-15, 15] without loss of efficiency), then progressively refine the alignment as we move to finer scales (with a smaller search window like [-3, 3])
- In this way, we create a total search window of around: \(30\times8 + 30\times4 + 30\times2 + 6\times1 = 426\), which is generally enough for all the images we test here! (30 is the length of [-15, 15], 8/4/2/1 are the scaling factors.)
- Intuitive Analogy: This is much like manually aligning two large images - you start with big, rough adjustments, then fine-tune as you get closer. This is also similar to how the learning rate is auto-adjusted in neural network training.
4.2 Result: Near-Perfect Alignment on High-Resolution Images
- All images here use NCC metric.
- Our pyramid approach successfully aligns even the largest images, bringing Prokudin-Gorskii's scenes to life in stunning detail.
- Except for the Emir.jpg! Clear red misalignment can be detected. So we still need to work on that.
data:image/s3,"s3://crabby-images/a4739/a473975b1371b733695fbaf824dda45023563542" alt="church_NCC"
Church
Green Channel: (4, 25)
Red Channel: (-4, 58)
Run-time: 15.3s
data:image/s3,"s3://crabby-images/e716b/e716b00675bf8dce43ba08221d5e983dd89aae2b" alt="harvesters_NCC"
Harvesters
Green Channel: (17, 59)
Red Channel: (15, 123)
Run-time: 15.8s
data:image/s3,"s3://crabby-images/02f92/02f92e1097aa74d9a70ac47f8c355f70ba36d1a7" alt="icon_NCC"
Icon
Green Channel: (18, 41)
Red Channel: (23, 90)
Run-time: 15.9s
data:image/s3,"s3://crabby-images/466ec/466ecc2078d03f121ab22f1ffe0f5a6121c80277" alt="lady_NCC"
Lady
Green Channel: (8, 54)
Red Channel: (11, 115)
Run-time: 15.6s
data:image/s3,"s3://crabby-images/30258/3025802857d7580f449fc34378f39eb7c8e7dbbf" alt="melons_NCC"
Melons
Green Channel: (10, 82)
Red Channel: (13, 179)
Run-time: 16.7s
data:image/s3,"s3://crabby-images/d860a/d860aa857c0cdc7face44eae646aa88e1b53263b" alt="onion_church_NCC"
Onion Church
Green Channel: (27, 51)
Red Channel: (37, 108)
Run-time: 15.9s
data:image/s3,"s3://crabby-images/0e052/0e05238ca5bb1a9ce812861e05be01239607755f" alt="sculpture_NCC"
Sculpture
Green Channel: (-11, 33)
Red Channel: (-27, 140)
Run-time: 16.1s
data:image/s3,"s3://crabby-images/523ac/523acf76d701a5679df7edabfad7cfd3aee363db" alt="self_portrait_NCC"
Self Portrait
Green Channel: (29, 78)
Red Channel: (37, 175)
Run-time: 15.8s
data:image/s3,"s3://crabby-images/0f1ed/0f1ed50c60603fce20f890233fad1d738d883383" alt="three_generations_NCC"
Three Generations
Green Channel: (14, 51)
Red Channel: (12, 110)
Run-time: 15.0s
data:image/s3,"s3://crabby-images/fc2ee/fc2ee92dc945e4b68b86ee7bbdf12af9c6844d8a" alt="train_NCC"
Train
Green Channel: (6, 42)
Red Channel: (32, 86)
Run-time: 16.7s
data:image/s3,"s3://crabby-images/a7689/a7689b1905d8e95774c9b727432dd87c92eccf8a" alt="emir_NCC"
Emir
Green Channel: (24, 48)
Red Channel: (70, 117)
Run-time: 15.6s
5 Bells and Whistles
5.1 Better Features: Gradient Edge Detection
- Due to different brightness in the orginal glass plates, L2 and NCC metrics are not effective anymore.
- Therefore, we choose to not use the raw pixel data for alignment, instead, we choose to detect edges since this feature is less affected by brightness.
- So, we choose to use Sobel Filter that convolve the original image with a sobel kernel to generate a gradient graph. $$ \text{sobel\_x} = \begin{bmatrix} 1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1 \end{bmatrix} \quad \text{sobel\_y} = \begin{bmatrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ -1 & -2 & -1 \end{bmatrix} $$ $$ $$
- Then, we use the same pyramid algorithm to align the gradient images to get the total displacement values.
- Finally, we apply the total displacement values to the original images to get the final aligned images!
data:image/s3,"s3://crabby-images/d4639/d46394de7b0f4caf79cb8ff0aaf1eeafeb8288c7" alt="Edge Detection"
And, here is the final result!
data:image/s3,"s3://crabby-images/a7689/a7689b1905d8e95774c9b727432dd87c92eccf8a" alt="emir_NCC"
Emir (w/o edge detection)
Green Channel: (24, 48)
Red Channel: (70, 117)
Run-time: 15.6s
data:image/s3,"s3://crabby-images/b81a6/b81a67595b33485830ce77af09666c9cf8d76370" alt="emir_NCC_grad"
Emir (w/ edge detection)
Green Channel: (24, 49)
Red Channel: (41, 107)
Run-time: 17.5s
6 Gallery
- All source images from Library Of Congress
- No special sricks spplied: just normal cutting, edge-detecting, aligning, stacking and cropping!
- Prudkin was really a genius. Just enjoy!
data:image/s3,"s3://crabby-images/3e5c0/3e5c032efd85067d7b379cffb863ffd97bfad788" alt="church_NCC"
data:image/s3,"s3://crabby-images/f628f/f628fd998ab5dd167db0128d176eebcd56bf5211" alt="1"
data:image/s3,"s3://crabby-images/544ec/544ecd6f13580ce552dba8825f1462bd2bd7c107" alt="2"
data:image/s3,"s3://crabby-images/47a9d/47a9d66d5b936b6273f1000ab3937f77546fb629" alt="3"
data:image/s3,"s3://crabby-images/75b8d/75b8da53c79aacb4a63726329e14a82f833f1af6" alt="1"
data:image/s3,"s3://crabby-images/c50bc/c50bc19fc007e05e246aecd0cb0d47ac169abc41" alt="2"
data:image/s3,"s3://crabby-images/5bf8e/5bf8eb9ec43b9cb02ada20def5c90c78e47d4622" alt="3"
data:image/s3,"s3://crabby-images/0cc0d/0cc0d1542dd9067af191d4372186575ad360f745" alt="1"
data:image/s3,"s3://crabby-images/4ee77/4ee77101baa1d2c86148705ae0c4c5752f8c2fbb" alt="2"
data:image/s3,"s3://crabby-images/644c1/644c1b1f342e6166ac7f8db45257cb47a2be3ce3" alt="3"