Cover Image

Reviving the Past:

Automatic Colorization of Prokudin-Gorskii's Russia

1 Overview

  • Step back in time with Sergei Prokudin-Gorskii, a visionary photographer who captured the vast Russian Empire in vibrant color over a century ago! His innovatively captured three separate grey-scale images (red, green, and blue) on glass plates but no equipment was available to recreate that colorful world until over 40 years later.
  • Our mission? To develop algorithms that automatically align and combine the three glass plates, breathing life back into these historical snapshots!

  

Section 1 Image

2 Naive Way: A First Glimpse

  • Cutting and Stacking: The Naive Approach
  • We begin by simply dividing each glass plate into three equal vertical sections, representing the red, green, and blue channels.
  • Stacking these sections directly results in a misaligned, blurry mess - a far cry from the vivid scenes Prokudin-Gorskii intended to capture...
  • Clearly, we need a smarter approach to align these channels!
Section 2 Image

3 Starting Small: Alignment on Low-Resolution Images

3.1 Approach: Single-Scale Exhaustive Search

  • We start with the smaller .jpg images (like monastery.jpg and cathedral.jpg) to test our alignment algorithms.
  • Our initial strategy is an exhaustive search: we systematically shift the green and red channels relative to the blue channel, evaluating the alignment at each step.
  • To measure the goodness of alignment, we experiment with two metrics:
    • Euclidean Distance (L2 Norm): \(\sum \sum ( A - B )^2 \), where A matrix = Img_1 and B matrix = Img_2. This is an equivalent formula since the optimization result won't change.
    • Normalized Cross-Correlation (NCC): \(\frac{\vec{a} \cdot \vec{b}}{\|\vec{a}\| \|\vec{b}\|} \), where a = Img_1 and b = Img_2. This is just a dot product between normalized image vectors to calculate the similarity.
  • Key Implementation Detail:
    • We crop the images to focus on the central regions, avoiding potential misalignment at the edges.
    • We always perform np.roll on the original image, not the cropped one; otherwise, the obvious misaligned edges would appear due to the roll function.

3.2 Result: Promising Beginnings!

  • First row of images use L2 metric. Second row uses NCC.
  • Coordinates mean displacement (x, y).
  • +x means the image is rolled from left to right for x pixels. +y means from top to bottom.
  • We achieve near-perfect alignment, laying a solid foundation for tackling the larger, high-resolution plates.
cathedral_L2

Cathedral L2

Green Channel: (2, 5)

Red Channel: (3, 12)

Run-time: 0.4s

monastery_L2

Monastery L2

Green Channel: (2, -3)

Red Channel: (2, 3)

Run-time: 0.4s

tobolsk_L2

Tobolsk L2

Green Channel: (3, 3)

Red Channel: (3, 7)

Run-time: 0.4s

cathedral_NCC

Cathedral NCC

Green Channel: (2, 5)

Red Channel: (3, 12)

Run-time: 0.3s

monastery_NCC

Monastery NCC

Green Channel: (2, -3)

Red Channel: (2, 3)

Run-time: 0.3s

tobolsk_NCC

Tobolsk NCC

Green Channel: (3, 3)

Red Channel: (3, 7)

Run-time: 0.3s


4 Going Big: Scaling Up with Image Pyramids

4.1 Approach: Efficient Pyramid Algorithm

  • The full-resolution .tif images are massive (around 3600x3600 pixels)! A simple exhaustive search would be incredibly slow if we want the search window to be big enough eg.[-200, 200].
  • Enter the image pyramid: we create a series of downsampled versions of the original image, starting with a very coarse representation (around 400x400 px).
  • We align the coarsest images first (using a large search window like [-15, 15] without loss of efficiency), then progressively refine the alignment as we move to finer scales (with a smaller search window like [-3, 3])
  • In this way, we create a total search window of around: \(30\times8 + 30\times4 + 30\times2 + 6\times1 = 426\), which is generally enough for all the images we test here! (30 is the length of [-15, 15], 8/4/2/1 are the scaling factors.)
  • Intuitive Analogy: This is much like manually aligning two large images - you start with big, rough adjustments, then fine-tune as you get closer. This is also similar to how the learning rate is auto-adjusted in neural network training.

4.2 Result: Near-Perfect Alignment on High-Resolution Images

  • All images here use NCC metric.
  • Our pyramid approach successfully aligns even the largest images, bringing Prokudin-Gorskii's scenes to life in stunning detail.
  • Except for the Emir.jpg! Clear red misalignment can be detected. So we still need to work on that.
church_NCC

Church

Green Channel: (4, 25)

Red Channel: (-4, 58)

Run-time: 15.3s

harvesters_NCC

Harvesters

Green Channel: (17, 59)

Red Channel: (15, 123)

Run-time: 15.8s

icon_NCC

Icon

Green Channel: (18, 41)

Red Channel: (23, 90)

Run-time: 15.9s

lady_NCC

Lady

Green Channel: (8, 54)

Red Channel: (11, 115)

Run-time: 15.6s

melons_NCC

Melons

Green Channel: (10, 82)

Red Channel: (13, 179)

Run-time: 16.7s

onion_church_NCC

Onion Church

Green Channel: (27, 51)

Red Channel: (37, 108)

Run-time: 15.9s

sculpture_NCC

Sculpture

Green Channel: (-11, 33)

Red Channel: (-27, 140)

Run-time: 16.1s

self_portrait_NCC

Self Portrait

Green Channel: (29, 78)

Red Channel: (37, 175)

Run-time: 15.8s

three_generations_NCC

Three Generations

Green Channel: (14, 51)

Red Channel: (12, 110)

Run-time: 15.0s

train_NCC

Train

Green Channel: (6, 42)

Red Channel: (32, 86)

Run-time: 16.7s

emir_NCC

Emir

Green Channel: (24, 48)

Red Channel: (70, 117)

Run-time: 15.6s


5 Bells and Whistles

5.1 Better Features: Gradient Edge Detection

  • Due to different brightness in the orginal glass plates, L2 and NCC metrics are not effective anymore.
  • Therefore, we choose to not use the raw pixel data for alignment, instead, we choose to detect edges since this feature is less affected by brightness.
  • So, we choose to use Sobel Filter that convolve the original image with a sobel kernel to generate a gradient graph.
  • $$ \text{sobel\_x} = \begin{bmatrix} 1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1 \end{bmatrix} \quad \text{sobel\_y} = \begin{bmatrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ -1 & -2 & -1 \end{bmatrix} $$ $$ $$
  • Then, we use the same pyramid algorithm to align the gradient images to get the total displacement values.
  • Finally, we apply the total displacement values to the original images to get the final aligned images!

Edge Detection

And, here is the final result!

emir_NCC

Emir (w/o edge detection)

Green Channel: (24, 48)

Red Channel: (70, 117)

Run-time: 15.6s

emir_NCC_grad

Emir (w/ edge detection)

Green Channel: (24, 49)

Red Channel: (41, 107)

Run-time: 17.5s


6 Gallery

church_NCC
1
2
3
1
2
3
1
2
3