Cropping a crossword puzzle with computer vision
Say you want to extract a crossword puzzle out of the following image:
How would you do it?
Well, let’s start by converting the image to grayscale:
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Let’s crop out the header and footer to narrow down our region of interest. Why? Well we’re going to eventually run a contour-detection algorithm and we want to filter away as much noise as possible:
height, width = gray.shape
roi_y1, roi_y2 = int(height*0.30), int(height*0.85)
roi_x1, roi_x2 = int(width*0.20), int(width*0.85)
gray_roi = gray[roi_y1:roi_y2, roi_x1:roi_x2]
Great! Now let’s binarize the image:
binarized_roi = cv2.adaptiveThreshold(
gray_roi, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, 31, 10
)
Okay! Now let’s run a long thin horizontal kernel to filter away everything that isn’t a long thin horizontal line. Apply erosion and dilation to help fill in the gaps:
scale = 15 # adjust per image size
horiz_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (gray_roi.shape[1]//scale, 1))
horiz = cv2.dilate(cv2.erode(binarized_roi, horiz_kernel, 1), horiz_kernel, 1)
Now do the same thing with a long thin vertical kernel, also applying erosion and dilation:
scale = 15 # adjust per image size
vert_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, gray_roi.shape[0]//scale))
vert = cv2.dilate(cv2.erode(binarized_roi, vert_kernel, 1), vert_kernel, 1)
Now let’s add the horiztonal and vertical bitmaps back together:
grid = cv2.addWeighted(horiz, 0.5, vert, 0.5, 0)
Now it’s time for the magic - let’s run a contour detection algorithm over the bitmap. This is basically finding blobs with their perimeter:
contours, _ = cv2.findContours(grid, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
Looks like there are a few contours, including the crossword contour! We need some way to rank them - let’s do it by area and width/height ratio:
candidates = []
for contour in contours:
x, y, contour_width, contour_height = cv2.boundingRect(contour)
contour_area = contour_width * contour_height
width_height_contour_ratio = contour_width / float(contour_height)
if contour_area > 15000 and 0.8 < width_height_contour_ratio < 1.2: # large and roughly square
candidates.append((contour_area, x, y, contour_width, contour_height))
top_candidate = candidates[0]
Looks good! Finally, let’s use the top candidate to crop the crossword from the original image:
_, x, y, contour_width, contour_height = top_candidate
crop = img_roi[y:y+contour_height, x:x+contour_width]
And that’s how you crop a crossword with computer vision!
Credits
Thanks to ChatGPT for coming up with the pipeline and filling in the gaps.
Also thanks to OpenCV for their documentation on Erosion and Dilation - it’s very well written!