EAST（Efficient and Accurate Scene Text Detector）文字偵測演算法的完整 Python 教學

📌 教學目標

使用 OpenCV 內建的 EAST 模型 (frozen_east_text_detection.pb)
載入一張圖片並執行文字區塊偵測
以矩形框框出偵測到的文字位置

✅ 教學環境需求

元件	說明
Python 3.x	建議 3.8 以上
OpenCV	安裝 `opencv-contrib-python`
模型檔案	`frozen_east_text_detection.pb`

安裝必要套件

pip install opencv-python opencv-contrib-python numpy

📁 下載 EAST 模型檔案

你可以使用以下指令下載：

wget https://github.com/oyyd/frozen_east_text_detection.pb/raw/master/frozen_east_text_detection.pb

或手動從 GitHub - EAST pretrained model 下載。

🧪 Python 實作程式碼

儲存為 east_text_detect.py：

import cv2
import numpy as np

# 讀取模型
net = cv2.dnn.readNet("frozen_east_text_detection.pb")

# 載入影像
image = cv2.imread("test.webp")
orig = image.copy()
(H, W) = image.shape[:2]

# 設定輸入尺寸（必須是32的倍數）
newW, newH = (320, 320)
rW = W / float(newW)
rH = H / float(newH)
resized = cv2.resize(image, (newW, newH))
blob = cv2.dnn.blobFromImage(resized, 1.0, (newW, newH),
                             (123.68, 116.78, 103.94), swapRB=True, crop=False)

# 模型輸出層
outputLayers = ["feature_fusion/Conv_7/Sigmoid", "feature_fusion/concat_3"]
net.setInput(blob)
(scores, geometry) = net.forward(outputLayers)

# 解碼函數：從模型輸出取出 boxes
def decode(scores, geometry, confThreshold):
    (numRows, numCols) = scores.shape[2:4]
    boxes = []
    confidences = []

    for y in range(numRows):
        scoresData = scores[0, 0, y]
        x0 = geometry[0, 0, y]
        x1 = geometry[0, 1, y]
        x2 = geometry[0, 2, y]
        x3 = geometry[0, 3, y]
        anglesData = geometry[0, 4, y]
        for x in range(numCols):
            if scoresData[x] < confThreshold:
                continue

            offsetX, offsetY = x * 4.0, y * 4.0
            angle = anglesData[x]
            cos = np.cos(angle)
            sin = np.sin(angle)
            h = x0[x] + x2[x]
            w = x1[x] + x3[x]
            endX = int(offsetX + cos * x1[x] + sin * x2[x])
            endY = int(offsetY - sin * x1[x] + cos * x2[x])
            startX = int(endX - w)
            startY = int(endY - h)

            boxes.append([startX, startY, endX, endY])
            confidences.append(float(scoresData[x]))

    return boxes, confidences

boxes, confidences = decode(scores, geometry, confThreshold=0.5)

# 轉換為 [x, y, w, h] 格式以套用 NMSBoxes
rects = []
for (startX, startY, endX, endY) in boxes:
    rects.append([startX, startY, endX - startX, endY - startY])

# 非極大值抑制
indices = cv2.dnn.NMSBoxes(rects, confidences, score_threshold=0.5, nms_threshold=0.4)

# 繪製結果
if len(indices) > 0:
    for i in indices.flatten():
        (startX, startY, endX, endY) = boxes[i]
        startX = int(startX * rW)
        startY = int(startY * rH)
        endX = int(endX * rW)
        endY = int(endY * rH)
        cv2.rectangle(orig, (startX, startY), (endX, endY), (0, 255, 0), 2)

# 顯示結果
cv2.imshow("Text Detection", orig)
cv2.waitKey(0)
cv2.destroyAllWindows()

📷 測試用圖片

請放置一張含有明顯文字的圖片命名為 test.webp 與 east_text_detect.py 放在同一資料夾中。

✅ 教學重點回顧

步驟	重點
1️⃣ 下載 EAST `.pb` 模型	一定要是 TensorFlow frozen graph
2️⃣ 使用 `cv2.dnn.readNet()`	輸入模型與影像尺寸需對應（32 倍數）
3️⃣ 執行 `forward()` 並解析 geometry	取得文字位置與角度
4️⃣ 非極大值抑制處理重疊框	`cv2.dnn.NMSBoxesRotated`
5️⃣ 顯示結果	使用 OpenCV `rectangle` 框出文字

EAST（Efficient and Accurate Sc

liusming

劉老師的跨域創想工坊

liusming 發表在痞客邦留言(0) 人氣( 209 )

全站分類：視覺設計

▲top

請先登入以發表留言。

劉老師的跨域創想工坊

EAST（Efficient and Accurate Scene Text Detector）文字偵測演算法 的 完整 Python 教學

📌 教學目標

✅ 教學環境需求

安裝必要套件

📁 下載 EAST 模型檔案

🧪 Python 實作程式碼

📷 測試用圖片

✅ 教學重點回顧

你可能也喜歡

參觀人氣

成人內容提醒