📌 教學目標
-
使用 OpenCV 內建的 EAST 模型 (
frozen_east_text_detection.pb) -
載入一張圖片並執行文字區塊偵測
-
以矩形框框出偵測到的文字位置
✅ 教學環境需求
| 元件 | 說明 |
|---|---|
| Python 3.x | 建議 3.8 以上 |
| OpenCV | 安裝 opencv-contrib-python |
| 模型檔案 | frozen_east_text_detection.pb |
安裝必要套件
pip install opencv-python opencv-contrib-python numpy
📁 下載 EAST 模型檔案
你可以使用以下指令下載:
wget https://github.com/oyyd/frozen_east_text_detection.pb/raw/master/frozen_east_text_detection.pb
或手動從 GitHub - EAST pretrained model 下載。
🧪 Python 實作程式碼
儲存為 east_text_detect.py:
import cv2
import numpy as np
# 讀取模型
net = cv2.dnn.readNet("frozen_east_text_detection.pb")
# 載入影像
image = cv2.imread("test.webp")
orig = image.copy()
(H, W) = image.shape[:2]
# 設定輸入尺寸(必須是32的倍數)
newW, newH = (320, 320)
rW = W / float(newW)
rH = H / float(newH)
resized = cv2.resize(image, (newW, newH))
blob = cv2.dnn.blobFromImage(resized, 1.0, (newW, newH),
(123.68, 116.78, 103.94), swapRB=True, crop=False)
# 模型輸出層
outputLayers = ["feature_fusion/Conv_7/Sigmoid", "feature_fusion/concat_3"]
net.setInput(blob)
(scores, geometry) = net.forward(outputLayers)
# 解碼函數:從模型輸出取出 boxes
def decode(scores, geometry, confThreshold):
(numRows, numCols) = scores.shape[2:4]
boxes = []
confidences = []
for y in range(numRows):
scoresData = scores[0, 0, y]
x0 = geometry[0, 0, y]
x1 = geometry[0, 1, y]
x2 = geometry[0, 2, y]
x3 = geometry[0, 3, y]
anglesData = geometry[0, 4, y]
for x in range(numCols):
if scoresData[x] < confThreshold:
continue
offsetX, offsetY = x * 4.0, y * 4.0
angle = anglesData[x]
cos = np.cos(angle)
sin = np.sin(angle)
h = x0[x] + x2[x]
w = x1[x] + x3[x]
endX = int(offsetX + cos * x1[x] + sin * x2[x])
endY = int(offsetY - sin * x1[x] + cos * x2[x])
startX = int(endX - w)
startY = int(endY - h)
boxes.append([startX, startY, endX, endY])
confidences.append(float(scoresData[x]))
return boxes, confidences
boxes, confidences = decode(scores, geometry, confThreshold=0.5)
# 轉換為 [x, y, w, h] 格式以套用 NMSBoxes
rects = []
for (startX, startY, endX, endY) in boxes:
rects.append([startX, startY, endX - startX, endY - startY])
# 非極大值抑制
indices = cv2.dnn.NMSBoxes(rects, confidences, score_threshold=0.5, nms_threshold=0.4)
# 繪製結果
if len(indices) > 0:
for i in indices.flatten():
(startX, startY, endX, endY) = boxes[i]
startX = int(startX * rW)
startY = int(startY * rH)
endX = int(endX * rW)
endY = int(endY * rH)
cv2.rectangle(orig, (startX, startY), (endX, endY), (0, 255, 0), 2)
# 顯示結果
cv2.imshow("Text Detection", orig)
cv2.waitKey(0)
cv2.destroyAllWindows()
📷 測試用圖片
請放置一張含有明顯文字的圖片命名為 test.webp 與 east_text_detect.py 放在同一資料夾中。
✅ 教學重點回顧
| 步驟 | 重點 |
|---|---|
1️⃣ 下載 EAST .pb 模型 |
一定要是 TensorFlow frozen graph |
2️⃣ 使用 cv2.dnn.readNet() |
輸入模型與影像尺寸需對應(32 倍數) |
3️⃣ 執行 forward() 並解析 geometry |
取得文字位置與角度 |
| 4️⃣ 非極大值抑制處理重疊框 | cv2.dnn.NMSBoxesRotated |
| 5️⃣ 顯示結果 | 使用 OpenCV rectangle 框出文字 |
文章標籤
全站熱搜
