What is OCR? What is OCR? How can a beginner use it with Azure Cognitive Services?
What is OCR?
OCR stands for optical character recognition. It is the identification of alphanumeric characters within an image. These steps will help you understand the OCR process.
Get an image
Perform pre-processing of the image
Use algorithm to recognize characters
Scanners and cameras can produce images. Scanners and cameras can preserve the layout when done correctly. However, camera-based images can cause parallax to affect the positions and dimensions of characters and words. Pre-processing can help to alleviate these issues.
Figure 1: Print text offers a simpler image to work on
Pre-processing images is done mainly to make it easier for computer systems to identify characters in an image. Based on your requirements, there are many pre-processing algorithms that can be used. Some of these algorithms include
Pre-processing is primarily used to enhance an image but incorrectly applying these filters can compromise the validity of the data.
After pre-processing, the next step is character recognition. Pattern matching is one of the most fundamental algorithms for character recognition. The image is compared to a stored glyph and compared pixel by pixel. This procedure is invalid for handwritten text.
Algorithms can be used to match features and not pixels, which can help overcome the difficulties of handwritten text. Feature extraction can also reduce dimensionality, which improves efficiency.
Figure 2: Patterns in handwritten texts can be very diverse and pose a problem for machine learning algorithms
The confidence level is a measure that shows how optimistic the algorithm is about its predictions. You can increase the confidence level by using standard fonts and different font sizes. These four steps are the most important. You can also improve OCR accuracy by implementing application-specific optimizations.
OCR on Azure
OCR is a tedious and complex task that requires extensive domain knowledge. My perspective is that I come from a background in Computer Science. I didn’t have any technical support in OCR.
Azure’s Cognitive Services makes OCR possible for both novice programmers and programmers. The service uses a simple REST interface, which gives off a sense of familiarity and increases the ease of use.
OCR on Azure is available as a sub-service to the Computer Vision API. To implement Microsoft’s OCR service one must have a key to Azure. To test the OCR service, you can easily obtain trial keys.
Computer Vision is similar to the LUIS service but not available in all regions. These regions offer Computer Vision services:
East US 2
South Central US
West Central US
West US 2
Computer Vision is available in a variety pricing tiers, unlike LUIS.
TierFeaturesUnitPriceFree-Transactions5k/monthS110 transactions/sTransactions0-1 mil – INR 66/1k transactions
1-5 mil – INR 52.8/1k transactions
>5 mil – INR 42.97/1k transactions
S210 transactions/sTransactions0-1 mil – INR 99.15/1k transactions
1-5 mil – INR.66.10/1k transactions
>5 mil – INR42.97/1k transactions
S310 transactions/sTransactionsINR 165.25/1k transactionsThe Computer Vision API differentiates between OCR for printed text and OCR for handwritten text. Different nested routes can be used for each process. It is much easier to analyze printed text, as we have already discussed. This is due to the clear distinction between background font and foregrou in standard fonts.