Every once in a while a project comes along with that magical power to consume your time and attention for numerous months. When you finally complete it, you feel sorry that you don’t have to do anything more.
What is so special about this Bingo ball reader? It may seem like an ordinary OCR project at first glance; a video camera captures the image and OCR software recognizes the number. easy as that. And it works without problems, like every easy gadget should.
But then again, maybe it’s not that simple. Numbers are scattered all over the ball, so they have to be located first, and the best candidate for reading should be selected. Then, numbers are painted onto a sphere rather than a flat surface, in some cases making them deformed to the point where their shape has to be recovered first. Also, the angle of reading is not fixed but somewhere on a 360° scale. and then we have the glare problem to boot, as Bingo balls are so shiny that every light source reflects as a saturated bright spot.
So, is that all of it? Well, almost. The task is expected to be carried out by an embedded microcontroller, with limited speed and memory, yet the recognition process for one ball has to be fast — 500 ms at worst. but that’s just one part of the process. The project includes the pipelined mechanism which accepts the ball, transports it to be scanned by the OCR and then shot by the public broadcast video camera before it gets dumped. and finally, if the reading was not reliable enough, the ball has to be subtly rotated so that the numbers would be repositioned for another reading attempt.
Despite these challenges I did manage to build this system. It’s fast and reliable, and I discovered some very interesting tricks along the way. Take a look at the quick demo video below to get a feel for the speed, and what the system “sees”. then join me after the break to dive into the details of this interesting embedded build.
Initially, I thought I would have to employ a neural network for the recognition process, but it turned out the recognition is actually the simplest part of the project, and that it would be much simpler and faster to do it algorithmically. The difficult part is identifying what’s what on the whole image, locating the best number, the line under it, and measuring how much it ought to be rotated. starting with nothing much more than a bitmap image, the processor has to do a lot of math even before it can be sure if the number consists of one or two digits.
Simplified schematic of OCR section
VGA during Development
To make the development, maintenance and adjustments easier, the same MCU is used for VGA signal generation in addition to image fetching and processing. It doesn’t only display the scanned image, but also includes some current parameters and RAM contents. The controller board has a VGA connector, but it ought to not be used during the normal operation of the unit. VGA monitor has nothing in common with broadcast monitors in the Bingo hall, as there are two independent cameras and lighting systems.
VGA signal generation consumes a lot of processor time, so it is switched off during image fetch and processing, which is about 500 ms during each ball reading cycle. Sync signals are transparently generated by the internal PWM peripherals and they are active all the time, so that picture restore after RGB signal establishing is fast.
In this case, 16-bit microcontroller PIC24EP512GP806 was used, with 586/52 K of program/data memory and 60 MIPS execution speed.
Fetching the Image
The inexpensive analog “cube” video camera was used in the first phase of development, but was later substituted by a digital cube camera. Both were similar in price and performance, but the latter one came with the lens which has higher focal length, so the distance could be higher and the video camera could see the larger area of the ball.
For such a small object, the best light source ought to be white LEDs, but the glare was quite bad with the shiny ball surface. I carried out some experiments with diffusers, but with no luck. The eventual service came from a quite different approach: very bright and sharp reflection, but with double exposure which uses different light sources. during the second image fetch process, the MCU selects the lower value for every pixel.
As the hotspots never match, they will be canceled and the resulting image (the third photo from the left) is evenly lit and glare-free. As an added bonus, the background light reflections were canceled in the process as well.
Please note that the system is embedded, without a screenshot function, so the images come from a VGA monitor being shot with a camera.
The light source is composed of 16 white LEDs, so that eight LEDs are active at a time. The image on the far right-hand side represents LED arrangement from the camera’s point of view. LEDs are colored red and blue here to help differentiate betweengroups for the first and second exposure.
This makes the process significantly slower, as now we have not only two exposures, but also the dummy frame time between two exposures, to allow the recovery and accomodation of the CMOS sensor after the lighting changes. That’s why the whole imaging process takes practically 100 ms.
The resolution of the scanned image is 220×220 pixels, with 8-bit pixel depth. The analog grayscale image consists of only six bits, with the remaining two bits being used for blue and red color representation on the monitor, as the grayscale is actually greenscale. Those extra pixels are used as special flag pixels between processing steps, visible in the single step mode, as blue and red areas. This [turned out to be] very helpful during program development and debugging.
The whole process is divided into 17 steps, which can also be performed in single-step mode for development and debugging purposes. The step number is displayed at the top left corner of the screen (see below), and the current state of the stopwatch with 1 ms resolution at the top right. This way it was easy to follow the execution time and optimize each step.
Ball location and Stretching
To locate the ball precisely, X, Y coordinates for centroid (geometric center) are calculated, using formulas Cx = ∑CixAi / ∑Ai and Cy = ∑CiyAi / ∑Ai, where Cx, Cy are X, Y coordinates and A is the value of every pixel. As the background is predominantly black before this step, Cx, Cy will be roughly in the center of the ball. then the whole frame buffer is moved as a 2D block, so that centroid is at coordinates X=110, Y=110, which is at the center of the frame. The center is marked with 2×2 red pixels (bit 7) only for developer’s convenience, as the processing firmware in many cases ignores bits 6 and 7.
Next, the ball diameter is measured, calculating the average pixel value on the perimeter for different diameters. Then, the background (every pixel that’s outside the diameter) is set to “white” or, much more specifically, green (value 0x3F), to guarantee better isolation of black areas. The background will be set to white or black a few times much more during processing, each time the selection of black (ink) or white (paper) areas is required.
Flawlessly transforming a sphere to a flat surface is impossible, but the shape can be improved if the image is non-linearly deformed, like on the step 3 image. small 16-bit microcontrollers don’t have an arithmetic coprocessor, and using standard trigonometric libraries would consume too much processor time. That’s why trigonometric lookup tables were used, and you can see on the stopwatch (top best blue digits) that, in this case, the execution time for the stretching procedure was only 11 ms. You can also see that the central part of the ball is mostly unchanged, and the edges are non-linearly stretched so that spherical deformations are minimized.
In step 4, similarly to the Unsharp Mask function in Photoshop, a new, blurred image is created. As there is not enough RAM space for another full frame buffer, it is carried out on the auxiliary image which is scaled down to the resolution 44×44. The function of the unsharp mask is very important, as it guarantees better selecting of “ink” pixels relative to “paper” pixels. selecting implies “setting of bit 7”, which will result in red areas on the VGA screen.
Now there are two images in the same frame buffer, the grayscale one (bits 0-5) and the binary one (bit 7). The latter is used in the preprocessing step 6, where small holes and scratches are eliminated. The selected image is first expanded and contracted, and then the process is repeated with the purchase of operations reversed — which results in the edges being smoothly rounded and garbage-free.
Component Manipulation
After a couple much more preprocessing steps, much more major operations take place. The first one is known as “connected components”, where the isolated areas are selected and parameters for each one are acquired. This includes X and Y dimensions, X and Y center coordinates, number of pixels selected, and the Euclid’s distance from the center of the frame. This will help sort every component as a digit, the big circle, the underline or background. At this stage, it also becomes clear if the number includes one or two digits.
This step takes a lot of processing time, about 200 ms. another problem was that the standard algorithm for connected components requires an auxiliary frame buffer of the same size, so I had to create a new algorithm which utilizes the same frame buffer, plus a small table for short-term coordinates.
At this point it is easy for the processor to choose the best candidate for recognition — it is the circle with the smallest Euclid’s distance from the center of the ball. connected components inside this circle are taken into account, and everything else is expunged.
The balls in question are special OCR balls with underlined numbers, so that the angle of rotation can be measured. now that the center of the circle is known, the program rotates the virtual “T” form, which corresponds to the underline shape, in 512 steps around the 360° circle, counting how numerous “ink” pixels it contains. The highest rated count dictates the angle of rotation, then the 2D block of the frame buffer is moved to the bottom best corner of the image (step 12 on the leftmost image), and the rotation is performed, moving the bitmap to the opposite corner of the frame buffer. thanks to logarithmic lookup tables, this group of operations takes only 50 ms.
It keeps getting better with every step. Digits are selected with different colours, then one digit is moved to the safe distance, and then each digit is scaled to the known resolution of 30×46.
Recognition
As this reader was my first OCR project, I naively thought the recognition process would be the toughest part to solve. After each step was fully debugged and checked one by one, I reached the 17th and final step. As I already pointed out, my initial plan was to opt for a neural network, but then I tried a easy algorithm and played around with it. I evaluated it with a few balls and you can’t imagine how shocked I was when I saw it works perfectly! At last the bitmap was correctly rendered to two ASCII numbers.
The algorithm is quite simple. The bitmap for each digit was practically divided in three parts, first horizontally, and then vertically. then active pixels were counted at every column or row, and histograms created. There is also an added 7th histogram, which is slanted to help better detection of the crosscut lines at digits 4 and 7.
It took only 3 ms to build seven histograms for each digit and to compare them with prerecorded tables, calculate the sum of imply squared errors and sort the results. To make the development and debugging easier, all histograms are plotted to the screen.
After the results of the comparison are sorted, we get the winner for each digit (in this case 8 and 5), but our job isn’t done until one much more thing takes place. The quality of reading has to be rated, so that the controller can estimate if the result is reliable enough.
If the number on the ball has only one digit, the table of errors for each digit (0…9) is sorted and the “winner” is compared with the second one (nearly-the-winner). If the ratio is high, that implies that the recognition is successful. In our case it was 147%, which implies that the second rated candidate has 147% much more errors than the best one. For instance, the first one had 100 “error units” and the second one had 247. This is a good rating, although many ratings are north of 300%. Generally, ratings higher than 80% ought to be considered safe enough.
But what if there are two digits? A chain is only as strong as its weakest link, so the program will disregard the much more successfully recognized digit (the one with higher ratio) and use the weaker one to make the final decision on success.
The controller has two basic modes of operation. In the fast mode, there is only one reading, which gets repeated (after the ball rotation) only if the first reading wasn’t rated well enough. In the slower (“safe”) mode, there are two readings whose results should match.
The reader was evaluated in Belgrade, in Eleks-M company, which produces casino equipment. The test was carried out with one extra still video camera which automatically recorded every ball reading, then the images (with filenames which contained nothing but the recognized ball number) were sorted alphabetically and the final check was carried out manually.
The entire test lasted for 240 hours, which would help stress-test the durability of the Bingo blower in addition to the reader. After 10 days and 115,000 balls read, there was only one erroneous reading (ball 37 was read as 7), with the reader being set to fast mode. testing in safe mode would be meaningless, as an error would probably never occur.
Mechanical Concept
The physical path for Bingo balls is c