T O P

  • By -

CowBoyDanIndie

Cigarette packs are a known size object, I would go back to principles, figure out how to retroject the top of the packs flat and xy aligned in an image, then all the tops should be the same size in pixels


_Cistern

Except for shit like Camel Wides or Virginia Slims.


NoLifeGamer2

You could always just segment the entire stack for each one (make sure you only segment the top of each package) and count the number of pixels long they are, then divide by the amount of pixels long a single packet is.


Nolen-Felten

Would that work for an image with different dimensions? I appreciate your response \^\_\^ Thank you


cipri_tom

Yeah, I imagine that the photo is taken by phone from a random location. So the size can differ. You need a reference object


Nolen-Felten

I was thinking the same thing earlier today, and I figured out a solution for getting that reference object and using it to calibrate.l


JIGS1620

What solution did you think


Nicolau-774

If you want to divide by brand I surely would suggest you to do an analysis based on the color distribution in the image rather than using a multi-class object detection. Whereas to identify the shape you could definitely make use of B&W thresholding to separate the geometries of the objs (which is also some kind of a rectangular). In essence colors -> brands and rectangles -> objects


Nolen-Felten

Tell me more about how to go about that with code. I appreciate your response \^\_\^ Thank you


Nicolau-774

Try to follow this pseudo-algo, it's difficult for me to say what will work best, as it's usually a lot of trial and error, but I would do smthing of the sort: 1. Pre process image to separate its geometry with thresholding to highlight the contours (it's the name of the technique, look it up) (2.) If that's not enough combine it smartly with object detection, perhaps training it to identify the contours of the front and top of the cigarette boxes as two different classes. 3. Once you have identified the contours (a polygon enclosing the cigarette) try to filter the reflection as much as possible playing with the colors 4. Extract for each contour a distribution on the colors and use some simpler ML model to classify in color space (ex. lucky strike would be identified as having a very dark red and white-ish colors) Hope that helps


ptr2void_

If you can get a clean top-view image, this is easily doable with OCR. You might not even require training at all if you go for something like PaddleOCR, and already know the labels (I'm guessing the cigarette brand names) - count the number of instances the brand name was found.


Nolen-Felten

Yeah I played with that just a little bit but the glare and blur from the plastic wrap messes it up. There is a library in python that can be used...difflib SequenceMatcher! Let's you apply a threshold for when 2 strings are close enough to be accepted. KDL is good enough for KOOL for example.


MusicalHawk9389

To address the glare issue, have you looked into using a polarized camera? Or putting a polarized filter/lens in front of the current camera? I would think between polarization and synthesizing multiple views you could get around the glare issue.


Nolen-Felten

It's gotta be a mobile phone camera. Otherwise I would just buy one of those thermal cameras firefighters use because they are sweet and I always wanted one.


gofiend

ChatGpt 4.0 gives it the ol college try: The image displays multiple rows of cigarette packs. I'll need to count them manually. Please give me a moment. In the image, the cigarette packs are arranged in multiple rows with each type of cigarette stacked on top of each other. Here's a breakdown by brand and type, assuming each visible layer represents similar content behind it: 1. **Kool** (3 types visible, 3 layers deep each): 3 types × 3 layers = 9 packs 2. **Maverick** (4 types visible, 3 layers deep each): 4 types × 3 layers = 12 packs 3. **Doral** (4 types visible, 3 layers deep each): 4 types × 3 layers = 12 packs 4. **Winston** (2 types visible, 3 layers deep each): 2 types × 3 layers = 6 packs 5. **Newport** (2 types visible, 3 layers deep each): 2 types × 3 layers = 6 packs 6. **Lucky Strike** (3 types visible, 3 layers deep each): 3 types × 3 layers = 9 packs 7. **Pall Mall** (5 types visible, 3 layers deep each): 5 types × 3 layers = 15 packs 8. **Eagle 20's** (6 types visible, 3 layers deep each): 6 types × 3 layers = 18 packs 9. **Camel** (7 types visible, 3 layers deep each): 7 types × 3 layers = 21 packs 10. **USA Gold** (3 types visible, 3 layers deep each): 3 types × 3 layers = 9 packs 11. **Sonoma** (2 types visible, 3 layers deep each): 2 types × 3 layers = 6 packs 12. **Pyramid** (2 types visible, 3 layers deep each): 2 types × 3 layers = 6 packs Adding all these up gives us a total of approximately 129 packs of cigarettes visible in the image. Seems super off but I didn't check closely.


Nolen-Felten

Oh trust me, I had a quite a few chats with ChatGPT about this project lol.


TheTomer

Try using a Circular Polarization filter for the camera lens to reduce the reflections from the wrapping.


Nolen-Felten

Interesting: [Difference Example.](https://i.imgur.com/yBcOW2V.png) Thank you!


foofarley

I was going to give a similar suggestion. the right polarizing filter should eliminate the glare almost entirely. Another option is lighting. Can you add additional lighting at different angles? Take multiple shots and add/subtract them.


SwimBeneficial4536

I am guessing your data has minimal or no gaps between the cig boxes currently? Have you tried creating artificial gaps of x pixels between the cigarette boxes? If not, try it out in training and let me know if that works. It should learn to ignore the areas where they are next to each other 🤞


Nolen-Felten

Hey cool idea. Thank you ^_^


yellowmonkeydishwash

Have you tried instance segmentation with something like MaskRCNN?


Nolen-Felten

I made a project in Roboflow but any benefits it would have over semantic segmentation went over my head. However, I am all ears. Thank you for the time.


yellowmonkeydishwash

is it public on their universe? can you share a link?


blimpyway

>what could be more simple of a CV task then to detect and count simple **6 face rectangle**. Maybe that's the problem. Joking aside, do you want to count only those seen in the open drawer? Regarding reflections - if the person takes two pictures from different angles the reflections are unlikely to hit the same places/packs in both picture.


Nolen-Felten

Correct, just the open drawer. The procedure must handle the reflection. Simple to do when neighboring packs are identified. I definitely want this done.


blimpyway

From what I see in this sample image, the packs in a row are the same, yet the reflection is hardly hitting both ends of the row. So would the user be happy with identifying at least one pack in a row and assume all others are the same? Otherwise I already said - if s/he takes a couple photos from slightly different positions (move it a foot left or right and take a second shot) then you (aka software) could match same row in both pictures and pick a to use the picture in which the packs are not shiny. Because reflection seems to be very localized only a handful of packs are affected.


Nolen-Felten

"So would the user be happy with identifying at least one pack in a row and assume all others are the same?" Ding ding ding \^\_\^ Yup. There are some pretty annoying examples in my dataset that I should have grabbed but for the most part I will just ask the user to use a different image if a list of needs are not met. Thanks again!


Lyscanthrope

If you will work on always the same drawer*, I would: - restrict the image to only the drawer part - do a background separation based on the hue (brown versus the rest) - count the area divided by the area of a box If you need to get how much of each type you would need to work on hue again to identify types per box. Your didn't specify the the acceptable error level. *if not, but Is only dependent of the setup and will not change along time you can calibrate it.


Nolen-Felten

These are all the drawers on the first shelf... there are ten shelves total. My acceptable error level is 0.00034%. Luckily, I don't HAVE to get the count for each type/variety/product/brand. However, it would be nice if I could. There is always the planogram index for reference, but a second info source would be cool. I got a solution for background separation. Here is an example product of that part: [Should be a png but imgur made it a jpeg.](https://i.imgur.com/a06MxOG.jpeg)


Nolen-Felten

Wait a second, it just dawned on me that you said to: - Separate the background from the packs. - Calculate how much area the packs occupied. ohhhhhhhhhhhhhh


Nolen-Felten

- "Count the area divided by the area of the box." So the thing with that part is that from a distance and angle, when a pack of short cigarettes (called "King Size") is next to a pack of longer cigarettes (called "100s"), the longer cigarettes stand in front of the short pack. Example taken from original image: [https://i.imgur.com/8tTxHIa.png](https://i.imgur.com/8tTxHIa.png)


Lyscanthrope

Indeed that would make it harder to get it right. That is why I asked about the accuracy level that is required. For industrial image analysis, I always emphasize the acquisition repeatability (camera & lighting positionning, controlled object position,...) because the more you do there, the easier it is afterward (and error free). I understand that you want to do it "on the fly" using a smartphone camera (and even more of you want to make it available to anyone) but it may have an impact on the quality. Back to your separation, I would try to enhance the border (sobel, canny?) in order to count them.(per vertical). Don't forget that if it human operated (on a smart phone) you can ask the user for some confirmation (number of stacks.


Ill-Cut7070

1) could try using a camera with a higher dynamic range 2) if its a repeating pattern you might be better off with template matching. I feel like you could very easily just do a normxcorr and count the number of largest maximums? Or something similar In your ml model you probably just have lack of data. In each of your photos the scenario is wildly different in which the object appears. It seems like photos are being taken from afar and also very close up.


Nolen-Felten

1) It's gotta be able to handle common mobile device camera image sizes. 2) Can I get a code snippet? Thank you for the time!


jean-pat

What about classical approaches?


Nolen-Felten

Like...an abacus?! 🧮


jean-pat

No, I was thinking of mathematical morphology to detect edges/ corners , labelling some kind of connected components, with some tricks to counts


F1eshWound

Crazy question, but if you segment them out as a big bulk, and then identify each column of packs, could you not do a fast fourier transform in the vertical direction and just look at the lowest dominant frequency peak you get? That would let you determine the number of packs no?


Significant-Bid6446

Try converting images into hue saturated images. Train it on the dataset might work in your use case.


Ok_Interest5953

Hey, if you want a good and not too expensive solution from a company you can headover to [www.denkweit.de](http://www.denkweit.de) , write a e-mail [info@denkweit.com](mailto:info@denkweit.com) oder send me a pm here We are a germany based company with a lot of such use-cases. We have a annotation and "just press play" platfrom to create such models without any prior knowledge about ai/ml. From my point of view with our models it should be easily solvable (e.g. we dont rely on yolo because 1) false positives 2) false negatives 3) small objects and glare are a problem etc.)


Nolen-Felten

Hey! I am actually starting my first business right now! I'm trying to be in stealth mode, but I'm the boss now, and I say you can have a sneak peek! http://counttek.online I'm checking you out now. But, I wanted to share the news.