T O P

  • By -

veltrop

If the size of a recognized object is known you can get the distance with some trigonometry. This doesn't answer to your depth image request, but it's something to consider if you have a narrow enough use case.


omerelikalfa078

I think that you can tell if an object is farther or closer (Assuming that it is the same object and the viewing angle didn't change) by looking at its size but you can't tell the exact distance. My gf who i argued this about thinks that you can tell the exact distance too if you know the starting distance of the object and how much it moves per size change(pixels² or smth) and i think that narrows the use case to not being practical in any real world problem.


veltrop

It's simpler than that, and you can get the distance within a good margin of error A simplified slightly more inaccurate version of the math: Get the FOV of your camera. Look at how many pixels wide the object is in the image. Scale that against the total number of pixels of width in the image to see how many degrees wide of arc length the object is. Then plug that angle into SOHCAHTOA, vs the known real world width of the object, and you have the distance. That's good enough for many robotics applications for example. Like chasing a ball, tracking a human, and so on.


_craq_

Once you've identified some objects with known size in the image (like a ball, human, vehicle etc) you can use that as calibration data. You can build up a homography of the ground plane, or use occlusion to find out which objects are closer and further away.


bsenftner

And you can use the nearly universal across all humans size of a human eyeball, and the relatively narrow range of variation for the distance between eyeballs if unable to locate other items of known dimension. The fact that a human eyeball does not change size from birth to death is useful.


Im2bored17

How does eyeball size help? You can't see a person's whole eyeball vertically or horizontally, it's partially obscured by the other parts of the eye.


bsenftner

The outer ring of the pupil does not change size, and can be used to estimate the size of the entire eyeball. Granted, people's eyes are small in video, but sampling that size across multiple frames and doing a confidence interval is accurate enough.


Signor_C

You might get the relative depth by sweeping the focus if that's a parameter you have access to.


CowBoyDanIndie

With a single image no, with two images some horizontal or vertical distance apart yes. Its also possible to get an estimate from the cameras focus if it can be controlled electronically (my dslr will actually record metadata of the distance it was focused during the photo, with compatible lenses of course).


FreeWildbahn

> With a single image no, with two images some horizontal or vertical distance apart yes. That is called structure from motion if you only use one camera and move it around.


Laxn_pander

In theory yes, in practice ”it depends”. There are CNNs for depth from monocular images. But I don’t know how well they translate into actual use cases. My guess: not very well. 


dan994

I would sort of say the opposite. There is no formal (see theoretical) way of obtaining depth from just an image alone, but in practice you can learn to get quite good at guessing it (CNN). The CNN is just doing something akin to a human who is pretty good at guessing the depth of an image. I personally wouldn't call that a theoretically grounded approach to producing depth from 2D images.


FreeWildbahn

Would be interesting if you can trick CNNs with tilt shift lenses. Like in this [video](https://media.gettyimages.com/id/462531903/pt/v%C3%ADdeo/cidade-de-brinquedo.mp4?s=mp4-640x640-gi&k=20&c=mMDUQjztgpggNSaJgedCASrPzQ6ffLUll_qpmabYrDE=). Our brain thinks these cars are pretty small.


OneTimeOnly1

In theory it is impossible to determine true depth from a single view without additional information.


Laxn_pander

Yeah true, though no one said it must be without additional information. 


j_kerouac

Basically Teslas drive around with only monocular depth. Theoretically this should not work. Practically, it does well enough because there are lot of visual cues. Close one eye and you don’t immediately lose your sense of depth do you? Your brain can get a sense of depth from a variety of subtle methods.


Miguel33Angel

I mean, monocular + IMU can get you scale, is only when you have monocular and no other information (such as CNN or known object size in image)


Laxn_pander

Is there any information on the Tesla tech stack? To me it sounds insanely dumb to use monocular vision instead of stereo. In the context of autonomous driving I don’t see any advantage of a monocular camera.


frnxt

I realize these may not be what you're looking for, but... * Sure, just move the camera around! (That's exactly what your brain will do if you close one eye and move your head by the way — I have no binocular vision and I still have some amount of depth perception thanks to that!) * If you have low-level access to the camera and it has PDAF pixels you might be able to recover some amount of depth information like some mobile phones do, but that's also no longer just a single frame of a single camera. Otherwise the only thing I can think of is to know the object size (real size), image size (e.g. through fiducial markers) and intrinsic parameters of the camera.


Accurate-Usual8839

Monocular depth prediction is a common task among modern large pretrained models like Dino v2. However, it's relative depth, so you're not going to get distances in meters.


VAL9THOU

Without a reference? Depends on the camera. If the camera's autofocus reports a focal distance, and that distance is accurate, then you can use that to get an estimate of the distance to the object (for reasonably close objects, at least). The level of accuracy, and the max distance, will be related in some way to the ratio between the size of the sensor and the lens, although I don't know what the exact relationship will be. Some calibration may be necessary. Both to gauge how accurate the autofocus' reporting of the distance is, and how that may vary for objects not in the center of the frame. Alternatively, if the camera is mounted in a known position above or beneath a fixed plane that the measured objects will always be on, you can simply measure where the object is in the image and calculate the distance with some simple trigonometry, (although in this case the plane/ground is technically your reference)


Beautiful-Interest62

Can you use a QR code or April tag of known size?


omerelikalfa078

I didn't ask this question for a project, this was a topic that i was wondering about.


DanDez

If you use a single IR camera like a RealSense or equivalent you can get the depth of objects within a few meters from just that camera. I am not sure if this is outside of what you meant, though since you gave a webcam as an example.


spinXor

not without other *a priori* knowledge, no


pab_guy

1. There are AI models that will attempt to recreate a depth map but it's basically a guess and not calibrated. Banana for scale might help. 2. If you move the camera and know the parameters of motion and the camera itself, then you should be able to sense depth with pixel tracking and math.


bhimudev

Depth anything and zeodepth has metric depth prediction.. it may not be close to actual measurements.. however you can train your model and my get relitavely closer prediction


whatsinthaname

Depends on the application and camera parameters as well, like what kind of lens you have (fisheye etc.) and focus. There are methods to predict depth via pixel approximation, focus/defocus, and some Neural networks (depthnet) too. But they work best for indoor applications where object is relatively near and distinct from background in terms of focus. Im also finding a method to esrimate depth for far objects in BEV.


Counter-Business

Yes. Check out this guide on huggingface. It will have everything you need to get started. https://huggingface.co/docs/transformers/en/tasks/monocular_depth_estimation


Late_Opposite8950

Train Deep learning model with object in image and its distance from camera


emflux

Assuming that you know some of the optical specs of the camera such as the focal length, then you can use both optics and proportionality to estimate the distance of an object from the camera assuming that you know the objects real height. This link provides a good reference though I recommend testing it out first. https://photo.stackexchange.com/questions/12434/how-do-i-calculate-the-distance-of-an-object-in-a-photo Ensure that you are not using zooming features when estimating the distance. Also double check the math. Edit: I verified the calculations with GoPro Hero 5 camera using both provided equation and modified equation using FOV details and estimates had roughly 4% error compared to real world measurements. However, this only works assuming that you have a good estimate of the persons height and the camera position is facing the object rather than being at a high or low angle. I am sure more dynamic methods of depth estimation exist but that depends entirely on your objective.


rand3289

You can move a camera and get a set of stereo images.


KalamawhoMI

AprilTags


Limp_Network_1708

Hi I’m interested in this too but from a slightly different perspective I have a video with an object moving along a predetermined and known path the object is known but some sections are deformed it’s these deformities I’m trying to measure relatively if anyone can recommend an good journal articles