-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcamera.py
125 lines (100 loc) · 4.67 KB
/
camera.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
import cv2
import numpy as np
from visualize import reconstruct
'''
The goal of camera.py is to test concepts of monocular camera-based mapping.
For a monocular camera system, doing 3D reconstruction is tricky. Since there is
only one perspective, the robot must rely on egomotion to get 3D data.
How it works:
camera.py records and stores images as the "robot" moves. Each time a frame is
captured, camera.py computes a 3D reconstruction of the scene using previously
seen images. camera.py also attempts to localize the robot by returning
information about its movement (epipolar motion) between frames. The basics of
epipolar geometry are a solved problem. The challenges are in implementing an
efficient implementation.
Big questions to answer:
Which images are used for 3D reconstruction? Is reconstruction done every frame?
This seems to be a bad solution due to accumulating errors. It's a good idea to
refer to images "long in the past" and try to compute the geometry. However, at
this extreme is also a bad solution. We could compute a 3D reconstruction from
the current to all previous frames. This would give us maximum information about
the environment. It would also crash within seconds due to the time and memory
complexity of storing, then doing online epipolar math on an exponentially
growing set of image pairs.
A solution is somewhere in between: the algorithm could choose to compute only
after a set "period" of frames. This has a few advantages. It means that the
robot could accrue significant motion before it tried to do 3D reconstruction.
Larger optical flow provides a larger virtual baseline, which provides better
information about range. This also gives time for the computer to compute its
epipolar math, which can take a lot of time. The downsides are that the period
becomes a tunable hyperparameter that creates problems. How far should the robot
go before computing? What if the robot makes sudden turns which put the previous
completely out of frame? This means no correspondence can be computed, and
therefore the robot breaks its localization. This means to maintain high
accuracy state estimates, the robot must constrain its movements or set periods
really low. Even with this solution, it is also drifty in that errors can
accumulate.
Another possible solution is to pick which images to use for 3D reconstruction.
The previous image might not make sense as the image to compare to. Instead, the
robot might store some "key" images and then pick which one is most relevant.
This solution can help correct for drift in the estimates. If the robot looks at
features found in the original image, for example, it might be able to
completely eliminate all drift (within image processing error).
'''
imageCache = np.array((0,1,1))
def pickImage():
return imageCache[:,:,0]
def reconstruct():
pass
def retrievePose(E):
# Retrieve translation, up to sign
u,s,v = np.linalg.svd(E)
t = u[-1,:] # could be positive or negative of real translation
# Retrieve 2 possible rotations
possibleR = np.array(2,3,3)
rotationPos90 = np.array([[0,-1,0],[1,0,0],[0,0,1]])
rotationNeg90 = rotationPos90.T
validIndex = 0
for i in range(4):
if i // 2 == 1:
rotation = rotationPos90
else:
rotation = rotationNeg90
candidate = (-1 ** i) * u @ rotation @ v.T
if np.linalg.det(candidate) > 0:
possibleR[i,:,:] = candidate
validIndex += 1
return t, possibleR
# Get a video feed from the laptop's camera
vc = cv2.VideoCapture(0) #0 -> index of camera
if vc.isOpened():
rval, frame = vc.read()
else:
rval = False
cv2.namedWindow("Kevin's face")
# Load intrinsic matrix of the camera
# K = loadIntrinsic()
while rval:
# TODO: Pick the image to be used for comparison
#img1 = pickImage()
# TODO: Find correspondences
#(pts1, pts2) = findCorrespondences(img1, frame)
# Do 3D Reconstruction: get a set of 3D points
points, E = reconstruct(pts1, pts2, K, K)
# TODO: Cluster to find objects: want number of objects and distance to each
# objects = cluster(points)
# Get the pose change from the 3D reconstruction
Rs, ts = retrievePose(E)
# TODO: Use odometry or some other metric to filter out one of the rotations
# R, t = filterRT(Rs, ts, stateEstimate)
# TODO: Update the state estimate of the robot
# stateEstimate = fuse(stateEstimate, R, t)
# TODO: Update the map
# updateMap(map, objects)
# Get a new frame, optionally break
rval, frame = vc.read()
cv2.imshow("Kevin's face", frame)
key = cv2.waitKey(20)
if key == 28: # if escape was pressed
cv2.destroyWindow("Kevin's face")
break