Interestingly, in image 2, the filtered data seems to be worse than the actual noisy data?
Sure, the large spikes from sensor data were reduced, as seen with the blue line up in north which was considerably reduced, but seemingly at the cost of the more accurate tracks. We can see some "ground truth" - namely the map roads. I think if the source of the tracks are someone moving on a road (in a car etc.), it is safe to assume that the roads will be the most likely place to find them. In that image, it seems like we're seeing the tracks of some object moving on the road.
EDIT: But nice work anyway, I work a lot with noisy GPS data for vessels, where there are no roads - only shipping routes / paths, and increased GPS jamming in some areas makes prediction models more useful.
n4r9 12 hours ago [-]
Yeah, this sounds like a way to "smooth" the GPS trail to remove anomalies quickly, without paying attention to the road network.
The problem of snapping a noisy GPS trail to the road network is known as map-matching. Good map-matching algorithms tend to use hidden Markov models, which are sort of like discrete Kalman filters. The state of the model is something like "which road segment is the truck on", and the predictive step employs routing algorithms to calculate transition probabilities between states. This is a dynamic algorithm that can be done on the fly - i.e. as each GPS point comes in - but I'd be very reluctant to do it in postgres.
foota 3 hours ago [-]
So apps like Google maps do this? I'm always surprised when it jumps between roads. Like... You knew I've been on this road for the last ten minutes, you think I'm going to teleport into the tunnel beneath me?
n4r9 2 hours ago [-]
I'm not sure about Maps to be honest, but that sort of glitch is a strong indicator that they're just snapping to the nearest current road rather than doing proper routing calculations.
My Toyota has a speed limit symbol on the dashboard which will occasionally show the speed of a slip-road going onto the motorway I'm already on. I'm guessing it's a similar phenomenon.
whilenot-dev 11 hours ago [-]
I share the confusion. It depends on the measuring intent I guess, and it'd have been nice to say something about that and include some kind of indicator for these outliers. Here's the thing in Google Maps: https://www.google.com/maps/@47.1745904,7.2745602,14z/data=!...
From looking at the company website[0] I'd assume the goal could've been to get a better estimate about the total distance travelled during tracking analysis? Keeping that goal in mind, the error from the outliers was reduced significantly without causing too much disturbance on the accurate data. Nonetheless, including further measurements from speedo- and odometer in the sensor fusion at certain intervals would make this goal redundant and provide an even better estimate.
This is nfortunately limited to 2-dimensional state/measurements. In this case the covariance matrix is only 3 numbers, so the required linear algebra can be easily be done in a loop. The generic Kalman handles arbitrary dimensions, but requires general matrix multiplication and inversions, which are not easy to implement in Postgres.
Still, 2d is a useful special case, and if it addresses the problem at hand, there's no need to overbuild. (Even the 1d Kalman filter, which often boils down to exponential smoothing, is a useful special case.)
fifilura 11 hours ago [-]
I'd imagine 90% of the kalman filters out there are for 2 or maybe 3 dimensions, since the use case is mostly this, determining a position.
The filter fails is when there is not a single "true" answer to aim for, but there are many true answers. A position is clearly defined as long as it is not quantum physics.
thekoma 9 hours ago [-]
Yeah. Using the Kalman filter just to determine the position from noisy position measurements really undercuts the capability of the filter to use system physics to estimate the true state.
In one of the most common applications of Kalman filters, autonomous robots (e.g., a robot vacuum or a commercial drone), the filters are around 9 to 12 dimensions.
em500 8 hours ago [-]
Right, in addition to the position you usually want the velocity, and sometimes also the acceleration, in all dimensions. More ambitious (or optimistic) practitioners could add more sensor measurements, like gyroscopes.
fifilura 5 hours ago [-]
You are right of course and I was out of my depth. I wonder if the vector types now being added to databases for ML/AI stuff could help with this.
8 hours ago [-]
tech_ken 7 hours ago [-]
Wow this is extremely cool/impressive, but if my manager asked me to implement this I'd quit lol. The "state" headaches alone seem like a nightmare, nevermind all the whacky linear algebra you're going to have hand-roll (Like does Postgres even have a matrix type?? Did you have to implement matrix inversion in SQL from scratch?? I get nauseous just thinking about it.)
edit: I guess in 2D a lot of this becomes simpler than in general high-dimensions.
fifilura 12 hours ago [-]
I have done this with AWS Athena. At the end of the day a kalman filter is just a number of multiplications and divisions.
My version would calculate one step at a time so it is a bit simplified (since that was a requirement, processing one measurement of incoming data daily). And also only in one dimension (here is two).
For the offline version (calculating many steps in a chunk), i'd imagine i'd use the array functions in Athena. But it may very well be possible to recreate using window functions. The state is just more column/columns after all.
Rendered at 22:32:52 GMT+0000 (Coordinated Universal Time) with Vercel.
Sure, the large spikes from sensor data were reduced, as seen with the blue line up in north which was considerably reduced, but seemingly at the cost of the more accurate tracks. We can see some "ground truth" - namely the map roads. I think if the source of the tracks are someone moving on a road (in a car etc.), it is safe to assume that the roads will be the most likely place to find them. In that image, it seems like we're seeing the tracks of some object moving on the road.
EDIT: But nice work anyway, I work a lot with noisy GPS data for vessels, where there are no roads - only shipping routes / paths, and increased GPS jamming in some areas makes prediction models more useful.
The problem of snapping a noisy GPS trail to the road network is known as map-matching. Good map-matching algorithms tend to use hidden Markov models, which are sort of like discrete Kalman filters. The state of the model is something like "which road segment is the truck on", and the predictive step employs routing algorithms to calculate transition probabilities between states. This is a dynamic algorithm that can be done on the fly - i.e. as each GPS point comes in - but I'd be very reluctant to do it in postgres.
My Toyota has a speed limit symbol on the dashboard which will occasionally show the speed of a slip-road going onto the motorway I'm already on. I'm guessing it's a similar phenomenon.
From looking at the company website[0] I'd assume the goal could've been to get a better estimate about the total distance travelled during tracking analysis? Keeping that goal in mind, the error from the outliers was reduced significantly without causing too much disturbance on the accurate data. Nonetheless, including further measurements from speedo- and odometer in the sensor fusion at certain intervals would make this goal redundant and provide an even better estimate.
[0]: https://traconiq.ch/
Still, 2d is a useful special case, and if it addresses the problem at hand, there's no need to overbuild. (Even the 1d Kalman filter, which often boils down to exponential smoothing, is a useful special case.)
The filter fails is when there is not a single "true" answer to aim for, but there are many true answers. A position is clearly defined as long as it is not quantum physics.
In one of the most common applications of Kalman filters, autonomous robots (e.g., a robot vacuum or a commercial drone), the filters are around 9 to 12 dimensions.
edit: I guess in 2D a lot of this becomes simpler than in general high-dimensions.
My version would calculate one step at a time so it is a bit simplified (since that was a requirement, processing one measurement of incoming data daily). And also only in one dimension (here is two).
For the offline version (calculating many steps in a chunk), i'd imagine i'd use the array functions in Athena. But it may very well be possible to recreate using window functions. The state is just more column/columns after all.