Calculon part 2: costing profiles, performance, and maneuvers
- 8 minutes read - 1592 wordsPart 1 covered the basics: six crates, Valhalla’s tile format, a driving cost model, bidirectional A*, many-to-many matrix, and an Axum HTTP server. Everything worked for the driving case on the Monaco dataset.
Since then, the engine gained bicycle and pedestrian costing, memory-mapped tiles, a comparison webapp, turn-by-turn directions in 34 languages, and enough performance work to make it usable on France-scale data. This is what changed.
Bicycle and pedestrian costing
Two new cost models implement the DynamicCost trait alongside the existing AutoCost.
Bicycle defaults to 18 kph (hybrid bike). The model factors in grade, road stress, surface quality, and turn costs, all lighter than auto because a cyclist doesn’t mind a left turn as much as a driver crossing oncoming traffic.
Grade uses a Tobler-style speed table. Flat terrain is 1.0x; steep downhill goes up to 2.2x; steep uphill drops to 0.3x. Hill avoidance is a separate preference (use_hills) that applies an additional penalty on grades regardless of the speed adjustment.
The interesting part is road stress, how comfortable a road is to cycle on:
fn accommodation_factor(&self, edge: &DirectedEdge) -> f32 {
let use_type = edge.use_type();
if use_type == Use::Cycleway as u32 || use_type == Use::MountainBike as u32 {
return 0.0;
}
match edge.cycle_lane() {
CycleLane::Separated => 0.0,
CycleLane::Dedicated => 0.1,
CycleLane::Shared => 0.3,
CycleLane::None => {
if edge.bike_network() { 0.2 }
else if edge.shoulder() { 0.5 }
else { 1.0 }
}
}
}A separated cycle lane costs nothing extra. A shared lane adds 0.3. No accommodation at all on a road that’s not part of any bike network costs 1.0, scaled by the use_roads preference. Combined with ROAD_CLASS_FACTOR that penalizes trunk roads (0.8) vs residential (0.0), the engine steers cyclists toward quieter streets with bike infrastructure.
Pedestrian defaults to 5.1 kph. It adds SAC hiking scale (7-tier trail difficulty), a step penalty for stairs, lighting preference, and use-type bonuses for footways and sidewalks:
const GRADE_SPEED_FACTOR: [f32; 16] = [
1.5, 1.4, 1.3, 1.15, 1.05, 1.0, // downhill: slightly faster
1.0, 1.0, // flat
0.95, 0.85, 0.75, 0.65, 0.55, 0.45, 0.4, 0.35, // uphill: slower
];Both profiles make a critical architectural decision: they disable hierarchy entirely. The A* hierarchy transitions work by “jumping up” to highways as the search moves away from endpoints. That makes sense for driving: a 200km route shouldn’t expand residential streets in the middle. But for cycling and walking, the shortcut edges that power hierarchy are built for driving and would produce wrong routes. So bicycle and pedestrian stay on level 2 where all edges are accessible, trading search speed for correctness.
Memory-mapped tiles
The tile reader now uses mmap(2) instead of heap-allocated reads:
enum TileData {
Owned(Vec<u8>), // Tests
Mapped(Mmap), // Production
}
pub fn from_mmap(mmap: Mmap) -> Result<Self, TileError> {
let base_ll = Self::validate(&mmap)?;
Ok(GraphTile { data: TileData::Mapped(mmap), base_ll })
}In the graph reader:
let file = std::fs::File::open(&path).map_err(TileError::Io)?;
let mmap = unsafe { Mmap::map(&file) }.map_err(TileError::Io)?;
let tile = Arc::new(GraphTile::from_mmap(mmap)?);No heap allocation, no copy. The OS page cache handles eviction. The old application-level LRU cache became unnecessary. Tiles are cheap Arc pointers to mapped pages.
Thread-local tile cache
Routing makes thousands of get_tile calls per search. Even a RwLock on a HashMap adds up. The solution is a three-level cache:
thread_local! {
static TL_CACHE: RefCell<FxHashMap<u32, Arc<GraphTile>>> =
RefCell::new(FxHashMap::default());
}
pub fn get_tile(&self, id: &GraphId) -> Result<Arc<GraphTile>, TileError> {
let tile_key = id.tile_value();
// Level 1: thread-local (no synchronization, ~20ns)
let hit = TL_CACHE.with(|c| c.borrow().get(&tile_key).cloned());
if let Some(tile) = hit { return Ok(tile); }
// Level 2: shared cache (RwLock, parallel readers)
// ... insert into thread-local on hit
// Level 3: mmap from disk, populate both caches
// ...
}The hot path (thread-local hit) is a plain HashMap lookup in a RefCell. No lock, no contention, no Arc::clone from the shared cache. FxHash (from rustc_hash) replaces std::HashMap everywhere, optimized for integer keys that don’t need cryptographic hashing.
Distance approximator
Edge snapping calls distance computations in a tight loop, projecting a point onto every nearby road segment. Haversine uses sin, cos, asin, sqrt per call. For relative comparisons within a small area, that’s overkill.
The DistanceApproximator pre-computes cos(lat) once and then uses only multiplications:
pub struct DistanceApproximator {
lng_scale: f64,
meters_per_lng_degree: f64,
}
impl DistanceApproximator {
pub fn new(ll: &PointLL) -> Self {
let lng_scale = Self::lng_scale_per_lat(ll.lat);
Self {
lng_scale,
meters_per_lng_degree: lng_scale * METERS_PER_DEGREE_LAT,
}
}
pub fn distance_squared(&self, ll: &PointLL) -> f64 {
let lat_m = (ll.lat - self.center_lat) * METERS_PER_DEGREE_LAT;
let lng_m = (ll.lng - self.center_lng) * self.meters_per_lng_degree;
lat_m * lat_m + lng_m * lng_m
}
}Construct one for the search point, reuse it across all segment projections. 26% faster in edge snapping benchmarks. The accuracy loss is negligible within a few kilometers, well within the range of any snap search.
Distance-adaptive hierarchy
Short urban routes were sometimes jumping to highways unnecessarily. A 2km trip through a city shouldn’t consider the nearby motorway just because the hierarchy said “expand to level 0 after 5,000 meters.”
The fix scales hierarchy limits proportionally to route distance:
fn adjust_limits_for_distance(limits: &mut [HierarchyLimits], distance: f32) {
// 5km route: expand_within_dist = 2500m, max_up_transitions = 11
// 1km route: expand_within_dist = 2000m, max_up_transitions = 5
// 20km+ route: no adjustment
}A 5km route caps level-2 expansion at 2,500m and allows only 11 upward transitions (from 100 default). A 1km route drops to 5 transitions with a 2,000m cap. Routes above 15km keep the defaults. The search still finds the optimal path, it just doesn’t waste time on irrelevant highway segments for short trips.
A* heuristics in the cost matrix
The matrix previously used pure Dijkstra for each source/target pair: expand uniformly in all directions. Now each gets an A* heuristic aimed at its closest counterpart:
let closest = targets.iter()
.min_by(|a, b| {
let da = src.distance(a);
let db = src.distance(b);
da.partial_cmp(&db).unwrap_or(std::cmp::Ordering::Equal)
})
.unwrap_or(&targets[0]);
h.init(closest, cost_factor);
self.heuristics_forward.push(h);This steers each search toward likely connection points rather than expanding uniformly. Combined with Valhalla-style hierarchy and a path_dist_threshold set to 2x the max source-target distance, the matrix handles 9x9 urban grids without exploring the entire region.
Turn-by-turn maneuvers
The first version returned routes with “Start” and “Arrive” stubs. Now there’s a proper maneuver builder that groups consecutive edges into instructions.
The builder triggers a new maneuver on name changes, significant turns, road type transitions (roundabouts, ramps, ferries):
fn should_start_new_maneuver(&self, in_roundabout: bool) -> bool {
self.entering_roundabout
|| self.exiting_roundabout
|| self.entering_ramp
|| self.exiting_ramp
|| self.entering_ferry
|| self.exiting_ferry
|| self.turn_type == TurnType::Reverse
|| (self.name_changed && !in_roundabout)
|| (self.turn_type != TurnType::Straight && self.name_changed)
}Narrative generation uses rust-i18n with 34 locale files. Each maneuver type maps to a template:
start.5: "Drive %{cardinal_direction} on %{street_names}."
turn.1: "Turn %{relative_direction} onto %{street_names}."
enter_roundabout.2: "Enter the roundabout and take the %{ordinal_value} exit onto %{roundabout_exit_street_names}."
destination.0: "You have arrived at your destination."Roundabouts track exit counts. Cardinal directions derive from heading. Ordinals support up to the 10th exit. The locale files include French, German, Spanish, Italian, and (yes) pirate English.
Edge snapping improvements
Snapping a coordinate to the road network was one of the weaker points. Two fixes:
Multi-tile search. When a point falls near a tile boundary, the search now checks neighboring tiles in all 8 directions:
let near_south = point.lat - bbox.min_pt.lat < TILE_BOUNDARY_MARGIN;
let near_north = bbox.max_pt.lat - point.lat < TILE_BOUNDARY_MARGIN;
// ... check all 8 directions
for (i, &(dlng, dlat)) in offsets.iter().enumerate() {
if !flags[i] { continue; }
let neighbor_id = TileHierarchy::get_graph_id(&neighbor_pt, level);
if let Ok(tile) = reader.get_tile(&neighbor_id) {
let candidates = tile.find_closest_edges(point, access_mask, ...);
all.extend(candidates);
}
}Smarter candidate pruning. Post-collection, candidates beyond min_distance * 3.0 (minimum 25m threshold) are pruned, then select_nth_unstable_by does an O(N) partition instead of a full sort. Only the top K candidates get sorted.
All My Circuits
A Lit + TypeScript + MapLibre GL webapp for side-by-side comparison with Valhalla. Route and matrix modes, all three costing profiles, Valhalla comparison with percentage diffs on time and distance, resizable sidebar, URL state persistence, clickable matrix routes in verbose mode, and links to Google Maps and HERE for sanity checking.
The debug mode loads per-edge data (speed, density, road class, costs) so you can see exactly why the engine chose one road over another. The inspect endpoint visualizes tile-level node and edge data for when things really go wrong.
Criterion benchmarks
Benchmarks run against France tiles across four crates:
- Tiles: cache hit (~20ns thread-local), cold load (mmap), edge traversal
- Costing: single edge cost, transition cost, access check
- Routing: 5km urban, 55km regional, 240km long-distance
- Matrix: 1x1, 4x4 coastal, 9x9 urban
A separate bench runner does HTTP-level comparison against Valhalla, reporting p50/p95/p99 latencies and route cost/distance diffs. Useful for regression testing and for validating that the cost model matches Valhalla’s output within acceptable margins.
Where it stands
The engine handles France-scale data with three costing profiles. Routes match Valhalla on the scenarios I’ve tested. Turn-by-turn directions work in 34 languages. The matrix scales to 9x9 with A* heuristics. Performance is good enough for a single-server deployment.
What’s new since part 1:
- Bicycle and pedestrian cost models
- Memory-mapped tiles with thread-local cache
- FxHash everywhere, DistanceApproximator for snapping (26% faster)
- Distance-adaptive hierarchy limits
- A* heuristics in the cost matrix
- Turn-by-turn maneuvers with localized narrative in 34 languages
- Multi-tile edge snapping
- Criterion benchmarks and a Valhalla comparison webapp
What’s still ahead:
- Isochrones: reachability polygons from a point within a time/distance budget
- Contraction hierarchies: the hierarchy transitions help but true CH preprocessing would unlock cross-Europe routes at acceptable latency
- Via points: multi-leg routes with intermediate stops