PR: trap-configmap-watcher
Branch: jessegoodier:opencost:trap-configmap-watcher
Comparison: https://github.com/opencost/opencost/compare/develop...jessegoodier:opencost:trap-configmap-watcher
Reviewer: Claude
Date: 2026-01-21
This PR adds proactive RBAC validation to the ConfigMap watcher component and includes comprehensive documentation on failure modes in OpenCost. The changes convert silent RBAC failures into explicit, actionable errors that help users diagnose permission issues.
Recommendation: Approve with minor changes
| File | Change Type | Lines Added | Lines Removed |
|---|---|---|---|
pkg/util/watcher/configwatchers.go |
Modified | ~35 | ~2 |
docs/AREAS_WE_SHOULD_FAIL_FAST.md |
Added | 567 | 0 |
docs/RBAC_VALIDATION_ANALYSIS.md |
Added | 389 | 0 |
pkg/util/watcher/configwatchers.gofmt, apierrorsNewConfigMapWatchers(): Tests ConfigMap list permission at initializationWatch(): Distinguishes between Forbidden, NotFound, and other errors// In NewConfigMapWatchers() - NEW CODE
if kubeClientset != nil {
// Validate RBAC permissions before starting the watcher
// This prevents silent failures and provides clear error messages
_, err := kubeClientset.CoreV1().ConfigMaps(namespace).List(
context.Background(),
metav1.ListOptions{Limit: 1})
if err != nil {
if apierrors.IsForbidden(err) {
panic(fmt.Sprintf(
"FATAL: Missing RBAC permissions to list ConfigMaps in namespace '%s'.\n"+
"The service account does not have the required permissions.\n"+
"Please ensure the service account has a Role/ClusterRole with "+
"'get', 'list', and 'watch' permissions for ConfigMaps.\n"+
"Error: %v",
namespace, err))
}
// For other errors (e.g., API server unreachable), log but don't panic
log.Warnf("Unable to verify ConfigMap access in namespace '%s': %v", namespace, err)
}
// ... rest of initialization
}
// In Watch() - MODIFIED CODE
for cw := range cmw.watchers {
configs, err := cmw.kubeClientset.CoreV1().ConfigMaps(cmw.namespace).Get(
context.Background(), cw, metav1.GetOptions{})
if err != nil {
if apierrors.IsForbidden(err) {
panic(fmt.Sprintf(
"FATAL: Missing RBAC permissions to access ConfigMap '%s' in namespace '%s'.\n"+
"The service account does not have the required permissions.\n"+
"Please ensure the service account has a Role/ClusterRole with "+
"'get', 'list', and 'watch' permissions for ConfigMaps.\n"+
"Error: %v",
cw, cmw.namespace, err))
}
if apierrors.IsNotFound(err) {
log.Infof("ConfigMap %s not found in namespace %s at install time, "+
"using existing configs", cw, cmw.namespace)
} else {
log.Warnf("Error accessing ConfigMap %s in namespace %s: %v",
cw, cmw.namespace, err)
}
} else {
log.Infof("Found configmap %s, watching...", configs.Name)
watchConfigFunc(configs)
}
}
| Aspect | Assessment |
|---|---|
| Problem identification | Correctly identifies that silent RBAC failures cause user confusion |
| Error classification | Properly uses apierrors.IsForbidden() and apierrors.IsNotFound() |
| Error messages | Clear, actionable messages with namespace context and remediation steps |
| Performance | Uses Limit: 1 to minimize permission check overhead |
| Graceful degradation | Only panics on permission issues; logs warnings for transient errors |
Description: The same Forbidden handling pattern is repeated in both NewConfigMapWatchers() and Watch().
Current Code:
// Appears twice with slight variations
if apierrors.IsForbidden(err) {
panic(fmt.Sprintf(
"FATAL: Missing RBAC permissions to list ConfigMaps in namespace '%s'.\n"+
// ...
))
}
Recommendation: Extract into a helper function:
func panicOnForbidden(err error, resource, namespace string) {
if apierrors.IsForbidden(err) {
log.Fatalf(
"FATAL: Missing RBAC permissions to access %s in namespace '%s'.\n"+
"The service account does not have the required permissions.\n"+
"Please ensure the service account has a Role/ClusterRole with "+
"'get', 'list', and 'watch' permissions for %s.\n"+
"Error: %v",
resource, namespace, resource, err)
}
}
Description: The PR uses panic() for fatal errors, but the existing codebase uses log.Fatalf() for similar situations.
Existing pattern in codebase (pkg/costmodel/router.go):
kubeClientset, err := kubeconfig.LoadKubeClient("")
if err != nil {
log.Fatalf("Failed to build Kubernetes client: %s", err.Error())
}
New pattern in this PR:
panic(fmt.Sprintf("FATAL: Missing RBAC permissions..."))
Recommendation: Use log.Fatalf() for consistency:
if apierrors.IsForbidden(err) {
log.Fatalf(
"FATAL: Missing RBAC permissions to list ConfigMaps in namespace '%s'.\n"+
"The service account does not have the required permissions.\n"+
"Please ensure the service account has a Role/ClusterRole with "+
"'get', 'list', and 'watch' permissions for ConfigMaps.\n"+
"Error: %v",
namespace, err)
}
Benefits:
Description: The RBAC check in NewConfigMapWatchers() uses List, but Watch() uses Get. If RBAC is configured to allow list but not get (unusual but possible), the startup check would pass but Watch() would fail.
Recommendation: Document that both get and list permissions are required, or add a note in the error message.
Description: No unit tests are visible in the diff for the new error handling paths.
Recommendation: Add tests covering:
Example test structure:
func TestNewConfigMapWatchers_ForbiddenError(t *testing.T) {
// Mock client that returns Forbidden on List
// Assert that panic/fatal occurs with expected message
}
func TestWatch_NotFoundError(t *testing.T) {
// Mock client that returns NotFound on Get
// Assert that no panic occurs and info is logged
}
Description: The initial check validates permissions at startup, but if permissions are revoked while OpenCost is running, the underlying watchController will still fail silently via the Kubernetes reflector.
Note: This is acknowledged in the documentation but not addressed in the code. This is acceptable for the scope of this PR but should be tracked for future work.
docs/AREAS_WE_SHOULD_FAIL_FAST.md (567 lines)Content: Comprehensive analysis of areas where OpenCost should fail fast rather than continue with degraded functionality.
Strengths:
Concerns:
docs/RBAC_VALIDATION_ANALYSIS.md (389 lines)Content: Detailed analysis of RBAC permission validation gaps with implementation recommendations.
Strengths:
Concerns:
| Category | Rating | Notes |
|---|---|---|
| Correctness | ✅ Good | Properly detects and handles RBAC errors |
| Error Messages | ✅ Good | Clear, actionable, includes context |
| Code Quality | ⚠️ Fair | Some duplication; inconsistent error pattern |
| Consistency | ⚠️ Fair | Uses panic instead of log.Fatalf |
| Test Coverage | ❌ Missing | No tests for new error paths |
| Documentation | ⚠️ Mixed | Thorough but possibly over-scoped |
panic() with log.Fatalf() for consistency with existing codebase patternsget + list + watch all being requiredThis PR addresses a real user pain point: silent RBAC failures that make troubleshooting difficult. The core approach of validating permissions at startup and failing fast with clear error messages is sound.
The implementation is functional but would benefit from minor refinements for consistency with the existing codebase. The documentation is comprehensive but may be better suited for a separate tracking mechanism given its scope.
Final Recommendation: Approve with the changes noted above.
Report generated by Claude Code Review