Преглед на файлове

docs: add sharded Prometheus best practices and warnings for global e… (#3244)

Signed-off-by: shiavm006 <shivammittal42006@gmail.com>
Co-authored-by: Matt Bolt <mbolt35@gmail.com>
Shivam Mittal преди 9 месеца
родител
ревизия
6343e2c9a9
променени са 3 файла, в които са добавени 23 реда и са изтрити 1 реда
  1. 3 0
      README.md
  2. 17 1
      modules/prometheus-source/README.md
  3. 3 0
      modules/prometheus-source/pkg/env/promenv.go

+ 3 - 0
README.md

@@ -33,6 +33,9 @@ You can deploy OpenCost on any Kubernetes 1.20+ cluster in a matter of minutes,
 
 Visit the full documentation for [recommended installation options](https://www.opencost.io/docs/installation/install).
 
+> **Note for sharded Prometheus users:**
+> If you run Prometheus in a sharded (HA) setup, set `PROMETHEUS_SERVER_ENDPOINT` to a global query endpoint (e.g., Thanos Query, Cortex, or Mimir). Pointing to a single Prometheus pod may result in incomplete or intermittent export results. See the [Prometheus integration docs](https://www.opencost.io/docs/installation/prometheus) for details.
+
 ## Usage
 
 - [Cost APIs](https://www.opencost.io/docs/integrations/api)

+ 17 - 1
modules/prometheus-source/README.md

@@ -1,3 +1,19 @@
 # OpenCost Data Sources - Prometheus
 
-The OpenCost Prometheus data source is an implementation which provides OpenCost with the metrics and metadata required to calculate cost allocation. Prometheus provides longer retention periods and more detailed metrics than the OpenCost Collector, which is useful for historical analysis and cost forecasting.
+The OpenCost Prometheus data source is an implementation which provides OpenCost with the metrics and metadata required to calculate cost allocation. Prometheus provides longer retention periods and more detailed metrics than the OpenCost Collector, which is useful for historical analysis and cost forecasting.
+
+# Sharded Prometheus Best Practices
+
+**If you are running Prometheus in a sharded (HA) setup:**
+
+- Each Prometheus pod only scrapes a subset of targets. If OpenCost is configured to query a single Prometheus pod, it will only see partial data, and export jobs may fail or return incomplete results.
+- To ensure complete and reliable cost data, set `PROMETHEUS_SERVER_ENDPOINT` to a global query endpoint that aggregates all shards, such as [Thanos Query](https://thanos.io/tip/components/query.md/), [Cortex Query Frontend](https://cortexmetrics.io/docs/architecture/), or [Mimir Query Frontend](https://grafana.com/docs/mimir/latest/operations/query-frontend/).
+- If you do not use a global endpoint, you may experience intermittent failures or missing data in OpenCost exports.
+
+**Example:**
+
+```
+export PROMETHEUS_SERVER_ENDPOINT="http://thanos-query-frontend:9090"
+```
+
+For more details, see the [OpenCost documentation](https://www.opencost.io/docs/installation/prometheus) and the documentation for your query aggregator.

+ 3 - 0
modules/prometheus-source/pkg/env/promenv.go

@@ -39,6 +39,9 @@ const (
 	KubecostJobNameEnvVar = "KUBECOST_JOB_NAME"
 )
 
+// In sharded Prometheus setups, PROMETHEUS_SERVER_ENDPOINT should point to a global query endpoint (e.g., Thanos Query, Cortex, or Mimir)
+// to ensure OpenCost receives complete data. Pointing to a single Prometheus pod may result in incomplete or intermittent export results.
+
 // IsPrometheusRetryOnRateLimitResponse will attempt to retry if a 429 response is received OR a 400 with a body containing
 // ThrottleException (common in AWS services like AMP)
 func IsPrometheusRetryOnRateLimitResponse() bool {