Partial UPSERTs in Apache Pinot
An often underestimated feature you shouldn’t overlook when choosing a real-time analytics database, for precision and simplicity.
Recently, I had the opportunity to dive deep into the partial UPSERT capabilities of Apache Pinot, and I was absolutely amazed by the flexibility and power of the available strategies. These strategies unlock countless possibilities when building real-time analytics applications. What struck me most is how much complexity they take off your plate—no more worrying about writing intricate application logic to handle updates and merges. The database takes care of it for you, automatically and efficiently.
In this blog, I’ll showcase these strategies with simple examples across different industries, illustrating how they can simplify real-time analytics development.
What are UPSERTs in Apache Pinot / StarTree ?
In traditional databases, an UPSERT (a combination of “update” and “insert”) ensures that existing records are updated, or new ones are inserted when needed. Apache Pinot takes this concept further by supporting partial updates, allowing you to update specific columns in a record instead of overwriting the entire row. This feature is essential for optimizing performance, storage, and flexibility in real-time analytics.
Partial UPSERTs in Apache Pinot
Partial UPSERTs enable column-level updates using configurable strategies that dictate how new values interact with existing data. Whether you’re incrementing counters, appending to lists, or maintaining minimum and maximum values, Pinot offers precise tools to handle these scenarios efficiently. These strategies provide an elegant way to manage complex use cases without having to write custom logic in your application.
Apache Pinot supports seven partial UPSERT strategies, each tailored to specific use cases across different industries. Let’s explore each strategy in detail, paired with practical examples and corresponding table structures.
1. OVERWRITE Strategy
The OVERWRITE strategy replaces the value of a column in the last record with the new value. This is useful when only the most recent data matters, such as updating the latest bid value for an ad campaign.
Example Industry Use Case: Ad Tech (Updating Ad Campaign Bids)
2. INCREMENT Strategy
The INCREMENT strategy adds the incoming value to the existing value. This is ideal for cumulative metrics, such as tracking the total number of items sold for a product.
Example Industry Use Case: E-commerce (Updating Total Items Sold for a Product)
3. APPEND Strategy
The APPEND strategy adds the new item to an unordered set, even if it already exists. This is commonly used for logging or maintaining event histories.
Example Industry Use Case: Logistics (Tracking Delivery Checkpoints)
4. UNION Strategy
The UNION strategy adds the new item to an unordered set only if it doesn’t already exist. This is perfect for deduplicating values.
Example Industry Use Case: Social Media (Adding Unique User IDs to a Post)
5. IGNORE Strategy
The IGNORE strategy retains the existing value, ignoring incoming updates. This is useful when updates need to be handled manually or under specific conditions.
Example Industry Use Case: Finance (Risk Score Updates Requiring Manual Review)
6. MAX Strategy
The MAX strategy keeps the larger value between the existing value and the incoming value. This is ideal for tracking maximum readings.
Example Industry Use Case: IoT (Tracking Maximum Temperature Recorded by a Sensor)
7. MIN Strategy
The MIN strategy keeps the smaller value between the existing value and the incoming value. This is useful for identifying the lowest values.
Example Industry Use Case: Supply Chain (Tracking Minimum Shipping Cost for a Route)
Furthermore, the most powerful aspect of Apache Pinot’s partial UPSERT capabilities is the column-level granularity it offers. You can apply different strategies to different columns within the same table—for example, INCREMENT for one column, IGNORE for another, and OVERWRITE for yet another. Alternatively, you can set a default partial UPSERT strategy across all columns. This flexibility makes it incredibly versatile and suitable for a wide range of real-time analytics use cases.
...
"upsertConfig": {
"mode": "PARTIAL",
"partialUpsertStrategies": {
"column1": "INCREMENT",
"column2": "IGNORE",
"column3": "OVERWRITE"
},
"defaultPartialUpsertStrategy": "OVERWRITE"
}
This config demonstrates how you can mix and match strategies at the column level while setting a default strategy for any unspecified columns.
I recently tried this myself in StarTree’s free tier, where you can spin up a workspace and test it out too. It’s a simple way to explore these strategies and see their impact in action.