Object Storage

Object storage is a device and software that stores data in structures called objects, which are combinations of data, associated metadata, and custom identifiers. It provides client data through RESTful HTTP APIs, which also helps with scalability. Simply adding nodes enables petabyte-scale expansion. Data and metadata are distributed, processed, and replicated across multiple nodes in a distributed cluster environment through the network, with erasure coding support for recovery. Comprehensive metadata configuration eliminates the need for hierarchical folder/subfolder structures like in file storage. However, while object storage has fast average response times, there can be significant latency at p99 and above. Due to distributed system characteristics, some requests can be very slow.

Object Storage Hedging

Send multiple requests simultaneously → significantly reduces p99 latency, though at increased cost.

Object Storage Systems

FUSE

Minio

Taking out the Trash: Garbage Collection of Object Storage at Massive Scale

The process of physically removing files deleted in metadata from object storage is necessary, but simple bucket policies or synchronous deletion methods have limitations such as interfering with live queries and making recovery impossible when misuse (errors) occur.

Taking out the Trash: Garbage Collection of Object Storage at Massive Scale

Distributed systems built on object storage all have one common problem: removing files that have been logically deleted either due to data expiry or compaction. We review the pros and cons of five ways to solve this problem.