The Drawbacks of using AWS SageMaker Feature Store

Published: Nov 25, 2025
8 min read
TABLE OF CONTENTS

Introduction

When designing a system, it's important to carefully evaluate the tools and services you plan to integrate. These choices shape the long-term architecture, and the wrong fit can introduce technical debt that slows future development or, in the worst case, impacts the product itself.

Identifying potential drawbacks early helps you make informed decisions and avoid pitfalls that can turn project maintenance into a long-term burden.

With that in mind, if you're considering AWS SageMaker Feature Store for managing and serving ML features, here are several important factors to review before committing to the service.

Batch ingestion

There is no native batch ingestion support in SageMaker Feature Store, records can only be written one at a time.

The SageMakerFeatureStoreRuntime.Client exposes only the put_record API, which accepts a single Record, a list of FeatureValues, per request.

This limitation can severely impact ingestion throughput and overall efficiency, especially when working with many feature groups or high-volume data.

Quota:

Maximum Transactions per second (TPS) per API per AWS account: Soft limit of 10000 TPS per API excluding the BatchGetRecord API call, which has a soft limit of 500 TPS.

Ref: AWS

The SageMaker Feature Store SDK has the following ingest function:

feature_group.ingest(
    data_frame=feature_data,
    max_workers=3,
    wait=True
)

This approach runs a multithreaded execution (with max_workers threads), where each thread processes a batch of the DataFrame and issues a separate put_record call for each row.

For details on the exact implementation, refer to the linked source code.

When you run multiple batch or streaming ingestion jobs across many feature groups, it becomes easy to hit the request quota. At that point, you'll need to implement backoff logic, which increases overall job duration, and request a soft-limit increase from AWS.

Updating records

You cannot update a record directly or partially. You can only add new records to the feature group or overwrite them using the PutRecord API.

Which means that if you want to update the latest record with additional features, you will have to:

  • Use GetRecord to retrieve the latest record
  • Update the returned record
  • Use PutRecord to update feature values

The PutRecord will do a full-overwrite of FeatureValue objects. That's why you have to keep everything from the returned record and update selected features.

To get the exact EventTime of the latest record, we are retrieving it first. Otherwise:

If a new record's EventTime is greater, the new record is written to both the OnlineStore and OfflineStore. Otherwise, the record is a historic record and it is written only to the OfflineStore.

Ref: AWS

This also increases the total number of requests, as every record must be written separately.

Limited dtypes

AWS SageMaker Feature Store supports three data types:

  • String
  • Fractional (IEEE 64-bit floating point value)
  • Integer (Int64 - 64 bit signed integral value)

The EventTime feature can be provided as a String (ISO-8601) or Fractional (unix epoch).

This means that, if a column in your dataset is not of a float or long feature type, it defaults to String in your feature store.

This is evident in the SageMaker SDK source code , where the load_feature_definitions utility function handles determining the feature value type:

_INTEGER_TYPES = ["int_", "int8", "int16", "int32", "int64", "uint8", "uint16", "uint32", "uint64"]
_FLOAT_TYPES = ["float_", "float16", "float32", "float64"]

DTYPE_TO_FEATURE_DEFINITION_CLS_MAP: Dict[str, FeatureTypeEnum] = {type: FeatureTypeEnum.INTEGRAL for type in _INTEGER_TYPES}
DTYPE_TO_FEATURE_DEFINITION_CLS_MAP.update({type: FeatureTypeEnum.FRACTIONAL for type in _FLOAT_TYPES})
DTYPE_TO_FEATURE_DEFINITION_CLS_MAP["string"] = FeatureTypeEnum.STRING
DTYPE_TO_FEATURE_DEFINITION_CLS_MAP["object"] = FeatureTypeEnum.STRING

_FEATURE_TYPE_TO_DDL_DATA_TYPE_MAP = {
    FeatureTypeEnum.INTEGRAL.value: "INT",
    FeatureTypeEnum.FRACTIONAL.value: "FLOAT",
    FeatureTypeEnum.STRING.value: "STRING",
}

In practice, this means you'll need to implement your own serialization and deserialization logic for any non-primitive objects, ensuring they are safely converted to and from strings before being written to or read from the Feature Store.

Besides data types, you can use collection types that provide a way to organize and structure data for efficient retrival and analytics. They are groupings of elements in which each element within the collection must have the same feature type - string, fractional, or integer.

However, collection types are supported only by the InMemory Online Store, which itself does not support an Offline Store.

Note that the InMemory tier currently supports online feature groups only, not online+offline feature groups, so there is not replication between online and offline stores for the InMemory tier. Also, the InMemory tier does not currently support customer managed KMS keys.

Ref: AWS

Schema evolution

You can add features to a feature group, but you cannot remove them.

You can add features for your feature group using the FeatureAdditions request parameter. Features cannot be removed from a feature group.

Ref: AWS

You can only completely remove a whole feature group by using DeleteFeatureGroup. The data in OnlineStore is immediately unavailable, but the data written into the OfflineStore will not be deleted.

The UpdateFeatureGroup supports adding new feature definitions or updating the online store configuration. Which means once you create a feature group, you cannot change its schema, you can only add new features to the group.

Others also raised this limitation on Stack Overflow.

Nulls not supported

You cannot insert null values to a feature store.

The FeatureValue object contains:

  • FeatureName - The name of the feature
  • ValueAsString - String representation of all three types: String, Fractional, and Integer
  • ValueAsStringList - List of values in string format, if collection type is used

This means that if you need to explicitly ingest a null value, you must implement logic to translate it into an accepted placeholder. For example, using string tokens like NaN or NA for String features, or numerical values such as 0 or -1 for Integer and Fractional features.

Otherwise, the null value will have to be ommited and removed from the Record before put_record is called.

The SageMaker Feature Store SDK implements this logic in the _ingest_row method:

record = [
    (
        FeatureValue(
            feature_name=data_frame.columns[index - 1],
            value_as_string_list=IngestionManagerPandas._covert_feature_to_string_list(
                row[index]
            ),
        )
        if IngestionManagerPandas._is_feature_collection_type(
            feature_name=data_frame.columns[index - 1],
            feature_definitions=feature_definitions,
        )
        else FeatureValue(
            feature_name=data_frame.columns[index - 1], value_as_string=str(row[index])
        )
    )
    for index in range(1, len(row))
    if IngestionManagerPandas._feature_value_is_not_none(feature_value=row[index]) # <-------
]

put_record_params = {
    "FeatureGroupName": feature_group_name,
    "Record": [value.to_dict() for value in record],
}
if target_stores:
    put_record_params["TargetStores"] = [
        target_store.value for target_store in target_stores
    ]

sagemaker_fs_runtime_client.put_record(**put_record_params)

If we disregard collection types, what happens is value_as_string=str(row[index]) if row[index] is not null.

This Stack Overflow post is a bit dated, but it clearly illustrates this limitation, along with a common workaround.

Single Record Identifier

Record: Collection of values for features for a single record identifier. A combination of record identifier and event time values uniquely identify a record within a feature group.

Ref: AWS

When defining a Feature Group, the RecordIdentifierFeatureName must correspond to one of the existing feature definitions, such as customer_id.

This field must be a single feature, as composite identifiers spanning multiple features i.e. columns are not supported.

If you need a multi-feature identifier, you'll have to create a new feature that combines those values and use it as the record identifier.

For more information, please see CreateFeatureGroup API.