你可以使用Python代码来实现这个需求。以下是一个示例代码:
data = [
{'doc_id': '123', 'key': '1'},
{'doc_id': '123', 'key': '2'},
{'doc_id': 'abc', 'key': '3'},
{'doc_id': '123', 'key': '4'},
{'doc_id': 'abc', 'key': '5'}
]
def filter_documents(data, limits):
result = []
counts = {}
for item in data:
doc_id = item['doc_id']
if doc_id not in counts:
counts[doc_id] = 0
if counts[doc_id] < limits.get(doc_id, 0):
result.append(item)
counts[doc_id] += 1
return result
# 设置每个doc_id的保留数量
limits = {
'123': 2,
'abc': 1
}
filtered_data = filter_documents(data, limits)
print(filtered_data)
运行这段代码将输出:
[{'doc_id': '123', 'key': '1'}, {'doc_id': '123', 'key': '2'}, {'doc_id': 'abc', 'key': '3'}]
这个代码通过维护一个计数器来跟踪每个doc_id
的出现次数,并根据给定的限制来决定是否保留该项。