3.22. MongoDB aggregation

发布时间 : 2025-10-25 13:34:21 UTC      

Page Views: 10 views

Aggregate in MongoDB is mainly used to process data (such as statistical average, summation, etc.) and return the calculated data results.

It’s kind of similar. SQL Count (*) in the statement.

3.22.1. aggregate() Method

The aggregate method in MongoDB uses the aggregate() .

Grammar

aggregate() The basic syntax format of the method is as follows:

>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION) 

Example

The data in the collection is as follows:

{ _id: ObjectId(7df78ad8902c) title: 'MongoDB Overview', description: 'MongoDB is no sql database', by_user: 'runoob.com', url: 'http://www.runoob.com', tags: ['mongodb', 'database', 'NoSQL'], likes: 100 }, { _id: ObjectId(7df78ad8902d) title: 'NoSQL Overview', description: 'No sql database is very fast', by_user: 'runoob.com', url: 'http://www.runoob.com', tags: ['mongodb', 'database', 'NoSQL'], likes: 10 }, { _id: ObjectId(7df78ad8902e) title: 'Neo4j Overview', description: 'Neo4j is no sql database', by_user: 'Neo4j', url: 'http://www.neo4j.com', tags: ['neo4j', 'database', 'NoSQL'], likes: 750 }, 

Now we use the above collection to calculate the number of articles written by each author, and the result using aggregate () is as follows:

> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}]) { "result" : [ { "_id" : "runoob.com", "num_tutorial" : 2 }, { "_id" : "Neo4j", "num_tutorial" : 1 } ], "ok" : 1 } > 

The above example is similar to the sql statement:

select by_user, count(*) from mycol group by by_user 

In the above example, we group the data by the field by_user field and calculate the sum of the same values in the by_user field.

The following table shows some aggregate expressions:

Expression.

Description

Example

$sum

Calculate the sum.

Db.mycol.aggregate ( [{$group : {_id : “$by_user”, num_tutorial : {$sum : “$likes”}}}] )

$avg

Calculate the average

Db.mycol.aggregate ( [{$group : {_id : “$by_user”, num_tutorial : {$avg : “$likes”}}}] )

$min

Gets the minimum value for all documents in the collection.

Db.mycol.aggregate ( [{$group : {_id : “$by_user”, num_tutorial : {$min : “$likes”}}}] )

$max

Gets the maximum value for all documents in the collection.

Db.mycol.aggregate ( [{$group : {_id : “$by_user”, num_tutorial : {$max : “$likes”}}}] )

$push

Adding values to an array does not determine whether there are duplicate values.

Db.mycol.aggregate ( [{$group : {_id : “$by_user”, url : {$push: “$url”}}}] )

$addToSet

Adding a value to an array determines whether there is a duplicate value, but does not add the same value if it already exists in the array.

Db.mycol.aggregate ( [{$group : {_id : “$by_user”, url : {$addToSet : “$url”}}}] )

$first

The first document data is obtained according to the sorting of resource documents.

Db.mycol.aggregate ( [{$group : {_id : “$by_user”, first_url : {$first : “$url”}}}] )

$last

Get the last document data according to the sorting of resource documents

Db.mycol.aggregate ( [{$group : {_id : “$by_user”, last_url : {$last : “$url”}}}] )

3.22.2. The concept of pipeline

Pipes are commonly used in Unix and Linux to use the output of the current command as a parameter to the next command.

MongoDB’s aggregation pipeline passes the results of the MongoDB document to the next after one pipe has finished processing. Pipe operations can be repeated.

Expression: processes the input document and outputs it. Expressions are stateless and can only be used to evaluate documents for the current aggregation pipeline, not other documents.

Here we introduce several operations commonly used in the aggregation framework:

  • $project: modify the structure of the input document. Can be used to rename, add, or delete fields, or to create calculation results and nested documents.

  • Match: used to filter data and output only documents that meet the criteria. Match uses MongoDB’s standard query operation.

  • Limit: used to limit the number of documents returned by the MongoDB aggregation pipeline.

  • Skip: skips the specified number of documents in the aggregation pipeline and returns the remaining documents.

  • Unwind: splits an array type field in a document into multiple strips, each containing a value in the array.

  • $group: groups the documents in the collection and can be used to count the results.

  • $sort: sort the input documents and output them.

  • $geoNear: outputs ordered documents close to a geographic location.

Pipe operator instance

1、$project实例

db.article.aggregate( { $project : { title : 1 , author : 1 , }} ); 

In this way, there are only three fields,_ id,tilte and author, in the result. By default, the_ id field is included. If you want not to include_ id, you can do this:

db.article.aggregate( { $project : { _id : 0 , title : 1 , author : 1 }}); 

2.$match实例

db.articles.aggregate( [ { $match : { score : { $gt : 70, $lte : 90 } } }, { $group: { _id: null, count: { $sum: 1 } } } ] ); 

$match is used to get records with scores greater than 70, less than or equal to 90, and then send the eligible records to the next stage $group pipeline operator for processing.

3.$skip实例

db.article.aggregate( { $skip : 5 }); 

After being processed by the $skip pipeline operator, the first five documents are “filtered” out.

《地理信息系统原理、技术与方法》  97

最近几年来,地理信息系统无论是在理论上还是应用上都处在一个飞速发展的阶段。 GIS被应用于多个领域的建模和决策支持,如城市管理、区划、环境整治等等,地理信息成为信息时代重要的组成部分之一; “数字地球”概念的提出,更进一步推动了作为其技术支撑的GIS的发展。 与此同时,一些学者致力于相关的理论研究,如空间感知、空间数据误差、空间关系的形式化等等。 这恰好说明了地理信息系统作为应用技术和学科的两个方面,并且这两个方面构成了相互促进的发展过程。