3.39. MongoDB Map Reduce

发布时间 : 2025-10-25 13:34:21 UTC      

Page Views: 9 views

Map-Reduce is a computing model that simply breaks down (MAP) a large number of work (data) and then merges the results into the final result (REDUCE).

The Map-Reduce provided by MongoDB is very flexible and useful for large-scale data analysis.

3.39.1. MapReduce command

The following is the basic syntax of MapReduce:

>db.collection.mapReduce( function() {emit(key,value);}, //map 函数 function(key,values) {return reduceFunction}, //reduce 函数 { out: collection, query: document, sort: document, limit: number } ) 

Using MapReduce to implement two functions, the Map function and the Reduce function, the Map function calls emit (key, value), traverses all the records in the collection, and passes key and value to the Reduce function for processing.

The Map function must call emit (key, value) to return a key-value pair.

Parameter description:

  • map Mapping function (generates a sequence of key-value pairs as an argument to the reduce function)

  • reduce Statistics function, the task of the reduce function is to turn key-values into key-value, that is, to turn the values array into a single value value.

  • out The statistical results store the collection (if not specified, a temporary collection is used, which is automatically deleted when the client is disconnected).

  • query A filter condition in which only documents that meet the criteria call the map function. (query. Limit,sort can be combined at will)

  • sort The sort sorting parameter combined with limit (also sorting documents before sending to the map function) can optimize the grouping mechanism.

  • limit The upper limit on the number of documents sent to the map function (without limit, it is not useful to use sort alone)

The following example looks for data for status: “A” in the collection orders, and according to the cust_id To group and calculate the sum of the amount.

Image0

Use MapReduce

Consider the following document structure to store the user’s article, where the document stores the user’s user_name and the article’s status field:

>db.posts.insert({ "post_text": "菜鸟教程,最全的技术文档。", "user_name": "mark", "status":"active" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "菜鸟教程,最全的技术文档。", "user_name": "mark", "status":"active" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "菜鸟教程,最全的技术文档。", "user_name": "mark", "status":"active" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "菜鸟教程,最全的技术文档。", "user_name": "mark", "status":"active" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "菜鸟教程,最全的技术文档。", "user_name": "mark", "status":"disabled" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "菜鸟教程,最全的技术文档。", "user_name": "runoob", "status":"disabled" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "菜鸟教程,最全的技术文档。", "user_name": "runoob", "status":"disabled" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "菜鸟教程,最全的技术文档。", "user_name": "runoob", "status":"active" }) WriteResult({ "nInserted" : 1 }) 

Now we will use the mapReduce function in the posts collection to select the published article (status: “active”) and use the user_name Group to calculate the number of articles per user:

>db.posts.mapReduce( function() { emit(this.user_name,1); }, function(key, values) {return Array.sum(values)}, { query:{status:"active"}, out:"post_total" } ) 

The above mapReduce output is as follows:

{ "result" : "post_total", "timeMillis" : 23, "counts" : { "input" : 5, "emit" : 5, "reduce" : 1, "output" : 2 }, "ok" : 1 } 

The results show that there are five documents that meet the query criteria (status: “active”), and five key-value pairs are generated in the map function. Finally, the same key-value is divided into two groups by using the reduce function.

  • Result: the name of the collection that stores the results. This is a temporary collection. The connection to the MapReduce is automatically deleted when the connection is closed.

  • TimeMillis: time taken to execute in milliseconds

  • Input: the number of documents sent to the map function that meet the condition

  • Emit: the number of times emit is called in the map function, that is, the total amount of data in all collections

  • Output: the number of documents in the result set (count is very helpful for debugging)

  • Ok: whether it is successful or not, success is 1

  • Err: if you fail, there can be reasons for failure, but from an empirical point of view, the reasons are vague and of little use.

Use find Operator to view mapReduce Query results of:

> var map=function() { emit(this.user_name,1); } > var reduce=function(key, values) {return Array.sum(values)} > var options={query:{status:"active"},out:"post_total"} > db.posts.mapReduce(map,reduce,options) { "result" : "post_total", "ok" : 1 } > db.post_total.find(); 

The above query shows the following results:

{ "_id" : "mark", "value" : 4 } { "_id" : "runoob", "value" : 1 } 

In a similar manner, MapReduce can be used to build large and complex aggregate queries.

The Map function and Reduce function can be implemented using JavaScript, which makes the use of MapReduce very flexible and powerful.

《地理信息系统原理、技术与方法》  97

最近几年来,地理信息系统无论是在理论上还是应用上都处在一个飞速发展的阶段。 GIS被应用于多个领域的建模和决策支持,如城市管理、区划、环境整治等等,地理信息成为信息时代重要的组成部分之一; “数字地球”概念的提出,更进一步推动了作为其技术支撑的GIS的发展。 与此同时,一些学者致力于相关的理论研究,如空间感知、空间数据误差、空间关系的形式化等等。 这恰好说明了地理信息系统作为应用技术和学科的两个方面,并且这两个方面构成了相互促进的发展过程。