3.39. MongoDB Map Reduce

Map-Reduce is a computing model that simply breaks down (MAP) a large number of work (data) and then merges the results into the final result (REDUCE).

The Map-Reduce provided by MongoDB is very flexible and useful for large-scale data analysis.

3.39.1. MapReduce command ¶

The following is the basic syntax of MapReduce:

>db.collection.mapReduce( function() {emit(key,value);}, //map 函数 function(key,values) {return reduceFunction}, //reduce 函数 { out: collection, query: document, sort: document, limit: number } ) 

Using MapReduce to implement two functions, the Map function and the Reduce function, the Map function calls emit (key, value), traverses all the records in the collection, and passes key and value to the Reduce function for processing.

The Map function must call emit (key, value) to return a key-value pair.

Parameter description:

map Mapping function (generates a sequence of key-value pairs as an argument to the reduce function)
reduce Statistics function, the task of the reduce function is to turn key-values into key-value, that is, to turn the values array into a single value value.
out The statistical results store the collection (if not specified, a temporary collection is used, which is automatically deleted when the client is disconnected).
query A filter condition in which only documents that meet the criteria call the map function. (query. Limit,sort can be combined at will)
sort The sort sorting parameter combined with limit (also sorting documents before sending to the map function) can optimize the grouping mechanism.
limit The upper limit on the number of documents sent to the map function (without limit, it is not useful to use sort alone)

The following example looks for data for status: “A” in the collection orders, and according to the cust_id To group and calculate the sum of the amount.

Use MapReduce ¶

Consider the following document structure to store the user’s article, where the document stores the user’s user_name and the article’s status field:

>db.posts.insert({ "post_text": "菜鸟教程，最全的技术文档。", "user_name": "mark", "status":"active" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "菜鸟教程，最全的技术文档。", "user_name": "mark", "status":"active" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "菜鸟教程，最全的技术文档。", "user_name": "mark", "status":"active" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "菜鸟教程，最全的技术文档。", "user_name": "mark", "status":"active" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "菜鸟教程，最全的技术文档。", "user_name": "mark", "status":"disabled" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "菜鸟教程，最全的技术文档。", "user_name": "runoob", "status":"disabled" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "菜鸟教程，最全的技术文档。", "user_name": "runoob", "status":"disabled" }) WriteResult({ "nInserted" : 1 }) >db.posts.insert({ "post_text": "菜鸟教程，最全的技术文档。", "user_name": "runoob", "status":"active" }) WriteResult({ "nInserted" : 1 }) 

Now we will use the mapReduce function in the posts collection to select the published article (status: “active”) and use the user_name Group to calculate the number of articles per user:

>db.posts.mapReduce( function() { emit(this.user_name,1); }, function(key, values) {return Array.sum(values)}, { query:{status:"active"}, out:"post_total" } ) 

The above mapReduce output is as follows:

{ "result" : "post_total", "timeMillis" : 23, "counts" : { "input" : 5, "emit" : 5, "reduce" : 1, "output" : 2 }, "ok" : 1 } 

The results show that there are five documents that meet the query criteria (status: “active”), and five key-value pairs are generated in the map function. Finally, the same key-value is divided into two groups by using the reduce function.

Result: the name of the collection that stores the results. This is a temporary collection. The connection to the MapReduce is automatically deleted when the connection is closed.
TimeMillis: time taken to execute in milliseconds
Input: the number of documents sent to the map function that meet the condition
Emit: the number of times emit is called in the map function, that is, the total amount of data in all collections
Output: the number of documents in the result set (count is very helpful for debugging)
Ok: whether it is successful or not, success is 1
Err: if you fail, there can be reasons for failure, but from an empirical point of view, the reasons are vague and of little use.

Use find Operator to view mapReduce Query results of:

> var map=function() { emit(this.user_name,1); } > var reduce=function(key, values) {return Array.sum(values)} > var options={query:{status:"active"},out:"post_total"} > db.posts.mapReduce(map,reduce,options) { "result" : "post_total", "ok" : 1 } > db.post_total.find(); 

The above query shows the following results:

{ "_id" : "mark", "value" : 4 } { "_id" : "runoob", "value" : 1 } 

In a similar manner, MapReduce can be used to build large and complex aggregate queries.

The Map function and Reduce function can be implemented using JavaScript, which makes the use of MapReduce very flexible and powerful.