python模拟MapReduce的感受

2017-12-16 10:20:32来源:oschina作者:ZhangYuXia人点击

分享

此例子简单,但能帮助快速体会Map和Reduce到底在干什么


目的是求出每个年份的最大值


[zhangyuxia@hadoop234 ~]$ cat test.dat 19500515070049999999N9+00001+99999999999 19500515120049999999N9+00221+99999999999 19500515180049999999N9-00111+99999999999 19490324120040500001N9+01111+99999999999 19490324180040500001N9+00781+99999999999 19500515070049999999N9+00001+99999999999 19500515120049999999N9+00221+99999999999 19500515180049999999N9-00111+99999999999 19490324120040500001N9+01111+99999999999 19490324180040500001N9+00781+99999999999


[zhangyuxia@hadoop234 ~]$ cat map.py import re import sys for line in sys.stdin: #按行读取val=line.strip()(year,temp)=(val[0:4],val[23:28]) #每行都进行此操作print "%s/t%s" % (year,temp)


[zhangyuxia@hadoop234 ~]$ cat test.dat|python map.py 1950 00001 1950 00221 1950 00111 1949 01111 1949 00781 1950 00001 1950 00221 1950 00111 1949 01111 1949 00781


排序


[zhangyuxia@hadoop234 ~]$ cat test.dat|python map.py|sort 1949 00781 1949 00781 1949 01111 1949 01111 1950 00001 1950 00001 1950 00111 1950 00111 1950 00221 1950 00221


[zhangyuxia@hadoop234 ~]$ cat reduce.py import sys (last_key,max_val)=(None,0) for line in sys.stdin:#按行 (key,val)=line.strip().split('/t') if last_key and last_key!=key:print '%s/t%s' % (last_key, max_val) (last_key, max_val)=(key,int(val)) else: (last_key, max_val)=(key,max(max_val,int(val))) if last_key:print '%s/t%s' % (last_key, max_val)


[zhangyuxia@hadoop234 ~]$ cat test.dat|python map.py|sort|python reduce.py 1949 1111 1950 221

最新文章

123

最新摄影

微信扫一扫

第七城市微信公众平台