git log根据特定条件查询日志并统计修改的代码行数

(编辑:jimmy 日期: 2024/11/13 浏览:2)

前言

随着年龄的增长和知识的积累,最近常常有种豁然开朗的感觉,或者对一个已经存在的事物突然有了新的认识,比如统计这个词很早就接触了,从没考虑过它是什么意思,而这篇总结的题目中用了统计一词,第一感觉应该是汇总、记录的意思,后来去查了词条定义,也确实就是类似的解释,从没有刻意去学这个词的含义,但是在每天的生活中已经潜移默化地归纳、总结出来了。

想要统计就得有数据源,而 git log 命令恰恰就能提供这个数据源,git log 本身就是一个显示日志的命令,日志记录的是代码库变化的数据,类似于描述代码库变化的 “史书”,想要描述历史就需要大量的数据支撑,想要统计修改的代码行数,只要我们从历史记录中找到需要计算的部分就可以了。

git log

在统计之前我们需要先整理数据,杂乱无章的数据不是不能统计,只是计算起来更加的麻烦,所以在统计前需要先将数据规范化,所以我们需要先学习一下 git log 的相关操作。

我们以 redis 开源库为例,切换到 6.0 分支,提交记录定位到 7bf665f125a4771db095c83a7ad6ed46692cd314,以此为数据源,学习一下git log 的常用的查询方法,其实使用不同的条件查询就是整理、归类数据的过程。

git log 的用法多种多样,我们主要关心两个大类,分别是条件筛选和显示格式。

条件筛选

git log 条件筛选的选项非常多,使用条件筛选的选项会影响显示的提交记录的范围,查找到想要显示的提交记录。

查询最近几条log

使用 -number 参数可以查询最近几条提交提交记录:

$ git log -3
commit 7bf665f125a4771db095c83a7ad6ed46692cd314 (HEAD -> 6.0, tag: 6.0.6, origin/6.0)
Author: Oran Agra <oran@redislabs.com>
Date: Sun Jul 19 14:00:20 2020 +0300

 Redis 6.0.6.

commit a5696bdf4f2687ab45f633ccb7cdc4ee9c2f957d
Author: Oran Agra <oran@redislabs.com>
Date: Sun Jul 19 15:33:21 2020 +0300

 Run daily CI on PRs to release a branch

commit e15528bf1da1f1232fd08801ad382c915be94662
Author: Itamar Haber <itamar@redislabs.com>
Date: Thu Jul 16 21:31:36 2020 +0300

 Adds SHA256SUM to redis-stable tarball upload

 (cherry picked from commit 5df0a64d30e7815c0a4a75a80f165fdee0bd1db6)

查询指定作者提交

使用 --author 参数可以查询指定作者的提交记录:

Albert@DESKTOP-6746UC3 MINGW64 /d/data/maingit/redis (6.0)
$ git log -2 --author='Oran Agra'
commit 7bf665f125a4771db095c83a7ad6ed46692cd314 (HEAD -> 6.0, tag: 6.0.6, origin/6.0)
Author: Oran Agra <oran@redislabs.com>
Date: Sun Jul 19 14:00:20 2020 +0300

 Redis 6.0.6.

commit a5696bdf4f2687ab45f633ccb7cdc4ee9c2f957d
Author: Oran Agra <oran@redislabs.com>
Date: Sun Jul 19 15:33:21 2020 +0300

 Run daily CI on PRs to release a branch

查询指定时间段的日志

这个可选参数比较多,比如 --since--until--before--after 等等,从意思很容易分辨怎么使用:

查询2020-01-01到2020-04-01的提交记录

$ git log -2 --after=2020-01-01 --before=2020-04-01
commit 957e917a84ac9979f18145a4d0b53386f5ce4fd9 (tag: 6.0-rc3)
Author: antirez <antirez@gmail.com>
Date: Tue Mar 31 17:56:04 2020 +0200

 Redis 6.0-RC3.

commit ef1b1f01a84e969ea368e7fdbaf0d10615743269
Author: antirez <antirez@gmail.com>
Date: Tue Mar 31 17:41:23 2020 +0200

 cast raxSize() to avoid warning with format spec.

恰好逮到了原作者的提交~

查询1年前的提交记录

$ git log -2 --until=1.year.ago
commit 86aade9a024c3582665903d0cc0c5692c6677cfd
Merge: 89ad0ca56 3bfcae247
Author: Salvatore Sanfilippo <antirez@gmail.com>
Date: Thu Sep 5 13:30:26 2019 +0200

 Merge pull request #6364 from oranagra/fix_module_aux_when

 Fix to module aux data rdb format for backwards compatibility with old check-rdb

commit 3bfcae247a1c51788940bd4d2f32751ead451e42
Author: Oran Agra <oran@redislabs.com>
Date: Thu Sep 5 14:11:37 2019 +0300

 Fix to module aux data rdb format for backwards compatibility with old check-rdb

 When implementing the code that saves and loads these aux fields we used rdb
 format that was added for that in redis 5.0, but then we added the 'when' field
 which meant that the old redis-check-rdb won't be able to skip these.
 this fix adds an opcode as if that 'when' is part of the module data.

查询包含指定描述内容的提交记录

这里用可以使用 --grep 参数,可以过滤出包含指定内容的提交记录,这里指的是在 commit 描述中筛选符合条件的提交,比如查找提交描述中包含 client 的提交记录:

$ git log -2 --grep='client'
commit 0f75036c07db48dfcf605e090216a4447edc38fc
Author: Wen Hui <wen.hui.ware@gmail.com>
Date: Wed Jul 15 05:38:47 2020 -0400

 correct error msg for num connections reaching maxclients in cluster mode (#7444)


 (cherry picked from commit d85af4d6f5fbe9cb9787b81583627cd74b47f838)

commit f89f50dbd06247677b8cb3927cbb88c1b5384061
Author: Oran Agra <oran@redislabs.com>
Date: Tue Jul 14 20:21:59 2020 +0300

 diskless master disconnect replicas when rdb child failed (#7518)

 in case the rdb child failed, crashed or terminated unexpectedly redis
 would have marked the replica clients with repl_put_online_on_ack and
 then kill them only after a minute when no ack was received.

 it would not stream anything to these connections, so the only effect of
 this bug is a delay of 1 minute in the replicas attempt to re-connect.

 (cherry picked from commit a176cb56a3c0235adddde33fcbaee2369a5af73e)

查找指定分支的提交记录

使用 git log 默认查找的是当前分支的提交记录,如果想查询其他分支的记录直接在命令后面加上分支名字就行,比如查询 arm 分支上的提交记录:

$ git log -2 arm
commit 7329cc39818a05c168e7d1e791afb03c089f1933 (origin/arm, arm)
Author: Salvatore Sanfilippo <antirez@gmail.com>
Date: Sun Feb 19 15:07:08 2017 +0000

 ARM: Avoid fast path for BITOP.

 GCC will produce certain unaligned multi load-store instructions
 that will be trapped by the Linux kernel since ARM v6 cannot
 handle them with unaligned addresses. Better to use the slower
 but safer implementation instead of generating the exception which
 should be anyway very slow.

commit 4e9cf4cc7ed4b732fc4bb592f19ceb41d132954e
Author: Salvatore Sanfilippo <antirez@gmail.com>
Date: Sun Feb 19 15:02:37 2017 +0000

 ARM: Use libc malloc by default.

 I'm not sure how much test Jemalloc gets on ARM, moreover
 compiling Redis with Jemalloc support in not very powerful
 devices, like most ARMs people will build Redis on, is extremely
 slow. It is possible to enable Jemalloc build anyway if needed
 by using "make MALLOC=jemalloc".

其实在 git 体系中,分支名、commit、标签等拥有几乎相同的含义,所以在很多场景下可以扩展互换,比如 git log 后面加上分支名就可以查询指定分支的提交记录,如果加上 commit 就会查询这个 commit 之前的提交记录,如果加上标签,就可以查询这个标签之前的提交记录,比如我们加一个 commit 试试:

$ git log -2 7329cc39818a05c168e7d1e791afb03c089f1933
commit 7329cc39818a05c168e7d1e791afb03c089f1933 (origin/arm, arm)
Author: Salvatore Sanfilippo <antirez@gmail.com>
Date: Sun Feb 19 15:07:08 2017 +0000

 ARM: Avoid fast path for BITOP.

 GCC will produce certain unaligned multi load-store instructions
 that will be trapped by the Linux kernel since ARM v6 cannot
 handle them with unaligned addresses. Better to use the slower
 but safer implementation instead of generating the exception which
 should be anyway very slow.

commit 4e9cf4cc7ed4b732fc4bb592f19ceb41d132954e
Author: Salvatore Sanfilippo <antirez@gmail.com>
Date: Sun Feb 19 15:02:37 2017 +0000

 ARM: Use libc malloc by default.

 I'm not sure how much test Jemalloc gets on ARM, moreover
 compiling Redis with Jemalloc support in not very powerful
 devices, like most ARMs people will build Redis on, is extremely
 slow. It is possible to enable Jemalloc build anyway if needed
 by using "make MALLOC=jemalloc".

因为 commit id 就是之前的 arm 分支最新的记录,所以这个命令等价于 git log -2 arm

查询指定 commit 之间的提交记录

如果想查询两个 commit 之前的提交记录,可以将两个 commit id 依次放在命令后面并用 .. 连接就可以了,格式为 git log commit1..commit2,需要注意的是这样查询出来的提交记录列表中不包含 commit1,其实列举出的就是 commit1 之后又做了哪些修改提交。

$ git log e15528bf1da1f1232fd08801ad382c915be94662..7bf665f125a4771db095c83a7ad6ed46692cd314
commit 7bf665f125a4771db095c83a7ad6ed46692cd314 (HEAD -> 6.0, tag: 6.0.6, origin/6.0)
Author: Oran Agra <oran@redislabs.com>
Date: Sun Jul 19 14:00:20 2020 +0300

 Redis 6.0.6.

commit a5696bdf4f2687ab45f633ccb7cdc4ee9c2f957d
Author: Oran Agra <oran@redislabs.com>
Date: Sun Jul 19 15:33:21 2020 +0300

 Run daily CI on PRs to release a branch

这个特性有一个应用就是在 merge 分支之前可以查询究竟会 merge 哪些记录,常见的用法比如 git log feature..dev 就是列举出 feature 分支合并到 dev 分支将要合并的提交记录有哪些。

$ git log 6.0..unstable
commit 324e22accf457edc996971bc97f5474349cd7c4c (unstable)
Author: antirez <antirez@gmail.com>
Date: Fri Dec 20 12:29:02 2019 +0100

 Fix ip and missing mode in RM_GetClusterNodeInfo().

查询指定文件的提交记录

查询指定文件的提交记录一般直接在 git log 命令后面跟上文件名就可以,但是为了避免和分支名产生分歧,所以通常在文件名前面加上 -- 用来区分,-- 这个标识符就是用来防止混淆的,放在 -- 前面的是分支名,放在后面的是文件名,相同的作用不仅仅在 git log 命令中,在其他命令比如 git checkout 中也有相同的用法。

$ git log -2 -- redis.conf
commit 7a536c2912be1fd9f62b26b7022a00644c88ef8b
Author: Yossi Gottlieb <yossigo@users.noreply.github.com>
Date: Fri Jul 10 11:33:47 2020 +0300

 TLS: Session caching configuration support. (#7420)

 * TLS: Session caching configuration support.
 * TLS: Remove redundant config initialization.

 (cherry picked from commit 3e6f2b1a45176ac3d81b95cb6025f30d7aaa1393)

commit 8312aa27d47c0befcf69eb74d0a5dc19745ffd32
Author: antirez <antirez@gmail.com>
Date: Mon Jun 22 11:21:21 2020 +0200

 Clarify maxclients and cluster in conf. Remove myself too.

 (cherry picked from commit 59fd178014c7cca1b0c668b30ab0d991dd3030f3)

显示格式

git log 除了可以筛选提交记录,还可以控制显示格式,普通不加参数,会显示作者、邮件、提交描述信息、日期等信息。

通过添

$ git log -1
commit 7bf665f125a4771db095c83a7ad6ed46692cd314 (HEAD -> 6.0, tag: 6.0.6, origin/6.0)
Author: Oran Agra <oran@redislabs.com>
Date: Sun Jul 19 14:00:20 2020 +0300

 Redis 6.0.6.

加参数可以控制和改变显示格式,下面来看几条常见的

显示单行信息

git log 默认会显示多行信息,使用 --oneline 后每条提交记录只显示一行信息,可以在一屏幕中查看到更多的信息

$ git log -10 --oneline
7bf665f12 (HEAD -> 6.0, tag: 6.0.6, origin/6.0) Redis 6.0.6.
a5696bdf4 Run daily CI on PRs to release a branch
e15528bf1 Adds SHA256SUM to redis-stable tarball upload
e28aa99af Support passing stack allocated module strings to moduleCreateArgvFromUserFormat (#7528)
305143004 Send null for invalidate on flush (#7469)
29b20fd52 Notify systemd on sentinel startup (#7168)
5b3668121 Add registers dump support for Apple silicon (#7453)
0f75036c0 correct error msg for num connections reaching maxclients in cluster mode (#7444)
b1a01fda9 Fix command help for unexpected options (#7476)
83f55f61a Refactor RM_KeyType() by using macro. (#7486)

显示每条记录中文件修改的具体行数和行体统计

使用 --stat 参数就可以显示每条记录的中修改文件的具体行数和行数统计

$ git log -2 --stat
commit 7bf665f125a4771db095c83a7ad6ed46692cd314 (HEAD -> 6.0, tag: 6.0.6, origin/6.0)
Author: Oran Agra <oran@redislabs.com>
Date: Sun Jul 19 14:00:20 2020 +0300

 Redis 6.0.6.

 00-RELEASENOTES | 245 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/help.h | 4 +-
 src/version.h | 2 +-
 3 files changed, 248 insertions(+), 3 deletions(-)

commit a5696bdf4f2687ab45f633ccb7cdc4ee9c2f957d
Author: Oran Agra <oran@redislabs.com>
Date: Sun Jul 19 15:33:21 2020 +0300

 Run daily CI on PRs to release a branch

 .github/workflows/daily.yml | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

显示每条提交记录中文件的增加行数和删除行数

使用 --numstat 参数会把 --stat 参数中合并显示的修改行数拆分成增加行数和删除行数

$ git log -2 --numstat
commit 7bf665f125a4771db095c83a7ad6ed46692cd314 (HEAD -> 6.0, tag: 6.0.6, origin/6.0)
Author: Oran Agra <oran@redislabs.com>
Date: Sun Jul 19 14:00:20 2020 +0300

 Redis 6.0.6.

245 0 00-RELEASENOTES
2 2 src/help.h
1 1 src/version.h

commit a5696bdf4f2687ab45f633ccb7cdc4ee9c2f957d
Author: Oran Agra <oran@redislabs.com>
Date: Sun Jul 19 15:33:21 2020 +0300

 Run daily CI on PRs to release a branch

4 2 .github/workflows/daily.yml

依次罗列各提交记录中每个文件中增加的行数和删除的行数

要想达到这个目的需要用到 --prety=tformat: --numstat 参数,这样的显示格式便于统计

$ git log -2 --pretty=tformat: --numstat
245 0 00-RELEASENOTES
2 2 src/help.h
1 1 src/version.h
4 2 .github/workflows/daily.yml

统计修改的代码行数

有了前面的铺垫,想要统一修改的行数就容易了,只要配合 awk 工具就可以完成统计了

$ $ git log -2 --pretty=tformat: --numstat | awk '{adds += $1; subs += $2; diffs += $1 - $2} END {printf "added lines: %s removed lines: %s, diff lines: %s\n", adds, subs, diffs}'
added lines: 252 removed lines: 5, diff lines: 247

还可以统计两个分支相差的代码行数

$ git log 6.0..unstable --pretty=tformat: --numstat | awk '{adds += $1; subs += $2; diffs += $1 - $2} END {printf "added lines: %s removed lines: %s, diff lines: %s\n", adds, subs, diffs}'
added lines: 5 removed lines: 2, diff lines: 3

到这里可以发现前面的知识都可以用上,前面筛选的参数变了,得到的结果就变了,我们可以根据需求来调整不同的参数

总结

  • git log 就是一部代码库记录的“史书”,对于曾经所做的修改可以做到有史可查
  • git log 的选项参数可以分为筛选参数和格式参数,筛选参数可以选择记录范围,格式参数可以控制显示样式
  • 统计就是按照一定规律来将数据进行汇总,在进行汇总前需要将数据进行整理,这样汇总的工作才会更加顺利