上学时代某呆觉得课堂记笔记太分神,通常情况是拿个烂笔头随便画几个线,或者预习非常好,大致框架已经了然,随便记几个小问题就好了。 之前爬虫中,某呆自不量力试图阅读爬下和谷哥推送的所有文献,实践证明是徒劳。 事实上,在海量文献累积,到处玩大数据的时代,人脑读文献只能是有限几个。 在不知道往哪里走的起步阶段,更不可能阅读所有下载文献。 更多情况是通过谷哥协助,泛泛查一查信息,这时候需要权衡一把“好记性不如烂笔头”。 通常,需要复印式记一堆乱七八糟信息:文章名称,重点内容和一些评论。 某呆用记事本记录不一会就会烦。
最近某呆需要查大量文献,搜到了 org-capture,这个东西会自动复印,省掉一堆乱七八糟的操作。
如下设置org-protocol和模板
(require 'org-protocol)
(defun transform-square-brackets-to-round-ones(string-to-transform)
"Transforms [ into ( and ] into ), other chars left unchanged."
(concat
(mapcar #'(lambda (c) (if (equal c ?[) ?\( (if (equal c ?]) ?\) c))) string-to-transform)))
(setq org-capture-templates
'(("p" "Protocol" entry (file+headline "~/Dropbox/agenda/emacs/else/work.org" "Temporary task")
"* %^{Title}\nSource: %u [[%:link][%(transform-square-brackets-to-round-ones \"%:description\")]]\n#+BEGIN_QUOTE\n%i\n#+END_QUOTE\n\n\n")
("L" "Protocol Link" entry (file+headline "~/Dropbox/agenda/emacs/else/work.org" "Temporary task")
"* %? [[%:link][%(transform-square-brackets-to-round-ones \"%:description\")]]\n")
("i" "Idea item from clipboard" plain (file+headline "~/Dropbox/agenda/emacs/else/work.org" "Paper")
"- %^{Title}\n#+begin_src\n%x\n#+end_src\n" :jump-to-captured t)
("t" "Task" entry (file "") "* TODO %^{Title}\n%i\n%a" :prepend t :jump-to-captured t)))
再设置一下refile
(setq org-refile-targets '((org-agenda-files :maxlevel . 5)))
(setq org-refile-use-outline-path 'file)
(setq org-outline-path-complete-in-steps nil)
(setq org-refile-allow-creating-parent-nodes 'confirm)
火狐下载这个org-capture 插件后,下一步就可任意地、写实地、抽象地在网页抓信息了。 整个操作过程是,鼠标选中文献网页中一段文字,然后按一下这个按钮就可以打开emacsclient,然后填写一点评论,然后refile到任何目标。
以上,某完美主义者还是不爽,想到如何抓pdf中信息?直接复制到剪切版,或者在emacs中阅读pdf能够自动抓取内容,但那样太慢且pdf阅读功能不强大。 所以再来一个剪切版剪内容,
("i" "Idea item from clipboard" plain (file+headline "~/Dropbox/agenda/emacs/else/work.org" "Paper")
"- %^{Title}\n#+begin_src\n%x\n#+end_src\n" :jump-to-captured t)
不需要剪切版时候,可以放个任务
("t" "Task" entry (file "") "* TODO %^{Title}\n%i\n%a" :prepend t :jump-to-captured t)
以上,可做进一步改进,即对于有些网页内容,排版往往与常见的代码宽度(81列)不一致, 这里可以做一下简单的重新排版的小操作。
(defun replace-in-string (what with in)
(replace-regexp-in-string (regexp-quote what) with in nil 'literal))
(defun format-i-string ()
(let ((v-i (plist-get org-store-link-plist :initial)))
(shell-command-to-string (format "~/bin/pfmt.py --string \"%s\""
(replace-in-string "\"" "\\\"" v-i)
))))
(defun format-x-string ()
(let ((v-x (current-kill 0)))
(shell-command-to-string (format "~/bin/pfmt.py --string \"%s\""
(replace-in-string "\"" "\\\"" v-x)
))))
(setq org-capture-templates
'(("p" "Protocol" entry (file+headline "~/Dropbox/agenda/emacs/else/work.org" "Temporary task")
"* %^{Title}\nSource: %u [[%:link][%(transform-square-brackets-to-round-ones \"%:description\")]]\n#+BEGIN_QUOTE\n%(format-i-string)#+END_QUOTE" :jump-to-captured t :empty-lines-after 1)
("L" "Protocol Link" entry (file+headline "~/Dropbox/agenda/emacs/else/work.org" "Temporary task")
"* %? [[%:link][%(transform-square-brackets-to-round-ones \"%:description\")]]" :empty-lines-after 1 :jump-to-captured t)
("i" "Idea item from clipboard" plain (file+headline "~/Dropbox/agenda/emacs/else/work.org" "Paper")
"- %^{Title}\n#+begin_src\n%(format-x-string)#+end_src\n" :jump-to-captured t)
("t" "Task" entry (file "") "* TODO %^{Title}\n%i\n%a" :prepend t :jump-to-captured t)))
目前linux主流发行版上没有中英文字符串混合重新排版的命令,所以用python3.9中现有cjkwrap包来收拾一把中英文混合排版:
#!/usr/bin/env python
import argparse
import cjkwrap
from itertools import groupby
def paragraph(lines) :
for group_separator, line_iteration in groupby(lines.splitlines(True),
key = str.isspace) :
if not group_separator :
yield ''.join(line_iteration)
def main():
parser = argparse.ArgumentParser(description = 'format string vector')
parser.add_argument('--string', dest='inString', help='input string')
args = parser.parse_args()
inString = args.inString.strip(" \n")
reString = ""
for p in paragraph(inString):
p = p.strip(" \n")
reString = "{0}\n\n{1}".format(reString, cjkwrap.fill(p, 81))
reString = reString.strip("\n")
print(reString)
if __name__ == '__main__':
main()
这样capture的内容的格式就更加漂亮了。