2009-02-08 15:00:28 -05:00
#+OPTIONS : H:3 num:nil toc:t \n:nil @:t ::t |:t ^:t -:t f:t *:t TeX:t LaTeX:t skip:nil d:(HIDE) tags:not-in-toc
#+TITLE : rorg --- Code evaluation in org-mode, with an emphasis on R
2009-02-06 18:45:59 -05:00
#+SEQ_TODO : TODO PROPOSED | DONE DROPPED MAYBE
2009-02-08 13:11:22 -05:00
#+STARTUP : oddeven
2009-03-31 10:24:15 -04:00
* Tasks
2009-03-31 19:00:23 -04:00
2009-03-31 10:24:15 -04:00
** TODO evaluation of shell code as background process? [DED]
After C-c C-c on an R code block, the process may appear to block,
but C-g can be used to reclaim control of the .org buffer, without
2009-03-31 14:54:19 -04:00
interrupting the R evalution. However I believe this is not true of
bash/sh evaluation. [Haven't tried other languages] Perhaps a
solution is just to background the individual shell
commands.
2009-03-31 19:00:23 -04:00
2009-03-31 10:24:15 -04:00
** litorgy-R
*** TODO ability to select which of multiple R sessions is being used
(like ess-switch-process in .R buffers)
2009-03-31 19:00:23 -04:00
2009-04-01 18:46:55 -04:00
** DONE a header argument specifying silent evaluation (no output)
2009-03-31 19:00:23 -04:00
This would be useful across all types of source block. Currently
there is a =:replace t= option to control output, this could be
generalized to an =:output= option which could take the following
options (maybe more)
- =t= :: this would be the default, and would simply insert the
results after the source block
2009-04-01 18:46:55 -04:00
- =replace= :: to replace any results which may already be there
- =silent= :: this would inhibit any insertion of the results
This is now implemented see the example in the [[* silent evaluation ][sandbox ]]
2009-03-31 19:00:23 -04:00
2009-04-01 18:59:33 -04:00
** TODO insert 2-D R results as tables
2009-04-03 19:50:57 -04:00
everything is working but R and shell
*** TODO shells
*** TODO R
This has already been tackled by Dan in [[file:existing_tools/org-R.el::defconst%20org%20R%20write%20org%20table%20def ][org-R:check-dimensions ]]. The
functions there should be useful in combination with [[http://cran.r-project.org/doc/manuals/R-data.html#Export-to-text-files ][R-export-to-csv ]]
as a means of converting multidimensional R objects to emacs lisp.
It may be as simple as first checking if the data is multidimensional,
and then, if so using =write= to write the data out to a temporary
file from which emacs can read the data in using =org-table-import= .
Looking into this further, is seems that there is no such thing as a
scalar in R [[http://tolstoy.newcastle.edu.au/R/help/03a/3733.html ][R-scalar-vs-vector ]]. In that light I am not sure how to
deal with trivial vectors (scalars) in R. I'm tempted to just treat
them as vectors, but then that would lead to a proliferation of
trivial 1-cell tables...
2009-04-01 18:59:33 -04:00
** TODO allow variable initialization from source blocks
Currently it is possible to initialize a variable from an org-mode
table with a block argument like =table=sandbox= (note that the
variable doesn't have to named =table= ) as in the following example
#+TBLNAME : sandbox
| 1 | 2 | 3 |
| 4 | schulte | 6 |
#+begin_src emacs-lisp :var table=sandbox :results replace
(message (format "table = %S" table))
#+end_src
: "table = ((1 2 3) (4 \"schulte\" 6))"
It would be good to allow initialization of variables from the results
of other source blocks in the same manner. This would probably
require the addition of =#+SRCNAME: example= lines for the naming of
source blocks, also the =table=sandbox= syntax may have to be expanded
to specify whether the target is a source code block or a table
(alternately we could just match the first one with the given name
whether it's a table or a source code block).
2009-04-02 20:58:04 -04:00
At least initially I'll try to implement this so that there is no need
to specify whether the reference is to a table or a source-code block.
That seems to be simpler both in terms of use and implementation.
2009-04-02 23:02:56 -04:00
This is now working for emacs-lisp, ruby and python (and mixtures of
the three) source blocks. See the examples in the [[* (sandbox) referencing other source blocks ][sandbox ]].
2009-04-02 20:58:04 -04:00
This is currently working only with emacs lisp as in the following
example in the [[* emacs lisp source reference ][emacs lisp source reference ]].
2009-03-31 10:24:15 -04:00
* Bugs
** Args out of range error
The following block resulted in the error below [DED]. It ran without
error directly in the shell.
#+begin_src sh
cd ~/work/genopca
for platf in ill aff ; do
for pop in CEU YRI ASI ; do
rm -f $platf/hapmap-genos-$pop-all $platf/hapmap-rs-all
cat $platf/hapmap-genos-$pop-* > $platf/hapmap-genos-$pop-all
cat $platf/hapmap-rs-* > $platf/hapmap-rs-all
done
done
#+end_src
executing source block with sh...
finished executing source block
string-equal: Args out of range: "", -1, 0
* Sandbox
This is a place for code examples
** litorgy.el beginning functionality
After evaluating litorgy.el and litorgy-script.el, you should be able
to evaluate the following blocks of code by pressing =\C-c\C-c= on the
header lines. *Note* : your version of org-mode must be at least 6.23
or later.
To run these examples open both [[file:litorgy/litorgy.el ][litorgy.el ]], [[file:litorgy/litorgy-script.el ][litorgy-script.el ]] and
evaluate them with =M-x eval-buffer=
2009-04-01 18:59:33 -04:00
#+begin_src sh :results replace
2009-03-31 10:24:15 -04:00
date
#+end_src
#+begin_src ruby
puts Time.now
#+end_src
#+begin_src python
print "Hello world!"
#+end_src
** litorgy-R
To run these examples open both [[file:litorgy/litorgy.el ][litorgy.el ]], [[file:litorgy/litorgy-R.el ][litorgy-R.el ]] and evaluate
them with =M-x eval-buffer=
2009-04-01 18:59:33 -04:00
#+begin_src R :results replace
2009-03-31 10:24:15 -04:00
a <- 9
b <- 17
a + b
#+end_src
2009-04-04 14:55:36 -04:00
: 26
2009-03-31 14:54:19 -04:00
#+begin_src R
hist(rgamma(20,3,3))
#+end_src
2009-03-31 10:24:15 -04:00
** free variables
First assign the variable with some sort of interpreted line
- this is independent of any particular type of source code
- this could use references to table ranges
** resource reference example
2009-03-31 19:11:17 -04:00
*Note* : this example is largely *defunct* , see the
[[* litorgy plays with tables ][litorgy-plays-with-tables ]] section below.
2009-03-31 10:24:15 -04:00
This block holds an array of information written in [[http://www.yaml.org ][YAML ]]
#name: yaml-array
#+begin_src yaml
---
- 1
- 2
- 3
- 4
- 5
#+end_src
This next block saves the information in the YAML array into the ruby
variable =ya= and then in ruby it multiplies each variable in the =ya=
by 2.
#name: ruby-array
#assign: ya = yaml-array
#+begin_src ruby
ya.map{ |e| e * 2 }
#+end_src
This final block takes the output of the ruby block, and writes it to
cell =0,0= through =0,3= of the table
#name: example-table
#assign: self[0, (1..3)] = ruby-array
| example results |
|-----------------|
| |
| |
| |
** litorgy plays with tables
Alright, this should demonstrate both the ability of litorgy to read
tables into a lisp source code block, and to then convert the results
of the source code block into an org table. It's using the classic
"lisp is elegant" demonstration transpose function. To try this
out...
1. evaluate [[file:litorgy/init.el ]] to load litorgy and friends
2. evaluate the transpose definition =\C-u \C-c\C-c= on the beginning of
the source block (prefix arg to inhibit output)
3. evaluate the next source code block, this should read in the table
because of the =:var table=previous= , then transpose the table, and
finally it should insert the transposed table into the buffer
immediately following the block
*** Emacs lisp
#+begin_src emacs-lisp
(defun transpose (table)
(apply #'mapcar* #'list table))
#+end_src
#+TBLNAME : sandbox
| 1 | 2 | 3 |
| 4 | schulte | 6 |
2009-04-01 18:59:33 -04:00
#+begin_src emacs-lisp :var table=previous :results replace
2009-03-31 10:24:15 -04:00
(transpose table)
#+end_src
2009-04-01 18:59:33 -04:00
#+begin_src emacs-lisp :var table=sandbox :results replace
2009-03-31 10:24:15 -04:00
(transpose table)
#+end_src
*** Ruby and Python
2009-04-01 18:59:33 -04:00
#+begin_src ruby :var table=sandbox :results replace
2009-03-31 10:24:15 -04:00
table.first.join(" - ")
#+end_src
: "1 - 2 - 3"
2009-04-01 18:59:33 -04:00
#+begin_src python :var table=sandbox :results replace
2009-03-31 10:24:15 -04:00
table[0]
#+end_src
| 1 | 2 | 3 |
2009-04-01 18:59:33 -04:00
#+begin_src ruby :var table=sandbox :results replace
2009-03-31 10:24:15 -04:00
table
#+end_src
| 1 | 2 | 3 |
| 4 | "schulte" | 6 |
2009-04-01 18:59:33 -04:00
#+begin_src python :var table=sandbox :results replace
2009-03-31 10:24:15 -04:00
table
#+end_src
| 1 | 2 | 3 |
| 4 | "schulte" | 6 |
*** R
2009-04-01 18:59:33 -04:00
#+begin_src R :results replace
2009-03-31 10:24:15 -04:00
a <- 9
b <- 8
2009-04-03 18:56:02 -04:00
b*a
2009-03-31 10:24:15 -04:00
#+end_src
2009-04-03 18:56:02 -04:00
: 72
2009-04-01 18:59:33 -04:00
#+begin_src R :results replace
2009-03-31 10:24:15 -04:00
x <- c(rnorm(10, mean=-3, sd=1), rnorm(10, mean=3, sd=1))
x
#+end_src
: -2.059712 -1.299807 -2.518628 -4.319525 -1.944779 -5.345708 -3.921314
: -2.841109 -0.963475 -2.465979 4.092037 1.299202 1.476687 2.128594
: 3.200629 1.990952 1.888890 3.561541 3.818319 1.969161
2009-04-01 18:46:55 -04:00
** silent evaluation
#+begin_src ruby
:im_the_results
#+end_src
#+begin_src ruby :results silent
:im_the_results
#+end_src
#+begin_src ruby :results replace
:im_the_results
#+end_src
2009-04-02 23:02:56 -04:00
** (sandbox) referencing other source blocks
2009-04-02 20:58:04 -04:00
Doing this in emacs-lisp first because it's trivial to convert
emacs-lisp results to and from emacs-lisp.
*** emacs lisp source reference
This first example performs a calculation in the first source block
named =top= , the results of this calculation are then saved into the
variable =first= by the header argument =:var first=top= , and it is
used in the calculations of the second source block.
#+SRCNAME : top
#+begin_src emacs-lisp
(+ 4 2)
#+end_src
#+begin_src emacs-lisp :var first=top :results replace
(* first 3)
#+end_src
This example is the same as the previous only the variable being
passed through is a table rather than a number.
#+begin_src emacs-lisp :results silent
(defun transpose (table)
(apply #'mapcar* #'list table))
#+end_src
#+TBLNAME : top_table
| 1 | 2 | 3 |
| 4 | schulte | 6 |
#+SRCNAME : second_src_example
#+begin_src emacs-lisp :var table=top_table
(transpose table)
#+end_src
2009-04-02 23:02:56 -04:00
#+begin_src emacs-lisp :var table=second_src_example :results replace
2009-04-02 20:58:04 -04:00
(transpose table)
#+end_src
2009-04-02 23:02:56 -04:00
*** ruby python
Now working for ruby
#+srcname : start
#+begin_src ruby
89
#+end_src
#+begin_src ruby :var other=start :results replace
2 * other
#+end_src
and for python
#+SRCNAME : start_two
#+begin_src python
98
#+end_src
#+begin_src python :var another=start_two :results replace
2009-04-03 18:56:02 -04:00
another*3
2009-04-02 23:02:56 -04:00
#+end_src
2009-04-03 18:56:02 -04:00
: 294
2009-04-02 23:02:56 -04:00
*** mixed languages
Since all variables are converted into Emacs Lisp it is no problem to
reference variables specified in another language.
#+SRCNAME : ruby-block
#+begin_src ruby
2
#+end_src
#+SRCNAME : lisp_block
#+begin_src emacs-lisp :var ruby-variable=ruby-block
(* ruby-variable 8)
#+end_src
2009-04-02 20:58:04 -04:00
2009-04-02 23:02:56 -04:00
#+begin_src python :var lisp_var=lisp_block
lisp_var + 4
#+end_src
*** R
not yet implemented
2009-04-02 20:58:04 -04:00
2009-04-03 18:56:02 -04:00
#+srcname : first_r
#+begin_src R :results replace
a <- 9
a
#+end_src
: 9
#+begin_src R :var other=first_r :results replace
other + 2
#+end_src
: 11
2009-04-02 20:58:04 -04:00
2009-03-31 10:24:15 -04:00
* COMMENT Commentary
I'm seeing this as like commit notes, and a place for less formal
communication of the goals of our changes.
** Eric <2009-02-06 Fri 15:41>
I think we're getting close to a comprehensive set of objectives
(although since you two are the real R user's I leave that decision up
to you). Once we've agreed on a set of objectives and agreed on at
least to broad strokes of implementation, I think we should start
listing out and assigning tasks.
** Eric <2009-02-09 Mon 14:25>
I've done a fairly destructive edit of this file. The main goal was
to enforce a structure on the document that we can use moving forward,
so that any future objective changes are all made to the main
objective list.
I apologize for removing sections written by other people. I did this
when they were redundant or it was not clear how to fit them into this
structure. Rest assured if the previous text wasn't persisted in git
I would have been much more cautious about removing it.
I hope that this outline structure should be able to remain stable
through the process of fleshing out objectives, and cashing those
objectives out into tasks. That said, please feel free to make any
changes that you see fit.
** Dan <2009-02-12 Thu 10:23>
Good job Eric with major works on this file.
** Eric <2009-02-22 Sun 13:17>
So I skipped ahead and got started on the fun part. Namely stubbing
out some of the basic functionality. Please don't take any of the
decisions I've made so far (on things like names, functionality,
design etc...) as final decisions, I'm of course open to and hoping
for improvement.
So far [[file:litorgy/litorgy.el ][litorgy.el ]] and [[file:litorgy/litorgy-script.el ][litorgy-script.el ]] can be used to evaluate source
code blocks of simple scripting languages. It shouldn't be too hard
(any takers) to write a litorgy-R.el modeled after litorgy-script.el
to use for evaluating R code files.
See the [[* litorgy.el beginning functionality ][Sandbox ]] for evaluable examples.
** Eric <2009-02-23 Mon 15:12>
While thinking about how to implement the transfer of data between
source blocks and the containing org-mode file, I decided it *might*
be useful to explicitly support the existence of variables which exist
independent of source blocks or tables. I'd appreciate any
feedback... (see [[free explicit variables ][free explicit variables ]])
** Eric <2009-02-23 Mon 17:53>
So as I start populating this file with source code blocks I figure I
should share this... I don't know if you guys use [[http://code.google.com/p/smart-snippet/ ][yasnippet ]] at all,
but if you do you might find this [[file:block ][block-snippet ]] org-mode snippet
useful (I use it all the time).
2009-03-31 19:00:23 -04:00
2009-02-09 17:41:31 -05:00
* Overview
2009-02-08 13:11:22 -05:00
This project is basically about putting source code into org
files. This isn't just code to look pretty as a source code example,
but code to be evaluated. Org files have 3 main export targets: org,
2009-02-09 17:41:31 -05:00
html and latex. Once we have implemented a smooth bi-directional flow
of data between org-mode formats (including tables, and maybe lists
and property values) and source-code blocks, we will be able to use
2009-02-11 11:08:34 -05:00
org-mode's built in export to publish the results of evaluated source
code in any org-supported format using org-mode as an intermediate
format. We have a current focus on R code, but we are regarding that
more as a working example than as a defining feature of the project.
2009-02-08 13:11:22 -05:00
2009-02-09 17:41:31 -05:00
The main objectives of this project are...
# Lets start with this list and make changes as appropriate. Please
# try to make changes to this list, rather than starting any new
# lists.
- [[* evaluation of embedded source code ][evaluation of embedded source code ]]
- [[* execution on demand and on export ][execution on demand and on export ]]
2009-02-11 11:08:34 -05:00
- [[* source blocks ][source blocks ]]
2009-02-23 01:19:34 -05:00
- [[* header arguments ][header arguments ]]
2009-02-09 17:41:31 -05:00
- [[* inline source evaluation ][inline source evaluation ]]
2009-02-11 11:08:34 -05:00
- [[* included source file evaluation ][included source file evaluation ]] ?? maybe
- [[* caching of evaluation ][caching of evaluation ]]
2009-02-09 17:41:31 -05:00
- [[* interaction with the source-code's process ][interaction with the source-code's process ]]
- [[* output of code evaluation ][output of code evaluation ]]
- [[* textual/numeric output ][textual/numeric output ]]
- [[* graphical output ][graphical output ]]
- [[* file creation ][non-graphics file creation ]]
- [[* side effects ][side effects ]]
2009-02-11 11:08:34 -05:00
- [[* reference to data and evaluation results ][reference to data and evaluation results ]]
2009-02-09 17:41:31 -05:00
- [[* reference format ][reference format ]]
- [[* source-target pairs ][source-target pairs ]]
- [[* source block output from org tables ][source block output from org tables ]]
- [[* source block outpt from other source block ][source block outpt from other source block ]]
- [[* source block output from org list ][source block output from org list ]] ?? maybe
- [[* org table from source block ][org table from source block ]]
- [[* org table from org table ][org table from org table ]]
- [[* org properties from source block ][org properties from source block ]]
- [[* org properties from org table ][org properties from org table ]]
- [[* export ][export ]]
2009-02-08 15:00:28 -05:00
2009-02-10 17:21:06 -05:00
* Objectives and Specs
2009-02-08 13:11:22 -05:00
2009-02-09 17:41:31 -05:00
** evaluation of embedded source code
2009-02-08 13:11:22 -05:00
2009-02-09 17:41:31 -05:00
*** execution on demand and on export
2009-02-08 15:00:28 -05:00
Let's use an asterisk to indicate content which includes the
*result* of code evaluation, rather than the code itself. Clearly
we have a requirement for the following transformation:
org \to org*
Let's say this transformation is effected by a function
`org-eval-buffer'. This transformation is necessary when the
target format is org (say you want to update the values in an org
table, or generate a plot and create an org link to it), and it
can also be used as the first step by which to reach html and
latex:
org \to org* \to html
2009-02-08 13:11:22 -05:00
2009-02-08 15:00:28 -05:00
org \to org* \to latex
2009-02-08 13:11:22 -05:00
2009-02-08 15:00:28 -05:00
Thus in principle we can reach our 3 target formats with
`org-eval-buffer', `org-export-as-latex' and `org-export-as-html'.
An extra transformation that we might want is
org \to latex
I.e. export to latex without evaluation of code, in such a way that R
code can subsequently be evaluated using
=Sweave(driver=RweaveLatex)= , which is what the R community is
used to. This would provide a `bail out' avenue where users can
escape org mode and enter a workflow in which the latex/noweb file
is treated as source.
**** How do we implement `org-eval-buffer'?
AIUI The following can all be viewed as implementations of
org-eval-buffer for R code:
2009-02-23 19:07:07 -05:00
(see this question again posed in [[file:litorgy/litorgy-R.el::Maybe%20the%20following%20be%20replaced%20with%20a%20method%20using%20ess%20execute ][litorgy-R.el ]])
2009-02-08 15:00:28 -05:00
***** org-eval-light
This is the beginnings of a general evaluation mechanism, that
could evaluate python, ruby, shell, perl, in addition to R.
2009-02-09 17:41:31 -05:00
The header says it's based on org-eval
what is org-eval??
org-eval was written by Carsten. It lives in the
org/contrib/lisp directory because it is too dangerous to
include in the base. Unlike org-eval-light org-eval evaluates
all source blocks in an org-file when the file is first opened,
which could be a security nightmare for example if someone
emailed you a pernicious file.
2009-02-08 15:00:28 -05:00
***** org-R
This accomplishes org \to org* in elisp by visiting code blocks
and evaluating code using ESS.
***** RweaveOrg
This accomplishes org \to org* using R via
2009-02-08 13:11:22 -05:00
2009-02-08 15:00:28 -05:00
: Sweave("file-with-unevaluated-code.org", driver=RweaveOrg, syntax=SweaveSyntaxOrg)
***** org-exp-blocks.el
Like org-R, this achieves org \to org* in elisp by visiting code
blocks and using ESS to evaluate R code.
2009-02-11 11:08:34 -05:00
*** source blocks
2009-02-09 17:41:31 -05:00
(see [[* Special editing and evaluation of source code ][Special editing and evaluation of source code ]])
2009-02-08 15:00:28 -05:00
2009-02-23 01:19:34 -05:00
*** header arguments
(see [[* block headers/parameters ][block headers/parameters ]])
There are going to be many cases where we want to use header arguments
to change the evaluation options of source code, to pass external
information to a block of source code and control the inclusion of
evaluation results.
2009-02-09 17:41:31 -05:00
*** inline source evaluation
2009-02-11 11:08:34 -05:00
*** included source file evaluation
It may be nice to be able to include an entire external file of source
code, and then evaluate and export that code as if it were in the
file. The format for such a file inclusion could optionally look like
the following
: #+include_src filename header_arguments
*** caching of evaluation
Any kind of code that can have a block evaluated could optionally define
a function to write the output to a file, or to serialize the output of
the function. If a document or block is configured to cache input,
write all cached blocks to their own files and either a) hash them, or
b) let git and org-attach track them. Before a block gets eval'd, we
check to see if it has changed. If a document or block is configured to
cache output and a print/serialize function is available, write the
output of each cached block to its own file. When the file is eval'd
and some sort of display is called for, only update the display if the
output has changed. Each of these would have an override, presumably
something like (... & force) that could be triggered with a prefix arg
to the eval or export function.
For R, I would say
#+begin_src emacs-lisp
;; fake code that only pretends to work
(add-hook 'rorg-store-output-hook
'("r" lambda (block-environment block-label)
(ess-exec (concat "save.image("
block-environment
", file = " block-label
".Rdata, compress=TRUE)"))))
#+end_src
The idea being that for r blocks that get eval'd, if output needs to be
stored, you should write the entire environment that was created in that
block to an Rdata file.
(see [[* block scoping ][block scoping ]])
2009-02-06 18:45:59 -05:00
2009-02-09 17:41:31 -05:00
** interaction with the source-code's process
We should settle on a uniform API for sending code and receiving
output from a source process. Then to add a new language all we need
to do is implement this API.
2009-02-06 18:45:59 -05:00
2009-02-09 17:41:31 -05:00
for related notes see ([[* Interaction with the R process ][Interaction with the R process ]])
2009-02-05 18:04:09 -05:00
2009-02-09 17:41:31 -05:00
** output of code evaluation
*** textual/numeric output
We (optionally) incorporate the text output as text in the target
document
*** graphical output
We either link to the graphics or (html/latex) include them
inline.
I would say, if the block is being evaluated interactively then
lets pop up the image in a new window, and if it is being exported
then we can just include a link to the file which will be exported
appropriately by org-mode.
*** non-graphics files
? We link to other file output
*** side effects
If we are using a continuous process in (for example an R process
handled by ESS) then any side effects of the process (for example
setting values of R variables) will be handled automatically
Are there side-effects which need to be considered aside from those
internal to the source-code evaluation process?
** reference to data and evaluation results
2009-02-12 11:07:25 -05:00
I think this will be very important. I would suggest that since we
are using lisp we use lists as our medium of exchange. Then all we
need are functions going converting all of our target formats to and
from lists. These functions are already provided by for org tables.
2009-02-09 17:41:31 -05:00
2009-02-12 11:07:25 -05:00
It would be a boon both to org users and R users to allow org tables
to be manipulated with the R programming language. Org tables give R
users an easy way to enter and display data; R gives org users a
powerful way to perform vector operations, statistical tests, and
visualization on their tables.
2009-02-09 17:41:31 -05:00
2009-02-12 11:07:25 -05:00
This means that we will need to consider unique id's for source
blocks, as well as for org tables, and for any other data source or
target.
2009-02-05 18:54:46 -05:00
*** Implementations
**** naive
Naive implementation would be to use =(org-export-table "tmp.csv")=
2009-02-05 21:15:09 -05:00
and =(ess-execute "read.csv('tmp.csv')")= .
2009-02-05 18:47:20 -05:00
**** org-R
2009-02-05 21:15:09 -05:00
org-R passes data to R from two sources: org tables, or csv
files. Org tables are first exported to a temporary csv file
using [[file:existing_tools/org-R.el::defun%20org%20R%20export%20to%20csv%20csv%20file%20options ][org-R-export-to-csv ]].
2009-02-05 18:47:20 -05:00
**** org-exp-blocks
2009-02-12 11:07:25 -05:00
org-exp-blocks uses [[org-interblock-R-command-to-string ]] to send
commands to an R process running in a comint buffer through ESS.
org-exp-blocks has no support for dumping table data to R process, or
vice versa.
2009-02-06 18:45:59 -05:00
2009-02-05 18:47:20 -05:00
**** RweaveOrg
NA
2009-02-09 17:41:31 -05:00
*** reference format
2009-02-12 11:07:25 -05:00
This will be tricky, Dan has already come up with a solution for R, I
need to look more closely at that and we should try to come up with a
formats for referencing data from source-code in such a way that it
will be as source-code-language independent as possible.
2009-03-23 20:01:11 -04:00
Org tables already have a sophisticated reference system in place
that allows referencing table ranges in other files, as well as
specifying constants in the header arguments of a table. This is
described in [[info:org:References ]].
2009-02-12 11:07:25 -05:00
**** Dan: thinking aloud re: referencing data from R
Suppose in some R code, we want to reference data in an org
table. I think that requires the use of 'header arguments', since
otherwise, under pure evaluation of a code block without header
args, R has no way to locate the data in the org buffer. So that
suggests a mechanism like that used by org-R whereby table names
or unique entry IDs are used to reference org tables (and indeed
potentially row/column ranges within org tables, although that
subsetting could also be done in R).
Specifically what org-R does is write the table to a temp csv
file, and tell R the name of that file. However:
1. We are not limited to a single source of input; the same sort
of thing could be done for several sources of input
2. I don't think we even have to use temp files. An alternative
would be to have org pass the table contents as a csv-format
string to textConnection() in R, thus creating an arbitrary
number of input objects in the appropriate R environment
(scope) from which the R code can read data when necessary.
That suggests a header option syntax something like
#+begin_src emacs-lisp
'(:R-obj-name-1 tbl-name-or-id-1 :R-obj-name-2 tbl-name-or-id-2)
#+end_src emacs-lisp
As a result of passing that option, the code would be able to access
the data referenced by table-name-or-id-2 via read.table(R-obj-name-1).
An extension of that idea would be to allow remote files to be used as
data sources. In this case one might need just the remote file (if
it's a csv file), or if it's an org file then the name of the file
plus a table reference within that org file. Thus maybe something like
#+begin_src emacs-lisp
'((R-obj-name-1 . (:tblref tbl-name-or-id-1 :file file-1))
(R-obj-name-2 . (:tblref tbl-name-or-id-2 :file file-2)))
#+end_src emacs-lisp
2009-03-04 20:22:28 -05:00
**** Eric: referencing data in general
So here's some thoughts for referencing data (henceforth referred to
as *resources* ). I think this is the next thing we need to tackle for
implementation to move forward. We don't need to implement everything
below right off the bat, but I'd like to get these lists as full as
possible so we don't make any implementation assumptions which
preclude real needs.
We need to reference resources of the following types...
- table (list)
- output from a source code block (list or hash)
- property values of an outline header (hash)
- list (list)
- description list (hash)
- more?...
All of these resources will live in org files which could be
- the current file (default)
- another file on the same system (path)
- another file on the web (url)
- another file in a git repo (file and commit hash)
What information should each of these resources be able to supply?
I'm thinking (again not that we'll implement all of these but just to
think of them)...
- ranges or points of vector data
- key/value pairs from a hash
- when the object was last modified
- commit info (author, date, message, sha, etc...)
- pointers to the resources upon which the resource relies
So we need a referencing syntax powerful enough to handle all of these
alternatives. Maybe something like =path:sha:name:range= where
- path :: is empty for the current file, is a path for files on the
same system, and is a url otherwise
- sha :: is an option git commit indicator
- name :: is the table/header/source-block name or id for location
inside of the org file (this would not be optional)
- range :: would indicate which information is requested from the
resource, so it could be a range to access parts of a
table, or the names of properties to be referenced from an
outline header
Once we agree on how this should work, I'll try to stub out some code,
so that we can get some simple subset of this functionality working,
hopefully something complex enough to do the following...
- [[* resource reference example ][resource-reference-example ]]
***** questions
****** multiple outputs
Do we want things like a source code block to leave multiple outputs,
or do we only want them to be able to output one vector or hash?
****** environment (state and side-effects)
This design assumes that any changes will explicitly pass data in a
functional programming style. This makes no assumptions about things
like source code blocks changing state (in general state changes lead
to more difficult debugging).
- Do we want to take steps so ensure we do things like execute
consecutive R blocks in different environment, or do we want to
allow state changes?
- Does this matter?
2009-03-04 21:14:42 -05:00
****** passing arguments to resources
So I(eric) may be getting ahead of myself here, but what do you think
about the ability to pass arguments to resources. I'm having visions
of google map-reduce, processes spread out across multiple machines.
Maybe we could do this by allowing the arguments to be specified?
2009-02-09 17:41:31 -05:00
*** source-target pairs
2009-02-06 18:45:59 -05:00
2009-02-12 11:07:25 -05:00
The following can be used for special considerations based on
source-target pairs
2009-02-09 17:41:31 -05:00
2009-02-12 11:07:25 -05:00
Dan: I don't quite understand this subtree; Eric -- could you give
a little more explanation of this and of your comment above
regarding using [[lists as our medium of exchange ]]?
2009-02-09 17:41:31 -05:00
**** source block output from org tables
**** source block outpt from other source block
**** source block output from org list
**** org table from source block
**** org table from org table
**** org properties from source block
**** org properties from org table
2009-02-12 11:07:25 -05:00
2009-02-09 17:41:31 -05:00
** export
2009-02-12 11:07:25 -05:00
once the previous objectives are met export should be fairly simple.
Basically it will consist of triggering the evaluation of source code
blocks with the org-export-preprocess-hook.
2009-02-06 18:45:59 -05:00
2009-02-12 11:07:25 -05:00
This block export evaluation will be aware of the target format
through the htmlp and latexp variables, and can then create quoted
=#+begin_html= and =#+begin_latex= blocks appropriately.
2009-02-06 18:45:59 -05:00
2009-02-05 20:07:09 -05:00
* Notes
2009-02-11 11:08:34 -05:00
** Block Formats
2009-02-05 20:07:09 -05:00
Unfortunately org-mode how two different block types, both useful.
In developing RweaveOrg, a third was introduced.
Eric is leaning towards using the =#+begin_src= blocks, as that is
2009-02-05 21:15:09 -05:00
really what these blocks contain: source code. Austin believes
2009-02-05 20:07:09 -05:00
that specifying export options at the beginning of a block is
useful functionality, to be preserved if possible.
Note that upper and lower case are not relevant in block headings.
2009-02-11 11:08:34 -05:00
*** PROPOSED block format
2009-02-06 18:45:59 -05:00
I (Eric) propose that we use the syntax of source code blocks as they
currently exist in org-mode with the addition of *evaluation* ,
*header-arguments* , *exportation* , *single-line-blocks* , and
*references-to-table-data* .
1) *evaluation* : These blocks can be evaluated through =\C-c\C-c= with
a slight addition to the code already present and working in
[[file:existing_tools/org-eval-light.el ][org-eval-light.el ]]. All we should need to add for R support would
be an appropriate entry in [[org-eval-light-interpreters ]] with a
corresponding evaluation function. For an example usinga
org-eval-light see [[* src block evaluation w/org-eval-light ]].
2) *header-arguments* : These can be implemented along the lines of
Austin's header arguments in [[file:existing_tools/RweaveOrg/org-sweave.el ][org-sweave.el ]].
3) *exportation* : Should be as similar as possible to that done by
Sweave, and hopefully can re-use some of the code currently present
in [[file:existing_tools/exp-blocks/org-exp-blocks.el ][org-exp-blocks.el ]].
4) *single-line-blocks* : It seems that it is useful to be able to
place a single line of R code on a line by itself. Should we add
2009-02-10 20:18:20 -05:00
syntax for this similar to Dan's =#+RR:= lines? I would lean
2009-02-06 18:45:59 -05:00
towards something here that can be re-used for any type of source
code in the same manner as the =#+begin_src R= blocks, maybe
2009-02-10 20:18:20 -05:00
=#+src_R= ? Dan: I'm fine with this, but don't think single-line
blocks are a priority. My =#+R= lines were something totally
different: an attempt to have users specify R code implicitly,
using org-mode option syntax.
2009-02-06 18:45:59 -05:00
5) *references-to-table-data* : I get this impression that this is
vital to the efficient use of R code in an org file, so we should
come up with a way to reference table data from a single-line-block
or from an R source-code block. It looks like Dan has already done
this in [[file:existing_tools/org-R.el ][org-R.el ]].
2009-02-10 17:21:06 -05:00
Syntax
Multi-line Block
: #+begin_src lang header-arguments
: body
: #+end
- lang :: the language of the block (R, shell, elisp, etc...)
2009-02-10 20:18:20 -05:00
- header-arguments :: a list of optional arguments which control how
2009-02-10 17:21:06 -05:00
the block is evaluated and exported, and how the results are handled
- body :: the actual body of the block
Single-line Block
: #+begin_src lang body
- It's not clear how/if we would include header-arguments into a
2009-02-10 20:18:20 -05:00
single line block. Suggestions? Can we just leave them out? Dan:
I'm not too worried about single line blocks to start off
with. Their main advantage seems to be that they save 2 lines.
2009-02-11 11:08:34 -05:00
Eric: Fair enough, lets not worry about this now, also I would guess
that any code simple enough to fit on one line wouldn't need header
arguments anyways.
2009-02-10 17:21:06 -05:00
Include Block
: #+include_src lang filename header-arguments
2009-02-10 20:18:20 -05:00
- I think this would be useful, and should be much more work (Dan:
2009-02-11 11:08:34 -05:00
didn't get the meaning of that last clause!?). Eric: scratch that,
I meant "*shouldn't* be too much work" :) That way whole external
files of source code could be evaluated as if they were an inline
block. Dan: again I'd say not a massive priority, as I think all the
languages we have in mind have facilities for doing this natively,
thus I think the desired effect can often be achieved from within a
#+begin_src block. Eric: Agreed, while this would be a nice thing
to include we shouldn't wast too much effort on it in the beginning.
2009-02-10 17:21:06 -05:00
2009-02-06 18:45:59 -05:00
What do you think? Does this accomplish everything we want to be able
to do with embedded R source code blocks?
2009-02-09 17:41:31 -05:00
***** src block evaluation w/org-eval-light
2009-02-06 18:45:59 -05:00
here's an example using org-eval-light.el
first load the org-eval-light.el file
[[elisp:(load (expand-file-name "org-eval-light.el" (expand-file-name "existing_tools" (file-name-directory buffer-file-name)))) ]]
then press =\C-c\C-c= inside of the following src code snippet. The
results should appear in a comment immediately following the source
code block. It shouldn't be too hard to add R support to this
function through the `org-eval-light-interpreters' variable.
2009-02-08 15:00:28 -05:00
(Dan: The following causes error on export to HTML hence spaces inserted at bol)
#+begin_src shell
2009-02-06 18:45:59 -05:00
date
2009-02-08 15:00:28 -05:00
#+end_src
2009-02-06 18:45:59 -05:00
2009-02-11 11:08:34 -05:00
*** existing formats
2009-02-09 17:41:31 -05:00
**** Source code blocks
2009-02-05 20:07:09 -05:00
Org has an extremely useful method of editing source code and
examples in their native modes. In the case of R code, we want to
be able to use the full functionality of ESS mode, including
interactive evaluation of code.
Source code blocks look like the following and allow for the
special editing of code inside of the block through
`org-edit-special'.
2009-02-05 19:37:17 -05:00
#+BEGIN_SRC r
,## hit C-c ' within this block to enter a temporary buffer in r-mode.
,## while in the temporary buffer, hit C-c C-c on this comment to
,## evaluate this block
a <- 3
a
,## hit C-c ' to exit the temporary buffer
#+END_SRC
2009-02-05 19:00:05 -05:00
2009-02-09 17:41:31 -05:00
**** dblocks
2009-02-05 20:07:09 -05:00
dblocks are useful because org-mode will automatically call
2009-02-05 19:59:19 -05:00
`org-dblock-write:dblock-type' where dblock-type is the string
following the =#+BEGIN:= portion of the line.
2009-02-05 18:04:09 -05:00
2009-02-05 20:07:09 -05:00
dblocks look like the following and allow for evaluation of the
code inside of the block by calling =\C-c\C-c= on the header of
the block.
2009-02-05 19:00:05 -05:00
#+BEGIN : dblock-type
#+END :
2009-02-05 18:04:09 -05:00
2009-02-09 17:41:31 -05:00
**** R blocks
In developing RweaveOrg, Austin created [[file:existing_tools/RweaveOrg/org-sweave.el ][org-sweave.el ]]. This
allows for the kind of blocks shown in [[file:existing_tools/RweaveOrg/testing.Rorg ][testing.Rorg ]]. These blocks
have the advantage of accepting options to the Sweave preprocessor
following the #+BEGIN_R declaration.
2009-02-05 20:07:09 -05:00
2009-02-10 17:21:06 -05:00
*** block headers/parameters
2009-02-12 11:07:25 -05:00
Regardless of the syntax/format chosen for the source blocks, we will
2009-02-10 17:21:06 -05:00
need to be able to pass a list of parameters to these blocks. These
should include (but should certainly not be limited to)
2009-02-11 11:08:34 -05:00
- label or id :: Label of the block, should we provide facilities for
automatically generating a unique one of these?
- file :: names of file to which graphical/textual/numerical/tabular output
should be written. Do we need this, or should this be controlled
through the source code itself?
2009-02-23 01:19:34 -05:00
- results :: indication of where the results should be placed, maybe
the following values...
- append :: *default* meaning just append to the current buffer
immediately following the current source block
- replace :: like append, but replace any results currently there
- file :: save the results in a new file, and place a link to the
file into the current buffer immediately following the
source code block
- table :: save the results into a table, maybe use a table id:range
to identify which table and where therein
- nil :: meaning just discard the results
2009-02-11 11:08:34 -05:00
- not sure of a good name here :: flags for when/if the block should
be evaluated (on export etc...)
- again can't thing of a concise name :: flags for how the results of
the export should be displayed/included
- scope :: flag indicating whether the block should have a local or
global scope
2009-02-10 17:21:06 -05:00
- flags specific to the language of the source block
- etc...
2009-02-23 01:19:34 -05:00
I think fleshing out this list is an important next step.
2009-02-06 18:45:59 -05:00
** Interaction with the R process
We should take care to implement this in such a way that all of the
different components which have to interactive with R including:
- evaluation of source code blocks
- automatic evaluation on export
- evaluation of \R{} snippets
- evaluation of single source code lines
2009-02-11 11:08:34 -05:00
- evaluation of included source code files
2009-02-08 13:11:22 -05:00
- sending/receiving vector data
2009-02-06 18:45:59 -05:00
2009-02-08 15:00:28 -05:00
I think we currently have two implementations of interaction with R
2009-02-06 18:45:59 -05:00
processes; [[file:existing_tools/org-R.el ][org-R.el ]] and [[file:existing_tools/exp-blocks/org-exp-blocks.el ][org-exp-blocks.el ]]. We should be sure to take
the best of each of these approaches.
2009-02-11 11:08:34 -05:00
More on the exchange of data at between org-mode and source code
blocks at [[* reference to data and evaluation results ][reference to data and evaluation results ]].
** block scoping
(see [[* caching of evaluation ][caching of evaluation ]])
This inadvertently raises the issue of scoping. The pretend function
pretends that we will create a block-local scope, and that we can save
just the things in that scope. Sweave takes the make-everything-global
approach. I can see advantages either way. If we make block-local
scopes, we can save each one independently, and generally speaking it
seems like more granularity==more control. If we make everything
global, we can refer to entities declared in earlier blocks without
having to explicitly import those entities into the current block. I
think this counts in the "need to think about it early on" category.
If we did want block-local scopes, in R we can start every eval with
something like
;; fake code that pretends to create a new, empty environment
(ess-exec (concat block-env " <- new.env()"))
(ess-exec (concat "eval(" block-contents ", envir=" block-env ")"))
If we decide we want block-scoping, I'm sure Dan and I can figure out
the right way to do this in R, if he hasn't already. I haven't thought
at all about how these scope issues generalize to, say, bash blocks.
Maybe this is something that should be controlled by a header
argument?
2009-02-22 16:19:32 -05:00
** =\C-c\C-c= evaluation
With org-mode version at least 6.23, see the documentation for
[[info:org:Context-sensitive%20commands ][info:org:Context-sensitive commands ]].
2009-02-23 18:15:12 -05:00
** free explicit variables
Maybe we should have some idea of variables independent of any
particular type of source code or source block. These could be
variables that have a value inside of the scope of the org-mode file,
and they could be used as a transport mechanism for information
transfer between org-tables, org-lists, and different source-blocks.
Each type of source code (and org-mode types like tables, lists,
etc...) would need to implement functions for converting different
types of data to and from these variables (which would be elisp
variables).
So for example say we want to read the values from a table into an R
block, perform some calculations, and then write the results back into
the table. We could
1) assign the table to a variable
- the table would be converted into a lisp vector (list of lists)
- the vector would be saved in the variable
2) an R source block would reference the variable
- the variable would be instantiated into an R variable (through
mechanisms mentioned [[* Dan: thinking aloud re: referencing data from R ][elsewhere ]])
- the R code is executed, and the value of the variable *inside of
R* is updated
- when the R block finished the value of the variable *globally in
the org buffer* would be updated
3) optionally the global value of the variable would be converted back
into an org-mode table and would be used to overwrite the existing
table.
What do you think?
This might not be too different from what we were already talking
about, but I think the introduction of the idea of having variables
existing independently of any tables or source code blocks is novel
and probably has some advantages (and probably shortfalls).
2009-02-06 18:45:59 -05:00
2009-02-09 17:41:31 -05:00
* Buffer Dictionary
2009-03-04 21:14:42 -05:00
LocalWords: DBlocks dblocks litorgy el eric