52 KiB
rorg — Code evaluation in org-mode, with an emphasis on R
- Tasks
[8/16]
- share litorgy
- litorgy tests litorgy
[0/1]
- inline source code blocks
[3/5]
- figure out how to handle graphic output
- ability to select which of multiple sessions is being used
- evaluation of shell code as background process?
- conversion of output from interactive shell, R (and python) sessions to litorgy buffers
- PROPOSED re-implement helper functions from org-R
- integration with org tables
- source blocks as functions
- folding of code blocks?
[2/2]
- selective export of text, code, figures
- a header argument specifying silent evaluation (no output)
- assign variables from tables in R
- insert 2-D R results as tables
- allow variable initialization from source blocks
- Bugs
[2/2]
- Sandbox
- COMMENT Commentary
- Overview
- Objectives and Specs
- Notes
- Buffer Dictionary
Tasks [8/16]
TODO share litorgy
how should we share litorgy?
- post to org-mode and ess mailing lists
- create a litorgy page on worg
- create a short screencast demonstrating litorgy in action
examples
we need to think up some good examples
interactive tutorials
This could be a place to use litorgical assertions.
for example the first step of a tutorial could assert that the version of the software-package (or whatever) is equal to some value, then source-code blocks could be used with confidence (and executed directly from) the rest of the tutorial.
answering a text-book question w/code example
litorgy is an ideal environment enabling both the development and demonstrationg of the code snippets required as answers to many text-book questions.
something using tables
maybe something along the lines of calculations from collected grades
file sizes
Maybe something like the following which outputs sizes of directories
under the home directory, and then instead of the trivial emacs-lisp
block we could use an R block to create a nice pie chart of the
results.
du -sc ~/*
(mapcar #'car sizes)
TODO
litorgy tests litorgy [0/1]
since we are accumulating this nice collection of source-code blocks in the sandbox section we should make use of them as unit tests. What's more, we should be able to actually use litorgy to run these tests.
We would just need to cycle over every source code block under the sandbox, run it, and assert that the return value is equal to what we expect.
I have the feeling that this should be possible using only litorgical functions with minimal or no additional elisp. It would be very cool for litorgy to be able to test itself.
TODO litorgical assertions
These could be used to make assertions about the results of a source-code block. If the assertion fails then the point could be moved to the block, and error messages and highlighting etc… could ensue
TODO
inline source code blocks [3/5]
Like the \R{ code }
blocks
not sure what the format should be, maybe just something simple
like src_lang[]{}
where lang is the name of the source code
language to be evaluated, []
is optional and contains any header
arguments and {}
contains the code.
(see the-sandbox)
DONE evaluation with \C-c\C-c
Putting aside the header argument issue for now we can just run these with the following default header arguments
-
:results
- silent
-
:exports
- results
DONE inline exportation
Need to add an interblock hook (or some such) through org-exp-blocks
DONE header arguments
We should make it possible to use header arguments.
TODO fontification
we should color these blocks differently
TODO refine html exportation
should use a span class, and should show original source in tool-tip
TODO figure out how to handle graphic output
This is listed under graphical output in out objectives.
How should this work for R? For example how are files included with Sweave? Would/Should we just mimic the behavior of Sweave with the addition of support for poping up graphics during live evaluation of a source code block.
I think the best way to approach this would be to start with an example R source-code block and then work up from there.
TODO ability to select which of multiple sessions is being used
Increasingly it is looking like we're going to want to run all source code blocks in comint buffer (sessions). Which will have the benefits of
- allowing background execution
-
maintaining state between source-blocks
- allowing inline blocks w/o header arguments
R sessions
(like ess-switch-process in .R buffers)
Maybe this could be packaged into a header argument, something
like :R_session
which could accept either the name of the
session to use, or the string prompt
, in which case we could use
the ess-switch-process
command to select a new process.
TODO evaluation of shell code as background process?
After C-c C-c on an R code block, the process may appear to block, but C-g can be used to reclaim control of the .org buffer, without interrupting the R evalution. However I believe this is not true of bash/sh evaluation. [Haven't tried other languages] Perhaps a solution is just to background the individual shell commands.
The other languages (aside from emacs lisp) are run through the shell, so if we find a shell solution it should work for them as well.
Adding an ampersand seems to be a supported way to run commands in the background (see external-commands). Although a more extensible solution may involve the use of the call-process-region function.
Going to try this out in a new file litorgy-proc.el. This should contain functions for asynchronously running generic shell commands in the background, and then returning their input.
partial update of org-mode buffer
The sleekest solution to this may be using a comint buffer, and then defining a filter function which would incrementally interpret the results as they are returned, including insertion into the org-mode buffer. This may actually cause more problems than it is worth, what with the complexities of identifying the types of incrementally returned results, and the need for maintenance of a process marker in the org buffer.
'working' spinner
It may be nice and not too difficult to place a spinner on/near the evaluating source code block
TODO conversion of output from interactive shell, R (and python) sessions to litorgy buffers
[DED] This would be a nice feature I think. Although a litorgy purist would say that it's working the wrong way round… After some interactive work in a R buffer, you save the buffer, maybe edit out some lines, and then convert it to litorgy format for posterity. Same for a shell session either in a shell buffer, or pasted from another terminal emulator. And python of course.
PROPOSED re-implement helper functions from org-R
Much of the power of org-R seems to be in it's helper functions for the quick graphing of tables. Should we try to re-implement these functions on top of litorgy?
I'm thinking this may be useful both to add features to litorgy-R and also to potentially suggest extensions of the framework. For example one that comes to mind is the ability to treat a source-code block like a function which accepts arguments and returns results. Actually this can be it's own TODO (see source blocks as functions).
DONE integration with org tables
We should make it easy to call litorgy source blocks from org-mode table formulas. This is practical now that it is possible to pass arguments to litorgical source blocks.
See the related sandbox header for tests/examples.
digging in org-table.el
In the past org-table.el has proven difficult to work with.
Should be a hook in org-table-eval-formula.
Looks like I need to change this if statement (line 2239) into a cond expression.
DONE source blocks as functions
Allow source code blocks to be called like functions, with arguments specified. We are already able to call a source-code block and assign it's return result to a variable. This would just add the ability to specify the values of the arguments to the source code block assuming any exist. For an example see
When a variable appears in a header argument, how do we differentiate between it's value being a reference or a literal value? I guess this could work just like a programming language. If it's escaped or in quotes, then we count it as a literal, otherwise we try to look it up and evaluate it.
DONE
folding of code blocks? [2/2]
[DED] In similar way to using outline-minor-mode for folding function bodies, can we fold code blocks? #+begin whatever statements are pretty ugly, and in any case when you're thinking about the overall game plan you don't necessarily want to see the code for each Step.
DONE folding of source code block
Sounds good, and wasn't too hard to implement. Code blocks should now be fold-able in the same manner as headlines (by pressing TAB on the first line).
REJECTED folding of results
So, lets do a three-stage tab cycle… First fold the src block, then fold the results, then unfold.
There's no way to tell if the results are a table or not w/o actually executing the block which would be too expensive of an operation.
DONE selective export of text, code, figures
[DED] The litorgy buffer contains everything (code, headings and notes/prose describing what you're up to, textual/numeric/graphical code output, etc). However on export to html / LaTeX one might want to include only a subset of that content. For example you might want to create a presentation of what you've done which omits the code.
[EMS] So I think this should be implemented as a property which can
be set globally or on the outline header level (I need to review
the mechanics of org-mode properties). And then as a source block
header argument which will apply only to a specific source code
block. A header argument of :export
with values of
-
code
- just show the code in the source code block
-
none
- don't show the code or the results of the evaluation
-
results
- just show the results of the code evaluation (don't show the actual code)
-
both
- show both the source code, and the results
this will be done in (sandbox) selective export.
DONE a header argument specifying silent evaluation (no output)
This would be useful across all types of source block. Currently
there is a :replace t
option to control output, this could be
generalized to an :output
option which could take the following
options (maybe more)
-
t
- this would be the default, and would simply insert the results after the source block
-
replace
- to replace any results which may already be there
-
silent
- this would inhibit any insertion of the results
This is now implemented see the example in the sandbox
DONE assign variables from tables in R
This is now working (see (sandbox-table)-R). Although it's not that impressive until we are able to print table results from R.
DONE insert 2-D R results as tables
everything is working but R and shell
DONE shells
DONE R
This has already been tackled by Dan in org-R:check-dimensions. The functions there should be useful in combination with R-export-to-csv as a means of converting multidimensional R objects to emacs lisp.
It may be as simple as first checking if the data is multidimensional,
and then, if so using write
to write the data out to a temporary
file from which emacs can read the data in using org-table-import
.
Looking into this further, is seems that there is no such thing as a scalar in R R-scalar-vs-vector. In that light I am not sure how to deal with trivial vectors (scalars) in R. I'm tempted to just treat them as vectors, but then that would lead to a proliferation of trivial 1-cell tables…
DONE allow variable initialization from source blocks
Currently it is possible to initialize a variable from an org-mode
table with a block argument like table=sandbox
(note that the
variable doesn't have to named table
) as in the following example
1 | 2 | 3 |
4 | schulte | 6 |
(message (format "table = %S" table))
"table = ((1 2 3) (4 \"schulte\" 6))"
It would be good to allow initialization of variables from the results
of other source blocks in the same manner. This would probably
require the addition of #+SRCNAME: example
lines for the naming of
source blocks, also the table=sandbox
syntax may have to be expanded
to specify whether the target is a source code block or a table
(alternately we could just match the first one with the given name
whether it's a table or a source code block).
At least initially I'll try to implement this so that there is no need to specify whether the reference is to a table or a source-code block. That seems to be simpler both in terms of use and implementation.
This is now working for emacs-lisp, ruby and python (and mixtures of the three) source blocks. See the examples in the sandbox.
This is currently working only with emacs lisp as in the following example in the emacs lisp source reference.
Bugs [2/2]
RESOLVED Args out of range error
The following block resulted in the error below [DED]. It ran without error directly in the shell.
cd ~/work/genopca
for platf in ill aff ; do
for pop in CEU YRI ASI ; do
rm -f $platf/hapmap-genos-$pop-all $platf/hapmap-rs-all
cat $platf/hapmap-genos-$pop-* > $platf/hapmap-genos-$pop-all
cat $platf/hapmap-rs-* > $platf/hapmap-rs-all
done
done
executing source block with sh… finished executing source block string-equal: Args out of range: "", -1, 0
the error string-equal: Args out of range: "", -1, 0
looks like what
used to be output when the block returned an empty results string.
This should be fixed in the current version, you should now see the
following message no result returned by source block
.
RESOLVED ruby arrays not recognized as such
Something is wrong in /ndwarshuis/org-mode/src/commit/e680407493eb9c8047842a66e83e6063fa20219b/litorgy/litorgy-script.el related to the recognition of ruby arrays as such.
[1, 2, 3, 4]
1 | 2 | 3 | 4 |
[1, 2, 3, 4]
1 | 2 | 3 | 4 |
Sandbox
To run these examples evaluate litorgy-init.el
litorgy.el beginning functionality
date
Sun Apr 5 10:10:05 PDT 2009
Time.now
Sat May 09 18:18:33 -0700 2009
"Hello World"
Hello World
litorgy-R
a <- 9
b <- 17
a + b
26
hist(rgamma(20,3,3))
litorgy plays with tables
Alright, this should demonstrate both the ability of litorgy to read tables into a lisp source code block, and to then convert the results of the source code block into an org table. It's using the classic "lisp is elegant" demonstration transpose function. To try this out…
- evaluate /ndwarshuis/org-mode/src/commit/e680407493eb9c8047842a66e83e6063fa20219b/litorgy/init.el to load litorgy and friends
- evaluate the transpose definition
\C-u \C-c\C-c
on the beginning of the source block (prefix arg to inhibit output) - evaluate the next source code block, this should read in the table
because of the
:var table=previous
, then transpose the table, and finally it should insert the transposed table into the buffer immediately following the block
Emacs lisp
(defun transpose (table)
(apply #'mapcar* #'list table))
1 | 2 | 3 |
4 | schulte | 6 |
(transpose table)
1 | 4 |
2 | "schulte" |
3 | 6 |
'(1 2 3 4 5)
1 | 2 | 3 | 4 | 5 |
Ruby and Python
table.first.join(" - ")
"1 - 2 - 3"
table[0]
1 | 2 | 3 |
table
1 | 2 | 3 |
4 | "schulte" | 6 |
table
1 | 2 | 3 |
4 | "schulte" | 6 |
(sandbox table) R
1 | 2 | 3 |
4 | schulte | 6 |
x <- c(rnorm(10, mean=-3, sd=1), rnorm(10, mean=3, sd=1))
x
tabel
1 | 2 | 3 |
4 | "schulte" | 6 |
shell
Now shell commands are converted to tables using org-table-import
and if these tables are non-trivial (i.e. have multiple elements) then
they are imported as org-mode tables…
ls -l
"total" | 224 | "" | "" | "" | "" | "" | "" | "" |
"-rw-r–r–" | 1 | "eschulte" | "staff" | 35147 | "Apr" | 15 | 14 | "COPYING" |
"-rw-r–r–" | 1 | "eschulte" | "staff" | 277 | "Apr" | 15 | 14 | "README.markdown" |
"-rw-r–r–" | 1 | "eschulte" | "staff" | 57 | "Apr" | 15 | 14 | "block" |
"drwxr-xr-x" | 6 | "eschulte" | "staff" | 204 | "Apr" | 15 | 14 | "existing_tools" |
"drwxr-xr-x" | 12 | "eschulte" | "staff" | 408 | "May" | 9 | 18 | "litorgy" |
"-rw-r–r–" | 1 | "eschulte" | "staff" | 790 | "May" | 6 | 6 | "litorgy.org" |
"-rw-r–r–" | 1 | "eschulte" | "staff" | 49904 | "May" | 9 | 18 | "rorg.org" |
"-rw-r–r–" | 1 | "eschulte" | "staff" | 5469 | "Apr" | 26 | 13 | "test-export.html" |
"-rw-r–r–" | 1 | "eschulte" | "staff" | 972 | "Apr" | 26 | 13 | "test-export.org" |
silent evaluation
:im_the_results
:im_the_results
:im_the_results
(sandbox) referencing other source blocks
Doing this in emacs-lisp first because it's trivial to convert emacs-lisp results to and from emacs-lisp.
emacs lisp source reference
This first example performs a calculation in the first source block
named top
, the results of this calculation are then saved into the
variable first
by the header argument :var first=top
, and it is
used in the calculations of the second source block.
(+ 4 2)
(* first 3)
This example is the same as the previous only the variable being passed through is a table rather than a number.
(defun transpose (table)
(apply #'mapcar* #'list table))
1 | 2 | 3 |
4 | schulte | 6 |
(transpose table)
(transpose table)
ruby python
Now working for ruby
89
2 * other
178
and for python
98
another*3
294
mixed languages
Since all variables are converted into Emacs Lisp it is no problem to reference variables specified in another language.
2
(* ruby-variable 8)
lisp_var + 4
20
R
a <- 9
a
9
other + 2
11
(sandbox) selective export
For exportation tests and examples see (including exportation of inline source code blocks) /ndwarshuis/org-mode/src/commit/e680407493eb9c8047842a66e83e6063fa20219b/test-export.org
(sandbox) source blocks as functions
5
(* 3 n)
15
result
6
The following just demonstrates the ability to assign variables to literal values, which was not implemented until recently.
num+" schulte"
"eric schulte"
(sandbox) inline source blocks
This is an inline source code block
1 + 6
This is an inline source code block with header arguments.
n
(sandbox) integration w/org tables
(defun fibbd (n) (if (< n 2) 1 (+ (fibbd (- n 1)) (fibbd (- n 2)))))
(fibbd n)
(mapcar #'fibbd '(0 1 2 3 4 5 6 7 8))
Something is not working here. The function `sbe ' works fine when called from outside of the table (see the source block below), but produces an error when called from inside the table. I think there must be some narrowing going on during intra-table emacs-lisp evaluation.
original | fibbd |
---|---|
0 | 1 |
1 | 1 |
2 | 2 |
3 | 3 |
4 | 5 |
5 | 8 |
6 | 13 |
7 | 21 |
8 | 34 |
9 | 55 |
silent-result
(sbe 'fibbd (n "8"))
COMMENT Commentary
I'm seeing this as like commit notes, and a place for less formal communication of the goals of our changes.
Eric <2009-02-06 Fri 15:41>
I think we're getting close to a comprehensive set of objectives (although since you two are the real R user's I leave that decision up to you). Once we've agreed on a set of objectives and agreed on at least to broad strokes of implementation, I think we should start listing out and assigning tasks.
Eric <2009-02-09 Mon 14:25>
I've done a fairly destructive edit of this file. The main goal was to enforce a structure on the document that we can use moving forward, so that any future objective changes are all made to the main objective list.
I apologize for removing sections written by other people. I did this when they were redundant or it was not clear how to fit them into this structure. Rest assured if the previous text wasn't persisted in git I would have been much more cautious about removing it.
I hope that this outline structure should be able to remain stable through the process of fleshing out objectives, and cashing those objectives out into tasks. That said, please feel free to make any changes that you see fit.
Dan <2009-02-12 Thu 10:23>
Good job Eric with major works on this file.
Eric <2009-02-22 Sun 13:17>
So I skipped ahead and got started on the fun part. Namely stubbing out some of the basic functionality. Please don't take any of the decisions I've made so far (on things like names, functionality, design etc…) as final decisions, I'm of course open to and hoping for improvement.
So far litorgy.el and litorgy-script.el can be used to evaluate source code blocks of simple scripting languages. It shouldn't be too hard (any takers) to write a litorgy-R.el modeled after litorgy-script.el to use for evaluating R code files.
See the Sandbox for evaluable examples.
Eric <2009-02-23 Mon 15:12>
While thinking about how to implement the transfer of data between source blocks and the containing org-mode file, I decided it might be useful to explicitly support the existence of variables which exist independent of source blocks or tables. I'd appreciate any feedback… (see free explicit variables)
Eric <2009-02-23 Mon 17:53>
So as I start populating this file with source code blocks I figure I should share this… I don't know if you guys use yasnippet at all, but if you do you might find this block-snippet org-mode snippet useful (I use it all the time).
Overview
This project is basically about putting source code into org files. This isn't just code to look pretty as a source code example, but code to be evaluated. Org files have 3 main export targets: org, html and latex. Once we have implemented a smooth bi-directional flow of data between org-mode formats (including tables, and maybe lists and property values) and source-code blocks, we will be able to use org-mode's built in export to publish the results of evaluated source code in any org-supported format using org-mode as an intermediate format. We have a current focus on R code, but we are regarding that more as a working example than as a defining feature of the project.
The main objectives of this project are…
Objectives and Specs
evaluation of embedded source code
execution on demand and on export
Let's use an asterisk to indicate content which includes the result of code evaluation, rather than the code itself. Clearly we have a requirement for the following transformation:
org → org*
Let's say this transformation is effected by a function `org-eval-buffer'. This transformation is necessary when the target format is org (say you want to update the values in an org table, or generate a plot and create an org link to it), and it can also be used as the first step by which to reach html and latex:
org → org* → html
org → org* → latex
Thus in principle we can reach our 3 target formats with `org-eval-buffer', `org-export-as-latex' and `org-export-as-html'.
An extra transformation that we might want is
org → latex
I.e. export to latex without evaluation of code, in such a way that R
code can subsequently be evaluated using
Sweave(driver=RweaveLatex)
, which is what the R community is
used to. This would provide a `bail out' avenue where users can
escape org mode and enter a workflow in which the latex/noweb file
is treated as source.
How do we implement `org-eval-buffer'?
AIUI The following can all be viewed as implementations of org-eval-buffer for R code:
(see this question again posed in litorgy-R.el)
org-eval-light
This is the beginnings of a general evaluation mechanism, that could evaluate python, ruby, shell, perl, in addition to R. The header says it's based on org-eval
what is org-eval??
org-eval was written by Carsten. It lives in the org/contrib/lisp directory because it is too dangerous to include in the base. Unlike org-eval-light org-eval evaluates all source blocks in an org-file when the file is first opened, which could be a security nightmare for example if someone emailed you a pernicious file.
org-R
This accomplishes org → org* in elisp by visiting code blocks and evaluating code using ESS.
RweaveOrg
This accomplishes org → org* using R via
Sweave("file-with-unevaluated-code.org", driver=RweaveOrg, syntax=SweaveSyntaxOrg)
org-exp-blocks.el
Like org-R, this achieves org → org* in elisp by visiting code blocks and using ESS to evaluate R code.
source blocks
header arguments
(see block headers/parameters)
There are going to be many cases where we want to use header arguments to change the evaluation options of source code, to pass external information to a block of source code and control the inclusion of evaluation results.
inline source evaluation
included source file evaluation
It may be nice to be able to include an entire external file of source code, and then evaluate and export that code as if it were in the file. The format for such a file inclusion could optionally look like the following
#+include_src filename header_arguments
caching of evaluation
Any kind of code that can have a block evaluated could optionally define a function to write the output to a file, or to serialize the output of the function. If a document or block is configured to cache input, write all cached blocks to their own files and either a) hash them, or
- let git and org-attach track them. Before a block gets eval'd, we
check to see if it has changed. If a document or block is configured to cache output and a print/serialize function is available, write the output of each cached block to its own file. When the file is eval'd and some sort of display is called for, only update the display if the output has changed. Each of these would have an override, presumably something like (… & force) that could be triggered with a prefix arg to the eval or export function.
For R, I would say
;; fake code that only pretends to work
(add-hook 'rorg-store-output-hook
'("r" lambda (block-environment block-label)
(ess-exec (concat "save.image("
block-environment
", file = " block-label
".Rdata, compress=TRUE)"))))
The idea being that for r blocks that get eval'd, if output needs to be stored, you should write the entire environment that was created in that block to an Rdata file.
(see block scoping)
interaction with the source-code's process
We should settle on a uniform API for sending code and receiving output from a source process. Then to add a new language all we need to do is implement this API.
for related notes see (Interaction with the R process)
output of code evaluation
textual/numeric output
We (optionally) incorporate the text output as text in the target document
graphical output
We either link to the graphics or (html/latex) include them inline.
I would say, if the block is being evaluated interactively then lets pop up the image in a new window, and if it is being exported then we can just include a link to the file which will be exported appropriately by org-mode.
non-graphics files
? We link to other file output
side effects
If we are using a continuous process in (for example an R process handled by ESS) then any side effects of the process (for example setting values of R variables) will be handled automatically
Are there side-effects which need to be considered aside from those internal to the source-code evaluation process?
reference to data and evaluation results
I think this will be very important. I would suggest that since we are using lisp we use lists as our medium of exchange. Then all we need are functions going converting all of our target formats to and from lists. These functions are already provided by for org tables.
It would be a boon both to org users and R users to allow org tables to be manipulated with the R programming language. Org tables give R users an easy way to enter and display data; R gives org users a powerful way to perform vector operations, statistical tests, and visualization on their tables.
This means that we will need to consider unique id's for source blocks, as well as for org tables, and for any other data source or target.
Implementations
naive
Naive implementation would be to use (org-export-table "tmp.csv")
and (ess-execute "read.csv('tmp.csv')")
.
org-R
org-R passes data to R from two sources: org tables, or csv files. Org tables are first exported to a temporary csv file using org-R-export-to-csv.
org-exp-blocks
org-exp-blocks uses /ndwarshuis/org-mode/src/commit/e680407493eb9c8047842a66e83e6063fa20219b/org-interblock-R-command-to-string to send commands to an R process running in a comint buffer through ESS. org-exp-blocks has no support for dumping table data to R process, or vice versa.
RweaveOrg
NA
reference format
This will be tricky, Dan has already come up with a solution for R, I need to look more closely at that and we should try to come up with a formats for referencing data from source-code in such a way that it will be as source-code-language independent as possible.
Org tables already have a sophisticated reference system in place that allows referencing table ranges in other files, as well as specifying constants in the header arguments of a table. This is described in info:org:References.
Dan: thinking aloud re: referencing data from R
Suppose in some R code, we want to reference data in an org table. I think that requires the use of 'header arguments', since otherwise, under pure evaluation of a code block without header args, R has no way to locate the data in the org buffer. So that suggests a mechanism like that used by org-R whereby table names or unique entry IDs are used to reference org tables (and indeed potentially row/column ranges within org tables, although that subsetting could also be done in R).
Specifically what org-R does is write the table to a temp csv file, and tell R the name of that file. However:
- We are not limited to a single source of input; the same sort of thing could be done for several sources of input
- I don't think we even have to use temp files. An alternative would be to have org pass the table contents as a csv-format string to textConnection() in R, thus creating an arbitrary number of input objects in the appropriate R environment (scope) from which the R code can read data when necessary.
That suggests a header option syntax something like
'(:R-obj-name-1 tbl-name-or-id-1 :R-obj-name-2 tbl-name-or-id-2)
As a result of passing that option, the code would be able to access the data referenced by table-name-or-id-2 via read.table(R-obj-name-1).
An extension of that idea would be to allow remote files to be used as data sources. In this case one might need just the remote file (if it's a csv file), or if it's an org file then the name of the file plus a table reference within that org file. Thus maybe something like
'((R-obj-name-1 . (:tblref tbl-name-or-id-1 :file file-1))
(R-obj-name-2 . (:tblref tbl-name-or-id-2 :file file-2)))
Eric: referencing data in general
So here's some thoughts for referencing data (henceforth referred to as resources). I think this is the next thing we need to tackle for implementation to move forward. We don't need to implement everything below right off the bat, but I'd like to get these lists as full as possible so we don't make any implementation assumptions which preclude real needs.
We need to reference resources of the following types…
- table (list)
- output from a source code block (list or hash)
- property values of an outline header (hash)
- list (list)
- description list (hash)
- more?…
All of these resources will live in org files which could be
- the current file (default)
- another file on the same system (path)
- another file on the web (url)
- another file in a git repo (file and commit hash)
What information should each of these resources be able to supply? I'm thinking (again not that we'll implement all of these but just to think of them)…
- ranges or points of vector data
- key/value pairs from a hash
- when the object was last modified
- commit info (author, date, message, sha, etc…)
- pointers to the resources upon which the resource relies
So we need a referencing syntax powerful enough to handle all of these
alternatives. Maybe something like path:sha:name:range
where
- path
- is empty for the current file, is a path for files on the same system, and is a url otherwise
- sha
- is an option git commit indicator
- name
- is the table/header/source-block name or id for location inside of the org file (this would not be optional)
- range
- would indicate which information is requested from the resource, so it could be a range to access parts of a table, or the names of properties to be referenced from an outline header
Once we agree on how this should work, I'll try to stub out some code, so that we can get some simple subset of this functionality working, hopefully something complex enough to do the following…
questions
Do we want things like a source code block to leave multiple outputs, or do we only want them to be able to output one vector or hash?
This design assumes that any changes will explicitly pass data in a functional programming style. This makes no assumptions about things like source code blocks changing state (in general state changes lead to more difficult debugging).
- Do we want to take steps so ensure we do things like execute consecutive R blocks in different environment, or do we want to allow state changes?
- Does this matter?
So I(eric) may be getting ahead of myself here, but what do you think about the ability to pass arguments to resources. I'm having visions of google map-reduce, processes spread out across multiple machines.
Maybe we could do this by allowing the arguments to be specified?
source-target pairs
The following can be used for special considerations based on source-target pairs
Dan: I don't quite understand this subtree; Eric – could you give a little more explanation of this and of your comment above regarding using /ndwarshuis/org-mode/src/commit/e680407493eb9c8047842a66e83e6063fa20219b/lists%20as%20our%20medium%20of%20exchange?
source block output from org tables
source block outpt from other source block
source block output from org list
org table from source block
org table from org table
org properties from source block
org properties from org table
export
once the previous objectives are met export should be fairly simple. Basically it will consist of triggering the evaluation of source code blocks with the org-export-preprocess-hook.
This block export evaluation will be aware of the target format
through the htmlp and latexp variables, and can then create quoted
#+begin_html
and #+begin_latex
blocks appropriately.
There will also need to be a set of header arguments related to code export. These would be similar to the results header arguments but would apply to how to handle execution and results during export.
Notes
Block Formats
Unfortunately org-mode how two different block types, both useful. In developing RweaveOrg, a third was introduced.
Eric is leaning towards using the #+begin_src
blocks, as that is
really what these blocks contain: source code. Austin believes
that specifying export options at the beginning of a block is
useful functionality, to be preserved if possible.
Note that upper and lower case are not relevant in block headings.
PROPOSED block format
I (Eric) propose that we use the syntax of source code blocks as they currently exist in org-mode with the addition of evaluation, header-arguments, exportation, single-line-blocks, and references-to-table-data.
- evaluation: These blocks can be evaluated through
\C-c\C-c
with a slight addition to the code already present and working in org-eval-light.el. All we should need to add for R support would be an appropriate entry in /ndwarshuis/org-mode/src/commit/e680407493eb9c8047842a66e83e6063fa20219b/org-eval-light-interpreters with a corresponding evaluation function. For an example usinga org-eval-light see /ndwarshuis/org-mode/src/commit/e680407493eb9c8047842a66e83e6063fa20219b/%2A%20src%20block%20evaluation%20w/org-eval-light. - header-arguments: These can be implemented along the lines of Austin's header arguments in org-sweave.el.
- exportation: Should be as similar as possible to that done by Sweave, and hopefully can re-use some of the code currently present in org-exp-blocks.el.
- single-line-blocks: It seems that it is useful to be able to
place a single line of R code on a line by itself. Should we add
syntax for this similar to Dan's
#+RR:
lines? I would lean towards something here that can be re-used for any type of source code in the same manner as the#+begin_src R
blocks, maybe#+src_R
? Dan: I'm fine with this, but don't think single-line blocks are a priority. My#+R
lines were something totally different: an attempt to have users specify R code implicitly, using org-mode option syntax. - references-to-table-data: I get this impression that this is vital to the efficient use of R code in an org file, so we should come up with a way to reference table data from a single-line-block or from an R source-code block. It looks like Dan has already done this in org-R.el.
Syntax
Multi-line Block
#+begin_src lang header-arguments body #+end
- lang
- the language of the block (R, shell, elisp, etc…)
- header-arguments
- a list of optional arguments which control how the block is evaluated and exported, and how the results are handled
- body
- the actual body of the block
Single-line Block
#+begin_src lang body
- It's not clear how/if we would include header-arguments into a single line block. Suggestions? Can we just leave them out? Dan: I'm not too worried about single line blocks to start off with. Their main advantage seems to be that they save 2 lines. Eric: Fair enough, lets not worry about this now, also I would guess that any code simple enough to fit on one line wouldn't need header arguments anyways.
Include Block
#+include_src lang filename header-arguments
-
I think this would be useful, and should be much more work (Dan: didn't get the meaning of that last clause!?). Eric: scratch that, I meant "shouldn't be too much work" :) That way whole external files of source code could be evaluated as if they were an inline block. Dan: again I'd say not a massive priority, as I think all the languages we have in mind have facilities for doing this natively, thus I think the desired effect can often be achieved from within a
to include we shouldn't wast too much effort on it in the beginning. What do you think? Does this accomplish everything we want to be able to do with embedded R source code blocks? ***** src block evaluation w/org-eval-light here's an example using org-eval-light.el first load the org-eval-light.el file [[elisp:(load (expand-file-name "org-eval-light.el" (expand-file-name "existing_tools" (file-name-directory buffer-file-name))))]] then press =\C-c\C-c= inside of the following src code snippet. The results should appear in a comment immediately following the source code block. It shouldn't be too hard to add R support to this function through the `org-eval-light-interpreters' variable. (Dan: The following causes error on export to HTML hence spaces inserted at bol) #+begin_src shell date
existing formats
Source code blocks
Org has an extremely useful method of editing source code and examples in their native modes. In the case of R code, we want to be able to use the full functionality of ESS mode, including interactive evaluation of code.
Source code blocks look like the following and allow for the special editing of code inside of the block through `org-edit-special'.
,## hit C-c ' within this block to enter a temporary buffer in r-mode.
,## while in the temporary buffer, hit C-c C-c on this comment to
,## evaluate this block
a <- 3
a
,## hit C-c ' to exit the temporary buffer
dblocks
dblocks are useful because org-mode will automatically call
`org-dblock-write:dblock-type' where dblock-type is the string
following the #+BEGIN:
portion of the line.
dblocks look like the following and allow for evaluation of the
code inside of the block by calling \C-c\C-c
on the header of
the block.
R blocks
In developing RweaveOrg, Austin created org-sweave.el. This allows for the kind of blocks shown in testing.Rorg. These blocks have the advantage of accepting options to the Sweave preprocessor following the #+BEGIN_R declaration.
block headers/parameters
Regardless of the syntax/format chosen for the source blocks, we will need to be able to pass a list of parameters to these blocks. These should include (but should certainly not be limited to)
- label or id
- Label of the block, should we provide facilities for automatically generating a unique one of these?
- file
- names of file to which graphical/textual/numerical/tabular output should be written. Do we need this, or should this be controlled through the source code itself?
- results
-
indication of where the results should be placed, maybe the following values…
- append
- default meaning just append to the current buffer immediately following the current source block
- replace
- like append, but replace any results currently there
- file
- save the results in a new file, and place a link to the file into the current buffer immediately following the source code block
- table
- save the results into a table, maybe use a table id:range to identify which table and where therein
- nil
- meaning just discard the results
- not sure of a good name here
- flags for when/if the block should be evaluated (on export etc…)
- again can't thing of a concise name
- flags for how the results of the export should be displayed/included
- scope
- flag indicating whether the block should have a local or global scope
- ?
- flags specific to the language of the source block
- ?
- etc…
I think fleshing out this list is an important next step.
Interaction with the R process
We should take care to implement this in such a way that all of the different components which have to interactive with R including:
- evaluation of source code blocks
- automatic evaluation on export
- evaluation of \R{} snippets
- evaluation of single source code lines
- evaluation of included source code files
- sending/receiving vector data
I think we currently have two implementations of interaction with R processes; org-R.el and org-exp-blocks.el. We should be sure to take the best of each of these approaches.
More on the exchange of data at between org-mode and source code blocks at reference to data and evaluation results.
block scoping
(see caching of evaluation)
This inadvertently raises the issue of scoping. The pretend function pretends that we will create a block-local scope, and that we can save just the things in that scope. Sweave takes the make-everything-global approach. I can see advantages either way. If we make block-local scopes, we can save each one independently, and generally speaking it seems like more granularity==more control. If we make everything global, we can refer to entities declared in earlier blocks without having to explicitly import those entities into the current block. I think this counts in the "need to think about it early on" category.
If we did want block-local scopes, in R we can start every eval with something like
;; fake code that pretends to create a new, empty environment (ess-exec (concat block-env " <- new.env()")) (ess-exec (concat "eval(" block-contents ", envir=" block-env ")"))
If we decide we want block-scoping, I'm sure Dan and I can figure out the right way to do this in R, if he hasn't already. I haven't thought at all about how these scope issues generalize to, say, bash blocks.
Maybe this is something that should be controlled by a header argument?
\C-c\C-c
evaluation
With org-mode version at least 6.23, see the documentation for info:org:Context-sensitive commands.
free explicit variables
Maybe we should have some idea of variables independent of any particular type of source code or source block. These could be variables that have a value inside of the scope of the org-mode file, and they could be used as a transport mechanism for information transfer between org-tables, org-lists, and different source-blocks.
Each type of source code (and org-mode types like tables, lists, etc…) would need to implement functions for converting different types of data to and from these variables (which would be elisp variables).
So for example say we want to read the values from a table into an R block, perform some calculations, and then write the results back into the table. We could
-
assign the table to a variable
- the table would be converted into a lisp vector (list of lists)
- the vector would be saved in the variable
-
an R source block would reference the variable
- the variable would be instantiated into an R variable (through mechanisms mentioned elsewhere)
- the R code is executed, and the value of the variable inside of R is updated
- when the R block finished the value of the variable globally in the org buffer would be updated
- optionally the global value of the variable would be converted back into an org-mode table and would be used to overwrite the existing table.
What do you think?
This might not be too different from what we were already talking about, but I think the introduction of the idea of having variables existing independently of any tables or source code blocks is novel and probably has some advantages (and probably shortfalls).
Buffer Dictionary
LocalWords: DBlocks dblocks litorgy el eric litorgical fontification