The Documentation Project

From BioPerl
Jump to: navigation, search


Introduction (maj)

It has been clear for some time (evidently much longer than I've been around here) that the BioPerl documentation bolus is due for a major overhaul. The major issue is not so much one of content (though there are content gaps), but of organization and accessibility. There have been several attempts to improve documentation, both in automated ways (the Deobfuscator, the pdoc system) and human-driven ways (the Scrapbook, the wiki itself). These have been generally successful in themselves, but rather than making it easier for the user, particularly the new user, to find what she is seeking, they create a meta-search problem: which tool do you use to find the thing you're looking for? We enter the old disconnect between developers and users: developers put a premium on adding features; users on, well, usability, which in the first instance means simplicity. We might be tempted to say "RTFM", but it isn't surprising when the user responds "WFM?"

For a large project like BioPerl, it seems to me that documentation is as much an efficiency issue as object overhead or fast argument parsing. Reducing the entry barrier to BioPerl via well organized and well targeted documentation can significantly reduce both analysis time for users and development time for devs. Currently, it seems that the mailing list is best place to get audience-targeted documentation; devs often respond more as wiki search engines than gurus. A serious documentation improvement effort could save hundreds of hours of real time with out a single line of code spilled, integrated over a large component of our user base -- those who want to use BioPerl like they use Perl: to just get something done. Hints {{#comment| test whether user set the "named" parameter }} {{#function|present||{{#not|{{#strpos|{{#1}}|{{#2}}}}}}}} {{#var|See|@=|{{{2}}}}} {{#var|sp|@=| }} {{#if|{{#present|{{#var|See}}|2}}||{{#var|See|@=|{{{2}}}{{#var|sp}}}}||{{#var|See|@=|^.}}}} ({{#if|{{#present|{{#var|See}}|2}}||{{#var|See}}||see{{#var|sp}} }}thread) that BioPerl is going the way of COBOL as people turn back to C/C++ to handle next-generation-sized datasets to the contrary, I think a documentation overhaul could extend the useful life of the project by many years, by catering to this group of users--a group to which we all belong from time to time.

A Discussion

Here are the basic points emerging from a recent list discussion {{#comment| test whether user set the "named" parameter }} {{#function|present||{{#not|{{#strpos|{{#1}}|{{#2}}}}}}}} {{#var|See|@=|{{{2}}}}} {{#var|sp|@=| }} {{#if|{{#present|{{#var|See}}|2}}||{{#var|See|@=|{{{2}}}{{#var|sp}}}}||{{#var|See|@=|^.}}}} ({{#if|{{#present|{{#var|See}}|2}}||{{#var|See}}||see{{#var|sp}} }}thread):

  • The foundation of BioPerl documentation lies in the POD and wiki HOWTOs. Of these, POD is the most fundamental.
  • The wiki and the POD are synergistic, with wiki being the standard place to expand on POD, providing use cases and worked examples.
  • A telling statement: "Bioperl is amazingly useful, but only if you can figure out how to use it. If what you want to do isn't in the howtos, you can be in big trouble." {{#comment| test whether user set the "named" parameter }}

{{#function|present||{{#not|{{#strpos|{{#1}}|{{#2}}}}}}}} {{#var|See|@=|{{{2}}}}} {{#var|sp|@=| }} {{#if|{{#present|{{#var|See}}|2}}||{{#var|See|@=|{{{2}}}{{#var|sp}}}}||{{#var|See|@=|^.}}}} ({{#if|{{#present|{{#var|See}}|2}}||{{#var|See}}||see{{#var|sp}} }}thread)

  • A meta-point: only 7 of the dozens of regular list users commented in the thread.
And gentlemen in England now-a-bed
Shall think themselves accurs'd they were not here,
And hold their manhoods cheap whiles any speaks
That fought with us upon Saint Crispin's day.
-- Henry V, IV, iii

Ponder and answer

In order to establish a workable, sustainable, and modular plan, I think it's worth giving some thought to the following questions. I will be doing so, and I invite readers to add their answers here.


Who is the intended audience for
  • the POD?
  • the HOWTOs?
  • the Pdoc?
  • CPAN docs?
  • the Deobfuscator?
  • the Scrapbook?
  • bioperl-l?
  • the FAQ?
Can the intended audience for any documentation component easily
  • find that component?
  • search that component?
  • understand the content?
  • find alternative resources?


What is the resource of first resort? Of last resort? Do the current answers to these questions invite or repel newbies, users, developers?
  • Perhaps the expected resource of first resort is "the wiki". This is too broad. Many list questions are prefaced by "I looked everywhere,and after [time period] I decided to post my question..." When I answer these questions, often go to the wiki, so I evidently look where the user did not... --Majensen 14:33, 24 August 2009 (UTC)
  • The objective is to create documentation that has a single starting point for at least 50% of the questions asked in mailing list. We've achieved this for certain topics, like SearchIO. In the old days you'd get a query a week about doing something with Blast and we'd repeat something written the previous week, week after week. Then we wrote some HOWTOs so the answer to just about any question on Features or SearchIO was answered by "See the HOWTO". --Brian (via list)
  • one starting page for every single reasonably general question, like "See the Installation page". --Brian (via list)

What improvements are necessary to make those expectations reasonable?
  • How easy is it, for example, to search the mailing list? Is it easy for a newbie to find an appropriate search utility? Is the list organized, or can its metadata (headers, primarily) be automatedly organized, to be search-friendly?
What work is the developer expected to do?
  • We expect POD and tests from developers. Should we also expect a HOWTO?
What work is the Core expected to do?
  • If an automated update of Pdoc fails, is there an identified ball-handler to care of that? Is there a backup person? Is an email sent to these people on failure?

Doc life cycle

What is current, out-of-date, deprecated?
What SOPs or utilities can be put in place to monitor the answers to this?

What works elsewhere?

What makes perl documentation itself work?
  • perldoc provides access in one line to functions, module pod, and language articles. I use it all the time. CPAN provides the same thing online (TMTOWDI). I use it all the time. --Majensen 14:33, 24 August 2009 (UTC)
What about documentation formats?
  • The installer spends a lot of time making html. It is easily accessible? Do people not "in-the-know" know about it?
  • Is documentation (say, POD) worth compiling into other formats, like .info? If bioperl.lisp provides a de facto standard of code formatting, shouldn't BioPerl provide its docs in emacs's de facto standard format? Can the installer do this automatedly?
  • Should we provide/solicit templating for IDEs besides emacs?

Other doc-related issues

Where in the module files should POD be placed? Should all modules be updated?
What subset of Perl Best Practices, if any, should BioPerl "officially" adopt?
Which modules should be brought up to date with such practices, and which should be "grandfathered"?
More automation, or less? Where?
How can we improve searchability on the wiki?

Open-source community involvement and attribution issues

Should we become more active and up-to-date in services such as Ohloh?


Objective Action Items Altruist(s) Timeline Date Added Date Completed Thread/Info Link
{{#comment objective}} {{#comment action item}} {{#comment altruist}} {{#comment timeline}} {{#comment date added}} {{#comment date completed}} {{#comment thread link}}


Define objectives Solicit comments via wiki/list maj by 24 Aug 09 14:33, 24 August 2009 (UTC) 24 August 2009 [1]
Rationalize doc entry points Seriously prune the Main Page cjf,maj 16:48, 27 August 2009 (UTC) 26 September 2009 [2]
Merge FAQ and Scrapbook 16:48, 27 August 2009 (UTC) [3]
Condense/streamline install docs 16:48, 27 August 2009 (UTC) [4]
Provide missing docs Write Align/AlignIO HOWTO 16:48, 27 August 2009 (UTC) [5]
Podify key wiki docs for distribution 01:53, 28 August 2009 (UTC) [6]
Revise key docs/doc access Revise the SeqIO HOWTO 16:48, 27 August 2009 (UTC) [7]
Single official API access on wiki 01:53, 28 August 2009 (UTC) [8]
Rationalize/revise Feature-Annotation docs 16:48, 27 August 2009 (UTC) [9]


{{#comment objective}} {{#comment action item}} {{#comment altruist}} {{#comment timeline}} {{#comment date added}} {{#comment date completed}} {{#comment thread link}}


Documentation front-ends Fix and spiff Deobfuscator dm 11:28, 28 August 2009 (UTC)
Revisions under the hood Push POD to end of modules 01:53, 28 August 2009 (UTC) [10]
Add Status: tags to method pod 01:53, 28 August 2009 (UTC) [11]
Improve <biblio> caching 12:56, 28 September 2009 (UTC) [12]
Personal tools
Main Links