Bell Eapen MD, PhD.

Bringing Digital health & Gen AI research to life!

Word to LaTeX: How paperajcli Bridges Two Academic Worlds

Academic writing often lives in two incompatible ecosystems. Microsoft Word is where collaboration happens—tracked changes, inline comments, and committee feedback. LaTeX is where publication happens—precise typesetting, journal templates, and mathematical formatting. Moving between these worlds has traditionally been frustrating, especially when Pandoc alone can’t handle template integration or citation workflows smoothly. As the repository notes, this process “often becomes cumbersome” when using Pandoc directly.

Word to Latex with Paperajcli

Image credit: Petar Milošević, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

paperajcli is a lightweight command‑line tool designed to solve this problem. It lets you write collaboratively in Word while producing clean, modular LaTeX files ready for any journal or thesis template. It’s a simple idea with a big impact: mark sections in Word, export them as LaTeX, and drop them into your template with zero fuss.


Why Word and LaTeX Still Need Each Other

Word remains the universal tool for drafting manuscripts with co‑authors, especially those who prefer not to touch LaTeX. It excels at:

  • Commenting and tracking changes
  • Quick edits
  • Committee and multi‑author workflows

LaTeX, on the other hand, is essential for:

  • Journal and thesis templates
  • Bibliography control
  • Mathematical typesetting
  • Figure and table environments
  • Cross‑referencing

The challenge is getting from one world to the other without losing structure, citations, or formatting. paperajcli provides a structured bridge.


What paperajcli Does

The tool works by detecting custom delimiters inside a .docx file and exporting each marked section into its own .tex file. The repository explains that it “exports each marked section into its own LaTeX file” using these delimiters.

Example

If your Word document contains:

<paperaj-introduction>
Introduction text…
</paperaj-introduction>

<paperaj-methods>
Methods text…
</paperaj-methods>

paperajcli produces:

  • introduction.tex
  • methods.tex

as clean, modular LaTeX files.

Headings are preserved—Word’s H1 becomes \section{}, H2 becomes \subsection{}—ensuring your structure remains intact.

These files can then be included in any LaTeX template using:

\input{myfolder/methods.tex}

Native LaTeX Commands Inside Word

One of the most powerful features is that paperajcli preserves LaTeX commands written directly in Word. The repository confirms that commands like \cite{}, \href{}, \ref{}, \label{}, and math environments are “automatically un‑escaped during conversion”.

This means you can write:

“As shown in Figure \ref{fig:architecture}…”

directly in Word, and the LaTeX output will behave exactly as expected.

For citations, the tool is compatible with Zotero and other BibTeX‑based managers. The repository even includes a CSL file to ensure Pandoc citation compatibility.


Figures, Tables, and Cross‑References

Figures and tables are often the hardest part of Word‑to‑LaTeX conversion. paperajcli includes thoughtful post‑processing to make this seamless:

  • Figure captions written as
    Figure 1: Caption text
    are converted into proper LaTeX figure environments.
  • Add TWOCOLUMN in Word to trigger figure* environments.
  • Add LATEXROTATE to generate rotated figures via sidewaysfigure.
  • Cross‑references like Figure_1 or Table_2 are automatically converted to \ref{} commands.

All of these behaviours are documented in the repository’s post‑processing section.


A Clean, Reproducible Workflow

The repository outlines a recommended workflow that blends Word, LaTeX, Zotero, and Overleaf smoothly:

  1. Git Clone a LaTeX template from Overleaf.
  2. Run paperajcli to export Word sections into a directory inside the template.
  3. Insert each .tex file using \input{}.
  4. Manage citations in Zotero and export a .bib file.
  5. Add the .bib file to your project and compile.

This workflow “keeps the collaborative convenience of Word while giving you the precision and template‑compatibility of LaTeX”.


How to Use the CLI

The primary command is:

npx paperajcli latex <input-file> <output-directory>

Arguments

  • file: path to the .docx file
  • outputDir: where .tex files and media will be saved.

Useful Flags

  • --dry-run to preview actions without writing files.
  • --extract-media / --no-extract-media to control image extraction.
  • --help for documentation.

Prerequisites

  • Node.js 18+.
  • Pandoc installed and available in PATH.

Where paperajcli Fits in the Writing Ecosystem

Pandoc

Pandoc is powerful but not template‑aware. It doesn’t split documents into modular sections or preserve custom delimiters. paperajcli adds structure and workflow on top of Pandoc.

Zotero + Better BibTeX

Zotero remains the easiest way to manage references. Exporting a .bib file ensures compatibility with LaTeX citation packages like natbib or biblatex.

Overleaf

Overleaf is the natural destination for collaborative LaTeX editing. With paperajcli, you can maintain a hybrid workflow:

  • Draft in Word
  • Convert with paperajcli
  • Finalize in Overleaf

GitHub + CI

Because paperajcli outputs modular .tex files, it integrates well with:

  • Git version control
  • Automated LaTeX builds
  • Continuous integration pipelines

Real‑World Use Cases

Graduate Theses

Committees often insist on Word drafts. Universities often require LaTeX templates. paperajcli bridges the two without manual rewriting.

Multi‑Author Manuscripts

When co‑authors refuse to use LaTeX, you can still maintain a LaTeX‑based submission pipeline.

Scientific Reports

Figures, tables, and equations survive the transition intact.

Institutional Templates

Many institutions provide rigid LaTeX templates. With paperajcli, you can drop in modular sections without restructuring everything.


Why This Workflow Matters

The academic writing process is rarely linear. Drafts move between collaborators, supervisors, editors, and reviewers. Word is the lingua franca of collaboration; LaTeX is the lingua franca of publication. paperajcli respects both worlds.

It gives researchers:

  • A clean separation between drafting and typesetting
  • A reproducible, template‑friendly workflow
  • A way to preserve citations, math, figures, and structure
  • A modular LaTeX output that plays nicely with Git and Overleaf

It’s a small tool that solves a big, persistent problem.

Pragmatic Research That Builds and Travels

I have noticed a steady shift from abstract theorizing toward pragmatic research, resulting in tangible, reusable artifacts across many areas. These artifacts are not just code; they are models, methods, algorithms, datasets, and tools that solve real operational problems. In areas where generative AI is already changing workflows, the value of such pragmatic research is becoming unmistakable.

Image credit: Justmee3001, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

Why building matters now

The catalyst is twofold. First, the technical maturity of generative AI and related toolchains has lowered the cost of moving from idea to prototype. Second, health systems and organizations are asking for systems that integrate with workflows and regulatory constraints rather than for more conceptual frameworks. In practice, this means moving upstream in the research lifecycle: designing artifacts with deployability, explainability, and governance in mind, and creating reproducible stacks that others can use.

Open-source availability plays a special role. When models, algorithms, and tools are shared openly, they invite scrutiny, rapid iteration, and safer deployment, especially in high-stakes domains like healthcare, where transparency aids validation and trust. Open artifacts accelerate safe, community-driven improvements and reduce single-vendor lock-in, improving the odds that a research output will see real-world use.

How evaluation and impact change

Traditional academic success metrics emphasize conceptual novelty and citation counts. For pragmatic research, those metrics are necessary but insufficient. The new signals of value include artifact availability, adoption, downloads, forks, integration reports, and even social engagement that indicates uptake and practitioner interest. Empirical evaluation will increasingly combine:

  • Classical metrics from peer review and controlled experiments.
  • Community signals (downloads, GitHub stars/forks, package installs).
  • Operational outcomes (reduced task time, fewer errors, improved throughput).
  • Policy and governance readiness (documentation, auditing hooks, monitoring plans)

As researchers build usable systems, journals and conferences will need to evolve their review criteria to assess reproducibility and real-world applications, not just the strength of theoretical claims.

Sharing, incentives, and scholarly credit

Open-source distribution is central to the pragmatic approach because it enables external validation and iterative refinement. But scholarships must also evolve to reward the labor of engineering, documentation, and maintenance. Practical contributions, well-documented software and model releases, replicable deployment recipes, and usable toolkits should become first-class scholarly outputs. Peer communities should value artifacts that show measurable use in the wild, not just theoretical elegance.

Risks and guardrails

A pragmatic focus raises important risks: rushed or poorly validated tools entering clinical environments, fragile artifacts that break in new settings, and overreliance on usage metrics that can be gamed. Academic conferences and funders must insist on transparency: open validation datasets (where privacy allows), clear documentation of model limitations, and post-deployment evaluation plans.

What this means for MIS and health informatics researchers

For MIS researchers, the pragmatic paradigm reframes scholarship as product plus evidence. Studies should connect organizational processes, human factors, and deployed systems, measuring how an artifact changes decisions, coordination, or resource allocation. For health informatics scholars, the emphasis on safety, explainability, and auditability becomes non-negotiable; artifacts must be designed with clinical oversight, privacy-preserving techniques, and regulatory constraints in mind.

Practically, scholars will benefit from adopting engineering best practices: continuous integration for models, packaged reproducible environments, clear APIs, and user-centered design. Collaboration across disciplinary boundaries, clinical partners, product engineers, ethicists, and implementation scientists will be essential to translate artifacts into impact.

Research that travels

The pragmatic paradigm restores a simple promise: research should travel beyond the page. When MIS and health informatics scholars build artifacts designed for real settings and share them openly, scholarship becomes a living conversation, one of iterative improvements, operational learning, and measurable benefits. Publication will no longer be the last step in the journey; it will be a milestone on the route to adoption, where downloads, forks, deployment stories, and measurable outcomes tell the fuller story of impact. In an era powered by generative AI, the most consequential research will be the kind that people can pick up, run, and improve. Research that travels beyond the lab or paper into real-world settings.

DHTI: a reference architecture for Gen AI in healthcare and a skill platform for vibe coding!
https://github.com/dermatologist/dhti
0 forks.
17 stars.
8 open issues.

Recent commits:

The art of taking online help:

I am not a big time researcher with lots of international experience. However I would like to make an attempt to suggest few guidelines for the young Indian bioinformatician, seeking online help for project or showcasing their profile online.

How to address a researcher online? Generally in research community, people are not bothered much about show of respect. Hence sir, respected sir, the most adorable etc can be translated to lack of confidence or to too much submissiveness. Hence it is appropriate to address anybody by the second name adding the appropriate title. Just using the first name is also OK. However title is often taken seriously and addressing a Dr/Prof as Mr is a cardinal sin even if you add a liberal dose of sir/almighty to that.

Career guidance is often done face to face or over the phone or through forums specifically dedicated for that. However before posting career guidance questions to forums search the forum for similar questions unless your profile is unique. Questions like I am going to finish my Kinder Garden What should I do next to become a successful bioinformatician is unlikely to fetch many answers. If you dont have enough time to search the forum, dont expect anybody else to send a personal two page letter to you.

The same applies to very broad, open ended questions. Questions like How is bioinformatics important for clinical medicine? is unlikely to get much attention. Be as specific as possible. Do not expect others to provide complete answers in a platter. Answers will be mostly very short, incomplete and often cryptic (because you may not know what the other person is talking about). Be ready to do some background research on the answer rather than asking for more information.

Posting your profile in online forums is also an art. Bioinformatics is a very broad field and employers look for certain specific skills which you may not always have. I often see sequence analysis, genomics, proteiomics, computer programming, PERL, RUBY, EMERALD, systems biology, drug designing, structural and molecular biology, talking, reading and sleeping in the skill set, everyone competing to make the complete list. In reality, no body can be a complete bioinformatician and it is better to showcase your core competency which needs to be substantiated by your projects or publications.

Please post your comments / criticisms / suggestions here.