Emacs User Survey Analysis
An exploration of the 2020 Emacs User Survey results
1. Basic
Frequently thorought this document I will refer to “Emacers” or “Doomers” or “Vanilla users”. In every such case imagine a little extra “(that responded to this survey)” caveat — sampling bias is no joke.
The respondent pool looks fairly diverse though, so that’s rather nice 😃.
This is an analysis of the publically avalible data made avalible from the 2020 Emacs User Survey.
This analysis was done with the intent of helping the Emacs community understand itself better, and which aspects of Emacs could benefit the most from development effort.
Feeling lazy? Jump to Conclusions.
1.1. Univariate breakdowns
Pairwise incidence matrix adds to [x,y]
for every result containing both x
and y
.
1.1.1. Survey respondents
1.1.2. Languages
Throughout this analysis, I will use a number of pairwise incidence matrix plots. This is useful when examining variables which may hold multiple values, seeing which pairs of values do and don’t appear together.
Each square has \(1\) added to it, for each case where the variable of its row and column are both present. So, the \((x, y)\) cell contains the number of instances where both \(x\) and \(y\) were present. The diagonal \((x, x)\) simply gives the number of times \(x\) appears.
In the proportional pairwise incidence matrix, the values of each row are divided by the diagonal \((x, x)\). Now the \((x, y)\) entry gives the proportion of the time that \(y\) is present when \(x\) already is.
This visualisation is also applied to correlation matrices further on.
The proportional pairwise incidence matrix simply divides each [x,y]
entry by [x,x]
.
It could be interesting to consider how the Emacs community’s use of languages differs from programmers in general. I considered using the Tirobe index, but the StackOverflow developer survey better mirrors the style of question used in this survey (list languages you use vs. primary language).
1.1.2.1. Observations
- The ’usual suspects’ seem popular (Python, Bash, HTML, Javascript, etc.)
- Haskell seems more popular than one would expect (similar to Go)
- Given that this is an Emacs survey, I would have expected more Lisp. Perhaps this relates to the low portion of responses that rate their elisp proficiency above “simple functions”, or people are simply assuming elisp and only checking “lisp” when they use other lisps
- Python, Bash, and Javascript can be thought of as “the big three” — they’re consistently used a lot in combination with other languages
- C, C++, and Assembly really like each other
- The distribution of people using Haskell with other languages is unusually flat / uniform.
- Julia users seem exceptionally monogamous, only dallying with a little Haskell and a pinch of R
- Python is nigh-universally popular, with the two exceptions of Haskell (but consistent with Haskellers general use of other languages), and oddly — Typescript.
- C users seem to tend to use a large number of other languages as well.
1.1.3. Packages
1.1.3.1. Observations
- The top 4 packages are clear: Magit, Org-mode, Projectile, and LSP-mode
- Magit seems to have uniquely broad appeal, no other package comes close in the package proportional pairwise incidence matrix
1.1.4. Emacs use cases
1.1.4.1. Observations
- Use cases are very mixed.
- The vast majority of responses consisted of software development and writing
- Software developers mix uses the most, followed by writers
1.1.5. Disabled UI elements
1.1.5.1. Observations
- Barely anyone likes the tool bar, almost everyone likes the modeline
- What’s with these modeline people? Disable the modeline but keep everything else. They’re crazy.
- Tool and scroll bar dislike are most closely linked
- Splash screen haters seem to dislike everything else (except the modeline) pretty evenly
1.2. Breakdowns by category
In the following graphics, to make it easier to compare the proportion of users who match a criteria within each framework, the user counts are normalised. To gain an intuition for the overall situation, mix the Custom column with a pinch of Doom, Vanilla, and Spacemacs.
1.2.1. Observations
Oh wow, a lot to unpack. Let’s pick some highlights.
1.2.1.1. Purpose
- Work usage high across frameworks, and all have a decent slice of hobbyists
- Custom almost entirely work + hobby
- Doom less popular with tinkers, more with students and hobbyists
1.2.1.2. Use case
- Vanilla particularly popular for writing
- Doom is more popular for research writing, but outdone by “Other” which is highest (of the frameworks) in research writing and other.
1.2.1.3. Version
- Doom users are the most up-to-date
- Vanilla users use older versions the most
1.2.1.4. OS
- Doom and Prelude have the least Windows users
- Vanilla has the most BSD users
- Prelude has the most Mac users
1.2.1.5. Run mode
- Vanilla users use the daemon the least
1.2.1.6. GUI/TUI
- TUI is massively more popular with Vanilla users
- GUI very slightly more popular with Doom/Spacemacs than the rest
1.2.1.7. Keybindings, now + initial
- Doom and Spacemacs
- Starts as half Vim half Emacs
- another third converts from Emacs to Vim bindings from “initial” to “now”
- Vim keybindings are popular, and well-received by these users
- CUA least popular
- Everything else
- around 80% Emacs, but more like 90% for Prelude
- Not much change between “initial” and “now”
- Custom users grab some other keybindings
1.2.1.8. Previous editor
- Doom slightly more popular than Spacemacs for ex-Vimmers
- Doom twice as popular than the next most (Spacemacs) for VSCode users
- None of the others differ notably
1.2.1.9. Org usage
- Doom users use Org the most, but not by much
- However rate of “not using org” is the lowest by a fair bit
- Across frameworks, around half use Org daily, and 80% use Org
1.2.1.10. Completeion
- Doomers like Ivy, Spacers like Helm
- Half of Vanillans don’t like completion it seems
- but those that do, use ido as much as ivy/helm
- Other frameworks have a pretty consistent ~15% on ido
1.2.1.11. Elisp package management
- use-package rules, and a lot of other people like package.el
- spacemacs does it’s own thing mostly
1.2.1.12. Elisp package source
- Melpa dominates
- Doom users grab packages from source much more than anyone else
- Prelude and spacemacs seem to avoid source
1.2.1.13. Theme
- Prelude users like zenburn a fair bit
- Doomers like doom-one
1.2.1.14. Error checking
- Most vanilla users don’t make mistakes 😉
- Everybody else is fairly similar (mostly flycheck, some rather confident individuals, and a small slice of flymake)
1.2.1.15. TRAMP
- Consistently around 50/50 usage
1.2.1.16. Terminal emulator
- Doomers love vterm
- Eshell is generally pretty popular (quarter of users)
1.2.1.17. Mail client
- Quarter of people do mail in Emacs it seems
- Mu4e dominates in Doom and Custom, semi-even split between Mu4e/Notmuch elsewhere
1.2.1.18. Elisp proficiency
- Consistantly, half of people feel confident with simple functions, and most of the remainder with copy and paste
- Custom users are the most confident about package writing by far
1.3. Breakdown by Emacs experience
1.3.1. Framework
Let’s now look at the the distribution of years of Emacs experience, by framework, normalised by the total users of each framework.
Now normalising by total Emacs usage,
1.3.2. Observations
- Consistent preferences throughout the 10-30 year experience range
- Only one dip as users become more recent, which is from ~2000–2005
- dot com bubble?
- The few 30+ year users are almost all on Custom + Vanilla
- Spacemacs has a ~5 year wide peak of ~15% centred on 3 year old users
- Prelude has a ~10 year wide to peak of ~3% users centred on 15 year old users
- Doom’s popularity looks like a trumpet bell, almost half of new Emacs users (who are involved in the community) seem to be using Doom.
1.4. Prior Editor/IDE
1.4.1. Observations
- Initially, the majority of users were ’fresh’ to Emacs (no prior editor/IDE)
- Vim has semi-consistently been a source for around a quarter of new users, though that’s been increacing to almost half as of late
- Eclipse, Notepad++, and Sublime have all ’peaked’
- The proportion of users coming from VSCode has risen rapidly, from 5% to 30% over 5 years.
2. Text mining
Here, four techniques are applied:
- Word clouds, to for an indication of which words are most prevelent
- Association graphs, where links are made between words that appear together a lot
- Cluster dendogram, a hierachical tree of words
- Response ’represetativeness’
- We have word frequency data
- Responses are given points equal to the number of times a word is seen in the corpus for each of the 100 most frequent words (the same words seen in the word clouds) to create a ’represetativeness score’
- We plot the distribution of response points, and provide the top responses to examine
2.1. Org mode purpose
2.1.1. Sentiments
- Org is used for all kinds of writing, primaraly note taking
- A lot of people use it with task management, todo list management, … helping themselves get organised (see the L3 chunk of the dendogram)
- People who reference writing with Org tend to mention
- research
- literate programing
- The use of the task management facilities of Org is split between personal and work settings (see association graph)
2.2. Emacs improvements
2.2.1. Sentiments
- Performance, speed improvements are popular
- Seems to be some hope that gccemacs and multithreading may be good for this
- Talk about:
- A more modern GUI
- Better defaults
- LSP support
- New users may struggle in getting Emacs to “just work” (emacs–new–easier–make–work)
2.3. Emacs strengths
2.3.1. Sentiments
- Extensibility, Extensibility, Extensibility
- Oh, and flexibility, configuration, customisation, …
- Emacs lisp can do anything I want
- It’s free software
- Great community, who have created a good package ecosystem
- Magit and Org being standout examples, which “just work”
- Use one editor for everything text
- good programming language support
2.4. Emacs learning difficulties
2.4.1. Sentiments
- Keybindings are the main stumbling block
- Elisp is hard to get into, looks really strange at first
- lots of people didn’t understand it, some still don’t
- It takes a lot of time to get comfortable with the ’basics’
- Not enough help getting started. Interested in a good tutorial.
2.5. Emacs, one thing to do differently
2.5.1. Sentiments
- Better defaults / language support
- Modern defaults
- Need to “just work” better
3. Multivariate analysis
To perform multivariate analysis, I’ll examine the subset of questions and responses that I feel can be (sensibly) placed on a numeric scale.
R
os_score <- # unix-ness match_scorer(os_matcher, c("BSD"=0, "Linux"=1, "MacOS"=2, "WSL"=3, "Windows"=3, "Other"=NA)) usecase_score <- # how much coding match_scorer(usecase_matcher, c("Software Development"=0, "Data Science"=1, "Research Writing"=2, "Writing"=3, "Other"=NA)) version_score <- match_scorer(version_matcher, c("25"=25, "26"=26, "27"=27, "28"=28, "gcc"=28, "Other"=NA)) keybindings_score <- # how far from defaults match_scorer(keybindings_matcher, c("CUA"=0, "Emacs"=1, "Vim"=2, "Other"=3)) usage_score <- # how frequent match_scorer(usage_matcher, c("daily"=4, "weekly"=3, "monthly"=2, "time to time"=1, "don't use"=0, "no"=0, "Other"=NA)) package_repo_score <- # how walled-garden match_scorer(package_repo_matcher, c("elpa"=0, "melpa"=1, "source"=2, "Other"=NA)) elisp_skill_score <- # how proficient match_scorer(elisp_skill_matcher, c("packages"=3, "simple functions"=2, "copy paste"=1, "none"=0, "no"=0, "Other"=NA)) contribution_score <- # how much contributing match_scorer(contribution_matcher, c("maintainer"=3, "regularly"=2, "time to time"=1, "no"=0, "Other"=NA))
3.1. Pairwise correlation
How’s a pairwise correlation matrix look?
3.1.1. Observations
These variables exhibit a high degree of independance, with few exceptions.
It is interesting that MELPA contribution is more strongly correlated with elisp proficiency than contribution to the Emacs core.
3.2. PCA
This decline in contribution to total variance is rather slow. Let’s look at the first few PCs.
So far, this direction of analysis does not look very promising.
This has been good for establishing the independence between these factors, and it is interesting to see the scree plot and loadings.
4. Conclusions
This survey was, in many respects very successful. It had 7344 respondents, from a mix of sources.
Unsurprisingly, the respondents seem to be heavily biased towards more community-involved users. For instance, using the number of self-reported MELPA maintainers (394), the total respondents and total number of MELPA packages (\(\sim\,\)4,800) suggest a mere 90,000 Emacs users globally. The last StackOverflow survey that polled Development Environments indicated StackOverflow sees around 2 million Emacs users monthly.
4.1. The Current State of Affairs
The single most apparent result of this survey is the diversity. There is no good ’average’ respondent. Emacs is used primarily for programming, however only 27% of respondents only listed Software Development as their use of Emacs. It’s a similar story when it comes to languages, where there are half as many people using Emacs for Haskell as C++. It is impossible to make an accurate generalisation about the nature of Emacs’ use.
However, it is posible to make generalisations about what Emacs users like. In a word: “Extensibility” (and to the surprise of no one). Related terms like “Versitility”, “Flexibility”, “Customisation”, etc. come up frequntly in the responses. I doubt the apparent diversity of use cases, and the headline strength of Emacs being “Extensibility” are a coincidence.
The respondents are predominantly on Linux (65%), with most of the rest on MacOS (25%), then a sliver on Windows (10%) / BSD (2%). This is a huge Compared to the 2020 StackOverflow Survey, BSD is 20x more prevalent, Linux 2.5x, MacOS 1x, and Windows 0.15x.
4.2. Trends
As the first Emacs User Survey, one can hope that future instances of this survey (this would be fantastic annually, or biannually) will provide the ability to examine trends within the community.
Without historical data to compare to, the best that can be done is to look to decisions that are rarely changed. We can the examine this with reported Emacs years of experience to get an idea of shifts in the community.
The first prominent choice is which starter-kit/framework people choose to build their configuration on. The most striking shift is the arrival of the Doom Emacs framework, which appears exceptionally attractive to newer users. If one were to treat this as a growth rate, 40% of the growth in Emacs users would be from Doom users alone. While Spacemacs has also been quite popular, it appears to have peaked among individuals who have been using Emacs for two years.
The second prominent choice is choosing Emacs in the first place. Over the past four decades there has been a monumental shift in the background of new Emacsers]]. Early on around 60% of new Emacs users had no prior experience with text editors / IDEs. Now, that only applies to 3% of new users. As other editors have faded into the dust, the majority of new users stem from two sources:
- 45% from Vim, up from 30% a decade ago (moderate increace)
- 30% from VSCode, up from 5% five years ago (massive increace)
One can suspect that the appearence of frameworks like Doom and Spacemacs may have played a role in both the increace of users, and the increace coming from Vim/VSCode — however we have no way of investigating whether they’re a cause or effect from this survey.
Concerningly, the apparent growth in Emacs users indicated does not seemed to be mirrored in development of Emacs itself. The commit frequency seems to have peaked in the late 2000s (bu hasn’t dropped much since), and the number of first-time contributors peaked in 2012.
4.3. Pain points (new users)
With this section it’s worth keeping in mind there is likely a strong survivor bias at play — only those that perservered through any difficulties they faced woud still be using Emacs and answering this survey.
Three topics consistantly appeared as off-putting factors
- Keybindings
- Four decades ago the keyboard / CUA landscape was very different
- 12% of all respondants mentioned keybindings when discussing learning difficulties
- Lack of a good tutorial
- Without anything, Emacs can be overwhelming
- Whe completely new to Emacs, the manual can also be overwhelming
- Elisp
- Hard to work out where to start (see: Tutorial)
- The non-elisp way of customising Emacs is not as obvious and smooth (to use) as it should be
4.4. Desired improvements
Bearing in mind the apparent bias towards Emacs-developers discussed earlier, the three most-mentioned topic seem to be:
- Improved performance
- Improved threading / async / coroutines
- Better defaults, OOTB language functionality
- Oh, and inclusion some generally useful tools like company/magit
4.5. Final comments
All in all, I think this paints a rather positive picture for the state of Emacs and its community. Interest in Emacs seems on the rise, likely helped by the popularisation of Emacs starter kits / frameworks — which are exploring ways to make Emacs more accessible to certain segments of the population (ex-Vimmers for instance).
Some of the lesser pain-points, and a few major desired improvements are actively being addressed as I write this (thanks to gccemacs and pgtk), and LSP is unlocking a fantastic amount of work on language-specific functionality. I am optimistic that with time other prominent concerns/desires will also be addressed, and with luck future surveys will be able to interrogate the community about their involvement with Emacs development.
To everyone that participated in the survey, thank you! It is my hope that these results, and (with luck) those of future surveys will help us better understand the Emacs community, and inform development.