The lights dim, the national anthem plays, and a certificate from the Central Board of Film Certification (CBFC) flashes on the screen. Most viewers note only the “U/A” or “A” rating. Hidden in that certificate, however, is a QR code that once revealed every cut the board demanded before clearing the film.
Since 2017, these cut lists have lived on e-Cinepramaan—the CBFC’s digital certification portal where filmmakers file applications and pay fees. Though designed for industry use, the portal inadvertently allowed public access to censorship records through an overlooked feature: each certificate’s web address ended with an 18-digit number that anyone could modify to view other certificates in sequence.
That transparency ended in June. After entering “maintenance” in late May, the portal returned with random alphanumeric strings replacing the sequential numbers. The change broke eight years of QR codes and blocked systematic access to censorship records.
Fortunately, two Bengaluru developers had already begun preserving the data months earlier. Aman Bhargava, 24, and Vivek Matthew, 26—software engineers who run Diagram Chasing, a data journalism blog—downloaded and structured over 1,00,000 cut records from more than 17,000 films before the portal locked down. Their work now exists as CBFC Watch—India’s first searchable public archive of film censorship.
Also Read | The Censor Board has become a moral tribunal
“This data should be accessible to anyone interested,” said Bhargava, who champions open-source principles. The pair tackle such projects beyond their day jobs, having previously analysed everything from India’s Time Use Survey to Bengaluru’s flood drainage patterns.
Their preservation effort began in December 2024, when they wrote programs to crawl through every URL on e-Cinepramaan. By modifying the sequential numbers in web addresses, they collected modification logs stretching back to 2017. “We managed to collect data for all certificates posted on this portal since 2017,” Bhargava said, noting the records extended through June 2025, when the portal transformed.
The raw data they extracted proved chaotic. “The modification records are arbitrary. A word might appear in Hindi, Malayalam, or English. There’s no standardised format,” Bhargava told Frontline.
Method out of madness
To transform this mess into usable information, they employed a language model that restructured sentences without altering meaning. This allowed them to classify cuts into clear categories: political references, religious references, violent scenes, and abusive language. The cleaning process alone consumed four months.
Rather than dump raw data online, they built an explorer page where users can search specific films or browse patterns. They also published completely cleaned datasets for researchers to analyse independently.
The CBFC logs themselves contained minimal information—no cast names, no directors, just bare certification details. To create meaningful records, Bhargava and Matthew matched each entry with IMDb data, adding posters, cast lists, and synopses. “E-Cinepramaan lacks details on cast, crew, or directors. We had to bring in another dataset,” Bhargava explained.
Initially, the developers envisioned CBFC Watch as a live tracker with biweekly updates pulled directly from e-Cinepramaan. The portal’s transformation killed that plan. “This is primarily an archive and dataset for research now,” Bhargava said. While a “contribute” page allows filmgoers to submit new certificate links for processing, updates will be irregular without automated access.
Also Read | CBFC being used as backdoor to control freedom of expression: Honey Trehan
Despite these limitations, the project has already gained traction. The site recorded 5,000 views within 24 hours of its September 14 launch, with users sharing it widely across social media.
The project carries forward an earlier effort by journalist Aroon Deep of The Hindu, who regularly posted censorship information on social media under the handle “CBFC Watch”. With Deep’s permission, Bhargava and Matthew adopted the name for their comprehensive archive.
Their timing proved crucial. Had they started even a few months later, eight years of censorship data would have vanished behind the CBFC’s new system. Instead, two developers with curiosity and persistence preserved what the government—intentionally or not—made inaccessible.
“This is just data. All the analysis, charts, and code are visible. Nothing’s hidden,” Bhargava said. CBFC Watch doesn’t campaign against censorship—it simply ensures public records remain public, even when official portals close their doors.