[China] Draft Measures for Labelling of AI-Generated Synthetic Content, draft Labelling Method for AI-Generated Content
What
On 14 Sep 2024, the Cyberspace Administration of China (CAC) issued for public consultation:
the draft Measures for the Labelling of AI-Generated Synthetic Content
the draft Cybersecurity Technology - Labelling Method for Content Generated by AI, together with Explanatory Notes.
I think both ought to be read together for increased context to each, so this post discusses both instruments starting with the Measures.
Note: It is challenging to translate technical terms, and different machine translations use different terms sometimes. In this post I have used the translations which I thought might be closest to the intent, and have retained most of the direct translations than try to refine the grammar, so as to try and preserve intent and nuances.
Draft Measures for the Labelling of AI-Generated Synthetic Content
Note:
The date at the bottom of the draft Measures indicate that the Measures may come into force this year.
Some machine translations of the Measures translate the obligations within as being optional than mandatory i.e., they translate “应当” as “should” and not “must”. Given the nature of the Measures - that they are not guidelines but are meant to be complied with, I think the more accurate English translation is “must”.
Instead of the word “labelling”, some machine translations use the word “identification”. When read in the context of China’s three AI laws and the draft Cybersecurity Technology - Labelling Method for Content Generated by AI, I think the more accurate English translation is “labelling”.
Who this applies to1
Internet information service providers (“IISPs”) that carry out the labelling of AI-generated synthetic content in accordance with the:
Internet Information Service Algorithm Recommendation Management Provisions
Internet Information Service Deep Synthesis Management Provisions
Interim Measures for the Management of Generative Artificial Intelligence Services
The Measures do not apply to the following entities that develop or apply AI-generated and synthesis technologies but do not provide services to the domestic public: industry organisations, enterprises, educational and research institutions, public cultural institutions, and relevant professional institutions.2
Terminology3
“AI-generated synthetic content” refers to text, images, audio, video, and other information created, generated, or synthesized using AI technology.
Labelling of AI-generated synthetic content includes explicit and implicit labels.
Explicit labels refers to labels added to the generated synthetic content or interactive scene interface, presented in a way that can be clearly perceived by users, such as text, sound.
Implicit labels refers to labels added to the data of the generated synthetic content file through technical measures, which is not easily perceived by users. [DC: This refers to metadata, as you will see below.]
When must explicit labels be added to generated synthetic content?4
—if the generated synthetic service falls under Article 17 para 1 of the Internet Information Service Deep Synthesis Management Provisions:
Article 17: Where deep synthesis service providers provide the following deep synthesis services which might cause confusion or mislead the public, they shall make a conspicuous label in a reasonable position or location on information content they generate or edit, alerting the public of the deep synthesis generation:
(1) services such as smart dialogue or smart writing, etc., which simulate natural persons to generate or edit texts;
(2) speech generation services such as voice synthesis and imitations or editing services that significantly change personal identification characteristics;
(3) services that generate images or video of virtual persons such as face generation, face swapping, face manipulation, and gesture manipulation, or editing services that significantly change personal identification characteristics;
(4) generation or editing services such as realistic immersive scenes;
(5) other services that have functions that generate or significantly alter information content.
The explicit labels must adhere to these requirements:
For text, add text prompts, universal symbol prompts, or other labels at the beginning, end, or appropriate positions in the middle, or add prominent prompt labels in the interactive scene interface or around the text.
For audio, add voice prompts, audio rhythm prompts, or other labels at the beginning, end, or appropriate positions in the middle, or add prominent prompt labels in the interactive scene interface.
For images, add prominent prompt labels at appropriate positions.
For videos, add prominent prompt labels at appropriate positions on the initial screen and around the video playback area, and may add prominent prompt labels at the end and appropriate positions in the middle of the video.
When presenting virtual scenes, add prominent prompt labels at appropriate positions on the initial screen, and may add prominent prompt labels at appropriate positions during the continuous service of the virtual scene.
For other generated synthetic service scenarios, add explicit labels with prominent prompting effects according to their specific application characteristics.
IISPs that allow downloading, copying, exporting, or other means of accessing generated synthetic content must ensure that the files contain explicit labels that meet these requirements.
When must implicit labelling be added to generated synthetic content?5
IISPs must add implicit labelling to the metadata of the generated synthetic content file in accordance with Article 16 of the Internet Information Service Deep Synthesis Management Provisions:
Article 16: Deep synthesis service providers shall employ technical measures to attach symbols to information content produced or edited by their services' users that do not impact users' usage, and store log information in accordance with laws, administrative regulations, and relevant state provisions.
Metadata of the file refers to descriptive information embedded in the file header according to a specific encoding format, used to record information such as the file’s source, attributes, purpose, and copyright.
IISPs are encouraged to add implicit labelling in the form of digital watermarks to the generated synthetic content.
IISPs that provide internet information content dissemination platform services must regulate the dissemination of generated synthetic content6
They must take these measures:
Verify whether the file metadata contains implicit labels. If so, add prominent prompt labels around the published content in an appropriate manner to clearly remind users that the content is generated synthetic content.
If not to #1 but the user declares it as generated synthetic content, add prominent prompt labels around the published content to remind users that the content may be generated synthetic content.
If not to #1 and the user does not declare it as generated synthetic content but the service provider that provides internet information content dissemination platform services detects explicit labels or other traces of generated synthesis, it may be identified as suspected generated synthetic content, and prominent prompt labels should be added around the published content to remind users that the content is suspected to be generated synthetic content.
For confirmed, possible, and suspected generated synthetic content, add information on the attributes of the generated synthetic content, the name or code of the dissemination platform, the content number, and other dissemination element information to the file metadata;
Provide necessary labelling functions and remind users to actively declare whether the published content contains generated synthetic content.
Obligations of internet application distribution platforms
When they review the listing or launch of an application, they must verify whether the IISP has provided the required labelling functions for generated synthetic content.7
Obligations between IISPs and users
IISPs must clearly explain the methods, styles, and other specifications of the labelling of generated synthetic content in the user service agreement and remind users to carefully read and understand the relevant labelling management requirements.8
If users require the IISP to provide generated synthetic content without explicit labels, it can be provided if the user agreement clearly defines the user’s labelling obligations and usage responsibilities. Relevant logs should be retained for no less than six months.9
When users upload generated synthetic content to a IISP that provides internet information content dissemination platform services, they should actively declare and use the labelling functions provided by the platform for labelling.10
No organisation or individual shall maliciously delete, tamper with, forge, or conceal the labels of generated synthetic content as stipulated in these Measures, provide tools or services for others to carry out the above malicious actions, or harm the legitimate rights and interests of others through improper labelling methods.11
IISPs should conduct labelling in accordance with the requirements of relevant mandatory national standards.12 [DC: This would refer to the Cybersecurity Technology - Labelling Method for Content Generated by AI - see below.]
Draft Cybersecurity Technology - Labelling Method for Content Generated by AI
This is a national standard, compliance with which is intended to be mandatory.
Based on the accompanying Explanatory Notes, a drafting group was established in Nov 2023. It conducted extensive research in drafting the standard. Companies and experts were consulted in refining the draft standard. Multiple rounds of discussions were held with research institutions, enterprises, and experts to ensure that the technical requirements in the standard are verifiable and operational.
What this applies to
Labelling activities carried out by generative synthesis service providers and content dissemination service providers.13
Requirements relating to explicit labels14
Key points:
“Explicit Label” is defined as a label added to AI-generated synthetic content or interactive interface scenarios, presented in text, sound, graphics, etc., and directly perceivable by users.15
Section 5.1 lists the requirements for text content explicit labelling e.g., it must contain the words “artificial intelligence” or “AI” to indicate the use of AI technology, be placed at the beginning or end of the text, or at an appropriate position in the middle of the text.
Section 5.2 lists the requirements for image content explicit labelling e.g., it must also contain the words mentioned above, the text height should be no less than 5% of the shortest side length of the image.
Section 5.3 lists the requirements for audio content explicit labelling e.g., it must also contain the words mentioned above, at the begining or end of the audio, or at an appropriate position in the middle of the audio, voice labels should use normal speech speed.
Section 5.4 lists the requirements for video content explicit labelling e.g., it must also contain the words mentioned above, the label should be located at the edge or corner of the video frame, the text height should be no less than 5% of the shortest side length of the video frame.
Section 5.5 lists the requirements for interactive scenario interface explicit labelling e.g., it must also contain the words mentioned above, you must continuously display prompt text near the content display area.
Appendices C and D contain examples of how the labelling should be done. The examples are in Chinese so you may need some help with translation.
Requirements relating to implicit labels16
Key points:
Relevant definitions:
“Implicit Label” is defined as a label embedded in the AI-generated synthetic content data, not directly perceivable by users but extractable through technical means.
“Metadata Implicit Label” is defined as an implicit label located within the file metadata of AI-generated synthetic content.
“File Metadata” is defined as descriptive information embedded in the file header according to a specific encoding format, used to record file source, attributes, usage, copyright, etc.
While section 4 of the Method says that implicit labelling methods include metadata implicit labelling and content implicit labelling, the section on Implicit Labelling only talks about Metadata Implicit Labelling. Section A.2 in Appendix A says that “content implicit labelling” refers to digital watermarks and other labels added to AI-generated synthetic content data [DC: note that the term is not defined in the definitions section], and clarifies that these are not required by the Method.
Section 6.1 sets out what implicit labels should include e.g., name or code of the generation and synthesis service provider or content distribution service provider.
The format of metadata implicit labelling must comply with Appendix E. Examples of such labelling is in Appendix F.
Key points from Explanatory Notes
This standard was developed to implement the relevant requirements of China’s three AI laws mentioned above. It regulates the methods for labelling AI-generated synthetic content, and it prevents security risks arising from AI-generated synthetic content, enhances AI security, and promotes the safe development of the AI industry.
Section IV of the Explanatory Notes makes a note of other laws, regulations, and standards around the world that require labelling e.g., EU AI Act, EU Digital Services Act, US Executive Order on the Safe, Secure, and Trustworthy Development and Use of AI, Singapore’s AI Governance Model Framework, Canada’s Generative AI Code of Conduct.
It then says that the Method as a standard can lead the standardised and safe development of AI technology, and provide support for China’s contribution to international AI labelling standards. It says that compared with existing international policies and standards, the Method clearly defines the form of identification, is technically complete, and has strong applicability. The Method does not conflict with existing international policies and standards, and its promulgation and implementation will not create barriers to international trade.
Original Chinese draft Measures for the Labelling of AI-Generated Synthetic Content
Machine-translated English version
Original Chinese draft Cybersecurity Technology - Labelling Method for Content Generated by AI and accompanying Explanatory Notes
Machine-translated English versions
(Note: Some Chinese text remains in the Method because they are in images, which machine translators are unequipped to process.)
Article 2 of the Measures.
Article 2 of the Measures.
See Article 3 of the Measures.
See Article 4 of the Measures.
See Article 5 of the Measures.
See Article 6 of the Measures.
Article 7 of the Measures.
Article 8 of the Measures.
Article 9 of the Measures.
Article 10 of the Measures.
Article 10 of the Measures.
Article 11 of the Measures.
Section 1 of the Method.
See section 5 of the Method.
Section 3.3 of the Method.
See section 6.1 of the Method.