The rapid evolution of Large Language Models (LLMs) like DeepSeek and Lucy is reshaping the landscape of both legal technology and cybersecurity. As stakeholders straddling these two critical domains, it’s crucial to understand the implications of these advancements and their impact on data privacy, security, and legal compliance. Far from willing to surf on the…

By

AI’s Double-Edged Sword: Navigating LLM Advancements

The rapid evolution of Large Language Models (LLMs) like DeepSeek and Lucy is reshaping the landscape of both legal technology and cybersecurity. As stakeholders straddling these two critical domains, it’s crucial to understand the implications of these advancements and their impact on data privacy, security, and legal compliance. Far from willing to surf on the vibe of DeepSeek, the purpose here is to give a reading grid of the latest events and take a step aside to understand what lies under the hood. I encourage you to comment on this article in order to share your point of view.

The Rise of DeepSeek and Its Implications

DeepSeek, recently made waves with its DeepSeek-R1 model, sparking debates across tech and policy circles1. While some hail it as AI’s “Sputnik moment,” others express grave concerns about its cybersecurity vulnerabilities. The French data protection authority, CNIL, has announced an investigation into DeepSeek over potential privacy risks, highlighting the growing scrutiny of AI systems in Europe2.

Security concerns

DeepSeek’s cybersecurity failures have been alarming. Researchers discovered a publicly exposed database that leaked chat histories, API keys, and back-end details3. This basic security lapse underscores the importance of robust cybersecurity measures in AI development. Moreover, DeepSeek’s guardrails have proven susceptible to jailbreaking, allowing the extraction of malware scripts and other malicious content4.

Legal and compliance issues

From a legal standpoint, DeepSeek’s practices raise significant GDPR compliance concerns. The indiscriminate data scraping used to train LLMs like DeepSeek conflicts with GDPR principles of data minimization and purpose limitation5. Furthermore, the inability of AI companies to comply with existing data rights due to the nature of LLM operations poses a significant legal challenge.

The AI French touch ?

In contrast to DeepSeek, French AI company Mistral AI represents a potentially more secure and compliant alternative for European markets6. Choosing a French model over DeepSeek could offer several advantages:

  1. GDPR Compliance: European-based companies are more likely to adhere to strict GDPR standards.
  2. Data Sovereignty: Keeping data within EU borders aligns with data localisation requirements.
  3. Transparent Practices: EU companies often provide clearer insights into their data handling processes as stated in the terms of use of Mistral AI 7.

Balancing innovation and security

Last week I’ve told you about the choices one often has to make between speed and quality. Here and other dilemma is pointed out. Indeed while the allure of cutting-edge AI capabilities is strong, stakeholders must take into account security and compliance. There are a lot of parameters that will weight on the capacity of people and companies to balance those two elements. It has been shown that risk mitigation is something that is highly influenced by cultural background 8. We will assess here key considerations for implementing LLMs in legal tech and cybersecurity:

Implementing Data Privacy Measures

1. Data Anonymisation Techniques
  • Use k-anonymity: Replace specific identifiers with generalised categories. For example, replace exact ages with age ranges (e.g., 25-30, 31-35).
  • Implement t-closeness: Ensure the distribution of sensitive attributes within any group is close to the overall distribution. Use tools like ARX Data Anonymisation Tool to achieve this 9.
  • Soon, you’ll be able to implement MAPA (Multilingual Anonymisation toolkit for Public Administrators): This EU-funded tool uses AI to identify and anonymise personal details in 24 EU languages, ensuring GDPR compliance. It’s a bit like wearing online gloves 😉 10.
2. Prompt injection prevention
  • Use a content filtering system to detect and block potentially malicious prompts
3. Input validation

Input validation is the process of verifying that user-supplied data meets specific criteria before processing it. There are several types of input validation:

  1. Data Type Validation: Ensures input is of the correct data type (e.g., numbers for numerical fields).
  2. Range Validation: Checks if input falls within a specified range.
  3. Format Validation: Verifies input adheres to a specific format (e.g., email addresses).
  4. Consistency Validation: Ensures input is consistent with other related data.
Input Sanitization

Input sanitization involves cleaning or modifying user input to remove potentially harmful elements. Common sanitization strategies include:

  1. Whitelisting: Allows only specific characters or patterns.
  2. Blacklisting: Blocks certain characters or patterns.
  3. Escaping: Converts special characters to their encoded equivalents.
Here are some Best Practices

To implement effective input validation and sanitisation. Define clear input requirements before implementation.

Validate and sanitise all user inputs, including those from APIs and external sources.

Use specialised libraries for validation, such as Pydantic for Python or Joi for JavaScript.

An other level of security can be reached by using Models on private cloud infrastucture.  Running an LLM on a private server refers to hosting and operating the language model on your own infrastructure rather than using a cloud-based service. This approach offers several benefits:

  1. Data privacy: Your data doesn’t leave your controlled environment. Though one needs to secure what is going outside the model and server to ensure no backdoor is seated up.
  2. Customisation: You can fine-tune the model for specific use cases.
  3. Cost control: For high-volume usage, self-hosting can be more cost-effective.

However, hosting a private LLM holds some cons as well such as:

  1. Significant computational resources
  2. Technical expertise.

Take home message:

LLMs are a not so new but are more than ever taking a direct benefit from our data and can be a threat to one’s security and privacy. It is of crucial importance to handle data fed to them with care and set up measures.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.