MachineLearningStories

ISO (International Organization for Standardization)/IEC (International Electrotechnical Commission ) 42001:2023 Artificial Intelligence Management System & ISO/IEC- 23894 ( AI Risk Management)

2026-02-09T20:56:00.000-08:00

ISO/IEC 42001

it provides guidelines and framework for how to manage AI systems in an organization. It's comparison with other AI Acts, AI Risk management system.

Framework	Primary Nature	Scope of Coverage	Who It’s For
NIST AI RMF	Risk framework	End-to-end AI risks (technical, societal, operational)	Builders, deployers
EU AI Act	Regulation (law)	High-risk AI obligations & prohibitions	Anyone operating in EU
ISO/IEC 42001	Management system	Organizational AI governance & controls	Enterprises, auditors
OECD AI Principles	Policy principles	Ethical & societal values	Governments, enterprises
ISO/IEC 23894	Risk standard	AI risk identification & treatment	Risk & compliance teams

Clause 1 – Scope

Defining what your AI governance covers

Clause 1 clarifies:

What types of AI systems are included
Which organizational units are in scope
Whether AI is developed, procured, or used

Why it matters:
Without a clearly defined scope, AI governance becomes vague and unenforceable. This clause prevents organizations from saying “we follow responsible AI” without defining where and how.

Clause 2 – Normative References

The standards you must align with

This clause lists other ISO standards that are required for applying ISO 42001 correctly.

Why it matters:
It ensures consistency across standards, especially if your organization already follows ISO 27001, ISO 9001, or similar frameworks.

Clause 3 – Terms and Definitions

Creating a common AI language

AI terminology is often overloaded and misunderstood. Clause 3 defines:

AI system
AI lifecycle
Risk, harm, impact, oversight

Why it matters:
Governance fails when teams interpret AI terms differently. This clause ensures legal, technical, and business teams speak the same language.

Clause 4 – Context of the Organization

Understanding where AI fits in your business

This is one of the most important clauses.

It requires organizations to:

Identify internal and external factors affecting AI . for example for any healthcare org, these should be factors-

Understand stakeholders (users, customers, regulators, society). Taking example of healthcare organization-

Define AI use cases and boundaries - for healthcare , this will be kind of scoping of use cases-

Why it matters:
Clause 4 prevents “one-size-fits-all” AI policies. AI used for credit scoring, medical diagnosis, or marketing personalization does not carry the same risk, and this clause makes that explicit.

Clause 5 – Leadership

Making AI a business responsibility

Clause 5 shifts AI accountability to top management.

It requires:

Executive ownership of AI risks
Clear roles and responsibilities
An AI policy aligned with business strategy

AI Ethics committee

Why it matters:
Regulators and auditors increasingly ask:

Who is accountable when AI causes harm?

This clause ensures AI governance is not just an IT or data science problem—it is a leadership responsibility.

Clause 6 – Planning

Turning AI principles into action

Clause 6 is the operational brain of ISO 42001.

It covers:

AI risk assessment and treatment

Ethical, legal, and societal risks
Objectives, KPIs, and mitigation plans
Planning for change in AI systems
AI Life cycle

AI Strategy & Roadmap-

Why it matters:
This clause converts Responsible AI from high-level values into measurable, auditable controls.

Clause 7 – Support

Enabling people, skills, and documentation

Clause 7 focuses on:

Skills and competence

Training and awareness
Communication
Documentation and records
Tech Support

Why it matters:
Even the best AI policies fail if teams don’t understand them. This clause ensures people building and using AI are properly trained and informed.

Clause 8 – Operation

Governing the AI lifecycle

This is where AI is actually built and used.

Clause 8 covers:

Data management

Model development and deployment

Human oversight mechanisms
Monitoring and incident handling

Why it matters:
Clause 8 ensures responsible AI is embedded throughout the AI lifecycle, not just reviewed at the end.

Clause 9 – Performance Evaluation

Proving your AI governance works

This clause requires:

Monitoring and measurement

Internal audits
Management reviews

Why it matters:
You must be able to demonstrate effectiveness, not just claim compliance. This is critical for regulators, customers, and certification audits.

Clause 10 – Improvement

Keeping pace with evolving AI risks

AI systems evolve—and so must governance.

Clause 10 covers:

Handling nonconformities
Corrective actions
Continuous improvement

A simple way to remember ISO 42001:-

Clause	Title	What it Covers	Why it Matters
1	Scope	Defines applicability, boundaries, and AI activities covered	Prevents vague or selective AI governance
2	Normative References	Lists other standards required for compliance	Ensures alignment with ISO ecosystem
3	Terms and Definitions	Standard AI terminology and concepts	Creates a shared language across teams
4	Context of the Organization	Business context, stakeholders, AI use cases, scope of AIMS	Foundation for risk-based AI governance
5	Leadership	Executive accountability, AI policy, roles & responsibilities	Makes AI a business responsibility
6	Planning	AI risk assessment, objectives, mitigation plans	Turns principles into executable controls
7	Support	Skills, training, communication, documentation	Enables people and process to work
8	Operation	AI lifecycle controls, data, models, monitoring, oversight	Where AI is actually governed in practice
9	Performance Evaluation	KPIs, audits, management review	Proves governance effectiveness
10	Improvement	Nonconformities, corrective actions, continuous improvement	Keeps governance relevant as AI evolves

ISO/IEC 23894

Published in 2023, ISO/IEC 23894 provides guidance on how to identify, assess, treat, monitor, and communicate AI risks across the AI lifecycle. It is not a certification standard — it is a deep risk playbook that complements governance standards like ISO/IEC 42001 and regulatory regimes like the EU AI Act.

This article explains each clause in detail, what it expects, and how it should be used in practice.

Clause 1 – Scope

What this standard is (and is not)

Clause 1 defines the purpose and applicability of ISO/IEC 23894.

It clarifies that the standard:

Applies to organizations that develop, deploy, operate, or use AI systems
Covers AI-related risks across the full lifecycle
Provides guidance, not mandatory requirements

Why this clause matters

Many organizations misunderstand ISO standards as compliance checklists. Clause 1 makes it clear that ISO/IEC 23894 is:

Not a regulatory standard
Not certifiable on its own
Not limited to technical AI risks

Instead, it is intended to help organizations design their own AI risk management processes, aligned with their context, industry, and maturity.

Practical takeaway

Clause 1 sets expectations:

Use this standard to design how you manage AI risk — not to prove compliance by ticking boxes.

Clause 2 – Normative References

Standing on the shoulders of enterprise risk management

This clause references ISO 31000 (Risk Management – Guidelines) as the foundational risk standard.

Why this clause matters

AI risk should not exist in isolation from enterprise risk management (ERM). This clause ensures:

AI risk uses consistent risk language
AI risk integrates with existing governance, legal, and operational risk processes

Practical takeaway

If your organization already follows ISO 31000:

ISO/IEC 23894 becomes a specialized extension, not a new framework
If not:
Your AI risk processes will lack coherence with enterprise decision-making

Clause 3 – Terms and Definitions

Creating a shared understanding of AI risk

Clause 3 defines AI-specific interpretations of:

Risk
Harm
AI system
Lifecycle
Stakeholders
Controls

Why this clause matters

AI risk discussions often break down because:

Legal teams think in terms of compliance
Engineers think in terms of model performance
Business leaders think in terms of impact and reputation

Clause 3 aligns these perspectives by establishing shared definitions.

Practical takeaway

Before building risk registers or controls:

Make sure everyone uses the same vocabulary — or risk management will fail in execution.

Clause 4 – Principles of AI Risk Management

The philosophy behind the process

This clause adapts ISO 31000 principles for the AI context.

Typical principles include:

Integrated into governance and operations
Structured and comprehensive
Adaptive to change
Human-centric
Transparent and traceable
Inclusive of stakeholders

Why this clause matters

AI systems are:

Probabilistic
Adaptive
Context-dependent

Static risk assessments are insufficient. Clause 4 establishes that AI risk management must be continuous and evolving.

Practical takeaway

These principles guide how risk decisions should be made — especially when trade-offs are unavoidable.

Clause 5 – Framework for Managing AI Risk

Building the organizational backbone

Clause 5 explains how to embed AI risk management into the organization, not just into projects.

5.1 Leadership and Commitment

Senior management accountability
Clear risk ownership

Why it matters:
AI risk is a business risk, not a technical side task.

5.2 Integration

Embed AI risk into ERM, product, legal, and compliance workflows

Why it matters:
Isolated AI risk processes get ignored under delivery pressure.

5.3 Design of the Framework

Define scope, context, risk appetite
Align with organizational objectives

Why it matters:
Risk tolerance varies by use case (e.g., medical AI vs marketing AI).

5.4 Implementation

Deploy risk processes across AI lifecycle stages

5.5 Evaluation

Assess whether controls actually reduce risk

5.6 Improvement

Continuously improve the framework as AI evolves

Practical takeaway

Clause 5 ensures AI risk management is systemic, not ad-hoc.

Clause 6 – AI Risk Management Process

The heart of ISO/IEC 23894

This clause defines the end-to-end AI risk process.

6.1 Communication and Consultation

Engage internal and external stakeholders
Consider affected individuals and groups

Why it matters:
Many AI harms are social, contextual, or downstream — not obvious from model metrics.

6.2 Establishing the Context

Organizational context
AI system purpose and intended use
Stakeholders
Risk criteria and thresholds

Key insight:
Risk depends on how AI is used, not just how it performs.

6.3 AI Risk Assessment

6.3.1 Risk Identification

Examples:

Bias and discrimination
Model drift
Explainability gaps
Automation bias
Misuse or abuse
Legal and reputational harm

6.3.2 Risk Analysis

Likelihood
Severity of impact
Affected stakeholders

6.3.3 Risk Evaluation

Compare against risk criteria
Decide accept, mitigate, escalate, or stop

6.4 AI Risk Treatment

Options include:

Avoidance
Reduction (technical, procedural, human controls)
Sharing
Acceptance

Key insight:
Not all AI risks can be eliminated — but all must be consciously decided.

6.5 Monitoring and Review

Performance drift
Context changes
Emerging risks

Why it matters:
AI risk is dynamic and degrades over time.

6.6 Recording and Reporting

Risk registers
Decision logs
Audit trails

Key insight:
If risk decisions are not documented, they do not exist during audits.

Clause 7 – AI Lifecycle Considerations

Risk is not uniform across stages

This clause emphasizes that risks appear differently across:

Design
Data collection
Training
Testing
Deployment
Operation
Decommissioning

Why this clause matters

Many organizations assess AI risk once, usually before deployment. Clause 7 makes it clear that:

Risk must be reassessed as the system evolves.

Annexes – Practical Gold (Informative)

While not mandatory, annexes provide:

Common AI risk sources
Typical impacts
Lifecycle mappings
Example treatments

These are invaluable when creating:

Risk taxonomies
Assessment templates
Control libraries

How ISO/IEC 23894 Is Used in Practice

Leading organizations use it to:

Design AI risk registers
Support EU AI Act risk documentation
Feed risk inputs into ISO/IEC 42001
Create defensible AI risk decisions

EU AI Act

the European Union introduced the EU AI Act — the world’s first comprehensive, legally binding framework for regulating AI.

This article explains what the EU AI Act is, why it exists, how it works, and why it matters globally, even if you are not based in Europe.

Why the EU Created the AI Act

The EU AI Act was introduced to solve a clear problem:
AI systems were being deployed faster than governments could regulate them, often without clarity on who is accountable when harm occurs.

The Act aims to:

Protect fundamental rights (privacy, non-discrimination, safety)
Reduce harmful or manipulative AI practices
Create trust in AI systems
Provide legal certainty for businesses
Enable responsible innovation, not ban AI

In simple terms, the EU AI Act treats AI like a regulated product, similar to medical devices or automobiles.

The Core Idea: Risk-Based Regulation

The EU AI Act does not regulate all AI equally.
Instead, it classifies AI systems based on the level of risk they pose to people and society.

The higher the risk, the stricter the obligations.

This risk-based approach is the foundation of the entire law.

The Four AI Risk Categories

1️⃣ Unacceptable Risk — Banned AI

These are AI practices considered a clear threat to fundamental rights.

Examples include:

Social scoring by governments
AI systems that manipulate human behavior
Certain forms of real-time biometric surveillance in public spaces

These systems are prohibited outright.

2️⃣ High Risk — Strictly Regulated AI

These AI systems directly affect people’s rights, safety, or access to essential services.

Examples:

Creditworthiness and loan approval systems
Recruitment, hiring, and promotion tools
Medical devices using AI
AI used in education, law enforcement, or border control
Biometric identification systems

High-risk AI systems are allowed, but only if they meet mandatory compliance requirements.

3️⃣ Limited Risk — Transparency Obligations

AI systems that interact with humans or generate content.

Examples:

Chatbots
Deepfake generation
Emotion recognition tools

Requirement:

Users must be clearly informed that they are interacting with AI.

4️⃣ Minimal Risk — Mostly Unregulated

Everyday AI applications with low societal risk.

Examples:

AI in games
Photo filters
Spam detection
Recommendation engines

📌 These systems are largely exempt from regulation.

What Does the EU AI Act Require for High-Risk AI?

For high-risk AI systems, the EU AI Act mandates a full risk management and governance approach, including:

A risk management system
High-quality, representative training data
Bias detection and mitigation
Human oversight mechanisms
Robustness, accuracy, and cybersecurity
Detailed technical documentation
Post-market monitoring
Incident reporting to authorities

This is where standards like ISO/IEC 23894 and ISO/IEC 42001 become extremely valuable.

Who Must Comply with the EU AI Act?

The EU AI Act applies to:

Organizations based in the EU
Organizations outside the EU whose AI systems affect people inside the EU

📌 If your AI system touches EU users — this law applies to you, regardless of where your company is headquartered.

This “extra-territorial reach” is similar to GDPR.

Enforcement and Penalties

Non-compliance can result in fines of up to:

€35 million, or
Up to 7% of global annual turnover

The penalties place the EU AI Act in the same seriousness category as GDPR.

How the EU AI Act Fits with Other AI Frameworks

The EU AI Act does not operate in isolation. In practice, organizations combine it with other frameworks:

EU AI Act → Legal obligations (what you must do)
NIST AI RMF → Risk categories and measurement (what can go wrong)
ISO/IEC 23894 → AI risk assessment process (how to manage risk)
ISO/IEC 42001 → Governance and auditability (how to sustain control)

how to map risk categories of EU-

The core principle: use-case over technology

The EU AI Act asks four fundamental questions:

What is the AI system used for?
Who is affected by its decisions?
Does it impact fundamental rights or safety?
Could harm occur at scale or without human control?

The answers determine the category — not model size, accuracy, or novelty.

Step 1: Check if the AI practice is explicitly prohibited

➝ Unacceptable Risk

The first question regulators ask is very simple:

Is this AI use case fundamentally incompatible with EU values?

If yes, the AI system is banned, regardless of safeguards.

Typical indicators

Manipulates human behavior
Exploits vulnerable groups (children, disabled)
Enables social scoring by public authorities
Certain forms of real-time biometric surveillance

📌 If the answer is yes → Unacceptable Risk → Stop here

Step 2: Check if the AI is listed as high-risk

➝ High Risk

If it’s not banned, the next question is:

Is this AI used in a domain that affects people’s rights, safety, or access to essential services?

The EU AI Act explicitly lists high-risk use cases, such as:

Recruitment and employee evaluation
Credit scoring and loan approvals
Medical diagnosis or treatment
Educational assessment
Law enforcement and border control
Biometric identification

Key test

If an AI system:

Makes or supports decisions about people at scale, and
Errors could cause serious harm or discrimination

👉 It is High Risk, even if humans are “in the loop”.

Step 3: Check if transparency alone is sufficient

➝ Limited Risk

If the AI does not decide rights or safety outcomes, ask:

Does this AI interact directly with humans or generate synthetic content that could mislead them?

Typical cases:

Chatbots
Voice assistants
Deepfake or synthetic media generation
Emotion-recognition systems

Requirement

These systems are allowed, but users must be:

Clearly informed they are interacting with AI
Informed when content is AI-generated

📌 No deep risk controls — just transparency obligations.

Step 4: Everything else

➝ Minimal Risk

If the AI:

Does not impact rights or safety
Does not manipulate or mislead users
Does not operate in sensitive domains

Then it falls into Minimal Risk.

Examples:

AI in games
Photo enhancement
Recommendation engines
Spam filters

📌 These systems are largely unregulated.

From Gas to Clean Fuels: The GTL Value Chain — and Where AI Changes the Game

2026-02-03T11:00:00.000-08:00

Gas-to-Liquids (GTL) sounds exotic, but the idea is simple:

Take natural gas → convert it into clean liquid fuels and specialty products.

1. Feed Gas Reception & Pre-Treatment

“Clean the gas before anything else breaks”

What’s happening (plain English)

Natural gas arrives from underground reservoirs.
Before it can be used, unwanted stuff must be removed — otherwise expensive equipment gets damaged.

Key activities

Removing water to prevent corrosion
Removing sulphur and CO₂ that poison catalysts
Filtering trace metals like mercury
Monitoring gas composition continuously

Purpose

Protect downstream units and keep the plant running smoothly.

Value created

Fewer shutdowns
Longer equipment life
Stable operations

AI / ML / GenAI interventions

Time-series ML to predict sudden feed-gas quality changes
Anomaly detection on gas composition sensors
Root-cause models linking upstream field behavior to plant upsets
GenAI copilots that explain why gas quality is drifting in plain language

2. Syngas Generation

“Convert gas into chemical building blocks”

What’s happening

Clean gas is heated and reacted to create syngas (hydrogen + carbon monoxide).
This step consumes massive energy and is one of the costliest parts of GTL.

Key activities

High-temperature reforming
Oxygen production using air separation units
Fine-tuning hydrogen-to-carbon monoxide ratios
Managing heat recovery and energy balance

Purpose

Create the exact chemical mix needed for fuel synthesis.

Value created

Energy efficiency
Stable downstream conversion
Cost control

AI / ML / GenAI interventions

Reinforcement learning for optimal control strategies
Energy optimization models (steam, fuel gas, power)
Physics-informed ML to respect thermodynamics
What-if simulators for operators (“What if feed gas changes?”)
GenAI to summarize complex control-room situations

3. Fischer–Tropsch (FT) Synthesis

“Grow liquid hydrocarbons from gas”

What’s happening

Syngas is passed over catalysts, forming long-chain hydrocarbons — like growing wax molecules from gas.

Key activities

Operating FT reactors
Managing catalyst activity and replacement
Controlling temperature to avoid runaway reactions
Recovering heat for steam generation

Purpose

Create synthetic hydrocarbons — the heart of GTL.

Value created

Product yield
Plant throughput
Safety and stability

AI / ML / GenAI interventions

Catalyst life prediction models
Early fault detection using multivariate ML
Soft sensors estimating unmeasured reaction variables
Digital twins for reactor behavior
GenAI for shift handover summaries and incident explanation

4. Hydrocarbon Finishing

“Turn synthetic wax into premium fuels”

What’s happening

The wax-like product is carefully reshaped into diesel, naphtha, base oils, and specialty waxes.

Key activities

Breaking long molecules (hydrocracking)
Re-shaping molecules for cold-weather performance
Separating products by boiling point
Quality testing and certification

Purpose

Design fuels with exact properties — not just “acceptable” ones.

Value created

High margins
Premium, ultra-clean products
Market flexibility

AI / ML / GenAI interventions

Multi-objective optimization (yield vs quality vs energy)
Hydrogen consumption optimization models
Product-slate recommendation engines
GenAI for operator decision support (“Best cut-point strategy today?”)
ML-based quality prediction before lab results arrive

5. Utilities & Offsites

“Everything that keeps the plant alive”

What’s happening

Power, water, steam, storage, and logistics — none of these make fuel, but without them nothing works.

Key activities

Power and steam generation
Cooling water & desalination
Tank-farm operations
Marine loading and export logistics

Purpose

Enable continuous, safe, energy-efficient production.

Value created

Uptime
Cost control
ESG performance

AI / ML / Vision interventions

Energy demand forecasting
Load-balancing optimization
Tank-level prediction & inventory optimization
Computer vision for leaks, spills, unsafe access
GenAI for compliance & reporting automation

6. Storage, Blending & Export

“Deliver exactly what customers paid for”

What’s happening

Products are blended to precise specs and shipped worldwide.

Key activities

Blending recipe calculation
Quality assurance
Export scheduling

Purpose

Meet customer specifications and delivery commitments.

Value created

Working capital efficiency
Customer trust

AI / ML / GenAI interventions

Demand forecasting
Blending optimization algorithms
Supply-chain risk models
GenAI for contracts, shipping docs, regulatory filings

7. Lifecycle Operations, Maintenance & Safety

“Protect billions over decades”

What’s happening

Assets age, corrode, and fail — unless constantly monitored.

Key activities

Inspections
Preventive and predictive maintenance
Turnarounds
Safety monitoring

Value created

Avoided unplanned shutdowns
Worker safety
Asset life extension

AI / Robotics / AR-VR interventions

Drones + vision AI for corrosion & flare inspection
Predictive maintenance ML on rotating equipment
Robotics for confined-space inspections
AR/VR for operator training and remote expert guidance
GenAI copilots for maintenance troubleshooting

AI Risk Management Framework

2026-01-29T07:03:00.000-08:00

AI Risk Management Framework

The U.S. government has rescinded the Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, which previously aimed to establish a comprehensive framework for responsible AI development. The order addressed critical areas such as ensuring the safety and security of AI technologies, promoting innovation and competition, supporting the workforce, advancing equity and civil rights, protecting consumers, patients, passengers, and students, safeguarding privacy, and accelerating the adoption of AI across federal agencies.

While the intention behind reducing regulatory constraints is to foster innovation, leaving AI largely unregulated is not without risks. Artificial intelligence systems can produce incorrect or misleading outcomes, influence critical decisions that directly affect people’s lives, and reinforce or amplify existing biases. Moreover, AI technologies can be misused for manipulation, misinformation, and cybercrime, creating new vectors for security threats. Without appropriate governance mechanisms, these risks can scale rapidly as AI systems become more powerful and widely deployed.

Drawing on the EU’s “Trustworthy AI” principles, several key risk categories have been identified, highlighting the need for structured governance and risk management approaches in AI development and deployment.

https://www.pwc.com/jp/en/knowledge/column/generative-ai-regulation09.html

further explanation of these is below-

There are many frameworks that defines risk principles but policies are at institution level( no global standard)

The NIST AI Risk Management Framework is a voluntary framework designed to help organizations manage risks associated with artificial intelligence while enabling innovation and value creation. It focuses on identifying, assessing, and mitigating the potential negative impacts of AI systems, while also helping organizations protect themselves from technical, legal, and reputational risks.

The framework is technology- and vendor-agnostic, making it applicable across industries, sectors, and AI technologies. It defines the characteristics of a trustworthy AI system, providing guidance on how organizations can design, develop, deploy, and govern AI responsibly.

The NIST framework consists of two key components:

Foundational concepts, which define core principles, risk taxonomy, and characteristics of trustworthy AI.
Core functions, which provide a structured approach to AI risk management through governance, risk identification, measurement, and mitigation.

NIST Risk Management Framework-Core

For an AI system to be considered trustworthy, it must be valid and reliable (robust), safe, accountable, transparent, explainable, and fair. These characteristics ensure that AI systems operate consistently, make responsible decisions, and can be understood and governed effectively.

If these qualities are absent, AI systems can become harmful—causing negative impacts on individuals, organizations, and the broader ecosystem. Such risks may include incorrect or biased decisions, loss of trust, legal and reputational damage, and systemic societal harm.

How to Implement the NIST AI Risk Management Framework (AI RMF)

Implementing the NIST AI Risk Management Framework requires a structured, step-by-step approach that integrates governance, risk assessment, and organizational change. The process begins with understanding the framework and culminates in embedding responsible AI practices into organizational culture.

Step-by-Step Implementation Approach

1. Understand the Framework and Current AI Landscape

Organizations must first build awareness of the NIST AI RMF through education and training. This includes assessing existing AI use cases, documenting an AI inventory, identifying potential risks, and consulting key stakeholders across business, technology, legal, and compliance teams.

2. Govern AI Usage

Establish governance mechanisms to guide AI adoption. This involves developing AI policies aligned with organizational strategy, defining roles and responsibilities, ensuring accountability, and aligning stakeholders around responsible AI objectives.

3. Map Risks and Benefits

Identify and document the risks and benefits associated with each AI system. This step requires cross-functional collaboration among data scientists, IT teams, product owners, legal teams, and business leaders to evaluate impacts on individuals, organizations, and society.

4. Measure AI Risks

Define a risk assessment methodology and metrics to evaluate AI trustworthiness based on Responsible AI principles. Calculate risk scores, conduct periodic assessments, and establish mechanisms for continuous monitoring and feedback.

5. Manage AI Risks

Implement risk mitigation strategies such as model retraining, data governance enhancements, security controls, and process improvements. Allocate resources effectively, communicate risks transparently, and regularly review and update AI policies and controls.

6. Embed Responsible AI Culture

Drive cultural transformation through organization-wide training programs and awareness initiatives. Ensure that responsible AI becomes a shared responsibility across all levels of the organization.

Case Study: Implementing NIST AI RMF in a Mid-Sized Organization

Step 1: Awareness and Training

Expert-led training sessions were conducted to educate employees about the NIST AI RMF, preparing teams for subsequent implementation steps.

Step 2: AI Use Case Assessment

An audit of existing AI systems was performed to document their purpose, risks, and potential impacts.

Step 3: AI Governance Setup

The organization developed AI policies aligned with its strategic goals, defined roles and responsibilities, and established accountability mechanisms for AI risk management.

Step 4: Risk and Benefit Mapping

A cross-functional team assessed the risks and benefits of each AI system, focusing on impacts on people, business operations, and society. Findings were systematically documented.

Step 5: Risk Measurement

Metrics were defined to evaluate the trustworthiness of AI systems based on Responsible AI principles. Regular assessments were conducted to track risk levels.

Step 6: Risk Management

Risk mitigation strategies were implemented, including model retraining, enhanced data security, and process improvements. Risk management strategies were periodically updated.

Step 7: Responsible AI Training

Organization-wide training programs were launched to promote responsible AI usage across all departments.

Outcomes of AI RMF Implementation

Implementing the NIST AI RMF leads to:

Enhanced AI risk management
Increased trustworthiness of AI systems
Improved regulatory and compliance alignment
Future-ready AI governance capabilities
More strategically aligned AI use cases

Type of Harms and related risk-

challenges in risk measurement according to NIST -

AI Risk Tolerance

The AI Risk Tolerance Model defines how much risk an organization is willing to accept across different AI risk categories, use cases, and impact levels. It helps organizations decide whether an AI system can be deployed, mitigated, or rejected.

Organizations must define risk tolerance based on:

context
stakeholders
societal impact
legal requirements
use case criticality

This naturally leads to separate tolerance thresholds per risk dimension.

Integrated AI Risk Prioritization Model

Step 1: Questionnaire → Impact Score

Example: You assess each risk category using questions.

Risk Category	Questionnaire Score (0–5)	Meaning
Safety Risk	4.5	High safety risk
Privacy Risk	3.8	Medium-high privacy risk
Bias & Fairness Risk	4.2	High bias risk
Security Risk	2.5	Moderate security risk
Explainability Risk	3.0	Moderate explainability risk

👉 This becomes your Impact Score.

Step 2: Likelihood Score (optional but recommended)

Likelihood can come from:

historical incidents
expert judgment
system exposure
usage scale

Example:

Risk Category	Likelihood Score (1–5)
Safety Risk	3
Privacy Risk	4
Bias Risk	4
Security Risk	2
Explainability Risk	3

Step 3: Risk Category Weight (fixed)

From earlier model:

Risk Category	Weight
Safety Risk	3.0
Privacy Risk	3.0
Bias & Fairness Risk	2.5
Security Risk	2.5
Legal & Compliance Risk	2.5
Reputational Risk	2.0
Operational Risk	1.8
Explainability Risk	1.5
Performance Risk	1.2
Innovation Risk	1.0

Step 4: Final Priority Score Formula

$\textbf{Priority Score} = \text{Questionnaire Score (Impact)} \times \text{Likelihood} \times \text{Weight}$

Step 5: Example Calculation (with your questionnaire)

Risk Category	Impact (Questionnaire)	Likelihood	Weight	Priority Score
Safety Risk	4.5	3	3.0	40.5
Privacy Risk	3.8	4	3.0	45.6
Bias & Fairness Risk	4.2	4	2.5	42.0
Security Risk	2.5	2	2.5	12.5
Explainability Risk	3.0	3	1.5	13.5

Step 6: Priority Interpretation

Priority Score	Priority Level	Action
> 50	Critical	Stop deployment / immediate mitigation
30 – 50	High	Mitigate before deployment
15 – 29	Medium	Monitor and control
< 15	Low	Acceptable risk

risk assessment , risk tolerance & prioritization can be summarized as-

Questionnaire Score → Impact

Impact × Likelihood × Weight → Priority Score

Priority Score vs Risk Tolerance → Decision

Also AI Risk management should be part Enterprise Risk management team. ( Along with cyber security, privacy, operational process etc).

What is trustworthy Ai system by NIST -

Functions of AI Risk Management Framework ( Deep Dive) from NIST AI Risk RMP Playbook-

Layered Architecture for AI Risk Framework

Layer 1: Evidence Layer (Questionnaire)

Purpose: Collect factual evidence about AI systems.
Question types:
Governance questions (policy, roles, accountability)
RAI principle questions (fairness, transparency, privacy, safety, etc.)
Risk-related questions (bias, privacy, security, explainability, etc.)
Mitigation/control questions (monitoring, retraining, incident handling)
Output: Raw evidence for analysis.

Layer 2: NIST Framework Mapping Layer

Purpose: Structure evidence using the NIST AI Risk Management Framework (AI RMF).
NIST Function Meaning
GOVERN Governance, policy, risk tolerance, accountability
MAP Risk identification, context, stakeholder impact
MEASURE Risk assessment, testing, metrics
MANAGE Risk mitigation, controls, monitoring
Output: NIST-aligned classification of questions and evidence.

NIST Function	Meaning
GOVERN	Governance, policy, risk tolerance, accountability
MAP	Risk identification, context, stakeholder impact
MEASURE	Risk assessment, testing, metrics
MANAGE	Risk mitigation, controls, monitoring

Layer 3: Analytical Views Layer (Three Parallel Dimensions)

This layer interprets the same evidence through three complementary lenses.

View A: NIST Process Maturity View

Key question:
How mature is the organization’s AI governance process?
NIST Function Score Output
GOVERN Governance maturity score
MAP Risk identification capability score
MEASURE Risk measurement capability score
MANAGE Mitigation capability score
Reference:
https://airc.nist.gov/airmf-resources/playbook/

NIST Function	Score Output
GOVERN	Governance maturity score
MAP	Risk identification capability score
MEASURE	Risk measurement capability score
MANAGE	Mitigation capability score

View B: Responsible AI (RAI) Principle Quality View

Key question:
How trustworthy is the AI system?
RAI Principle Score Output
Fairness Fairness score
Transparency Transparency score
Privacy Privacy score
Safety Safety score
Accountability Accountability score
Security Security score
Robustness Robustness score

RAI Principle	Score Output
Fairness	Fairness score
Transparency	Transparency score
Privacy	Privacy score
Safety	Safety score
Accountability	Accountability score
Security	Security score
Robustness	Robustness score

View C: AI Risk Impact View

Key question:
How risky is the AI system?
Risk Category Risk Score
Bias & Discrimination Risk Bias risk score
Privacy Risk Privacy risk score
Safety Risk Safety risk score
Security Risk Cyber risk score
Legal & Compliance Risk Legal risk score
Operational Risk Operational risk score
Reputational Risk Reputation risk score
Societal Risk Societal risk score
Transparency / Explainability Risk Transparency risk score
Performance Risk Performance risk score

Risk Category	Risk Score
Bias & Discrimination Risk	Bias risk score
Privacy Risk	Privacy risk score
Safety Risk	Safety risk score
Security Risk	Cyber risk score
Legal & Compliance Risk	Legal risk score
Operational Risk	Operational risk score
Reputational Risk	Reputation risk score
Societal Risk	Societal risk score
Transparency / Explainability Risk	Transparency risk score
Performance Risk	Performance risk score

Layer 4: Cross-View Insight Layer

Purpose: Interpret relationships between governance maturity, principles, and risks.
Pattern Observed Insight
Low GOVERN score + High risk score Governance weakness driving risk
Low transparency score + High transparency risk Opaque AI decision-making
High RAI scores + High risk scores Ethical design but operational gaps
Low MEASURE score Risk scores may be unreliable
Low MANAGE score + High risk score Weak mitigation capability

Pattern Observed	Insight
Low GOVERN score + High risk score	Governance weakness driving risk
Low transparency score + High transparency risk	Opaque AI decision-making
High RAI scores + High risk scores	Ethical design but operational gaps
Low MEASURE score	Risk scores may be unreliable
Low MANAGE score + High risk score	Weak mitigation capability

Layer 5: Decision and Action Layer

Purpose: Drive governance and risk decisions.
Combined Insight Action
Low risk + high maturity + high RAI Accept
Medium risk + moderate maturity Mitigate
High risk + low maturity Redesign or restrict
Critical risk Stop deployment

Combined Insight	Action
Low risk + high maturity + high RAI	Accept
Medium risk + moderate maturity	Mitigate
High risk + low maturity	Redesign or restrict
Critical risk	Stop deployment

Example: Applying the AI Governance Framework Step-by-Step

To illustrate how the layered AI governance framework works in practice, we demonstrate it using 12 sample questions and walk through each layer of analysis.

Step 1: Sample Questions (Evidence Layer)

We start with 12 representative questions from an AI risk and governance questionnaire.

Q.No	Question
Q1	Do we have an AI governance policy?
Q2	Are roles and responsibilities for AI defined?
Q3	Have AI use cases been documented?
Q4	Have stakeholders impacted by AI been identified?
Q5	Has bias risk been identified in the model?
Q6	Are privacy risks assessed for AI systems?
Q7	Are fairness metrics measured?
Q8	Are explainability techniques applied to models?
Q9	Are security tests conducted on AI systems?
Q10	Are AI risks scored using defined metrics?
Q11	Are mitigation actions defined for identified risks?
Q12	Are AI models retrained when risks increase?

Responses are scored on a 5-point scale (1 = very weak, 5 = very strong).

Step 2: Raw Scores (Evidence)

Assume the following scores:

Q.No	Score (1–5)
Q1	4
Q2	3
Q3	5
Q4	4
Q5	2
Q6	3
Q7	2
Q8	1
Q9	3
Q10	2
Q11	3
Q12	2

These scores represent the raw evidence collected from the organization.

Step 3: Mapping to NIST RMF, RAI Principles, and Risk Categories

Each question is mapped to:

NIST AI RMF function,
Responsible AI (RAI) principle,
AI risk category.

Q.No	NIST Function	RAI Principle	Risk Category
Q1	GOVERN	Accountability	Governance Risk
Q2	GOVERN	Accountability	Governance Risk
Q3	MAP	–	Operational Risk
Q4	MAP	–	Societal Risk
Q5	MAP	Fairness	Bias Risk
Q6	MAP	Privacy	Privacy Risk
Q7	MEASURE	Fairness	Bias Risk
Q8	MEASURE	Transparency	Transparency Risk
Q9	MEASURE	Security	Security Risk
Q10	MEASURE	–	General Risk
Q11	MANAGE	Safety	Safety Risk
Q12	MANAGE	Robustness	Performance Risk

Step 4A: NIST Process Maturity View

Scores are aggregated by NIST function.

GOVERN Score
(Q1 + Q2) / 2 = (4 + 3) / 2 = 3.5

MAP Score
(Q3 + Q4 + Q5 + Q6) / 4 = (5 + 4 + 2 + 3) / 4 = 3.5

MEASURE Score
(Q7 + Q8 + Q9 + Q10) / 4 = (2 + 1 + 3 + 2) / 4 = 2.0

MANAGE Score
(Q11 + Q12) / 2 = (3 + 2) / 2 = 2.5

Output: NIST Maturity Profile

NIST Function	Score
GOVERN	3.5
MAP	3.5
MEASURE	2.0
MANAGE	2.5

Insight: Governance and risk identification are moderate, but risk measurement and mitigation are weak.

Step 4B: RAI Principle Quality View

Scores are aggregated by Responsible AI principles.

RAI Principle	Score
Fairness	(Q5 + Q7)/2 = 2.0
Transparency	Q8 = 1.0
Privacy	Q6 = 3.0
Security	Q9 = 3.0
Accountability	(Q1 + Q2)/2 = 3.5
Robustness	Q12 = 2.0

Output: RAI Principle Profile

Principle	Score
Fairness	2.0
Transparency	1.0
Privacy	3.0
Security	3.0
Accountability	3.5
Robustness	2.0

Insight: Transparency and fairness are major weaknesses in the AI system.

Step 4C: AI Risk Impact View

Risk scores are derived from principle weaknesses (lower principle score → higher risk).

Example formula:
Risk Score = 5 − Principle Score

Risk Category	Risk Score
Bias Risk	5 − 2.0 = 3.0
Transparency Risk	5 − 1.0 = 4.0
Privacy Risk	5 − 3.0 = 2.0
Security Risk	5 − 3.0 = 2.0
Governance Risk	5 − 3.5 = 1.5
Performance Risk	5 − 2.0 = 3.0

Output: Risk Heatmap

Risk Level
High Risk: Transparency
Medium Risk: Bias, Performance
Low Risk: Privacy, Security
Very Low Risk: Governance

Step 5: Cross-View Insights

By combining NIST maturity, RAI principles, and risk scores, we derive deeper insights.

Observation	Interpretation
Low MEASURE score (2.0)	Risk assessment capability is weak
Very low Transparency score (1.0)	AI decisions are opaque
High Transparency risk (4.0)	Explainability must be prioritized
Low Fairness score (2.0)	Bias risk is significant
Moderate GOVERN score (3.5)	Governance exists but is not strong
Low MANAGE score (2.5)	Mitigation capability is insufficient

Step 6: Decision & Action Layer

Based on combined insights:

Condition	Decision
High transparency risk + weak measurement capability	Mitigate immediately
Medium bias risk	Implement bias mitigation controls
Moderate governance maturity	Strengthen AI policies and accountability

Final Decision

The AI system should not be fully approved in its current state.
Priority actions include improving explainability, strengthening fairness assessment, and enhancing risk mitigation mechanisms.

Various Outputs-

LLM Observability: Getting Started with LangSmith and LangChain

2025-11-25T21:11:00.000-08:00

Introduction

Building AI applications often involves experimenting, debugging, and optimizing responses. LangSmith helps developers track and trace interactions with language models, making it easier to debug and monitor their applications.

In this blog, we’ll demonstrate how to set up LangSmith with LangChain to track and trace requests while invoking OpenAI's ChatOpenAI model.

Prerequisites

Before running the code, ensure you have:

An OpenAI API Key
A LangSmith API Key
Installed the required Python libraries:

pip install langchain langchain-openai python-dotenv

Loading Environment Variables

The first step is to load environment variables from a .env file. This helps keep sensitive information secure and separate from the code.

1. Create a .env file:

LANGCHAIN_API_KEY=your_langchain_api_key
LANGCHAIN_PROJECT=your_project_name
OPENAI_API_KEY=your_openai_api_key

2. Load Environment Variables in Python:

import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Verify environment variables are loaded
os.environ

The load_dotenv() function loads the variables into os.environ, making them accessible throughout the script.

Enabling LangSmith Tracking and Tracing

LangSmith enables tracking of prompts and responses sent to the LLM. We can enable it by setting the following environment variables:

# Setting up LangSmith for tracking
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = os.getenv("LANGCHAIN_PROJECT")

These settings allow LangSmith to trace interactions and group them under a specified project.

Invoking the Language Model

Now, let’s instantiate the ChatOpenAI model from LangChain and invoke it with a query.

from langchain_openai import ChatOpenAI

# Initialize the model
llm = ChatOpenAI(model="gpt-3.5-turbo")

# Print model details
print(llm)

# Invoke the model with a query
response = llm.invoke("What is Agentic AI?")
print(response)

The above code:

Instantiates OpenAI’s ChatOpenAI model.
Invokes it with the query “What is Agentic AI?”.
Prints the response from the model.

With LangSmith enabled, all requests and responses are logged, allowing you to analyze them via LangSmith’s dashboard.

Langsmith Dashboard

Why Use LangSmith with LangChain?

Improved Debugging: Tracks prompts and responses to help diagnose unexpected outputs.
Better Observability: Logs request-response pairs for auditing and optimization.
Performance Analysis: Helps identify issues like token overuse or slow response times.
Experiment Tracking: Organizes LLM experiments by grouping them under projects.

Conclusion

In this tutorial, we covered how to:

Load environment variables securely.
Enable LangSmith for tracking and tracing.
Invoke OpenAI's ChatOpenAI model using LangChain.

Enhancing AI Knowledge Retrieval with LangChain, Deepseek-r1-70b( Groq) and FAISS

2025-11-25T21:05:00.000-08:00

In the evolving landscape of artificial intelligence, retrieving relevant information from vast sources has become a crucial aspect of building intelligent applications. This blog demonstrates how to harness the power of LangChain, FAISS, and Groq’s deep learning models to extract and process information from web sources like Times of India.

Introduction

Retrieving relevant and up-to-date information is essential for AI-driven applications, whether it be in journalism, research, or customer service. This guide will show you how to scrape, process, and query web-based data using LangChain and FAISS, with Groq’s LLM as the backbone of intelligent responses.

Prerequisites

Before running the code, ensure you have the following installed:

langchain
langchain_community
faiss-cpu
groq
dotenv
requests
beautifulsoup4

You can install them using pip:

pip install langchain langchain_community faiss-cpu groq python-dotenv requests beautifulsoup4

Additionally, you need a GROQ API Key, which should be stored in a .env file.

Code Breakdown

1. Setting Up Environment and LLM

We begin by loading environment variables and initializing Groq’s LLM for text generation.

import os
from dotenv import load_dotenv
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_groq import ChatGroq
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

# Load API keys and environment variables
load_dotenv()

if "GROQ_API_KEY" not in os.environ:
    raise ValueError("GROQ_API_KEY not found in environment variables. Please add it to your .env file.")

# Initialize the language model
llm = ChatGroq(model="deepseek-r1-distill-llama-70b", groq_api_key=os.getenv("GROQ_API_KEY"))

2. Scraping and Processing Web Content

Using LangChain’s WebBaseLoader, we scrape content from Times of India.

# Load web content from Times of India
loader = WebBaseLoader("https://timesofindia.indiatimes.com/")
document = loader.load()

# Split content into manageable chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
documents = text_splitter.split_documents(document)

3. Embedding and Vector Store Creation

To efficiently store and retrieve relevant information, we use FAISS along with OllamaEmbeddings.

# Create document embeddings and store in FAISS
embeddings = OllamaEmbeddings(model='mxbai-embed-large')
vectorstore = FAISS.from_documents(documents, embeddings)

4. Creating Retrieval and Query Chains

We define the structure for document retrieval and querying.

# Define prompt structure
prompt = ChatPromptTemplate.from_template(
    """
    Answer the following question based only on the provided context:
    <context>
    {context}
    </context>
    """
)

# Create document chain
document_chain = create_stuff_documents_chain(llm, prompt)

# Set up retrieval mechanism
retriever = vectorstore.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

5. Querying the Model

Finally, we can ask questions based on the retrieved knowledge.

# Example query
result = retrieval_chain.invoke({"input": "What are the latest news headlines?"})
print(result['answer'])

Conclusion

With this approach, we can effectively scrape, process, and query data from Times of India, allowing for real-time, AI-driven information retrieval. This method can be extended to other sources, making it valuable for various business applications, including journalism, market research, and automated content summarization.

By integrating LangChain, FAISS, and Groq’s LLM, we enhance the way AI interacts with real-world information, providing accurate and contextually relevant responses.

Building an Multi-Source Search Engine with LangChain Agents

2025-11-25T21:03:00.000-08:00

Overview

In today’s world of vast digital information, having an efficient search engine that integrates multiple sources is crucial. This blog explores how to build a powerful search engine using LangChain, Wikipedia, Arxiv, and a custom retriever tool. The system is orchestrated by an AI agent powered by OpenAI’s Llama3-8b-8192 model, making it capable of fetching relevant information seamlessly.

1. Importing Necessary Libraries

To get started, we import the required libraries for integrating search functionalities from Wikipedia and Arxiv.

from langchain_community.tools import ArxivQueryRun, WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper, ArxivAPIWrapper

2. Setting Up Wikipedia Tool

We create a Wikipedia API wrapper that retrieves results with a maximum of 250 characters.

api_wrapper_wiki = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=250)
wiki = WikipediaQueryRun(api_wrapper=api_wrapper_wiki)
print(wiki.name)  # Outputs: 'wikipedia'

This tool enables quick access to summarized Wikipedia content.

3. Setting Up Arxiv Tool

Arxiv is a repository for research papers. We configure an Arxiv API wrapper to fetch concise results from scientific articles.

api_wrapper_arxiv = ArxivAPIWrapper(top_k_results=1, doc_content_chars_max=250)
arxiv = ArxivQueryRun(api_wrapper=api_wrapper_arxiv)
print(arxiv.name)  # Outputs: 'arxiv'

Both Wikipedia and Arxiv tools are combined into a list:

tools = [wiki, arxiv]

4. Creating a Custom Retriever Tool

To enhance search capabilities, we integrate a custom retriever tool using FAISS for vector-based search. Idea is if any query comes related to Langsmith, it should come to this tool.

from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

We load documents from a specific URL, split them into smaller chunks, and create a retriever tool.

loader = WebBaseLoader("https://docs.smith.langchain.com/")
docs = loader.load()
documents = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200).split_documents(docs)
vectordb = FAISS.from_documents(documents, OpenAIEmbeddings())
retriever = vectordb.as_retriever()

Creating and adding the retriever tool:

from langchain.tools.retriever import create_retriever_tool
retriever_tool = create_retriever_tool(retriever, "langsmith-search", "Search any information about Langsmith")
tools.append(retriever_tool)

5. Setting Up the AI Model and Agent

We configure ChatGroq as the AI model and load environment variables for API keys.

from langchain_groq import ChatGroq
from dotenv import load_dotenv
import openai
import os

load_dotenv()
groq_api_key = os.getenv("GROQ_API_KEY")
openai.api_key = os.getenv("OPENAI_API_KEY")
llm = ChatGroq(groq_api_key=groq_api_key, model_name="Llama3-8b-8192")

6. Creating the AI Agent

We pull a pre-defined prompt template from LangChain’s hub to guide the AI agent’s responses.

from langchain import hub
prompt = hub.pull("hwchase17/openai-functions-agent")

Next, we create an AI agent that integrates the tools, language model, and prompt.

from langchain.agents import create_openai_tools_agent
agent = create_openai_tools_agent(llm, tools, prompt)

To execute the agent, we set up an AgentExecutor.

from langchain.agents import AgentExecutor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

7. Executing Search Queries

The agent can now be invoked to fetch information from Wikipedia, Arxiv, or the custom retriever tool.

agent_executor.invoke({"input": "Tell me about Langsmith"})

Additional example queries:

agent_executor.invoke({"input": "What is machine learning"})
agent_executor.invoke({"input": "What's the paper 1706.03762 about?"})

Conclusion

This multi-source AI-powered search engine is an effective tool for retrieving information from Wikipedia, Arxiv, and a custom document retriever. The combination of LangChain, FAISS, OpenAI’s Llama3-8b-8192, and ChatGroq creates a dynamic and scalable search system.

Understanding LangChain: A Deep Dive into LLM Chaining in Chatbot with OOP Concepts

2025-11-25T21:01:00.000-08:00

Introduction

LangChain is a powerful framework that helps developers integrate Large Language Models (LLMs) into applications with structured workflows, memory handling, and chaining mechanisms. In this blog, we will explore an example code that sets up an LLM chain with memory, breaking it down into Python concepts and OOP principles.

Code Walkthrough of basic chatbot with OOP and Python Concepts

1. Importing Required Libraries

import os
from langchain import OpenAI
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate
from langchain_ollama import ChatOllama

Encapsulation: These imports bring in various modules, encapsulating functionalities like LLM handling, memory storage, and prompt management into separate classes.

2. Initializing the Language Model (LLM)

llm = ChatOllama(
    model="deepseek-r1:1.5b",
    base_url="http://localhost:11434",
    temperature=0.3
)

Class Instantiation: ChatOllama is a class that we instantiate with specific parameters like model name, API base URL, and temperature.
Encapsulation & Abstraction: The underlying details of how the model communicates with the API are abstracted away inside the ChatOllama class.

3. Creating a Memory Object

memory = ConversationBufferMemory(memory_key="chat_history")

State Management: The ConversationBufferMemory class maintains chat history, allowing the LLM to retain context across interactions.
Encapsulation: The class hides internal memory handling, exposing only necessary methods for use.

4. Defining a Prompt Template

template = """You are a helpful assistant. Here is the conversation history:
{chat_history}
Human: {human_input}
AI:"""

prompt = PromptTemplate(input_variables=["chat_history", "human_input"], template=template)

Template Design Pattern: PromptTemplate acts as a structured format for inputs to the LLM.
Encapsulation: The PromptTemplate class encapsulates the logic of handling dynamic variables inside the prompt.

5. Creating an LLM Chain with Memory

llm_chain = LLMChain(llm=llm, prompt=prompt, memory=memory)

Chaining Pattern: LLMChain enables sequential execution of LLM calls while retaining memory.
Composition: LLMChain is composed of llm, prompt, and memory, demonstrating OOP’s composition principle.
Encapsulation & Abstraction: LLMChain hides the complexities of managing multiple interactions with memory, exposing an easy-to-use interface.

6. Generating Responses

response1 = llm_chain.predict(human_input="Hello, who are you?")
print(response1)

response2 = llm_chain.predict(human_input="What can you do?")
print(response2)

response3 = llm_chain.predict(human_input="Tell me a joke.")
print(response3)

Method Calls: predict() is a method in LLMChain that interacts with the LLM using the provided prompt and memory.
Polymorphism: Different inputs result in different outputs from the same method (predict()), showcasing method behavior adaptation.
Encapsulation: The details of token processing and inference are hidden inside LLMChain and ChatOllama.

Summary of OOP Concepts Used

OOP Concept	Explanation
Encapsulation	Hides the internal workings of classes, such as memory management in ConversationBufferMemory and inference handling in ChatOllama.
Abstraction	The complexity of interacting with the LLM is hidden behind the ChatOllama and LLMChain classes, providing a simple interface.
Composition	LLMChain is composed of multiple objects (llm, prompt, memory) to form a functional workflow.
Polymorphism	The predict() method behaves differently based on input, demonstrating method adaptability.
State Management	ConversationBufferMemory maintains chat history, allowing the AI to retain context.

Conclusion

This example showcases various OOP principles like encapsulation, composition, and polymorphism in action. LangChain abstracts the complexities of managing LLM interactions, making it easy for developers to integrate AI models efficiently into applications. By leveraging memory and structured prompts, developers can create context-aware AI applications effortlessly.

Evaluating Text-to-SQL LLMs: Key Metrics and Their Significance

2025-11-25T20:52:00.000-08:00

Text-to-SQL LLMs bridge the gap between natural language queries and structured databases by translating user queries into SQL statements. To ensure accuracy, we need robust evaluation metrics at each stage. This blog explores the key metrics used to assess keyword extraction, retrieval, LLM response quality, and final SQL query generation.

1. Keyword Extraction Metrics

Extracting the right keywords ensures the retrieval step fetches the most relevant database metadata.

Out-of-Vocabulary (OOV)

Definition: Measures the percentage of extracted keywords that are not present in the predefined vocabulary. Example: If the query is "Find top 10 employees by salary" and the extracted keywords are {"Find", "top", "salary"}, but "Find" is not in the vocabulary, the OOV rate is 33%.

Average Edit Distance

Definition: Computes the average Levenshtein distance (number of edits required to match words) between extracted and ground-truth keywords. Example: If "emploees" is extracted instead of "employees", the edit distance is 1.

Cosine Similarity

Definition: Measures vector similarity between extracted and expected keywords. Example: If "salary" and "income" are close in an embedding space, cosine similarity would be high.

Lexical Overlap

Definition: Calculates the overlap between extracted and reference keywords. Example: Extracted keywords: {"employees", "salary"}; Ground truth: {"salary", "wages"}. Overlap = 1/2 (50%).

2. Retrieval Metrics

Once keywords are extracted, relevant metadata (table/column names) and few-shot examples must be retrieved.

Cosine Similarity

Same as in keyword extraction but applied to retrieved database schema elements.

Cumulative Gain (CG)

Definition: Measures the total relevance of items in a retrieved list, without considering their positions.

Discounted Cumulative Gain (DCG)

Definition: Improves upon CG by accounting for item positions, reducing the contribution of later items.

Normalized Discounted Cumulative Gain (NDCG)

Definition: Normalizes DCG by comparing it to the ideal DCG (IDCG), ensuring fair comparisons.

Precision@k

Definition: Measures the proportion of relevant results in the top-k retrieved items.

Recall@k

Definition: Measures the proportion of relevant items retrieved out of all possible relevant items.

Mean Reciprocal Rank (MRR)

Definition: Computes the reciprocal rank of the first relevant result.

Mean Average Precision (MAP)

Definition: Averages precision over multiple queries.

3. LLM Evaluation Metrics

LLM-generated SQL queries should be contextually relevant and faithful to the input information.

Faithfulness

Definition: Measures whether the LLM output is supported by the given context without adding extraneous information.

Calculation: Faithfulness is determined by extracting statements from the generated output and checking whether each is verifiable by the provided context. An LLM assists in extracting statements and identifying supported vs. unsupported claims.

Context Relevance

Definition: The contextual relevancy metric measures the quality of your RAG pipeline's retriever by evaluating the overall relevance of the information presented in your retrieval_context for a given input.

Calculation:

Answer Relevance

Definition: The answer relevancy metric measures the quality of your RAG pipeline's generator by evaluating how relevant the actual_output of your LLM application is compared to the provided input.

Calculation:

The Answer Relevancy Metric first uses an LLM to extract all statements made in the actual output. Then, the same LLM classifies whether each statement is relevant to the input query.

Geval (Custom Criteria)

Definition: A custom metric defined based on domain-specific SQL evaluation needs.

Hallucination Rate

Definition: The hallucination metric determines whether your LLM generates factually correct information by comparing the actual_output to the provided context.

Calculation:

4. Text-to-SQL Metrics

The final SQL output needs to be syntactically and semantically correct.

Exact Match

Definition: Checks if the generated SQL exactly matches a ground-truth SQL statement.

ROUGE Score

Definition: Measures n-gram overlap between generated and reference SQL queries.

BLEU Score

Definition: Computes precision-based similarity for SQL sequences.

Semantic Correctness

Definition: Ensures the SQL query retrieves the expected results even if the structure differs.

Syntax Correctness

Definition: Checks if the SQL query is valid and executable.

Query Execution Time

Definition: Measures how long the generated query takes to execute, impacting performance.

Evaluating a text-to-SQL pipeline requires a comprehensive approach covering keyword extraction, retrieval, LLM responses, and SQL correctness. Using these metrics ensures robustness, efficiency, and accuracy in generating SQL queries from natural language.

references-

https://docs.confident-ai.com/docs/metrics-hallucination

https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/

https://www.evidentlyai.com/ranking-metrics/ndcg-metric

Explainable Reinforcement Learning (XRL): A Literature Survey

2025-11-25T20:48:00.000-08:00

Introduction

Reinforcement Learning (RL) has shown significant advancements in solving complex tasks, but the lack of interpretability limits its adoption in high-stakes applications. Explainable Reinforcement Learning (XRL) aims to bridge this gap by providing insights into model behavior, state transitions, reward structures, and policy decisions. This blog explores various approaches to XRL, categorized into model explanation, state explanation, reward explanation, and task explanation.

Model Explanation

Model explanation focuses on generating interpretable policies and decision-making processes.

Method	Explanation Technique
SHAP – Deep Explainer	Utilizes SHAP (SHapley Additive exPlanations) values to explain model outputs by assigning importance to each feature. Source
Autonomous Policy Explanation	Summarizes policies using structured causal models to elucidate decision-making.
Policy Summarization	Generates concise summaries and allows query-based explanations of policies.
Dot to Dot	Constructs deep symbolic policy representations for better interpretability.
Self-Explainable LMUT	Employs Linear Model U-Trees and decision trees to visualize and explain policies.

Limitations:

Existing methods often require curated datasets and specific use cases.
The trade-off between interpretability and performance is not always well understood.

State Explanation

State explanation aims to provide insights into why an agent takes specific actions given a state.

Method	Explanation Technique
History Trajectory Analysis	Examines past actions and their influences on current decisions.
Object Saliency Maps	Highlights important objects in the environment that affect decision-making.
Future Prediction	Forecasts future states to justify current actions.
Contrastive Explanation via ESP	Offers contrastive justifications for different actions to explain why certain decisions were made over others.

Limitations:

Requires extensive trajectory analysis.
Contextual saliency may not always align with human intuition.

Reward Explanation

Understanding reward structures is essential for interpreting RL behavior.

Method	Explanation Technique
Reward Decomposition	Breaks down rewards into interpretable components to clarify their contributions. Source
Shapley Q-values	Applies Shapley values for fair credit assignment among agents in multi-agent settings. Source
COMA Shapley Credit Assignment	Allocates reward contributions in cooperative multi-agent scenarios.
Reward Shaping	Modifies reward signals to enhance learning and interpretability.
ELLA	Enhances reward explanations using causal analysis techniques.

Limitations:

Requires knowledge of underlying reward functions.
Reward shaping may influence learning dynamics in unintended ways.

Task Explanation

Task-level explanations focus on hierarchical decomposition and zero-shot learning.

Method	Explanation Technique
Whole Top-Down Structure	Explains tasks hierarchically to show the breakdown of complex tasks into simpler subtasks.
Zero-shot Composition	Demonstrates how agents generalize to new tasks without prior specific training.
Hierarchical Policy	Structures policies into interpretable sub-policies for clarity.
Simple Task Division	Decomposes complex tasks into simpler, manageable steps.
MARL Explainers (CARE)	Provides explanations for policies in multi-agent reinforcement learning environments.

Limitations:

Hard to generalize across different environments.
Requires well-defined task hierarchies.

Conclusion

Explainable RL is a crucial research area aimed at making RL models more interpretable and trustworthy. While significant progress has been made in model, state, reward, and task explanations, challenges remain in generalizability, dataset dependencies, and balancing interpretability with performance. Future work should focus on standardizing evaluation metrics and improving human-centered explanations.

Building a Simple Chatbot Using LangGraph and LangChain

2025-11-25T20:45:00.000-08:00

Introduction

In this blog, we'll explore how to build a chatbot using LangGraph and LangChain with Groq's Gemma2-9b-it model. We'll cover key concepts like defining a State, using reducer functions, and structuring a graph-based workflow for managing chat interactions. By the end, you'll understand how to construct an AI-powered conversational agent efficiently.

Why Use LangGraph?

LangGraph extends LangChain by allowing developers to define stateful workflows using graphs. Instead of executing sequential chains, you can manage complex decision trees and multi-step interactions with a structured state management system.

Key Benefits:

Graph-based execution: Define flexible, modular workflows.
State management: Track conversation history efficiently.
Reducer functions: Append messages instead of overwriting previous state data.

Step 1: Defining the State

When defining a graph, the first step is to define its State. The State acts as the schema for our chatbot’s conversation history.

from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph.message import add_messages

class State(TypedDict):
    messages: Annotated[list, add_messages]

What’s Happening Here?

State as a TypedDict: The State dictionary holds the conversation’s messages.
Annotated[list, add_messages]: The add_messages function ensures new responses are appended instead of overwriting previous messages.

Step 2: Creating the Chatbot Node

We create a chatbot node using LangGraph's StateGraph, which defines the structure of our conversational workflow.

from langgraph.graph import StateGraph
from langchain_groq import ChatGroq

# Initialize the LLM (Gemma2-9b-it)
llm = ChatGroq(model="gemma2-9b-it")

# Create a graph builder with our State schema
graph_builder = StateGraph(State)

# Define chatbot function
def chatbot(state: State):
    return {"messages": [llm.invoke(state["messages")]]}

# Add the chatbot node to the graph
graph_builder.add_node("chatbot", chatbot)

Explanation:

StateGraph(State): Initializes the workflow with our defined State.
chatbot(state: State): The chatbot function receives the conversation history and calls the LLM for a response.
add_node("chatbot", chatbot): Adds a node to handle conversation updates.

Step 3: Setting Entry and Finish Points

After defining the chatbot node, we specify where the conversation starts and ends.

# Set entry and finish points
graph_builder.set_entry_point("chatbot")
graph_builder.set_finish_point("chatbot")

# Compile the graph
graph = graph_builder.compile()

Why Is This Needed?

set_entry_point("chatbot"): Ensures every session starts with the chatbot.
set_finish_point("chatbot"): Defines where the graph execution ends.
compile(): Converts our graph definition into an executable workflow.

Step 4: Streaming Chatbot Responses

To make the chatbot interactive, we define a function that streams updates from the graph in real-time.

def stream_graph_updates(user_input: str):
    for event in graph.stream({"messages": [{"role": "user", "content": user_input}]}):
        for value in event.values():
            print("Assistant:", value["messages"][-1].content)


while True:
    try:
        user_input = input("User: ")
        if user_input.lower() in ["quit", "exit", "q"]:
            print("Goodbye!")
            break

        stream_graph_updates(user_input)
    except:
        # Fallback if input() is not available
        user_input = "What do you know about LangGraph?"
        print("User: " + user_input)
        stream_graph_updates(user_input)
        break

snapshot of chat communication

Explanation:

stream_graph_updates(user_input: str):
- Sends user input to the graph for processing.
- Streams responses back in real-time.
- Extracts and prints the assistant's reply from the message history.
while True:: Runs an interactive chat loop.
Handles graceful exit (quit, exit, q) to end the conversation.
Includes a fallback mechanism to ensure at least one response is generated.

Understanding Reducer Functions in Python

A reducer function in Python is a function that accumulates a sequence of values into a single result.

In our chatbot implementation, the add_messages reducer ensures new responses are added without overwriting existing chat history.

Example of a Simple Reducer Function:

from functools import reduce

def sum_numbers(numbers):
    return reduce(lambda x, y: x + y, numbers)

print(sum_numbers([1, 2, 3, 4]))  # Output: 10

In our chatbot:

The chat state acts as the list.
The add_messages reducer appends new LLM responses to the list.
Unlike direct assignment (state["messages"] = new_messages), it preserves chat history by appending new interactions.

Reducer Function in LangGraph

In LangGraph, reducers help manage state updates efficiently. Here’s how it works:

from langgraph.graph.message import add_messages

class State(TypedDict):
    messages: Annotated[list, add_messages]

add_messages ensures new messages are appended, not overwritten.
Without a reducer, assigning state["messages"] = new_messages would replace the old conversation.

Conclusion

Using LangGraph, we structured a chatbot with a stateful conversation flow, leveraging TypedDict, state reducers, and graph-based execution. This setup ensures efficient state management and a more scalable chatbot architecture.

Model Distillation: Simplifying AI Models Without Losing Accuracy

2025-11-25T20:41:00.000-08:00

Model compression refers to techniques that reduce the size and computational cost of deep learning models while maintaining accuracy. This is crucial for deploying AI on edge devices, mobile applications, and low-power environments.

1. Knowledge Distillation

A small student model is trained to imitate a larger teacher model by learning from its soft labels. This reduces model complexity while preserving performance. Example: BERT → DistilBERT.

2. Quantization

Weights and activations are converted from 32-bit floating point (FP32) to lower precision formats like INT8 or FP16, reducing model size and speeding up inference. Used in mobile AI applications.

3. Pruning

Unimportant weights or entire neurons are removed from the network, reducing redundancy. It can be structured (entire layers) or unstructured (individual weights). Often combined with quantization.

4. Low-Rank Factorization

Weight matrices are decomposed into smaller matrices using techniques like Singular Value Decomposition (SVD), reducing computation while keeping model performance stable.

5. Weight Sharing

Similar weights are grouped together and stored efficiently, minimizing redundancy without significantly affecting model accuracy.

Choosing the Right Technique

The best approach depends on the use case. Distillation is great for smaller models, quantization is ideal for faster inference, and pruning helps in deploying lightweight models. A combination of these techniques often yields the best results.

How Model distillation Works

Train a large, powerful model (Teacher Model).
Use the teacher model’s soft predictions (probabilities instead of hard labels) as additional training data.
Train a smaller model (Student Model) to mimic the teacher’s behavior using these soft predictions.

Benefits of Model Distillation

✅ Faster inference – Smaller models run more quickly

.✅ Lower resource usage – Ideal for edge devices and mobile applications

.✅ Improved generalization – Helps reduce overfitting in smaller models.

Example in NLP

Large language models like GPT-4 can be distilled into smaller models like DistilBERT, which retains most of the performance but is much faster.

DistilBert takes lesser time in inference, less resource utilisation cost making it more useful for real time applications like Chat.

Types of Distillation based on availability of ground truth

1. Supervised Distillation (with Ground Truth)

The student model learns from both the teacher model’s soft labels and the true labels from the dataset.
This is useful when labeled data is available and helps balance between generalization and accuracy.
Example: Distilling a BERT model into a DistilBERT model while training on a labeled NLP dataset.

2. Self-Distillation / Unsupervised Distillation (No Ground Truth)

The student only learns from the teacher’s soft outputs, without any ground-truth labels.
This is useful when labeled data is scarce or unavailable.
The assumption is that the teacher’s predictions carry useful knowledge even without explicit labels.
Example: Using a large LLaMA model to train a smaller one for chat-based tasks without labeled responses.

Knowledge Distillation in PyTorch

we will guide you through the implementation of knowledge distillation using PyTorch, with a focus on key PyTorch concepts like no_grad(), model.eval(), and train().

🔑 Key Idea

Instead of training the student model on hard labels (ground truth), we also train it using the soft labels provided by the teacher model. This helps the student learn better generalization.

Step 1: Import Required Libraries

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import datasets, transforms

🔍 Explanation

torch – Core PyTorch library.
torch.nn – Helps in defining models and layers.
torch.optim – Provides optimization algorithms.
torch.nn.functional (F) – Provides activation functions and loss functions.
torchvision.datasets & transforms – Handles dataset loading and preprocessing.

Step 2: Load & Preprocess the Dataset

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)

🔍 Explanation

transforms.ToTensor() – Converts images to tensors.
transforms.Normalize() – Normalizes pixel values.
DataLoader() – Efficiently loads batches of data for training/testing.

Step 3: Define Teacher & Student Models

Teacher Model (Larger)

class TeacherModel(nn.Module):
    def __init__(self):
        super(TeacherModel, self).__init__()
        self.fc1 = nn.Linear(28*28, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 10)
    
    def forward(self, x):
        x = x.view(-1, 28*28)  # Flatten image
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Student Model (Smaller)

class StudentModel(nn.Module):
    def __init__(self):
        super(StudentModel, self).__init__()
        self.fc1 = nn.Linear(28*28, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = x.view(-1, 28*28)  # Flatten image
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

🔍 Explanation

Fully connected layers (nn.Linear()) – Define layers of the neural network.
F.relu() – ReLU activation function.
x.view(-1, 28*28) – Flattens images from 28×28 to 784 pixels.

Step 4: Define Loss & Optimizer

teacher = TeacherModel()
student = StudentModel()

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(student.parameters(), lr=0.001)

🔍 Explanation

CrossEntropyLoss() – Standard classification loss function.
Adam() – Adaptive learning rate optimizer.

Step 5: Train Teacher Model

def train_teacher(model, train_loader, optimizer, criterion, epochs=5):
    model.train()
    for epoch in range(epochs):
        for images, labels in train_loader:
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

🔍 Key PyTorch Concepts

model.train() – Enables training mode (activates dropout/batch norm).
optimizer.zero_grad() – Clears old gradients.
loss.backward() – Computes gradients for backpropagation.
optimizer.step() – Updates model weights.

Step 6: Train Student Using Knowledge Distillation

def train_student(teacher, student, train_loader, optimizer, criterion, alpha=0.5, T=3):
    teacher.eval()
    
    for images, labels in train_loader:
        optimizer.zero_grad()

        with torch.no_grad():
            teacher_outputs = teacher(images)
        
        student_outputs = student(images)
        
        hard_loss = criterion(student_outputs, labels)
        soft_loss = nn.KLDivLoss()(F.log_softmax(student_outputs / T, dim=1),
                                   F.softmax(teacher_outputs / T, dim=1))
        
        loss = alpha * hard_loss + (1 - alpha) * soft_loss
        loss.backward()
        optimizer.step()

🔍 Key PyTorch Concepts

teacher.eval() – Sets teacher model to inference mode.
torch.no_grad() – Prevents gradient computation (saves memory & speeds up inference).
Soft Targets (Temperature Scaling):
- F.softmax(teacher_outputs / T, dim=1) – Applies softmax with temperature.
- F.log_softmax(student_outputs / T, dim=1) – Log softmax before KL divergence loss.
- KLDivLoss() – Measures the divergence between student & teacher predictions.

Step 7: Evaluate Student Model

def evaluate(model, test_loader):
    model.eval()
    correct = 0
    total = 0
    
    with torch.no_grad():
        for images, labels in test_loader:
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    print(f'Accuracy: {100 * correct / total:.2f}%')

🔍 Key PyTorch Concepts

model.eval() – Ensures proper inference behavior.
torch.no_grad() – Disables gradients for efficiency.
torch.max(outputs, 1) – Finds highest probability class.

Conclusion

By using knowledge distillation, we successfully trained a smaller, efficient student model that mimics a larger teacher model. This approach is widely used in real-world applications like deploying lightweight models on mobile devices.

Getting Started with LangSmith and LangChain: A Quickest Demo

2025-09-06T09:53:00.000-07:00

Introduction

In this blog, we’ll demonstrate how to set up LangSmith with LangChain to track and trace requests while invoking OpenAI's ChatOpenAI model.

Prerequisites

Before running the code, ensure you have:

An OpenAI API Key
A LangSmith API Key
Installed the required Python libraries:

pip install langchain langchain-openai python-dotenv

Loading Environment Variables

The first step is to load environment variables from a .env file. This helps keep sensitive information secure and separate from the code.

1. Create a .env file:

LANGCHAIN_API_KEY=your_langchain_api_key
LANGCHAIN_PROJECT=your_project_name
OPENAI_API_KEY=your_openai_api_key

2. Load Environment Variables in Python:

import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Verify environment variables are loaded
os.environ

The load_dotenv() function loads the variables into os.environ, making them accessible throughout the script.

Enabling LangSmith Tracking and Tracing

LangSmith enables tracking of prompts and responses sent to the LLM. We can enable it by setting the following environment variables:

# Setting up LangSmith for tracking
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = os.getenv("LANGCHAIN_PROJECT")

These settings allow LangSmith to trace interactions and group them under a specified project.

Invoking the Language Model

Now, let’s instantiate the ChatOpenAI model from LangChain and invoke it with a query.

from langchain_openai import ChatOpenAI

# Initialize the model
llm = ChatOpenAI(model="gpt-3.5-turbo")

# Print model details
print(llm)

# Invoke the model with a query
response = llm.invoke("What is Agentic AI?")
print(response)

The above code:

Instantiates OpenAI’s ChatOpenAI model.
Invokes it with the query “What is Agentic AI?”.
Prints the response from the model.

With LangSmith enabled, all requests and responses are logged, allowing you to analyze them via LangSmith’s dashboard.

Langsmith Dashboard

Why Use LangSmith with LangChain?

Improved Debugging: Tracks prompts and responses to help diagnose unexpected outputs.
Better Observability: Logs request-response pairs for auditing and optimization.
Performance Analysis: Helps identify issues like token overuse or slow response times.
Experiment Tracking: Organizes LLM experiments by grouping them under projects.

Conclusion

In this tutorial, we covered how to:

Load environment variables securely.
Enable LangSmith for tracking and tracing.
Invoke OpenAI's ChatOpenAI model using LangChain.

Python: Audio to text conversion

2021-04-04T21:20:00.003-07:00

This is simple program to convert audio into text. I have used speec_recognition library to do it.

import speech_recognition as sr

r = sr.Recognizer()
with sr.Microphone() as source:
print("Speak Anything :")
audio = r.listen(source)
try:
text = r.recognize_google(audio)
print("You said : {}".format(text))
except:
print("Sorry could not recognize what you said")

Output-
Speak Anything : You said : internet is required to access the Google API if there is no internet you will not be able to convert voice into text

to install SpeechRecognition-
https://pypi.org/project/SpeechRecognition/

one of the dependencies is PyAudio which is not supported after 3.6.
For python verison 3.6+ one needs to download wheel ( https://python101.pythonlibrary.org/chapter39_wheels.html) and install PyAudio separately.

https://stackoverflow.com/questions/54998028/how-do-i-install-pyaudio-on-python-3-7

Exhaustive Literature Study on XAI

2020-06-03T00:50:00.004-07:00

All Frameworks of Explainable AI-

Framework	Algorithms used
Aix360	Protodash, DIP-VAE, CEM, CEM-MAF, TED, BRCG, GLRM
Alibi	CEM, counter-factual, explanation, anchors, counter factual explanation( prototype)
Dalex
Eli5	LIME, grad_cam
H20	Shapley values, K-LIME, PDP, LOCO, SDT, disparate impact analysis
Google explainable AI	Integrated gradients, shapley
MS Azure explainability	SHAP, mimic, HAN
Captum	Captum, IG, DeepLift, ( for pytoch)
Skater	Lime, PDP
Lucid- tensorflow	Set of tools ( NN explainability)
InterpretML	Surrogate model building

All Algorithms of Explainable AI-

Algorithm	Explainability	Type of Data	Mechanism	Links
ACE( Automatic concept based explanation)	Global (G)	Any		https://arxiv.org/pdf/1902.03129.pdf
Anchor	Local (L)	Structured*	Optimization	https://homes.cs.washington.edu/~marcotcr/aaai18.pdf
Autoencoder	L	Any
CAM/	L	Image	Sensitivity
GradCAM/
GradCAM++
Permutation Importance	G	structured		https://scikit-learn.org/dev/modules/permutation_importance.html
Decision Trees	L / G	Structured
DeepLift	L / G	Any	Decomposition	https://arxiv.org/pdf/1704.02685.pdf
GAM/GA2M	L / G	Structured
GEF( Generative Explanation framework)	L	Text		https://arxiv.org/pdf/1811.00196.pdf ( no code available)
ICE	L / G	Structured	Visualization	http://savvastjortjoglou.com/intrepretable-machine-learning-nfl-combine.html
Integrated Gradients	L	Any	Invariance	https://arxiv.org/pdf/1703.01365.pdf
LIME	L	Any	Optimization	https://arxiv.org/pdf/1602.04938.pdf
LRP	L / G	Any	Decomposition	https://arxiv.org/pdf/1604.00825.pdf
LOCO	L	Structured*		https://arxiv.org/pdf/1604.04173.pdf
LSTMVis	G	Text		https://arxiv.org/pdf/1811.00196.pdf
MMD-critic			Prototypes and Criticisms	https://people.csail.mit.edu/beenkim/papers/KIM2016NIPS_MMD.pdf
PDP	G	Structured	Visualization
PCA		Any	Correlation
SHAP	L / G	Any		https://arxiv.org/pdf/1705.07874.pdf
TCAV		Any	Sensitivity	https://arxiv.org/pdf/1711.11279.pdf
treeinterpreter	L / G	Structured	Optimization
T-SNE		Any	Clustering
XRAI	L	Image		https://arxiv.org/pdf/1906.02825.pdf

here is the link if you want to see how to use some of above algos o IRIS dataset- XAI on iris dataset

Stripping iris dataset with 6 explainability Algorithms./ Desmontaje del conjunto de datos de iris con 6 algoritmos explicables.

2020-05-27T21:35:00.010-07:00

Explainable AI (XAI) refers to methods and techniques in the application of AI, such that the results of the solution can be understood by human experts. It contrasts with the concept of the 'blackbox in machine learning where even their designers cannot explain why the AI arrived at a specific decision. XAI is an implementation of the social right to explanation.

Here I have taken iris dataset to build a Random Forest. My focus is on providing data explainability, model explainability aka global explainability and prediction aka local explainability.

Here is your Iris datset-

## loading data set

from sklearn import datasets

iris = datasets.load_iris()

X_df= pd.DataFrame(X, columns=iris.feature_names)

X=iris.data

Y = iris.target

1) Data explainability through IBM AIX 360's Protodash- (https://arxiv.org/abs/1707.01212)-

This algo provides prototypes( samples) of original dataset. So one can see only 10 data points representing entire dataset( million observations) .

Here I want to have only 10 data points as representative of entire dataset.

from aix360.algorithms.protodash import ProtodashExplainer, get_Gaussian_Data

explainer = ProtodashExplainer()

(W, S, _) = explainer.explain(X, X, m=10)

# Display the prototypes along with their computed weights

inc_prototypes = X_df.iloc[S, :].copy()

# Compute normalized importance weights for prototypes

inc_prototypes["Weights of Prototypes"] = np.around(W/np.sum(W), 2)

inc_prototypes

The data is represented using 10 observations.
These 10 data points are coming from input data only. See the row index.
Weights of prototype represents total percentage of similar data points in entire data. Sum of these should be 1.

-----------------------------------------------------------------------------------------------

2) Global Explainability through SHAP's treeexplainer- (https://arxiv.org/pdf/1705.07874.pdf)

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions

global explanation refers to explaining over-all feature importance in classification.It gives most important variables in model building.

import shap

explainer = shap.TreeExplainer(model, data=X_df)

shap_values = explainer.shap_values(X_df, check_additivity=False)

shap.summary_plot(shap_values, X-df, plot_type="bar")

colors in the bar represents relative contribution in classifying three classes.
Over all petal length and width are more important that sepal length and width.
This is kind of variable importance we get many many algos.

-----------------------------------------------------------------------------------------------

3) Global explainability through IBM360 LRR ( Logistic rule regression )- ( Wei et al., 2019)

Logistic Rule Regression is a directly interpretable supervised learning
method that performs logistic regression on rule-based features

# Generalized Linear Rule Models

from aix360.algorithms.rbm import FeatureBinarizer

from aix360.algorithms.rbm import LogisticRuleRegression

from sklearn.metrics import accuracy_score

fb = FeatureBinarizer(negations=True, returnOrd=True)

dfTrain, dfTrainStd = fb.fit_transform(X_df)

lrr = LogisticRuleRegression(lambda0=0.005, lambda1=0.001, useOrd=True, maxSolverIter=10000)

lrr.fit(dfTrain, Y, dfTrainStd)

print('Training accuracy:', accuracy_score(Y, lrr.predict(dfTrain, dfTrainStd)))

print('where z is a linear combination of the following rules/numerical features:')

lrr.fit(dfTrain, y, dfTrainStd)

lrr.explain()

so LRR builds a surrogate model and provides importance of rules created from the features.
This is important as one can see overall importance of a variable but even the over-all importance may vary for qualities/range of the same variable.
In the above example sepal width <=3 is better classifier than sepal width <=3.2.

-----------------------------------------------------------------------------------------------

4) Local explanation through LIME Tabular- https://arxiv.org/pdf/1602.04938.pdf

As we see in LRR, a variable may be important in classifying most of the instances but may not be important for all the instances. ( instance=data point).

To see the feature importance of a particular prediction, we should look for local explainability.

There is a lot of work already happened.I am presenting a few-

import lime

explainer = lime.lime_tabular.LimeTabularExplainer(X,

feature_names=X_df.columns, class_names=iris.target_names, discretize_continuous=True)

exp = explainer.explain_instance(X_df.iloc[1,:], model.predict_proba, num_features=10, top_labels=1)

exp.show_in_notebook(show_table=True, show_all=False)

Predicted probabilities are output class probabilities.
Horizontal bar plot shows features contributing to output class which is setosa in above example. Coffecient .49, .41 represents relative importance of features,
Values of feature for that instance is also given in 3rd table.[ feature-value table]
Explanation is for a datapoint- X_df.iloc[1,:].

-----------------------------------------------------------------------------------------------

5) Local explanation through treeinterpreter- (https://pypi.org/project/treeinterpreter/)

TreeInterpreter decomposes the predictions into the bias term (which is just the trainset mean) and individual feature contributions, so one can see which features contributed to the difference and by how much.

[line 2 in below code]

from treeinterpreter import treeinterpreter as ti

prediction, bias, contributions = ti.predict(model, X_test)

# converting 3 d to 2 d, 1 instance at a time-

contributions= contributions[0]

pd_contribution= pd.DataFrame(contributions)

pd_contribution.columns= iris.target_names

pd_contribution.index= iris.feature_names

pd_contribution['Overall Importance']=abs(pd_contribution['setosa'])+abs(pd_contribution['versicolor'])+abs(pd_contribution['virginica'])

pd_contribution.sort_values('Overall Importance', ascending=False, inplace=True)

print(pd_contribution)

table shows the importance of features in classifying all the classes for a datapoint(first data point here as I have taken contributions= contributions[0]. )
Overall importance shows how well a feature is doing in classifying all the classes.

-----------------------------------------------------------------------------------------------

6) Local explanation through SHAP Kernal Explainer-

https://github.com/slundberg/shap/blob/master/README.md

SHAP has 7 different explainability algos. Kernal Shap is one of them. It uses a specially-weighted local linear regression to estimate SHAP values for any model. So it is model agnostic also.[ works for any blackbox model]

import shap

explainer = shap.KernelExplainer(model.predict_proba, X, link="logit")

x_test_instance= X[149,:]

shap_values = explainer.shap_values(X, nsamples=100)

shap_values[2][149,:]

shap.force_plot(explainer.expected_value[2], shap_values[2][149,:], x_test_instance, iris.feature_names,

link="logit")

The above explanation shows three features each contributing to push the model output from the base value(.333) (the average model output over the training dataset we passed) towards zero.
Features pushing class label higher are shown in red.

-----------------------------------------------------------------------------------------------

There are many frameworks available for explainability. like-

Aix360	Alibi	Dalex	Eli5
H20	Google explainable AI	Skater	Lucid- tensorflow
Captum	MS Azure explainability	InterpretML	LIME/SHAP

do try these and kill the dataset next time.

another article on what to include from all the above crap in any ML model-

Explainability in Data Science:- Data, Model & Prediction

Forecasting total deaths from Covonavirus

2020-04-04T02:46:00.003-07:00

Almost 60, 000 people have died of Coronavirus and we have not reached even the peak of expected distribution of deaths. Expected distribution is somewhat bell shaped curve. Like plot of China's death because of Coronavirus-

Here I have taken machine-learning based approach to forecast total deaths from corona virus.

python( 3.6) code to do analysis-

Step 1 - loading required dataset( John Hopkins dataset from Github)
def load_data():

import pandas as pd

url='https://raw.githubusercontent.com/CSSEGISandData/COVID- 19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'

corona_data = pd.read_csv(url, sep=',')
corona_data.head()
return(corona_data)

data is updated daily, so one will get till date information of cases.

Step 2 data preprocessing- visit github page-

Pre Processing code

Step 3 forecasting deaths-

multiplicative ets model-
fit2_mul = ExponentialSmoothing(plot_dealts1, seasonal_periods=None, trend='mul', seasonal=None).fit(use_boxcox=True)

additive ets model -
fit2_add = ExponentialSmoothing(plot_dealts1, seasonal_periods=None, trend='add', seasonal=None).fit(use_boxcox=True)

Step 4- As this is not exact time series analysis. As actual distribution of deaths is bell shaped curve. We need to assume that peak will come in x number of days. Then we calculate number of deaths till peak and just multiply it with 2, to get total deaths.

the above plot is expected number of people dies if corona virus takes 15 more days to reach its peak.( additive etc model)

total deaths with additive etc model assuming it will take 10 days to reach its peak- 162996

total deaths with additive etc model assuming it will take 15 days to reach its peak- 280068

total deaths with additive etc model assuming it will take 20 days to reach its peak- 404939

total deaths with multiplicative etc model assuming it will take 10 days to reach its peak- 185368

total deaths with multiplicative etc model assuming it will take 15 days to reach its peak- 335326

total deaths with multiplicative etc model assuming it will take 20 days to reach its peak- 614004

In optimistic scenario, if we get the peak in just 5 day so total deaths would be 99408 with additive and 103785 with multiplicative model. The total death trend will be-

Complete code is present at Python Code Corona Virus Deaths

TO know about the Explainable AI- X_AI

Explainability in Data Science:- Data, Model & Prediction

2019-12-04T19:47:00.000-08:00

XAI( Explainable AI ) is grabbing lime-light in machine learning. How can we be sure that image classification algo is learning faces not background ? Customer wants to know why loan is disapproved? Globally important variable might not be responsible/ imp for individual prediction. Here XAI comes to rescue-

We have taken data from classification_data

This has some sensor values and an output class.

A) Data Explainability- what are the basic understanding required from data perspective.

1) Identify missing values, co-linear feature, feature interaction, zero importance feature, low important feature, single value feature and handle missing values, remove/ handle features accordingly.

2) Missing values- no missing values from data description

3) No good correlation between variables- can be seen from correlation plots

4) Feature interaction- tree based models would approximate integration interation in CART

5) Zero importance, low importance, single feature value- handles through RFE and models( RF, XGboost) itself.

6) Distribution and sampling of both the class and features is also seen as selection of model will depend of data distribution. Chances are data with lot of categorical variables is more suitable for tree based model.

7) Box plot itself can identify important feature for classification. We can see sensor 3, 8, 6 looks important whereas 5, 7 may not have good prediction power.

B) Other Approaches- Feature selection/engineering-

1) univariate feature selection using chi square test. ( select k best)-

2) Recursive feature Engineering RFE- select n specific features based on underlying model used. ( used)

3) PCA PCA- to reduce corelated feature by linear transformation ( not needed)

4) Autoencoders- non linear transformation of features if needed ( it will be over-kill here)

5) Feature importance by Random forest, DT( In terms of rules), other tree ensemble models like Catboost and Xgboost.- used on our scenario

C) Feature Importance on sensor data ( Global)- In practical I take features importance from the domain / business people, as in our scenario sensor 7 ( one of the least important feature) might be electric current in steel mixture plant and to see impact of current in anomalies/fault it has to be on higher sampling( micro/ mili seconds) unlike temperature. Thus we will be missing an important feature as data collection rate is not correct. Such understanding can only come from domain experts. So business understanding and ML both are equally important for feature engineering.

There are white box models like DT and Random Forest to get feature importance from model itself. In our case we have taken coefficient of logistic regression in the beginning.( see all the algos comparison at github- link Here we are relying on the models that have maximum accuracy - RF and xgboost.

Thus over all we can say that feature- 8,6, 4, 0, 1,3 looks important for classification model. Feature 7 seems having no importance in xgboost as its classification power is captured by other feature. This Important of features was visible in box-plot also.

Recursive feature Elimination is useful in selecting subset of features as it tells top feature to keep for modeling.

D) Feature Importance on sensor data ( Local)-

With the advancement of ML and Deep learning, just global importance is not useful. Business, Data scientist are looking for local explanation too. In our analysis, we have used IBM AIX 360 framework to get importance of rules on the features( importance of feature based on the values of feature and output value). The options to use different packages/framework are-

AIX360	https://github.com/IBM/AIX360/blob/master/examples/tutorials/HELOC.ipynb
Skater	https://github.com/oracle/Skater
ELI5	https://github.com/TeamHG-Memex/eli5
Alibi	https://github.com/SeldonIO/alibi
H20	https://www.h2o.ai/products-dai-mli/
MS Azure Explainability	https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-machine-learning-interpretability
DALEX	https://github.com/ModelOriented/DrWhy

The above image shows feature 8 is most important over-all but when it comes to specific predictions. Subset of feature 6 seems more importance for many predictions. We can get good insights from such rules like- sensor 6 in 1 st and 4^th quadrant has less importance compare to very strong importance in quadrant 2 and 3. If we know the exact feature name we can get lot of valuable insights.

F) SHAP Values explanation-

In above plot 10 data points from class 1 is selected, we can clearly see for these data points 6 is more important and importance of 8 is changing based on values of features. At the same time feature 1,2,3,5,7, are almost not useful at all for the prediction. ( 1 Series represents 1 observation)

Above plot has 10 observations from class -1. It shows that for class -1 , feature 0 is also important for few predictions and instead of 8 and 6, 7 and 9 are more important.

Such finding are more important when we have scenarios like multiple fault prediction, anomalies classification I industrial applications. Once we know the actual name of signals we will get very insightful information.

Above plot shows how signal 6 is mostly useful in prediction but there are many instances when it has no importance on predicted value. Also feature 6 has more classifying power for class 1 rather than -1. Similar analysis can be done on other features for better and exhaustive understanding of features- importance.

Detailed code is present on Github- link to github code

Automation of customer-care tickets resolution using NLP

2019-11-23T21:25:00.000-08:00

When we call customer care, they keep on connecting with different department like technical department, billing department etc. What if they suggest some quick fixes even though they are not expert in providing solution.

Keeping this in mind, lets build a simple solution recommendation system for internet servive providers based on cosine similarity of earlier questions with the present question. Higher the similarity, solution might be same.Assumption is questions with similar title, content will have similar solution.

Following steps would be required to implement solution in Python 3.0+-

1) load nlp libraries.

2) create dummy data with some questions and answers.

3) create a function that calculates cosine similarity of new ticket with all existing ticket titles.

4) show the solution/answer of the ticket that has maximum similarity with present ticket.

Step 1 - import required libraries-

import pandas as pd
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
nltk.download('punkt') # this is tokenizer that converts words in to tokens
nltk.download('stopwords') # all the stop words like verbs, prepositions etc.

Step 2 -create a dummy dataset-

question_ans_data= pd.DataFrame()

question_ans_data['question']= ['there is no internet','no ineternet since last 2 days','net speed is slow','wrong bill','too much charge']

question_ans_data['answer']= ['restart router, check if lights blinking','technician will be sent, check lights, restart router','technician will be sent','will get back to you','will get back to you']

have a look at data-

Step 3- create a function ( set_con) to do text pre-processing and calculate cosine similarity between 2 strings-

def set_con(X, Y):
X_list = word_tokenize(X)
Y_list = word_tokenize(Y) # convert string into word tokens
sw = stopwords.words('english')
l1 =[];l2 =[]
X_set = {w for w in X_list if not w in sw} # remove stop words
Y_set = {w for w in Y_list if not w in sw}
rvector = X_set.union(Y_set)

# form a set containing keywords of both strings as pre-process step to calculate cosine similarity ( can be calculated from sklearn.matrics also)
for w in rvector:
if w in X_set: l1.append(1) # create a vector
else: l1.append(0)
if w in Y_set: l2.append(1)
else: l2.append(0)
c = 0

# cosine formula
for i in range(len(rvector)):
c+= l1[i]*l2[i]
cosine = c / float((sum(l1)*sum(l2))**0.5)
return(cosine)

Step 4 create a subject/title of input ticket as a string-

input_ticket= 'broadband internet not working' # input ticket

Step 5, find similar most similar ticket title/s with existing ticket-

question_ans_data['cosine_similiarity']= [set_con(x ,input_ticket) for x in question_ans_data['question']] # calculating cosine similarity with existing tickets

sorted_main_df=question_ans_data.sort_values(by=['cosine_similiarity'], ascending=False)

output_dataset= sorted_main_df[sorted_main_df['cosine_similiarity'] == max(sorted_main_df['cosine_similiarity'])] # most similar tickets based on similarity of questions

output_dataset

So if the question is ' there is no internet', solution might be to restart router, check light . Given a large data-set with many tickets and possible solution, this can provide great help for customer care executives.

product recommendation approach in retail industry-

Product Recommendation using MBA

Sentiment Analysis using NLTK and Sklearn in Python

2019-09-25T19:54:00.000-07:00

Data can be downloaded from -

http://www.cs.cornell.edu/people/pabo/movie-review-data/review_polarity.tar.gz

Step 1 - loading required libraries

import os # to check working path
from sklearn.datasets import load_files # load_files automatically labels classes when input data is present in different folders
import re # for regular expressions
import nltk # for nlp
from nltk.stem import WordNetLemmatizer # to use WordNet dataset for stemming
nltk.download('wordnet')
from sklearn.feature_extraction.text import TfidfVectorizer # get tf-idf values
from sklearn.model_selection import train_test_split # to split testand train dataset
from sklearn.ensemble import RandomForestClassifier # for classification
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import pickle # to save model

Step 2 - loading data

movie_data = load_files("C:\\D\\Learning\\Sentiment Analysis usinf sklearn\\txt_sentoken")
X,y=movie_data.data, movie_data.target

Step 3- data preprocessing and converting into tf-idf values ( documents are converted into array of all the words ( tf-idf value of every word in every documents)

new_X= []
for data in X:
data1= str(data)
data2= re.sub(r'[^\w]', " ", data1) # replaces all special characters
data3= re.sub(r'[\s+\W+\s]', " ", data2) # replaces all single letter word
data4= re.sub(r'[ ][ ]+', " ", data3) # removes multiple spaces
data5 = re.sub(r'^b\s+', '', data4) # removes leading b
document = re.sub(r'\s+[a-zA-Z]\s+', ' ', data5) # removes single letter
document_splitted= document.lower()
document_splitted= document.split() # stemming has to be done on strings
stemmer = WordNetLemmatizer()
stemmed_doc= [stemmer.lemmatize(word) for word in document_splitted]
stemmed_str= " ".join(stemmed_doc) # converting list back to str
new_X.append(stemmed_str) # creating list of documents
vectorizer = TfidfVectorizer()
X= vectorizer.fit_transform(new_X)
X_arr= X.toarray()

Step 4- Getting train and test set and fitting classification

X_train, X_test, y_train, y_test = train_test_split(X_arr, y, test_size= .2)

classifier = RandomForestClassifier(n_estimators=1000, random_state=0)

classifier.fit(X_train, y_train)

Step 5- model Evaluation-

# model evaluation on train data
y_predicted= classifier.predict(X_train)
cf= confusion_matrix(y_train, y_predicted)

print(classification_report(y_train, y_predicted))
# model evaluation on test data
y_test_predicted= classifier.predict(X_test)
print(confusion_matrix(y_test, y_test_predicted))

print(classification_report(y_test, y_test_predicted))

Step 6- storing and loading model again-

with open('text_classifier', 'wb') as picklefile:
pickle.dump(classifier,picklefile)
with open('text_classifier', 'rb') as mfile:
model= pickle.load(mfile)

Step 7- test on new document

file1 = open("nerw_review.txt","r")
data_file= file1.readlines()

X1= vectorizer.transform(data_file) # vectorizer.transform is used to convert new doc into tf-idf
predict_review= classifier.predict(X1)

predict_review.view()

Deep Learning with H2O in Python

2019-07-09T04:36:00.002-07:00

H2O.ai is focused on bringing AI to businesses through software. Its flagship product is H2O, the leading open source platform that makes it easy for financial services, insurance companies, and healthcare companies to deploy AI and deep learning to solve complex problems. More than 9,000 organizations and 80,000+ data scientists depend on H2O for critical applications like predictive maintenance and operational intelligence. The company – which was recently named to the CB Insights AI 100 – is used by 169 Fortune 500 enterprises, including 8 of the world’s 10 largest banks, 7 of the 10 largest insurance companies, and 4 of the top 10 healthcare companies. Notable customers include Capital One, Progressive Insurance, Transamerica, Comcast, Nielsen Catalina Solutions, Macy’s, Walgreens, and Kaiser Permanente.

Using in-memory compression, H2O handles billions of data rows in-memory, even with a small cluster. To make it easier for non-engineers to create complete analytic workflows, H2O’s platform includes interfaces for R, Python, Scala, Java, JSON, and CoffeeScript/JavaScript, as well as a built-in web interface, Flow. H2O is designed to run in standalone mode, on Hadoop, or within a Spark Cluster, and typically deploys within minutes.

H2O includes many common machine learning algorithms, such as generalized linear modeling (linear regression, logistic regression, etc.), Na¨ıve Bayes, principal components analysis, k-means clustering, and word2vec. H2O implements bestin-class algorithms at scale, such as distributed random forest, gradient boosting, and deep learning. H2O also includes a Stacked Ensembles method, which finds the optimal combination of a collection of prediction algorithms using a process 6 | Installation known as ”stacking.” With H2O, customers can build thousands of models and compare the results to get the best predictions.

Here is an example to use H2O-deeplearning in Python-

Step 1- First of all , we need to install H2o package in Python.

on anaconda prompt
pip install h2o

Step 2- Initialize and start the cluster -

h2o.init()

from h2o.estimators.deeplearning import H2ODeepLearningEstimator

Step 3- load train and test data set-

train = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")

Step 4- Creating test and train data set using split-

splits = train.split_frame(ratios=[0.75], seed=1234)

Step 5- Configuring the model-

model = H2ODeepLearningEstimator(distribution = "AUTO",activation = "RectifierWithDropout",hidden = [32,32],input_dropout_ratio = 0.2,l1 = 1e-5,epochs = 10)

Step 6- train(fit the model)-

model.train(x="sepal_len", y=["petal_len"], training_frame=splits[0])

Step 7- predicting using trained model and creating a new column in test data-

(splits[1]['predicted_sepal_len'])=model.predict(splits[1])

One can compare sepal_len ( actual) and predicted_sepal_len ( forecasted ) values.

How to survive in data science and the first steps

2019-07-04T07:26:00.001-07:00

Few years before I read this article and it made sense in 2012-2017-

https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century

Days are gone when IT organizations are looking for core data science profile which includes doing research and complete the POC. There is a lot of hype around data science and in very near future this profile will become obsolete. People in the data science profile know it. It’s fancy for other IT profiles because a lot of material is bombarded by training institutes and start ups. Current demand is short term( Organizations are in exploration phase, what to do with data and delivering POCs ). Most of the organization are now looking for ML-Engineer profile which is the combination of 3 profiles- data engineer, data science and someone who can deploy in production( in cloud most of the time).

The sooner the better. So-called data scientist should move into data engineering and embrace the cloud. Here I have given small introduction on how to start working on Azure-Databricks so that people like me can become a better hiring material.

Step-1 Create Azure trial account, Databricks Workspace and launch the workspace

https://docs.azuredatabricks.net/getting-started/try-databricks.html

Step-2 Data bricks quick start-

https://docs.azuredatabricks.net/getting-started/quick-start.html#quick-start

Step-3 Why not try Keras-

https://keras.io/

A) Sequential model is a data structure given in Keras. One needs to add layers according to NN model-

f from keras.models import Sequential

model = Sequential()

B) add the layers according to structure of neural network-

from keras.layers import Dense

model.add(Dense(units=4, activation='relu', input_dim=2))

model.add(Dense(units=1, activation='linear'))

C) configure the model by passing arguments-

model.compile(loss='mean_squared_error',

optimizer='sgd',

metrics=['mae', 'mape'])

D) creating X and Y values-

x1 = np.random.randn(10000, 2)

dataframe_X= pd.DataFrame(x1)

dataframe_X.columns =['x1','x2']

Y1 = np.random.randn(10000, 1)

E) fitting the model by calling model.fit

model.fit(x_train, y_train, epochs=5, batch_size=32)

F) model evaluation-

evaluation_metrics= model.evaluate(x_test, y_test)

G) use model for prediction

predicted_value = model.predict(dataframe_X)

f) testing on same data-

predicted_vals = model.predict(x_test, batch_size=32)

Although this code is written in python but now we have run first ML program on databricks. One should start replacing python commands with PySpark commands make it a habit over time.

In production, this notebook will read run time data by scheduling a job( how to schedule a job in data bricks) and from notebook one can save predicted values in any database which can further be read by visualization tool/ another application.

Data scientist should come out of pure research, statistics, R/Python profile to be stay relevant in IT industry. Remember golden words by Charles Darwin-

Easiest and most effective way of detection abnormality/ outlier in time-series data

2018-12-14T02:32:00.000-08:00

We have read many blogs on various anomaly detection algorithms. Many a times, we don't need any algorithm to detect abnormality in a system.

Different machine learning approaches to detect abnormality in system .

data scientists are using muti-angle PCA to auto-encoders to detect abnormality in a time series data. There are other complex techniques like ABOD ;used in high dimensional data and CBOF ; used when density based algorithms fail. These techniques are effective only if you know the properties of expected abnormality in system.

The most effective approach as mentioned in Anomaly detection approaches , is building an expected rule from the variables involved and any deviation form this rule is indication of abnormality in time series. One can use auto encoder , PCA or regression to build such rules. We are using regression so that audience understand the concept and don't get bogged down by related algorithms.

We can take any home appliance for example like Electric Fan. Let's say we know the temperature of fan's motor and current going into it.

from sklearn import datasets, linear_model
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# take data values from normal running scenario, hopefully there is no issue in motor now. Generally this is the time when fan is just installed -

# creating a dummy data
data = [[352,88],[350,90],[350,89],[400,95],[400,94], [390,92], [400,93], [352,88],[350,90],[352,91],[400,95],[400,94], [390,92], [400,93],[350,90],[350,89],[400,95]]

df = pd.DataFrame(data,columns=['Current','Temp'],dtype=float)

# taking independent ( current) and dependent variable ( temperature) for relation ( to build using regression )
X= df['Current']
X1= X.values.reshape(X.size,1)
Y= df['Temp']
Y1 = Y.values.reshape(Y.size,1)

# fitting the regression model
regr = linear_model.LinearRegression()
regr.fit(X1, Y1)
predictions =regr.predict(X1)

# plotting error and analyzing it
error =Y1- predictions
plt.plot(error)

plot shows that values are lying randomly between y=0 and error is in between +- 1.5. Seems a good fit. Thus we get a relation between current and temperature of motor. If we know the actual current, we can predict temperature with some accuracy. Now the concept is - 'if actual temp is far more than what it should be ( predicted from current values), then there might be some thermal abnormality in the motor. Lets extend our example further-

Taking run time data( run time values of current and temperature) from fan now;

test_X= np.array([400,380,370,355, 370,370,350, 360, 355,352,350,350,400,400,390,400,400,380,400,380,390,400,350,350])
test_Y= np.array([96,94,93, 92, 93,98,97, 98,97,88,90,89,95,94,92,94,96,94,96,93,92,94,90,90])

# predicting temperature for the present values of current ( at run time)
test_X1= test_X.reshape(test_X.size,1)
test_Y1 = test_Y.reshape(test_Y.size,1)
run_time_predictions =regr.predict(test_X1)

# plotting the errors
plt.plot(test_Y1- run_time_predictions)

Error seems high for few minutes ( between 5 to 9) . Lets combine both test and train error values.-

# combining train and test errors to include longer period of time in analysis
X_values= np.concatenate((X1, test_X1), axis=0)
Y_values= np.concatenate((Y1, test_Y1))
prections_values= np.concatenate((predictions,run_time_predictions))
Error_values= Y_values- prections_values
plt.plot(Error_values)

Y values( errors) near x=20, shows that temperature is far more that expected for specific amount of current flow. This has to be investigated further. ( coolant might not be working, spark is happening etc) . After 23-24, motor is running fine again as error is randomly distributed along y=0.

Thus at run time high error ( positive, ie. actual more than expected) is an indicator of abnormal system. I don't know how it came like somebody is showing middle finger, but exactly the middle finger is abnormal here. haha!!

The Github link for the same is present at - Python_Regression_Anomaly_Detection

Read about the mother of all time series algorithms here- ForecastHybrid

Religious demographics of India in future: A Machine Learning View

2018-10-06T12:47:00.000-07:00

According to Sachar Committee ( ref-1) report in 2005, the religious demographics of India for next 100 years is below-

We took a machine learning approach and built different time series' to show demographics( of 2 major religion) in coming years. The data is taken from Wikipedia ( 2011 Census of India; ref 2) . Data used is given below-

Above image clearly shows that Hinduism is major religion followed by Islam. Lets create a new variable ratio of 'Hinduism to Islam' for these 70 years-

for 1951 ratio is 84.1/9.8, which is 8.581633, similarly for other decades-

8.581633, 7.806361, 7.380018, 7.004255, 6.465504, 5.991065, 5.607871,

so Hinduism which was 8.5 times of Islam in 1951 is 5.6 times in 2011.

Now, let's build Arima time-series on ratio variable-

comman_ratio <- auto.arima(ratio)
forecasted_ratio <-forecast(comman_ratio, 10)

Above table and Image shows that around 2100, Islam and Hinduism will have equal number of followers. Is this forecasting correct??

Let's build another time series with different ratio, now variable is ratio of Islam to Hinduism population. This variable gives the percentage of Islam respect to Hinduism population in India.

0.1165279, 0.1281007 ,0.1355010, 0.1427704, 0.1546670, 0.1669152, 0.1783208 ( ratio1)

in 1951, Islam is 11 % of total Hinduism and in 2011 it's 17 % of total Hinduism in India.

comman_ratio1 <- auto.arima(ratio1)
forecasted_ratio <-forecast(comman_ratio, 80)

qq <- c(ratio1, forecasted_ratio$mean)
year= seq(from = 1951, to=2811, by=10)
df <- data.frame(percentage_of_islam_compare_to_hinduism= qq, year =year )
ggplot2::ggplot(df, aes(year, percentage_of_islam_compare_to_hinduism)) + geom_line()

so this forecasting says that Islam is not going to be equal but 28% of total Hinduism and with current growth rate it would take 800 years for Islam to become equal to Hinduism in terms of followers.

So what is correct composition of demographics in 2100? Machine learning is giving different results based on variable taken. Plus 7 data points are not sufficient to forecast future 70 values. ☺☺Results might be different if we had taken only population of religions not the ratios.

ref:-

1) Sachar_Committee
2) 2011_Census_of_India
3) https://www.quora.com/What-was-the-Muslim-population-in-India-in-1947-and-now-in-2016

Connectivity Based Outlier Detection and its implementation in R

2018-09-25T11:33:00.001-07:00

Identifying abnormality in any industrial process, banking fraud, ad clicks etc is one of the major challenges for data scientist. There are many ways of detecting an abnormality.

different ways of detecting abnormalities through machine learning

There are many outlier detection techniques. One of these is connectivity based outlier factor. It is an improved version of LOF (local outlier factor) technique.

data point away in linear set of data points
should have been picked as outlier

The idea of Connectivity based outlier algorithm is to assign degree of outlier to each data point. This degree of outlier is called connectivity based outlier factor; COF of the data point. High COF value of data point represent the high probability of being an outlier.

Let’s understand COF step by step with an example.

Below diagram shows 9 data points in the plane. As we can see there are 2 data points P1 and P2 which are away from the trend line and seems outlier. The COF value for P1 and P2 should be higher than other data points in the trend line. Here we are taking k=5 nearest neighbor for COF calculation.

Following steps to compute the COF value for a data point P1.

1) Find k nearest neighbor (k-NN) of the data point P. (k=5)

N₅(P1) = {P2, P5, P4, P7, P6} create set of all data points nearer to P1.

2) Find Set based nearest (SBN) path: represent k nearest data points in order s={P₁,P₂,……., P_k}

SBN path = {P1, P2, P5, P4, P6, P7}, arrange data points in such a way that it should create a path, like P2 is the nearest data point from P1 then P5 is the nearest data point from P2, then either P6 or P4 can be choose as nearest data from P5 then P7 is the nearest data point from P6. All chosen data points must be available in nearest neighbor data points N₅(P1) set.

3) Find set based nearest (SBN) trail: represent sequence of edges based on SBN path e={e₁,e₂, …,e_k}. SBN trail = {(P1, P2), (P2, P5), (P5, P4), (P5, P6), (P6, P7)} arrange set of data points with respect to edges e1, e2, e3, e4, e5 respectively.

4) Find the cost of SBN trail: represent the distance between 2 data point (edge value) - Cost description = {3, 2, 1, 1, 1} weight of each edge.

5) Find Average chaining distance of the data point

dist(e_i) denotes distance between 2 data points, an edge, ex-

Like P1, find average chaining distance for all 5 nearest neighbor P2, P4, P5, P6 and P7.

Formula Explanation:

Total no of edges = {(P1,P2),(P1,P4),(P1,P5),(P1,P6),(P1,P7),(P2,P4),(P2,P5),(P2,P6), (P2,P7),(P4,P5),(P4,P6),(P4,P7),(P5,P6),(P5,P7),(P6,P7)} =15

k(k+1)/2 = 5(5+1)/2=15

· Sum of all edges weight during traversal of nearest data point

Ø Edge weight from P1 to P2 = 3

Ø Edge weight from P1 to P5 = 3+2 =5

Ø Edge weight from P1 to P4 = 3+2+1 =6

Ø Edge weight from P1 to P6 = 3+2+1+1 =7

Ø Edge weight from P1 to P7 = 3+2+1+1+1 =8

Total edge weight =(3+5+6+7+8) = 29

ac-dist(P1) = 29/15 = 1.933

6) Find COF value of the data point-

COF is the ratio of average chaining distance of data point and the average of average chaining distance of k nearest neighbor of the data point.

Like COF(P1), find COF for all the data points available in diagram, the data points having high COF values will be considered as outliers.

Darker data points showing most outlier data points. One can compare CBOF with Angle based outlier detection techniques ( ABOD).

click to know about angle based outlier detection algorithm ABOD-

Anomaly Detection in High Dimensional data :- Angle based outlier detection technique

2018-08-13T10:43:00.000-07:00

Angular Based Outlier Detection (ABOD)

Before starting ABOD method let’s try to understand what is outlier, different types of methods to detect outliers and how ABOD is different from other outlier detection methods.

As per Hawkins definition “An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism”

There are mainly 3 types of methods:-

1) Statistical or Model-based Methods: It includes Parametric and Non-parametric approach.

2) Proximity based methods: It can be classified in 3 category

Cluster based methods

Distance based methods

Density based methods

3) Angle based methods

Statistical models are relatively simple way of identifying an abnormal data point. Abnormal data points are outliers which can be identified even by Box- Plot, Extreme values in normal distribution etc.

Model based and Proximity based approaches, however, are based on an assessment of distances in the full-dimensional Euclidean data space. In high-dimensional data, these approaches are bound to deteriorate due to the notorious “curse of dimensionality”.You can read this article to know more about it- Distance & Density Based Clustering

The notion of ABOD algorithm is to find the outlier based on the variance of the angles between the difference vectors of data objects in the dataset.This way, the effects of the “curse of dimensionality” are alleviated compared to purely distance-based approaches.

In above figure for an outlier point P the angle between PX and PY for any two X Y from the database is substantially smaller than angles of other points Q and R.Angle between farthest data point is less than the angle between nearer data points. If you think deeper, the variance ( of all the possible angles to rest of the data points) for the farthest data points will be lesser as compared to the nearer data points. Thus the data point with less variance of angle will be considered as an outlier.

Angles are more stable than distances in high dimensional

• Object o is an outlier if most other objects are located in similar directions ( less variance of angles)

• Object o is no outlier if many other objects are located in varying directions (Higher variance of angles)

In actual implementation, not just the angle but the distance between the point is also divided so that distance is also taken into account.( Nearby points may also have very less angle but might not be outlier) So angular distance=

(AB,AC) - dot product of AB

AB, AC - distance between A and B, A and C

So cosine= (AB, AC)/AB*AC

cosine /distances= (AB, AC)/(AB^2*AC^2)

to calculate angle based outlier factor of A, variance of all possible cosine/distance is taken. Lower value means more outlier-ness.

Implementation of AOBD method in R

# Sub-setting the data

iris_dataset <- iris[,1:4]

# Running ABOD code

angular_distance <- abodOutlier::abod(iris_dataset, method = "complete")

# plotting the data

library(ggplot2)

gg <- ggplot(data = iris_dataset, aes(x=Sepal.Length, y= Sepal.Width)) + geom_point(aes(col=angular_distance))

plot(gg)

Here the darker points (smaller angular distance) are clearly visible as outliers(Abnormal data points).

Connectivity based Outlier Detection method

Read more about other interesting ML topics-

Hierarchical Clustering and performance parameters of clustering.

Text Classification Algorithms